Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3449726.3463197acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

EBIC.JL: an efficient implementation of evolutionary biclustering algorithm in Julia

Published: 08 July 2021 Publication History

Abstract

Biclustering is a data mining technique which searches for local patterns in numeric tabular data with main application in bioinformatics. This technique has shown promise in multiple areas, including development of biomarkers for cancer, disease subtype identification, or gene-drug interactions among others. In this paper we introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia, a modern highly parallelizable programming language for data science. We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems. We hope that this open source software in a high-level programming language will foster research in this promising field of bioinformatics and expedite development of new biclustering methods for big data.

References

[1]
Harith Al-Sahaf, Ying Bi, Qi Chen, Andrew Lensen, Yi Mei, Yanan Sun, Binh Tran, Bing Xue, and Mengjie Zhang. 2019. A survey on evolutionary machine learning. Journal of the Royal Society of New Zealand 49, 2 (2019), 205--228.
[2]
A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini. 2003. Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol. 10, 3-4 (2003), 373--384.
[3]
Sven Bergmann, Jan Ihmels, and Naama Barkai. 2003. Iterative signature algorithm for the analysis of large-scale gene expression data. Physical review E 67, 3 (2003), 031902.
[4]
Tim Besard, Valentin Churavy, Alan Edelman, and Bjorn De Sutter. 2019. Rapid software prototyping for heterogeneous and distributed platforms. Advances in Engineering Software 132 (2019), 29--46.
[5]
Tim Besard, Christophe Foket, and Bjorn De Sutter. 2018. Effective Extensible Programming: Unleashing Julia on GPUs. IEEE Transactions on Parallel and Distributed Systems abs/1712.03112 (2018).
[6]
Anthony D. Blaom, Franz Kiraly, Thibaut Lienart, Yiannis Simillides, Diego Arenas, and Sebastian J. Vollmer. 2020. MLJ: A Julia package for composable machine learning. Journal of Open Source Software 5, 55 (2020), 2704.
[7]
Yizong Cheng and George M. Church. 2000. Biclustering of Expression Data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, 93--103. http://dl.acm.org/citation.cfm?id=645635.660833
[8]
Phuong Dao, Recep Colak, Raheleh Salari, Flavia Moser, Elai Davicioni, Alexander Schönhuth, and Martin Ester. 2010. Inferring cancer subnetwork markers using density-constrained biclustering. Bioinformatics 26, 18 (2010), i625--i631.
[9]
Kemal Eren, Mehmet Deveci, Onur Küçüktunç, and Ümit V Çatalyürek. 2013. A comparative analysis of biclustering algorithms for gene expression data. Briefings in bioinformatics 14, 3 (2013), 279--292.
[10]
David E. Goldberg and John H. Holland. 1988. Genetic Algorithms and Machine Learning. Mach. Learn. 3 (1988), 95--99.
[11]
Rave Harpaz, Hector Perez, Herbert S Chase, Raul Rabadan, George Hripcsak, and Carol Friedman. 2011. Biclustering of adverse drug events in the FDA's spontaneous reporting system. Clinical Pharmacology & Therapeutics 89, 2 (2011), 243--250.
[12]
Samuel Hindmarsh, Peter Andreae, and Mengjie Zhang. 2012. Genetic Programming for Improving Image Descriptors Generated Using the Scale-Invariant Feature Transform. In Proceedings of the 27th Conference on Image and Vision Computing New Zealand (Dunedin, New Zealand) (IVCNZ '12). Association for Computing Machinery, New York, NY, USA, 85--90.
[13]
S. Hochreiter, U. Bodenhofer, M. Heusel, A. Mayr, A. Mitterecker, A. Kasim, T. Khamiakova, S. VanSanden, D. Lin, W. Talloen, et al. 2010. FABIA: factor analysis for bicluster acquisition. Bioinformatics 26, 12 (2010), 1520--1527.
[14]
Danilo Horta and Ricardo JGB Campello. 2014. Similarity measures for comparing biclusterings. IEEE/ACM transactions on computational biology and bioinformatics 11, 5 (2014), 942--954.
[15]
Michael Innes, Elliot Saba, Keno Fischer, Dhairya Gandhi, Marco Concetto Rudilosso, Neethu Mariya Joy, Tejan Karmali, Avik Pal, and Viral Shah. 2018. Fashionable Modelling with Flux. CoRR abs/1811.01457 (2018). arXiv:1811.01457
[16]
Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist 11, 2 (1912), 37--50.
[17]
K. Krawiec. 2002. Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks. Genetic Programming and Evolvable Machines 3 (2002), 329--343.
[18]
William La Cava, Lee Spector, and Kourosh Danai. 2016. Epsilon-Lexicase Selection for Regression (GECCO '16). Association for Computing Machinery, New York, NY, USA, 741--748.
[19]
Laura Lazzeroni and Art Owen. 2002. PLAID MODELS FOR GENE EXPRESSION DATA. Statistica Sinica 12, 1 (2002), 61--86. http://www.jstor.org/stable/24307036
[20]
G. Li, Q. Ma, H. Tang, A. H. Paterson, and Y. Xu. 2009. QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic acids research 37, 15 (2009), e101--e101.
[21]
Xiangyu Liu, Di Li, Juntao Liu, Zhengchang Su, and Guojun Li. 2020. RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters. Bioinformatics 36, 20 (07 2020), 5054--5060.
[22]
Yiyi Liu, Quanquan Gu, Jack P Hou, Jiawei Han, and Jian Ma. 2014. A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression. BMC bioinformatics 15, 1 (2014), 1--11.
[23]
Zhichao Lu, Ian Whalen, Vishnu Boddeti, Yashesh Dhebar, Kalyanmoy Deb, Erik Goodman, and Wolfgang Banzhaf. 2019. NSGA-Net: Neural Architecture Search Using Multi-Objective Genetic Algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference (Prague, Czech Republic) (GECCO '19). Association for Computing Machinery, New York, NY, USA, 419--427.
[24]
S. C. Madeira and A. L. Oliveira. 2004. Biclustering algorithms for biological data analysis: a survey. Computational Biology and Bioinformatics, IEEE/ACM Transactions on 1, 1 (2004), 24--45.
[25]
Jonathan Malmaud and Lyndon White. 2018. TensorFlow. jl: An idiomatic Julia front end for TensorFlow. Journal of Open Source Software 3, 31 (2018), 1002.
[26]
Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, et al. 2019. Evolving deep neural networks. In Artificial intelligence in the age of neural networks and brain computing. Elsevier, 293--312.
[27]
James Munkres. 1957. Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics 5, 1 (1957), 32--38.
[28]
Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore. 2016. Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I. Springer International Publishing, Chapter Automating Biomedical Data Science Through Tree-Based Pipeline Optimization, 123--137.
[29]
Patryk Orzechowski and Krzysztof Boryczko. 2015. Effective biclustering on GPU-capabilities and constraints. Prz Elektrotechniczn 1 (2015), 133--6.
[30]
Patryk Orzechowski, Krzysztof Boryczko, and Jason H Moore. 2019. Scalable biclustering --- the future of big data exploration? GigaScience 8, 7 (06 2019). giz078.
[31]
Patryk Orzechowski, William La Cava, and Jason H. Moore. 2018. Where Are We Now? A Large Benchmark Study of Recent Symbolic Regression Methods. In Proceedings of the Genetic and Evolutionary Computation Conference (Kyoto, Japan) (GECCO '18). Association for Computing Machinery, New York, NY, USA, 1183--1190.
[32]
Patryk Orzechowski and Jason H Moore. 2019. EBIC: an open source software for high-dimensional and big data analyses. Bioinformatics 35, 17 (01 2019), 3181--3183. arXiv:https://academic.oup.com/bioinformatics/article-pdf/35/17/3181/29591797/btz027.pdf
[33]
Patryk Orzechowski and Jason H. Moore. 2019. Mining a Massive RNA-Seq Dataset with Biclustering: Are Evolutionary Approaches Ready for Big Data?. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (Prague, Czech Republic) (GECCO '19). Association for Computing Machinery, New York, NY, USA, 304--305.
[34]
Patryk Orzechowski, Artur Pańszczyk, Xiuzhen Huang, and Jason H Moore. 2018. runibic: a Bioconductor package for parallel row-based biclustering of gene expression data. Bioinformatics 34, 24 (2018), 4302--4304.
[35]
Patryk Orzechowski, Moshe Sipper, and Xiuzhen Huang. 2018. EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery. Bioinformatics 34, 21 (05 2018), 3719--3726.
[36]
Victor A Padilha and Ricardo JGB Campello. 2017. A systematic comparative evaluation of biclustering techniques. BMC bioinformatics 18, 1 (2017), 55.
[37]
Anne Patrikainen and Marina Meila. 2006. Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering 18, 7 (2006), 902--916.
[38]
A. Prelić, S. Bleuler, P. Zimmermann, A. Wille, P. Bühlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler. 2006. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 9 (2006), 1122--1129.
[39]
David J Reiss, Nitin S Baliga, and Richard Bonneau. 2006. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC bioinformatics 7, 1 (2006), 1--22.
[40]
Ankur Sinha, Pekka Malo, and Timo Kuosmanen. 2015. A Multiobjective Exploratory Procedure for Regression Model Selection. Journal of Computational and Graphical Statistics 24, 1 (2015), 154--182.
[41]
Y. Sun, G. G. Yen, and Z. Yi. 2019. Evolving Unsupervised Deep Neural Networks for Learning Meaningful Representations. IEEE Transactions on Evolutionary Computation 23, 1 (2019), 89--103.
[42]
Alain B Tchagang, Ahmed H Tewfik, Melissa S DeRycke, Keith M Skubitz, and Amy PN Skubitz. 2008. Early detection of ovarian cancer using group biomarkers. Molecular cancer therapeutics 7, 1 (2008), 27--37.
[43]
Yi Kan Wang, Edmund J Crampin, et al. 2013. Biclustering reveals breast cancer tumour subgroups with common clinical features and improves prediction of disease recurrence. BMC genomics 14, 1 (2013), 1--15.
[44]
Zhenjia Wang, Guojun Li, Robert W Robinson, and Xiuzhen Huang. 2016. UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data. Scientific reports 6, 1 (2016), 1--10.
[45]
Juan Xie, Anjun Ma, Anne Fennell, Qin Ma, and Jing Zhao. 2018. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Briefings in Bioinformatics 20, 4 (02 2018), 1450--1465. arXiv:https://academic.oup.com/bib/article-pdf/20/4/1450/31614291/bby014.pdf
[46]
Juan Xie, Anjun Ma, Yu Zhang, Bingqiang Liu, Sha Cao, Cankun Wang, Jennifer Xu, Chi Zhang, and Qin Ma. 2019. QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data. Bioinformatics 36, 4 (09 2019), 1143--1149.
[47]
Liying Yang, Yunyan Shen, Xiguo Yuan, Junying Zhang, and Jianhua Wei. 2017. Analysis of breast cancer subtypes by AP-ISA biclustering. BMC bioinformatics 18, 1 (2017), 1--13.
[48]
Huimin Zhao. 2007. A multi-objective genetic programming approach to developing Pareto optimal decision trees. Decision Support Systems 43, 3 (2007), 809 -- 826. Integrated Decision Support.
[49]
Xiuqing Zhou and Jinde Wang. 2005. A genetic method of LAD estimation for models with censored data. Computational Statistics & Data Analysis 48, 3 (2005), 451 -- 466.
[50]
Z. Zhu, Y. Ong, and J. M. Zurada. 2010. Identification of Full and Partial Class Relevant Genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 2 (2010), 263--277.
[51]
Eckart Zitzler and Kalyanmoy Deb. 2007. Evolutionary multiobjective optimization. In Genetic and Evolutionary Computation Conference, GECCO 2007, Proceedings, London, England, UK, July 7-11, 2007, Companion Material, Dirk Thierens (Ed.). ACM, 3792--3809.

Cited By

View all
  • (2022)JGEAProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3533960(2009-2018)Online publication date: 9-Jul-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2021
2047 pages
ISBN:9781450383516
DOI:10.1145/3449726
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. biclustering
  2. data mining
  3. evolutionary computation
  4. machine learning
  5. parallel algorithms

Qualifiers

  • Research-article

Funding Sources

Conference

GECCO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)JGEAProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3533960(2009-2018)Online publication date: 9-Jul-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media