Abstract
As a high-throughput detection technology, the gene chips produce huge amount of gene expression data. How to effectively analyze the data has become an urgent need. Biclustering techniques have been used as important tools to find the local patterns in gene expression data. Biclustering is to find submatrices, so that a subset of the genes shows a “highly correlated behavior in a subset of conditions”. However, most existing biclustering algorithms are not able to find biclusters with contiguous columns. Since there is important internal sequential relationship in time-series data, these methods are not suitable for the analysis of time-series data. In order to explore the potential biological information of contiguous time point and find the co-expressed relationship among genes, this paper presents an efficient, accurate algorithm named k-CCC algorithm, to search contiguous coherent evolution biclusters in time-series data. The first step of the algorithm is to transform the original matrix into a difference matrix; then starting from the column pattern consisting of contiguous k columns, we gradually assemble them into patterns composed of more columns. A pattern update strategy is adopted to improve the efficiency of the algorithm. The algorithm can find all the embedded biclusters and show good scalability in simulated tests. Experimental results on real datasets show that the algorithm can find biclusters with statistical significance and strong biological relevance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Al-Akwaa FM, Kadah YM (2009) An automatic gene ontology software tool for bicluster and cluster comparisons. In: IEEE symposium on computational intelligence in bioinformatics and computational biology (CIBCB2009), 2009. pp 163–167
Bar-Joseph Z (2004) Analyzing time series gene expression data. Bioinformatics 20:2493–2503. doi:10.1093/bioinformatics/bth283
Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10:373–384. doi:10.1089/10665270360688075
Chen B, Zou Q-H, Chen W-S, Pan B-B (2014) A novel adaptive partial differential equation model for image segmentation. Appl Anal 93:2440–2450
Chen B, Q-h Zou, Li Y (2015) A new image segmentation model with local statistical characters based on variance minimization. Appl Math Model 39:3227–3235
Cheng Y, Church GM (2000) Biclustering of expression data. In: The 8th international conference on intelligent systems for molecular biology (ISMB2000), 2000. pp 93–103
Cho RJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73. doi:10.1016/S1097-2765(00)80114-8
Du Z, Wang Y, Ji Z (2008) PK-means: a new algorithm for gene clustering. Comput Biol Chem 32:243–247
Fang Q, Ng W, Feng J (2010) Discovering significant relaxed order-preserving submatrices. In: The 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010, ACM. pp 433–442. doi:10.1145/1835804.1835861
Fang Q, Ng W, Feng J, Li Y (2012) Mining bucket order-preserving submatrices in gene expression data. In: IEEE transactions on knowledge and data engineering, 2012, vol 12. pp 2218–2231. doi:10.1109/TKDE.2011.180
Gao BJ, Griffith OL, Ester M, Hui X, Qiang Z, Jones SJM (2012) On the deep order-preserving submatrix problem: A best effort approach. In: IEEE transactions on knowledge and data engineering, 2012, vol 2. pp 309–325. doi:10.1109/tkde.2010.244
Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. In: The National Academy of Sciences, 2000, vol 22. pp 12079–12084
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129. doi:10.1080/01621459.1972.10481214
Hsu H-H (2006) Advanced data mining technologies in bioinformatics. IGI Global, Pennsylvania
Ji-Bin Q, Xiang-Sun Z, Ling-Yun W, Yong W, Luonan C (2011) Detecting coherent local patterns from time series gene expression data by a temporal biclustering method. In: IEEE international conference on systems biology (ISB2011), 2–4 Sept 2011. pp 388–393. doi:10.1109/ISB.2011.6033184
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, United States
Korenberg MJ (2007) Microarray data analysis: methods and applications, vol 377. Springer, New Jersey
Laura L, Art O (2002) Plaid models for gene expression data. Stat Sin 12:61–86
Liu F, Wang L (2010) Biclustering of time-lagged gene expression data using real number. J Biomed Sci Eng 3:217. doi:10.4236/jbise.2010.32029
Liu jin Z, Wei W (2003) OP-cluster: clustering by tendency in high dimensional space. In: Third IEEE international conference on data mining (ICDM2003), 2003. pp 187–194
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: The 5th berkeley symposium on mathematical statistics and probability, 1967, vol 14, Oakland, CA, USA, pp 281–297
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. In: IEEE/ACM transactions on computational biology and bioinformatics, 2004, vol 1. pp 24–45
Madeira SC, Oliveira AL (2005) A linear time biclustering algorithm for time series gene expression data. In: Algorithms in bioinformatics. Springer, Berlin, pp 39–52
Madeira SC, Oliveira AL (2007) An efficient biclustering algorithm for finding genes with similar patterns in time-series expression data. In: The 5th Asia–Pacific bioinformatics conference (APBC2007), October 9, 2006–2007, Citeseer, pp 67–80
Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B (2004) GOToolBox: functional analysis of gene datasets based on gene ontology. Genome Biol 5:R101. doi:10.1186/gb-2004-5-12-r101
Prelić A et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22:1122–1129. doi:10.1093/bioinformatics/btl060
Törönen P, Kolehmainen M, Wong G, Castrén E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451:142–146. doi:10.1016/S0014-5793(99)00524-4
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144. doi:10.1093/bioinformatics/18.suppl_1.S136
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525. doi:10.1093/bioinformatics/17.6.520
Xue Y, Li T, Liu Z, Liao Z, Xiao H, Zhao H, Hu X (2014) A common-subsequence-based approach for mining deep order preserving submatrix. In: Fuzzy systems and knowledge discovery (FSKD), 2014 11th international conference on, 19–21 Aug 2014, pp 334–340. doi:10.1109/FSKD.2014.6980856
Yang J, Wang H, Wang W, Yu P (2003) Enhanced biclustering on expression data. In: The 3rd IEEE symposium on bioinformatics and bioengineering, 2003, IEEE, pp 321–327
You Z-H, Lei Y-K, Zhu L, Xia J, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinform 14:S10
You Z-H, Yu J-Z, Zhu L, Li S, Wen Z-K (2014) A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145:37–43
You Z-H, Zhu L, Zheng C-H, Yu H-J, Deng S-P, Ji Z (2014) Prediction of protein–protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform 15:S9
Zhang Y, Zha H, Chu C-H (2005) A time-series biclustering algorithm for revealing co-regulated genes. In: International conference on information technology: coding and computing, 2005, IEEE, pp 32–37
Zhao H, Liew AW-C, Wang DZ, Yan H (2012) Biclustering analysis for pattern discovery: current techniques, comparative studies and applications. Curr Bioinform 7:43–55. doi:10.2174/157489312799304413
Acknowledgments
The authors thank gratefully for the colleagues who participated in this work and provided technical supports. This work is supported by National Natural Science Foundation of China (No. 71272084), the PCSIRT (Grant No. IRT1243), and the Scientific Research Foundation of Graduate School of South China Normal University (2015lkxm37).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Rights and permissions
About this article
Cite this article
Xue, Y., Zhang, M., Liao, Z. et al. A contiguous column coherent evolution biclustering algorithm for time-series gene expression data. Int. J. Mach. Learn. & Cyber. 9, 441–453 (2018). https://doi.org/10.1007/s13042-015-0487-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0487-6