A contiguous column coherent evolution biclustering algorithm for time-series gene expression data

Yun Xue¹,
Meizhen Zhang¹,
Zhengling Liao¹,
Meihang Li¹,
Jie Luo¹ &
…
Xiaohui Hu¹

266 Accesses
4 Citations
Explore all metrics

Abstract

As a high-throughput detection technology, the gene chips produce huge amount of gene expression data. How to effectively analyze the data has become an urgent need. Biclustering techniques have been used as important tools to find the local patterns in gene expression data. Biclustering is to find submatrices, so that a subset of the genes shows a “highly correlated behavior in a subset of conditions”. However, most existing biclustering algorithms are not able to find biclusters with contiguous columns. Since there is important internal sequential relationship in time-series data, these methods are not suitable for the analysis of time-series data. In order to explore the potential biological information of contiguous time point and find the co-expressed relationship among genes, this paper presents an efficient, accurate algorithm named k-CCC algorithm, to search contiguous coherent evolution biclusters in time-series data. The first step of the algorithm is to transform the original matrix into a difference matrix; then starting from the column pattern consisting of contiguous k columns, we gradually assemble them into patterns composed of more columns. A pattern update strategy is adopted to improve the efficiency of the algorithm. The algorithm can find all the embedded biclusters and show good scalability in simulated tests. Experimental results on real datasets show that the algorithm can find biclusters with statistical significance and strong biological relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data

Article 18 December 2015

An Effective Biclustering Algorithm for Time-Series Gene Expression Data

Identification of K-Tolerance Regulatory Modules in Time Series Gene Expression Data Using a Biclustering Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Al-Akwaa FM, Kadah YM (2009) An automatic gene ontology software tool for bicluster and cluster comparisons. In: IEEE symposium on computational intelligence in bioinformatics and computational biology (CIBCB2009), 2009. pp 163–167
Bar-Joseph Z (2004) Analyzing time series gene expression data. Bioinformatics 20:2493–2503. doi:10.1093/bioinformatics/bth283
Article Google Scholar
Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10:373–384. doi:10.1089/10665270360688075
Article Google Scholar
Chen B, Zou Q-H, Chen W-S, Pan B-B (2014) A novel adaptive partial differential equation model for image segmentation. Appl Anal 93:2440–2450
Article MATH Google Scholar
Chen B, Q-h Zou, Li Y (2015) A new image segmentation model with local statistical characters based on variance minimization. Appl Math Model 39:3227–3235
Article MathSciNet Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. In: The 8th international conference on intelligent systems for molecular biology (ISMB2000), 2000. pp 93–103
Cho RJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73. doi:10.1016/S1097-2765(00)80114-8
Article MathSciNet Google Scholar
Du Z, Wang Y, Ji Z (2008) PK-means: a new algorithm for gene clustering. Comput Biol Chem 32:243–247
Article MATH Google Scholar
Fang Q, Ng W, Feng J (2010) Discovering significant relaxed order-preserving submatrices. In: The 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010, ACM. pp 433–442. doi:10.1145/1835804.1835861
Fang Q, Ng W, Feng J, Li Y (2012) Mining bucket order-preserving submatrices in gene expression data. In: IEEE transactions on knowledge and data engineering, 2012, vol 12. pp 2218–2231. doi:10.1109/TKDE.2011.180
Gao BJ, Griffith OL, Ester M, Hui X, Qiang Z, Jones SJM (2012) On the deep order-preserving submatrix problem: A best effort approach. In: IEEE transactions on knowledge and data engineering, 2012, vol 2. pp 309–325. doi:10.1109/tkde.2010.244
Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. In: The National Academy of Sciences, 2000, vol 22. pp 12079–12084
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129. doi:10.1080/01621459.1972.10481214
Article Google Scholar
Hsu H-H (2006) Advanced data mining technologies in bioinformatics. IGI Global, Pennsylvania
Book Google Scholar
Ji-Bin Q, Xiang-Sun Z, Ling-Yun W, Yong W, Luonan C (2011) Detecting coherent local patterns from time series gene expression data by a temporal biclustering method. In: IEEE international conference on systems biology (ISB2011), 2–4 Sept 2011. pp 388–393. doi:10.1109/ISB.2011.6033184
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, United States
MATH Google Scholar
Korenberg MJ (2007) Microarray data analysis: methods and applications, vol 377. Springer, New Jersey
Google Scholar
Laura L, Art O (2002) Plaid models for gene expression data. Stat Sin 12:61–86
MathSciNet MATH Google Scholar
Liu F, Wang L (2010) Biclustering of time-lagged gene expression data using real number. J Biomed Sci Eng 3:217. doi:10.4236/jbise.2010.32029
Article Google Scholar
Liu jin Z, Wei W (2003) OP-cluster: clustering by tendency in high dimensional space. In: Third IEEE international conference on data mining (ICDM2003), 2003. pp 187–194
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: The 5th berkeley symposium on mathematical statistics and probability, 1967, vol 14, Oakland, CA, USA, pp 281–297
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. In: IEEE/ACM transactions on computational biology and bioinformatics, 2004, vol 1. pp 24–45
Madeira SC, Oliveira AL (2005) A linear time biclustering algorithm for time series gene expression data. In: Algorithms in bioinformatics. Springer, Berlin, pp 39–52
Madeira SC, Oliveira AL (2007) An efficient biclustering algorithm for finding genes with similar patterns in time-series expression data. In: The 5th Asia–Pacific bioinformatics conference (APBC2007), October 9, 2006–2007, Citeseer, pp 67–80
Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B (2004) GOToolBox: functional analysis of gene datasets based on gene ontology. Genome Biol 5:R101. doi:10.1186/gb-2004-5-12-r101
Article Google Scholar
Prelić A et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22:1122–1129. doi:10.1093/bioinformatics/btl060
Article Google Scholar
Törönen P, Kolehmainen M, Wong G, Castrén E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451:142–146. doi:10.1016/S0014-5793(99)00524-4
Article Google Scholar
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144. doi:10.1093/bioinformatics/18.suppl_1.S136
Article Google Scholar
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525. doi:10.1093/bioinformatics/17.6.520
Article Google Scholar
Xue Y, Li T, Liu Z, Liao Z, Xiao H, Zhao H, Hu X (2014) A common-subsequence-based approach for mining deep order preserving submatrix. In: Fuzzy systems and knowledge discovery (FSKD), 2014 11th international conference on, 19–21 Aug 2014, pp 334–340. doi:10.1109/FSKD.2014.6980856
Yang J, Wang H, Wang W, Yu P (2003) Enhanced biclustering on expression data. In: The 3rd IEEE symposium on bioinformatics and bioengineering, 2003, IEEE, pp 321–327
You Z-H, Lei Y-K, Zhu L, Xia J, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinform 14:S10
Article Google Scholar
You Z-H, Yu J-Z, Zhu L, Li S, Wen Z-K (2014) A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145:37–43
Article Google Scholar
You Z-H, Zhu L, Zheng C-H, Yu H-J, Deng S-P, Ji Z (2014) Prediction of protein–protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform 15:S9
Article Google Scholar
Zhang Y, Zha H, Chu C-H (2005) A time-series biclustering algorithm for revealing co-regulated genes. In: International conference on information technology: coding and computing, 2005, IEEE, pp 32–37
Zhao H, Liew AW-C, Wang DZ, Yan H (2012) Biclustering analysis for pattern discovery: current techniques, comparative studies and applications. Curr Bioinform 7:43–55. doi:10.2174/157489312799304413
Article Google Scholar

Download references

Acknowledgments

The authors thank gratefully for the colleagues who participated in this work and provided technical supports. This work is supported by National Natural Science Foundation of China (No. 71272084), the PCSIRT (Grant No. IRT1243), and the Scientific Research Foundation of Graduate School of South China Normal University (2015lkxm37).

Author information

Authors and Affiliations

Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China
Yun Xue, Meizhen Zhang, Zhengling Liao, Meihang Li, Jie Luo & Xiaohui Hu

Authors

Yun Xue
View author publications
You can also search for this author in PubMed Google Scholar
Meizhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengling Liao
View author publications
You can also search for this author in PubMed Google Scholar
Meihang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Luo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohui Hu.

Ethics declarations

Conflict of interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xue, Y., Zhang, M., Liao, Z. et al. A contiguous column coherent evolution biclustering algorithm for time-series gene expression data. Int. J. Mach. Learn. & Cyber. 9, 441–453 (2018). https://doi.org/10.1007/s13042-015-0487-6

Download citation

Received: 25 February 2015
Accepted: 22 December 2015
Published: 07 January 2016
Issue Date: March 2018
DOI: https://doi.org/10.1007/s13042-015-0487-6

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data

An Effective Biclustering Algorithm for Time-Series Gene Expression Data

Identification of K-Tolerance Regulatory Modules in Time Series Gene Expression Data Using a Biclustering Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A contiguous column coherent evolution biclustering algorithm for time-series gene expression data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data

An Effective Biclustering Algorithm for Time-Series Gene Expression Data

Identification of K-Tolerance Regulatory Modules in Time Series Gene Expression Data Using a Biclustering Algorithm

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation