Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

A contiguous column coherent evolution biclustering algorithm for time-series gene expression data

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

As a high-throughput detection technology, the gene chips produce huge amount of gene expression data. How to effectively analyze the data has become an urgent need. Biclustering techniques have been used as important tools to find the local patterns in gene expression data. Biclustering is to find submatrices, so that a subset of the genes shows a “highly correlated behavior in a subset of conditions”. However, most existing biclustering algorithms are not able to find biclusters with contiguous columns. Since there is important internal sequential relationship in time-series data, these methods are not suitable for the analysis of time-series data. In order to explore the potential biological information of contiguous time point and find the co-expressed relationship among genes, this paper presents an efficient, accurate algorithm named k-CCC algorithm, to search contiguous coherent evolution biclusters in time-series data. The first step of the algorithm is to transform the original matrix into a difference matrix; then starting from the column pattern consisting of contiguous k columns, we gradually assemble them into patterns composed of more columns. A pattern update strategy is adopted to improve the efficiency of the algorithm. The algorithm can find all the embedded biclusters and show good scalability in simulated tests. Experimental results on real datasets show that the algorithm can find biclusters with statistical significance and strong biological relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Al-Akwaa FM, Kadah YM (2009) An automatic gene ontology software tool for bicluster and cluster comparisons. In: IEEE symposium on computational intelligence in bioinformatics and computational biology (CIBCB2009), 2009. pp 163–167

  2. Bar-Joseph Z (2004) Analyzing time series gene expression data. Bioinformatics 20:2493–2503. doi:10.1093/bioinformatics/bth283

    Article  Google Scholar 

  3. Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10:373–384. doi:10.1089/10665270360688075

    Article  Google Scholar 

  4. Chen B, Zou Q-H, Chen W-S, Pan B-B (2014) A novel adaptive partial differential equation model for image segmentation. Appl Anal 93:2440–2450

    Article  MATH  Google Scholar 

  5. Chen B, Q-h Zou, Li Y (2015) A new image segmentation model with local statistical characters based on variance minimization. Appl Math Model 39:3227–3235

    Article  MathSciNet  Google Scholar 

  6. Cheng Y, Church GM (2000) Biclustering of expression data. In: The 8th international conference on intelligent systems for molecular biology (ISMB2000), 2000. pp 93–103

  7. Cho RJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73. doi:10.1016/S1097-2765(00)80114-8

    Article  MathSciNet  Google Scholar 

  8. Du Z, Wang Y, Ji Z (2008) PK-means: a new algorithm for gene clustering. Comput Biol Chem 32:243–247

    Article  MATH  Google Scholar 

  9. Fang Q, Ng W, Feng J (2010) Discovering significant relaxed order-preserving submatrices. In: The 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010, ACM. pp 433–442. doi:10.1145/1835804.1835861

  10. Fang Q, Ng W, Feng J, Li Y (2012) Mining bucket order-preserving submatrices in gene expression data. In: IEEE transactions on knowledge and data engineering, 2012, vol 12. pp 2218–2231. doi:10.1109/TKDE.2011.180

  11. Gao BJ, Griffith OL, Ester M, Hui X, Qiang Z, Jones SJM (2012) On the deep order-preserving submatrix problem: A best effort approach. In: IEEE transactions on knowledge and data engineering, 2012, vol 2. pp 309–325. doi:10.1109/tkde.2010.244

  12. Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. In: The National Academy of Sciences, 2000, vol 22. pp 12079–12084

  13. Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129. doi:10.1080/01621459.1972.10481214

    Article  Google Scholar 

  14. Hsu H-H (2006) Advanced data mining technologies in bioinformatics. IGI Global, Pennsylvania

    Book  Google Scholar 

  15. Ji-Bin Q, Xiang-Sun Z, Ling-Yun W, Yong W, Luonan C (2011) Detecting coherent local patterns from time series gene expression data by a temporal biclustering method. In: IEEE international conference on systems biology (ISB2011), 2–4 Sept 2011. pp 388–393. doi:10.1109/ISB.2011.6033184

  16. Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, United States

    MATH  Google Scholar 

  17. Korenberg MJ (2007) Microarray data analysis: methods and applications, vol 377. Springer, New Jersey

    Google Scholar 

  18. Laura L, Art O (2002) Plaid models for gene expression data. Stat Sin 12:61–86

    MathSciNet  MATH  Google Scholar 

  19. Liu F, Wang L (2010) Biclustering of time-lagged gene expression data using real number. J Biomed Sci Eng 3:217. doi:10.4236/jbise.2010.32029

    Article  Google Scholar 

  20. Liu jin Z, Wei W (2003) OP-cluster: clustering by tendency in high dimensional space. In: Third IEEE international conference on data mining (ICDM2003), 2003. pp 187–194

  21. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: The 5th berkeley symposium on mathematical statistics and probability, 1967, vol 14, Oakland, CA, USA, pp 281–297

  22. Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. In: IEEE/ACM transactions on computational biology and bioinformatics, 2004, vol 1. pp 24–45

  23. Madeira SC, Oliveira AL (2005) A linear time biclustering algorithm for time series gene expression data. In: Algorithms in bioinformatics. Springer, Berlin, pp 39–52

  24. Madeira SC, Oliveira AL (2007) An efficient biclustering algorithm for finding genes with similar patterns in time-series expression data. In: The 5th Asia–Pacific bioinformatics conference (APBC2007), October 9, 2006–2007, Citeseer, pp 67–80

  25. Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B (2004) GOToolBox: functional analysis of gene datasets based on gene ontology. Genome Biol 5:R101. doi:10.1186/gb-2004-5-12-r101

    Article  Google Scholar 

  26. Prelić A et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22:1122–1129. doi:10.1093/bioinformatics/btl060

    Article  Google Scholar 

  27. Törönen P, Kolehmainen M, Wong G, Castrén E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451:142–146. doi:10.1016/S0014-5793(99)00524-4

    Article  Google Scholar 

  28. Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144. doi:10.1093/bioinformatics/18.suppl_1.S136

    Article  Google Scholar 

  29. Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525. doi:10.1093/bioinformatics/17.6.520

    Article  Google Scholar 

  30. Xue Y, Li T, Liu Z, Liao Z, Xiao H, Zhao H, Hu X (2014) A common-subsequence-based approach for mining deep order preserving submatrix. In: Fuzzy systems and knowledge discovery (FSKD), 2014 11th international conference on, 19–21 Aug 2014, pp 334–340. doi:10.1109/FSKD.2014.6980856

  31. Yang J, Wang H, Wang W, Yu P (2003) Enhanced biclustering on expression data. In: The 3rd IEEE symposium on bioinformatics and bioengineering, 2003, IEEE, pp 321–327

  32. You Z-H, Lei Y-K, Zhu L, Xia J, Wang B (2013) Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinform 14:S10

    Article  Google Scholar 

  33. You Z-H, Yu J-Z, Zhu L, Li S, Wen Z-K (2014) A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145:37–43

    Article  Google Scholar 

  34. You Z-H, Zhu L, Zheng C-H, Yu H-J, Deng S-P, Ji Z (2014) Prediction of protein–protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform 15:S9

    Article  Google Scholar 

  35. Zhang Y, Zha H, Chu C-H (2005) A time-series biclustering algorithm for revealing co-regulated genes. In: International conference on information technology: coding and computing, 2005, IEEE, pp 32–37

  36. Zhao H, Liew AW-C, Wang DZ, Yan H (2012) Biclustering analysis for pattern discovery: current techniques, comparative studies and applications. Curr Bioinform 7:43–55. doi:10.2174/157489312799304413

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank gratefully for the colleagues who participated in this work and provided technical supports. This work is supported by National Natural Science Foundation of China (No. 71272084), the PCSIRT (Grant No. IRT1243), and the Scientific Research Foundation of Graduate School of South China Normal University (2015lkxm37).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Hu.

Ethics declarations

Conflict of interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, Y., Zhang, M., Liao, Z. et al. A contiguous column coherent evolution biclustering algorithm for time-series gene expression data. Int. J. Mach. Learn. & Cyber. 9, 441–453 (2018). https://doi.org/10.1007/s13042-015-0487-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-015-0487-6

Keywords

Navigation