Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-540-77046-6_47guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype

Data Analysis and Bioinformatics

Published: 18 December 2007 Publication History


Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several fields such as statistics, pattern recognition, and machine learning. Data mining adds to clustering the complications of very large data-sets with many attributes of different types. And this is a typical situation in biology. Some cases studies are also described.


Brudno M., Malde S., and Poliakov A. Glocal alignment: finding rearrangements during alignment Bioinformatics 2003 19 1 54-62
Rogic, S.: The role of pre-mRNA secondary structure in gene splicing in Saccharomyces cerevisiae, PhD Dissertation, University of British Columbia (2006)
Bourne P.E. and Shindyalov I.N. Bourne P.E. and Weissig H. Structure Comparison and Alignment Structural Bioinformatics 2003 Hoboken, NJ Wiley-Liss
Zhang Y. and Skolnick J. The protein structure prediction problem could be solved using the current PDB library Proc. Natl. Acad. Sci. USA 2005 102 4 1029-1034
Gould, S.J.: The Structure of Evolutionary Theory. Belknap Press (2002)
Matsuda T., Motoda H., Yoshida T., and Washio T. Lange S., Satoh K., and Smith C.H. Mining Patterns from Structured Data by Beam-wise Graph-Based Induction Discovery Science 2002 Heidelberg Springer 422-429
Schaffer A.A., Aravind L., Madden T.L., Shavirin S., Spouge J.L., Wolf Y.I., Koonin E.V., and Altschul S.F. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements Nucleic Acids Res. 2001 29 14 2994-3005
Karp P.D., Riley M., Saier M., Paulsen I.T., Paley S.M., and Pellegrini-Toole A. The EcoCyc and MetaCyc databases Nucleic Acids Research 2000 28 56-59
Vert, J.-P.: Support Vector Machine Prediction of Signal Peptide Cleavage Site Using a New Class of Kernels for Strings. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 649–660 (2002)
Aerts S., Thijs G., Coessens B., Staes M., Moreau Y., and De Moor B. Toucan: deciphering the cis-regulatory logic of coregulated genes Nucleic Acids Research 2003 31 6 1753-1764
Cappé O., Moulines E., and Rydén T. Inference in Hidden Markov Models 2005 Heidelberg Springer
Kielbasa S.M., Blüthgen N., Sers C., Schäfer R., and Herze H. Prediction of Cis-Regulatory Elements of Coregulated Genes Szymon Genome Informatics 2004 15 1 117-124
Cheng Cheung, L.-L., Siu-Ming Yiu, D.W.: Approximate string matching in DNA sequences. In: Proceedings DASFAA 2003, pp. 303–310 (2003)
Myers G. A fast bit-vector algorithm for approximate string matching based on dynamic programming Journal of the ACM 1999 46 3 395-415
Aoki K.F., Yamaguchi A., and Okuno Y. Effcient Tree-Matching Methods for Accurate Carbohydrate Database Queries Genome Informatics 2003 14 134-143
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, The Press Syndacate of the University of Cambridge, UK (1999)
Taylor W.R. Protein Structure Comparison Using Bipartite Graph Matching and Its Application to Protein Structure Classification Molecular & Cellular Proteomics 2002 1 4 334-339
Yang Q. and Sze S.-H. Path Matching and Graph Matching in Biological Networks Journal of Computational Biology 2007 14 1 56-67
Sholom M.W. and Indurkhya N. Predictive Data-Mining: A Practical Guide 1998 San Francisco Morgan Kaufmann
Tana A.H. and Panb H. Predictive neural networks for gene expression data analysis Neural Networks 2005 18 297-306
Ben-Dor A., Shamir R., and Yakhini Z. Clustering gene expression patterns Journal of Computational Biology 1999 6 3/4 281-297
Eisen M.B., Spellman P.T., Brown P.O., and Botstein D. Cluster analysis and display of genome-wide expression patterns Proc. Natl. Acad. Sci. USA 1998 95 25 14863-14868
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, vol. 1, pp. 281–297. University of California Press (1967)
Tavazoie S., Hughes J.D., Campbell M.J., Cho R.J., and Church G.H. Systematic determination of genetic network architecture Nature Genet. 1999 22 3 281-285
Herwig R., Poustka A.J., Muller C., Bull C., Lehrach H., and O’Brien J. Large-Scale Clustering of cDNA Fingerprinting Data Genome Research 1999 9 11 1093-1105
Heyer L.J., Kruglyak S., and Yooseph S. Exploring expression data: identification and analysis of coexpressed genes Genome Research 1999 9 11 1106-1115
De Smet F., Mathys J., Marchal K., Thijs G., De Moor B., and Moreau Y. Adaptive quality-based clustering of gene expression profiles Bioinformatics 2002 18 735-746
Kohonen T. Self-Organization and Associative Memory 1984 Berlin Springer
Tamayo P., Slonim D., Mesirov J., Zhu Q., Kitareewan S., Dmitrovsky E., Lander E.S., and Golub T.R. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation Proc. Natl. Acad. Sci. USA 1999 96 6 2907-2912
Mahony S., Golden A., Smith T.J., and Benos P.V. Improved detection of DNA motifs using a self-organized clustering of familial binding profiles Bioinformatics 2005 21 Suppl 1 283-291
Yeung K.Y., Fraley C., Mura A., Raftery A.E., and Ruzzo W.L. Model-based clustering and data transformations for gene expression data Bioinformatics 2001 17 977-987
Yeang, C.-H., Jaakkola, T.: Time Series Analysis of Gene Expression and Location Data. In: Proceedings of the Third IEEE Symposium on BioInformatics and BioEngineering (BIBE 2003), pp. 1–8 (2003)
Ramoni M.F., Sebastiani P., and Kohane I.S. Cluster analysis of gene expression dynamics Proc. Natl. Acad. Sci. USA 2002 99 14 9121-9126
Koski T.T. Hidden Markov Models for Bioinformatics 2002 Heidelberg Springer
Hartuv E. and Shamir R. A clustering algorithm based on graph connectivity Information Processing Letters 2000 76 4/6 175-181
Xu Y., Olman V., and Xu D. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees Bioinformatics 2002 18 536-545
Jiang, D., Pei, J., Zhang, A.: Interactive Exploration of Coherent Patterns in Time-Series Gene Expression Data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC, USA, pp. 24–27 (2003)
Sultan M., Wigle D.A., Cumbaa C.A., Marziar M., Glasgow J., Tsao M.S., and Jurisca J. Binary tree-structured vector quantization approach to clustering and visualizing microarray data Bioinformatics 2002 18 1 111-119
Bellaachia, A., Portnoy, D., Chen, Y., Elkahloun, A.G.: E-CAST: a data mining algorithm for gene expression data. In: Proceedings of the ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD 2002), pp. 49–54 (2002)
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), vol. 8, pp. 93–103 (2000)
Mirkin B. Mathematical Classification and Clustering 1996 Dordrecht Kluwer Academic Publishers
Van Mechelen I., Bock H.H., and De Boeck P. Two-mode clustering methods:a structured overview Statistical Methods in Medical Research 2004 13 5 363-394
Bryan, K., Cunningham, P., Bolshakova, N.: Biclustering of Expression Data Using Simulated Annealing. In: 18th IEEE Symposium on Computer-Baseds Medical Systems (CBMS 2005), pp. 383–388 (2005)
Kirkpatrick S., Gelatt C.D., and Vecchi M.P. Optimization by Simulated Annealing Science 1983 220 4598 671-680
Chakraborty, A., Maka, H.: Biclustering of Gene Expression Data Using Genetic Algorithm. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005), vol. 14(15), pp. 1–8 (2005)
Sushmita M. and Haider B. Multi-objective evolutionary biclustering of gene expression data Pattern Recognition 2006 39 12 2464-2477
Di Gesù, V., Giancarlo, R., Lo Bosco, G., Raimondi, A., Scaturro, D.: GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC Bioinformatics 6(289) (2005)
Di Gesù V. and Lo Bosco G. A genetic integrated fuzzy classifier Pattern Recognition Letters 2005 26 4 411-420
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.J.: Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinformatics 5(172) (2004)
Di Gesù, V., Lo Bosco, G.: GenClust: a Genetic Algorithm for Cluster Analysis. In: Proc. ADA III, pp. 12–18 (2004)
Jain A.K., Murty M.N., and Flynn P.J. Data Clustering: A Review ACM Computing Surveys 1999 31 3 264-323
Yuan G.C., Liu Y.J., Dion M.F., Slack M.D., Wu L.F., Altschuler S.J., and Rando O.J. Genome-Scale Identification of Nucleosome Positions in S. cerevisiae Science 2005 309 626-630
Delcher, A.L., Kasif, S., Goldberg, H.R., Hsu, W.H.: Protein secondary structure modelling with probabilistic networks. In: Proc. of Int. Conf. on Intelligent Systems and Molecular Biology, pp. 109–117 (1993)
Corona D., Di Gesù V., Lo Bosco G., Pinello L., and Yuan G.-C. A new Multi-Layers Method to Analyze Gene Expression Proc. KES 2007 2007 Heidelberg Springer
Yeung K.Y., Haynor D.R., and Ruzzo W.L. Validating clustering for gene expression data Bioinformatics 2001 17 309-318
Somogyi R., Wen X., Ma W., and Barker J.L. Developmental kinetic of GLAD family mRNAs parallel neurogenesis in the rat Spinal Cord Journal Neurosciences 1995 15 2575-2591
Spellman P., Sherlock G., Zhang M., et al. Comprehensive identification of cell cycle regulated genes of the yeast Saccharomyces Cerevisiae by microarray hybridization Journal of Mol. Biol. Cell 1998 9 3273-3297
Cho R.J. et al. A genome-wide transcriptional analysis of the mitotic cell cycle Journal of Molecular Cell 1998 2 65-73
Hartuv E., Schmitt A., Lange J., et al. An Algorithm for Clustering of cDNAs for Gene Expression Analysis Using Short Oligonucleotide Fingerprints Journal Genomics 2000 66 249-256
Jiang D., Pei J., and Zhang A. Towards Interactive Exploration of Gene Expression Patterns SIGKDD Explorations 2003 5 2 79-90
Delcher, A.L., Kasif, S., Goldberg, H.R., Hsu, W.H.: Protein secondary structure modelling with probabilistic networks. In: Proc. of Int. Conf. on Intelligent Systems and Molecular Biology, pp. 109–117 (1993)
Yuan G.C., Liu Y.J., Dion M.F., Slack M.D., Wu L.F., Altschuler S.J., and Rando O.J. Genome-Scale Identification of Nucleosome Positions in S. cerevisiae Science 2005 309 626-630
Delcher, A.L., Kasif, S., Goldberg, H.R., Hsu, W.H.: Protein secondary structure modelling with probabilistic networks. In: Proc. of Int. Conf. on Intelligent Systems and Molecular Biology, pp. 109–117 (1993)
Corona D., Di Gesù V., Lo Bosco G., Pinello L., and Yuan G.-C. A new Multi-Layers Method to Analyze Gene Expression Proc. KES 2007 11th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems 2007 Heidelberg Springer



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image Guide Proceedings
Pattern Recognition and Machine Intelligence: Second International Conference, PReMI 2007, Kolkata, India, December 18-22, 2007. Proceedings
Dec 2007
596 pages



Berlin, Heidelberg

Publication History

Published: 18 December 2007

Author Tags

  1. Clustering
  2. data mining
  3. bio-informatics
  4. Kernel methods
  5. Hidden Markov Models
  6. Multi-Layers Model


  • Article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics


View Options

View options






Share this Publication link

Share on social media