Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-540-77046-6_47guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Data Analysis and Bioinformatics

Published: 15 March 2023 Publication History

Abstract

Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several fields such as statistics, pattern recognition, and machine learning. Data mining adds to clustering the complications of very large data-sets with many attributes of different types. And this is a typical situation in biology. Some cases studies are also described.

References

[1]
Brudno M., Malde S., and Poliakov A. Glocal alignment: finding rearrangements during alignment Bioinformatics 2003 19 1 54-62
[2]
Rogic, S.: The role of pre-mRNA secondary structure in gene splicing in Saccharomyces cerevisiae, PhD Dissertation, University of British Columbia (2006)
[3]
Bourne P.E. and Shindyalov I.N. Bourne P.E. and Weissig H. Structure Comparison and Alignment Structural Bioinformatics 2003 Hoboken, NJ Wiley-Liss
[4]
Zhang Y. and Skolnick J. The protein structure prediction problem could be solved using the current PDB library Proc. Natl. Acad. Sci. USA 2005 102 4 1029-1034
[5]
Gould, S.J.: The Structure of Evolutionary Theory. Belknap Press (2002)
[6]
Matsuda T., Motoda H., Yoshida T., and Washio T. Lange S., Satoh K., and Smith C.H. Mining Patterns from Structured Data by Beam-wise Graph-Based Induction Discovery Science 2002 Heidelberg Springer 422-429
[7]
Schaffer A.A., Aravind L., Madden T.L., Shavirin S., Spouge J.L., Wolf Y.I., Koonin E.V., and Altschul S.F. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements Nucleic Acids Res. 2001 29 14 2994-3005
[9]
Karp P.D., Riley M., Saier M., Paulsen I.T., Paley S.M., and Pellegrini-Toole A. The EcoCyc and MetaCyc databases Nucleic Acids Research 2000 28 56-59
[10]
Vert, J.-P.: Support Vector Machine Prediction of Signal Peptide Cleavage Site Using a New Class of Kernels for Strings. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 649–660 (2002)
[11]
Aerts S., Thijs G., Coessens B., Staes M., Moreau Y., and De Moor B. Toucan: deciphering the cis-regulatory logic of coregulated genes Nucleic Acids Research 2003 31 6 1753-1764
[13]
Cappé O., Moulines E., and Rydén T. Inference in Hidden Markov Models 2005 Heidelberg Springer
[14]
Kielbasa S.M., Blüthgen N., Sers C., Schäfer R., and Herze H. Prediction of Cis-Regulatory Elements of Coregulated Genes Szymon Genome Informatics 2004 15 1 117-124
[15]
Cheng Cheung, L.-L., Siu-Ming Yiu, D.W.: Approximate string matching in DNA sequences. In: Proceedings DASFAA 2003, pp. 303–310 (2003)
[16]
Myers G. A fast bit-vector algorithm for approximate string matching based on dynamic programming Journal of the ACM 1999 46 3 395-415
[17]
Aoki K.F., Yamaguchi A., and Okuno Y. Effcient Tree-Matching Methods for Accurate Carbohydrate Database Queries Genome Informatics 2003 14 134-143
[18]
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, The Press Syndacate of the University of Cambridge, UK (1999)
[19]
Taylor W.R. Protein Structure Comparison Using Bipartite Graph Matching and Its Application to Protein Structure Classification Molecular & Cellular Proteomics 2002 1 4 334-339
[20]
Yang Q. and Sze S.-H. Path Matching and Graph Matching in Biological Networks Journal of Computational Biology 2007 14 1 56-67
[21]
Sholom M.W. and Indurkhya N. Predictive Data-Mining: A Practical Guide 1998 San Francisco Morgan Kaufmann
[22]
Tana A.H. and Panb H. Predictive neural networks for gene expression data analysis Neural Networks 2005 18 297-306
[23]
Ben-Dor A., Shamir R., and Yakhini Z. Clustering gene expression patterns Journal of Computational Biology 1999 6 3/4 281-297
[24]
Eisen M.B., Spellman P.T., Brown P.O., and Botstein D. Cluster analysis and display of genome-wide expression patterns Proc. Natl. Acad. Sci. USA 1998 95 25 14863-14868
[25]
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, vol. 1, pp. 281–297. University of California Press (1967)
[26]
Tavazoie S., Hughes J.D., Campbell M.J., Cho R.J., and Church G.H. Systematic determination of genetic network architecture Nature Genet. 1999 22 3 281-285
[27]
Herwig R., Poustka A.J., Muller C., Bull C., Lehrach H., and O’Brien J. Large-Scale Clustering of cDNA Fingerprinting Data Genome Research 1999 9 11 1093-1105
[28]
Heyer L.J., Kruglyak S., and Yooseph S. Exploring expression data: identification and analysis of coexpressed genes Genome Research 1999 9 11 1106-1115
[29]
De Smet F., Mathys J., Marchal K., Thijs G., De Moor B., and Moreau Y. Adaptive quality-based clustering of gene expression profiles Bioinformatics 2002 18 735-746
[30]
Kohonen T. Self-Organization and Associative Memory 1984 Berlin Springer
[31]
Tamayo P., Slonim D., Mesirov J., Zhu Q., Kitareewan S., Dmitrovsky E., Lander E.S., and Golub T.R. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation Proc. Natl. Acad. Sci. USA 1999 96 6 2907-2912
[33]
Mahony S., Golden A., Smith T.J., and Benos P.V. Improved detection of DNA motifs using a self-organized clustering of familial binding profiles Bioinformatics 2005 21 Suppl 1 283-291
[34]
Yeung K.Y., Fraley C., Mura A., Raftery A.E., and Ruzzo W.L. Model-based clustering and data transformations for gene expression data Bioinformatics 2001 17 977-987
[35]
Yeang, C.-H., Jaakkola, T.: Time Series Analysis of Gene Expression and Location Data. In: Proceedings of the Third IEEE Symposium on BioInformatics and BioEngineering (BIBE 2003), pp. 1–8 (2003)
[36]
Ramoni M.F., Sebastiani P., and Kohane I.S. Cluster analysis of gene expression dynamics Proc. Natl. Acad. Sci. USA 2002 99 14 9121-9126
[37]
Koski T.T. Hidden Markov Models for Bioinformatics 2002 Heidelberg Springer
[38]
Hartuv E. and Shamir R. A clustering algorithm based on graph connectivity Information Processing Letters 2000 76 4/6 175-181
[39]
Xu Y., Olman V., and Xu D. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees Bioinformatics 2002 18 536-545
[40]
Jiang, D., Pei, J., Zhang, A.: Interactive Exploration of Coherent Patterns in Time-Series Gene Expression Data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC, USA, pp. 24–27 (2003)
[41]
Sultan M., Wigle D.A., Cumbaa C.A., Marziar M., Glasgow J., Tsao M.S., and Jurisca J. Binary tree-structured vector quantization approach to clustering and visualizing microarray data Bioinformatics 2002 18 1 111-119
[42]
Bellaachia, A., Portnoy, D., Chen, Y., Elkahloun, A.G.: E-CAST: a data mining algorithm for gene expression data. In: Proceedings of the ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD 2002), pp. 49–54 (2002)
[43]
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), vol. 8, pp. 93–103 (2000)
[44]
Mirkin B. Mathematical Classification and Clustering 1996 Dordrecht Kluwer Academic Publishers
[45]
Van Mechelen I., Bock H.H., and De Boeck P. Two-mode clustering methods:a structured overview Statistical Methods in Medical Research 2004 13 5 363-394
[46]
Bryan, K., Cunningham, P., Bolshakova, N.: Biclustering of Expression Data Using Simulated Annealing. In: 18th IEEE Symposium on Computer-Baseds Medical Systems (CBMS 2005), pp. 383–388 (2005)
[47]
Kirkpatrick S., Gelatt C.D., and Vecchi M.P. Optimization by Simulated Annealing Science 1983 220 4598 671-680
[48]
Chakraborty, A., Maka, H.: Biclustering of Gene Expression Data Using Genetic Algorithm. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005), vol. 14(15), pp. 1–8 (2005)
[49]
Sushmita M. and Haider B. Multi-objective evolutionary biclustering of gene expression data Pattern Recognition 2006 39 12 2464-2477
[50]
Di Gesù, V., Giancarlo, R., Lo Bosco, G., Raimondi, A., Scaturro, D.: GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC Bioinformatics 6(289) (2005)
[51]
Di Gesù V. and Lo Bosco G. A genetic integrated fuzzy classifier Pattern Recognition Letters 2005 26 4 411-420
[52]
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.J.: Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinformatics 5(172) (2004)
[53]
Di Gesù, V., Lo Bosco, G.: GenClust: a Genetic Algorithm for Cluster Analysis. In: Proc. ADA III, pp. 12–18 (2004)
[54]
Jain A.K., Murty M.N., and Flynn P.J. Data Clustering: A Review ACM Computing Surveys 1999 31 3 264-323
[55]
Yuan G.C., Liu Y.J., Dion M.F., Slack M.D., Wu L.F., Altschuler S.J., and Rando O.J. Genome-Scale Identification of Nucleosome Positions in S. cerevisiae Science 2005 309 626-630
[56]
Delcher, A.L., Kasif, S., Goldberg, H.R., Hsu, W.H.: Protein secondary structure modelling with probabilistic networks. In: Proc. of Int. Conf. on Intelligent Systems and Molecular Biology, pp. 109–117 (1993)
[57]
Corona D., Di Gesù V., Lo Bosco G., Pinello L., and Yuan G.-C. A new Multi-Layers Method to Analyze Gene Expression Proc. KES 2007 2007 Heidelberg Springer
[58]
Yeung K.Y., Haynor D.R., and Ruzzo W.L. Validating clustering for gene expression data Bioinformatics 2001 17 309-318
[59]
Somogyi R., Wen X., Ma W., and Barker J.L. Developmental kinetic of GLAD family mRNAs parallel neurogenesis in the rat Spinal Cord Journal Neurosciences 1995 15 2575-2591
[60]
Spellman P., Sherlock G., Zhang M., et al. Comprehensive identification of cell cycle regulated genes of the yeast Saccharomyces Cerevisiae by microarray hybridization Journal of Mol. Biol. Cell 1998 9 3273-3297
[61]
Cho R.J. et al. A genome-wide transcriptional analysis of the mitotic cell cycle Journal of Molecular Cell 1998 2 65-73
[62]
Hartuv E., Schmitt A., Lange J., et al. An Algorithm for Clustering of cDNAs for Gene Expression Analysis Using Short Oligonucleotide Fingerprints Journal Genomics 2000 66 249-256
[63]
Jiang D., Pei J., and Zhang A. Towards Interactive Exploration of Gene Expression Patterns SIGKDD Explorations 2003 5 2 79-90
[64]
Delcher, A.L., Kasif, S., Goldberg, H.R., Hsu, W.H.: Protein secondary structure modelling with probabilistic networks. In: Proc. of Int. Conf. on Intelligent Systems and Molecular Biology, pp. 109–117 (1993)
[65]
Yuan G.C., Liu Y.J., Dion M.F., Slack M.D., Wu L.F., Altschuler S.J., and Rando O.J. Genome-Scale Identification of Nucleosome Positions in S. cerevisiae Science 2005 309 626-630
[66]
Delcher, A.L., Kasif, S., Goldberg, H.R., Hsu, W.H.: Protein secondary structure modelling with probabilistic networks. In: Proc. of Int. Conf. on Intelligent Systems and Molecular Biology, pp. 109–117 (1993)
[67]
Corona D., Di Gesù V., Lo Bosco G., Pinello L., and Yuan G.-C. A new Multi-Layers Method to Analyze Gene Expression Proc. KES 2007 11th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems 2007 Heidelberg Springer

Index Terms

  1. Data Analysis and Bioinformatics
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Pattern Recognition and Machine Intelligence: Second International Conference, PReMI 2007, Kolkata, India, December 18-22, 2007. Proceedings
    Dec 2007
    596 pages
    ISBN:978-3-540-77045-9
    DOI:10.1007/978-3-540-77046-6

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 15 March 2023

    Author Tags

    1. Clustering
    2. data mining
    3. bio-informatics
    4. Kernel methods
    5. Hidden Markov Models
    6. Multi-Layers Model

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media