Abstract
Many researches related to storing XML data have been performed and some of them proposed methods to improve the performance of databases by reducing the joins between tables. Those methods are very efficient in deriving and optimizing tables from a DTD or XML schema in which elements and attributes are defined. Nevertheless, those methods are not effective in an XML schema for biological information such as microarray data because even though microarray data have complex hierarchies just a few core values of microarray data repeatedly appear in the hierarchies. In this paper, we propose a new algorithm to extract core features which is repeatedly occurs in an XML schema for biological information, and elucidate how to improve classification speed and efficiency by using a decision tree rather than pattern matching in classifying structural similarities. We designed a database for storing biological information using features extracted by our algorithm. By experimentation, we showed that the proposed classification algorithm also reduced the number of joins between tables.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/11915034_125.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Schoning, H.: Tamino - A DBMS designed for XML. In: Proceedings of the 17th ICDE Conference, Heidelberg, Germany, pp. 149–154 (2001)
Tatarinov, I., Viglas, S.D.: Storing and Querying Ordered XML Using a Relational Database System. In: Proceedings of the 2002 ACM SIGMODACM SIGMOD international conference on Management of data, Madison, Wisconsin, pp. 204–215 (2002)
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitz, D., Naughton, J.: Relational databases for querying xml documents: Limitations and opportunities. In: Proc. Intl. Conf. on 25th VLDB (1999)
Runapongsa, K., Patel, J.M.: Storing and Querying XML Data in Object-Relational DBMSs. In: EDBT Workshop XMLDM 2002, pp. 266–285 (2002)
Laur, P.A., Masseglia, F., Poncelet, P.: Schema Mining: Finding Structural Regularity among Semistructured Data. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, p. 498. Springer, Heidelberg (2000)
Anne, L., Pascal, P., Maguelonne, T.: Towards a fuzzy approach for mining XML mediator schemas. In: Fuzzy Logic and the Senmantic Web Workshop (2005)
Witten, I.H., Frank, E.: Data Mining Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco
Wang, H., Li, J., Luo, J., He, Z.: XCpaqs:Compression of XML Document with XPath Query Support. In: IEEE, Proceedings of the International Conference on Information Technology:Coding and Computing (ITCC 2004) (2004)
Sarkans, U., Parkinson, H., Lara, G.G., Oezcimen, A., Sharma, A., Abeygunawardena, N., Contrino, S., Holloway, E., Rocca-Serra, P., Mukherjee, G., Shojatalab, M., Kapushesky, M., Sansone, S.A., Farne, A., Rayner, T., Brazma, A.: The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics 21, 1495–1501 (2005)
Levene, M., Wood, P.: XML Structure Compression. In: Proc. 2nd Int. Workshop on Web Dynamics (2002)
JAXB (Java Architecture for XML Binding).: http://java.sun.com/xml/downloads/jaxb.html
XSLT (XML Stylesheet Language Transformations).: http://www.w3.org/Style/XSL/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeong, J., Shin, D., Cho, C., Shin, D. (2006). Structural Similarity Mining in Semi-structured Microarray Data for Efficient Storage Construction. In: Meersman, R., Tari, Z., Herrero, P. (eds) On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. OTM 2006. Lecture Notes in Computer Science, vol 4277. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11915034_96
Download citation
DOI: https://doi.org/10.1007/11915034_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48269-7
Online ISBN: 978-3-540-48272-7
eBook Packages: Computer ScienceComputer Science (R0)