Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Analytical processing of XML documents: opportunities and challenges

Published: 01 June 2005 Publication History

Abstract

Online Analytical Processing (OLAP) has been a valuable tool for analyzing trends in business information. While the multi-dimensional cube model used by OLAP is ideal for analyzing structured business data, it is not suitable for representing and analyzing complex semi-structured data, such as, XML documents. Need for analyzing XML documents is gaining urgency as XML has become the language of choice for data representation across a wide range of application domains. This paper describes a proposal for analyzing XML documents using the abstract XML tree model. We argue that OLAP's multi-dimensional aggregation operators can not express structurally complex analytical operations on XML documents. Hence, we outline new extensions to XQuery for supporting such complex analytical operations. Finally, we discuss various challenges in implementing XML analysis in a real system.

References

[1]
B. Babcock, S. Chaudhuri, and G. Das. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD international conference on on Management of data, pages 539--550. ACM Press, 2003.]]
[2]
D. Barbara and M. Sullivan. Quasi-Cubes: Exploiting Approximations in Multidimensional Databases. ACM SIGMOD Record, 26(3):12--17, 1997.]]
[3]
D. Carmel, Y. S. Maarek. M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML documents via XML fragments. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. pages 151--158. ACM Press, 2003.]]
[4]
S. Chaudhuri, G. Das. and V. Narasayya. A robust, optimization-based approach for approximate answering of aggregate queries. In Proceedings of the 2001 ACM SIGMOD international conference on Management of data, pages 295--306. ACM Press, 2001.]]
[5]
S. Chaudhuri and U. Dayal. An Overview of Data Warehousing and OLAP Technology. Data Mining and Knowledge Discovery, 26(1):65--74, 1997.]]
[6]
Z. Chen, H. V. Jagadish, L. V. S. Lakshmanan, and S. Paparizos. From Tree Patterns to Generalized Tree Patterns: On Efficient Evaluation of XQuery. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), pages 237--248, September 2003.]]
[7]
World Wide Web Consortium. W3C Architecture Domain: XML. www.w3c.org/xml. Online Documents.]]
[8]
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By. Cross-Tab and Sub-Totals. Data Mining and Knowledge Discovery, 1(1):29--53, March 1997.]]
[9]
Moving Pictures Experts Group. MPEG Standards. www.chiariglione.org/mpeg.]]
[10]
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In Proceedings of the 2003 ACM SIGMOD international conference on on Management of data, pages 16--27. ACM Press, 2003.]]
[11]
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, January 1997.]]
[12]
C. A. Hurtado and A. O. Mendelzon. Reasoning about Summarizability in Heterogeneous Multidimensional Schemas. In Proceedings of the International Conference on Database Theory, pages 375--389, 2001.]]
[13]
N. Huyn. Data Analysis and Mining in the Life Sciences. ACM SIGMOD Record, 30(3):76--85, 2001.]]
[14]
H. V. Jagadish, L. V. S. Lakshmanan, and D. Srivastava. What can Hierarchies do for Data Warehouses? In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 530--541, September 1999.]]
[15]
M. R. Jensen, T. H. Moller. and T. B. Pedersen. Specifying OLAP Cubes on XML Data. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management, pages 18--20, July 2001.]]
[16]
A. Lerner and D. Shasha. Aquery: Query language for ordered data, optimization techniques, and experiments. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), pages 345--356, September 2003.]]
[17]
A. Marian and J. Simeon. Projecting XML Documents. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), pages 213--224, September 2003.]]
[18]
G. Navarro and R. Baeza-Yates. Proximal nodes: a model to query document databases by content and structure. ACM Trans. Inf. Syst., 15(4):400--435, 1997.]]
[19]
S. Paparizos, S. Al-Khalifa, H. V. Jagadish. L. V. S. Lakshmanan, A. Nierman, D. Srivastava. and Y. Wu. Grouping in XML. In EDBT Workshops 2002, pages 128--147, 2002.]]
[20]
D. Pedersen, K. Riis, and T. B. Pedersen. Query Optimization for OLAP-XML Federations. In Proceedings of DOLAP 2002, ACM Fifth International Workshop on Data Warehousing and OLAP, pages 57--64, November 2002.]]
[21]
N. Pendse. The OLAP Report. Online Document www.olapreport.com.]]
[22]
E. Pourabbas and M. Rafanelli. Hierarchies and Relative Operators in the OLAP Environment. ACM SIGMOD Record, 29(1):33--37, 2000.]]
[23]
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In In Proceedings of IJCAI, pages 448--453, 1995.]]
[24]
J. Trujillo, S. Lujan-Mora, and I. Song. Applying UML and XML for designing and interchanging information for data warehouses and OLAP. Journal of Database Management, 15(1):41--72, 2004.]]
[25]
P. Vassiliadis and T. Sellis. A Survey of Logical Models for OLAP Databases. ACM SIGMOD Record, 28(4):64--49, 1999.]]

Cited By

View all

Index Terms

  1. Analytical processing of XML documents: opportunities and challenges

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 34, Issue 2
    June 2005
    91 pages
    ISSN:0163-5808
    DOI:10.1145/1083784
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2005
    Published in SIGMOD Volume 34, Issue 2

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A Landscape of XML Data from Analytics PerspectiveProcedia Computer Science10.1016/j.procs.2020.06.046173(392-402)Online publication date: 2020
    • (2019)Privacy Preserving OLAP over Distributed XML DataJournal of Computer and System Sciences10.1016/j.jcss.2011.02.00477:6(965-987)Online publication date: 1-Jan-2019
    • (2018)Topological XML data cube constructionInternational Journal of Web Engineering and Technology10.1504/IJWET.2013.0591048:4(347-368)Online publication date: 20-Dec-2018
    • (2018)Finding an application-appropriate model for XML data warehousesInformation Systems10.1016/j.is.2009.12.00235:6(662-687)Online publication date: 29-Dec-2018
    • (2014)Multidimensional Data Analysis Based on LinksSystems and Software Development, Modeling, and Analysis10.4018/978-1-4666-6098-4.ch009(212-281)Online publication date: 2014
    • (2014)Privacy Preserving OLAP Data CubesEncyclopedia of Business Analytics and Optimization10.4018/978-1-4666-5202-6.ch169(1886-1897)Online publication date: 2014
    • (2014)OLAP over XLM DataEncyclopedia of Business Analytics and Optimization10.4018/978-1-4666-5202-6.ch150(1680-1688)Online publication date: 2014
    • (2012)Analytical Processing Over XML and XLinkInternational Journal of Data Warehousing and Mining10.4018/jdwm.20120101038:1(52-92)Online publication date: Jan-2012
    • (2012)A practical application of our MDD approach for modeling secure XML data warehousesDecision Support Systems10.1016/j.dss.2011.11.00852:4(899-925)Online publication date: Mar-2012
    • (2010)Research on Index Technology for Group-by Aggregation Query in XML CubeInformation Technology Journal10.3923/itj.2010.116.1239:1(116-123)Online publication date: 1-Jan-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media