Abstract
A massive amount of extensible markup language (XML) data from various areas is available on the Web. Answering structural queries against XML data is important, as it is the core of information retrieval systems for XML data. Labeling scheme has been suggested for rapid query processing of massive XML data. Interval-based, prefix-based, and prime number labeling scheme exist. Of these, the prime number labeling scheme has the advantage of query processing by arithmetic operations. Recently, the repetitive prime number labeling scheme was proposed; this scheme produces a smaller label size than conventional prime number labeling using prime numbers repetitively. However, a parallel algorithm for the repetitive prime number labeling scheme does not exist; therefore, this scheme is difficult to apply to massive XML data. In this paper, a dynamic and parallel approach of XML labeling algorithm that works with MapReduce is proposed for, particularly, the repetitive prime number labeling scheme. Two optimization techniques are devised: the label assignment order adjustment to further reduce the label size and the upper tree compressing technique to reduce the memory requirements during the labeling process. Experiments over real-world XML data confirmed that the techniques are effective than the previous works.
Similar content being viewed by others
References
Choi H, Lee KH, Lee YJ (2014) Parallel labeling of massive XML data with MapReduce. J Supercomput 67(2):408–437
Christophides V, Karvounarakis G, Plexousakis D, Scholl M, Tourtounis S (2004) Optimizing taxonomic semantic web queries using labeling schemes. Web Semant Sci Serv Agents World Wide Web 1(2):207–228
Clark J, DeRose S et al (2004) XML Path Language (XPath) Version 1.0, W3C Recommendation, 1999
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Klaib A, Joan L (2014) Investigation into indexing XML data techniques. In: Proceedings. The International Conference on Internet Computing (ICOMP), 21–24 July 2014, Las Vegas, USA
Leonardi E, Bhowmick SS, Madria S (2005) Xandy: detecting changes on large unordered XML documents using relational databases. In: Zhou L, Ooi BC, Meng X (eds) Database systems for advanced applications. Springer, Berlin, pp 711–723
Li C, Ling TW (2005) QED: a novel quaternary encoding to completely avoid re-labeling in XML updates. In: CIKM. ACM
Lin RR, Chang YH, Chao KM (2013) A compact and efficient labeling scheme for XML documents. In: Meng W, Feng L, Bressan S, Winiwarter W, Song W(eds) Database systems for advanced applications. Springer, Berlin, pp 269–283
Lu J, Meng X, Ling TW (2011) Indexing and querying XML using extended Dewey labeling scheme. Data Knowl Eng 70(1):35–59
Morozov S, Saiedian H, Wang H (2014) Reusable prime number labeling scheme for hierarchical data representation in relational databases. J Comput Inf Technol 22(1):31–43
Subramaniam S, Haw SC, Soon LK (2014) Relab: a subtree based labeling scheme for efficient XML query processing. In: IEEE 2nd International Symposium on Telecommunication Technologies (ISTT). IEEE, pp 121–125
Sun DH, Hwang SC (2014) A labeling methods for keyword search over large XML documents. J KIISE 41(9):699–706
Tatarinov I, Viglas SD, Beyer K, Shanmugasundaram J, Shekita E, Zhang C (2002) Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 204–215. ACM
Wang Y, DeWitt DJ, Cai JY (2003) X-diff: an effective change detection algorithm for XML documents. In: ICDE, 2003. IEEE, pp 519–530
Wu X, Lee ML, Hsu W (2004) A prime number labeling scheme for dynamic ordered XML trees. In: ICDE
Xu L, Bao Z, Ling TW (2007) A dynamic labeling scheme using vectors. In: Wagner R, Revell N, Pernul G (eds) Database and expert systems applications. Springer, Berlin, pp 130–140
Xu L, Ling TW, Wu H (2012) Labeling dynamic XML documents: an order-centric approach. IEEE Trans Knowl Data Eng 24(1):100–113
Xu L, Ling TW, Wu H, Bao Z (2009) DDE: from dewey to a fully dynamic XML labeling scheme. In: SIGMOD. ACM
Acknowledgments
This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0101-16-0054, WiseKB: Big data based self-evolving knowledge base and reasoning platform) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A1A1002236) and ETRI R&D Program (“Development of Big Data Platform for Dual Mode Batch-Query Analytics, 16ZS1410”) funded by the Government of Korea.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahn, J., Im, DH., Lee, T. et al. A dynamic and parallel approach for repetitive prime labeling of XML with MapReduce. J Supercomput 73, 810–836 (2017). https://doi.org/10.1007/s11227-016-1803-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1803-y