Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

FoXtrot: Distributed structural and value XML filtering

Published: 02 October 2012 Publication History

Abstract

Publish/subscribe systems have emerged in recent years as a promising paradigm for offering various popular notification services. In this context, many XML filtering systems have been proposed to efficiently identify XML data that matches user interests expressed as queries in an XML query language like XPath. However, in order to offer XML filtering functionality on an Internet-scale, we need to deploy such a service in a distributed environment, avoiding bottlenecks that can deteriorate performance. In this work, we design and implement FoXtrot, a system for filtering XML data that combines the strengths of automata for efficient filtering and distributed hash tables for building a fully distributed system. Apart from structural-matching, performed using automata, we also discuss different methods for evaluating value-based predicates. We perform an extensive experimental evaluation of our system, FoXtrot, on a local cluster and on the PlanetLab network and demonstrate that it can index millions of user queries, achieving a high indexing and filtering throughput. At the same time, FoXtrot exhibits very good load-balancing properties and improves its performance as we increase the size of the network.

References

[1]
Aberer, K., Cudr_e-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., and Schmidt, R. 2003. P-Grid: A self-organizing structured P2P system. SIGMOD Record 32, 3, 29--33.
[2]
Abiteboul, S., Manolescu, I., Polyzotis, N., Preda, N., and Sun, C. 2008. XML processing in DHT networks. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE'08). IEEE, Los Alamitos, CA, 606--615.
[3]
Aekaterinidis, I. and Triantafillou, P. 2006. PastryStrings: A comprehensive content-based publish/subscribe DHT network. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06). IEEE, Los Alamitos, CA, 23--.
[4]
Altinel, M. and Franklin, M. J. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB'00). Morgan Kaufmann, San Francisco, CA, 53--64.
[5]
Aspnes, J. and Shah, G. 2003. Skip graphs. In Proceedings of the14th Annual ACM-SIAM Symposium on Discrete algorithms (SODA'03). SIAM, Philadelphia, PA, 384--393.
[6]
Balakrishnan, H., Kaashoek, M. F., Karger, D., Morris, R., and Stoica, I. 2003. Looking up data in p2p systems. Comm. ACM 46, 43--48.
[7]
Barbosa, D., Mignet, L., and Veltri, P. 2006. Studying the XML Web: Gathering statistics from an XML sample. World Wide Web 9, 2, 187--212.
[8]
Bonifati, A., Matrangolo, U., Cuzzocrea, A., and Jain, M. 2004. XPath lookup queries in P2P networks. In Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management (WIDM'04). ACM, New York, 48--55.
[9]
Bruno, N., Gravano, L., Koudas, N., and Srivastava, D. 2003. Navigation- vs. index-based XML multiquery processing. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). IEEE, Los Alamitos, CA, 139--150.
[10]
Chan, C. Y., Felber, P., Garofalakis, M. N., and Rastogi, R. 2002. Efficient Filtering of XML documents with XPath expressions. In Proceedings of the 18th International Conference on Data Engineering (ICDE'02). IEEE, Los Alamitos, CA, 235.
[11]
Chan, C. Y. and Ni, Y. 2007. Efficient XML Data dissemination with piggybacking. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 737--748.
[12]
Chand, R. and Felber, P. 2008. Scalable distribution of XML content with XNet. IEEE Trans. Parallel Distrib. Syst. 19, 4, 447--461.
[13]
Chand, R. and Felber, P. A. 2003. A scalable protocol for content-based routing in overlay networks. In Proceedings of the 2nd IEEE International Symposium on Network Computing and Applications (NCA'03). IEEE, Los Alamitos, CA, 123--.
[14]
Clark, J. and DeRose, S. J. 1999. XML path language (XPath). Version 1.0. World Wide Web Consortium, Recommendation. http://www.w3.org/TR/xpath.
[15]
Consens, M. P. and Milo, T. 1994. Optimizing queries on files. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'94). ACM, New York, 301--312.
[16]
Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance XML ffltering. ACM Trans. Datab. Syst. 28, 4, 467--516.
[17]
Diao, Y., Rizvi, S., and Franklin, M. J. 2004. Towards an internet-scale XML dissemination service. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'04). VLDB Endowment, 612--623.
[18]
Felber, P., Chan, C., Garofalakis, M., and Rastogi, R. 2003. Scalable filtering of XML data for Web services. IEEE Internet Comput 7, 1, 49--57.
[19]
Fenner, W., Rabinovich, M., Ramakrishnan, K. K., Srivastava, D., and Zhang, Y. 2005. XTreeNet: Scalable overlay networks for XML content dissemination and querying (synopsis). In Proceedings of the 10th International Workshop on Web Content Caching and Distribution (WCW'05). IEEE, Los Alamitos, CA, 41--46.
[20]
FreePastry release 2009. FreePastry 2.1 release. http://www.freepastry.org/FreePastry/.
[21]
Galanis, L., Wang, Y., Jeffery, S., and DeWitt, D. J. 2003. Locating data sources in large distributed systems. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'03). VLDB Endowment,874--885.
[22]
Gong, X., Qian, W., Yan, Y., and Zhou, A. 2005. Bloom filter-based XML packets filtering for millions of path queries. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 890--901.
[23]
Gupta, A. K. and Suciu, D. 2003. Stream processing of XPath queries with predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). ACM, New York, 419--430.
[24]
Hopcroft, J. E., Motwani, R., Rotwani, and Ullman, J. D. 2000. Introduction to Automata Theory, Languages and Computability. Addison-Wesley, Boston, MA.
[25]
Hou, S. and Jacobsen, H. A. 2006. Predicate-based filtering of XPath expressions. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE, Los Alamitos, CA, 53--.
[26]
IBM XML.1999. Generator 1999. IBM XML Generator. http://www.alphaworks.ibm.com/xmlgenerator.
[27]
Jagadish, H. V., Ooi, B. C., Tan, K., and Vu, Q. H. 2005. BATON:A balanced tree structure for peer-to-peer networks. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). VLDB Endowment, 661--672.
[28]
Jagadish, H. V., Ooi, B. C., Tan, K., Vu, Q. H., and Zhang, R. 2006. Speeding up search in peer-to-peer networks with a multi-way tree structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). ACM, New York, 1--12.
[29]
Kannan, J., Yang, B., Shenker, S., Sharma, P., Banerjee, S., Basu, S., and Ju Lee, S. 2006. Smartseer: Using a dht to process continuous queries over peer-to-peer networks. In Proceedings of the IEEE INFOCOM.
[30]
Koloniari, G. and Pitoura, E. 2004. Content-based routing of path queries in peer-to-peer systems. In Proceedings of the Advances in Database Technology (EDBT'04). Springer, 29--47.
[31]
Liarou, E., Idreos, S., and Koubarakis, M. 2006. Evaluating conjunctive triple pattern queries over large structured overlay networks. In Proceedings of the International Semantic Web Conference. 399--413.
[32]
Lua, E. K., Crowcroft, J., Pias, M., Sharma, R., and Lim, S. 2005. A survey and comparison of peer-to-peer overlay network schemes. IEEE Comm. Surv. Tutorials, 7, 2, 72--93.
[33]
Manola, F. and Miller, E. 2004. RDF primer: W3c recommendation. Decision Support Systems.
[34]
Miliaraki, I. 2011. Distributed filtering and dissemination of XML data in peer-to-peer systems. Ph.D. dissertations, Department of Informatics and Telecommunications, National and Kapodistrian, University of Athens.
[35]
Miliaraki, I., Kaoudi, Z., and Koubarakis, M. 2008. XML data dissemination using automata on top of structured overlay networks. In Proceedings of the 17th International World Wide Web Conference (WWW'08). ACM, New York, 865--874.
[36]
Miliaraki, I. and Koubarakis, M. 2010. Distributed structural and value XML filtering. In Proceedings of the 4th ACM International Conference on Distributed Event-Based Systems (DEBS'10). ACM, New York, 2--13.
[37]
Moro, M. M., Bakalov, P., and Tsotras, V. J. 2007. Early profile pruning on XML-aware publish/subscribe systems. In Proceedings of the 33rd International Conference on Very large Data Bases (VLDB'07). VLDB Endowment, 866--877.
[38]
Papaemmanouil, O. and Cetintemel, U. 2005. SemCast: Semantic multicast for content-based data dissemination. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 242--253.
[39]
Perez, J., Arenas, M., and Gutierrez, C. 2010. nSPARQL: A navigational language for RDF. Web Semant. 8, 255--270.
[40]
Ramabhadran, S., Ratnasamy, S., Hellerstein, J. M., and Shenker, S. 2004. Brief announcement: Prefix hash tree. In Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing (PODC'04). ACM, New York, 368--368.
[41]
Ramasubramanian, V., Peterson, R., and Sirer, E. G. 2006. Corona: A high performance publish/subscribe system for the World Wide Web. In Proceedings of the 3rd Conference on Networked Systems Design & Implementation (NSDI'06). Vol. 3. USENIX Association, Berkeley, CA, 2--2.
[42]
Rao, P. R. and Moon, B. 2009. Locating XML documents in a peer-to-peer network using distributed hash tables. IEEE Trans. Knowl. Data Eng. 21, 12, 1737--1752.
[43]
Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. 2001. A scalable content addressable network. SIGCOMM Comput. Commun. Rev. 31, 161--172.
[44]
Rhea, S., Chun, B.-G., Kubiatowicz, J., and Shenker, S. 2005. Fixing the embarrassing slowness of OpenDHT on PlanetLab. In Proceedings of the 2nd Conference on Real, Large Distributed Systems (WORLDS'05). Vol. 2, USENIX Association, Berkeley, CA, 25--30.
[45]
Rowstron, A. and Druschel, P. 2001. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed System Platforms (Middleware '01). Springer, Berlin, 329--350.
[46]
Skobeltsyn, G., Hauswirth, M., and Aberer, K. 2005. Efficient processing of XPath queries with structured overlay networks. In Proceedings of the OTM Conferences. 1243--1260.
[47]
Snoeren, A. C., Conley, K., and Gifford, D. K. 2001. Mesh-based content routing using XML. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 160--173.
[48]
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'01). ACM, New York, 149--160.
[49]
Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., and Myllymaki, J. 2004. Implementing a scalable XML publish/subscribe system using relational database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). ACM, New York, 479--490.
[50]
Tryfonopoulos, C., Idreos, S., and Koubarakis, M. 2005. Publish/subscribe functionality in IR environments using structured overlay networks. In Proceedings of the SIGIR. 322--329.
[51]
Uchiyama, H., Onizuka, M., and Honishi, T. 2005. Distributed XML stream filtering system with high scalability. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 968--977.
[52]
Voulgaris, S., Riviere, E., Kermarrec, A.-M., and van Steen, M. 2006. Sub-2-sub: Self-organizing content-based publish/subscribe for dynamic large scale collaborative networks. In Proceedings of the IPTPS.
[53]
XMark 2001. XMark: An XML benchmark project. http://www.xml-benchmark.org/.
[54]
YFilter release. 2004. YFilter 1.0 release. http://yfilter.cs.umass.edu/code_release.htm.
[55]
Zhang, C., Krishnamurthy, A., and Wang, R. Y. 2005. Brushwood: Distributed trees in peer-to-peer systems. In Peer-to-Peer Systems IV, 4th International Workshop (IPTPS'05). Lecture Notes in Computer Science, Vol. 3640, Springer, Berlin, 47--57.
[56]
Zhou, A., Qian, W., Gong, X., and Zhou, M. 2007. Sonnet: An efficient distributed content-based dissemination broker (poster paper). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 1094--1096.
[57]
Zhou, M. and Wu, Y. 2010. XML-based RDF data management for efficient query processing. In Proceedings of the 13th International Workshop on the Web and Databases (WebDB'10). ACM, New York, 3:1--3:6.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 6, Issue 3
September 2012
133 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2344416
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2012
Accepted: 01 May 2012
Revised: 01 March 2012
Received: 01 May 2011
Published in TWEB Volume 6, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML filtering
  2. automata
  3. distributed hash tables
  4. load-balancing

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Multi-query processing of XML data streams on multicoreThe Journal of Supercomputing10.1007/s11227-016-1919-073:6(2339-2368)Online publication date: 1-Jun-2017
  • (2016)Enhancing applications with filtering of XML message streamsProceedings of the 20th International Database Engineering & Applications Symposium10.1145/2938503.2938509(322-327)Online publication date: 11-Jul-2016
  • (2013)DeltaProceedings of the VLDB Endowment10.14778/2732240.27322417:4(217-228)Online publication date: 1-Dec-2013
  • (2013)HopeProceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2013.169(1399-1406)Online publication date: 16-Jul-2013
  • (2013)Effectively Delivering XML Information in Periodic Broadcast EnvironmentsProceedings of the 24th International Conference on Database and Expert Systems Applications - Volume 805510.1007/978-3-642-40285-2_16(165-179)Online publication date: 26-Aug-2013

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media