Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Analysis and optimization for boolean expression indexing

Published: 04 July 2013 Publication History

Abstract

BE-Tree is a novel dynamic data structure designed to efficiently index Boolean expressions over a high-dimensional discrete space. BE Tree-copes with both high-dimensionality and expressiveness of Boolean expressions by introducing an effective two-phase space-cutting technique that specifically utilizes the discrete and finite domain properties of the space. Furthermore, BE-Tree employs self-adjustment policies to dynamically adapt the tree as the workload changes. Moreover, in BE-Tree, we develop two novel cache-conscious predicate evaluation techniques, namely, lazy and bitmap evaluations, that also exploit the underlying discrete and finite space to substantially reduce BE-Tree's matching time by up to 75%
BE-Tree is a general index structure for matching Boolean expression which has a wide range of applications including (complex) event processing, publish/subscribe matching, emerging applications in cospaces, profile matching for targeted web advertising, and approximate string matching. Finally, the superiority of BE-Tree is proven through a comprehensive evaluation with state-of-the-art index structures designed for matching Boolean expressions.

Supplementary Material

a8-sadoghi-apndx.pdf (sadoghi.zip)
Supplemental movie, appendix, image and software files for, Analysis and optimization for boolean expression indexing

References

[1]
Agrawal, R., Ailamaki, A., et al. 2008. The Claremont report on database research. SIGMOD Rec. 37, 3, 9--19.
[2]
Aguilera, M. K., Strom, R. E., Sturman, D. C., Astley, M., and Chandra, T. D. 1999. Matching events in a content-based subscription system. In Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing (PODC'99). ACM, New York, 53--61.
[3]
Arumugam, S., Dobra, A., Jermaine, C. M., Pansare, N., and Perez, L. 2010. The DataPath system: A datacentric analytic processing engine for large data warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). ACM, New York, 519--530.
[4]
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'90). ACM, New York, 322--331.
[5]
Berchtold, S., Keim, D. A., and Kriegel, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22th International Conference on Very Large Data Bases (VLDB'96). Morgan Kaufmann Publishers Inc., San Francisco, CA, 28--39.
[6]
Berg, M. D., Cheong, O., Kreveld, M. V., and Overmars, M. 2008. Computational Geometry: Algorithms and Applications 3rd Ed., Springer.
[7]
Brenna, L., Demers, A., Gehrke, J., Hong, M., Ossher, J., Panda, B., Riedewald, M., Thatte, M., and White, W. 2007. Cayuga: a high-performance event processing engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 1100--1102.
[8]
Campailla, A., Chaki, S., Clarke, E., Jha, S., and Veith, H. 2001. Efficient filtering in publish-subscribe systems using binary decision diagrams. In Proceedings of the 23rd International Conference on Software Engineering (ICSE'01). IEEE, 443--452.
[9]
Candan, K. S., Hsiung, W.-P., Chen, S., Tatemura, J., and Agrawal, D. 2006. AFilter: adaptable XML filtering with prefix-caching suffix-clustering. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06). VLDB Endowment, 559--570.
[10]
Carzaniga, A. and Wolf, A. L. 2003. Forwarding in a content-based network. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications ( SIGCOMM'03). ACM, New York, 163--174.
[11]
Chan, C.-Y., Felber, P., Garofalakis,M., and Rastogi, R. 2002. Efficient filtering of XML documents with XPath expressions. VLDB J. 11, 4, 354--379.
[12]
Chandel, A., Hassanzadeh, O., Koudas, N., Sadoghi, M., and Srivastava, D. 2007. Benchmarking declarative approximate selection predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 353--364.
[13]
Chandy, K. M., Charpentier, M., and Capponi, A. 2007. Towards a theory of events. In Proceedings of the Inaugural International Conference on Distributed Event-Based Systems (DEBS'07). ACM, New York, 180--187.
[14]
Chaudhuri, S., Ganjam, K., Ganti, V., and Motwani, R. 2003. Robust and efficient fuzzy match for online data cleaning. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). ACM, New York, 313--324.
[15]
Cugola, G. and Margara, A. 2012. High-performance location-aware publish-subscribe on GPUs. In Proceedings of the ACM/IFIP/USENIX 13th International Middleware Conference. Lecture Notes in Computer Science, vol. 7662, Springer, 312--331.
[16]
Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans. Datab. Syst. 28, 4, 467--516.
[17]
Fabret, F., Jacobsen, H. A., Llirbat, F., Pereira, J., Ross, K. A., and Shasha, D. 2001. Filtering algorithms and implementation for very fast publish/subscribe systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'01). ACM, New York, 115--126.
[18]
Farroukh, A., Sadoghi, M., and Jacobsen, H.-A. 2011. Towards vulnerability-based intrusion detection with event processing. In Proceedings of the 5th ACM International Conference on Distributed Event-Based System (DEBS'11). ACM, New York, 171--182.
[19]
Fellegi, I. P. and Sunter, A. B. 1969. A theory for record linkage. J. Amer. Statist. Assoc. 64, 328, 1183--1210.
[20]
Fischer, P. M. and Kossmann, D. 2005. Batched processing for information filters. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, 902--913.
[21]
Fontoura, M., Sadanandan, S., Shanmugasundaram, J., Vassilvitski, S., Vee, E., Venkatesan, S., and Zien, J. 2010. Efficiently evaluating complex Boolean expressions. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). ACM, New York, 3--14.
[22]
Forgy, C. L. 1990. Rete: A fast algorithm for the many pattern/many object pattern match problem. In Expert Systems, P. G. Raeth, Ed., IEEE, 324--341.
[23]
Freeston, M. 1995. A general solution of the n-dimensional B-tree problem. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95). ACM, New York, 80--91.
[24]
Gaede, V. and Günther, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 2, 170--231.
[25]
Giarratano, J. C. and Riley, G. 1989. Expert Systems: Principles and Programming. Brooks/Cole Publishing Co., Pacific Grove, CA.
[26]
Guttman, A. 1984. R-trees: a dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'84). ACM, New York, 47--57.
[27]
Hanson, E. N., Carnes, C., Huang, L., Konyala, M., Noronha, L., Parthasarathy, S., Park, J. B., and Vernon, A. 1999. Scalable trigger processing. In Proceedings of the 15th International Conference on Data Engineering. M. Kitsuregawa, M. P. Papazoglou, and C. Pu, Eds., IEEE, 266--275.
[28]
Hanson, E. N., Chaabouni, M., Kim, C.-H., and Wang, Y.-W. 1990. A predicate matching algorithm for database rule systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'90). ACM, New York, 271--280.
[29]
Hull, R. 2008. Artifact-centric business process models: Brief survey of research results and challenges. In Proceedings of the OTM Conferences, Part II. Lecture Notes in Computer Science, vol. 5332, Springer, 1152--1163.
[30]
Jerzak, Z. and Fetzer, C. 2008. Bloom filter based routing for content-based publish/subscribe. In Proceedings of the 2nd International Conference on Distributed Event-Based Systems (DEBS'08). ACM, New York, 71--81.
[31]
Kale, S., Hazan, E., Cao, F., and Singh, J. P. 2005. Analysis and algorithms for content-based event matching. In Proceedings of the 4th International Workshop on Distributed Event-Based Systems (DEBS'05). IEEE, 363--369.
[32]
Machanavajjhala, A., Vee, E., Garofalakis, M., and Shanmugasundaram, J. 2008. Scalable ranked publish/subscribe. Proc. VLDB Endow. 1, 1, 451--462.
[33]
Margara, A. and Cugola, G. 2013. High performance publish-subscribe matching using parallel hardware. IEEE Trans. Parallel Distrib Syst 99, PrePrints, 1.
[34]
Ooi, B. C., Tan, K. L., and Tung, A. 2010. Sense the physical, walkthrough the virtual, manage the co (existing) spaces: A database perspective. SIGMOD Rec. 38, 3, 5--10.
[35]
Rjaibi,W., Dittrich, K. R., and Jaepel, D. 2002. Event matching in symmetric subscription systems. In Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (CASCON'02). IBM Press.
[36]
Sadoghi, M. 2012. Towards an extensible efficient event processing kernel. In Proceedings of the SIGMOD/PODS PhD Symposium (PhD'12). ACM, 3--8.
[37]
Sadoghi, M., Burcea, I., and Jacobsen, H.-A. 2011. GPX-Matcher: A generic Boolean predicate-based XPath expression matcher. In Proceedings of the 14th International Conference on Extending Database Technology (EDBT/ICDT'11). ACM, New York, 45--56.
[38]
Sadoghi, M. and Jacobsen, H.-A. 2011. BE-Tree: an index structure to efficiently match Boolean expressions over high-dimensional discrete space. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). ACM, New York, 637--648.
[39]
Sadoghi, M. and Jacobsen, H.-A. 2012. Relevance matters: Capitalizing on less (top-k matching in publish/subscribe). In Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE'12). IEEE, 786--797.
[40]
Sadoghi, M., Jacobsen, H.-A., Labrecque, M., Shum, W., and Singh, H. 2010. Efficient event processing through reconfigurable hardware for algorithmic trading. Proc. VLDB Endow. 3, 2, 1525--1528.
[41]
Sadoghi, M., Singh, H., and Jacobsen, H.-A. 2011. Towards highly parallel event processing through reconfigurable hardware. In Proceedings of the 7th International Workshop on Data Management on New Hardware (DaMoN'11). ACM, New York, 27--32.
[42]
Saita, C.-A. and Llirbat, F. 2004. Clustering multidimensional extended objects to speed up execution of spatial queries. In Proceedings of the 9th International Conference on Extending Database Technology. Lecture Notes in Computer Science, vol. 2992, Springer, 623--624.
[43]
Sellis, T. K., Roussopoulos, N., and Faloutsos, C. 1987. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'87). Morgan Kaufmann Publishers Inc., San Francisco, CA, 507--518.
[44]
Triantafillou, P. and Economides, A. A. 2002. Subscription summaries for scalability and efficiency in publish/subscribe systems. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCSW'02). IEEE, 619--624.
[45]
Whang, S. E., Garcia-Molina, H., Brower, C., Shanmugasundaram, J., Vassilvitskii, S., Vee, E., and Yerneni, R. 2009. Indexing Boolean expressions. Proc. VLDB Endow. 2, 1, 37--48.
[46]
Yan, T. W. and García-Molina, H. 1994. Index structures for selective dissemination of information under the Boolean model. ACM Trans. Datab. Syst. 19, 2, 332--364.

Cited By

View all
  • (2022)Parallel Ensemble Matching Based on Subscription Partitioning for Content-Based Publish/Subscribe SystemsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S0218194022500619(1-20)Online publication date: 25-Oct-2022
  • (2021)Demo: Attribute-Stream-Based Access Control (ASBAC) with the Streaming Attribute Policy Language (SAPL)Proceedings of the 26th ACM Symposium on Access Control Models and Technologies10.1145/3450569.3464397(95-97)Online publication date: 11-Jun-2021
  • (2021)In-Memory Policy Indexing for Policy Retrieval Points in Attribute-Based Access ControlProceedings of the 26th ACM Symposium on Access Control Models and Technologies10.1145/3450569.3463562(59-70)Online publication date: 11-Jun-2021
  • Show More Cited By

Index Terms

  1. Analysis and optimization for boolean expression indexing

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 38, Issue 2
      June 2013
      245 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2487259
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 July 2013
      Accepted: 01 January 2013
      Revised: 01 August 2012
      Received: 01 February 2012
      Published in TODS Volume 38, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Boolean expressions
      2. complex event processing
      3. data structure
      4. publish/subscribe

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Parallel Ensemble Matching Based on Subscription Partitioning for Content-Based Publish/Subscribe SystemsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S0218194022500619(1-20)Online publication date: 25-Oct-2022
      • (2021)Demo: Attribute-Stream-Based Access Control (ASBAC) with the Streaming Attribute Policy Language (SAPL)Proceedings of the 26th ACM Symposium on Access Control Models and Technologies10.1145/3450569.3464397(95-97)Online publication date: 11-Jun-2021
      • (2021)In-Memory Policy Indexing for Policy Retrieval Points in Attribute-Based Access ControlProceedings of the 26th ACM Symposium on Access Control Models and Technologies10.1145/3450569.3463562(59-70)Online publication date: 11-Jun-2021
      • (2021)A-Tree: A Dynamic Data Structure for Efficiently Indexing Arbitrary Boolean ExpressionsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457266(817-829)Online publication date: 9-Jun-2021
      • (2021)Lap: A latency‐aware parallelism framework for content‐based publish/subscribe systemsConcurrency and Computation: Practice and Experience10.1002/cpe.664035:17Online publication date: 21-Sep-2021
      • (2019)PhSIHProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337859(1-10)Online publication date: 5-Aug-2019
      • (2019)A fast and anti-matchability matching algorithm for content-based publish/subscribe systemsComputer Networks10.1016/j.comnet.2018.12.001149(213-225)Online publication date: Feb-2019
      • (2018)PS-tree-based efficient boolean expression matching for high-dimensional and dense workloadsProceedings of the VLDB Endowment10.14778/3291264.329127012:3(251-264)Online publication date: 1-Nov-2018
      • (2018)Analysis and Development of Boolean Expression Matching on Survey Data Validation : (Case Study: Survey and Census of Statistics Indonesia)2018 International Conference on Information Technology Systems and Innovation (ICITSI)10.1109/ICITSI.2018.8696000(300-305)Online publication date: Oct-2018
      • (2017)Toward Efficient Ranked-key Algorithm for the Web notification of Big Data SystemsProceedings of the 2nd international Conference on Big Data, Cloud and Applications10.1145/3090354.3090386(1-4)Online publication date: 29-Mar-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media