Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/375551.375610acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Two-dimensional substring indexing

Published: 01 May 2001 Publication History

Abstract

As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing.
In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.

References

[1]
L. Arge, V. Samoladas, and J. S. Vitter. On two-dimensional indexability and optimal range search indexing. Proceedings of ACM PODS, 1999.]]
[2]
P. Agarwal and J. Erickson. Geometric range searching and its relatives. Advances in Discrete and Computational Geometry, AMS, 1998.]]
[3]
R. Bayer mad K. Unterauer. Prefix B-trees. ACM Transactions on Database Systems, 2(1):11-26, 1977.]]
[4]
N. Beckmann, H.-P. Kriegel, R. Sdmeider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. Proceedings of ACM SIGMOD, pages 220-231, 1990.]]
[5]
P. Ferragina and R. Grossi. A fully dynamic data structure for external substring search. Proceedings of ACM STOC, pages 693-702, 1995.]]
[6]
P. Ferragina and R. Grossi. Fast string searching in secondary storage: Theoretical developments and experimental results. Proceedings of ACM SODA, pages 373-382, 1996.]]
[7]
P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in external memory and its appfications. Journal of ACM, 46(2):237-280, 1999.]]
[8]
V. Gaede and O. Gfinther. Multidimensional access methods. ACM Computing Surveys, 30(2):170-231, 1998.]]
[9]
G. Gonnet, R. Baeza-Yates, mad T. Snider. New indices for text: PAT trees and PAT arrays. hfformation Retrieval: Data Structures mad Algorithms, Prentice Hall, 1992.]]
[10]
P. Gupta, R. Janardhan, mad M. Smid. Further results on generalized intersection search problems: Counting, reporting mad dynamization. J. Algorithms, 282-317, 1995.]]
[11]
A. Guttman. R-trees : A dynamic index structure for spatial searctfing. Proceedings of ACM SIGMOD, pages 47-57, 1984.]]
[12]
H. V. Jagadish, N. Koudas, and D. Srivastava. On effective multi-dimensional indexhag for strings. Proceedings of ACM SIGMOD, 2000.]]
[13]
I. Kamel mad C. Faloutsos. Hilbert R-tree: An improved R-tree using fractals. Proceedings of VLDB, pages 500-510, 1994.]]
[14]
U. Manber and G. Myers. Suffix arrays: A new method for onfine string searches. SIAM Journal On Computing, 22(5):935-948, 1993.]]
[15]
Y. Matias, S. Muthulwishnan, S. Satfinalp, and J. Ziv. Augmenting suffix trees with appfications. Proc. European Symp. on Algorithms, 1998.]]
[16]
E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM 23:262-272, 1976.]]
[17]
D. R. Morrison. PATRICIA: Practical algorithm to retrieve information coded in alphanumeric. Journal of ACM 15(4):514-534, 1968.]]
[18]
J. Robinson. The K-D-B-tree: A search structure for large multidimensional dynamic indexes. Proceedings of ACM SIGMOD, pages 10-18, 1981.]]
[19]
H. Samet. The design and analysis of spatial data structures. Addison Wesley, 1990.]]
[20]
T. Sellis, N. Roussopotflos, mad C. Faloutsos. The R+-tree: A dynamic index for mtflti-dimensional data. Proceedings of VLDB, pages 507-518, 1987.]]
[21]
K. C. Sevcik and N. Koudas. Filter trees for managing spatial data over a range of size granularities. Proceedings of VLDB, pages 16-27, 1996.]]
[22]
J. S. Vitter and E. A. M. Shriver. Optimal disk I/O with parallel block transfer. Proceedings of STOC, 1990.]]
[23]
J. S. Vitter. External memory algorithms mad data structures: Dealing with massive data. ACM Computing Surveys, 2000.]]

Cited By

View all
  • (2019)Autocompletion for Prefix-Abbreviated InputProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319858(211-228)Online publication date: 25-Jun-2019
  • (2010)Fast set intersection and two-patterns matchingTheoretical Computer Science10.1016/j.tcs.2010.06.002411:40-42(3795-3800)Online publication date: 1-Sep-2010
  • (2010)Fast set intersection and two-patterns matchingProceedings of the 9th Latin American conference on Theoretical Informatics10.1007/978-3-642-12200-2_22(234-242)Online publication date: 19-Apr-2010
  • Show More Cited By

Index Terms

  1. Two-dimensional substring indexing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    May 2001
    301 pages
    ISBN:1581133618
    DOI:10.1145/375551
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 May 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS01
    Sponsor:

    Acceptance Rates

    PODS '01 Paper Acceptance Rate 26 of 99 submissions, 26%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Autocompletion for Prefix-Abbreviated InputProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319858(211-228)Online publication date: 25-Jun-2019
    • (2010)Fast set intersection and two-patterns matchingTheoretical Computer Science10.1016/j.tcs.2010.06.002411:40-42(3795-3800)Online publication date: 1-Sep-2010
    • (2010)Fast set intersection and two-patterns matchingProceedings of the 9th Latin American conference on Theoretical Informatics10.1007/978-3-642-12200-2_22(234-242)Online publication date: 19-Apr-2010
    • (2008)Approximate colored range and point enclosure queriesJournal of Discrete Algorithms10.1016/j.jda.2007.10.0016:3(420-432)Online publication date: 1-Sep-2008
    • (2005)Approximate colored range queriesProceedings of the 16th international conference on Algorithms and Computation10.1007/11602613_37(360-369)Online publication date: 19-Dec-2005
    • (2004)Expressing and optimizing sequence queries in database systemsACM Transactions on Database Systems10.1145/1005566.100556829:2(282-318)Online publication date: 1-Jun-2004
    • (2002)Efficient algorithms for document retrieval problemsProceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms10.5555/545381.545469(657-666)Online publication date: 6-Jan-2002
    • (2002)Range Searching in Categorical Data: Colored Range Searching on GridAlgorithms — ESA 200210.1007/3-540-45749-6_6(17-28)Online publication date: 29-Aug-2002

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media