Dotted Suffix Trees A Structure for Approximate Text Indexing

Luís Pedro Coelho¹⁹ &
Arlindo L. Oliveira¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4209))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

629 Accesses
6 Citations

Abstract

In this work, the problem we address is text indexing for approximate matching. Given a text $\mathcal{T}$ which undergoes some preprocessing to generate an index, we can later query this index to identify the places where a string occurs up to a certain number of errors k (edition distance). The indexing structure occupies space $\mathcal{O}(n\log^kn)$ in the average case, independent of alphabet size. This structure can be used to report the existence of a match with k errors in $\mathcal{O}(3^k m^{k+1})$ and to report the occurrences in $\mathcal{O}(3^k m^{k+1} + \mbox{\it ed})$ time, where m is the length of the pattern and ed and the number of matching edit scripts. The construction of the structure has time bound by $\mathcal{O}(kN|\Sigma|)$, where N is the number of nodes in the index and |Σ| the alphabet size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An efficient pruning strategy for approximate string matching over suffix tree

Article 06 November 2015

On Suffix Tree Breadth

Property Suffix Array with Applications

References

Weiner, P.: Linear pattern matching algorithms. In: FOCS, pp. 1–11. IEEE, Los Alamitos (1973)
Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33 (2001)
Google Scholar
Maaß, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 21–32. Springer, Heidelberg (2005)
Chapter Google Scholar
Chattaraj, A., Parida, L.: An inexact-suffix-tree-based algorithm for detecting extensible patterns. Theor. Comput. Sci. 335, 3–14 (2005)
Article MATH MathSciNet Google Scholar
Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)
Google Scholar
McCreight, E.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)
Article MATH MathSciNet Google Scholar
Apostolico, A., Szpankowski, W.: Self-alignments in words and their applications. J. Algorithms 13, 446–467 (1992)
Article MATH MathSciNet Google Scholar
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York (1997)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

INESC-ID/IST,
Luís Pedro Coelho & Arlindo L. Oliveira

Authors

Luís Pedro Coelho
View author publications
You can also search for this author in PubMed Google Scholar
Arlindo L. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Strathclyde, Scotland
Fabio Crestani
Dipartimento di Informatica, University of Pisa, Largo B. Pontecorvo 3, 56127, Pisa, Italy
Paolo Ferragina
Department of Information Studies, University of Sheffield, Sheffield, UK
Mark Sanderson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coelho, L.P., Oliveira, A.L. (2006). Dotted Suffix Trees A Structure for Approximate Text Indexing. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_27

Download citation

DOI: https://doi.org/10.1007/11880561_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dotted Suffix Trees A Structure for Approximate Text Indexing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An efficient pruning strategy for approximate string matching over suffix tree

On Suffix Tree Breadth

Property Suffix Array with Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Dotted Suffix Trees A Structure for Approximate Text Indexing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An efficient pruning strategy for approximate string matching over suffix tree

On Suffix Tree Breadth

Property Suffix Array with Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation