Abstract
In this work, the problem we address is text indexing for approximate matching. Given a text \(\mathcal{T}\) which undergoes some preprocessing to generate an index, we can later query this index to identify the places where a string occurs up to a certain number of errors k (edition distance). The indexing structure occupies space \(\mathcal{O}(n\log^kn)\) in the average case, independent of alphabet size. This structure can be used to report the existence of a match with k errors in \(\mathcal{O}(3^k m^{k+1})\) and to report the occurrences in \(\mathcal{O}(3^k m^{k+1} + \mbox{\it ed})\) time, where m is the length of the pattern and ed and the number of matching edit scripts. The construction of the structure has time bound by \(\mathcal{O}(kN|\Sigma|)\), where N is the number of nodes in the index and |Σ| the alphabet size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Weiner, P.: Linear pattern matching algorithms. In: FOCS, pp. 1–11. IEEE, Los Alamitos (1973)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33 (2001)
Maaß, M.G., Nowak, J.: Text indexing with errors. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 21–32. Springer, Heidelberg (2005)
Chattaraj, A., Parida, L.: An inexact-suffix-tree-based algorithm for detecting extensible patterns. Theor. Comput. Sci. 335, 3–14 (2005)
Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)
McCreight, E.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)
Apostolico, A., Szpankowski, W.: Self-alignments in words and their applications. J. Algorithms 13, 446–467 (1992)
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coelho, L.P., Oliveira, A.L. (2006). Dotted Suffix Trees A Structure for Approximate Text Indexing. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_27
Download citation
DOI: https://doi.org/10.1007/11880561_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)