Abstract
In this paper we describe a factorial language, denoted by L(S,k,r), that contains all words that occur in a string S up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h≥ 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S,k,r) is a non-increasing function of r and a non-decreasing function of k and that the equation r=R(S,k,r) admits a unique solution.
The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word x∈ L(S,k,r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x|+|occ(x)|.
Partially supported by MIUR National Project PRIN “Linguaggi Formali e Automi: teoria ed applicazioni”
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arratia, R., Waterman, M.: The Erdös-Rényi strong law for pattern matching with given proportion of mismatches. Annals of Probability 4, 200–225 (1989)
Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Annual ACM Symposium on Theory of Computing, STOC 2004 (2004)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithmique du texte. Vuibert, 347 pages (2001)
Gabriele, A., Mignosi, F., Restivo, A., Sciortino, M.: Indexing structure for approximate string matching. In: Petreschi, R., Persiano, G., Silvestri, R. (eds.) CIAC 2003. LNCS, vol. 2653, pp. 140–151. Springer, Heidelberg (2003)
Gabriele, A., Mignosi, F., Restivo, A., Sciortino, M.: Approximate string matching: indexing and the k-mismatch problem. Technical Report 244, University of Palermo, Department of Mathematics and Applications (2004)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge, 534 pages. (1997), ISBN 0 521 58519 8 hardback
Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002), ISBN 0-521-81220-8 hardback
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 657–666 (2002)
Pelfrêne, J., Abdeddaïm, S., Alexandre, J.: Extracting approximate patterns. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 328–347. Springer, Heidelberg (2003)
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 622–631. Springer, Heidelberg (2003)
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A comparative study of bases for motif inference. In: Iliopoulos, C., Lecroq, T. (eds.) String Algorithmics, King’s College London Publications (2004)
Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. John Wiley & Sons, Chichester (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Epifanio, C., Gabriele, A., Mignosi, F. (2005). Languages with Mismatches and an Application to Approximate Indexing. In: De Felice, C., Restivo, A. (eds) Developments in Language Theory. DLT 2005. Lecture Notes in Computer Science, vol 3572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11505877_20
Download citation
DOI: https://doi.org/10.1007/11505877_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26546-7
Online ISBN: 978-3-540-31682-4
eBook Packages: Computer ScienceComputer Science (R0)