Nothing Special   »   [go: up one dir, main page]

Skip to main content

Languages with Mismatches and an Application to Approximate Indexing

  • Conference paper
Developments in Language Theory (DLT 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3572))

Included in the following conference series:

Abstract

In this paper we describe a factorial language, denoted by L(S,k,r), that contains all words that occur in a string S up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h≥ 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S,k,r) is a non-increasing function of r and a non-decreasing function of k and that the equation r=R(S,k,r) admits a unique solution.

The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word xL(S,k,r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x|+|occ(x)|.

Partially supported by MIUR National Project PRIN “Linguaggi Formali e Automi: teoria ed applicazioni”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arratia, R., Waterman, M.: The Erdös-Rényi strong law for pattern matching with given proportion of mismatches. Annals of Probability 4, 200–225 (1989)

    Google Scholar 

  2. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Annual ACM Symposium on Theory of Computing, STOC 2004 (2004)

    Google Scholar 

  3. Crochemore, M., Hancart, C., Lecroq, T.: Algorithmique du texte. Vuibert, 347 pages (2001)

    Google Scholar 

  4. Gabriele, A., Mignosi, F., Restivo, A., Sciortino, M.: Indexing structure for approximate string matching. In: Petreschi, R., Persiano, G., Silvestri, R. (eds.) CIAC 2003. LNCS, vol. 2653, pp. 140–151. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Gabriele, A., Mignosi, F., Restivo, A., Sciortino, M.: Approximate string matching: indexing and the k-mismatch problem. Technical Report 244, University of Palermo, Department of Mathematics and Applications (2004)

    Google Scholar 

  6. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge, 534 pages. (1997), ISBN 0 521 58519 8 hardback

    Google Scholar 

  7. Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002), ISBN 0-521-81220-8 hardback

    MATH  Google Scholar 

  9. Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 657–666 (2002)

    Google Scholar 

  10. Pelfrêne, J., Abdeddaïm, S., Alexandre, J.: Extracting approximate patterns. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 328–347. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 622–631. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A comparative study of bases for motif inference. In: Iliopoulos, C., Lecroq, T. (eds.) String Algorithmics, King’s College London Publications (2004)

    Google Scholar 

  13. Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. John Wiley & Sons, Chichester (2001)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Epifanio, C., Gabriele, A., Mignosi, F. (2005). Languages with Mismatches and an Application to Approximate Indexing. In: De Felice, C., Restivo, A. (eds) Developments in Language Theory. DLT 2005. Lecture Notes in Computer Science, vol 3572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11505877_20

Download citation

  • DOI: https://doi.org/10.1007/11505877_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26546-7

  • Online ISBN: 978-3-540-31682-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics