Abstract
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model,
, for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A* the set of words over A, we specify a stochastically consistent scheme by which a string U ∈ A* can be transformed into any Y ∈ A* by means of arbitrarily distributed substitution, deletion and insertion operations. The scheme is shown to be Functionally Complete and stochastically consistent. Apart from the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[Y¦U], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming. Experimental results which involve dictionaries with strings of lengths between 7 and 14 with an overall average noise of 39.75 % demonstrate the superiority of our system over existing methods. The model also has applications in speech and uni-dimensional signal processing.
The work of this author was supported in part by the Natural Sciences and Engineering Research Council of Canada.
The work of this author was supported in part by the Office of Naval Research and BMD under Contract ONR N00014-91-J-4126.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Abridged List of References
R. L. Bahl and F. Jelinek, Decoding with channels with insertions, deletions and substitutions with applications to speech recognition, IEEE T. Inf. Th., IT-21:404–411 (1975).
Bunke, H. and Csirik, J., Parametric string edit distance and its application to pattern Recognition, IEEE T. Syst., Man and Cybern., SMC-25:202–206 (1993).
L. Devroye, W. Szpankowski and B. Rais, A note on the height of suffix trees, SIAM J. of Computing, 21:48–54, (1992).
G. Dewey, Relative Frequency of English Speech Sounds, Harvard Univ. Press, (1923).
R. O. Duda, P.E. Hart. Pattern Classification and Scene Analysis. Wiley & Sons, 1973.
G.D. Forney, The Viterbi Algorithm, Proceedings of the IEEE, Vol. 61. (1973).
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1972.
P. A. V. Hall and G.R. Dowling, Approximate string matching, Comp. Sur., 12:381–402 (1980).
P. Jacquet and W. Szpankowski, Analysis of digital tries with markovian dependencies, IEEE T. Inf. Th., IT-37:1470–1475 (1991).
R. L. Kashyap and B. J. Oommen, A common basis for similarity and dissimilarity measures involving two strings, Int. J. Comp. Math., 17–40 (1983).
R. L. Kashyap and B. J. Oommen, An effective algorithm for string correction using generalized edit distances-I. Description of the algorithm and its optimality, Inf. Sci., 23(2):123–142 (1981).
R. L. Kashyap, and B. J. Oommen, String correction using probabilistic methods, Pattern Recognition Letters, 147–154 (1984).
R. Lowrance and R. A. Wagner, An extension of the string to string correction problem, J. Assoc. Comput. Mach., 22:177–183 (1975).
A. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Soviet Phys. Dokl., 10:707–710 (1966).
W. J. Masek and M. S. Paterson, A faster algorithm computing string edit distances, J. Comput. System Sci., 20:18–31 (1980).
D. L. Neuhoff, The Viterbi algorithm as an aid in text recognition, IEEE T. Inf. Th., 222–226 (1975).
T. Okuda, E. Tanaka, and T. Kasai, A method of correction of garbled words based on the Levenshtein metric, IEEE T. Comput., C-25:172–177 (1976).
B. J. Oommen, Recognition of noisy subsequences using constrained edit distances, IEEE T. on Pattern Anal. and Mach. Intel., PAMI-9:676–685 (1987).
B. J. Oommen and R. L. Kashyap, A Formal Theory for Optimal and Information Theoretic Syntactic Pattern Recognition. Unabridged Version of the present paper. (Submitted for Publication).
D. Sankoff and J. B. Kruskal, Time Warps,String Edits and Macromolecules: The Theory and practice of Sequence Comparison, Addison-Wesley (1983).
R. Shinghal, and G. T. Toussaint, Experiments in text recognition with modified Viterbi algorithm, IEEE T. on Pat. Anal. and Mach. Intel., 184–192 (1979).
S. Srihari, Computer Text Recognition and Error Correction, IEEE Computer Press, (1984).
A. J. Viterbi, Error bounds for convolutional codes and an asymptotically optimal decoding algorithm, IEEE T. on Inf. Th., 260–26 (1967).
R. A. Wagner and M. J. Fisher, The string to string correction problem, J. Assoc. Comput. Mach., 21:168–173 (1974).
Author information
Authors and Affiliations
Editor information
Additional information
We would like to dedicate this paper to the memory of the late Prof. K. S. Fu who pioneered the field of Syntactic Pattern Recognition. Both authors remember their friend and colleague with respect. We are also very indebted to Richard Loke for helping us prepare the final manuscript and for assisting us obtain the experimental results.
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oommen, B.J., Kashyap, R.L. (1996). Optimal and information theoretic syntactic pattern recognition for traditional errors. In: Perner, P., Wang, P., Rosenfeld, A. (eds) Advances in Structural and Syntactical Pattern Recognition. SSPR 1996. Lecture Notes in Computer Science, vol 1121. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61577-6_2
Download citation
DOI: https://doi.org/10.1007/3-540-61577-6_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61577-4
Online ISBN: 978-3-540-70631-1
eBook Packages: Springer Book Archive