Abstract
The intelligibility of a speech output device is an important predictor of user acceptability. The Diagnostic Rhyme Test (DRT) is an ANSI standard for measuring speech intelligibility (ANSI S3.2-1989). In the DRT, respondents hear a word and choose its equivalent from two visually presented words. The two words differ only in their initial (e.g., veal-feel), and the two consonants differ only in a single distinctive acousticphonetic feature (e.g., voicing). To define “distinctive feature”, the DRT uses a minimal distinctive feature system, loosely based on the work of Jakobson et al. (1963) and Miller and Nicely (1955). These studies carefully analyzed natural speech errors in various noise environments. Whether or not these studies can be freely applied to alternative forced-choice tests of coded or synthesized speech is an empirical issue. In the present study, the results of a Consonant Identification (CI) task were compared to a previously conducted DRT using the same coding algorithms. The CI data indicated that the low-bit-rate coded speech yielded significantly more multifeature confusions then the uncoded speech. Moreover, the multifeature confusions could not be easily predicted from the single-feature confusions. A fundamental assumption of the DRT is that speech errors are adequately diagnosed by testing single-feature confusions. The results of the present study contradict that assumption. In conclusion, we argue that the application of the DRT (and more generally, any closed-response choice procedure) to coded or synthesized speech is questionable.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
American National Standards Institute. (1960). American standard method for measurement of monosyllabic word intelligibility (ANS S3.2-1960). New York: American Standards Association.
American National Standards Institute. (1989). Method for measuring the intelligibility of speech over communication systems (ANS S3.2-1989). New York: American Standards Association.
Bronson, E., Carlone, D., Kleijn, W.B., O'Dell, K., Picone, J., and Thomson, J. (1987). Harmonic coding of speech at 4.8 Kb/s. InProceedings of the IEEE International Conference on Acoustics, Speech, and Speech Processing, pp. 2213–2216.
Campbell, G.A. (1910, cited in Schmidt-Nielsen, 1994). Telephonic intelligibility,Phil. Mag. January.
Chomsky, N. and Halle, M. (1968).The Sound Pattern of English. New York: Harper and Row.
Egan, J.P. (1948). Articulation testing.Lyryngoscope, 58:955–991.
Greenspan, S.L., Nusbaum, H.C., and Pisoni, D.B. (1988). Perceptual learning of synthetic speech produced by rule.Journal of Experimental Psychology: Human Learning and Performance, 14(3):421–433.
House, A.S., Williams, C.E., Hecker, M.H.L., and Kryter, K.D. (1965). Articulation testing methods: Consonantal differentiation with a closed-response set.Journal of the Acoustical Society of America, 37:158–166.
Jakobson, R., Fant, C.G.M., and Halle, M. (1963).Preliminaries to Speech Analysis: The Distinctive Features and their Correlates. Cambridge, MA: MIT.
Luce, P.A. (1987). Structural distinctions between high and low frequency words in auditory word recognition. Unpublished doctoral dissertation, Indiana University.
McAuley, R.J. and Quatieri, T.E. (1985). Mid-rate coding based on a sinusoidal representation of speech. InProceedings of the IEEE International Conference on Acoustics, Speech, and Speech Processing, pp. 945–948.
Miller, G.A. and Nicely, P. (1955). An analysis of perceptual confusions among some English consonants.Journal of the Acoustical Society of America, 27:338–352.
Nusbaum, H.C., Dedina, M.J., and Pisoni, D.B. (1984). Perceptual confusions of consonants in natural and synthetic CV syllables. Research on Speech Perception: Progress Report No. 10, Speech Research Laboratory, Indian University, Bloomington, Indiana, pp. 409–422.
Nusbaum, H.C., Francis, A.L., and Henly, A.S. (1995). Measuring the naturalness of synthetic speech.International Journal of Speech Technology, 1:7–19.
Ralston, J.V., Pisoni, D.B., and Mullenix, J.W. (1994). Perception and comprehension of speech. In A. Syrdal, R. Bennett, and S. Greenspan (Eds.),Applied Speech Technology. Boca Raton, FL: CRC Press.
Salasoo, A. and Pisoni, D.B. (1985). Sources of knowledge in spoken word identification.Journal of Verbal Learning and Verbal Behavior, 24:210–234.
Scmidt-Neilsen, A. (1994). Intelligibility and acceptability testing for speech technology. In A. Syrdal, R. Bennett, and S. Greenspan (Eds.),Applied Speech Technology. Boca Raton, FL: CRC Press.
Syrdal, A. (1987). Methods for a detailed analysis of Dynastat DRT results.AT&T Bell Laboratories Technical Memorandum.
Voiers, W.D. (1977). Diagnostic acceptability measure for speech communication systems. In M.E. Hawley (Ed.),Speech Intelligibility1 and Speaker Recognition, vol. 2. Stroudsberg, PA: Dowden, Hutchinson, and Ross.
Voiers, W.D. (1983). Evaluating processed speech using the diagnostic rhyme test.Speech Technology, 30–39.
Wang, M.D. and Bilger, R.C. (1973). Consonant confusions in noise: A study of perceptual features.Journal of the Acoustical Society of America, 54:1248–1266.
Wickelgren, W.A. (1966). Distinctive features and errors in shortterm memory for english consonants.Journal of the Acoustical Society of America, 39(2):388–398.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Greenspan, S.L., Bennett, R.W. & Syrdal, A.K. An evaluation of the diagnostic rhyme test. Int J Speech Technol 2, 201–214 (1998). https://doi.org/10.1007/BF02111208
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02111208