An evaluation of the diagnostic rhyme test

Steven L. Greenspan¹,
Raymond W. Bennett² &
Ann K. Syrdal³

338 Accesses
10 Citations
Explore all metrics

Abstract

The intelligibility of a speech output device is an important predictor of user acceptability. The Diagnostic Rhyme Test (DRT) is an ANSI standard for measuring speech intelligibility (ANSI S3.2-1989). In the DRT, respondents hear a word and choose its equivalent from two visually presented words. The two words differ only in their initial (e.g., veal-feel), and the two consonants differ only in a single distinctive acousticphonetic feature (e.g., voicing). To define “distinctive feature”, the DRT uses a minimal distinctive feature system, loosely based on the work of Jakobson et al. (1963) and Miller and Nicely (1955). These studies carefully analyzed natural speech errors in various noise environments. Whether or not these studies can be freely applied to alternative forced-choice tests of coded or synthesized speech is an empirical issue. In the present study, the results of a Consonant Identification (CI) task were compared to a previously conducted DRT using the same coding algorithms. The CI data indicated that the low-bit-rate coded speech yielded significantly more multifeature confusions then the uncoded speech. Moreover, the multifeature confusions could not be easily predicted from the single-feature confusions. A fundamental assumption of the DRT is that speech errors are adequately diagnosed by testing single-feature confusions. The results of the present study contradict that assumption. In conclusion, we argue that the application of the DRT (and more generally, any closed-response choice procedure) to coded or synthesized speech is questionable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The perceptual flow of phonetic information

Article 31 January 2019

Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification

Article 22 February 2024

Degraded and computer-generated speech processing in a bonobo

Article Open access 20 May 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

American National Standards Institute. (1960). American standard method for measurement of monosyllabic word intelligibility (ANS S3.2-1960). New York: American Standards Association.
Google Scholar
American National Standards Institute. (1989). Method for measuring the intelligibility of speech over communication systems (ANS S3.2-1989). New York: American Standards Association.
Google Scholar
Bronson, E., Carlone, D., Kleijn, W.B., O'Dell, K., Picone, J., and Thomson, J. (1987). Harmonic coding of speech at 4.8 Kb/s. InProceedings of the IEEE International Conference on Acoustics, Speech, and Speech Processing, pp. 2213–2216.
Campbell, G.A. (1910, cited in Schmidt-Nielsen, 1994). Telephonic intelligibility,Phil. Mag. January.
Chomsky, N. and Halle, M. (1968).The Sound Pattern of English. New York: Harper and Row.
Google Scholar
Egan, J.P. (1948). Articulation testing.Lyryngoscope, 58:955–991.
Google Scholar
Greenspan, S.L., Nusbaum, H.C., and Pisoni, D.B. (1988). Perceptual learning of synthetic speech produced by rule.Journal of Experimental Psychology: Human Learning and Performance, 14(3):421–433.
Google Scholar
House, A.S., Williams, C.E., Hecker, M.H.L., and Kryter, K.D. (1965). Articulation testing methods: Consonantal differentiation with a closed-response set.Journal of the Acoustical Society of America, 37:158–166.
Google Scholar
Jakobson, R., Fant, C.G.M., and Halle, M. (1963).Preliminaries to Speech Analysis: The Distinctive Features and their Correlates. Cambridge, MA: MIT.
Google Scholar
Luce, P.A. (1987). Structural distinctions between high and low frequency words in auditory word recognition. Unpublished doctoral dissertation, Indiana University.
McAuley, R.J. and Quatieri, T.E. (1985). Mid-rate coding based on a sinusoidal representation of speech. InProceedings of the IEEE International Conference on Acoustics, Speech, and Speech Processing, pp. 945–948.
Miller, G.A. and Nicely, P. (1955). An analysis of perceptual confusions among some English consonants.Journal of the Acoustical Society of America, 27:338–352.
Google Scholar
Nusbaum, H.C., Dedina, M.J., and Pisoni, D.B. (1984). Perceptual confusions of consonants in natural and synthetic CV syllables. Research on Speech Perception: Progress Report No. 10, Speech Research Laboratory, Indian University, Bloomington, Indiana, pp. 409–422.
Google Scholar
Nusbaum, H.C., Francis, A.L., and Henly, A.S. (1995). Measuring the naturalness of synthetic speech.International Journal of Speech Technology, 1:7–19.
Google Scholar
Ralston, J.V., Pisoni, D.B., and Mullenix, J.W. (1994). Perception and comprehension of speech. In A. Syrdal, R. Bennett, and S. Greenspan (Eds.),Applied Speech Technology. Boca Raton, FL: CRC Press.
Google Scholar
Salasoo, A. and Pisoni, D.B. (1985). Sources of knowledge in spoken word identification.Journal of Verbal Learning and Verbal Behavior, 24:210–234.
Google Scholar
Scmidt-Neilsen, A. (1994). Intelligibility and acceptability testing for speech technology. In A. Syrdal, R. Bennett, and S. Greenspan (Eds.),Applied Speech Technology. Boca Raton, FL: CRC Press.
Google Scholar
Syrdal, A. (1987). Methods for a detailed analysis of Dynastat DRT results.AT&T Bell Laboratories Technical Memorandum.
Voiers, W.D. (1977). Diagnostic acceptability measure for speech communication systems. In M.E. Hawley (Ed.),Speech Intelligibility1 and Speaker Recognition, vol. 2. Stroudsberg, PA: Dowden, Hutchinson, and Ross.
Google Scholar
Voiers, W.D. (1983). Evaluating processed speech using the diagnostic rhyme test.Speech Technology, 30–39.
Wang, M.D. and Bilger, R.C. (1973). Consonant confusions in noise: A study of perceptual features.Journal of the Acoustical Society of America, 54:1248–1266.
Google Scholar
Wickelgren, W.A. (1966). Distinctive features and errors in shortterm memory for english consonants.Journal of the Acoustical Society of America, 39(2):388–398.
Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Labs-Research, 180 Florham Park, 07932, NJ
Steven L. Greenspan
Ameritech, Hoffman Estates, 60196, IL
Raymond W. Bennett
AT&T Labs-Research, 180 Florham Park, 07932, NJ
Ann K. Syrdal

Authors

Steven L. Greenspan
View author publications
You can also search for this author in PubMed Google Scholar
Raymond W. Bennett
View author publications
You can also search for this author in PubMed Google Scholar
Ann K. Syrdal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Greenspan, S.L., Bennett, R.W. & Syrdal, A.K. An evaluation of the diagnostic rhyme test. Int J Speech Technol 2, 201–214 (1998). https://doi.org/10.1007/BF02111208

Download citation

Received: 30 January 1998
Revised: 27 February 1998
Accepted: 27 February 1998
Issue Date: September 1998
DOI: https://doi.org/10.1007/BF02111208

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The perceptual flow of phonetic information

Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification

Degraded and computer-generated speech processing in a bonobo

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An evaluation of the diagnostic rhyme test

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The perceptual flow of phonetic information

Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification

Degraded and computer-generated speech processing in a bonobo

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation