Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Multiword expressions in spoken language: An exploratory study on pronunciation variation

Published: 01 October 2005 Publication History

Abstract

The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. These N-grams were filtered and subsequently assigned to linguistic categories. For a small selection of these N-grams we examined the phonetic transcriptions contained in the corpus. We found that the pronunciation of these N-grams differed to a large extent from the canonical form. In order to determine whether this is a general characteristic of spontaneous speech or rather the effect of the specific status of these N-grams, we analysed the pronunciations of the individual words composing the N-grams in two context conditions: (1) in the N-gram context and (2) in any other context. We found that words in N-grams do indeed have peculiar pronunciation patterns. This seems to suggest that the N-grams investigated may be considered as MWEs that should be treated as lexical entries in the pronunciation lexicons used in ASR and APT, with their own specific pronunciation variants.

References

[1]
Pronunciation modelling in the RWTH large vocabulary speech recognizer. In: Proceedings of the ESCA workshop modeling pronunciation variation for automatic speech recognition, pp. 13-16.
[2]
The Longman grammar of spoken and written English. Longman, Harlow, Essex.
[3]
Improving automatic phonetic transcription of spontaneous speech through variant-based pronunciation variation modelling. In: Proceedings of LREC 2004, Lisbon, pp. 681-684.
[5]
The phonology of Dutch. Clarendon Press, Oxford.
[6]
Assessing transcription agreement: methodological aspects. Clinical Linguistics&Phonetics. v10 i2. 131-155.
[7]
The recognition of reduced word forms. Brain and Language. v81. 162-173.
[8]
Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition. In: Proceedings of EuroSpeech-97, Rhodes, pp. 2379-2382.
[9]
Roles and representations of systematic fine phonetic detail in speech understanding. Journal of Phonetics. v31. 373-405.
[10]
Word-level phonetic variation in large speech corpora. In: Alexiadou, A., Fuhrhop, N., Kleinhenz, U., Law, P. (Eds.), ZAS papers in linguistics, vol. 11. Zentrum für Allgemeine Sprachwissenschaft, Typologie und Universalienforschung, Berlin. pp. 35-50.
[11]
Improving the performance of a Dutch CSR by modelling within-word and cross-word pronunciation variation. Speech Communication. v29 i2-4. 193-207.
[12]
A data-driven method for modeling pronunciation variation. Speech Communication. v40 i4. 517-534.
[13]
Segmental reduction in connected speech in German: phonological facts and phonetic explanations. In: Hardcastle, W.J., Marchal, A. (Eds.), Speech production and speech modelling, Kluwer, Dordrecht. pp. 69-92.
[14]
Transducing text to multiword units. In: Proceedings MEMURA 2004 workshop, Lisbon, pp. 31-38.
[15]
Multiword units in syntactic parsing. In: Proceedings MEMURA 2004 workshop, Lisbon, pp. 39-46.
[16]
Idioms. Language. v70. 491-538.
[17]
Reusable lexical representations for idioms. In: Proceedings LREC 2004, Lisbon, pp. 903-906.
[18]
The design of the spoken Dutch corpus. In: Peters, P., Collins, P., Smith, A. (Eds.), New frontiers of corpus research, Rodopi, Amsterdam. pp. 105-112.
[19]
A look at NIST's benchmark ASR tests: past, present, and future. In: Proceedings workshop automatic speech recognition and understanding, pp. 483-488.
[20]
Multiword expressions: a pain in the neck for NLP. In: LinGO working paper (2001-03),
[21]
Automatic phonetic transcription of non-prompted speech. In: Proceedings of the ICPhS 1999, San Francisco, pp. 607-610.
[22]
Dictionary learning for spontaneous speech recognition. In: Proceedings of ICSLP-96, Philadelphia, pp. 2328-2331.
[23]
Modeling pronunciation variation for ASR: a survey of the literature. Speech Communication. v29 i2-4. 225-246.
[24]
Individual differences in second language acquisition. In: Fillmore, C., Kempler, D., Wang, W. (Eds.), Individual differences in language ability and language behaviour, Academic Press, New York. pp. 203-228.
[25]
On the importance of exception and cross-word rules for the data-driven creation of Lexica for ASR. In: Proceedings 11th ProRisc workshop, Veldhoven, The Netherlands, pp. 589-593.

Cited By

View all
  • (2006)A bio-inspired approach for multi-word expression extractionProceedings of the COLING/ACL on Main conference poster sessions10.5555/1273073.1273096(176-182)Online publication date: 17-Jul-2006
  1. Multiword expressions in spoken language: An exploratory study on pronunciation variation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Computer Speech and Language
      Computer Speech and Language  Volume 19, Issue 4
      October, 2005
      183 pages

      Publisher

      Academic Press Ltd.

      United Kingdom

      Publication History

      Published: 01 October 2005

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2006)A bio-inspired approach for multi-word expression extractionProceedings of the COLING/ACL on Main conference poster sessions10.5555/1273073.1273096(176-182)Online publication date: 17-Jul-2006

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media