In this dissertation I present a model for the determination of intonation contours from context and provide two implemented systems which apply this theory to the problem of generating spoken language with appropriate intonation from high-level semantic representations. The theory and implementations presented here are based on an information structure framework that mediates between intonation and discourse, and encodes the proper level of semantic information to account for both contextually-bound accentuation patterns and intonational phrasing. The structural similarities among these linguistic levels of representation are the basis for selecting Combinatory Categorial Grammar (CCG, Steedman 1985, 1990a) as the model for spoken language production. This model licenses congruent syntactic, prosodic and information structural constituents and consequently represents a simplification over models of prosody developed in syntactically more traditional frameworks.The previous mention heuristic, which has been widely used as a model for determining intonation contours, is shown to be inadequate for handling a broad range of examples involving semantic contrasts, which require pitch accents to be allocated based on their ability to discriminate among available entities in the discourse model. To address this problem, I introduce a model that determines accentual patterns based on sets of alternative entities in the knowledge base. The algorithms for building the information structural representations that encode the semantics of intonation supply the foundation for two computational implementations. These implementations demonstrate how the theoretical model applies to the problem of producing contextually-appropriate spoken output in a natural language generation framework and provide a platform for incrementally testing and refining the underlying theory.
Cited By
- Walker M, Stent A, Mairesse F and Prasad R (2007). Individual and domain adaptation in sentence planning for dialogue, Journal of Artificial Intelligence Research, 30:1, (413-456), Online publication date: 1-Sep-2007.
- Kruijff-Korbayová I, Karagjosova E, Rodríguez K and Ericsson S A dialogue system with contextually appropriate spoken output intonation Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2, (199-202)
- Kruijff-Korbayová I, Ericsson S, Rodríguez K and Karagjosova E Producing contextually appropriate intonation in an information-state based dialogue system Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1, (227-234)
- Park J and Cho H Informed parsing for coordination with combinatory categorial grammar Proceedings of the 18th conference on Computational linguistics - Volume 2, (593-599)
- Thórisson K Real-time decision making in multimodal face-to-face communication Proceedings of the second international conference on Autonomous agents, (16-23)
- Pelachaud C and Poggi I Multimodal communication between synthetic agents Proceedings of the working conference on Advanced visual interfaces, (156-163)
- Pan S and McKeown K Learning intonation rules for Concept to Speech generation Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2, (1003-1009)
- Thórisson K Gandalf Proceedings of the first international conference on Autonomous agents, (536-537)
- McKeown K, Pan S, Shaw J, Jordan D and Allen B Language generation for multimedia healthcare briefings Proceedings of the fifth conference on Applied natural language processing, (277-282)
- Theune M Contrastive accent in a data-to-speech system Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, (519-521)
- Prevost S An information structural approach to spoken language generation Proceedings of the 34th annual meeting on Association for Computational Linguistics, (294-301)
Index Terms
- A semantics of contrast and information structure for specifying intonation in spoken language generation
Recommendations
Structure and intonation in spoken language understanding
ACL '90: Proceedings of the 28th annual meeting on Association for Computational LinguisticsThe structure imposed upon spoken sentences by intonation seems frequently to be orthogonal to their traditional surface-syntactic structure. However, the notion of "intonational structure" as formulated by Pierrehumbert, Selkirk, and others, can be ...
Two-Stage Hypotheses Generation for Spoken Language Translation
Spoken Language Translation (SLT) is the research area that focuses on the translation of speech or text between two spoken languages. Phrase-based and syntax-based methods represent the state-of-the-art for statistical machine translation (SMT). The ...
Synthesis of the intonation of neutrally spoken Modern Standard Arabic speech
Acoustical analyses of the fundamental frequency (F"0) contours of neutrally spoken Modern Standard Arabic (MSA) speech types of declarative, imperative, exclamative, and interrogative nature showed that their pitch patterns are characterized by four ...