Hon et al., 1998 - Google Patents

Automatic generation of synthesis units for trainable text-to-speech systems

Hon et al., 1998

Document ID: 7488096140853669849
Author: Hon H; Acero A; Huang X; Liu J; Plumpe M
Publication year: 1998
Publication venue: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98 (Cat. No. 98CH36181)

External Links

Cited by

Snippet

The Whistler text-to-speech engine was designed so that we can automatically construct the model parameters from training data. This paper describes in detail the design issues of constructing the synthesis unit inventory automatically from speech databases. The …

Continue reading at citeseerx.ist.psu.edu (PDF) (other versions)

230000002194 synthesizing 0 title abstract description 40

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G10L13/07—Concatenation rules
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification

Similar Documents

Publication	Publication Date	Title
Hon et al.	1998	Automatic generation of synthesis units for trainable text-to-speech systems
Huang et al.	1997	Recent improvements on Microsoft's trainable text-to-speech system-Whistler
Huang et al.	1996	Whistler: A trainable text-to-speech system
US7567896B2 (en)	2009-07-28	Corpus-based speech synthesis based on segment recombination
JP4176169B2 (en)	2008-11-05	Runtime acoustic unit selection method and apparatus for language synthesis
US6163769A (en)	2000-12-19	Text-to-speech using clustered context-dependent phoneme-based units
Donovan et al.	1999	A hidden Markov-model-based trainable speech synthesizer
US5905972A (en)	1999-05-18	Prosodic databases holding fundamental frequency templates for use in speech synthesis
Donovan	1996	Trainable speech synthesis
CA2351988C (en)	2007-07-24	Method and system for preselection of suitable units for concatenative speech
US6665641B1 (en)	2003-12-16	Speech synthesis using concatenation of speech waveforms
JP3910628B2 (en)	2007-04-25	Speech synthesis apparatus, speech synthesis method and program
Plumpe et al.	1998	HMM-based smoothing for concatenative speech synthesis.
Hon	1992	Vocabulary-independent speech recognition: The VOCIND system
Marino et al.	2000	The demiphone: an efficient contextual subword unit for continuous speech recognition
Mullah	2015	A comparative study of different text-to-speech synthesis techniques
Yin	2018	An overview of speech synthesis technology
Nock	2001	Techniques for modelling phonological processes in automatic speech recognition
Bunnell	2022	Speech synthesis: Toward a “Voice” for all
Ng	1998	Survey of data-driven approaches to Speech Synthesis
i Barrobes	2006	Voice Conversion applied to Text-to-Speech systems
Jouvet et al.	1999	Hypothesis dependent threshold setting for improved out-of-vocabulary data rejection
Huan et al.	0	RECENT IMPROVEMENTS ON MICHAEL’S TRAINABLE SAMPLE PAPER SYSTEM-WHISTLE
Wiener	2003	Database Driven Speech Synthesis Systems
EP1589524A1 (en)	2005-10-26	Method and device for speech synthesis