EP1271469A1 - Method for generating personality patterns and for synthesizing speech - Google Patents
Method for generating personality patterns and for synthesizing speech Download PDFInfo
- Publication number
- EP1271469A1 EP1271469A1 EP01115216A EP01115216A EP1271469A1 EP 1271469 A1 EP1271469 A1 EP 1271469A1 EP 01115216 A EP01115216 A EP 01115216A EP 01115216 A EP01115216 A EP 01115216A EP 1271469 A1 EP1271469 A1 EP 1271469A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- features
- anyone
- acoustical
- synthesizing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 27
- 238000004590 computer program Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 230000003595 spectral effect Effects 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 230000003278 mimic effect Effects 0.000 abstract description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
Definitions
- the present invention relates to a method for generating personality patterns and to a method for synthesizing speech.
- man-machine dialogue systems to ensure an easy and reliable use by a human user.
- These man-machine dialogue systems are enabled to receive and consider users' utterances, in particular orders and/or inquiries, and to react and respond in an appropriate way.
- current speech synthesis systems involved in such man-machine dialogue systems suffer from a lack of personality and naturalness.
- the systems are enabled to deal with the context of the situation in an appropriate way, the prepared and output speech of the dialogue system often sounds monotonically, machine-like, and not embedded into the particular situation.
- the object is achieved by a method for generating personality patterns, in particular for synthesizing speech, with the features of claim 1. Furtheron, the object is achieved by a method for synthesizing speech according to the characterizing features of claim 11.
- a system and a computer program product for carrying out the inventive methods are the subject-matter of claims 14 and 15, respectively. Preferred embodiments of the inventive methods are within the scope of the dependent subclaims.
- a speech input is received and/or preprocessed.
- acoustical and/or non-acoustical speech features are extracted.
- a personality pattern is generated and/or stored.
- online input speech and/or speech of a speech data base for at least one given speaker are used for receiving said speech input.
- a speech data base enables a system involving the inventive method to generate the personality patterns in advance of an application. That means that, before the system is applied for example in an speech synthesizing unit, a speech model for a single speaker or for a variety of speakers can be constructed.
- the personality patterns during the application in a speech synthesizing unit in a real time or online manner, so as to adapt a speech output generated in a dialogue system during the application and/or during the dialogue with the user.
- pitch Within the class of prosodic features, pitch, pitch range, intonation attitude, loudness, speaking rate, phone duration, speech element duration features, and or the like can be employed.
- voice quality features phonation type, articulation manner, voice timbre features, and/or the like can be employed.
- contextual features and/or the like may be important in accordance to a further advantageous embodiment of the present invention.
- syntactical, grammatical, semantical features, and/or the like can be used as contextual features.
- a process of speech recognition is preferably carried out within the inventive method.
- a process of speaker identification and/or adaptation can be performed, in particular so as to increase the matching rate of the feature extraction and/or of the recognition rate of the process of speech recognition.
- the inventive method for synthesizing speech in particular for a man-machine dialogue system, the inventive method for generating personality patterns is employed.
- the method for generating personality patterns is essentially carried out in a preprocessing step, in particular based on a speech data base or the like.
- the method for generating personality patterns can be carried out and/or continued in a continuous, real time, or online manner. This enables a system involving said method for synthesizing speech to adapt its speech output in accordance to the received input during the dialogue.
- Both of the methods for generating personality patterns and/or for synthesizing speech can be configured to create a personality pattern or a speech output which is in some sense complementary to the personality pattern or character assigned to the speaker of the speech input. That means, for instance, that in the case of an emergency call system for activating ambulance or fire alarm services the speaker of the speech input might be excited and/or confused. It might therefore be necessary to calm down the speaking person and this can be achieved by creating a personality pattern for the speech synthesis reflecting a strong and confident and safe character. Additionally, it might also be possible to construct personality patterns for the synthesized speech output which reflects a gender which is complementary to the gender of the speaker of the speech input, i. e. in the case of a male speaker, the system might respond as a female speaker so as to make the dialogue most convenient for the speaking person.
- a computer program product comprising computer program means which is adapted to perform and/or to realize the inventive method for generating personality patterns and/or for synthesizing speech and/or the steps thereof when it is executed on a computer, a digital signal processing means, and/or the like.
- both his relevant voice quality features and his speech itself - as described by any units, such as words, syllables, diphones, sentences, and/or the like - is automatically extracted according to the invention. Also information about preferred sentence structure and word usage are extracted and used to create a speech synthesis system with those characteristics in a completely unsupervised way.
- the proposed methods can be used to mimic the actual speaker talking to the device but also to equip the device with some different personalities, e. g. gathered from the speaking style of famous people, movie stars, or the like. This can be very attractive for potential customers.
- the proposed system can be used not only to mimic speaker's behavior but more generally to control the dialogue depending on changing speaking style and emotions of the human partner.
- the collection of features describing the speaker's personality can be done on different levels during the conversation of the human by a dialogue unit.
- the speech signal has to be recorded and segmented into phones, diphones, and/or into other speech units or speech elements in dependence on the speech synthesis method used in the system.
- Prosodic features like pitch, pitch range, attitude of sentence intonation (monotonous or effected), loudness, speaking rate, durations of phones, and/or the like can be collected to characterize the speaker's prosody.
- Voice quality features like phonation type, articulation manner, voice timbre, and/or the like can be automatically extracted from the collected speech data.
- Speaker identification or a speaker identification module are necessary for a proper function of the system.
- the system can also collect all the words recognized from the adherences spoken by the speaker and to generate and evaluate statistics on the usage. This can be used to find the most frequent phrases, words used by a given speaker, and/or the like. Also syntactic information gathered from the recognized phrases can enhance the quality of personality description.
- the dialogue system can adjust parameters and units of acoustic output - for example the synthesized waveforms or the like - and modes of text generation to suite the recognized speaker's characteristic.
- the parameterized personality can be stored for future use or can be preprogrammed in the dialogue device.
- the information can be used to recognize speakers and to change the personality of the system depending on the user's preference or mood, for example in case of a system with a built-in emotion recognition engine.
- the personality can be changed according to the user's wish, preprogrammed sequence or depending on changing speaker's style and emotions of the speaker.
- the main advantage of such a system is the possibility to adapt the dialogue to the given speaker, make the dialogue more attractive, and/or the like.
- the possibility to mimic certain speakers or to switch between different personalities or speaking styles can be very entertaining and attractive for the user.
- FIG. 1 shows a preferred embodiment of the inventive method for a synthesizing speech employing an embodiment of the inventive method for generating personality pattern from a given received speech input SI.
- step S1 speech input S1 is received.
- a first section S10 of the inventive method for synthesizing speech non-acoustic features are extracted from the received speech input SI.
- acoustical features are extracted from the received speech input SI.
- the sections S10 and S20 can be performed parallely or sequentially on a given device or apparatus.
- a speech input S1 for extracting non-acoustical features from the speech input S1 in a first step S11, speech parameters are extracted from said speech input SI.
- the speech input S1 is fed into a speech recognizer to analyze the content and the context of the received speech input SI.
- contextual features are extracted from said speech input S1, in particular syntactical, semantical, grammatical, and statistical information on particular speech elements are obtained.
- the second section S20 of the inventive method for synthesizing speech consists of three steps S21, S22, and S23 to be performed independently from each other.
- prosodic features are extracted from the received speech input SI.
- Said prosodic feature may comprise features of pitch, pitch range, intonation attitude, loudness, speaking rate, speech element duration, and/or the like.
- voice quality features are extracted from the given received speech input SI, for instance phonation type, articulation manner, voice timbre features, and/or the like.
- the non-acoustical features and the acoustical features obtained from sections S10 and S20 are merged in a following postprocessing step S30 to detect, model, and store a personality pattern PP for the given speaker.
- the data describing the personality pattern PP for the current speaker are fed into a following step S40 which includes the steps of speech synthesis, text generation, and dialogue managing from which a responsive speech output SO is generated and then output in a final step S50.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
To mimic the speaking behavior of a given speaker, a method for generating
personality patterns in particular for synthesizing speech is proposed in which
acoustical as well as non-acoustical speech features (SF) are extracted from a
given speech input (SI).
Description
The present invention relates to a method for generating personality patterns
and to a method for synthesizing speech.
Nowadays, a large variety of equipment and appliances employ man-machine
dialogue systems to ensure an easy and reliable use by a human user. These
man-machine dialogue systems are enabled to receive and consider users' utterances,
in particular orders and/or inquiries, and to react and respond in an
appropriate way. Nevertheless, current speech synthesis systems involved in
such man-machine dialogue systems suffer from a lack of personality and
naturalness. Although the systems are enabled to deal with the context of the
situation in an appropriate way, the prepared and output speech of the dialogue
system often sounds monotonically, machine-like, and not embedded
into the particular situation.
It is an object of the present invention to provide a method for generating personality
patterns in particular for synthesizing speech and a method for synthesizing
speech in which naturalness of the speech and its features can be realized.
The object is achieved by a method for generating personality patterns, in particular
for synthesizing speech, with the features of claim 1. Furtheron, the
object is achieved by a method for synthesizing speech according to the characterizing
features of claim 11. A system and a computer program product for
carrying out the inventive methods are the subject-matter of claims 14 and 15,
respectively. Preferred embodiments of the inventive methods are within the
scope of the dependent subclaims.
In the inventive method for generating personality patterns, in particular for
synthesizing speech, a speech input is received and/or preprocessed. From the
speech input acoustical and/or non-acoustical speech features are extracted.
Based on the extracted speech features and/or on models and/or parameters
thereof, a personality pattern is generated and/or stored.
It is therefore a basic idea of the present invention to extract acoustical and
alternatively or simultaneously non-acoustical speech features from a received
speech input. The speech features are then directly or indirectly used to construct
a personality pattern which can lateron be used to reconstruct a speech
output with the mimic of the speech input and its speaker. The speech features
are therefore parameterized or modeled and included or described in certain
models or units.
According to an embodiment of the inventive method for generating personality
patterns, online input speech and/or speech of a speech data base for at least
one given speaker are used for receiving said speech input. Using a speech
data base enables a system involving the inventive method to generate the personality
patterns in advance of an application. That means that, before the
system is applied for example in an speech synthesizing unit, a speech model
for a single speaker or for a variety of speakers can be constructed. Within the
application of the inventive method it is also possible to construct the personality
patterns during the application in a speech synthesizing unit in a real
time or online manner, so as to adapt a speech output generated in a dialogue
system during the application and/or during the dialogue with the user.
It is an aspect of the present invention to use a large variety of features from
the speech input so as to model the personality patterns as good as possible to
achieve in an application of a dialogue system a particular natural responding
speech output.
It is therefore an aspect of a further embodiment of the present invention to
use prosodic features, voice quality features, global statistic and/or spectral
properties, and/or the like as acoustical features.
Within the class of prosodic features, pitch, pitch range, intonation attitude,
loudness, speaking rate, phone duration, speech element duration features,
and or the like can be employed.
Within the class of voice quality features, phonation type, articulation manner,
voice timbre features, and/or the like can be employed.
In the class of non-acoustical features, contextual features and/or the like
may be important in accordance to a further advantageous embodiment of the
present invention. In particular, syntactical, grammatical, semantical features,
and/or the like can be used as contextual features.
As a human speaker has distinct preferences in constructing sentences,
phrases, word combinations, and/or the like, according to a further preferred
embodiment of the present invention within the class of non-acoustical features
statistical features on the usage, distribution, and/or probability of
speech elements - such as words, subword units, syllables, phonemes, phones,
and/or the like - and/or combinations of them within said speech input can be
used. Additional sentence, phrase, word combination preferences can be evaluated
and included into said personality pattern.
To prepare for the extraction of contextual features or the like, a process of
speech recognition is preferably carried out within the inventive method.
Alternatively or additionally, a process of speaker identification and/or adaptation
can be performed, in particular so as to increase the matching rate of
the feature extraction and/or of the recognition rate of the process of speech
recognition.
In the inventive method for synthesizing speech, in particular for a man-machine
dialogue system, the inventive method for generating personality patterns
is employed.
According to a further embodiment of the inventive method for synthesizing
speech, the method for generating personality patterns is essentially carried
out in a preprocessing step, in particular based on a speech data base or the
like.
Alternatively or additionally, the method for generating personality patterns
can be carried out and/or continued in a continuous, real time, or online manner.
This enables a system involving said method for synthesizing speech to
adapt its speech output in accordance to the received input during the dialogue.
Both of the methods for generating personality patterns and/or for synthesizing
speech can be configured to create a personality pattern or a speech output
which is in some sense complementary to the personality pattern or character
assigned to the speaker of the speech input. That means, for instance,
that in the case of an emergency call system for activating ambulance or fire
alarm services the speaker of the speech input might be excited and/or confused.
It might therefore be necessary to calm down the speaking person and
this can be achieved by creating a personality pattern for the speech synthesis
reflecting a strong and confident and safe character. Additionally, it might also
be possible to construct personality patterns for the synthesized speech output
which reflects a gender which is complementary to the gender of the speaker of
the speech input, i. e. in the case of a male speaker, the system might respond
as a female speaker so as to make the dialogue most convenient for the speaking
person.
It is a further aspect of the present invention to provide a system, an apparatus,
a device, and/or the like for generating personality patterns and/or for
synthesizing speech which is in each case capable of performing and/or realizing
the inventive methods for generating personality patterns and/or for synthesizing
speech and/or its steps.
According to a further aspect of the present invention, a computer program
product is provided, comprising computer program means which is adapted to
perform and/or to realize the inventive method for generating personality patterns
and/or for synthesizing speech and/or the steps thereof when it is executed
on a computer, a digital signal processing means, and/or the like.
The aspects of the present invention will become more elucidated taking into
account the following remarks:
After the identification of a speaker, both his relevant voice quality features
and his speech itself - as described by any units, such as words, syllables, diphones,
sentences, and/or the like - is automatically extracted according to
the invention. Also information about preferred sentence structure and word
usage are extracted and used to create a speech synthesis system with those
characteristics in a completely unsupervised way.
The starting point for these inventive concepts is the lack of personality of current
speech synthesis systems. Prior art systems are developed with text-to-speech
(TTS) operation in mind, where intelligibility and naturalness of speech
is the most important. For dialogue systems, however, the personality of the
dialogue partner is essential, too. Depending on the personality of the artificial
dialogue partner, the speaker may be interested in continuation of the dialogue
or not. Thus, adding a personality pattern to the speech generated by the device
may be crucial for the success of the dialogue device.
Therefore, it is proposed to collect and store all information about speaking
style of the person making conversation with the system or device and to use
said information to modify the speaking style of the device.
The proposed methods can be used to mimic the actual speaker talking to the
device but also to equip the device with some different personalities, e. g. gathered
from the speaking style of famous people, movie stars, or the like. This
can be very attractive for potential customers. The proposed system can be
used not only to mimic speaker's behavior but more generally to control the
dialogue depending on changing speaking style and emotions of the human
partner.
The collection of features describing the speaker's personality can be done on
different levels during the conversation of the human by a dialogue unit. In order
to mimic the speaker's voice, the speech signal has to be recorded and segmented
into phones, diphones, and/or into other speech units or speech elements
in dependence on the speech synthesis method used in the system.
Prosodic features like pitch, pitch range, attitude of sentence intonation (monotonous
or effected), loudness, speaking rate, durations of phones, and/or
the like can be collected to characterize the speaker's prosody.
Voice quality features like phonation type, articulation manner, voice timbre,
and/or the like can be automatically extracted from the collected speech data.
Speaker identification or a speaker identification module are necessary for a
proper function of the system.
The system can also collect all the words recognized from the adherences spoken
by the speaker and to generate and evaluate statistics on the usage. This
can be used to find the most frequent phrases, words used by a given speaker,
and/or the like. Also syntactic information gathered from the recognized
phrases can enhance the quality of personality description.
After all necessary information has been collected, the dialogue system can adjust
parameters and units of acoustic output - for example the synthesized
waveforms or the like - and modes of text generation to suite the recognized
speaker's characteristic.
The parameterized personality can be stored for future use or can be preprogrammed
in the dialogue device. The information can be used to recognize
speakers and to change the personality of the system depending on the user's
preference or mood, for example in case of a system with a built-in emotion
recognition engine.
The personality can be changed according to the user's wish, preprogrammed
sequence or depending on changing speaker's style and emotions of the
speaker.
The main advantage of such a system is the possibility to adapt the dialogue to
the given speaker, make the dialogue more attractive, and/or the like. The
possibility to mimic certain speakers or to switch between different personalities
or speaking styles can be very entertaining and attractive for the user.
In the following, further advantages and aspects of the present invention will
be described taking reference to the accompanying figure.
- Fig. 1
- is a schematical block diagram describing a preferred embodiment of a method for synthesizing speech employing an embodiment of the inventive method for generating personality patterns.
The schematical block diagram of Fig. 1 shows a preferred embodiment of the
inventive method for a synthesizing speech employing an embodiment of the
inventive method for generating personality pattern from a given received
speech input SI.
In step S1, speech input S1 is received. In a first section S10 of the inventive
method for synthesizing speech, non-acoustic features are extracted from the
received speech input SI. In a second section S20 of the inventive method for
synthesizing speech, acoustical features are extracted from the received speech
input SI. The sections S10 and S20 can be performed parallely or sequentially
on a given device or apparatus.
In the first section S10 for extracting non-acoustical features from the speech
input S1 in a first step S11, speech parameters are extracted from said speech
input SI. In a second step S12, the speech input S1 is fed into a speech recognizer
to analyze the content and the context of the received speech input SI.
Based on the recognition result, in a following step S13 contextual features are
extracted from said speech input S1, in particular syntactical, semantical,
grammatical, and statistical information on particular speech elements are obtained.
In the embodiment of Fig. 1, the second section S20 of the inventive method
for synthesizing speech consists of three steps S21, S22, and S23 to be performed
independently from each other.
In the first step S21 of the second section S20 for extracting acoustical features,
prosodic features are extracted from the received speech input SI. Said
prosodic feature may comprise features of pitch, pitch range, intonation attitude,
loudness, speaking rate, speech element duration, and/or the like.
In a second step S22, voice quality features are extracted from the given received
speech input SI, for instance phonation type, articulation manner, voice
timbre features, and/or the like.
Finally, in a third and final step S23 of the second section S20, statistical/spectral
features are extracted from the given speech input SI.
The non-acoustical features and the acoustical features obtained from sections
S10 and S20 are merged in a following postprocessing step S30 to detect,
model, and store a personality pattern PP for the given speaker.
The data describing the personality pattern PP for the current speaker are fed
into a following step S40 which includes the steps of speech synthesis, text
generation, and dialogue managing from which a responsive speech output SO
is generated and then output in a final step S50.
Claims (15)
- Method for generating personality patterns, in particular for synthesizing speech, wherein:speech input (SI) is received and/or preprocessed,acoustical and/or non-acoustical speech features (SF) are extracted from said speech input (SI),based on the extracted speech features (SF) or on models or parameters thereof a personality pattern (PP) is generated and/or stored.
- Method according to claim 1, wherein online input speech and/or speech of a speech data base for at least one given speaker are used for receiving said speech input (SI).
- Method according to anyone of the preceding claims, wherein prosodic features, voice quality features, global statistical, and/or spectral properties, and/or the like are used as acoustical features.
- Method according to claim 3, wherein pitch, pitch range, intonation attitude, loudness, speaking rate, phone duration, speech element duration features, and/or the like are used as prosodic features.
- Method according to anyone of the claims 3 or 4, wherein phonation type, articulation manner, voice timbre features, and/or the like are used as voice quality features.
- Method according to anyone of the preceding claims, wherein contextual features, and/or the like are used as said non-acoustical features.
- Method according to claim 6, wherein syntactical, grammatical, semantical features, and/or the like are used as contextual features.
- Method according to anyone of the claims 6 or 7, wherein statistical features on the usage, distribution, and/or probability of speech elements - such as words, subword units, syllables, phonemes, phones, and/or the like - and/or combinations of them within said speech input (SI) are used as non-acoustical features.
- Method according to anyone of the preceding claims, wherein a process of speech recognition is carried out, in particular to prepare the extraction of contextual features and/or the like.
- Method according to anyone of the preceding claims, wherein a process of speaker identification and/or adaptation is performed, in particular so as to increase the matching rate of the feature extraction and/or of the recognition rate of the process of speech recognition.
- Method for synthesizing speech, in particular for a man-machine dialogue system, wherein the method for generating personality patterns according to anyone of the claims 1 to 10 is employed.
- Method according to claim 11, wherein the method for generating personality patterns is essentially carried out in a preprocessing step, in particular based on a speech data base or the like.
- Method according to anyone of the claims 11 or 12, wherein the method for generating personality patterns is carried out and/or continued in a continuous, real time, or online manner.
- System for generating personality patterns and/or for synthesizing speech which is capable of performing and/or realizing the method for generating personality patterns according to anyone of the claims 1 to 10 and/or the method for synthesizing speech according to anyone of the claims 11 to 13 and/or the steps thereof.
- Computer program product, comprising computer program means adapted to perform and/or to realize the method for generating personality patterns according to anyone of the claims 1 to 10 and/or the method for synthesizing speech according to anyone of the claims 11 to 13 and/or the steps thereof when it is executed on a computer, a digital signal processing means, and/or the like.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01115216A EP1271469A1 (en) | 2001-06-22 | 2001-06-22 | Method for generating personality patterns and for synthesizing speech |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01115216A EP1271469A1 (en) | 2001-06-22 | 2001-06-22 | Method for generating personality patterns and for synthesizing speech |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1271469A1 true EP1271469A1 (en) | 2003-01-02 |
Family
ID=8177799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01115216A Withdrawn EP1271469A1 (en) | 2001-06-22 | 2001-06-22 | Method for generating personality patterns and for synthesizing speech |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP1271469A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004068466A1 (en) * | 2003-01-24 | 2004-08-12 | Voice Signal Technologies, Inc. | Prosodic mimic synthesis method and apparatus |
WO2005081508A1 (en) * | 2004-02-17 | 2005-09-01 | Voice Signal Technologies, Inc. | Methods and apparatus for replaceable customization of multimodal embedded interfaces |
EP2147429A1 (en) * | 2007-05-24 | 2010-01-27 | Microsoft Corporation | Personality-based device |
US7873390B2 (en) | 2002-12-09 | 2011-01-18 | Voice Signal Technologies, Inc. | Provider-activated software for mobile communication devices |
WO2014024399A1 (en) * | 2012-08-10 | 2014-02-13 | Casio Computer Co., Ltd. | Content reproduction control device, content reproduction control method and program |
US9363378B1 (en) | 2014-03-19 | 2016-06-07 | Noble Systems Corporation | Processing stored voice messages to identify non-semantic message characteristics |
US9865281B2 (en) | 2015-09-02 | 2018-01-09 | International Business Machines Corporation | Conversational analytics |
CN110751940A (en) * | 2019-09-16 | 2020-02-04 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and computer storage medium for generating voice packet |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5278943A (en) * | 1990-03-23 | 1994-01-11 | Bright Star Technology, Inc. | Speech animation and inflection system |
WO1999012324A1 (en) * | 1997-09-02 | 1999-03-11 | Jack Hollins | Natural language colloquy system simulating known personality activated by telephone card |
US6144938A (en) * | 1998-05-01 | 2000-11-07 | Sun Microsystems, Inc. | Voice user interface with personality |
-
2001
- 2001-06-22 EP EP01115216A patent/EP1271469A1/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5278943A (en) * | 1990-03-23 | 1994-01-11 | Bright Star Technology, Inc. | Speech animation and inflection system |
WO1999012324A1 (en) * | 1997-09-02 | 1999-03-11 | Jack Hollins | Natural language colloquy system simulating known personality activated by telephone card |
US6144938A (en) * | 1998-05-01 | 2000-11-07 | Sun Microsystems, Inc. | Voice user interface with personality |
Non-Patent Citations (2)
Title |
---|
JANET E. CAHN: "The Generation of Affect in Synthesized Speech", JOURNAL OF THE AMERICAN VOICE I/O SOCIETY, vol. 8, July 1990 (1990-07-01), pages 1 - 19, XP002183399, Retrieved from the Internet <URL:http://www.media.mit.edu/~cahn/masters-thesis.htm> [retrieved on 20011120] * |
KLASMEYER ET AL: "The perceptual importance of selected voice quality parameters", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 1997. ICASSP-97., 1997 IEEE INTERNATIONAL CONFERENCE ON MUNICH, GERMANY 21-24 APRIL 1997, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 21 April 1997 (1997-04-21), pages 1615 - 1618, XP010226301, ISBN: 0-8186-7919-0 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7873390B2 (en) | 2002-12-09 | 2011-01-18 | Voice Signal Technologies, Inc. | Provider-activated software for mobile communication devices |
WO2004068466A1 (en) * | 2003-01-24 | 2004-08-12 | Voice Signal Technologies, Inc. | Prosodic mimic synthesis method and apparatus |
US8768701B2 (en) | 2003-01-24 | 2014-07-01 | Nuance Communications, Inc. | Prosodic mimic method and apparatus |
CN1742321B (en) * | 2003-01-24 | 2010-08-18 | 语音信号科技公司 | Prosodic mimic method and apparatus |
WO2005081508A1 (en) * | 2004-02-17 | 2005-09-01 | Voice Signal Technologies, Inc. | Methods and apparatus for replaceable customization of multimodal embedded interfaces |
US8285549B2 (en) | 2007-05-24 | 2012-10-09 | Microsoft Corporation | Personality-based device |
US8131549B2 (en) * | 2007-05-24 | 2012-03-06 | Microsoft Corporation | Personality-based device |
AU2008256989B2 (en) * | 2007-05-24 | 2012-07-19 | Microsoft Technology Licensing, Llc | Personality-based device |
EP2147429A4 (en) * | 2007-05-24 | 2011-10-19 | Microsoft Corp | Personality-based device |
EP2147429A1 (en) * | 2007-05-24 | 2010-01-27 | Microsoft Corporation | Personality-based device |
WO2014024399A1 (en) * | 2012-08-10 | 2014-02-13 | Casio Computer Co., Ltd. | Content reproduction control device, content reproduction control method and program |
US9363378B1 (en) | 2014-03-19 | 2016-06-07 | Noble Systems Corporation | Processing stored voice messages to identify non-semantic message characteristics |
US9865281B2 (en) | 2015-09-02 | 2018-01-09 | International Business Machines Corporation | Conversational analytics |
US9922666B2 (en) | 2015-09-02 | 2018-03-20 | International Business Machines Corporation | Conversational analytics |
US11074928B2 (en) | 2015-09-02 | 2021-07-27 | International Business Machines Corporation | Conversational analytics |
CN110751940A (en) * | 2019-09-16 | 2020-02-04 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and computer storage medium for generating voice packet |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7355306B2 (en) | Text-to-speech synthesis method, device, and computer-readable storage medium using machine learning | |
KR100811568B1 (en) | Method and apparatus for preventing speech comprehension by interactive voice response systems | |
Shichiri et al. | Eigenvoices for HMM-based speech synthesis. | |
US20200251104A1 (en) | Content output management based on speech quality | |
JP4884212B2 (en) | Speech synthesizer | |
JPH10507536A (en) | Language recognition | |
JP5507260B2 (en) | System and technique for creating spoken voice prompts | |
US7454348B1 (en) | System and method for blending synthetic voices | |
WO2007148493A1 (en) | Emotion recognizer | |
CA2167200A1 (en) | Multi-language speech recognition system | |
WO2002080140A1 (en) | Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems | |
EP1280137B1 (en) | Method for speaker identification | |
WO2006106182A1 (en) | Improving memory usage in text-to-speech system | |
JP2006517037A (en) | Prosodic simulated word synthesis method and apparatus | |
US20230146945A1 (en) | Method of forming augmented corpus related to articulation disorder, corpus augmenting system, speech recognition platform, and assisting device | |
JP2011186143A (en) | Speech synthesizer, speech synthesis method for learning user's behavior, and program | |
EP1271469A1 (en) | Method for generating personality patterns and for synthesizing speech | |
Levinson et al. | Speech synthesis in telecommunications | |
O'Shaughnessy | Modern methods of speech synthesis | |
Pols | Flexible, robust, and efficient human speech processing versus present-day speech technology | |
US20230148275A1 (en) | Speech synthesis device and speech synthesis method | |
Westall et al. | Speech technology for telecommunications | |
JP3706112B2 (en) | Speech synthesizer and computer program | |
Carlson | Synthesis: Modeling variability and constraints | |
Nthite et al. | End-to-End Text-To-Speech synthesis for under resourced South African languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
AKX | Designation fees paid | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20030703 |