KR101089184B1

KR101089184B1 - Method and system for providing a speech and expression of emotion in 3D charactor

Info

Publication number: KR101089184B1
Application number: KR20100000837A
Authority: KR
Inventors: 송세경; 이준영
Original assignee: (주) 퓨처로봇
Priority date: 2010-01-06
Filing date: 2010-01-06
Publication date: 2011-12-02
Also published as: WO2011083978A3; WO2011083978A2; KR20110081364A

Abstract

The present invention simultaneously executes a utterance operation that expresses the contents of speech while executing an emotional expression operation such as crying or laughing of a three-dimensional character appearing in a three-dimensional animation, three-dimensional virtual space, advertisement contents delivery, and the like. The present invention relates to a system and method for providing character speech and emotion expression through which story transmission, advertisement delivery, and content delivery can be made clearly.
A system for providing speech and emotion expression of a character according to the present invention includes a situation recognition unit for recognizing surrounding situations; A speech door selection unit for selecting a speech door according to the recognized surrounding situation; A utterance image selection unit for selecting a lip shape required to express the selected utterance sentence; An expression selection unit for selecting a facial expression corresponding to the emotion expression according to the recognized surrounding situation; A sound source generator for generating a sound source corresponding to the selected speech sentence; A syntax analysis unit for extracting consonant and vowel information necessary for generating a lip shape from the spoken sentence, and generating time information in which the consonant and vowel in which the lip shape is changed are pronounced; A controller configured to control the facial expression, the lip shape, and the sound source to be synchronized; And an emotional expression unit expressing the synchronized facial expressions, the lips, and the sound source.
According to the present invention, it is possible to provide a character capable of simultaneously displaying a facial expression and speech content in a 2D or 3D character. Accordingly, various emotion expressions may be provided according to facial expressions and utterances of the character.

Description

Method and system for providing a speech and expression of emotion in 3D charactor

The present invention relates to a system and method for simultaneously providing a utterance motion and an emotion expression motion of a three-dimensional character. More specifically, the three-dimensional character appearing in three-dimensional animation, three-dimensional virtual space, advertisement content delivery, etc. Simultaneous utterances that express the contents of speech while performing emotion expressing actions such as crying or laughing, make it possible to clearly communicate stories, advertisements, and contents through 3D characters. Emotional expression providing system and method.

The main direction of conventional facial animation research has been to find an efficient way to deal with emotions and lip movements. Until now, many studies on facial expression behavior have been conducted at home and abroad, but it is hard to say that characters appearing in 3D games and animations are producing natural facial expressions. Nevertheless, face modeling and animation have actually advanced dramatically in recent years.

Computer graphics technology for 3D animation production is currently growing and developing globally, and researches on expanding and improving the expressive range, improving performance for shortening production time and reducing production cost, and improving the interface for user convenience are being conducted. have.

In addition, current voice recognition and speaker authentication technology has been steadily developed around the world, showing very satisfactory performance in a limited environment. In this technology, it is essential to extract clear boundaries of phonemes from consecutive voices in order to improve the performance of food recognition or speaker authentication system. The most important thing to consider in the natural facial expression of the characters in the animation is the synchronization of the voice signal and the lip movement.

In the case of producing animation, the voice actor first records the dialogue and creates a character animation accordingly. Therefore, it is difficult to use the conventional text-based mouth shape synchronization and facial expression animation methods in actual production sites. A technique for generating animation by extracting phonemes directly from recorded voice data has been studied.

However, researches on facial expressions and movements of facial parts themselves have been done a lot, including medicine and art, but the three-dimensional face models that are actually used are mainly drawn by frame by animator by hand or by using three-dimensional software. Even if the animation was performed, the quality (Quality) was lowered compared to the working time.

In addition, when the emotion expression and the utterance motion are applied to the 3D character, the expression of the emotion of the 3D character is performed, such as the smiley expression of the lips, followed by the utterance of moving the lips, or the crying after the utterance operation is performed. Emotional expression and utterances such as the emotional expression of the operation proceeded as a separate sequential operation. Therefore, there is a demand for a technology that enables a speech operation to be performed simultaneously while executing an emotion expressing operation such as crying or laughing in order to improve content delivery or story delivery power according to the motion of the 3D character.

An object of the present invention for solving the above-described problems, a speech that expresses the delivered content in words while performing an emotional expression operation such as crying or laughing, such as a three-dimensional character appearing in three-dimensional animation, three-dimensional virtual space, advertising content delivery, etc. The present invention provides a system and method for providing utterances and emotion expressions of characters so that story transmission, advertisement delivery, content delivery, etc. can be clearly performed through the three-dimensional character.

According to an aspect of the present invention, there is provided a system for providing speech and emotion expression of a character, including: a situation recognition unit for recognizing a surrounding situation; A speech door selection unit for selecting a speech door according to the recognized surrounding situation; A utterance image selection unit for selecting a lip shape required to express the selected utterance sentence; An expression selection unit for selecting a facial expression corresponding to the emotion expression according to the recognized surrounding situation; A sound source generator for generating a sound source corresponding to the selected speech sentence; A syntax analysis unit for extracting consonant and vowel information necessary for generating a lip shape from the spoken sentence, and generating time information in which the consonant and vowel in which the lip shape is changed are pronounced; A controller configured to control the facial expression, the lip shape, and the sound source to be synchronized; And an emotional expression unit expressing the synchronized facial expressions, the lips, and the sound source.

In addition, the facial expression database for storing the facial expression as an image; A utterance image database storing the lip shape as a utterance image; A utterance statement database storing data corresponding to the utterance statement; And an emotion adding unit for changing the tone of the generated sound source to add emotion information.

The emotion expression unit may include a display unit for displaying the synchronized face and lips on a screen, and a sound source output unit for outputting a sound source synchronized with the face and lips.

In addition, the control unit analyzes the consonants and vowels of the speech sentence, controls the lip shape based on the vowel in which the lip shape changes the most, and closes the lip before expressing the next vowel when the lip is closed. To control.

The control unit may include connection lines, such as bones corresponding to human bones, on the lip-shaped graphic objects of the upper and lower lips with respect to the movement of the lips. The lips shape is controlled to move according to their movement.

The control unit may control a plurality of connection lines, a plurality of rotation control points in the connection line, a plurality of position control points at the tip of the lips, and a plurality of connection lines and a plurality of position control points for the lower lip.

In addition, the controller controls the lip shape by moving / rotating a control point, or controls the operation of the lip shape by applying acceleration / deceleration to an object to which the two control points are connected.

The control unit controls the lip shape by applying a weight to a control point of the lip shape in the facial expression according to the emotional state.

The controller may be further configured to, for the face expression, the lip shape, and the sound source, perform facial expression and the facial expression according to a synchronization function including a difference between a facial expression and expression time, a speech and speech time, and a facial expression time and speech time. Lip shape and the sound source is controlled to be synchronized.

On the other hand, in order to achieve the above object, a method of providing a speech and emotion expression of a character according to the present invention includes: (a) recognizing a surrounding situation; (b) selecting a spoken text according to the recognized surrounding situation; (c) selecting the shape of lips needed to express the selected speech; (d) selecting a facial expression corresponding to the emotional expression according to the recognized surrounding situation; (e) generating a sound source corresponding to the selected speech sentence; (f) extracting consonant and vowel information necessary for lip shape generation from the spoken sentence, and generating time information in which the consonant and vowel in which the lip shape is changed are pronounced; And (g) synchronizing the facial expression with the lips and expressing the sound source.

In addition, in the step (c), by analyzing the consonants and vowels of the utterance, the lip shape is selected based on the vowel in which the lip shape changes the most, and the lips before the next vowel is expressed in the consonant pronunciation when the lips are closed. Choose this closed lip shape.

In addition, the step (g), the facial expression and the lip shape and the sound source, the facial expression and expression time, the speech and speech time, the facial expression according to the synchronization function consisting of the difference between the facial expression time and speech time Expression is expressed in synchronization with the lip shape and the sound source.

Also, the step (c) may include connection lines, such as bones, corresponding to human bones in the graphic objects of the upper lip and the lower lip so that the joints move similarly to the selection of the lip shape. , And selects a lip shape formed according to the movement of the connecting lines.

Also, in the step (c), the changed lip shape is selected by moving / rotating the control point, or the lip shape to which acceleration / deceleration is applied to the object to which the two control points are connected is selected.

The step (c) selects a lip shape to which weights are applied to a control point of the lip shape in the facial expression according to the emotional state.

According to the present invention, it is possible to provide a character capable of simultaneously displaying a facial expression and speech content in a 2D or 3D character.

Accordingly, various emotion expressions may be provided according to facial expressions and utterances of the character.

1 is a block diagram schematically illustrating a functional block of a system for providing a speech and emotion expression of a character according to an exemplary embodiment of the present invention.
2 is a flowchart illustrating a method of providing a speech and emotion expression of a character according to an exemplary embodiment of the present invention.
3 is a view showing an example of a lip shape provided with a bone according to an embodiment of the present invention.
4 is a diagram illustrating an example of synchronizing facial expression and speech information based on time information.
5 is a view showing an example in which the facial expression and the shape of the lips are simultaneously expressed according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram schematically illustrating a functional block of a system for providing a speech and emotion expression of a character according to an exemplary embodiment of the present invention.

Referring to FIG. 1, the character speech and emotion expression providing system 100 according to the present invention includes a situation recognition unit 102, a speech sentence selecting unit 104, a speech image selecting unit 106, and an expression selecting unit ( 108, sound source generation unit 110, syntax analysis unit 112, control unit 114, emotion expression unit 116, facial expression database 118, speech image database 120, speech statement database 122, emotion It includes the seasoning portion 124.

The situation recognition unit 102 recognizes a surrounding situation. For example, the situation recognition unit 102 recognizes a situation in which the surrounding situation approaches the customer over a predetermined distance.

The speech door selecting unit 104 selects a speech door according to the recognized surrounding situation. That is, when the situation recognition unit 102 recognizes that the guest has approached a predetermined distance or more, the speech door selecting unit 104 selects the speech text, for example, "Hello? Come!"

In addition, a user input unit may be provided so that a user may input emotions and spoken texts arbitrarily.

The utterance image selection unit 106 selects a lip shape necessary to express the selected utterance.

The facial expression selector 108 selects a facial expression corresponding to the emotional expression according to the recognized surrounding situation.

The sound source generator 110 generates a sound source corresponding to the selected speech sentence.

The parser 112 extracts consonant and vowel information necessary for lip generation from the spoken sentence, and generates time information in which the consonant and vowel in which the lip shape changes are pronounced.

The controller 114 controls to express the facial expression, the lip shape, and the sound source in synchronization.

The emotion expression unit 116 expresses synchronized facial expressions, lips, and sound sources. Here, the emotion expression unit 116 includes a display unit for displaying the synchronized face and lips on the screen, and a sound source output unit for outputting a sound source synchronized with the face and the lips.

The facial expression DB 118 stores a plurality of facial expressions as an image.

The utterance image DB 120 stores a plurality of lip shapes as an utterance image.

The utterance statement DB 122 stores data corresponding to the utterance statement.

The emotion adding unit 124 changes the tone of the generated sound source to add emotion information.

In addition, the controller 114 analyzes the consonants and vowels of the spoken sentence, controls the lip shape based on the vowel in which the lip shape changes the most, and controls the lips to close before expressing the next vowel when the lip is closed. do.

In addition, the control unit 114 includes connection lines, such as bones corresponding to human bones, on the lip-shaped graphic objects of the upper and lower lip so that the joints move similarly to the movement of the lips. The lips are controlled to move according to the movement of the same connecting line.

In addition, the control unit 114 controls a plurality of connection lines and a plurality of rotation control points in the connection line, a plurality of position control points of the lip end for the upper lip, and controls a plurality of connection lines and a plurality of position control points for the lower lip.

In addition, the control unit 114 controls to change the shape of the lips by moving / rotating the control point using a twin technique, or applies acceleration / deceleration to an object to which the two control points are connected by using an Ease in / out technique. It is applied to control the operation of the lip shape.

In addition, the controller 114 controls by applying a weight to the control point of the lip shape in the facial expression according to the emotional state when adjusting the control point of the lip shape.

Then, the control unit 114 for the facial expression, lips and sound source, the face according to the synchronization function consisting of the difference between facial expression and expression time, speech and speech time, facial expression time and speech time, as shown in Equation 1 Controls expression, lips and sound to be synchronized.

Here, Tai is composed of facial expression i and expression time ti, Tbi is composed of speech sentence i and speech time ti, and Tci represents the difference i between facial expression time and speech sentence time.

2 is a flowchart illustrating a method of providing a speech and emotion expression of a character according to an exemplary embodiment of the present invention.

Referring to FIG. 2, the character speech and emotion expression providing system 100 according to the present invention first recognizes a surrounding situation through the situation recognition unit 102 (S202).

Here, the situation recognition unit 102 may be implemented as a system for analyzing an image by photographing a surrounding situation through a camera. In addition, the situation recognizer 102 may be provided with various sensors capable of recognizing the situation, thereby recognizing the situation.

Subsequently, the character utterance and emotion expression providing system 100 selects an utterance sentence according to the surrounding situation recognized by the situation recognition unit 102 through the utterance sentence selection unit 104 (S204).

Subsequently, the character utterance and emotion expression providing system 100 selects a lip shape necessary to express the selected speech sentence through the utterance image selection unit 106 (S206).

At this time, the character utterance and emotion expression providing system 100 analyzes the consonants and vowels of the utterance, selects the lip shape based on the vowel in which the lip shape changes the most, and expresses the next vowel when the lip is closed consonant pronunciation. Before you do, choose a lip shape that closes your lips.

In addition, the character utterance and emotion expression providing system 100 is a lip shaped graphic object of the upper lip and the lower lip as shown in FIG. It is provided with connecting lines such as bones to function, and selects a lip shape formed by the movement of the connecting lines such as bones. 3 is a view showing an example of a lip shape provided with a bone according to an embodiment of the present invention.

In addition, the character utterance and emotion expression providing system 100 selects a changed lip shape by moving / rotating a control point using a Tween technique, or connects two control points using an Ease in / out technique. Select the lips with acceleration / deceleration applied to the object.

When the character utterance and emotion expression providing system 100 adjusts the control point for controlling the lip shape, the lip shape with weights applied to the control point of the lip shape in the facial expression according to the emotional state, for example, as shown in Equation 2 below. Select.

Here, the k value represents a weight for determining the final lip shape.

Subsequently, the character utterance and emotion expression providing system 100 selects the facial expression corresponding to the emotion expression according to the recognized surrounding situation through the facial expression selection unit 108 (S208).

Subsequently, the character speech and emotion expression providing system 100 generates a sound source corresponding to the selected speech sentence through the sound source generator 110 (S210).

Subsequently, the character utterance and emotion expression providing system 100 extracts consonant and vowel information necessary for generating a lip shape from the utterance sentence, and generates time information in which the consonant and vowel in which the lip shape is pronounced are pronounced (S212).

Subsequently, the character speech and emotion expression providing system 100 synchronizes the facial expression, the lip shape, and the sound source as shown in FIG. 4 according to the synchronization function of Equation 1 based on the time information. Expressed through (S214). 4 is a diagram illustrating an example of synchronizing facial expression and speech information based on time information.

That is, the character speech and emotion expression providing system 100 includes a Tai element according to the facial expression i and the expression time ti, the speech sentence i and the speech time for the facial expression, the lip shape, and the sound source, as shown in FIG. 5. Tbi elements according to ti, the difference between facial expression time and speech time, for example, A, O, E, U, L, N, etc. by synchronizing facial expressions, lips and sound sources Can be expressed. 5 is a view showing an example in which the facial expression and the shape of the lips are simultaneously expressed according to an embodiment of the present invention. In the present invention, the character utterance and emotion expression providing system 100 controls the image in a vector manner, and controls the shape of the lips at the same time as the reproduction of the sound source.

Therefore, it is possible to provide users with a three-dimensional character capable of expressing a lip shape and a sound source simultaneously with facial expressions such as a smiling face.

As described above, according to the present invention, a three-dimensional character appearing in a three-dimensional animation, a three-dimensional virtual space, advertisement content delivery, etc. simultaneously executes an utterance operation that expresses the delivered content in words while performing an emotional expression operation such as crying or laughing. By doing so, it is possible to realize a system and method for providing a utterance and emotion expression of a character such that story transmission, advertisement delivery, content delivery, and the like can be clearly performed through the three-dimensional character.

As those skilled in the art to which the present invention pertains may implement the present invention in other specific forms without changing the technical spirit or essential features, the embodiments described above are intended to be illustrative in all respects and should not be considered as limiting. Should be. The scope of the present invention is shown by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

100: character utterance and emotion expression providing system 102: situation recognition unit
104: speech door selection unit 106: speech image selection unit
108: facial expression selection unit 110: sound source generation unit
112: parser 114: control unit
116: emotion expression unit 118: facial expression DB
120: ignition image DB 122: utterance statement DB
124: Emotional Kabubu

Claims

A situation recognition unit for recognizing a surrounding situation;
A speech door selection unit for selecting a speech door according to the recognized surrounding situation;
A utterance image selection unit for selecting a lip shape required to express the selected utterance sentence;
An expression selection unit for selecting a facial expression corresponding to the emotion expression according to the recognized surrounding situation;
A sound source generator for generating a sound source corresponding to the selected speech sentence;
A syntax analysis unit for extracting consonant and vowel information necessary for generating a lip shape from the spoken sentence, and generating time information in which the consonant and vowel in which the lip shape is changed are pronounced;
A controller configured to synchronize the facial expression, the lip shape, and the sound source according to a synchronization function based on the generated time information; And
An emotion expression unit expressing the synchronized facial expressions, lips, and sound sources;
Speech and emotion expression providing system of the character comprising a.

The method of claim 1,
An expression database storing the facial expressions as an image;
A utterance image database storing the lip shape as a utterance image;
A utterance statement database storing data corresponding to the utterance statement; And
Emotion adding unit for changing the tone of the generated sound source to add emotion information;
Speech and emotion expression providing system of the character further comprising.

The method of claim 1,
The emotion expression unit may include a display unit for displaying the synchronized face and lips on a screen, and a sound source output unit for outputting a sound source synchronized with the face and lips. Expressive delivery system.

The method of claim 1,
The control unit analyzes the consonants and vowels of the utterance sentence, controls the lip shape based on the vowel in which the lip shape changes the most, and controls the lip to close before expressing the next vowel when the lip is closed. System for providing speech and emotion expression of the character, characterized in that.

The method of claim 1,
The control unit includes connection lines, such as bones corresponding to human bones, on the lip-shaped graphic objects of the upper lip and the lower lip so that the lip movement is similar to the movement of the joint. System for providing speech and emotion expression of the character, characterized in that to control the movement of the lips shape.

The method of claim 5, wherein
The control unit controls a plurality of connection lines, a plurality of rotation control points in the connection line, a plurality of position control points of the end of the lip for the upper lip, and controls a plurality of connection lines and a plurality of position control points for the lower lip. Character utterance and emotion expression providing system.

The method of claim 5, wherein
The controller controls movement of the lip shape by moving / rotating a control point or by applying acceleration / deceleration to an object to which two control points are connected. Provide system.

The method of claim 5, wherein
And the control unit controls the lip-shaped control point by applying a weight to a control point of the lip shape in the facial expression according to the emotional state.

The method of claim 1,
The control unit, the facial expression and the lip shape and the sound source, the facial expression and the lip shape according to the synchronization function consisting of the difference between the facial expression and expression time, the speech and speech time, the facial expression time and speech language time And controlling the sound source to be synchronized.

(a) recognizing the surrounding situation;
(b) selecting a spoken text according to the recognized surrounding situation;
(c) selecting the shape of lips needed to express the selected speech;
(d) selecting a facial expression corresponding to the emotional expression according to the recognized surrounding situation;
(e) generating a sound source corresponding to the selected speech sentence;
(f) extracting consonant and vowel information necessary for lip shape generation from the spoken sentence, and generating time information in which the consonant and vowel in which the lip shape is changed are pronounced; And
(g) synchronizing the facial expression with the lip shape and the sound source according to a synchronization function based on the generated time information;
Speech and emotion expression providing method of the character comprising a.

The method of claim 10,
In the step (c), by analyzing the consonants and vowels of the utterance, the lip shape is selected based on the vowel in which the lip shape changes the most, and the lip is closed before the next vowel is expressed when the lip is closed. A method for providing speech and emotion expression of a character, characterized in that the selection of the lips shape.

The method of claim 10,
In the step (g), for the facial expression and the lip shape and the sound source, the facial expression and the facial expression according to a synchronous function comprising a difference between facial expression and expression time, speech and speech time, facial expression time and speech language time, and the like. A method for providing speech and emotion expression of a character, characterized in that the lip and the sound source is displayed in synchronization.

The method of claim 10,
In the step (c), for the lip selection, the lip shape graphic object of the upper lip and the lower lip is provided with connection lines such as a bone corresponding to a human bone, so that the joints move similarly. A method of providing utterances and expressions of characters characterized by selecting a lip shape formed according to the movement of the connecting lines.

The method of claim 10,
In the step (c), moving / rotating the control point to select a changed lip shape, or selecting a lip shape to which acceleration / deceleration is applied to an object to which the two control points are connected.

The method of claim 10,
In the step (c), the method of providing a character utterance and emotion expression, characterized in that the weight of the lip shape is selected to the control point of the lip shape in the facial expression according to the emotional state.