JP2003058908A

JP2003058908A - Method and device for controlling face image, computer program and recording medium

Info

Publication number: JP2003058908A
Application number: JP2001244176A
Authority: JP
Inventors: Osamu Toyama; 修遠山
Original assignee: Minolta Co Ltd
Current assignee: Minolta Co Ltd
Priority date: 2001-08-10
Filing date: 2001-08-10
Publication date: 2003-02-28

Abstract

PROBLEM TO BE SOLVED: To reduce quantity of data regarding the shape and to realistically realize lip sync animation by control of a face image in comparison with the conventional manner. SOLUTION: Pieces of first shape data regarding the shapes of a mouth when vowels are uttered are stored by every kind of vowels, kinds of consonants having common points in the shapes of the mouth when the consonants are uttered are sorted into the same groups, second shape data regarding the shapes of the mouth when the consonants sorted into the groups are uttered, are stored (#4) by every group, sounds of a language are sectioned (#5) by every vowel or consonant and motions of the face image are controlled (#8) on the basis of pieces of the first shape data corresponding to the vowels or pieces of the second shape data corresponding to the groups into which the consonants are sorted by every sectioned vowel and consonant.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、言葉に合わせて口
が動く顔画像いわゆるリップシンクアニメーションの制
御方法および制御装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a control method and control device for a so-called lip-sync animation of a face image whose mouth moves in accordance with a word.

【０００２】[0002]

【従来の技術】従来より、言葉に合わせて顔画像の口を
動かし、その顔画像がその言葉を喋っているかのように
見せる技術が提案されている。2. Description of the Related Art Conventionally, a technique has been proposed in which a mouth of a face image is moved according to a word so that the face image looks as if the word is spoken.

【０００３】例えば、言葉の長さに応じて単に口の開閉
を繰り返し行うように顔画像を制御する方法がある。係
る方法は、顔画像の制御が簡単である。しかし、すべて
の音について口の形状が同じになるので、リアル性に乏
しい。For example, there is a method of controlling a face image so that the mouth is simply opened and closed depending on the length of a word. With such a method, control of the face image is easy. However, the shape of the mouth is the same for all sounds, so it is not realistic.

【０００４】そこで、言葉に合わせて口の形状を変化さ
せるリップシンクアニメーション（Lip Synchronizatio
n Animation ）の技術が提案されている。例えば、特表
２０００−５０７３７７号に開示されている発明による
と、その言葉（単語）に含まれる隣り合う２つの発音表
記の組合わせ、特に、母音と子音との組合わせに応じて
口の形状を制御する。Therefore, a lip synch animation (Lip Synchronizatio) that changes the shape of the mouth according to the words
n Animation) technology has been proposed. For example, according to the invention disclosed in Japanese Patent Publication No. 2000-507377, the shape of the mouth is determined according to the combination of two adjacent phonetic notations included in the word (word), in particular, the combination of a vowel and a consonant. To control.

【０００５】また、よりリアルなアニメーションを実現
するためには、実際に撮影された映像からＩｎｖｅｒｓ
ｅＫｉｎｅｍａｔｉｃｓなどの手法により形状のデー
タを取得する方法が用いられる。Further, in order to realize more realistic animation, the Invers
A method of acquiring shape data by a method such as e Kinetics is used.

【０００６】[0006]

【発明が解決しようとする課題】しかし、特表２０００
−５０７３７７号の発明によると、単に口の開閉を繰り
返し行う方法よりもリアル性は向上するが、発音表記の
組合わせの数が大きいので、形状に関するデータの量が
多くなる。ＩｎｖｅｒｓｅＫｉｎｅｍａｔｉｃｓの手
法を用いる場合も、映像から得られる形状が非常に多い
ので、形状に関するデータの量が多くなる。[Problems to be Solved by the Invention] However, the special table 2000
According to the invention of No. 507377, the realism is improved as compared with the method of repeatedly opening and closing the mouth, but since the number of combinations of phonetic notations is large, the amount of shape-related data is large. Even when the Inverse Kinetics method is used, the amount of data regarding the shape is large because the shapes obtained from the video are very large.

【０００７】データ量が多くなると、リップシンクアニ
メーションを実現するために、より高速なネットワーク
およびより高性能なコンピュータなどが必要となる。し
たがって、リップシンクアニメーションの適用範囲を広
げるためには、リップシンクアニメーションに必要なデ
ータ量を少なくしなければならない。When the amount of data increases, a higher speed network and a higher performance computer are required to realize the lip sync animation. Therefore, in order to expand the application range of lip sync animation, the amount of data required for lip sync animation must be reduced.

【０００８】本発明は、このような問題点に鑑み、従来
よりも形状に関するデータの量を少なくし、かつ、顔画
像の制御によるリップシンクアニメーションをリアルに
実現することができる顔画像制御方法および顔画像制御
装置を提供することを目的とする。In view of such a problem, the present invention reduces the amount of data relating to the shape as compared with the conventional art, and can realize a lip sync animation by controlling a face image in a realistic manner. It is an object to provide a face image control device.

【０００９】[0009]

【課題を解決するための手段】本発明に係る顔画像制御
方法は、言葉に合わせて口が動くように顔画像の動作を
制御する顔画像制御方法であって、母音の種類ごとに当
該母音を発するときの口の形状に関する第一の形状デー
タを記憶し、発音するときの口の形状に共通点がある子
音の種類同士を同じグループに分類し、前記グループご
とに当該グループに分類された子音を発するときの口の
形状に関する第二の形状データを記憶し、前記言葉の音
を母音または子音ごとに区切り、区切られた前記母音ま
たは前記子音ごとに、当該母音に対応する前記第一の形
状データまたは当該子音が分類されたグループに対応す
る第二の形状データに基づいて前記顔画像の動作の制御
を行う。A face image control method according to the present invention is a face image control method for controlling the movement of a face image so that the mouth moves in accordance with a word, and the vowel sounds are different for each kind of vowel. The first shape data relating to the shape of the mouth when uttering is stored, the types of consonants having common points in the shape of the mouth when pronounced are classified into the same group, and each group is classified into the group. The second shape data regarding the shape of the mouth when emitting a consonant is stored, the sound of the word is divided for each vowel or consonant, and for each vowel or consonant separated, the first corresponding to the vowel The operation of the face image is controlled based on the shape data or the second shape data corresponding to the group into which the consonants are classified.

【００１０】好ましくは、言葉に合わせて口が動くよう
に顔画像の動作を制御する顔画像制御方法であって、母
音の種類ごとに当該母音を発するときの口の形状に関す
る第一の形状データを記憶し、発音するときの口の形状
が、唇を合わせたようになる特徴を有する第一のグルー
プ、やや開いたようになる特徴を有する第二のグルー
プ、およびその直前の音韻の口の形状のままとなる特徴
を有する第三のグループ、の各グループごとに、それぞ
れの特徴の口の形状に関する第二の形状データを記憶
し、各子音の種類を、当該子音を発するときの口の形状
が前記各特徴のうち最も近い特徴を有する前記グループ
に分類し、前記言葉の音を母音または子音ごとに区切
り、区切られた前記母音または前記子音ごとに、当該母
音に対応する前記第一の形状データまたは当該子音が分
類される前記グループに対応する第二の形状データに基
づいて前記顔画像の動作の制御を行う。Preferably, the method is a face image control method for controlling the operation of a face image so that the mouth moves in accordance with a word, and the first shape data regarding the shape of the mouth when the vowel is emitted for each kind of vowel. The first group having the characteristic that the mouth shape when memorizing and pronouncing the lips matches the second group having the characteristic that the mouth becomes slightly open, and the mouth of the phoneme immediately before that. The second shape data regarding the mouth shape of each characteristic is stored for each group of the third group having the feature that remains the shape, and the type of each consonant is set to the type of the mouth when the consonant is emitted. The shape is classified into the group having the closest characteristic among the respective characteristics, the sound of the word is divided for each vowel or consonant, and for each of the divided vowels or the consonant, the first corresponding to the vowel. Jo data or based on the second shape data corresponding to the group to which the consonant is classified for controlling the operation of the face image.

【００１１】または、前記言葉の音に所定の連続する２
つの母音または子音が含まれる場合は、当該連続する２
つの母音または子音のうち後の母音または子音を所定の
母音または子音に置換え、置き換えられた音について前
記制御を行う。Alternatively, a predetermined number of consecutive 2 is added to the sound of the word.
Two consecutive vowels or consonants
One vowel or consonant is replaced with a subsequent vowel or consonant by a predetermined vowel or consonant, and the replaced sound is controlled.

【００１２】または、母音のときの口の形状を保つ時間
よりも子音のときの口の形状を保つ時間のほうが短くな
るように前記制御を行ってもよい。または、前記言葉の
音に所定の連続する２つの母音または子音が含まれる場
合は、当該連続する２つの母音または子音のうちいずれ
か１つまたは両方について、口の形状を保つ既定の時間
を変更して前記制御を行ってもよい。Alternatively, the control may be performed so that the time for maintaining the shape of the mouth in the consonant is shorter than the time for maintaining the shape of the mouth in the vowel. Alternatively, when the sound of the word includes two predetermined consecutive vowels or consonants, the default time for keeping the mouth shape is changed for one or both of the two consecutive vowels or consonants. Then, the control may be performed.

【００１３】本発明に係る顔画像制御装置は、母音の種
類ごとに当該母音を発するときの口の形状に関する第一
の形状データを記憶し、発音するときの口の形状が、唇
を合わせたようになる特徴を有する第一のグループ、や
や開いたようになる特徴を有する第二のグループ、およ
びその直前の音韻の口の形状のままとなる特徴を有する
第三のグループ、の各グループごとに、それぞれの特徴
の口の形状に関する第二の形状データを記憶し、各子音
の種類を、当該子音を発するときの口の形状が前記各特
徴のうち最も近い特徴を有する前記グループに分類して
記憶する形状データ記憶手段と、前記言葉の音を母音ま
たは子音ごとに区切り、区切られた前記母音および前記
子音ごとに、当該母音に対応する前記第一の形状データ
または当該子音が分類された前記グループに対応する前
記第二の形状データに基づいて前記顔画像の動作の制御
を行う動作制御手段と、を有する。The face image control apparatus according to the present invention stores the first shape data regarding the shape of the mouth when the vowel is emitted for each kind of the vowel, and the shape of the mouth when the vowel sounds is the same as that of the lips. For each group, a first group having a characteristic that becomes, a second group having a characteristic that becomes slightly open, and a third group having a characteristic that remains the mouth shape of the phoneme immediately before that. The second shape data regarding the mouth shape of each feature is stored, and the type of each consonant is classified into the group in which the mouth shape when the consonant is emitted has the closest feature among the features. Shape data storage means for storing as a vowel, consonant the sound of the word for each vowel or consonant, for each of the separated vowel and consonant, the first shape data or the consonant corresponding to the vowel Wherein corresponding to the class has been the group based on the second shape data having the operation control means for controlling operation of the face image.

【００１４】本発明に係るコンピュータプログラムは、
母音の種類ごとに当該母音を発するときの口の形状に関
する第一の形状データ、および発音するときの口の形状
に共通点がある子音の種類が分類されたグループごとに
当該グループの子音を発するときの口の形状に関する第
二の形状データを記憶する形状データ記憶手段を有し、
言葉に合わせて口が動くように顔画像の動作を制御する
コンピュータに用いられるコンピュータプログラムであ
って、前記言葉の音を母音ごとおよび子音ごとに区切る
ステップと、区切られた前記母音および前記子音ごと
に、当該母音に対応する前記第一の形状データまたは当
該子音が分類された前記グループに対応する前記第二の
形状データに基づいて前記顔画像の動作の制御を行うス
テップと、をコンピュータに実行させる。本発明に係る
記録媒体は、前記コンピュータプログラムが記録されて
いる。A computer program according to the present invention is
The first shape data regarding the shape of the mouth when emitting the vowel for each type of vowel, and the consonant of the group for each type of consonant that has a common feature in the shape of the mouth when pronounced are emitted. A shape data storage means for storing second shape data relating to the shape of the mouth when,
A computer program used in a computer for controlling the movement of a facial image so that the mouth moves in accordance with words, the step of separating the sounds of the words into vowels and consonants, and the separated vowels and consonants A step of controlling the operation of the face image based on the first shape data corresponding to the vowel or the second shape data corresponding to the group into which the consonant is classified, Let The computer program is recorded on a recording medium according to the present invention.

【００１５】[0015]

【発明の実施の形態】図１は本発明に係る顔画像制御装
置１の構成を説明する図、図２は磁気記憶装置１２に記
憶されるプログラムおよびデータを示す図、図３は顔画
像制御装置１の機能的構成を説明する図である。1 is a diagram for explaining the configuration of a face image control apparatus 1 according to the present invention, FIG. 2 is a diagram showing programs and data stored in a magnetic storage device 12, and FIG. 3 is a face image control. It is a figure explaining the functional composition of device 1.

【００１６】図１に示すように、顔画像制御装置１は、
処理装置１０、ディスプレイ装置１１、磁気記憶装置１
２、キーボード１３、マウス１４、マイク１５、および
スピーカ１６などによって構成される。As shown in FIG. 1, the face image control apparatus 1 is
Processing device 10, display device 11, magnetic storage device 1
2, a keyboard 13, a mouse 14, a microphone 15, a speaker 16 and the like.

【００１７】処理装置１０は、ＣＰＵ１０ａ、ＲＡＭ１
０ｂ、ＲＯＭ１０ｃ、各種の入出力ポート１０ｄ、およ
び各種のコントローラ１０ｅなどによって構成される。
磁気記憶装置１２には、図２に示すように、オペレーテ
ィングシステム（ＯＳ）１２ａ、顔画像制御プログラム
１２ｂ、およびモデリングプログラム１２ｃなどのプロ
グラム、および後に説明する種々の処理に用いられるデ
ータなどが記憶されている。The processing device 10 includes a CPU 10a and a RAM 1
0b, ROM 10c, various input / output ports 10d, various controllers 10e, and the like.
As shown in FIG. 2, the magnetic storage device 12 stores programs such as an operating system (OS) 12a, a face image control program 12b, and a modeling program 12c, and data used for various processes described later. ing.

【００１８】磁気記憶装置１２に記憶されているプログ
ラムおよびデータは、必要に応じてＲＡＭ１０ｂにロー
ドされる。ロードされたプログラムは、ＣＰＵ１０ａに
よって実行される。ネットワーク６Ｎを介して顔画像制
御装置１を他のコンピュータに接続し、プログラムまた
はデータをダウンロードしてもよい。または、フロッピ
ディスク１９ａ、ＣＤ−ＲＯＭ１９ｂ、または光磁気デ
ィスク（ＭＯ）１９ｃなどの各種リムーバブルディスク
（記録媒体）からプログラムまたはデータをロードして
もよい。The programs and data stored in the magnetic storage device 12 are loaded into the RAM 10b as needed. The loaded program is executed by the CPU 10a. The face image control apparatus 1 may be connected to another computer via the network 6N to download the program or data. Alternatively, the program or data may be loaded from various removable disks (recording media) such as the floppy disk 19a, the CD-ROM 19b, or the magneto-optical disk (MO) 19c.

【００１９】ディスプレイ装置１１には、処理装置１０
による処理結果が表示される。例えば、入力された言葉
に合わせて口が動くように人物の顔画像ＨＦを制御し、
係る制御の結果をリップシンクアニメーションとしてデ
ィスプレイ装置１１の表示画面ＨＧに表示する。スピー
カ１６は、顔画像ＨＦの動作（リップシンクアニメーシ
ョン）に合わせてその言葉を音声として出力する。これ
により、顔画像ＨＦがその言葉を喋っているかのように
ユーザに認識させることができる。キーボード１３また
はマイク１５は、顔画像ＨＦに喋らせる言葉を入力する
ことなどに用いられる。The display device 11 includes a processing device 10
The processing result of is displayed. For example, the face image HF of the person is controlled so that the mouth moves in accordance with the input words,
The result of such control is displayed on the display screen HG of the display device 11 as a lip sync animation. The speaker 16 outputs the word as voice in accordance with the operation (lip sync animation) of the face image HF. This allows the user to recognize the face image HF as if he or she is speaking that word. The keyboard 13 or the microphone 15 is used for inputting words to be spoken in the face image HF.

【００２０】顔画像制御装置１として、例えば、ワーク
ステーションまたはパーソナルコンピュータなどが用い
られる。このような構成によって、顔画像制御装置１に
は、図３に示すように、顔モデルデータ生成部１０１、
顔モデルデータ記憶部１０２、テキストデータ取得部１
０３、音声データ取得部１０４、音声テキスト変換部１
０５、顔画像制御部１０６、形状変化データ記憶部１０
７、形状グループ記憶部１０８、および音声出力部１０
９などが設けられる。As the face image control device 1, for example, a workstation or a personal computer is used. With such a configuration, the face image control device 1 includes, as shown in FIG.
Face model data storage unit 102, text data acquisition unit 1
03, voice data acquisition unit 104, voice text conversion unit 1
05, face image control unit 106, shape change data storage unit 10
7, shape group storage unit 108, and audio output unit 10
9 and the like are provided.

【００２１】人物の顔画像ＨＦは、その人物の３次元形
状モデルを所定の方向から２次元上に投影することによ
って得られる。つまり、顔画像ＨＦの動作は、３次元形
状モデルを言葉に合わせて変形することによって制御さ
れる。〔３次元形状モデルの準備〕図４は３次元形状モデルの
生成の処理の流れを説明するフローチャート、図５は標
準モデルＤＳの例を示す図、図６は変形処理の流れを説
明するフローチャート、図７は標準モデルＤＳの面Ｓと
３次元計測データの点Ｐとを模式的に示す図、図８は標
準モデルＤＳの異常変形を防ぐための仮想バネを説明す
るための図である。The face image HF of a person is obtained by projecting a three-dimensional shape model of the person in two dimensions from a predetermined direction. That is, the operation of the face image HF is controlled by deforming the three-dimensional shape model according to the words. [Preparation of 3D Geometric Model] FIG. 4 is a flowchart illustrating a flow of processing for generating a 3D geometric model, FIG. 5 is a diagram illustrating an example of a standard model DS, FIG. 6 is a flowchart illustrating a flow of deformation processing, FIG. 7 is a diagram schematically showing the surface S of the standard model DS and the points P of the three-dimensional measurement data, and FIG. 8 is a diagram for explaining a virtual spring for preventing abnormal deformation of the standard model DS.

【００２２】図３の顔モデルデータ生成部１０１は、上
に述べた顔画像ＨＦの基となる３次元形状モデルを生成
する。３次元形状モデルの生成は、図４に示すフローチ
ャートのような手順で行われる。The face model data generation unit 101 in FIG. 3 generates the three-dimensional shape model which is the basis of the face image HF described above. The generation of the three-dimensional shape model is performed by the procedure shown in the flowchart of FIG.

【００２３】まず、図５に示す標準モデルＤＳと人物
（例えば顔画像制御装置１のユーザ）の３次元計測デー
タとの概略の位置合わせを行う（＃１０１）。標準モデ
ルＤＳは、標準的な顔のサイズおよび形状を有した、頭
部の全周を構造化した３次元データである。３次元計測
データは、点群からなるユーザの顔の３次元データであ
る。すなわち、ステップ＃１０１では、標準モデルＤＳ
と３次元計測データとの距離が最小となるように、標準
モデルＤＳの向き、大きさ、および位置を変更する。一
般に、標準モデルＤＳおよび３次元計測データとして、
無表情の状態のものが用いられる。なお、３次元計測デ
ータは、３次元計測装置でユーザを撮影するなどして予
め用意されている。First, the standard model DS shown in FIG. 5 is roughly aligned with the three-dimensional measurement data of a person (for example, the user of the face image control apparatus 1) (# 101). The standard model DS is three-dimensional data having a standard face size and shape and structured around the entire circumference of the head. The three-dimensional measurement data is three-dimensional data of the user's face consisting of point clouds. That is, in step # 101, the standard model DS
The orientation, size, and position of the standard model DS are changed so that the distance between and the three-dimensional measurement data is minimized. Generally, as standard model DS and three-dimensional measurement data,
An expressionless state is used. The three-dimensional measurement data is prepared in advance by photographing the user with the three-dimensional measurement device.

【００２４】輪郭および特徴点を抽出する（＃１０
２）。標準モデルＤＳについての輪郭ＲＫおよび特徴点
ＴＴと同じ位置に配置されるべき輪郭および特徴点を、
３次元計測データ上に、またはそれに対応する２次元画
像上に配置する。Extract contours and characteristic points (# 10)
2). The contour and the feature points to be arranged at the same positions as the contour RK and the feature point TT for the standard model DS are
It is arranged on the three-dimensional measurement data or on the corresponding two-dimensional image.

【００２５】特徴点として、例えば、目や口の端部、鼻
の頂部、顎の下端部のように実際に特徴のある部分、ま
たは、それらの中間のようなそれ自体では特徴はないが
位置的に特定し易い部分などが選ばれる。輪郭として、
顎のライン、唇のライン、または瞼のラインなどが選ば
れる。As the characteristic points, for example, there are actually characteristic portions such as the ends of the eyes and mouth, the top of the nose, and the lower end of the chin, or there is no characteristic by itself such as the middle but the position. A part that is easy to identify is selected. As a contour,
The chin line, lip line, or eyelid line is selected.

【００２６】計算量および誤差を削減するために、３次
元計測データについてデータの削減を行う（＃１０
３）。標準モデルＤＳの変形を行う（＃１０４）。すな
わち、３次元計測データの各点と標準モデルＤＳの面と
の間の距離に関連して定義されたエネルギー関数、また
は過剰な変形を回避するために定義されたエネルギー関
数などを用い、それらが最小となるように標準モデルＤ
Ｓの面を変形させる。In order to reduce the calculation amount and the error, data reduction is performed on the three-dimensional measurement data (# 10).
3). The standard model DS is modified (# 104). That is, an energy function defined in relation to the distance between each point of the three-dimensional measurement data and the surface of the standard model DS, or an energy function defined to avoid excessive deformation is used. Standard model D to minimize
Deform the surface of S.

【００２７】そして、対象とするエネルギー関数および
制御点を変更し、ステップ＃１０４と同様な変更のため
の処理を繰り返す（＃１０５）。次に、ステップ＃１０
４の変形処理について説明する。Then, the target energy function and the control point are changed, and the processing for the change similar to step # 104 is repeated (# 105). Next, Step # 10
The transformation process of No. 4 will be described.

【００２８】図７において、３次元計測データを構成す
る点群の１つが点Ｐｋで示されている。標準モデルＤＳ
の面Ｓにおいて、点Ｐｋに最も近い点がＱｋで示されて
いる。点Ｑｋは、点Ｐｋから面Ｓに垂線を下ろしたとき
の交点である。In FIG. 7, one of the point groups forming the three-dimensional measurement data is indicated by the point Pk. Standard model DS
On the surface S of, the point closest to the point Pk is indicated by Qk. The point Qk is an intersection when a perpendicular line is drawn from the point Pk to the surface S.

【００２９】点群に面Ｓをフィッティングする方法は次
の通りである。ここでは、一般的なフィッティングにつ
いて説明する。点群の中の１つの点Ｐｋ、それに対応す
る点Ｑｋ、および対応点群Ｔ＝｛（Ｐｋ，Ｑｋ），ｋ＝
１…ｎ｝について、フィッティングエネルギー（Fittin
g Energy) 関数Ｆｆ（Ｕ）を、次の式（１）のように設
定する。The method of fitting the surface S to the point cloud is as follows. Here, general fitting will be described. One point Pk in the point group, the corresponding point Qk, and the corresponding point group T = {(Pk, Qk), k =
1 ... n}, the fitting energy (Fittin
g Energy) Function Ff (U) is set as in the following equation (1).

【００３０】[0030]

【数１】 [Equation 1]

【００３１】ただし、Ｑｋ（Ｕ）は、ＱｋがＵの関数で
あることを示す。また、面Ｓの過度の変形を防ぐため
に、図８に示す仮想バネ(elastic bar) ＫＢを導入す
る。仮想バネＫＢの制約に基づいて、面Ｓの形状安定化
のための安定化エネルギー関数を導く。However, Qk (U) indicates that Qk is a function of U. In order to prevent the surface S from being excessively deformed, a virtual spring (elastic bar) KB shown in FIG. 8 is introduced. A stabilizing energy function for stabilizing the shape of the surface S is derived based on the constraint of the virtual spring KB.

【００３２】すなわち、図８において、フィッティング
対象である標準モデルＤＳの面（曲面）Ｓの一部が示さ
れている。面Ｓは、制御点群Ｕ＝｜ｕｉ，ｉ＝１…ｎ｜
で形成されている。隣接する制御点間には、仮想バネＫ
Ｂが配置されている。仮想バネＫＢは、制御点間に引っ
張り力による拘束を与え、面Ｓの異常変形を防ぐ働きを
する。That is, FIG. 8 shows a part of the surface (curved surface) S of the standard model DS to be fitted. The surface S has a control point group U = | ui, i = 1 ... n |
Is formed by. A virtual spring K is provided between the adjacent control points.
B is arranged. The virtual spring KB exerts a constraint by a pulling force between the control points and functions to prevent abnormal deformation of the surface S.

【００３３】つまり、隣接する制御点ｕの間隔が大きく
なった場合に、それに応じて仮想バネＫＢによる引っ張
り力が大きくなる。例えば、点Ｑｋが点Ｐｋに近づく場
合に、その移動にともなって制御点ｕの間隔が大きくな
ると、仮想バネＫＢによる引っ張り力が増大する。点Ｑ
ｋが移動しても制御点ｕの間隔が変わらなければ、つま
り制御点ｕ間の相対位置関係に変化がなければ、仮想バ
ネＫＢによる引っ張り力は変化しない。仮想バネＫＢに
よる引っ張り力を面Ｓの全体について平均化したもの
を、安定化エネルギーとして定義する。したがって、面
Ｓの一部が突出して変形した場合に安定化エネルギーは
増大する。面Ｓの全体が平均して移動すれば安定化エネ
ルギーは零である。That is, when the distance between the adjacent control points u becomes large, the pulling force of the virtual spring KB becomes large accordingly. For example, when the point Qk approaches the point Pk and the distance between the control points u increases with the movement of the point Qk, the pulling force of the virtual spring KB increases. Point Q
Even if k is moved, if the distance between the control points u does not change, that is, if the relative positional relationship between the control points u does not change, the pulling force by the virtual spring KB does not change. Stabilization energy is defined as an average of the pulling force of the virtual spring KB over the entire surface S. Therefore, the stabilization energy increases when part of the surface S projects and deforms. If the entire surface S moves on average, the stabilization energy is zero.

【００３４】安定化エネルギー関数Ｆｓ（Ｕ）は、次の
式（２）で示される。The stabilized energy function Fs (U) is expressed by the following equation (2).

【００３５】[0035]

【数２】 [Equation 2]

【００３６】ここで、Here,

【００３７】[0037]

【数３】 [Equation 3]

【００３８】は、それぞれ、仮想バネＫＢの初期端点、
変形後の仮想バネＫＢの端点である。ｃはバネ係数であ
り、Ｍは仮想バネＫＢの本数である。また、次の関係が
成り立つ。Are the initial end points of the virtual spring KB,
It is the end point of the virtual spring KB after deformation. c is a spring coefficient, and M is the number of virtual springs KB. Also, the following relationship holds.

【００３９】[0039]

【数４】 [Equation 4]

【００４０】したがって、バネ係数ｃを大きくすると、
仮想バネＫＢは硬くなって変形し難くなる。このような
安定化エネルギー関数Ｆｓ（Ｕ）を導入することによ
り、面Ｓの形状変化に一定の拘束を設けることとなり、
面Ｓの過度の変形を防ぐことができる。Therefore, when the spring coefficient c is increased,
The virtual spring KB becomes hard and difficult to deform. By introducing such a stabilizing energy function Fs (U), a constant constraint is provided for the shape change of the surface S,
Excessive deformation of the surface S can be prevented.

【００４１】上に述べたフィッティングエネルギー関数
Ｆｆ（Ｕ）、および安定化エネルギー関数Ｆｓ（Ｕ）を
用い、フィッティングの評価関数Ｆ（Ｕ）を次の式
（３）のように定義する。Using the fitting energy function Ff (U) and the stabilizing energy function Fs (U) described above, the fitting evaluation function F (U) is defined by the following equation (3).

【００４２】Ｆ（Ｕ）＝ＷｆＦｆ（Ｕ）＋ＷｓＦｓ（Ｕ） ……（３）ここで、Ｗｆ，Ｗｓは、それぞれ正規化のための重み係
数である。式（３）の評価関数Ｆ（Ｕ）が十分小さくな
るように、面Ｓの変形および対応点の探索を繰り返し、
面のフィッティングを行う。例えば、Ｆ（Ｕ）のＵに関
する微分が０に近づく方向にフィッティングを行う。F (U) = WfFf (U) + WsFs (U) (3) Here, Wf and Ws are weighting coefficients for normalization. The deformation of the surface S and the search for corresponding points are repeated so that the evaluation function F (U) of the equation (3) becomes sufficiently small,
Perform face fitting. For example, fitting is performed in the direction in which the derivative of F (U) with respect to U approaches 0.

【００４３】図６において、変形処理では、まず、点Ｐ
ｋに対応する点Ｑｋを計算で求め、点Ｐｋと点Ｑｋの組
みを作成する（＃１１１）。面Ｓを変形し（＃１１
２）、変形後の評価関数Ｆ（Ｕ）を計算する（＃１１
３）。評価関数Ｆ（Ｕ）が収束するまで（＃１１４でＹ
ｅｓ）、処理を繰り返す。In FIG. 6, in the transformation process, first, the point P
A point Qk corresponding to k is calculated and a pair of points Pk and Qk is created (# 111). Deform the surface S (# 11
2) Calculate the modified evaluation function F (U) (# 11
3). Until the evaluation function F (U) converges (Y in # 114
es), the process is repeated.

【００４４】評価関数Ｆ（Ｕ）の収束を判定する方法と
して、評価関数Ｆ（Ｕ）が所定の値よりも小さくなった
ときを収束とする方法、前回の計算と比較べた変化の割
合が所定値以下となったときに収束とする方法など、公
知の方法を用いることが可能である。As a method of determining the convergence of the evaluation function F (U), the method of setting the convergence when the evaluation function F (U) becomes smaller than a predetermined value, and the change rate compared with the previous calculation are predetermined. It is possible to use a known method such as a method of converging when the value becomes less than or equal to the value.

【００４５】このような処理によって標準モデルＤＳを
変形し、ユーザの顔の形状をした３次元形状モデルを生
成することができる。〔筋肉の定義〕図９は顔モデル３Ｍの構成の例を示す
図、図１０は筋肉配置データ７１の例を示す図、図１１
はノード影響データ７２の例を示す図、図１２はあるノ
ードＮの移動による影響が及ぶ範囲の例を説明する図で
ある。By such processing, the standard model DS can be transformed to generate a three-dimensional shape model having the shape of the user's face. [Definition of Muscle] FIG. 9 is a diagram showing an example of the configuration of the face model 3M, FIG. 10 is a diagram showing an example of muscle placement data 71, and FIG.
Is a diagram showing an example of the node influence data 72, and FIG. 12 is a diagram explaining an example of a range affected by the movement of a certain node N.

【００４６】顔モデルデータ生成部１０１によって生成
されたユーザの顔の３次元形状モデルは、図３に示す顔
モデルデータ記憶部１０２に記憶される。以下、３次元
形状モデルを「顔モデル３Ｍ」と記載する。また、顔モ
デルデータ記憶部１０２は、図１０に示す筋肉配置デー
タ７１および図１１に示すノード影響データ７２を記憶
する。The three-dimensional shape model of the user's face generated by the face model data generation unit 101 is stored in the face model data storage unit 102 shown in FIG. Hereinafter, the three-dimensional shape model will be referred to as “face model 3M”. The face model data storage unit 102 also stores the muscle placement data 71 shown in FIG. 10 and the node influence data 72 shown in FIG. 11.

【００４７】図９（ａ）において、複数の細い直線同士
の交点は、顔モデル３Ｍの構成頂点（Model Vertex）Ｖ
を示す。顔の表面すなわち皮膚の位置は、構成頂点Ｖに
よって決まる。In FIG. 9A, the intersections of a plurality of thin straight lines are the vertices (Model Vertex) V of the face model 3M.
Indicates. The position of the surface or skin of the face is determined by the constituent vertices V.

【００４８】太い直線は、顔モデル３Ｍの筋肉を意味す
るエッジ（Edge）Ｅを示す。黒い丸印は筋肉の端点を意
味するノードＮ（Node）を示す。つまり、筋肉（エッジ
Ｅ）の位置は、異なる２つのノードＮによって決まる。
ノードＮ（Ｎ１、Ｎ２、…）は、３次元形状モデルの各
制御点ｕに対応しており、顔全体の各筋肉の端点となる
位置に配置されている。なお、図９（ｂ）は、ノードＮ
とエッジＥとの関係を分かりやすくするために図９
（ａ）から構成頂点Ｖを省略して示している。図９
（ａ）（ｂ）は、顔の右半分のノードＮおよびエッジＥ
を省略して示しているが、実際には、左半分と同様にノ
ードＮおよびエッジＥが存在する。A thick straight line indicates an edge E which means a muscle of the face model 3M. A black circle indicates a node N (Node) which means an end point of a muscle. That is, the position of the muscle (edge E) is determined by two different nodes N.
The node N (N1, N2, ...) Corresponds to each control point u of the three-dimensional shape model, and is arranged at a position that is an end point of each muscle of the entire face. It should be noted that FIG.
9 to facilitate understanding of the relationship between the edge E and the edge E.
The apexes V are omitted from FIG. Figure 9
(A) and (b) are the node N and edge E of the right half of the face.
Although omitted, the node N and the edge E actually exist as in the left half.

【００４９】ノードＮの位置は、次に示す式（４）のよ
うに構成頂点Ｖの相対的位置として表される。The position of the node N is represented as the relative position of the constituent vertex V as in the following equation (4).

【００５０】[0050]

【数５】 [Equation 5]

【００５１】筋肉配置データ７１は、図１０に示すよう
に、各筋肉（エッジＥ１、Ｅ２、…）の構成に関するデ
ータである。エッジＥの第一のパラメータは、そのエッ
ジＥの端点となる２つのノードＮを示す。エッジＥの第
二のパラメータは、そのエッジＥ（筋肉）を変位させた
場合に、どちらの端点（ノードＮ）をどれだけの割合
（ウェイト）で移動させるかを示す。例えば、エッジＥ
３の第二のパラメータ「０．７，０．３」は、ノードＮ
４を７、ノードＮ３を３、の割合でそれぞれ移動させる
ということを示している。As shown in FIG. 10, the muscle placement data 71 is data relating to the structure of each muscle (edges E1, E2, ...). The first parameter of the edge E indicates the two nodes N that are the end points of the edge E. The second parameter of the edge E indicates which end point (node N) is moved at what ratio (weight) when the edge E (muscle) is displaced. For example, edge E
The second parameter “0.7, 0.3” of 3 is the node N
It is indicated that 4 is moved at a ratio of 7 and node N3 is moved at a ratio of 3.

【００５２】エッジＥが変位するとき、ノードＮが移動
する位置は、次に示す式（５）によって求められる。The position where the node N moves when the edge E is displaced is obtained by the following equation (5).

【００５３】[0053]

【数６】 [Equation 6]

【００５４】ただし、実際には複数のエッジＥに関係す
るノードＮが存在するため、収束演算または連立演算に
よってノードＮの移動後の位置が求められる。ノード影
響データ７２は、図１１に示すように、ノードＮの移動
に伴って構成頂点Ｖに及ぼされる影響に関するデータで
ある。ノード影響データ７２の第二のパラメータは、各
ノードＮが移動したときに影響を受ける構成頂点Ｖを示
している。つまり、ノードＮが移動したときの影響の範
囲を示している。ノードＮの移動による影響を受ける構
成頂点Ｖは、そのノードＮの周辺に集中している。例え
ば、図１２において、大きい黒丸が示すノードＮの移動
による影響を受ける構成頂点Ｖは、小さい黒丸が示す９
つの構成頂点Ｖである。However, since there are actually a plurality of nodes N related to the edges E, the position of the node N after movement can be obtained by a convergence calculation or a simultaneous calculation. As shown in FIG. 11, the node influence data 72 is data relating to the influence exerted on the constituent vertex V with the movement of the node N. The second parameter of the node influence data 72 indicates the constituent vertex V that is affected when each node N moves. That is, it shows the range of influence when the node N moves. The constituent vertices V affected by the movement of the node N are concentrated around the node N. For example, in FIG. 12, the constituent vertex V affected by the movement of the node N indicated by the large black circle is 9 indicated by the small black circle.
Two constituent vertices V.

【００５５】第一のパラメータは、ノードＮが移動した
ときに構成頂点Ｖに対して与える影響の度合（intensit
y ）を示している。この値が大きいと、ノードＮの移動
に伴う構成頂点Ｖの移動量（変位量）が大きくなる。The first parameter is the degree of influence (intensit) on the constituent vertex V when the node N moves.
y) is shown. If this value is large, the movement amount (displacement amount) of the constituent vertex V accompanying the movement of the node N becomes large.

【００５６】ノードＮが移動するのに伴って構成頂点Ｖ
が移動する位置は、次に示す式（６）によって求められ
る。As node N moves, constituent vertex V
The position to move is calculated by the following equation (6).

【００５７】[0057]

【数７】 [Equation 7]

【００５８】〔顔モデルの形状を制御するためのデー
タ〕図１３は母音および子音が通常属する形状グループ
を示す図、図１４は形状グループデータ７５の例を示す
図、図１５は形状変化データ７４の例を示す図、図１６
は各形状グループに属する音を発したときの顔モデルの
形状の例を示す図である。[Data for controlling the shape of the face model] FIG. 13 shows a shape group to which vowels and consonants usually belong, FIG. 14 shows an example of the shape group data 75, and FIG. 15 shows shape change data 74. 16 shows an example of FIG.
FIG. 8 is a diagram showing an example of the shape of a face model when a sound belonging to each shape group is emitted.

【００５９】本実施形態では、図１３に示すように、顔
モデル３Ｍの形状の変化に関するデータ量の軽減のため
に、５つの母音のグループ（形状グループＡ、Ｅ、Ｉ、
Ｏ、Ｕ）および発音する際の口の形状の特徴に基づいて
各子音の種類を分類するための３つのグループ（形状グ
ループ１〜３）が設けられている。In this embodiment, as shown in FIG. 13, five vowel groups (shape groups A, E, I, in order to reduce the amount of data relating to changes in the shape of the face model 3M).
O, U) and three groups (shape groups 1 to 3) for classifying each consonant type based on the characteristics of the mouth shape at the time of pronunciation.

【００６０】形状グループＡ、Ｅ、Ｉ、Ｏ、Ｕには、そ
れぞれ「ａ」、「ｅ」、「ｉ」、「ｏ」、「ｕ」の１種
類ずつの母音が属する。形状グループ１は唇を合わせて
発音する子音のグループ、形状グループ２は唇を合わせ
ずに口を所定の形状にして発音する子音のグループ、形
状グループ３は前に発した音の口の形状のまま発音する
子音のグループである。係る分類によると、通常、形状
グループ１には「ｂ、ｆ、ｍ、ｐ、ｖ」の５種類の子音
が属し、形状グループ２には「ｄ、ｇ、ｊ、ｋ、ｌ、
ｎ、ｒ、ｓ、ｔ、ｗ、ｚ」の１１種類の子音が属し、形
状グループ３には「ｈ、ｙ」の２種類の子音が属する。To the shape groups A, E, I, O, and U, one type of vowel of "a", "e", "i", "o", and "u" belongs, respectively. Shape group 1 is a group of consonant sounds that are pronounced with lips, shape group 2 is a group of consonant sounds that are pronounced with the mouth in a predetermined shape without matching lips, and shape group 3 is a shape of the mouth of a sound that was emitted before. A group of consonants that are pronounced as they are. According to this classification, the shape group 1 usually includes five types of consonants “b, f, m, p, v”, and the shape group 2 includes “d, g, j, k, l,”.
11 types of consonants "n, r, s, t, w, z" belong to the shape group 3, and two types of consonants "h, y" belong to the shape group 3.

【００６１】図３の形状変化データ記憶部１０７は図１
５に示す形状変化データ７４を記憶する。形状変化デー
タ７４は、各形状グループＡ、Ｅ、Ｉ、Ｏ、Ｕ、１、２
に属する音を発するときに、顔モデル３Ｍがそれぞれ図
１６（ａ）〜（ｇ）に示す形状になるように顔モデル３
Ｍの各筋肉（エッジＥ１、Ｅ２、…）を変化させるため
のデータである。形状変化データ７４の各値は、筋肉の
収縮の度合を示している。収縮していない状態を
「０」、最も収縮した状態を「２０」とする。例えば、
変形量（収縮の度合）が「１５．０」であれば、その筋
肉（エッジＥ）が７５％収縮することを示す。なお、形
状グループ３の場合は、顔モデル３Ｍは前に発した音
（音韻）の形状のまま保たれるので、変形量の値を持た
ない。The shape change data storage unit 107 of FIG.
The shape change data 74 shown in FIG. The shape change data 74 includes the shape groups A, E, I, O, U, 1, 2
When the sound belonging to the face model 3M is emitted, the face model 3M has the shapes shown in FIGS. 16 (a) to 16 (g).
This is data for changing each muscle of M (edges E1, E2, ...). Each value of the shape change data 74 indicates the degree of muscle contraction. The state of non-contraction is "0", and the most contracted state is "20". For example,
If the amount of deformation (degree of contraction) is “15.0”, it means that the muscle (edge E) contracts by 75%. In the case of the shape group 3, since the face model 3M is kept in the shape of the sound (phoneme) emitted previously, it does not have the value of the deformation amount.

【００６２】後に説明する顔モデル３Ｍの形状の制御
は、その顔モデル３Ｍに喋らせる言葉を音韻ごとに区切
って分解し、各音韻が属する形状グループに対応する形
状変化データ７４に基づいてエッジＥを変化させること
によって行われる。例えば、「館（ｋａｎ）」という言
葉の場合は、「ｋ、ａ、ｎ」のように区切られ、順に形
状グループ２、Ａ、２の形状になるように顔モデル３Ｍ
を制御する。To control the shape of the face model 3M, which will be described later, the words to be spoken by the face model 3M are divided into phonemes and decomposed, and the edge E is generated based on the shape change data 74 corresponding to the shape group to which each phoneme belongs. Is done by changing. For example, in the case of the word "kan", the face model 3M is divided into "k, a, n" and the shape groups 2, A, 2 are formed in order.
To control.

【００６３】しかし、一般に、言葉には、直前の音（音
韻）の影響を受けて記述（スペル）通りに発音しない場
合がある。例えば、「公（ｋｏｕ）」という言葉の場合
は、「ｕ」がその直前の「ｏ」の影響を受けて「ｋｏ
ｏ」と発音される。このとき、口の形状も、図１６
（ｅ）に示す「ｕ」の形状ではなく、図１６（ｄ）に示
す「ｏ」の形状となる。However, in general, words may not be pronounced as described (spelled) due to the influence of the immediately preceding sound (phoneme). For example, in the case of the word "kou", "u" is affected by "o" immediately before it, and "ko" is affected.
pronounced "o". At this time, the shape of the mouth is also shown in FIG.
The shape is not the “u” shape shown in (e) but the “o” shape shown in FIG. 16 (d).

【００６４】形状グループ記憶部１０８に記憶される形
状グループデータ７５は、図１４に示すように、上記の
例のように直前の音韻の影響を受けて通常とは異なる口
の形状で発音される音韻に関するデータである。具体的
には、形状グループデータ７５は、各音韻がその直前の
音韻の影響を受けるとどの形状グループに変更されるか
を示している。例えば、「ｇ」は通常は形状グループ２
に属するが、直前に「ｎ」があるときは太字で示すよう
に形状グループ３に変更される。〔顔画像の制御（アニメーションの実行）〕次に、ユー
ザの顔画像ＨＦの動作の制御について説明する。As shown in FIG. 14, the shape group data 75 stored in the shape group storage unit 108 is pronounced with an unusual mouth shape under the influence of the immediately preceding phoneme as in the above example. This is data about phonemes. Specifically, the shape group data 75 indicates to which shape group each phoneme is changed when influenced by the phoneme immediately before it. For example, "g" is usually shape group 2
However, when there is "n" immediately before, it is changed to shape group 3 as shown in bold. [Control of Face Image (Execution of Animation)] Next, control of operation of the face image HF of the user will be described.

【００６５】図１７は「ｋｏｕｍｉｎｋａｎ」の各音韻
が属する形状グループの例を示す図、図１８は母音、子
音、および連続母音の継続時間を説明する図、図１９は
タイムテーブルの例を示す図である。FIG. 17 is a diagram showing an example of a shape group to which each phoneme of "kouminkan" belongs, FIG. 18 is a diagram explaining the duration of vowels, consonants, and continuous vowels, and FIG. 19 is a diagram showing an example of a time table. Is.

【００６６】図３のテキストデータ取得部１０３は、顔
モデル３Ｍの形状を変形する基となる言葉（すなわち顔
モデル３Ｍに喋らせる言葉）をテキストデータ７３とし
て取得する。例えば、ユーザがキーボード１３を操作し
て入力した文字列をテキストデータ７３として取得す
る。または、他のユーザから受け取った電子メールに書
かれているメッセージをテキストデータ７３としてもよ
い。The text data acquisition unit 103 in FIG. 3 acquires, as the text data 73, the words that are the basis for deforming the shape of the face model 3M (that is, the words that cause the face model 3M to speak). For example, the character string input by the user operating the keyboard 13 is acquired as the text data 73. Alternatively, a message written in an electronic mail received from another user may be used as the text data 73.

【００６７】テキストデータの代わりに、音声によって
言葉を入力してもよい。音声データ取得部１０４は、マ
イク１５に向かって喋ったユーザの声を音声データ７４
として取得する。音声データ７４は、音声テキスト変換
部１０５によってテキストデータ７３に変換される。Words may be input by voice instead of text data. The voice data acquisition unit 104 converts the voice of the user who speaks into the microphone 15 into the voice data 74.
To get as. The voice data 74 is converted into text data 73 by the voice / text converter 105.

【００６８】顔画像制御部１０６は、形状取得部１６
１、時間配分部１６２、形状補間部１６３、および動画
像生成部１６４によって構成され、入力された言葉に合
わせて顔画像ＨＦが動作するように制御する。The face image control unit 106 includes the shape acquisition unit 16
1. The time distribution unit 162, the shape interpolation unit 163, and the moving image generation unit 164 control the face image HF to operate in accordance with the input words.

【００６９】形状取得部１６１は、テキストデータ７３
を音韻すなわち母音または子音ごとに区切って分解す
る。例えば、テキストデータ７３が「公民館（こうみん
かん）」である場合は、「ｋ、ｏ、ｕ、ｍ、ｉ、ｎ、
ｋ、ａ、ｎ」の９つの音韻に区切られる。そして、区切
られた各音韻がいずれの形状グループに属するかを、そ
の直前の音韻および図１４に示す形状グループデータ７
５に基づいて求める。すなわち、各音韻について、直前
の音韻の影響を受けて形状グループの変更があるか否か
を形状グループデータ７５に基づいて求める。例えば、
図１７に示すように、３番目の「ｕ」は、直前に「ｏ」
があるので形状グループＯに変更される。これ以外の音
韻は、直前の音韻の影響を受けないので、通常の形状グ
ループのままである。このようにして、図１７に示すよ
うに、各音韻「ｋ、ｏ、…、ｎ」に合わせて顔モデル３
Ｍの形状を制御するための形状グループが求められる。
ただし、図１７において、「Ｇ１」および「Ｇ２」はそ
れぞれ形状グループ１および２を示し、「Ａ」、
「Ｉ」、および「Ｏ」はそれぞれ形状グループＡ、Ｉ、
およびＯを示している。The shape acquisition unit 161 uses the text data 73.
Is divided into phonemes, that is, vowels or consonants, and decomposed. For example, when the text data 73 is “public hall”, “k, o, u, m, i, n,
It is divided into nine phonemes “k, a, n”. Then, the shape group data 7 shown in FIG. 14 and the phoneme immediately before it is determined to which shape group each segmented phoneme belongs.
Calculate based on 5. That is, for each phoneme, it is determined based on the shape group data 75 whether or not the shape group is changed under the influence of the immediately preceding phoneme. For example,
As shown in FIG. 17, the third “u” is immediately preceded by an “o”.
Therefore, the shape group is changed to O. The other phonemes are not affected by the immediately preceding phoneme, and thus remain in the normal shape group. In this way, as shown in FIG. 17, the face model 3 is matched to each phoneme “k, o, ..., N”.
A shape group for controlling the shape of M is determined.
However, in FIG. 17, “G1” and “G2” represent shape groups 1 and 2, respectively, and “A”,
“I” and “O” are shape groups A, I, and
And O are shown.

【００７０】時間配分部１６２は、各音韻に合わせて顔
モデル３Ｍの形状を変形させる際のタイミングに関する
設定を行う。係る設定は、次に示す規則に基づいて行わ
れる。The time allocation section 162 sets the timing for deforming the shape of the face model 3M in accordance with each phoneme. The setting is performed based on the following rules.

【００７１】図１８に示すように、ある形状（例えば無
表情の形状）から次の音韻を発した形状に変化するまで
の時間（立ち上がり時間Ｔ１）およびその形状が無表情
の形状に戻るまでの時間（終息時間Ｔ３）を極めて短い
時間とし、例えば、０．１秒以下とする。このとき、Ｔ
１＝Ｔ３としてもよい。As shown in FIG. 18, the time from a certain shape (for example, an expressionless shape) to the change of the next phoneme (rise time T1) and the time until the shape returns to the expressionless shape. The time (end time T3) is set to an extremely short time, for example, 0.1 second or less. At this time, T
It may be 1 = T3.

【００７２】母音を発した形状を保つ時間（継続時間Ｔ
ｂ２）、子音を発した形状を保つ時間（継続時間Ｔｓ
２）、および直前の音韻が母音である場合の母音（連続
母音）を発した形状を保つ時間（継続時間Ｔｒ２）の長
さの関係を、Ｔｓ２＜Ｔｒ２＜Ｔｂ２、とする。標準的
な会話のスピードであれば、Ｔｓ２がおよそ０．１秒、
Ｔｂ２がおそよ０．４秒くらいとなる。これらの時間の
長さは、話すスピードに応じて変更可能である。Time to maintain the shape of vowel (duration T
b2), the time for maintaining the shape of the consonant (duration Ts
The relationship between 2) and the length of the time (duration Tr2) for maintaining the shape of a vowel (continuous vowel) when the immediately preceding phoneme is a vowel is Ts2 <Tr2 <Tb2. At standard conversation speed, Ts2 is about 0.1 seconds,
Tb2 is about 0.4 seconds. The length of these times can vary depending on the speaking speed.

【００７３】隣り合う２つの音韻のタイミングは、図１
９に示すように、前の音韻が終息したとき（ｔ＝ｔｂ）
に後の音韻の立ち上がりが完了するように設定する。つ
まり、後の音韻は、ｔｂよりも終息時間Ｔ３だけ前に立
ち上がりはじめるように配置される（ｔ＝ｔａ）。The timing of two adjacent phonemes is shown in FIG.
As shown in 9, when the previous phoneme ends (t = tb)
Set so that the subsequent phoneme rise is completed. That is, the subsequent phonemes are arranged so as to start rising before the end time T3 before tb (t = ta).

【００７４】上記の規則によると、例えば、「ｋｏｏｍ
ｉｎｋａｎ」の各音韻に対応する形状グループが、図１
９に示すタイムテーブルのように配置される。図３に戻
って、形状補間部１６３は、ある音韻からその次の音韻
へ移る際の顔モデル３Ｍの形状のデータを補間する。つ
まり、一方の音韻の立ち上がり時間と他方の終息時間と
が重なる部分の顔モデル３Ｍの形状を簡単のため直線近
似とし、次の式（７）に基づいて顔モデル３Ｍの各構成
頂点Ｖの位置を求める。According to the above rule, for example, "koom
The shape group corresponding to each phoneme of "inkan" is shown in FIG.
It is arranged like a timetable shown in FIG. Returning to FIG. 3, the shape interpolating unit 163 interpolates the data of the shape of the face model 3M when moving from one phoneme to the next phoneme. That is, the shape of the face model 3M at the portion where the rise time of one phoneme and the end time of the other overlap is linearly approximated for simplicity, and the position of each constituent vertex V of the face model 3M is calculated based on the following equation (7). Ask for.

【００７５】[0075]

【数８】 [Equation 8]

【００７６】動画像生成部１６４は、形状取得部１６１
および形状補間部１６３によって得られた顔モデル３Ｍ
の形状すなわち構成頂点Ｖの位置を図１９に示すタイム
テーブルに従って変化させながら所定の方向から２次元
上に投影することによって顔画像ＨＦを動作させ、リッ
プシンクアニメーションを生成する。The moving image generation section 164 has a shape acquisition section 161.
And the face model 3M obtained by the shape interpolation unit 163
The face image HF is operated by projecting the shape, that is, the position of the constituent vertex V according to the time table shown in FIG. 19 in a two-dimensional manner from a predetermined direction, and a lip sync animation is generated.

【００７７】音声出力部１０９は、言葉を音声化し、リ
ップシンクアニメーションと同期して出力する。例え
ば、所定の音韻が立ち上がるときに顔画像制御部１０６
から発せられる信号（トリガー）に合わせて順次音声を
出力する。テキストデータを音声化する方法として、公
知の音声合成技術が用いられる。The voice output unit 109 converts the words into voices and outputs them in synchronization with the lip sync animation. For example, when a predetermined phoneme rises, the face image control unit 106
The sound is sequentially output in accordance with the signal (trigger) emitted from. A known speech synthesis technique is used as a method for converting text data into speech.

【００７８】次に、入力された言葉に合わせてユーザの
顔画像ＨＦを動かしリップシンクアニメーションを生成
する処理の流れを、フローチャートを参照して説明す
る。図２０は顔画像制御装置１の全体の処理の流れを説
明するフローチャート、図２１は形状の取得の処理の流
れを説明するフローチャート、図２２は時間配置の処理
の流れを説明するフローチャートである。Next, the flow of processing for moving the user's face image HF in accordance with the input words to generate a lip sync animation will be described with reference to the flowchart. 20 is a flowchart for explaining the overall processing flow of the face image control apparatus 1, FIG. 21 is a flowchart for explaining the shape acquisition processing flow, and FIG. 22 is a flowchart for explaining the time allocation processing flow.

【００７９】図２０に示すように、まず、ユーザの顔画
像の基となる顔モデル３Ｍを生成し（＃１）、エッジＥ
（筋肉）の位置を設定する（＃２）。ノードＮの位置の
変化による影響が及ぶ構成頂点Ｖの範囲を設定し、顔モ
デル３Ｍの筋肉が動いたときの皮膚への影響について設
定する（＃３）。形状グループごとにエッジＥの位置を
設定する（＃４）。ステップ＃１〜＃４は、顔画像の生
成のための準備なので、既に顔モデル３Ｍを生成し各設
定を行っている場合は省略してもよい。As shown in FIG. 20, first, the face model 3M which is the basis of the face image of the user is generated (# 1), and the edge E is generated.
Set the (muscle) position (# 2). The range of the constituent vertex V affected by the change in the position of the node N is set, and the effect on the skin when the muscle of the face model 3M moves is set (# 3). The position of the edge E is set for each shape group (# 4). Steps # 1 to # 4 are preparations for generating a face image, and may be omitted if the face model 3M has already been generated and each setting is performed.

【００８０】ユーザの顔画像に喋らせる言葉をテキスト
データとして入力する（＃５）。音声で入力した場合
は、テキストに変換する。入力したテキストデータに含
まれる音韻ごとに顔モデル３Ｍの形状を取得する（＃
６）。すなわち、図２１に示すように、テキストデータ
を音韻ごとに区切って分解し（＃１２１）、図１４に示
す形状グループデータ７５に基づいて各音韻が属する形
状グループを求め（＃１２２）、求められた各形状グル
ープに対応する顔モデル３Ｍの形状を図１５に示す形状
変化データ７４に基づいて取得する（＃１２３）。The words to be spoken on the face image of the user are input as text data (# 5). If input by voice, convert it to text. The shape of the face model 3M is acquired for each phoneme included in the input text data (#
6). That is, as shown in FIG. 21, the text data is divided into phonemes and decomposed (# 121), and the shape group to which each phoneme belongs is obtained based on the shape group data 75 shown in FIG. 14 (# 122). The shape of the face model 3M corresponding to each shape group is acquired based on the shape change data 74 shown in FIG. 15 (# 123).

【００８１】分解された各音韻をタイムテーブルに配置
する（＃７）。すなわち、図２２に示すように、音韻が
子音である場合は（＃１３１でＮｏ）その音韻の継続時
間をＴｓ２とし（＃１３２）、連続母音である場合は
（＃１３３でＹｅｓ）継続時間をＴｒ２とし（＃１３
４）、通常の母音である場合は（＃１３３でＮｏ）継続
時間をＴｂ２とする（＃１３５）。Each decomposed phoneme is placed in the timetable (# 7). That is, as shown in FIG. 22, when the phoneme is a consonant (No in # 131), the duration of the phoneme is set to Ts2 (# 132), and when it is a continuous vowel (Yes in # 133), the duration is Let Tr2 (# 13
4) If it is a normal vowel (No in # 133), the duration is set to Tb2 (# 135).

【００８２】タイムテーブルに配置された隣り合う音韻
同士の重なる部分について、形状の補間を行う（＃
８）。そして、上記の処理によって得られたデータに基
づいて顔モデル３Ｍの形状を変形させ、リップシンクア
ニメーションを生成する（＃９）。The shape is interpolated with respect to the overlapping portion of the adjacent phonemes arranged in the timetable (#
8). Then, the shape of the face model 3M is deformed based on the data obtained by the above processing, and a lip sync animation is generated (# 9).

【００８３】本実施形態によると、言葉の音を音韻ごと
すなわち母音または子音ごとに区切り、区切られた各音
韻ごとに顔画像の動作を制御するので、従来よりもデー
タの量を削減しつつリアルなリップシンクアニメーショ
ンを実現することができる。さらに、本実施形態では、
５つの母音の形状グループおよび３つの子音の形状グル
ープを設け、子音の種類を３つの子音の形状グループの
うちのいずれかに分類している。したがって、形状に関
するデータを８種類だけ準備すればよいので、一層デー
タの量を削減することができる。According to the present embodiment, the sound of a word is divided for each phoneme, that is, for each vowel or consonant, and the operation of the face image is controlled for each divided phoneme. It is possible to realize various lip sync animations. Furthermore, in this embodiment,
Five vowel shape groups and three consonant shape groups are provided, and the consonant type is classified into any of the three consonant shape groups. Therefore, since it is only necessary to prepare eight types of data regarding the shape, it is possible to further reduce the amount of data.

【００８４】また、本実施形態では、子音の継続時間Ｔ
ｓ２を母音の継続時間Ｔｂ２よりも短く設定し、顔画像
の形状の変化を制御している。これにより、よりリアル
なリップシンクアニメーションを実現することができ
る。また、隣り合う２つの音韻の間について形状のデー
タの補間処理を行っているので、音韻間の形状の変化が
統合され、全体として滑らかに変化するリップシンクア
ニメーションを実現することができる。In this embodiment, the consonant duration T
s2 is set to be shorter than the duration Tb2 of the vowel, and the change in the shape of the face image is controlled. As a result, a more realistic lip sync animation can be realized. Further, since the shape data is interpolated between two adjacent phonemes, the change in shape between phonemes is integrated, and a lip-sync animation that smoothly changes as a whole can be realized.

【００８５】本実施形態では、ユーザの３次元計測デー
タに標準モデルをフィッティングすることにより顔モデ
ルを取得したが、ユーザの２次元画像に標準モデルをフ
ィッティングして顔モデルを取得してもよい。または、
種々のコンピュータグラフィック（ＣＧ）プログラムを
用いて顔モデルを作成してもよい。In the present embodiment, the face model is obtained by fitting the standard model to the user's three-dimensional measurement data, but the face model may be obtained by fitting the standard model to the user's two-dimensional image. Or
The face model may be created using various computer graphics (CG) programs.

【００８６】言葉に合わせて顔画像の口を動かすだけで
なく、目、眉、または表情などを変化させるようにして
もよい。例えば、時間ごとの目、眉、または表情などの
変化を表すデータを用意し、係るデータに基づいて顔モ
デルの筋肉（エッジＥ）の制御を行うようにする。Not only may the mouth of the face image be moved in accordance with the words, but the eyes, eyebrows, or facial expression may be changed. For example, data representing changes in eyes, eyebrows, or facial expressions with time is prepared, and the muscle (edge E) of the face model is controlled based on the data.

【００８７】本実施形態の顔画像制御装置１は、複数の
ユーザ間におけるコミュニケーション手段として利用可
能である。例えば、ユーザαとユーザβとが画像処理機
能を備えた携帯電話で会話をする場合に、ユーザα、β
は自分の顔モデルを相手の携帯電話装置に予め送ってお
く。ユーザβの携帯電話端末は、受信したユーザαの音
声を音韻ごとに区切り、図２０に示すステップ＃６〜＃
９の処理を行い、ユーザαのリップシンクアニメーショ
ンを表示する。同様に、ユーザαの携帯電話端末は、ユ
ーザβのリップシンクアニメーションを表示する。The face image control apparatus 1 of this embodiment can be used as a communication means between a plurality of users. For example, when the user α and the user β have a conversation on a mobile phone having an image processing function, the users α and β
Sends his face model to the other party's mobile phone device in advance. The mobile phone terminal of the user β divides the received voice of the user α into phonemes, and performs steps # 6 to # shown in FIG.
The process of 9 is performed, and the lip sync animation of the user α is displayed. Similarly, the mobile phone terminal of the user α displays the lip sync animation of the user β.

【００８８】本実施形態では、３次元の顔モデルを用い
て顔画像の制御を行ったが、２次元の顔モデルであって
もよい。本実施形態では、子音の種類を３つの形状グル
ープに分類したが、４つ以上の形状グループに分類して
もよい。また、英語などの場合は、テキストデータを発
音記号に変換し、発音記号に基づいて顔画像を制御する
ようにしてもよい。In this embodiment, the face image is controlled using the three-dimensional face model, but the two-dimensional face model may be used. In the present embodiment, the consonant types are classified into three shape groups, but may be classified into four or more shape groups. Further, in the case of English or the like, text data may be converted into phonetic symbols and the face image may be controlled based on the phonetic symbols.

【００８９】その他、顔画像制御装置１の全体または各
部の構成、処理内容、処理順序などは、本発明の趣旨に
沿って適宜変更することができる。In addition, the configuration, processing content, processing order, etc. of the whole face image control apparatus 1 or each part can be appropriately changed in accordance with the spirit of the present invention.

【００９０】[0090]

【発明の効果】本発明によると、従来よりも形状に関す
るデータの量を少なくし、かつ、顔画像の制御によるリ
ップシンクアニメーションをリアルに実現することがで
きる。According to the present invention, it is possible to reduce the amount of data relating to the shape as compared with the related art and to realize a lip sync animation realistically by controlling a face image.

[Brief description of drawings]

【図１】本発明に係る顔画像制御装置の構成を説明する
図である。FIG. 1 is a diagram illustrating a configuration of a face image control apparatus according to the present invention.

【図２】磁気記憶装置に記憶されるプログラムおよびデ
ータを示す図である。FIG. 2 is a diagram showing programs and data stored in a magnetic storage device.

【図３】顔画像制御装置の機能的構成を説明する図であ
る。FIG. 3 is a diagram illustrating a functional configuration of a face image control device.

【図４】３次元形状モデルの生成の処理の流れを説明す
るフローチャートである。FIG. 4 is a flowchart illustrating a flow of processing for generating a three-dimensional shape model.

【図５】標準モデルの例を示す図である。FIG. 5 is a diagram showing an example of a standard model.

【図６】変形処理の流れを説明するフローチャートであ
る。FIG. 6 is a flowchart illustrating the flow of a transformation process.

【図７】標準モデルの面Ｓと３次元計測データの点Ｐと
を模式的に示す図である。FIG. 7 is a diagram schematically showing a surface S of a standard model and a point P of three-dimensional measurement data.

【図８】標準モデルの異常変形を防ぐための仮想バネを
説明するための図である。FIG. 8 is a diagram for explaining a virtual spring for preventing abnormal deformation of the standard model.

【図９】顔モデルの構成の例を示す図である。FIG. 9 is a diagram showing an example of the configuration of a face model.

【図１０】筋肉配置データの例を示す図である。FIG. 10 is a diagram showing an example of muscle placement data.

【図１１】ノード影響データの例を示す図である。FIG. 11 is a diagram showing an example of node influence data.

【図１２】あるノードの移動による影響が及ぶ範囲の例
を説明する図である。FIG. 12 is a diagram illustrating an example of a range affected by the movement of a certain node.

【図１３】母音および子音が通常属する形状グループを
示す図である。FIG. 13 is a diagram showing shape groups to which vowels and consonants usually belong.

【図１４】形状グループデータの例を示す図である。FIG. 14 is a diagram showing an example of shape group data.

【図１５】形状変化データの例を示す図である。FIG. 15 is a diagram showing an example of shape change data.

【図１６】各形状グループに属する音を発したときの顔
モデルの形状の例を示す図である。FIG. 16 is a diagram showing an example of the shape of a face model when a sound belonging to each shape group is emitted.

【図１７】「ｋｏｕｍｉｎｋａｎ」の各音韻が属する形
状グループの例を示す図である。FIG. 17 is a diagram showing an example of a shape group to which each phoneme of “kouminkan” belongs.

【図１８】母音、子音、および連続母音の継続時間を説
明する図である。FIG. 18 is a diagram illustrating durations of vowels, consonants, and continuous vowels.

【図１９】タイムテーブルの例を示す図である。FIG. 19 is a diagram showing an example of a time table.

【図２０】顔画像制御装置の全体の処理の流れを説明す
るフローチャートである。FIG. 20 is a flowchart illustrating the overall processing flow of the face image control apparatus.

【図２１】形状の取得の処理の流れを説明するフローチ
ャートである。FIG. 21 is a flowchart illustrating a flow of a shape acquisition process.

【図２２】時間配置の処理の流れを説明するフローチャ
ートである。FIG. 22 is a flowchart illustrating a processing flow of time allocation.

[Explanation of symbols]

１顔画像制御装置１０６顔画像制御部（動作制御手段）１０７形状変化データ記憶部（形状データ記憶手段）１９ａ〜１９ｃリムーバブルディスク（記録媒体）７４形状変化データ（形状データ）ＨＦ顔画像 1 Face image control device 106 face image control unit (motion control means) 107 Shape Change Data Storage Unit (Shape Data Storage Means) 19a to 19c Removable disk (recording medium) 74 Shape change data (shape data) HF face image

Claims

[Claims]

1. A face image control method for controlling the motion of a face image so that the mouth moves in accordance with a word, wherein first shape data relating to the shape of the mouth when the vowel is emitted for each type of vowel. The second shape data on the shape of the mouth when memorizing and classifying consonant types that have common points in the shape of the mouth when pronounced, into the same group, and emitting consonants classified into the group for each group Storing the sound of the word for each vowel or consonant, and for each of the separated vowels or consonants, the first shape data corresponding to the vowel or a group corresponding to the group in which the consonant is classified. A face image control method comprising: controlling the operation of the face image based on the second shape data.

2. A face image control method for controlling the movement of a face image so that the mouth moves in accordance with a word, wherein first shape data relating to the shape of the mouth when the vowel is emitted for each type of vowel. The mouth shape when memorized and pronounced is such that the first group has the characteristics that make the lips fit together, the second group that has the characteristics that makes the mouth slightly open, and the mouth shape of the phoneme immediately before that. The second shape data regarding the mouth shape of each characteristic is stored for each group of the third group having the remaining characteristics, and the type of each consonant is defined as the mouth shape when the consonant is emitted. Is classified into the group having the closest feature among the features, the sound of the word is divided into vowels or consonants, and each of the separated vowels or consonants has the first shape corresponding to the vowel. Day Or based on the second shape data corresponding to the group to which the consonant is classified for controlling the operation of the face image, a face image control method, characterized in that.

3. When the sound of the word includes two predetermined consecutive vowels or consonants, the vowel or consonant following the two consecutive vowels or consonants is replaced with a predetermined vowel or consonant, and replaced. The face image control method according to claim 1, wherein the control is performed on the generated sound.

4. The face image control method according to claim 1 or 3, wherein the control is performed such that the time for maintaining the mouth shape during consonants is shorter than the time for maintaining the mouth shape during vowels. .

5. When the sound of the word includes two predetermined continuous vowels or consonants, one of the two continuous vowels or consonants or both of them has a predetermined mouth shape. The face image control method according to claim 1, wherein the control is performed by changing the time.

6. A face image control device for controlling the movement of a face image so that the mouth moves in accordance with a word, the first shape data relating to the shape of the mouth when the vowel is emitted for each type of vowel, And shape data storage means for storing second shape data relating to the shape of the mouth when emitting the consonant of the group for each group in which the types of consonants that have a common point in the shape of the mouth when sounding, The sound of a word is divided for each vowel and each consonant, and for each of the divided vowels and the consonants, the first shape data corresponding to the vowel or the second corresponding to the group in which the consonant is classified A face image control device comprising: an operation control unit that controls the operation of the face image based on shape data.

7. A face image control device for controlling the movement of a face image so that the mouth moves in accordance with a word, wherein first shape data regarding the shape of the mouth when the vowel is emitted for each type of vowel. The mouth shape at the time of memorizing and pronouncing is such that the first group has a feature that makes the lips fit together, the second group has a feature that makes the mouth slightly open, and the mouth shape of the phoneme immediately before that. The second shape data regarding the mouth shape of each characteristic is stored for each group of the third group having the remaining characteristics, and the type of each consonant is the shape of the mouth when the consonant is emitted. Shape data storage means for classifying and storing into the group having the closest feature among the features, separating the sound of the word for each vowel or consonant, and for each of the separated vowels and consonants, the vowel Motion control means for controlling the motion of the face image based on the corresponding first shape data or the second shape data corresponding to the group into which the consonants are classified. Face image control device.

8. The first shape data relating to the shape of the mouth when the vowel is emitted for each kind of the vowel, and the group of the consonant having the common point in the shape of the mouth when the vowel is pronounced. A computer program used for a computer that has shape data storage means for storing second shape data relating to the shape of the mouth when a consonant of a group is emitted, and controls the operation of a face image so that the mouth moves in accordance with a word. There is a step of dividing the sound of the word into vowels and consonants, and for each of the separated vowels and the consonants, the first shape data corresponding to the vowel or the group into which the consonant is classified A step of controlling the operation of the face image based on the corresponding second shape data; Program.

9. A computer-readable recording medium in which the computer program according to claim 8 is recorded.