JP4182656B2

JP4182656B2 - Terminal device, transmission method, and computer program

Info

Publication number: JP4182656B2
Application number: JP2001304909A
Authority: JP
Inventors: 修遠山
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2001-10-01
Filing date: 2001-10-01
Publication date: 2008-11-19
Anticipated expiration: 2021-10-01
Also published as: JP2003109036A

Description

【０００１】
【発明の属する技術分野】
本発明は、メッセージに合わせて動作するアニメーションのためのデータの送受信を行う通信システムおよびこれに用いられる端末装置に関する。
【０００２】
【従来の技術】
従来より、メッセージの送信者のアニメーションがメッセージに合わせて動作するように受信者側の端末装置の表示画面に表示する技術が提案されている。例えば、特開平８−３０７８４１号公報には、送信側からの音声信号に基づいて擬似動画（アニメーション）を生成し表示するＴＶ電話装置が開示されている。このＴＶ電話装置によると、送信側から画像データを受けることなく、アニメーションを表示することができる。
【０００３】
ところで、最近は、メッセージのやり取りを行うための多くの種類の端末装置が多く提案されまたは普及している。例えば、パーソナルコンピュータ、ＰＤＡ（Personal Digital Assistant）、携帯電話装置、またはＰＨＳ端末装置などがある。さらに、これらの装置は、各メーカまたは各通信会社から様々な機種が発売されている。
【０００４】
【発明が解決しようとする課題】
これらの装置を用いて、特開平８−３０７８４１号のＴＶ電話装置のように、送信側のメッセージに基づいて受信側の端末装置にアニメーションを表示することが考えられる。
【０００５】
しかし、これらの端末装置は互いに仕様が異なる場合が多い。例えば、メッセージの送信側および受信側の端末装置がともにアニメーションの実行が可能であっても、アニメーションを実行するためのデータ形式が互いに異なる場合がある。このような場合は、一方の端末装置から他方の端末装置へアニメーションのためのデータを送信しても、他方の端末装置においてアニメーションを実行することはできない。
【０００６】
特に、ＰＤＡまたは携帯電話などのモバイル機器は、各メーカおよび各通信会社が新機種および新規格の開発を行い、数多くの新旧の端末装置が混在しているので、データ形式を統一することは実際には難しい。
【０００７】
本発明は、このような問題点に鑑み、メッセージの送信先の端末装置の機種に関わらず、メッセージに合わせてアニメーションの実行が可能なデータを当該端末装置に送信する通信システムおよび端末装置を提供することを目的とする。
【０００８】
本発明の一形態に係る端末装置は、ユーザが入力したメッセージを他の端末装置に通信回線を介して送信する端末装置であって、前記他の端末装置がモデルの形状を変化させてアニメーションを生成するための動作制御データに対応しているか否かを判別する、動作制御データ対応判別手段と、前記動作制御データ対応判別手段によって前記他の端末装置が前記動作制御データに対応していると判別された場合は、前記モデルを用いたアニメーションのためのアニメーションデータとして、前記メッセージの音韻ごとに前記モデルの形状を変化させる動作制御データを生成して当該他の端末装置に送信し、対応していないと判別された場合は、当該アニメーションデータとして、前記メッセージに合わせて前記モデルを変化させたときの各タイミングでの画像データを送信する、アニメーションデータ送信手段と、を有してなる。
【０００９】
本発明の一形態に係る送信方法は、ユーザが入力したメッセージを他の端末装置に通信回線を介して送信する端末装置における送信方法であって、前記端末装置に、前記他の端末装置がモデルの形状を変化させてアニメーションを生成するための動作制御データに対応しているか否かを判別する処理を実行させ、前記他の端末装置が前記動作制御データに対応していると判別された場合は、前記モデルを用いたアニメーションのためのアニメーションデータとして、前記メッセージの音韻ごとに前記モデルの形状を変化させる動作制御データを生成して当該他の端末装置に送信する処理を実行させ、前記他の端末装置が前記動作制御データに対応していないと判別された場合は、前記アニメーションデータとして、前記メッセージに合わせて前記モデルを変化させたときの各タイミングでの画像データを送信する処理、を実行させる。
【００１０】
本発明の一形態に係るコンピュータプログラムは、ユーザが入力したメッセージを他の端末装置に通信回線を介して送信するコンピュータに用いられるコンピュータプログラムであって、前記他の端末装置がモデルの形状を変化させてアニメーションを生成するための動作制御データに対応しているか否かを判別する手段と、前記他の端末装置が前記動作制御データに対応していると判別された場合は、前記モデルを用いたアニメーションのためのアニメーションデータとして、前記メッセージの音韻ごとに前記モデルの形状を変化させる動作制御データを生成して当該他の端末装置に送信する手段と、前記他の端末装置が前記動作制御データに対応していないと判別された場合は、前記アニメーションデータとして、前記メッセージの出力中の各タイミングに合わせて前記モデルを変化させたときの各タイミングでの画像データを送信する手段として、前記コンピュータを機能させる。
【００１４】
【発明の実施の形態】
図１は本発明に係るアニメーション通信システム１の構成を説明する図、図２は端末装置２α、２βの構成を説明する図、図３は送信側の端末装置２αの記憶装置２２に記憶されるプログラムおよびデータを示す図、図４は受信側の端末装置２βの記憶装置２２に記憶されるプログラムおよびデータを示す図、図５は送信側の端末装置２αの機能的構成を説明する図、図６は受信側の端末装置２βの機能的構成を説明する図である。
【００１５】
本発明に係るアニメーション通信システム１は、図１に示すように、複数の端末装置２によって構成される。端末装置２として、携帯電話装置、パーソナルコンピュータ（パソコン）、またはＰＤＡ（Personal Digital Assistant）など、通信機能を有する種々の端末装置が用いられる。
【００１６】
これらの端末装置２は、通信回線４を介して互いに接続し、データの送受信を行うことが可能である。通信回線４として、アナログ回線またはＩＳＤＮなどの公衆回線、携帯電話回線、専用線、またはインターネットなどが用いられる。
【００１７】
ユーザは、自分の端末装置２から他のユーザの端末装置２にメッセージを送信することができる。以下、メッセージの送信側の端末装置と受信側の端末装置とを区別するために、送信側、受信側の端末装置２をそれぞれ端末装置２α、２βと記載する。
【００１８】
図２（ａ）は端末装置２α、２βがパソコンである場合の構成を示し、図２（ｂ）は端末装置２α、２βが携帯電話装置である場合の構成を示す。図２（ａ）（ｂ）に示すように、端末装置２α、２βは、処理装置２０、表示装置２１、記憶装置２２、文字入力装置２３、音声入力装置２４、および音声出力装置２５などによって構成される。
【００１９】
処理装置２０は、ＣＰＵ２０ａ、ＲＡＭ２０ｂ、ＲＯＭ２０ｃ、各種の入出力ポート２０ｄ、および各種のコントローラ２０ｅなどによって構成される。
端末装置２α、２βがパソコンである場合は、記憶装置２２として、磁気記憶装置などが用いられる。携帯電話装置である場合は、ＥＥＰＲＯＭなどの書き換え可能な記憶素子が用いられる。
【００２０】
端末装置２αの記憶装置２２には、図３に示すように、オペレーティングシステム（ＯＳ）２２ａ、動作制御データ生成プログラム２２ｂ、モデリングプログラム２２ｃ、および後に説明する種々の処理のためのプログラムおよびデータなどが記憶されている。
【００２１】
端末装置２βの記憶装置２２には、図４に示すように、オペレーティングシステム（ＯＳ）２２ｄ、アニメーション実行プログラム２２ｅ、および後に説明する種々の処理のためのプログラムおよびデータなどが記憶されている。ただし、標準モデル情報７１または顔形状情報７２は、記憶されていない場合がある。端末装置２βには、機種Ｘ、Ｙ、およびＺの３種類の機種があり、図４（ａ）〜（ｃ）に示すように、機種によって記憶装置２２に記憶されるプログラムなどの内容が異なる。機種Ｘ〜Ｚの相違については、後に説明する。
【００２２】
記憶装置２２に記憶されているプログラムおよびデータは、必要に応じてＲＡＭ２０ｂにロードされる。ロードされたプログラムは、ＣＰＵ２０ａによって実行される。通信回線４を介して端末装置２α、２βを他のコンピュータに接続し、プログラムまたはデータをダウンロードしてもよい。または、フロッピディスク２９ａ、ＣＤ−ＲＯＭ２９ｂ、または光磁気ディスク（ＭＯ）２９ｃなどの各種リムーバブルディスク（記録媒体）からプログラムまたはデータをロードしてもよい。
【００２３】
端末装置２βの表示装置２１には、処理装置２０による処理結果が表示される。例えば、端末装置２αから受信したメッセージに合わせて口が動くようにメッセージの送信者（端末装置２αのユーザ）の顔画像ＨＦが表示装置２１の表示画面ＨＧにアニメーションとして表示される。音声出力装置２５は、アニメーションに合わせてメッセージを音声として出力する。これにより、送信者の顔画像ＨＦがメッセージを読み上げているかのように端末装置２βのユーザに認識させることができる。
【００２４】
顔画像ＨＦは、送信者の頭部の３次元形状を示す３次元形状モデル（顔モデル）を所定の方向から２次元上に投影することによって得られる。つまり、顔画像ＨＦが動作するアニメーションを生成するには、顔モデルの形状を変化させながら所定の方向から２次元上に投影すればよい。顔モデルに関するデータおよび顔モデルの制御については、後に説明する。
【００２５】
このような構成によって、端末装置２αには、図５に示すように、データ生成部２０１、顔モデルデータ生成部２０２、音声テキスト変換部２０３、形状指定部２０４、通信状況判別部２０５、データ送信部２０６、およびデータ記憶部２０７などが設けられる。端末装置２βには、図６に示すような機能が設けられる。ただし、図６（ａ）〜（ｃ）に示すように、機種ごとに機能的構成がそれぞれ異なる。
〔顔画像の生成のためのデータ〕
図５において、端末装置２αのデータ記憶部２０７は、標準モデル情報７１、顔形状情報７２、および符号形状情報７３などを記憶する。
【００２６】
図７は標準モデルＤＳまたは顔モデルＤＳ' の構成の例を示す図、図８はエッジＥとノードＮとの対応関係を示す図、図９はノードＮの影響を受ける構成頂点Ｖを示す図、図１０はノードＮが影響を与える範囲を説明する図である。
【００２７】
標準モデル情報７１は、図７（ａ）に示す標準モデルＤＳの構成頂点（Model Vertex）Ｖ、ポリゴンＰｇ、ノード（Node）Ｎ、およびエッジ（Edge）Ｅなどに関する情報である。標準モデルＤＳは、標準的な顔のサイズおよび形状を有した、頭部の全周を構造化した３次元モデルである。図７（ａ）において、複数の細い直線同士の交点は、構成頂点Ｖを示す。各構成頂点Ｖの位置は、ｘ、ｙ、ｚの３次元座標（位置データ）によって決まる。各ポリゴンＰｇは、同一平面上にある複数の構成頂点Ｖの集合すなわち位相（topology）によって定義される。これら位置データおよび位相データによってジオメトリデータ（Geometry Data ）が構成される。太い直線は、筋肉を意味するエッジ（Edge）Ｅを示す。黒い丸印は筋肉の端点を意味するノード（Node）Ｎを示す。
【００２８】
ノードＮの位置は、次に示す式（１）のように構成頂点Ｖの相対的位置として表される。
【００２９】
【数１】

【００３０】
エッジＥ（Ｅ１、Ｅ２、…）の位置は、図８のエッジＥの第一のパラメータに示すように、異なる２つのノードＮによって決められる。ノードＮ（Ｎ１、Ｎ２、…）は、顔全体の各筋肉の端点となる位置に配置されている。なお、図７（ｂ）は、ノードＮとエッジＥとの関係を分かりやすくするために図７（ａ）から構成頂点Ｖを省略して示している。図７（ａ）（ｂ）は、顔の右半分のノードＮおよびエッジＥを省略して示しているが、実際には、左半分と同様にノードＮおよびエッジＥが存在する。
【００３１】
エッジＥの第二のパラメータは、そのエッジＥ（筋肉）を変位させた場合に、どちらの端点（ノードＮ）をどれだけの割合（ウェイト）で移動させるかを示す。例えば、エッジＥ３の第二のパラメータ「０．７，０．３」は、エッジＥ３が変位したときに、ノードＮ４とノードＮ３とを７対３の割合でそれぞれ移動させるということを示している。エッジＥの変位量は、筋肉の収縮の度合によって表される。筋肉が収縮していない状態を「０」、最も収縮した状態を「２０」とする。例えば、変位量（収縮の度合）が「１５．０」であれば、その筋肉（エッジＥ）が７５％収縮することを示す。
【００３２】
エッジＥが変位するとき、ノードＮが移動する位置は、次に示す式（２）によって求められる。
【００３３】
【数２】

【００３４】
ただし、実際には複数のエッジＥに関係するノードＮが存在するため、収束演算または連立演算によってノードＮの移動後の位置が求められる。
各ノードＮが移動したときに影響を受ける構成頂点Ｖは、図９の第二のパラメータのように示される。つまり、この第二のパラメータは、ノードＮが移動したときの影響の範囲を示している。ノードＮの移動による影響を受ける構成頂点Ｖは、そのノードＮの周辺に集中している。例えば、図１０において、大きい黒丸が示すノードＮの移動による影響を受ける構成頂点Ｖは、小さい黒丸が示す９つの構成頂点Ｖである。
【００３５】
図９の第一のパラメータは、ノードＮが移動したときに構成頂点Ｖに対して与える影響の度合（Intensity ）を示している。この値が大きいと、ノードＮの移動に伴う構成頂点Ｖの移動量（変位量）が大きくなる。
【００３６】
ノードＮが移動するのに伴って構成頂点Ｖが移動する位置は、次に示す式（３）によって求められる。
【００３７】
【数３】

【００３８】
このように、標準モデル情報７１は、図８、図９、および式（１）〜式（３）に示すように、標準モデルＤＳの構成頂点Ｖ、ノードＮ、およびエッジＥの位置および関係を表している。標準モデル情報７１を制御することにより、標準モデルＤＳのエッジＥ（筋肉）を動かして標準モデルＤＳを任意の形状に変化させることができる。例えば、標準モデルＤＳの右目を閉じる（ウィンクさせる）には、右目の周辺の所定のエッジＥ（筋肉）をそれぞれ所定の値だけ変位（収縮）させればよい。すると、式（２）に従って各エッジＥに関連する各ノードＮの位置が移動し、式（３）に従って各ノードＮの影響を受ける各構成頂点Ｖの位置が移動し、これにより標準モデルＤＳはウィンクした形状となる。
【００３９】
顔形状情報７２は、端末装置２αのユーザすなわちメッセージの送信者の顔の３次元形状モデル（顔モデル）の構成頂点に関する情報である。顔モデルは、標準モデルＤＳをユーザの３次元計測データにフィッティングすることによって生成される。顔モデルの生成は、後に説明する顔モデルデータ生成部２０２よって行われる。
【００４０】
つまり、顔モデルの各構成頂点は、フィッティングの処理によって移動した標準モデルＤＳの各構成頂点Ｖに対応する。標準モデル情報７１の各構成頂点Ｖを顔モデルの各構成頂点に置き換えると、式（１）および図８の関係に従って顔モデルのノードＮおよびエッジＥの位置が求められる。したがって、標準モデルＤＳの場合と同様に、エッジＥを変位させることによって顔モデルの形状を変化させることができる。以下、標準モデルＤＳを端末装置２αのユーザの３次元計測データにフィッティングして得られた顔モデルを「顔モデルＤＳ' 」と記載する。図１１は符号形状情報７３の例を示す図、図１２は標準モデルＤＳまたは顔モデルＤＳ' を各形状グループの形状に変化させた場合の例を示す図である。
【００４１】
ところで、顔モデルＤＳ' または標準モデルＤＳの形状を連続して変化させる場合は、各エッジＥに対して与える値の数が増えるので、全体のデータ量が増える。例えば、ある言葉に合わせて顔モデルＤＳ' の形状を変化させる場合は、その言葉に含まれる音の数だけその音を発しているかのように口を開閉するように各エッジＥの変位量を設定しなければならない。
【００４２】
しかし、一般に、互いに異なる音韻であっても発音するときの口の形状の特徴が同一でありまたは類似するものがある。例えば、子音「ｍ」および子音「ｎ」は、ともに唇を合わせて発音されるという点で類似する。
【００４３】
符号形状情報７３は、このように、発音するときの口の形状の特徴が同一または類似の音韻をグループ化し、グループごとに各筋肉（エッジＥ１、Ｅ２、…）の変位量を定めている。本実施形態では、図１１（ａ）に示すように、５つの母音のグループ（形状グループＡ、Ｅ、Ｉ、Ｏ、Ｕ）および３つの子音のグループ（形状グループ１〜３）が設けられている。
【００４４】
形状グループＡ、Ｅ、Ｉ、Ｏ、Ｕには、それぞれ「ａ」、「ｅ」、「ｉ」、「ｏ」、「ｕ」の１種類ずつの母音が属する。形状グループ１は唇を合わせて発音する子音のグループ、形状グループ２は唇を合わせずに口を所定の形状にして発音する子音のグループ、形状グループ３は前に発した音の口の形状のまま発音する子音のグループである。係る分類によると、通常、形状グループ１には「ｂ、ｆ、ｍ、ｐ、ｖ」の５種類の子音が属し、形状グループ２には「ｄ、ｇ、ｊ、ｋ、ｌ、ｎ、ｒ、ｓ、ｔ、ｗ、ｚ」の１１種類の子音が属し、形状グループ３には「ｈ、ｙ」の２種類の子音が属する。
【００４５】
すなわち、符号形状情報７３は、各形状グループＡ、Ｅ、Ｉ、Ｏ、Ｕ、１、２に属する音を発するときに、顔モデルＤＳ' がそれぞれ図１２（ａ）〜（ｇ）に示す形状になるように顔モデルＤＳ' の各筋肉（エッジＥ１、Ｅ２、…）を変位させるためのデータである。ただし、形状グループ３の場合は、顔モデルＤＳ' は前に発した音（音韻）の形状のまま保たれるので、変位量の値を持たない。
【００４６】
さらに、符号形状情報７３は、図１１（ｂ）に示すように、「ウィンク」（形状グループ１１）、「驚き」（形状グループ１２）、および「喜び」（形状グループ１３）などの表情をしたときの顔モデルＤＳ' の形状についての各エッジＥの変位量を有している。図１１（ｂ）に示す各値によると、図１２（ｈ）〜（ｊ）に示す形状を得ることができる。
【００４７】
符号形状情報７３を用いると、形状グループ名を順次指定するだけで顔モデルＤＳ' の形状を連続して変化させることができる。例えば、「かん（ｋａｎ）」という言葉に合わせて顔モデルＤＳ' の形状を変化させる場合は、形状グループ２、Ａ、１と指定すればよい。
〔顔モデルの生成（標準モデルのフィッティング）〕
図５に戻って、顔モデルデータ生成部２０２は、メッセージに合わせて動作する顔画像ＨＦすなわちアニメーションの基となる顔モデルＤＳ' を生成する。次に、顔モデルすなわち３次元形状モデルを生成する方法について、フローチャートを参照して説明する。
【００４８】
図１３は３次元形状モデルの生成の処理の流れを説明するフローチャート、図１４は標準モデルＤＳの例を示す図、図１５は変形処理の流れを説明するフローチャート、図１６は標準モデルＤＳの面Ｓと３次元計測データの点Ｐとを模式的に示す図、図１７は標準モデルＤＳの異常変形を防ぐための仮想バネを説明するための図である。
【００４９】
図１３において、まず、図１４に示す標準モデルＤＳと人物（例えば端末装置２αのユーザ）の３次元計測データとの概略の位置合わせを行う（＃１０１）。標準モデルＤＳは、標準的な顔のサイズおよび形状を有した、頭部の全周を構造化した３次元データである。３次元計測データは、点群からなるユーザの顔の３次元データである。すなわち、ステップ＃１０１では、標準モデルＤＳと３次元計測データとの距離が最小となるように、標準モデルＤＳの向き、大きさ、および位置を変更する。一般に、標準モデルＤＳおよび３次元計測データとして、無表情の状態のものが用いられる。なお、３次元計測データは、３次元計測装置でユーザを撮影するなどして予め用意されている。
【００５０】
輪郭および特徴点を抽出する（＃１０２）。標準モデルＤＳについての輪郭ＲＫおよび特徴点ＴＴと同じ位置に配置されるべき輪郭および特徴点を、３次元計測データ上に、またはそれに対応する２次元画像上に配置する。
【００５１】
特徴点として、例えば、目や口の端部、鼻の頂部、顎の下端部のように実際に特徴のある部分、または、それらの中間のようなそれ自体では特徴はないが位置的に特定し易い部分などが選ばれる。輪郭として、顎のライン、唇のライン、または瞼のラインなどが選ばれる。
【００５２】
計算量および誤差を削減するために、３次元計測データについてデータの削減を行う（＃１０３）。
標準モデルＤＳの変形を行う（＃１０４）。すなわち、３次元計測データの各点と標準モデルＤＳの面との間の距離に関連して定義されたエネルギー関数、または過剰な変形を回避するために定義されたエネルギー関数などを用い、それらが最小となるように標準モデルＤＳの面を変形させる。
【００５３】
そして、対象とするエネルギー関数および制御点を変更し、ステップ＃１０４と同様な変更のための処理を繰り返す（＃１０５）。
次に、ステップ＃１０４の変形処理について説明する。
【００５４】
図１６において、３次元計測データを構成する点群の１つが点Ｐｋで示されている。標準モデルＤＳの面Ｓにおいて、点Ｐｋに最も近い点がＱｋで示されている。点Ｑｋは、点Ｐｋから面Ｓに垂線を下ろしたときの交点である。
【００５５】
点群に面Ｓをフィッティングする方法は次の通りである。ここでは、一般的なフィッティングについて説明する。
点群の中の１つの点Ｐｋ、それに対応する点Ｑｋ、および対応点群Ｔ＝｛（Ｐｋ，Ｑｋ），ｋ＝１…ｎ｝について、フィッティングエネルギー（Fitting Energy) 関数Ｆｆ（Ｕ）を、次の式（４）のように設定する。
【００５６】
【数４】

【００５７】
ただし、Ｑｋ（Ｕ）は、ＱｋがＵの関数であることを示す。
また、面Ｓの過度の変形を防ぐために、図１７に示す仮想バネ(elastic bar) ＫＢを導入する。仮想バネＫＢの制約に基づいて、面Ｓの形状安定化のための安定化エネルギー関数を導く。
【００５８】
すなわち、図１７において、フィッティング対象である標準モデルＤＳの面（曲面）Ｓの一部が示されている。面Ｓは、制御点群Ｕ＝｜ｕｉ，ｉ＝１…ｎ｜で形成されている。隣接する制御点間には、仮想バネＫＢが配置されている。仮想バネＫＢは、制御点間に引っ張り力による拘束を与え、面Ｓの異常変形を防ぐ働きをする。
【００５９】
つまり、隣接する制御点ｕの間隔が大きくなった場合に、それに応じて仮想バネＫＢによる引っ張り力が大きくなる。例えば、点Ｑｋが点Ｐｋに近づく場合に、その移動にともなって制御点ｕの間隔が大きくなると、仮想バネＫＢによる引っ張り力が増大する。点Ｑｋが移動しても制御点ｕの間隔が変わらなければ、つまり制御点ｕ間の相対位置関係に変化がなければ、仮想バネＫＢによる引っ張り力は変化しない。仮想バネＫＢによる引っ張り力を面Ｓの全体について平均化したものを、安定化エネルギーとして定義する。したがって、面Ｓの一部が突出して変形した場合に安定化エネルギーは増大する。面Ｓの全体が平均して移動すれば安定化エネルギーは零である。
【００６０】
安定化エネルギー関数Ｆｓ（Ｕ）は、次の式（５）で示される。
【００６１】
【数５】

【００６２】
ここで、
【００６３】
【数６】

【００６４】
は、それぞれ、仮想バネＫＢの初期端点、変形後の仮想バネＫＢの端点である。ｃはバネ係数であり、Ｍは仮想バネＫＢの本数である。また、次の関係が成り立つ。
【００６５】
【数７】

【００６６】
したがって、バネ係数ｃを大きくすると、仮想バネＫＢは硬くなって変形し難くなる。
このような安定化エネルギー関数Ｆｓ（Ｕ）を導入することにより、面Ｓの形状変化に一定の拘束を設けることとなり、面Ｓの過度の変形を防ぐことができる。
【００６７】
上に述べたフィッティングエネルギー関数Ｆｆ（Ｕ）、および安定化エネルギー関数Ｆｓ（Ｕ）を用い、フィッティングの評価関数Ｆ（Ｕ）を次の式（６）のように定義する。
【００６８】
Ｆ（Ｕ）＝ＷｆＦｆ（Ｕ）＋ＷｓＦｓ（Ｕ） ……（６）
ここで、Ｗｆ，Ｗｓは、それぞれ正規化のための重み係数である。
式（６）の評価関数Ｆ（Ｕ）が十分小さくなるように、面Ｓの変形および対応点の探索を繰り返し、面のフィッティングを行う。例えば、Ｆ（Ｕ）のＵに関する微分が０に近づく方向にフィッティングを行う。
【００６９】
図１５において、変形処理では、まず、点Ｐｋに対応する点Ｑｋを計算で求め、点Ｐｋと点Ｑｋの組みを作成する（＃１１１）。
面Ｓを変形し（＃１１２）、変形後の評価関数Ｆ（Ｕ）を計算する（＃１１３）。評価関数Ｆ（Ｕ）が収束するまで（＃１１４でＹｅｓ）、処理を繰り返す。
【００７０】
評価関数Ｆ（Ｕ）の収束を判定する方法として、評価関数Ｆ（Ｕ）が所定の値よりも小さくなったときを収束とする方法、前回の計算と比較べた変化の割合が所定値以下となったときに収束とする方法など、公知の方法を用いることが可能である。
【００７１】
このような処理によって標準モデルＤＳを変形し、ユーザの顔の形状をした３次元形状モデル（顔モデルＤＳ' ）を生成することができる。なお、ユーザの２次元画像に標準モデルをフィッティングして顔モデルを取得してもよい。または、種々のコンピュータグラフィック（ＣＧ）プログラムを用いて顔モデルを作成してもよい。
〔顔モデルの動作の制御のためのデータ〕
図１８はエッジ変位データ７６および符号データ７５の例を示す図、図１９は端末装置２βの機種の相違を説明する図である。
【００７２】
前に述べたように、顔モデルＤＳ' の形状を変化させるには、顔モデルＤＳ' の各エッジＥに対して変位量を直接与える方法と形状グループ名を指定する方法とがある。
【００７３】
例えば、「こうみんかん」の言葉に合わせて口を動かした後にウィンクをするように顔モデルＤＳ' の形状を連続して変化させる場合において、前者の方法であれば、図１８（ａ）のエッジ変位データ７６のように、顔モデルＤＳ' の各エッジＥの変位量を時間ごとに指定して直接的に顔モデルＤＳ' の動作を指定する。よって、エッジ変位データ７６を「動きデータ」と呼ぶことができる。
【００７４】
後者の方法であれば、図１８（ｂ）の符号データ７５のように、「１」、「Ａ」、「１１」などの符号を用いて形状グループを指定して間接的に顔モデルＤＳ' の動作（各エッジＥの変位量）を指定する。顔モデルＤＳ' の形状を変化させる際に、この符号データ７５は、図１１に示す符号形状情報７３に基づいてエッジ変位データ７６に変換される。つまり、符号データ７５は符号形状情報（動きデータ）７３を得るためのパラメータを意味するので、符号データ７５を「動きパラメータ」と呼ぶことができる。
【００７５】
図５に戻って、通信状況判別部２０５は、通信に関する状況を判別し、端末装置２βに送信するためのデータの種類を決める。具体的には、メッセージの送信相手である端末装置２βがどのような機種または機能であるか、端末装置２βにどのようなデータが記憶されているか、または通信回線４がどれくらいの通信速度であるかのいずれかを判別し、標準モデル情報７１、顔形状情報７２、符号データ（動きパラメータ）７５、およびエッジ変位データ（動きデータ）７６などのうち、いずれのデータを送信するかを選択する。通信に関する状況の判別は、実際に端末装置２βとの間で通信を開始した後に行われる。なお、ユーザが端末装置２αを操作して、通信に関する状況を入力してもよいし、端末装置２βに送信するデータを選択してもよい。
【００７６】
ここで、端末装置２βの機種Ｘ〜Ｚについて図１９を参照して説明する。機種Ｘは、符号データ７５に基づいて顔モデルＤＳ' の形状を変化させ、アニメーションを生成し表示することができる。例えば、「２、Ａ、１」という符号データ７５が与えられると、符号形状情報７３に基づいて符号データ７５をエッジ変位データ７６に変換し、順次、顔モデルＤＳ' の形状を図１２（ｇ）（ａ）（ｆ）のように変化させることができる。したがって、次に説明する機種Ｙのように、エッジ変位データ７６を直接与えられた場合であっても、アニメーションを生成することができる。
【００７７】
機種Ｙは、符号データ７５に対応しておらず、符号データ７５に基づいて顔モデルＤＳ' を制御することができない。したがって、顔モデルＤＳ' の形状を変化させるには、直接、エッジ変位データ７６が与えられなければならない。
【００７８】
機種Ｚは、符号データ７５およびエッジ変位データ７６のいずれにも対応しておらず、顔モデルＤＳ' の形状を変化させてアニメーションを生成することができない。ただし、入力された画像データに基づいてアニメーションを表示することが可能である。
【００７９】
さらに、機種Ｘ、Ｙは、標準モデルＤＳの標準モデル情報７１を有する場合と有しない場合とがある。このように、機種および標準モデル情報７１の有無によって、端末装置２βに対して送信されるアニメーションの生成のために必要なデータの量の多さは図１９に示すような順番になる。
【００８０】
図５のデータ生成部２０１は、符号生成部２１１、変位データ生成部２１２、および画像データ生成部２１３などからなり、通信状況判別部２０５による判別結果に従って端末装置２βに送信するためのデータを生成する。
【００８１】
端末装置２βが機種Ｘであると判別された場合は、原則として、符号データ７５を生成する。すなわち、符号生成部２１１によって、キーボードなどの文字入力装置２３から入力された端末装置２αのユーザのメッセージであるテキストデータ７４を音韻ごとに区切り、区切られた各音韻がいずれの形状グループに属するかを符号形状情報７３に基づいて求める。例えば、テキストデータ７４が「こうみんかん（ｋｏｕｍｉｎｋａｎ）」である場合は、「ｋ、ｏ、ｕ、ｍ、ｉ、ｎ、ｋ、ａ、ｎ」の９つの音韻に区切られ、「２、Ｏ、Ｏ、１、Ｉ、１、２、Ａ、１」という符合データ７５が得られる。
【００８２】
ただし、３番目の音韻「ｕ」は、形状グループＯに置き換えられる。なぜなら、一般に、「こう（ｋｏｕ）」のように「ｕ」の直前に「ｏ」がある場合は、「ｕ」は「ｏ」と発音されるからである。このように、直前の音（音韻）の影響を受けて記述（スペル）通りに発音しない場合は、適宜、その音韻の属する形状グループを変更する。
【００８３】
顔の形状を「ウィンクをする」または「驚き」などの動作または表情に変化させたい場合は、図１１（ｂ）に示すように予め定義されている文字列を用いればよい。例えば、「こうみんかん」という言葉を発した後にウィンクをさせたい場合は、「こうみんかん（＾＿−）」のようにウィンクを示す顔文字を用いてテキストデータ７４を入力すればよい。この場合の符合データ７５は、図１８（ｂ）に示すように、「２、Ｏ、Ｏ、１、Ｉ、１、２、Ａ、１、１１」となる。
【００８４】
符号データ７５の基となるテキストデータ７４を音声データから取得してもよい。すなわち、マイクなどの音声入力装置２４から入力された音声データを音声テキスト変換部２０３によってテキストデータ７３に変換する。
【００８５】
符号生成部２１１によって生成された符号データ７５は、顔モデルＤＳ' を制御するための動作制御データとしてテキストデータ７４とともにデータ送信部２０６によって相手先の端末装置２βに送信される。符号データ７５をエッジ変位データ７６に変換ための符号形状情報７３が端末装置２βにない場合は、符号形状情報７３も送信される。
【００８６】
また、端末装置２βが標準モデル情報７１を有しないと判別された場合は、その端末装置２βに標準モデル情報７１および端末装置αのユーザの顔形状情報７２をモデル情報７として送信する。標準モデル情報７１を有するが顔形状情報７２を有しないと判別された場合は、顔形状情報７２をモデル情報７として送信する。次に説明する機種Ｙの場合も同様である。
【００８７】
端末装置２βが機種Ｙであると判別された場合は、変位データ生成部２１２によって符号生成部２１１で生成された符号データ７５が示す各形状グループに対応する顔モデルＤＳ' の各エッジＥの変位量を求め、図１８（ａ）に示すようなエッジ変位データ７６を生成する。生成されたエッジ変位データ７６は、顔モデルＤＳ' を制御するための動作制御データとしてテキストデータ７４とともに端末装置２βに送信される。ただし、通信状況判別部２０５によって通信回線４の通信速度が遅くデータ通信に時間を要すると判別された場合は、エッジ変位データ７６を間引いてもよい。例えば、エッジ変位データ７６のうち子音を発音する形状に該当するデータを間引いてもよい。
【００８８】
端末装置２βが機種Ｚであると判別された場合は、画像データ生成部２１３によって変位データ生成部２１２で生成されたエッジ変位データ７６に基づいて顔モデルＤＳ' の形状を変化させ、端末装置２βにおいてメッセージの出力に合わせてアニメーションを表示するための画像データ７７を生成する。画像データ７７の生成については、後に説明する端末装置２βの機能と重複するので、ここでは説明を省略する。画像データ７７は、テキストデータ７４とともに端末装置２βに送信される。
【００８９】
なお、端末装置２βが機種Ｘであっても、端末装置２αのユーザは、端末装置２βに送信する動作制御データとして符合データ７５の代わりにエッジ変位データ７６を選ぶことができる。
【００９０】
符合データ７５はエッジ変位データ７６よりもデータ量が少ないので、通信時間などの点に鑑みると、符合データ７５を端末装置２βに送信するほうが望ましい。しかし、符合データ７５を用いた場合は、図１１の符号形状情報７３に定められた形状グループの形状以外には顔モデルＤＳ' を変形させることができない。そこで、符号形状情報７３に定められた形状に限られず、より細かな動きを顔モデルＤＳ' に与えたい場合は、形状指定部２０４にて顔モデルＤＳ' の各エッジＥの変位量を自由に設定し変位データ７６を作成すればよい。ただし、通信回線４の通信速度が遅いと判別された場合は、ユーザの選択に関わらず、符合データ７５が端末装置２βに送信されるようにしてもよい。
〔端末装置２βの機能〕
次に、端末装置２βの機能的構成について説明する。図６（ａ）に示すように、端末装置２βの機種Ｘには、データ受信部２３１、変位データ生成部２３２、画像生成部２３３、データ記憶部２３４、および音声合成部２３５などが設けられる。
【００９１】
データ受信部２３１は、端末装置２αからテキストデータ７４および符合データ７５などのデータを受信する。標準モデル情報７１、顔形状情報７２、または符号形状情報７３を受信した場合は、これらの情報はデータ記憶部２３４に記憶され保存される。
【００９２】
変位データ生成部２３２は、端末装置２αの変位データ生成部２１２と同様に、受信した符合データ７５に基づいてエッジ変位データ７６を生成する。
画像生成部２３３は、標準モデル情報７１および顔形状情報７２に基づいて標準モデルＤＳの構成頂点Ｖを置き換えて顔モデルＤＳ' を取得し、エッジ変位データ７６に基づいて顔モデルＤＳ' の形状を変化させて顔画像ＨＦのアニメーションを生成する。
【００９３】
顔モデルＤＳ' の形状は、次のように変化させる。図２０はタイムテーブルの例を示す図である。
まず、エッジ変位データ７６に示される時刻ごとの形状（形状グループ）をタイムテーブルに配置する。例えば、顔モデルＤＳ' に「こうみんかん」と発音させる場合は、各形状グループ（２、Ｏ、…、１）を図２０（ａ）に示す台形のように配置する。
【００９４】
図２０（ｂ）に示すように、形状グループを示す台形の上辺の長さは、その形状グループの形状を保っている継続時間を意味する。継続時間は、母音よりも子音のほうが短く、母音が０．４秒、子音が０．１秒程度である。母音が連続する場合は、後の母音の継続時間を通常よりも短めに設定してもよい。
【００９５】
立ち上がり時間Ｔ１は、ある形状（例えば無表情の形状）からその形状に変化するまでの時間を意味する。終息時間Ｔ３は、その形状が無表情の形状に戻るまでの時間を意味する。立ち上がり時間Ｔ１および終息時間Ｔ３は、ともに極めて短い時間であり、０．１秒以下である。
【００９６】
図２０（ａ）に戻って、隣り合う２つの台形は、前の台形の形状が終息したとき（ｔ＝ｔｂ）に後の台形の形状の立ち上がりが完了するように配置される。つまり、後の音韻は、ｔｂよりも終息時間Ｔ３だけ前に立ち上がりはじめるように配置される（ｔ＝ｔａ）。
【００９７】
このように、各形状グループを配置したタイムテーブルに従って、顔モデルＤＳ' の形状を変化させる。ただし、２つの形状グループの間（ｔａ〜ｔｂ）の顔モデルＤＳ' の形状すなわち各構成頂点Ｖの位置は、次の式（７）に基づいて直線近似して補間する。
【００９８】
【数８】

【００９９】
そして、顔モデルＤＳ' の形状を変化させながら所定の方向から顔モデルＤＳ' を２次元上に投影して画像ＨＦのアニメーションを生成する。
図６（ａ）の音声合成部２３５は、テキストデータ７４に示される端末装置２αのユーザのメッセージを音声化し、アニメーションと同期して出力する。例えば、顔モデルＤＳ' が所定の形状に変化をはじめる（立ち上がる）ときに画像生成部２３３から発せられる信号（トリガー）に合わせて順次音声を出力する。テキストデータを音声化する方法として、公知の音声合成技術が用いられる。
【０１００】
端末装置２βの機種Ｙには、図６（ｂ）に示すように、データ受信部２４１、画像生成部２４２、データ記憶部２４３、および音声合成部２４４などが設けられる。
【０１０１】
データ受信部２４１は、端末装置２αからテキストデータ７４およびエッジ変位データ７６などのデータを受信する。
図６（ｂ）を図６（ａ）と比較すると、端末装置２βの機種Ｙには機種Ｘの変位データ生成部２３２に相当するものがないことが分かる。つまり、機種Ｙの画像生成部２４２は、機種Ｘのように自ら生成したエッジ変位データ７６に基づいて画像ＨＦのアニメーションを生成するのではなく、端末装置２αから受信したエッジ変位データ７６に基づいて画像ＨＦを生成する。その他の機能は、機種Ｘの場合と同様である。
【０１０２】
端末装置２βの機種Ｚには、図６（ｃ）に示すように、データ受信部２５１、データ記憶部２５２、画像出力部２５３、および音声合成部２５４などが設けられる。
【０１０３】
データ受信部２５１は、端末装置２αからテキストデータ７４および画像データ７７を受信する。これらのデータは、データ記憶部２５２に記憶される。画像出力部２５３は、画像データ７７に基づいて顔画像ＨＦのアニメーションを表示画面ＨＧに出力する。音声合成部２５４は、アニメーションに合わせて音声を出力する。
【０１０４】
次に、アニメーション通信システム１における処理の流れをフローチャートを参照して説明する。図２１は送信側の端末装置２αの処理の流れを説明するフローチャート、図２２は受信側の端末装置２βの処理の流れを説明するフローチャートである。
【０１０５】
図２１に示すように、送信側の端末装置２αにおいて、端末装置２αのユーザのメッセージを図１１に示す符号形状情報７３に基づいて符号化し、符号データ７５を生成する（＃１１）。
【０１０６】
メッセージの送信先である端末装置２βに問い合わせるなどして、その端末装置２βの機種、保有するデータ、および通信回線４の通信速度など、通信に関する状況を判別する（＃１２）。端末装置２βが顔モデルに非対応の機種すなわち機種Ｚであると判別された場合は（＃１３でＮｏ）、符号データ７５に基づいてアニメーションを作成し（＃１４）、適当な長さの時間ごとにアニメーションを画像データ化し、適当なタイミングで画像データ７７を端末装置２βに送信する（＃１５）。ただし、一般に画像データはサイズが大きいので、通信速度などに応じてデータを間引いてもよい。なお、ステップ＃１４の処理は、後に説明する図２２のステップ＃３２〜＃３４の一連の処理と同じである。
【０１０７】
端末装置２βが顔モデルに対応した機種と判別された場合は（＃１３でＹｅｓ）、その端末装置２βが標準モデル情報７１を有するか否かを判別する（＃１６）。標準モデル情報７１を有しない場合は（＃１６でＮｏ）、標準モデル情報７１および端末装置２αのユーザの顔形状情報７２を端末装置２βに送信する（＃１７、＃１８）。
【０１０８】
標準モデル情報７１を有する場合は（＃１６でＹｅｓ）、端末装置２βが顔形状情報７２を有するか否かを判別し（＃１９）。有しないと判別された場合は端末装置２βにその顔形状情報７２を送信する（＃１８）。
【０１０９】
端末装置２βが符号データに対応した機種か否かを判別する（＃２０）。端末装置２βが符号データ非対応の機種すなわち機種Ｙである場合は（＃２０でＮｏ）、符号データ７５より顔モデルＤＳ' の形状変化データ（エッジ変位データ７６）を生成し（＃２１）、端末装置２βに送信する（＃２２）。
【０１１０】
端末装置２βが符号データに対応した機種である場合は（＃２０でＹｅｓ）、さらに、送信モードが符号モードであるか形状モードであるかを判別する（＃２３）。符号モードとは、端末装置２βに送信する顔モデルＤＳ' の動作制御データとして符号データ７５が選択されていることを意味する。形状モードとは、形状変化データ（エッジ変位データ７６）が選択されていることを意味する。
【０１１１】
符号モードの場合は（＃２３でＹｅｓ）、ステップ＃１１で生成した符号データ７５を端末装置２βに送信する（＃２４）。形状モードの場合は（＃２３でＮｏ）、符号データ７５より顔モデルＤＳ' の形状変化データ（エッジ変位データ７６）を生成し（＃２１）、端末装置２βに送信する（＃２２）。なお、ステップ＃２１において、より細かな動きのアニメーションを実現するために、各エッジＥの変位量を調整してもよい。
【０１１２】
一方、図２２に示すように、受信側の端末装置２β（機種Ｘ）において、端末装置２αから受信したデータが符号データ７５である場合は（＃３１でＹｅｓ）、受信した符号データ７５に基づいてエッジ変位データ７６を生成する（＃３２）。端末装置２αから受信しまたはステップ＃３２で生成したエッジ変位データ７６について形状補間の処理を行い（＃３３）、順次顔モデルＤＳ' を所定の方向から２次元上に投影して顔画像ＨＦを生成し、アニメーションを実行する（＃３４）。
【０１１３】
機種Ｙの場合は、ステップ＃３１および＃３２の処理が省略される。機種Ｚの場合は、ステップ＃３１ないし＃３３の処理が省略され、端末装置２αから受信した画像データ７７に基づいてアニメーションを実行する。
【０１１４】
本実施形態によると、メッセージの送信先である端末装置２βの機種に関わらず、メッセージに合わせてアニメーションの実行が可能なデータを端末装置２βに送信することができる。
【０１１５】
端末装置２βが機種Ｘのように複数の動作制御データに対応している場合は、目的または通信の状況に適応した動作制御データを選択することができる。例えば、細かい動きのアニメーションを実行したい場合は、動作制御データとしてエッジ変位データ７６を選択し、エッジＥごとに変位量を設定することができる。通信のデータ量を減らして通信時間を短縮したい場合は、符号データ７５を選ぶことができる。
【０１１６】
端末装置２についてはメッセージの送信側と受信側とに分けて機能を説明したが、端末装置２に両方の機能を設けてもよい。これにより、互いに相手の顔のアニメーションを表示しながら双方向にメッセージのやり取りを行うことができる。
【０１１７】
本実施形態では、端末装置２αのユーザのメッセージをテキストデータとして端末装置２βに送信したが、音声データとして送信してもよい。この場合は、端末装置２βにおいて、受信した音声データの出力とアニメーションの実行とのタイミングを図ればよい。
【０１１８】
本実施形態では、「こうみんかん」のような１単語をメッセージとして送信する例について説明したが、電子メールなどのように長い文書をメッセージとして送信する場合は、メッセージを適当な長さに区切って複数の符号データなどの動作制御データを生成してもよいし、全文について１つの動作制御データを生成してもよい。電話による会話またはパソコンによるチャットなどのようにリアルタイムでメッセージのやり取りを行う場合は、１音または１語ごとに動作制御データを生成してもよい、メッセージを短い時間ごとに区切って動作制御データを生成してもよい。
【０１１９】
端末装置２βの機種は、対応可能な動作制御データの種類に応じて３つの機種を例示したが、もっと多くの機種があってもよい。例えば、同じ動作制御データに対応した機種であっても、メーカ、通信会社、または処理速度、メモリ容量、または通信速度などの性能などに応じて別々の機種として判別するようにしてもよい。また、本実施形態では、動作制御データとして符号データおよびエッジ変位データの２種類を例示したが、その他の形式の動作制御データがあってもよい。
【０１２０】
顔モデルとして３次元形状モデルを用いたが、２次元の形状モデルであってもよい。
その他、アニメーション通信システム１、端末装置２α、２βの全体または各部の構成、処理内容、処理順序などは、本発明の趣旨に沿って適宜変更することができる。
【０１２１】
【発明の効果】
本発明によると、メッセージの送信先の端末装置の機種に関わらず、メッセージに合わせてアニメーションの実行が可能なデータを当該端末装置に送信することができる。
【図面の簡単な説明】
【図１】本発明に係るアニメーション通信システムの構成を説明する図である。
【図２】端末装置の構成を説明する図である。
【図３】送信側の端末装置の記憶装置に記憶されるプログラムおよびデータを示す図である。
【図４】受信側の端末装置の記憶装置に記憶されるプログラムおよびデータを示す図である。
【図５】送信側の端末装置の機能的構成を説明する図である。
【図６】受信側の端末装置の機能的構成を説明する図である。
【図７】標準モデルまたは顔モデルの構成の例を示す図である。
【図８】エッジとノードとの対応関係を示す図である。
【図９】ノードの影響を受ける構成頂点を示す図である。
【図１０】ノードが影響を与える範囲を説明する図である。
【図１１】符号形状情報の例を示す図である。
【図１２】標準モデルまたは顔モデルを各形状グループの形状に変化させた場合の例を示す図である。
【図１３】３次元形状モデルの生成の処理の流れを説明するフローチャートである。
【図１４】標準モデルの例を示す図である。
【図１５】変形処理の流れを説明するフローチャートである。
【図１６】標準モデルの面Ｓと３次元計測データの点Ｐとを模式的に示す図である。
【図１７】標準モデルの異常変形を防ぐための仮想バネを説明するための図である。
【図１８】エッジ変位データおよび符号データの例を示す図である。
【図１９】端末装置の機種の相違を説明する図である。
【図２０】タイムテーブルの例を示す図である。
【図２１】送信側の端末装置の処理の流れを説明するフローチャートである。
【図２２】受信側の端末装置の処理の流れを説明するフローチャートである。
【符号の説明】
１アニメーション通信システム（通信システム）
２α 端末装置（第二の端末装置）
２β 端末装置（第一の端末装置）
２１表示装置（表示手段）
２２ｂ動作制御データ生成プログラム（コンピュータプログラム）
２９ａ〜２９ｃ記録媒体
２０５通信状況判別部（動作制御データ対応判別手段、機能判別手段、モデルデータ有無判別手段、標準モデルデータ有無判別手段、エッジ変位データ対応判別手段）
２０６データ送信部（アニメーションデータ送信手段、モデルデータ送信手段、標準モデルデータ送信手段）
２１１符号生成部（符号データ生成手段）
２１２変位データ生成部（形状変化データ生成手段）
２３３、２４２画像生成部（表示手段）
４通信回線
７モデル情報（モデルデータ）
７１標準モデル情報（標準モデルデータ）
７２顔形状情報（モデル変形データ）
７３符号形状情報（位置決めデータ）
７４テキストデータ（メッセージ）
７５符号データ（アニメーションデータ）
７６エッジ変位データ（アニメーションデータ）
ＤＳ' 顔モデル（モデル）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a communication system that transmits and receives data for animation that operates in accordance with a message, and a terminal device used therefor.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a technique has been proposed in which an animation of a message sender is displayed on a display screen of a terminal device on the receiver side so as to operate in accordance with the message. For example, Japanese Patent Laid-Open No. 8-307841 discloses a TV phone device that generates and displays a pseudo moving image (animation) based on an audio signal from a transmission side. According to this TV telephone apparatus, animation can be displayed without receiving image data from the transmission side.
[0003]
Recently, many types of terminal devices for exchanging messages have been proposed or spread widely. For example, there are a personal computer, a PDA (Personal Digital Assistant), a mobile phone device, or a PHS terminal device. Furthermore, various types of these apparatuses are sold by manufacturers or communication companies.
[0004]
[Problems to be solved by the invention]
Using these devices, it is conceivable to display an animation on the terminal device on the receiving side based on the message on the transmitting side, as in the TV telephone device of JP-A-8-307841.
[0005]
However, these terminal devices often have different specifications. For example, even if both the terminal device on the message transmission side and the reception side can execute the animation, the data formats for executing the animation may be different from each other. In such a case, even if data for animation is transmitted from one terminal device to the other terminal device, the animation cannot be executed in the other terminal device.
[0006]
Especially for mobile devices such as PDAs or mobile phones, each manufacturer and each communication company develops new models and new standards, and many new and old terminal devices are mixed. It is difficult to do.
[0007]
In view of such problems, the present invention provides a communication system and a terminal device that transmit data that can be animated in accordance with a message to the terminal device regardless of the model of the terminal device to which the message is transmitted. The purpose is to do.
[0008]
  A terminal device according to an aspect of the present invention is provided.The message entered by the userTo other terminal devicesVia communication lineA terminal device for transmitting,The operation control data correspondence determining means for determining whether or not the other terminal device supports motion control data for generating an animation by changing the shape of the model, and the motion control data correspondence determining means When it is determined that another terminal device corresponds to the motion control data, motion control data that changes the shape of the model for each phoneme of the message is used as animation data for animation using the model. Is generated and transmitted to the other terminal device, and if it is determined that the model is not supported, image data at each timing when the model is changed in accordance with the message is transmitted as the animation data. And animation data transmission means.
[0009]
  A transmission method according to an aspect of the present invention is a transmission method in a terminal device that transmits a message input by a user to another terminal device via a communication line, and the other terminal device is a model of the terminal device. When it is determined that the other terminal device corresponds to the motion control data by executing a process of determining whether or not the motion control data for generating an animation is generated by changing the shape of Performs a process of generating operation control data for changing the shape of the model for each phoneme of the message and transmitting it to the other terminal device as animation data for animation using the model, If it is determined that the terminal device does not support the motion control data, the animation data is matched with the message. Process of transmitting the image data at each timing when changing the serial model is run.
[0010]
  A computer program according to an aspect of the present invention is a computer program used in a computer that transmits a message input by a user to another terminal device via a communication line, and the other terminal device changes the shape of the model. To determine whether it corresponds to motion control data for generating animationMeans,When it is determined that the other terminal device corresponds to the motion control data, motion control is performed to change the shape of the model for each phoneme of the message as animation data for animation using the model. Data is generated and transmitted to the other terminal deviceMeans,When it is determined that the other terminal device does not correspond to the motion control data, the animation data at each timing when the model is changed in accordance with each timing during the output of the message. Send image dataAs a means, the computer is caused to function.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a diagram illustrating a configuration of an animation communication system 1 according to the present invention, FIG. 2 is a diagram illustrating a configuration of terminal devices 2α and 2β, and FIG. 3 is stored in a storage device 22 of a terminal device 2α on a transmission side. FIG. 4 is a diagram illustrating programs and data, FIG. 4 is a diagram illustrating programs and data stored in the storage device 22 of the reception-side terminal device 2β, and FIG. 5 is a diagram illustrating a functional configuration of the transmission-side terminal device 2α. 6 is a diagram illustrating the functional configuration of the terminal device 2β on the receiving side.
[0015]
As shown in FIG. 1, the animation communication system 1 according to the present invention includes a plurality of terminal devices 2. As the terminal device 2, various terminal devices having a communication function such as a mobile phone device, a personal computer (personal computer), or a PDA (Personal Digital Assistant) are used.
[0016]
These terminal devices 2 can be connected to each other via a communication line 4 to transmit and receive data. As the communication line 4, a public line such as an analog line or ISDN, a mobile phone line, a dedicated line, or the Internet is used.
[0017]
A user can transmit a message from his / her terminal device 2 to the terminal device 2 of another user. Hereinafter, in order to distinguish between the terminal device on the message transmission side and the terminal device on the reception side, the terminal device 2 on the transmission side and the reception side are referred to as terminal devices 2α and 2β, respectively.
[0018]
2A shows a configuration when the terminal devices 2α and 2β are personal computers, and FIG. 2B shows a configuration when the terminal devices 2α and 2β are mobile phone devices. As shown in FIGS. 2A and 2B, the terminal devices 2α and 2β are constituted by a processing device 20, a display device 21, a storage device 22, a character input device 23, a voice input device 24, a voice output device 25, and the like. Is done.
[0019]
The processing device 20 includes a CPU 20a, a RAM 20b, a ROM 20c, various input / output ports 20d, various controllers 20e, and the like.
When the terminal devices 2α and 2β are personal computers, a magnetic storage device or the like is used as the storage device 22. In the case of a mobile phone device, a rewritable storage element such as an EEPROM is used.
[0020]
As shown in FIG. 3, the storage device 22 of the terminal device 2α includes an operating system (OS) 22a, an operation control data generation program 22b, a modeling program 22c, and programs and data for various processes described later. It is remembered.
[0021]
As shown in FIG. 4, the storage device 22 of the terminal device 2β stores an operating system (OS) 22d, an animation execution program 22e, programs and data for various processes described later, and the like. However, the standard model information 71 or the face shape information 72 may not be stored. The terminal device 2β has three types of models X, Y, and Z. As shown in FIGS. 4A to 4C, the contents of programs stored in the storage device 22 differ depending on the model. . The difference between the models X to Z will be described later.
[0022]
Programs and data stored in the storage device 22 are loaded into the RAM 20b as necessary. The loaded program is executed by the CPU 20a. The terminal devices 2α and 2β may be connected to other computers via the communication line 4 to download programs or data. Alternatively, a program or data may be loaded from various removable disks (recording media) such as a floppy disk 29a, a CD-ROM 29b, or a magneto-optical disk (MO) 29c.
[0023]
A processing result by the processing device 20 is displayed on the display device 21 of the terminal device 2β. For example, the face image HF of the message sender (user of the terminal device 2α) is displayed as an animation on the display screen HG of the display device 21 so that the mouth moves in accordance with the message received from the terminal device 2α. The voice output device 25 outputs a message as voice according to the animation. Thereby, the user of the terminal device 2β can be made to recognize as if the sender's face image HF is reading a message.
[0024]
The face image HF is obtained by projecting a three-dimensional shape model (face model) indicating the three-dimensional shape of the sender's head two-dimensionally from a predetermined direction. That is, in order to generate an animation in which the face image HF operates, it is only necessary to project the image of the face model two-dimensionally from a predetermined direction while changing the shape of the face model. Data related to the face model and control of the face model will be described later.
[0025]
With such a configuration, as shown in FIG. 5, the terminal device 2α includes a data generation unit 201, a face model data generation unit 202, a voice text conversion unit 203, a shape designation unit 204, a communication status determination unit 205, and data transmission. A unit 206, a data storage unit 207, and the like are provided. The terminal device 2β is provided with functions as shown in FIG. However, as shown in FIGS. 6A to 6C, the functional configuration differs for each model.
[Data for generating face images]
In FIG. 5, the data storage unit 207 of the terminal device 2α stores standard model information 71, face shape information 72, code shape information 73, and the like.
[0026]
7 is a diagram showing an example of the configuration of the standard model DS or the face model DS ′, FIG. 8 is a diagram showing the correspondence between the edge E and the node N, and FIG. 9 is a diagram showing the configuration vertex V affected by the node N. FIG. 10 is a diagram for explaining a range in which the node N affects.
[0027]
The standard model information 71 is information regarding the constituent vertices (Model Vertex) V, polygons Pg, nodes N, and edges E of the standard model DS shown in FIG. The standard model DS is a three-dimensional model having a standard face size and shape and structured around the entire circumference of the head. In FIG. 7A, the intersection of a plurality of thin straight lines indicates the constituent vertex V. The position of each constituent vertex V is determined by the three-dimensional coordinates (position data) of x, y, and z. Each polygon Pg is defined by a set of a plurality of constituent vertices V on the same plane, that is, a topology. These position data and phase data constitute geometry data. A thick straight line indicates an edge E that means muscle. A black circle indicates a node N which means an end point of a muscle.
[0028]
The position of the node N is expressed as a relative position of the constituent vertex V as shown in the following equation (1).
[0029]
[Expression 1]

[0030]
The position of the edge E (E1, E2,...) Is determined by two different nodes N as shown in the first parameter of the edge E in FIG. Nodes N (N1, N2,...) Are arranged at positions serving as end points of each muscle of the entire face. In FIG. 7B, the configuration vertex V is omitted from FIG. 7A for easy understanding of the relationship between the node N and the edge E. FIGS. 7A and 7B show the node N and the edge E in the right half of the face omitted, but actually the node N and the edge E exist as in the left half.
[0031]
The second parameter of the edge E indicates which end point (node N) is moved at what rate (weight) when the edge E (muscle) is displaced. For example, the second parameter “0.7, 0.3” of the edge E3 indicates that when the edge E3 is displaced, the node N4 and the node N3 are moved at a ratio of 7 to 3, respectively. . The displacement amount of the edge E is represented by the degree of muscle contraction. The state where the muscle is not contracted is “0”, and the state where the muscle is most contracted is “20”. For example, if the displacement amount (degree of contraction) is “15.0”, this indicates that the muscle (edge E) contracts 75%.
[0032]
When the edge E is displaced, the position where the node N moves is obtained by the following equation (2).
[0033]
[Expression 2]

[0034]
However, since there are actually nodes N related to a plurality of edges E, the position after movement of the node N is obtained by convergence calculation or simultaneous calculation.
The constituent vertex V that is affected when each node N moves is shown as the second parameter in FIG. That is, this second parameter indicates the range of influence when the node N moves. The configuration vertices V affected by the movement of the node N are concentrated around the node N. For example, in FIG. 10, the constituent vertices V that are affected by the movement of the node N indicated by the large black circle are nine constituent vertices V indicated by the small black circles.
[0035]
The first parameter in FIG. 9 indicates the degree of influence (Intensity) exerted on the constituent vertex V when the node N moves. When this value is large, the movement amount (displacement amount) of the constituent vertex V accompanying the movement of the node N increases.
[0036]
The position at which the constituent vertex V moves as the node N moves is obtained by the following equation (3).
[0037]
[Equation 3]

[0038]
As described above, the standard model information 71 indicates the positions and relationships of the constituent vertex V, the node N, and the edge E of the standard model DS as shown in FIGS. 8 and 9 and Expressions (1) to (3). Represents. By controlling the standard model information 71, the standard model DS can be changed to an arbitrary shape by moving the edge E (muscle) of the standard model DS. For example, in order to close (wink) the right eye of the standard model DS, a predetermined edge E (muscle) around the right eye may be displaced (contracted) by a predetermined value. Then, the position of each node N related to each edge E is moved according to equation (2), and the position of each component vertex V affected by each node N is moved according to equation (3). It becomes a winked shape.
[0039]
The face shape information 72 is information regarding the constituent vertices of the three-dimensional shape model (face model) of the face of the user of the terminal device 2α, that is, the message sender. The face model is generated by fitting the standard model DS to the user's three-dimensional measurement data. The face model is generated by the face model data generation unit 202 described later.
[0040]
That is, each constituent vertex of the face model corresponds to each constituent vertex V of the standard model DS moved by the fitting process. When the constituent vertices V of the standard model information 71 are replaced with the constituent vertices of the face model, the positions of the node N and the edge E of the face model are obtained in accordance with the relationship of Expression (1) and FIG. Therefore, as in the case of the standard model DS, the shape of the face model can be changed by displacing the edge E. Hereinafter, the face model obtained by fitting the standard model DS to the three-dimensional measurement data of the user of the terminal device 2α is referred to as “face model DS ′”. FIG. 11 is a diagram showing an example of the code shape information 73, and FIG. 12 is a diagram showing an example when the standard model DS or the face model DS ′ is changed to the shape of each shape group.
[0041]
By the way, when the shape of the face model DS ′ or the standard model DS is continuously changed, the number of values given to each edge E increases, so that the entire data amount increases. For example, when the shape of the face model DS ′ is changed according to a certain word, the displacement amount of each edge E is set so that the mouth is opened and closed as if the sound is emitted by the number of sounds included in the word. Must be set.
[0042]
However, in general, even if the phonemes are different from each other, there are those having the same or similar mouth shape characteristics when pronounced. For example, the consonant “m” and the consonant “n” are similar in that both are pronounced with their lips together.
[0043]
In this way, the code shape information 73 groups phonemes having the same or similar mouth shape characteristics when sounding, and defines the displacement amount of each muscle (edge E1, E2,...) For each group. In the present embodiment, as shown in FIG. 11A, five vowel groups (shape groups A, E, I, O, U) and three consonant groups (shape groups 1 to 3) are provided. Yes.
[0044]
Each of the shape groups A, E, I, O, and U has one type of vowels “a”, “e”, “i”, “o”, and “u”. Shape group 1 is a group of consonants that are pronounced together with lips, shape group 2 is a group of consonants that is pronounced with a mouth in a predetermined shape without lips, and shape group 3 is a shape of the mouth of a previously emitted sound This is a group of consonants that sound as they are. According to such classification, normally, five types of consonants “b, f, m, p, v” belong to the shape group 1, and “d, g, j, k, l, n, r” belong to the shape group 2. , S, t, w, z ”belong, and shape group 3 contains two types of consonants“ h, y ”.
[0045]
That is, the code shape information 73 indicates that the shape of the face model DS ′ shown in FIGS. 12A to 12G when sounds belonging to the shape groups A, E, I, O, U, 1, and 2 are emitted. Is data for displacing each muscle (edges E1, E2,...) Of the face model DS ′. However, in the case of the shape group 3, since the face model DS ′ is kept in the shape of the sound (phoneme) that has been emitted before, it does not have a displacement value.
[0046]
Further, as shown in FIG. 11B, the code shape information 73 has expressions such as “wink” (shape group 11), “surprise” (shape group 12), and “joy” (shape group 13). The displacement amount of each edge E with respect to the shape of the face model DS ′ at that time is included. According to the values shown in FIG. 11B, the shapes shown in FIGS. 12H to 12J can be obtained.
[0047]
If the code shape information 73 is used, the shape of the face model DS ′ can be continuously changed by simply designating the shape group names sequentially. For example, when the shape of the face model DS ′ is changed in accordance with the word “kan”, the shape groups 2, A, and 1 may be designated.
[Generation of face model (standard model fitting)]
Returning to FIG. 5, the face model data generation unit 202 generates a face image HF that operates in accordance with the message, that is, a face model DS ′ that is the basis of the animation. Next, a method for generating a face model, that is, a three-dimensional shape model will be described with reference to a flowchart.
[0048]
FIG. 13 is a flowchart for explaining the flow of processing for generating a three-dimensional shape model, FIG. 14 is a diagram showing an example of the standard model DS, FIG. 15 is a flowchart for explaining the flow of deformation processing, and FIG. FIG. 17 is a diagram schematically illustrating S and a point P of the three-dimensional measurement data, and FIG. 17 is a diagram for explaining a virtual spring for preventing abnormal deformation of the standard model DS.
[0049]
In FIG. 13, first, rough alignment of the standard model DS shown in FIG. 14 and the three-dimensional measurement data of a person (for example, the user of the terminal device 2α) is performed (# 101). The standard model DS is three-dimensional data structured around the entire circumference of the head, having a standard face size and shape. The three-dimensional measurement data is three-dimensional data of the user's face made up of point clouds. That is, in step # 101, the orientation, size, and position of the standard model DS are changed so that the distance between the standard model DS and the three-dimensional measurement data is minimized. In general, as the standard model DS and the three-dimensional measurement data, those without an expression are used. Note that the three-dimensional measurement data is prepared in advance by photographing the user with a three-dimensional measurement device.
[0050]
Outlines and feature points are extracted (# 102). Contours and feature points to be placed at the same positions as the contours RK and feature points TT for the standard model DS are placed on the three-dimensional measurement data or on the corresponding two-dimensional image.
[0051]
As a feature point, for example, there is no actual feature such as the end of the eyes or mouth, the top of the nose, the lower end of the chin, or the middle of them, but the location is specific Parts that are easy to do are selected. As the contour, a chin line, a lip line, a heel line, or the like is selected.
[0052]
In order to reduce the amount of calculation and the error, data is reduced for the three-dimensional measurement data (# 103).
The standard model DS is deformed (# 104). That is, the energy function defined in relation to the distance between each point of the three-dimensional measurement data and the surface of the standard model DS or the energy function defined to avoid excessive deformation is used. The surface of the standard model DS is deformed so as to be minimized.
[0053]
Then, the target energy function and control point are changed, and the process for change similar to step # 104 is repeated (# 105).
Next, the deformation process in step # 104 will be described.
[0054]
In FIG. 16, one of the point groups constituting the three-dimensional measurement data is indicated by a point Pk. On the surface S of the standard model DS, the point closest to the point Pk is indicated by Qk. The point Qk is an intersection when a perpendicular is drawn from the point Pk to the surface S.
[0055]
A method of fitting the surface S to the point group is as follows. Here, general fitting will be described.
For one point Pk in the point group, the corresponding point Qk, and the corresponding point group T = {(Pk, Qk), k = 1... N}, the fitting energy function Ff (U) is The following equation (4) is set.
[0056]
[Expression 4]

[0057]
However, Qk (U) indicates that Qk is a function of U.
In order to prevent excessive deformation of the surface S, a virtual spring (elastic bar) KB shown in FIG. 17 is introduced. A stabilization energy function for stabilizing the shape of the surface S is derived based on the constraints of the virtual spring KB.
[0058]
That is, in FIG. 17, a part of the surface (curved surface) S of the standard model DS to be fitted is shown. The surface S is formed by the control point group U = | ui, i = 1... N | A virtual spring KB is disposed between adjacent control points. The virtual spring KB acts to prevent abnormal deformation of the surface S by giving a restraint between the control points by a pulling force.
[0059]
That is, when the interval between adjacent control points u increases, the pulling force by the virtual spring KB increases accordingly. For example, when the point Qk approaches the point Pk, if the distance between the control points u increases with the movement, the pulling force by the virtual spring KB increases. Even if the point Qk moves, if the distance between the control points u does not change, that is, if the relative positional relationship between the control points u does not change, the pulling force by the virtual spring KB does not change. A value obtained by averaging the pulling force by the virtual spring KB over the entire surface S is defined as stabilization energy. Therefore, the stabilization energy increases when a part of the surface S protrudes and deforms. If the entire surface S moves on average, the stabilization energy is zero.
[0060]
The stabilization energy function Fs (U) is expressed by the following equation (5).
[0061]
[Equation 5]

[0062]
here,
[0063]
[Formula 6]

[0064]
Are the initial end point of the virtual spring KB and the end point of the deformed virtual spring KB, respectively. c is a spring coefficient, and M is the number of virtual springs KB. In addition, the following relationship holds.
[0065]
[Expression 7]

[0066]
Therefore, when the spring coefficient c is increased, the virtual spring KB becomes hard and is not easily deformed.
By introducing such a stabilization energy function Fs (U), a constant constraint is provided for the shape change of the surface S, and excessive deformation of the surface S can be prevented.
[0067]
Using the fitting energy function Ff (U) and the stabilization energy function Fs (U) described above, the fitting evaluation function F (U) is defined as the following equation (6).
[0068]
F (U) = WfFf (U) + WsFs (U) (6)
Here, Wf and Ws are weighting factors for normalization, respectively.
The surface fitting is repeated by repeatedly deforming the surface S and searching for corresponding points so that the evaluation function F (U) of Expression (6) is sufficiently small. For example, the fitting is performed in a direction in which the derivative of F (U) with respect to U approaches zero.
[0069]
In FIG. 15, in the deformation process, first, a point Qk corresponding to the point Pk is obtained by calculation, and a set of the point Pk and the point Qk is created (# 111).
The surface S is deformed (# 112), and the evaluation function F (U) after deformation is calculated (# 113). The process is repeated until the evaluation function F (U) converges (Yes in # 114).
[0070]
As a method for determining the convergence of the evaluation function F (U), a method of convergence when the evaluation function F (U) becomes smaller than a predetermined value, and a rate of change compared to the previous calculation is less than a predetermined value. It is possible to use a publicly known method such as a method of converging when it becomes.
[0071]
By such processing, the standard model DS can be transformed to generate a three-dimensional shape model (face model DS ′) having the shape of the user's face. Note that a face model may be acquired by fitting a standard model to a user's two-dimensional image. Alternatively, the face model may be created using various computer graphic (CG) programs.
[Data for controlling the movement of the face model]
FIG. 18 is a diagram illustrating an example of the edge displacement data 76 and the code data 75, and FIG.
[0072]
As described above, to change the shape of the face model DS ′, there are a method of directly giving a displacement amount to each edge E of the face model DS ′ and a method of specifying a shape group name.
[0073]
For example, in the case of continuously changing the shape of the face model DS ′ so as to wink after moving the mouth in accordance with the word “Kominkan”, the former method shown in FIG. Like the edge displacement data 76, the displacement amount of each edge E of the face model DS ′ is designated for each time, and the operation of the face model DS ′ is directly designated. Therefore, the edge displacement data 76 can be called “motion data”.
[0074]
In the latter method, the face model DS ′ is indirectly specified by designating a shape group using codes such as “1”, “A”, and “11” as in the code data 75 of FIG. Is specified (the displacement amount of each edge E). When changing the shape of the face model DS ′, the code data 75 is converted into edge displacement data 76 based on the code shape information 73 shown in FIG. That is, since the code data 75 means a parameter for obtaining the code shape information (motion data) 73, the code data 75 can be called a “motion parameter”.
[0075]
Returning to FIG. 5, the communication status determination unit 205 determines the status related to communication, and determines the type of data to be transmitted to the terminal device 2β. Specifically, what type or function is the terminal device 2β that is the message transmission partner, what data is stored in the terminal device 2β, or what communication speed the communication line 4 has. Is selected, and one of the standard model information 71, the face shape information 72, the code data (motion parameter) 75, the edge displacement data (motion data) 76, and the like is selected. The situation regarding the communication is determined after actually starting communication with the terminal device 2β. Note that a user may operate the terminal device 2α to input a situation regarding communication, or may select data to be transmitted to the terminal device 2β.
[0076]
Here, the models X to Z of the terminal device 2β will be described with reference to FIG. The model X can generate and display an animation by changing the shape of the face model DS ′ based on the code data 75. For example, when the code data 75 of “2, A, 1” is given, the code data 75 is converted into the edge displacement data 76 based on the code shape information 73, and the shape of the face model DS ′ is sequentially changed as shown in FIG. ) (A) (f) can be changed. Therefore, an animation can be generated even when the edge displacement data 76 is directly given as in the model Y described below.
[0077]
The model Y does not correspond to the code data 75, and the face model DS ′ cannot be controlled based on the code data 75. Therefore, in order to change the shape of the face model DS ′, the edge displacement data 76 must be directly given.
[0078]
The model Z does not correspond to either the code data 75 or the edge displacement data 76 and cannot generate an animation by changing the shape of the face model DS ′. However, an animation can be displayed based on the input image data.
[0079]
Further, the models X and Y may or may not have the standard model information 71 of the standard model DS. Thus, depending on the model and the presence / absence of the standard model information 71, the amount of data necessary for generating the animation transmitted to the terminal device 2β is in the order shown in FIG.
[0080]
A data generation unit 201 in FIG. 5 includes a code generation unit 211, a displacement data generation unit 212, an image data generation unit 213, and the like, and generates data to be transmitted to the terminal device 2β according to the determination result by the communication status determination unit 205. To do.
[0081]
When it is determined that the terminal device 2β is the model X, in principle, the code data 75 is generated. That is, the code generator 211 divides the text data 74, which is a user message of the terminal device 2α input from the character input device 23 such as a keyboard, into phonemes, and to which shape group each divided phoneme belongs. Is obtained based on the code shape information 73. For example, when the text data 74 is “kouminkan”, it is divided into nine phonemes “k, o, u, m, i, n, k, a, n”, and “2, O , O, 1, I, 1, 2, A, 1 ”is obtained.
[0082]
However, the third phoneme “u” is replaced with the shape group O. This is because, in general, “u” is pronounced “o” when “o” is immediately before “u”, such as “kou”. As described above, when the sound is not generated according to the description (spell) due to the influence of the immediately preceding sound (phoneme), the shape group to which the phoneme belongs is changed as appropriate.
[0083]
In order to change the face shape to an action or expression such as “wink” or “surprise”, a predefined character string may be used as shown in FIG. For example, if it is desired to wink after uttering the word “Kominkan”, text data 74 may be input using an emoticon indicating wink such as “Kominkan (^ _−)”. The code data 75 in this case is “2, O, O, 1, I, 1, 2, A, 1, 11” as shown in FIG.
[0084]
The text data 74 that is the basis of the code data 75 may be acquired from the voice data. That is, the voice data input from the voice input device 24 such as a microphone is converted into the text data 73 by the voice text converter 203.
[0085]
The code data 75 generated by the code generation unit 211 is transmitted to the counterpart terminal device 2β by the data transmission unit 206 together with the text data 74 as operation control data for controlling the face model DS ′. When the code shape information 73 for converting the code data 75 into the edge displacement data 76 is not in the terminal device 2β, the code shape information 73 is also transmitted.
[0086]
When it is determined that the terminal device 2β does not have the standard model information 71, the standard model information 71 and the face shape information 72 of the user of the terminal device α are transmitted as model information 7 to the terminal device 2β. When it is determined that the standard model information 71 is included but the face shape information 72 is not included, the face shape information 72 is transmitted as the model information 7. The same applies to the model Y described below.
[0087]
When it is determined that the terminal device 2β is the model Y, the displacement of each edge E of the face model DS ′ corresponding to each shape group indicated by the code data 75 generated by the code generation unit 211 by the displacement data generation unit 212 The amount is obtained, and edge displacement data 76 as shown in FIG. 18A is generated. The generated edge displacement data 76 is transmitted to the terminal device 2β together with the text data 74 as motion control data for controlling the face model DS ′. However, when the communication status determining unit 205 determines that the communication speed of the communication line 4 is slow and requires time for data communication, the edge displacement data 76 may be thinned out. For example, data corresponding to a shape that generates a consonant in the edge displacement data 76 may be thinned out.
[0088]
When it is determined that the terminal device 2β is the model Z, the shape of the face model DS ′ is changed by the image data generation unit 213 based on the edge displacement data 76 generated by the displacement data generation unit 212, and the terminal device 2β The image data 77 for displaying the animation is generated in accordance with the output of the message. Since the generation of the image data 77 overlaps with the function of the terminal device 2β described later, the description thereof is omitted here. The image data 77 is transmitted to the terminal device 2β together with the text data 74.
[0089]
Even if the terminal device 2β is the model X, the user of the terminal device 2α can select the edge displacement data 76 instead of the code data 75 as the operation control data to be transmitted to the terminal device 2β.
[0090]
Since the code data 75 has a smaller data amount than the edge displacement data 76, it is desirable to transmit the code data 75 to the terminal device 2β in view of communication time and the like. However, when the code data 75 is used, the face model DS ′ cannot be deformed except for the shape group shape defined in the code shape information 73 of FIG. Therefore, not only the shape defined in the code shape information 73 but also a more detailed movement is desired to be given to the face model DS ′, the amount of displacement of each edge E of the face model DS ′ can be freely set by the shape designation unit 204. What is necessary is just to create the displacement data 76 by setting. However, if it is determined that the communication speed of the communication line 4 is low, the code data 75 may be transmitted to the terminal device 2β regardless of the user's selection.
[Function of terminal device 2β]
Next, a functional configuration of the terminal device 2β will be described. As shown in FIG. 6A, the model X of the terminal device 2β is provided with a data reception unit 231, a displacement data generation unit 232, an image generation unit 233, a data storage unit 234, a voice synthesis unit 235, and the like.
[0091]
The data receiving unit 231 receives data such as text data 74 and code data 75 from the terminal device 2α. When the standard model information 71, the face shape information 72, or the code shape information 73 is received, these pieces of information are stored and stored in the data storage unit 234.
[0092]
The displacement data generation unit 232 generates edge displacement data 76 based on the received code data 75 in the same manner as the displacement data generation unit 212 of the terminal device 2α.
The image generation unit 233 acquires the face model DS ′ by replacing the constituent vertex V of the standard model DS based on the standard model information 71 and the face shape information 72, and determines the shape of the face model DS ′ based on the edge displacement data 76. The animation of the face image HF is generated by changing.
[0093]
The shape of the face model DS ′ is changed as follows. FIG. 20 is a diagram illustrating an example of a time table.
First, the shape (shape group) for each time indicated in the edge displacement data 76 is arranged in the time table. For example, when the face model DS ′ is pronounced “Kominkan”, each shape group (2, O,..., 1) is arranged as a trapezoid shown in FIG.
[0094]
As shown in FIG. 20B, the length of the upper side of the trapezoid indicating the shape group means the duration of maintaining the shape of the shape group. The duration is shorter for consonants than for vowels, 0.4 seconds for vowels and 0.1 seconds for consonants. When vowels are continuous, the duration of subsequent vowels may be set shorter than usual.
[0095]
The rise time T1 means a time until the shape changes from a certain shape (for example, an expressionless shape). The end time T3 means the time until the shape returns to the expressionless shape. Both the rise time T1 and the end time T3 are extremely short times and are 0.1 seconds or less.
[0096]
Returning to FIG. 20A, two adjacent trapezoids are arranged so that the rise of the subsequent trapezoidal shape is completed when the previous trapezoidal shape ends (t = tb). That is, the subsequent phonemes are arranged so as to start rising before the end time T3 before tb (t = ta).
[0097]
In this way, the shape of the face model DS ′ is changed according to the time table in which each shape group is arranged. However, the shape of the face model DS ′ between the two shape groups (ta to tb), that is, the position of each constituent vertex V is interpolated by linear approximation based on the following equation (7).
[0098]
[Equation 8]

[0099]
Then, the face model DS ′ is projected two-dimensionally from a predetermined direction while changing the shape of the face model DS ′ to generate an animation of the image HF.
The voice synthesizer 235 in FIG. 6A voices the user's message of the terminal device 2α indicated in the text data 74 and outputs it in synchronization with the animation. For example, sound is sequentially output in accordance with a signal (trigger) issued from the image generation unit 233 when the face model DS ′ starts to change (rises) into a predetermined shape. As a method for converting text data into speech, a known speech synthesis technique is used.
[0100]
As shown in FIG. 6B, the model Y of the terminal device 2β is provided with a data receiving unit 241, an image generating unit 242, a data storing unit 243, a voice synthesizing unit 244, and the like.
[0101]
The data receiving unit 241 receives data such as text data 74 and edge displacement data 76 from the terminal device 2α.
Comparing FIG. 6B with FIG. 6A, it can be seen that the model Y of the terminal device 2β has nothing equivalent to the displacement data generation unit 232 of the model X. That is, the image generation unit 242 of the model Y does not generate the animation of the image HF based on the edge displacement data 76 generated by itself like the model X, but based on the edge displacement data 76 received from the terminal device 2α. An image HF is generated. Other functions are the same as those of the model X.
[0102]
As shown in FIG. 6C, the model Z of the terminal device 2β is provided with a data reception unit 251, a data storage unit 252, an image output unit 253, a voice synthesis unit 254, and the like.
[0103]
The data receiving unit 251 receives text data 74 and image data 77 from the terminal device 2α. These data are stored in the data storage unit 252. The image output unit 253 outputs the animation of the face image HF to the display screen HG based on the image data 77. The voice synthesizer 254 outputs voice in accordance with the animation.
[0104]
Next, the flow of processing in the animation communication system 1 will be described with reference to a flowchart. FIG. 21 is a flowchart for explaining the processing flow of the terminal device 2α on the transmission side, and FIG. 22 is a flowchart for explaining the processing flow of the terminal device 2β on the reception side.
[0105]
As shown in FIG. 21, in the terminal device 2α on the transmission side, the message of the user of the terminal device 2α is encoded based on the code shape information 73 shown in FIG. 11 to generate code data 75 (# 11).
[0106]
The terminal device 2β that is the message transmission destination is inquired to determine the status of communication such as the model of the terminal device 2β, the data held, and the communication speed of the communication line 4 (# 12). When it is determined that the terminal device 2β is a model that does not support the face model, that is, the model Z (No in # 13), an animation is created based on the code data 75 (# 14), and an appropriate length of time Each time the animation is converted into image data, the image data 77 is transmitted to the terminal device 2β at an appropriate timing (# 15). However, since image data is generally large in size, the data may be thinned out according to the communication speed. Note that the process of step # 14 is the same as a series of processes of steps # 32 to # 34 of FIG.
[0107]
When it is determined that the terminal device 2β is a model corresponding to the face model (Yes in # 13), it is determined whether or not the terminal device 2β has the standard model information 71 (# 16). When the standard model information 71 is not included (No in # 16), the standard model information 71 and the face shape information 72 of the user of the terminal device 2α are transmitted to the terminal device 2β (# 17, # 18).
[0108]
When the standard model information 71 is included (Yes in # 16), it is determined whether or not the terminal device 2β has the face shape information 72 (# 19). If it is determined that the face shape information 72 is not included, the face shape information 72 is transmitted to the terminal device 2β (# 18).
[0109]
It is determined whether or not the terminal device 2β is a model corresponding to the code data (# 20). If the terminal device 2β is a model that does not support code data, that is, model Y (No in # 20), shape change data (edge displacement data 76) of the face model DS ′ is generated from the code data 75 (# 21), It transmits to the terminal device 2β (# 22).
[0110]
If the terminal device 2β is a model that supports code data (Yes in # 20), it is further determined whether the transmission mode is the code mode or the shape mode (# 23). The code mode means that the code data 75 is selected as the operation control data of the face model DS ′ transmitted to the terminal device 2β. The shape mode means that shape change data (edge displacement data 76) is selected.
[0111]
In the case of the code mode (Yes in # 23), the code data 75 generated in step # 11 is transmitted to the terminal device 2β (# 24). In the shape mode (No in # 23), shape change data (edge displacement data 76) of the face model DS ′ is generated from the code data 75 (# 21) and transmitted to the terminal device 2β (# 22). In step # 21, the displacement amount of each edge E may be adjusted in order to realize a finer motion animation.
[0112]
On the other hand, as shown in FIG. 22, in the receiving terminal device 2β (model X), when the data received from the terminal device 2α is the code data 75 (Yes in # 31), based on the received code data 75. Then, edge displacement data 76 is generated (# 32). A shape interpolation process is performed on the edge displacement data 76 received from the terminal device 2α or generated in step # 32 (# 33), and the face model DS ′ is sequentially projected onto a two-dimensional surface from a predetermined direction to generate a face image HF. Generate and execute the animation (# 34).
[0113]
In the case of model Y, steps # 31 and # 32 are omitted. In the case of the model Z, the processing of steps # 31 to # 33 is omitted, and animation is executed based on the image data 77 received from the terminal device 2α.
[0114]
According to this embodiment, regardless of the model of the terminal device 2β that is the message transmission destination, it is possible to transmit to the terminal device 2β data that can be animated according to the message.
[0115]
When the terminal device 2β supports a plurality of operation control data like the model X, the operation control data suitable for the purpose or the communication situation can be selected. For example, when it is desired to execute an animation of fine movement, the edge displacement data 76 can be selected as the operation control data, and the displacement amount can be set for each edge E. If it is desired to shorten the communication time by reducing the amount of communication data, the code data 75 can be selected.
[0116]
The functions of the terminal device 2 have been described separately for the message transmission side and the reception side, but both functions may be provided in the terminal device 2. As a result, messages can be exchanged bidirectionally while displaying each other's face animation.
[0117]
In the present embodiment, the message of the user of the terminal device 2α is transmitted as text data to the terminal device 2β, but may be transmitted as voice data. In this case, in the terminal device 2β, the timing of outputting the received audio data and executing the animation may be set.
[0118]
In the present embodiment, an example in which a single word such as “Kominkan” is transmitted as a message has been described. However, when a long document such as an e-mail is transmitted as a message, the message is divided into appropriate lengths. Thus, a plurality of operation control data such as code data may be generated, or one operation control data may be generated for the entire sentence. When exchanging messages in real time, such as by telephone conversation or chat on a personal computer, operation control data may be generated for each sound or word. It may be generated.
[0119]
As the model of the terminal device 2β, three models are illustrated according to the types of operation control data that can be handled, but there may be more models. For example, even models that correspond to the same operation control data may be identified as different models according to the manufacturer, communication company, or performance such as processing speed, memory capacity, or communication speed. In the present embodiment, two types of code data and edge displacement data are exemplified as the motion control data. However, other types of motion control data may be present.
[0120]
Although a three-dimensional shape model is used as the face model, a two-dimensional shape model may be used.
In addition, the whole of the animation communication system 1 and the terminal devices 2α and 2β, the configuration of each unit, the processing content, the processing order, and the like can be appropriately changed in accordance with the spirit of the present invention.
[0121]
【The invention's effect】
According to the present invention, regardless of the model of a terminal device to which a message is transmitted, data capable of executing animation in accordance with the message can be transmitted to the terminal device.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of an animation communication system according to the present invention.
FIG. 2 is a diagram illustrating a configuration of a terminal device.
FIG. 3 is a diagram illustrating a program and data stored in a storage device of a transmission-side terminal device.
FIG. 4 is a diagram illustrating programs and data stored in a storage device of a terminal device on the receiving side.
FIG. 5 is a diagram illustrating a functional configuration of a terminal device on a transmission side.
FIG. 6 is a diagram illustrating a functional configuration of a terminal device on the reception side.
FIG. 7 is a diagram illustrating an example of a configuration of a standard model or a face model.
FIG. 8 is a diagram illustrating a correspondence relationship between edges and nodes.
FIG. 9 is a diagram illustrating constituent vertices affected by a node.
FIG. 10 is a diagram illustrating a range in which a node affects.
FIG. 11 is a diagram illustrating an example of code shape information.
FIG. 12 is a diagram illustrating an example when a standard model or a face model is changed to a shape of each shape group.
FIG. 13 is a flowchart illustrating a processing flow for generating a three-dimensional shape model.
FIG. 14 is a diagram illustrating an example of a standard model.
FIG. 15 is a flowchart illustrating a flow of deformation processing.
FIG. 16 is a diagram schematically illustrating a surface S of a standard model and a point P of three-dimensional measurement data.
FIG. 17 is a diagram for explaining a virtual spring for preventing abnormal deformation of a standard model.
FIG. 18 is a diagram illustrating an example of edge displacement data and code data.
FIG. 19 is a diagram for explaining a difference in model of a terminal device.
FIG. 20 is a diagram illustrating an example of a time table.
FIG. 21 is a flowchart illustrating a process flow of a transmission-side terminal device.
FIG. 22 is a flowchart for explaining the processing flow of the terminal device on the receiving side.
[Explanation of symbols]
1 Animation communication system (communication system)
2α terminal device (second terminal device)
2β terminal device (first terminal device)
21 Display device (display means)
22b Operation control data generation program (computer program)
29a-29c recording medium
205 Communication status discriminator (Compatible with motion control dataDiscrimination means,Function determination means, model data presence / absence determination means, standard model data presence / absence determination means, edge displacement data correspondence determination means)
206 Data transmission unit (Animation dataTransmission means, Model data transmission means, standard model data transmission means)
211 Code generation unit (code data generation means)
212 Displacement data generator (shape change data generator)
233, 242 Image generation unit (display means)
4 communication lines
7 Model information (model data)
71 Standard model information (standard model data)
72 Face shape information (model deformation data)
73 Code shape information (positioning data)
74 Text data (message)
75 Code data (animation data)
76 Edge displacement data (animation data)
DS 'face model (model)

Claims

A terminal device that transmits a message input by a user to another terminal device via a communication line,
Action control data correspondence determining means for determining whether or not the other terminal device corresponds to action control data for generating an animation by changing the shape of the model;
When it is determined by the motion control data correspondence determining means that the other terminal device corresponds to the motion control data, as the animation data for the animation using the model, for each phoneme of the message When motion control data that changes the shape of the model is generated and transmitted to the other terminal device, and it is determined that the model is not supported, when the model is changed in accordance with the message as the animation data Animation data transmission means for transmitting image data at each timing of
The terminal device characterized by having.

A transmission method in a terminal device for transmitting a message input by a user to another terminal device via a communication line,
In the terminal device,
A process for determining whether or not the other terminal device corresponds to motion control data for generating an animation by changing the shape of the model;
When it is determined that the other terminal device corresponds to the motion control data, motion control is performed to change the shape of the model for each phoneme of the message as animation data for animation using the model. Processing to generate data and send it to the other terminal device,
When it is determined that the other terminal device does not correspond to the motion control data, processing for transmitting image data at each timing when the model is changed according to the message as the animation data , Run,
A transmission method characterized by the above.

A computer program used in a computer that transmits a message input by a user to another terminal device via a communication line,
Means for determining whether or not the other terminal device corresponds to motion control data for generating an animation by changing a shape of a model ;
When it is determined that the other terminal device corresponds to the motion control data, motion control is performed to change the shape of the model for each phoneme of the message as animation data for animation using the model. Means for generating data and transmitting it to the other terminal device ;
When it is determined that the other terminal device does not correspond to the operation control data, the animation data at each timing when the model is changed in accordance with each timing during the output of the message. Causing the computer to function as means for transmitting image data ;
A computer program characterized by the above.