JP2019215502A

JP2019215502A - Server, sound data evaluation method, program, and communication system

Info

Publication number: JP2019215502A
Application number: JP2018113949A
Authority: JP
Inventors: 惇平三神; Jumpei Mikami; 真夕計田; Mayu Hakata; 和博寺山; Kazuhiro Terayama
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2018-06-14
Filing date: 2018-06-14
Publication date: 2019-12-19
Anticipated expiration: 2038-06-14
Also published as: JP7119615B2

Abstract

To provide a server that allows one teacher to teach speaking to a plurality of students in parallel.SOLUTION: A server 50 can communicate with a plurality of communication terminals 10, 70A to 70C via a network, and comprises: communication means that transmits first sound data respectively to the plurality of second communication terminals 70A to 70C according to an instruction from the first communication terminal 10, and receives a plurality of pieces of second sound data respectively uttered by second users of the plurality of second communication terminals; and generation means that, on the basis of the first sound data and the second sound data acquired respectively from the plurality of second communication terminals, generates evaluations for the respective plurality of pieces of second sound data. The communication means transmits the evaluations for the respective plurality of pieces of second sound data obtained from the generation means to the first communication terminal of the first user different from the second users.SELECTED DRAWING: Figure 1

Description

本発明は、サーバ、音データ評価方法、プログラム、及び、通信システムに関する。 The present invention relates to a server, a sound data evaluation method, a program, and a communication system.

英会話などの外国語教室やアナウンス教室など、生徒が実際に声を出して教師からフィードバックを受けて正しい発音や発声を習得したいというニーズは少なくない。習得の際は、教師と生徒が教室など同じ場所に集まり、生徒の出す音に対して教師が実例を交えながら指導するという形態が取られる。 There are many needs for students who want to learn correct pronunciation and utterance by actually speaking out and receiving feedback from teachers, such as foreign language classes such as English conversation and announcement classes. At the time of learning, the teacher and the students gather in the same place such as a classroom, and the teacher gives guidance to the sounds emitted by the students with examples.

近年では、インターネットを利用して、遠隔地の教師と生徒が発音や発声を学習する形態も実用化されている。例えば、外国語教室においては、テレビ会議等に使用される通信システムを使って１対１で指導を受けられる音の教育システムが既に知られている。 In recent years, a form in which teachers and students at remote locations learn pronunciation and utterance using the Internet has been put to practical use. For example, in a foreign language classroom, a sound education system in which a one-to-one instruction is provided using a communication system used for a video conference or the like is already known.

また、１人の教師に対し複数人の生徒が同時に指導を受けることを想定した音の教育システムも知られている（例えば、特許文献１参照。）。特許文献１には、設定した問題を生徒が端末を介して回答でき、それを教師が確認して複数の生徒を指導するシステムが開示されている。 There is also known a sound education system that assumes that a plurality of students receive guidance simultaneously for one teacher (for example, see Patent Document 1). Patent Literature 1 discloses a system in which a student can answer a set problem via a terminal, and the teacher confirms the question and instructs a plurality of students.

しかしながら、従来の技術では、１人の教師が複数人の生徒を並行して指導することは困難であるという問題がある。教師と生徒が１対１であれば、教師が生徒の発声を確認して細やかな指導が可能である。しかし、複数人の生徒がいる場合、生徒に１人ずつ発声させて教師がアドバイスするのには時間がかかってしまう。一方、複数の生徒が同時に発声すれば時間を短縮できるが、評価対象が音であるため、教師が複数人の生徒の発声を同時に評価することは難しく細やかな指導が困難になる。 However, the conventional technique has a problem that it is difficult for one teacher to instruct a plurality of students in parallel. If the teacher and the student are one-to-one, the teacher can check the utterance of the student and provide detailed guidance. However, when there are a plurality of students, it takes time for the teacher to give advice to the students one by one. On the other hand, if a plurality of students utter simultaneously, the time can be reduced, but since the evaluation target is sound, it is difficult for the teacher to simultaneously evaluate the utterances of the plurality of students, and it is difficult to provide detailed guidance.

本発明は、上記課題に鑑み、１人の教師が複数の生徒を並行して発声の指導をすることができるサーバを提供することを目的とする。 The present invention has been made in view of the above problems, and has as its object to provide a server that allows one teacher to instruct a plurality of students in parallel.

上記課題に鑑み、本発明は、複数の通信端末とネットワークを介して通信可能なサーバであって、第１のユーザが操作する第１の通信端末の指示により第１の音データを複数の第２の通信端末にそれぞれ送信して、前記複数の第２の通信端末の第２のユーザがそれぞれ発した複数の第２の音データをそれぞれ受信する通信手段と、前記第１の音データと、前記複数の第２の通信端末それぞれから取得した前記第２の音データとに基づき、複数の前記第２の音データそれぞれに対する評価を生成する生成手段と、を有し、前記通信手段は、前記生成手段によって得られた評価を前記第１の通信端末に送信することを特徴とする。 In view of the above-mentioned problem, the present invention is a server capable of communicating with a plurality of communication terminals via a network, wherein the first sound data is transmitted to a plurality of first terminals by an instruction of the first communication terminal operated by a first user. Communication means for transmitting to each of the plurality of second communication terminals and receiving a plurality of second sound data respectively emitted by a second user of the plurality of second communication terminals; and Generating means for generating an evaluation for each of the plurality of second sound data based on the second sound data obtained from each of the plurality of second communication terminals, wherein the communication means The evaluation obtained by the generating means is transmitted to the first communication terminal.

１人の教師が複数の生徒を並行して発声の指導をすることができるサーバを提供することができる。 A server can be provided in which one teacher can instruct a plurality of students to speak in parallel.

通信システムが生徒の音声を評価する処理の概略を説明する図の一例である。It is an example of a figure explaining the outline of processing which a communication system evaluates a student's voice. 通信システムの概略図の一例である。It is an example of the schematic diagram of a communication system. 端末の一例のハードウェア構成図である。FIG. 3 is a hardware configuration diagram of an example of a terminal. 管理システムの一例のハードウェア構成図である。FIG. 2 is a hardware configuration diagram of an example of a management system. 端末の一例のソフトウェア構成図である。FIG. 3 is a software configuration diagram of an example of a terminal. 通信システムの一部を構成する端末、及び管理システムの機能ブロック図の一例である。FIG. 2 is an example of a functional block diagram of a terminal constituting a part of the communication system and a management system. 認証管理テーブル、端末管理テーブル、グループ管理テーブルの一例を示す図である。FIG. 4 is a diagram illustrating an example of an authentication management table, a terminal management table, and a group management table. 録音管理テーブル、結果管理テーブルの一例を示す図である。It is a figure showing an example of a sound recording management table and a result management table. 端末が管理システムへログインする処理を示すシーケンス図の一例である。FIG. 5 is an example of a sequence diagram illustrating a process in which a terminal logs in to a management system. 管理システムが生徒の音声データを評価する手順を示すシーケンス図の一例である（録音ファイルを出力）。It is an example of the sequence diagram which shows the procedure by which the management system evaluates the voice data of a student (recorded file is output). 教師画面Ｔ１の一例を示す図である。It is a figure showing an example of a teacher screen T1. 教師画面Ｔ２ａ、Ｔ２ｂの一例を示す図である。It is a figure showing an example of teacher screen T2a and T2b. 教師画面Ｔ３の一例を示す図である。It is a figure showing an example of a teacher screen T3. 生徒画面Ｓ１の一例を示す図である。It is a figure showing an example of student screen S1. 生徒画面Ｓ２の一例を示す図である。It is a figure showing an example of student screen S2. 生徒画面Ｓ３の一例を示す図である。It is a figure showing an example of student screen S3. 管理システムが生徒の音声データを評価する手順を示すシーケンス図の一例である（教師が発声）。It is an example of the sequence diagram which shows the procedure by which the management system evaluates the voice data of a student (teacher utters). 音声認識を模式的に説明する図の一例である。It is an example of a figure explaining voice recognition typically. 通信システムの概略図の別の例である。FIG. 3 is another example of a schematic diagram of a communication system. 通信システムの概略図の別の例である。FIG. 3 is another example of a schematic diagram of a communication system. 端末が管理システムへログインする処理を示すシーケンス図の一例である。FIG. 5 is an example of a sequence diagram illustrating a process in which a terminal logs in to a management system. 端末におけるＷｅｂアプリの構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of a Web application in a terminal.

以下、本発明を実施するための形態の一例として、通信システムと通信システムが行う音データ評価方法について図面を参照しながら説明する。 Hereinafter, a communication system and a sound data evaluation method performed by the communication system will be described with reference to the drawings as an example of an embodiment for implementing the present invention.

＜処理の概略＞
図１は、本実施形態の通信システム１が生徒の音声を評価する処理の概略を説明する図の一例である。図１では、通信ネットワークを介して管理システム５０に教師が使用する端末１０と生徒が使用する端末７０Ａ〜７０Ｃ（いずれも後述する通信端末に相当）が通信できるように接続されている。
（１）まず、教師がお手本の音声を発声する（あるいは予め登録されているお手本の音声を使用してよい）。お手本の音声は管理システム５０を介して端末７０Ａ〜７０Ｃに送信され、生徒が聴くことができる。
（２）次に、生徒は教師の音声をお手本に、ほぼ同時に（少なくとも一部の時間帯で重複して）音声を発声する。生徒の音声は管理システム５０に送信される。
（３）管理システム５０は、お手本の音声と生徒の音声をそれぞれ比較して、どの程度、生徒の音声がお手本に近いかを評価する。
（４）管理システム５０は全生徒の評価結果を教師の端末１０に送信する。端末１０は全生徒の評価結果を表示するので教師は各生徒の評価結果を確認して各生徒の発声を向上させる細やかな指導が可能となる。 <Outline of processing>
FIG. 1 is an example of a diagram illustrating an outline of a process in which a communication system 1 of the present embodiment evaluates a student's voice. In FIG. 1, a terminal 10 used by a teacher and terminals 70A to 70C used by students (all correspond to communication terminals described later) are connected to a management system 50 via a communication network so as to be able to communicate.
(1) First, the teacher utters an example voice (or a pre-registered example voice may be used). The model voice is transmitted to the terminals 70A to 70C via the management system 50 and can be listened to by the student.
(2) Next, the student uses the teacher's voice as a model and utters a voice almost at the same time (at least in some time zones). The student's voice is transmitted to the management system 50.
(3) The management system 50 compares the model voice with the student voice to evaluate how close the student voice is to the model.
(4) The management system 50 transmits the evaluation results of all the students to the terminal 10 of the teacher. Since the terminal 10 displays the evaluation results of all the students, the teacher can check the evaluation results of each student and perform detailed guidance to improve the utterance of each student.

このように本実施形態の通信システム１は、遠隔地の教師が複数の生徒の発声を指導する形態においても、管理システム５０が教師の音声と各生徒の音声を比較するので、１人の教師が多数の生徒の発声を並行して指導することが可能になる。 As described above, in the communication system 1 according to the present embodiment, even in a mode in which a remote teacher instructs a plurality of students to utter, the management system 50 compares the teacher's voice with each student's voice. Can teach a lot of students' voices in parallel.

＜用語について＞
会議とは本来、会合して評議したり、集まって話し合ったりすることを言うが、本実施形態ではインターネットを介して話し合うため同じ場所に集まる必要はない。また、音の評価や指導を行う会合も会議の一形態である。 <Terminology>
A meeting originally refers to meeting and discussing or gathering and discussing, but in the present embodiment, it is not necessary to gather at the same place because the meeting is discussed via the Internet. Meetings for sound evaluation and guidance are also forms of meetings.

音データとは、空気の振動が電気信号に変換されたものであればよい。本実施形態では人が発声する音声データを例にして説明する。また、所定の言語として英語の発声を指導する通信システム１を説明するが、英語以外の所定の言語（中国語、フランス語、スペイン語、ロシア語、ドイツ語、アラビア語等）にも適用できる。また、音データとしては音声に限られず、音楽教室などで楽器が発する音でもよい。 The sound data may be any data as long as air vibration is converted into an electric signal. In the present embodiment, a description will be given using voice data uttered by a person as an example. Also, the communication system 1 that teaches English utterance as a predetermined language will be described, but the present invention can also be applied to a predetermined language other than English (Chinese, French, Spanish, Russian, German, Arabic, etc.). The sound data is not limited to sound, but may be sound emitted from a musical instrument in a music classroom or the like.

また、音は空気が振動する状態を言い、音データは情報又はデータであるが、本実施形態では両者を厳密には区別しない。 The sound refers to a state in which the air vibrates, and the sound data is information or data. In the present embodiment, the two are not strictly distinguished.

並行して指導するとは、１つのテレビ会議に複数の生徒が参加し、音を発声する時間帯が重複することがあっても、１対１の授業と同等の指導が可能であることをいう。並行して指導することを「同時に指導する」と称してもよい。 Instructing in parallel means that even if multiple students participate in one videoconference and the time of sound production overlaps, teaching equivalent to one-on-one lessons is possible. . Teaching in parallel may be referred to as "teaching at the same time."

＜システム構成例＞
図２は、本実施形態に係る通信システム１の概略図である。図２に示されているように、通信システム１は、テレビ会議端末が一例である複数の通信端末１０ａ，１０ｂ，１０ｃ、スマートフォンが一例である複数の通信端末７０ｘ，７０ｙ、各通信端末を管理する通信管理システム５０、及び中継装置３０によって構築されている。以下、「通信端末」を「端末」と表し、「通信管理システム」を「管理システム」と表す。 <System configuration example>
FIG. 2 is a schematic diagram of the communication system 1 according to the present embodiment. As shown in FIG. 2, the communication system 1 manages a plurality of communication terminals 10a, 10b, 10c, an example of which is a video conference terminal, a plurality of communication terminals 70x, 70y, an example of which is a smartphone, and manages each communication terminal. The communication management system 50 and the relay device 30 are configured. Hereinafter, the “communication terminal” is referred to as a “terminal”, and the “communication management system” is referred to as a “management system”.

図２では、３つの端末１０ａ，１０ｂ，１０ｃ、及び２つの端末７０ｘ，７０ｙについて示しているが、数はこれに限らない。また、複数の端末７０ｘ，７０ｙのうち、任意の端末を「端末７０」と表し、複数の端末１０ａ，１０ｂ，１０ｃのうち、任意の端末を「端末１０」と表す。端末１０，７０は、通信機能を有する汎用コンピュータ、電子ホワイトボード、カーナビゲーション端末、電子看板(デジタルサイネージ)等であってもよい。 FIG. 2 illustrates three terminals 10a, 10b, and 10c and two terminals 70x and 70y, but the number is not limited thereto. Further, an arbitrary terminal among the plurality of terminals 70x and 70y is represented as “terminal 70”, and an arbitrary terminal among the plurality of terminals 10a, 10b, and 10c is represented as “terminal 10”. The terminals 10 and 70 may be a general-purpose computer having a communication function, an electronic whiteboard, a car navigation terminal, an electronic signboard (digital signage), or the like.

管理システム５０は、サーバ機能を備えた情報処理装置であり、いわゆるサーバである（特許請求の範囲のサーバに相当）。端末１０，７０は、対応するクライアント機能を備えたコンピュータである。端末１０，７０、中継装置３０、及び管理システム５０は、インターネット、携帯電話網、ＬＡＮ(Local Area Network)、ＷｉＦｉ(Wireless Fidelity)、或いはＢｌｕｅｔｏｏｔｈ（登録商標）等の通信ネットワーク２によって通信可能である。通信ネットワーク２には、携帯電話網の末端にある基地局２ａも含まれる。なお、図２では、１つの基地局２ａを示しているが、数はこれに限らない。 The management system 50 is an information processing device having a server function, and is a so-called server (corresponding to a server in the claims). Each of the terminals 10 and 70 is a computer having a corresponding client function. The terminals 10 and 70, the relay device 30, and the management system 50 can communicate with each other via the communication network 2 such as the Internet, a mobile phone network, a LAN (Local Area Network), WiFi (Wireless Fidelity), or Bluetooth (registered trademark). . The communication network 2 also includes a base station 2a at the end of the mobile phone network. Although FIG. 2 shows one base station 2a, the number is not limited to this.

中継装置３０は、端末１０，７０間で、音データ（音声データも含まれる）、映像（画像）データ、及び、テキストデータ等のコンテンツデータを中継する。 The relay device 30 relays content data such as audio data (including audio data), video (image) data, and text data between the terminals 10 and 70.

端末１０は、拠点の一例として、教師が存在する居室（自宅、教室など）に配置されており、端末７０は、拠点の一例として、各生徒の自宅などに配置されている。教師の拠点はどこでもよく、生徒の拠点もどこでもよい。また、教師が端末７０を使用してもよいし、生徒が端末１０を使用してもよい。本実施形態では説明の便宜上、教師が端末１０を使用し、生徒が端末７０を使用するとして説明する。 The terminal 10 is arranged in a living room (home, classroom, or the like) where a teacher is located as an example of a base, and the terminal 70 is arranged in each student's home or the like as an example of a base. Teachers can be located anywhere, and students can be located anywhere. In addition, the teacher may use the terminal 70, or the student may use the terminal 10. In the present embodiment, for convenience of explanation, a description will be given assuming that a teacher uses the terminal 10 and a student uses the terminal 70.

管理システム５０、中継装置３０、及び端末１０，７０は、同一の国、地域に配置されていても、異なる国、地域に配置されていてもよい。端末１０のユーザは、例えば教師であり、端末７０のユーザは、例えば生徒である。 The management system 50, the relay device 30, and the terminals 10, 70 may be arranged in the same country or region, or may be arranged in different countries or regions. The user of the terminal 10 is, for example, a teacher, and the user of the terminal 70 is, for example, a student.

通信システム１には、管理システム５０を介して一方の端末から他方の端末に一方向でコンテンツデータを伝送するデータ提供システムや、管理システムを介して複数の端末間で情報や感情等を相互に伝達するコミュニケーションシステムが含まれる。このコミュニケーションシステムは、管理システムを介して複数の末間で情報や感情等を相互に伝達するためのシステムであり、テレビ会議システムやテレビ電話システム等が例として挙げられる。 The communication system 1 includes a data providing system for transmitting content data in one direction from one terminal to another terminal via a management system 50, and information and emotions between a plurality of terminals via a management system. Communication systems to communicate are included. This communication system is a system for mutually transmitting information, emotions, and the like between a plurality of terminals via a management system, and examples thereof include a video conference system and a video telephone system.

＜ハードウェア構成例＞
＜＜端末のハードウェア構成例＞＞
図３（ａ）は、一実施形態に係る端末１０のハードウェア構成図である。端末１０は、端末１０全体の動作を制御するＣＰＵ１０１(Central Processing Unit)、ＩＰＬ(Initial Program Loader)等のＣＰＵ１０１の駆動に用いられるプログラムを記憶したＲＯＭ１０２(Read Only Memory)、ＣＰＵ１０１のワークエリアとして使用されるＲＡＭ１０３(Random Access Memory)、端末１０用のプログラム、画像データ、及び音データ等の各種データを記憶するフラッシュメモリ１０４を有する。また、ＣＰＵ１０１の制御にしたがってフラッシュメモリ１０４に対する各種データの読み出し又は書き込みを制御するＳＳＤ１０５（Solid State Drive）、フラッシュメモリやＩＣカード（Integrated Circuit Card）等の記録メディア１０６に対するデータの読み出し又は書き込み（記憶）を制御するメディアＩ／Ｆ１０７(Interface)を有する。また、端末１０の宛先を選択する場合などに操作される操作ボタン１０８、端末１０の電源のＯＮ／ＯＦＦを切り換えるための電源スイッチ１０９、通信ネットワーク２を利用してデータ伝送をするためのネットワークＩ／Ｆ１１１を備えている。 <Example of hardware configuration>
<< Example of hardware configuration of terminal >>
FIG. 3A is a hardware configuration diagram of the terminal 10 according to the embodiment. The terminal 10 is used as a CPU 101 (Central Processing Unit) for controlling the entire operation of the terminal 10, a ROM 102 (Read Only Memory) storing a program used for driving the CPU 101, such as an IPL (Initial Program Loader), and a work area for the CPU 101. A random access memory (RAM) 103, a flash memory 104 for storing various data such as a program for the terminal 10, image data, and sound data. Also, an SSD 105 (Solid State Drive) that controls reading or writing of various data from or to the flash memory 104 under the control of the CPU 101, and reading or writing (storage) of data from or to a recording medium 106 such as a flash memory or an IC card (Integrated Circuit Card). ) For controlling the media I / F 107 (Interface). Further, an operation button 108 operated when selecting a destination of the terminal 10, a power switch 109 for turning on / off the power of the terminal 10, and a network I for transmitting data using the communication network 2. / F111.

また、端末１０は、ＣＰＵ１０１の制御にしたがって被写体を撮像して画像データを得る内蔵型のカメラ１１２、このカメラ１１２の駆動を制御する撮像素子Ｉ／Ｆ１１３、音を入力する内蔵型のマイク１１４、音を出力する内蔵型のスピーカ１１５、ＣＰＵ１０１の制御にしたがってマイク１１４及びスピーカ１１５との間で音信号の入出力を処理する音入出力Ｉ／Ｆ１１６を有する。また、ＣＰＵ１０１の制御にしたがって外付けのディスプレイ１２０に画像データを伝送するディスプレイＩ／Ｆ１１７、各種の外部機器を接続するための外部機器接続Ｉ／Ｆ１１８、端末１０の各種機能の異常を知らせるアラームランプ１１９、及び上記各構成要素を図３（ａ）に示されているように電気的に接続するためのアドレスバスやデータバス等のバスライン１１０を備えている。 Further, the terminal 10 includes a built-in camera 112 that obtains image data by capturing an image of a subject under the control of the CPU 101, an image sensor I / F 113 that controls driving of the camera 112, a built-in microphone 114 that inputs sound, It has a built-in speaker 115 for outputting sound, and a sound input / output I / F 116 for processing input / output of a sound signal between the microphone 114 and the speaker 115 under the control of the CPU 101. Also, a display I / F 117 for transmitting image data to an external display 120 under the control of the CPU 101, an external device connection I / F 118 for connecting various external devices, and an alarm lamp for notifying abnormalities of various functions of the terminal 10. 119, and bus lines 110 such as an address bus and a data bus for electrically connecting the above-mentioned components as shown in FIG. 3A.

ディスプレイ１２０は、被写体の画像等を表示する表示装置である。ディスプレイ１２０の一例として液晶や有機ＥＬ(Organic Electroluminescence)が挙げられる。また、ディスプレイ１２０は、ケーブル１２０ｃによってディスプレイＩ／Ｆ１１７に接続される。このケーブル１２０ｃは、アナログＲＧＢ（ＶＧＡ）信号用のケーブルであってもよいし、コンポーネントビデオ用のケーブルであってもよいし、ＨＤＭＩ（登録商標）(High−Definition Multimedia Interface)やＤＶＩ(Digital Video Interactive)信号用のケーブルであってもよい。 The display 120 is a display device that displays an image of a subject and the like. Examples of the display 120 include a liquid crystal and an organic EL (Organic Electroluminescence). The display 120 is connected to the display I / F 117 by a cable 120c. The cable 120c may be a cable for an analog RGB (VGA) signal, a cable for a component video, an HDMI (registered trademark) (High-Definition Multimedia Interface) or a DVI (Digital Video). (Interactive) signal cable.

カメラ１１２は、レンズや、光を電荷に変換して被写体の画像（映像）を電子化する固体撮像素子を含み、固体撮像素子として、ＣＭＯＳ(Complementary Metal Oxide Semiconductor)や、ＣＣＤ（Charge Coupled Device）等が用いられる。 The camera 112 includes a lens and a solid-state imaging device that converts light into electric charge to convert an image (video) of a subject into an electron. As the solid-state imaging device, a CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge Coupled Device) is used. Are used.

外部機器接続Ｉ／Ｆ１１８には、筐体１１００の接続口１１３２に差し込まれたＵＳＢ(Universal Serial Bus)ケーブル等によって、外付けカメラ、外付けマイク、及び外付けスピーカ等の外部機器がそれぞれ電気的に接続可能である。外付けカメラが接続された場合には、ＣＰＵ１０１の制御にしたがって、外付けカメラが駆動する。同じく、外付けマイクが接続された場合や、外付けスピーカが接続された場合には、ＣＰＵ１０１の制御にしたがって、外付けマイクや外付けスピーカが駆動する。 External devices such as an external camera, an external microphone, and an external speaker are electrically connected to the external device connection I / F 118 by a USB (Universal Serial Bus) cable inserted into the connection port 1132 of the housing 1100. Can be connected to When an external camera is connected, the external camera is driven under the control of the CPU 101. Similarly, when an external microphone is connected or an external speaker is connected, the external microphone and the external speaker are driven under the control of the CPU 101.

なお、記録メディア１０６は、端末１０に対して着脱自在となっている。また、ＣＰＵ１０１の制御にしたがってデータの読み出し又は書き込みを行う不揮発性メモリであれば、フラッシュメモリ１０４に限らず、ＥＥＰＲＯＭ（Electrically Erasable and Programmable ROM）等を用いてもよい。 Note that the recording medium 106 is detachable from the terminal 10. In addition, as long as it is a non-volatile memory that reads or writes data under the control of the CPU 101, an EEPROM (Electrically Erasable and Programmable ROM) may be used instead of the flash memory 104.

図３（ｂ）は、端末７０のハードウェア構成図である。図３（ｂ）に示されているように、端末７０は、端末７０全体の動作を制御するＣＰＵ７０１、プログラムを記憶したＲＯＭ(Read Only Memory)７０２、ＣＰＵ７０１のワークエリアとして使用されるＲＡＭ７０３、ＣＰＵ７０１の制御にしたがってデータの読み出し又は書き込みを行うＥＥＰＲＯＭ（Electrically Erasable and Programmable ROM）７０４を有する。また、フラッシュメモリ等の記録メディア７０６に対するデータの読み出し又は書き込み（記憶）を制御するメディアＩ／Ｆ７０７、及びＣＰＵ７０１の制御にしたがって被写体を撮像し画像データを得るＣＭＯＳ(Complementary Metal Oxide Semiconductor)センサ７１２を備えている。 FIG. 3B is a hardware configuration diagram of the terminal 70. As shown in FIG. 3B, the terminal 70 includes a CPU 701 for controlling the operation of the entire terminal 70, a ROM (Read Only Memory) 702 storing a program, a RAM 703 used as a work area of the CPU 701, and a CPU 701. Has an EEPROM (Electrically Erasable and Programmable ROM) 704 for reading or writing data in accordance with the above control. A media I / F 707 that controls reading or writing (storage) of data from or to a recording medium 706 such as a flash memory, and a CMOS (Complementary Metal Oxide Semiconductor) sensor 712 that captures an image of a subject and obtains image data under the control of the CPU 701. Have.

なお、ＥＥＰＲＯＭ７０４には、ＣＰＵ７０１が実行するオペレーティングシステム（ＯＳ）、その他のプログラム、及び、種々データが記憶されている。また、ＣＭＯＳセンサ７１２は、光を電荷に変換して被写体の画像を電子化する電荷結合素子である。被写体を撮像することができれば、ＣＭＯＳセンサは、ＣＣＤ(Charge Coupled Device)センサに置き換えることもできる。 Note that the EEPROM 704 stores an operating system (OS) executed by the CPU 701, other programs, and various data. In addition, the CMOS sensor 712 is a charge-coupled device that converts light into electric charge to digitize an image of a subject. If a subject can be imaged, the CMOS sensor can be replaced with a CCD (Charge Coupled Device) sensor.

更に、端末７０は、音を音信号に変換するマイク７１４、音信号を音に変換するスピーカ７１５、アンテナ７１１ａ、このアンテナ７１１ａを利用して無線通信信号により、最寄りの基地局２ａと通信を行う通信部７１１、被写体の画像や各種アイコン等を表示する液晶や有機ＥＬなどのディスプレイ７２０、このディスプレイ７２０上に載せられ、感圧式又は静電式のパネルによって構成され、指やタッチペン等によるタッチによってディスプレイ７２０上におけるタッチ位置を検出するタッチパネル７２１、及び、上記各部を電気的に接続するためのアドレスバスやデータバス等のバスライン７１０を備えている。 Further, the terminal 70 communicates with the nearest base station 2a by a wireless communication signal using the microphone 714 for converting sound to a sound signal, the speaker 715 for converting sound signal to sound, the antenna 711a, and the antenna 711a. A communication unit 711, a display 720 such as a liquid crystal display or an organic EL display that displays an image of a subject and various icons, and is mounted on the display 720 and is configured by a pressure-sensitive or electrostatic type panel. A touch panel 721 for detecting a touch position on the display 720 and a bus line 710 such as an address bus and a data bus for electrically connecting the above-described units are provided.

＜＜管理システムのハードウェア構成例＞＞
図４は、一実施形態に係る管理システム５０のハードウェア構成図である。管理システム５０は、管理システム５０全体の動作を制御するＣＰＵ２０１、ＩＰＬ等のＣＰＵ２０１の駆動に用いられるプログラムを記憶したＲＯＭ２０２、ＣＰＵ２０１のワークエリアとして使用されるＲＡＭ２０３、管理システム５０用のプログラム等の各種データを記憶するＨＤ２０４を有する。また、ＣＰＵ２０１の制御にしたがってＨＤ２０４に対する各種データの読み出し又は書き込みを制御するＨＤＤ２０５(Hard Disk Drive)、フラッシュメモリ等の記録メディア２０６に対するデータの読み出し又は書き込み（記憶）を制御するメディアＩ／Ｆ２０７、カーソル、メニュー、ウィンドウ、文字、又は画像などの各種情報を表示するディスプレイ２０８を有する。また、通信ネットワーク２を利用してデータ通信するためのネットワークＩ／Ｆ２０９、文字、数値、各種指示などの入力のための複数のキーを備えたキーボード２１１、各種指示の選択や実行、処理対象の選択、カーソルの移動などを行うマウス２１２、着脱可能な記録媒体の一例としてのＣＤ−ＲＯＭ２１３(Compact Disc Read Only Memory)に対する各種データの読み出し又は書き込みを制御するＣＤ−ＲＯＭドライブ２１４を有する。更に、上記各構成要素を図４に示されているように電気的に接続するためのアドレスバスやデータバス等のバスライン２１０を備えている。 << Hardware configuration example of management system >>
FIG. 4 is a hardware configuration diagram of the management system 50 according to the embodiment. The management system 50 includes a CPU 201 that controls the operation of the entire management system 50, a ROM 202 that stores a program such as an IPL used for driving the CPU 201, a RAM 203 that is used as a work area of the CPU 201, and various programs such as a program for the management system 50. It has an HD 204 for storing data. Also, an HDD 205 (Hard Disk Drive) that controls reading or writing of various data from or to the HD 204 under the control of the CPU 201, a media I / F 207 that controls reading or writing (storage) of data from or to a recording medium 206 such as a flash memory, , A display 208 for displaying various information such as menus, windows, characters, and images. Further, a network I / F 209 for performing data communication using the communication network 2, a keyboard 211 having a plurality of keys for inputting characters, numerical values, various instructions, etc., selection and execution of various instructions, execution of processing targets, It has a mouse 212 for selecting, moving a cursor, and the like, and a CD-ROM drive 214 for controlling reading or writing of various data to a CD-ROM 213 (Compact Disc Read Only Memory) as an example of a removable recording medium. Further, as shown in FIG. 4, a bus line 210 such as an address bus or a data bus for electrically connecting the above-mentioned components is provided.

中継装置３０は、管理システム５０と同様のハードウェア構成を有しているため、その説明を省略する。 Since the relay device 30 has the same hardware configuration as the management system 50, the description thereof is omitted.

＜ソフトウェア構成例＞
図５（ａ）は、一実施形態に係る端末１０のソフトウェア構成図である。端末１０には、クライアントアプリとして「通信アプリＡ１」がインストールされている。アプリとはアプリケーションソフトを意味する。図５（ａ）に示されているように、ＯＳ１０２０、及び通信アプリＡ１は、端末１０のＲＡＭ１０３の作業領域１０１０上で動作する。これらのうち、ＯＳ１０２０は、基本的な機能を提供し、端末１０全体を管理する基本ソフトウェアである。通信アプリＡ１は、教師が使用する端末１０で動作し、他の端末７０と通信するためのアプリである。 <Example of software configuration>
FIG. 5A is a software configuration diagram of the terminal 10 according to the embodiment. “Communication application A1” is installed in the terminal 10 as a client application. An application means application software. As shown in FIG. 5A, the OS 1020 and the communication application A1 operate on the work area 1010 of the RAM 103 of the terminal 10. Among these, the OS 1020 is basic software that provides basic functions and manages the entire terminal 10. The communication application A1 is an application that operates on the terminal 10 used by the teacher and communicates with another terminal 70.

図５（ｂ）は、一実施形態に係る端末７０のソフトウェア構成図である。端末７０には、クライアントアプリとして「通信アプリＡ７」がインストールされている。図５（ｂ）に示されているように、ＯＳ７０２０、及び通信アプリＡ７は、ＲＡＭ７０３の作業領域７０１０上で動作する。これらのうち、ＯＳ７０２０は、基本的な機能を提供し、端末７０全体を管理する基本ソフトウェアである。通信アプリＡ７は、生徒が使用する端末１０で動作し、端末１０と通信するためのアプリである。 FIG. 5B is a software configuration diagram of the terminal 70 according to the embodiment. “Communication application A7” is installed in the terminal 70 as a client application. As shown in FIG. 5B, the OS 7020 and the communication application A7 operate on the work area 7010 of the RAM 703. Among them, the OS 7020 is basic software that provides basic functions and manages the entire terminal 70. The communication application A7 operates on the terminal 10 used by the student and is an application for communicating with the terminal 10.

なお、通信アプリＡ１，Ａ７の通信プロトコルとしては、(1)SIP(Session Initiation Protocol)、(2)H.323、(3)SIPを拡張したプロトコル、(4)インスタントメッセンジャーのプロトコル、(5)SIPのMESSAGEメソッドを利用したプロトコル、(6)インターネットリレーチャットのプロトコル(IRC(Internet Relay Chat))、(7)インスタントメッセンジャーのプロトコルを拡張したプロトコル等が挙げられる。このうち、(4)インスタントメッセンジャーのプロトコルは、例えば、(4−1)XMPP(Extensible Messaging and Presence Protocol)、又は(4−2)ICQ（登録商標）、AIM（登録商標）、若しくはSkype（登録商標）などで利用されるプロトコルである。また、(7)インスタントメッセンジャーのプロトコルを拡張したプロトコルは、例えば、Jingleである。 The communication protocols of the communication applications A1 and A7 include (1) SIP (Session Initiation Protocol), (2) H.323, (3) a protocol extended from SIP, (4) an instant messenger protocol, (5) Protocols using the MESSAGE method of SIP, (6) Internet Relay Chat protocol (IRC), (7) a protocol extended from the instant messenger protocol, and the like. Among them, (4) the instant messenger protocol is, for example, (4-1) XMPP (Extensible Messaging and Presence Protocol), or (4-2) ICQ (registered trademark), AIM (registered trademark), or Skype (registered trademark). Trademark). Also, (7) a protocol that is an extension of the instant messenger protocol is, for example, Jingle.

＜通信システムの機能について＞
次に、図６を用いて本実施形態の通信システム１の機能構成について説明する。図６は、通信システム１の一部を構成する端末１０，７０、及び管理システム５０の機能ブロック図である。なお、図６では、端末１０，７０、及び管理システム５０が、通信ネットワーク２を介してデータ通信することができるように接続されている。 <About communication system functions>
Next, a functional configuration of the communication system 1 of the present embodiment will be described with reference to FIG. FIG. 6 is a functional block diagram of the terminals 10 and 70 and the management system 50 that constitute a part of the communication system 1. In FIG. 6, the terminals 10 and 70 and the management system 50 are connected so that data communication can be performed via the communication network 2.

＜端末の機能構成＞
端末１０，７０は、送受信部１１，７１、操作入力受付部１２，７２、起動部１３，７３、出力制御部１４，７４、及び記憶・読出部１９，７９を有している。これら各部は、図３に示されている各構成要素のいずれかが、フラッシュメモリ１０４又はＥＥＰＲＯＭ７０４からＲＡＭ１０３，７０３上に展開された通信アプリＡ１，Ａ７（プログラム）に従ったＣＰＵ１０１，７０１からの命令によって動作することで実現される機能である。 <Functional configuration of terminal>
The terminals 10 and 70 include transmission / reception units 11 and 71, operation input reception units 12 and 72, activation units 13 and 73, output control units 14 and 74, and storage / readout units 19 and 79. These units are configured such that any one of the components shown in FIG. 3 is executed by the CPU 101 or 701 in accordance with the communication applications A1 and A7 (programs) developed on the RAM 103 and 703 from the flash memory 104 or the EEPROM 704. It is a function realized by operating with.

また、端末１０は、図３に示されているＲＯＭ１０２，７０２、ＲＡＭ１０３，７０３、フラッシュメモリ１０４又はＥＥＰＲＯＭ７０４によって構築される記憶部１０００，７０００を有している。 In addition, the terminal 10 includes storage units 1000 and 7000 configured by the ROMs 102 and 702, the RAMs 103 and 703, the flash memory 104, or the EEPROM 704 illustrated in FIG.

端末１０における各機能構成について詳細に説明する。送受信部１１，７１は、ＣＰＵ１０１，７０１からの命令、及びネットワークＩ／Ｆ１１１又は通信部７１１によって実現され、通信ネットワーク２を介して、通信相手の端末、各装置又はシステム等と各種データ（又は情報）の送受信を行う。 Each functional configuration in the terminal 10 will be described in detail. The transmission / reception units 11 and 71 are realized by commands from the CPUs 101 and 701 and the network I / F 111 or the communication unit 711, and communicate with the terminal, each device or system of the communication partner and various data (or information) via the communication network 2. ).

操作入力受付部１２，７２は、ＣＰＵ１０１，７０１からの命令、並びに操作ボタン１０８又はタッチパネル７２１、並びに電源スイッチ１０９によって実現され、ユーザによる各種入力又は各種選択を受け付ける。 The operation input receiving units 12 and 72 are realized by commands from the CPUs 101 and 701, the operation buttons 108 or the touch panel 721, and the power switch 109, and receive various inputs or various selections by the user.

起動部１３，７３は、ＣＰＵ１０１，７０１からの命令によって実現され、通信アプリＡ１，Ａ７の動作を起動する。 The activation units 13 and 73 are implemented by instructions from the CPUs 101 and 701, and activate the operations of the communication applications A1 and A7.

出力制御部１４，７４は、ＣＰＵ１０１，７０１からの命令、並びに、ディスプレイＩ／Ｆ１１７及び音入出力Ｉ／Ｆ１１６によって実現され、画像データ、及び音データの出力を制御する。 The output control units 14 and 74 are realized by instructions from the CPUs 101 and 701, the display I / F 117 and the sound input / output I / F 116, and control output of image data and sound data.

記憶・読出部１９，７９は、ＣＰＵ１０１，７０１からの命令によって実現され、記憶部１０００，７０００に各種データを記憶したり、記憶部１０００，７０００に記憶された各種データを読み出したりする処理を行う。 The storage / readout units 19 and 79 are realized by instructions from the CPUs 101 and 701, and perform processing of storing various data in the storage units 1000 and 7000 and reading various data stored in the storage units 1000 and 7000. .

＜＜管理システムの機能構成＞＞
管理システム５０は、送受信部５１、認証部５２、管理部５３、録音管理部５４、解析・結果管理部５５、セッション制御部５８、及び記憶・読出部５９を有している。これら各部は、図４に示されている各構成要素のいずれかが、ＨＤ２０４からＲＡＭ２０３上に展開された管理システム５０用のプログラムに従ったＣＰＵ２０１からの命令によって動作することで実現される機能である。また、管理システム５０は、ＨＤ２０４により構築される記憶部５０００を有している。更に、記憶部５０００には、以下に示すような各テーブルによって各ＤＢが構築される。 << Functional configuration of management system >>
The management system 50 includes a transmission / reception unit 51, an authentication unit 52, a management unit 53, a recording management unit 54, an analysis / result management unit 55, a session control unit 58, and a storage / readout unit 59. These units are functions realized by operating any of the components shown in FIG. 4 by a command from the CPU 201 in accordance with a program for the management system 50 developed on the RAM 203 from the HD 204. is there. The management system 50 has a storage unit 5000 constructed by the HD 204. Further, in the storage unit 5000, each DB is constructed by the following tables.

（認証管理テーブル）
図７（ａ）は、認証管理テーブルを示す概念図である。記憶部５０００には、図７（ａ）に示されているような認証管理テーブルによって認証管理ＤＢ５００１が構築されている。この認証管理テーブルでは、管理システム５０によって管理される全ての端末１０，７０の各通信ＩＤに対して、認証用のパスワードが関連付けられて管理される。 (Authentication management table)
FIG. 7A is a conceptual diagram illustrating an authentication management table. In the storage unit 5000, an authentication management DB 5001 is constructed by an authentication management table as shown in FIG. In this authentication management table, a password for authentication is managed in association with each communication ID of all terminals 10 and 70 managed by the management system 50.

（端末管理テーブル）
図７（ｂ）は、端末管理テーブルを示す概念図である。記憶部５０００には、図７（ｂ）に示されているような端末管理テーブルによって端末管理ＤＢ５００２が構築されている。この端末管理テーブルでは、各端末１０，７０の通信ＩＤ毎に、各端末１０，７０のＩＰアドレス及びＴＹＰＥが関連付けられて管理される。各端末１０，７０の通信ＩＤは各端末１０，７０のログインにより把握される。ＴＹＰＥについては後述されるが、ＴＹＰＥはログイン時などに登録されてもよいし、予め登録されていてもよい。 (Terminal management table)
FIG. 7B is a conceptual diagram illustrating a terminal management table. In the storage unit 5000, a terminal management DB 5002 is constructed by a terminal management table as shown in FIG. In this terminal management table, the IP address and TYPE of each terminal 10, 70 are managed in association with each communication ID of each terminal 10, 70. The communication ID of each terminal 10, 70 is grasped by the login of each terminal 10, 70. The TYPE will be described later, but the TYPE may be registered at the time of login or the like, or may be registered in advance.

（グループ管理テーブル）
図７（ｃ）は、グループ管理テーブルを示す概念図である。記憶部５０００には、図７（ｃ）に示されているようなグループ管理テーブルによってグループ管理ＤＢ５００５が構築されている。このグループ管理テーブルでは、グループＩＤに対応付けて、各端末１０，７０の通信ＩＤが関連付けられて管理される。グループとは、１人の教師、及び、この教師が指導する複数の生徒の集まりである。グループＩＤはグループを識別する識別情報である。１つのグループは同じ会議（コミュニケーションの一例）に参加して、グループの端末１０，７０の間でコンテンツデータが相互に送受信される。 (Group management table)
FIG. 7C is a conceptual diagram showing a group management table. In the storage unit 5000, a group management DB 5005 is constructed by a group management table as shown in FIG. In this group management table, the communication IDs of the terminals 10 and 70 are managed in association with the group IDs. A group is a group of one teacher and a plurality of students led by the teacher. The group ID is identification information for identifying a group. One group participates in the same conference (an example of communication), and content data is mutually transmitted and received between the terminals 10 and 70 of the group.

グループ管理テーブルの作成方法としては、会議が始まる前に例えば教師が管理システム５０にアクセスして登録しておく方法がある。あるいは、予めグループ管理テーブルを作る必要がない方法として、同じ会議に参加する生徒に管理システム５０が電子メールなどで同じ招待コードを配布しておき、生徒がログイン時に招待コードを入力し、管理システム５０が同じ招待コードの生徒を同じグループに分類してもよい（事後的にグループ管理テーブルを作成する）。あるいは、予め生徒を言語能力でレベル分けしておき（レベルは認証管理テーブルに登録されているものとうる）、管理システム５０が同じレベルの生徒をログイン順に同じグループに分類してもよい。例えば１つのグループの生徒数を３人とした場合、同じレベルの３人がログインするごとにグループＩＤが割り当てられる。この場合、各グループに教師が割り当てられ、会議が開始する。 As a method of creating the group management table, for example, there is a method in which a teacher accesses the management system 50 and registers it before the meeting starts. Alternatively, as a method that does not require the creation of a group management table in advance, the management system 50 distributes the same invitation code by e-mail or the like to students participating in the same meeting, The students 50 may classify the students having the same invitation code into the same group (the group management table is created afterwards). Alternatively, the students may be classified in advance by language ability (the levels may be registered in the authentication management table), and the management system 50 may classify the students of the same level into the same group in the order of login. For example, if the number of students in one group is three, a group ID is assigned every time three students of the same level log in. In this case, a teacher is assigned to each group, and the conference starts.

（録音管理テーブル）
図８（ａ）は、録音管理テーブルを示す概念図である。記憶部５０００には、図８（ａ）に示されているような録音管理テーブルによって録音管理ＤＢ５００３が構築されている。録音管理テーブルは録音された音声データに関する情報が登録されるテーブルである。 (Recording management table)
FIG. 8A is a conceptual diagram showing a recording management table. In the storage unit 5000, a recording management DB 5003 is constructed by a recording management table as shown in FIG. The recording management table is a table in which information on recorded audio data is registered.

録音管理テーブルには、録音ＩＤ、ＴＹＰＥ、通信ＩＤ、及び、録音ファイルの各項目が関連付けられている。録音ＩＤは教師又は生徒の音声データの録音ファイルを一意に特定又は識別する識別情報であり、録音のたびに管理システム５０が採番する。図８（ａ）では通信ＩＤ、会議ＩＤ、問題番号（後述される）、及び、リトライ番号（後述される）を連結したものが録音ＩＤである。 Each item of the recording ID, the TYPE, the communication ID, and the recording file is associated with the recording management table. The recording ID is identification information for uniquely specifying or identifying a recording file of voice data of a teacher or a student, and is assigned by the management system 50 every time recording is performed. In FIG. 8A, a recording ID is a combination of a communication ID, a conference ID, a problem number (described later), and a retry number (described later).

ＴＹＰＥはその録音ファイルが、生徒（STUDENT）が発声した音声、教師(TEACHER)が発声した音声、又は、事前に録音された(PRESETされた)音声のいずれであるかを示す録音ファイルの属性である。例えば通信アプリＡ１，Ａ７がＴＹＰＥを管理システム５０に送信することにより判断される。通信ＩＤは上記と同様であるが、事前に録音された音声の場合は通信ＩＤがＮＵＬＬ（値がないという意味）となる。録音ファイル名は実際に録音された音声データのファイル名である。ファイル名は管理システム５０が自動的に付与するが、図８（ａ）では録音ＩＤをファイル名としている。なお、録音管理テーブルの１行をレコードという。 TYPE is an attribute of a recording file indicating whether the recording file is a voice uttered by a student (STUDENT), a voice uttered by a teacher (TEACHER), or a voice recorded in advance (preset). is there. For example, the determination is made by the communication applications A1 and A7 transmitting TYPE to the management system 50. The communication ID is the same as above, but the communication ID is NULL (meaning that there is no value) in the case of a voice that has been recorded in advance. The recording file name is the file name of the actually recorded audio data. The file name is automatically given by the management system 50. In FIG. 8A, the recording ID is used as the file name. One line of the recording management table is called a record.

（結果管理テーブル）
図８（ｂ）は、結果管理テーブルを示す概念図である。記憶部５０００には、図８（ｂ）に示されているような結果管理テーブルによって結果管理ＤＢ５００４が構築されている。結果管理テーブルは、お手本の音声と比較して生徒の音声がどのように評価されたかをまとめたテーブルである。 (Result management table)
FIG. 8B is a conceptual diagram illustrating a result management table. In the storage unit 5000, a result management DB 5004 is constructed by a result management table as shown in FIG. The result management table is a table summarizing how the student's voice was evaluated compared to the model voice.

結果管理テーブルには、手本録音ＩＤ、回答録音ＩＤ、会議ＩＤ、問題番号、リトライ番号、手本認識結果、回答認識結果、及び、採点結果の各項目が関連付けられている。手本録音ＩＤと回答録音ＩＤは録音管理ＤＢ５００３に登録されているＩＤが転用される。手本録音ＩＤのＴＹＰＥはPRESET又はTEACHERのいずれかであり、回答録音ＩＤのＴＹＰＥはSTUDENTである。会議ＩＤはその会議が終了するまでを一単位として一意にどの会議であるかを決定する識別情報である。会議ＩＤは会議開始時に管理システム５０が採番する。問題番号は、１つの会議で教師が出題した問題を識別する識別情報である。問題番号は出題のたびに管理システム５０が採番する。リトライ番号は、同じ問題が再度、出題された場合の問題に対する枝番である。手本認識結果は、お手本の音声データの認識結果である。図８（ｂ）では一例として発音記号に変換されたものが認識結果となっている。回答認識結果は、生徒の音声データの認識結果である。図８（ｂ）では一例として発音記号に変換されたものが認識結果となっている。採点結果は、手本認識結果と回答認識結果の比較結果を単語ごと「○」「×」で示す。なお、結果管理テーブルの１行をレコードという。 In the result management table, items of sample recording ID, answer recording ID, conference ID, question number, retry number, sample recognition result, answer recognition result, and scoring result are associated. As the sample recording ID and the answer recording ID, the ID registered in the recording management DB 5003 is diverted. The type of the sample recording ID is either PRESET or TEACHER, and the type of the answer recording ID is STUDENT. The conference ID is identification information for uniquely determining which conference is a unit until the conference ends. The conference ID is assigned by the management system 50 at the start of the conference. The question number is identification information for identifying a question given by a teacher in one meeting. The question number is assigned by the management system 50 every time the question is asked. The retry number is a branch number for a question when the same question is re-issued. The model recognition result is a recognition result of the model voice data. In FIG. 8B, as an example, the result of recognition is converted to phonetic symbols. The answer recognition result is a recognition result of the voice data of the student. In FIG. 8B, as an example, the result of recognition is converted to phonetic symbols. In the scoring result, the comparison result between the model recognition result and the answer recognition result is indicated by “O” or “X” for each word. One row of the result management table is called a record.

＜＜管理システムの各機能構成＞＞
次に、管理システム５０の各機能構成について詳細に説明する。送受信部５１は、ＣＰＵ２０１からの命令、及びネットワークＩ／Ｆ２０９によって実現され、通信ネットワーク２を介して各端末、装置又はシステムと各種データ（又は情報）の送受信を行う。 << Functional Configuration of Management System >>
Next, each functional configuration of the management system 50 will be described in detail. The transmission / reception unit 51 is realized by a command from the CPU 201 and the network I / F 209, and transmits / receives various data (or information) to / from each terminal, device, or system via the communication network 2.

認証部５２は、ＣＰＵ２０１からの命令によって実現され、送受信部５１で受信された通信ＩＤ及びパスワードを検索キーとして認証管理テーブルを検索し、この認証管理テーブルに同一の通信ＩＤ及びパスワードが管理されているかを判断することによって端末の認証を行う。 The authentication unit 52 is realized by a command from the CPU 201, searches the authentication management table using the communication ID and password received by the transmission / reception unit 51 as a search key, and the same communication ID and password are managed in the authentication management table. Authentication of the terminal by determining whether the terminal

管理部５３は、ＣＰＵ２０１からの命令によって実現され、端末管理テーブルにログイン中の端末１０、７０を登録することで各端末１０，７０を管理する。 The management unit 53 is realized by an instruction from the CPU 201, and manages the terminals 10, 70 by registering the logged-in terminals 10, 70 in the terminal management table.

録音管理部５４は、ＣＰＵ２０１からの命令によって実現され、教師の操作を契機にして録音を開始し、教師又は生徒の操作を契機にして録音を終了する。開始から終了の間の音声データを録音し、録音管理ＤＢ５００３に各項目を登録する。 The recording management unit 54 is realized by a command from the CPU 201, and starts recording upon an operation of a teacher, and ends recording upon an operation of a teacher or a student. Voice data from the start to the end is recorded, and each item is registered in the recording management DB 5003.

解析・結果管理部５５は、ＣＰＵ２０１からの命令によって実現され、録音された教師と生徒の音声データを比較・解析し、その結果を結果管理ＤＢ５００４に記録する。 The analysis / result management unit 55 is realized by an instruction from the CPU 201, compares and analyzes the recorded teacher and student voice data, and records the result in the result management DB 5004.

セッション制御部５８は、ＣＰＵ２０１からの命令によって、端末１０，７０間でコンテンツデータを送信するためのセッションを制御する。この制御としては、セッションを確立するための制御、確立されたセッションに端末１０，７０参加させる制御、セッションから退出する制御等が含まれる。 The session control unit 58 controls a session for transmitting content data between the terminals 10 and 70 according to a command from the CPU 201. This control includes control for establishing a session, control for joining the terminals 10 and 70 to the established session, control for exiting from the session, and the like.

記憶・読出部５９は、ＣＰＵ２０１からの命令及びＨＤＤ２０５によって実現され、又はＣＰＵ２０１からの命令によって実現され、記憶部５０００に各種データを記憶したり、記憶部５０００に記憶された各種データを抽出したりする処理を行う。 The storage / readout unit 59 is realized by a command from the CPU 201 and the HDD 205, or is realized by a command from the CPU 201, and stores various data in the storage unit 5000 or extracts various data stored in the storage unit 5000. Is performed.

＜通信システムの処理・動作＞
続いて、通信システム１における処理及び動作について説明する。 <Processing and operation of communication system>
Subsequently, processing and operation in the communication system 1 will be described.

＜＜ログイン時の処理＞＞
まず、図９を用いて、端末１０，７０が管理システム５０へログインする処理を説明する。図９は、端末１０，７０が管理システム５０へログインする処理を示すシーケンス図の一例である。 << Login processing >>
First, a process in which the terminals 10 and 70 log in to the management system 50 will be described with reference to FIG. FIG. 9 is an example of a sequence diagram illustrating a process in which the terminals 10 and 70 log in to the management system 50.

端末７０のユーザが電源スイッチをＯＮにすると、操作入力受付部７２が電源ＯＮを受け付けて、端末７０を起動させる（ステップＳ１）。端末７０が起動すると、起動部７３は、端末７０にインストールされている通信アプリＡ７を起動させる（ステップＳ２）。以下、端末７０における処理は、通信アプリＡ７の命令により実行される。 When the user of the terminal 70 turns on the power switch, the operation input receiving unit 72 receives the power ON and activates the terminal 70 (step S1). When the terminal 70 is activated, the activation unit 73 activates the communication application A7 installed in the terminal 70 (Step S2). Hereinafter, the process in the terminal 70 is executed by a command of the communication application A7.

端末７０の送受信部７１は、通信ネットワーク２を介して管理システム５０に、ログイン要求を送信する（ステップＳ３）。このログイン要求には、ログイン要求元である自端末を識別するための通信ＩＤ、及びパスワードが含まれている。 The transmission / reception unit 71 of the terminal 70 transmits a login request to the management system 50 via the communication network 2 (Step S3). The login request includes a communication ID and a password for identifying the terminal that is the login request source.

管理システム５０の送受信部５１はログイン要求を受信する。端末７０から管理システム５０へログイン要求が送信されることで、受信側である管理システム５０は、送信側である端末７０のＩＰアドレスを取得することができる。 The transmission / reception unit 51 of the management system 50 receives the login request. By transmitting the login request from the terminal 70 to the management system 50, the management system 50 on the receiving side can acquire the IP address of the terminal 70 on the transmitting side.

次に、管理システム５０の認証部５２は、ログイン要求に含まれている通信ＩＤ及びパスワードを検索キーとして、記憶部５０００の認証管理テーブルを検索し、この認証管理テーブルに同一の通信ＩＤ及びパスワードが管理されているかを判断することによって認証を行う（ステップＳ４）。 Next, the authentication unit 52 of the management system 50 searches the authentication management table of the storage unit 5000 using the communication ID and password included in the login request as a search key, and stores the same communication ID and password in the authentication management table. Authentication is performed by judging whether or not is managed (step S4).

認証部５２によって、正当な利用権限を有する端末７０からのログイン要求であると認証された場合には、管理部５３は、端末管理テーブルに、ログイン要求元の端末７０の通信ＩＤ、及びＩＰアドレスを関連付けて記憶する（ステップＳ５）。これにより端末管理テーブルには、ログイン中の端末７０へアクセスするための情報が管理される。また、管理部５３は通信アプリＡ７であることを示す情報を受信して、端末管理テーブルにＴＹＰＥを登録する。 When the authentication unit 52 authenticates that the login request is from the terminal 70 having the right to use, the management unit 53 stores the communication ID and the IP address of the login request source terminal 70 in the terminal management table. Are stored in association with each other (step S5). As a result, information for accessing the logged-in terminal 70 is managed in the terminal management table. Further, the management unit 53 receives the information indicating the communication application A7 and registers the TYPE in the terminal management table.

管理システム５０の送受信部５１は、認証部５２によって得られた認証結果が示された認証結果情報を、通信ネットワーク２を介して、ログイン要求元の端末７０へ送信する（ステップＳ６）。これにより、端末７０の送受信部７１は、認証結果情報を受信する。以下、生徒が操作するログイン要求元の端末７０（複数ある）の認証に成功し、端末７０が管理システム５０にログインした場合について説明する。 The transmission / reception unit 51 of the management system 50 transmits the authentication result information indicating the authentication result obtained by the authentication unit 52 to the terminal 70 as the login request source via the communication network 2 (step S6). Thereby, the transmission / reception unit 71 of the terminal 70 receives the authentication result information. Hereinafter, a case will be described in which the terminal 70 (the plurality of terminals) of the login request source operated by the student has been successfully authenticated and the terminal 70 has logged in to the management system 50.

一方、教師側の端末１０の教師が電源スイッチ１０９をＯＮにすると、操作入力受付部１２が電源ＯＮを受け付けて、端末１０を起動させる（ステップＳ１１）。端末１０が起動すると、起動部１３は、端末１０にインストールされている通信アプリＡ１を起動させる（ステップＳ１２）。以下、端末１０における処理は、通信アプリＡ１の命令により実行される。 On the other hand, when the teacher of the terminal 10 on the teacher side turns on the power switch 109, the operation input receiving unit 12 receives the power ON and starts the terminal 10 (step S11). When the terminal 10 is activated, the activation unit 13 activates the communication application A1 installed on the terminal 10 (Step S12). Hereinafter, the process in the terminal 10 is executed by a command of the communication application A1.

端末１０は、管理システム５０へログイン要求を送信して、管理システム５０へログインする（ステップＳ１３−２，Ｓ１４，Ｓ１５，Ｓ１６）。この処理は、端末７０と管理システム５０との間のステップＳ３，Ｓ４，Ｓ５，Ｓ６の処理と同様であるので、説明を省略する。 The terminal 10 transmits a login request to the management system 50, and logs in to the management system 50 (steps S13-2, S14, S15, S16). This processing is the same as the processing in steps S3, S4, S5, and S6 between the terminal 70 and the management system 50, and thus the description is omitted.

管理システム５０が端末１０、７０を操作するユーザが、教師であるか生徒であるかを判断する方法としては、ステップＳ５で説明したように、通信アプリＡ１、Ａ７を識別する方法がある。この方法では、例えばログイン時に端末１０、７０が通信アプリＡ１、Ａ７の区別する情報を管理システム５０に送信すればよい。あるいは、予め通信ＩＤを区別して配布する方法もある。例えば、教師に配布される通信ＩＤは「Ｔ００１」のようにＴで始まり、生徒に配布される通信ＩＤは「Ｓ００１」のようにＳで始まる。これらの方法で、管理システム５０は容易に教師か生徒かを判断できる。あるいは、上記の招待コードが利用されている場合、生徒には特定の招待コードが配布されるようにして生徒か教師かを判断できる。 As a method for the management system 50 to determine whether the user operating the terminals 10 and 70 is a teacher or a student, there is a method of identifying the communication applications A1 and A7 as described in step S5. In this method, for example, the terminals 10 and 70 may transmit information for distinguishing the communication applications A1 and A7 to the management system 50 at the time of login. Alternatively, there is a method of distributing the communication ID in advance. For example, a communication ID distributed to a teacher starts with T as “T001”, and a communication ID distributed to students starts with S as “S001”. With these methods, the management system 50 can easily determine whether it is a teacher or a student. Alternatively, when the above invitation code is used, it is possible to determine whether a student or a teacher by distributing a specific invitation code to a student.

端末１０，７０がログインすると、セッション制御部５８は例えば負荷の少ない中継装置３０を決定して、この中継装置３０に端末１０，７０が送受信するコンテンツデータの中継を開始させる。中継装置３０には端末１０，７０の通信ＩＤ（端末１０，７０を特定できる情報であればよい）とグループＩＤが対応付けて送信される。これにより、端末１０，７０が通信ＩＤ（端末１０，７０を特定できる情報であればよい）と共に送信したコンテンツデータは同じグループ内の端末１０，７０に転送される。 When the terminals 10 and 70 log in, the session control unit 58 determines, for example, the relay device 30 with a small load, and causes the relay device 30 to start relaying the content data transmitted and received by the terminals 10 and 70. The communication ID of the terminals 10 and 70 (the information may be any information that can identify the terminals 10 and 70) and the group ID are transmitted to the relay device 30 in association with each other. As a result, the content data transmitted by the terminals 10 and 70 together with the communication ID (the information may be any information that can identify the terminals 10 and 70) is transferred to the terminals 10 and 70 in the same group.

＜＜音声データの評価＞＞
次に、図１０を用いて、管理システム５０が生徒の音声データを評価する処理について説明する。ここでは、英会話のオンライン教室が開講された状態をイメージされたい。以下の、英会話のオンライン教室では、教師が生徒と会話しながら、適宜、出題するという形態が想定されている。 << Evaluation of audio data >>
Next, a process in which the management system 50 evaluates student voice data will be described with reference to FIG. Here, I want to imagine a state where an online classroom for English conversation has been opened. In the following English conversation online classroom, it is assumed that the teacher gives questions as appropriate while talking with the students.

図１０は、管理システム５０が生徒の音声データを評価する手順を示すシーケンス図の一例である。図１０の処理は、端末１０，７０がログインして中継装置３０が決定された状態から説明される。また、図１１〜図１６に示す、教師用の端末１０と生徒用の端末１０が表示する画面例を適宜、参照して説明する。なお、図１０では中継装置３０が図示されていないが、図１０では管理システム５０が中継装置３０と一体と見なしている。実際に音声を評価するのは管理システム５０でも中継装置３０でもよい。 FIG. 10 is an example of a sequence diagram illustrating a procedure in which the management system 50 evaluates student voice data. The process of FIG. 10 is described from the state where the terminals 10 and 70 log in and the relay device 30 is determined. In addition, description will be given with reference to examples of screens displayed by the teacher terminal 10 and the student terminal 10 shown in FIGS. Although the relay device 30 is not shown in FIG. 10, the management system 50 is regarded as integral with the relay device 30 in FIG. 10. The management system 50 or the relay device 30 may actually evaluate the voice.

M1：管理システム５０の録音管理部５４は会議が始まると会議ＩＤを採番する。これにより音声データの管理が可能になる。 M1: The recording management unit 54 of the management system 50 assigns a conference ID when the conference starts. This enables management of audio data.

M2−1,M2−2：管理システム５０のセッション制御部５８は会議の準備が整ったため会議開始要求を端末１０，７０に送信する。これにより、テレビ会議が開始される。つまり、コンテンツデータの送受信が開始される。 M2-1, M2-2: The session control unit 58 of the management system 50 transmits a conference start request to the terminals 10 and 70 because the conference is ready. Thereby, the video conference is started. That is, transmission / reception of content data is started.

T1,S1：端末１０の通信アプリＡ１、端末７０の通信アプリＡ７は会議開始の処理を行う。具体的には通信アプリＡ１、通信アプリＡ７が画面を表示したり、中継装置３０と通信を開始したり、管理システム５０から必要な情報を取得したりする。 T1, S1: The communication application A1 of the terminal 10 and the communication application A7 of the terminal 70 perform conference start processing. Specifically, the communication applications A1 and A7 display a screen, start communication with the relay device 30, and acquire necessary information from the management system 50.

S2：生徒側の端末７０の出力制御部７４は生徒画面Ｓ１（第３の画面の一例）を表示する。生徒画面Ｓ１を図１４に示す。生徒画面Ｓ１が表示されている間、生徒は原則的に教師の発声を聞き取るが任意に発言することも可能である。 S2: The output control unit 74 of the student terminal 70 displays a student screen S1 (an example of a third screen). FIG. 14 shows the student screen S1. While the student screen S1 is displayed, the student listens to the teacher's voice in principle, but can also speak arbitrarily.

M3,M4：一方、管理システム５０の送受信部５１は教師側の端末１０に事前に録音されている録音ファイルのリストを送信する。すなわち、録音管理ＤＢ５００３からＴＹＰＥがＰＲＥＳＥＴの録音ファイルの録音ＩＤ及び録音ファイル名を送信する。教師側の端末１０であることは端末管理ＤＢに登録されている。なお、リストは構造化文書で構成されており、構造化文書の例としては、ＨＴＭＬやＸＭＬ（Extensible Markup Language）等がある。 M3, M4: On the other hand, the transmitting / receiving unit 51 of the management system 50 transmits a list of recorded files recorded in advance to the terminal 10 on the teacher side. That is, the TYPE transmits the recording ID and the recording file name of the recording file of PRESET from the recording management DB 5003. The terminal 10 on the teacher side is registered in the terminal management DB. The list is composed of a structured document, and examples of the structured document include HTML, XML (Extensible Markup Language), and the like.

T2：端末１０の送受信部１１は事前に録音されている録音ファイルのリストを受信する。これにより、端末１０の出力制御部１４は教師画面Ｔ１（第２の画面の一例）を表示する。教師画面Ｔ１を図１１に示す。教師画面Ｔ１は、教師が任意に発言し、必要に応じて出題用の音声データを選択又は入力する画面である。 T2: The transmitting / receiving unit 11 of the terminal 10 receives a list of recording files recorded in advance. Thereby, the output control unit 14 of the terminal 10 displays the teacher screen T1 (an example of the second screen). FIG. 11 shows the teacher screen T1. The teacher screen T1 is a screen on which the teacher arbitrarily speaks and selects or inputs question sound data as needed.

T3：図１０では教師が教師画面Ｔ１で事前に録音されている音声（録音ファイル名）を選択したものとする。教師が音声を入力する場合の処理については図１７にて説明する。端末１０の操作入力受付部１２は操作を受け付け、出力制御部１４は教師画面Ｔ２ａを表示する。教師画面Ｔ２ａを図１２（ａ）に示す。教師画面Ｔ２ａは、教師が音声データを選択した状態の画面である。 T3: In FIG. 10, it is assumed that the teacher has selected a pre-recorded voice (recorded file name) on the teacher screen T1. The process when the teacher inputs a voice will be described with reference to FIG. The operation input receiving unit 12 of the terminal 10 receives the operation, and the output control unit 14 displays the teacher screen T2a. The teacher screen T2a is shown in FIG. The teacher screen T2a is a screen in a state where the teacher has selected audio data.

T4：端末１０の送受信部１１は、教師が選択した録音ファイル名の録音ＩＤを管理システム５０に送信する。この録音ＩＤが録音ファイルの送信要求となる。 T4: The transmitting / receiving unit 11 of the terminal 10 transmits the recording ID of the recording file name selected by the teacher to the management system 50. This recording ID becomes a transmission request of the recording file.

M5：管理システム５０の送受信部５１は録音ＩＤを受信する。録音管理部５４は録音管理ＤＢ５００３から録音ＩＤに該当する音声データを取得する。 M5: The transmission / reception unit 51 of the management system 50 receives the recording ID. The recording management unit 54 acquires audio data corresponding to the recording ID from the recording management DB 5003.

M6：管理システム５０の送受信部５１は取得した音声データを生徒の端末７０に送信する。図１０では生徒の端末７０は１台であるが、複数の端末７０があるものとする。 M6: The transmission / reception unit 51 of the management system 50 transmits the acquired audio data to the student terminal 70. In FIG. 10, there is one student terminal 70, but it is assumed that there are a plurality of terminals 70.

S3：生徒の端末７０の送受信部７１は音声データを受信し、音声データを受信したことを契機に、出力制御部７４が生徒画面Ｓ２（第４の画面の一例）を表示する。生徒画面Ｓ２は音声データの受信開始、受信完了、又は、受信開始から終了まで、のいずれかのタイミングで表示されてよいが、生徒画面Ｓ１は教師の発声の出力中に表示される画面なので、受信完了（受信と共にリアルタイムに出力されるため出力完了ともいう）を契機に生徒画面Ｓ１から生徒画面Ｓ２に遷移するとよい。なお、生徒画面Ｓ１から生徒画面Ｓ２に遷移させる信号を管理システム５０が端末７０に明示的に送信してもよい。生徒画面Ｓ２を図１５に示す。生徒画面Ｓ２は、管理システム５０から送信された音声データがスピーカから出力され、これを手本にして生徒が音声を発声する画面である。このように、教師が出題することで、自動的に生徒の端末７０の画面が遷移するため、生徒は生徒画面Ｓ２を表示するために通信アプリＡ７を操作する必要がない。換言すると、教師が生徒の端末７０の画面を制御する主導権を持っている。 S3: The transmission / reception unit 71 of the student terminal 70 receives the audio data, and upon receiving the audio data, the output control unit 74 displays the student screen S2 (an example of a fourth screen). The student screen S2 may be displayed at any timing from the start of reception of audio data, the completion of reception, or from the start to the end of reception. However, since the student screen S1 is a screen displayed during output of the teacher's utterance, The transition from the student screen S1 to the student screen S2 may be triggered by the completion of reception (also referred to as output completion because the data is output in real time upon reception). Note that the management system 50 may explicitly transmit a signal for transitioning from the student screen S1 to the student screen S2 to the terminal 70. FIG. 15 shows the student screen S2. The student screen S2 is a screen in which the audio data transmitted from the management system 50 is output from the speaker, and the student speaks using the audio data as an example. As described above, the screen of the student terminal 70 automatically changes when the teacher gives a question, so that the student does not need to operate the communication application A7 to display the student screen S2. In other words, the teacher has the initiative to control the screen of the student terminal 70.

S4：生徒画面Ｓ２を表示した端末７０の送受信部７１は通信ＩＤとＴＹＰＥを管理システム５０に送信する。音声を録音するためである。ＴＹＰＥに関しては端末管理ＤＢ５００２に登録されているため送信されなくてもよい。 S4: The transmitting / receiving unit 71 of the terminal 70 displaying the student screen S2 transmits the communication ID and the TYPE to the management system 50. This is for recording voice. Since TYPE is registered in the terminal management DB 5002, it need not be transmitted.

M7：管理システム５０の送受信部５１は通信ＩＤとＴＹＰＥを受信することで、録音管理部５４が録音を開始する。 M7: The transmission / reception unit 51 of the management system 50 receives the communication ID and TYPE, and the recording management unit 54 starts recording.

S5：生徒画面Ｓ２に遷移した端末７０では音声データの送信が開始されているので、生徒が発声した音声データを送受信部７１は随時（リアルタイムに）管理システム５０に送信する。管理システム５０の録音管理部５４は引き続き、録音する。 S5: Since the terminal 70 that has transitioned to the student screen S2 has started transmitting voice data, the transmitting / receiving unit 71 transmits the voice data uttered by the student to the management system 50 as needed (in real time). The recording management unit 54 of the management system 50 continues recording.

S6：発声が終わると生徒は生徒画面Ｓ２でエンドボタン３０６を押下する。 S6: When the utterance ends, the student presses the end button 306 on the student screen S2.

S7：操作入力受付部７２は生徒の操作を受け付け、送受信部７１が通信ＩＤを再度、送信する。発声が終了した管理システム５０に知らせるためである。 S7: The operation input receiving unit 72 receives the operation of the student, and the transmitting / receiving unit 71 transmits the communication ID again. This is for notifying the management system 50 that the utterance has ended.

M8：管理システム５０の送受信部５１は通信ＩＤを受信し、録音管理部５４が音声データの録音を終了する。録音管理部５４は録音ファイルの識別子となる録音ＩＤを採番する。上記のように、通信ＩＤ、会議ＩＤ、問題番号、及び、リトライ番号が録音ＩＤとなる。なお、別のパターンとして、録音は端末７０側で実行し、録音終了後に録音ファイルを管理システム５０に送付してもよい。 M8: The transmission / reception unit 51 of the management system 50 receives the communication ID, and the recording management unit 54 ends the recording of the audio data. The recording management unit 54 assigns a recording ID as an identifier of the recording file. As described above, the communication ID, the conference ID, the question number, and the retry number are the recording ID. As another pattern, the recording may be executed on the terminal 70 side, and the recording file may be sent to the management system 50 after the recording is completed.

M9：録音管理部５４は録音ＩＤ、ＴＹＰＥ、通信ＩＤ、及び録音ファイルを録音管理ＤＢ５００３に登録する。 M9: The recording management unit 54 registers the recording ID, TYPE, communication ID, and recording file in the recording management DB 5003.

M10：そして、解析・結果管理部５５が、出力した（出力とは再生することをいう）音声データと端末７０から送信された音声データを比較して評価する。詳細は後述する。 M10: Then, the analysis / result management unit 55 compares and evaluates the audio data output (output means to reproduce) with the audio data transmitted from the terminal 70. Details will be described later.

M11：解析・結果管理部５５は、手本とした録音ファイルの録音ＩＤを手本録音ＩＤとして、生徒の録音ファイルの録音ＩＤを回答録音ＩＤとして、結果管理ＤＢ５００４に登録する。同様に、会議ＩＤ、この会議の何問目の問題であるか（問題番号）、同一問題の何回目のリトライであるかというリトライ番号、手本認識結果、回答認識結果を結果管理ＤＢ５００４に登録する。リトライについては教師画面Ｔ３（第１の画面の一例）で詳細に説明する。 M11: The analysis / result management unit 55 registers the recording ID of the sampled recording file as the model recording ID and the recording ID of the student's recording file as the answer recording ID in the result management DB 5004. Similarly, the conference ID, the question of this conference (problem number), the retry number indicating the retry of the same question, the model recognition result, and the answer recognition result are registered in the result management DB 5004. I do. The retry will be described in detail on the teacher screen T3 (an example of the first screen).

M12−1、M12−2：管理システム５０の送受信部５１は、評価結果を教師側の端末１０と生徒側の端末７０にそれぞれ送信する。また、教師の音声データの録音ＩＤ（又は音声データそのもの）を生徒の端末７０のそれぞれに送信すると共に、生徒が発声した音声データの録音ＩＤ（又は音声データそのもの）をその音声データを発声した端末７０に個別に送信する。教師の音声データの録音ＩＤ（又は音声データそのもの）と、各生徒が発声した全ての音声データの録音ＩＤ（又は音声データそのもの）を端末１０に送信する。 M12-1, M12-2: The transmitting / receiving unit 51 of the management system 50 transmits the evaluation result to the terminal 10 on the teacher side and the terminal 70 on the student side, respectively. In addition, the recording ID of the teacher's voice data (or the voice data itself) is transmitted to each of the student terminals 70, and the recording ID of the voice data uttered by the student (or the voice data itself) is transmitted to the terminal that uttered the voice data. 70 individually. The recording ID of the teacher's voice data (or the voice data itself) and the recording IDs of all the voice data uttered by each student (or the voice data itself) are transmitted to the terminal 10.

T5：端末１０の送受信部１１は評価結果を受信し、出力制御部１４が教師画面Ｔ３を表示する。教師画面Ｔ３を図１３に示す。教師画面Ｔ３は、教師が各生徒の評価結果を確認する画面である。 T5: The transmitting / receiving unit 11 of the terminal 10 receives the evaluation result, and the output control unit 14 displays the teacher screen T3. FIG. 13 shows the teacher screen T3. The teacher screen T3 is a screen where the teacher checks the evaluation result of each student.

S8：端末７０の送受信部７１は評価結果を受信し、評価結果を受信したことを契機に、出力制御部７４が生徒画面Ｓ２から生徒画面Ｓ３（第５の画面の一例）に遷移する。なお、生徒画面Ｓ２から生徒画面Ｓ３に遷移させる信号を管理システム５０が端末７０に明示的に送信してもよい。生徒画面Ｓ３を図１６に示す。生徒画面Ｓ３は、生徒が自分の評価結果を確認する画面である。生徒画面Ｓ３への遷移時も、生徒は生徒画面Ｓ３を表示するために通信アプリＡ７を操作する必要がない。 S8: The transmitting / receiving unit 71 of the terminal 70 receives the evaluation result, and upon receiving the evaluation result, the output control unit 74 transits from the student screen S2 to the student screen S3 (an example of a fifth screen). Note that the management system 50 may explicitly transmit a signal for transitioning from the student screen S2 to the student screen S3 to the terminal 70. FIG. 16 shows the student screen S3. The student screen S3 is a screen on which the student confirms his / her own evaluation result. At the time of transition to the student screen S3, the student does not need to operate the communication application A7 to display the student screen S3.

＜画面例＞
図１１〜図１６を用いて教師の端末１０と生徒の端末７０が表示する画面例を説明する。図１１は、教師画面Ｔ１の一例を示す。教師画面Ｔ１は、会議を開始した当初、端末７０が表示する画面である。教師画面Ｔ１は、教師画像表示欄３０１，生徒画像表示欄３０２、事前録音データリスト欄３０３、アップロードボタン３０４、スタートボタン３０５、及び、エンドボタン３０６を有する。 <Screen example>
An example of a screen displayed by the teacher terminal 10 and the student terminal 70 will be described with reference to FIGS. FIG. 11 shows an example of the teacher screen T1. The teacher screen T1 is a screen displayed by the terminal 70 at the beginning of the conference. The teacher screen T1 has a teacher image display field 301, a student image display field 302, a pre-recorded data list field 303, an upload button 304, a start button 305, and an end button 306.

教師画面Ｔ１の画面情報は、ＨＴＭＬを少なくとも含み、更に、ＣＳＳ（Cascade Style Sheet）及びJavaScript（登録商標）等を含むWebAPP（Web Application）で作成されよい。後述する教師画面Ｔ２、Ｔ３、生徒画面Ｓ１〜Ｓ３についても同様である。 The screen information of the teacher screen T1 includes at least HTML, and may be created by WebAPP (Web Application) including CSS (Cascade Style Sheet) and JavaScript (registered trademark). The same applies to teacher screens T2 and T3 and student screens S1 to S3 described later.

教師画像表示欄３０１は、教師が使用する端末１０が撮像した画像データが表示される欄である。生徒画像表示欄３０２は、生徒が使用する端末７０が撮像した画像データが表示される欄である。図１１では４つの生徒画像表示欄３０２があるが、生徒画像表示欄３０２の数は生徒の数によって増減する。 The teacher image display column 301 is a column in which image data captured by the terminal 10 used by the teacher is displayed. The student image display column 302 is a column in which image data captured by the terminal 70 used by the student is displayed. In FIG. 11, there are four student image display columns 302, but the number of student image display columns 302 increases or decreases depending on the number of students.

事前録音データリスト欄３０３は、管理システム５０が端末１０に送信した録音ファイル名のリストが表示される欄である。図１１では５つの録音ファイル名が表示されているが、数は一例である。表示しきれない場合はスクロールボタンなどが表示される。それぞれの録音ファイル名はユーザの押下（選択）を受け付けることができる。アップロードボタン３０４は教師が選択した録音ファイル名の録音ＩＤを端末１０が管理システム５０に送信するためのボタンである。スタートボタン３０５とエンドボタン３０６は事前に録音された録音ファイルでなく、教師がリアルタイムに発声した音声データを入力し、端末１０が管理システム５０に送信する際に使用される。すなわち、教師がスタートボタン３０５を押下してからエンドボタン３０６を押下するまでに教師が発声した音声が管理システム５０に送信される。詳細は図１７にて説明する。 The pre-recorded data list column 303 is a column in which a list of recorded file names transmitted from the management system 50 to the terminal 10 is displayed. In FIG. 11, five recording file names are displayed, but the number is an example. If not, a scroll button or the like is displayed. Each recording file name can receive a press (selection) by the user. The upload button 304 is a button for the terminal 10 to transmit the recording ID of the recording file name selected by the teacher to the management system 50. The start button 305 and the end button 306 are used when the terminal 10 inputs voice data uttered in real time by the teacher instead of a recorded file recorded in advance, and the terminal 10 transmits the voice data to the management system 50. That is, the voice uttered by the teacher from when the teacher presses the start button 305 to when the end button 306 is pressed is transmitted to the management system 50. Details will be described with reference to FIG.

図１２（ａ）は、教師画面Ｔ２ａの一例を示す。教師画面Ｔ２ａは、教師画面Ｔ１から遷移する。なお、教師画面Ｔ２ａに関しては教師画面Ｔ１との相違を説明する。教師画面の構成は教師画面Ｔ１と同様であるが、教師画面Ｔ２ａでは事前録音データリスト欄３０３の録音ファイル名が１つ選択された状態である。このように、教師が選択した録音ファイル名は強調され、アップロードボタン３０４による管理システム５０への録音ＩＤの送信が可能になる。 FIG. 12A shows an example of the teacher screen T2a. The teacher screen T2a transits from the teacher screen T1. The difference between the teacher screen T2a and the teacher screen T1 will be described. The configuration of the teacher screen is the same as that of the teacher screen T1, but in the teacher screen T2a, one recording file name in the pre-recorded data list column 303 is selected. As described above, the recording file name selected by the teacher is emphasized, and the transmission of the recording ID to the management system 50 by the upload button 304 becomes possible.

教師画面Ｔ２ａでは教師は録音ファイルを選択すれば、お手本の音声データを生徒に聴かせることができるので、いつも同じ音声データを聴かせることができる。なお、録音ファイルは教師本人（同一人物）が録音しておく必要はなく、使い回してよい。 In the teacher screen T2a, if the teacher selects a recording file, the teacher can make the sample audio data available to the student, so that the same audio data can always be heard. The recording file need not be recorded by the teacher (same person), and may be reused.

図１２（ｂ）は、教師画面Ｔ２ｂの一例を示す。教師画面Ｔ２ｂは、教師画面Ｔ１から遷移する。なお、教師画面Ｔ２ｂに関しては教師画面Ｔ１との相違を説明する。教師画面の構成は教師画面Ｔ１と同様であるが、教師画面Ｔ２ｂではスタートボタン３０５が押下された状態である。したがって、教師が音声を発声中であり、この音声データがリアルタイムに管理システム５０に送信されている。 FIG. 12B shows an example of the teacher screen T2b. The teacher screen T2b transits from the teacher screen T1. The difference between the teacher screen T1 and the teacher screen T1 will be described. The configuration of the teacher screen is the same as that of the teacher screen T1, except that the start button 305 is pressed on the teacher screen T2b. Therefore, the teacher is producing a voice, and the voice data is transmitted to the management system 50 in real time.

教師画面Ｔ２ｂでは教師はリアルタイムにお手本の音声データを生徒に聴かせることができるので、任意のセンテンス（英語の文又は文章）を生徒に練習させることができる。また、教師画面Ｔ１，Ｔ２ａ、Ｔ２ｂの構成から明らかなように、教師は事前に録音された音声データとリアルタイムの音声データを適宜、組み合わせながら授業できる。 In the teacher screen T2b, the teacher can make the students listen to the sample audio data in real time, so that the students can practice arbitrary sentences (English sentences or sentences). Also, as is clear from the configuration of the teacher screens T1, T2a, and T2b, the teacher can teach while appropriately combining pre-recorded audio data and real-time audio data.

図１３は、教師画面Ｔ３の一例を示す。教師画面Ｔ３は、正解音声再生ボタン３１１、リトライボタン３１２、終了ボタン３１３、生徒画像表示欄３１４、発音記号欄３１６ａ、３１６ｂ、詳細評価欄３１７、点数欄３１８、及び、回答音声再生ボタン３１５を有する。正解音声再生ボタン３１１は、お手本の録音ファイルを端末１０、７０で出力させるためのボタンである。お手本の録音ファイルの録音ＩＤが管理システム５０に送信される。この場合、単に音声データが流れるだけでリトライ番号は増えない。リトライボタン３１２は、同じお手本の録音ファイルの録音ＩＤを再度、管理システム５０に送信して生徒たちに発声させるためのボタンである。 FIG. 13 shows an example of the teacher screen T3. The teacher screen T3 has a correct answer voice play button 311, a retry button 312, an end button 313, a student image display field 314, a phonetic symbol field 316a, 316b, a detailed evaluation field 317, a score field 318, and an answer voice play button 315. . The correct sound reproduction button 311 is a button for outputting a model recording file on the terminals 10 and 70. The recording ID of the model recording file is transmitted to the management system 50. In this case, the retry number does not increase simply because the audio data flows. The retry button 312 is a button for transmitting the recording ID of the same model recording file to the management system 50 again and causing the students to speak.

再度、同一の問題を出題する場合、教師が端末１０のリトライボタン３１２を押下することで、録音管理部５４と解析・結果管理部５５は図１０のステップＴ４以降を再実行し、同一のフローを繰り返す。まず、お手本の録音ファイルの録音ＩＤとリトライである旨が管理システム５０に送信される。録音管理部５４はリトライである旨を端末７０に送信することで、生徒画面Ｓ３から生徒画面Ｓ１に遷移させる（これが、生徒画面Ｓ３から生徒画面Ｓ１に遷移させる信号となる）。次に、録音管理部５４は録音ＩＤの録音ファイルを端末７０に送信することで、端末７０は録音ファイルの例えば受信完了時に生徒画面Ｓ１から生徒画面Ｓ２に遷移する。生徒画面Ｓ１から生徒画面Ｓ２に遷移させる信号を管理システム５０が端末７０に明示的に送信してもよい。録音管理部５４は、再度、端末７０からの音声データを録音して録音ＩＤを採番し、録音管理ＤＢ５００３に登録する。次に、解析・結果管理部５５は音声を比較する。解析・結果管理部５５は結果管理ＤＢ５００４の手本録音ＩＤをステップＴ４の録音ＩＤで検索し、ヒットしたレコードの中から最も大きいリトライ番号を有するレコードを特定し、このレコードのリトライ番号を１つ大きくする。そして、手本録音ＩＤ（Ｔ４で送信されている録音ＩＤ）、回答録音ＩＤ（新たに採番した端末７０の音声データの録音ＩＤ）、会議ＩＤ、問題番号（録音ＩＤでヒットしたレコードの問題番号と同じ）、１つ大きくしたリトライ番号、手本解析結果、回答認識結果、及び、採点結果を結果管理ＤＢ５００４に登録する。送受信部５１は評価結果を端末１０，７０に送信する。以上により、端末１０には、再度、図１３の教師画面Ｔ３が表示される。また、端末７０には、再度、図１６の生徒画面Ｓ３が表示される。 When the same question is asked again, the teacher presses the retry button 312 of the terminal 10, and the recording management unit 54 and the analysis / result management unit 55 re-execute Step T4 and subsequent steps in FIG. repeat. First, the recording ID of the model recording file and the fact that the retry is to be performed are transmitted to the management system 50. The recording management unit 54 transmits from the terminal screen 70 the retry to the terminal screen 70, thereby transitioning from the student screen S3 to the student screen S1 (this is a signal for transitioning from the student screen S3 to the student screen S1). Next, the recording management unit 54 transmits the recording file of the recording ID to the terminal 70, and the terminal 70 transitions from the student screen S1 to the student screen S2 when the reception of the recording file is completed, for example. The management system 50 may explicitly transmit a signal for transitioning from the student screen S1 to the student screen S2 to the terminal 70. The recording management unit 54 records the voice data from the terminal 70 again, assigns a recording ID, and registers the recording ID in the recording management DB 5003. Next, the analysis / result management unit 55 compares the sounds. The analysis / result management unit 55 searches the sample recording ID in the result management DB 5004 by the recording ID in step T4, specifies the record having the highest retry number from the hit records, and assigns the retry number of this record to one. Enlarge. The sample recording ID (recording ID transmitted in T4), the answer recording ID (recording ID of the voice data of the newly numbered terminal 70), the conference ID, and the problem number (the problem of the record hit by the recording ID) The retry number, the model analysis result, the answer recognition result, and the scoring result, which are increased by one, are registered in the result management DB 5004. The transmission / reception unit 51 transmits the evaluation result to the terminals 10 and 70. Thus, the teacher screen T3 of FIG. 13 is displayed on the terminal 10 again. The student screen S3 of FIG. 16 is displayed on the terminal 70 again.

終了ボタン３１３は現在のお手本の録音ファイルの出題を終了するためのボタンである。終了ボタン３１３が押下されると、教師の端末１０は教師画面Ｔ１に戻り、生徒の端末１０は生徒画面Ｓ１に戻る。したがって、通常のビデオ会議の状態となる。この場合も生徒は生徒画面Ｓ１に戻るための操作が不要である。 The end button 313 is a button for ending the question of the current model recording file. When the end button 313 is pressed, the teacher terminal 10 returns to the teacher screen T1, and the student terminal 10 returns to the student screen S1. Therefore, a normal video conference state is set. Also in this case, the student does not need to perform an operation for returning to the student screen S1.

生徒画像表示欄３１４は図１１と同様である。発音記号欄３１６ａは教師の音声データから変換された発音記号を示し、発音記号欄３１６ｂは生徒の音声データから変換された発音記号を示す。詳細評価欄３１７は管理システム５０がお手本の音声データと生徒の音声データを比較して評価した結果である。図１３では、音声データの各単語が発音記号に変換され、発音記号ごとに「○」「×」が添付されている。「○」はお手本の音声データと生徒の音声データの発音記号が一致したことを示し、「×」はお手本の音声データと生徒の音声データの発音記号が一致しないことを示す。また、点数欄３１８には一致の程度に応じて点数が付与されている。解析・結果管理部５５は１回の出題を１センテンスとして、１センテンスに含まれる単語のうち発音記号が一致した割合を点数とする。文字レベルで採点してもよいし、採点方法はこれらに限られない。 The student image display field 314 is the same as in FIG. The phonetic symbol column 316a shows phonetic symbols converted from teacher voice data, and the phonetic symbol column 316b shows phonetic symbols converted from student voice data. The detailed evaluation column 317 is a result of the evaluation by the management system 50 comparing the sample audio data and the student audio data. In FIG. 13, each word of the audio data is converted into a phonetic symbol, and "O" and "X" are attached to each phonetic symbol. “○” indicates that the pronunciation symbols of the sample voice data and the student's voice data match, and “x” indicates that the pronunciation symbols of the sample voice data and the student's voice data do not match. Further, points are given to the point column 318 according to the degree of coincidence. The analysis / result management unit 55 treats one question as one sentence, and sets a score of the proportion of words included in one sentence in which pronunciation symbols match. The scoring may be performed at the character level, and the scoring method is not limited to these.

回答音声再生ボタン３１５は、生徒の音声データを出力するためのボタンである。回答音声再生ボタン３１５の押下により録音ファイルの録音ＩＤが管理システム５０に送信される。生徒の音声データは管理システム５０が教師の端末１０にのみ送信してもよいし、この音声データを発声した端末７０と端末１０にのみ送信してもよいし、全ての端末１０，７０に送信してもよい。なお、録音管理ＤＢ５００３では録音ＩＤには通信ＩＤが対応付けられているので、管理システム５０はこの通信ＩＤの端末７０にのみ送信できる。 The answer voice reproduction button 315 is a button for outputting the voice data of the student. When the answer voice reproduction button 315 is pressed, the recording ID of the recording file is transmitted to the management system 50. The student's voice data may be transmitted by the management system 50 only to the teacher's terminal 10, may be transmitted only to the terminal 70 and the terminal 10 that uttered the voice data, or may be transmitted to all the terminals 10 and 70. May be. Since the communication ID is associated with the recording ID in the recording management DB 5003, the management system 50 can transmit only to the terminal 70 having this communication ID.

図１３に示すように、教師画面Ｔ３には各生徒の評価結果が一覧で表示されるので、各生徒が一度に発声しても各生徒の音声データを個別に評価できる。また、教師は各生徒の音声データを任意に出力できるので、自分自身が生徒の音声データを評価することもできる。 As shown in FIG. 13, since the evaluation result of each student is displayed in a list on the teacher screen T3, even if each student utters at once, the voice data of each student can be individually evaluated. In addition, since the teacher can arbitrarily output the voice data of each student, the teacher can evaluate the voice data of the student.

図１４は、生徒画面Ｓ１の一例を示す。生徒画面Ｓ１は、会議を開始した当初、端末１０が表示する画面である。生徒画面Ｓ１は、教師画像表示欄３２１，及び、生徒画像表示欄３２２、を有する。教師画像表示欄３２１と生徒画像表示欄３２２については教師画面Ｔ１と同様でよい。生徒画面Ｓ１には「Listen in class」というメッセージ３２３が表示されている。これは授業内容を聞くという意味であり、生徒画面Ｓ１は教師の音声の出力中に表示される。生徒画面Ｓ１では教師が生徒と会話したり、教師が生徒に指示を出したりする。生徒が教師に質問することも可能である。 FIG. 14 shows an example of the student screen S1. The student screen S1 is a screen displayed by the terminal 10 at the beginning of the start of the conference. The student screen S1 has a teacher image display section 321 and a student image display section 322. The teacher image display section 321 and the student image display section 322 may be the same as the teacher screen T1. A message 323 "Listen in class" is displayed on the student screen S1. This means listening to the contents of the lesson, and the student screen S1 is displayed while the teacher's voice is being output. In the student screen S1, the teacher has a conversation with the student or the teacher gives an instruction to the student. Students can ask teachers questions.

図１５は、生徒画面Ｓ２の一例を示す。生徒画面Ｓ２は、各生徒が手本の音声データをまねて音声を発声する画面である。生徒画面Ｓ２は、教師画像表示欄３２１，生徒画像表示欄３２２、及び、エンドボタン３２５を有する。教師画像表示欄３２１と生徒画像表示欄３２２については教師画面Ｔ１と同様でよい。生徒画面Ｓ２には「Repeat!!」というメッセージ３２４が表示されている。これは手本の音声データをまねて発声しないという意味である。生徒画面Ｓ２が表示されると、端末７０ではお手本の音声データがスピーカから出力され、通信ＩＤとＴＹＰＥが送信されたタイミングで管理システム５０が音声データの録音を開始する。エンドボタン３２５は発声が終わった生徒が録音の終了を管理システム５０に通知するためのボタンである。 FIG. 15 shows an example of the student screen S2. The student screen S2 is a screen in which each student utters a voice by imitating model voice data. The student screen S2 has a teacher image display field 321, a student image display field 322, and an end button 325. The teacher image display section 321 and the student image display section 322 may be the same as the teacher screen T1. A message 324 “Repeat !!” is displayed on the student screen S2. This means that the model voice data is not uttered. When the student screen S2 is displayed, the terminal 70 outputs sample audio data from the speaker, and the management system 50 starts recording the audio data at the timing when the communication ID and the TYPE are transmitted. The end button 325 is a button for the student who has finished speaking to notify the management system 50 of the end of the recording.

図１６は、生徒画面Ｓ３の一例を示す。生徒画面Ｓ３は、生徒の評価結果が表示される画面である。生徒画面Ｓ３は、正解音声再生ボタン３３１、教師画像表示欄３３２、発音記号欄３３３ａ、３３３ｂ、詳細評価欄３３４、点数欄３３６、及び、回答音声再生ボタン３３５を有する。これらの機能は教師画面３と同様でよい。 FIG. 16 shows an example of the student screen S3. The student screen S3 is a screen on which student evaluation results are displayed. The student screen S3 has a correct answer sound reproduction button 331, a teacher image display field 332, phonetic symbol fields 333a and 333b, a detailed evaluation field 334, a score field 336, and an answer sound reproduction button 335. These functions may be the same as those of the teacher screen 3.

生徒画面Ｓ３ではリトライして発声した音声データごとに評価結果が表示されるので、生徒は各音声データでどこが悪かったのか、改善されたのか、どこが間違いやすいのかなどを把握できる。また、お手本の音声データを任意に出力できる。また、自分の音声データを出力して確認できる。 In the student screen S3, the evaluation result is displayed for each voice data retried and uttered, so that the student can grasp what is wrong with each voice data, whether the voice data has been improved, what is likely to be wrong, and the like. In addition, sample audio data can be arbitrarily output. In addition, the user can output his own voice data and check it.

＜録音済みの音声データでなく教師が発声する場合の処理手順＞
図１０では録音済みの音声データをお手本として生徒が発声する処理の流れを説明したが、図１７を用いて教師が発声した音声データをお手本として生徒が発声する処理の手順を説明する。図１７は、管理システム５０が生徒の音声データを評価する手順を示すシーケンス図の一例である。なお、図１７の説明では図１０との相違を主に説明する。 <Processing procedure when the teacher utters voice instead of recorded voice data>
In FIG. 10, the flow of the process in which the student utters using the recorded voice data as an example has been described. However, the procedure of the process in which the student utters using the voice data uttered by the teacher as an example will be described with reference to FIG. FIG. 17 is an example of a sequence diagram showing a procedure in which the management system 50 evaluates student voice data. Note that the description of FIG. 17 mainly describes differences from FIG.

まず、ステップＴ１〜Ｔ２、ステップＭ１〜Ｍ４、ステップＳ１〜Ｓ３は図１０と同様でよい。 First, steps T1 to T2, steps M1 to M4, and steps S1 to S3 may be the same as those in FIG.

T3：教師はリアルタイムに発声した音声データをお手本とするため教師画面Ｔ１でスタートボタン３０５を押下する。 T3: The teacher presses the start button 305 on the teacher screen T1 to model the voice data uttered in real time.

T4：端末１０のＴＹＰＥと通信ＩＤを通知するため、端末１０の送受信部１１は通信ＩＤとＴＹＰＥを管理システム５０に送信する。 T4: The transmitting / receiving unit 11 of the terminal 10 transmits the communication ID and the TYPE to the management system 50 in order to notify the TYPE of the terminal 10 and the communication ID.

M4：管理システム５０の送受信部５１はＴＹＰＥと通信ＩＤを受信し、これにより録音管理部５４は教師の音声データの録音を開始する。 M4: The transmission / reception unit 51 of the management system 50 receives the TYPE and the communication ID, whereby the recording management unit 54 starts recording the teacher's voice data.

T5：教師が発声を終えると教師画面Ｔ２ｂ（図１２（ｂ））でエンドボタン３０６を押下する。操作入力受付部１２は押下を受け付ける。 T5: When the teacher finishes speaking, he presses the end button 306 on the teacher screen T2b (FIG. 12B). The operation input receiving unit 12 receives the press.

T6：教師の端末１０の送受信部１１は録音を終了させるため通信ＩＤを管理システム５０に送信する。 T6: The transmission / reception unit 11 of the teacher's terminal 10 transmits the communication ID to the management system 50 to end the recording.

M5：管理システム５０の送受信部５１は通信ＩＤを受信し、これにより録音管理部５４は録音を終了し、録音ＩＤを採番する。 M5: The transmission / reception unit 51 of the management system 50 receives the communication ID, whereby the recording management unit 54 ends the recording and assigns the recording ID.

M6：管理システム５０の録音管理部５４は録音ＩＤ、ＴＹＰＥ、通信ＩＤ、及び、録音ファイルを録音管理ＤＢ５００３に登録する。 M6: The recording management unit 54 of the management system 50 registers the recording ID, the TYPE, the communication ID, and the recording file in the recording management DB 5003.

M7：管理システム５０の送受信部５１は、録音管理ＤＢ５００３に登録した音声データを端末７０に送信する。これにより、端末７０は生徒画面Ｓ２を表示するので、以降の処理は図１０と同様になる。 M7: The transmission / reception unit 51 of the management system 50 transmits the audio data registered in the recording management DB 5003 to the terminal 70. As a result, the terminal 70 displays the student screen S2, and the subsequent processing is the same as in FIG.

このように、本実施形態の通信システム１は、お手本の音声データを教師が発声しても、多数の生徒の音声データと比較した評価及び細やかな指導が可能になる。 In this manner, the communication system 1 of the present embodiment enables evaluation and detailed guidance in comparison with the voice data of many students, even if the teacher utters the voice data of the model.

＜音声データの比較・評価＞
本実施形態では教師の音声データと生徒の音声データを発音記号で比較し、単語毎に発音がどの程度一致するか否かを比較した。また、１つのセンテンスのうち発音記号が一致する単語の数の割合を点数とした。このように音声データを発音記号に変換する技術としては音声認識技術を利用できる。 <Comparison and evaluation of audio data>
In the present embodiment, the voice data of the teacher and the voice data of the student are compared using phonetic symbols, and the degree of pronunciation matching for each word is compared. The score was defined as the ratio of the number of words having the same phonetic symbol in one sentence. As a technique for converting speech data into phonetic symbols, a speech recognition technique can be used.

図１８は、音声認識を模式的に説明する図の一例である。図１８は機械学習により音声を発音記号に変換するニューラルネットワーク４０１を示す。特に、ニューラルネットワーク４０１の階層が深いものをディープラーニングという。また、音声データのような時系列データを分類するにはＲＮＮ（Reccurent Neural Network）が適していることが知られている。 FIG. 18 is an example of a diagram schematically illustrating speech recognition. FIG. 18 shows a neural network 401 that converts speech into phonetic symbols by machine learning. In particular, a deep layer of the neural network 401 is called deep learning. It is known that an RNN (Reccurent Neural Network) is suitable for classifying time-series data such as voice data.

一例として２０ミリ秒ごとにスライスした音声データをＭＦＣＣ（Mel−frequency cepstral coefficients）に変換し、入力層４０２の各ノードに入力する。出力層４０３の各ノードには各発音記号が対応する。 As an example, audio data sliced every 20 milliseconds is converted into MFCC (Mel-frequency cepstral coefficients) and input to each node of the input layer 402. Each node of the output layer 403 corresponds to each phonetic symbol.

学習フェーズにおいて入力層４０２にはＭＦＣＣに変換された音声データが入力され、出力層４０３には、音声データに含まれる音声の発音記号に対応するノードに「１」が割り当てられる（その他のノードは「０」）。そして、入力層４０２のノード、ニューラルネットワーク４０１の各階層のノード、及び、出力層４０３のノードの間の重みが誤差逆伝播法で学習される。 In the learning phase, voice data converted to MFCC is input to the input layer 402, and “1” is assigned to the node corresponding to the phonetic symbol of the voice included in the voice data to the output layer 403 (the other nodes are "0"). Then, the weight between the node of the input layer 402, the node of each layer of the neural network 401, and the node of the output layer 403 is learned by the back propagation method.

認識フェーズでは、入力された音声データが含む発音記号に近い出力層４０３のノードほど大きい値（１に近い値）を出力する。解析・結果管理部５５は閾値以上の値を出力したノードに対応する発音記号を取り出す。この発音記号が複数の場合は、発音記号のつながりやすさを確率で示す辞書データを参照し、前後の発音記号との組み合わせごとにつながりやすさの確率を算出し、確率が最も高くなる組み合わせになるように発音記号を特定する。 In the recognition phase, a larger value (a value closer to 1) is output to a node of the output layer 403 closer to a phonetic symbol included in the input speech data. The analysis / result management unit 55 extracts a phonetic symbol corresponding to the node that has output a value equal to or greater than the threshold. If there are a plurality of phonetic symbols, refer to the dictionary data that indicates the ease of connection of phonetic symbols by probability, calculate the probability of easy connection for each combination with the preceding and following phonetic symbols, and determine the combination with the highest probability. Specify phonetic symbols so that

なお、種々の音声認識方法があり、図１８の説明は一例に過ぎない。例えば隠れマルコフモデルを使用して音声認識してもよい。また、本実施形態ではこの他の手法を採用しても何ら支障がない。 Note that there are various voice recognition methods, and the description in FIG. 18 is merely an example. For example, speech recognition may be performed using a hidden Markov model. In the present embodiment, there is no problem even if other methods are adopted.

また、本実施形態では音声データを発音記号に変換したが、音声データを単語（アルファベット）に変換してもよい。音声データの単語（アルファベット）への変換には公知の方法を採用できる。一般的な英語の辞書には単語の発音記号が掲載されているので、単語が決まれば発音記号も一意に決まる。したがって、音声データを発音記号に変換できる。ただし、発音が悪い音声も正しい単語に認識するような優秀な識別器が生成されている場合、生徒の発音が悪くても正しい発音記号であると判断されるおそれがある。 In the present embodiment, the audio data is converted to phonetic symbols, but the audio data may be converted to words (alphabet). A known method can be used to convert the audio data into words (alphabets). In a general English dictionary, phonetic symbols for words are listed, so once a word is determined, phonetic symbols are uniquely determined. Therefore, voice data can be converted into phonetic symbols. However, when an excellent classifier that recognizes a voice with poor pronunciation as a correct word is generated, there is a possibility that a student may be judged to be a correct pronunciation symbol even if the student has poor pronunciation.

楽器などの音データの評価では、例えば教師と生徒の音データをそれぞれフーリエ変換して、周波数ごとの強度の差異を比較して数値化するなどの方法がある。 For evaluation of sound data of musical instruments and the like, for example, there is a method of performing Fourier transform on sound data of a teacher and a student, comparing the difference in intensity for each frequency, and digitizing the difference.

＜まとめ＞
以上説明したように、本実施形態の通信システム１は、遠隔地の教師が複数の生徒の発声を指導する形態においても、管理システム５０が教師の音声と各生徒の音声を比較するので、１人の教師が多数の生徒の発声を並行して指導することが可能になる。１対１の授業と同等以上に細やかな指導が可能である。 <Summary>
As described above, in the communication system 1 of the present embodiment, the management system 50 compares the teacher's voice with each student's voice even in a mode in which a remote teacher instructs a plurality of students. It is possible for one teacher to teach a large number of students in parallel. Teaching that is as detailed as one-on-one lessons is possible.

＜その他の適用例＞
以上、本発明を実施するための最良の形態について実施例を用いて説明したが、本発明はこうした実施例に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 <Other application examples>
The best mode for carrying out the present invention has been described above by using the embodiments. However, the present invention is not limited to these embodiments, and various modifications can be made without departing from the gist of the present invention. And substitutions can be added.

＜＜システム構成の別の例＞＞
図２では管理システムが１台であったが管理システム５０は通信システム１内に複数存在してもよい。図１９は、本実施形態に係る通信システム１の概略図である。図１９では、複数の管理システム５０がネットワークに接続されている。このように通信システム１に含まれる管理システム５０は複数台でもよく、どの管理システム５０に機能を備えさせてもよい。 << Another example of system configuration >>
Although one management system is shown in FIG. 2, a plurality of management systems 50 may exist in the communication system 1. FIG. 19 is a schematic diagram of the communication system 1 according to the present embodiment. In FIG. 19, a plurality of management systems 50 are connected to a network. As described above, the management system 50 included in the communication system 1 may be plural, and any management system 50 may have a function.

また、本実施形態で説明する端末１０と管理システム５０とが接続されたシステム構成は一例であり、用途や目的に応じて様々なシステム構成例があることは言うまでもない。 Further, the system configuration in which the terminal 10 and the management system 50 described in the present embodiment are connected is merely an example, and it goes without saying that there are various system configuration examples according to applications and purposes.

また、図２０に示すように、管理システム５０と中継装置３０は１つのサーバ装置６０として存在してもよい。図２０のサーバ装置６０は、管理システム５０の機能と中継装置３０の機能を併せ持っている。また、このサーバ装置６０が複数存在してもよい。 As shown in FIG. 20, the management system 50 and the relay device 30 may exist as one server device 60. The server device 60 in FIG. 20 has both the function of the management system 50 and the function of the relay device 30. Further, a plurality of server devices 60 may exist.

＜＜メールアドレスによるユーザの識別＞＞
本実施形態では通信ＩＤで端末１０、７０が識別されていたが、端末１０を識別する識別情報として、ユーザＩＤが用いられてもよい。また、このユーザＩＤとしてメールアドレスが用いられてよい。この場合、端末１０、７０のユーザはメールアドレスで管理システム５０にログインする。 << User identification by email address >>
In the present embodiment, the terminals 10 and 70 are identified by the communication ID, but a user ID may be used as identification information for identifying the terminal 10. Also, a mail address may be used as the user ID. In this case, the user of the terminal 10 or 70 logs in to the management system 50 with the mail address.

図２１は、端末１０，７０が管理システム５０へログインする処理を示すシーケンス図の一例である。なお、図２１の説明では図９との相違を主に説明する。図２１に示すように、ステップＳ３、Ｓ１３−２では通信ＩＤでなく、メールアドレスが送信されている。管理システム５０はメールアドレスとパスワードでユーザを認証する。 FIG. 21 is an example of a sequence diagram illustrating a process in which the terminals 10 and 70 log in to the management system 50. In the description of FIG. 21, differences from FIG. 9 will be mainly described. As shown in FIG. 21, in steps S3 and S13-2, a mail address is transmitted instead of a communication ID. The management system 50 authenticates the user with the mail address and the password.

同様に、図１０で説明した生徒の音声データを評価する手順では、ステップＳ４、Ｓ７で通信ＩＤでなくメールアドレスが送信される。図１７についても同様に、通信ＩＤの代わりにメールアドレスが使用される。 Similarly, in the procedure for evaluating student voice data described with reference to FIG. 10, a mail address is transmitted instead of a communication ID in steps S4 and S7. Similarly, in FIG. 17, a mail address is used instead of the communication ID.

＜＜アプリでなく汎用的なブラウザアプリによるテレビ会議＞＞
上記の実施形態では、端末１０、７０で通信アプリＡ１、Ａ７が動作すると説明したが、同様の処理をＷｅｂアプリによっても実現できる。Ｗｅｂアプリは、Ｗｅｂブラウザ上で動作する例えばJavaScript（登録商標）によるプログラムとＷｅｂサーバ側のプログラムが協調することによって動作し、ユーザはそれをブラウザ上で使用する。 << Video conference using general browser application instead of application >>
In the embodiment described above, the communication applications A1 and A7 operate on the terminals 10 and 70, but the same processing can be realized by the Web application. The Web application operates by coordinating a program based on, for example, JavaScript (registered trademark) that operates on the Web browser and a program on the Web server side, and the user uses the Web application on the browser.

図２２は、端末１０，７０におけるＷｅｂアプリの構成例を示す図である。端末１０は管理システム５０から、教師用のプログラム１０３０（HTML5＋JAVASCRIPT＋CSS等）をダウンロードしブラウザアプリ１０４０上で実行する。端末７０は管理システム５０から、生徒用のプログラム７０３０（HTML5＋JAVASCRIPT＋CSS等）をダウンロードしブラウザアプリ７０４０上で実行する。この場合は例えば通信ＩＤで教師か生徒かが判断される。 FIG. 22 is a diagram illustrating a configuration example of a Web application in the terminals 10 and 70. The terminal 10 downloads the teacher program 1030 (HTML5 + JAVASCRIPT + CSS, etc.) from the management system 50 and executes it on the browser application 1040. The terminal 70 downloads the student program 7030 (HTML5 + JAVASCRIPT + CSS, etc.) from the management system 50 and executes it on the browser application 7040. In this case, for example, a communication ID is used to determine whether a teacher or a student.

なお、図２２においても、通信ＩＤでなく端末１０を識別する識別情報としてユーザＩＤが用いられてよい。このユーザＩＤとしてメールアドレスが用いられてよい。 In FIG. 22, a user ID may be used as identification information for identifying the terminal 10 instead of the communication ID. A mail address may be used as the user ID.

端末１０，７０はHTTP又はHTTPS等のプロトコルを用いて管理システム５０とデータを送受信することによって、管理システム５０が提供しているサービスを利用できる。このような利用形態では、予め端末１０，７０に通信アプリＡ１、Ａ７をダウンロードしておく必要がない。 The terminals 10 and 70 can use services provided by the management system 50 by transmitting and receiving data to and from the management system 50 using a protocol such as HTTP or HTTPS. In such a usage form, there is no need to download the communication applications A1 and A7 to the terminals 10 and 70 in advance.

＜その他＞
例えば、本実施形態では、テレビ会議端末（通信端末）を用いた会議室への参加を例に説明したが、通信端末の機能を有する各種の装置でも適用できる。例えば、通信端末の機能を有している電子黒板にも好適に適用できる。 <Others>
For example, in the present embodiment, an example has been described in which the user participates in a conference room using a video conference terminal (communication terminal). However, the present invention can be applied to various devices having the function of a communication terminal. For example, the present invention can be suitably applied to an electronic blackboard having a function of a communication terminal.

また、電子黒板には大型のタッチパネルを備えたものだけでなく、プロジェクタが映像を投影し、電子ペンの位置を音波等とカメラにより検出するものがある。 Some electronic blackboards have a large touch panel, and others have a projector that projects an image and detects the position of the electronic pen with a sound wave or the like and a camera.

また、図６などの構成例は、管理システム５０、端末１０、７０、及び中継装置３０による処理の理解を容易にするために、主な機能に応じて分割したものである。処理単位の分割の仕方や名称によって本願発明が制限されることはない。管理システム５０、端末１０、７０、及び中継装置３０の処理は、処理内容に応じて更に多くの処理単位に分割することもできる。また、１つの処理単位が更に多くの処理を含むように分割することもできる。 6 is divided according to main functions in order to facilitate understanding of the processing by the management system 50, the terminals 10, 70, and the relay device 30. The present invention is not limited by the way of dividing the processing unit or the name. The processing of the management system 50, the terminals 10, 70, and the relay device 30 can be divided into more processing units according to the processing content. Moreover, it can also divide | segment so that one process unit may contain many processes.

上記で説明した実施形態の各機能は、一又は複数の処理回路によって実現することが可能である。ここで、本明細書における「処理回路」とは、電子回路により実装されるプロセッサのようにソフトウェアによって各機能を実行するようプログラミングされたプロセッサや、上記で説明した各機能を実行するよう設計されたASIC(Application Specific Integrated Circuit)、DSP（digital signal processor）、FPGA（field programmable gate array）、SOC(System on a chip)、GPU（Graphics Processing Unit）や従来の回路モジュール等のデバイスを含むものとする。 Each function of the embodiment described above can be realized by one or a plurality of processing circuits. Here, the “processing circuit” in the present specification refers to a processor programmed to execute each function by software, such as a processor implemented by an electronic circuit, or a processor designed to execute each function described above. ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), FPGAs (field programmable gate arrays), SOCs (System on a chip), GPUs (Graphics Processing Units), and devices such as conventional circuit modules.

また、本実施形態では、教師と生徒が会話しながら授業が進められているが、生徒が録音ファイルを自分でダウンロードして出力し、それをお手本にして発声した音声データを管理システム５０に送信してもよい。管理システム５０は生徒の音声データを評価しておき、教師が任意のタイミングでダウンロードする。こうすることで、互いの時間を節約できたり、生徒が宿題を実行したりすることができる。 In the present embodiment, the teacher and the student have a conversation while the lesson is proceeding. However, the student downloads and outputs the recorded file by himself, and transmits the voice data based on the downloaded file to the management system 50. May be. The management system 50 evaluates the voice data of the student, and the teacher downloads the voice data at an arbitrary timing. This can save each other's time and allow students to do their homework.

また、音声データだけでなく画像データを評価に採用してもよい。例えば、機械学習で口の動きを発音記号に変換することで同様に評価できる。 Further, not only audio data but also image data may be used for evaluation. For example, it can be similarly evaluated by converting mouth movements into phonetic symbols by machine learning.

なお、送受信部５１は通信手段又は第２の通信手段の一例であり、解析・結果管理部５５は生成手段の一例であり、お手本の音声データは第１の音データの一例であり、生徒が発声した音声データは第２の音データの一例であり、教師は第１のユーザの一例であり、生徒は第２のユーザの一例であり、端末１０は第１の通信端末の一例であり、端末７０は第２の通信端末の一例であり、送受信部１１は第１の通信手段の一例であり、出力制御部１４は表示制御手段の一例である。 The transmission / reception unit 51 is an example of a communication unit or a second communication unit, the analysis / result management unit 55 is an example of a generation unit, and the sample audio data is an example of the first sound data. The uttered voice data is an example of second sound data, a teacher is an example of a first user, a student is an example of a second user, and the terminal 10 is an example of a first communication terminal. The terminal 70 is an example of a second communication terminal, the transmission / reception unit 11 is an example of a first communication unit, and the output control unit 14 is an example of a display control unit.

１通信システム
１０、７０端末
３０中継装置
５０管理システム 1 Communication System 10, 70 Terminal 30 Relay Device 50 Management System

特表２００９‐５１１９６４号公報JP-T-2009-511964

Claims

A server capable of communicating with a plurality of communication terminals via a network,
The first sound data is transmitted to the plurality of second communication terminals according to the instruction of the first communication terminal operated by the first user, and the second users of the plurality of second communication terminals emit the respective sound data. Communication means for receiving the plurality of second sound data respectively;
Generating means for generating an evaluation for each of the plurality of second sound data based on the first sound data and the second sound data obtained from each of the plurality of second communication terminals; And
The server, wherein the communication unit transmits the evaluation obtained by the generation unit to the first communication terminal.

The communication means may include, in the first communication terminal, identification information of the plurality of second sound data respectively issued by the second user of the plurality of second communication terminals or the second sound data, respectively. When,
2. The server according to claim 1, wherein screen information of a first screen having a button for individually receiving a plurality of the second sound data is transmitted. 3.

The communication means transmits identification information of the first sound data or the first sound data to the first communication terminal,
The server according to claim 2, wherein the first screen further includes a button for receiving an output of the first sound data.

The first screen has a retry button for outputting the first sound data again from the plurality of second communication terminals,
When the communication unit receives a notification that the retry button has been pressed,
The communication unit transmits the first sound data used by the generation unit to generate the evaluation to each of the plurality of second communication terminals, and a second user of the plurality of second communication terminals transmits Receiving a plurality of second sound data respectively emitted,
The generation unit compares the first sound data with the second sound data obtained from each of the plurality of second communication terminals, and evaluates each of the second sound data again. Generate
The server according to claim 3, wherein the communication unit transmits the evaluation obtained by the generation unit to the first communication terminal.

The first sound data is sound data of a predetermined language,
The generating means converts the first sound data and the plurality of second sound data respectively emitted by the second users of the plurality of second communication terminals into phonetic symbols,
The phonetic symbol converted from the first sound data and the phonetic symbol converted from the second sound data are compared to evaluate each of the plurality of second sound data. Item 5. The server according to any one of Items 1 to 4.

The communication means transmits, to the first communication terminal, screen information of a second screen including a list of the first sound data recorded in advance,
When receiving the identification information of the first sound data selected from the list displayed on the second screen,
The server according to any one of claims 1 to 5, wherein the first sound data specified by the identification information is transmitted to each of the plurality of second communication terminals.

When a button for receiving the input is pressed on the second screen including a button for receiving an input of the first sound data displayed on the first communication terminal,
The communication unit transmits the first sound data input by the first user to the first communication terminal and received from the first communication terminal to the plurality of second communication terminals, respectively. The server according to claim 6, characterized in that:

The communication means is provided to the second communication terminal,
The first sound data identified by the identification information of the first sound data received from the first communication terminal of the first user or the first sound data received from the first communication terminal When,
Screen information of a third screen indicating that the voice of the first user is being output during the output of the first sound data;
8. The server according to claim 7, further comprising: transmitting screen information of a fourth screen that is displayed when the output of the first sound data is completed and that receives an input of the second sound data. 9. .

The communication unit transmits the second sound data so that the second communication terminal that transmitted the second sound data displays an evaluation of the plurality of second sound data obtained by the generation unit. Individually transmitted to said second communication terminal,
Further, the identification information of the first sound data or the first sound data, and the identification information of the second sound data or the second sound data, the second sound data transmitted the second sound data 2 individually transmitted to the communication terminal,
The evaluation that the second communication terminal receives the identification information of the first sound data or the first sound data, and the identification information of the second sound data or the second sound data. A fifth screen including a button for receiving the evaluation, the output of the first sound data, and a button for receiving the output of the second sound data, which is displayed at an opportunity, and transitioning from the fourth screen 9. The server according to claim 8, wherein screen information of the fifth screen to be transmitted is transmitted to the second communication terminal.

A sound data evaluation method performed by a server capable of communicating with a plurality of communication terminals via a network,
The communication means transmits the first sound data to each of the plurality of second communication terminals according to an instruction of the first communication terminal operated by the first user, and transmits the first sound data to the second communication terminal of the plurality of second communication terminals. Receiving a plurality of second sound data respectively emitted by the user;
Generating means for generating an evaluation for each of the plurality of second sound data based on the first sound data and the second sound data obtained from each of the plurality of second communication terminals;
The communication unit transmitting the evaluation obtained by the generation unit to the first communication terminal;
A sound data evaluation method comprising:

A server that can communicate with multiple communication terminals via a network,
The first sound data is transmitted to the plurality of second communication terminals according to the instruction of the first communication terminal operated by the first user, and the second users of the plurality of second communication terminals emit the respective sound data. Communication means for receiving the plurality of second sound data respectively;
The first sound data, based on the second sound data obtained from each of the plurality of second communication terminals, based on the second sound data, generating means for generating an evaluation for each of the plurality of second sound data,
The program, wherein the communication unit transmits the evaluation obtained by the generation unit to the first communication terminal.

A communication system having a first communication terminal and a server capable of communicating via a network,
The first communication terminal includes:
First communication means for transmitting a transmission request for transmitting the first sound data to each of the plurality of second communication terminals to the server,
The server comprises:
The first sound data is transmitted to the plurality of second communication terminals in response to the transmission request, and a plurality of second sounds respectively emitted by a second user of the plurality of second communication terminals are transmitted. Second communication means for receiving data respectively;
Generating means for generating an evaluation for each of the plurality of second sound data based on the first sound data and the second sound data obtained from each of the plurality of second communication terminals; ,
The second communication unit transmits the evaluation obtained by the generation unit to the first communication terminal,
The communication system according to claim 1, wherein the first communication terminal includes display control means for displaying, on a display device, the plurality of evaluations received by the first communication means from the server.