JP2001142482A

JP2001142482A - Device for converting voice to caption

Info

Publication number: JP2001142482A
Application number: JP32003199A
Authority: JP
Inventors: Nobumasa Seiyama; 信正清山; Atsushi Goto; 淳後藤; Toru Imai; 亨今井; Toru Tsugi; 徹都木; Akio Ando; 彰男安藤
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 1999-11-10
Filing date: 1999-11-10
Publication date: 2001-05-25

Abstract

PROBLEM TO BE SOLVED: To present a voice of a preliminarily determined unit at proper timing. SOLUTION: A voice recognition part 102 performs voice recognition processing on the basis of feature parameters and transmits the obtained recognition result to a recognition result correction part 104 and simultaneously transmits recognition result output timing. A voice presenting part 103 receives a voice divided to a proper unit from a sound analysis and division part 1 and receives recognition result output timing from the voice recognition part 102 and presents the voice of the pertinent unit synchronously with recognition result output timing.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声を認識処理し
てリアルタイムで字幕化する音声字幕化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio captioning apparatus for recognizing and processing audio to make captions in real time.

【０００２】[0002]

【従来の技術】従来から、ニュース音声などの音声を音
声認識処理を用いてリアルタイムで字幕化する際に、音
声認識結果を人手で修正することが行われている。音声
認識結果を人手で修正する時、修正者を支援するため、
音声認識に供された元の音声を修正者に呈示することが
行われており、この音声呈示方法としては、例えば、音
声認識に供された元の音声を一定時間だけ遅れを持たせ
て呈示する方法が知られている。2. Description of the Related Art Hitherto, when a speech such as a news speech is converted into subtitles in real time by using a speech recognition process, a result of the speech recognition is manually corrected. When correcting speech recognition results by hand, to assist the corrector,
Presenting the original voice provided for speech recognition to the corrector is performed. For example, as the voice presenting method, for example, the original voice provided for voice recognition is presented with a certain time delay. There are known ways to do this.

【０００３】[0003]

【発明が解決しようとする課題】しかし、この方法によ
れば、音声認識処理にかかる時間にバラツキが生じて、
個々の認識結果の出力タイミングがずれ、その結果、修
正者への認識結果の呈示と音声呈示とのタイミングが不
均一となって一致せず、すなわち、修正者に適正なタイ
ミングで音声を呈示することができず、従って、修正者
を支援することが実質的には不可能であった。However, according to this method, the time required for the speech recognition process varies,
The output timing of each recognition result is shifted, and as a result, the timing of the presentation of the recognition result to the corrector and the presentation of the voice become non-uniform and do not coincide with each other. That is, the voice is presented to the corrector at an appropriate timing. Could not do so, and it was virtually impossible to assist the corrector.

【０００４】一方、視聴者にとっては、画像からなるべ
く少ない遅れ時間で字幕が表示されるのが好ましいので
あるから、認識結果の修正と修正のための音声呈示と
を、画像に追随するようにリアルタイムで行う必要があ
った。On the other hand, it is preferable for the viewer to display the subtitles with a minimum delay time from the image. Therefore, the correction of the recognition result and the presentation of the voice for the correction are performed in real time so as to follow the image. Had to be done in.

【０００５】本発明の目的は、上記のような問題点を解
決し、予め定めた単位ごとの音声の呈示を適正なタイミ
ングで行なうことができる音声字幕化装置を提供するこ
とにある。An object of the present invention is to solve the above-mentioned problems and to provide an audio captioning apparatus capable of presenting audio in predetermined units at an appropriate timing.

【０００６】[0006]

【課題を解決するための手段】請求項１の発明は、予め
定めた単位ごとの音声を認識する音声認識手段と、該音
声認識手段による音声認識結果を修正するための修正手
段とを含む音声認識・修正手段を有する音声字幕化装置
において、前記音声認識・修正手段は、予め定めた単位
ごとの音声を、前記音声認識手段による音声認識結果が
出力されるタイミングに同期して呈示する音声呈示手段
を備えたことを特徴とする。According to a first aspect of the present invention, there is provided a speech recognition apparatus comprising: speech recognition means for recognizing speech for each predetermined unit; and correction means for correcting a speech recognition result by the speech recognition means. In a voice captioning device having a recognition / correction unit, the voice recognition / correction unit presents a voice for each predetermined unit in synchronization with a timing at which a voice recognition result is output by the voice recognition unit. Means are provided.

【０００７】請求項１において、音声認識・修正手段
は、音声呈示手段により呈示された音声を予め定めた遅
延時間だけ遅延させる遅延手段をさらに備えることがで
きる。In the first aspect, the voice recognition / correction means may further include a delay means for delaying the voice presented by the voice presentation means by a predetermined delay time.

【０００８】請求項１において、音声認識・修正手段
は、音声呈示手段により呈示された音声の話速を変化さ
せる話速可変手段をさらに備えることができる。[0008] In the first aspect, the voice recognition / correction means may further include a voice speed changing means for changing a voice speed of the voice presented by the voice presenting means.

【０００９】請求項１において、修正手段は、音声呈示
手段による音声の呈示を任意のタイミングで停止、再開
させるための停止・再開手段を備えることができる。In the first aspect, the correction means may include a stop / restart means for stopping and restarting the presentation of the sound by the sound presentation means at an arbitrary timing.

【００１０】請求項１において、修正手段は、音声の繰
り返し呈示部分を指定するための指定手段と、該指定手
段により指定された繰り返し呈示部分の最初の部分から
音声呈示を再開可能な第１再開手段とを備えることがで
きる。[0010] In the first aspect, the modifying means designates a repetitive presentation part of the voice, and a first resumable capable of resuming the speech presentation from the first part of the repetition presentation part designated by the designation means. Means.

【００１１】請求項１において、修正手段は、音声の繰
り返し呈示部分を指定するための指定手段と、該指定手
段により指定された繰り返し呈示部分の最初の部分より
所定時間前の部分から音声呈示を再開可能な第２再開手
段とを備えることができる。[0011] In the first aspect, the correction means includes a designation means for designating a repeated presentation part of the voice, and a speech presentation from a part which is a predetermined time before the first part of the repetition presentation part designated by the designation means. And a second resumable means that can be restarted.

【００１２】請求項１において、修正手段は、音声の繰
り返し呈示部分を指定するための指定手段と、該指定手
段により指定された繰り返し呈示部分の最後の部分より
所定時間後の部分から音声呈示を再開可能な第３再開手
段とを備えることができる。[0012] In the first aspect, the modifying means designates a repetitive presentation part of the speech, and provides the speech presentation from a part after a predetermined time from the last part of the repetition presentation part designated by the designation means. And a third restart means that can be restarted.

【００１３】請求項４ないし７のいずれかにおいて、音
声認識・修正手段は、音声の呈示が再開された時点から
元の音声呈示タイミングに復帰するまでの間に話速を速
くして音声を早口に変化させる話速高速化手段をさらに
備えることができる。[0013] In any one of the fourth to seventh aspects, the voice recognition / correction means increases the speech speed from the point in time when the presentation of the voice is resumed until the time when the voice is returned to the original voice presentation timing, and the voice is spoken quickly. The speed-up means can be further provided.

【００１４】請求項９の発明は、入力された音声を所定
の単位ごとに区分する区分手段と、該区分手段による区
分により得られた異なる予め定めた単位の音声を振り分
ける振分手段と、請求項１ないし８に記載の音声認識・
修正手段のうちの少なくとも２つの同一または異なる音
声認識・修正手段と、該少なくとも２つの音声認識・修
正手段からそれぞれ得られた修正結果を統合する統合手
段とを備えたことを特徴とする。According to a ninth aspect of the present invention, there is provided a dividing means for dividing an input sound into predetermined units, a distributing means for distributing different predetermined units of sound obtained by the division by the dividing means, Speech recognition described in items 1 to 8
It is characterized by comprising at least two identical or different voice recognition / correction means of the correction means, and integration means for integrating correction results respectively obtained from the at least two voice recognition / correction means.

【００１５】コンピュータ読み取り可能な記録媒体に記
録した制御プログラムは、予め定めた単位ごとの音声を
認識する音声認識手順と、該音声認識手順による音声認
識結果を修正するための修正手順とをコンピュータに実
行させるための制御プログラムであって、予め定めた単
位ごとの音声を、前記音声認識手順による音声認識結果
が出力されるタイミングに同期して呈示する音声呈示手
順をコンピュータに実行させる。The control program recorded on the computer-readable recording medium stores in the computer a speech recognition procedure for recognizing speech for each predetermined unit and a modification procedure for modifying the speech recognition result by the speech recognition procedure. A control program for causing a computer to execute a voice presentation procedure for presenting a voice for each predetermined unit in synchronization with a timing at which a voice recognition result by the voice recognition procedure is output.

【００１６】請求項１０において、コンピュータ読み取
り可能な記録媒体に記録した制御プログラムは、音声呈
示手順により呈示された音声を予め定めた遅延時間だけ
遅延させる遅延手順をさらにコンピュータに実行させ
る。In the tenth aspect, the control program recorded on the computer-readable recording medium causes the computer to further execute a delay procedure for delaying the voice presented by the voice presentation procedure by a predetermined delay time.

【００１７】請求項１０において、コンピュータ読み取
り可能な記録媒体に記録した制御プログラムは、音声呈
示手順により呈示された音声の話速を変化させる話速可
変手順をさらにコンピュータに実行させる。In a tenth aspect, the control program recorded on the computer-readable recording medium causes the computer to further execute a speech speed variable procedure for changing the speech speed of the voice presented by the voice presentation procedure.

【００１８】[0018]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１９】＜第１の実施の形態＞図１は本発明の第１
の実施の形態を示す。図１において、１は音響分析・区
分部であり、入力された音声から特徴パラメータを算出
するとともに、入力された音声を適正な単位に区分する
ものである。１０は音声認識・修正部であり、音声認識
部１０２と、音声呈示部１０３と、認識結果修正部１０
４とを有する。音声認識部１０２は音響分析・区分部１
からの特徴パラメータに基づき音声認識処理を行うもの
である。音声呈示部１０３は音響分析・区分部１により
適正な単位に区分された音声と、音声認識部１０２から
認識結果出力タイミングとをそれぞれ受信し、認識結果
修正部１０４に適正なタイミングで音声を呈示するもの
である。認識結果修正部１０４は音声認識部１０２から
の音声認識結果を、音声呈示部１０３により呈示された
音声と比較し、音声認識結果に対して手動で行われた修
正の結果（修正結果）を送信するものである。<First Embodiment> FIG. 1 shows a first embodiment of the present invention.
An embodiment will be described. In FIG. 1, reference numeral 1 denotes an acoustic analysis / separation unit, which calculates a characteristic parameter from an input voice and divides the input voice into appropriate units. Reference numeral 10 denotes a voice recognition / correction unit, which includes a voice recognition unit 102, a voice presentation unit 103, and a recognition result correction unit 10.
And 4. The voice recognition unit 102 is a sound analysis and classification unit 1
The voice recognition processing is performed based on the characteristic parameters from the above. The speech presentation unit 103 receives the speech divided into appropriate units by the acoustic analysis and division unit 1 and the recognition result output timing from the speech recognition unit 102, and presents the speech to the recognition result correction unit 104 at an appropriate timing. Is what you do. The recognition result correcting unit 104 compares the voice recognition result from the voice recognizing unit 102 with the voice presented by the voice presenting unit 103 and transmits the result of the manual correction (correction result) to the voice recognition result. Is what you do.

【００２０】次に、動作を説明する。入力された音声か
ら音響分析・区分部１により特徴パラメータを算出し、
得られた特徴パラメータを音声認識部１０２へ送信する
とともに、入力された音声を、音響分析・区分部１によ
り適正な単位に区分し、音声呈示部１０３に送信する。
音響分析・区分部１から特徴パラメータを受信した音声
認識部１０２は、特徴パラメータに基づき音声認識処理
を行い、得られた認識結果を認識結果修正部１０４に送
信し、同時に、認識結果出力タイミングを送信する。音
声呈示部１０３は、音響分析・区分部１から、適正な単
位に区分された音声を受信するとともに、音声認識部１
０２から認識結果出力タイミングを受信し、認識結果出
力タイミングに同期して当該単位分の音声を呈示する。Next, the operation will be described. A characteristic parameter is calculated by the acoustic analysis / separation unit 1 from the input speech,
The obtained feature parameters are transmitted to the speech recognition unit 102, and the input speech is divided into appropriate units by the acoustic analysis / separation unit 1 and transmitted to the speech presentation unit 103.
The speech recognition unit 102 that has received the feature parameters from the acoustic analysis / classification unit 1 performs speech recognition processing based on the feature parameters, transmits the obtained recognition result to the recognition result correction unit 104, and simultaneously sets the recognition result output timing. Send. The voice presenting unit 103 receives the voice divided into appropriate units from the acoustic analysis / classification unit 1 and
02, the recognition result output timing is received, and the sound of the unit is presented in synchronization with the recognition result output timing.

【００２１】修正者は、音声呈示部１０３により呈示さ
れた音声と、音声認識部１０２から認識結果修正部１０
４に送信された認識結果とを比較し、認識結果修正部１
０４を操作することにより、当該音声認識結果を手動で
修正することができる。修正結果が確定されると、確定
された修正結果が認識結果修正部１０４により送信され
る。The corrector inputs the voice presented by the voice presenting unit 103 and the recognition result correcting unit 10 from the voice recognizing unit 102.
4, and compares the recognition result with the recognition result transmitted to the recognition result correction unit 1.
By operating the button 04, the speech recognition result can be manually corrected. When the correction result is determined, the determined correction result is transmitted by the recognition result correction unit 104.

【００２２】＜第２の実施の形態＞図２は本発明の第２
の実施の形態を示す。本実施の形態は第１の実施の形態
との比較でいえば、音声認識・修正部の構成が異なる。<Second Embodiment> FIG. 2 shows a second embodiment of the present invention.
An embodiment will be described. This embodiment differs from the first embodiment in the configuration of the speech recognition / correction unit.

【００２３】すなわち、第１の実施の形態の音声認識・
修正部１０では、音声呈示部１０３は、音響分析・区分
部１から適正な単位に区分された音声を受信するととも
に、音声認識部１０２から認識結果出力タイミングを受
信し、認識結果出力タイミングに同期して当該単位分の
音声を呈示するようにした。That is, the voice recognition and speech recognition of the first embodiment
In the correction unit 10, the speech presentation unit 103 receives the speech divided into appropriate units from the acoustic analysis and classification unit 1, receives the recognition result output timing from the speech recognition unit 102, and synchronizes with the recognition result output timing. Then, the sound of the unit is presented.

【００２４】これに対して、本実施の形態の音声認識・
修正部２０では、音声呈示部１０３により認識結果出力
タイミングに同期して呈示された当該単位分の音声を、
音声遅延部２０５により所定時間だけ遅延させて呈示す
るようにした。On the other hand, the speech recognition and
In the correction unit 20, the sound of the unit presented by the speech presentation unit 103 in synchronization with the recognition result output timing is
The sound is delayed by a predetermined time by the audio delay unit 205 and presented.

【００２５】＜第３の実施の形態＞図３は本発明の第３
の実施の形態を示す。本実施の形態は第１の実施の形態
との比較でいえば、音声認識・修正部の構成が異なる。<Third Embodiment> FIG. 3 shows a third embodiment of the present invention.
An embodiment will be described. This embodiment differs from the first embodiment in the configuration of the speech recognition / correction unit.

【００２６】すなわち、第１の実施の形態の音声認識・
修正部１０では、音声呈示部１０３は、音響分析・区分
部１から単位分の音声を受信するとともに、音声認識部
１０２から認識結果出力タイミングを受信し、認識結果
出力タイミングに同期して当該単位分の音声を呈示する
ようにした。That is, the speech recognition and the speech recognition of the first embodiment
In the correcting unit 10, the voice presenting unit 103 receives the unit voice from the acoustic analysis / classification unit 1, receives the recognition result output timing from the voice recognition unit 102, and synchronizes the unit with the recognition result output timing. Minute voice.

【００２７】これに対して、本実施の形態の音声認識・
修正部３０では、音声呈示部１０３により認識結果出力
タイミングに同期して呈示された音声を、話速可変部３
０５により所定の話速に変化させて呈示するようにし
た。On the other hand, the voice recognition and
In the correcting unit 30, the speech presented by the speech presenting unit 103 in synchronization with the recognition result output timing is converted into the speech speed variable unit 3
05, the speech speed is changed to a predetermined speech speed.

【００２８】＜第４の実施の形態＞図４は本発明の第４
の実施の形態を示す。本実施の形態は第１の実施の形態
との比較でいえば、音声認識・修正部の構成が異なる。<Fourth Embodiment> FIG. 4 shows a fourth embodiment of the present invention.
An embodiment will be described. This embodiment differs from the first embodiment in the configuration of the speech recognition / correction unit.

【００２９】すなわち、第１の実施の形態の音声認識・
修正部１０では、音声呈示部１０３は、音響分析・区分
部１から、単位分の音声を受信するとともに、音声認識
部１０２から認識結果出力タイミングを受信し、認識結
果出力タイミングに同期して当該単位分の音声を呈示す
るようにした。That is, the speech recognition and the speech recognition of the first embodiment
In the correcting unit 10, the voice presenting unit 103 receives the unit voice from the acoustic analysis / classification unit 1, receives the recognition result output timing from the voice recognition unit 102, and synchronizes with the recognition result output timing. The sound of the unit was presented.

【００３０】これに対して、本実施の形態の音声認識・
修正部４０では、認識結果修正部４０４は、任意のタイ
ミングで、音声呈示の停止要求および再開要求を行うこ
とができるようにし、また、音声呈示部４０３は、認識
結果修正部４０４から、任意のタイミングで、音声呈示
の停止要求および再開要求を受信した場合、それぞれの
要求に応じて、認識結果修正部４０４への音声呈示の停
止および再開を行うようにした。On the other hand, the speech recognition and
In the correction unit 40, the recognition result correction unit 404 enables a stop request and a restart request for voice presentation to be made at any timing. At the timing, when the stop request and the restart request of the voice presentation are received, the stop and the restart of the voice presentation to the recognition result correcting unit 404 are performed according to each request.

【００３１】＜第５の実施の形態＞図５は本発明の第５
の実施の形態を示す。本実施の形態は第４の実施の形態
との比較でいえば、音声認識・修正部の構成が異なる。<Fifth Embodiment> FIG. 5 shows a fifth embodiment of the present invention.
An embodiment will be described. This embodiment differs from the fourth embodiment in the configuration of the speech recognition / correction unit.

【００３２】すなわち、第４の実施の形態の音声認識・
修正部１０では、認識結果修正部４０４は、任意のタイ
ミングで、音声呈示の停止要求および再開要求を行うこ
とができるようにし、また、音声呈示部４０３は、認識
結果修正部４０４から、任意のタイミングで、音声呈示
の停止要求および再開要求を受信した場合、それぞれの
要求に応じて、認識結果修正部４０４への音声呈示の停
止および再開を行うようにした。That is, the speech recognition and the speech recognition of the fourth embodiment
In the correction unit 10, the recognition result correction unit 404 enables a stop request and a restart request for voice presentation to be made at an arbitrary timing. At the timing, when the stop request and the restart request of the voice presentation are received, the stop and the restart of the voice presentation to the recognition result correcting unit 404 are performed according to each request.

【００３３】これに対して、本実施の形態の音声認識・
修正部５０では、（１）認識結果修正部５０４を操作し
て指定された繰り返し呈示部分の最初の部分から音声呈
示部５０３により音声呈示を再開させることができるよ
うにし、（２）認識結果修正部５０４を操作して指定さ
れた繰り返し呈示部分の最初の部分より所定時間前の部
分から音声呈示部５０３により音声呈示を再開させるこ
とができるようにし、（３）認識結果修正部５０４を操
作して指定された繰り返し呈示部分の最後の部分より所
定時間後の部分から音声呈示部５０３により音声呈示を
再開させることができるようにした。On the other hand, the voice recognition and
The correction unit 50 enables (1) the speech presentation unit 503 to restart the speech presentation from the first part of the repeatedly presented portion designated by operating the recognition result modification unit 504, and (2) the recognition result modification. By operating the unit 504, the voice presentation can be resumed by the voice presentation unit 503 from a portion that is a predetermined time before the first portion of the designated repeated presentation portion, and (3) the recognition result correction unit 504 is operated. The voice presentation unit 503 can restart the voice presentation from a part that is a predetermined time after the last part of the designated repeated presentation part.

【００３４】＜第６の実施の形態＞図６は本発明の第６
の実施の形態を示す。本実施の形態は第５の実施の形態
との比較でいえば、音声認識・修正部の構成が異なる。<Sixth Embodiment> FIG. 6 shows a sixth embodiment of the present invention.
An embodiment will be described. This embodiment differs from the fifth embodiment in the configuration of the speech recognition / correction unit.

【００３５】すなわち、第５の実施の形態の音声認識・
修正部５０では、認識結果修正部５０４を操作して指定
された繰り返し呈示部分の最初の部分から音声呈示部５
０３により音声呈示を再開させることができるように
し、また、認識結果修正部５０４を操作して指定された
繰り返し呈示部分の最初の部分より所定時間前の部分か
ら音声呈示部５０３により音声呈示を再開させることが
できるようにし、さらに、認識結果修正部５０４を操作
して指定された繰り返し呈示部分の最後の部分より所定
時間後の部分から音声呈示部５０３により音声呈示を再
開させることができるようにした。That is, the speech recognition and the speech recognition of the fifth embodiment
The correcting unit 50 operates the speech presenting unit 5 from the first part of the repetitively presented part specified by operating the recognition result correcting unit 504.
03, the speech presentation can be resumed, and the speech presentation unit 503 resumes the speech presentation from a part that is a predetermined time before the first part of the designated repeated presentation part by operating the recognition result correction unit 504. In addition, the voice presentation unit 503 can restart the voice presentation from a part that is a predetermined time after the last part of the repeatedly presented part specified by operating the recognition result correction unit 504 by operating the recognition result correction unit 504. did.

【００３６】これに対して、本実施の形態の音声認識・
修正部６０では、認識結果修正部６０４により第５の実
施の形態と本質的に同様にして音声呈示再開部分が指定
され、音声呈示部５０３により音声呈示が行われた場合
に、音声呈示部５０３から音声を受信した話速可変部３
０５により、受信した音声の話速を高速にして、すなわ
ち、音声を早口にして呈示するようにした。On the other hand, the voice recognition and
In the correcting unit 60, the speech presentation restarting part is designated by the recognition result modifying unit 604 essentially in the same manner as in the fifth embodiment, and when the speech presenting unit 503 performs the speech presentation, the speech presenting unit 503 Variable speed section 3 that receives voice from
According to 05, the speech speed of the received voice is increased, that is, the voice is presented in a fast voice.

【００３７】＜第７の実施の形態＞本実施の形態の音声
字幕化装置は、第１ないし第６の実施の形態に係る音声
認識・修正部１０，２０，３０，４０，５０，６０のう
ちの少なくとも２つの音声認識・修正部を用いたもので
あり、図７には、第１の実施の形態に係る音声認識・修
正部１０と、第２の実施の形態に係る音声認識・修正部
２０と、第６の実施の形態に係る音声認識・修正部６０
とが示してある。これら少なくとも２つの音声認識・修
正部は同一の構成のものであっても良いし、異なる構成
のものであってもよい。<Seventh Embodiment> A speech captioning apparatus according to the present embodiment is provided with the speech recognition / correction units 10, 20, 30, 40, 50, and 60 according to the first to sixth embodiments. FIG. 7 shows a voice recognition / correction unit 10 according to the first embodiment, and a voice recognition / correction unit 10 according to the second embodiment. Unit 20 and a speech recognition / correction unit 60 according to the sixth embodiment.
Is shown. These at least two speech recognition / correction units may have the same configuration or different configurations.

【００３８】音響分析・音声区分振分部７０１は入力音
声から特徴パラメータを算出し、得られた特徴パラメー
タを各音声認識・修正部の音声認識部へ送信するととも
に、入力された音声を適正な単位に区分し、各音声認識
・修正部の音声呈示部に振り分けている。そして、修正
結果統合部７０３は各音声認識・修正部の修正結果修正
部による修正結果を統合して字幕を出力する。The acoustic analysis / speech classification / distribution unit 701 calculates feature parameters from the input speech, transmits the obtained feature parameters to the speech recognition unit of each speech recognition / correction unit, and converts the input speech into an appropriate speech. It is divided into units and distributed to the voice presentation unit of each voice recognition / correction unit. Then, the correction result integration unit 703 integrates the correction results obtained by the correction result correction units of the voice recognition / correction units and outputs subtitles.

【００３９】以上のように、上述した実施形態の機能を
実現するソフトウェアのプログラムコードを記録した記
憶媒体を、システムあるいは装置に供給し、そのシステ
ムあるいは装置のコンピュータ、または、ＣＰＵ（cent
ral processing unit）やＭＰＵ（microprocessor uni
t）が記憶媒体に格納されたプログラムコードを読み出
し実行することによっても、本発明の目的が達成される
ことは言うまでもない。As described above, the storage medium storing the program codes of the software for realizing the functions of the above-described embodiments is supplied to the system or the apparatus, and the computer or the CPU (central processing unit) of the system or the apparatus is supplied.
ral processing unit) or MPU (microprocessor uni
Needless to say, the object of the present invention can be achieved also by reading and executing the program code stored in the storage medium in t).

【００４０】この場合、記憶媒体から読み出されたプロ
グラムコード自体が本発明の新規な機能を実現すること
になり、そのプログラムコードを記憶した記憶媒体は本
発明を構成することになる。In this case, the program code itself read from the storage medium realizes the novel function of the present invention, and the storage medium storing the program code constitutes the present invention.

【００４１】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピーディスク、ハードディ
スク、磁気テープ、光ディスク、光磁気ディスク、ＣＤ
−ＲＯＭ(compact disk ROM)、ＣＤ−Ｒ(compact disk
recordable)、不揮発性のメモリカード、ＲＯＭ（read
only memory）、等々を用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, magnetic tape, optical disk, magneto-optical disk, CD
-ROM (compact disk ROM), CD-R (compact disk
recordable), non-volatile memory card, ROM (read
only memory), and so on.

【００４２】また、コンピュータが読み出したプログラ
ムコードを実行することにより、上述した実施形態の機
能が実現されるだけでなく、そのプログラムコードの指
示に基づき、コンピュータ上で稼働しているＯＳ(opera
ting system)などが実際の処理の一部または全部を行
い、その処理によって上述した実施形態の機能が実現さ
れる場合も含まれることは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS (opera- tion) running on the computer based on the instruction of the program code.
Needless to say, a case in which the functions of the above-described embodiments are realized by performing part or all of the actual processing by a ting system or the like is also included.

【００４３】さらに、記憶媒体から読み出されたプログ
ラムコードが、コンピュータに挿入された機能拡張ボー
ドやコンピュータに接続された機能拡張ユニットに備わ
るメモリに書き込まれた後、そのプログラムコードの指
示に基づき、その機能拡張ボードや機能拡張ユニットに
備わるＣＰＵなどが実際の処理の一部または全部を行
い、その処理によって上述した実施形態の機能が実現さ
れる場合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instruction of the program code, It goes without saying that the CPU provided in the function expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００４４】[0044]

【発明の効果】以上説明したように、本発明によれば、
上記のように構成したので、予め定めた単位ごとの音声
の呈示を適正なタイミングで行なうことができ、従っ
て、修正を支援することができる。As described above, according to the present invention,
With the configuration as described above, it is possible to present the sound for each predetermined unit at an appropriate timing, and thus to assist the correction.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態を示すブロック図で
ある。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】本発明の第２の実施の形態を示すブロック図で
ある。FIG. 2 is a block diagram showing a second embodiment of the present invention.

【図３】本発明の第３の実施の形態を示すブロック図で
ある。FIG. 3 is a block diagram showing a third embodiment of the present invention.

【図４】本発明の第４の実施の形態を示すブロック図で
ある。FIG. 4 is a block diagram showing a fourth embodiment of the present invention.

【図５】本発明の第５の実施の形態を示すブロック図で
ある。FIG. 5 is a block diagram showing a fifth embodiment of the present invention.

【図６】本発明の第６の実施の形態を示すブロック図で
ある。FIG. 6 is a block diagram showing a sixth embodiment of the present invention.

【図７】本発明の第７の実施の形態を示すブロック図で
ある。FIG. 7 is a block diagram showing a seventh embodiment of the present invention.

[Explanation of symbols]

１音響分析・音声区分部１０，２０，３０，４０，５０，６０音声認識・修正
部１０２音声認識部１０３，４０３，５０３音声呈示部１０４，４０４，５０４，６０４認識結果修正部２０５音声遅延部３０５話速可変部７０１音響分析・音声区分振分部７０３修正結果統合部DESCRIPTION OF SYMBOLS 1 Acoustic analysis / speech classification part 10, 20, 30, 40, 50, 60 Speech recognition / correction part 102 Speech recognition part 103, 403, 503 Speech presentation part 104, 404, 504, 604 Recognition result correction part 205 Speech delay part 305 Variable speech rate unit 701 Sound analysis / speech classification distribution unit 703 Correction result integration unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者今井亨東京都世田谷区砧一丁目10番11号日本放送協会放送技術研究所内 (72)発明者都木徹東京都世田谷区砧一丁目10番11号日本放送協会放送技術研究所内 (72)発明者安藤彰男東京都世田谷区砧一丁目10番11号日本放送協会放送技術研究所内Ｆターム(参考） 5D015 KK02 LL04 LL05 9A001 HH17 HH33 KK60 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Toru Imai 1-10-11 Kinuta, Setagaya-ku, Tokyo Japan Broadcasting Corporation Japan Broadcasting Research Institute (72) Inventor Toru Toki 1-1-10 Kinuta, Setagaya-ku, Tokyo 11 Japan Broadcasting Corporation Broadcasting Research Institute (72) Inventor Akio Ando 1-10-11 Kinuta, Setagaya-ku, Tokyo Japan Broadcasting Research Institute F-term (reference) 5D015 KK02 LL04 LL05 9A001 HH17 HH33 KK60

Claims

[Claims]

1. An audio captioning apparatus comprising: a voice recognition unit for recognizing voice for each predetermined unit; and a correction unit for correcting a voice recognition result by the voice recognition unit. Wherein the voice recognition / correction means comprises voice presentation means for presenting voice of each predetermined unit in synchronization with a timing at which a voice recognition result is output by the voice recognition means. Device.

2. The captioning system according to claim 1, wherein the voice recognition / correction unit further includes a delay unit that delays the voice presented by the voice presentation unit by a predetermined delay time. apparatus.

3. The audio captioning apparatus according to claim 1, wherein the voice recognition / correction means further comprises a voice speed variable means for changing a voice speed of the voice presented by the voice presenting means. .

4. The audio captioning device according to claim 1, wherein the correction means includes a stop / restart means for stopping and restarting the presentation of the sound by the sound presenting means at an arbitrary timing. .

5. The method according to claim 1, wherein the modifying means comprises: designating means for designating a repeated presentation part of the voice; and the speech presentation can be resumed from the first part of the repeated presentation part designated by the designating means. An audio captioning apparatus comprising: a first resuming unit.

6. The method according to claim 1, wherein said correcting means comprises: designating means for designating a repetitive presentation part of the voice; and a predetermined time before the first part of the repetition presentation part designated by the designation means. An audio captioning device, comprising: second resuming means capable of resuming audio presentation.

7. The method according to claim 1, wherein the correcting means comprises: a designating means for designating a repeated presentation part of the voice; and a part after a predetermined time from the last part of the repeated presentation part designated by the designation means. An audio captioning device comprising: a third resuming unit capable of resuming audio presentation.

8. The method according to claim 4, wherein
The voice recognition / correction unit further includes a voice speed increasing unit that speeds up the voice speed and changes the voice to a quick voice during a period from when the voice presentation is resumed to when the voice presentation timing is restored. An audio captioning apparatus characterized by the above-mentioned.

9. A dividing means for dividing the input sound into predetermined units, and a distributing means for distributing sounds of different predetermined units obtained by the division by the dividing means. And at least two of the same or different voice recognition / correction means, and integration means for integrating correction results obtained from the at least two voice recognition / correction means. Characterized audio captioning device.

10. A control program for causing a computer to execute a voice recognition procedure for recognizing voice of each predetermined unit and a correction procedure for correcting a voice recognition result by the voice recognition procedure. A computer-readable recording medium in which a control program for causing a computer to execute a voice presentation procedure for presenting a voice for each predetermined unit in synchronization with a timing at which a voice recognition result is output by the voice recognition procedure is recorded.

11. The computer-readable recording medium according to claim 10, further comprising a control program for causing a computer to further execute a delay procedure for delaying a sound presented by the sound presentation procedure by a predetermined delay time.

12. The computer-readable recording medium according to claim 10, further comprising a control program for causing a computer to further execute a speech speed changing procedure for changing a speech speed of the voice presented by the voice presenting procedure.