JP2011170103A

JP2011170103A - Advertisement display system, advertisement display method, and advertisement display program

Info

Publication number: JP2011170103A
Application number: JP2010033813A
Authority: JP
Inventors: Seiichi Miki; 清一三木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2010-02-18
Filing date: 2010-02-18
Publication date: 2011-09-01

Abstract

<P>PROBLEM TO BE SOLVED: To improve an advertisement effect by naturally presenting an advertisement word to a user who utilizes voice recognition processing. <P>SOLUTION: An advertisement display system 100 includes: a voice recognition unit 102 which calculates the sound score, the language score and the total score, of a word to be the candidate of a voice recognition result of input voice data, to select a word having high total score, as a voice recognition result; an advertisement word storage unit 120 for storing the advertisement word in which an advertisement rate is paid as a counter value for the word to be easily selected as the voice recognition result; and an output adjustment unit 104 which performs adjustment so that the advertisement word is selected as the voice recognition result of the voice data when the similarity degree of the advertisement word with the voice data is within a predetermined similarity allowable range, and which displays and outputs the adjusted result as the voice recognition result of the voice data. Moreover, the similarity allowable range is set wider, as the rank of the advertisement rate of the advertisement word is higher. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、広告表示システム、広告表示方法、および広告表示プログラムに関する。 The present invention relates to an advertisement display system, an advertisement display method, and an advertisement display program.

特許文献１（特開平１１−１３３９９４号公報）には、入力音声に対して得られる複数の認識結果候補の中から最適なものを選択し入力音声に対する認識結果とする音声入力装置であって、認識結果候補となり得る複数の単語の適応スコアが格納される適応スコア記憶部と、入力音声に対して得られる複数の認識結果候補の中から認識結果を選択する際、適応スコア記憶部に格納されている各認識結果候補の適応スコアも考慮して認識結果を選択する手段と、該手段によって選択された認識結果をユーザの指示に従って修正すると共に、修正後の認識結果が手段に於いて選択されやすくなるように適応スコア記憶部の内容を更新するユーザ修正部とを備えた音声入力装置が記載されている。 Patent Document 1 (Japanese Patent Application Laid-Open No. 11-133994) discloses a voice input device that selects an optimum one from a plurality of recognition result candidates obtained for an input voice and obtains a recognition result for the input voice. An adaptive score storage unit that stores adaptive scores of a plurality of words that can be recognition result candidates, and a selection result that is stored in the adaptive score storage unit when a recognition result is selected from a plurality of recognition result candidates obtained for input speech. Means for selecting the recognition result in consideration of the adaptive score of each recognition result candidate, and correcting the recognition result selected by the means according to the user's instruction, and the corrected recognition result is selected by the means A voice input device is described that includes a user correction unit that updates the contents of the adaptive score storage unit so as to facilitate.

特許文献２（特開２００８−１７０８２０号公報）には、単語とコンテンツを関連付けて記憶する手段と、会話音声の中から単語を抽出する単語抽出手段と、前記単語抽出手段によって抽出された単語に関連付けて記憶されている前記コンテンツを読み取るコンテンツ読み取り手段と、前記読み取られたコンテンツをコンテンツ再生手段へ送るコンテンツ送信手段と、を備えるコンテンツ提供システムが記載されている。これにより、表示すべき広告等を話者の感情に基づいて選択できるとされている。 Patent Document 2 (Japanese Patent Application Laid-Open No. 2008-170820) discloses a means for storing a word and content in association with each other, a word extracting means for extracting a word from conversation speech, and a word extracted by the word extracting means. A content providing system including a content reading unit that reads the content stored in association with the content transmission unit that transmits the read content to a content reproduction unit is described. Thereby, it is supposed that the advertisement etc. which should be displayed can be selected based on a speaker's emotion.

特開平１１−１３３９９４号公報JP-A-11-133994 特開２００８−１７０８２０号公報JP 2008-170820 A

しかし、従来の広告の表示方法では、たとえば特許文献２に記載された技術のように、単語に関連付けられたコンテンツが、音声認識結果とは別にユーザに提示される。そのため、ユーザにとっては、自分がコンテンツを表示させようとしたのではないのにコンテンツが表示されることになり、たとえばコンテンツが広告の場合、ユーザにとっては、明らかにそれが広告であるとわかるようになっており、不自然さを与えていた。 However, in the conventional advertisement display method, the content associated with the word is presented to the user separately from the speech recognition result, as in the technique described in Patent Document 2, for example. Therefore, for the user, the content is displayed even though the user did not try to display the content. For example, when the content is an advertisement, the user clearly understands that the content is an advertisement. It has become unnatural.

本発明の目的は、上述した課題であるユーザに広告を提示する際に不自然になってしまうという問題を解決する広告表示システム、広告表示方法、および広告表示プログラムを提供することにある。 An object of the present invention is to provide an advertisement display system, an advertisement display method, and an advertisement display program that solve the above-described problem of unnaturalness when an advertisement is presented to a user.

本発明によれば、
音響モデルおよび言語モデルに基づき、入力された音声データに対する音声認識結果の候補となる単語の音響スコア、言語スコア、および当該音響スコアおよび当該言語スコアに基づく総スコアを算出して、総スコアが高い単語を音声認識結果として選出する音声認識手段と、
音声認識結果として選出されやすくするための対価として広告料金が支払われる広告単語を記憶する広告単語記憶手段を含み、前記広告単語と前記音声データとの類似度が所定の類似許容範囲内の場合に、当該広告単語が当該音声データの音声認識結果として選出されるように調整して、調整した結果を前記音声データの音声認識結果として表示出力する出力調整手段と、
を含み、
前記広告単語の広告料金のランクが高いほど前記類似許容範囲が広くなるように設定されている広告表示システムが提供される。 According to the present invention,
Based on the acoustic model and the language model, the acoustic score of the words that are candidates for the speech recognition result for the input speech data, the language score, and the total score based on the acoustic score and the language score are calculated, and the total score is high. Speech recognition means for selecting words as speech recognition results;
Including an advertising word storage means for storing an advertising word for which an advertising fee is paid as a consideration for facilitating selection as a speech recognition result, and the similarity between the advertising word and the voice data is within a predetermined similarity allowable range Adjusting so that the advertisement word is selected as the voice recognition result of the voice data, and output adjusting means for displaying and outputting the adjusted result as the voice recognition result of the voice data;
Including
There is provided an advertisement display system that is set such that the similarity allowable range becomes wider as the rank of the advertisement fee of the advertisement word is higher.

本発明によれば、
音声認識結果として選出されやすくするための対価として広告料金が支払われる広告単語を記憶する広告単語記憶手段を含むコンピュータシステムを用いた広告表示方法であって、
音響モデルおよび言語モデルに基づき、入力された音声データに対する音声認識結果の候補となる単語の音響スコア、言語スコア、および当該音響スコアおよび当該言語スコアに基づく総スコアとを算出して、総スコアが高い単語を音声認識結果として選出する音声認識ステップと、
前記広告単語と前記音声データとの類似度が所定の類似許容範囲内の場合に、当該広告単語が当該音声データの音声認識結果として選出されるように調整して、調整した結果を前記音声データの音声認識結果として表示出力する出力調整ステップと、
を含み、
前記広告単語の広告料金のランクが高いほど前記類似許容範囲が広くなるように設定されている広告表示方法が提供される。 According to the present invention,
An advertisement display method using a computer system including an advertisement word storage means for storing an advertisement word for which an advertisement fee is paid as a price for facilitating selection as a speech recognition result,
Based on the acoustic model and the language model, an acoustic score, a language score, and a total score based on the acoustic score and the language score are calculated for a word that is a speech recognition result candidate for the input speech data. A speech recognition step for selecting high words as speech recognition results;
When the similarity between the advertisement word and the voice data is within a predetermined similarity tolerance range, the advertisement word is adjusted to be selected as a voice recognition result of the voice data, and the adjusted result is the voice data. Output adjustment step to display and output as a voice recognition result of
Including
There is provided an advertisement display method that is set such that the similarity allowable range becomes wider as the advertisement charge rank of the advertisement word is higher.

本発明によれば、
コンピュータを、
音響モデルおよび言語モデルに基づき、入力された音声データに対する音声認識結果の候補となる単語の音響スコア、言語スコア、および当該音響スコアおよび当該言語スコアに基づく総スコアとを算出して、総スコアが高い単語を音声認識結果として選出する音声認識手段、
音声認識結果として選出されやすくするための対価として広告料金が支払われる広告単語を記憶する広告単語記憶手段を含み、前記広告単語と前記音声データとの類似度が所定の類似許容範囲内の場合に、当該広告単語が当該音声データの音声認識結果として選出されるように調整して、調整した結果を前記音声データの音声認識結果として表示出力する出力調整手段、
として機能させるプログラムであって、
前記広告単語の広告料金のランクが高いほど前記類似許容範囲が広くなるように設定されている広告表示プログラムが提供される。 According to the present invention,
Computer
Based on the acoustic model and the language model, an acoustic score, a language score, and a total score based on the acoustic score and the language score are calculated for a word that is a speech recognition result candidate for the input speech data. Speech recognition means for selecting high words as speech recognition results,
Including an advertising word storage means for storing an advertising word for which an advertising fee is paid as a consideration for facilitating selection as a speech recognition result, and the similarity between the advertising word and the voice data is within a predetermined similarity allowable range Adjusting the output so that the advertisement word is selected as the voice recognition result of the voice data, and displaying and outputting the adjusted result as the voice recognition result of the voice data;
A program that functions as
There is provided an advertisement display program which is set such that the similarity allowable range becomes wider as the rank of the advertisement fee of the advertisement word is higher.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

本発明によれば、音声認識処理を利用するユーザに、自然なかたちで広告単語を提示して広告効果を高めることができる。 ADVANTAGE OF THE INVENTION According to this invention, an advertising word can be shown to a user using a speech recognition process naturally, and an advertising effect can be heightened.

本発明の実施の形態における広告表示システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the advertisement display system in embodiment of this invention. 本発明の実施の形態における広告単語記憶部の構成の一例を示す図である。It is a figure which shows an example of a structure of the advertisement word memory | storage part in embodiment of this invention. 本発明の実施の形態における広告表示システムの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the advertisement display system in embodiment of this invention. 本発明の実施の形態におけるスコア調整部の処理手順を示す模式図である。It is a schematic diagram which shows the process sequence of the score adjustment part in embodiment of this invention. 本発明の実施の形態におけるスコア調整部の処理手順を示す模式図である。It is a schematic diagram which shows the process sequence of the score adjustment part in embodiment of this invention. 本発明の実施の形態における広告単語記憶部の構成の他の例を示す図である。It is a figure which shows the other example of a structure of the advertisement word memory | storage part in embodiment of this invention. 本発明の実施の形態における広告表示システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the advertisement display system in embodiment of this invention. 本発明の実施の形態における広告単語記憶部の構成の一例を示す図である。It is a figure which shows an example of a structure of the advertisement word memory | storage part in embodiment of this invention. 本発明の実施の形態における広告単語記憶部の構成の他の例を示す図である。It is a figure which shows the other example of a structure of the advertisement word memory | storage part in embodiment of this invention. 本発明の実施の形態における広告表示システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the advertisement display system in embodiment of this invention. 本発明の実施の形態における広告単語記憶部の構成の一例を示す図である。It is a figure which shows an example of a structure of the advertisement word memory | storage part in embodiment of this invention. 本発明の実施の形態における広告表示システムの構成の他の例を示すブロック図である。It is a block diagram which shows the other example of a structure of the advertisement display system in embodiment of this invention. 本発明の実施の形態における広告表示システムの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the advertisement display system in embodiment of this invention. 本発明の実施の形態における広告表示システムを含むネットワーク構造を示すブロック図である。It is a block diagram which shows the network structure containing the advertisement display system in embodiment of this invention. 本発明の実施の形態における広告表示システムの構成の他の例を示すブロック図である。It is a block diagram which shows the other example of a structure of the advertisement display system in embodiment of this invention.

以下、本発明の実施の形態について、図面を用いて説明する。尚、すべての図面において、同様の構成要素には同様の符号を付し、適宜説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all the drawings, the same constituent elements are denoted by the same reference numerals, and the description thereof is omitted as appropriate.

以下の実施の形態において、広告表示システムは、
音響モデルおよび言語モデルに基づき、入力された音声データに対する音声認識結果の候補となる単語の音響スコア、言語スコア、および当該音響スコアおよび当該言語スコアに基づく総スコアを算出して、総スコアが高い単語を音声認識結果として選出する音声認識手段と、
音声認識結果として選出されやすくするための対価として広告料金が支払われる広告単語を記憶する広告単語記憶手段を含み、広告単語と音声データとの類似度が所定の類似許容範囲内の場合に、当該広告単語が当該音声データの音声認識結果として選出されるように調整して、調整した結果を音声データの音声認識結果として表示出力する出力調整手段と、
を含み、
広告単語の広告料金のランクが高いほど類似許容範囲が広くなるように設定されている。 In the following embodiment, the advertisement display system includes:
Based on the acoustic model and the language model, the acoustic score of the words that are candidates for the speech recognition result for the input speech data, the language score, and the total score based on the acoustic score and the language score are calculated, and the total score is high. Speech recognition means for selecting words as speech recognition results;
Including an advertising word storage means for storing an advertising word for which an advertising fee is paid as a consideration for facilitating selection as a speech recognition result, and the similarity between the advertising word and voice data is within a predetermined similarity allowable range, Adjusting so that the advertisement word is selected as the voice recognition result of the voice data, and output adjusting means for displaying and outputting the adjusted result as the voice recognition result of the voice data;
Including
The similarity allowable range is set to be wider as the rank of the advertisement fee of the advertisement word is higher.

以下の実施の形態において、広告表示システムは、広告単語と入力された音声データとの音がある程度類似する場合に、広告単語の本来の音声認識処理の総スコアが他の単語の総スコアよりも低い場合でも、音声認識結果として当該広告単語が表示出力されるように構成されている。ここで、広告単語と入力された音声データとの音がどの程度類似する場合に上記のような処理をするかは、類似許容範囲を設定することにより規定することができる。広告単語と入力された音声データとの音の類似の度合は、音声認識処理の結果の総スコアまたは音響スコアに基づき判断することができる。 In the following embodiment, the advertisement display system is configured such that when the sound of the advertisement word and the input voice data is somewhat similar, the total score of the original speech recognition processing of the advertisement word is higher than the total score of other words. Even if it is low, the advertisement word is displayed and output as a voice recognition result. Here, to what extent the sound of the advertising word and the input voice data is similar can be defined by setting a similar allowable range. The degree of sound similarity between the advertisement word and the input voice data can be determined based on the total score or acoustic score of the result of the voice recognition process.

（第１の実施の形態）
図１は、本実施の形態における広告表示システムの構成の一例を示すブロック図である。
広告表示システム１００は、音声認識部１０２（音声認識手段）と、出力調整部１０４（出力調整手段）と、音響モデル記憶部１１０と、言語モデル記憶部１１２とを含む。 (First embodiment)
FIG. 1 is a block diagram illustrating an example of a configuration of an advertisement display system in the present embodiment.
The advertisement display system 100 includes a speech recognition unit 102 (speech recognition unit), an output adjustment unit 104 (output adjustment unit), an acoustic model storage unit 110, and a language model storage unit 112.

音響モデル記憶部１１０は、音響モデルを記憶する。言語モデル記憶部１１２は、言語モデルを記憶する。これらの音響モデルおよび言語モデルは、一般的に用いられているものとすることができる。 The acoustic model storage unit 110 stores an acoustic model. The language model storage unit 112 stores a language model. These acoustic models and language models can be generally used.

音声認識部１０２は、音響モデル記憶部１１０に記憶された音響モデルおよび言語モデル記憶部１１２に記憶された言語モデルに基づき、入力された音声データに対する音声認識結果の候補となる単語の音響スコア、言語スコア、および当該音響スコアおよび当該言語スコアに基づく総スコアを算出する。音声認識部１０２は、総スコアが高い単語を入力された音声データの音声認識結果として選出する。本実施の形態において、音声認識部１０２は、総スコアが高い単語を、総スコアが高い順に総スコアに対応付けて複数選出することができる。 The speech recognition unit 102 is based on the acoustic model stored in the acoustic model storage unit 110 and the language model stored in the language model storage unit 112, and the acoustic score of words that are candidates for speech recognition results for the input speech data, A language score and a total score based on the acoustic score and the language score are calculated. The voice recognition unit 102 selects a word having a high total score as a voice recognition result of voice data that has been input. In the present embodiment, the speech recognition unit 102 can select a plurality of words having a high total score in association with the total score in descending order of the total score.

出力調整部１０４は、広告単語と入力させた音声データとの類似度が所定の類似許容範囲内の場合に、広告単語が当該音声データの音声認識結果として選出されるように調整する。出力調整部１０４は、調整した結果を当該音声データの音声認識結果として表示出力する。ここで、出力表示とは、後述するように、ネットワークを介して接続された他の端末のディスプレイ等に表示される形態で出力することである。 The output adjustment unit 104 performs adjustment so that the advertisement word is selected as the voice recognition result of the voice data when the similarity between the advertisement word and the input voice data is within a predetermined similarity tolerance range. The output adjustment unit 104 displays and outputs the adjusted result as a voice recognition result of the voice data. Here, the output display means output in a form displayed on a display or the like of another terminal connected via a network, as will be described later.

以下に具体例を示す。本実施の形態において、出力調整部１０４は、スコア調整部１０６（スコア調整手段）と、表示出力部１０８（表示出力手段）と、広告単語記憶部１２０（広告単語記憶手段）とを含む。 Specific examples are shown below. In the present embodiment, the output adjustment unit 104 includes a score adjustment unit 106 (score adjustment unit), a display output unit 108 (display output unit), and an advertisement word storage unit 120 (advertisement word storage unit).

広告単語記憶部１２０は、音声認識結果として選出されやすくするための対価として広告料金が支払われる広告単語を記憶する。本実施の形態において、広告単語記憶部１２０は、類似許容範囲を規定するパラメータとして、広告単語の広告料金のランクが高いほど高い値となるスコアの補正値を、広告単語に対応付けて記憶する。これにより、広告単語の広告料金のランクが高いほど類似許容範囲が広くなるように設定されることになる。 The advertising word storage unit 120 stores an advertising word for which an advertising fee is paid as a consideration for facilitating selection as a voice recognition result. In the present embodiment, the advertising word storage unit 120 stores a correction value of a score that becomes a higher value as the advertising fee rank of the advertising word is higher in association with the advertising word as a parameter for defining the similarity allowable range. . As a result, the similar allowable range is set to be wider as the advertisement charge rank of the advertisement word is higher.

スコア調整部１０６は、広告単語については、音声認識部１０２が算出した総スコアに広告単語記憶部１２０に記憶されたスコアの補正値を付加する調整を行う。 For the advertising word, the score adjustment unit 106 performs adjustment to add the score correction value stored in the advertising word storage unit 120 to the total score calculated by the voice recognition unit 102.

図２は、広告単語記憶部１２０の構成の一例を示す図である。
広告単語記憶部１２０は、広告単語欄と、広告料金欄と、スコアの補正値欄とを含む。広告単語欄には、たとえば「ＮＥＣ」や「パペロ」等、広告主が広告したい広告単語が記憶される。 FIG. 2 is a diagram illustrating an example of the configuration of the advertising word storage unit 120.
The advertising word storage unit 120 includes an advertising word column, an advertising fee column, and a score correction value column. In the advertisement word column, an advertisement word that the advertiser wants to advertise, such as “NEC” or “Papello” is stored.

広告料金欄には、たとえば「２０円／出力」、「１０円／出力」等、各広告単語の広告料金を示す情報が記憶される。「２０円／出力」とは、通常の音声認識処理の結果としては本来この広告単語が表示出力されないが、本実施の形態における広告表示システム１００の調整処理により、この広告単語が表示出力されることになった場合に、１出力あたり２０円の広告料金が課せられることを示す。 In the advertisement fee column, information indicating the advertisement fee of each advertisement word such as “20 yen / output”, “10 yen / output”, etc. is stored. “20 yen / output” means that the advertisement word is not originally displayed and output as a result of the normal speech recognition process, but the advertisement word is displayed and output by the adjustment process of the advertisement display system 100 in the present embodiment. This means that an advertising fee of 20 yen per output is imposed.

スコアの補正値欄には、たとえば「２０」や「１０」等のスコアの補正値が記憶される。本実施の形態において、スコアの補正値は類似許容範囲を規定するパラメータとして設定される。ここで、スコアの補正値は総スコアに対する補正値とすることができる。 In the score correction value column, for example, score correction values such as “20” and “10” are stored. In the present embodiment, the score correction value is set as a parameter that defines the similar allowable range. Here, the correction value of the score can be a correction value for the total score.

以下、スコアの補正値の設定方法について説明する。スコアの補正値は、たとえば、入力された音声データに対して総スコアが最も高い最上位単語の総スコアと当該音声データに対する広告単語の総スコアとの差がどの程度の範囲内であれば人が類似と感じるかを考慮して上限値を決定することができる。スコアの補正値が大きすぎると、音声認識部１０２に入力された音声データと広告単語とが音として全く異なるものであっても、その音声データの音声認識結果として広告単語が選出されることになってしまう。しかし、入力された音声データと音声が全く異なる広告単語が音声認識結果として表示出力されると、ユーザにとって不自然になってしまう。そのため、本実施の形態においては、上記のようにスコアの補正値の上限値を決定し、スコアの補正値が、上限値以下となるように制御する。 Hereinafter, a method for setting a score correction value will be described. The correction value of the score is, for example, human if the difference between the total score of the highest word having the highest total score for the input voice data and the total score of the advertising word for the voice data is within the range. The upper limit value can be determined in consideration of whether they feel similar to each other. If the correction value of the score is too large, the advertisement word is selected as the voice recognition result of the voice data even if the voice data input to the voice recognition unit 102 and the advertisement word are completely different as sounds. turn into. However, if an advertisement word whose voice is completely different from the input voice data is displayed and output as a voice recognition result, it becomes unnatural for the user. Therefore, in the present embodiment, the upper limit value of the score correction value is determined as described above, and the score correction value is controlled to be equal to or lower than the upper limit value.

また、本実施の形態において、スコアの補正値は、広告単語の広告料金のランクが高いほど高い値となるように設定することができる。たとえば、広告料金の単価が「２０円／出力」である広告単語「ＮＥＣ」は、広告料金の単価が「１０円／出力」である広告単語「パペロ」よりも広告料金のランクが高い。そのため、広告単語記憶部１２０において、広告単語「ＮＥＣ」のスコア補正値が、広告単語「パペロ」のスコアの補正値「１０」よりも高い「２０」となるように設定する。なお、ここでは、広告料金の各単価に応じてスコアの補正値が設定されているが、たとえば広告料金の単価に応じてランク分けして、各ランクに応じてスコアの補正値を設定することもできる。たとえば、広告料金の単価が「６円／出力」以上「１０円／出力」以下は「Ｃ（低ランク）」、広告料金の単価が「１１円／出力」以上「１５円／出力」以下は「Ｂ（中ランク）」、広告料金の単価が「１６円／出力」以上「２０円／出力」以下は「Ａ（高ランク）」のようにして、各ランク毎にスコアの補正値を設定することもできる。 Further, in the present embodiment, the score correction value can be set to be higher as the rank of the advertisement fee of the advertisement word is higher. For example, the advertising word “NEC” whose unit price of the advertising fee is “20 yen / output” has a higher advertising fee rank than the advertising word “Papello” whose unit price of the advertising fee is “10 yen / output”. Therefore, in the advertising word storage unit 120, the score correction value of the advertising word “NEC” is set to be “20”, which is higher than the correction value “10” of the advertising word “Papello”. Here, the score correction value is set according to each unit price of the advertising fee. For example, the score correction value is classified according to the unit price of the advertising fee, and the score correction value is set according to each rank. You can also. For example, if the unit price of the advertising fee is “6 yen / output” or more and “10 yen / output” or less, “C (low rank)”, and the unit price of the advertising fee is “11 yen / output” or more and “15 yen / output” or less. “B (medium rank)”, and “A (high rank)” when the unit price of the advertising fee is “16 yen / output” or more and “20 yen / output” or less, a score correction value is set for each rank. You can also

本実施の形態において、スコアの補正値の上限以下の範囲で、広告単語の広告料金のランクに応じて各ランクのスコアの補正値を設定することができる。たとえば、入力された音声データに対して総スコアが最も高い最上位単語の総スコアと当該音声データに対する広告単語の総スコアとの差が３０以内であれば、これらの単語が類似であると判断できるような場合は、スコアの補正値の上限を３０として、広告単語の広告料金のランクが高い順に、スコアの補正値として、３０、２５、２０・・等を設定することができる。 In the present embodiment, the correction value of the score of each rank can be set in accordance with the rank of the advertisement fee of the advertisement word within the range not exceeding the upper limit of the correction value of the score. For example, if the difference between the total score of the highest word having the highest total score for the input voice data and the total score of the advertising word for the voice data is within 30, it is determined that these words are similar. In such a case, the upper limit of the score correction value can be set to 30, and the score correction value can be set to 30, 25, 20,.

なお、本実施の形態において、広告単語記憶部１２０に広告料金欄が設けられた例を示しているが、広告単語記憶部１２０は、広告料金欄は含まない構成とすることもできる。 In the present embodiment, an example is shown in which an advertising fee column is provided in the advertising word storage unit 120, but the advertising word storage unit 120 may be configured not to include an advertising fee column.

図１に戻り、表示出力部１０８は、スコア調整部１０６が調整したスコアに基づき、総スコアが高い単語を入力された音声データの音声認識結果として表示出力する。本実施の形態において、表示出力部１０８は、総スコアが最も高い単語を一つのみ表示出力することができる。 Returning to FIG. 1, the display output unit 108 displays and outputs a word having a high total score as a voice recognition result of the input voice data based on the score adjusted by the score adjustment unit 106. In the present embodiment, the display output unit 108 can display and output only one word having the highest total score.

図１４は、本実施の形態における広告表示システム１００を含むネットワーク構造を示すブロック図である。
このネットワーク構造は、広告表示システム１００と、広告表示システム１００にインターネット等のネットワーク１５０を介して接続されたユーザ端末装置２００とを含む。ユーザ端末装置２００は、ユーザのＰＣ等とすることができる。ユーザ端末装置２００は、たとえばパーソナルコンピュータ等により構成することができる。ユーザ端末装置２００には、たとえばマイク等の音声入力手段、およびディスプレイ等の表示手段が設けられた構成とすることができる。ユーザ端末２００の音声入力手段を介して音声データが入力されると、当該音声データは、ネットワーク１５０を介して広告表示システム１００の音声認識部１０２（図１参照）に入力される。広告表示システム１００の表示出力部１０８が音声認識結果を出力すると、当該認識結果は、ネットワーク１５０を介してユーザ端末装置２００に入力され、ユーザ端末装置２００の表示手段に表示出力される。なお、ユーザ端末２００のユーザの音声データは、ユーザ端末２００との対応が取れていれば、たとえば電話回線等、ネットワーク１５０以外のネットワークを介して広告表示システム１００の音声認識部１０２に入力される構成とすることもできる。 FIG. 14 is a block diagram showing a network structure including the advertisement display system 100 in the present embodiment.
This network structure includes an advertisement display system 100 and a user terminal device 200 connected to the advertisement display system 100 via a network 150 such as the Internet. The user terminal device 200 can be a user's PC or the like. The user terminal device 200 can be configured by a personal computer, for example. The user terminal device 200 can be configured to include voice input means such as a microphone and display means such as a display. When voice data is input via the voice input unit of the user terminal 200, the voice data is input to the voice recognition unit 102 (see FIG. 1) of the advertisement display system 100 via the network 150. When the display output unit 108 of the advertisement display system 100 outputs a speech recognition result, the recognition result is input to the user terminal device 200 via the network 150 and is displayed and output on the display unit of the user terminal device 200. Note that the voice data of the user of the user terminal 200 is input to the voice recognition unit 102 of the advertisement display system 100 via a network other than the network 150 such as a telephone line if the correspondence with the user terminal 200 is achieved. It can also be configured.

次に、本実施の形態において、広告表示システム１００に音声データが入力されてから、音声認識結果が表示出力されるまでの手順を説明する。図３は、本実施の形態における広告表示システム１００の処理手順を示すフローチャートである。 Next, in this embodiment, a procedure from when voice data is input to the advertisement display system 100 to when a voice recognition result is displayed and output will be described. FIG. 3 is a flowchart showing a processing procedure of the advertisement display system 100 in the present embodiment.

音声データが入力されると（ステップＳ１００）、音声認識部１０２は音声認識処理を行う（ステップＳ１０２）。この処理は、通常の音声認識処理とすることができる。具体的には、音声認識部１０２は、音声認識結果の候補となる単語の各スコアを算出する。 When voice data is input (step S100), the voice recognition unit 102 performs voice recognition processing (step S102). This process can be a normal voice recognition process. Specifically, the speech recognition unit 102 calculates each score of words that are candidates for speech recognition results.

つづいて、出力調整部１０４のスコア調整部１０６は、音声認識部１０２による音声認識結果に対して、広告単語については、広告単語記憶部１２０に記憶されたスコアの補正値を用いてスコアの調整を行う（ステップＳ１０４）。表示出力部１０８は、スコア調整部１０６によるスコアの調整が行われた後のスコアに基づき、スコアが高い単語を音声認識結果として出力する（ステップＳ１０６）。 Subsequently, the score adjustment unit 106 of the output adjustment unit 104 adjusts the score using the score correction value stored in the advertisement word storage unit 120 for the advertisement word with respect to the speech recognition result by the speech recognition unit 102. Is performed (step S104). The display output unit 108 outputs a word having a high score as a speech recognition result based on the score after the score adjustment by the score adjustment unit 106 is performed (step S106).

本実施の形態における出力調整部１０４の処理手順を図４および図５を参照して説明する。ここで、表示出力部１０８は、音声認識結果として、総スコアが最も高い単語一つのみを表示出力するものとする。 The processing procedure of the output adjustment unit 104 in the present embodiment will be described with reference to FIGS. Here, the display output unit 108 displays and outputs only one word having the highest total score as a speech recognition result.

図４（ａ）に示すように、たとえば入力された音声が「えんにちにいきました」だとする。この中で「えんにち」に対応する単語の音声認識結果として、「縁日」、・・・、「ＮＥＣ」・・・等が候補の単語として挙げられるとする（図４（ｂ））。また、音声認識部１０２による通常の音声認識処理の結果の総スコアが「縁日」＝「３０」、「ＮＥＣ」＝「１６」だとする。図４（ｂ）では、総スコアの大きい順に候補が表示されている。この場合、通常の音声認識処理においては、音声認識処理の結果、総スコアが最も高い「縁日」が音声認識結果として表示出力される。 As shown in FIG. 4A, for example, it is assumed that the input voice is “I went to life”. Among these, as a speech recognition result of a word corresponding to “enichi”, “enjoyment day”,..., “NEC”... Are assumed as candidate words (FIG. 4B). Further, it is assumed that the total score of the result of the normal speech recognition processing by the speech recognition unit 102 is “Family Day” = “30”, “NEC” = “16”. In FIG. 4B, candidates are displayed in descending order of the total score. In this case, in the normal speech recognition process, the “day of the festival” having the highest total score is displayed and output as the speech recognition result as a result of the speech recognition process.

本実施の形態において、音声認識部１０２の音声認識処理の結果がそのまま出力されるのではなく、広告単語について、スコア調整部１０６がスコアの調整を行った後に、結果が出力される。スコア調整部１０６は、広告単語記憶部１２０を参照して、広告単語について、音声認識部１０２による音声認識処理の結果の総スコアにスコアの補正値を付加する。図２に示したように、「ＮＥＣ」は広告単語記憶部１２０に広告単語として記憶されており、スコアの補正値が「２０」となっている。そこで、図４（ｂ）に示された「ＮＥＣ」の総スコア「１６」にスコアの補正値「２０」を付加すると、調整後の「ＮＥＣ」の総スコアは「３６」となる。この結果、「ＮＥＣ」の総スコアが「縁日」の総スコア「３０」よりも大きくなり、最も大きくなる（図４（ｃ））。 In the present embodiment, the result of the speech recognition process of the speech recognition unit 102 is not output as it is, but the result is output after the score adjustment unit 106 adjusts the score for the advertising word. The score adjustment unit 106 refers to the advertisement word storage unit 120 and adds a correction value of the score to the total score as a result of the speech recognition processing by the speech recognition unit 102 for the advertisement word. As shown in FIG. 2, “NEC” is stored as an advertising word in the advertising word storage unit 120, and the score correction value is “20”. Therefore, when the correction value “20” of the score is added to the total score “16” of “NEC” shown in FIG. 4B, the total score of “NEC” after adjustment becomes “36”. As a result, the total score of “NEC” becomes larger than the total score “30” of “Family Day”, and becomes the largest (FIG. 4C).

そのため、表示出力部１０８は、入力された音声データ「えんにち」の音声認識結果として、調整処理後の総スコアに基づき、総スコアが最も高い「ＮＥＣ」を選択し、音声認識結果として「ＮＥＣに行きました」等を表示出力する（図４（ｄ））。 Therefore, the display output unit 108 selects “NEC” having the highest total score based on the total score after the adjustment process as the voice recognition result of the input voice data “Enichi”, and displays “NEC” as the voice recognition result. “I went” is displayed and output (FIG. 4D).

一方、図５（ａ）に示すように、たとえば入力された音声が「えいがにいきました」だとする。この中で「えいが」に対応する単語の音声認識結果として、「映画」、・・・、「ＮＥＣ」・・・等が候補の単語として挙げられるとする（図５（ｂ））。また、音声認識部１０２による通常の音声認識処理の結果の総スコアが「映画」＝「３０」、「ＮＥＣ」＝「８」だとする。図５（ｂ）でも、総スコアの大きい順に候補が表示されている。 On the other hand, as shown in FIG. 5 (a), for example, it is assumed that the input voice is “I went to Eigai”. Among these, as a speech recognition result of a word corresponding to “Eiga”, “movie”,..., “NEC”..., Etc. are cited as candidate words (FIG. 5B). Further, it is assumed that the total score of the result of the normal voice recognition processing by the voice recognition unit 102 is “movie” = “30”, “NEC” = “8”. Also in FIG. 5B, candidates are displayed in descending order of the total score.

図４を参照して説明したのと同様に、スコア調整部１０６は、広告単語記憶部１２０を参照して、広告単語について、音声認識部１０２による音声認識処理の結果の総スコアにスコアの補正値を付加する。この場合、図５（ｂ）に示された「ＮＥＣ」の総スコア「８」にスコアの補正値「２０」を付加すると、調整後の「ＮＥＣ」のスコアは、「２８」となる。本例では、「ＮＥＣ」のスコアを調整した後も、「映画」のスコア「３０」の方が大きい（図５（ｃ））。そのため、本例では、スコア調整部１０６は、入力された音声データ「えいが」の音声認識結果として、調整処理後のスコアに基づき、スコアが最も高い「映画」を選択し、音声認識結果として、「映画に行きました」等を出力する（図５（ｄ））。 As described with reference to FIG. 4, the score adjustment unit 106 refers to the advertisement word storage unit 120 and corrects the score to the total score as a result of the speech recognition processing by the speech recognition unit 102 for the advertisement word. Add a value. In this case, when the correction value “20” of the score is added to the total score “8” of “NEC” shown in FIG. 5B, the adjusted score of “NEC” becomes “28”. In this example, after adjusting the score of “NEC”, the score “30” of “movie” is larger (FIG. 5C). Therefore, in this example, the score adjustment unit 106 selects “movie” having the highest score based on the score after the adjustment processing as the voice recognition result of the input voice data “Eiga”, and the voice recognition result “I went to the movie” or the like is output (FIG. 5D).

本実施の形態において、広告表示システム１００は、広告単語と入力された音声データとの音が類似する場合に、広告単語が当該音声データの音声認識結果として選出されるように調整する。そのため、音自体の類似度を示す音響スコアを調整するよりも、言語スコアを調整することにより、入力された音声データと音が類似する場合に、広告単語が表示出力されやすくなるようにすることができる。 In the present embodiment, the advertisement display system 100 adjusts so that the advertisement word is selected as the voice recognition result of the voice data when the sound of the advertisement word and the input voice data is similar. Therefore, by adjusting the language score rather than adjusting the acoustic score indicating the degree of similarity of the sound itself, when the sound is similar to the input voice data, the advertising word is easily displayed and output. Can do.

以下、他の例として、スコアの補正値が言語スコアに対するものである場合を説明する。音響モデル記憶部１２０のスコアの補正値欄には、言語スコアに対する補正値が記憶される。この場合、スコア調整部１０６は、広告単語については、音声認識部１０２が算出した言語スコアに広告単語記憶部１２０に記憶されたスコアの補正値を付加する調整を行う。音声認識部１０２は、スコア調整部１０６が広告単語の言語スコアに補正値を付加した場合、補正値が付加された言語スコアと音響スコアに基づき総スコアを算出することができる。 Hereinafter, as another example, a case where the score correction value is for the language score will be described. In the score correction value column of the acoustic model storage unit 120, a correction value for the language score is stored. In this case, for the advertising word, the score adjustment unit 106 performs adjustment to add the score correction value stored in the advertising word storage unit 120 to the language score calculated by the speech recognition unit 102. When the score adjustment unit 106 adds a correction value to the language score of the advertising word, the voice recognition unit 102 can calculate the total score based on the language score and the acoustic score to which the correction value is added.

この場合、言語スコアの補正値は、たとえば、入力された音声データに対して総スコアが最も高い最上位単語の総スコアと当該音声データに対する広告単語の総スコアとの差がどの程度の範囲内であれば人が類似と感じるかを考慮して総スコアの補正値の上限値を決定し、当該上限値に対応する言語スコアの補正値の上限値を算出することにより決定することができる。 In this case, the correction value of the language score is, for example, within the range of the difference between the total score of the highest word having the highest total score with respect to the input voice data and the total score of the advertising word with respect to the voice data Then, the upper limit value of the correction value of the total score is determined in consideration of whether the person feels similar, and the upper limit value of the correction value of the language score corresponding to the upper limit value can be determined.

以上のように、本実施の形態における広告表示システム１００によれば、広告単語の本来の総スコアが他の単語の総スコアよりも低い場合でも、当該広告単語と音声データとの類似度が所定の類似許容範囲内であれば、音声データの音声認識結果として広告単語が表示出力されるようになっている。本実施の形態において、類似許容範囲を規定するパラメータとしてスコアの補正値が設定される。また、スコアの補正値は、入力された音声データに対して総スコアが最も高い最上位単語の総スコアと当該音声データに対する広告単語の総スコアとの差がどの程度の範囲内であれば類似と判断できるかを考慮して上限値が決定される。そのため、広告単語が、入力された音声データに類似する場合に限って、広告単語が表示出力されるようになっているので、ユーザに自然なかたちで広告単語を提示して広告効果を高めることができる。 As described above, according to the advertisement display system 100 in the present embodiment, even when the original total score of the advertisement word is lower than the total score of other words, the similarity between the advertisement word and the voice data is predetermined. If it is within the similar permissible range, the advertisement word is displayed and output as the voice recognition result of the voice data. In the present embodiment, a score correction value is set as a parameter that defines the similar allowable range. The score correction value is similar if the difference between the total score of the highest word with the highest total score for the input voice data and the total score of the advertising words for the voice data is within the range. The upper limit is determined in consideration of whether it can be determined. Therefore, the advertising word is displayed and output only when the advertising word is similar to the input voice data, so that the advertising word can be presented to the user in a natural manner to enhance the advertising effect. Can do.

たとえば、図４および図５を参照して説明した例においては、入力された音声データが「えんにち」であれば、広告単語「ＮＥＣ」が当該音声データに類似するとして、音声認識結果として表示出力されるが、入力された音声データが「えいが」の場合、広告単語「ＮＥＣ」は音声認識結果として表示出力されない。「えんにち」と「ＮＥＣ」とは音声が類似しているので、ユーザは誤認識結果として「ＮＥＣ」が表示されたと感じ、「ＮＥＣ」が広告単語であるとは気がつかない可能性がある。これにより、「ＮＥＣ」という単語を自然なかたちでユーザの目にとまらせることができ、この単語をユーザに印象付けることができる。 For example, in the example described with reference to FIG. 4 and FIG. 5, if the input voice data is “Eniichi”, the advertisement word “NEC” is displayed as a voice recognition result, assuming that it is similar to the voice data. However, when the input voice data is “Eiga”, the advertising word “NEC” is not displayed and output as a voice recognition result. Since “enichi” and “NEC” have similar sounds, the user may feel that “NEC” is displayed as a misrecognition result and may not notice that “NEC” is an advertising word. As a result, the word “NEC” can be caught naturally by the user, and the word can be impressed by the user.

なお、図２では、広告単語記憶部１２０にスコアの補正値が記憶されている例を示したが、初期設定では、広告単語記憶部１２０には、広告単語に対応付けて、広告単語の広告料金を示す情報のみが記憶された構成とすることもできる。 FIG. 2 shows an example in which the correction value of the score is stored in the advertising word storage unit 120. However, in the initial setting, the advertising word storage unit 120 associates the advertising word with the advertising word. Only the information indicating the fee may be stored.

この構成を図６（ａ）に示す。この場合、出力調整部１０４は、広告単語記憶部１２０に記憶された広告単語の広告料金を示す情報に基づき、広告単語の広告料金のランクが高いほど高い値となるようにスコアの補正値を設定する補正値設定手段を有する構成とすることができる。本実施の形態において、スコア調整部１０６が、補正値設定手段の機能も有するようにすることができる。スコア調整部１０６は、スコアの補正値の上限値の入力を受け付けることができ、上限値以下の範囲で、広告単語の広告料金のランクに応じて、スコアの補正値を設定することができる。 This configuration is shown in FIG. In this case, the output adjustment unit 104 sets the score correction value based on the information indicating the advertisement fee of the advertisement word stored in the advertisement word storage unit 120 so that the higher the rank of the advertisement fee of the advertisement word, the higher the value. It can be configured to have a correction value setting means for setting. In the present embodiment, the score adjustment unit 106 can also have a function of a correction value setting unit. The score adjustment unit 106 can accept an input of an upper limit value of the score correction value, and can set a score correction value in accordance with the rank of the advertisement fee of the advertisement word within a range equal to or lower than the upper limit value.

図６（ｂ）に示したように、スコア調整部１０６は、設定したスコアの補正値を広告単語記憶部１２０の各広告単語に対応付けて記憶することができる。また、他の例において、スコア調整部１０６は、広告単語記憶部１２０とは別の記憶部にスコアの補正値を各広告単語に対応付けて記憶することもできる。 As shown in FIG. 6B, the score adjustment unit 106 can store the set score correction value in association with each advertisement word in the advertisement word storage unit 120. In another example, the score adjustment unit 106 may store a score correction value in association with each advertisement word in a storage unit different from the advertisement word storage unit 120.

この後の処理は、図１から図５を参照して説明した広告表示システム１００と同様とすることができる。このような構成としても、図１から図５を参照して説明した広告表示システム１００と同様の効果が得られる。 The subsequent processing can be the same as that of the advertisement display system 100 described with reference to FIGS. Even with such a configuration, the same effect as the advertisement display system 100 described with reference to FIGS. 1 to 5 can be obtained.

（第２の実施の形態）
図７は、本実施の形態における広告表示システムの構成の一例を示すブロック図である。
本実施の形態において、出力調整部１０４が、図１に示したスコア調整部１０６にかえて、類似判断部１３０を含む点で第１の実施の形態と異なる。また、本実施の形態において、広告単語記憶部１２０が、類似許容範囲を規定するパラメータとして記憶する内容も第１の実施の形態と異なる。 (Second Embodiment)
FIG. 7 is a block diagram showing an example of the configuration of the advertisement display system in the present embodiment.
This embodiment is different from the first embodiment in that the output adjustment unit 104 includes a similarity determination unit 130 instead of the score adjustment unit 106 shown in FIG. Further, in the present embodiment, the content that the advertising word storage unit 120 stores as a parameter that defines the similar allowable range is also different from that of the first embodiment.

本実施の形態において、広告表示システム１００は、広告単語と入力された音声データとの音が類似する場合に、実際の音声データの音声認識処理の結果に変えて広告単語を結果として表示出力するようにする。そのため、広告単語と入力された音声データとが類似するか否かは、言語スコアではなく、音響スコアまたは総スコアに基づき判断することができる。 In the present embodiment, the advertisement display system 100 displays and outputs the advertisement word as a result instead of the voice recognition processing result of the actual voice data when the sound of the advertisement word and the input voice data is similar. Like that. Therefore, whether or not the advertising word is similar to the input voice data can be determined based on the acoustic score or the total score, not the language score.

一例として、広告単語記憶部１２０は、類似許容範囲を規定するパラメータとして、広告単語の広告料金のランクが高いほど高い値となるスコアの許容範囲を、広告単語に対応付けて記憶する。類似判断部１３０は、音声データに対し、音声認識部１０２が算出した音響スコアまたは総スコアが最も高い最上位単語の当該音響スコアまたは当該総スコアと、広告単語の音響スコアまたは総スコアとのスコアの差がスコアの許容範囲内か否かを判断し、許容範囲内の場合に広告単語と音声データとの類似度が類似許容範囲内であると判断する。表示出力部１０８は、類似判断部１３０が広告単語と音声データとの類似度が類似許容範囲内であると判断した場合に、当該広告単語を音声データの音声認識結果として表示出力する。つまり、スコアの許容範囲が広い方が、広告単語が入力された音声データに類似すると判断される可能性が高くなり、広告単語が表示出力されやすくなる。 As an example, the advertising word storage unit 120 stores, as a parameter for defining the similarity allowable range, a score allowable range that becomes higher as the advertising charge rank of the advertising word is higher in association with the advertising word. The similarity determination unit 130 calculates a score between the acoustic score or the total score of the highest word having the highest acoustic score or the total score calculated by the voice recognition unit 102 and the acoustic score or the total score of the advertisement word for the audio data. If the difference is within the allowable range of the score, and if it is within the allowable range, it is determined that the similarity between the advertisement word and the audio data is within the allowable range of similarity. When the similarity determination unit 130 determines that the similarity between the advertisement word and the voice data is within the similarity allowable range, the display output unit 108 displays and outputs the advertisement word as a voice recognition result of the voice data. In other words, the wider the score tolerance range, the higher the possibility that it is determined that the advertising word is similar to the input voice data, and the advertising word is easily displayed and output.

図８は、本実施の形態における広告単語記憶部１２０の構成の一例を示す図である。
広告単語記憶部１２０は、広告単語欄と、広告料金欄と、スコアの許容範囲欄とを含む。広告単語欄および広告料金欄は、図２に示した構成と同様とすることができる。スコアの許容範囲欄には、たとえば「２０」や「１０」等のスコアの許容範囲が記憶される。スコアの許容範囲は、類似判断部１３０の処理に応じて、総スコアに対する許容範囲または音響スコアに対する許容範囲のいずれかとすることができる。 FIG. 8 is a diagram showing an example of the configuration of the advertising word storage unit 120 in the present embodiment.
The advertising word storage unit 120 includes an advertising word column, an advertising fee column, and a score tolerance range column. The advertising word column and the advertising fee column can be the same as the configuration shown in FIG. In the score tolerance range column, for example, a score tolerance range such as “20” or “10” is stored. The allowable range of the score can be either an allowable range for the total score or an allowable range for the acoustic score depending on the processing of the similarity determining unit 130.

たとえば、第１の実施の形態において図４を参照して説明したように、入力された音声が「えんにちにいきました」だとする。また、「えんにち」に対応する単語の音声認識結果として、「縁日」、・・・、「ＮＥＣ」・・・等が候補の単語として挙げられるとする（図４（ｂ））。また、音声認識部１０２による通常の音声認識処理の結果の総スコアが「縁日」＝「３０」、「ＮＥＣ」＝「１６」だとする。ここで、総スコアが最も高い「縁日」の総スコア「３０」と広告単語である「ＮＥＣ」の総スコア「１６」との差は「１４」で、図８に示した広告単語「ＮＥＣ」のスコアの許容範囲「２０」以内である。そのため、類似判断部１３０は、広告単語「ＮＥＣ」と音声データとの類似度が類似許容範囲内であると判断し、表示出力部１０８は、音声認識結果として「ＮＥＣに行きました」等を出力する。 For example, as described with reference to FIG. 4 in the first embodiment, it is assumed that the input voice is “I went to life”. In addition, as a speech recognition result of a word corresponding to “enichi”, it is assumed that “female day”,..., “NEC”... Are candidate words (FIG. 4B). Further, it is assumed that the total score of the result of the normal speech recognition processing by the speech recognition unit 102 is “Family Day” = “30”, “NEC” = “16”. Here, the difference between the total score “30” of “Fair Day” with the highest total score and the total score “16” of the advertising word “NEC” is “14”, and the advertising word “NEC” shown in FIG. Is within the allowable range of “20”. For this reason, the similarity determination unit 130 determines that the similarity between the advertising word “NEC” and the voice data is within the allowable range of similarity, and the display output unit 108 indicates “I went to NEC” or the like as the voice recognition result. Output.

他の例として、広告単語記憶部１２０は、類似許容範囲を規定するパラメータとして、広告単語の広告料金のランクが高いほど高い値となる順位の許容範囲を、広告単語に対応付けて記憶する。類似判断部１３０は、音声データに対し、音声認識部１０２が算出した音響スコアまたは総スコアが高い順に音声認識結果の候補として並べたときに、広告単語の順位が許容範囲内か否かを判断し、許容範囲内の場合に広告単語と音声データとの類似度が類似許容範囲内であると判断する。表示出力部１０８は、類似判断部１３０が広告単語と音声データとの類似度が類似許容範囲内であると判断した場合に、当該広告単語を音声データの音声認識結果として表示出力する。つまり、順位の許容範囲が広い方が、広告単語が入力された音声データに類似すると判断される可能性が高くなり、広告単語が表示出力されやすくなる。 As another example, the advertising word storage unit 120 stores, as a parameter that defines the similarity allowable range, an allowable range having a higher value as the advertising fee rank of the advertising word is higher in association with the advertising word. The similarity determination unit 130 determines whether or not the rank of the advertising word is within an allowable range when the speech data is arranged as a speech recognition result candidate in descending order of the acoustic score or total score calculated by the speech recognition unit 102. If it is within the allowable range, it is determined that the similarity between the advertisement word and the audio data is within the similar allowable range. When the similarity determination unit 130 determines that the similarity between the advertisement word and the voice data is within the similarity allowable range, the display output unit 108 displays and outputs the advertisement word as a voice recognition result of the voice data. In other words, the wider the ranking tolerance range, the higher the possibility that it is determined that the advertising word is similar to the input voice data, and the advertising word is easily displayed and output.

図９は、本実施の形態における広告単語記憶部１２０の構成の一例を示す図である。
広告単語記憶部１２０は、広告単語欄と、広告料金欄と、順位の許容範囲欄とを含む。広告単語欄および広告料金欄は、図２に示した構成と同様とすることができる。順位の許容範囲欄には、たとえば「６」や「３」等の順位の許容範囲が記憶される。順位の許容範囲は、類似判断部１３０の処理に応じて、総スコアに基づく順位の許容範囲または音響スコアに基づく順位の許容範囲のいずれかとすることができる。 FIG. 9 is a diagram showing an example of the configuration of the advertising word storage unit 120 in the present embodiment.
The advertising word storage unit 120 includes an advertising word column, an advertising fee column, and a rank allowable range column. The advertising word column and the advertising fee column can be the same as the configuration shown in FIG. In the rank allowable range column, for example, a rank allowable range such as “6” or “3” is stored. The allowable rank range can be either an allowable rank range based on the total score or an allowable rank range based on the acoustic score depending on the processing of the similarity determining unit 130.

ここでも、たとえば、入力された音声が「えんにちにいきました」だとする。また、「えんにち」に対応する単語の音声認識結果として、「縁日」、・・・、「ＮＥＣ」・・・等が候補の単語として挙げられるとする。ここで、総スコアが高い順に音声認識結果の候補として並べたときに、「縁日」が１位で「ＮＥＣ」が３位だとする。この場合、図９に示した広告単語「ＮＥＣ」の順位の許容範囲「６」以内となる。そのため、類似判断部１３０は、広告単語「ＮＥＣ」と音声データとの類似度が類似許容範囲内であると判断し、表示出力部１０８は、音声認識結果として「ＮＥＣに行きました」等を出力する。 Again, for example, suppose that the input voice is “I went to life”. In addition, as a speech recognition result of a word corresponding to “enichi”, it is assumed that “sales day”,..., “NEC”. Here, it is assumed that, when arranged as a candidate for the speech recognition result in descending order of the total score, “Funny Day” is first and “NEC” is third. In this case, the rank of the advertisement word “NEC” shown in FIG. 9 is within the allowable range “6”. For this reason, the similarity determination unit 130 determines that the similarity between the advertising word “NEC” and the voice data is within the allowable range of similarity, and the display output unit 108 indicates “I went to NEC” or the like as the voice recognition result. Output.

本実施の形態においても、第１の実施の形態で説明したスコアの補正値と同様、スコアの許容範囲および順位の許容範囲は、入力された音声データとの差がどの程度の範囲内であれば人が類似と感じるかを考慮して上限値を決定することができる。本実施の形態においても、スコアの許容範囲および順位の許容範囲は、このように決定した上限値以下の範囲で、広告単語の広告料金のランクが高いほど高い値となるように設定することができる。本実施の形態においても、第１の実施の形態と同様の効果が得られる。 Also in the present embodiment, like the score correction values described in the first embodiment, the allowable range of scores and the allowable range of ranks are within the range of the difference from the input voice data. The upper limit value can be determined in consideration of whether people feel similar. Also in the present embodiment, the allowable range of the score and the allowable range of the ranking may be set so as to be higher as the rank of the advertising fee of the advertising word is higher than the upper limit value determined as described above. it can. Also in this embodiment, the same effect as that of the first embodiment can be obtained.

（第３の実施の形態）
図１０は、本実施の形態における広告表示システムの構成の一例を示すブロック図である。
本実施の形態において、出力調整部１０４が、図７に示した構成に加えて、類似単語抽出部１３２も含む点で第２の実施の形態と異なる。また、本実施の形態において、広告単語記憶部１２０が、類似許容範囲を規定するパラメータとして記憶する内容も第２の実施の形態と異なる。 (Third embodiment)
FIG. 10 is a block diagram showing an example of the configuration of the advertisement display system in the present embodiment.
This embodiment is different from the second embodiment in that the output adjustment unit 104 includes a similar word extraction unit 132 in addition to the configuration shown in FIG. Further, in the present embodiment, the content that the advertising word storage unit 120 stores as a parameter that defines the similar allowable range is also different from that of the second embodiment.

図１１（ａ）は、本実施の形態における広告単語記憶部１２０の構成の一例を示す図である。広告単語記憶部１２０は、類似許容範囲を規定するパラメータとして、広告単語の広告料金のランクが高いほど高い値となる、当該広告単語に類似する類似単語の登録可能個数（個数）を、広告単語に対応付けて記憶する。ここで、たとえば広告単語「ＮＥＣ」の登録可能個数は「２０」と設定されている。 FIG. 11A is a diagram illustrating an example of the configuration of the advertising word storage unit 120 in the present embodiment. The advertising word storage unit 120 sets the number of similar words similar to the advertising word that can be registered as the parameter that defines the similar allowable range, the higher the advertising fee rank of the advertising word is higher, Is stored in association with. Here, for example, the registerable number of advertisement words “NEC” is set to “20”.

類似単語抽出部１３２は、広告単語と類似する類似単語を登録可能個数だけ抽出する。類似単語抽出部１３２は、音声認識部１０２が音声認識処理を行うのと同様にして、たとえば音響モデル記憶部１１０や言語モデル記憶部１１２を用いて、広告単語の音声データが入力されたとして音声認識処理を行うことができる。類似単語抽出部１３２は、音声認識処理の結果として選出される単語を、総スコアまたは音響スコアが高い順に、登録可能個数だけ広告単語に類似する類似単語として抽出することができる。 The similar word extraction unit 132 extracts as many similar words as the adwords that can be registered. The similar word extraction unit 132 uses the acoustic model storage unit 110 and the language model storage unit 112, for example, as the voice recognition unit 102 performs the voice recognition processing, and uses the voice data of the advertising word as input. Recognition processing can be performed. The similar word extraction unit 132 can extract words selected as a result of the speech recognition process as similar words that are similar to the advertisement word in the order of the total score or the acoustic score in the order of the maximum number that can be registered.

類似単語抽出部１３２は、抽出した類似単語を各広告単語に対応付けて広告単語記憶部１２０に記憶する。この例を図１１（ｂ）に示す。ここで、たとえば広告単語「ＮＥＣ」の類似単語としては、「縁日」、「英字」等が記憶されている。なお、以上の手順において、類似単語抽出部１３２は、音声認識処理を音声認識部１０２に依頼し、音声認識部１０２から音声認識処理の結果を受け取るようにすることもできる。 The similar word extracting unit 132 stores the extracted similar words in the advertising word storage unit 120 in association with each advertising word. An example of this is shown in FIG. Here, for example, as a similar word of the advertising word “NEC”, “Family day”, “English character” and the like are stored. In the above procedure, the similar word extraction unit 132 can request the speech recognition unit 102 to perform speech recognition processing and receive the result of the speech recognition processing from the speech recognition unit 102.

本実施の形態において、類似判断部１３０は、音声認識部１０２が音声データの音声認識結果として選出した単語として、広告単語または類似単語が挙げられた場合に、広告単語と音声データとの類似度が類似許容範囲内であると判断する。表示出力部１０８は、類似判断部１３０が広告単語と音声データとの類似度が類似許容範囲内であると判断した場合に、当該広告単語を音声データの音声認識結果として表示出力する。つまり、類似単語の登録数が多い方が、広告単語が入力された音声データに類似すると判断される可能性が高くなり、広告単語が表示出力されやすくなる。 In the present embodiment, the similarity determining unit 130, when an advertisement word or a similar word is cited as the word selected by the speech recognition unit 102 as the speech recognition result of the speech data, the similarity between the advertisement word and the speech data. Is determined to be within the similar allowable range. When the similarity determination unit 130 determines that the similarity between the advertisement word and the voice data is within the similarity allowable range, the display output unit 108 displays and outputs the advertisement word as a voice recognition result of the voice data. In other words, the higher the number of registered similar words, the higher the possibility that it is determined that the advertising word is similar to the input voice data, and the advertising word is easily displayed and output.

また、他の例において、類似許容範囲を規定するパラメータとして、広告単語の広告料金のランクが高いほど高い値となる個数の当該広告単語に類似する類似単語を、広告単語に対応付けて広告単語記憶部１２０に予め登録しておくこともできる。本例において、広告単語記憶部１２０は、たとえば図１１（ｂ）に示したような構成となっている。 In another example, as a parameter for defining the similarity allowable range, the number of similar words that are similar to the number of advertisement words that are higher as the advertisement charge rank of the advertisement word is higher are associated with the advertisement word and the advertisement word. It can also be registered in advance in the storage unit 120. In this example, the advertising word storage unit 120 has a configuration as shown in FIG.

この例において、たとえば、類似単語は以下の手順で選択することができる。まず、音声認識部１０２の機能、または他の音声認識処理システムを用いて、広告単語を発話した音声データに対する音声認識処理を行い、音声認識処理の結果の候補となる複数の単語の総スコアまたは音響スコアをそれぞれ算出する。次いで、総スコアまたは音響スコアが高い順、または総スコアまたは音響スコアが所定値以上の単語を複数抽出する。このとき、広告単語記憶部１２０に登録する類似単語の個数に対応する数の単語を抽出してそれらを類似単語として広告単語記憶部１２０に登録することもできるが、その個数以上の数の単語を抽出した後、その中から広告主が所望する単語を上記個数だけ選択し、それらを類似単語として広告単語記憶部１２０に登録することもできる。 In this example, for example, a similar word can be selected by the following procedure. First, using the function of the speech recognition unit 102 or another speech recognition processing system, speech recognition processing is performed on speech data that utters an advertising word, and a total score of a plurality of words that are candidates for speech recognition processing results or Each acoustic score is calculated. Next, a plurality of words whose total score or acoustic score is higher or whose total score or acoustic score is a predetermined value or more are extracted. At this time, it is possible to extract a number of words corresponding to the number of similar words registered in the advertising word storage unit 120 and register them as similar words in the advertising word storage unit 120. Then, the above-mentioned number of words desired by the advertiser can be selected from them and registered in the advertising word storage unit 120 as similar words.

この場合、広告表示システム１００の出力調整部１０４は、類似単語抽出部１３２を有しない構成とすることができる。つまり、この場合、広告表示システム１００の構成は、図５に示した構成と同様とすることができる。 In this case, the output adjustment unit 104 of the advertisement display system 100 can be configured not to include the similar word extraction unit 132. That is, in this case, the configuration of the advertisement display system 100 can be the same as the configuration shown in FIG.

類似判断部１３０は、音声認識部１０２が音声データの音声認識結果として選出した単語として、広告単語または類似単語が挙げられた場合に、広告単語と音声データとの類似度が類似許容範囲内であると判断する。表示出力部１０８は、類似判断部１３０が広告単語と音声データとの類似度が類似許容範囲内であると判断した場合に、当該広告単語を音声データの音声認識結果として表示出力する。 The similarity determination unit 130 determines that the similarity between the advertisement word and the voice data is within the allowable range when the advertisement word or the similar word is given as the word selected by the voice recognition unit 102 as the voice recognition result of the voice data. Judge that there is. When the similarity determination unit 130 determines that the similarity between the advertisement word and the voice data is within the similarity allowable range, the display output unit 108 displays and outputs the advertisement word as a voice recognition result of the voice data.

本実施の形態においても、第１の実施の形態で説明したスコアの補正値と同様、登録可能個数は、入力された音声データとの差がどの程度の範囲内であれば人が類似と感じるかを考慮して上限値を決定することができる。本実施の形態においても、登録可能個数は、このように決定した上限値以下の範囲で、広告単語の広告料金のランクが高いほど高い値となるように設定することができる。本実施の形態においても、第１の実施の形態と同様の効果が得られる。 Also in the present embodiment, like the score correction values described in the first embodiment, the number that can be registered is similar to a person if the difference from the input voice data is within a range. The upper limit value can be determined in consideration of the above. Also in the present embodiment, the number that can be registered can be set to be higher as the rank of the advertising fee of the advertising word is higher within the range of the upper limit determined in this way. Also in this embodiment, the same effect as that of the first embodiment can be obtained.

（第４の実施の形態）
図１２は、本実施の形態における広告表示システムの構成の一例を示すブロック図である。
広告表示システム１００は、図１に示した構成に加えて、さらにモード設定部１３４を含む点で、第１の実施の形態に記載した構成と異なる。本実施の形態において、出力調整部１０４は、モード設定部１３４の設定に基づき、音声認識部１０２が選出した音声認識結果をそのまま出力する通常モードの処理と、出力調整部１０４により調整した結果を出力する広告モードの処理のいずれかを選択して実行する。 (Fourth embodiment)
FIG. 12 is a block diagram showing an example of the configuration of the advertisement display system in the present embodiment.
The advertisement display system 100 is different from the configuration described in the first embodiment in that it further includes a mode setting unit 134 in addition to the configuration shown in FIG. In the present embodiment, the output adjustment unit 104 performs normal mode processing in which the voice recognition result selected by the voice recognition unit 102 is output as it is based on the setting of the mode setting unit 134 and the result adjusted by the output adjustment unit 104. Select and execute one of the advertisement mode processes to be output.

たとえば、第１の実施の形態で説明した手順においては、広告単語と入力された音声データとの類似度が類似許容範囲内であれば、音声認識処理の結果の総スコアが他の単語に比べて最高でなくても、音声認識結果として広告単語が表示出力されることになる。そのため、精度の高い音声認識処理の結果を求めるユーザにとっては、使い勝手が悪くなってしまう。そこで、本実施の形態において、広告表示システム１００は、通常の音声認識処理の結果が出力される通常モードの処理も選択可能な構成とすることができる。 For example, in the procedure described in the first embodiment, if the degree of similarity between the advertising word and the input voice data is within the allowable similarity range, the total score of the result of the voice recognition processing is compared with other words. Even if it is not the best, the advertisement word is displayed and output as a voice recognition result. For this reason, the user who obtains the result of the voice recognition processing with high accuracy is unusable. Therefore, in the present embodiment, the advertisement display system 100 can be configured to be able to select normal mode processing in which the result of normal speech recognition processing is output.

モード設定部１３４は、通常の音声認識処理を行う通常モードと、広告モードとのいずれが設定されているかを記憶する設定記憶部を含むことができる。スコア調整部１０６は、音声認識部１０２が音声認識処理を行うと、広告モードと通常モードとのいずれが設定されているかを判断する。スコア調整部１０６は、通常モードが設定されている場合は、音声認識部１０２の音声認識処理の結果をそのまま表示出力部１０８から表示出力するようにする。一方、広告モードが設定されている場合は、スコア調整部１０６は、広告単語が表示出力されやすくなるように上述したような調整処理を行い、調整処理後の結果を表示出力部１０８から表示出力するようにする。 The mode setting unit 134 can include a setting storage unit that stores which of a normal mode for performing normal voice recognition processing and an advertisement mode is set. When the speech recognition unit 102 performs speech recognition processing, the score adjustment unit 106 determines which one of the advertisement mode and the normal mode is set. When the normal mode is set, the score adjustment unit 106 displays and outputs the result of the speech recognition process of the speech recognition unit 102 as it is from the display output unit 108. On the other hand, when the advertisement mode is set, the score adjustment unit 106 performs the adjustment process as described above so that the advertisement word is easily displayed and output, and the display output unit 108 displays the result after the adjustment process. To do.

図１３は、本実施の形態における広告表示システム１００の処理手順を示すフローチャートである。
ステップＳ１０１およびステップＳ１０２の処理は、第１の実施の形態において、図３を参照して説明した手順と同様とすることができる。この後、スコア調整部１０６は、広告モードと通常モードとのいずれが設定されているかを判断する（ステップＳ１０３）。広告モードに設定されている場合（ステップＳ１０３のＹＥＳ）、広告単語について、広告料金に応じたスコア調整を行う（ステップＳ１０４）。表示出力部１０８は、スコア調整部１０６によるスコアの調整が行われた後のスコアに基づき、スコアが最も高い単語を音声認識結果として出力する（ステップＳ１０６）。 FIG. 13 is a flowchart showing a processing procedure of the advertisement display system 100 in the present embodiment.
The processing in steps S101 and S102 can be the same as the procedure described with reference to FIG. 3 in the first embodiment. Thereafter, the score adjustment unit 106 determines which of the advertisement mode and the normal mode is set (step S103). When the advertisement mode is set (YES in step S103), the score adjustment corresponding to the advertisement fee is performed for the advertisement word (step S104). The display output unit 108 outputs the word with the highest score as a speech recognition result based on the score after the score adjustment by the score adjustment unit 106 (step S106).

一方、ステップＳ１０３において、広告モードに設定されておらず、通常モードの場合（ステップＳ１０３のＮＯ）、スコア調整部１０６は、調整処理を行わず、表示出力部１０８は、音声認識部１０２の音声認識処理によるスコアに基づき、スコアが最も高い単語を音声認識結果として出力する（ステップＳ１０６）。 On the other hand, in the case where the advertisement mode is not set in step S103 and the normal mode is set (NO in step S103), the score adjustment unit 106 does not perform the adjustment process, and the display output unit 108 displays the voice of the voice recognition unit 102. Based on the score of the recognition process, the word with the highest score is output as a speech recognition result (step S106).

なお、第２の実施の形態および第３の実施の形態で説明した広告表示システム１００においても、同様にモード設定部１３４を含む構成とし、通常の音声認識処理の結果が出力される通常モードの処理も選択可能な構成とすることができる。 Note that the advertisement display system 100 described in the second embodiment and the third embodiment also includes the mode setting unit 134 in the same manner, and the normal mode in which the result of the normal voice recognition process is output. The processing can also be selected.

また、ここでは図示していないが、広告表示システム１００は、ユーザをたとえばユーザＩＤ等によって識別する機能を有することができ、ユーザによって、通常モードと広告モードのいずれを用いるかを決定することができる。たとえば、まだユーザ登録をしておらず、お試し版として音声認識部１０２の音声認識処理機能を用いるようなユーザには、広告モードでの音声認識処理の結果を表示出力するようにしてもよい。一方、ユーザ登録をしており、たとえばサービス利用料金を支払っているようなユーザには、通常モードでの音声認識処理の結果を表示出力するようにしてもよい。 Although not shown here, the advertisement display system 100 can have a function of identifying a user by, for example, a user ID, and the user can determine whether to use the normal mode or the advertisement mode. it can. For example, for a user who has not registered yet and uses the voice recognition processing function of the voice recognition unit 102 as a trial version, the result of the voice recognition processing in the advertisement mode may be displayed and output. . On the other hand, for a user who has registered as a user and paid a service usage fee, for example, the result of the speech recognition processing in the normal mode may be displayed and output.

以上、図面を参照して本発明の実施形態について述べたが、これらは本発明の例示であり、上記以外の様々な構成を採用することもできる。 As mentioned above, although embodiment of this invention was described with reference to drawings, these are the illustrations of this invention, Various structures other than the above are also employable.

広告表示システム１００は、図１５に示したように、課金処理部１４０および課金情報記憶部１４２をさらに含む構成とすることができる。課金処理部１４０は、音声認識部１０２の音声認識処理の結果によれば、本来広告単語が結果として表示出力されないにも関わらず、出力調整部１０４の調整処理により広告単語が表示出力された場合、つまり入力された音声データに対する広告単語の本来の総スコアが他の単語の総スコアより低かったにも関わらず表示出力部１０８から広告単語が表示出力された出力回数を計数（カウント）する。課金処理部１４０は、その出力回数および広告単語記憶部１２０に設定された広告料金に基づき、各広告単語の料金を算出する。 As shown in FIG. 15, the advertisement display system 100 can further include a billing processing unit 140 and a billing information storage unit 142. According to the result of the speech recognition process performed by the speech recognition unit 102, the billing processing unit 140 displays the advertisement word as a result of the adjustment process performed by the output adjustment unit 104 even though the advertisement word is not originally displayed and output as a result. That is, the number of times the advertisement word is displayed and output from the display output unit 108 is counted even though the original total score of the advertisement word with respect to the input voice data is lower than the total score of other words. The billing processing unit 140 calculates a charge for each advertisement word based on the number of outputs and the advertisement charge set in the advertisement word storage unit 120.

課金処理部１４０は、音声認識部１０２に入力された音声データが最初から広告単語だった場合は課金しないようにする。課金処理部１４０は、たとえば音声認識部１０２から出力される音声認識処理の結果と表示出力部１０８から出力される結果とを比較して、出力調整部１０４の調整処理により広告単語が表示出力された場合の出力回数を計数することができる。課金処理部１４０は、出力回数や料金を広告単語毎に課金情報記憶部１４２に記憶することができる。ここでは第１の実施の形態で説明した広告表示システム１００が課金処理部１４０および課金情報記憶部１４２を含む構成を示したが、第２の実施の形態から第４の実施の形態における広告表示システム１００においても、課金処理部１４０および課金情報記憶部１４２を含む構成とすることができる。 The charging processing unit 140 does not charge when the voice data input to the voice recognition unit 102 is an advertisement word from the beginning. For example, the billing processing unit 140 compares the result of the speech recognition process output from the speech recognition unit 102 with the result output from the display output unit 108, and the advertisement word is displayed and output by the adjustment process of the output adjustment unit 104. In this case, the number of outputs can be counted. The billing processing unit 140 can store the number of outputs and the charge in the billing information storage unit 142 for each advertising word. Here, the configuration in which the advertisement display system 100 described in the first embodiment includes the billing processing unit 140 and the billing information storage unit 142 is shown, but the advertisement display in the second to fourth embodiments is shown. The system 100 can also include a charging processing unit 140 and a charging information storage unit 142.

また、以上の実施の形態においては、表示出力部１０８が、総スコアが最も高い単語を一つのみ表示出力することができる構成を示したが、表示出力部１０８は、総スコアが高い単語を、総スコアが高い順に複数表示出力して、ユーザに選択させるようにすることもできる。 In the above embodiment, the display output unit 108 has been configured to display and output only one word having the highest total score. However, the display output unit 108 displays the word having the highest total score. It is also possible to output a plurality of items in descending order of the total score so that the user can select them.

また、表示出力部１０８から広告単語が表示出力され、ユーザ端末２００のディスプレイに広告単語が表示された後、たとえばユーザが当該広告単語をクリックすると、出力調整部１０４による調整処理前の通常の音声認識処理の結果で総スコアが高かった本来の音声認識処理の結果の単語が表示される構成とすることもできる。 Further, after the advertisement word is displayed and output from the display output unit 108 and the advertisement word is displayed on the display of the user terminal 200, for example, when the user clicks on the advertisement word, the normal voice before the adjustment processing by the output adjustment unit 104 is performed. It is also possible to adopt a configuration in which words as a result of the original speech recognition process that has a high total score as a result of the recognition process are displayed.

また、表示出力部１０８から広告単語が表示出力され、ユーザ端末２００のディスプレイに広告単語が表示された後、たとえばユーザが当該広告単語をクリックすると、その広告単語の広告主が提供するサイトにアクセスできる構成とすることもできる。 Further, after the advertisement word is displayed and output from the display output unit 108 and the advertisement word is displayed on the display of the user terminal 200, for example, when the user clicks the advertisement word, the site provided by the advertiser of the advertisement word is accessed. It can also be set as the structure which can be performed.

図１、図７、図１０、図１２および図１５に示した広告表示システム１００の各構成要素は、ハードウエア単位の構成ではなく、機能単位のブロックを示している。広告表示システム１００の各構成要素は、任意のコンピュータのＣＰＵ、メモリ、メモリにロードされた本図の構成要素を実現するプログラム、そのプログラムを格納するハードディスクなどの記憶ユニット、ネットワーク接続用インタフェースを中心にハードウエアとソフトウエアの任意の組合せによって実現される。そして、その実現方法、装置にはいろいろな変形例があることは、当業者には理解されるところである。 Each component of the advertisement display system 100 shown in FIG. 1, FIG. 7, FIG. 10, FIG. 12 and FIG. 15 is not a hardware unit configuration but a functional unit block. Each component of the advertisement display system 100 is centered on an arbitrary computer CPU, memory, a program for realizing the components shown in the figure loaded in the memory, a storage unit such as a hard disk for storing the program, and a network connection interface. It is realized by any combination of hardware and software. It will be understood by those skilled in the art that there are various modifications to the implementation method and apparatus.

１００広告表示システム
１０２音声認識部
１０４出力調整部
１０６スコア調整部
１０８表示出力部
１１０音響モデル記憶部
１１２言語モデル記憶部
１２０広告単語記憶部
１３０類似判断部
１３２類似単語抽出部
１３４モード設定部
１４０課金処理部
１４２課金情報記憶部
１５０ネットワーク
２００ユーザ端末 DESCRIPTION OF SYMBOLS 100 Advertisement display system 102 Speech recognition part 104 Output adjustment part 106 Score adjustment part 108 Display output part 110 Acoustic model memory | storage part 112 Language model memory | storage part 120 Advertising word memory | storage part 130 Similarity judgment part 132 Similar word extraction part 134 Mode setting part 140 Charge Processing unit 142 Accounting information storage unit 150 Network 200 User terminal

Claims

Based on the acoustic model and the language model, the acoustic score of the words that are candidates for the speech recognition result for the input speech data, the language score, and the total score based on the acoustic score and the language score are calculated, and the total score is high. Speech recognition means for selecting words as speech recognition results;
Including an advertising word storage means for storing an advertising word for which an advertising fee is paid as a consideration for facilitating selection as a speech recognition result, and the similarity between the advertising word and the voice data is within a predetermined similarity allowable range Adjusting so that the advertisement word is selected as the voice recognition result of the voice data, and output adjusting means for displaying and outputting the adjusted result as the voice recognition result of the voice data;
Including
The advertisement display system in which the similarity allowable range is set wider as the advertisement charge rank of the advertisement word is higher.

The advertisement display system according to claim 1,
The advertisement word storage means stores a correction value of a score that is higher as the advertisement charge rank of the advertisement word is higher as a parameter for defining the similarity allowable range, in association with the advertisement word,
The output adjusting means is
For the advertising word, score adjustment means for performing adjustment to add a correction value of the score to the total score or the language score calculated by the voice recognition means;
Display output means for displaying and outputting a word having a high total score as a voice recognition result of the voice data based on the score adjusted by the score adjusting means;
Advertising display system including.

The advertisement display system according to claim 1,
The advertising word storage means stores information indicating an advertising fee of the advertising word in association with the advertising word,
The output adjusting means is
Correction value setting means for setting a correction value of a score that becomes higher as the rank of the advertisement fee of the advertisement word is higher, as a parameter that defines the similar allowable range, based on information indicating the advertisement fee of the advertisement word;
For the advertising word, score adjusting means for performing adjustment to add a correction value of the score to the total score or the language score calculated by the voice recognition means;
Display output means for displaying and outputting a word having a high total score as a voice recognition result of the voice data based on the score adjusted by the score adjusting means;
Advertising display system including.

The advertisement display system according to claim 2 or 3,
The score correction value is a language score correction value,
The said score adjustment means is an advertisement display system which performs adjustment which adds the correction value of the said language score to the said language score which the said speech recognition means calculates about the said advertising word.

The advertisement display system according to claim 1,
The advertising word storage means stores, as a parameter for defining the similarity allowable range, an allowable range of a score that becomes higher as the advertising fee rank of the advertising word is higher in association with the advertising word,
The output adjusting means is
The difference in score between the acoustic score or the total score of the top word with the highest acoustic score or the total score calculated by the voice recognition means and the acoustic score or the total score of the advertising word is calculated for the voice data. Similarity determination means for determining whether the score is within an allowable range, and determining that the similarity between the advertising word and the audio data is within the allowable range if the score is within the allowable range;
Display output means for displaying and outputting the advertisement word as a voice recognition result of the voice data when the similarity judgment means determines that the similarity between the advertisement word and the voice data is within the similarity allowable range;
Advertising display system including.

The advertisement display system according to claim 1,
The advertising word storage means stores, as a parameter defining the similarity allowable range, an allowable range of a rank that becomes a higher value as the advertising fee rank of the advertising word is higher in association with the advertising word,
The output adjusting means is
When the speech data is arranged as a speech recognition result candidate in descending order of the acoustic score or total score calculated by the speech recognition means, it is determined whether or not the rank of the advertising word is within the allowable range, Similarity determination means for determining that the similarity between the advertising word and the audio data is within the similarity allowable range when within the range;
Display output means for displaying and outputting the advertisement word as a voice recognition result of the voice data when the similarity judgment means determines that the similarity between the advertisement word and the voice data is within the similarity allowable range;
Advertising display system including.

The advertisement display system according to claim 1,
The advertisement word storage means sets the number of similar words similar to the advertisement word that can be registered as a parameter that defines the similarity allowable range as the advertisement charge rank of the advertisement word is higher. And store it in association with
The output adjusting means is
Similar word extraction means for extracting similar words similar to the advertisement word by the registerable number;
When the advertisement word or the similar word is cited as the word selected as the speech recognition result of the speech data by the speech recognition means, the similarity between the advertisement word and the speech data is within the similarity allowable range. Similarity determination means for determining that there is,
Display output means for displaying and outputting the advertisement word as a voice recognition result of the voice data when the similarity judgment means determines that the similarity between the advertisement word and the voice data is within the similarity allowable range;
Advertising display system including.

The advertisement display system according to claim 1,
The advertisement word storage means associates, as a parameter for defining the similarity allowable range, a similar number of similar words that are similar to the advertisement word with a higher value as the advertisement charge rank of the advertisement word is higher. Remember,
The output adjusting means is
When the advertisement word or the similar word is cited as the word selected as the speech recognition result of the speech data by the speech recognition means, the similarity between the advertisement word and the speech data is within an allowable range. Similarity determination means for determining that the similarity is within the allowable range;
Display output means for displaying and outputting the advertisement word as a voice recognition result of the voice data when the similarity judgment means determines that the similarity between the advertisement word and the voice data is within the similarity allowable range;
Advertising display system including.

The advertisement display system according to any one of claims 1 to 8,
The output adjustment unit selects, based on the setting, one of a normal mode process for outputting the voice recognition result selected by the voice recognition unit as it is and an advertisement mode process for outputting the result adjusted by the output adjustment unit And display advertising system.

The advertisement display system according to any one of claims 1 to 9,
Although the total score calculated by the voice recognition unit is lower than other words, the number of times the advertising word is displayed and output as a voice recognition result as a result of adjustment by the processing adjustment unit is counted. An advertisement display system further including billing processing means for calculating an advertisement fee in response.

An advertisement display method using a computer system including an advertisement word storage means for storing an advertisement word for which an advertisement fee is paid as a price for facilitating selection as a speech recognition result,
Based on the acoustic model and the language model, an acoustic score, a language score, and a total score based on the acoustic score and the language score are calculated for a word that is a speech recognition result candidate for the input speech data. A speech recognition step for selecting high words as speech recognition results;
When the similarity between the advertisement word and the voice data is within a predetermined similarity tolerance range, the advertisement word is adjusted to be selected as a voice recognition result of the voice data, and the adjusted result is the voice data. Output adjustment step to display and output as a voice recognition result of
Including
The advertisement display method set so that the similarity allowable range becomes wider as the advertisement charge rank of the advertisement word is higher.

Computer
Based on the acoustic model and the language model, an acoustic score, a language score, and a total score based on the acoustic score and the language score are calculated for a word that is a speech recognition result candidate for the input speech data. Speech recognition means for selecting high words as speech recognition results,
Including an advertising word storage means for storing an advertising word for which an advertising fee is paid as a consideration for facilitating selection as a speech recognition result, and the similarity between the advertising word and the voice data is within a predetermined similarity allowable range Adjusting the output so that the advertisement word is selected as the voice recognition result of the voice data, and displaying and outputting the adjusted result as the voice recognition result of the voice data;
A program that functions as
An advertisement display program that is set such that the higher the rank of the advertisement charge of the advertisement word, the wider the similar allowable range.