JP4854032B2

JP4854032B2 - Acoustic likelihood parallel computing device and program for speech recognition

Info

Publication number: JP4854032B2
Application number: JP2007254642A
Authority: JP
Inventors: 良一八木
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2007-09-28
Filing date: 2007-09-28
Publication date: 2012-01-11
Anticipated expiration: 2027-09-28
Also published as: JP2009086202A

Description

本発明は音声認識における音響尤度並列計算装置及びそのプログラムに関し、特に音声認識における音響特徴量の積和演算部分を並列演算化することで尤度計算の処理時間を短縮可能にする音響尤度計算装置及びそのプログラムに関する。 The present invention relates to an acoustic likelihood parallel calculation apparatus and its program in speech recognition, and in particular, acoustic likelihood that can shorten the processing time of likelihood calculation by parallelizing product-sum operation parts of acoustic features in speech recognition. The present invention relates to a computing device and its program.

音声認識では、マイクからデジタル・サンプリングされた音声は時系列の音響特徴量に変換され、母音や子音といった認識単位毎の音響モデルとパターン・マッチングが行われる。一方、隠れマルコフモデル（Hidden Markov Model: HMM）で表現される音響モデルは、数百の状態（State）で構成され、それぞれの状態は音響特徴量（38次元）の入力に対して尤度（対数化した確率）を出力する多次元正規分布を持つ。 In speech recognition, speech digitally sampled from a microphone is converted into time-series acoustic features, and pattern matching is performed with acoustic models for each recognition unit such as vowels and consonants. On the other hand, an acoustic model expressed by a Hidden Markov Model (HMM) is composed of hundreds of states (State), and each state has a likelihood (38-dimensional) for the input of acoustic features (38 dimensions). It has a multidimensional normal distribution that outputs logarithmic probabilities.

この多次元正規分布は次の演算式（１）で表される。 This multidimensional normal distribution is expressed by the following arithmetic expression (1).

ここに、μは平均、σは分散、χは互いに独立する３８次元（３８個）のパラメータである。 Here, μ is an average, σ is variance, and χ is a 38-dimensional (38) parameter independent of each other.

前記（１）式を対数化して尤度にすると、次の演算式（２）の通り、２次関数の和で表される。 When the equation (1) is logarithmized to be a likelihood, it is represented by the sum of quadratic functions as the following equation (2).

ここに、演算式（２）の括弧{ }内の第１項は定数項であり、予め計算しておけるため、演算式（２）は括弧{ }内の第二項の３８回の積和演算となる。この音響特徴量の積和演算を音響尤度計算と呼ぶ。 Here, since the first term in parentheses {} of the arithmetic expression (2) is a constant term and can be calculated in advance, the arithmetic expression (2) is the 38th product sum of the second term in the parentheses {}. It becomes an operation. This sum-of-products calculation of acoustic features is called acoustic likelihood calculation.

さて、下記の特許公報１には、前記尤度計算演算式（演算式（２））を単純な積和演算式へ分解し、個々の積和演算の計算コストを下げることで高速実行する方法が開示されている。また、下記の特許文献２には、音響特徴量の変化量に閾値を設け、音響尤度計算の実行回数の絞込みを行うことで高速実行する方法が開示されている。さらに、下記の特許文献３には、公知の尤度計算演算式の定数部をテーブル化し、計算量を削減することで高速実行する方法が開示されている。 Now, the following Patent Publication 1 discloses a method for performing high-speed execution by decomposing the likelihood calculation expression (operation expression (2)) into a simple product-sum operation expression and reducing the calculation cost of each product-sum operation Is disclosed. Further, Patent Document 2 below discloses a method of performing a high speed by providing a threshold for the amount of change of the acoustic feature value and narrowing down the number of executions of the acoustic likelihood calculation. Furthermore, the following Patent Document 3 discloses a method of performing high-speed execution by making a constant part of a known likelihood calculation formula into a table and reducing the amount of calculation.

一方、下記の非特許文献１には、音響モデルの状態を予備選択することで、無駄な音響尤度計算を減らし、また、非選択の音響モデルの状態に対しても、事前に他音響モデルから算出した近似値を適用することで高速実行する方法が開示されている。
特開2005-031151号公報特開2000-250580号公報特開2000-322081号公報情報処理学会論文誌 IPSJ-JNL4307023、李晃伸他２名、「音素環境独立HMMを用いた混合ガウス分布選択による音響尤度計算量の削減」 On the other hand, the following Non-Patent Document 1 reduces the useless acoustic likelihood calculation by pre-selecting the state of the acoustic model, and the other acoustic model in advance for the state of the unselected acoustic model. A method of executing at high speed by applying an approximate value calculated from the above is disclosed.
JP 2005-031151 A JP 2000-250580 A Japanese Unexamined Patent Publication No. 2000-322081 Information Processing Society of Japan IPSJ-JNL4307023, Lee Sung-nobu and two others, "Reduction of acoustic likelihood calculation by selecting Gaussian mixture using phoneme environment independent HMM"

音響尤度計算を高速実行するために、前記した特許文献１〜３、非特許文献１に記されている従来技術を適用しようとすると、以下のような課題が存在する。 In order to execute the acoustic likelihood calculation at high speed, if the conventional techniques described in Patent Documents 1 to 3 and Non-Patent Document 1 described above are applied, the following problems exist.

(a)特許文献１に記載の方法では、音響モデルの状態数や音響特徴量の次元数が増大すると、分解された個々の積和演算の累積度が増大するため、並列化が困難である。 (a) In the method described in Patent Document 1, if the number of states of the acoustic model and the number of dimensions of the acoustic feature amount increase, the degree of accumulation of the individual product-sum operations that are decomposed increases, so that parallelization is difficult. .

(b)特許文献２あるいは特許文献３に記載の方法では、音響モデルの状態数や音響特徴量の次元数が増大すると、音響特徴量や定数を記憶するデータ記憶領域が増大するため、データ記憶領域の限られた機器では、実装が困難である。 (b) In the method described in Patent Literature 2 or Patent Literature 3, when the number of states of the acoustic model or the dimension number of the acoustic feature amount increases, the data storage area for storing the acoustic feature amount or constant increases, so that data storage is performed. It is difficult to implement with equipment with limited area.

(c)非特許文献１に記載の方法では、音響モデルの状態数や音響特徴量の次元数が増大すると、非選択の音響モデルの状態から算出した近似値を記憶するデータ記憶領域が増大するため、データ記憶領域の限られた機器では、実装が困難である。 (c) In the method described in Non-Patent Document 1, when the number of states of the acoustic model and the dimension number of the acoustic feature amount increase, the data storage area for storing the approximate value calculated from the state of the non-selected acoustic model increases. For this reason, it is difficult to implement the device with a limited data storage area.

本発明は、前記した従来技術の課題に鑑みてなされたものであり、その目的は、音響尤度計算の処理時間を短縮できる音声認識における音響尤度並列計算装置及びそのプログラムを提供することにある。 The present invention has been made in view of the above-described problems of the prior art, and an object thereof is to provide an acoustic likelihood parallel computing device in speech recognition and a program thereof that can shorten the processing time of acoustic likelihood calculation. is there.

前記した目的を達成するために、本発明は、入力音声を分析し音響特徴量に変換する音響特徴量変換部と、音響尤度計算の並列実行手順を登録する音響尤度並列実行手順登録部と、入力音声の音響特徴量と音響モデルの状態を保持する音響特徴量保持部と、前記音響尤度並列実行手順登録部によって登録された音響尤度並列実行手順に従い、入力音声の音響特徴量と音響モデルの状態を用いて尤度計算を実行する音響尤度並列計算部と、前記音響尤度並列計算部から尤度計算結果を出力させる音響尤度並列計算制御部と、前記音響尤度並列計算部から音響尤度計算結果を取得する音響尤度計算結果取得部とを具備し、前記音響尤度並列計算部には、一般的にはグラフィック処理を行うＧＰＧＰＵ（General Purpose Graphic Processing Unit）を使用し、該ＧＰＧＰＵのピクセルシェーダの並列演算をカスケード接続することで音響尤度計算を実行させ、前記ピクセルシェーダは、データ記憶領域に確保されたｎ次元（ｎは１より大きい正の整数）の各々のμ、χに対し、下記の演算式（２）の第二項の（χ−μ）の演算を一括実行してｎ次元の結果ｚを算出させ、続けて、ｚ、σに対し、演算式（２）の第二項（ｚ×ｚ）／（σ×σ）の演算を一括実行し、ｎ次元の結果Ｐ'を算出させるようにした点に特徴がある。

ここに、μは平均、σは分散、χは互いに独立するｎ次元（ｎ個）のパラメータである。 In order to achieve the above-described object, the present invention includes an acoustic feature amount conversion unit that analyzes input speech and converts it into an acoustic feature amount, and an acoustic likelihood parallel execution procedure registration unit that registers a parallel execution procedure of acoustic likelihood calculation. And an acoustic feature quantity holding unit that holds the acoustic feature quantity of the input speech and the state of the acoustic model, and an acoustic feature quantity of the input speech according to the acoustic likelihood parallel execution procedure registered by the acoustic likelihood parallel execution procedure registration unit And an acoustic likelihood parallel calculation unit that performs likelihood calculation using the state of the acoustic model, an acoustic likelihood parallel calculation control unit that outputs a likelihood calculation result from the acoustic likelihood parallel calculation unit, and the acoustic likelihood An acoustic likelihood calculation result acquisition unit that acquires an acoustic likelihood calculation result from a parallel calculation unit, and the acoustic likelihood parallel calculation unit generally includes a GPGPU (General Purpose Graphic Processing Unit) that performs graphic processing Use The acoustic likelihood calculation is performed by cascading parallel operations of the pixel shaders of the PGPU, and the pixel shaders are each of n of n dimensions (n is a positive integer larger than 1) reserved in the data storage area, The calculation of the second term (χ−μ) of the following equation (2) is performed on χ to calculate the n-dimensional result z, and then the equation (2 ) Of the second term (z × z) / (σ × σ) is collectively executed to calculate the n-dimensional result P ′ .

Here, μ is an average, σ is a variance, and χ is an n-dimensional (n) parameter independent of each other.

また、前記音響尤度並列計算部には、一般的にはグラフィック処理を行うＧＰＧＰＵ（General Purpose Graphic Processing Unit）を使用し、該ＧＰＧＰＵのピクセルシェーダのそれぞれの並列演算をカスケード接続することで音響尤度計算を実行させるようにした点に他の特徴がある。 The acoustic likelihood parallel calculation unit generally uses a GPGPU (General Purpose Graphic Processing Unit) that performs graphic processing, and cascades the parallel computations of the pixel shaders of the GPGPU. There is another feature in that the degree calculation is executed.

また、ＧＰＧＰＵのピクセルシェーダの並列演算をカスケード接続することで音響尤度計算を実行させる音響尤度並列計算のプログラムであって、コンピュータに、データ記憶領域に確保されたｎ次元（ｎは１より大きい正の整数）の各々のμ、χに対し、下記の演算式（２）の第二項の（χ−μ）の演算を一括実行してｎ次元の結果ｚを算出させる機能と、ｚ、σに対し、演算式（２）の第二項（ｚ×ｚ）／（σ×σ）の演算を一括実行し、ｎ次元の結果Ｐ'を算出させる機能とを、前記ピクセルシェーダで実現させるためのプログラムを提供する点に、さらに他の特徴がある。

ここに、μは平均、σは分散、χは互いに独立するｎ次元（ｎ個）のパラメータである。 In addition, it is an acoustic likelihood parallel calculation program for executing acoustic likelihood calculation by cascading parallel operations of GPGPU pixel shaders, and the computer stores n dimensions (n is 1 from 1). A function of calculating the n- dimensional result z by executing the operation of (χ−μ) in the second term of the following equation (2) for each μ and χ of a large positive integer) , z The pixel shader realizes the function of collectively executing the second term (z × z) / (σ × σ) of the arithmetic expression (2) for σ and calculating the n- dimensional result P ′. There is still another feature in that a program is provided.

Here, μ is an average, σ is a variance, and χ is an n- dimensional ( n ) parameter independent of each other.

本発明による音響尤度並列計算装置によれば、音響モデルの状態数や音響特徴量の次元数に依存しない可変のデータ記憶領域を利用することで、従来の尤度計算方法をそのまま並列実行出来る。これにより、前記演算式（２）を、短時間で実行することが可能となる。 According to the acoustic likelihood parallel calculation apparatus of the present invention, the conventional likelihood calculation method can be executed in parallel as it is by using a variable data storage area that does not depend on the number of states of the acoustic model or the dimension of the acoustic feature. . As a result, the arithmetic expression (2) can be executed in a short time.

また、可変のデータ記憶領域に、音響モデルの状態数や音響特徴量の次元数を保持することで、音響尤度並列計算前後のデータ記憶領域の使用頻度とデータ記憶総量を低く抑えることができるようになる。 Further, by holding the number of states of the acoustic model and the number of dimensions of the acoustic feature quantity in the variable data storage area, the frequency of use of the data storage area before and after the acoustic likelihood parallel calculation and the total amount of data storage can be kept low. It becomes like this.

まず、本発明の原理を説明する。一般的なPCや携帯電話端末では、CPUの演算処理能力は限定されるため、音声認識処理において処理量の特に多い音響尤度計算を、通常グラフィック処理を行っているGPGPU（General Purpose Graphic Processing Unit）を転用して行うことで、膨大な演算処理を高効率かつ高精度で行え、大語彙辞書での音声認識処理をリアルタイムで実行するようにする。具体的には、３Ｄグラフィック処理等で用いられる頂点処理（バーテックスシェーダ）やテクスチャ処理（ピクセルシェーダ）のマトリックス並列演算をベクトル処理と見做して複数並列処理のカスケード接続の処理形態を構成し、音響尤度計算における並列・順次処理を効率的にマッピングすることで、高速化を図るものである。この発明によれば、また、これまでＣＰＵ側で必要としていた膨大なメモリもＧＰＧＰＵ側で吸収することができる。 First, the principle of the present invention will be described. In general PCs and mobile phone terminals, the processing power of the CPU is limited. Therefore, GPGPU (General Purpose Graphic Processing Unit), which normally performs graphic processing, is used to calculate acoustic likelihood, which requires a large amount of processing in speech recognition processing. ) Can be used to perform enormous arithmetic processing with high efficiency and high accuracy, and to perform speech recognition processing in a large vocabulary dictionary in real time. More specifically, a matrix parallel processing of vertex processing (vertex shader) and texture processing (pixel shader) used in 3D graphic processing, etc. is regarded as vector processing, and a cascaded processing form of multiple parallel processing is configured. Speedup is achieved by efficiently mapping parallel / sequential processing in acoustic likelihood calculation. According to the present invention, a huge amount of memory that has been necessary on the CPU side can be absorbed on the GPGPU side.

以下に、図面を参照して、本発明を詳細に説明する。図１は音響尤度並列計算装置のハード構成を示すブロック図である。 Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a hardware configuration of the acoustic likelihood parallel computing apparatus.

図示されているように、該音響尤度並列計算装置は、ハード構成としては、ＣＰＵ１、メモリ２，ＧＰＧＰＵ（General Purpose Graphic Processing Unit）３，ディスプレイ４およびこれらを接続するバス５から構成される。また、ＧＰＧＰＵ３は、ＧＰＵ３１と可変のデータ記憶領域を有するＧＰＵメモリ３２とからなる。図から分かるように、ＧＰＵメモリ３２はＧＰＵ３１から直接アクセスされてデータの読み書きをすることができるが、ＣＰＵ１からは直接にアクセスされることはできない。このため、ＣＰＵ１は、ＧＰＵ３１を介してＧＰＵメモリ３２にアクセスするようにする。 As shown in the drawing, the acoustic likelihood parallel computing device includes a CPU 1, a memory 2, a GPGPU (General Purpose Graphic Processing Unit) 3, a display 4, and a bus 5 that connects them as hardware configurations. The GPGPU 3 includes a GPU 31 and a GPU memory 32 having a variable data storage area. As can be seen from the figure, the GPU memory 32 can be directly accessed from the GPU 31 to read / write data, but cannot be accessed directly from the CPU 1. For this reason, the CPU 1 accesses the GPU memory 32 via the GPU 31.

図２は、該音響尤度並列計算装置の機能ブロック図であり、入力音声を分析し、音響特徴量に変換する音響特徴量変換部１１と、音響尤度計算の並列実行手順を登録する音響尤度並列実行手順登録部１２と、入力音声の音響特徴量と音響モデルの状態を保持する音響特徴量保持部１４と、該音響特徴量保持部１４で保持された入力音声の音響特徴量と音響モデルの状態と音響尤度並列実行手順登録部１２に予め登録されている音響尤度並列実行手順に従い、尤度計算を実行する音響尤度並列計算部１５と、該音響尤度並列計算部１５から尤度計算結果を出力させる音響尤度並列計算制御部１３とからなる。 FIG. 2 is a functional block diagram of the acoustic likelihood parallel calculation device, which analyzes an input speech and converts it into an acoustic feature amount, and an acoustic function for registering a parallel execution procedure of acoustic likelihood calculation. Likelihood parallel execution procedure registration unit 12, acoustic feature quantity holding unit 14 that holds the acoustic feature quantity of the input voice and the state of the acoustic model, and the acoustic feature quantity of the input voice held by the acoustic feature quantity holding unit 14 State of acoustic model and acoustic likelihood parallel execution procedure registered in advance in acoustic likelihood parallel execution procedure registration unit 12, acoustic likelihood parallel calculation unit 15 for performing likelihood calculation, and acoustic likelihood parallel calculation unit 15 includes an acoustic likelihood parallel calculation control unit 13 that outputs a likelihood calculation result from 15.

ここに、前記音響特徴量変換部１１、音響尤度並列計算制御部１３は、図１のＣＰＵ１が行う機能であり、前記音響尤度並列実行手順登録部１２はメモリ２に相当する。また、前記音響特徴量保持部１４はＧＰＵメモリ３２に相当し、前記音響尤度並列計算部１５はＧＰＵ３１が行う機能である。 Here, the acoustic feature quantity conversion unit 11 and the acoustic likelihood parallel calculation control unit 13 are functions performed by the CPU 1 in FIG. 1, and the acoustic likelihood parallel execution procedure registration unit 12 corresponds to the memory 2. The acoustic feature quantity holding unit 14 corresponds to the GPU memory 32, and the acoustic likelihood parallel calculation unit 15 is a function performed by the GPU 31.

次に、該音響尤度並列計算装置の動作を説明する。ＧＰＧＰＵ３は元々グラフイックの処理、例えば画像のピクセルを回転させたり、その色を変化させたりする処理を高速に行うためのものであるので、これをＣＰＵの演算用のアクセラレータとして用いる場合には、予め、演算のためのデータ領域のサイズとか、演算の変数の定義、演算の処理手順（プログラム）とかを、ＣＰＵ１側からＧＰＵ３１に設定またはロードすることが必要になる。 Next, the operation of the acoustic likelihood parallel computing device will be described. Since GPGPU3 is originally intended to perform graphic processing, for example, processing of rotating pixels of an image or changing the color thereof at high speed, when this is used as an accelerator for CPU computation, It is necessary to set or load the size of the data area for the calculation, the definition of the variable for the calculation, and the processing procedure (program) of the calculation into the GPU 31 from the CPU 1 side.

そこで、図３(a)に示すように、ＣＰＵ１はＧＰＵ３１を介してＧＰＵメモリ３２に、ｎ個の独立するパラメータχ、音声特徴量の分散σ、平均μのための記憶領域（テクスチャ）を用意し、各記憶領域のサイズを、例えば５１２×５１２（ピクセル）とする。また、ＣＰＵ１はＧＰＵ３１に、演算の変数の定義や、図３(b)のような演算の処理手順をロードする。 Therefore, as shown in FIG. 3A, the CPU 1 prepares a storage area (texture) for n independent parameters χ, variance σ of voice feature amount, and average μ in the GPU memory 32 via the GPU 31. The size of each storage area is, for example, 512 × 512 (pixels). In addition, the CPU 1 loads the GPU 31 with the definition of calculation variables and the calculation processing procedure as shown in FIG.

つまり、図２においては、音響尤度並列実行手順登録部１２が音響尤度並列計算部１５に対して、初期設定フェーズとして、以下の初期設定を行う。
〈初期設定フェーズ〉 That is, in FIG. 2, the acoustic likelihood parallel execution procedure registration unit 12 performs the following initial setting as an initial setting phase for the acoustic likelihood parallel calculation unit 15.
<Initial setting phase>

・音声特徴量をｎ次元（ｎ個）とする。 The voice feature amount is n-dimensional (n).

・音響モデルの状態数をｍ個とする。・ The number of states of the acoustic model is m.

・ｎの平均をμ、ｎの分散をσ、ｎ個の独立するパラメータをχとする。 The mean of n is μ, the variance of n is σ, and n independent parameters are χ.

・μ、σ、χから求めた尤度計算結果（音響モデルの状態各々の尤度確率）をｐとする。 The likelihood calculation result (likelihood probability of each state of the acoustic model) obtained from μ, σ, and χ is p.

・μ、σ、χ、ｐを、各々保持するためのデータ記憶領域のサイズをｌ＝ｍ×ｎとし、データ記憶領域を確保する。 The size of the data storage area for holding μ, σ, χ, and p is set to l = m × n, and the data storage area is secured.

前記初期設定フェーズが終わると、次は並列演算フェーズに移る。該並列演算フェーズでは、前記演算式（２）の右辺第二項の演算、すなわち３８回の積和演算を行う。具体的には、図２の音響特徴量変換部１１は、入力音声を時系列の３８個の音響特徴量（デジタル）に変換し、該３８個の音響特徴量は音響尤度並列計算部１５を介して音響特徴量保持部１４に格納される。なお、好ましくは、音声の２５ｍ秒を１フレームとした場合、音響特徴量３８個×音響モデルの状態数５４６個×音響モデルの状態の任意の組み合わせ数×１フレーム分、例えば３８個×５４６個×音響モデルの状態の任意の組み合わせ数分の音響特徴量を一括して音響特徴量保持部１４に格納すると、処理速度をさらに向上することができる。 When the initial setting phase is finished, the next step is a parallel operation phase. In the parallel operation phase, the operation of the second term on the right side of the equation (2), that is, 38 product-sum operations are performed. Specifically, the acoustic feature quantity conversion unit 11 in FIG. 2 converts the input voice into 38 time-series acoustic feature quantities (digital), and the 38 acoustic feature quantities are converted into the acoustic likelihood parallel calculation unit 15. Is stored in the acoustic feature quantity holding unit 14. Preferably, when 25 msec of speech is one frame, the number of acoustic features 38 × the number of states of the acoustic model 546 × the number of combinations of the states of the acoustic model × one frame, for example 38 × 546 X If the acoustic feature quantities for any number of combinations of the states of the acoustic model are stored in the acoustic feature quantity holding unit 14 collectively, the processing speed can be further improved.

該音響特徴量として、前記した３８次元（一般的に、ｎ次元）の音声特徴量平均μ_ｎ、分散σ_ｎ、独立するパラメータχ_ｎが求められているので、音響尤度並列計算部１５はこれらの特徴量を用いて、下記の並列演算フェーズの処理を行う。
〈並列演算処理フェーズ〉 As the acoustic feature amount, the 38-dimensional (generally n-dimensional) speech feature amount average μ _n , variance σ _n , and independent parameter χ _n are obtained. Using these feature amounts, the following parallel operation phase processing is performed.
<Parallel processing phase>

・前記データ記憶領域（音響特徴量保持部１４）に確保された各々のμ_ｎ、χ_ｎに対し、演算式（２）第二項（χ_ｎ−μ_ｎ）の演算を一括実行し、結果をｚ_ｎとする。 A calculation result of the second term (χ _n −μ _n ) is collectively executed for each μ _n , χ _n secured in the data storage area (acoustic feature amount holding unit 14), and the result Is z _n .

・続いて、ｚ_ｎ、σ_ｎに対し、演算式（２）第二項（ｚ_ｎ×ｚ_ｎ）／（σ_ｎ×σ_ｎ）の演算を一括実行し、演算結果をＰ_ｎとし、該Ｐ_ｎを保持するためのデータ記憶領域Ｐ’（音響特徴量保持部１４）に設定する。 - Subsequently, _{z n,} with respect to sigma _n, arithmetic expression (2) the calculation of the second term _{_{(z n × z n) /}} (σ n × σ n) collectively perform the operation result as a _{P n,} the It is set in the data storage area P ′ (acoustic feature quantity holding unit 14) for holding _Pn .

図４は、該並列演算処理フェーズの概念図である。このフェーズは、前記ＧＰＧＰＵのピクセルシェーダのそれぞれの並列演算をカスケード接続することにより行われる。前記ＧＰＧＰＵのピクセルシェーダでは、データ記憶領域に確保された３８次元の各々のμ、χに対し、演算式（２）第二項（χ−μ）の演算を一括実行して３８次元の結果ｚを算出させ、該ピクセルシェーダにおいて、続けて、ｚ、σに対し、演算式（２）第二項（ｚ×ｚ）／（σ×σ）の演算を一括実行し、３８次元の結果Ｐ’を算出させる。 FIG. 4 is a conceptual diagram of the parallel operation processing phase. This phase is performed by cascading parallel operations of the GPGPU pixel shaders. In the GPGPU pixel shader, the 38-dimensional result z is obtained by collectively executing the operation of the second expression (χ−μ) of the equation (2) for each of 38-dimensional μ and χ secured in the data storage area. In the pixel shader, the operation of the expression (2) second term (z × z) / (σ × σ) is collectively executed for z and σ, and the 38-dimensional result P ′ Is calculated.

前記並列演算処理フェーズが終わると、次は終了フェーズに移る。該終了フェーズでは、音響尤度並列計算制御部１３は、音響尤度並列計算部１５を介して音響特徴量保持部１４にアクセスし、ｎ個のｐを読み出し、以下の終了フェーズの処理を行う。
〈終了フェーズ〉 When the parallel operation processing phase ends, the process proceeds to the end phase. In the end phase, the acoustic likelihood parallel calculation control unit 13 accesses the acoustic feature quantity holding unit 14 via the acoustic likelihood parallel calculation unit 15, reads n pieces of p, and performs the following end phase processing. .
<End phase>

・データ記憶領域（音響特徴量保持部１４）上のｎ次元（ｎ個）分のＰ’_ｎに対してシグマ演算を実行し、音響特徴量の合計尤度を算出する。 A sigma calculation is performed on n-dimensional (n) P ′ _n on the data storage area (acoustic feature amount holding unit 14), and the total likelihood of the acoustic feature amount is calculated.

つまり、音響尤度並列計算制御部１３は、Ｐ’の記憶領域からＰ’の値を３８個読み出して、図５に示されている式の演算（音響尤度計算）を行う。ここに、Ｃは定数部分である。 That is, the acoustic likelihood parallel calculation control unit 13 reads out 38 values of P ′ from the storage area of P ′, and performs calculation (acoustic likelihood calculation) of the equation shown in FIG. Here, C is a constant part.

以上のように、本実施形態によれば、入力音声のｎ個又はｎ×１フレームの特徴量をＧＰＵメモリ３２に一括して格納し、前記並列演算処理フェーズで説明した並列演算処理を行うようにしたので、演算式（２）第二項（ｚ_ｎ×ｚ_ｎ）／（σ_ｎ×σ_ｎ）の演算を一括して実行できるようになる。このため、従来の尤度計算方法をそのまま並列実行でき、前記演算式（２）を短時間で実行することが可能となる。 As described above, according to the present embodiment, n or n × 1 frame feature quantities of input speech are collectively stored in the GPU memory 32, and the parallel arithmetic processing described in the parallel arithmetic processing phase is performed. As a result, the calculation of the second term (z _n × z _n ) / (σ _n × σ _n ) of the calculation formula (2) can be executed collectively. Therefore, the conventional likelihood calculation method can be executed in parallel as it is, and the arithmetic expression (2) can be executed in a short time.

なお、本発明は前記した実施形態に限定されず、前記ピクセルシェーダの同時ベクトル演算数を上げ、複数の音響モデルに対する演算（音響特徴量３８個×音響モデルの状態数５４６個×音響モデルの状態の任意の重ね合わせ数×Ｍ）（Ｍは１より大きい正の整数）の処理を同時に実行することにより、速度向上を図ることができる。例えば、音声を１フレームずつ並列処理する場合には、例えばＭは、その実施形態における音声フレームの最大並列処理実行数となる。 Note that the present invention is not limited to the above-described embodiment, and the number of simultaneous vector operations of the pixel shader is increased to calculate a plurality of acoustic models (acoustic feature amount 38 × acoustic model state number 546 × acoustic model state). The speed can be improved by simultaneously executing the processing of any number of superpositions × M) (M is a positive integer greater than 1). For example, when audio is processed in parallel one frame at a time, M is, for example, the maximum number of parallel processing executions of audio frames in the embodiment.

本発明の一実施形態のハード構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of one Embodiment of this invention. 本発明の一実施形態の機能を示すブロック図である。It is a block diagram which shows the function of one Embodiment of this invention. 初期設定フェーズの概念を説明する図である。It is a figure explaining the concept of an initial setting phase. 並列処理フェーズの概念を説明する図である。It is a figure explaining the concept of a parallel processing phase. 終了フェーズの概念を説明する図である。It is a figure explaining the concept of an end phase.

Explanation of symbols

１・・・ＣＰＵ、２・・・メモリ、３・・・ＧＰＧＰＵ、４・・・ディスプレイ、５・・・バス、３１・・・ＧＰＵ、３２・・・ＧＰＵメモリ、１１・・・音響特徴量変換部、１２・・・音響尤度並列実行手順登録部、１３・・・音響尤度並列計算制御部、１４・・・音響特徴量保持部、１５・・・音響尤度並列計算部。

DESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... Memory, 3 ... GPGPU, 4 ... Display, 5 ... Bus, 31 ... GPU, 32 ... GPU memory, 11 ... Acoustic feature-value Conversion part, 12 ... Acoustic likelihood parallel execution procedure registration part, 13 ... Acoustic likelihood parallel calculation control part, 14 ... Acoustic feature-value holding part, 15 ... Acoustic likelihood parallel calculation part.

Claims

An acoustic feature conversion unit that analyzes input speech and converts it into acoustic features;
An acoustic likelihood parallel execution procedure registration unit for registering parallel execution procedures of acoustic likelihood calculation;
An acoustic feature holding unit that holds the acoustic feature of the input speech and the state of the acoustic model;
In accordance with the acoustic likelihood parallel execution procedure registered by the acoustic likelihood parallel execution procedure registration unit, an acoustic likelihood parallel calculation unit that performs likelihood calculation using the acoustic feature quantity of the input speech and the state of the acoustic model;
An acoustic likelihood parallel calculation control unit that outputs a likelihood calculation result from the acoustic likelihood parallel calculation unit;
An acoustic likelihood calculation result acquisition unit for acquiring an acoustic likelihood calculation result from the acoustic likelihood parallel calculation unit;
Equipped with,
The acoustic likelihood parallel calculation unit generally uses a GPGPU (General Purpose Graphic Processing Unit) for performing graphic processing, and performs acoustic likelihood calculation by cascading parallel operations of pixel shaders of the GPGPU. Let
The pixel shader uses (χ−μ) in the second term of the following equation (2) for each of μ and χ of n dimensions (n is a positive integer greater than 1) reserved in the data storage area. To calculate the n-dimensional result z, and subsequently to the second term (z × z) / (σ × σ) of the arithmetic expression (2) for z and σ. And an acoustic likelihood parallel calculation device that calculates an n-dimensional result P ′.

In the acoustic likelihood parallel computing device according to claim 1,
The number of simultaneous vector operations of the pixel shader is set to (n dimensions x M) (M is a positive integer greater than 1) , and the processing speed is improved by simultaneously executing operations for multiple (M) acoustic models. An acoustic likelihood parallel computing device characterized by that.

In the acoustic likelihood parallel computing device according to claim 1 or 2,
An acoustic likelihood parallel computing device characterized in that processing (Σ operation) for adding the calculation processing results of each dimension is executed by a CPU.

In the acoustic likelihood parallel computing device according to any one of claims 1 to 3,
The n-dimensionality is 38 dimensions, and the acoustic likelihood parallel computing device.

An acoustic likelihood parallel calculation program for executing acoustic likelihood calculation by cascading parallel operations of pixel shaders of GPGPU,
On the computer,
For the n- dimensional (n is a positive integer greater than 1) μ and χ secured in the data storage area, the operation of (χ−μ) in the second term of the following equation (2) is executed in a batch. And a function for calculating the n- dimensional result z,
The pixel shader has the function of collectively executing the calculation of the second term (z × z) / (σ × σ) of the calculation formula (2) for z and σ and calculating the n- dimensional result P ′. A program to make it happen.

The acoustic likelihood parallel calculation program according to claim 5,
The n-dimensional program is 38-dimensional .