Nothing Special   »   [go: up one dir, main page]

JP4854032B2 - Acoustic likelihood parallel computing device and program for speech recognition - Google Patents

Acoustic likelihood parallel computing device and program for speech recognition Download PDF

Info

Publication number
JP4854032B2
JP4854032B2 JP2007254642A JP2007254642A JP4854032B2 JP 4854032 B2 JP4854032 B2 JP 4854032B2 JP 2007254642 A JP2007254642 A JP 2007254642A JP 2007254642 A JP2007254642 A JP 2007254642A JP 4854032 B2 JP4854032 B2 JP 4854032B2
Authority
JP
Japan
Prior art keywords
acoustic
parallel
calculation
likelihood
acoustic likelihood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2007254642A
Other languages
Japanese (ja)
Other versions
JP2009086202A (en
Inventor
良一 八木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KDDI Corp
Original Assignee
KDDI Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by KDDI Corp filed Critical KDDI Corp
Priority to JP2007254642A priority Critical patent/JP4854032B2/en
Publication of JP2009086202A publication Critical patent/JP2009086202A/en
Application granted granted Critical
Publication of JP4854032B2 publication Critical patent/JP4854032B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Description

本発明は音声認識における音響尤度並列計算装置及びそのプログラムに関し、特に音声認識における音響特徴量の積和演算部分を並列演算化することで尤度計算の処理時間を短縮可能にする音響尤度計算装置及びそのプログラムに関する。   The present invention relates to an acoustic likelihood parallel calculation apparatus and its program in speech recognition, and in particular, acoustic likelihood that can shorten the processing time of likelihood calculation by parallelizing product-sum operation parts of acoustic features in speech recognition. The present invention relates to a computing device and its program.

音声認識では、マイクからデジタル・サンプリングされた音声は時系列の音響特徴量に変換され、母音や子音といった認識単位毎の音響モデルとパターン・マッチングが行われる。一方、隠れマルコフモデル(Hidden Markov Model: HMM)で表現される音響モデルは、数百の状態(State)で構成され、それぞれの状態は音響特徴量(38次元)の入力に対して尤度(対数化した確率)を出力する多次元正規分布を持つ。   In speech recognition, speech digitally sampled from a microphone is converted into time-series acoustic features, and pattern matching is performed with acoustic models for each recognition unit such as vowels and consonants. On the other hand, an acoustic model expressed by a Hidden Markov Model (HMM) is composed of hundreds of states (State), and each state has a likelihood (38-dimensional) for the input of acoustic features (38 dimensions). It has a multidimensional normal distribution that outputs logarithmic probabilities.

この多次元正規分布は次の演算式(1)で表される。   This multidimensional normal distribution is expressed by the following arithmetic expression (1).

Figure 0004854032
Figure 0004854032

ここに、μは平均、σは分散、χは互いに独立する38次元(38個)のパラメータである。   Here, μ is an average, σ is variance, and χ is a 38-dimensional (38) parameter independent of each other.

前記(1)式を対数化して尤度にすると、次の演算式(2)の通り、2次関数の和で表される。   When the equation (1) is logarithmized to be a likelihood, it is represented by the sum of quadratic functions as the following equation (2).

Figure 0004854032
Figure 0004854032

ここに、演算式(2)の括弧{ }内の第1項は定数項であり、予め計算しておけるため、演算式(2)は括弧{ }内の第二項の38回の積和演算となる。この音響特徴量の積和演算を音響尤度計算と呼ぶ。   Here, since the first term in parentheses {} of the arithmetic expression (2) is a constant term and can be calculated in advance, the arithmetic expression (2) is the 38th product sum of the second term in the parentheses {}. It becomes an operation. This sum-of-products calculation of acoustic features is called acoustic likelihood calculation.

さて、下記の特許公報1には、前記尤度計算演算式(演算式(2))を単純な積和演算式へ分解し、個々の積和演算の計算コストを下げることで高速実行する方法が開示されている。また、下記の特許文献2には、音響特徴量の変化量に閾値を設け、音響尤度計算の実行回数の絞込みを行うことで高速実行する方法が開示されている。さらに、下記の特許文献3には、公知の尤度計算演算式の定数部をテーブル化し、計算量を削減することで高速実行する方法が開示されている。   Now, the following Patent Publication 1 discloses a method for performing high-speed execution by decomposing the likelihood calculation expression (operation expression (2)) into a simple product-sum operation expression and reducing the calculation cost of each product-sum operation Is disclosed. Further, Patent Document 2 below discloses a method of performing a high speed by providing a threshold for the amount of change of the acoustic feature value and narrowing down the number of executions of the acoustic likelihood calculation. Furthermore, the following Patent Document 3 discloses a method of performing high-speed execution by making a constant part of a known likelihood calculation formula into a table and reducing the amount of calculation.

一方、下記の非特許文献1には、音響モデルの状態を予備選択することで、無駄な音響尤度計算を減らし、また、非選択の音響モデルの状態に対しても、事前に他音響モデルから算出した近似値を適用することで高速実行する方法が開示されている。
特開2005-031151号公報 特開2000-250580号公報 特開2000-322081号公報 情報処理学会 論文誌 IPSJ-JNL4307023、李晃伸他2名、「音素環境独立HMMを用いた混合ガウス分布選択による音響尤度計算量の削減」
On the other hand, the following Non-Patent Document 1 reduces the useless acoustic likelihood calculation by pre-selecting the state of the acoustic model, and the other acoustic model in advance for the state of the unselected acoustic model. A method of executing at high speed by applying an approximate value calculated from the above is disclosed.
JP 2005-031151 A JP 2000-250580 A Japanese Unexamined Patent Publication No. 2000-322081 Information Processing Society of Japan IPSJ-JNL4307023, Lee Sung-nobu and two others, "Reduction of acoustic likelihood calculation by selecting Gaussian mixture using phoneme environment independent HMM"

音響尤度計算を高速実行するために、前記した特許文献1〜3、非特許文献1に記されている従来技術を適用しようとすると、以下のような課題が存在する。   In order to execute the acoustic likelihood calculation at high speed, if the conventional techniques described in Patent Documents 1 to 3 and Non-Patent Document 1 described above are applied, the following problems exist.

(a)特許文献1に記載の方法では、音響モデルの状態数や音響特徴量の次元数が増大すると、分解された個々の積和演算の累積度が増大するため、並列化が困難である。   (a) In the method described in Patent Document 1, if the number of states of the acoustic model and the number of dimensions of the acoustic feature amount increase, the degree of accumulation of the individual product-sum operations that are decomposed increases, so that parallelization is difficult. .

(b)特許文献2あるいは特許文献3に記載の方法では、音響モデルの状態数や音響特徴量の次元数が増大すると、音響特徴量や定数を記憶するデータ記憶領域が増大するため、データ記憶領域の限られた機器では、実装が困難である。   (b) In the method described in Patent Literature 2 or Patent Literature 3, when the number of states of the acoustic model or the dimension number of the acoustic feature amount increases, the data storage area for storing the acoustic feature amount or constant increases, so that data storage is performed. It is difficult to implement with equipment with limited area.

(c)非特許文献1に記載の方法では、音響モデルの状態数や音響特徴量の次元数が増大すると、非選択の音響モデルの状態から算出した近似値を記憶するデータ記憶領域が増大するため、データ記憶領域の限られた機器では、実装が困難である。   (c) In the method described in Non-Patent Document 1, when the number of states of the acoustic model and the dimension number of the acoustic feature amount increase, the data storage area for storing the approximate value calculated from the state of the non-selected acoustic model increases. For this reason, it is difficult to implement the device with a limited data storage area.

本発明は、前記した従来技術の課題に鑑みてなされたものであり、その目的は、音響尤度計算の処理時間を短縮できる音声認識における音響尤度並列計算装置及びそのプログラムを提供することにある。   The present invention has been made in view of the above-described problems of the prior art, and an object thereof is to provide an acoustic likelihood parallel computing device in speech recognition and a program thereof that can shorten the processing time of acoustic likelihood calculation. is there.

前記した目的を達成するために、本発明は、入力音声を分析し音響特徴量に変換する音響特徴量変換部と、音響尤度計算の並列実行手順を登録する音響尤度並列実行手順登録部と、入力音声の音響特徴量と音響モデルの状態を保持する音響特徴量保持部と、前記音響尤度並列実行手順登録部によって登録された音響尤度並列実行手順に従い、入力音声の音響特徴量と音響モデルの状態を用いて尤度計算を実行する音響尤度並列計算部と、前記音響尤度並列計算部から尤度計算結果を出力させる音響尤度並列計算制御部と、前記音響尤度並列計算部から音響尤度計算結果を取得する音響尤度計算結果取得部とを具備し、前記音響尤度並列計算部には、一般的にはグラフィック処理を行うGPGPU(General Purpose Graphic Processing Unit)を使用し、該GPGPUのピクセルシェーダの並列演算をカスケード接続することで音響尤度計算を実行させ、前記ピクセルシェーダは、データ記憶領域に確保されたn次元(nは1より大きい正の整数)の各々のμ、χに対し、下記の演算式(2)の第二項の(χ−μ)の演算を一括実行してn次元の結果zを算出させ、続けて、z、σに対し、演算式(2)の第二項(z×z)/(σ×σ)の演算を一括実行し、n次元の結果P'を算出させるようにした点に特徴がある。

Figure 0004854032
ここに、μは平均、σは分散、χは互いに独立するn次元(n個)のパラメータである。 In order to achieve the above-described object, the present invention includes an acoustic feature amount conversion unit that analyzes input speech and converts it into an acoustic feature amount, and an acoustic likelihood parallel execution procedure registration unit that registers a parallel execution procedure of acoustic likelihood calculation. And an acoustic feature quantity holding unit that holds the acoustic feature quantity of the input speech and the state of the acoustic model, and an acoustic feature quantity of the input speech according to the acoustic likelihood parallel execution procedure registered by the acoustic likelihood parallel execution procedure registration unit And an acoustic likelihood parallel calculation unit that performs likelihood calculation using the state of the acoustic model, an acoustic likelihood parallel calculation control unit that outputs a likelihood calculation result from the acoustic likelihood parallel calculation unit, and the acoustic likelihood An acoustic likelihood calculation result acquisition unit that acquires an acoustic likelihood calculation result from a parallel calculation unit, and the acoustic likelihood parallel calculation unit generally includes a GPGPU (General Purpose Graphic Processing Unit) that performs graphic processing Use The acoustic likelihood calculation is performed by cascading parallel operations of the pixel shaders of the PGPU, and the pixel shaders are each of n of n dimensions (n is a positive integer larger than 1) reserved in the data storage area, The calculation of the second term (χ−μ) of the following equation (2) is performed on χ to calculate the n-dimensional result z, and then the equation (2 ) Of the second term (z × z) / (σ × σ) is collectively executed to calculate the n-dimensional result P ′ .
Figure 0004854032
Here, μ is an average, σ is a variance, and χ is an n-dimensional (n) parameter independent of each other.

また、前記音響尤度並列計算部には、一般的にはグラフィック処理を行うGPGPU(General Purpose Graphic Processing Unit)を使用し、該GPGPUのピクセルシェーダのそれぞれの並列演算をカスケード接続することで音響尤度計算を実行させるようにした点に他の特徴がある。   The acoustic likelihood parallel calculation unit generally uses a GPGPU (General Purpose Graphic Processing Unit) that performs graphic processing, and cascades the parallel computations of the pixel shaders of the GPGPU. There is another feature in that the degree calculation is executed.

また、GPGPUのピクセルシェーダの並列演算をカスケード接続することで音響尤度計算を実行させる音響尤度並列計算のプログラムであって、コンピュータに、データ記憶領域に確保された次元(nは1より大きい正の整数)の各々のμ、χに対し、下記の演算式(2)の第二項の(χ−μ)の演算を一括実行して次元の結果zを算出させる機能と、z、σに対し、演算式(2)の第二項(z×z)/(σ×σ)の演算を一括実行し、次元の結果P'を算出させる機能とを、前記ピクセルシェーダで実現させるためのプログラムを提供する点に、さらに他の特徴がある。

Figure 0004854032
ここに、μは平均、σは分散、χは互いに独立する次元(個)のパラメータである。 In addition, it is an acoustic likelihood parallel calculation program for executing acoustic likelihood calculation by cascading parallel operations of GPGPU pixel shaders, and the computer stores n dimensions (n is 1 from 1). A function of calculating the n- dimensional result z by executing the operation of (χ−μ) in the second term of the following equation (2) for each μ and χ of a large positive integer) , z The pixel shader realizes the function of collectively executing the second term (z × z) / (σ × σ) of the arithmetic expression (2) for σ and calculating the n- dimensional result P ′. There is still another feature in that a program is provided.
Figure 0004854032
Here, μ is an average, σ is a variance, and χ is an n- dimensional ( n ) parameter independent of each other.

本発明による音響尤度並列計算装置によれば、音響モデルの状態数や音響特徴量の次元数に依存しない可変のデータ記憶領域を利用することで、従来の尤度計算方法をそのまま並列実行出来る。これにより、前記演算式(2)を、短時間で実行することが可能となる。   According to the acoustic likelihood parallel calculation apparatus of the present invention, the conventional likelihood calculation method can be executed in parallel as it is by using a variable data storage area that does not depend on the number of states of the acoustic model or the dimension of the acoustic feature. . As a result, the arithmetic expression (2) can be executed in a short time.

また、可変のデータ記憶領域に、音響モデルの状態数や音響特徴量の次元数を保持することで、音響尤度並列計算前後のデータ記憶領域の使用頻度とデータ記憶総量を低く抑えることができるようになる。   Further, by holding the number of states of the acoustic model and the number of dimensions of the acoustic feature quantity in the variable data storage area, the frequency of use of the data storage area before and after the acoustic likelihood parallel calculation and the total amount of data storage can be kept low. It becomes like this.

まず、本発明の原理を説明する。一般的なPCや携帯電話端末では、CPUの演算処理能力は限定されるため、音声認識処理において処理量の特に多い音響尤度計算を、通常グラフィック処理を行っているGPGPU(General Purpose Graphic Processing Unit)を転用して行うことで、膨大な演算処理を高効率かつ高精度で行え、大語彙辞書での音声認識処理をリアルタイムで実行するようにする。具体的には、3Dグラフィック処理等で用いられる頂点処理(バーテックスシェーダ)やテクスチャ処理(ピクセルシェーダ)のマトリックス並列演算をベクトル処理と見做して複数並列処理のカスケード接続の処理形態を構成し、音響尤度計算における並列・順次処理を効率的にマッピングすることで、高速化を図るものである。この発明によれば、また、これまでCPU側で必要としていた膨大なメモリもGPGPU側で吸収することができる。   First, the principle of the present invention will be described. In general PCs and mobile phone terminals, the processing power of the CPU is limited. Therefore, GPGPU (General Purpose Graphic Processing Unit), which normally performs graphic processing, is used to calculate acoustic likelihood, which requires a large amount of processing in speech recognition processing. ) Can be used to perform enormous arithmetic processing with high efficiency and high accuracy, and to perform speech recognition processing in a large vocabulary dictionary in real time. More specifically, a matrix parallel processing of vertex processing (vertex shader) and texture processing (pixel shader) used in 3D graphic processing, etc. is regarded as vector processing, and a cascaded processing form of multiple parallel processing is configured. Speedup is achieved by efficiently mapping parallel / sequential processing in acoustic likelihood calculation. According to the present invention, a huge amount of memory that has been necessary on the CPU side can be absorbed on the GPGPU side.

以下に、図面を参照して、本発明を詳細に説明する。図1は音響尤度並列計算装置のハード構成を示すブロック図である。   Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a hardware configuration of the acoustic likelihood parallel computing apparatus.

図示されているように、該音響尤度並列計算装置は、ハード構成としては、CPU1、メモリ2,GPGPU(General Purpose Graphic Processing Unit)3,ディスプレイ4およびこれらを接続するバス5から構成される。また、GPGPU3は、GPU31と可変のデータ記憶領域を有するGPUメモリ32とからなる。図から分かるように、GPUメモリ32はGPU31から直接アクセスされてデータの読み書きをすることができるが、CPU1からは直接にアクセスされることはできない。このため、CPU1は、GPU31を介してGPUメモリ32にアクセスするようにする。   As shown in the drawing, the acoustic likelihood parallel computing device includes a CPU 1, a memory 2, a GPGPU (General Purpose Graphic Processing Unit) 3, a display 4, and a bus 5 that connects them as hardware configurations. The GPGPU 3 includes a GPU 31 and a GPU memory 32 having a variable data storage area. As can be seen from the figure, the GPU memory 32 can be directly accessed from the GPU 31 to read / write data, but cannot be accessed directly from the CPU 1. For this reason, the CPU 1 accesses the GPU memory 32 via the GPU 31.

図2は、該音響尤度並列計算装置の機能ブロック図であり、入力音声を分析し、音響特徴量に変換する音響特徴量変換部11と、音響尤度計算の並列実行手順を登録する音響尤度並列実行手順登録部12と、入力音声の音響特徴量と音響モデルの状態を保持する音響特徴量保持部14と、該音響特徴量保持部14で保持された入力音声の音響特徴量と音響モデルの状態と音響尤度並列実行手順登録部12に予め登録されている音響尤度並列実行手順に従い、尤度計算を実行する音響尤度並列計算部15と、該音響尤度並列計算部15から尤度計算結果を出力させる音響尤度並列計算制御部13とからなる。   FIG. 2 is a functional block diagram of the acoustic likelihood parallel calculation device, which analyzes an input speech and converts it into an acoustic feature amount, and an acoustic function for registering a parallel execution procedure of acoustic likelihood calculation. Likelihood parallel execution procedure registration unit 12, acoustic feature quantity holding unit 14 that holds the acoustic feature quantity of the input voice and the state of the acoustic model, and the acoustic feature quantity of the input voice held by the acoustic feature quantity holding unit 14 State of acoustic model and acoustic likelihood parallel execution procedure registered in advance in acoustic likelihood parallel execution procedure registration unit 12, acoustic likelihood parallel calculation unit 15 for performing likelihood calculation, and acoustic likelihood parallel calculation unit 15 includes an acoustic likelihood parallel calculation control unit 13 that outputs a likelihood calculation result from 15.

ここに、前記音響特徴量変換部11、音響尤度並列計算制御部13は、図1のCPU1が行う機能であり、前記音響尤度並列実行手順登録部12はメモリ2に相当する。また、前記音響特徴量保持部14はGPUメモリ32に相当し、前記音響尤度並列計算部15はGPU31が行う機能である。   Here, the acoustic feature quantity conversion unit 11 and the acoustic likelihood parallel calculation control unit 13 are functions performed by the CPU 1 in FIG. 1, and the acoustic likelihood parallel execution procedure registration unit 12 corresponds to the memory 2. The acoustic feature quantity holding unit 14 corresponds to the GPU memory 32, and the acoustic likelihood parallel calculation unit 15 is a function performed by the GPU 31.

次に、該音響尤度並列計算装置の動作を説明する。GPGPU3は元々グラフイックの処理、例えば画像のピクセルを回転させたり、その色を変化させたりする処理を高速に行うためのものであるので、これをCPUの演算用のアクセラレータとして用いる場合には、予め、演算のためのデータ領域のサイズとか、演算の変数の定義、演算の処理手順(プログラム)とかを、CPU1側からGPU31に設定またはロードすることが必要になる。   Next, the operation of the acoustic likelihood parallel computing device will be described. Since GPGPU3 is originally intended to perform graphic processing, for example, processing of rotating pixels of an image or changing the color thereof at high speed, when this is used as an accelerator for CPU computation, It is necessary to set or load the size of the data area for the calculation, the definition of the variable for the calculation, and the processing procedure (program) of the calculation into the GPU 31 from the CPU 1 side.

そこで、図3(a)に示すように、CPU1はGPU31を介してGPUメモリ32に、n個の独立するパラメータχ、音声特徴量の分散σ、平均μのための記憶領域(テクスチャ)を用意し、各記憶領域のサイズを、例えば512×512(ピクセル)とする。また、CPU1はGPU31に、演算の変数の定義や、図3(b)のような演算の処理手順をロードする。   Therefore, as shown in FIG. 3A, the CPU 1 prepares a storage area (texture) for n independent parameters χ, variance σ of voice feature amount, and average μ in the GPU memory 32 via the GPU 31. The size of each storage area is, for example, 512 × 512 (pixels). In addition, the CPU 1 loads the GPU 31 with the definition of calculation variables and the calculation processing procedure as shown in FIG.

つまり、図2においては、音響尤度並列実行手順登録部12が音響尤度並列計算部15に対して、初期設定フェーズとして、以下の初期設定を行う。
〈初期設定フェーズ〉
That is, in FIG. 2, the acoustic likelihood parallel execution procedure registration unit 12 performs the following initial setting as an initial setting phase for the acoustic likelihood parallel calculation unit 15.
<Initial setting phase>

・音声特徴量をn次元(n個)とする。   The voice feature amount is n-dimensional (n).

・音響モデルの状態数をm個とする。   ・ The number of states of the acoustic model is m.

・nの平均をμ、nの分散をσ、n個の独立するパラメータをχとする。   The mean of n is μ, the variance of n is σ, and n independent parameters are χ.

・μ、σ、χから求めた尤度計算結果(音響モデルの状態各々の尤度確率)をpとする。   The likelihood calculation result (likelihood probability of each state of the acoustic model) obtained from μ, σ, and χ is p.

・μ、σ、χ、pを、各々保持するためのデータ記憶領域のサイズをl=m×nとし、データ記憶領域を確保する。   The size of the data storage area for holding μ, σ, χ, and p is set to l = m × n, and the data storage area is secured.

前記初期設定フェーズが終わると、次は並列演算フェーズに移る。該並列演算フェーズでは、前記演算式(2)の右辺第二項の演算、すなわち38回の積和演算を行う。具体的には、図2の音響特徴量変換部11は、入力音声を時系列の38個の音響特徴量(デジタル)に変換し、該38個の音響特徴量は音響尤度並列計算部15を介して音響特徴量保持部14に格納される。なお、好ましくは、音声の25m秒を1フレームとした場合、音響特徴量38個×音響モデルの状態数546個×音響モデルの状態の任意の組み合わせ数×1フレーム分、例えば38個×546個×音響モデルの状態の任意の組み合わせ数分の音響特徴量を一括して音響特徴量保持部14に格納すると、処理速度をさらに向上することができる。   When the initial setting phase is finished, the next step is a parallel operation phase. In the parallel operation phase, the operation of the second term on the right side of the equation (2), that is, 38 product-sum operations are performed. Specifically, the acoustic feature quantity conversion unit 11 in FIG. 2 converts the input voice into 38 time-series acoustic feature quantities (digital), and the 38 acoustic feature quantities are converted into the acoustic likelihood parallel calculation unit 15. Is stored in the acoustic feature quantity holding unit 14. Preferably, when 25 msec of speech is one frame, the number of acoustic features 38 × the number of states of the acoustic model 546 × the number of combinations of the states of the acoustic model × one frame, for example 38 × 546 X If the acoustic feature quantities for any number of combinations of the states of the acoustic model are stored in the acoustic feature quantity holding unit 14 collectively, the processing speed can be further improved.

該音響特徴量として、前記した38次元(一般的に、n次元)の音声特徴量平均μ、分散σ、独立するパラメータχが求められているので、音響尤度並列計算部15はこれらの特徴量を用いて、下記の並列演算フェーズの処理を行う。
〈並列演算処理フェーズ〉
As the acoustic feature amount, the 38-dimensional (generally n-dimensional) speech feature amount average μ n , variance σ n , and independent parameter χ n are obtained. Using these feature amounts, the following parallel operation phase processing is performed.
<Parallel processing phase>

・前記データ記憶領域(音響特徴量保持部14)に確保された各々のμ、χに対し、演算式(2)第二項(χ−μ)の演算を一括実行し、結果をzとする。 A calculation result of the second term (χ n −μ n ) is collectively executed for each μ n , χ n secured in the data storage area (acoustic feature amount holding unit 14), and the result Is z n .

・続いて、z、σに対し、演算式(2)第二項(z×z)/(σ×σ)の演算を一括実行し、演算結果をPとし、該Pを保持するためのデータ記憶領域P’(音響特徴量保持部14)に設定する。 - Subsequently, z n, with respect to sigma n, arithmetic expression (2) the calculation of the second term (z n × z n) / (σ n × σ n) collectively perform the operation result as a P n, the It is set in the data storage area P ′ (acoustic feature quantity holding unit 14) for holding Pn .

図4は、該並列演算処理フェーズの概念図である。このフェーズは、前記GPGPUのピクセルシェーダのそれぞれの並列演算をカスケード接続することにより行われる。前記GPGPUのピクセルシェーダでは、データ記憶領域に確保された38次元の各々のμ、χに対し、演算式(2)第二項(χ−μ)の演算を一括実行して38次元の結果zを算出させ、該ピクセルシェーダにおいて、続けて、z、σに対し、演算式(2)第二項(z×z)/(σ×σ)の演算を一括実行し、38次元の結果P’を算出させる。   FIG. 4 is a conceptual diagram of the parallel operation processing phase. This phase is performed by cascading parallel operations of the GPGPU pixel shaders. In the GPGPU pixel shader, the 38-dimensional result z is obtained by collectively executing the operation of the second expression (χ−μ) of the equation (2) for each of 38-dimensional μ and χ secured in the data storage area. In the pixel shader, the operation of the expression (2) second term (z × z) / (σ × σ) is collectively executed for z and σ, and the 38-dimensional result P ′ Is calculated.

前記並列演算処理フェーズが終わると、次は終了フェーズに移る。該終了フェーズでは、音響尤度並列計算制御部13は、音響尤度並列計算部15を介して音響特徴量保持部14にアクセスし、n個のpを読み出し、以下の終了フェーズの処理を行う。
〈終了フェーズ〉
When the parallel operation processing phase ends, the process proceeds to the end phase. In the end phase, the acoustic likelihood parallel calculation control unit 13 accesses the acoustic feature quantity holding unit 14 via the acoustic likelihood parallel calculation unit 15, reads n pieces of p, and performs the following end phase processing. .
<End phase>

・データ記憶領域(音響特徴量保持部14)上のn次元(n個)分のP’に対してシグマ演算を実行し、音響特徴量の合計尤度を算出する。 A sigma calculation is performed on n-dimensional (n) P ′ n on the data storage area (acoustic feature amount holding unit 14), and the total likelihood of the acoustic feature amount is calculated.

つまり、音響尤度並列計算制御部13は、P’の記憶領域からP’の値を38個読み出して、図5に示されている式の演算(音響尤度計算)を行う。ここに、Cは定数部分である。   That is, the acoustic likelihood parallel calculation control unit 13 reads out 38 values of P ′ from the storage area of P ′, and performs calculation (acoustic likelihood calculation) of the equation shown in FIG. Here, C is a constant part.

以上のように、本実施形態によれば、入力音声のn個又はn×1フレームの特徴量をGPUメモリ32に一括して格納し、前記並列演算処理フェーズで説明した並列演算処理を行うようにしたので、演算式(2)第二項(z×z)/(σ×σ)の演算を一括して実行できるようになる。このため、従来の尤度計算方法をそのまま並列実行でき、前記演算式(2)を短時間で実行することが可能となる。 As described above, according to the present embodiment, n or n × 1 frame feature quantities of input speech are collectively stored in the GPU memory 32, and the parallel arithmetic processing described in the parallel arithmetic processing phase is performed. As a result, the calculation of the second term (z n × z n ) / (σ n × σ n ) of the calculation formula (2) can be executed collectively. Therefore, the conventional likelihood calculation method can be executed in parallel as it is, and the arithmetic expression (2) can be executed in a short time.

なお、本発明は前記した実施形態に限定されず、前記ピクセルシェーダの同時ベクトル演算数を上げ、複数の音響モデルに対する演算(音響特徴量38個×音響モデルの状態数546個×音響モデルの状態の任意の重ね合わせ数×M)(Mは1より大きい正の整数)の処理を同時に実行することにより、速度向上を図ることができる。例えば、音声を1フレームずつ並列処理する場合には、例えばMは、その実施形態における音声フレームの最大並列処理実行数となる。   Note that the present invention is not limited to the above-described embodiment, and the number of simultaneous vector operations of the pixel shader is increased to calculate a plurality of acoustic models (acoustic feature amount 38 × acoustic model state number 546 × acoustic model state). The speed can be improved by simultaneously executing the processing of any number of superpositions × M) (M is a positive integer greater than 1). For example, when audio is processed in parallel one frame at a time, M is, for example, the maximum number of parallel processing executions of audio frames in the embodiment.

本発明の一実施形態のハード構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of one Embodiment of this invention. 本発明の一実施形態の機能を示すブロック図である。It is a block diagram which shows the function of one Embodiment of this invention. 初期設定フェーズの概念を説明する図である。It is a figure explaining the concept of an initial setting phase. 並列処理フェーズの概念を説明する図である。It is a figure explaining the concept of a parallel processing phase. 終了フェーズの概念を説明する図である。It is a figure explaining the concept of an end phase.

符号の説明Explanation of symbols

1・・・CPU、2・・・メモリ、3・・・GPGPU、4・・・ディスプレイ、5・・・バス、31・・・GPU、32・・・GPUメモリ、11・・・音響特徴量変換部、12・・・音響尤度並列実行手順登録部、13・・・音響尤度並列計算制御部、14・・・音響特徴量保持部、15・・・音響尤度並列計算部。

DESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... Memory, 3 ... GPGPU, 4 ... Display, 5 ... Bus, 31 ... GPU, 32 ... GPU memory, 11 ... Acoustic feature-value Conversion part, 12 ... Acoustic likelihood parallel execution procedure registration part, 13 ... Acoustic likelihood parallel calculation control part, 14 ... Acoustic feature-value holding part, 15 ... Acoustic likelihood parallel calculation part.

Claims (6)

入力音声を分析し音響特徴量に変換する音響特徴量変換部と、
音響尤度計算の並列実行手順を登録する音響尤度並列実行手順登録部と、
入力音声の音響特徴量と音響モデルの状態を保持する音響特徴量保持部と、
前記音響尤度並列実行手順登録部によって登録された音響尤度並列実行手順に従い、入力音声の音響特徴量と音響モデルの状態を用いて尤度計算を実行する音響尤度並列計算部と、
前記音響尤度並列計算部から尤度計算結果を出力させる音響尤度並列計算制御部と、
前記音響尤度並列計算部から音響尤度計算結果を取得する音響尤度計算結果取得部と、
を具備し、
前記音響尤度並列計算部には、一般的にはグラフィック処理を行うGPGPU(General Purpose Graphic Processing Unit)を使用し、該GPGPUのピクセルシェーダの並列演算をカスケード接続することで音響尤度計算を実行させ、
前記ピクセルシェーダは、データ記憶領域に確保されたn次元(nは1より大きい正の整数)の各々のμ、χに対し、下記の演算式(2)の第二項の(χ−μ)の演算を一括実行してn次元の結果zを算出させ、続けて、z、σに対し、演算式(2)の第二項(z×z)/(σ×σ)の演算を一括実行し、n次元の結果P'を算出させることを特徴とする音響尤度並列計算装置。
Figure 0004854032
ここに、μは平均、σは分散、χは互いに独立するn次元(n個)のパラメータである。
An acoustic feature conversion unit that analyzes input speech and converts it into acoustic features;
An acoustic likelihood parallel execution procedure registration unit for registering parallel execution procedures of acoustic likelihood calculation;
An acoustic feature holding unit that holds the acoustic feature of the input speech and the state of the acoustic model;
In accordance with the acoustic likelihood parallel execution procedure registered by the acoustic likelihood parallel execution procedure registration unit, an acoustic likelihood parallel calculation unit that performs likelihood calculation using the acoustic feature quantity of the input speech and the state of the acoustic model;
An acoustic likelihood parallel calculation control unit that outputs a likelihood calculation result from the acoustic likelihood parallel calculation unit;
An acoustic likelihood calculation result acquisition unit for acquiring an acoustic likelihood calculation result from the acoustic likelihood parallel calculation unit;
Equipped with,
The acoustic likelihood parallel calculation unit generally uses a GPGPU (General Purpose Graphic Processing Unit) for performing graphic processing, and performs acoustic likelihood calculation by cascading parallel operations of pixel shaders of the GPGPU. Let
The pixel shader uses (χ−μ) in the second term of the following equation (2) for each of μ and χ of n dimensions (n is a positive integer greater than 1) reserved in the data storage area. To calculate the n-dimensional result z, and subsequently to the second term (z × z) / (σ × σ) of the arithmetic expression (2) for z and σ. And an acoustic likelihood parallel calculation device that calculates an n-dimensional result P ′.
Figure 0004854032
Here, μ is an average, σ is a variance, and χ is an n-dimensional (n) parameter independent of each other.
請求項1に記載の音響尤度並列計算装置において、
前記ピクセルシェーダの同時ベクトル演算数を(n次元×M)(Mは1より大きい正の整数)、複数(M個)の音響モデルに対する演算の処理を同時に実行することで速度向上を実現したことを特徴とする音響尤度並列計算装置。
In the acoustic likelihood parallel computing device according to claim 1,
The number of simultaneous vector operations of the pixel shader is set to (n dimensions x M) (M is a positive integer greater than 1) , and the processing speed is improved by simultaneously executing operations for multiple (M) acoustic models. An acoustic likelihood parallel computing device characterized by that.
請求項1または2に記載の音響尤度並列計算装置において、
各次元の計算処理結果を足し算する処理(Σ演算)については、CPUにて実行することを特徴とする音響尤度並列計算装置。
In the acoustic likelihood parallel computing device according to claim 1 or 2,
An acoustic likelihood parallel computing device characterized in that processing (Σ operation) for adding the calculation processing results of each dimension is executed by a CPU.
請求項1ないしのいずれかに記載の音響尤度並列計算装置において、
前記n次元は、38次元であることを特徴とする音響尤度並列計算装置。
In the acoustic likelihood parallel computing device according to any one of claims 1 to 3,
The n-dimensionality is 38 dimensions, and the acoustic likelihood parallel computing device.
GPGPUのピクセルシェーダの並列演算をカスケード接続することで音響尤度計算を実行させる音響尤度並列計算のプログラムであって、
コンピュータに、
データ記憶領域に確保された次元(nは1より大きい正の整数)の各々のμ、χに対し、下記の演算式(2)の第二項の(χ−μ)の演算を一括実行して次元の結果zを算出させる機能と、
z、σに対し、演算式(2)の第二項(z×z)/(σ×σ)の演算を一括実行し、次元の結果P'を算出させる機能とを、前記ピクセルシェーダで実現させるためのプログラム。
Figure 0004854032
ここに、μは平均、σは分散、χは互いに独立する次元(個)のパラメータである。
An acoustic likelihood parallel calculation program for executing acoustic likelihood calculation by cascading parallel operations of pixel shaders of GPGPU,
On the computer,
For the n- dimensional (n is a positive integer greater than 1) μ and χ secured in the data storage area, the operation of (χ−μ) in the second term of the following equation (2) is executed in a batch. And a function for calculating the n- dimensional result z,
The pixel shader has the function of collectively executing the calculation of the second term (z × z) / (σ × σ) of the calculation formula (2) for z and σ and calculating the n- dimensional result P ′. A program to make it happen.
Figure 0004854032
Here, μ is an average, σ is a variance, and χ is an n- dimensional ( n ) parameter independent of each other.
請求項5に記載の音響尤度並列計算のプログラムにおいて、
前記n次元は、38次元であることを特徴とするプログラム。
The acoustic likelihood parallel calculation program according to claim 5,
The n-dimensional program is 38-dimensional .
JP2007254642A 2007-09-28 2007-09-28 Acoustic likelihood parallel computing device and program for speech recognition Expired - Fee Related JP4854032B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007254642A JP4854032B2 (en) 2007-09-28 2007-09-28 Acoustic likelihood parallel computing device and program for speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2007254642A JP4854032B2 (en) 2007-09-28 2007-09-28 Acoustic likelihood parallel computing device and program for speech recognition

Publications (2)

Publication Number Publication Date
JP2009086202A JP2009086202A (en) 2009-04-23
JP4854032B2 true JP4854032B2 (en) 2012-01-11

Family

ID=40659724

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2007254642A Expired - Fee Related JP4854032B2 (en) 2007-09-28 2007-09-28 Acoustic likelihood parallel computing device and program for speech recognition

Country Status (1)

Country Link
JP (1) JP4854032B2 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5059928B2 (en) 2010-10-28 2012-10-31 みずほ第一フィナンシャルテクノロジー株式会社 Parallelization of random number generation processing using GPU
JP6346893B2 (en) * 2012-09-07 2018-06-20 カーネギー メロン ユニバーシティCarnegie Mellon University Hybrid GPU / CPU data processing method
CN104538033A (en) * 2014-12-29 2015-04-22 江苏科技大学 Parallelized voice recognizing system based on embedded GPU system and method
CN109087630B (en) * 2018-08-29 2020-09-15 深圳追一科技有限公司 Method and related device for speech recognition
WO2021033889A1 (en) 2019-08-20 2021-02-25 Samsung Electronics Co., Ltd. Electronic device and method for controlling the electronic device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0535923A (en) * 1991-02-28 1993-02-12 Toshiba Corp Pattern discrimination circuit
WO2006033044A2 (en) * 2004-09-23 2006-03-30 Koninklijke Philips Electronics N.V. Method of training a robust speaker-dependent speech recognition system with speaker-dependent expressions and robust speaker-dependent speech recognition system
JP2006171185A (en) * 2004-12-14 2006-06-29 Asahi Kasei Corp Speech recognition device and method
JP2006201265A (en) * 2005-01-18 2006-08-03 Matsushita Electric Ind Co Ltd Voice recognition device
JP4621076B2 (en) * 2005-06-21 2011-01-26 株式会社アドバンテスト Electron beam exposure system
JP2007078943A (en) * 2005-09-13 2007-03-29 Hitachi Ltd Acoustic score calculating program

Also Published As

Publication number Publication date
JP2009086202A (en) 2009-04-23

Similar Documents

Publication Publication Date Title
CN108351984B (en) Hardware-efficient deep convolutional neural network
CN111488985B (en) Deep neural network model compression training method, device, equipment and medium
CN110659725B (en) Neural network model compression and acceleration method, data processing method and device
US12067075B1 (en) Solving optimization problems using a hybrid computer system
JP4854032B2 (en) Acoustic likelihood parallel computing device and program for speech recognition
CN112509600A (en) Model training method and device, voice conversion method and device and storage medium
CN111476138B (en) Construction method, identification method and related equipment for building drawing component identification model
Brugger et al. A quantitative cross-architecture study of morphological image processing on CPUs, GPUs, and FPGAs
CN111709530A (en) Visual display method for learning of quantum machine
CN107977980B (en) Target tracking method, device and readable medium
Khujayorov et al. Parallel signal processing based-on graphics processing units
Poli et al. Voice command recognition with dynamic time warping (dtw) using graphics processing units (gpu) with compute unified device architecture (cuda)
JP7111177B2 (en) LEARNING APPARATUS, LEARNING METHOD, AND LEARNING PROGRAM
CN114299204B (en) Three-dimensional cartoon character model generation method and device
CN116362301A (en) Model quantization method and related equipment
CN105654527A (en) Magnetic resonance imaging reconstruction method and device based on structural dictionary learning
Li et al. A laplacian pyramid based generative h&e stain augmentation network
US11288534B2 (en) Apparatus and method for image processing for machine learning
CN113450764A (en) Text voice recognition method, device, equipment and storage medium
JP3709158B2 (en) Partial selection conversion apparatus, partial selection conversion method, and partial selection conversion program
JP7468650B2 (en) Information processing device, information processing method, and program
WO2023220891A1 (en) Resolution-switchable segmentation networks
JP2005031151A (en) Device and method for likelihood calculation
WO2023220892A1 (en) Expanded neural network training layers for convolution
DeSantis et al. AI SoC-Based Accelerator for Speech Classification Accélérateur de classification de la parole basé sur un AI SoC

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20100128

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20110610

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20110727

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20110926

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20111019

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20111021

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20141104

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

LAPS Cancellation because of no payment of annual fees