JP3209197B2

JP3209197B2 - Character recognition device and recording medium storing character recognition program

Info

Publication number: JP3209197B2
Application number: JP33037298A
Authority: JP
Inventors: 孝文越仲
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-07-03
Filing date: 1998-11-20
Publication date: 2001-09-17
Anticipated expiration: 2018-11-20
Also published as: JP2000082115A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、紙などに書かれた
文字を光学センサで取り込んで読み取る光学的文字認識
技術に関し、特に、単語や文のように複数の文字が並ん
だ文字列を認識する文字認識装置及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an optical character recognition technology for reading a character written on paper or the like by an optical sensor and, more particularly, for recognizing a character string in which a plurality of characters are arranged like a word or a sentence. And a character recognition device and method.

【０００２】[0002]

【従来の技術】この種の従来の文字認識装置において
は、文字列内の文字の境界を同定する文字切り出しと、
切り出されたそれぞれの文字を読み取る文字認識を組み
合わせることによって文字列を読み取っている。2. Description of the Related Art In a conventional character recognition apparatus of this kind, a character segmentation for identifying a boundary of a character in a character string, and
A character string is read by combining character recognition for reading each cut-out character.

【０００３】従来の文字認識技術の一例として、例えば
文献「１９９４年、スー・リャン他、セグメンテーショ
ン・オブ・タッチング・キャラクターズ・イン・プリン
テッド・ドキュメント・レコグニション、パターン・レ
コグニション、第２７巻、第６号、第８２５〜８４０頁
(Su Liang et al., Segmentation of Touching Charact
ers in Printed Document Recognition, Pattern Recog
nition, Vol.27, No.6, pp.825-840, 1994)」の記載が
参照される。[0003] As an example of a conventional character recognition technique, for example, in the document "Shu Liang et al., 1994, Segmentation of Touching Characters in Printed Document Recognition, Pattern Recognition, Vol. 27, No. 6, No., pp. 825-840
(Su Liang et al., Segmentation of Touching Charact
ers in Printed Document Recognition, Pattern Recog
nition, Vol. 27, No. 6, pp. 825-840, 1994). "

【０００４】この文献に記載されている方式は、投影ヒ
ストグラムの形状、及び、これから派生する情報を利用
して、文字境界の候補を抽出し、任意の２つの文字境界
で挟まれる文字列の一部を文字の候補として、すべて抽
出する（文字切り出し）。In the method described in this document, a candidate for a character boundary is extracted using the shape of a projection histogram and information derived from the shape of the projection histogram, and one of character strings sandwiched between any two character boundaries is extracted. All parts are extracted as character candidates (character extraction).

【０００５】次に、全ての文字候補に対して、文字認識
を行い、それぞれに認識結果とその尤もらしさ（スコ
ア）を計算する。Next, character recognition is performed on all character candidates, and the recognition result and its likelihood (score) are calculated for each.

【０００６】最後に、文字列として連結した際にスコア
が最大となるように、文字候補を選び、同時に正しいと
思われる文字列の切り出し位置を決定する。Finally, character candidates are selected so that the score becomes maximum when the character strings are concatenated, and at the same time, a cutout position of a character string considered to be correct is determined.

【０００７】この他にも各種方式が、従来より提案され
ているが、その多くは、文字切り出しに用いる情報が異
なるのみであるもの、あるいは、文字切り出しをせずに
網羅的に文字列中のあらゆる部分で文字認識を行って最
適な切り出し位置を決定するもの、または、文字認識で
文字画像から抽出する特徴量や文字を識別する方法が異
なるのみである。[0007] In addition, various methods have been proposed in the past, but most of the methods use only different information for character extraction, or comprehensively extract characters in character strings without character extraction. The only difference is that character recognition is performed at every part to determine the optimal cutout position, or the method of character recognition and the method of identifying characters extracted from a character image by character recognition are different.

【０００８】また上述した例は、印刷文字を認識対象と
しているが、手書き文字を対象とする方式においても、
同様であり、多くの場合、文字切り出しと文字認識は別
個のモジュールとして構成されており、これらを組み合
わせて文字列の読み取りを行うという手順が採用されて
いる。In the above-described example, print characters are to be recognized.
Similarly, in many cases, character segmentation and character recognition are configured as separate modules, and a procedure of reading a character string by combining them is adopted.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上記し
た従来の技術においては、文字切り出しと文字認識の処
理系が別個に構築されて用いられるため、特に手書き文
字列の認識において、文字列特有の文字パタンの変形に
対応できず、このため誤認識を生ずることが多い、とい
う問題点を有している。However, in the above-mentioned prior art, since the processing systems for character extraction and character recognition are separately constructed and used, particularly in the recognition of handwritten character strings, a character string specific to a character string is used. There is a problem in that it is not possible to cope with the deformation of the pattern, which often causes erroneous recognition.

【００１０】例えば、筆記体英文のように、続け書きで
書かれた文字列の場合、“ａ”という文字を書き終わっ
た時点でのペンの位置は下部にあるが、“ｏ”を書き終
わった時点でのペンの位置は上部である。したがって、
同じ文字であっても、“ａ”の次に書かれるか、“ｏ”
の次に書かれるかによって、パタンの形状が変化する。
図６を参照すると、続け書きで書かれた文字列“ａ
ｂ”、“ｏｂ”について、同じ“ｂ”であっても、
“ａ”に続く“ｂ”は“ｂ”を囲む矩形左下側から、
“ｏ”に続く“ｂ”は“ｂ”を囲む矩形左側のほぼ中央
から開始している。For example, in the case of a character string written in continuous writing like a cursive English sentence, the position of the pen at the time when the character "a" has been written is at the bottom, but the character position "o" has been written. The position of the pen at the point of time is at the top. Therefore,
Even if it is the same character, it is written after "a" or "o"
The shape of the pattern changes depending on whether it is written next.
Referring to FIG. 6, the character string "a"
b "and" ob ", even if the same" b "
"B" following "a" is from the lower left side of the rectangle surrounding "b".
"B" following "o" starts from approximately the center of the left side of the rectangle surrounding "b".

【００１１】このようなことは個別文字では起こり得な
い、文字列特有の変形である。[0011] This is a character string-specific deformation that cannot occur with individual characters.

【００１２】そして、このような変形は、個別文字だけ
を学習して構築されている従来の文字認識処理系では、
対応不可能であり、しばしば誤認識の原因となる。[0012] Such a modification is caused by a conventional character recognition processing system constructed by learning only individual characters.
It is not possible and often causes misperception.

【００１３】またアルファベットだけでなく、数字でも
同様の問題はみられる。例えば“５”の書き終わりのペ
ンの進行方向は文字列の方向と同じであることから、
“５”に続く文字は“５”とつながって書かれることが
多い。この影響で、図７に示すように、“５”の次に書
かれる文字は“５”と滑らかにつながるように変形を受
ける。A similar problem is observed not only with alphabets but also with numerals. For example, the direction of travel of the pen at the end of writing “5” is the same as the direction of the character string,
The character following "5" is often written connected to "5". As a result, as shown in FIG. 7, the character written after "5" is deformed so as to be smoothly connected to "5".

【００１４】このため文字列特有の変形を考慮していな
い従来の文字認識処理システムでは、誤認識を生じ易
い。For this reason, in a conventional character recognition processing system which does not take into account the deformation unique to a character string, erroneous recognition is likely to occur.

【００１５】すなわち、一般的に文字列は、隣接する文
字同士に依存関係があり、互いに影響しあって変形を生
ずるという傾向が存在する。That is, in general, a character string has a dependency relationship between adjacent characters, and tends to affect each other and cause deformation.

【００１６】隣接する文字に依存して文字が変形を受け
る問題に対して、隣接する２文字を１つのパタンとし
て、字種数の２乗個のテンプレートを学習して辞書を構
築する方法も考えられる。しかし、２文字の並びは、パ
タンとしての変形のバリエーションが極めて多様とな
り、膨大な量の学習データが必要となる。さらに、テン
プレートは、字種数の２乗個必要となるため、学習デー
タ不足の問題が深刻となる。To solve the problem that a character is deformed depending on an adjacent character, a method of constructing a dictionary by learning two templates of the number of character types using two adjacent characters as one pattern is also considered. Can be However, in the arrangement of two characters, variations in the deformation as patterns become extremely diverse, and an enormous amount of learning data is required. Further, since the number of templates required is the square of the number of character types, the problem of insufficient training data becomes serious.

【００１７】このように、２文字を１パタンとして字種
の２乗個のテンプレートを学習する方法は実用に適さな
いことがわかる。As described above, it is understood that the method of learning the square template of the character type with two characters as one pattern is not suitable for practical use.

【００１８】したがって、本発明は、上記技術的課題の
認識に基づき創案されてものであって、その目的は、隣
接する文字間の依存性に起因して生ずる文字形状の変形
による影響が小さい、すなわち文字の接触や続け書きに
対して頑健（robust）な文字認識装置及び方法を提供す
ることにある。Therefore, the present invention has been made based on the recognition of the above technical problem, and an object of the present invention is to reduce the influence of the deformation of the character shape caused by the dependence between adjacent characters. That is, an object of the present invention is to provide a character recognition apparatus and method that is robust against character contact and continuous writing.

【００１９】[0019]

【課題を解決するための手段】上述した目的を達成する
本発明は、文字列画像を入力し記憶する画像記憶手段
と、前記画像記憶手段から得た文字列画像から１文字に
相当する部分パタンを獲得するための切り出し位置候補
を検出する文字切り出し手段と、前記文字切り出し手段
から得た切り出し位置候補に基づいて１文字に相当する
部分パタンである個別文字パタン候補を生成し、文字認
識して最適な文字列の読み取り結果を出力する文字列読
み取り手段と、前記文字列読み取り手段が生成した個別
文字パタン候補を、前記文字列読み取り手段の要求に応
じて認識し、文字認識結果及び文字認識結果の尤もらし
さを表す文字認識スコアを出力する文字認識手段と、前
記文字認識手段が１文字パタン候補の識別及びスコア評
価に用いるための辞書を格納する１文字辞書格納手段
と、前記文字認識手段が隣接した２文字分の文字パタン
候補を用いて個別文字の識別を行うための隣接２文字辞
書を格納するための２文字辞書格納手段と、を備え、前
記文字認識手段が、前記文字列読み取り手段から文字パ
タン候補を受け取って文字認識を行う際に、認識対象の
文字パタン候補と、その直前の文字パタン候補を受け取
り、認識対象の文字パタン候補がある字種に属すると仮
定した場合に、与えられた認識対象の文字パタン候補と
その直前の文字パタン候補とが生起する確率、及び、直
前の文字パタンが生起する確率を用いて、認識対象の文
字パタン候補がその字種に属することの尤もらしさを表
すスコアとする。 SUMMARY OF THE INVENTION In order to achieve the above object, the present invention provides an image storage means for inputting and storing a character string image, and a partial pattern corresponding to one character from the character string image obtained from the image storage means. Character segmenting means for detecting a segmentation position candidate for obtaining a character, and an individual character pattern candidate, which is a partial pattern corresponding to one character, is generated based on the segmentation position candidate obtained from the character segmentation unit. A character string reading means for outputting an optimum character string reading result, and an individual character pattern candidate generated by the character string reading means recognized according to a request from the character string reading means, and a character recognition result and a character recognition result Character recognition means for outputting a character recognition score representing the likelihood of a character pattern, and a character string used by the character recognition means for identification of one-character pattern candidates and score evaluation. A two-character dictionary storage means for storing an adjacent two-character dictionary for identifying individual characters by using the character pattern candidates for two adjacent characters. With, before
The character recognizing means receives a character pattern from the character string reading means.
When receiving character candidates and performing character recognition,
Receive character pattern candidates and the character pattern candidate immediately before
If the character pattern candidate to be recognized belongs to a certain character type,
If specified, the given character pattern candidate to be recognized
The probability of occurrence of the character pattern candidate immediately before
Using the probability that the previous character pattern occurs, the sentence to be recognized is
Shows the likelihood that a character pattern candidate belongs to that character type.
Score.

【００２０】[0020]

【発明の実施の形態】本発明の実施の形態について説明
する。まず本発明の文字認識装置の原理について説明す
る。本発明は、一実施例の形態において、（ａ）文字認
識処理系を構築する際に、文字列を訓練データに用い
て、文字列から直接文字を学習し、（ｂ）入力文字列中
の第ｉ番目の文字パタンＸｉを辞書パタンｗと比較して
文字認識のスコアを計算する際に、その直前のｉ−１番
目の文字としてパタンＸｉ−１が生起するという条件を
付加した条件付き確率Ｐ（Ｘｉ｜Ｘｉ−１，ｗ）として
計算することによって、文字の接触や続け書き等、隣接
する文字間の依存性に起因して生ずる文字形状の変形に
よる影響を受けにくい頑強な文字認識を実現している。Embodiments of the present invention will be described. First, the principle of the character recognition device of the present invention will be described. According to an embodiment of the present invention, in the embodiment, (a) when constructing a character recognition processing system, a character string is used as training data to learn characters directly from the character string; When calculating the character recognition score by comparing the i-th character pattern Xi with the dictionary pattern w, a conditional probability to which a condition that a pattern Xi-1 occurs as the (i-1) th character immediately before it is added. By calculating as P (Xi | Xi-1, w), robust character recognition that is not easily affected by character shape deformation caused by dependence between adjacent characters, such as character contact or continuous writing, can be achieved. Has been realized.

【００２１】ここで、条件付き確率Ｐ（Ｘｉ｜Ｘｉ−
１，ｗ）の値は、ある文字とその直前の１文字を合わせ
た隣接２文字単位でのスコアＰ（Ｘｉ−１，Ｘｉ｜ｗ）
と、１文字単位でのスコアＰ（Ｘｉ−１｜ｗ）の比、Ｐ（Ｘｉ−１，Ｘｉ｜ｗ）／Ｐ（Ｘｉ−１｜ｗ） …(1)、または、これを簡単化した、Ｐ（Ｘｉ−１，Ｘｉ｜ｗ）／Ｐ（Ｘｉ−１） …(2) を計算することによって得られる。Here, the conditional probability P (Xi | Xi−
The value of (1, w) is the score P (Xi-1, Xi | w) in units of two adjacent characters obtained by combining a certain character and the character immediately before it.
And the ratio of the score P (Xi-1 | w) in units of one character, P (Xi-1, Xi | w) / P (Xi-1 | w) (1) or simplified , P (Xi-1, Xi | w) / P (Xi-1) (2)

【００２２】本発明は、別の実施の形態として、（ａ）
文字認識処理系を構築する際に、文字列を訓練データに
用いて文字列から直接文字を学習し、（ｂ）入力文字列
中の第ｉ番目の文字パタンＸｉを辞書パタンｗと比較し
て文字認識のスコアを計算する際に、ｉ−１番目の文字
としてパタンＸｉ−１が生起し、かつ、ｉ−１番目の文
字パタンＸｉ−１が辞書パタンｗ′に代表される文字カ
テゴリに属するという条件を付加した条件付き確率Ｐ
（Ｘｉ｜Ｘｉ−１，ｗ′，ｗ）として計算することによ
って実現される。The present invention provides, as another embodiment, (a)
When constructing a character recognition processing system, a character string is directly learned from the character string by using the character string as training data, and (b) the i-th character pattern Xi in the input character string is compared with the dictionary pattern w. When calculating the character recognition score, a pattern Xi-1 occurs as the (i-1) th character, and the (i-1) th character pattern Xi-1 belongs to a character category represented by the dictionary pattern w '. Conditional probability P with the condition
This is realized by calculating as (Xi | Xi-1, w ', w).

【００２３】ここで、条件付き確率Ｐ（Ｘｉ｜Ｘｉ−
１，ｗ′，ｗ）の値は、ある文字とその直前の１文字を
合わせた隣接２文字単位でのスコアＰ（Ｘｉ−１，Ｘｉ
｜ｗ′，ｗ）と、１文字単位でのスコアＰ（Ｘｉ−１｜
ｗ′，ｗ）の比Ｐ（Ｘｉ−１，Ｘｉ｜ｗ′，ｗ）／Ｐ（Ｘｉ−１｜ｗ′，ｗ） …(3)、またはそれを簡単化した、Ｐ（Ｘｉ−１，Ｘｉ｜ｗ′，ｗ）／Ｐ（Ｘｉ−１｜ｗ′） …(4) を計算することによって得られる。以下、実施例に即し
て詳細に説明する。Here, the conditional probability P (Xi | Xi−
The value of (1, w ′, w) is the score P (Xi−1, Xi) in units of two adjacent characters obtained by combining a certain character and the immediately preceding character.
| W ′, w) and the score P (Xi−1 |
w ′, w) ratio P (Xi−1, Xi | w ′, w) / P (Xi−1 | w ′, w) (3) or a simplified version of P (Xi−1, Xi | w ', w) / P (Xi-1 | w') (4) Hereinafter, the present invention will be described in detail with reference to examples.

【００２４】[0024]

【実施例】図１は、本発明の一実施例の構成を示すブロ
ック図である。図１を参照すると、この実施例は、入力
された文字列画像を光学センサで取り込んで格納する画
像記憶手段１と、画像記憶手段１より受け取った文字列
画像より隣接文字間の境界の候補を切り出し位置候補と
して検出する文字切り出し手段２と、いくつかの切り出
し位置候補を選んで文字列画像を分割した際の個々の個
別文字パタンについて文字認識処理を呼び出して文字列
全体としての認識スコアを計算し、最適な切り出し及び
認識結果を文字列の読み取り結果として出力する文字列
読み取り手段３と、文字列読み取り手段３の要求に応じ
て文字パタンに認識処理をかけ、１つの字種と認識スコ
アを返す文字認識手段４と、単一文字パタンの出現しや
すさのスコアを計算する１文字辞書格納手段５と、隣り
合う２文字に相当するパタンを用いて２文字目の字種と
認識スコアを計算する隣接２文字辞書格納手段６と、を
有する。画像記憶手段１、文字切り出し手段２、文字列
読み取り手段３、文字認識手段４と、１文字辞書格納手
段５、隣接２文字辞書格納手段６は、コンピュータ上で
実行されるプログラム制御によりその機能を実現するこ
とができる。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention. Referring to FIG. 1, in this embodiment, an image storage unit 1 that captures and stores an input character string image by an optical sensor and a candidate for a boundary between adjacent characters is determined from the character string image received from the image storage unit 1. Character extraction means 2 for detecting as a cutout position candidate, and character recognition processing is called for each individual character pattern when a character string image is divided by selecting some cutout position candidates to calculate a recognition score of the entire character string Then, a character string reading means 3 for outputting an optimal cutout and recognition result as a character string reading result, and performing a recognition process on a character pattern in response to a request from the character string reading means 3 to obtain one character type and a recognition score. A character recognition unit 4 to return, a one-character dictionary storage unit 5 for calculating a score of the likelihood of occurrence of a single character pattern, and a pattern corresponding to two adjacent characters are used. Te to calculate the recognition score and the second character of the character types with the adjacent 2 character dictionary storage unit 6, a. The image storage unit 1, character extraction unit 2, character string reading unit 3, character recognition unit 4, one-character dictionary storage unit 5, and adjacent two-character dictionary storage unit 6 have their functions controlled by a program executed on a computer. Can be realized.

【００２５】なお、文字認識手段４において、文字認識
を行う場合には、入力画像に対して前処理を行うのが一
般的であり、この前処理として、例えば、多値画像をよ
り扱いやすい２値画像に変換する２値化処理、文字の大
きさやストロークの間隔、傾き等を整形する正規化処
理、画像中の細かな汚れやかすれを取り除くノイズ除去
処理、入力パタンを識別に有用な、より少数の量に変換
する特徴抽出処理等の公知の各種処理が用いられる。本
発明の一実施例においても、これらの処理を、必要に応
じて文字認識手段４に導入してもよいことは勿論であ
る。また、これらの前処理は前後関係を問わず、任意の
順序で適用することができる。When character recognition is performed by the character recognizing means 4, preprocessing is generally performed on an input image. As the preprocessing, for example, a multi-valued image is more easily handled. Binary processing to convert to a value image, normalization processing to shape character size, stroke interval, inclination, etc., noise removal processing to remove fine dirt and blurring in images, useful for identifying input patterns, etc. Various known processes such as a feature extraction process for converting into a small amount are used. In the embodiment of the present invention, these processes may of course be introduced into the character recognition means 4 if necessary. In addition, these preprocessings can be applied in any order regardless of the context.

【００２６】さらに、特徴抽出処理によって抽出される
特徴量が、文字切り出しによる画像の分割に伴って分割
できる種類のもの（入力画像内の局所領域ごとに特徴量
が計算される）であれば、これらの前処理は、特徴抽出
処理とともに、文字切り出し手段２あるいは画像記憶手
段１において行うことにより、入力文字列画像から特徴
を一括抽出することも可能である。Furthermore, if the feature quantity extracted by the feature extraction processing is of a type that can be divided along with the division of the image by character segmentation (the feature quantity is calculated for each local area in the input image), By performing these pre-processing in the character extracting means 2 or the image storing means 1 together with the feature extracting processing, it is also possible to collectively extract the features from the input character string image.

【００２７】特徴抽出処理の一例として、文字線の方向
別の強さを特徴として計算する処理を、図３に示す入力
画像の具体例に即して説明する。As an example of the feature extraction process, a process of calculating the strength of each character line direction as a feature will be described with reference to a specific example of an input image shown in FIG.

【００２８】文字線の方向を縦（９０°方向）、横（０
°方向）、斜め（４５°及び１３５°方向）の４段階方
向に取り、それぞれの方向について、また画像を縦４、
横６３区間で分割した小領域のそれぞれについて、文字
線の長さを計測する。The direction of the character line is vertical (90 ° direction) and horizontal (0
° direction) and diagonal (45 ° and 135 ° directions) in four steps, and in each direction,
The length of the character line is measured for each of the small areas divided in the 63 horizontal sections.

【００２９】ここで文字線の長さは、該当する方向に連
結する黒画素の数として計測することができる。黒画素
が存在しない領域では、長さは０とすればよい。このよ
うにして、図３に示す、“０２０６２”と書かれた入力
画像から、図４に示すような、文字線の方向に基づいた
特徴パタンが得られる。Here, the length of the character line can be measured as the number of black pixels connected in the corresponding direction. In a region where no black pixel exists, the length may be set to 0. In this way, a characteristic pattern based on the direction of the character line as shown in FIG. 4 is obtained from the input image written as “02062” shown in FIG.

【００３０】図４において、黒色が濃いほど、文字線の
長さに対応する特徴値が大きいことを意味する。また特
徴パタンは縦方向に１６の領域に分かれているが、上か
ら４領域ずつが、それぞれ０°、４５°、９０°、１３
５°の方向の特徴値に対応する。In FIG. 4, the darker the black, the greater the characteristic value corresponding to the length of the character line. The feature pattern is divided into 16 regions in the vertical direction. Four regions from the top are 0 °, 45 °, 90 °, 13
This corresponds to a feature value in the direction of 5 °.

【００３１】この特徴のように、画像中の小領域に対応
して特徴の値が定義される特徴量の場合、入力画像中で
の文字の切り出し位置が決まれば、これに伴って、小領
域単位で特徴パタンを分割することができることから、
文字切り出し手段２の上流工程でも特徴抽出処理を行う
ことができる。In the case of a feature amount in which the value of a feature is defined corresponding to a small area in an image as in this feature, if the cutout position of the character in the input image is determined, the small area Since the feature pattern can be divided in units,
The feature extraction process can be performed also in the upstream process of the character segmenting means 2.

【００３２】上述した特徴抽出処理の例は、文字切り出
し手段２での後工程に置くこともできる。さらに、特徴
抽出処理を行わず、入力画像をそのまま一種の特徴とし
て用いるようにしてもよい。The above-described example of the feature extraction processing can be placed in a later step in the character extracting means 2. Further, the input image may be used as it is as a kind of feature without performing the feature extraction process.

【００３３】図２は、本発明の一実施例の処理フローを
説明するための流れ図である。図１及び図２を参照し
て、本発明の一実施例の動作について詳細に説明する。FIG. 2 is a flowchart for explaining the processing flow of one embodiment of the present invention. The operation of the embodiment of the present invention will be described in detail with reference to FIGS.

【００３４】画像はスキャナ等によって光学的に入力さ
れ、画像記憶手段１に格納され、さらに文字切り出し手
段２へ送られる（図２のステップ１０）。The image is optically input by a scanner or the like, stored in the image storage means 1, and sent to the character cutout means 2 (step 10 in FIG. 2).

【００３５】文字切り出し手段２は、文字列画像からい
くつかの切り出し位置候補を検出し、その座標情報及び
文字列画像または文字列画像を特徴抽出処理により変換
した特徴パタンを文字列読み取り手段３へ送る（図２の
ステップ１１）。The character cutout means 2 detects some cutout position candidates from the character string image, and sends the coordinate information and the character string image or the characteristic pattern obtained by converting the character string image by the characteristic extraction processing to the character string reading means 3. Send (step 11 in FIG. 2).

【００３６】切り出し位置候補の検出には、何らかの図
形的な情報を利用する。図形的な情報としては、例えば
文字列の投影ヒストグラムを計算する。例えば、文字列
が横書きならば縦方向、縦書きならば横方向に投影した
ヒストグラムを計算する。そして、度数があらかじめ設
定したしきい値よりも低い位置を、切り出し位置候補と
する。Some graphical information is used to detect a cutout position candidate. As the graphic information, for example, a projection histogram of a character string is calculated. For example, if the character string is written horizontally, a histogram projected in the vertical direction is calculated, and if the character string is written vertically, a histogram projected in the horizontal direction is calculated. Then, a position whose frequency is lower than a preset threshold value is set as a cutout position candidate.

【００３７】図形的な情報を用いた、別の切り出し手段
として、文字列の輪郭線を追跡してその凹凸を計測し、
凹みがしきい値よりも大きくなる位置を切り出し位置候
補として記憶する、という方法を用いてもよい。As another cutout means using graphic information, the contour of a character string is tracked and its irregularities are measured.
A method of storing a position where the dent is larger than the threshold value as a cutout position candidate may be used.

【００３８】また、複数の図形的特徴を併用して切り出
し位置候補を求める方法を用いてもよい。Further, a method of obtaining a cutout position candidate using a plurality of graphic features together may be used.

【００３９】さらに、切り出し位置候補の検出は、図形
的情報を利用しない方法によっても可能である。例えば
図形的情報を利用しない場合、文字列画像の開始位置の
座標から終了位置の座標までを等間隔に区切り、その区
切り点をすべて切り出し位置候補として記憶する。この
場合、切り出し位置候補としては、例えば想定される文
字数の数倍程度等という具合に、ある程度多数の切り出
し位置候補を記憶する。Further, the extraction position candidate can be detected by a method not using graphic information. For example, when the graphical information is not used, the coordinates from the coordinates of the start position to the coordinates of the end position of the character string image are equally spaced, and all the separation points are stored as cutout position candidates. In this case, a large number of cutout position candidates are stored as the cutout position candidates, for example, about several times the assumed number of characters.

【００４０】文字列読み取り手段３は、文字列画像再構
成処理（図２のステップ１２）、文字列認識（図２のス
テップ１６）、及び結果比較評価（図２のステップ１
７）の各処理を行う。The character string reading means 3 performs character string image reconstruction processing (step 12 in FIG. 2), character string recognition (step 16 in FIG. 2), and result comparison and evaluation (step 1 in FIG. 2).
Perform each process of 7).

【００４１】すなわち文字切り出し手段２より受け取っ
た文字列画像または文字列画像を特徴抽出処理により変
換した特徴パタン、及び切り出し位置候補の座標情報を
受け取り、切り出し位置で切り出されたあらゆる文字パ
タン候補について文字認識手段４を用いて文字認識を行
い、その認識結果と認識スコアを記憶する。That is, the character string image received from the character extracting means 2 or the characteristic pattern obtained by converting the character string image by the characteristic extracting process and the coordinate information of the extracting position candidate are received, and the character pattern is extracted for all the character pattern candidates extracted at the extracting position. Character recognition is performed using the recognition means 4, and the recognition result and the recognition score are stored.

【００４２】そして、文字列全体としてもっとも認識ス
コアが高く、かつ重複や読み飛ばしのない文字パタン候
補列の認識結果を、文字列の読み取り結果として選び出
して出力する。Then, a recognition result of a character pattern candidate string having the highest recognition score as a whole and having no duplication or skipping is selected and output as a character string reading result.

【００４３】最適な読み取り結果の検索手順については
後述する。The procedure for searching for the optimum reading result will be described later.

【００４４】文字列読み取り手段３が文字認識手段４に
文字パタン候補を送る際、該当する文字パタン候補に加
え、その直前の文字パタン候補も送る。When the character string reading means 3 sends a character pattern candidate to the character recognizing means 4, it sends not only the corresponding character pattern candidate but also the character pattern candidate immediately before it.

【００４５】文字認識手段４は、これら隣接する２つの
文字パタン候補を考慮して該当する文字パタン候補の文
字認識処理を行う。The character recognizing means 4 performs a character recognizing process of the corresponding character pattern candidate in consideration of these two adjacent character pattern candidates.

【００４６】文字認識手段４は、文字列読み取り手段３
より、隣接する２つの文字パタン候補を受け取り、後者
に関する文字認識処理を行い、文字認識結果（字種）及
び文字認識スコアを計算し、文字列読み取り手段３に返
す。The character recognizing means 4 includes the character string reading means 3
Then, two adjacent character pattern candidates are received, character recognition processing for the latter is performed, a character recognition result (character type) and a character recognition score are calculated, and the result is returned to the character string reading means 3.

【００４７】ここで、文字認識手段４が文字列読み取り
手段３から、隣接する２つの文字パタン候補Ｘｉ−１、
Ｘｉを受け取ったとすると、Ｘｉの文字認識結果ｗｉ
は、字種がｗ、直前の文字パタンがＸｉ−１であるとい
う条件の下で文字パタンＸｉが生起する確率Ｐ（Ｘｉ｜
Ｘｉ−１，ｗ）を最大にするｗとして決定される。Here, the character recognizing means 4 sends two adjacent character pattern candidates Xi-1,
If Xi is received, the character recognition result wi of Xi
Is the probability P (Xi |) that the character pattern Xi occurs under the condition that the character type is w and the character pattern immediately before is Xi-1.
Xi−1, w) is determined as w that maximizes Xi−1, w).

【００４８】また、その際の文字認識スコアは、Ｐ（Ｘ
ｉ｜Ｘｉ−１，ｗｉ）として計算される。The character recognition score at that time is P (X
i | Xi-1, wi).

【００４９】確率Ｐ（Ｘｉ｜Ｘｉ−１，ｗ）の実際の計
算では、Ｐ（Ｘｉ｜Ｘｉ−１，ｗ）を直接計算せずに、Ｐ（Ｘｉ−１，Ｘｉ｜ｗ）／Ｐ（Ｘｉ−１） …(5) という近似値を求める。In the actual calculation of the probability P (Xi | Xi−1, w), P (Xi | Xi−1, w) is not directly calculated, but P (Xi−1, Xi | w) / P ( Xi-1)... (5)

【００５０】この近似値の計算において、分子（被除
数）Ｐ（Ｘｉ−１，Ｘｉ｜ｗ）は、隣接する２文字パタ
ンのうちの２文字目の字種がｗであるという条件で隣接
する２文字のパタンがＸｉ−１，Ｘｉとして生起する確
率であり、これは隣接２文字辞書格納手段５に記憶され
た隣接２文字パタンの辞書から、２文字単位のパタンの
マッチング結果として計算される。この処理が、図２の
ステップ１３の隣接２文字評価処理に相当する。In the calculation of the approximate value, the numerator (dividend) P (Xi−1, Xi | w) is calculated based on the condition that the character type of the second character in the adjacent two-character pattern is w. The probability of occurrence of a character pattern as Xi-1, Xi, which is calculated from a dictionary of adjacent two-character patterns stored in the adjacent two-character dictionary storage unit 5 as a matching result of a pattern in units of two characters. This processing corresponds to the adjacent two-character evaluation processing in step 13 in FIG.

【００５１】一方、分母（除数）Ｐ（Ｘｉ−１）は、事
前知識なしの状態で文字パタンＸｉ−１が観測される確
率であり、これは、１文字辞書格納手段６より、１文字
単位のパタンのマッチングとして計算される。この処理
が、図２のステップ１４の１文字評価処理に相当する。On the other hand, the denominator (divisor) P (Xi-1) is the probability that the character pattern Xi-1 is observed without prior knowledge, and is calculated from the one-character dictionary storage means 6 in units of one character. Is calculated as the pattern matching of This processing corresponds to the one-character evaluation processing in step 14 in FIG.

【００５２】文字認識手段４は、隣接２文字辞書格納手
段５、及び、１文字辞書格納手段６より得られた、それ
ぞれの数値の比として、文字認識スコア、Ｐ（Ｘｉ｜Ｘｉ−１，ｗｉ）≒Ｐ（Ｘｉ−１，Ｘｉ｜ｗ）／Ｐ（Ｘｉ−１） …(6) を得る。この処理が、図２のステップ１５の文字認識処
理に相当する。The character recognizing means 4 calculates a character recognition score, P (Xi | Xi-1, wi) as a ratio of respective numerical values obtained from the adjacent two-character dictionary storing means 5 and the one-character dictionary storing means 6. ) ≒ P (Xi−1, Xi | w) / P (Xi−1) (6) This processing corresponds to the character recognition processing in step 15 in FIG.

【００５３】文字列読み取り手段３の動作についてより
詳しく説明する。文字列読み取り手段３は、文字切り出
し手段２より受け取った文字列画像または文字列画像を
特徴抽出処理により変換した特徴パタン、及び切り出し
位置候補の座標情報を用いて、文字列の文字パタン候補
へのあらゆる分割の仕方を列挙する。この処理は、図２
のステップ１２の文字列画像再構成処理に相当する。The operation of the character string reading means 3 will be described in more detail. The character string reading unit 3 converts the character string into a character pattern candidate using the character string image received from the character cutting unit 2 or the characteristic pattern obtained by converting the character string image by the characteristic extraction processing and the coordinate information of the cutout position candidate. List all ways of division. This process is shown in FIG.
Corresponds to the character string image reconstruction processing in step 12 of FIG.

【００５４】例えば、入力画像から４つの切り出し位置
候補が得られているとすると、入力画像は、パタン１、
パタン２、パタン３、パタン４、パタン５という５つの
部分パタンに分割できる。For example, if four cutout position candidates are obtained from the input image, the input image is
It can be divided into five partial patterns: pattern 2, pattern 3, pattern 4, and pattern 5.

【００５５】これに対して、文字数２を仮定すると、
（１｜２，３，４，５）、（１，２｜３，４，５）、
（１，２，３｜４，５）、（１，２，３，４｜５）とい
う４通りの分割があり得る。On the other hand, assuming that the number of characters is 2,
(1 | 2,3,4,5), (1,2 | 3,4,5),
There can be four types of divisions, (1,2,3 | 4,5) and (1,2,3,4 | 5).

【００５６】また文字数３を仮定すると、（１｜２｜
３，４，５）、（１｜２，３｜４，５）、（１｜２，
３，４｜５）、（１，２｜３｜４，５）、（１，２｜
３，４｜５）、（１，２，３｜４｜５）の計６通りの分
割があり得る。Assuming that the number of characters is 3, (1 | 2 |
3, 4, 5), (1 | 2, 3 | 4, 5), (1 | 2,
3,4 | 5), (1,2 | 3 | 4,5), (1,2 |
There can be a total of six divisions of (3, 4 | 5) and (1, 2, 3 | 4 | 5).

【００５７】ただし、ここでは、入力画像の分割位置を
“｜”で表している。Here, the division position of the input image is represented by "|".

【００５８】例えば（１，２｜３｜４，５）は、部分パ
タン１、２が１文字目に、部分パタン３が２文字目に、
部分パタン４、５が３文字目に割り当てられるように、
入力画像を分割（グループ分け）することを意味する。For example, (1,2 | 3 | 4,5) indicates that partial patterns 1 and 2 are the first character, partial pattern 3 is the second character,
As partial patterns 4 and 5 are assigned to the third character,
This means that the input image is divided (grouped).

【００５９】このようにして想定される文字数につい
て、あらゆる分割の仕方を網羅して文字パタン候補の列
を生成し、それぞれについて、文字列全体での読み取り
スコアを計算する。この処理は、図２のステップ１６の
文字列認識処理に相当する。In this way, a string of character pattern candidates is generated for all possible divisions for the assumed number of characters, and the reading score of the entire character string is calculated for each. This processing corresponds to the character string recognition processing in step 16 in FIG.

【００６０】読み取りスコアは、各文字パタン候補の認
識スコアの積、すなわち、Ｐ（Ｘ１｜ｗ１）×Ｐ（Ｘ２｜Ｘ１，ｗ２）×Ｐ（Ｘ３
｜Ｘ２，ｗ３）×…×Ｐ（Ｘｎ｜Ｘｎ−１，ｗｎ）と計算する。ここで、ｎは文字数である。The read score is the product of the recognition scores of each character pattern candidate, that is, P (X1 | w1) × P (X2 | X1, w2) × P (X3
| X2, w3) ×... × P (Xn | Xn−1, wn). Here, n is the number of characters.

【００６１】想定される文字数及び字種について、それ
ぞれ読み取りスコアを計算し、読み取りスコアが最大と
なる認識結果ｗ１、ｗ２、…、ｗｎが、読み取り結果と
して出力される。この処理は、図２のステップ１７の結
果比較評価処理に相当する。The reading score is calculated for each of the assumed number of characters and the character type, and the recognition results w1, w2,..., Wn that maximize the reading score are output as the reading results. This processing corresponds to the result comparison evaluation processing of step 17 in FIG.

【００６２】最初の文字のスコアＰ（Ｘ１｜ｗ１）につ
いては、直前に文字パタン候補が存在しないので、文字
認識手段４が１文字辞書を用いて計算する。As for the score P (X1 | w1) of the first character, since there is no character pattern candidate immediately before, the character recognizing means 4 calculates using the one-character dictionary.

【００６３】なお、ここでは、読み取りスコアは、確率
として扱っているので、各文字パタン候補の認識スコア
の積を全体のスコアとしているが、確率とみなせないス
コア（例えば対数確率や、テンプレートからの距離）を
扱う場合には、積ではなく、和を用いてもよい。In this case, since the read score is treated as a probability, the product of the recognition scores of the respective character pattern candidates is used as the overall score. However, a score that cannot be regarded as a probability (for example, a logarithmic probability or a template from the template). When dealing with (distance), a sum may be used instead of a product.

【００６４】また、文字の並びに言語的な制約がある場
合には、適宜この制約を利用する。例えば、文字Ａの直
後に文字Ｂが続く確率Ｐ（Ｂ｜Ａ）が、統計的な分析か
ら既知であるような場合には、この確率を読み取りスコ
アに反映させ、Ｐ（Ｘ１｜ｗ１）Ｐ（ｗ１）×Ｐ（Ｘ２｜Ｘ１，ｗ２）
Ｐ（ｗ２｜ｗ１）×Ｐ（Ｘ３｜Ｘ２，ｗ３）Ｐ（ｗ３｜
ｗ２）×…×Ｐ（Ｘｎ｜Ｘｎ−１，ｗｎ）Ｐ（ｗｎ｜ｗ
ｎ−１）というようにスコアを計算する。In addition, when there are linguistic restrictions on the arrangement of characters, these restrictions are appropriately used. For example, if the probability P (B | A) that the character B immediately follows the character A is known from statistical analysis, this probability is reflected in the read score, and P (X1 | w1) P (W1) × P (X2 | X1, w2)
P (w2 | w1) × P (X3 | X2, w3) P (w3 |
w2) × ... × P (Xn | Xn−1, wn) P (wn | w
n-1) The score is calculated as follows.

【００６５】あるいは、文字列が限られた何種類かの単
語のうちの１つであることがわかっている場合は、それ
ぞれの単語の文字並びのみを想定して、読み取りスコア
を計算すればよい。Alternatively, if the character string is known to be one of several types of limited words, the reading score may be calculated by assuming only the character arrangement of each word. .

【００６６】文字列読み取り手段３は、動的計画法（Ｄ
ynamic Ｐrogramming）に基づき、効率的に、最適解を
得るようにしてもよい。この動的計画法を用いた例につ
いて説明する。ここでは、Ｔ−１個の切り出し位置候補
が検出され、入力文字列画像をＴ個の部分パタンに分割
することができるものとする。The character string reading means 3 uses the dynamic programming method (D
Based on dynamic programming, an optimal solution may be efficiently obtained. An example using this dynamic programming will be described. Here, it is assumed that T-1 cutout position candidates are detected and the input character string image can be divided into T partial patterns.

【００６７】また１番目の部分パタンからｉ番目の部分
パタンまでを１文字目からｋ文字目までに対応させ、か
つ１番目の部分パタンからｊ番目の部分パタンまでを１
文字目から（ｋ−１）文字目までに対応させた場合の、
ｋ文字分の読み取りスコアをＡ（ｋ，ｉ，ｊ）とする。The first partial pattern to the i-th partial pattern correspond to the first to k-th characters, and the first partial pattern to the j-th partial pattern correspond to one to one.
In the case where the characters from the first character to the (k-1) th character are used,
A reading score for k characters is A (k, i, j).

【００６８】このとき、最初の１文字目に関するスコア
Ａ（１，ｉ，ｊ）は、文字認識手段４により、Ｐ（部分
パタン１〜ｉ｜ｗ）のｗに関する最大値として計算でき
る。At this time, the score A (1, i, j) relating to the first character can be calculated by the character recognizing means 4 as the maximum value relating to w of P (partial patterns 1 to i | w).

【００６９】２文字目以降のスコアＡ（ｋ，ｉ，ｊ）
（ｋ＞１）については式（７）に示す漸化式で順次計算
できる。 Score A (k, i, j) for the second and subsequent characters
(K> 1) can be sequentially calculated by the recurrence formula shown in Expression (7).

【００７０】ただし、Ｘ（ｊ＋１，ｉ）はｊ＋１番目の
部分パタンからｉ番目の部分パタンまでを合わせて作ら
れた部分パタンである。またＢ（ｋ，ｉ，ｊ）及びＣ
（ｋ，ｉ，ｊ）はそれぞれｊ＋１番目の部分パタンから
ｉ番目の部分パタンまでをｋ文字目として使用した場合
の、ｋ−１文字目の開始位置及びｋ文字目の字種であ
る。Note that X (j + 1, i) is a partial pattern created by combining the (j + 1) -th partial pattern to the i-th partial pattern. B (k, i, j) and C
(K, i, j) are the start position of the (k-1) th character and the character type of the kth character when the jth partial pattern to the ith partial pattern are used as the kth character.

【００７１】上記漸化式によって、ひとたび、最大スコ
アＡ（ｎ，Ｔ，ｊ_max）＝ｍａｘ_jＡ（ｎ，Ｔ，ｊ）が求
められれば、ｎ文字目の字種は、ｗｎ＝Ｃ（ｎ，Ｔ，ｊ
_max）、ｎ文字目の開始位置はｊ_maxとなる。Once the maximum score A (n, T, j _max ) = max _j A (n, T, j) is obtained by the above recurrence formula, the character type of the n-th character is wn = C ( n, T, j
_max ), the start position of the n-th character is j _max .

【００７２】またｎ−１文字目の開始位置は、Ｂ（ｎ，
Ｔ，ｊ_max）、ｎ−１文字目の字種はＣ（ｎ−１，
ｊ_max，Ｂ（ｎ，Ｔ，ｊ_max））というように、後方へと
順次求められる。The start position of the (n-1) th character is B (n,
T, j _max ) and the character type of the (n−1) th character is C (n−1,
j _max , B (n, T, j _max )) are sequentially obtained backward.

【００７３】切り出し位置候補を少数に限定せず、等間
隔に多数設定する場合には、このようにして、最適な読
み取り結果を効率よく検索できる。When a large number of cutout position candidates are set at equal intervals without being limited to a small number, an optimum reading result can be efficiently searched in this way.

【００７４】この場合、図２のステップ１２の文字列画
像再構成、ステップ１５の文字認識、ステップ１６の文
字列認識、ステップ１７の結果比較評価、及び、ステッ
プ１３の隣接２文字評価、ステップ１４の１文字評価が
並行して処理されるため、効率よく読み取り結果を検索
できる。In this case, the character string image reconstruction in step 12 in FIG. 2, the character recognition in step 15, the character string recognition in step 16, the result comparison and evaluation in step 17, the evaluation of two adjacent characters in step 13, and step 14 Are processed in parallel, so that the reading result can be searched efficiently.

【００７５】次に、隣接２文字辞書格納手段５に格納さ
れる隣接２文字辞書の構成手順について説明する。Next, the procedure for constructing an adjacent two-character dictionary stored in the adjacent two-character dictionary storage means 5 will be described.

【００７６】隣接２文字辞書は、文字列画像データから
抽出された隣接する２文字の画像データを学習データと
した事前学習により構成される。The adjacent two-character dictionary is formed by pre-learning using image data of adjacent two characters extracted from character string image data as learning data.

【００７７】まず、隣接２文字画像データを、１文字目
の字種が何であるかにかかわらず、２文字の字種で分類
して、字種数に等しい数のデータセットを作成する。First, regardless of the character type of the first character, the adjacent two-character image data is classified according to the character type of the two characters, and a data set having a number equal to the number of character types is created.

【００７８】２文字目の字種がａである隣接２文字画像
を集めたデータセットに正解ａを、２文字目の字種がｂ
である隣接２文字画像を集めたデータセットに正解ｂ
を、という具合に、すべてのデータに、２文字目の字種
を、正解として付与する。以降は、通常の１文字のデー
タと同様にパタンの学習を行う。A correct a is set in a data set obtained by collecting adjacent two-character images whose character type is a, and the second character type is b.
Is a correct answer to the data set of two adjacent character images
, Etc., the character type of the second character is given as a correct answer to all data. Thereafter, pattern learning is performed in the same manner as for normal one-character data.

【００７９】例えば、文字認識手段４に、隠れマルコフ
モデル（Hidden Markov Model；ＨＭＭ）を用いる場
合には、例えば文献「１９９５年、ローレンス・ラビナ
ー他著、古井監訳、音声認識の基礎（下）、ＮＴＴアド
バンステクノロジ株式会社、第１２８〜１３８頁」に示
されているように、Ｂａｕｍ−Ｗｅｌｃｈアルゴリズム
によって、字種ａ、ｂ、…それぞれについて１つのＨＭ
Ｍのパラメータを推定して辞書を構成する。For example, when a Hidden Markov Model (HMM) is used for the character recognition means 4, for example, the reference “1995, Lawrence Rabbiner et al., Translated by Furui, the basics of speech recognition (below), As shown in "NTT Advanced Technology Co., Ltd., pp. 128-138", one HM is used for each of the character types a, b,... By the Baum-Welch algorithm.
A dictionary is constructed by estimating M parameters.

【００８０】１文字辞書格納手段６に格納される１文字
辞書の構成手順について説明する。１文字辞書は、事前
知識なしで部分パタンＸが生起する確率Ｐ（Ｘ）を計算
するための辞書と、字種ｗから文字が現れるという条件
の下にＸというパタンが生起する確率Ｐ（Ｘ｜ｗ）を計
算するための辞書と、を含む。The procedure for constructing the one-character dictionary stored in the one-character dictionary storage means 6 will be described. The one-character dictionary is a dictionary for calculating the probability P (X) of the occurrence of the partial pattern X without prior knowledge and the probability P (X) of the occurrence of the pattern X under the condition that a character appears from the character type w. | W) to calculate | w).

【００８１】まず、Ｐ（Ｘ）を計算する辞書について
は、１文字ずつ切り出された個別文字画像を字種によら
ず、すべて集めたデータセットを作成し、それにより１
つの辞書を作成する。First, as for the dictionary for calculating P (X), a data set is created by collecting all the individual character images cut out one by one regardless of the character type.
Create two dictionaries.

【００８２】そして、前述と同様、隠れマルコフモデル
を用いる場合は、作成したデータセットを用いて、Ｂａ
ｕｍ−Ｗｅｌｃｈ（バウム・ウェルチ）アルゴリズムを
実行して、１つのＨＭＭのパラメータを推定して辞書を
構成する。As described above, when the hidden Markov model is used, the created data set is used
The um-Welch algorithm is executed to estimate parameters of one HMM and construct a dictionary.

【００８３】次に、Ｐ（Ｘ｜ｗ）を計算するための辞書
は、直前の文字パタンが存在しない１文字目の認識スコ
アを計算するための辞書であるが、これはＰ（Ｘ）を計
算する辞書の学習に使った個別文字画像のデータセット
を字種別に分類し、各々の字種についてＨＭＭのパラメ
ータを推定し、字種数分のＨＭＭを構成することにより
辞書を作成する。Next, the dictionary for calculating P (X | w) is a dictionary for calculating the recognition score of the first character having no preceding character pattern. Data sets of individual character images used for learning the dictionary to be calculated are classified into character types, HMM parameters are estimated for each character type, and a HMM for the number of character types is constructed to create a dictionary.

【００８４】隣接２文字辞書及び１文字辞書は、正解付
けされた任意文字数の文字列画像を学習データとして、
自動的に構成するようにしてもよい。この手順について
説明する。The adjacent two-letter dictionary and the one-letter dictionary use a character string image of an arbitrary number of correct characters as learning data.
It may be configured automatically. This procedure will be described.

【００８５】まず、隣接２文字辞書及び１文字辞書を特
徴づけるパラメータの初期値を適当に定める。文字切り
出し手段２を用いて学習用の文字列画像データから切り
出し位置候補を検出し、文字列読み取り手段３、文字認
識手段４及び初期辞書を用いて、最適な切り出し位置を
求める。First, initial values of parameters characterizing the adjacent two-character dictionary and the one-character dictionary are appropriately determined. A candidate cutout position is detected from the character string image data for learning using the character cutout means 2, and an optimum cutout position is obtained using the character string reading means 3, the character recognition means 4, and the initial dictionary.

【００８６】この際、学習用の文字列画像データにはす
でに正解が付与されているので、ｗ１、ｗ２、…、ｗｎ
に関しては固定で最適なスコアを探索すればよい。これ
によって、暫定的な文字切り出し位置が定まり、個別に
文字切り出され正解付けされたデータが得られる。これ
を用いて、１文字辞書及び隣接２文字辞書を前述の手順
に従って構成すればよい。そして、これ以降、新しく構
成された辞書を用いて、再び、文字切り出し手段２、文
字列読み取り手段３、文字認識手段４を起動して、個別
に切り出された文字データを得、これらを用いて辞書を
再構成する、という一連の処理の繰り返しを任意回数行
えばよい。At this time, since a correct answer has already been given to the character string image data for learning, w1, w2,.
For, a fixed and optimum score may be searched for. As a result, a provisional character cutout position is determined, and character cutout data that is individually cut out and obtained as a correct answer is obtained. Using this, the one-character dictionary and the adjacent two-character dictionary may be configured according to the above-described procedure. Thereafter, using the newly constructed dictionary, the character extracting means 2, the character string reading means 3, and the character recognizing means 4 are activated again to obtain individually extracted character data. A series of processing of reconstructing a dictionary may be repeated an arbitrary number of times.

【００８７】なお、ここでは、初期辞書のパラメータを
適当に定め、次に個別文字データを生成するという手順
について説明したが、これを逆の順序で開始してもよ
い。すなわち、最初に適当な切り出し位置で文字を切り
出し、これら個別文字データを学習データとして、初期
辞書を構成してもよい。ひとたび辞書が構成されれば、
以降の手続きは同様である。Although the procedure of appropriately defining the parameters of the initial dictionary and then generating the individual character data has been described, the procedure may be started in the reverse order. That is, first, characters may be cut out at an appropriate cut-out position, and these individual character data may be used as learning data to form an initial dictionary. Once a dictionary has been constructed,
The subsequent procedure is the same.

【００８８】次に、本発明の第二の実施例について説明
する。図１を参照すると、本実施例は、入力された文字
列画像を光学センサで取り込んで格納する画像記憶手段
１と、画像記憶手段１より受け取った文字列画像より隣
接文字間の境界の候補を切り出し位置候補として検出す
る文字切り出し手段２と、いくつかの切り出し位置候補
を選んで文字列画像を分割した際の個々の個別文字パタ
ンについて文字認識処理を呼び出して文字列全体として
の認識スコアを計算し、最適な切り出し及び認識結果を
文字列の読み取り結果として出力する文字列読み取り手
段３と、文字列読み取り手段３の要求に応じて文字パタ
ンに認識処理を施し、１つの字種と認識スコアを返す文
字認識手段４と、単一文字パタンの出現しやすさのスコ
アを計算する１文字辞書格納手段５と、隣り合う２文字
に相当するパタンを用いて２文字目の字種と認識スコア
を計算する隣接２文字辞書格納手段６とを備えている。
各々の手段はそれぞれ計算機上の主記憶装置に記憶され
たプログラムを実行させることによって実現可能であ
る。Next, a second embodiment of the present invention will be described. Referring to FIG. 1, in the present embodiment, an image storage unit 1 that captures an input character string image by an optical sensor and stores it, and a candidate for a boundary between adjacent characters is determined from the character string image received from the image storage unit 1. Character extraction means 2 for detecting as a cutout position candidate, and character recognition processing is called for each individual character pattern when a character string image is divided by selecting some cutout position candidates to calculate a recognition score of the entire character string Then, a character string reading means 3 for outputting an optimal cutout and recognition result as a character string reading result, and performing a recognition process on a character pattern in response to a request from the character string reading means 3 to determine one character type and a recognition score. A character recognition unit 4 to be returned, a one-character dictionary storage unit 5 for calculating a score of the likelihood of appearance of a single character pattern, and a pattern corresponding to two adjacent characters are used. And a neighboring 2 character dictionary storage unit 6 for calculating a recognition score 2 character of character types.
Each means can be realized by executing a program stored in a main storage device on the computer.

【００８９】なお、初期辞書のパラメータを適当に定
め、次に個別文字データを生成するという手順を述べた
が、これを逆の順序で開始してもよい。すなわち、最初
に適当な切り出し位置で文字を切り出し、それら個別文
字データを学習データとして初期辞書を構成してもよ
い。ひとたび辞書が構成されれば、以降の手続きは同様
である。また、入力文字列画像を読み取りに適した特徴
パタンに変換する特徴抽出処理を文字列読み取り処理過
程に挿入してもよい点についても、本発明の第一の実施
例で述べた通りである。Although the procedure has been described in which the parameters of the initial dictionary are appropriately determined, and then the individual character data is generated, the procedure may be started in the reverse order. That is, first, characters may be cut out at an appropriate cut-out position, and the initial dictionary may be configured using the individual character data as learning data. Once the dictionary is constructed, the subsequent procedures are the same. Further, as described in the first embodiment of the present invention, a feature extraction process for converting an input character string image into a feature pattern suitable for reading may be inserted into the character string reading process.

【００９０】本発明の第２の実施例について、図２の流
れ図を参照しながら、段階を追って説明する。The second embodiment of the present invention will be described step by step with reference to the flowchart of FIG.

【００９１】図２において、ステップ１０、１１の画像
読み込み及び文字切り出しの動作は、前記第一の実施例
と同様である。すなわち、画像記憶手段１及び文字切り
出し手段２により、文字列画像の入力及び文字パタンの
抽出が行われる。In FIG. 2, the operations of reading images and extracting characters in steps 10 and 11 are the same as those in the first embodiment. That is, the input of the character string image and the extraction of the character pattern are performed by the image storage unit 1 and the character cutout unit 2.

【００９２】文字列読み取り手段３は、図２の流れ図で
は、ステップ１２の文字列画像再構成処理１２、ステッ
プ１６の文字列認識、ステップ１７の結果比較評価処理
の各処理を実行するものであり、文字切り出し手段２よ
り受け取った文字列画像または文字列画像を特徴抽出処
理により変換した特徴パタン、及び切り出し位置候補の
座標情報を受け取り、切り出し位置で切り出されたあら
ゆる文字パタン候補について、文字認識手段４を用いて
文字認識を行い、その認識結果と、認識スコアを記憶す
る。In the flowchart of FIG. 2, the character string reading means 3 executes the character string image reconstruction processing 12 in step 12, the character string recognition in step 16, and the result comparison and evaluation processing in step 17. Receiving the character string image received from the character extracting means 2 or the characteristic pattern obtained by converting the character string image by the characteristic extraction processing, and the coordinate information of the extracting position candidate, and recognizing the character pattern of any character pattern candidate extracted at the extracting position. 4 to perform character recognition, and store the recognition result and the recognition score.

【００９３】そして、文字列全体として、最も認識スコ
アが高く、かつ重複や読み飛ばしのない文字パタン候補
列の認識結果を、文字列の読み取り結果として選び出し
て出力する。最適な読み取り結果の検索手順については
後述する。Then, the recognition result of the character pattern candidate string having the highest recognition score as a whole and having no duplication or skipping is selected and output as the character string reading result. The search procedure for the optimum reading result will be described later.

【００９４】文字列読み取り手段３が文字認識手段４に
文字パタン候補を送る際、該当する文字パタン候補に加
えて、その直前の文字パタン候補も送る。When the character string reading means 3 sends a character pattern candidate to the character recognizing means 4, in addition to the corresponding character pattern candidate, it also sends the immediately preceding character pattern candidate.

【００９５】文字認識手段４は、これら隣接する２つの
文字パタン候補を考慮して、該当する文字パタン候補の
文字認識処理を行う。文字認識手段４では、文字列読み
取り手段３より、隣接する２つの文字パタン候補を受け
取り、あらゆる２文字の文字カテゴリの組合せを仮定し
て、１番目の文字パタンの発生を考慮した場合の２番目
の文字パタンの認識スコアを計算し、文字列読み取り手
段３に返す。The character recognizing means 4 performs a character recognizing process of the corresponding character pattern candidate in consideration of these two adjacent character pattern candidates. The character recognizing means 4 receives two adjacent character pattern candidates from the character string reading means 3 and assumes the combination of any two character character categories, and considers the occurrence of the first character pattern as the second character pattern. The recognition score of the character pattern is calculated and returned to the character string reading means 3.

【００９６】ここで、文字認識手段４が文字列読み取り
手段３から、隣接する２つの文字パタン候補Ｘｉ−１、
Ｘｉを受け取ったとすると、文字認識手段４は、パタン
Ｘｉの属する文字カテゴリｗｉと、パタンは、字種パタ
ンＸｉ−１の属する文字カテゴリｗｉ−１のあらゆる組
合せについて、直前の文字パタンがカテゴリｗｉ−１に
属するＸｉ−１であり、かつ、着目する文字パタンがカ
テゴリｗｉに属する確率Ｐ（Ｘｉ｜Ｘｉ−１，ｗｉ−
１，ｗｉ）を計算する。Here, the character recognizing means 4 sends two adjacent character pattern candidates Xi-1 from the character string reading means 3 to each other.
Assuming that Xi has been received, the character recognizing means 4 determines that the character pattern wi to which the pattern Xi belongs and the character pattern wi-1 to which the character type pattern Xi-1 belongs, for each combination of the character category wi-1 and the immediately preceding character pattern wi- 1 and the probability P (Xi | Xi−1, wi−) that the character pattern of interest belongs to the category wi.
1, wi).

【００９７】実際の確率の計算では、Ｐ（Ｘｉ｜Ｘｉ−
１，ｗｉ−１，ｗｉ）を直接計算せずに、Ｐ（Ｘｉ−１，Ｘｉ｜ｗｉ−１，ｗｉ）／Ｐ（Ｘｉ−１｜ｗｉ−１） …(8 ) という近似値を求める。In the calculation of the actual probability, P (Xi | Xi−
1, (wi-1, wi) is not directly calculated, but an approximate value of P (Xi-1, Xi | wi-1, wi) / P (Xi-1 | wi-1) (8) is obtained.

【００９８】この近似値の計算において、分子Ｐ（Ｘｉ
−１，Ｘｉ｜ｗｉ−１，ｗｉ）は、隣接する２文字パタ
ンのうちの１文字目の字種がｗｉ−１、２文字目の字種
がｗｉであるという条件で隣接する２文字のパタンがＸ
ｉ−１，Ｘｉとして生起する確率であり、これは隣接２
文字辞書格納手段５に記憶された隣接２文字パタンの辞
書から、２文字単位のパタンのマッチング結果として計
算される。この処理が図２のステップ１３の隣接２文字
評価処理に相当する。In the calculation of the approximate value, the numerator P (Xi
−1, Xi | wi−1, wi) are the two characters adjacent to each other on the condition that the character type of the first character of the two adjacent character patterns is wi-1, and the character type of the second character is wi. The pattern is X
i-1, Xi, which is the probability of occurrence
It is calculated from the dictionary of adjacent two-character patterns stored in the character dictionary storage unit 5 as a pattern matching result in units of two characters. This processing corresponds to the adjacent two-character evaluation processing in step 13 in FIG.

【００９９】一方、分母Ｐ（Ｘｉ−１｜ｗｉ−１）は、
文字カテゴリｗｉ−１を仮定した場合に文字パタンＸｉ
−１が観測される確率であり、これは１文字辞書格納手
段６より、１文字単位のパタンのマッチングとして計算
される。これは図２のステップ１４の１文字評価処理に
相当する。On the other hand, the denominator P (Xi-1 | wi-1) is
When character category wi-1 is assumed, character pattern Xi
-1 is the probability of being observed, and is calculated by the one-character dictionary storage means 6 as pattern matching for each character. This corresponds to the one-character evaluation process in step 14 of FIG.

【０１００】文字認識手段４は、隣接２文字辞書格納手
段５、及び１文字辞書格納手段６より得られたそれぞれ
の数値の比として、文字認識スコア、Ｐ（Ｘｉ｜Ｘｉ−１，ｗｉ）≒Ｐ（Ｘｉ−１，Ｘｉ｜ｗ）／Ｐ（Ｘｉ−１） …(9) を得る。この処理は、図２のステップ１５の文字認識処
理に相当する。The character recognizing means 4 calculates a character recognition score, P (Xi | Xi-1, Wi) ≒, as a ratio of respective numerical values obtained from the adjacent two-character dictionary storing means 5 and the one-character dictionary storing means 6. P (Xi-1, Xi | w) / P (Xi-1) (9) is obtained. This processing corresponds to the character recognition processing in step 15 in FIG.

【０１０１】文字列読み取り手段３の動作についてより
詳しく説明する。文字列読み取り手段３は、文字切り出
し手段２より受け取った文字列画像または文字列画像を
特徴抽出処理により変換した特徴パタン、及び切り出し
位置候補の座標情報を用いて、文字列の文字パタン候補
へのあらゆる分割の仕方を列挙する。これは図２のステ
ップ１２の文字列画像再構成処理に相当する。The operation of the character string reading means 3 will be described in more detail. The character string reading unit 3 converts the character string into a character pattern candidate using the character string image received from the character cutting unit 2 or the characteristic pattern obtained by converting the character string image by the characteristic extraction processing and the coordinate information of the cutout position candidate. List all ways of division. This corresponds to the character string image reconstruction processing in step 12 in FIG.

【０１０２】例えば、入力画像から４つの切り出し位置
候補が得られているとすると、入力画像はパタン１，パ
タン２，パタン３，パタン４，パタン５という５つの部
分パタンに分割できる。For example, assuming that four cutout position candidates are obtained from the input image, the input image can be divided into five partial patterns of pattern 1, pattern 2, pattern 3, pattern 4, and pattern 5.

【０１０３】これに対して、文字数２を仮定すると、
（１｜２，３，４，５）、（１，２｜３，４，５）、
（１，２，３｜４，５）、（１，２，３，４｜５）とい
う４通りの分割があり得る。On the other hand, assuming that the number of characters is 2,
(1 | 2,3,4,5), (1,2 | 3,4,5),
There can be four types of divisions, (1,2,3 | 4,5) and (1,2,3,4 | 5).

【０１０４】また文字数３を仮定すると、（１｜２｜３，４，５）、（１｜２，３｜４，５）、
（１｜２，３，４｜５）、（１，２｜３｜４，５）、
（１，２｜３，４｜５）、（１，２，３｜４｜５）という６通りの分割があり得る。ただしここでは入力画
像の分割位置を“｜”で表している。Assuming that the number of characters is 3, (1 | 2 | 3,4,5), (1 | 2,3 | 4,5),
(1 | 2,3,4 | 5), (1,2 | 3 | 4,5),
There are six possible divisions, (1,2 | 3,4 | 5) and (1,2,3 | 4 | 5). However, here, the division position of the input image is represented by “|”.

【０１０５】例えば（１，２｜３｜４，５）は、部分パ
タン１，２が１文字目に、部分パタン３が２文字目に、
部分パタン４，５が３文字目に割り当てられるように入
力画像を分割（グループ分け）することを意味する。For example, (1,2 | 3 | 4,5) means that partial patterns 1 and 2 are the first character, partial pattern 3 is the second character,
This means that the input image is divided (grouped) so that the partial patterns 4 and 5 are assigned to the third character.

【０１０６】このようにして想定される文字数につい
て、あらゆる分割の仕方を網羅して文字パタン候補の列
を生成し、それぞれについて文字列全体での読み取りス
コアを計算する。これは図２のステップの文字列認識処
理に相当する。In this way, for the assumed number of characters, a string of character pattern candidates is generated covering all division methods, and the reading score of the entire character string is calculated for each. This corresponds to the character string recognition process in the step of FIG.

【０１０７】読み取りスコアは、各文字パタン候補の認
識スコアの積、すなわちＰ（Ｘ１｜ｗ１）×Ｐ（Ｘ２｜Ｘ１，ｗ１，ｗ２）×Ｐ
（Ｘ３｜Ｘ２，ｗ２，ｗ３）×…×Ｐ（Ｘｎ｜Ｘｎ−
１，ｗｎ−１，ｗｎ）と計算する。ここで、ｎは文字数である。The read score is the product of the recognition scores of each character pattern candidate, that is, P (X1 | w1) × P (X2 | X1, w1, w2) × P
(X3 | X2, w2, w3) ×... P (Xn | Xn−
1, wn-1, wn). Here, n is the number of characters.

【０１０８】想定される文字数及び字種について、それ
ぞれ読み取りスコアを計算し、読み取りスコアが最大と
なる認識結果ｗ１，ｗ２，…，ｗｎが読み取り結果とし
て出力される。この処理は、図２のステップの結果比較
評価処理に相当する。A reading score is calculated for each of the assumed number of characters and character types, and the recognition results w1, w2,..., Wn with the maximum reading score are output as the reading results. This processing corresponds to the result comparison and evaluation processing of the step in FIG.

【０１０９】最初の文字のスコアＰ（Ｘ１｜ｗ１）につ
いては、直前に文字パタン候補が存在しないので、文字
認識手段４が１文字辞書を用いて計算する。As for the score P (X1 | w1) of the first character, since there is no character pattern candidate immediately before, the character recognizing means 4 calculates using the one-character dictionary.

【０１１０】なお、読み取りスコアは、ここでは、確率
として扱っているので、各文字パタン候補の認識スコア
の積を全体のスコアとしているが、確率とみなせないス
コア（対数確率やテンプレートからの距離）を扱う場合
は、積ではなく、和を用いてもよい。Since the read score is treated as a probability here, the product of the recognition scores of each character pattern candidate is used as the overall score, but a score that cannot be regarded as a probability (log probability or distance from the template) , A sum may be used instead of a product.

【０１１１】また、文字の並びに言語的な制約がある場
合には、適宜この制約を利用する。例えば、文字Ａの直
後に文字Ｂが続く確率Ｐ（Ｂ｜Ａ）が、統計的な分析か
ら既知であるような場合には、これを読み取りスコアに
反映させて、Ｐ（Ｘ１｜ｗ１）Ｐ（ｗ１）×Ｐ（Ｘ２｜Ｘ１，ｗ２）
Ｐ（ｗ２｜ｗ１）×Ｐ（Ｘ３｜Ｘ２，ｗ３）Ｐ（ｗ３｜
ｗ２）×…×Ｐ（Ｘｎ｜Ｘｎ−１，ｗｎ）Ｐ（ｗｎ｜ｗ
ｎ−１）というようにスコアを計算する。If there are linguistic restrictions on the arrangement of characters, these restrictions are appropriately used. For example, when the probability P (B | A) that the character B immediately follows the character A is known from the statistical analysis, this is reflected in the read score, and P (X1 | w1) P (W1) × P (X2 | X1, w2)
P (w2 | w1) × P (X3 | X2, w3) P (w3 |
w2) × ... × P (Xn | Xn−1, wn) P (wn | w
n-1) The score is calculated as follows.

【０１１２】あるいは、文字列が限られた何種類かの単
語のうちの１つであることがわかっている場合には、そ
れぞれの単語の文字並びのみを想定して読み取りスコア
を計算すればよい。Alternatively, if the character string is known to be one of a limited number of words, the reading score may be calculated by assuming only the character arrangement of each word. .

【０１１３】文字列読み取り手段３の動作については、
前記第一の実施例と同様、動的計画法に基づいて効率的
に最適解を得るようにしてもよい。ここでは、Ｔ−１個
の切り出し位置候補が検出され、入力文字列画像をＴ個
の部分パタンに分割することができるとする。Regarding the operation of the character string reading means 3,
As in the first embodiment, the optimal solution may be efficiently obtained based on the dynamic programming. Here, it is assumed that T-1 cutout position candidates are detected and the input character string image can be divided into T partial patterns.

【０１１４】また１番目の部分パタンからｉ番目の部分
パタンまでを１文字目からｋ文字目までに対応させ、か
つ１番目の部分パタンからｊ番目の部分パタンまでを１
文字目から（ｋ−１）文字目までに対応させ、かつｋ文
字目の文字カテゴリをｗとした場合の、ｋ文字分の読み
取りスコアをＡ（ｋ，ｉ，ｊ，ｗ）とする。The first to i-th partial patterns correspond to the first to k-th characters, and the first to j-th partial patterns correspond to 1 to k characters.
The read score for k characters is assumed to be A (k, i, j, w) when the character category is set to w from the kth character to the (k-1) th character and the character category of the kth character is w.

【０１１５】このとき、最初の１文字目に関するスコア
Ａ（１，ｉ，ｊ，ｗ）は、文字認識手段４により、Ｐ
（部分パタン１〜ｉ｜ｗ）のｗに関する最大値として計
算できる。At this time, the score A (1, i, j, w) relating to the first character is obtained by
It can be calculated as the maximum value of w of (partial patterns 1 to i | w).

【０１１６】また２文字目までに関するスコアＡ（２，
ｉ，ｊ，ｗ）は、文字認識手段４と隣接２文字辞書格納
手段６により、Ｐ（部分パタン１〜ｊ，部分パタンｊ＋
１〜ｉ｜ｗ′，ｗ）のｗ′に関する最大値として計算で
きる。The score A (2,
i (j, j, w) is stored in P (partial patterns 1 to j, partial pattern j +
1−i | w ′, w) can be calculated as the maximum value of w ′.

【０１１７】３文字目以降のスコアＡ（ｋ，ｉ，ｊ）
（ｋ＞２）については、次式（１０）に示す漸化式で順
次計算できる。Score A (k, i, j) after the third character
(K> 2) can be sequentially calculated by the recurrence formula shown in the following formula (10).

【０１１８】 [0118]

【０１１９】ただし、Ｘ（ｊ＋１，ｉ）は、（ｊ＋１）
番目の部分パタンからｉ番目の部分パタンまでを合わせ
て作られた部分パタンである。However, X (j + 1, i) is (j + 1)
This is a partial pattern formed by combining the i-th partial pattern with the i-th partial pattern.

【０１２０】また、式（１０）において、ｍａｘは、ｌ
やｗ′など指定した変数に関する最大値を表し、ａｒｇ
_maxはｍａｘの操作を行って最大値が得られたときの変
数の値を表す。In the equation (10), max is 1
Represents the maximum value of the specified variable such as or w ', arg
_max represents the value of the variable when the maximum value is obtained by performing the operation of max.

【０１２１】また、Ｂ（ｋ，ｉ，ｊ，ｗ）及びＣ（ｋ，
ｉ，ｊ，ｗ）は、それぞれ（ｊ＋１）番目の部分パタン
からｉ番目の部分パタンまでをｋ文字目として使用し、
かつ、ｋ文字目に相当するパタンの属する文字カテゴリ
をｗとした場合の、ｋ−２文字目の終端位置及び（ｋ−
１）文字目の字種である。Further, B (k, i, j, w) and C (k,
i, j, w) respectively use the (j + 1) -th partial pattern to the i-th partial pattern as the k-th character,
When the character category to which the pattern corresponding to the k-th character belongs is w, the end position of the k-th character and (k-
1) The character type of the character.

【０１２２】上記漸化式によって、ひとたび、最大スコ
アＡ（ｎ，Ｔ，ｊ_max，ｗ_max）＝ｍａｘ_jｍａｘ_wＡ
（ｎ，Ｔ，ｊ，ｗ）が求められれば、ｎ文字目の字種
は、ｗｎ＝Ｃ（ｎ，Ｔ，ｊ_max，ｗ_max）、ｎ文字目の開始位置はｊ_maxとなる。According to the above recurrence formula, once the maximum score A (n, T, j _max , w _max ) = max _j max _w A
If (n, T, j, w) is obtained, the character type of the nth character is wn = C (n, T, _jmax , _wmax ), and the start position of the nth character is _jmax .

【０１２３】また（ｎ−１）文字目の開始位置は、Ｂ（ｎ，Ｔ，ｊ_max，ｗ_max）＋１、（ｎ−１）文字目の字種は、Ｃ（ｎ−１，ｊ_max，Ｂ（ｎ，Ｔ，ｊ_max，ｗ_max））というように、後方へと順次求められる。The start position of the (n-1) th character is B (n, T, _jmax , _wmax ) +1, and the character type of the (n-1) th character is C (n-1, _jmax). , B (n, T, j _max , w _max )).

【０１２４】切り出し位置候補を少数に限定せず、等間
隔に多数設定する場合には、この形態で最適な読み取り
結果を効率よく検索できる。この場合、図２のステップ
１２の文字列画像再構成処理、ステップ１５の文字認識
処理、ステップ１６の文字列認識処理、ステップ１７の
結果比較評価処理、及びステップ１３の隣接２文字評価
処理、ステップ１４の１文字評価処理が並行して処理さ
れるため、効率よく読み取り結果を検索できる。When a large number of cutout position candidates are set at equal intervals without being limited to a small number, an optimum reading result can be efficiently searched in this mode. In this case, the character string image reconstruction processing of step 12 in FIG. 2, the character recognition processing of step 15, the character string recognition processing of step 16, the result comparison evaluation processing of step 17, the adjacent two-character evaluation processing of step 13, Since the 14 one-character evaluation processes are performed in parallel, the reading result can be efficiently searched.

【０１２５】隣接２文字辞書格納手段５に格納される隣
接２文字辞書の構成手順について説明する。The procedure for constructing an adjacent two-character dictionary stored in the adjacent two-character dictionary storage means 5 will be described.

【０１２６】隣接２文字辞書は、文字列画像データから
抽出された隣接する２文字の画像データを学習データと
した事前学習により構成される。The adjacent two-character dictionary is formed by pre-learning using image data of two adjacent characters extracted from character string image data as learning data.

【０１２７】まず、隣接２文字画像データを、それらを
構成する各文字の字種によりいくつかのクラスに分類す
る。例えば数字を扱う場合には、００、０１、０２、
…、９９という１００通りの組合せがあり得るので、そ
れぞれの組合せで画像データを分類する。０１と１０は
異なるクラスに分ける。First, adjacent two-character image data is classified into several classes according to the character type of each character constituting the image data. For example, when dealing with numbers, 00, 01, 02,
.., 99, there are 100 possible combinations, and the image data is classified by each combination. 01 and 10 are divided into different classes.

【０１２８】この結果、字種数の二乗に等しい数のクラ
ス（データのセット）ができる。以降は、通常の１文字
のデータと同様にパタンの学習を行う。例えば、文字認
識手段４に隠れマルコフモデル（ＨＭＭ）を用いる場合
には、文献（「１９９５年、ローレンス・ラビナー他
著、古井監訳、音声認識の基礎（下）、ＮＴＴアドバン
ステクノロジ株式会社、１２８〜１３８頁」）に記載さ
れているように、Ｂａｕｍ−Ｗｅｌｃｈアルゴリズムに
よって、それぞれのクラス（数字の場合なら００、０
１、０２、…、９９）について１つのＨＭＭのパラメー
タを推定して辞書を構成する。As a result, the number of classes (data sets) equal to the square of the number of character types is created. Thereafter, pattern learning is performed in the same manner as for normal one-character data. For example, in the case where a hidden Markov model (HMM) is used for the character recognizing means 4, a document ("Lawrence Labiner et al., 1995, translated by Furui, basics of speech recognition (below), NTT Advanced Technology Co., Ltd., 128- 138 ”), each class (00, 0 in the case of a numeral) is calculated by the Baum-Welch algorithm.
1, 02,..., 99), one HMM parameter is estimated to form a dictionary.

【０１２９】１文字辞書格納手段６に格納される１文字
辞書の構成手順については、前記第一の実施例で説明し
た、確率Ｐ（Ｘ｜ｗ）を計算するための辞書の構成手順
と同様である。The construction procedure of the one-character dictionary stored in the one-character dictionary storage means 6 is the same as the construction procedure of the dictionary for calculating the probability P (X | w) described in the first embodiment. It is.

【０１３０】なお、隣接２文字辞書、及び、１文字辞書
を、正解付けされた任意文字数の文字列画像を学習デー
タとして、自動的に構成することも可能である。これに
ついては、前記第一の実施例で述べた方法と同様の手順
で行えばよい。Note that it is also possible to automatically configure the adjacent two-character dictionary and one-character dictionary as learning data using a character string image of an arbitrary number of characters that have been correctly answered. This may be performed in the same procedure as the method described in the first embodiment.

【０１３１】次に、本発明の第３の実施例について説明
する。図５は、本発明の第３の実施例の構成を示すブロ
ック図である。図５を参照すると、本発明の第３の実施
例は、文字認識プログラムを記録した記録媒体７を備え
る。この記録媒体７はＣＤ−ＲＯＭ、磁気ディスク、半
導体メモリその他の記録媒体であってよく、ネットワー
クを介して流通する場合も含む。Next, a third embodiment of the present invention will be described. FIG. 5 is a block diagram showing the configuration of the third embodiment of the present invention. Referring to FIG. 5, the third embodiment of the present invention includes a recording medium 7 on which a character recognition program is recorded. The recording medium 7 may be a CD-ROM, a magnetic disk, a semiconductor memory, or another recording medium, and includes a case where the recording medium is distributed via a network.

【０１３２】文字認識プログラムは、記録媒体７からデ
ータ処理装置８に読み込まれ、データ処理装置８で実行
される。データ処理装置８は文字認識プログラムの制御
により、文字切り出し手段を用いて画像記憶手段１に入
力された文字列画像からいくつかの切り出し位置候補を
検出し、それら複数の切り出し位置候補より文字パタン
候補を生成し、それら文字パタン候補のそれぞれについ
て、１文字辞書格納手段５及び隣接２文字辞書格納手段
６にそれぞれ格納された１文字辞書及び隣接２文字辞書
を用いた文字認識手段によって認識処理を行い、文字列
全体として最大のスコアが得られるような読み取り結果
を求めて出力する。The character recognition program is read from the recording medium 7 into the data processing device 8 and executed by the data processing device 8. Under the control of the character recognition program, the data processing device 8 detects some cutout position candidates from the character string image input to the image storage unit 1 by using the character cutout unit, and detects a character pattern candidate from the plurality of cutout position candidates. And performs a recognition process for each of these character pattern candidates by a character recognition unit using the one-character dictionary and the adjacent two-character dictionary stored in the one-character dictionary storage unit 5 and the adjacent two-character dictionary storage unit 6, respectively. , And obtains and outputs a reading result that gives the maximum score for the entire character string.

【０１３３】データ処理装置７は文字認識プログラムの
制御により、文字切り出し手段２、文字列読み取り手段
３及び文字認識手段４による処理と同一の処理を実行し
て文字列の読み取り結果を結果を出力する。Under the control of the character recognition program, the data processing device 7 executes the same processing as the processing by the character cutout means 2, the character string reading means 3 and the character recognition means 4, and outputs the result of reading the character string. .

【０１３４】[0134]

【発明の効果】以上説明したように、本発明によれば、
文字列の読み取りにおいて、文字列から抽出される文字
候補を隣接する２文字の組として処理し、１文字目のパ
タン形状を考慮しつつ、２文字目に対する認識結果や認
識スコアを算出することにより、直前に書かれた文字か
らの続け書きや接触等によって文字形状が変形すること
に対応して、安定した文字認識が可能となり、このた
め、正確な文字列の読み取りを可能とする、という効果
を奏する。As described above, according to the present invention,
In reading a character string, character candidates extracted from the character string are processed as a set of two adjacent characters, and a recognition result and a recognition score for the second character are calculated while considering the pattern shape of the first character. In response to the fact that the character shape is deformed due to continuous writing, contact, etc. from the character written immediately before, stable character recognition becomes possible, thereby enabling accurate character string reading. To play.

【０１３５】さらに、本発明によれば、辞書のテンプレ
ート数は、多くとも高々、従来技術の２倍程度とされる
ため、十分高速な文字列の読み取りを実現できる、とい
う効果も奏する。Further, according to the present invention, the number of dictionary templates is at most about twice as large as that of the prior art, so that a sufficiently high-speed reading of a character string can be realized.

[Brief description of the drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】本発明の一実施例の処理フローを示す流れ図で
ある。FIG. 2 is a flowchart showing a processing flow of an embodiment of the present invention.

【図３】本発明の一実施例を説明するための図であり、
入力文字列画像の一例を示す図である。FIG. 3 is a diagram for explaining one embodiment of the present invention;
FIG. 4 is a diagram illustrating an example of an input character string image.

【図４】本発明の一実施例を説明するための図であり、
入力文字列画像から識別に有効な特徴量を抽出した結果
の一例を示す図である。FIG. 4 is a diagram for explaining one embodiment of the present invention;
FIG. 14 is a diagram illustrating an example of a result of extracting a feature amount effective for identification from an input character string image.

【図５】本発明の第二の実施例の構成を示すブロック図
である。FIG. 5 is a block diagram showing a configuration of a second exemplary embodiment of the present invention.

【図６】文字の誤り方がその字種に依存する例を説明す
るための文字画像の一例を示す図である。FIG. 6 is a diagram illustrating an example of a character image for explaining an example in which a character error depends on the character type.

【図７】文字の誤り方がその字種に依存する例を説明す
るための文字画像の一例を示す図である。FIG. 7 is a diagram illustrating an example of a character image for explaining an example in which a character error depends on the character type;

[Explanation of symbols]

１画像記憶手段２文字切り出し手段３文字列読み取り手段４文字認識手段５１文字辞書格納手段６隣接２文字辞書格納手段７記憶媒体８データ処理装置 REFERENCE SIGNS LIST 1 image storage means 2 character cutout means 3 character string reading means 4 character recognition means 5 1 character dictionary storage means 6 adjacent 2 character dictionary storage means 7 storage medium 8 data processing device

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平５−6464（ＪＰ，Ａ) 特開平８−96085（ＪＰ，Ａ) 「電子情報通信学会技術研究報告」ＰＲＭＵ98−139 Ｖｏｌ．98 Ｎｏ．489 ｐ．25−30（1998）”確率モデルに基づくオンライン枠なし手書き文字列認識" 「電子情報通信学会技術研究報告」ＰＲＭＵ98−138 Ｖｏｌ．98 Ｎｏ．489 ｐ．17−24（1998）”切り出し・認識・言語の確信度を統合した枠なしオンライン文字列認識手法" 「情報処理学会研究報告」Ｖｏｌ．93 Ｎｏ．79 ｐ．37−44（1993）”Ｂｉｇｒａｍによるオンライン漢字認識の文脈後処理手法" 「情報処理学会論文誌」Ｖｏｌ．39 Ｎｏ．３ｐ．625−635（1998）”認識誤りを含む和文テキストにおける全文検索手法" (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/62 - 9/72 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-5-6464 (JP, A) JP-A-8-96085 (JP, A) "Technical Research Report of the Institute of Electronics, Information and Communication Engineers", PRMU 98-139, Vol. 98 No. 489 p. 25-30 (1998) "On-line Handwritten Character String Recognition Based on Stochastic Model""IEICE Technical Report" PRMU 98-138 Vol. 98 No. 489 p. 17-24 (1998), “Frameless Online Character String Recognition Method Integrating Extraction, Recognition, and Language Confidence” “IPSJ Research Report” Vol. 93 No. 79 p. 37-44 (1993) "Post-processing method for online kanji recognition by Biggram""Transactions of Information Processing Society of Japan", Vol. 39 No. 3 p. 625-635 (1998) "A full-text search method for Japanese texts containing recognition errors" (58) Fields investigated (Int. Cl. ⁷ , DB name) G06K 9/62-9/72 JICST file (JOIS)

Claims

(57) [Claims]

1. An image storage means for inputting and storing a character string image, and a character extraction means for detecting a candidate extraction position for obtaining a partial pattern corresponding to one character from the character string image obtained from the image storage means. And generating an individual character pattern candidate that is a partial pattern corresponding to one character based on the cutout position candidate detected by the character cutout means, and performing character recognition to output an optimal character string reading result. Character string reading means, and individual character pattern candidates generated by the character string reading means are recognized in response to a request from the character string reading means, and a character recognition result and character recognition indicating the likelihood of the character recognition result are provided. A character recognition unit that outputs a score, and a one-character dictionary that stores a dictionary that the character recognition unit uses for identification of one-character pattern candidates and score evaluation. Storage means; and two-character dictionary storage means for storing an adjacent two-character dictionary for identifying individual characters using character pattern candidates for two adjacent characters, wherein the character recognition means comprises: Is a character from the character string reading means.
When receiving pattern candidates and performing character recognition, the recognition target
Character pattern candidate and the character pattern candidate immediately before
If the character pattern candidate to be recognized belongs to a certain character type,
Assuming, given given character pattern candidate for recognition
And the probability of occurrence of the character pattern candidate immediately before and
Using the probability of occurrence of the previous character pattern,
The likelihood that a character pattern candidate belongs to that character type
A character recognition device, wherein the character recognition device represents a score to represent .

2. The method according to claim 1, wherein the character recognition means receives a character pattern candidate to be recognized and a character pattern candidate immediately before the character pattern candidate and assumes that the character pattern candidate to be recognized belongs to a character type. The ratio between the probability of occurrence of the target character pattern candidate and the immediately preceding character pattern candidate, and the ratio of the probability of occurrence of the immediately preceding character pattern candidate, is a score indicating the likelihood that the recognition target character pattern candidate belongs to the character type. The character recognition device according to claim 1, wherein:

3. An image recording device for inputting and storing a character string image.
From the character string image obtained from the storage means
Character that detects a candidate cutout position to obtain a button
A cutout process; and (b) a cutout position detected in the character cutout process.
Based on the candidate, the number of partial patterns corresponding to one character
Optimum by generating different character pattern candidates and recognizing characters
String reading processing to output the result of reading a simple string
And (c) an individual character pattern generated by the character string reading process.
In response to a request from the character string reading process,
Recognize, character recognition result, and likelihood of character recognition result
Anda character recognition process of outputting character recognition score representing the of the character recognition processing of the (c) is one character dictionary storage means
Refers to the dictionary stored in the
Perform separate and score evaluation and store in two-character dictionary storage
With reference to the adjacent two-character dictionary,
Individual characters are identified using character pattern candidates, and the character recognition process is performed by the character string reading means.
When receiving pattern candidates and performing character recognition, the recognition target
Character pattern candidate and the character pattern candidate immediately before
If the character pattern candidate to be recognized belongs to a certain character type,
Assuming, given given character pattern candidate for recognition
And the probability of occurrence of the character pattern candidate immediately before and
Using the probability of occurrence of the previous character pattern,
The likelihood that a character pattern candidate belongs to that character type
A recording medium on which a program for executing each of the above-described processes on a computer is recorded.

(A) detecting a cutout position candidate forming a character boundary candidate from the character string image input from the image input means; and (b) character recognition for the character pattern candidate cut out by the cutout position candidate. And the character recognition result and the recognition score are stored. At this time, the character recognition process is performed on the corresponding character pattern candidate in consideration of two character pattern candidates from the character pattern candidate and the character pattern candidate immediately before the character pattern candidate. storing the likelihood of the recognition result as the recognition score, it viewed including the steps of (c) most recognition score as a whole string to output a recognition result of the high character pattern candidate string, the (a) from (c), From a character pattern candidate and the character pattern candidate immediately before it,
Condition that the certain character pattern candidate is of a certain character type
And the character pattern candidate and the character pattern candidate immediately before the character pattern candidate.
Occurs, and the character pattern immediately before occurs.
The character pattern candidate to be recognized is
Derive a recognition score that represents the likelihood of belonging to
That, character recognition method, characterized in that.

5. A method according to claim 1, wherein the probability that the character pattern candidate and the character pattern candidate immediately before the character pattern candidate occur are two characters from a dictionary of two adjacent character patterns stored in advance, provided that the certain character pattern candidate is of a certain character type. claim calculated by the matching units of patterns, is calculated by the first character unit patterns matching the pattern of dictionary character by character in which the preceding character pattern is registered in advance the probability of occurrence, characterized in that 4 The character recognition method described.

6. In the step (b), adjacent two
It is assumed that two character pattern candidates Xi-1 and Xi have been received.
And the character recognition result wi of Xi, the character type is w, the previous character
Character pattern X under the condition that the pattern is Xi-1
The conditional probability P (Xi | Xi-1, w) that i occurs
Is determined as w to be maximized, and the character recognition score is P (X
5. The character recognition method according to claim 4 , wherein the calculation is performed as i | Xi-1, wi) .

7. The second character of an adjacent two-character pattern
If the character type is w, the pattern of two adjacent characters is
Xi-1, Xi, the conditional probability P (X
i | the Xi-1, w value of), the genus of the certain character Xi
Under the condition that the character type is w, a pattern P (Xi-1, Xi | w) where a pattern of two adjacent characters , which is the sum of Xi and the immediately preceding character Xi-1, occurs . Using the probability P (Xi-1) that the immediately preceding character pattern is observed without prior knowledge , P (Xi-1, Xi | w) / P (Xi-1) is calculated. The character recognition method according to claim 6 .

8. In the step (b), when an i-th character pattern Xi in an input character string is compared with a dictionary pattern w to obtain a recognition score for character recognition, a character as an (i-1) -th character is used. pattern Xi-1 has occurred, and, i-1 th character pattern Xi-1 is a dictionary pattern w i-1 conditional probability by adding the condition that belongs to the character category represented by P (Xi | Xi-1 5. The character recognition method according to claim 4, wherein the calculation is performed as (wi-1, wi).

9. The pattern of two adjacent characters is Xi-1, Xi on condition that the character type of the first character of the two adjacent character patterns is wi-1, and the character type of the second character is wi. The conditional probability P (Xi | Xi−1, wi−
1, wi) is calculated as the probability P (Xi−1, Xi | wi) in units of two adjacent characters obtained by combining a certain character and the character immediately before it.
−1, wi) and the score P (Xi−1 |
wi-1) ratio P (Xi-1, Xi | wi-1, Wi)
/ P (Xi-1 | wi -1), or, using the conditions probability P that character pattern Xi-1 of assuming a character category w i-1 is observed (Xi-1), P ( Xi- 1,
9. The character recognition method according to claim 8, wherein the value is calculated by Xi | wi-1, wi) / P (Xi-1).

10. A recognition score representing the likelihood of the character recognition is obtained by a dynamic programming method, and a plurality of (T-1) recognition scores are obtained.
Is detected, the input character string image is divided into T partial patterns, the first partial pattern to the i-th partial pattern correspond to the first to k-th characters, and the first When the partial pattern to the j-th partial pattern correspond to the first character to the (k-1) th character, the read score for k characters is A (k, i, j), and the first one character Eye score A (1, i, j)
Is determined as the maximum value of w of the probability P that the first to i-th partial patterns occur under the condition that the i-th partial pattern is a character type w, and the maximum score of the scores is determined. The character recognition method according to claim 4, wherein

11. The score A (k, i, j) for the second and subsequent characters
(K> 1) is sequentially calculated by the following recurrence formula, (However, X (j + 1, i) is a partial pattern created by combining the (j + 1) th partial pattern to the ith partial pattern. B (k, i, j) and C (k, i, j) Are the start position of the (k-1) th character and the character type of the kth character when the j + 1st partial pattern to the ith partial pattern are used as the kth character, respectively. A function that represents the maximum value of the argument for the variable, argm
ax represents the value of the variable when the maximum value was obtained by performing the operation of max. ), The maximum score A (n, T, j _max ) = max _j A (n, T,
j), the character type of the nth character is wn = C (n, T, _jmax ), the start position of the nth character is _jmax , and the start position of the (n-1) th character is B (n , T, j _max ) and the character type of the (n−1) th character is C (n−1, j _max , B (n,
9. The character recognition method according to claim 8, wherein the calculation is sequentially performed backward, such as T, j _max )).

12. A recognition score representing the likelihood of the character recognition is obtained by a dynamic programming method, wherein a plurality of (T-1) recognition scores are obtained.
Is detected, the input character string image is divided into T partial patterns, the first partial pattern to the i-th partial pattern correspond to the first to k-th characters, and the first When the partial pattern to the j-th partial pattern correspond to the first character to the (k-1) th character, the read score for k characters is A (k, i, j), and the first one character Eye score A (1, i, j)
Is determined by the character recognition means as the maximum value of w of the probability P that the first to i-th partial patterns occur under the condition that the i-th partial pattern is a certain character type w. The score A (k, i, j) (k> 1) is sequentially calculated by the following recurrence formula, (However, X (j + 1, i) is a partial pattern created by combining the (j + 1) th partial pattern to the ith partial pattern. B (k, i, j) and C (k, i, j) Are the start position of the (k-1) th character and the character type of the kth character when the j + 1st partial pattern to the ith partial pattern are used as the kth character, respectively. A function that represents the maximum value of the argument for the variable, argm
ax represents the value of the variable when the maximum value was obtained by performing the operation of max. ), By the above recurrence formula, the maximum score A (n, T, j _max ) = max _j A (n, T,
j), the character type of the nth character is wn = C (n, T, _jmax ), the start position of the nth character is _jmax , and the start position of the (n-1) th character is B (n, T, j _max ), the character type of the (n−1) th character is C (n−1, j _max , B (n,
2. The character recognition device according to claim 1, wherein the values are sequentially obtained backward, such as T, j _max )).

13. A dynamic programming method for determining a recognition score representing the likelihood of the character recognition, wherein a plurality of (T-1) recognition scores are obtained.
Is detected, the input character string image is divided into T partial patterns, the first partial pattern to the i-th partial pattern correspond to the first to k-th characters, and The first partial pattern to the j-th partial pattern correspond to the first to (k-1) th characters,
When the character category of the k-th character is w, the reading score for the k characters is A (k, i, j, w), and the score A (1, i, j, w) for the first character is given. )
Obtained as the maximum value of w of the probability P of occurrence of the first to i-th partial patterns, and score A (2, i, j, w) for the second character
Is P (partial patterns 1 to j, partial patterns j + 1 to i |) by the character recognition unit and the two-character dictionary storage unit.
w ', w) is calculated as the maximum value for w'. The score A (k, i, j) (k> 2) after the third character is sequentially calculated by the following recurrence formula. (However, X (j + 1, i) is a partial pattern created by combining the (j + 1) -th partial pattern to the i-th partial pattern. B (k, i, j, w) and C (k) ,
i, j, w) respectively use the (j + 1) -th partial pattern to the i-th partial pattern as the k-th character,
When the character category to which the pattern corresponding to the k-th character belongs is w, the end position of the k-th character and (k-
1) The character type of the character. max is specified as l or w '
Argmax is a function that represents the maximum value of an argument related to a variable.
Displays the value of the variable when the maximum value was obtained by performing the operation.
You. ), By the above recurrence formula, the maximum score A (n, T, j _max , w _max ) = max _j ma
x _w A (n, T, j, w) is obtained, and the character type of the n-th character is w
n = C (n, T, _jmax , _wmax ), the start position of the nth character is _jmax , and the start position of the (n-1) th character is B (n, T, _jmax ,
w _max ) +1, the character type of the (n−1) th character is C (n−1, j _max , B)
2. The character recognition apparatus according to claim 1, wherein (n, T, j _max , w _max )) are sequentially obtained backward.

14. A process for detecting a cutout position candidate forming a character boundary candidate from a character string image input from an image input means, and (b) character recognition for a character pattern candidate cut out by the cutout position candidate. Is performed, and the character recognition result and the recognition score are stored. At this time, from the certain character pattern candidate and the character pattern candidate immediately before the character pattern candidate, the character pattern candidate Using the probability that the immediately preceding character pattern candidate occurs and the probability that the immediately preceding character pattern occurs, a recognition score indicating the likelihood that the recognition target character pattern candidate belongs to the character type is derived. (C) a process of outputting a recognition result of a character pattern candidate sequence having the highest recognition score as a whole character string, and a process of each of the above (a) to (c) being performed by a computer. Recording medium for recording a program to be executed by data.