JP3452774B2

JP3452774B2 - Character recognition method

Info

Publication number: JP3452774B2
Application number: JP28328097A
Authority: JP
Inventors: 保直伊崎
Original assignee: Fujitsu Ltd; Fujitsu Frontech Ltd
Current assignee: Fujitsu Ltd; Fujitsu Frontech Ltd
Priority date: 1997-10-16
Filing date: 1997-10-16
Publication date: 2003-09-29
Anticipated expiration: 2017-10-16
Also published as: CN1215201A; KR19990036515A; CN1140878C; JPH11120293A; KR100412317B1

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、通常見かける各種
伝票に記入される文字列であって、不規則な文字間隔又
は不規則な記入方法で記入され、隣接文字間で接触、分
離が発生することのあるような、低品質な文字列を認識
する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is a character string that is normally written on various kinds of slips, and is written at irregular character intervals or irregular writing methods, and contact and separation occur between adjacent characters. It relates to a technique for recognizing a low-quality character string that sometimes occurs.

【０００２】[0002]

【従来の技術及び発明が解決しようとする課題】イメー
ジデータを読み取って文字符号データに変換するＯＣＲ
（光学的文字読取装置）は、その適用分野が広まるに伴
って、様々な業務に使われてきている。業務毎に異なる
帳票が使用され、そこに記入される文字列も記入者もさ
まざまとなってきている。2. Description of the Related Art OCR for reading image data and converting it into character code data
(Optical character reading device) has been used in various businesses as its application field spreads. Different forms are used for each work, and the character strings and the people who fill in the forms are becoming various.

【０００３】従来のＯＣＲ用帳票においては、文字記入
枠が１文字ずつ印刷された文字枠が使用され、特に漢字
が記入される場合には大きな文字枠が使用されている。
これは、ＯＣＲ装置にとっては記入される文字を一文字
ずつ検出しやすくし、また、記入者に対しては記入時に
記入される文字が隣の文字と接触しないように促すため
のものである。In the conventional OCR form, a character frame in which a character entry frame is printed character by character is used, and a large character frame is used particularly when Kanji is entered.
This is to make it easier for the OCR device to detect the characters to be entered one by one, and to encourage the writer to prevent the characters to be entered at the time of writing from contacting adjacent characters.

【０００４】このような帳票の場合、例えば、住所や氏
名が２、３箇所記入されるだけでも記入される文字数は
何十文字にもなり、結果的に大きなサイズの帳票が必要
となり費用がかかる。また、記入者に対しても、１文字
ずつ枠の中に記入しなければならないという面倒を強い
ていた。In the case of such a form, for example, even if the address and the name are entered in a few places, the number of characters to be entered becomes dozens of characters, and as a result, a large-sized form is required and the cost is high. In addition, it has been troublesome for the writer to fill in each frame one by one.

【０００５】ＯＣＲの適用分野が広まるに従って、通常
の帳票のような小さな帳票の中に漢字文字列を文字枠に
とらわれずに記入でき、かつ実用になる認識精度で認識
でき、また読めない文字を修正する際にも効率よく修正
できる文字認識／修正技術が必要になってきている。As the field of application of OCR spreads, a kanji character string can be entered in a small form such as an ordinary form without being restricted by a character frame, and can be recognized with practical recognition accuracy, and unreadable characters can be recognized. There is a need for character recognition / correction technology that enables efficient correction even when making corrections.

【０００６】従来の代表的な文字認識方法では、認識対
象文字列が記入される文字枠の帳票上での座標位置が格
納された、定義体と呼ばれるファイルが参照されなが
ら、記入された文字が１文字ずつ検出されて切り出され
る。そして、その切り出された各文字に対して認識処理
が実行されることにより、認識結果である候補文字群が
出力される。[0006] In the conventional typical character recognition method, the entered characters are referred to while referring to a file called a definition body that stores the coordinate position on the form of the character frame in which the character string to be recognized is entered. Each character is detected and cut out. Then, the recognition process is performed on each of the cut-out characters to output a candidate character group as a recognition result.

【０００７】切り出された文字の認識処理は、例えば次
のようにして実行される。まず、多数の不特定の筆記者
により予め決められた書式に従って記入された文字が収
集され、これらの文字から認識方式に依存する特徴量が
抽出され、統計的な手法等（例えばクラスタリング手
法）によって標準パターンが作成される。そして、目的
とする字種毎の標準パターンから標準パターン辞書が作
成される。The recognition processing of the cut out characters is executed as follows, for example. First, a large number of unspecified writers collect the characters that have been filled in according to a predetermined format, the features that depend on the recognition method are extracted from these characters, and a statistical method (for example, clustering method) is used. A standard pattern is created. Then, a standard pattern dictionary is created from the target standard pattern for each character type.

【０００８】標準パターンは、例えば収集された各文字
パターンを平均することによって得られる平均パターン
として作成される。より具体的には、収集された各文字
に対応する特徴量の平均が演算されることによって得ら
れる平均特徴量によって、この平均パターンが表現され
る。The standard pattern is created as an average pattern obtained by averaging the collected character patterns, for example. More specifically, this average pattern is represented by the average feature amount obtained by calculating the average of the feature amounts corresponding to the collected characters.

【０００９】手書き文字の認識処理においては、記入者
によって大きな字形変形が生じるため、各字種毎に複数
の標準パターンが作成される。通常、１つの標準パター
ンはテンプレートと呼ばれ、上記各字種毎に複数の標準
パターンから作成される辞書は、複数テンプレート辞書
と呼ばれる。In the process of recognizing handwritten characters, a large number of standard patterns are created for each character type because a large character shape change occurs depending on the writer. Usually, one standard pattern is called a template, and a dictionary created from a plurality of standard patterns for each character type is called a plural template dictionary.

【００１０】文字認識処理は、上述の標準パターン辞書
又は複数テンプレート辞書を用いて実行される。具体的
には、入力帳票から切り出された１文字から特徴量が抽
出され、この特徴量と標準パターン辞書（又は複数テン
プレート辞書）を構成する各テンプレート（標準パター
ン）の特徴量との間で、類似度又は距離（ユークリッド
距離、マハラノビス距離等）が計算される。そして、類
似度が大きい順又は距離が小さい順に所定順位（例えば
８位）までの各テンプレートが属する各字種カテゴリー
が、候補文字群として出力される。The character recognition processing is executed using the above-mentioned standard pattern dictionary or plural template dictionary. Specifically, a feature amount is extracted from one character cut out from the input form, and between this feature amount and the feature amount of each template (standard pattern) that constitutes the standard pattern dictionary (or multiple template dictionary), Similarity or distance (Euclidean distance, Mahalanobis distance, etc.) is calculated. Then, each character type category to which each template up to a predetermined rank (e.g., 8th) belongs to a candidate character group in order of increasing similarity or decreasing distance.

【００１１】ここで、認識される文字が住所や氏名を表
わす文字である場合には、一般に、上記候補文字群に対
し、住所単語、氏名単語を使った知識処理が実行され
る。より具体的には、まず、各記入位置毎の候補文字群
が記入位置全体で組み合わせられることにより、候補文
字列群が出力される。When the recognized character is a character representing an address or a name, generally, knowledge processing using the address word and the name word is executed on the above candidate character group. More specifically, first, the candidate character strings for each writing position are combined in the entire writing position to output the candidate character string group.

【００１２】次に、この候補文字列群を構成する各候補
文字列毎に、知識処理対象の住所辞書又は氏名辞書内の
各単語文字列がその候補文字列中に存在するか否かが比
較される。Next, for each candidate character string constituting this candidate character string group, it is compared whether or not each word character string in the address dictionary or name dictionary of the knowledge processing target exists in the candidate character string. To be done.

【００１３】そして、その比較結果と、例えば候補文字
列を構成する各候補文字の順位等に従って、その候補文
字列に対して得点が付けられる。この処理が全ての候補
文字列に対して実行された後、最も得点の高い候補文字
列が知識処理結果として出力される。Then, a score is given to the candidate character string according to the comparison result and the rank of each candidate character forming the candidate character string, for example. After this processing is executed for all the candidate character strings, the candidate character string with the highest score is output as the knowledge processing result.

【００１４】このような知識処理に関する従来技術とし
ては、例えば日本国特許公開公報：特開昭６１−１０７
４８６号に開示されるものが知られている。ここで、通
常の伝票への記入のように、フリーピッチで記入された
住所、氏名のような漢字文字列が認識される場合、隣接
する文字同士が接触することは一般に多く発生し、ま
た、逆に漢字には偏（へん）と旁（つくり）のように分
離して記入される文字も存在する。As a conventional technique relating to such knowledge processing, for example, Japanese Patent Laid-Open Publication No. Sho 61-107.
The one disclosed in No. 486 is known. Here, when an address written in free pitch, a kanji character string such as a name, is recognized as in the case of writing on a regular slip, it is common that adjacent characters often come into contact with each other, and On the other hand, there are some characters that are written separately, such as Hen and Tsuku.

【００１５】このため、記入文字が１文字ずつ検出され
切り出されて認識される従来の文字認識方法では、どの
範囲が１文字の範囲であるかを判断するのは困難であ
り、実用に耐える認識精度を実現することは困難であ
る。For this reason, it is difficult to judge which range is a range of one character by the conventional character recognition method in which the entered characters are detected, cut out and recognized one by one, and the recognition is practical. It is difficult to achieve accuracy.

【００１６】更に、各文字が正しく認識できなければ、
何文字記入されているかさえ判断できない場合もあり、
単語を構成する文字数が確定していることが前提とされ
る従来の知識処理では、認識精度の向上を図ることには
限界がある。Further, if each character cannot be recognized correctly,
In some cases, it may not be possible to determine how many characters have been entered,
In the conventional knowledge processing, which is premised on that the number of characters forming a word is fixed, there is a limit in improving the recognition accuracy.

【００１７】また、特に住所地名などの認識処理におい
て、例えば上位レベルの単語（例えば東京都、大阪府
等）が知識処理によって認識できなかった場合に、その
段階にでは下位レベルの単語は知識処理できていないの
が一般的であるため、住所地名を修正するためには、１
文字目から全ての文字列を順次修正する必要がある。Further, particularly in the recognition processing of an address place name, when a higher-level word (for example, Tokyo, Osaka prefecture, etc.) cannot be recognized by the knowledge processing, the lower-level word is processed by the knowledge processing at that stage. Since it is generally not done, to correct the address place name, 1
It is necessary to correct all the character strings starting from the first character.

【００１８】上述のようなフリーピッチの文字列を認識
するための第１の従来技術として、日本国特許公報：特
公平８−２３８７５号「単語読み取り方式」に開示され
ているものが知られている。この第１の従来技術では、
認識結果である候補文字列と単語辞書とがＤＰマッチン
グ等により照合され、一致する文字が多い単語が選択さ
れ、不一致の部分が再度切り出され、その切り出された
文字列に対して更に認識が行われる。As a first conventional technique for recognizing the above-mentioned free-pitch character string, the one disclosed in Japanese Patent Publication: Japanese Patent Publication No. 8-23875 "word reading method" is known. There is. In this first conventional technique,
The candidate character string which is the recognition result and the word dictionary are collated by DP matching or the like, a word having a large number of matching characters is selected, the mismatched portion is cut out again, and the cut-out character string is further recognized. Be seen.

【００１９】フリーピッチの文字列を認識するための第
２の従来技術として、日本国特許公開公報：特開昭６３
−１３６２９１号「単語読み取り方式」に開示されてい
るものが知られている。この第２の従来技術では、文字
の偏、旁の各部分を示す部分パターンを標準パターンと
して有する標準パターン辞書を用いて認識処理が実行さ
れ、候補文字列の各文字の偏、旁から文字列が生成さ
れ、それと単語辞書とのマッチング処理が実行される。As a second conventional technique for recognizing a free-pitch character string, Japanese Patent Laid-Open Publication No. 63-63.
The one disclosed in No. -136291 "Word reading method" is known. In the second conventional technique, the recognition processing is executed by using a standard pattern dictionary having partial patterns indicating the respective portions of the character deviation and the character string as a standard pattern, and the character string from the character deviation and the character string of the candidate character string is calculated. Is generated, and the matching process with the word dictionary is executed.

【００２０】フリーピッチの文字列を認識するための第
３の従来技術として、日本国特許公開公報：特開平８−
１７１６１４号「文字列読み取り装置」に開示されてい
るものが知られている。この第３の従来技術では、候補
文字列中に正解文字が含まれずに読み飛ばしが発生した
場合や、正解文字と競合する文字候補の存在によって複
数の読み取り候補が発生した場合などにつき、予想文字
列の存在可能性が検証される。この場合の検証手段とし
て、いくつかの実現方法が開示されている。As a third conventional technique for recognizing a free-pitch character string, Japanese Patent Laid-Open Publication No. 8-
The one disclosed in No. 171614 “Character string reading device” is known. In the third conventional technique, a predicted character is detected when a skipped character occurs because the correct character is not included in the candidate character string, or when a plurality of read candidates occur due to the existence of a character candidate that conflicts with the correct character. The existence of the column is verified. As a verification means in this case, some implementation methods are disclosed.

【００２１】しかし、我々が日常記入するような文字
列、即ち隣接文字間の接触が頻繁に発生し、文字幅も文
字毎に大きく変化し、つぶれやかすれの多い低品質な文
字列に対する認識処理を検討した場合に、上記第１乃至
第３の従来技術は、以下のような問題点を有している。However, the recognition processing for a character string that we normally write, that is, a contact between adjacent characters frequently occurs, the character width changes greatly for each character, and a low-quality character string that is often crushed or faint is recognized. The above-mentioned first to third prior arts have the following problems.

【００２２】まず、第１の従来技術では、候補文字列の
どの文字が優先的に扱われるかは不定であり、候補文字
列中の全ての文字が対等に扱われるため、最初の文字切
り出し位置によっては全く不適切な単語しか候補に選ば
れない可能性があるという問題点を有している。First, in the first conventional technique, it is uncertain which character of the candidate character string is preferentially treated, and all the characters in the candidate character string are treated equally. There is a problem in that there is a possibility that only inappropriate words may be selected as candidates.

【００２３】次に、第２の従来技術では、隣接文字同士
が接触した領域に対する処理に問題がある。更に、第３
の従来技術では、検証手段の実現方法としていくつかの
方法が記されているが、いずれの方法も文字候補の組合
せを用いたものであり、それらの検証性能は最初の文字
の切出し結果に大きく依存してしまうという問題点を有
している。Next, in the second conventional technique, there is a problem in processing the area where adjacent characters are in contact with each other. Furthermore, the third
In the related art, several methods are described as a method of realizing the verification means, but all of them use a combination of character candidates, and their verification performance greatly depends on the cutting result of the first character. It has a problem of dependence.

【００２４】本発明の課題は、特定の文字に着目するこ
とにより低品質な文字列を精度よく認識することにあ
る。An object of the present invention is to accurately recognize a low-quality character string by paying attention to a specific character.

【００２５】[0025]

【課題を解決するための手段】本発明は、所定カテゴリ
ーを有する記入フィールドに記入された入力文字列を構
成する文字を認識する文字認識／修正方法、又はそれと
同等の機能を有するコンピュータ読出し可能記録媒体を
前提とする。SUMMARY OF THE INVENTION The present invention provides a character recognition / correction method for recognizing characters constituting an input character string entered in an entry field having a predetermined category, or a computer-readable record having a function equivalent thereto. Assuming medium.

【００２６】本発明において、まず、入力文字列と第１
の認識辞書（特定文字標準パターン辞書１０７）との間
で第１のマッチング処理が実行されることにより、入力
文字列中から特定文字又は特定文字列が抽出される。よ
り具体的には、第１の認識辞書に、特定文字又は特定文
字列に対応する標準パターンが記憶され、入力文字列の
パターンと第１の認識辞書内の各標準パターンとの間で
第１のマッチング処理が実行されることにより、入力文
字列中から特定文字又は特定文字列が抽出される。上述
の特定文字又は特定文字列は、例えば所定カテゴリーに
おいて出現する頻度の高いもの、或いは、認識精度の高
いものである。In the present invention, first, the input character string and the first
By executing the first matching process with the recognition dictionary (specific character standard pattern dictionary 107), the specific character or the specific character string is extracted from the input character string. More specifically, a standard pattern corresponding to a specific character or a specific character string is stored in the first recognition dictionary, and the first pattern is stored between the pattern of the input character string and each standard pattern in the first recognition dictionary. By executing the matching process of, the specific character or the specific character string is extracted from the input character string. The above-mentioned specific character or specific character string is, for example, one that frequently appears in a predetermined category or one that has high recognition accuracy.

【００２７】次に、所定カテゴリー（例えば住所文字
列）に属し、かつ入力文字列中から抽出された各特定文
字又は特定文字列の前後の入力文字列中の領域に位置す
る可能性のある候補単語群がカテゴリー別単語辞書（特
定文字辞書１１０、知識辞書１１１）から抽出される。Next, a candidate that belongs to a predetermined category (for example, an address character string) and is possibly located in an area in the input character string before or after each specific character extracted from the input character string. A word group is extracted from the category-based word dictionary (specific character dictionary 110, knowledge dictionary 111).

【００２８】そして、その抽出された候補単語群に属す
る各候補単語毎に、その各候補単語に関する情報に基づ
いてその各候補単語が位置する入力文字列中の各領域に
対して第２の認識辞書（標準パターン辞書１１３）を用
いて第２のマッチング処理が実行されることにより、入
力文字列を構成する文字が認識される。より具体的に
は、第２の認識辞書に、候補単語群に属する候補単語に
関連する文字又は文字列に対応する標準パターンが記憶
され、候補単語群に属する各候補単語毎に、その各候補
単語に関する情報に基づいてその各候補単語が位置する
入力文字列中の各領域に対してその各候補単語のパター
ンと第２の認識辞書内の各標準パターンとの間で第２の
マッチング処理が実行されることにより、入力文字列を
構成する文字が認識される。この場合に、各候補単語に
関する情報として、例えばその各候補単語の文字数の情
報が使用される。また、第２の認識辞書は、第１の認識
辞書を含むように構成されてもよい。Then, for each candidate word belonging to the extracted candidate word group, the second recognition is performed for each area in the input character string in which each candidate word is located based on the information about each candidate word. By performing the second matching process using the dictionary (standard pattern dictionary 113), the characters that form the input character string are recognized. More specifically, the second recognition dictionary stores standard patterns corresponding to characters or character strings related to the candidate words belonging to the candidate word group, and for each candidate word belonging to the candidate word group, each candidate A second matching process is performed between the pattern of each candidate word and each standard pattern in the second recognition dictionary for each area in the input character string where each candidate word is located based on the information about the word. By being executed, the characters forming the input character string are recognized. In this case, for example, information on the number of characters of each candidate word is used as the information on each candidate word. Also, the second recognition dictionary may be configured to include the first recognition dictionary.

【００２９】上述の発明の構成により、入力文字列中の
特定文字又は特定文字列がまず優先的に認識され、その
認識結果に基づいてその前後の候補単語が仮定され、更
にその候補単語の情報を用いて入力文字列を構成する文
字が再認識されることによって、通常見かける各種帳票
（伝票）に記入されるような、不規則な間隔、記入方法
で記入された入力文字列を構成する文字を、高い精度で
認識することが可能となる。According to the configuration of the above-described invention, the specific character or the specific character string in the input character string is first preferentially recognized, the candidate words before and after it are assumed based on the recognition result, and the information of the candidate word is further assumed. Characters that make up the input character string entered at irregular intervals and entry methods that are normally found on various forms (slips) by re-recognizing the characters that make up the input character string using Can be recognized with high accuracy.

【００３０】上述の発明の構成において、入力文字列を
構成する文字の認識結果が入力文字列と並列して表示さ
れ、その表示される入力文字列上の所望領域がユーザに
よって指定されてその所望領域に対応する文字又は文字
列が修正され、その修正によって与えられた正解文字又
は正解文字列に関する情報に基づいて、候補単語群の抽
出処理及び第２のマッチング処理が再度実行され、入力
文字列を構成する文字が再度認識されるように構成する
ことができる。この場合に、表示される入力文字列上の
所望領域の指定に応答して、その所望領域における複数
の候補認識結果が表示されるように構成することができ
る。In the above configuration of the invention, the recognition result of the characters forming the input character string is displayed in parallel with the input character string, and the desired area on the displayed input character string is designated by the user and the desired area is displayed. The character or character string corresponding to the area is corrected, and based on the information about the correct character or correct character string given by the correction, the candidate word group extraction process and the second matching process are executed again, and the input character string It can be configured so that the characters that make up the are recognized again. In this case, in response to designation of the desired area on the displayed input character string, a plurality of candidate recognition results in the desired area can be displayed.

【００３１】このような文字修正技術によって、特定の
文字又は文字列のみを修正するだけで、他の認識不能部
分も自動的に修正することができる。また、上述の発明
の構成において、各候補単語に対して表記上のゆらぎを
有する単語が、候補単語群に属する新たな候補単語とし
て出力されるように構成することができる。With such a character correction technique, other unrecognizable parts can be automatically corrected only by correcting a specific character or character string. Further, in the above-described configuration of the invention, a word having a notational fluctuation for each candidate word may be output as a new candidate word belonging to the candidate word group.

【００３２】このような表記上のゆらぎの制御技術によ
って、種々の記入方法に柔軟に対処することができる。With such a notational fluctuation control technique, various entry methods can be flexibly dealt with.

【００３３】[0033]

【発明の実施の形態】以下、図面を参照しながら本発明
の実施の形態につき詳細に説明する。本発明の実施の形態の構成及び概略動作図１は、本発明の実施の形態の構成図である。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings. Configuration and General Operation of Embodiment of the Present Invention FIG. 1 is a configuration diagram of an embodiment of the present invention.

【００３４】まず、文字切り出し部１０３が、帳票の記
入フィールド位置に関する情報を定義した記入フィール
ド定義１０４を用いて、イメージメモリ１０２から読み
出された帳票に記入された入力文字列１０１中の先頭か
ら順に１文字ずつを切り出す。First, the character slicing unit 103 uses the entry field definition 104 defining the information on the entry field position of the form, from the beginning of the input character string 101 entered in the form read from the image memory 102. Cut out one character at a time.

【００３５】次に、特徴抽出部１０５が、その切り出さ
れた文字から特徴量を抽出する。続いて、マッチング部
１０６が、その切り出された文字の特徴量と、特定文字
標準パターン辞書１０７内の各特定文字標準パターンの
特徴量との間のマッチング処理を実行し、マッチング度
が高い順に所定順位までの各特定文字標準パターンが属
する各特定文字の字種カテゴリーを、上記切り出された
文字に対する候補特定文字として候補文字列バッファ１
０８に出力する。Next, the feature extraction unit 105 extracts a feature amount from the cut out character. Subsequently, the matching unit 106 performs a matching process between the feature amount of the cut out character and the feature amount of each specific character standard pattern in the specific character standard pattern dictionary 107, and the matching degree is determined in descending order of matching degree. The character string category of each specific character to which each specific character standard pattern up to the rank belongs is set as a candidate specific character for the extracted character as a candidate character string buffer 1
It outputs to 08.

【００３６】文字切り出し部１０３、特徴抽出部１０
５、及びマッチング部１０６による上記一連の特定文字
認識処理は、文字切り出し部１０３が入力文字列１０１
の先頭から順に切り出した文字毎に実行される。この結
果、候補文字列バッファ１０８には、入力文字列１０１
から切り出された文字の並び順に対応する並び順で、各
文字に対応する候補特定文字が保持される。Character cutout unit 103, feature extraction unit 10
5 and the series of specific character recognition processing by the matching unit 106, the character cutting unit 103 performs the input character string 101.
It is executed for each character cut out in order from the beginning. As a result, the input character string 101 is stored in the candidate character string buffer 108.
The candidate specific characters corresponding to the respective characters are held in the order corresponding to the order of the characters cut out from.

【００３７】候補単語検索部１０９は、候補文字列バッ
ファ１０８に得られた候補特定文字列の中から隣接する
任意の２つの特定文字からなる組（特定文字組）を全て
抽出し、それぞれの特定文字組が特定文字辞書１１０に
登録されているか否かを検索する。The candidate word search unit 109 extracts all sets (specific character sets) consisting of any two adjacent specific characters from the candidate specific character strings obtained in the candidate character string buffer 108, and specifies each of them. It is searched whether the character set is registered in the specific character dictionary 110.

【００３８】候補単語検索部１０９は、１組の特定文字
組が特定文字辞書１１０に登録されている場合、その登
録レコードにリンクする知識辞書１１１中のレコードか
ら、その特定文字組を構成する２つの特定文字により挟
まれる単語群を検索し、その検索された単語群を候補単
語群として候補単語バッファ１１２に保持する。When one specific character set is registered in the specific character dictionary 110, the candidate word searching unit 109 forms the specific character set from the record in the knowledge dictionary 111 linked to the registered record. A word group sandwiched by two specific characters is searched, and the searched word group is held in the candidate word buffer 112 as a candidate word group.

【００３９】候補単語検索部１０９は、候補文字列バッ
ファ１０８から抽出した上記特定文字組毎に、それに対
応する候補単語群を抽出し、候補単語バッファ１１２に
保持する。The candidate word search unit 109 extracts a candidate word group corresponding to each of the above-mentioned specific character sets extracted from the candidate character string buffer 108, and holds it in the candidate word buffer 112.

【００４０】結局、候補単語バッファ１１２には、１組
の特定文字組について１つ以上の候補単語群が得られ、
最終的に、複数の特定文字組分の候補単語群の集合が得
られることになる。After all, in the candidate word buffer 112, one or more candidate word groups are obtained for one specific character set,
Eventually, a set of candidate word groups for a plurality of specific character sets will be obtained.

【００４１】１組の特定文字組について候補単語バッフ
ァ１１２に得られた候補単語群に属する各候補単語は、
順次読み出されてそれぞれに対して以下の一連の処理が
実行される。Each candidate word belonging to the candidate word group obtained in the candidate word buffer 112 for one specific character set is
It is sequentially read and the following series of processing is executed for each.

【００４２】まず、文字切り出し部１０３は、イメージ
メモリ１０２から読み出される入力文字列１０１におい
て、候補単語バッファ１１２から出力された候補単語の
情報を使って、その候補単語が属する特定文字組を構成
する２つの特定文字に挟まれた文字列領域内の文字列を
再度切り出す。First, the character slicing unit 103 constructs a specific character set to which the candidate word belongs in the input character string 101 read from the image memory 102, using the information of the candidate word output from the candidate word buffer 112. The character string in the character string area sandwiched between two specific characters is cut out again.

【００４３】特徴抽出部１０５は、再度切り出された文
字列から特徴量を抽出する。更に、マッチング部１０６
は、その再度切り出された文字列の特徴量と、第２の辞
書である標準パターン辞書１１３内の各標準パターンの
特徴量とのマッチング処理を実行し、マッチング度が高
い順に所定順位までの各標準パターンが属する文字列の
カテゴリーを、上記候補単語に対する候補認識結果群と
して候補文字列バッファ１０８に保持する。The feature extraction unit 105 extracts a feature amount from the re-cut character string. Furthermore, the matching unit 106
Performs matching processing of the feature amount of the re-cut out character string and the feature amount of each standard pattern in the standard pattern dictionary 113, which is the second dictionary. The category of the character string to which the standard pattern belongs is held in the candidate character string buffer 108 as a candidate recognition result group for the candidate word.

【００４４】文字切り出し部１０３、特徴抽出部１０
５、及びマッチング部１０６による上記一連の再認識処
理は、上記１組の特定文字組について候補単語バッファ
１１２に得られた候補単語群に属する候補単語のそれぞ
れにつき実行され、各候補単語毎に所定順位までの候補
認識結果群が候補文字列バッファ１０８に得られる。Character cutout unit 103, feature extraction unit 10
5 and the series of re-recognition processing by the matching unit 106 is executed for each candidate word belonging to the candidate word group obtained in the candidate word buffer 112 for the one set of specific character set, and predetermined for each candidate word. A candidate recognition result group up to the rank is obtained in the candidate character string buffer 108.

【００４５】そして、マッチング部１０６は、上記１組
の特定文字組に属する各候補単語毎に候補文字列バッフ
ァ１０８に得られる所定順位までの候補認識結果群の全
て中から、最も妥当で信頼度の高い認識結果、より具体
的には最もマッチング度が高い候補認識結果を、上記１
組の特定文字組を構成する２つの特定文字に挟まれた部
分の認識結果として、知識処理部１１４に出力する。Then, the matching unit 106 selects the most appropriate and reliable among all the candidate recognition result groups up to a predetermined rank obtained in the candidate character string buffer 108 for each candidate word belonging to the one set of specific character set. The recognition result having a high degree of matching, more specifically, the candidate recognition result having the highest degree of matching is
The result is output to the knowledge processing unit 114 as the recognition result of the portion sandwiched between the two specific characters forming the specific character set of the set.

【００４６】文字切り出し部１０３、特徴抽出部１０
５、及びマッチング部１０６による、１組の特定文字組
の候補単語群に属する候補単語毎の上記一連の再認識処
理は、候補単語バッファ１１２に登録されている各特定
文字組毎に実行される。この結果、知識処理部１１４に
は、各特定文字組を構成する２つの特定文字に挟まれた
各文字領域に対応する認識結果が出力されることにな
る。Character cutout unit 103, feature extraction unit 10
5 and the series of re-recognition processes by the matching unit 106 for each candidate word belonging to the candidate word group of one specific character set are executed for each specific character set registered in the candidate word buffer 112. . As a result, the knowledge processing unit 114 outputs the recognition result corresponding to each character area sandwiched between the two specific characters forming each specific character set.

【００４７】知識処理部１１４は、各特定文字組を構成
する２つの特定文字に挟まれた各文字領域に対応する認
識結果に対して、記入フィールド定義１０４及び知識辞
書１１１を用いた知識処理によって、上記各文字領域か
らなる全体文字領域の最終認識結果を決定し、それを認
識結果バッファ１１５に出力する。The knowledge processing unit 114 performs knowledge processing using the entry field definition 104 and the knowledge dictionary 111 on the recognition result corresponding to each character region sandwiched between two specific characters forming each specific character set. , The final recognition result of the entire character area including the above-mentioned character areas is determined, and is output to the recognition result buffer 115.

【００４８】上述の一連の認識処理において、認識条件
を最後まで満たさなかった文字又は文字列の部分につい
ては、リジェクト（認識不能）情報が付加される。この
場合に、認識結果バッファ１１５に得られた認識結果
が、インタフェース部１１６を介して表示部１１７に表
示される。ユーザは、表示部１１７での認識結果の表示
に対して、マウス及びキーボード等からなる入力部１１
８から、認識不能文字／文字列を修正することができ
る。In the series of recognition processes described above, reject (unrecognizable) information is added to the part of the character or character string that does not satisfy the recognition condition to the end. In this case, the recognition result obtained in the recognition result buffer 115 is displayed on the display unit 117 via the interface unit 116. When the user displays the recognition result on the display unit 117, the user uses the input unit 11 including a mouse and a keyboard.
From 8, it is possible to correct unrecognizable characters / character strings.

【００４９】ユーザは、入力部１１８から認識不能文字
／文字列中の特定の正解文字を指定するだけで、その正
解文字に関する情報がインタフェース部１１６から正解
文字バッファ１１９及び領域座標バッファ１２０に出力
される。The user only has to specify a specific correct character in the unrecognizable character / character string from the input unit 118, and information about the correct character is output from the interface unit 116 to the correct character buffer 119 and the area coordinate buffer 120. It

【００５０】候補単語検索部１０９は、正解文字バッフ
ァ１１９に得られた正解文字に関する情報を特定文字の
情報として、前述した特定文字辞書１１０と知識辞書１
１１を用いた候補単語の検索処理を実行することによ
り、認識不能文字を正しく再認識させることができる。
また、文字切り出し部１０３は、ユーザによって指定さ
れた正解文字の切り出し位置を領域座標バッファ１２０
から取得することによって、正しい文字の切り出しを実
行することができる。The candidate word searching unit 109 uses the information about the correct character obtained in the correct character buffer 119 as the information of the specific character and the specific character dictionary 110 and the knowledge dictionary 1 described above.
By executing the candidate word search process using 11, the unrecognizable character can be correctly recognized again.
Further, the character cutout unit 103 determines the cutout position of the correct character specified by the user as the area coordinate buffer 120.
The correct character segmentation can be performed by acquiring from.

【００５１】以上のように、本実施の形態では、帳票中
の各記入フィールドに記入される住所、氏名、品名等の
入力文字列１０１に対し、各フィールド毎に出現頻度が
高い文字或いは特定の文字／文字列に着目することで、
知識辞書１１１が保持する単語情報と、階層構造を有す
る住所等の文字列の場合は各文字領域毎の接続情報を用
いて、上記特定文字に挟まれた文字領域の候補単語を選
択することができる。更に、本実施の形態では、その候
補単語の情報を用いて、入力文字列１０１から上記特定
文字に挟まれた文字領域の抽出とその文字領域に対する
再認識処理が実行されることにより、隣接文字間で接
触、分離が多く発生する書き方で記入された文字列を、
高い認識精度で認識することができる。本発明の実施の形態の詳細動作図２〜図４は、図１に示される構成を有する本発明の実
施の形態が実現する全体制御を示す動作フローチャート
である。＜特定文字の認識処理＞まず、文字切り出し部１０３
が、帳票の記入フィールド位置に関する情報を定義した
記入フィールド定義１０４を用いて、イメージメモリ１
０２から２値化画像データとして読み出された、帳票に
記入された入力文字列１０１中の先頭から順に１文字ず
つを切り出す（図２のステップ２０１）。As described above, in the present embodiment, with respect to the input character string 101 such as the address, the name, the product name, etc. written in each entry field in the form, a character having a high frequency of appearance in each field or a specific appearance frequency is specified. By focusing on the character / character string,
Using the word information held in the knowledge dictionary 111 and the connection information for each character area in the case of a character string such as an address having a hierarchical structure, it is possible to select a candidate word in the character area sandwiched between the specific characters. it can. Further, in the present embodiment, the information of the candidate word is used to extract the character area sandwiched between the specific characters from the input character string 101 and to perform the re-recognition process on the character area, so that the adjacent character A character string written in a style that often causes contact and separation between
It can be recognized with high recognition accuracy. Detailed Operation of the Embodiment of the Present Invention FIGS. 2 to 4 are operation flowcharts showing overall control realized by the embodiment of the present invention having the configuration shown in FIG. < Specific Character Recognition Processing > First, the character cutout unit 103
Using the entry field definition 104 that defines the information about the entry field position of the form.
One character is cut out from the beginning of the input character string 101 written on the form, which is read out as binary image data from No. 02 (step 201 in FIG. 2).

【００５２】図５は、文字切り出し部１０３が使用する
記入フィールド定義１０４のデータフォーマット例を示
す図である。例えば、帳票上にフィールド１、２が配置
されており、この２つのフィールドに記入された文字列
が認識される場合、記入フィールド定義１０４は、以下
のようにして決定される。FIG. 5 is a diagram showing an example of the data format of the entry field definition 104 used by the character slicing section 103. For example, when fields 1 and 2 are arranged on the form and the character strings entered in these two fields are recognized, the entry field definition 104 is determined as follows.

【００５３】まず、帳票の上部が座標原点とされ、横方
向にｘ軸、縦方向にｙ軸がそれぞれ定義され、フィール
ド１、２のそれぞれについて、そのフィールドの左上端
の位置の座標（フィールド原点座標）と、ｘ軸方向のフ
ィールド幅及びｙ軸方向のフィールド高さとからなるフ
ィールドの大きさデータが、図５(a) に示されるように
定義される。長さの単位は、ミリメートル又はインチで
ある。First, the upper part of the form is the coordinate origin, the x-axis is defined in the horizontal direction, and the y-axis is defined in the vertical direction. For each of fields 1 and 2, the coordinates of the upper left corner of the field (field origin) are defined. Coordinate size), and field size data consisting of a field width in the x-axis direction and a field height in the y-axis direction are defined as shown in FIG. 5 (a). The unit of length is millimeters or inches.

【００５４】次に、フィールド１、２のそれぞれについ
て、各フィールドにどのような種別の文字列が記入され
るかを示すフィールド種別が定義される。これらの情報
が、図５(b) に示される表形式で、記入フィールド定義
１０４として特には図示しない記憶装置に保持される。Next, for each of the fields 1 and 2, a field type that indicates what kind of character string is written in each field is defined. These pieces of information are held in the storage device (not shown) as the entry field definition 104 in the table format shown in FIG.

【００５５】文字切り出し部１０３は、上述の記入フィ
ールド定義１０４を用いることによって、イメージメモ
リ１０２から読み出されたイメージデータ上で各フィー
ルド毎の文字領域を決定した後、その文字領域内のイメ
ージデータに対して、図６に示される動作フローチャー
トによって示される文字切り出し制御を実行する。The character slicing unit 103 determines the character area for each field on the image data read from the image memory 102 by using the above-described entry field definition 104, and then the image data in the character area is determined. On the other hand, the character cutout control shown by the operation flowchart shown in FIG. 6 is executed.

【００５６】ここで、図８(a) に示されるように、記入
フィールド定義１０４から抽出される対象領域のフィー
ルド原点座標を（ｘ₀，ｙ₀）、ｘ軸方向のフィールド
幅をｄｘ、ｙ軸方向のフィールド高さをｄｙとする。Here, as shown in FIG. 8A, the field origin coordinates of the target area extracted from the entry field definition 104 are (x ₀ , y ₀ ), and the field width in the x-axis direction is dx, y. The field height in the axial direction is dy.

【００５７】まず文字切り出し部１０３は、ｘ軸方向の
各走査ライン毎に、黒画素数を累算することにより、各
ｙ座標位置毎のｘ軸方向の黒画素の出現頻度を示す水平
ヒストグラムを、図８(b) に示されるように算出する
（図６のステップ６０１）。First, the character slicing section 103 accumulates the number of black pixels for each scanning line in the x-axis direction to obtain a horizontal histogram showing the appearance frequency of black pixels in the x-axis direction at each y-coordinate position. , As shown in FIG. 8B (step 601 in FIG. 6).

【００５８】次に、図８(b) に示されるように、文字切
り出し部１０３は、上記水平ヒストグラム上をその上方
及び下方のそれぞれから走査し、最初に頻度値Ｃを超え
る位置α及びβを算出し、更にそれらから算出される値
α−βを、その対象領域における文字列高さｈとする
（ステップ６０２）。Next, as shown in FIG. 8 (b), the character slicing unit 103 scans the horizontal histogram from above and below, respectively, and first finds the positions α and β that exceed the frequency value C. The value α-β calculated and further calculated therefrom is set as the character string height h in the target area (step 602).

【００５９】次に、文字切り出し部１０３は、ｙ軸方向
の各走査ライン毎に、黒画素数を累算することにより、
各ｘ座標位置毎のｙ軸方向の黒画素の出現頻度を示す垂
直ヒストグラムを図８(c) に示されるように算出する
（図６のステップ６０３）。Next, the character slicing section 103 accumulates the number of black pixels for each scanning line in the y-axis direction,
A vertical histogram showing the appearance frequency of black pixels in the y-axis direction at each x-coordinate position is calculated as shown in FIG. 8C (step 603 in FIG. 6).

【００６０】続いて、図８(c) に示されるように、文字
切り出し部１０３は、上記垂直ヒストグラム上をその左
から走査し、頻度値がしきい値ｄ以下からしきい値ｄ以
上に変化する点ｘ₁，ｘ₃，ｘ₅，・・・（ｘ_2n-1：ｎ
＝１，２，・・・）を切り出し候補位置として算出し、
また、頻度値がしきい値ｄ以上からしきい値ｄ以下に変
化する点ｘ₂，ｘ₄，ｘ₆，・・・（ｘ_2m：ｍ＝１，
２，・・・）もやはり切り出し候補位置として算出する
（ステップ６０４）。Subsequently, as shown in FIG. 8C, the character clipping unit 103 scans the vertical histogram from the left, and the frequency value changes from the threshold value d or less to the threshold value d or more. Points x ₁ , x ₃ , x ₅ , ... (x _2n-1 : n
= 1, 2, ...) is calculated as a cutout candidate position,
Further, points x ₂ , x ₄ , x ₆ , ... (x _2m : m = 1, where the frequency value changes from the threshold value d or more to the threshold value d or less)
2, ...) are also calculated as cutout candidate positions (step 604).

【００６１】次に、文字切り出し部１０３は、下記条件
式を満たす領域［ｘ_2m，ｘ_2n-1］を算出し、それを文字
切り出し結果とする（ステップ６０５）。Next, the character slicing section 103 calculates a region [x _2m , x _2n-1 ] satisfying the following conditional expression, and sets it as a character slicing result (step 605).

【００６２】[0062]

【数１】ｈ−ｔ₁≦ｘ_2m−ｘ_2n-1≦ｈ＋ｔ₂ （ｍ＝１，２，３，・・・），（ｎ＝１，２，３，・・
・）ここで、ｈは前述したステップ６０２において算出され
た文字列高さ、ｔ₁及びｔ₂は入力文字列１０１の学習
サンプルによって決まるパラメータである。図８(c) の
例では、下記３つの領域が文字切り出し結果として算出
される。［ｘ₁，ｘ₂］［ｘ₃，ｘ₄］［ｘ₅，ｘ₈］文字切り出し部１０３は、ステップ６０５の処理の結
果、下記条件式を満たす領域が残ったか否かを判定する
（ステップ６０６）。## EQU1 ## h−t ₁ ≦ x _2m −x _2n−1 ≦ h + t ₂ (m = 1, 2, 3, ...), (n = 1, 2, 3, ...
Here, h is the height of the character string calculated in step 602 described above, and t ₁ and t ₂ are parameters determined by the learning sample of the input character string 101. In the example of FIG. 8 (c), the following three areas are calculated as the character extraction result. [X ₁ , x ₂ ] [x ₃ , x ₄ ] [x ₅ , x ₈ ] As a result of the processing of step 605, the character segmenting unit 103 determines whether or not there is an area that satisfies the following conditional expression (step 606).

【００６３】[0063]

【数２】ｘ_2l−ｘ_2l-1＞ｈ＋ｔ₂ （ｌ＝１，２，３，・・・）ステップ６０６の判定がＮＯならば、文字切り出し部１
０３は、その制御処理を終了する。[Number 2] _{_{x 2l -x 2l-1> h}} + t 2 (l = 1,2,3, ···) If the determination in step 606 is NO, the character segmentation unit 1
03 ends the control process.

【００６４】ステップ６０６の判定がＹＥＳであるなら
ば、文字切り出し部１０３は、領域［ｘ_2l-1，ｘ_2l］に
おいて、ステップ６０３で算出された垂直ヒストグラム
の頻度値がしきい値ｄより大きい所定値以下で、かつ、
下記条件式を満たす値ｋを算出する。If the determination in step 606 is YES, the character clipping unit 103 determines that the frequency value of the vertical histogram calculated in step 603 is larger than the threshold value d in the region [x _2l-1 , x _2l ]. Below a specified value, and
A value k that satisfies the following conditional expression is calculated.

【００６５】[0065]

【数３】ｈ≒（ｘ_2l−ｘ_2l-1）／ｋこの結果、領域［ｘ_2l-1，ｘ_2l］をｋ分割した各位置を
文字切り出し位置として算出する（以上、ステップ６０
７）。図８(d) の例においては、ｌ＝１、ｋ＝２とな
り、領域［ｘ₁，ｘ₂］を２分割した位置ｘ′が文字切
り出し位置として算出される。## _EQU00003 ## h.apprxeq. ( _{X.sub.2l-x.sub.2l-1} ) / k As a result, each position obtained by dividing the region [ _x.sub.2l-1 , x _2l ] into k is calculated as the character cut-out position (above, step 60).
7). In the example of FIG. 8 (d), l = 1 and k = 2, and the position x ′ obtained by dividing the area [x ₁ , x ₂ ] into two is calculated as the character cut-out position.

【００６６】その後、文字切り出し部１０３は、その制
御処理を終了する。以上説明した図６の動作フローチャ
ートは、文字切り出し部１０３が、文字数が予め与えら
れていないフィールドに対して実行する文字切り出し処
理に対応するものである。After that, the character slicing section 103 ends the control processing. The operation flowchart of FIG. 6 described above corresponds to the character cutout process executed by the character cutout unit 103 for a field in which the number of characters is not given in advance.

【００６７】これに対して、候補単語バッファ１１２か
ら読み出される候補単語の情報に基づいて再認識処理が
実行される場合のように、文字切り出し部１０３が、文
字切り出しの対象となる領域とその領域内の文字数が予
め与えられているフィールドに対して文字切り出し処理
を実行する場合もある。On the other hand, as in the case where the re-recognition process is executed based on the information of the candidate words read from the candidate word buffer 112, the character slicing unit 103 causes the character slicing target area and its area In some cases, the character cutting process may be performed on a field in which the number of characters in the field is given in advance.

【００６８】この場合には、文字切り出し部１０３は、
図６のステップ６０５〜６０７の処理群の代わりに、図
７のステップ７０１の処理を実行する。即ち、文字切り
出しの対象となる領域の左端のｘ座標がｘ_s、右端のｘ
座標がｘ_t、上記領域内の文字数がｎとして与えられた
ときに、文字切り出し部１０３は、図６のステップ６０
３で算出された垂直ヒストグラムの頻度値が所定値以下
で、かつ、下記条件式を満たす値Ｘ_nに近い隣接間隔を
有する位置を文字切り出し位置として算出する。In this case, the character cutout unit 103
Instead of the processing group of steps 605 to 607 of FIG. 6, the processing of step 701 of FIG. 7 is executed. That is, the x coordinate at the left end of the area to be cut out is x _s , and the x coordinate at the right end is x.
When the coordinate is x _t and the number of characters in the area is n, the character segmentation unit 103 performs step 60 in FIG.
A position where the frequency value of the vertical histogram calculated in 3 is equal to or less than a predetermined value and has an adjacent interval close to a value X _n that satisfies the following conditional expression is calculated as a character cutout position.

【００６９】[0069]

【数４】（ｘ_t−ｘ_s）／ｎ＝Ｘ_n 具体的には、隣接する２つの文字切り出し位置をｘ_i，
ｘ_i+1（ｉ＝１，２，・・・、ｘ_s≦ｘ_i，ｘ_i+1≦ｘ
_t）としたときに、文字切り出し部１０３は、下記条件
式を満たす文字切り出し位置ｘ_i（ｘ_i≠ｘ_s，ｘ_t）
を算出する。## EQU00004 ## (x _t −x _s ) / n = X _n Specifically, two adjacent character cutting positions are defined as x _i ,
x _{i + 1} (i = 1, 2, ..., X _s ≦ x _i , x _{i + 1} ≦ x
_t ), the character cutout unit 103 determines that the character cutout position x _i (x _i ≠ x _s , x _t ) that satisfies the following conditional expression.
To calculate.

【００７０】[0070]

【数５】Ｘ_n−ｔ₅≦ｘ_i+1−ｘ_i≦Ｘ_n＋ｔ₆ ここで、ｔ₅及びｔ₆は入力文字列１０１の学習サンプ
ルによって決まるパラメータである。X _n −t ₅ ≦ x _{i + 1} −x _i ≦ X _n + t ₆ where t ₅ and t ₆ are parameters determined by the learning sample of the input character string 101.

【００７１】以上説明した文字切り出し部１０３による
文字切り出し処理の後、特徴抽出部１０５が、その切り
出された１文字から、認識のための特徴量である特徴ベ
クトルを抽出する（図２のステップ２０２）。After the character cutout processing by the character cutout unit 103 described above, the feature extraction unit 105 extracts a feature vector, which is a feature amount for recognition, from the cut-out one character (step 202 in FIG. 2). ).

【００７２】具体的には、特徴抽出部１０５は、例えば
以下の一連の処理によって特徴ベクトルを抽出する。即
ちまず、特徴抽出部１０５は、切り出された文字のイメ
ージデータから文字輪郭画素を抽出する。Specifically, the feature extraction unit 105 extracts the feature vector by the following series of processing, for example. That is, first, the feature extraction unit 105 extracts a character contour pixel from the cut-out character image data.

【００７３】次に、特徴抽出部１０５は、その切り出さ
れた領域を複数の分割領域に分割する。更に、特徴抽出
部１０５は、各分割領域につき、その分割領域内の輪郭
画素毎に方向成分（例えば、縦方向、横方向、左斜め方
向、右斜め方向の４方向成分）を抽出し、その分割領域
内の全輪郭画素の方向成分を集計することによりその分
割領域内の各方向成分毎の集計値を算出し、それらを各
方向成分に対応する要素値として有する部分特徴ベクト
ルを算出する。Next, the feature extraction unit 105 divides the cut out area into a plurality of divided areas. Further, the feature extraction unit 105 extracts a direction component (for example, a four-direction component of a vertical direction, a horizontal direction, a diagonal left direction, and a diagonal right direction) for each contour pixel in the divided region for each divided region, By summing up the direction components of all the contour pixels in the divided area, a total value for each direction component in the divided area is calculated, and a partial feature vector having them as element values corresponding to each direction component is calculated.

【００７４】最後に、特徴抽出部１０５は、全ての分割
領域の部分特徴ベクトルの各要素を統合することによ
り、特徴ベクトルを抽出する。上述のようにして特徴抽
出部１０５が切り出された文字の特徴ベクトルを抽出し
た後に、マッチング部１０６が、その切り出された文字
の特徴ベクトルと、特定文字標準パターン辞書１０７内
の各特定文字標準パターンの特徴ベクトルとの間のマッ
チング処理を実行し（図２のステップ２０３）、マッチ
ング度が高い順に所定順位までの各特定文字標準パター
ンが属する各特定文字の字種カテゴリーを、上記切り出
された文字に対する候補特定文字群として候補文字列バ
ッファ１０８に出力する（図２のステップ２０４）。Finally, the feature extraction unit 105 extracts the feature vector by integrating the elements of the partial feature vectors of all the divided areas. After the feature extraction unit 105 has extracted the feature vector of the extracted character as described above, the matching unit 106 extracts the feature vector of the extracted character and each specific character standard pattern in the specific character standard pattern dictionary 107. The matching process with the feature vector is performed (step 203 in FIG. 2), and the character type categories of the specific characters to which the specific character standard patterns up to a predetermined rank belong in the descending order of matching degree are extracted as the extracted characters. It is output to the candidate character string buffer 108 as a candidate specific character group for (step 204 in FIG. 2).

【００７５】より具体的には、マッチング部１０６が、
その切り出された文字の特徴ベクトルと、特定文字標準
パターン辞書１０７内の各特定文字標準パターンの特徴
ベクトルとの間で、例えば距離（ユークリッド距離、マ
ハラノビス距離等）を計算する。そして、マッチング部
１０６は、距離が小さい順に所定順位（ｎ位）までの各
特定文字標準パターンが属する各特定文字の字種カテゴ
リーを、上述の切り出された特定文字に対する候補特定
文字群として候補文字列バッファ１０８に出力する。More specifically, the matching unit 106
For example, a distance (Euclidean distance, Mahalanobis distance, etc.) is calculated between the cut-out character feature vector and the feature vector of each specific character standard pattern in the specific character standard pattern dictionary 107. Then, the matching unit 106 determines the character type categories of the specific characters to which the specific character standard patterns up to a predetermined rank (nth) belong in the order of increasing distance as candidate character groups for the cut-out specific characters. Output to the column buffer 108.

【００７６】なお、１位の特定文字標準パターンの距離
が所定のしきい値Ｔ₁より大きい場合は、その切り出さ
れた文字にはリジェクト（認識不能）情報が付加され
る。ここで、上述の特定文字標準パターン辞書１０７に
ついて、具体例を挙げて説明する。If the distance of the standard pattern of the first specific character is larger than the predetermined threshold value T ₁ , rejected (unrecognizable) information is added to the cut out character. Here, the above-mentioned specific character standard pattern dictionary 107 will be described with a specific example.

【００７７】今、入力文字列１０１が住所文字列である
場合を考える。本実施の形態では、最初は、例えば住所
文字列において、その階層構造の区切りを示す出現頻度
が高い、「都」「道」「府」「県」「市」「区」「郡」
「町」「村」「字」「大字」等の１文字又は２文字から
なる特定文字のみが認識されればよい。また、住所文字
列においては、「東」「西」「南」「北」等の特定文字
も出現頻度が高い。Now, consider the case where the input character string 101 is an address character string. In the present embodiment, at first, for example, in an address character string, the frequency of appearance that indicates the division of the hierarchical structure is high, and "city", "road", "prefecture", "prefecture", "city", "ward", "county".
Only specific characters consisting of one or two characters such as “town”, “village”, “letter”, “large letter” need to be recognized. Further, in the address character string, specific characters such as “east”, “west”, “south” and “north” also appear frequently.

【００７８】このため、本実施の形態では、これらの特
定文字の認識精度を高めるために、これらの特定文字の
標準パターンのみから構成され辞書容量の小さな特定文
字標準パターン辞書１０７が使用される。Therefore, in the present embodiment, in order to improve the recognition accuracy of these specific characters, the specific character standard pattern dictionary 107 which is composed of only the standard patterns of these specific characters and has a small dictionary capacity is used.

【００７９】このような特定文字標準パターン辞書１０
７が標準パターン辞書１１３とは別に用意されることに
より、認識処理速度を短縮し、かつ認識精度を高めるこ
とが可能となる。Such a specific character standard pattern dictionary 10
By preparing 7 separately from the standard pattern dictionary 113, the recognition processing speed can be shortened and the recognition accuracy can be improved.

【００８０】なお、特定文字標準パターン辞書１０７が
標準パターン辞書１１３と同じ辞書として構成され、特
定文字の認識精度を高めるために、各特定文字毎に多く
のテンプレート（標準パターン）が記憶されるように構
成されてもよい。The specific character standard pattern dictionary 107 is constructed as the same dictionary as the standard pattern dictionary 113, and many templates (standard patterns) are stored for each specific character in order to improve the recognition accuracy of the specific character. May be configured as.

【００８１】一方、入力文字列１０１が氏名文字列であ
る場合には、住所文字列のように区切りとなる文字は存
在しないが、出現頻度の高い文字種は存在する。例え
ば、名字に使用される文字は出現頻度において上位５０
０位までの文字種が８２％程度をカバーしているため、
上位Ｎ文字で特定文字標準パターン辞書１０７が作成さ
れるように構成することができる。On the other hand, when the input character string 101 is a name character string, there are no delimiter characters like the address character string, but there are character types that appear frequently. For example, the characters used as surnames are in the top 50 in terms of appearance frequency.
Since the character types up to the 0th place cover about 82%,
The specific character standard pattern dictionary 107 can be constructed with the upper N characters.

【００８２】或いは、標準パターン辞書１１３から選択
的にＮ字種のみが特定文字の認識に使用されるように構
成されてもよい。そして、特定文字辞書１１０は、上述
の特定の字種に対応するように構成される。Alternatively, it may be constructed such that only N character types are selectively used from the standard pattern dictionary 113 for recognition of a specific character. Then, the specific character dictionary 110 is configured to correspond to the above-mentioned specific character type.

【００８３】また、出現頻度によって字種を選択するの
ではなく、認識しやすい文字を多数の実データから統計
的に決定し、それらの決定された字種を選択するように
構成してもよい。Further, instead of selecting the character type according to the appearance frequency, it is possible to statistically determine easily recognizable characters from a large number of actual data and select the determined character type. .

【００８４】文字切り出し部１０３、特徴抽出部１０
５、及びマッチング部１０６による上記一連の特定文字
認識処理は、文字切り出し部１０３が入力文字列１０１
の先頭から順に切り出した文字毎に実行される（図２の
ステップ２０５→２０２の繰り返し）。この結果、候補
文字列バッファ１０８には、入力文字列１０１から切り
出された文字の並び順に対応する並び順で、各文字毎の
候補特定文字群が保持される。＜特定文字間領域の候補単語の検索とその領域での再認
識処理＞候補単語検索部１０９は、候補文字列バッファ
１０８に得られた候補特定文字群の集合の中から隣接す
る任意の２つの特定文字からなる組（特定文字組）を全
て抽出し、それぞれの特定文字組が特定文字辞書１１０
に登録されているか否かを検索する。そして、候補単語
検索部１０９は、１組の特定文字組が特定文字辞書１１
０に登録されている場合、その登録レコードにリンクす
る知識辞書１１１中のレコードから、その特定文字組を
構成する２つの特定文字により挟まれる単語群を検索
し、その検索された単語群を候補単語群として候補単語
バッファ１１２に保持する（以上、図３のステップ２０
６）。Character cutout unit 103, feature extraction unit 10
5 and the series of specific character recognition processing by the matching unit 106, the character cutting unit 103 performs the input character string 101.
It is executed for each character cut out in order from the beginning of the (repeating steps 205 → 202 in FIG. 2). As a result, the candidate character string buffer 108 holds a candidate specific character group for each character in an arrangement order corresponding to the arrangement order of the characters cut out from the input character string 101. < Search for candidate words in the specific inter-character area and re-recognize in that area
Knowledge processing > the candidate word search unit 109 extracts all sets of adjacent two specific characters (specific character sets) from the set of candidate specific character groups obtained in the candidate character string buffer 108, and extracts each of them. The specific character set of is a specific character dictionary 110
Search whether or not registered in. Then, the candidate word search unit 109 determines that one set of specific characters is the specific character dictionary 11
When registered in 0, the record in the knowledge dictionary 111 linked to the registered record is searched for a word group sandwiched by two specific characters forming the specific character set, and the searched word group is a candidate. The word group is held in the candidate word buffer 112 (above, step 20 in FIG. 3).
6).

【００８５】今、入力文字列１０１が住所文字列である
場合を考える。なお、住所文字列以外の氏名文字列、品
名文字列等については、階層構造を持たないため、階層
構造に関する部分を除いて住所文字列の場合と同様に実
現できる。Now, consider the case where the input character string 101 is an address character string. Note that name character strings, product name character strings, and the like other than the address character string do not have a hierarchical structure, and thus can be realized in the same manner as the case of the address character string except for the portion related to the hierarchical structure.

【００８６】住所辞書である知識辞書１１１の構造は、
例えば図１０に示されるように、住所の階層構造に従っ
て、レベル１：都道府県、レベル２：市区郡、レベル
３：町村、・・・というように分割されて、それぞれの
階層に属する単語が格納されている。The structure of the knowledge dictionary 111, which is an address dictionary, is
For example, as shown in FIG. 10, according to the hierarchical structure of the address, the words are divided into level 1: prefecture, level 2: city / gun, level 3: town / village, ... It is stored.

【００８７】一方、特定文字辞書１１０には、図１１に
示されるように、「文字１」と「文字２」という２つの
特定文字からなる特定文字組に対応するレコード毎に、
その特定文字組を構成する２つの特定文字により挟まれ
る単語群が格納されている知識辞書１１１上のレコード
の集合を示すための、ポインタ情報とそのポインタから
始まるデータ数情報とからなるデータ組が格納されてい
る。このデータ組としては、図１１に示されるように複
数組指定することができ、特定文字辞書１１０の各特定
文字組毎のレコードには、図１１に示されるように、上
記ポインタ情報とデータ数情報のデータ組の数に対応す
るポインタ数情報Ｎも記憶される。On the other hand, in the specific character dictionary 110, as shown in FIG. 11, for each record corresponding to a specific character set consisting of two specific characters "character 1" and "character 2",
A data set including pointer information and data number information starting from the pointer for indicating a set of records in the knowledge dictionary 111 in which a word group sandwiched by two specific characters forming the specific character set is stored. It is stored. As this data set, a plurality of sets can be designated as shown in FIG. 11. In the record for each specific character set of the specific character dictionary 110, as shown in FIG. The pointer number information N corresponding to the number of data sets of information is also stored.

【００８８】図１２の例では、特定文字辞書１１０内
の、空白文字と「県」という２つの特定文字からなる特
定文字組に対応するレコードには、図１０に示される知
識辞書１１１内のレベル１領域内の単語「青森」から始
まるｎ₁個のレコードと、同じくレベル１領域内の単語
「神奈川」から始まるｎ₂個のレコードをそれぞれ示す
データ組（ポインタ情報とデータ数情報）と、ポインタ
数Ｎ＝２が登録されている。In the example of FIG. 12, a record corresponding to a specific character set composed of two specific characters of a blank character and “prefecture” in the specific character dictionary 110 has a level in the knowledge dictionary 111 shown in FIG. A data set (pointer information and data number information) indicating n ₁ records starting from the word “Aomori” in one area and n ₂ records starting from the word “Kanagawa” in the level 1 area, respectively, and a pointer. The number N = 2 is registered.

【００８９】また図１３の例では、特定文字辞書１１０
内の、「都」と「区」という２つの特定文字からなる特
定文字組に対応するレコードには、図１０に示される知
識辞書１１１内のレベル２領域内の単語「千代田」から
始まるｎ₃個のレコードと、ポインタ数Ｎ＝１が登録さ
れている。Further, in the example of FIG. 13, the specific character dictionary 110
In the record corresponding to the specific character set consisting of two specific characters, “to” and “ward”, in the record n ₃ starting from the word “Chiyoda” in the level 2 area in the knowledge dictionary 111 shown in FIG. The number of records and the number of pointers N = 1 are registered.

【００９０】また、住所は通常、「・・・丁目・・・番
地・・・方」という書き方で終わるが、このような特定
文字「丁目」「番地」「番」「方」「号」に挟まれた領
域には、単語ではなく数字が記入される場合が多い。こ
のような場合には、図１４に示されるように、特定文字
辞書１１０内の、上記特定文字からなる特定文字組に対
応するレコードには、前述したようんポインタ情報とデ
ータ数情報とかなるデータ組ではなく、「（数字）＊
ｎ」というような記号が設定される。候補単語検索部１
０９は、特定文字辞書１１０から上述したような記号が
設定されているレコードを検索した場合には、上述のよ
うな特定文字に挟まれた領域には数字が連続して記入さ
れていることを検出し、その旨を示す検出結果を候補単
語バッファ１１２に書き込む。The address usually ends with the notation "... chome ... address ... kata", but with such special characters "chome""address""ban""kata""go" In many cases, numbers are entered in the sandwiched area instead of words. In such a case, as shown in FIG. 14, in the record corresponding to the specific character set made up of the specific characters in the specific character dictionary 110, the data including the pointer information and the data number information as described above. Not a pair, but "(number) *
A symbol such as "n" is set. Candidate word search unit 1
When searching for a record in which the above-described symbol is set from the specific character dictionary 110, it is confirmed that the numbers 09 are continuously entered in the area sandwiched between the specific characters as described above. The detection result indicating the detection is written in the candidate word buffer 112.

【００９１】更に、例えば図１５に示されるような特定
文字辞書１１０及び知識辞書１１１の構成も可能であ
る。即ち、図１５の例では、特定文字辞書１１０内の、
空白文字と「川」という２つの特定文字からなる特定文
字組に対応するレコードには、知識辞書１１１内の４文
字の単語「神奈川県」を指すポインタ情報及びデータ数
＝１と、知識辞書１１１内の２文字の単語「神奈」を指
すポインタ情報及びデータ数＝１が設定される。Furthermore, the specific character dictionary 110 and the knowledge dictionary 111 as shown in FIG. 15, for example, can be configured. That is, in the example of FIG. 15, in the specific character dictionary 110,
In a record corresponding to a specific character set consisting of two specific characters, that is, a blank character and “kawa”, pointer information indicating the 4-character word “Kanagawa” in the knowledge dictionary 111 and the number of data = 1, and the knowledge dictionary 111. The pointer information pointing to the two-letter word “Kanna” and the number of data = 1 are set.

【００９２】また特定文字辞書１１０内の、「川」と
「中」という２つの特定文字からなる特定文字組に対応
するレコードには、知識辞書１１１内の２文字の単語
「崎市」を指すポインタ情報及びデータ数＝１が設定さ
れる。The record corresponding to a specific character set consisting of two specific characters “kawa” and “middle” in the specific character dictionary 110 indicates the two-character word “Sakiichi” in the knowledge dictionary 111. The pointer information and the number of data = 1 are set.

【００９３】更に特定文字辞書１１０内の、「中」と
「中」という２つの特定文字からなる特定文字組に対応
するレコードには、知識辞書１１１内の５文字の単語
「原区上小田」を指すポインタ情報及びデータ数＝１が
設定される。Further, in the record corresponding to the specific character set composed of two specific characters “middle” and “middle” in the specific character dictionary 110, the five-character word “Hara-ku Ueda” in the knowledge dictionary 111 is recorded. Pointer information pointing to and the number of data = 1 are set.

【００９４】このように、住所文字列に高い頻度で出現
する特定文字及び単語に対応する情報を、特定文字辞書
１１０と知識辞書１１１に記憶させることも可能であ
る。次に、図１６に示されるように、特定文字辞書１１
０内の、「区」と住所の終わりを示す特定文字の２つの
特定文字からなる特定文字組に対応するレコードに、知
識辞書１１１内の単語「丸の内」がリンクしている場合
に、表示のゆらぎとして、「丸の内」ではなく「丸ノ
内」という文字列が記入される可能性がある。このよう
な場合に、知識辞書１１１に全ての表記のゆらぎに対応
する単語を記憶させるのは無駄である。As described above, it is possible to store the information corresponding to the specific characters and words that frequently appear in the address character string in the specific character dictionary 110 and the knowledge dictionary 111. Next, as shown in FIG. 16, the specific character dictionary 11
If the word "Marunouchi" in the knowledge dictionary 111 is linked to the record corresponding to the specific character set consisting of two specific characters, "ku" and the specific character indicating the end of the address, in 0, As a fluctuation, the character string "Marunouchi" may be entered instead of "Marunouchi". In such a case, it is useless to store the words corresponding to all the fluctuations of the writing in the knowledge dictionary 111.

【００９５】そこで、本実施の形態では、特定文字辞書
１１０からリンクする知識辞書１１１内の単語の検索時
に、図９の動作フローチャートで示される表記のゆれに
対処するための制御動作が実行される。Therefore, in the present embodiment, when a word in the knowledge dictionary 111 linked from the specific character dictionary 110 is searched, a control operation for coping with the fluctuation of the notation shown in the operation flowchart of FIG. 9 is executed. .

【００９６】まず、候補単語検索部１０９は、１組の特
定文字組に対し、特定文字辞書１１０及び知識辞書１１
１をここまで説明してきた規則に従って検索し、その結
果検索された単語群を現在処理中の特定文字組に対応す
る候補単語群として候補単語バッファ１１２に書き込む
（図９のステップ９０１）。このステップ９０１は、図
３のステップ２０６の一部である。First, the candidate word searching unit 109, for one set of specific character sets, the specific character dictionary 110 and the knowledge dictionary 11
1 is searched according to the rules described so far, and the word group searched as a result is written in the candidate word buffer 112 as a candidate word group corresponding to the specific character set currently being processed (step 901 in FIG. 9). This step 901 is part of step 206 of FIG.

【００９７】次に、図３のステップ２０６の一部とし
て、候補単語検索部１０９は、１組の特定文字組に対し
て候補単語バッファ１１２に得られた候補単語群のそれ
ぞれに対して、図９のステップ９０２〜９１０で示され
る一連の処理を繰り返し実行する。Next, as a part of step 206 in FIG. 3, the candidate word search unit 109 performs a drawing for each candidate word group obtained in the candidate word buffer 112 for one specific character set. The series of processes shown in Steps 902 to 910 of 9 are repeatedly executed.

【００９８】即ち、候補単語検索部１０９は、検出した
単語を構成する文字にひらがなが存在する場合に、その
ひらがなをカタカナに変更し、その結果得られる単語
を、現在処理中の特定文字組に対応する他の候補単語と
して、候補単語バッファ１１２に書き込む（図９のステ
ップ９０２→９０３）。That is, the candidate word search unit 109 changes the hiragana to katakana when the characters forming the detected word have hiragana, and the word obtained as a result is changed to the specific character set currently being processed. The corresponding candidate word is written in the candidate word buffer 112 (step 902 → 903 in FIG. 9).

【００９９】次に、候補単語検索部１０９は、検出した
単語を構成する文字にカタカナが存在する場合に、その
カタカナをひらがなに変更し、その結果得られる単語
を、現在処理中の特定文字組に対応する他の候補単語と
して、候補単語バッファ１１２に書き込む（図９のステ
ップ９０４→９０５）。Next, the candidate word search unit 109 changes the katakana to hiragana if the characters forming the detected word have katakana, and the word obtained as a result is changed to the specific character set currently being processed. Is written in the candidate word buffer 112 as another candidate word corresponding to (step 904 → 905 in FIG. 9).

【０１００】次に、候補単語検索部１０９は、検出した
単語を構成する文字に漢数字が存在する場合に、その漢
数字をアラビア数字に変更し、その結果得られる単語
を、現在処理中の特定文字組に対応する他の候補単語と
して、候補単語バッファ１１２に書き込む（図９のステ
ップ９０６→９０７）。Next, the candidate word searching unit 109 changes the Chinese numeral into an Arabic numeral when the detected word has a Chinese numeral, and the resultant word is currently being processed. Other candidate words corresponding to the specific character set are written in the candidate word buffer 112 (steps 906 → 907 in FIG. 9).

【０１０１】次に、候補単語検索部１０９は、検出した
単語を構成する文字にアラビア数字が存在する場合に、
そのアラビア数字を漢数字に変更し、その結果得られる
単語を、現在処理中の特定文字組に対応する他の候補単
語として、候補単語バッファ１１２に書き込む（図９の
ステップ９０８→９０９）。Next, the candidate word searching unit 109 determines that if the characters forming the detected word include Arabic numerals,
The Arabic numeral is changed to the Chinese numeral, and the word obtained as a result is written in the candidate word buffer 112 as another candidate word corresponding to the specific character set currently being processed (steps 908 → 909 in FIG. 9).

【０１０２】最後に候補単語検索部１０９は、検出した
単語を構成する文字に省略可能文字（例えば「溝ノ口」
が「溝口」と省略されたときの「ノ」）が存在する場合
に、その省略可能文字を省略して得られる文字列を、現
在処理中の特定文字組に対応する他の候補単語として、
候補単語バッファ１１２に書き込む（図９のステップ９
０８→９０９）。Finally, the candidate word searching unit 109 determines that the characters forming the detected word can be omitted characters (for example, "Mizonokuchi").
When there is an abbreviated "Mizoguchi"), the character string obtained by omitting the abbreviated characters is used as another candidate word corresponding to the specific character set currently being processed.
Write to the candidate word buffer 112 (step 9 in FIG. 9).
08 → 909).

【０１０３】候補単語検索部１０９は、１組の特定文字
組に対して候補単語バッファ１１２にまだ表記のゆらぎ
に対する制御処理を実行していない候補単語群がある場
合には、上述の図９のステップ９０２〜９１０で示され
る一連の処理を繰り返し実行する（図９のステップ９１
１→９０２〜９１０→９１１の繰り返し）。If there is a candidate word group that has not yet been subjected to the control processing for the fluctuation of the notation in the candidate word buffer 112 for one set of specific character sets, the candidate word searching unit 109 of FIG. A series of processing shown in steps 902 to 910 is repeatedly executed (step 91 in FIG. 9).
1 → 902 to 910 → 911).

【０１０４】上述のようにして、１組の特定文字組に対
して候補単語バッファ１１２に得られた候補単語群に対
して、表記のゆらぎに対する制御が実現される。以上の
ようにして、候補文字列バッファ１０８から選択された
１組の特定文字組に対して候補単語バッファ１１２に候
補単語群が得られる。As described above, for the candidate word group obtained in the candidate word buffer 112 for one specific character set, the control for the fluctuation of the notation is realized. As described above, a candidate word group is obtained in the candidate word buffer 112 for one specific character set selected from the candidate character string buffer 108.

【０１０５】今、例えば図１７に示される入力文字列１
０１が記入されると、前述の図２のステップ２０１〜２
０５の特定文字の認識処理によって、領域１７０１が特
定文字「都」、領域１７０２が特定文字「区」と認識さ
れる。Now, for example, the input character string 1 shown in FIG.
When 01 is entered, steps 201 to 2 in FIG.
Through the specific character recognition processing of 05, the area 1701 is recognized as the specific character “Tou” and the area 1702 is recognized as the specific character “ward”.

【０１０６】この認識結果に対して、候補単語検索部１
０９は、上述した図３のステップ２０６で、特定文字辞
書１１０において空白文字と特定文字「都」とからなる
特定文字組のレコードを検出し、その登録レコードにリ
ンクする知識辞書１１１中のエントリから、その特定文
字組を構成する２つの特定文字によって挟まれる１つの
単語「東京」を検索して、その検索された単語を、空白
文字と特定文字「都」とからなる特定文字組に対応する
候補単語群として、候補単語バッファ１１２に保持す
る。この場合は、上記特定文字組に対する候補単語群の
数は１個で、図１８に示されるように、、候補単語「東
京」の文字数は２文字となる。For this recognition result, the candidate word search unit 1
In step 206 of FIG. 3 described above, 09 is a record of a specific character set consisting of a blank character and a specific character “TO” in the specific character dictionary 110, and from the entry in the knowledge dictionary 111 linked to the registered record. , Searching for one word “Tokyo” sandwiched by two specific characters forming the specific character set, and corresponding the searched word to a specific character set consisting of a blank character and a specific character “TO” The candidate word group 112 holds the candidate word group. In this case, the number of candidate word groups for the specific character set is one, and as shown in FIG. 18, the candidate word "Tokyo" has two characters.

【０１０７】また、候補単語検索部１０９は、後述する
図３のステップ２１１の判定の後に２回目に実行される
図３のステップ２０６で、特定文字辞書１１０において
特定文字「都」と「区」からなる特定文字組のレコード
を検出し、その登録レコードにリンクする図１０に示さ
れる知識辞書１１１中のエントリから、その特定文字組
を構成する２つの特定文字によって挟まれる２３個の単
語「千代田」「中央」「港」・・・を検索して、それら
の検索された単語群を、上記特定文字組に対応する候補
単語群として、候補単語バッファ１１２に保持する。こ
の場合は、上記特定文字組に対する候補単語群の数は２
３個となり、図１９に示されるように、各候補単語の文
字数は、３文字、２文字、又は１文字の何れかとなる。In addition, the candidate word search unit 109 executes the second time in step 206 after the determination in step 211 of FIG. 3, which will be described later, in the specific character dictionary 110. A record of a specific character set consisting of is detected, and from the entry in the knowledge dictionary 111 shown in FIG. 10 linked to the registered record, 23 words “Chiyoda” sandwiched by two specific characters forming the specific character set are detected. , “Center”, “port” ... Are searched and the searched word group is held in the candidate word buffer 112 as a candidate word group corresponding to the specific character set. In this case, the number of candidate word groups for the specific character set is 2
There are three, and as shown in FIG. 19, the number of characters of each candidate word is either three, two, or one.

【０１０８】このようにして、候補文字列バッファ１０
８から選択された１組の特定文字組に対して候補単語バ
ッファ１１２に候補単語群が得られた後、その候補単語
群に属する候補単語のそれぞれにつき、文字切り出し部
１０３、特徴抽出部１０５、及びマッチング部１０６
が、図３のステップ２０７〜２１１の一連の再認識処理
を実行することにより、各候補単語毎に所定順位までの
候補認識結果群を抽出する。In this way, the candidate character string buffer 10
After a candidate word group is obtained in the candidate word buffer 112 for one specific character set selected from 8, the character segmentation unit 103, the feature extraction unit 105, and the candidate extraction unit 105 for each candidate word belonging to the candidate word group. And matching unit 106
However, a series of re-recognition processes in steps 207 to 211 of FIG. 3 is executed to extract a candidate recognition result group up to a predetermined rank for each candidate word.

【０１０９】まず、文字切り出し部１０３は、イメージ
メモリ１０２から読み出される入力文字列１０１におい
て、候補単語バッファ１１２から出力された候補単語の
情報を使って、その候補単語が属する特定文字組を構成
する２つの特定文字に挟まれた文字列領域内の文字列を
再度切り出す（図３のステップ２０７）。First, the character slicing unit 103 constructs a specific character set to which the candidate word belongs by using the information of the candidate word output from the candidate word buffer 112 in the input character string 101 read from the image memory 102. The character string in the character string area sandwiched between two specific characters is cut out again (step 207 in FIG. 3).

【０１１０】この場合、候補単語の文字数が例えば図１
８に示される「東京」又は図１９に示される「中央」の
ように２文字である場合には、文字切り出し部１０３
は、前述した図６のステップ６０１〜６０４及び図７の
ステップ７０１で示される動作フローチャートに従っ
て、文字切り出しの対象となる領域を２分割して（前述
した数３式におけるｎ＝２）、各文字の切り出し位置を
決定する。In this case, if the number of characters of the candidate word is,
If there are two characters such as “Tokyo” shown in FIG. 8 or “center” shown in FIG.
In accordance with the operation flow chart shown in steps 601 to 604 of FIG. 6 and step 701 of FIG. Determine the cutout position of.

【０１１１】また候補単語の文字数が例えば図１９に示
される「千代田」のように３文字である場合は、文字切
り出し部１０３は、文字切り出しの対象となる領域を３
分割して（前述した数３式におけるｎ＝３）、各文字の
切り出し位置を決定する。When the number of characters of the candidate word is three, such as "Chiyoda" shown in FIG. 19, the character cutout unit 103 sets the target area of the character cutout to three.
After dividing (n = 3 in the above-mentioned formula 3), the cut-out position of each character is determined.

【０１１２】更に候補単語の文字数が例えば図１９に示
される「港」のように１文字である場合は、文字切り出
し部１０３は、文字切り出しの対象となる領域に１文字
のみが存在すると仮定する（前述した数３式におけるｎ
＝１）。Further, when the number of characters of the candidate word is one character like "port" shown in FIG. 19, the character slicing section 103 assumes that only one character is present in the area to be the character slicing target. (N in the above equation 3
= 1).

【０１１３】次に特徴抽出部１０５は、再度切り出され
た文字列に対して１文字ずつ、前述したようにして特徴
ベクトルを抽出する（図３のステップ２０８）。更に、
マッチング部１０６は、上記各文字毎に、その文字の特
徴ベクトルと、第２の辞書である標準パターン辞書１１
３内の各標準パターンの特徴ベクトルとの間のマッチン
グ処理を実行し（図３のステップ２０９）、マッチング
度が高い順に所定順位までの各標準パターンが属する各
字種カテゴリーを、上記文字に対する候補文字群として
候補文字列バッファ１０８に出力する（図３のステップ
２１０）。Next, the feature extraction unit 105 extracts a feature vector for each character of the re-cut character string as described above (step 208 in FIG. 3). Furthermore,
The matching unit 106 includes, for each character, the feature vector of the character and the standard pattern dictionary 11 that is the second dictionary.
Matching processing with the feature vector of each standard pattern in 3 is performed (step 209 in FIG. 3), and each character type category to which each standard pattern up to a predetermined rank belongs in the descending order of matching degree is a candidate for the character. The character group is output to the candidate character string buffer 108 (step 210 in FIG. 3).

【０１１４】より具体的には、マッチング部１０６が、
上記文字の特徴ベクトルと、標準パターン辞書１１３内
の各標準パターンの特徴ベクトルとの間で、例えば距離
（ユークリッド距離、マハラノビス距離等）を計算す
る。そして、マッチング部１０６は、距離が小さい順に
所定順位（ｎ位）までの各標準パターンが属する各字種
カテゴリーを、上述の文字に対する候補文字群として候
補文字列バッファ１０８に出力する。More specifically, the matching unit 106
For example, a distance (Euclidean distance, Mahalanobis distance, etc.) is calculated between the character feature vector and the feature vector of each standard pattern in the standard pattern dictionary 113. Then, the matching unit 106 outputs, to the candidate character string buffer 108, the character type categories to which the standard patterns up to a predetermined rank (nth place) belong in order of increasing distance, as a candidate character group for the above-mentioned characters.

【０１１５】文字切り出し部１０３によって再度切り出
された文字列を構成する各文字のそれぞれについて、上
述のように距離が小さい順に所定順位までの候補文字群
が候補文字列バッファ１０８に得られた後、１つの特定
文字組について候補単語バッファ１１２に得られた候補
単語群に属する他の候補単語について、ステップ２０７
〜２１０の一連の処理が繰り返し実行される。After the candidate character group buffer 108 obtains a candidate character group up to a predetermined rank in the ascending order of distance, for each of the characters constituting the character string re-cut by the character cutting unit 103, For other candidate words belonging to the candidate word group obtained in the candidate word buffer 112 for one specific character set, step 207
A series of processes 210 to 210 are repeatedly executed.

【０１１６】１つの特定文字組について候補単語バッフ
ァ１１２に得られた候補単語群に属する全ての候補単語
について、それぞれを構成する文字毎に所定順位までの
候補文字群が候補文字列バッファ１０８に得られると、
マッチング部１０６は、各候補単語のそれぞれについ
て、それぞれを構成する文字毎の所定順位までの候補文
字群の全てを組み合わせて候補文字列群を生成し、それ
に含まれる各候補文字列毎に、次式によってその平均距
離を計算する（図３のステップ２１２）。With respect to all candidate words belonging to the candidate word group obtained in the candidate word buffer 112 for one specific character set, the candidate character string buffer 108 obtains a candidate character group up to a predetermined rank for each of the characters constituting each candidate word group. When
The matching unit 106 generates a candidate character string group by combining all candidate character groups up to a predetermined rank for each character that constitutes each candidate word, and generates a candidate character string group for each candidate character string included in it. The average distance is calculated by the formula (step 212 in FIG. 3).

【０１１７】[0117]

【数６】（Ｄ₁＋Ｄ₂＋・・・＋Ｄ_m）／ｍここで、ｍは対象候補単語の文字数であり、Ｄ_i（１≦
ｉ≦ｍ）は、対象候補単語内のｉ文字目において対象候
補文字列を構成するために選択された候補文字の距離を
示す。(D ₁ + D ₂ + ... + D _m ) / m where m is the number of characters of the target candidate word, and D _i (1 ≦
i ≦ m) indicates the distance of the candidate character selected to form the target candidate character string at the i-th character in the target candidate word.

【０１１８】そして、マッチング部１０６は、１つの特
定文字組についての全ての候補単語に対応して生成され
た候補文字列群の中から、それを構成する各候補文字列
に対応する平均距離が小さい順に所定数（Ｐ個）の候補
文字列を選択し、それらを上記特定文字組を構成する２
つの特定文字により挟まれた文字領域の認識結果とし
て、知識処理部１１４に出力する。Then, the matching unit 106 determines that the average distance corresponding to each of the candidate character strings forming the candidate character string group from among the candidate character string groups generated corresponding to all the candidate words for one specific character set. A predetermined number (P) of candidate character strings are selected in ascending order, and they are formed into the specific character set. 2
It is output to the knowledge processing unit 114 as the recognition result of the character area sandwiched by the one specific character.

【０１１９】このようにして、１つの特定文字組を構成
する２つの特定文字により挟まれた文字領域の認識結果
が得られると、再び図３のステップ２１３からステップ
２０６の処理に戻る。In this way, when the recognition result of the character area sandwiched by the two specific characters forming one specific character set is obtained, the process returns from step 213 to step 206 in FIG.

【０１２０】そして、前述の図２のステップ２０１〜２
０５の特定文字の認識処理によって候補文字列バッファ
１０８に得られている候補特定文字群の集合の中から隣
接する他の任意の２つの特定文字からなる他の特定文字
組が再び抽出され、その特定文字組に対して図３のステ
ップ２０６〜２１２の一連の制御処理が再び実行される
ことにより、その特定文字組を構成する２つの特定文字
により挟まれた文字領域の認識結果が算出されるという
動作が、各特定文字組毎に繰り返し実行される（図３の
ステップ２１３→２０６〜２１２→２１３の繰り返
し）。Then, the above-mentioned steps 201 to 2 of FIG.
From the set of candidate specific character groups obtained in the candidate character string buffer 108 by the specific character recognition process 05, another specific character set consisting of any two adjacent specific characters is extracted again, By performing the series of control processes of steps 206 to 212 of FIG. 3 again on the specific character set, the recognition result of the character region sandwiched by the two specific characters forming the specific character set is calculated. This operation is repeatedly executed for each specific character set (steps 213 → 206 to 212 → 213 in FIG. 3 are repeated).

【０１２１】知識処理部１１４は、各特定文字組を構成
する２つの特定文字に挟まれた各文字領域に対応する認
識結果に対して、記入フィールド定義１０４及び知識辞
書１１１を用いた知識処理によって、上記各文字領域か
らなる全体文字領域の最終認識結果を決定し、それを認
識結果バッファ１１５に出力する（図４のステップ２１
４）。The knowledge processing unit 114 performs knowledge processing using the entry field definition 104 and the knowledge dictionary 111 on the recognition result corresponding to each character area sandwiched between two specific characters forming each specific character set. , The final recognition result of the entire character area including the above-mentioned character areas is determined, and is output to the recognition result buffer 115 (step 21 in FIG. 4).
4).

【０１２２】以上説明した図２のステップ２０１〜図４
のステップ２１４の一連制御処理が帳票の記入フィール
ド位置毎に繰り返し実行されることにより、各記入フィ
ールドに対する最終認識結果が決定される（図４のステ
ップ２１５→図２のステップ２０１の繰り返し）。Step 201 to FIG. 4 of FIG. 2 explained above
The final control result for each entry field is determined by repeatedly executing the series of control processing in step 214 of each step for each entry field position of the form (step 215 in FIG. 4 → repeat step 201 in FIG. 2).

【０１２３】上述の一連の認識処理において、認識条件
を最後まで満たさなかった文字又は文字列の部分につい
ては、リジェクト（認識不能）情報が付加される。この
場合に、認識結果バッファ１１５に得られた認識結果
が、インタフェース部１１６を介して表示部１１７に表
示される。ユーザは、表示部１１７での認識結果の表示
に対して、マウス及びキーボード等からなる入力部１１
８から、認識不能文字／文字列を修正することができ
る。In the series of recognition processes described above, reject (unrecognizable) information is added to the part of the character or character string that does not satisfy the recognition condition to the end. In this case, the recognition result obtained in the recognition result buffer 115 is displayed on the display unit 117 via the interface unit 116. When the user displays the recognition result on the display unit 117, the user uses the input unit 11 including a mouse and a keyboard.
From 8, it is possible to correct unrecognizable characters / character strings.

【０１２４】ユーザは、入力部１１８から認識不能文字
／文字列中の特定の正解文字を指定するだけで、その正
解文字に関する情報がインタフェース部１１６から正解
文字バッファ１１９及び領域座標バッファ１２０に出力
される。The user only has to specify a specific correct character in the unrecognizable character / character string from the input unit 118, and information about the correct character is output from the interface unit 116 to the correct character buffer 119 and the area coordinate buffer 120. It

【０１２５】図２１の例では、表示部１１７に、認識結
果２１０２と並列に、対象文字列のイメージ２１０１が
表示される。ユーザは、イメージ２１０１上の特定領域
２１０３を入力部１１８であるマウス等から指示する
と、それに対応する認識結果文字２１０４が強調又は反
転表示等される。この表示に対し、ユーザが、入力部１
１８であるキーボード等から正解文字「都」を入力する
と、その正解文字「都」に関する情報がインタフェース
部１１６から正解文字バッファ１１９及び領域座標バッ
ファ１２０に出力される。当然、ユーザが、イメージ２
１０１上の例えば「東京」に対応する領域を指示し、そ
れに対応する認識結果「束長」を「東京」に修正する
と、その正解文字「東京」に関する情報がインタフェー
ス部１１６から正解文字バッファ１１９及び領域座標バ
ッファ１２０に出力される。In the example of FIG. 21, the image 2101 of the target character string is displayed on the display unit 117 in parallel with the recognition result 2102. When the user designates the specific area 2103 on the image 2101 with the mouse or the like which is the input unit 118, the recognition result character 2104 corresponding to it is highlighted or highlighted. In response to this display, the user inputs
When the correct character “Tou” is input from the keyboard or the like which is 18, information regarding the correct character “Tou” is output from the interface unit 116 to the correct character buffer 119 and the area coordinate buffer 120. Naturally, the user can
For example, when an area corresponding to “Tokyo” on 101 is designated and the corresponding recognition result “bunches” is corrected to “Tokyo”, information about the correct character “Tokyo” is transmitted from the interface unit 116 to the correct character buffer 119 and It is output to the area coordinate buffer 120.

【０１２６】候補単語検索部１０９は、正解文字バッフ
ァ１１９に得られた正解文字に関する情報を特定文字の
情報として、前述した特定文字辞書１１０と知識辞書１
１１を用いた候補単語の検索処理を実行することによ
り、認識不能文字を正しく再認識させることができる。
また、文字切り出し部１０３は、ユーザによって指定さ
れた正解文字の切り出し位置を領域座標バッファ１２０
から取得することによって、正しい文字の切り出しを実
行することができる。The candidate word searching unit 109 uses the information about the correct character obtained in the correct character buffer 119 as the information of the specific character, and the specific character dictionary 110 and the knowledge dictionary 1 described above.
By executing the candidate word search process using 11, the unrecognizable character can be correctly recognized again.
Further, the character cutout unit 103 determines the cutout position of the correct character specified by the user as the area coordinate buffer 120.
The correct character segmentation can be performed by acquiring from.

【０１２７】また、図２２の例では、表示部１１７に、
認識結果２２０２と並列に、対象文字列のイメージが表
示される。ユーザは、そのイメージ上の特定領域２２０
１を入力部１１８であるマウス等から指示すると、それ
に対応する認識結果文字２２０３が強調又は反転表示等
されると共に、指示部分に認識結果候補２２０４が表示
される。この表示に対して、ユーザが、入力部１１８で
あるキーボード等から正解文字「都」を選択すると、そ
の正解文字「都」に関する情報がインタフェース部１１
６から正解文字バッファ１１９及び領域座標バッファ１
２０に出力される。この場合に、指示部分に表示される
認識結果候補２２０４は、表示される文字の出現頻度
順、或いは住所文字列のように階層構造を有する場合に
はその階層構造による決定順、或いは単純に文字コード
順で表示されるように構成することができる。Further, in the example of FIG. 22, the display unit 117 displays
An image of the target character string is displayed in parallel with the recognition result 2202. The user selects a specific area 220 on the image.
When 1 is designated with the mouse or the like as the input unit 118, the recognition result character 2203 corresponding thereto is highlighted or highlighted, and the recognition result candidate 2204 is displayed in the designated portion. In response to this display, when the user selects the correct answer character "Miya" from the keyboard or the like which is the input unit 118, information regarding the correct answer character "Miya" is displayed on the interface unit 11.
6 to correct character buffer 119 and area coordinate buffer 1
It is output to 20. In this case, the recognition result candidates 2204 displayed in the designated portion are the order of appearance frequency of the displayed characters, or the order of determination according to the hierarchical structure when having a hierarchical structure such as an address character string, or simply the characters. It can be configured to be displayed in code order.

【０１２８】図２２の例に続いて図２３に示されるよう
に、更に指示位置２３０１とそれに対応する認識結果位
置２３０２についても、同様の修正処理が行われること
により、文字列２３０３を正しく再認識させることが可
能となる。As shown in FIG. 23 following the example of FIG. 22, the same correction processing is performed on the designated position 2301 and the recognition result position 2302 corresponding thereto, so that the character string 2303 is correctly recognized again. It becomes possible.

【０１２９】ここで、各特定文字組を構成する２つの特
定文字に挟まれた各文字領域に対する再認識処理につい
て、前述した図３のステップ２０７〜２１２において
は、１つの候補単語を構成する文字毎に個別に再認識処
理が実行され、最終的にその候補単語に対する認識結果
が出力されるように構成されている。Here, regarding the re-recognition processing for each character area sandwiched between two specific characters forming each specific character set, in steps 207 to 212 in FIG. 3 described above, the characters forming one candidate word are recognized. The re-recognition process is individually executed for each of them, and the recognition result for the candidate word is finally output.

【０１３０】この場合に、マッチング部１０６が標準パ
ターン辞書１１３上から検索する文字種が、候補単語が
属するカテゴリーの文字種に限定されることにより、効
率的な再認識処理が実現される。In this case, the character type searched by the matching unit 106 from the standard pattern dictionary 113 is limited to the character type of the category to which the candidate word belongs, whereby an efficient re-recognition process is realized.

【０１３１】一方、２つの特定文字に挟まれた文字領域
全体に対して、特徴ベクトルの抽出とマッチング部１０
６によるマッチング処理が実行されるように構成されて
もよい。この場合には、標準パターン辞書１１３には、
「川崎」「横浜」「横須賀」・・・のそれぞれの単語を
１つのパターンとする標準パターンの特徴ベクトルが保
持され、マッチング部１０６は、１つの候補単語の全体
を１つのパターンとする特徴ベクトルと、標準パターン
辞書１１３内の各標準パターンの特徴ベクトルとのマッ
チング処理を実行する。On the other hand, for the entire character area sandwiched between two specific characters, the feature vector extraction and matching unit 10
The matching process according to 6 may be configured to be executed. In this case, the standard pattern dictionary 113 contains
A feature vector of a standard pattern in which each word of “Kawasaki”, “Yokohama”, “Yokosuka” ... Is set as one pattern is held, and the matching unit 106 sets a feature vector in which one candidate word is entirely set as one pattern. And the feature vector of each standard pattern in the standard pattern dictionary 113 are matched.

【０１３２】この場合に、マッチング部１０６が標準パ
ターン辞書１１３上から検索する単語群が、候補単語が
属するカテゴリーの単語群に限定されることにより、効
率的な再認識処理が実現される。In this case, the word group searched by the matching unit 106 from the standard pattern dictionary 113 is limited to the word group of the category to which the candidate word belongs, whereby an efficient re-recognition process is realized.

【０１３３】より具体的には、例えば住所文字列の認識
において、マッチング部１０６が標準パターン辞書１１
３上から検索する単語群が、候補単語が属する階層レベ
ルを構成する単語群に限定されることにより、効率的な
再認識処理が実現される。More specifically, in recognizing an address character string, for example, the matching unit 106 sets the standard pattern dictionary 11
An efficient re-recognition process is realized by limiting the word groups searched from above 3 to the word groups constituting the hierarchical level to which the candidate word belongs.

【０１３４】例えば、図２０に示されるように、２つの
特定文字「県」と「市」に挟まれた領域の再認識処理に
おいて、標準パターン辞書１１３を、「川崎」「横浜」
「横須賀」・・・等の市を表わす単語群のみのものに限
定することができる。For example, as shown in FIG. 20, in the re-recognition process of the area sandwiched between two specific characters “prefecture” and “city”, the standard pattern dictionary 113 is changed to “Kawasaki” and “Yokohama”.
It can be limited to only a word group representing a city such as “Yokosuka”.

【０１３５】また、例えば住所文字列の認識において、
上位レベルの認識結果が得られているときには、マッチ
ング部１０６が標準パターン辞書１１３上から検索する
単語群が、その上位レベルの認識結果に属しかつ候補単
語が属する下位レベルを構成する単語群に限定されるこ
とにより、更に効率的な再認識処理が実現される。Further, for example, in recognizing an address character string,
When the higher-level recognition result is obtained, the word group searched by the matching unit 106 from the standard pattern dictionary 113 is limited to the lower-level word group that belongs to the higher-level recognition result and to which the candidate word belongs. By doing so, a more efficient re-recognition process is realized.

【０１３６】例えば、住所文字列のレベル１の認識結果
が「青森」である場合に、レベル２の標準パターンは、
２つの特定文字「県」と「市」に挟まれて出現し得る全
ての単語群ではなく、「青森県」に属する市を表わす単
語群に限定することが可能である。For example, when the recognition result of the level 1 of the address character string is "Aomori", the standard pattern of the level 2 is
It is possible to limit not to all the word groups that can appear between the two specific characters “prefecture” and “city”, but to the word groups that represent cities that belong to “Aomori prefecture”.

【０１３７】上記とは逆に、例えば住所文字列の認識に
おいて、下位レベルの認識結果が得られているときに
は、マッチング部１０６が標準パターン辞書１１３上か
ら検索する単語群が、その下位レベルの認識結果が属し
かつ候補単語が属する上位レベルを構成する単語群に限
定されることにより、上位レベルの認識不能状態を救済
することもできる。本実施の形態を実現するプログラムが記録された記録媒
体についての補足本発明は、コンピュータにより使用されたときに、上述
の本発明の実施の形態の各構成によって実現される機能
と同様の機能をコンピュータに行わせるためのコンピュ
ータ読出し可能記録媒体として構成することもできる。Contrary to the above, when the recognition result of the lower level is obtained in the recognition of the address character string, the word group searched by the matching unit 106 from the standard pattern dictionary 113 is recognized in the lower level. By limiting the word group to which the result belongs and which constitutes the upper level to which the candidate word belongs, the unrecognizable state of the upper level can be remedied. Recording medium on which a program realizing the present embodiment is recorded
Supplement to Body The present invention is configured as a computer-readable recording medium for causing a computer to perform the same function as the function realized by each configuration of the above-described embodiments of the present invention when used by the computer. You can also do it.

【０１３８】この場合に、図２４に示されるように、例
えばフロッピィディスク、ＣＤ−ＲＯＭディスク、光デ
ィスク、リムーバブルハードディスク等の可搬型記録媒
体２４０２や、ネットワーク回線２４０３経由で、本発
明の実施の形態の各種機能を実現するプログラムが、コ
ンピュータ２４０１の本体２４０４内のメモリ（ＲＡＭ
又はハードディスク等）２４０５にロードされて、実行
される。In this case, as shown in FIG. 24, for example, a portable recording medium 2402 such as a floppy disk, a CD-ROM disk, an optical disk, a removable hard disk or the like, and a network line 2403 are used to realize the embodiment of the present invention. Programs for realizing various functions are stored in a memory (RAM) in the main body 2404 of the computer 2401.
(Or hard disk, etc.) 2405 and executed.

【０１３９】[0139]

【発明の効果】本発明の文字認識技術によれば、入力文
字列中の特定文字又は特定文字列がまず優先的に認識さ
れ、その認識結果に基づいてその前後の候補単語が仮定
され、更にその候補単語の情報を用いて入力文字列を構
成する文字が再認識されることによって、通常見かける
各種帳票（伝票）に記入されるような、不規則な間隔、
記入方法で記入された入力文字列を構成する文字を、高
い精度で認識することが可能となる。According to the character recognition technique of the present invention, a specific character or a specific character string in an input character string is first recognized preferentially, and candidate words before and after that character string are assumed based on the recognition result. By re-recognizing the characters that make up the input character string by using the information of the candidate words, irregular intervals that are filled in various forms (slips) that are usually seen,
It is possible to recognize with high accuracy the characters that make up the input character string entered by the entry method.

【０１４０】本発明の文字修正技術によれば、特定の文
字又は文字列のみを修正するだけで、他の認識不能部分
も自動的に修正することが可能となる。本発明の表記ゆ
らぎの制御技術によれば、種々の記入方法に柔軟に対処
することが可能となる。According to the character correction technique of the present invention, it is possible to automatically correct other unrecognizable parts by correcting only a specific character or character string. According to the writing fluctuation control technique of the present invention, it is possible to flexibly deal with various writing methods.

[Brief description of drawings]

【図１】本発明の実施の形態の構成図である。FIG. 1 is a configuration diagram of an embodiment of the present invention.

【図２】本発明の実施の形態の全体制御動作フローチャ
ート（その１）である。FIG. 2 is an overall control operation flowchart (No. 1) of the embodiment of the present invention.

【図３】本発明の実施の形態の全体制御動作フローチャ
ート（その２）である。FIG. 3 is an overall control operation flowchart (No. 2) of the embodiment of the present invention.

【図４】本発明の実施の形態の全体制御動作フローチャ
ート（その３）である。FIG. 4 is an overall control operation flowchart (Part 3) of the embodiment of the present invention.

【図５】記入フィールド定義のデータフォーマット例を
示す図である。FIG. 5 is a diagram showing an example of a data format of an entry field definition.

【図６】文字切り出し部の制御動作フローチャート（そ
の１）である。FIG. 6 is a control operation flowchart (No. 1) of the character cutout unit.

【図７】文字切り出し部の制御動作フローチャート（そ
の２）である。FIG. 7 is a control operation flowchart (No. 2) of the character cutout unit.

【図８】文字切り出し部の制御動作の説明図である。FIG. 8 is an explanatory diagram of a control operation of a character cutout unit.

【図９】表記のゆれについての制御動作フローチャート
である。FIG. 9 is a control operation flowchart for fluctuation in notation.

【図１０】知識辞書（住所）の構造図である。FIG. 10 is a structural diagram of a knowledge dictionary (address).

【図１１】特定文字辞書の構造図である。FIG. 11 is a structural diagram of a specific character dictionary.

【図１２】特定文字辞書１１０の構造例（その１）を示
す図である。FIG. 12 is a diagram showing a structural example (part 1) of a specific character dictionary 110.

【図１３】特定文字辞書１１０の構造例（その２）を示
す図である。FIG. 13 is a diagram showing a structural example (part 2) of a specific character dictionary 110.

【図１４】特定文字辞書１１０の構造例（その３）を示
す図である。FIG. 14 is a diagram showing a structural example (part 3) of a specific character dictionary 110.

【図１５】特定文字辞書１１０の構造例（その４）を示
す図である。FIG. 15 is a diagram showing a structural example (part 4) of a specific character dictionary 110.

【図１６】表記のゆらぎの制御動作の説明図である。FIG. 16 is an explanatory diagram of a fluctuation control operation of the notation.

【図１７】候補単語検索部の動作説明図（その１）であ
る。FIG. 17 is an operation explanatory diagram (1) of the candidate word search unit.

【図１８】候補単語検索部の動作説明図（その２）であ
る。FIG. 18 is an explanatory diagram (No. 2) of operation of the candidate word search unit.

【図１９】候補単語検索部の動作説明図（その３）であ
る。FIG. 19 is a diagram for explaining the operation of the candidate word search unit (No. 3).

【図２０】標準パターン辞書による文字列検出／認識動
作の説明図である。FIG. 20 is an explanatory diagram of a character string detection / recognition operation using a standard pattern dictionary.

【図２１】入力部と表示部の動作説明図（その１）であ
る。FIG. 21 is an operation explanatory diagram (1) of the input unit and the display unit.

【図２２】入力部と表示部の動作説明図（その２）であ
る。FIG. 22 is an operation explanatory diagram (2) of the input unit and the display unit.

【図２３】入力部と表示部の動作説明図（その３）であ
る。FIG. 23 is an operation explanatory diagram (3) of the input unit and the display unit.

【図２４】本実施の形態を実現するプログラムが記録さ
れた記録媒体の説明図である。FIG. 24 is an explanatory diagram of a recording medium in which a program that realizes this embodiment is recorded.

[Explanation of symbols]

１０１入力文字列１０２イメージメモリ１０３文字切り出し部１０４記入フィールド定義１０５特徴抽出部１０６マッチング部１０７特定文字標準パターン辞書１０８候補文字列バッファ１０９候補単語検索部１１０特定文字辞書１１１知識辞書１１２候補単語バッファ１１３標準パターン辞書１１４知識処理部１１５認識結果バッファ１１６インタフェース部１１７表示部１１８入力部１１９正解文字バッファ１２０領域座標バッファ 101 Input string 102 image memory 103 character cutout section 104 Entry field definition 105 Feature Extraction Unit 106 Matching unit 107 Specific Character Standard Pattern Dictionary 108 Candidate character string buffer 109 Candidate word search unit 110 Specific character dictionary 111 Knowledge Dictionary 112 candidate word buffer 113 Standard Pattern Dictionary 114 Knowledge Processing Unit 115 Recognition result buffer 116 Interface part 117 Display 118 Input section 119 Correct character buffer 120 area coordinate buffer

フロントページの続き (56)参考文献特開平７−262320（ＪＰ，Ａ) 特開平３−257693（ＪＰ，Ａ) 特開平５−6464（ＪＰ，Ａ) 特開平６−4717（ＪＰ，Ａ) 特開平２−101596（ＪＰ，Ａ) 特開平８−171614（ＪＰ，Ａ) 特開平５−89291（ＪＰ，Ａ) ＰＲＵ93−49 住所文字列の認識後処理方式に関する検討，電子情報通信学会技術研究報告，日本，1993年９月16 日，第93巻第228号，ｐｐ．49−56 キー文字駆動型地名推論に基づく手書きあて名認識，電子情報通信学会論文誌，日本，1997年５月，Ｖｏｌ．Ｊ80 −Ｄ−ＩＩＮｏ．５，ｐｐ．1077− 1085 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/00 - 9/82 Continuation of front page (56) Reference JP-A-7-262320 (JP, A) JP-A-3-257693 (JP, A) JP-A-5-6464 (JP, A) JP-A-6-4717 (JP , A) JP-A-2-101596 (JP, A) JP-A-8-171614 (JP, A) JP-A-5-89291 (JP, A) PRU93-49 Study on post-recognition processing method of address character string , IEICE Technical Report, Japan, September 16, 1993, Vol. 93, No. 228, pp. 49-56 Handwriting based on key character driven place name inference, address recognition, IEICE Transactions, Japan, May 1997, Vol. J80-D-II No. 5, pp. 1077-1085 (58) Fields investigated (Int.Cl. ⁷ , DB name) G06K 9/00-9/82

Claims

(57) [Claims]

1. A character recognition method for recognizing a character constituting an input character string entered in an entry field having a predetermined category, which corresponds to the input character string and a specific character or a specific character string.
By executing a first matching process with a first recognition dictionary that stores a standard pattern, a specific character or a specific character string is extracted from the input character string, and belongs to the predetermined category, and the input Specific character or specific character string and two specific sentences extracted from the character string
A candidate word group that may be located in a region in the input character string between characters or a specific character string is extracted from the category word dictionary, and for each candidate word belonging to the extracted candidate word group, respective candidate words are included based on information about each candidate word
That for each region in the input character string, the candidate word group
Corresponding to a character or character string related to a candidate word belonging to
A character recognition method, comprising: recognizing a character forming the input character string by executing a second matching process using a second recognition dictionary storing a standard pattern .

2. A character correction method using the character recognition method according to claim 1 , wherein a recognition result of characters forming the input character string is displayed in parallel with the input character string, and the displayed result is displayed. A desired area on the input character string is designated to correct the character or character string corresponding to the desired area, and the candidate word group is extracted based on the correct character or the information on the correct character string given by the correction. A character correction method comprising: recognizing a character forming the input character string again by re-executing the processing and the second matching processing.

3. The method according to claim 2 , further comprising: displaying a plurality of candidate recognition results in the desired area in response to designation of the desired area on the displayed input character string. Character correction method characterized by the following.

4. A recording medium for recording a program read by the computer when it is used by the computer, which corresponds to an input character string entered in an entry field having a predetermined category and a specific character or a specific character string. Standard power
A function of extracting a specific character or a specific character string from the input character string by executing a first matching process with a first recognition dictionary storing a turn , and belonging to the predetermined category, and Specific character or specific character string and two specific sentences extracted from the input character string
A function of extracting a candidate word group that may be located in a region in the input character string between letters or a specific character string from a category word dictionary, and for each candidate word belonging to the extracted candidate word group , The candidate word group for each area in the input character string that includes each candidate word based on information about each candidate word
Corresponding to a character or character string related to a candidate word belonging to
By executing a second matching process using a second recognition dictionary that stores a standard pattern, a function for recognizing the characters that form the input character string and a program for causing the computer to perform are recorded. Computer readable recording medium.