JPS616780A

JPS616780A - Recognizing method of character

Info

Publication number: JPS616780A
Application number: JP59127751A
Authority: JP
Inventors: Tatsuo Furubayashi; 古林　龍夫
Original assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Current assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Priority date: 1984-06-20
Filing date: 1984-06-20
Publication date: 1986-01-13
Anticipated expiration: 2011-12-11
Also published as: JP2562574B2

Abstract

PURPOSE:To reduce the frequency of rejects by regarding an unrecognizable character having one and the same similarity order as that of plural reference characters as one and the same character, and counting up its generating frequency, recognizing the character while referring the reference character when the counted value exceeds a prescribed value. CONSTITUTION:A non-recognized word storage area in a memory of a character recognizing device has storage areas 10-30 for unrecognizable characters and respective areas 10-30 are provided with similarity storage columns 12-32, preceding/succeeding character storage columns 13-33, generating frequency storage columns 14-34, and correct character storage columns 15-35. The character recognizing device regards an unrecognizable character having practically one and the same similarity order as that of plural reference characters which are references of recognition to count up the generating frequency of the unrecognizable character, and when the counted value exceeds a prescribed value, specifies the unrecognizable character as a reference character while referring the contents stored in the non-recognized word storage area.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文字認識方法に関し、更に詳述すれば、文字認
識装置による認識の拒否（リジェクト）の発生を可及的
に回避し得る文字ｍ織方法に関する。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a character recognition method, and more specifically, the present invention relates to a character recognition method. Regarding weaving methods.

[Prior art]

近年、文字認識装置が実用化されているが、類似した文
字、特に手書き文字に関しては認識不可能な場合がまま
あり、このような場合には、誤認識を回避するため認識
を拒否（リジェクト）するように構成されている場合が
多い。In recent years, character recognition devices have been put into practical use, but there are cases where similar characters, especially handwritten characters, cannot be recognized.In such cases, recognition is rejected to avoid misrecognition. It is often configured to do so.

一般的に文字認識は、認、織対象文字との類似度が高い
複数の標準文字（認識の基準となる文字）それぞれとの
類似度を所定の演算処理により求め、最も類似度が高い
標準文字との類似度（最大類似度）が所定値以上であり
且つこの最大類似度と、その次に類似度の高い標準文字
との類似度（次大類似度）との差が予め定められた所定
閾値以上である場合にのみ、最大類似度を有する標準文
字として認識するものであり、最大類似度が所定値以下
、又は最大類似度と法人類似度との差が闇値内である場
合には、誤認識を避けるためにリジェクトがおこなわれ
る。In general, character recognition is performed by calculating the degree of similarity between each of multiple standard characters (characters that serve as the basis for recognition) that have a high degree of similarity to the character to be recognized or woven, using a predetermined calculation process, and then determining the standard character with the highest degree of similarity. The degree of similarity (maximum similarity) with the standard character is greater than or equal to a predetermined value, and the difference between this maximum degree of similarity and the degree of similarity (the next highest degree of similarity) with the standard character with the next highest degree of similarity is a predetermined value. It is recognized as a standard character with the maximum similarity only if it is above a threshold value, and if the maximum similarity is below a predetermined value or the difference between the maximum similarity and the corporate similarity is within the dark value. , rejection is performed to avoid misrecognition.

ところで、上述の如きリジェクトを回避するための方法
として、たとえば特公昭５２−１９４１６号及び同５７
−４９９４０号等が提案されている。By the way, as a method to avoid the above-mentioned rejection, for example, Japanese Patent Publications No. 52-19416 and No. 57
-49940 etc. have been proposed.

特公昭５２−１９４１６号では、上述の最大類似度と法
人類似度との差に応じて認識判定のための闇値を可変と
することにより誤ｇ識の割合を増大させることなく認識
不能となる文字の割合を減少させるようにしている。し
かし、最大類似度と次大類似度との間の判定のための闇
値が大きければ大きい程認識精度が高くなるため、上述
の方法では認識精度が低下することが考えられ、また閾
値をどれほど小さくしても最大類似度と法人類似度との
差がその闇値の範囲内にある場合には、やはりリジェク
トされてしまう。In Japanese Patent Publication No. 52-19416, by making the darkness value for recognition determination variable according to the difference between the above-mentioned maximum similarity and corporate similarity, recognition becomes impossible without increasing the rate of false recognition. I am trying to reduce the proportion of characters. However, the recognition accuracy increases as the darkness value for determining between the maximum similarity and the next-largest similarity increases, so it is possible that the recognition accuracy decreases with the above method. Even if it is made smaller, if the difference between the maximum similarity and the corporate similarity is within the range of the dark value, it will still be rejected.

一方、特公昭５７−４９９４０号では、誤認識を起こし
やすい領域の中心部に、リジェクト用の標準パターンを
新設し、それを中心とした一定範囲のりジェクト領域を
設定し、入力未知パターン（認識対象文字）がこの領域
に入った場合にリジェクトす゛るものである。これによ
り、従来の正しい認ｉＩｌ＋領域をほとんど減らすこと
な（、誤ＵＡ織領域を、はぼ完全にリジェクト領域とす
ることができる、としているが、この発明では誤＃Ｘｌ
、織を従来以上に回避することが可能となる反面、従来
以上にリジェクトの発生が増加し、このため再入力の回
数が増加し、あるいは他のりジェクト防止のための構成
が必要となりコストが嵩む等の問題が生じる。On the other hand, in Japanese Patent Publication No. 57-49940, a standard pattern for rejection is newly established in the center of the area where misrecognition is likely to occur, a certain range of rejection area is set around this pattern, and an input unknown pattern (recognition target If a character (character) enters this area, it will be rejected. As a result, it is said that it is possible to almost completely reduce the erroneous UA weave area as a reject area without almost reducing the conventionally correct recognition iIl+ area, but in this invention, the erroneous
Although this makes it possible to avoid more errors than before, the number of rejects increases more than before, which increases the number of re-inputs or requires other configurations to prevent rejects, which increases costs. Problems such as this arise.

また、上述のいずれの発明によっても手書き文字の認識
に際しては、個人の癖によりリジェクトが頻発する可能
性がある。Further, in any of the above-described inventions, when recognizing handwritten characters, there is a possibility that rejections may occur frequently due to individual habits.

[Purpose of the invention]

本発明は上述の如き問題点に鑑みて成されたものであり
、リジェクト、即ち認識不可能な認識対象文字が発生す
る都度、その文字と類似度の高い標準文字複数について
類似の順位、類似度及びその文字の前後に記されている
文字を記憶し、この認識不可能な文字と実質的に同一の
類似度の順位を有する認識不可能な文字の発生頻度を計
数し、この計数値が所定値に達した場合に、類似度の順
位、類似度及びその文字の前後に記された文字を参照し
ていずれかの標準文字として特定し、それ以後は特定さ
れた標準文字として認識する構成として、リジェクトの
発生を減少させて再入力の手間を軽減し、これに要する
無駄時間を省き、また認識率を向上させ得る文字認識方
法の提供を目的とする。The present invention has been made in view of the above-mentioned problems, and each time a reject character, that is, an unrecognizable character to be recognized, occurs, the similarity ranking and degree of similarity are determined for a plurality of standard characters that have a high degree of similarity to that character. and the characters written before and after that character are memorized, and the frequency of occurrence of an unrecognizable character having substantially the same similarity ranking as this unrecognizable character is counted, and this count value is determined as a predetermined value. When the value is reached, it is identified as one of the standard characters by referring to the similarity ranking, the similarity, and the characters written before and after that character, and from then on it is recognized as the specified standard character. To provide a character recognition method that can reduce the occurrence of rejects, reduce the effort of re-inputting, eliminate the wasted time required for this, and improve the recognition rate.

[Structure of the invention]

本発明に係る文字認識装置は、認識対象文字と複数の標
準文字との類似度に基づいて前記認識対象文字を認識す
る方法において、前記認識対象文字がいずれの標準文字
としても認識されない場合には、その都度、この認識対
象文字との類似度の高い標準文字複数についてＷ４僚の
順位、類似度及びその標準文字を特定するコード並びに
該認識対象文字の前後に記された文字とを記憶し、前記
順位と実質的に同一の順位を有する認識不可能な認識対
象文字が生じる都度、その回数を計数し、この計数値が
所定値以上となった場合に、記憶されている類似の順位
、類似度及び認識対象文字の前後に記された文字を参照
して前記認識不可能な認識対象文字をいずれかの標準文
字として特定し、爾後、この特定された標準文字として
前記認識不可能な認識対象文字を認識することを特徴と
する。The character recognition device according to the present invention provides a method for recognizing a recognition target character based on the degree of similarity between the recognition target character and a plurality of standard characters, when the recognition target character is not recognized as any standard character. , each time, for a plurality of standard characters with a high degree of similarity to this recognition target character, store the rank of W4 colleagues, the degree of similarity, a code specifying the standard character, and the characters written before and after the recognition target character, Every time an unrecognizable character to be recognized that has substantially the same rank as the above-mentioned rank occurs, the number of occurrences is counted, and when this count value is greater than or equal to a predetermined value, the stored similar rank, similar The unrecognizable recognition target character is identified as one of the standard characters by referring to the degree and the characters written before and after the recognition target character, and the unrecognizable recognition target character is then used as the specified standard character. Characterized by recognizing characters.

〔Example〕

以下本発明をその実施例を示す図面に基づいて詳述する
。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below based on drawings showing embodiments thereof.

図面は本発明を実施するために用いられる、文字認識装
置のメモリ内に備えられた非認識語記憶領域の内容を示
す模式図である。The drawing is a schematic diagram showing the contents of an unrecognized word storage area provided in the memory of a character recognition device used to implement the present invention.

この非認識語記憶領域は、たとえばメモリ内の所定領域
を所定番地ずつに区切ってそれぞれを認識不可能な文字
（以下、非認識文字という）のための記憶領域１０，２
０．３０・・・とじたものである。各記憶領域１０等に
はそれぞれ、類似度記憶欄１２□２２．３２・・・、前
後文字記憶欄１３，２３．３３・・・、発生回数記憶欄
１４，２４．３４・・・、正解文字記憶欄１５．２５．
３５・・・が備えられている。This unrecognized word storage area is, for example, a predetermined area in the memory divided into predetermined locations, each of which is a storage area 10, 2 for unrecognized characters (hereinafter referred to as unrecognized characters).
0.30...It is a closed item. Each storage area 10 etc. has a similarity storage field 12□22.32..., a preceding and following character storage field 13, 23.33..., an occurrence number storage field 14, 24.34..., and a correct character. Memory column 15.25.
35... are provided.

類似度記憶欄１２，２２．３２・・・には、非認識文字
それぞれについて複数の標準文字との類似度がその順位
に従って記憶されており、またその標準文字のＪＩＳコ
ードも記憶されている。Similarity storage columns 12, 22, 32, . . . store the degree of similarity of each unrecognized character with a plurality of standard characters according to its rank, and also store the JIS code of the standard character.

前後文字記憶欄１３，２３．３３・・・には、各非認識
文字が発生した際に、その前後に記憶されていた文字が
記憶されている。これは、その非認識文字が漢字であっ
て熟語を構成している場合、あるいは訓読みされている
場合Ｑ送り仮名等を記憶しておくことにより、その非認
識文字を特定する際の参考にするためである。The preceding and following character storage columns 13, 23, 33, . . . store the characters that were stored before and after each unrecognized character when it occurred. This can be used as a reference when identifying the unrecognized character by memorizing the Q-okuri kana, etc. if the unrecognized character is a kanji and composes a compound word, or if it is read in kun-yomi. It's for a reason.

発生回数記憶欄１４，２４．３４・・・にば、各記憶領
域１０゜２０．３０・・・に割当てられた非認識文字が
再発生する都度、その回数が記憶されるが、具体的には
、各記憶領域１０．２０．３０・・・の類似度記憶欄１
２，２２．３２・・・に記憶されている類似順位と実質
的に同一と見做される順位を有する非認識文字が発生し
た場合に同一の非認識文字の発生として計数される。な
お、類似順位が実質的に同一であるとは、全く同じ場合
は勿論のこと、たとえば記憶されている類似順位の組合
せの大部分が同一で且つ上位の２乃至３の類似順位が同
一である場合等である。In the occurrence count storage columns 14, 24, 34, etc., each time an unrecognized character assigned to each storage area 10, 20, 30, etc. re-occurs, the number of occurrences is stored. is the similarity storage column 1 of each storage area 10, 20, 30...
When an unrecognized character having a rank that is considered to be substantially the same as the similarity rank stored in 2, 22, 32, etc. occurs, it is counted as the occurrence of the same unrecognized character. Note that the similarity rankings are substantially the same not only when they are exactly the same, but also when, for example, most of the combinations of stored similarity rankings are the same and the top two or three similarity rankings are the same. Cases etc.

正解文字記憶欄１５，２５．３５・・・にば、各記憶領
域１０゜２０．３０・・・に割当られている非認識文字
の正解が人手を介して入力記憶される。即ち、発生回数
記憶欄１４，２４．３４・・・に記憶された発生回数が
予め定められた回数に達した場合には、その非認識文字
のデータ、即ち各記憶領域１０，２０．３０・・・に記
憶されている各データが出力される。この出力されたデ
ータには、上述の如く各非認識文字に対して類イ以度の
高い順に標準文字のＪＩＳコードが記憶されており、ま
たその非認識文字が記憶される都度、前後の文字（熟語
の構成文字、あるいは送仮名等）も記憶されているので
、これらを参照して使用者が非認識文字を特定し、正解
文字記憶欄１５，２５．３５・・・に入力する。In the correct character storage columns 15, 25, 35, . . . , the correct answers to the unrecognized characters assigned to the respective storage areas 10, 20, 30, . . . are manually input and stored. That is, when the number of occurrences stored in the number of occurrences storage columns 14, 24, 34, . Each data stored in ... is output. In this output data, as mentioned above, the JIS codes of standard characters are stored for each unrecognized character in descending order of class A or higher, and each time the unrecognized character is stored, the preceding and following characters are stored. (constituent characters of idioms, kanji, etc.) are also stored, so the user refers to these to specify unrecognized characters and inputs them into the correct character storage columns 15, 25, 35, . . . .

以上のようにして、ある程度以上の頻度で発生する非認
識文字が特定され、その正解文字（標準文字）が各記憶
領域１０（又は２０．３０・・・）の正解文字記憶欄１
５（又は２５．３５・・・）に入力されると、それ以後
は各記憶領域１０（又は２０．３０・・・）の類似度記
憶欄１２（又は２２．３２・・・）に記憶されている類
似順位と実質的に同一の順位を有する非認識文字が発生
した場合には、その記憶領域１０（又は２０．３０・・
・）の正解文字記憶欄１５（又は２５．３５・・・）に
記憶されている正解文字として認識される。As described above, unrecognized characters that occur at a certain frequency are identified, and the correct characters (standard characters) are stored in the correct character storage column 1 of each storage area 10 (or 20, 30, etc.).
5 (or 25.35...), it is thereafter stored in the similarity storage column 12 (or 22.32...) of each storage area 10 (or 20.30...). When an unrecognized character having substantially the same similarity rank as the previous similarity rank occurs, the storage area 10 (or 20, 30, etc.) is
) is recognized as the correct character stored in the correct character storage column 15 (or 25, 35...).

なお、非認識語記憶領域に余裕が無くなった場合には、
発生回数が少な（且つより古くから記憶されている非認
識文字のデータを消去して、新たに発生した非認識文字
を記憶することとする。In addition, if there is no more room in the unrecognized word storage area,
The data of unrecognized characters that occur less frequently (and have been stored for a longer time) are deleted, and newly generated unrecognized characters are stored.

〔effect〕

以上詳述した如く、本発明に係る文字認識方法は、認識
の基準である複数の標準文字との類イ以の順位が実質的
に同一である認識不可能な文字を同一の文字と見做して
その発生頻度を計数し、この計数値が所定値以上となっ
た場合に、その認識不可能な文字を記憶されている標準
文字との類似の順位、類似度、その標準文字のコード及
びその文字の前後に記されていた文字等を参照していず
れかの標準文字として特定し、以後この特定された標準
文字として認識するものである。このため、認識率が向
上し、リジェクトの頻度が減少するため、再入力の手間
が省け、文字認識に要する時間が短縮される。As described in detail above, the character recognition method according to the present invention considers unrecognizable characters that are substantially the same in the rank of similar or higher than a plurality of standard characters that are recognition standards as the same character. The frequency of occurrence is counted, and when this count value exceeds a predetermined value, the unrecognizable character is ranked in similarity to the stored standard characters, the degree of similarity, the code of the standard character, and The character is specified as one of the standard characters by referring to the characters written before and after the character, and is thereafter recognized as the specified standard character. Therefore, the recognition rate is improved and the frequency of rejections is reduced, which saves the effort of re-inputting and reduces the time required for character recognition.

更に本発明によれば、個人の癖によるリジェクトの回避
が可能となるので、手書き文字の認識に際しての効果は
多大であり、また癖字がどのような文字に類似すると判
断されているかを知ることが可能であるので、文字認識
装置の使用者は癖字の入力そのものを回避すべ（努力す
るという効果もある。Furthermore, according to the present invention, it is possible to avoid rejections due to individual quirks, so it is highly effective in recognizing handwritten characters, and it is also possible to know what kind of characters the quirky characters are judged to be similar to. Therefore, the user of the character recognition device should try to avoid inputting idiosyncratic characters.

[Brief explanation of drawings]

図面は本発明の実施例を示すための、文字認識装置のメ
モリ内に備えられた非認識語記憶領域の内容を示す模式
図である。The drawing is a schematic diagram illustrating the contents of an unrecognized word storage area provided in a memory of a character recognition device, for illustrating an embodiment of the present invention.

Claims

[Claims] 1. In a method for recognizing a character to be recognized based on the degree of similarity between a character to be recognized and a plurality of standard characters, if the character to be recognized is not recognized as any standard character, In each case, the similarity ranking of multiple standard characters with high similarity to this recognition target character,
The degree of similarity, the code specifying the standard character, and the characters written before and after the character to be recognized are memorized, and each time an unrecognizable character to be recognized that has substantially the same ranking as the above-mentioned ranking occurs, that character is memorized. The number of times the recognition target character is recognized is counted, and when this count value is greater than or equal to a predetermined value, the unrecognizable recognition target is identified by referring to the stored similarity rank, degree of similarity, and characters written before and after the recognition target character. A character recognition method, characterized in that a character is specified as any standard character, and then the unrecognizable recognition target character is recognized as the specified standard character.