JPH03276380A

JPH03276380A - Character recognizing device

Info

Publication number: JPH03276380A
Application number: JP2077770A
Authority: JP
Inventors: Koji Ito; 伊東　晃治; Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1990-03-27
Filing date: 1990-03-27
Publication date: 1991-12-06

Abstract

PURPOSE:To eliminate the generation of missing of a character line and the disappearance of blank owing to normalization by detecting the line width of a character pattern and executing processing that the character line less than the first fixed width is made thick and the character line more than the second fixed width is made thin. CONSTITUTION:A character segmenting part 10 segments a character pattern at one character unit from the quantized picture data of an inputted character medium. A character frame detecting part 12 detects a character circumscribing frame at every the respective character pattern and finds the character size of the character pattern from the position of the character circumscribing frame. A normalized constant determining part 14 sets a normalized constant corresponding to the character size, a normalizing part 16 normalizes the character pattern based on the normalized constant and a recognizing part 18 executes the recognization of the normalized character pattern. A line width detecting part 20 detects the line width of the character pattern. In a line width converting part 22, both processing that a thickening processing part 28 makes a character line thick for the character pattern of which the line width becomes less than a first fixed width and that a thinning processing part 30 makes the character line thin for the character pattern of which the line width becomes more than a second fixed width is executed.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は文字認識装置に関する。[Detailed description of the invention] (Industrial application field) The present invention relates to a character recognition device.

（従来の技術）新聞、書籍、雑誌等の一般印刷文書では、本文の文字サ
イズと、見出し又は表題の文字サイズとが大きく異なる
。これら文字サイズの異なる文字の認識を同−製雪によ
り従って同一の認識方法及び回路を用いて行なうために
は、文字パタンの大きざを正規化する必要がある。(Prior Art) In general printed documents such as newspapers, books, and magazines, the font size of the main text and the font size of headings or titles are significantly different. In order to recognize these characters of different sizes using the same recognition method and circuit, it is necessary to normalize the size of the character pattern.

正規化の一方法として、文字サイズか基準値より大きい
場合に文字パタンを単純に間引く方法がある。この方法
はもとの文字パタンを１／２．１／３．１／４等の正規
化定数倍に縮少したパタンに変換するもので、文字認識
装置の処理速度を早くしまたハードウェアの規模を小さ
くすることかできる。One method of normalization is to simply thin out character patterns when the character size is larger than a standard value. This method converts the original character pattern into a pattern that is reduced by a normalization constant such as 1/2.1/3.1/4, etc., increasing the processing speed of the character recognition device and reducing the hardware cost. It is possible to reduce the scale.

（発明か解決しようとする課題）しかしながら、一般の印刷文書では明朝体を主たる字体
としまた本文の文字サイズを約３ｍｍ及び見出しや表題
の文字サイズをほぼ１２ｍｍ以上とすることか多い。(Problem to be Solved by the Invention) However, in general printed documents, the main font is Mincho, and the font size of the main text is approximately 3 mm, and the font size of headings and titles is often approximately 12 mm or more.

文字サイズ３ｍｍの文字パタンを格納する正規化パタン
メモリには通常６４Ｘ６４画素或は１２８Ｘ１２８画素
の容量を有するものを用いるか、文字サイズの異なる文
字を同一の認識方法及び回路で認識するには、大きな文
字サイズ１２ｍｍの文字パタンを１／４に圧縮（正規化
）して正規化パタンメモリに格納しなければならない。A normalized pattern memory that stores character patterns with a character size of 3 mm usually has a capacity of 64 x 64 pixels or 128 x 128 pixels, or a large A character pattern with a character size of 12 mm must be compressed (normalized) to 1/4 and stored in a normalized pattern memory.

一方、文字サイズ１２ｍｍの明朝体文字の水平ストロー
クの線幅は０．３ｍｍ程度であり、この文字の画像パタ
ンを一般に用いられる解像度３００ｃｌｐｉのスキャナ
ーにより得た場合、水平ストロークの線幅は３画素とな
り、従って文字サイズ１２ｍｍの明朝体文字のパタンを
単純に１／４に間引くと、水平ストロークが欠落するこ
とがある。また見出しや表題にはゴシック体、特別にデ
ザインした字体等の極端に線幅の太い文字を使用するこ
とも多いか、一般に文字線が太くなるに応じて文字線間
のす白は狭くなる傾向があり、従って極端に線幅か太く
なると文字線間の空白の幅が狭くなりすぎて空白かつぶ
れることもある。On the other hand, the line width of the horizontal stroke of a Mincho typeface character with a character size of 12 mm is approximately 0.3 mm, and when the image pattern of this character is obtained using a commonly used scanner with a resolution of 300 clpi, the line width of the horizontal stroke is 3 pixels. Therefore, if the pattern of a Mincho font with a character size of 12 mm is simply thinned out to 1/4, horizontal strokes may be missing. In addition, headings and titles often use characters with extremely thick lines, such as Gothic fonts or specially designed fonts, or in general, as the character lines become thicker, the spaces between the character lines tend to become narrower. Therefore, if the line width becomes extremely thick, the space between the character lines may become too narrow, causing the space to collapse.

これらの点につき、図％９照しより詳細に説明する。第
５図は水平ストローク欠落の説明に供する図であり、第
５図（Ａ）は明朝体文字の一例を、また第５図ＣＢ）は
第５図（Ａ）の−点鎖線丸で囲んだ部分の水平ストロー
クであって線幅３画素の水平ストロークの例を拡大して
示す。These points will be explained in more detail with reference to Figure 9. Figure 5 is a diagram used to explain missing horizontal strokes. Figure 5 (A) shows an example of Mincho typeface characters, and Figure 5 (CB) shows the circle surrounded by the - dotted chain line in Figure 5 (A). An example of a horizontal stroke with a line width of 3 pixels is shown enlarged.

第５図（Ｂ）においてスキャナ読取り時の画素位８％点
線の格子点で及び１／４に間引きした時の画素位置を直
線の格子点で表し、さらに白黒２値に量子化された文字
パタンの黒画素（文字線部）を黒丸印を付した格子点で
及び白画素（文字背景部）を無印の格子点て表す。正規
化前の文字パタンを格納するパタンメモリ上にはスキャ
ナの土走査位Ｗを表すＸ軸及びスキャナの副走査位曹を
表すＹ軸を設定しており、Ｘ軸及びＹ軸に平行な点線Ｔ
ｘ及びＴｖを交差させてスキャナの読取り画素位置に対
応する位置に点線の格子点を設定しでいる。In Fig. 5(B), the pixel position when read by the scanner is represented by 8% dotted line grid points, and the pixel position when thinned out to 1/4 is represented by straight line grid points, and the character pattern is further quantized into black and white binary. The black pixels (character line part) are represented by grid points with black circles, and the white pixels (character background part) are represented by unmarked grid points. On the pattern memory that stores character patterns before normalization, an X-axis representing the scanner's scanning position W and a Y-axis representing the scanner's sub-scanning position are set, and dotted lines parallel to the X- and Y-axes are set. T
Dotted grid points are set at positions where x and Tv intersect and correspond to pixel positions read by the scanner.

単純に１／４に間引きを行なう場合、パタン上に設定し
たＸ−Ｙ座標系の原点がら数えて４本目毎の点線Ｔｘ及
びＴＶ￥！、英綴Ｊｘ及びＪＶとしで選択し、これら英
綴の格子点に存在する白画素或は黒画素をそのまま間引
きにより正規化した文字パタンの画素として採用する。When simply thinning out to 1/4, every fourth dotted line Tx and TV\! counting from the origin of the X-Y coordinate system set on the pattern. , Jx, and JV are selected, and the white pixels or black pixels existing at the grid points of these English spellings are directly adopted as pixels of the character pattern normalized by thinning.

従って第５図（Ｂ）に示すように、線幅３画素の水平ス
トロークが１１ｇ５接する直線Ｊｘ間に位置する場合、
間引き後の文字パタンには線幅３画素の水平ストローク
に対応する黒画素が残らず、この結果、水平ストローク
は欠落する。Therefore, as shown in FIG. 5(B), when a horizontal stroke with a line width of 3 pixels is located between straight lines Jx that are in contact with 11g5,
There are no black pixels left in the character pattern after thinning that correspond to a horizontal stroke with a line width of 3 pixels, and as a result, the horizontal stroke is missing.

第６図は文字間の空白のつぶれの説明に供する図であり
、第６図（Ａ）はゴシック体文字の一例を、また第６図
（Ｂ）は第６図（Ａ）の矢印で指し示した部分の文字線
間の空白であって幅３画素の空白の例を拡大して示す。Figure 6 is a diagram used to explain the collapse of spaces between characters. Figure 6 (A) shows an example of Gothic characters, and Figure 6 (B) shows the arrows in Figure 6 (A). An example of a blank space between character lines in a portion with a width of 3 pixels is shown enlarged.

第６図ＣＢ）においで第５図（Ｂ）に示す構成成分と同
様の構成成分については同一の符号を付してその詳細な
説明を省略する。In FIG. 6(CB), the same components as those shown in FIG. 5(B) are given the same reference numerals, and detailed explanation thereof will be omitted.

第６図（８）ｔこ示すよう１こ、幅３画素の空白が隣接
するＭ線Ｊｘ間に位置する場合、間引き後の文字パタン
には＠３画素の空白（こ対応する白点が残らず、この結
果、文字線間の空白はつぶれる。Figure 6 (8) If a blank space of 1 pixel width and 3 pixels width is located between adjacent M lines Jx as shown in Fig. As a result, the spaces between character lines are collapsed.

このように文字パタンを単純に間引いで正規化すると、
文字サイズの大きな文字パタンの線幅か細い場合や太い
場合に正規化後の文字パタンにおいて文字線か欠落した
つ文字線間の空白がつぶれたりすることかあるため、文
字認識の誤読やりジェクト（認識不能）が増加し認識精
度が悪くなる。If you normalize the character pattern by simply thinning it out in this way,
If the line width of a character pattern with a large character size is thin or thick, character lines may be missing or the spaces between character lines may be collapsed in the character pattern after normalization. impossibility) increases and recognition accuracy deteriorates.

この発明の目的は上述した従来の問題点を解決するため
、正規化による文字線の欠落や空白のつぶれをなくす或
は減少させるようにした文字認識装Ｈを提供することに
ある。SUMMARY OF THE INVENTION An object of the present invention is to provide a character recognition device H that eliminates or reduces missing character lines and collapsed spaces due to normalization, in order to solve the above-mentioned conventional problems.

（課題を解決するための手段）この目的の達成を図るため、この発明の文字認識装雪は
、文字媒体の量子化された画像データから一文字単位に
文字パタンを切出す文字切出し部と、各文字パタン毎に
文字外接枠を検出し、文字外接枠の位置から文字パタン
の文字サイズを求める文字枠検出部と、文字サイズに応
じた正規化定数を設定する正蜆化定数決定部と、正規化
定数に基づいて文字パタンを正規化する正規化部と、正
規化された文字パタンの認識を行なう認識部とを備えで
成る文字認識装置において、文字パタンの線幅を検出す
る線幅検出部と、線幅が第一の所定幅以下となる文字パ
タンの文字線を太くする！−の処理、及び、線幅が第二
の所定幅以上となる文字パタンの文字線を細くする第二
の処理の双方又はいずれか一方の処理を行なう線幅変換
部とを備えで成ることを特徴とする。(Means for Solving the Problem) In order to achieve this object, the character recognition system of the present invention includes a character cutting section that cuts out a character pattern for each character from quantized image data of a character medium; A character frame detection unit that detects a character circumscribing frame for each character pattern and determines the character size of the character pattern from the position of the character circumscribing frame, a normalization constant determination unit that sets a normalization constant according to the character size, A line width detection unit that detects the line width of a character pattern in a character recognition device comprising a normalization unit that normalizes a character pattern based on a constant and a recognition unit that recognizes the normalized character pattern. , thicken the character lines of character patterns whose line width is less than or equal to the first predetermined width! - a line width conversion unit that performs both or one of the processing of - and the second processing of thinning character lines of a character pattern whose line width is equal to or greater than a second predetermined width. Features.

（作用）このような構成によれば、文字パタンの線幅を検出し、
線幅が第一の所定幅以下となる文字パタンの文字線を太
くする第一の処理、及び又は、線幅が第二の所定幅以上
となる文字パタンの線幅を細くする第二の処理を行なう
。(Function) According to such a configuration, the line width of the character pattern is detected,
A first process of thickening the character lines of a character pattern whose line width is less than or equal to a first predetermined width, and/or a second process of thinning the line width of a character pattern whose line width is greater than or equal to a second predetermined width. Do the following.

線幅が第一の所定幅以下となる文字パタンを検出した場
合この文字パタンは正規化後に欠落するおそれのある細
い文字線を有するので、このような文字線を太めること
によって正規化後に文字線か欠落するのを防止する。If a character pattern is detected whose line width is less than or equal to the first predetermined width, this character pattern has thin character lines that may be missing after normalization. Prevent lines from missing.

また線幅か第二の所定幅以上となる文字パタンを検出し
た場合この文字パタンは正規化後につぶれるおそれのあ
る文字線間空白を有するので、文字線を細めることによ
って文字線間空白の幅を広くし正規化後に文字線間空白
がつぶれるのを防止する。In addition, if a character pattern whose line width is equal to or larger than the second predetermined width is detected, this character pattern has spaces between character lines that may be collapsed after normalization, so the width of the spaces between character lines can be reduced by narrowing the character lines. Make it wider to prevent spaces between character lines from being collapsed after normalization.

従って第一の処理及び又は第二の処理を行なうことによ
って、文字線の欠落及び又は空白のつぶれが正規化によ
り生じるのをなくし或は減少させることができる。Therefore, by performing the first process and/or the second process, it is possible to eliminate or reduce the occurrence of missing character lines and/or collapsed spaces due to normalization.

（実施例）以下、図面％９照し、この発明の実施例につき説明する
。(Examples) Examples of the present invention will be described below with reference to the drawings.

第１図はこの発明の詳細な説明に供する機能ブロック図
である。同図にも示すように、この実施例の文字認識装
置は、文字媒体の量子化された画像データから一文字単
位に文字パタンを切出す文字切出し部１０と、各文字パ
タン毎に文字外接枠泡検出し、文字外接枠の位置から文
字パタンの文字サイズを求める文字枠検出部１２と、文
字サイズに応じた正規化定数を設定するユ規化定数決定
部１４と、正規化定数に基づいて文字パタンを正規化す
る正規化部１６と、正規化された文字パタンの認識を行
なう認識部１８とを備え、ざらに文字パタンの線幅を検
出する線幅検出部２０と、線幅が第一の所定幅以下とな
る文字パタンの文字線を太くする第一の処理、及び、線
幅が第二の所定幅以上となる文字パタンの文字線を細く
する第二の処理の双方の処理を行なう線幅変換部２２と
を備えて成る。FIG. 1 is a functional block diagram for explaining the invention in detail. As shown in the figure, the character recognition device of this embodiment includes a character cutting section 10 that cuts out character patterns character by character from quantized image data of a character medium, and a character circumscribing frame bubble for each character pattern. A character frame detection unit 12 detects the character size of a character pattern from the position of a character circumscribing frame, a normalization constant determination unit 14 sets a normalization constant according to the character size, and a It includes a normalization section 16 that normalizes a pattern, a recognition section 18 that recognizes a normalized character pattern, a line width detection section 20 that roughly detects the line width of the character pattern, and a line width detection section 20 that roughly detects the line width of the character pattern. A first process of thickening the character lines of a character pattern whose line width is equal to or larger than a second predetermined width is performed, and a second process of thinning the character lines of a character pattern whose line width is equal to or greater than a second predetermined width. The line width conversion section 22 is also provided.

尚、第１図において２４及び２６は光電変換部及びパタ
ンレジスタを示す。In FIG. 1, 24 and 26 indicate a photoelectric conversion section and a pattern register.

以下、より詳細にこの実施例につき説明する。This embodiment will be explained in more detail below.

光電変換部２４は処理対象となる帳票を光学的に走査し
て帳票からの光信号ＬＶ大入力、光信号りを光電変換し
て電気信号の画像データを得、画像データを例えば白黒
２値に量子化する。The photoelectric conversion unit 24 optically scans the form to be processed, receives a large optical signal LV from the form, photoelectrically converts the optical signal to obtain electrical signal image data, and converts the image data into, for example, black and white binary. Quantize.

文字切出部１０は図示しない画像メモリに光電変換部２
４からの画像データを保存し、この画像データを走査し
て画像データから一文字単位に文字パタンを切出し、切
出した文字パタンをパタンレジスタ２６に保存する。The character cutting section 10 is connected to a photoelectric conversion section 2 in an image memory (not shown).
The image data from No. 4 is saved, the image data is scanned, character patterns are cut out character by character from the image data, and the cut out character patterns are stored in a pattern register 26.

文字枠検出部１２はパタンレジスタ２６の文字パタンを
走査して当該パタンの文字外接枠を検出する。パタンレ
ジスタ２６上にはＸ−Ｙ座標系を設定しでおり、この座
標系で表される文字外接枠の上端、下端、左端及び右端
位置Ｙ丁、Ｙ８、Ｘ、及びＸＲを検出する。上端、下端
位置Ｙ工、Ｙ、はＹ軸方向における文字外接枠の始端、
終端位Ｎを、また左端、右端位置ＸＬ、Ｘｌｌはｘ軸方
向における文字外接枠の始端、終端位Ｍを表すものであ
り、文字枠検出部１２は上端、下端位置ＹアｓＹＢから
文字パタンの高さを算出し、さらに左端、右端位置ＸＬ
、ＸＲから文字パタンの幅を算出する。The character frame detection unit 12 scans the character pattern in the pattern register 26 and detects the character circumscribing frame of the pattern. An X-Y coordinate system is set on the pattern register 26, and the upper, lower, left, and right end positions of the character circumscribing frame represented by this coordinate system are detected. Upper and lower end positions Y, Y, are the starting ends of the character circumscribing frame in the Y-axis direction,
The end position N, and the left and right end positions XL and Xll represent the start and end positions M of the character circumscribing frame in the x-axis direction, and the character frame detection unit 12 detects the character pattern from the top and bottom positions Y as YB. Calculate the height and further left end and right end positions XL
, XR to calculate the width of the character pattern.

正規化定数決定部１４は各文字毎に文字パタンの高さ及
び幅から正規化定数を決定する。ここでパタンレジスタ
２６に格納された文字パタンを着目文字パタンと称する
とすれば、まず、着目文字パタンの高さ及び幅のうち値
が大きい方を当該パタンの文字サイズＳとして検出する
。ここで処理対象となる帳票において文字サイズの最小
値が例えば３ｍｍであることが予めわかっているものと
すれば、例えば、３≦Ｓ＜６のときＮ＝１．６≦Ｓ＜９
のときＮ＝１／２．９≦Ｓ〈１２のときＮ＝１／３、・
・・以下３ｎ≦Ｓ＜３（ｎ＋１）のときＮ＝ｉ／ｎ（ｎ
は自然数）となるように３ｍｍ単位に文字サイズＳを分
類し、その分類に応じた正規化定数Ｉｎ着目文字パタン
に付与する。The normalization constant determining unit 14 determines a normalization constant for each character from the height and width of the character pattern. If the character pattern stored in the pattern register 26 is referred to as a character pattern of interest, first, the larger value of the height and width of the character pattern of interest is detected as the character size S of the pattern. Assuming that it is known in advance that the minimum character size of the form to be processed is, for example, 3 mm, for example, when 3≦S<6, N=1.6≦S<9
When N=1/2.9≦S〈12, N=1/3,・
...When 3n≦S<3(n+1), N=i/n(n
is a natural number), and a normalization constant In corresponding to the classification is given to the character pattern of interest.

解像度３００ｄｐｉのスキャナにより画像データを得た
場合、着目文字パタンの文字サイズが３ｍｍであれば当
該文字パタンの大きさは３５Ｘ３５画素程度、また文字
サイズが６ｍｍであれ（ざ文字パタンの大きざは７０Ｘ
７０画素程度となり、従ってこの場合に上述のように正
規化定数Ｎを付与するようにすれば、文字パタンの大き
ざを文字サイズに関わりなく３５ｘ３５〜７０Ｘ７０画
素の大きざに正規化することができ従って正規化後の文
字パタンを格納するパタンメモリの容量！１２８ｘ１２
８画素以内とすることかできる。When image data is obtained using a scanner with a resolution of 300 dpi, if the character size of the character pattern of interest is 3 mm, the size of the character pattern is about 35 x 35 pixels, and if the character size is 6 mm (the size of the character pattern is 70 x
Therefore, if the normalization constant N is assigned as described above in this case, the size of the character pattern can be normalized to 35x35 to 70x70 pixels regardless of the character size. Therefore, the capacity of pattern memory to store character patterns after normalization! 128x12
It can be within 8 pixels.

線幅検出部２ｏはパタンレジスタ２６の着目文字パタン
を走査し当該パタンの線幅を算出する。The line width detection unit 2o scans the character pattern of interest in the pattern register 26 and calculates the line width of the pattern.

線幅の算出を従来周知の任意好適な方法により行なって
よいがこの寅施例では、例えば、線幅検出部２ｏを従来
周知のフィルタ回路と同様のシフトレジスタ構成となし
、次式（１）に示す近似式に従って線幅Ｗを算出するよ
うにする。Although the calculation of the line width may be performed by any conventionally known suitable method, in this embodiment, for example, the line width detector 2o has a shift register configuration similar to a conventionally known filter circuit, and the following equation (1) is used. The line width W is calculated according to the approximate formula shown below.

（１）式においてＱは２×２の窓の全ての点か黒画素と
なった総回数、Ａは文字パタン中の黒画素総個数であり
、線幅検出部２ｏは２Ｘ２の窓の全ての点が黒画素とな
る回数及び文字パタン中の黒画素の個数を計数して総回
数へ及び総個数Ｑを求め、（１）式に従って線幅ＷＩＦ
ｒ算出する。In equation (1), Q is the total number of times that all points in the 2×2 window became black pixels, A is the total number of black pixels in the character pattern, and the line width detection unit 2o detects all the points in the 2×2 window. Count the number of times a point becomes a black pixel and the number of black pixels in the character pattern to calculate the total number of times and the total number Q, and calculate the line width WIF according to formula (1).
Calculate r.

この寅施例の線幅変換部２２は、太め処理部２８、細め
処理部３０、データ切換部３２及び処理選択部３４から
成る。The line width conversion section 22 of this embodiment includes a thickening processing section 28, a thinning processing section 30, a data switching section 32, and a processing selection section 34.

処理選択部３４は線幅Ｗ及び正規化定数Ｎに基づき、パ
タンレジスタ２６の着目文字パタンに関し、着目文字パ
タン、着目文字パタンの線幅を第一の処理により太めて
得た文字パタン（以下、太めパタン〕、及び着目文字パ
タンの線幅を第二の処理により細めて得た文字パタン（
以下、細めパタン）のうちいずれを文字認識に用いるパ
タンとするかを決定する。Based on the line width W and the normalization constant N, the processing selection unit 34 selects a character pattern of interest, a character pattern obtained by thickening the line width of the character pattern of interest by the first process (hereinafter referred to as thick pattern], and a character pattern obtained by narrowing the line width of the character pattern of interest by the second process (
Hereinafter, it is determined which of the narrow patterns) is to be used for character recognition.

このため処理選択部３４は、まず、第一の処理を実行す
る場合の処理実行回数ＭＣＮＴ及び第二の処理を実行す
る場合の処理実行回数ＮＣＮＴを算出する。For this reason, the process selection unit 34 first calculates the number of times MCNT of process executions when executing the first process and the number of process executions NCNT when executing the second process.

第一の処理を１回行なうと線幅がＭｕＰだけ増加し、第
一の処理をＭＣＮ７回繰返して太めた結果、第一の所定
幅Ｃ，（例えばＣ，＝３）の線幅が得られたとすれば所
定幅Ｃ１は次式（１）のように表せる。When the first process is performed once, the line width increases by MuP, and as a result of repeating the first process MCN 7 times to increase the thickness, a line width of the first predetermined width C, (for example, C, = 3) is obtained. Then, the predetermined width C1 can be expressed as shown in the following equation (1).

Ｃ＋　＝　Ｎ　＊　（ＭＣＮ　Ｔ　＊　Ｍｕｐ＋Ｗ）・
・・・・・（１）線幅の増分Ｍ　ｕ　ｐの設定値は、例
えば、文字線の縁にＶａ接する白点を全て黒点に盲き換
える場合にはＭｕ、＝２とし、文字線の上側縁及び右側
縁に隣接する白点を黒点に置き換える場合或は文字線の
上側縁及び左側縁にｗ４接する白点を黒点に置き換える
場合にはＭｕｐ”１とすればよい。C+ = N * (MCN T * Mup+W)・
...(1) The setting value of the line width increment M u p is, for example, when all the white points touching the edge of the character line Va are to be replaced with black points, Mu = 2, and the setting value of the line width increment Mu When replacing the white points adjacent to the upper and right edges with black points, or when replacing the white points adjacent to the upper and left edges of the character line w4 with black points, Mup"1 may be used.

（１）式より次式（２）を得る。The following equation (2) is obtained from equation (1).

Ｍ　ＣＮ　Ｔ　ｆ、ｔ　０以上の整数であって、（２）
式に従って算出したＭＣＮＴが負の値のときはＭＣＮＴ
＝Ｏとする。また、算出したＭＣＮＴが整数でない場合
には例えば小数点以下を切り上げ、切り捨て或は四捨五
入することにより、算出したＭＣＮＴを整数化するが、
好ましくは、（１）式の右辺が所定幅Ｃ１よりも大きく
なる最小の整数！ＭＣＮＴとするのがよい。M CN T f,t is an integer greater than or equal to 0, (2)
If MCNT calculated according to the formula is a negative value, MCNT
=O. In addition, if the calculated MCNT is not an integer, the calculated MCNT is converted into an integer by, for example, rounding up, rounding down, or rounding off the decimal point.
Preferably, the right side of equation (1) is the smallest integer that is larger than the predetermined width C1! It is better to use MCNT.

また第二の処理を１回行なうと線幅かＭ。、た゛け減少
し、第二の処理をＮＣＮ７回繰返しで細めた結果、第二
の所定幅Ｃ２（例えばｃ２＝３）の線幅が得られたとす
れば所定幅Ｃ２は次式（３）のように表せる。Also, if you perform the second process once, the line width will be M. , and as a result of thinning the second process by repeating the NCN seven times, if a line width of the second predetermined width C2 (for example, c2=3) is obtained, the predetermined width C2 is calculated as shown in the following equation (3) It can be expressed as

Ｃ２＝Ｎ＊　（Ｗ−ＮＣＮＴ＊Ｍｏ５）−−（３）線幅
の減分Ｍ。Ｎの設定値は、例えば、文字線の縁にｌｌＩ
接する白点を全て黒点に置き換える場合にはＭ、、＝２
とし、文字線の下側縁及び右側縁に隣接する白点を黒点
に置き換える場合或は文字線の上側縁及び左側縁にｗ４
接する白点を黒点に貫き換える場合にはＭ。、＝１とす
ればよい。C2=N*(W-NCNT*Mo5)--(3) Decrement M of line width. The setting value of N is, for example, llI on the edge of the character line.
When replacing all touching white points with black points, M, , = 2
If you want to replace the white dots adjacent to the lower and right edges of the character line with black dots, or add w4 to the upper and left edges of the character line.
M to change the touching white point to a black point. ,=1.

（３）式より次式（４）を得る。The following equation (4) is obtained from equation (3).

ＮＣＮＴは０以上の整数であって、（４）式に従って算
出したＮＣＮＴが負の値のときはＮＣＮＴ＝○とする。NCNT is an integer greater than or equal to 0, and when NCNT calculated according to equation (4) is a negative value, NCNT=◯.

また、算出したＮＣＮＴが整数でない場合には例えば小
数点以下を切り上げ、切り捨で或は四捨五入することに
より、算出したＮＣＮＴを整数化するか、好ましくは、
（３）式の右辺か所定幅Ｃ２よりも小ざくなる最大の整
数ＶＮＣ：ＮＴとするのがよい。In addition, if the calculated NCNT is not an integer, the calculated NCNT is converted into an integer by rounding up, rounding down, or rounding off to the nearest whole number, or preferably,
It is preferable to set the maximum integer VNC:NT that is smaller than the right side of equation (3) or the predetermined width C2.

上述のようにして求めた処理回数ＭＣＮＴ及びＮＣＮＴ
が共に０である場合とＭＣＮＴ及びＮＣＮＴが共に０で
ない場合には、処理選択部３４はパタンレジスタ２６の
着目文字パタンを文字認識に用いることを表す第一の指
示信号を出力する。Processing counts MCNT and NCNT obtained as described above
are both 0, and when both MCNT and NCNT are not 0, the processing selection unit 34 outputs a first instruction signal indicating that the character pattern of interest in the pattern register 26 is to be used for character recognition.

処理回数ＭＣＮＴ≠０かつＮＣＮＴ＝Ｏのとき着目パタ
ンの線幅は第一の所定幅Ｃ７以下であることを表すので
、処理選択部３４は第一の処理で線幅を太めた太めパタ
ンを文字認識に用いることを表す第二の指示信号を出力
する。When the number of processes MCNT≠0 and NCNT=O indicates that the line width of the pattern of interest is less than or equal to the first predetermined width C7, the process selection unit 34 selects a thick pattern whose line width is thickened in the first process as a character. A second instruction signal representing use for recognition is output.

ざらに処理回数ＮＣＮＴ≠０かつＭＣＮＴ＝○のときは
着目パタンの線幅は第二の所定幅Ｃ２以下であることを
表すので、第二の処理で線幅を細めた細めパタンを文字
認識に用いることを表す第三の指示信号を出力する。When the rough processing number NCNT≠0 and MCNT=○, it means that the line width of the pattern of interest is less than or equal to the second predetermined width C2, so the narrow pattern whose line width is narrowed in the second process is used for character recognition. A third instruction signal indicating use is output.

太め処理部２８は処理選択部３４から第二の指示信号を
入力すると着目文字パタンの文字線を太める処理を行な
う。この実施例の太め処理部２８は従来周知のフィルタ
構成と同様にシフトレジスタレジスタ構成を有し、例え
ば３×３の窓を用いて文字Ｓａミラめる第一の処理を行
なう。When the thickening processing section 28 receives the second instruction signal from the processing selection section 34, it performs a process of thickening the character lines of the character pattern of interest. The thick processing section 28 of this embodiment has a shift register configuration similar to a conventionally well-known filter configuration, and performs the first process of mirroring the character Sa using, for example, a 3.times.3 window.

第２図は３Ｘ３の窓を示す図、第３図（Ａ）及び（Ｂ）
は第一の処理前の文字線及び第一の処理後の文字線を示
す図である。第３図（Ａ）は所定幅Ｃ１以下の線幅Ｗを
有する文字線（垂直ストローク）とこの文字線の縁に隣
接する白画素の全部とを示し、第３図（Ｂ）は第３図（
Ａ）に示す文字線を第一の処理によって大めで得た文字
線を示す、これら第３図においで白丸は白画素を、黒丸
は黒画素を及び丸のなかに点を付した白丸は白画素から
黒画素に変更された画素を表す。Figure 2 shows a 3x3 window, Figure 3 (A) and (B)
FIG. 2 is a diagram showing a character line before the first process and a character line after the first process. FIG. 3(A) shows a character line (vertical stroke) having a line width W less than a predetermined width C1 and all white pixels adjacent to the edge of this character line, and FIG. 3(B) shows (
Figure 3 shows the character lines obtained by enlarging the character lines shown in A) in the first process. In these figures, white circles represent white pixels, black circles represent black pixels, and white circles with dots inside the circles represent white. Represents a pixel that has been changed from a pixel to a black pixel.

太め処理部２８は、３ｘ３の窓（第２図参照）の着目画
素ａＳが白画素で周囲の画素ａ、〜ａ６のいずれかの画
素が黒画素のとき着目画素ａ９を黒画素に変更し、所定
幅Ｃ１以下の文字線の締にｗＡ接する白画素を全て黒画
素とし、線幅を２増加させる（第３図（Ａ）及び（Ｂ）
？照）。The thick processing unit 28 changes the pixel of interest a9 to a black pixel when the pixel of interest aS in the 3x3 window (see FIG. 2) is a white pixel and any of the surrounding pixels a, to a6 is a black pixel, All white pixels that are in contact wA with the edge of a character line with a predetermined width C1 or less are made black pixels, and the line width is increased by 2 (Fig. 3 (A) and (B)
? (see).

細め処理部３０は処理選択部３４から第三の指示信号を
入力すると着目文字パタンの文字線を細める処理を行な
う。この実施例の細め処理部３０は従来周知のフィルタ
構成と同様にシフトレジスタレジスタ構成を有し、例え
ば３×３の窓を用いて文字線を細める第二の処理を行な
う。When the narrowing processing section 30 receives the third instruction signal from the processing selection section 34, it performs a process of narrowing the character lines of the character pattern of interest. The thinning processing section 30 of this embodiment has a shift register structure similar to a conventionally known filter structure, and performs a second process of thinning a character line using, for example, a 3.times.3 window.

第４図（Ａ）及び（８）は第二の処理前の文字線及び第
二の処理後の文字線を示す図である。第４図（Ａ）は所
定幅Ｃ２以下の線幅Ｗを有する文字線（垂直ストローク
）とこの文字線の縁にｗＡ接する白画素の全部とを示し
、篤４図（Ｂ）は第４図（Ａ）に示す文字線を第二の処
理１こよって細めて得た文字線を示す。これら第４図に
おいて白丸は白画素を、黒丸は黒画素を及び丸のなかに
×を付した白丸は黒画素から白画素に変更された画素を
表す。FIGS. 4(A) and 4(8) are diagrams showing character lines before the second process and character lines after the second process. Figure 4 (A) shows a character line (vertical stroke) having a line width W less than a predetermined width C2 and all of the white pixels that are in contact with the edge of this character line. The character line obtained by thinning the character line shown in (A) through the second process 1 is shown. In FIG. 4, white circles represent white pixels, black circles represent black pixels, and white circles with an x inside the circle represent pixels that have been changed from black pixels to white pixels.

細め処理部３０は、３Ｘ３の息（第２図参照）の着目画
素ａ９が黒画素でしかも周囲の画素ａ〜ａ８を順次に走
査したときに白画素の次に黒画素が現れる回数Ｈ１と黒
画素の次に白画素が現れる回数Ｈ２とを計数しこれら計
数値の和Ｈ，＋Ｈ２がＨ，＋Ｈ２＝２となるとき着目画
素ａｅ％白画素に変更し、所定幅Ｃ２以上の文字線の縛
に位置する黒画素を全て白画素とし、線幅を２減少させ
る（第４図（Ａ）及び（Ｂ）参照）。The narrowing processing unit 30 calculates the number H1 of the number of times a black pixel appears next to a white pixel when the target pixel a9 of 3×3 breath (see FIG. 2) is a black pixel, and the surrounding pixels a to a8 are sequentially scanned, and the black pixel is a black pixel. Count the number of times H2 that a white pixel appears next to a pixel, and when the sum of these counted values H,+H2 becomes H,+H2=2, change the pixel of interest ae% to a white pixel, and restrict character lines with a predetermined width C2 or more. All black pixels located at are made white pixels, and the line width is decreased by 2 (see FIGS. 4(A) and 4(B)).

データ切換部３２は処理選択部３４から第一の指示信号
を入力した場合にはパタンレジスタ２６の着目文字パタ
ンを、第二の指示信号を入力した場合には太め処理部２
８により作成された太めパタンを、また第三の指示信号
を入力した場合ｆこは細め処理部３０により作成された
細めパタンを入力し、入力したパタンを正規化部１６へ
出力する。The data switching unit 32 selects the character pattern of interest from the pattern register 26 when the first instruction signal is input from the processing selection unit 34, and selects the target character pattern from the thick processing unit 2 when the second instruction signal is input.
When the third instruction signal is inputted, the thinning pattern created by the thinning processing section 30 is inputted, and the inputted pattern is outputted to the normalization section 16.

この実施例の正規化部１６はデータ切換部３２から入力
した着目文字パタン、大めパタン或は細めパタンに対し
従来と同様に間引き処理を行なってパタンを正規化する
ものであり、Ｎ＝１のときはデータ切換部３２から入力
したパタンをそのまま認識部１８へ出力し、Ｏ＜Ｎ＜１
のときはデータ切換部３２から入力したパタンをＮ倍に
縮小したパタンを認識部１８へ出力する。以下、正規化
部］６が出力したパタンを正規化パタンと称する。The normalization unit 16 of this embodiment normalizes the pattern by thinning out the character pattern of interest, large pattern, or narrow pattern input from the data switching unit 32 in the same manner as in the past. In this case, the pattern input from the data switching section 32 is output as is to the recognition section 18, and O<N<1.
In this case, a pattern obtained by reducing the pattern inputted from the data switching section 32 by N times is output to the recognition section 18 . Hereinafter, the pattern output by the normalization unit 6 will be referred to as a normalized pattern.

認識部１８は正規化パタンかう文字特徴を抽出し、文字
特徴を図示しない辞書と照合して文学誌ｍを行ない、こ
の認識結果として例えば文字コードを次段の装置へ出力
する。The recognition unit 18 extracts the character features according to the normalized pattern, compares the character features with a dictionary (not shown), performs literary journal m, and outputs, for example, a character code as the recognition result to the next stage device.

尚、第一の所定幅Ｃ７は太めパタンの正規化パタンにお
ける文字線の平均的な線幅を表し、この平均的な線幅が
どの程度の値のとき或はどの程度の数値範囲のとき正規
化パタンでのストローク欠落を実質的になくせるか予め
統計的に調べて、ストロークの欠落を実質的になくせる
任意好適な値に所定幅Ｃ７を設定する。同様に、第二の
所定幅Ｃ２は細めパタンの正規化パタンにおける文字線
の平均的な線幅を表し、この平均的な線幅がどの程度の
値のとき或はどの程度の数値！！囲のとき正規化パタン
での文字線間空白のつぶれを実質的になくせるか予め統
計的に調べて、空白のつぶれを実質的になくせる任意好
適な値に所定幅Ｃ２を設定する。The first predetermined width C7 represents the average line width of character lines in the normalized thick pattern, and when this average line width is a value or in what numerical range it is normalized. The predetermined width C7 is set to an arbitrary suitable value by statistically examining in advance whether or not missing strokes in the cursive pattern can be substantially eliminated. Similarly, the second predetermined width C2 represents the average line width of the character lines in the normalized pattern of the narrow pattern, and what value is this average line width? ! It is statistically investigated in advance whether it is possible to substantially eliminate the collapse of spaces between character lines in the normalized pattern in the case of a normalization pattern, and the predetermined width C2 is set to an arbitrary suitable value that can substantially eliminate the collapse of spaces.

この発明は上述した実施例にのみ限定されるものではな
く、従って各構成成分の動作、構成、処理の流れ、数値
的条件そのほかを任意好適に変更することができる。The present invention is not limited to the embodiments described above, and therefore, the operation, configuration, processing flow, numerical conditions, etc. of each component can be changed as desired.

上述した実施例では処理寅行回数ＭＣＮＴ及びＮＣＮＴ
の数値に応じて着目文字パタン、太めパタン及び細めパ
タンのいずれを文字認識に用いるパタンとするかを判定
するようにしたが、このほか着目文字パタンの線幅Ｗそ
予め与えられた設定値と比較し、この比較結果に応じて
いずれのパタンとするか判定するようにしてもよい０例
えば、Ｗ＜１０であれば着目文字パタンに対し第一の処
理を２回繰返して得た太めパタンを文字認識に用いるパ
タンとし、２０≦Ｗであれば着目文字パタンに対し第二
の処理を２回繰返しで得た細めパタン壱文字認識に用い
るパタンとし、１ｏ≦Ｗ〈２０であれば着目文字パタン
を文字認識に用いるパタンとすればよい。In the embodiment described above, the number of processing times MCNT and NCNT
It is determined which of the character pattern of interest, a thick pattern, or a thin pattern is to be used for character recognition according to the numerical value of .In addition, the line width W of the character pattern of interest For example, if W<10, a thicker pattern obtained by repeating the first process twice for the character pattern of interest may be used. If 20≦W, use the narrow pattern obtained by repeating the second process twice for the character pattern of interest.1) Use the pattern to be used for character recognition; if 1o≦W<20, use the pattern of the character of interest. may be used as a pattern for character recognition.

また上述した実施例では、正規化定数Ｎの容重に全て共
通の優の第一、第二の所定幅ｃ１、Ｃ２を用いるように
したか、これら所定幅Ｃ，、Ｃ２の１１８正規化定数Ｎ
の値の大きざに応じて変化させるようにしてもよい。例
えば正規化定数Ｎ＝１のときにはＣ，＝Ｃ２＝３、Ｎ＝
１／２のときにはＣ＋　＝Ｃ２＝４、Ｎ＝１／３のとき
にはＣ，＝Ｃ２＝５、・・・・・・とすればよい。また
所定幅Ｃ１、Ｃ２の値は等しくても等しくなくともよい
。In addition, in the embodiment described above, the first and second predetermined widths c1 and C2, which are common to all the normalization constants N, are used, or the 118 normalization constants N of these predetermined widths C, C2 are used.
It may be changed depending on the size of the value of . For example, when the normalization constant N=1, C,=C2=3,N=
When N=1/2, C+=C2=4, and when N=1/3, C,=C2=5, . . . Further, the values of the predetermined widths C1 and C2 may or may not be equal.

線幅Ｗは上述した式（１）で定義される以外のｌ！Ｊ幅
を用いてもよいし、パタンの正規化にも間引き以外の従
来周知の方法を用いることができる。The line width W is l! other than that defined by the above equation (1). The J width may be used, and conventionally known methods other than thinning may be used to normalize the pattern.

（発明の効果）上述した説明からも明らかなように、この発明の文字認
識装置によれば、線幅が第一の所定幅以下となる文字パ
タンの文字線を太くする第一の処理、及び又は、線幅が
第二の所定幅以上となる文字パタンの線幅を細くする第
二の処理を行なう。(Effects of the Invention) As is clear from the above description, according to the character recognition device of the present invention, the first process of thickening the character lines of the character pattern in which the line width is equal to or less than the first predetermined width; Alternatively, a second process of thinning the line width of a character pattern whose line width is equal to or greater than a second predetermined width is performed.

線幅が第一の所定幅以下となる文字パタンを検出した場
合この文字パタンは正規化後に欠落するおそれのある細
い文字線を有するので、このような文字線を太めること
によって正規化後に文字線が欠落するのを防止する。ま
た線幅が第二の所定幅以上となる文字パタンを検出した
場合この文字パタンは正規化後につぶれるおそれのある
文字線間空白を有するので、文字線を細めることによっ
て文字！！闇空白の幅を広くし正規化後に文字線間空白
がつぶれるのを防止する。If a character pattern is detected whose line width is less than or equal to the first predetermined width, this character pattern has thin character lines that may be missing after normalization. Prevent missing lines. Also, if a character pattern with a line width greater than or equal to the second predetermined width is detected, this character pattern has spaces between character lines that may be collapsed after normalization, so by narrowing the character lines, the character pattern can be changed. ! Widen the width of the dark space to prevent the space between character lines from being collapsed after normalization.

この結果、例えば文字サイズが大きくかつ線幅の細い文
字パタン壱文字線の欠落なく或はほとんど欠落させるこ
となく正規化し、また文字サイズが大きくかつ線幅の太
い文字パタンを文字量空白のつぶれなく或はほとんどつ
ぶれることなく正規化することができ、従って誤読や読
取不能を減少させて文字認識精度を高めることができる
。As a result, for example, a character pattern with a large character size and thin line width can be normalized without or with almost no character lines missing, and a character pattern with a large character size and thick line width can be normalized without collapsing the amount of character space. Alternatively, it is possible to normalize with almost no distortion, thereby reducing misreading and unreadability and improving character recognition accuracy.

[Brief explanation of drawings]

第１図はこの発明の実施例の構成を概略的に示す機能ブ
ロック図、第２図は３×３の窓を示す図、第３図（Ａ）〜（Ｂ）は第一の処理前後の文字線を示す
図、第４図（Ａ）〜（Ｂ）は第二の処理前後の文字線を示す
図、第５図（Ａ）〜（Ｂ）は水平ストローク欠落の説明に供
する図、第６図（Ａ）〜（Ｂ）は文字量空白のつぶれの説明に供
する図である。１ｏ・・・文字切出部、　　１２・・・文字枠検出部１
４−・・正規化定数決定部１６・・・正規化部、　　　１８・・・認識部２ｏ・・
・線幅検出部、　２２・・・線幅変換部。Fig. 1 is a functional block diagram schematically showing the configuration of an embodiment of the present invention, Fig. 2 is a diagram showing a 3 x 3 window, and Figs. 3 (A) and (B) are before and after the first processing. Figures 4(A) and 4(B) are diagrams showing character lines before and after the second process; Figures 5(A) and 5(B) are diagrams illustrating missing horizontal strokes; FIGS. 6(A) and 6(B) are diagrams for explaining the collapse of character amount spaces. 1o...Character cutting section, 12...Character frame detection section 1
4--Normalization constant determining section 16... Normalization section 18... Recognition section 2o...
- Line width detection section, 22...Line width conversion section.

Claims

[Claims]

(1) A character cutting unit that cuts out a character pattern one character at a time from quantized image data of a character medium, and detects a character circumscribing frame for each character pattern, and detects the character circumscribing frame from the position of the character circumscribing frame. a character frame detection unit that determines the size; a normalization constant determination unit that sets a normalization constant according to the character size; a normalization unit that normalizes the character pattern based on the normalization constant; A character recognition device comprising: a recognition unit that recognizes a character pattern, the line width detection unit that detects the line width of the character pattern; and a character recognition unit that detects the line width of the character pattern; a line width conversion unit that performs both or either of a first process of thickening a line and a second process of thinning a character line of a character pattern in which the line width is equal to or larger than a second predetermined width; A character recognition device comprising:

(2) The character recognition device according to claim 1, wherein the line width conversion unit sets the first and second predetermined widths by changing them according to the magnitude of a value of a normalization constant.