JP2006127446A

JP2006127446A - Image processing device, image processing method, program, and recording medium

Info

Publication number: JP2006127446A
Application number: JP2005014033A
Authority: JP
Inventors: Hitoshi Ito; 仁志伊藤; Fumihiro Hasegawa; 史裕長谷川; Toshio Miyazawa; 利夫宮澤; Makoto Ishii; 信石井; Shigemasa Oba; 成征大羽; Takeshi Ogura; 武小倉
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2004-09-29
Filing date: 2005-01-21
Publication date: 2006-05-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing device capable of discriminating a color image or a gray scale image on which a character and an image or the like other than the character are mixed into a plurality of classes with high accuracy. <P>SOLUTION: This image processing device discriminates image information by a discriminator which learns on the basis of learning data, and comprises a feature amount extracting means for extracting a feature amount from the image information, a feature calculating means for calculating a feature which is a combination of the feature amounts extracted by the feature amount extracting means, a learning means for learning the discriminator by the feature amount calculated by the feature calculating means and the feature amount extracted by the feature amount extracting means, a collating means for collating the discrimination result, by applying teacher data to the discriminator having learned by the learning means, with an ideal discrimination result given from an external, and an optimizing means for changing a combination method of the feature amount by the feature calculating means on the basis of a collation result by the collating means. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、画像処理装置、画像処理方法、プログラムおよび記録媒体に関し、特に、文字や写真が混在するカラーあるいはモノクロの画像を複数のクラスに判別することにより、文字判別、物体判別あるいは領域判別などに応用できる判別技術に関する。 The present invention relates to an image processing apparatus, an image processing method, a program, and a recording medium, and in particular, by determining a color or monochrome image in which characters and photographs are mixed into a plurality of classes, character determination, object determination, region determination, etc. Discrimination technology applicable to

画像処理では、従来より文字・文書画像の認識処理が行われている。この認識処理を行う際に、処理対象画像の一部に存在する文字・文書画像が占める文字領域の正しい位置情報を取得することは、高い認識精度を得るために不可欠である。
例えば、画像中の文字領域以外の領域に対して文字認識処理を行った場合、不要な処理を行うため時間がかかるだけでなく、文字の存在しない領域に無理に文字認識を実行した結果として、大量にエラーが出力されることになる。 In image processing, character / document image recognition processing has been conventionally performed. When performing this recognition processing, it is indispensable to obtain the correct position information of the character area occupied by the character / document image existing in a part of the processing target image in order to obtain high recognition accuracy.
For example, when character recognition processing is performed on a region other than a character region in an image, not only does it take time to perform unnecessary processing, but also as a result of forcibly performing character recognition on a region where no character exists, A lot of errors will be output.

このため、特許文献１に記載の技術では、入力された画像を縮小して黒画素の連結成分の外接矩形を得て、その外接矩形を元に、文字、表、図、その他等に分類し、その中から文字要素を取り出し統合して行を生成し、生成した行を統合して文字領域を取得する。さらにこの例では、文字領域から段組情報を抽出し、抽出段の位置を参照して過剰に統合された文字領域を修正するようにしている。 For this reason, in the technique described in Patent Document 1, an input image is reduced to obtain a circumscribed rectangle of a connected component of black pixels, and is classified into characters, tables, figures, and others based on the circumscribed rectangle. Then, the character elements are taken out of them and integrated to generate lines, and the generated lines are integrated to acquire the character area. Further, in this example, column information is extracted from the character region, and the character region that is excessively integrated is corrected with reference to the position of the extraction step.

一方、近年カラープリンタなどの普及に伴い、カラー原稿をもとに文字認識を行うことが増えてきている。このようなカラー原稿をもとにした場合、カラー画像を二値画像に変換なければ上記の特許文献１の技術を適用することができない。
これを解消し、文字や写真が混在するカラー画像から文字領域を抽出する技術として、特許文献２、非特許文献１や非特許文献２がある。 On the other hand, with the recent spread of color printers and the like, character recognition based on color originals is increasing. When such a color original is used, the technique disclosed in Patent Document 1 cannot be applied unless the color image is converted into a binary image.
As a technique for solving this problem and extracting a character region from a color image in which characters and photographs are mixed, there are Patent Document 2, Non-Patent Document 1, and Non-Patent Document 2.

特許文献２に記載の技術では、原画像から圧縮画像を生成し、同色とみなせる画素をランとして抽出し、それらの連結成分を色ごとに求め、得た連結成分を文字候補とみなして近接する連結成分同士を統合して文字行を生成し、その後抽出された文字行から、過抽出部分を排除して文字行を出力するもので、背景という概念を入れることなく文字領域の情報を取得可能としたものである。これにより、直接カラー画像の画素情報を用いることで、より精度のよい文字領域抽出を行うことを可能とし、背景色が連続的に変化している場合でも対処できるようにした。 In the technique described in Patent Document 2, a compressed image is generated from an original image, pixels that can be regarded as the same color are extracted as runs, their connected components are obtained for each color, and the obtained connected components are regarded as character candidates and are adjacent to each other. By combining connected components to generate a character line, and then extracting the character line extracted from the extracted character line, the character line can be output without the concept of background. It is what. As a result, by directly using the pixel information of the color image, it is possible to perform more accurate character region extraction, and to cope with even when the background color continuously changes.

非特許文献１では、文字は色とサイズが同じであるという事前知識のもとで、カラー空間内でのクラスタを頼りに高精度に文字列抽出を行っている。また、同様に非特許文献２でも雑誌の表紙などのカラー画像から文字領域を抽出している。
特開２０００−６７１５８号公報特開２００２−２８８５８９号公報 H.Kasuga, M.Okamoto and H.Yamamoto,「Extraction of characters from color documents」, Proceedings of the SPIE-The International Society for Optical Engineering, V 3967, pp.278-285, 2000. H.Hase, T.Shinokawa, M.Yoneda, C.Y.Suen, 「Character string extraction from color documents」, Pattern Recognition 34, pp.1349-1365, 2001. In Non-Patent Document 1, character strings are extracted with high accuracy by relying on clusters in a color space based on prior knowledge that characters have the same color and size. Similarly, Non-Patent Document 2 also extracts character regions from color images such as magazine covers.
JP 2000-67158 A JP 2002-288589 A H. Kasuga, M. Okamoto and H. Yamamoto, `` Extraction of characters from color documents '', Proceedings of the SPIE-The International Society for Optical Engineering, V 3967, pp. 278-285, 2000. H.Hase, T.Shinokawa, M.Yoneda, CYSuen, `` Character string extraction from color documents '', Pattern Recognition 34, pp.1349-1365, 2001.

しかしながら、特許文献２に記載の技術は、背景という概念がなく、文字だけでなく文字に似た並びをもつ文字以外の画素の塊が、背景であっても文字と重なって抽出されることがある。
また、非特許文献１では、どのクラスタが文字列であるかまでは認識しておらず、扱っている画像がポスターカードのような小さなものであり、文字と背景とがはっきり区別されているような比較的単純な画像についての領域判別であった。
非特許文献２では、文字列と文字ではない背景のノイズを十分に分類するには至っていない。 However, the technique described in Patent Document 2 does not have a concept of background, and a pixel block other than a character having an arrangement similar to a character as well as a character may be extracted overlapping the character even in the background. is there.
Further, Non-Patent Document 1 does not recognize which cluster is a character string, and the image being handled is a small one such as a poster card, so that the character and the background are clearly distinguished. It was an area discrimination for a relatively simple image.
Non-Patent Document 2 does not sufficiently classify background noise that is not a character string and characters.

本発明は、上述した実情を考慮してなされたものであって、文字や文字以外の画像等が混在するカラー画像あるいはグレースケールの画像を複数のクラスに高精度に判別できる画像処理装置、画像処理方法、画像処理装置の機能を実行するためプログラム、およびそのプログラムを記録したコンピュータ読み取り可能な記録媒体を提供することを目的とする。 The present invention has been made in consideration of the above-described circumstances, and is an image processing apparatus and image that can accurately determine a color image or a grayscale image in which characters or images other than characters are mixed into a plurality of classes. An object of the present invention is to provide a processing method, a program for executing the functions of the image processing apparatus, and a computer-readable recording medium on which the program is recorded.

上記の課題を解決するために、請求項１に記載の発明は、学習データに基づいて学習した判別器によって画像情報を判別する画像処理装置において、画像情報から特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段により抽出された特徴量の組み合わせになる特徴を計算する特徴計算手段と、前記特徴計算手段で計算された特徴量と前記特徴量抽出手段で抽出された特徴量とにより判別器の学習を行う学習手段と、前記学習手段で学習した判別器へ教師データを適用して判別結果と外部から与えられる理想的な判別結果とを照合する照合手段と、前記照合手段における照合結果に基づき、前記特徴計算手段における特徴量の組み合わせ方法を変更する最適化手段とを有することを特徴とする。
請求項２に記載の発明は、請求項１に記載の画像処理装置において、前記最適化手段は、前記照合手段における照合結果に基づくクロスバリデーション解析によって、前記特徴計算手段における特徴量の組み合わせ方法を最適に変更することを特徴とする。 In order to solve the above-described problem, the invention according to claim 1 is characterized in that in the image processing apparatus that discriminates image information by a discriminator that has learned based on learning data, a feature amount extraction unit that extracts a feature amount from the image information. A feature calculation unit that calculates a feature that is a combination of the feature amounts extracted by the feature amount extraction unit, a feature amount calculated by the feature calculation unit, and a feature amount extracted by the feature amount extraction unit Learning means for learning the discriminator, collating means for applying the teacher data to the discriminator learned by the learning means to collate the discrimination result with an ideal discrimination result given from the outside, and collation in the collating means And an optimization unit that changes a combination method of the feature amounts in the feature calculation unit based on the result.
According to a second aspect of the present invention, in the image processing apparatus according to the first aspect, the optimization unit uses a method of combining feature amounts in the feature calculation unit by cross-validation analysis based on a collation result in the collation unit. It is characterized by changing optimally.

請求項３に記載の発明は、学習データに基づいて学習した判別器によって画像情報を判別する画像処理装置において、画像情報から特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段により抽出された複数の画像の特徴量から射影軸を求め、該射影軸によって該特徴量をカーネル特徴量へ変換する特徴変換手段と、前記特徴変換手段により変換されたカーネル特徴に基づいて画像情報をカテゴリに分類する分類手段と、前記分類手段により分類されたカテゴリごとに画像を判別する判別器を学習するカテゴリ別学習手段とを有することを特徴とする。 According to a third aspect of the present invention, in an image processing apparatus that discriminates image information by a discriminator that has been learned based on learning data, a feature amount extraction unit that extracts a feature amount from image information, and the feature amount extraction unit extracts the feature amount. Obtaining a projection axis from the feature quantities of the plurality of images obtained, converting the feature quantity into a kernel feature quantity by the projection axis, and classifying the image information based on the kernel feature transformed by the feature transformation means. And classifying means for learning a discriminator for discriminating an image for each category classified by the classification means.

請求項４に記載の発明は、請求項１、２または３に記載の画像処理装置において、前記特徴量は、画像中の行候補から取得することを特徴とする。
請求項５に記載の発明は、請求項４に記載の画像処理装置において、前記行候補は、類似色が連続した画素を連結成分とし、該連結成分の外接矩形を統合して求めることを特徴とする。
請求項６に記載の発明は、請求項５に記載の画像処理装置において、前記連結成分に関わる特徴を前記特徴量とすることを特徴とする。
請求項７に記載の発明は、請求項４または６に記載の画像処理装置において、取得した特徴量のモーメントを前記特徴量とすることを特徴とする。
請求項８に記載の発明は、請求項５に記載の画像処理装置において、前記行候補は、前記連結成分の近傍にある他の連結成分との色の相違度が小さい場合に、両者を同じ行候補に属するとみなして統合することを特徴とする。
請求項９に記載の発明は、請求項８に記載の画像処理装置において、前記近傍にある他の連結成分を統合する場合、行方向を縦または横方向に仮定し、該行方向に存在する連結成分のみを、統合の対象とみなすようにしたことを特徴とする。
請求項１０に記載の発明は、請求項８または９に記載の画像処理装置において、前記色の相違度は、連結成分を構成する画素の平均色を用いて算出することを特徴とする。 According to a fourth aspect of the present invention, in the image processing apparatus according to the first, second, or third aspect, the feature amount is acquired from a row candidate in the image.
According to a fifth aspect of the present invention, in the image processing apparatus according to the fourth aspect, the row candidates are obtained by integrating pixels that are similar in color to each other as a connected component and integrating a circumscribed rectangle of the connected component. And
According to a sixth aspect of the present invention, in the image processing apparatus according to the fifth aspect, the feature relating to the connected component is the feature amount.
According to a seventh aspect of the present invention, in the image processing apparatus according to the fourth or sixth aspect, the acquired feature amount moment is used as the feature amount.
According to an eighth aspect of the present invention, in the image processing apparatus according to the fifth aspect, when the row candidate has a small degree of color difference from other connected components in the vicinity of the connected component, both are the same. It is characterized by integrating as if it belonged to a line candidate.
According to a ninth aspect of the present invention, in the image processing apparatus according to the eighth aspect, when the other connected components in the vicinity are integrated, the row direction is assumed to be vertical or horizontal and exists in the row direction. Only the connected component is regarded as an object of integration.
According to a tenth aspect of the present invention, in the image processing apparatus according to the eighth or ninth aspect, the color difference is calculated using an average color of pixels constituting a connected component.

請求項１１に記載の発明は、請求項４に記載の画像処理装置において、前記行候補は明度の近い連続した画素を連結成分として抽出することを特徴とする。
請求項１２に記載の発明は、請求項１１に記載の画像処理装置において、前記行候補は、前記連結成分の近傍にある他の連結成分の明度の差が小さい場合に、両者を同じ行候補に属するとみなして統合することを特徴とする。
請求項１３に記載の発明は、請求項１２に記載の画像処理装置において、前記近傍にある他の連結成分を統合する場合、行方向を縦または横方向に仮定し、該行方向に存在する連結成分のみを、統合の対象とみなすようにしたことを特徴とする。
請求項１４に記載の発明は、請求項１２または１３に記載の画像処理装置において、前記明度の差は、連結成分を構成する画素の平均明度を用いて算出することを特徴とする。 According to an eleventh aspect of the present invention, in the image processing apparatus according to the fourth aspect, the row candidates extract continuous pixels having close lightness as connected components.
According to a twelfth aspect of the present invention, in the image processing apparatus according to the eleventh aspect, when the row candidate has a small difference in lightness between other connected components in the vicinity of the connected component, both of them are the same row candidate. It is characterized by integrating as if belonging to.
According to a thirteenth aspect of the present invention, in the image processing apparatus according to the twelfth aspect, when the other connected components in the vicinity are integrated, the row direction is assumed to be vertical or horizontal, and exists in the row direction. Only the connected component is regarded as an object of integration.
According to a fourteenth aspect of the present invention, in the image processing apparatus according to the twelfth or thirteenth aspect, the lightness difference is calculated using an average lightness of pixels constituting a connected component.

請求項１５に記載の発明は、請求項１乃至１４のいずれかに記載の画像処理装置において、前記特徴量抽出手段は、解像度の低い画像を生成してから、該解像度の低い画像から特徴量を抽出するようにしたことを特徴とする。 According to a fifteenth aspect of the present invention, in the image processing device according to any one of the first to fourteenth aspects, the feature amount extraction unit generates an image having a low resolution, and then generates a feature amount from the image having the low resolution. Is extracted.

請求項１６に記載の発明は、学習データに基づいて学習した判別器によって画像情報を判別する画像処理方法において、画像情報から特徴量を抽出する特徴量抽出工程と、前記特徴量抽出工程により抽出された特徴量の組み合わせになる特徴を計算する特徴計算工程と、前記特徴計算工程で計算された特徴量と前記特徴量抽出工程で抽出された特徴量とにより判別器の学習を行う学習工程と、前記学習工程で学習した判別器へ教師データを適用して判別結果と外部から与えられる理想的な判別結果とを照合する照合工程と、前記照合工程における照合結果に基づき、前記特徴計算工程における特徴量の組み合わせ方法を変更する最適化工程とを有することを特徴とする。 The invention according to claim 16 is an image processing method for discriminating image information by a discriminator learned based on learning data. A feature amount extraction step for extracting feature amounts from image information, and an extraction by the feature amount extraction step. A feature calculation step for calculating a feature that is a combination of the feature amounts, a learning step for learning a discriminator based on the feature amount calculated in the feature calculation step and the feature amount extracted in the feature amount extraction step; A matching step of applying the teacher data to the discriminator learned in the learning step to collate the discrimination result with an ideal discrimination result given from the outside, and based on the collation result in the collation step, in the feature calculation step And an optimization step of changing a combination method of feature amounts.

請求項１７に記載の発明は、学習データに基づいて学習した判別器によって画像情報を判別する画像処理方法において、画像情報から特徴量を抽出する特徴量抽出工程と、前記特徴量抽出工程により抽出された複数の画像の特徴量から射影軸を求め、該射影軸によって該特徴量をカーネル特徴量へ変換する特徴変換工程段と、前記特徴変換工程により変換されたカーネル特徴に基づいて画像情報をカテゴリに分類する分類工程と、前記分類工程により分類されたカテゴリごとに画像を判別する判別器を学習するカテゴリ別学習工程とを有することを特徴とする。 The invention according to claim 17 is an image processing method for discriminating image information by a discriminator learned based on learning data. A feature amount extraction step for extracting feature amounts from image information, and an extraction by the feature amount extraction step. Obtaining a projection axis from the feature quantities of the plurality of images obtained, converting the feature quantity into a kernel feature quantity using the projection axis, and image information based on the kernel feature transformed by the feature transformation process. The method includes a classification step of classifying into categories, and a category-specific learning step of learning a discriminator that discriminates an image for each category classified in the classification step.

請求項１８に記載の発明は、コンピュータに、請求項１乃至１５のいずれかに記載の画像処理装置の機能を実行させるためのプログラムである。
請求項１９に記載の発明は、請求項１８に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The invention according to claim 18 is a program for causing a computer to execute the function of the image processing apparatus according to any one of claims 1 to 15.
A nineteenth aspect of the present invention is a computer-readable recording medium on which the program according to the eighteenth aspect is recorded.

本発明によれば、文字や文字以外の画像等が混在するカラー画像あるいはグレースケールの画像を複数のクラスに高精度に判別することができる。 According to the present invention, a color image or a grayscale image in which characters and images other than characters are mixed can be discriminated into a plurality of classes with high accuracy.

以下、図面を参照して本発明の画像処理装置に係る好適な実施形態について説明する。
本発明の画像処理装置は、学習データを使って判別器の判別精度を高める学習部分と、この判別器を使って入力画像を判別する判別部分とがある。以下、これらの学習部分と判別部分を順に詳細に説明する。 Hereinafter, preferred embodiments of an image processing apparatus according to the present invention will be described with reference to the drawings.
The image processing apparatus according to the present invention includes a learning portion that increases the discrimination accuracy of the discriminator using learning data, and a discrimination portion that discriminates an input image using the discriminator. Hereinafter, the learning part and the discrimination part will be described in detail in order.

＜実施形態１＞
本実施形態１では、判別器の学習のために、学習データおよび教師データをユーザが用意するものである。 <Embodiment 1>
In the first embodiment, learning data and teacher data are prepared by the user for learning of the discriminator.

（Ａ）学習部分
図１は、本実施形態１における学習部分の機能構成を示すブロック図である。同図において学習部分は、特徴量抽出手段１０、特徴計算手段２０、学習手段３０、照合手段４０、最適化手段５０および学習データ記憶手段１１、教師データ記憶手段１２、特徴量記憶手段１３、判別器データ記憶手段１４とから構成される。 (A) Learning Part FIG. 1 is a block diagram showing a functional configuration of a learning part in the first embodiment. In the figure, the learning part includes feature quantity extraction means 10, feature calculation means 20, learning means 30, collation means 40, optimization means 50 and learning data storage means 11, teacher data storage means 12, feature quantity storage means 13, and discrimination. Unit data storage means 14.

学習データ記憶手段１１および教師データ記憶手段１２は、判別器の学習に用いる画像データとその判別結果の正解とを組として複数個格納しておく。
特徴量抽出手段１０は、学習データ記憶手段１１から１つずつ画像データを取り出し、その画像から特徴量を抽出して、画像データと対応させて特徴量記憶手段１３へ記憶する。
ここで抽出される特徴量は、画像の色や大きさ、色数など一般的な画像処理で使われる特徴量を使用すれば良い。また、抽出する特徴量は複数個抽出することが望ましい。
さらに、特徴量抽出手段１０は、同様に、教師データ記憶手段１２から１つずつ画像データを取り出し、その画像から特徴量を抽出して、画像データと対応させて特徴量記憶手段１３へ記憶する。ここで、特徴量記憶手段１３に記憶される画像ごとの特徴量は、学習データと教師データとの区別もつけておく。 The learning data storage means 11 and the teacher data storage means 12 store a plurality of sets of image data used for learning of the discriminator and the correct answer of the discrimination result.
The feature amount extraction unit 10 extracts image data one by one from the learning data storage unit 11, extracts the feature amount from the image, and stores it in the feature amount storage unit 13 in association with the image data.
The feature amount extracted here may be a feature amount used in general image processing such as the color, size, and number of colors of an image. It is desirable to extract a plurality of feature quantities to be extracted.
Further, the feature quantity extraction unit 10 similarly extracts image data one by one from the teacher data storage unit 12, extracts the feature quantity from the image, and stores it in the feature quantity storage unit 13 in correspondence with the image data. . Here, the feature quantity for each image stored in the feature quantity storage unit 13 also distinguishes between learning data and teacher data.

さらに、画像データを低解像度の画像データへ変換してから特徴量を抽出するようにすると、特徴量抽出にかかる処理時間を減らすことができ、ある色が細かい点（各色成分ドット）の集合で表現され、特徴量抽出時のノイズとなりやすい網点部分の悪影響も軽減することができる。 Furthermore, if feature values are extracted after converting the image data into low-resolution image data, the processing time required for feature value extraction can be reduced, and a certain color is a set of fine points (each color component dot). It is also possible to reduce the adverse effects of halftone dots that are expressed and tend to be noise during feature quantity extraction.

特徴計算手段２０は、特徴量記憶手段１３に記憶された学習データから抽出された特徴量を画像ごとに取り出し、例えば、各特徴量に重み付けして総和することによって組み合わせ、新しい特徴量を算出し、当該画像データに対応付けて特徴量記憶手段１３へ格納する。
さらに、特徴計算手段２０は、同様に、特徴量記憶手段１３に記憶された教師データから抽出された特徴量を画像ごとに取り出し、例えば、各特徴量に重み付けして総和することによって組み合わせ、新しい特徴量を算出し、当該画像データに対応付けて特徴量記憶手段１３へ格納する。 The feature calculation means 20 takes out the feature quantities extracted from the learning data stored in the feature quantity storage means 13 for each image and combines them by, for example, weighting and summing each feature quantity to calculate a new feature quantity. Then, it is stored in the feature amount storage means 13 in association with the image data.
Furthermore, the feature calculation means 20 similarly extracts the feature quantities extracted from the teacher data stored in the feature quantity storage means 13 for each image, and combines them by weighting and summing the feature quantities, for example. The feature amount is calculated and stored in the feature amount storage unit 13 in association with the image data.

学習手段３０は、特徴量記憶手段１３に記憶された学習データから抽出された特徴量と組み合わせて作られた特徴量を画像ごとに取り出して判別器の学習を行う。この判別器には、例えば、多層ニューラルネットワークやサポートベクトルマシンなどを使う。
１つの画像に対する学習が完了すると、判別器の各種パラメータを一時的に記憶しておき、次の学習データによって学習する判別器のパラメータとして用いる。 The learning unit 30 learns the discriminator by extracting, for each image, the feature amount created in combination with the feature amount extracted from the learning data stored in the feature amount storage unit 13. For this discriminator, for example, a multilayer neural network or a support vector machine is used.
When learning for one image is completed, various parameters of the discriminator are temporarily stored and used as parameters of the discriminator to be learned by the next learning data.

照合手段４０は、特徴量記憶手段１３に記憶された教師データから抽出された特徴量と、組み合わせて作られた新しい特徴量とを画像ごとに取り出して、学習手段３０で学習された判別器に適用する。その判別結果と与えられた正解とが一致するかをカウントしておき正解率を求め、組み合わせ方法、判別器の各種パラメータおよび正解率を対応付けて一時的に記憶させる。 The matching unit 40 extracts, for each image, the feature quantity extracted from the teacher data stored in the feature quantity storage unit 13 and the new feature quantity created in combination, and uses the discriminator learned by the learning unit 30. Apply. The correct answer rate is obtained by counting whether the discrimination result matches the given correct answer, and the combination method, various parameters of the discriminator, and the correct answer rate are associated and temporarily stored.

最適化手段５０は、特徴計算手段２０で特徴量の組み合わせ方法を適当に変更して、照合結果の一番良いものを選ぶ。例えば、組み合わせ方法に重み付け総和を用いる場合は、重みの値をランダムに再設定し、学習データ記憶手段１１から抽出した特徴量にその重みの値よる総和を求め、判別器を再度学習させ、教師データ記憶手段１２を判別器に適用して正解率を出す。
最適化手段５０は、この操作を所定回数繰り返し、最も高い正解率を出した組み合わせ方法（例えば、重み値を最終的な重み）とそのときの判別器を採用し、判別器データ記憶手段１４へ組み合わせ方法と判別器の各種パラメータを記憶させる。 The optimization unit 50 appropriately changes the combination method of the feature amounts by the feature calculation unit 20 and selects the best matching result. For example, when using the weighted sum for the combination method, the weight value is reset at random, the sum based on the weight value is obtained from the feature amount extracted from the learning data storage means 11, the discriminator is re-learned, and the teacher The data storage means 12 is applied to a discriminator to obtain a correct answer rate.
The optimizing means 50 repeats this operation a predetermined number of times, adopts a combination method (for example, the weight value is the final weight) and the discriminator at that time, which give the highest accuracy rate, and sends the discriminator data storage means 14 to the discriminator data storage means 14. The combination method and various parameters of the discriminator are stored.

次に、本実施形態１における学習部分の処理の流れを図２のフローチャートを用いて説明する。
学習データ記憶手段１１から１つずつ画像データを取り出し、その画像から特徴量を抽出して、画像データと対応させて特徴量記憶手段１３へ記憶し、同様に、教師データ記憶手段１２から１つずつ画像データを取り出し、その画像から特徴量を抽出して、画像データと対応させて特徴量記憶手段１３へ記憶する（ステップＳ１）。ここで抽出される特徴量は、画像の色や大きさ、色数など一般的な画像処理で使われる特徴量を使用すれば良い。 Next, the processing flow of the learning part in the first embodiment will be described with reference to the flowchart of FIG.
Image data is extracted from the learning data storage unit 11 one by one, feature amounts are extracted from the images, stored in the feature amount storage unit 13 in correspondence with the image data, and similarly one from the teacher data storage unit 12. The image data is taken out one by one, the feature quantity is extracted from the image, and stored in the feature quantity storage means 13 in correspondence with the image data (step S1). The feature amount extracted here may be a feature amount used in general image processing such as the color, size, and number of colors of an image.

特徴量記憶手段１３に記憶された学習データおよび教師データから抽出された特徴量を画像ごとに取り出し、例えば、各特徴量に重み付けして総和することによって組み合わせ、新しい特徴量を算出し、当該画像データに対応付けて特徴量記憶手段１３へ格納する（ステップＳ２）。 The feature quantity extracted from the learning data and the teacher data stored in the feature quantity storage unit 13 is extracted for each image, for example, combined by weighting and summing each feature quantity, and calculating a new feature quantity. The data is stored in the feature amount storage means 13 in association with the data (step S2).

特徴量記憶手段１３に記憶された学習データから抽出された特徴量と組み合わせて作られた特徴量を画像ごとに取り出して判別器の学習を行う（ステップＳ３）。この判別器には、例えば、多層ニューラルネットワークやサポートベクトルマシンなどを使う。 The feature quantity created in combination with the feature quantity extracted from the learning data stored in the feature quantity storage means 13 is extracted for each image to learn the discriminator (step S3). For this discriminator, for example, a multilayer neural network or a support vector machine is used.

特徴量記憶手段１３に記憶された教師データから抽出された特徴量と組み合わせて作られた特徴量とを画像ごとに取り出して、学習された判別器に適用し、その判別結果と正解とが一致するかをカウントし、すべての教師データに対して判別器を適用した後、正解率を求め、組み合わせ方法、判別器の各種パラメータおよび正解率を対応付けて一時的に記憶させる（ステップＳ４）。 The feature quantity created in combination with the feature quantity extracted from the teacher data stored in the feature quantity storage means 13 is extracted for each image, applied to the learned discriminator, and the discrimination result matches the correct answer. After applying the discriminator to all teacher data, the correct answer rate is obtained, and the combination method, various parameters of the discriminator and the correct answer rate are associated and temporarily stored (step S4).

判定器の学習が所定回数行われていない場合（ステップＳ５のＮＯ）、特徴量の組み合わせ方法を変更して（ステップＳ６）、この組み合わせ方法で再度判定器を学習させるためにステップＳ２へ戻る。ここで、組み合わせ方法に重み付け総和を用いる場合は、重みの値をランダムに再設定するようにする。 If learning of the determinator has not been performed a predetermined number of times (NO in step S5), the combination method of feature quantities is changed (step S6), and the process returns to step S2 in order to learn the determinator again by this combination method. Here, when the weighted sum is used in the combination method, the weight value is reset at random.

一方、判定器の学習が所定回数行われている場合（ステップＳ５のＹＥＳ）、照合結果の一番良いものを選び、このときの組み合わせ方法と判別器とを判別器データ記憶手段１４へ記憶させて（ステップＳ７）、処理を終了する。ここで、組み合わせ方法に重み付け総和を用いる場合は、最も高い正解率を出した重みと判別器の各種パラメータを記憶させる。 On the other hand, if the learning of the determiner has been performed a predetermined number of times (YES in step S5), the best matching result is selected, and the combination method and the determiner at this time are stored in the determiner data storage unit 14. (Step S7), and the process ends. Here, when the weighted sum is used for the combination method, the weight giving the highest accuracy rate and various parameters of the discriminator are stored.

（Ｂ）判別部分
図３は、本実施形態１における判別部分の機能構成を示すブロック図である。同図において、判別部分は、特徴量抽出手段１０、特徴計算手段２０、判別手段６０および判別器データ記憶手段１４とから構成される。図１と同じ機能については、同じ符号を付し説明を省略する。 (B) Discrimination Part FIG. 3 is a block diagram showing the functional configuration of the discrimination part in the first embodiment. In the figure, the discrimination part is composed of a feature amount extraction means 10, a feature calculation means 20, a discrimination means 60, and a discriminator data storage means 14. The same functions as those in FIG. 1 are denoted by the same reference numerals and description thereof is omitted.

特徴量抽出手段１０は、入力された画像データから特徴量を抽出する。ここで抽出される特徴量は、前記同様に、画像の色や大きさ、色数など一般的な画像処理で使われる特徴量を使用すれば良い。また、判別器を学習させた際、低解像度へ変換してから特徴量を抽出した場合には、ここでも画像データを低解像度の画像データへ変換してから特徴量を抽出するようにする。 The feature amount extraction unit 10 extracts a feature amount from the input image data. As the feature amount extracted here, the feature amount used in general image processing such as the color and size of the image and the number of colors may be used as described above. Further, when the feature quantity is extracted after the discriminator is learned, the feature quantity is extracted after converting the image data into the low resolution image data.

特徴計算手段２０は、判別器データ記憶手段１４に記憶されている組み合わせ方法により、特徴量を組み合わせて新しい特徴量を算出する。組み合わせ方法が特徴量の重み付け総和であれば、判別器データ記憶手段１４に記憶されている重みを取り出して特徴量の総和を計算する。
判別手段６０は、判別器データ記憶手段１４に記憶されている判別器に、特徴量とその特徴量を組み合わした特徴量とを適用して判別結果を出力する。 The feature calculation unit 20 calculates a new feature amount by combining the feature amounts by the combination method stored in the discriminator data storage unit 14. If the combination method is the weighted sum of feature amounts, the weight stored in the discriminator data storage unit 14 is extracted to calculate the sum of feature amounts.
The discriminating unit 60 applies the feature amount and the feature amount obtained by combining the feature amounts to the discriminator stored in the discriminator data storage unit 14 and outputs a discrimination result.

以上のように実施形態１を構成することにより、特徴量を組み合わせて新たな特徴を作り、その組み合わせ方法を教師あり学習によって最適に更新することによって判別精度を向上させることができる。 By configuring the first embodiment as described above, it is possible to improve the discrimination accuracy by creating a new feature by combining feature amounts and updating the combination method optimally by supervised learning.

＜実施形態２＞
本実施形態２は、上記の実施形態１をクロスバリデーションで行うようにしたものであり、画像データの集合を分割し、学習データと教師データに分け、この分割の仕方を変更して、学習を繰り返して、最適な判別器を得るようにする。 <Embodiment 2>
In the second embodiment, the above-described first embodiment is performed by cross-validation. A set of image data is divided into learning data and teacher data, and the learning method is changed by changing the dividing method. Repeat to get the best discriminator.

（Ａ）学習部分
図４は、本実施形態２における学習部分の機能構成を示すブロック図である。同図において、学習部分は、特徴量抽出手段１０、特徴計算手段２０、学習手段３０、照合手段４０、最適化手段５０、分割手段７０および画像データ記憶手段１５、分割テーブル１６とから構成される。図１と同じ機能については、同じ符号を付し説明を省略する。 (A) Learning Part FIG. 4 is a block diagram showing a functional configuration of the learning part in the second embodiment. In the figure, the learning part is composed of a feature quantity extraction means 10, a feature calculation means 20, a learning means 30, a collation means 40, an optimization means 50, a division means 70, an image data storage means 15, and a division table 16. . The same functions as those in FIG. 1 are denoted by the same reference numerals and description thereof is omitted.

画像データ記憶手段１５は、学習データ及び教師データとなる複数の画像データを記憶しており、データ項目（ファイルＩＤ、ファイル名、画像データ、正解値、特徴量、特徴量を組み合わせて生成される新しい特徴量）からなる（図５参照）。 The image data storage means 15 stores a plurality of image data as learning data and teacher data, and is generated by combining data items (file ID, file name, image data, correct value, feature value, feature value). New feature amount) (see FIG. 5).

分割手段７０は、各グループに含まれる画像データ数はほぼ同数となるように、画像データ記憶手段１５に記憶された画像データを所定の数（例えば、３グループ）のグループに分割してグループテーブル１６１へ記憶し、これらのグループを学習データと教師データとなるように組み合わせを生成して学習データ集合テーブル１６２へ記憶する。 The dividing unit 70 divides the image data stored in the image data storage unit 15 into a predetermined number (for example, 3 groups) of groups so that the number of image data included in each group is substantially the same. 161, and a combination is generated so that these groups become learning data and teacher data, and stored in the learning data set table 162.

例えば、グループ数を３とすれば、学習データへのグループの割り当ては、（グループ１）、（グループ２）、（グループ３）、（グループ１とグループ２）、（グループ１とグループ３）、（グループ２とグループ３）の６通りの組み合わせが考えられる。 For example, if the number of groups is 3, group assignment to learning data is (Group 1), (Group 2), (Group 3), (Group 1 and Group 2), (Group 1 and Group 3), Six combinations of (Group 2 and Group 3) are conceivable.

グループテーブル１６１は、グループごとに、グループＩＤとこのグループに属する画像データのファイルＩＤのリストを記憶するテーブルである（図６参照）。
また、学習データ集合テーブル１６２は、各グループを学習データと教師データとに分割したときの組み合わせと学習結果を記憶するテーブルである（図７参照）。 The group table 161 is a table that stores, for each group, a list of group IDs and file IDs of image data belonging to the group (see FIG. 6).
The learning data set table 162 is a table that stores combinations and learning results when each group is divided into learning data and teacher data (see FIG. 7).

この学習データ集合テーブル１６２は、組み合わせＩＤ、学習データに属するグループＩＤのリスト、教師データに属するグループＩＤのリスト、および学習結果である、特徴量の組み合わせ方法、判別器の各種パラメータおよび教師データによる正解回数と教師データ数とからなっている。
グループファイル対応テーブル１６１および学習データ集合テーブル１６２は、分割テーブル１６として、メモリやハードディスク等の記憶装置へ一時的に記録される。 The learning data set table 162 includes a combination ID, a list of group IDs belonging to the learning data, a list of group IDs belonging to the teacher data, and a combination method of feature amounts, various parameters of the discriminator, and teacher data that are learning results. It consists of the number of correct answers and the number of teacher data.
The group file correspondence table 161 and the learning data set table 162 are temporarily recorded as a division table 16 in a storage device such as a memory or a hard disk.

特徴量抽出手段１０は、画像データ記憶手段１５に記憶されたすべての画像データについて特徴量を抽出し、当該画像データに対応させて画像データ記憶手段１５を更新する。この特徴量としては、実施形態１と同様にして、例えば、一般的な画像処理で使われる画像の色や大きさ、色数などが抽出される。また、画像データを低解像度の画像データへ変換してから特徴量を抽出するようにしてもよい。 The feature amount extraction unit 10 extracts feature amounts for all image data stored in the image data storage unit 15 and updates the image data storage unit 15 in correspondence with the image data. As the feature amount, for example, the color and size of an image used in general image processing, the number of colors, and the like are extracted as in the first embodiment. Alternatively, the feature amount may be extracted after the image data is converted into low-resolution image data.

最適化手段５０は、学習データ集合テーブル１６２に記憶された学習データのうち処理されていない集合を探し、その集合（以下、集合Ａという）に対して、特徴量の組み合わせ方法、正解回数と教師データ数を初期化する。
例えば、特徴量の組み合わせ方法を、実施形態１で説明した特徴量の重み付け総和を使用する場合には、この重み値の初期値を設定する。 The optimization means 50 searches the learning data stored in the learning data set table 162 for an unprocessed set, and for the set (hereinafter referred to as set A), a combination method of feature quantities, the number of correct answers, and a teacher Initialize the number of data.
For example, when the feature value combination method uses the weighted sum of feature values described in the first embodiment, an initial value of the weight value is set.

特徴計算手段２０は、画像データ記憶手段１５に記憶されたすべての画像データについて特徴量を、集合Ａに指定されている組み合わせ方法で組み合わせ、新しい特徴量として当該画像データに対応させて画像データ記憶手段１５を更新する。 The feature calculation unit 20 combines the feature amounts of all the image data stored in the image data storage unit 15 by the combination method specified in the set A, and stores the image data as new feature amounts corresponding to the image data. The means 15 is updated.

学習手段３０は、学習データ集合テーブル１６２を参照して、集合Ａに属する学習データのグループＩＤを取り出し、グループテーブル１６１を参照してこのグループＩＤに属する画像データを特定する（以下、画像データＢという）。
次に、画像データ記憶手段１５を参照して、画像データＢに対応する特徴量と新しい特徴量を使って判別器の学習を行う。この判別器には、例えば、多層ニューラルネットワークやサポートベクトルマシンなどを使う。
１つの画像データＢに対して学習が完了すると、学習した判別器の各種パラメータを集合Ａに対応させて学習データ集合テーブル１６２へ記憶しておく。
集合Ａに属する他のすべての学習データを使って、さらに判別器を学習させる。 The learning means 30 refers to the learning data set table 162 to extract the group ID of learning data belonging to the set A, and refers to the group table 161 to identify image data belonging to this group ID (hereinafter referred to as image data B). Called).
Next, the discriminator is learned using the feature quantity corresponding to the image data B and the new feature quantity with reference to the image data storage means 15. For this discriminator, for example, a multilayer neural network or a support vector machine is used.
When learning is completed for one image data B, various parameters of the learned discriminator are stored in the learning data set table 162 in association with the set A.
The classifier is further trained using all other learning data belonging to the set A.

照合手段４０は、学習データ集合テーブル１６２を参照して、集合Ａに属する教師データのグループＩＤを取り出し、グループテーブル１６１を参照してこのグループＩＤに属する画像データを特定する（以下、画像データＣという）。
次に、画像データ記憶手段１５を参照して、画像データＣに対応する特徴量と新しい特徴量を、集合Ａに対応する判別器に適用する。その判別結果と正解とが一致する場合には、正解数をカウントするとともに、教師データの個数を１つカウントアップする。 The collation means 40 refers to the learning data set table 162, extracts the group ID of the teacher data belonging to the set A, and refers to the group table 161 to identify the image data belonging to this group ID (hereinafter referred to as image data C Called).
Next, the feature quantity corresponding to the image data C and the new feature quantity are applied to the discriminator corresponding to the set A with reference to the image data storage means 15. If the determination result matches the correct answer, the number of correct answers is counted and the number of teacher data is counted up by one.

これらの操作が終わると、再び、最適化手段５０が起動され、学習データ集合テーブル１６２に登録されている学習データの組み合わせをすべて処理したかを確かめる。
最適化手段５０は、処理されていない組み合わせが見つかると、この見つかった組み合わせを上述の集合Ａとみなして、特徴量の組み合わせ方法を適当に変更し、判別器や正解回数、教師データの個数の初期化して、判別器を新たに学習させる。例えば、組み合わせ方法に重み付け総和を用いる場合は、重みの値をランダムに再設定することによって組み合わせ方法を変更する。 When these operations are completed, the optimization unit 50 is activated again to check whether all combinations of learning data registered in the learning data set table 162 have been processed.
When an unprocessed combination is found, the optimization unit 50 regards the found combination as the above-described set A, changes the combination method of the feature values appropriately, and determines the classifier, the number of correct answers, and the number of teacher data. Initialization is performed to newly learn the classifier. For example, when the weighted sum is used for the combination method, the combination method is changed by resetting the weight value at random.

また、最適化手段５０は、学習データのすべての組み合わせを処理した場合には、学習データ集合テーブル１６２に登録されているすべてに対して、正解率＝（正解回数／教師データの個数）を求め、最も高い正解率を出した組み合わせ方法とそのときの判別器を採用して、この採用された組み合わせ方法と判別器の各種パラメータを判別器データ記憶手段１４へ記憶させる。 Further, when all the combinations of learning data are processed, the optimization means 50 obtains the correct answer rate = (number of correct answers / number of teacher data) for all registered in the learning data set table 162. The combination method that gives the highest accuracy rate and the discriminator at that time are adopted, and the adopted combination method and various parameters of the discriminator are stored in the discriminator data storage means 14.

次に、本実施形態２における学習部分の処理の流れを図８のフローチャートを用いて説明する。
まず、画像データ記憶手段１５に記憶された画像データを各グループに含まれる画像データ数がほぼ同数となるように、所定の数のグループに分割してグループテーブル１６１（図６参照）へ記憶し、これらのグループを学習データと教師データとなるように組み合わせを生成して学習データ集合テーブル１６２（図７参照）へ記憶する（ステップＳ１０）。 Next, the processing flow of the learning part in the second embodiment will be described with reference to the flowchart of FIG.
First, the image data stored in the image data storage means 15 is divided into a predetermined number of groups and stored in the group table 161 (see FIG. 6) so that the number of image data included in each group is substantially the same. Then, a combination is generated so that these groups become learning data and teacher data, and stored in the learning data set table 162 (see FIG. 7) (step S10).

画像データ記憶手段１５に記憶されたすべての画像データについて特徴量（例えば、画像の色や大きさ、色数など）を抽出し、当該画像データに対応させて画像データ記憶手段１５を更新する。 A feature amount (for example, image color, size, number of colors, etc.) is extracted from all the image data stored in the image data storage unit 15, and the image data storage unit 15 is updated corresponding to the image data.

学習データ集合テーブル１６２に記憶された学習データのうち処理されていない集合（以下、集合Ａという）を探し（ステップＳ１２）、まだ処理されていない集合Ａがある場合（ステップＳ１３のＮＯ）、その集合に対して、特徴量の組み合わせ方法、正解回数と教師データ数を初期化する（ステップＳ１４）。 The learning data stored in the learning data set table 162 is searched for an unprocessed set (hereinafter referred to as set A) (step S12). If there is an unprocessed set A (NO in step S13), For the set, the combination method of feature values, the number of correct answers and the number of teacher data are initialized (step S14).

画像データ記憶手段１５に記憶されたすべての画像データについて特徴量を、集合Ａに指定されている組み合わせ方法で組み合わせ、新しい特徴量として当該画像データに対応させて画像データ記憶手段１５を更新する（ステップＳ１５）。 The feature amounts of all the image data stored in the image data storage unit 15 are combined by the combination method specified in the set A, and the image data storage unit 15 is updated corresponding to the image data as a new feature amount ( Step S15).

学習データ集合テーブル１６２とグループテーブル１６１を参照して、集合Ａに属する学習データに対応する特徴量と新しい特徴量を使って判別器を学習し、学習した判別器の各種パラメータを集合Ａに対応させて学習データ集合テーブル１６２へ記憶しておく（ステップＳ１６）。 Referring to the learning data set table 162 and the group table 161, the discriminator is learned using the feature amount corresponding to the learning data belonging to the set A and the new feature amount, and various parameters of the learned discriminator are associated with the set A. And stored in the learning data set table 162 (step S16).

学習データ集合テーブル１６２とグループテーブル１６１を参照して、集合Ａに属する教師データに対応する特徴量と新しい特徴量を、集合Ａに対応する判別器に適用し、その判別結果と正解とが一致する場合には、正解数をカウントするとともに、教師データの個数を１つカウントアップする（ステップＳ１７）。 With reference to the learning data set table 162 and the group table 161, the feature quantity corresponding to the teacher data belonging to the set A and the new feature quantity are applied to the discriminator corresponding to the set A, and the discrimination result matches the correct answer. When doing so, the number of correct answers is counted and the number of teacher data is counted up by one (step S17).

これらの操作が終わると、学習データ集合テーブル１６２に登録されている学習データの組み合わせをすべて処理したかを確かめ、処理されていない組み合わせが見つかると（ステップＳ１３のＮＯ）、この見つかった組み合わせを上述の集合Ａとみなして、特徴量の組み合わせ方法を適当に変更し、判別器や正解回数、教師データの個数の初期化して（ステップＳ１４）、判別器を新たに学習させて、教師データによって正解回数を算出する（ステップＳ１５〜Ｓ１７）ことを繰り返す。 When these operations are completed, it is confirmed whether all combinations of learning data registered in the learning data set table 162 have been processed. If an unprocessed combination is found (NO in step S13), the found combination is described above. The combination method of the feature values is appropriately changed, the classifier, the number of correct answers, and the number of teacher data are initialized (step S14), the classifier is newly learned, and the correct answer is determined by the teacher data. The calculation of the number of times (steps S15 to S17) is repeated.

一方、学習データのすべての組み合わせを処理した場合には（ステップＳ１３のＹＥＳ）、学習データ集合テーブル１６２に登録されているすべてに対して、正解率＝（正解回数／教師データの個数）を求め、最も高い正解率を出した組み合わせ方法とそのときの判別器を採用して、この採用された組み合わせ方法と判別器の各種パラメータを判別器データ記憶手段１４へ記憶させ（ステップＳ１８）、学習部分の処理を終了する。 On the other hand, when all the combinations of learning data are processed (YES in step S13), the correct answer rate = (number of correct answers / number of teacher data) is obtained for all registered in the learning data set table 162. The combination method with the highest accuracy rate and the discriminator at that time are adopted, the adopted combination method and various parameters of the discriminator are stored in the discriminator data storage means 14 (step S18), and the learning part Terminate the process.

（Ｂ）判別部分
本実施形態における判別部分は、実施形態１と同様に構成されるので説明を省略する。 (B) Discrimination part The discrimination part in the present embodiment is configured in the same manner as in the first embodiment, and thus the description thereof is omitted.

以上のように実施形態２を構成することによって、特徴量を組み合わせて新たな特徴を作り、その組み合わせ方法を教師あり学習によって最適に更新することによって判別精度を向上させることができる。この際、学習データと教師データをクロスバリデーションによって最適な組み合わせで選択し、偏りの無い学習データを使用できるので、より判別精度を向上させることができる。 By configuring the second embodiment as described above, it is possible to improve the discrimination accuracy by creating a new feature by combining feature amounts and optimally updating the combination method by supervised learning. At this time, learning data and teacher data can be selected in an optimal combination by cross-validation, and learning data without bias can be used, so that the discrimination accuracy can be further improved.

＜実施形態３＞
次に、本実施形態３では、画像データから抽出した文字行候補が本当に文字行か否かを精度よく判別することに上記の実施形態１および２を利用することを考える。
このためには、判別器の学習のための学習データを次の２つの方法のいずれかによって作成する。 <Embodiment 3>
Next, in the third embodiment, it is considered that the first and second embodiments are used to accurately determine whether a character line candidate extracted from image data is really a character line.
For this purpose, learning data for learning of the discriminator is created by one of the following two methods.

（１）第１の方法
文字行を構成する画像データおよびそれ以外の画像データを用意する。これらの画像データには、その画像データが文字行であるかまたはそれ以外であるかの区別をつけたものとする。
実施形態１では、これらの画像データをユーザが学習データと教師データに分けて与え、また、実施形態２では、これらの画像データをそのまま与えればよい。
さらに、与えられた画像データから文字行候補抽出処理（後述）を行い、その抽出した文字行候補の画像から特徴量を抽出して各実施形態に適用する。 (1) First method Image data constituting a character line and other image data are prepared. It is assumed that these image data are distinguished from each other as to whether the image data is a character line or not.
In the first embodiment, the user provides these image data separately as learning data and teacher data, and in the second embodiment, these image data may be provided as they are.
Further, a character line candidate extraction process (described later) is performed from the given image data, and a feature amount is extracted from the extracted character line candidate image and applied to each embodiment.

（２）第２の方法
１枚の画像データに複数の文字行候補がある場合には、文字行候補抽出処理を行って抽出された文字行候補領域に対してユーザが文字行であるか否かを指示し、この指示と文字行候補領域の画像データとを対として記録しておく。
この操作を複数枚の画像データに適用することによって、判別器の学習データを作成し、この学習データをユーザあるいはクロスバリデーション手法によって、学習データと教師データへ分割して各実施形態に適用する。 (2) Second Method When there are a plurality of character line candidates in one piece of image data, whether or not the user is a character line with respect to the character line candidate area extracted by performing the character line candidate extraction process. This instruction and the image data of the character line candidate area are recorded as a pair.
By applying this operation to a plurality of pieces of image data, learning data of the discriminator is created, and this learning data is divided into learning data and teacher data by a user or a cross-validation technique and applied to each embodiment.

また、判別器を利用するときには、入力された画像データ中から文字行候補を抽出し、この文字行候補の画像に対して特徴量およびその特徴量を組み合わせた新しい特徴量を判別器へ適用して判別するようにする。 When using a discriminator, a character line candidate is extracted from the input image data, and a new feature quantity obtained by combining the feature quantity and the feature quantity is applied to the discriminator. To determine.

次に、文字行候補抽出処理について説明する。
文字行候補の抽出は、公知の技術（例えば、特開２００３−２０８５６８号公報）を適用することができる。
例えば、水平方向に隣接する画素の色が互いに近い場合にそれらを処理単位としてまとめてランを生成し、これらのランを対象として垂直方向に接するもの同士の色を比較し、色が近ければ連結成分として統合し、この統合された文字の連結成分の外接矩形を生成する。このようにすれば、文字候補となる一かたまりの画素が一つの外接矩形として抽出できる。ここで、色が近いことを判定する方法には様々な手法を採用することができるが、例えば画素値の各色成分（ＲＧＢなど）の差の２乗和などを計算し、これを画素間の色相違度とみなして、この値が実験値等を基に予め定めておいた値よりも小さい場合に近いと判断する。 Next, the character line candidate extraction process will be described.
A well-known technique (for example, Unexamined-Japanese-Patent No. 2003-208568) can be applied to extraction of a character line candidate.
For example, if the colors of adjacent pixels in the horizontal direction are close to each other, run them by combining them as processing units, compare the colors of those that touch the vertical direction for these runs, and connect if the colors are close Integration as a component, and a circumscribed rectangle of the connected component of the integrated character is generated. In this way, a group of pixels that are character candidates can be extracted as one circumscribed rectangle. Here, various methods can be adopted as a method for determining that the colors are close. For example, the sum of squares of differences between color components (RGB, etc.) of pixel values is calculated, and this is calculated between pixels. Considering the degree of color difference, it is determined that this value is close to a case where this value is smaller than a predetermined value based on an experimental value or the like.

次に、隣接する外接矩形の色類似性の判定および矩形間の距離を判定することにより、隣接する外接矩形を統合する。この判定を繰り返すことにより得た統合された外接矩形を文字行候補として抽出する。
ここで、色類似性の判定では、外接矩形に含まれる画素の色の平均値あるいは代表色の差が所定の値より小さい時には類似であると判定する。これにより、文字行を構成する画素の色むらに影響されることを抑制しながら、文字行候補の統合精度を向上させることができる。 Next, adjacent circumscribed rectangles are integrated by determining the color similarity of adjacent circumscribed rectangles and determining the distance between the rectangles. The integrated circumscribed rectangle obtained by repeating this determination is extracted as a character line candidate.
Here, in the determination of the color similarity, when the average value of the pixels included in the circumscribed rectangle or the difference between the representative colors is smaller than a predetermined value, it is determined that they are similar. Thereby, it is possible to improve the integration accuracy of the character line candidates while suppressing the influence of the color unevenness of the pixels constituting the character line.

また、上記の外接矩形の統合処理では文字行の方向（縦書きの文章なのか横書きの文章なのか）による制限を設けるようにしてもよい。例えば、前処理として、画像データの全体を上記のような外接矩形の統合処理を行って文字行候補群を生成し、これらの文字行候補がどの方向へ伸びているかを多数決で決定し、決定された文字行の方向に限定して、外接矩形の統合を行うようにする。
これにより、前処理によって行方向を縦または横に限定することができるので、文字行かそれ以外かの判別精度を向上させることができる。 Further, in the above-described circumscribed rectangle integration process, a restriction may be provided depending on the direction of the character line (whether it is a vertically written sentence or a horizontally written sentence). For example, as preprocessing, the entire image data is subjected to the circumscribed rectangle integration processing as described above to generate a character line candidate group, and the majority of the character line candidates is determined in which direction the decision is made. The bounding rectangles are integrated only in the direction of the specified character line.
As a result, the line direction can be limited to vertical or horizontal by pre-processing, so that it is possible to improve the discrimination accuracy between the character line and the other.

さらに、上記の外接矩形の統合の際、矩形内の明度による類似性によって統合するようにしてもよい。この場合には、グレースケールの画像に対しても文字行候補を有効に抽出することが可能となる。 Further, when the circumscribed rectangles are integrated, the rectangles may be integrated according to the similarity based on the lightness in the rectangles. In this case, it is possible to effectively extract character line candidates even for a grayscale image.

なお、特徴量として外接矩形を用いても良い。この場合は、外接矩形の幅や高さを特徴量として直接用いたり、この特徴量から計算されたモーメントを新たな特徴量としたりしてもよい。また、行内に存在する連結成分一つ一つの幅や高さなどの特徴量を求め、行内すべての連結成分の特徴量でモーメントを計算し、新たな特徴量としてもよい。 A circumscribed rectangle may be used as the feature quantity. In this case, the width or height of the circumscribed rectangle may be directly used as a feature amount, or a moment calculated from the feature amount may be used as a new feature amount. Alternatively, a feature amount such as the width or height of each connected component existing in the row may be obtained, and a moment may be calculated from the feature amounts of all the connected components in the row to obtain a new feature amount.

一般に、平均μのまわりのｎ次モーメントＭ（ｎ）は、次の式１で計算できる。 In general, the n-th moment M (n) around the average μ can be calculated by the following equation 1.

ここで、ｘは特徴量、ｎはモーメントの次数、Ｅ（）は平均を表す記号、μは特徴量ｘの平均値である。 Here, x is the feature quantity, n is the moment order, E () is a symbol representing the average, and μ is the average value of the feature quantity x.

＜実施形態４＞
本実施形態４は、学習データを予めクラスタリングしておき、このクラスタごとに学習して最適な判別器を得るようにする。 <Embodiment 4>
In the fourth embodiment, learning data is clustered in advance, and an optimal discriminator is obtained by learning for each cluster.

（Ａ）学習部分
図９は、本実施形態４における学習部分の機能構成を示すブロック図である。同図において学習部分は、特徴量抽出手段１０、変換軸導出手段８０、特徴変換手段９０、分類手段１００、クラスタ別学習手段１１０および画像データ記憶手段１５、クラスタ別判別器データ記憶手段１８とから構成される。図１および図４と同じ機能については、同じ符号を付し説明を省略する。 (A) Learning Part FIG. 9 is a block diagram showing a functional configuration of the learning part in the fourth embodiment. In the figure, the learning part is composed of a feature amount extraction means 10, a conversion axis derivation means 80, a feature conversion means 90, a classification means 100, a cluster-by-cluster learning means 110, an image data storage means 15, and a cluster-by-cluster discriminator data storage means 18. Composed. The same functions as those in FIGS. 1 and 4 are denoted by the same reference numerals and description thereof is omitted.

画像データ記憶手段１５は、学習データとなる複数の画像データを記憶しており、データ項目（ファイルＩＤ、ファイル名、画像データ、正解値、特徴量、変換特徴量、クラス）からなる（図１０参照）。 The image data storage unit 15 stores a plurality of pieces of image data serving as learning data, and includes data items (file ID, file name, image data, correct value, feature value, converted feature value, class) (FIG. 10). reference).

特徴量抽出手段１０は、画像データ記憶手段１５に記憶されたすべての画像データについて特徴量を抽出し、当該画像データに対応させて画像データ記憶手段１５を更新する。この特徴量としては、上記各実施形態と同様にして、例えば、一般的な画像処理で使われる画像の色や大きさ、色数などが抽出される。また、画像データを低解像度の画像データへ変換してから特徴量を抽出することによって、処理時間と網点によるノイズを軽減することができる。
この特徴量として、実施形態３で述べたような文字行候補から抽出した特徴量を用いることによって、文字行とそれ以外の判別精度は、より向上することが見込める。 The feature amount extraction unit 10 extracts feature amounts for all image data stored in the image data storage unit 15 and updates the image data storage unit 15 in correspondence with the image data. As this feature amount, for example, the color and size of an image used in general image processing, the number of colors, and the like are extracted in the same manner as in the above embodiments. Further, by extracting the feature amount after converting the image data into low-resolution image data, noise due to processing time and halftone dots can be reduced.
By using the feature amount extracted from the character line candidates as described in the third embodiment as the feature amount, it is expected that the accuracy of discrimination between the character line and other characters is further improved.

変換軸導出手段８０は、画像データ記憶手段１５に記憶されたすべての画像に対する、特徴量に対してカーネル主成分分析を行って、射影軸を求める。このカーネル主成分分析は、非線形写像した高次元特徴空間の主成分軸を効果的に計算する手法で、公知の技術（例えば、Bernhard Scholkopf, Alexander Smola, Klaus-Robert Muller著, 「Nonlinear Component Analysis as a Kernel Eigenvalue Problem」, Neural Computation, 10, pp.1299-1319, 1998）を用いることができる。このカーネル主成分分析に用いるカーネルとしては、多項式カーネル（Polynomial Kernel）やガウシアン・カーネル（Gaussian Kernel）やシグモイドカーネル（Sigmoid Kernel）などがある。 The conversion axis deriving unit 80 performs kernel principal component analysis on the feature amount for all images stored in the image data storage unit 15 to obtain a projection axis. This kernel principal component analysis is a method that effectively calculates the principal component axis of a high-dimensional feature space that has been nonlinearly mapped. For example, Bernhard Scholkopf, Alexander Smola, Klaus-Robert Muller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem ”, Neural Computation, 10, pp.1299-1319, 1998). Kernels used for the kernel principal component analysis include a polynomial kernel (Polynomial Kernel), a Gaussian kernel (Gaussian Kernel), and a sigmoid kernel (Sigmoid Kernel).

このカーネル主成分分析には次のようなメリットがある。
・カーネル特徴空間におけるデータの描像を可視化できる。
・カーネル関数の違い（Polynomial KernelやGaussian KernelやSigmoid Kernelなど）やパラメータの値（Polynomial Kernelの次数や、Gaussian Kernelの分散等）によって、表現力に違いがでる。
・特に、次数１のPolynomial Kernelを用いたカーネル主成分分析は、入力空間の線形主成分分析と等しい。 This kernel principal component analysis has the following merits.
・ Visualization of data in the kernel feature space can be visualized.
-Expressive power varies depending on differences in kernel functions (Polynomial Kernel, Gaussian Kernel, Sigmoid Kernel, etc.) and parameter values (Polynomial Kernel degree, Gaussian Kernel distribution, etc.).
In particular, kernel principal component analysis using a degree 1 Polynomial Kernel is equivalent to linear principal component analysis of the input space.

特徴変換手段９０は、画像データ記憶手段１５に記憶されているすべての画像の特徴量に対して、上記射影軸を適用してカーネル特徴量へ射影し、当該画像データに対応させて画像データ記憶手段１５を更新する。 The feature conversion unit 90 applies the projection axis to the feature amounts of all the images stored in the image data storage unit 15 to project them to the kernel feature amount, and stores the image data corresponding to the image data. The means 15 is updated.

分類手段１００は、画像データ記憶手段１５に記憶されているすべての画像のカーネル特徴量をクラスタリングし、クラスタを当該画像データに対応させて画像データ記憶手段１５を更新する。このクラスタリング手法としては、公知のk-means法やｋ-nearest neighbor法を用いる。 The classification unit 100 clusters the kernel feature values of all the images stored in the image data storage unit 15 and updates the image data storage unit 15 so that the cluster corresponds to the image data. As this clustering method, a known k-means method or k-nearest neighbor method is used.

クラスタ別学習手段１１０は、画像データ記憶手段１５に記憶されている画像データをクラスタごとに取り出して、取り出した画像の特徴量を用いてクラスタ毎に判別器を学習し、クラスタ別判別器データ記憶手段１８へ記憶させる。判別器としては多層ニューラルネットワークやサポートベクトルマシンなどが有効である。
また、このクラスタ別判別器データ記憶手段１８には、変換軸導出手段８０で導出した射影軸も記憶させる。 The cluster-by-cluster learning unit 110 takes out the image data stored in the image data storage unit 15 for each cluster, learns the discriminator for each cluster using the feature amount of the extracted image, and stores the discriminator data by cluster. The data is stored in the means 18. As the discriminator, a multilayer neural network or a support vector machine is effective.
The cluster-specific discriminator data storage means 18 also stores the projection axis derived by the conversion axis deriving means 80.

次に、本実施形態４における学習部分の処理の流れを図１１のフローチャートを用いて説明する。
画像データ記憶手段１５に記憶されたすべての画像データについて特徴量を抽出し、当該画像データに対応させて画像データ記憶手段１５を更新する（ステップＳ２０）。
画像データ記憶手段１５に記憶されたすべての画像に対する、特徴量に対してカーネル主成分分析を行って、射影軸を求める（ステップＳ２１）。
画像データ記憶手段１５に記憶されているすべての画像の特徴量に対して、上記射影軸を適用してカーネル特徴量へ射影し、当該画像データに対応させて画像データ記憶手段１５を更新する（ステップＳ２２）。 Next, the processing flow of the learning part in the fourth embodiment will be described with reference to the flowchart of FIG.
Feature amounts are extracted from all the image data stored in the image data storage unit 15, and the image data storage unit 15 is updated in correspondence with the image data (step S20).
A kernel principal component analysis is performed on the feature amount for all images stored in the image data storage unit 15 to obtain a projection axis (step S21).
The projection axis is applied to the feature quantities of all the images stored in the image data storage means 15 to project to the kernel feature quantities, and the image data storage means 15 is updated corresponding to the image data ( Step S22).

画像データ記憶手段１５に記憶されているすべての画像のカーネル特徴量を公知のk-means法やｋ-nearest neighbor法を用いてクラスタリングし、クラスタを当該画像データに対応させて画像データ記憶手段１５を更新する（ステップＳ２３）。
画像データ記憶手段１５に記憶されている画像データをクラスタごとに取り出して、クラスタ毎に、多層ニューラルネットワークやサポートベクトルマシンなどからなる判別器を学習し、クラスタ別判別器データ記憶手段１８へ記憶させるとともに、導出した射影軸も記憶させる（ステップＳ２４）。 The kernel feature values of all the images stored in the image data storage unit 15 are clustered by using a known k-means method or k-nearest neighbor method, and the image data storage unit 15 is made to correspond to the image data. Is updated (step S23).
The image data stored in the image data storage unit 15 is extracted for each cluster, and a classifier composed of a multilayer neural network, a support vector machine, or the like is learned for each cluster and stored in the cluster-specific classifier data storage unit 18. At the same time, the derived projection axis is stored (step S24).

（Ｂ）判別部分
図１２は、本実施形態４における判別部分の機能構成を示すブロック図である。同図において、判別部分は、特徴量抽出手段１０、特徴変換手段９０、分類手段１００、クラスタ別判別手段１２０およびクラスタ別判別器データ記憶手段１８とから構成される。図９と同じ機能については、同じ符号を付し説明を省略する。 (B) Discrimination Part FIG. 12 is a block diagram showing the functional configuration of the discrimination part in the fourth embodiment. In the figure, the discriminating part is composed of a feature quantity extracting means 10, a feature converting means 90, a classifying means 100, a cluster discriminating means 120 and a cluster discriminator data storage means 18. About the same function as FIG. 9, the same code | symbol is attached | subjected and description is abbreviate | omitted.

特徴変換手段９０は、クラスタ別判別器データ記憶手段１８に記憶されている射影軸によって特徴量を射影して、クラスタ特徴量を算出する。
分類手段１００は、例えば、公知のk-means法やｋ-nearest neighbor法を用いて、カーネル特徴量をクラスタリングする。例えば、ｋ−ｍｅａｎｓ法を使う場合、学習部分で求めたクラスタ中心との距離を計算し、最も近いクラスタに分類する。
クラスタ別判別手段１２０は、クラスタ別判別器データ記憶手段１８を参照して、分類手段１００で分類されたクラスタに対応する判別器にこの画像の特徴量を適用して判別結果を出力する。 The feature conversion unit 90 projects the feature amount using the projection axis stored in the cluster-specific discriminator data storage unit 18 and calculates a cluster feature amount.
The classification unit 100 clusters kernel feature amounts using, for example, a known k-means method or k-nearest neighbor method. For example, when the k-means method is used, the distance from the cluster center obtained in the learning part is calculated and classified into the nearest cluster.
The cluster-specific discriminating unit 120 refers to the cluster-specific discriminator data storage unit 18, applies the feature amount of the image to the discriminator corresponding to the cluster classified by the classification unit 100, and outputs a discrimination result.

次に、本実施形態４における判別部分の処理の流れを図１３のフローチャートを用いて説明する。
入力された画像データから特徴量（例えば、画像の色や大きさ、色数など）を抽出する（ステップＳ３０）。
クラスタ別判別器データ記憶手段１８に記憶されている射影軸によって特徴量を射影して、クラスタ特徴量を算出する（ステップＳ３１）。
公知のk-means法やｋ-nearest neighbor法を用いて、カーネル特徴量を最も近いクラスタに分類する（ステップＳ３２）。
クラスタ別判別器データ記憶手段１８を参照して、分類手段１００で分類されたクラスタに対応する判別器にこの画像の特徴量を適用して判別結果を出力する（ステップＳ３３）。 Next, the flow of processing of the determination part in the fourth embodiment will be described with reference to the flowchart of FIG.
A feature amount (for example, the color and size of the image, the number of colors, etc.) is extracted from the input image data (step S30).
The feature quantity is projected by the projection axis stored in the cluster discriminator data storage means 18 to calculate the cluster feature quantity (step S31).
Using the k-means method or k-nearest neighbor method known in the art, the kernel feature is classified into the nearest cluster (step S32).
With reference to the cluster discriminator data storage means 18, the feature quantity of this image is applied to the discriminator corresponding to the cluster classified by the classification means 100, and the discrimination result is output (step S33).

以上の構成により、非線形空間へ写像した特徴量に対してクラスタリングを行い、クラスタ毎に判別器を生成することによって判別精度を向上させることができる。 With the above configuration, it is possible to improve the discrimination accuracy by performing clustering on the feature quantity mapped to the nonlinear space and generating a discriminator for each cluster.

＜実施形態５＞
次に、本実施形態５では、実施形態３で説明したように、画像データから抽出した文字行候補が本当に文字行か否かを精度よく判別することに上記の実施形態４を利用することを考える。 <Embodiment 5>
Next, in the fifth embodiment, as described in the third embodiment, it is considered that the fourth embodiment is used to accurately determine whether or not the character line candidate extracted from the image data is really a character line. .

（１）第１の方法
文字行を構成する画像データおよびそれ以外の画像データを用意する。これらの画像データには、その画像データが文字行であるかまたはそれ以外であるかの区別をつけて、学習用の画像データを作成する。 (1) First method Image data constituting a character line and other image data are prepared. With respect to these image data, learning image data is created by distinguishing whether the image data is a character line or not.

（２）第２の方法
１枚の画像データに複数の文字行候補がある場合には、文字行候補抽出処理を行って抽出された文字行候補の画像に対してユーザが文字行であるか否かを指示し、この指示と文字行候補の画像データとを対として記録しておく。この操作を複数枚の画像データに適用することによって判別器の学習用の画像データが作成される。 (2) Second Method When there are a plurality of character line candidates in one piece of image data, is the user a character line for the character line candidate image extracted by performing the character line candidate extraction process? Whether or not, and this instruction and the image data of the character line candidate are recorded as a pair. By applying this operation to a plurality of pieces of image data, image data for learning of the discriminator is created.

実施形態４では、これら（１）または（２）の方法で作成された画像データから文字行候補抽出処理を行い、その抽出した文字行候補の画像領域から特徴量を抽出するようにして、判別器の学習を行わせる。 In the fourth embodiment, the character line candidate extraction process is performed from the image data created by the method (1) or (2), and the feature amount is extracted from the extracted image area of the character line candidate. Have the vessel learn.

＜実施形態６＞
さらに、本発明は上述した実施形態のみに限定されたものではない。上述した実施形態の画像処理装置を構成する各機能をそれぞれプログラム化し、あらかじめＣＤ−ＲＯＭ等の記録媒体に書き込んでおき、このＣＤ−ＲＯＭをＣＤ−ＲＯＭドライブのような媒体駆動装置を搭載したコンピュータに装着して、これらのプログラムをコンピュータのメモリあるいは記憶装置に格納して、実行することによって、本発明の目的を達成することができる。 <Embodiment 6>
Furthermore, the present invention is not limited only to the above-described embodiments. Each function constituting the image processing apparatus according to the above-described embodiment is programmed and written in advance on a recording medium such as a CD-ROM, and this CD-ROM is mounted with a medium driving device such as a CD-ROM drive. The object of the present invention can be achieved by mounting these programs on a computer and storing these programs in a memory or storage device of a computer and executing them.

なお、記録媒体としては半導体媒体（例えば、ＲＯＭ、不揮発性メモリカード等）、光媒体（例えば、ＤＶＤ、ＭＯ、ＭＤ、ＣＤ−Ｒ等）、磁気媒体（例えば、磁気テープ、フレキシブルディスク等）のいずれであってもよい。 As a recording medium, a semiconductor medium (for example, ROM, nonvolatile memory card, etc.), an optical medium (for example, DVD, MO, MD, CD-R, etc.), a magnetic medium (for example, magnetic tape, flexible disk, etc.) Either may be sufficient.

また、ロードしたプログラムの指示に基づき、オペレーティングシステムやアプリケーションプログラム等が実際の処理の一部または全部を行い、その処理によって上述した実施形態の機能が実現される場合も含まれる。 In addition, the case where an operating system, an application program, or the like performs part or all of the actual processing based on the instruction of the loaded program and the functions of the above-described embodiments are realized by the processing is also included.

また、上述したプログラムをサーバコンピュータの磁気ディスク等の記憶装置に格納しておき、ネットワークで接続された利用者のコンピュータからダウンロードして頒布する場合、また、サーバコンピュータから配信して頒布する場合、このサーバコンピュータの記憶装置も本発明の記録媒体に含まれる。
このように、本発明の機能をプログラムして、記録媒体に記録し頒布することによって、コスト、可搬性、汎用性を向上させることができる。 In addition, when the above-mentioned program is stored in a storage device such as a magnetic disk of a server computer and downloaded from a user computer connected via a network and distributed, or distributed and distributed from a server computer, The storage device of this server computer is also included in the recording medium of the present invention.
In this way, by programming the function of the present invention, recording it on a recording medium and distributing it, it is possible to improve cost, portability and versatility.

本実施例は、上述した実施形態を用いて、文字や写真が混在するカラー文書画像から文字行を認識する実験である。 This embodiment is an experiment for recognizing a character line from a color document image in which characters and photographs are mixed, using the above-described embodiment.

（１）特徴抽出
まず、文字行候補をカラー文書画像から網羅的に抽出する。同色の連結成分を横方向と縦方向に色成分の類似度と位置の近さをもとに次々とグループ化することで、文字行候補が得られる。ここでは文字行候補のうち真に文字行であるものをPositiveと呼び、それ以外をNegativeと呼ぶことにする。文字行候補は予め人間の目で判断しラベルをつけておいた。 (1) Feature extraction First, character line candidates are exhaustively extracted from a color document image. Character line candidates can be obtained by grouping connected components of the same color in the horizontal direction and the vertical direction one after another based on the similarity and position of the color components. Here, a character line candidate that is truly a character line is called Positive, and the others are called Negative. The character line candidates were previously determined by human eyes and labeled.

また、文字行候補に次に示すような特徴量を割り当てた。
・コントラストにかかわる特微量。
・連結成分の外接矩形内でのスパースネスにかかわる特徴量。
・外接矩形数の数などを元に計算された特徴量。 In addition, the following feature amounts are assigned to the character line candidates.
・ Special amount related to contrast.
-Features related to sparseness in the circumscribed rectangle of the connected component.
・ Features calculated based on the number of circumscribed rectangles.

（２）特徴空間
まず、特徴空間での高次元特徴量の分布を可視化するために主成分分析を行った。PositiveとNegativeからそれぞれ２０００サンプルを取り出して、正規化したあと、２通りの方法で主成分分析を試した。 (2) Feature space First, principal component analysis was performed to visualize the distribution of high-dimensional feature values in the feature space. After taking 2000 samples from Positive and Negative respectively and normalizing them, the principal component analysis was tried by two methods.

・Positiveのデータだけから主成分を計算し、その結果をもとにPositive，Negative両方のデータを射影する方法。
・Negativeのデータだけから主成分を計算し、その結果をもとにPositive，Negative両方のデータを射影する方法。・ Calculate principal components from only Positive data, and project both Positive and Negative data based on the result.
・ A method to calculate principal components from only negative data and project both positive and negative data based on the result.

それぞれの方法による累積寄与率の増分変化をプロットすると図１４に示すような結果となった。
その結果、Positiveデータは第４主成分までで、元の高次元特徴空間を８０％程度は説明できることがわかった。その一方Negativeデータは、元の高次元特徴空間を８０％程度説明するためには第６主成分まで計算しなくてはならない。これは特徴空間でPositiveデータの分布が比較的少ない主成分で説明することができることを意味しており、多次元方向へのばらつきが比較的小さいことを意味している。 When the incremental change of the cumulative contribution rate by each method is plotted, the result shown in FIG. 14 is obtained.
As a result, it was found that the positive data is up to the fourth principal component and can explain the original high-dimensional feature space by about 80%. On the other hand, the negative data must be calculated up to the sixth principal component in order to explain about 80% of the original high-dimensional feature space. This means that the distribution of the positive data in the feature space can be explained with a relatively small principal component, and the variation in the multidimensional direction is relatively small.

そして、Positiveデータの第１主成分と第２主成分にNegativeデータを射影してみると、PositiveクラスタとNegativeクラスタの分離性を確認することができた（図１５参照）。このとき累積寄与率は７１.８８％である。 Then, when the negative data was projected onto the first principal component and the second principal component of the positive data, the separability between the positive cluster and the negative cluster could be confirmed (see FIG. 15). At this time, the cumulative contribution rate is 71.88%.

次に、Negativeデータの第１主成分と第４主成分にPositiveデータを射影したときの散布図（図１６参照）と、第１主成分と第６主成分にPositiveデータを射影したときの散布図（図１７参照）を切り出した。その結果、Negativeがいくつかのクラスタを持っていることが確認できた。 Next, a scatter diagram when the positive data is projected onto the first principal component and the fourth principal component of the negative data (see FIG. 16), and a scatter when the positive data is projected onto the first principal component and the sixth principal component. The figure (refer FIG. 17) was cut out. As a result, we have confirmed that Negative has several clusters.

このクラスタは大きく３つに分けることができる。元の特徴量のfactor loadingから、これらのクラスタは文字行候補の形（即ち、クラスタ１：横長、クラスタ２：縦長、クラスタ３：正方形）に対応していることがわかった。 This cluster can be roughly divided into three. From the factor loading of the original features, it was found that these clusters correspond to the shape of the character line candidates (that is, cluster 1: landscape, cluster 2: portrait, cluster 3: square).

高次元特徴空間においては、Positiveは多次元方向にばらつかず比較的低次元で密集するように存在し、Negativeは比較的ばらつきが大きく、そして文字行候補の形によるクラスタ構造をもって存在しているのだということが確認できた。 In the high-dimensional feature space, Positive exists in a relatively low-dimensional manner without scattering in multi-dimensional directions, Negative has a relatively large variation, and has a cluster structure based on the shape of character line candidates. I was able to confirm that.

（３）学習
次に、主成分分析後のPositiveとNegativeのクラスタの分離性を頼りに、いくつかの教師あり学習アルゴリズムを用いて評価を行った。
学習用としてカラー文書画像を２０８サンプル用意し、Positiveを１９５８８entry、Negativeを７２０２７entry得た。また、教師用としてカラー文書画像を４１サンプル用意し、Positiveを２７３０entry、Negativeを３５４９５entry得た（表１参照）。 (3) Learning Next, evaluation was performed using several supervised learning algorithms depending on the separability of Positive and Negative clusters after principal component analysis.
208 samples of color document images were prepared for learning, and 19588 entries for Positive and 72027 entries for Negative were obtained. In addition, 41 color document images were prepared for teachers, and 2730 entries for positive and 35495 entries for negative were obtained (see Table 1).

また、学習する前に、高次元特徴量はそれぞれ値域の異なる特徴量であるから、特徴の距離を正確に捉えるために、各特徴量の平均が０，分散が１になるように正規化した。 Also, before learning, high-dimensional feature values are feature values with different ranges, so in order to accurately capture the feature distance, each feature value was normalized so that the average of the feature values was 0 and the variance was 1. .

（３−１）多階層パーセプトロン（ＭＬＰ）
まず、最初に入力層と出力層の他に１層の中間層がある３層パーセプトロンで学習を行った。学習データセット３００００（Positive：１５０００，Negative：１５０００）に対する１０ fold Cross validation法により、３層パーセプトロンのパラメータである隠れノード数は８０が最適となった。 (3-1) Multi-level perceptron (MLP)
First, learning was performed with a three-layer perceptron having one intermediate layer in addition to the input layer and the output layer. By the 10 fold cross validation method for the learning data set 30000 (Positive: 15000, Negative: 15000), the optimal number of hidden nodes, which is a parameter of the 3-layer perceptron, is 80.

学習データセットで学習後、教師用データセットで評価をすると、出力の閾値を０.５とすれば、Positiveは９０.７７％で認識することができ、全文字行候補（Positive＋Negative）に対しては９５.６０％で認識することができた。
また、閾値を変えてＲＯＣ（receiver operator characteristics）カーブを描くことにより、Positiveの認識精度を上げようとすればNegativeの認識精度を下げてしまうことになることがわかった（図１８参照）。Positiveの認識精度を９５％にするためにＭＬＰの出力の閾値を０.１にすると、Negativeの認識精度は８７％まで減少してしまう。 After learning with the learning data set and evaluating with the teacher data set, if the output threshold is set to 0.5, Positive can be recognized at 90.77%, and for all character line candidates (Positive + Negative) Was recognized at 95.60%.
Further, it was found that if the positive recognition accuracy is increased by drawing ROC (receiver operator characteristics) curves by changing the threshold value, the negative recognition accuracy is lowered (see FIG. 18). If the MLP output threshold is set to 0.1 in order to make the positive recognition accuracy 95%, the negative recognition accuracy is reduced to 87%.

（３−２）サポートベクターマシン（ＳＶＭ）
次に、サポートベクターマシンで学習し、評価を行った。尚、サポートベクターマシンにはＣパラメータの代わりにパラメータの値域が0＜ν＜1で決まるνＳＶＣ(Support Vector Classifier)を用いることにした。そしてカーネル関数には次の（式２）で示すPolynomial Kernelを用いた。 (3-2) Support Vector Machine (SVM)
Next, we learned with a support vector machine and evaluated it. It should be noted that a support vector classifier (νSVC) in which the parameter value range is determined by 0 <ν <1 is used instead of the C parameter for the support vector machine. The kernel function used was Polynomial Kernel shown in (Equation 2) below.

Polynomial Kerne1のパラメータｄとνＳＶＣのパラメータνは、多層パーセプトロンのとき同様に１０ fold Cross validationで決め、ｄ＝８,ν＝０.１２を用いた。
その結果、真に文字行であるデータPositiveに対しては９１.８１％で認識することができ、全文字行候補（Positive＋Negative）に対しては９５.４９％で認識することができた。 The parameter d of Polynomial Kerne1 and the parameter ν of νSVC are determined by 10 fold cross validation as in the case of the multilayer perceptron, and d = 8 and ν = 0.12 are used.
As a result, it was possible to recognize 91.81% of data Positive that is a truly character line, and 95.49% of all character line candidates (Positive + Negative).

（３−３）Mixture of Experts（ＭｏＥ）
クラスタ解析により、文字行候補は、横長の文字行候補と縦長の文字行候補、そして１文字しかないような正方形の文字行候補の３つに分けられることがわかった。
ＭＬＰ、ＳＶＭのエラー解析をしたところ、文字行候補が横長のものは正しく認識することができるが、文字行候補が正方形のものは誤って認識してしまうことが多いことがわかった。これは、正方形のものは横長のものよりも文字列であることの事前確率が低くなってしまうために、現在の特徴量では正方形のものは文字列ではないと判断してしまう傾向があるからである。 (3-3) Mixture of Experts (MoE)
By cluster analysis, it was found that character line candidates can be divided into three: a horizontally long character line candidate, a vertically long character line candidate, and a square character line candidate having only one character.
As a result of error analysis of MLP and SVM, it was found that a horizontal character line candidate can be correctly recognized, but a square character line candidate is often erroneously recognized. This is because the prior probability that a square object is a character string is lower than a horizontally long object, and the current feature value tends to determine that a square object is not a character string. It is.

そこで、特徴空間を分割して攻略するdivide-and-conquerアルゴリズムのひとつであるＭｏＥモデルを採用した。ＭｏＥは入力ｘに対する出力を担当するExpert Networkと、入力ｘに対して各Expert Networkに適切な重みづけを担当するGating Network、そして各Expert Networkからの出力を結合する結合ノードの３つから構成される（図１９）。 Therefore, the MoE model, which is one of the divide-and-conquer algorithms that divide and capture the feature space, was adopted. MoE is composed of three nodes: Expert Network responsible for output for input x, Gating Network responsible for appropriate weighting for each Expert Network for input x, and Join Nodes that combine the outputs from each Expert Network. (FIG. 19).

すべての学習データセットは、文字行候補の高さ（Ｈ）と幅（Ｗ）が（１−Ｈ／Ｗ）^２＜０.１のとき正方形モデルのための学習データセットとし、Ｈ／Ｗ＞１のとき縦長モデルのための学習データセットとし、それ以外であれば横長モデルのための学習データセットとする。 All learning data sets are set as learning data sets for the square model when the height (H) and width (W) of the character line candidates are (1-H / W) ² <0.1, and H / W> When it is 1, it is set as a learning data set for a vertically long model, and otherwise it is set as a learning data set for a horizontally long model.

さらに、各文字行候補の形（横長モデル、縦長モデル、正方形モデル）に対して割り当てることのできる、新しい特徴量「外形スコア」を追加した。この「外形スコア」は、外接矩形の高さｈ，幅をｗ、文字行候補の高さをＨ，幅をＷ，文字連結数をａ，外接矩形の面積をＡとし、変数ｖ_１，…，ｖ_６を次のように定義し、 Furthermore, a new feature quantity “outline score” that can be assigned to each character line candidate shape (horizontal model, vertical model, square model) has been added. This “outline score” is the height h and width of the circumscribed rectangle, the height of the character line candidate is H, the width is W, the number of connected characters is a, the area of the circumscribed rectangle is A, and the variables v ₁ ,. , V ₆ is defined as follows,

ｖ_１＝ｈ／Ｈ,
ｖ_２＝ｈ／ｗ，
ｖ_３＝ｗ／ｈ，
ｖ_４＝ｗ／Ｗ，
ｖ_５＝ａ／Ａ，
ｖ_６＝Ａ／ａ v ₁ = h / H,
v ₂ = h / w,
v ₃ = w / h,
v ₄ = w / W,
v ₅ = a / A,
v ₆ = A / a

重みパラメータをｗ_１，…，ｗ_６とすると、文字行候補を構成する外接矩形に対して、重み付総和
Ｓ＝Σｖ_ｉｗ_ｉ
をもって、文字行候補の特微量とした。 If the weighting parameters are w ₁ ,..., W ₆ , the weighted sum S = Σv _i w _{i is} applied to the circumscribed rectangle forming the character line candidate.
Therefore, it was made a special amount of character line candidates.

この重みｗ_ｉは、４ fold Cross Validationをしたときの認識精度が最も高くなるようにランダムに決め、各モデル（正方形、横長、縦長）に割り当てられた学習データセットをνＳＶＣで学習した。このときも各モデルの各パラメータ（ｄとν）は１０ fold Cross validationで決めた。 The weights w _i are randomly determined so that the recognition accuracy is highest when 4-fold cross validation is performed, and learning data sets assigned to each model (square, landscape, portrait) are learned by νSVC. At this time, the parameters (d and ν) of each model were determined by 10 fold cross validation.

認識するときには、Gating Networkは文字行候補の画像データを文字行候補のいずれかの形（横長、縦長、正方形）に分類し、この文字行候補の形に対応したExpert Networkからの出力を最終的な出力とするようにした。その結果、真に文字列であるデータPositiveに対しては９１.６％で認識することができ、全文字行候補（Positive＋Negative）に対しては９６.４６％で認識することができた。 When recognizing, Gating Network classifies the image data of the character line candidates into one of the character line candidate shapes (horizontal, vertical, square), and finally outputs the output from Expert Network corresponding to this character line candidate shape. The output was changed. As a result, it was possible to recognize 91.6% of data Positive which is a truly character string, and 96.46% of all character line candidates (Positive + Negative).

以上により、複雑なカラードキュメントに対して、画像処理と特徴抽出の後にＳＶＭやＭｏＥなどの教師あり学習の手法を用いることによって、高い精度で文字行領域を認識することができるようになった。 As described above, it is possible to recognize a character line region with high accuracy by using a supervised learning method such as SVM or MoE after image processing and feature extraction for a complex color document.

実施形態１における学習部分の機能構成を示すブロック図である。FIG. 3 is a block diagram illustrating a functional configuration of a learning part in the first embodiment. 実施形態１における学習部分の処理の流れを示すフローチャートである。3 is a flowchart showing a flow of processing of a learning part in the first embodiment. 実施形態１における判別部分の機能構成を示すブロック図である。FIG. 3 is a block diagram illustrating a functional configuration of a determination part in the first embodiment. 実施形態２における学習部分の機能構成を示すブロック図である。FIG. 10 is a block diagram illustrating a functional configuration of a learning part in the second embodiment. 実施形態２における画像データ記憶手段のデータ構造例である。It is an example of the data structure of the image data storage means in Embodiment 2. 実施形態２におけるグループテーブルのデータ構造例である。It is an example of a data structure of the group table in Embodiment 2. 実施形態２における学習データ集合テーブルのデータ構造例である。It is an example of a data structure of the learning data set table in Embodiment 2. 実施形態２における学習部分の処理の流れを示すフローチャートである。10 is a flowchart showing a flow of processing of a learning part in the second embodiment. 実施形態４における学習部分の機能構成を示すブロック図である。FIG. 10 is a block diagram illustrating a functional configuration of a learning part in the fourth embodiment. 実施形態４における画像データ記憶手段のデータ構造例である。10 is a data structure example of an image data storage unit in the fourth embodiment. 実施形態４における学習部分の処理の流れを示すフローチャートである。10 is a flowchart showing a flow of processing of a learning part in the fourth embodiment. 実施形態４における判別部分の機能構成を示すブロック図である。FIG. 10 is a block diagram illustrating a functional configuration of a determination part according to a fourth embodiment. 実施形態４における判別部分の処理の流れを示すフローチャートである。10 is a flowchart showing a flow of processing of a determination part in the fourth embodiment. ２種類の主成分分析の方法別の累積寄与率の増分変化を示すグラフである。It is a graph which shows the incremental change of the cumulative contribution rate according to the method of two types of principal component analyses. Positiveデータの第１主成分と第２主成分にNegativeデータを射影したときの、PositiveクラスタとNegativeクラスタの分離性を示す散布図である。It is a scatter diagram which shows the separability of Positive cluster and Negative cluster when negative data is projected on the 1st main component and 2nd main component of Positive data. Negativeデータの第１主成分と第４主成分にPositiveデータを射影したときの散布図である。It is a scatter diagram when Positive data is projected on the 1st principal ingredient and the 4th principal ingredient of Negative data. 第１主成分と第６主成分にPositiveデータを射影したときの散布図である。It is a scatter diagram when Positive data is projected on the 1st principal ingredient and the 6th principal ingredient. ＭＬＰにおいて、閾値と認識精度の関係を示すグラフである。It is a graph which shows the relationship between a threshold value and recognition accuracy in MLP. ＭｏＥの構成を説明するための図である。It is a figure for demonstrating the structure of MoE.

Explanation of symbols

１０…特徴量抽出手段、２０…特徴計算手段、３０…学習手段、４０…照合手段、５０…最適化手段、６０…判別手段、７０…分割手段、１１…学習データ記憶手段、１２…教師データ記憶手段、１３…特徴量記憶手段、１４…判別器データ記憶手段、１５…画像データ記憶手段、１６…分割テーブル、１６１…グループテーブル、１６２…学習データ集合テーブル、８０…変換軸導出手段、９０…特徴変換手段、１００…分類手段、１１０…クラスタ別学習手段、１２０…クラスタ別判別手段、１８…クラスタ別判別器データ記憶手段。 DESCRIPTION OF SYMBOLS 10 ... Feature quantity extraction means, 20 ... Feature calculation means, 30 ... Learning means, 40 ... Collation means, 50 ... Optimization means, 60 ... Discrimination means, 70 ... Dividing means, 11 ... Learning data storage means, 12 ... Teacher data Storage means 13 ... Feature value storage means 14 ... Discriminator data storage means 15 ... Image data storage means 16 ... Division table 161 161 Group table 162 Learning data set table 80 Conversion axis derivation means 90 ... Feature conversion means, 100... Classification means, 110... Cluster learning means, 120.

Claims

In an image processing apparatus that discriminates image information by a discriminator that has learned based on learning data, a feature amount extraction unit that extracts a feature amount from image information and a feature that is a combination of the feature amount extracted by the feature amount extraction unit A feature calculation means for calculating the classifier, a learning means for learning the discriminator based on the feature quantity calculated by the feature calculation means and the feature quantity extracted by the feature quantity extraction means, and a discriminator learned by the learning means A collation unit that applies the teacher data to collate a discrimination result with an ideal discrimination result given from the outside, and an optimum that changes a combination method of feature amounts in the feature calculation unit based on the collation result in the collation unit And an image processing apparatus.

The image processing apparatus according to claim 1, wherein the optimization unit optimally changes a combination method of feature amounts in the feature calculation unit by cross-validation analysis based on a matching result in the matching unit. Image processing device.

In an image processing apparatus that discriminates image information by a discriminator that has learned based on learning data, a feature amount extracting unit that extracts a feature amount from image information, and a feature amount of a plurality of images extracted by the feature amount extracting unit Obtaining a projection axis, converting the feature quantity into a kernel feature quantity by the projection axis, classifying means for classifying image information into categories based on the kernel feature converted by the feature conversion means, and the classification An image processing apparatus comprising: a category-based learning unit that learns a discriminator that determines an image for each category classified by the unit.

The image processing apparatus according to claim 1, wherein the feature amount is acquired from a row candidate in the image.

The image processing apparatus according to claim 4, wherein the row candidates are obtained by integrating pixels having continuous similar colors as a connected component and integrating a circumscribed rectangle of the connected component.

6. The image processing apparatus according to claim 5, wherein a feature relating to the connected component is the feature amount.

7. The image processing apparatus according to claim 4, wherein a moment of the acquired feature quantity is used as the feature quantity.

6. The image processing apparatus according to claim 5, wherein when the row candidate has a small color difference from other connected components in the vicinity of the connected component, they are regarded as belonging to the same row candidate and integrated. An image processing apparatus.

9. The image processing apparatus according to claim 8, wherein when other connected components in the vicinity are integrated, the row direction is assumed to be vertical or horizontal, and only the connected components existing in the row direction are to be integrated. An image processing apparatus characterized by being regarded.

The image processing apparatus according to claim 8, wherein the color difference is calculated using an average color of pixels constituting a connected component.

The image processing apparatus according to claim 4, wherein the row candidate extracts continuous pixels having lightness as connected components.

12. The image processing device according to claim 11, wherein when the line candidate has a small difference in brightness of other connected components in the vicinity of the connected component, both are considered to belong to the same line candidate and integrated. A featured image processing apparatus.

The image processing apparatus according to claim 12, wherein when other connected components in the vicinity are integrated, the row direction is assumed to be vertical or horizontal, and only the connected components existing in the row direction are to be integrated. An image processing apparatus characterized by being regarded.

14. The image processing device according to claim 12, wherein the brightness difference is calculated using an average brightness of pixels constituting a connected component.

15. The image processing apparatus according to claim 1, wherein the feature amount extraction unit generates an image with a low resolution and then extracts a feature amount from the image with the low resolution. An image processing apparatus.

In an image processing method for discriminating image information by a discriminator that has been learned based on learning data, a feature amount extraction step that extracts a feature amount from image information and a feature that is a combination of the feature amount extracted by the feature amount extraction step A feature calculation step of calculating a classifier, a learning step of learning a discriminator based on the feature amount calculated in the feature calculation step and the feature amount extracted in the feature amount extraction step, and a discriminator learned in the learning step A collation process for applying a teacher data to collate a discrimination result with an ideal discrimination result given from the outside, and an optimal method for changing a combination method of feature quantities in the feature calculation step based on the collation result in the collation step And an image processing method.

In an image processing method for discriminating image information by a discriminator learned based on learning data, a feature amount extraction step for extracting feature amounts from image information, and feature amounts of a plurality of images extracted by the feature amount extraction step Obtaining a projection axis, converting the feature quantity into a kernel feature quantity by the projection axis, a classification process for classifying image information into categories based on the kernel feature converted by the feature conversion process, and An image processing method comprising: a learning step by category for learning a discriminator that discriminates an image for each category classified by the classification step.

A program for causing a computer to realize the functions of the image processing apparatus according to any one of claims 1 to 15.

The computer-readable recording medium which recorded the program of Claim 18.