JP2005107931A

JP2005107931A - Image search apparatus

Info

Publication number: JP2005107931A
Application number: JP2003341188A
Authority: JP
Inventors: Masajiro Iwasaki; 雅二郎岩崎
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2003-09-30
Filing date: 2003-09-30
Publication date: 2005-04-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image search apparatus and an image search method that search many documents for an image and recognize an image from the document data. <P>SOLUTION: The image search apparatus comprises a registration processing means 11 for extracting text information indicating contents of images included in a part of documents from a text in the documents by natural language processing, and extracting feature quantities indicating the contents of the images from image data on the part of documents and the other part, a database 12 for saving the text information about the part of documents, the feature quantities about the part of documents and the other part, and the images, search processing means 13 for searching the text in the database in response to a search request from a user, and searching the feature quantities in the database according to the feature quantities of searched images, and a user interface 14 by which the user issues the search request and which presents the search result by the search processing means to the user. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、大量に存在する文書中の画像を検索し、当該文書データをもとにに画像を認識する画像検索装置および画像検索方法に関する。 The present invention relates to an image search apparatus and an image search method that search for images in a large amount of documents and recognize images based on the document data.

特許文献１では、画像検索装置，画像検索用キーテキストの生成方法、並びにその装置としてコンピュータを機能させるためのプログラムおよびその方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体に関するもので、文書中の画像をテキストにより検索するものことができる。 Patent Document 1 relates to an image search device, a method for generating image search key text, a program for causing a computer to function as the device, and a computer-readable recording medium on which a program for causing the computer to execute the method is recorded. It is possible to search for an image in a document by text.

また、テキストデータおよび画像特徴量を利用して画像検索を行う従来技術が特許文献２，３，４，５に示されている。 Further, Patent Documents 2, 3, 4, and 5 disclose conventional techniques for performing image search using text data and image feature amounts.

特開平１１−２５１１３号JP 11-25113 A 特開２００１−８４２７４号JP 2001-84274 A 特開２０００−４８０４１号JP 2000-48041 A 特開２０００−２５０９４３号JP 2000-250943 A 特開２０００−７６２８７号JP 2000-76287 A

しかし、特許文献１の検索技術では、画像に適切なテキストが付与されていない場合にはうまく検索できない問題点があった。 However, the search technique disclosed in Patent Document 1 has a problem that the search cannot be performed well when an appropriate text is not given to the image.

また、特許文献２〜特許文献５の技術では、テキストと画像特徴量の両者を利用して検索する場合でも利用者が指定したテキストおよび画像の画像特徴量を用いて検索するので、テキストが一致するか画像特徴量が類似していなければ検索されない。また、登録画像のテキストが適切に設定されていない場合には、画像特徴量が類似していなければ検索されない。
実際には画像特徴量は類似していなくても画像の意味的内容は類似する場合が少なからずあり、このような場合には特許文献２〜特許文献５の技術では役に立たない。 Further, in the techniques of Patent Document 2 to Patent Document 5, even when searching using both text and image feature amount, the search is performed using the text specified by the user and the image feature amount of the image. If the image feature amount is not similar, the search is not performed. If the text of the registered image is not set appropriately, the search is not performed unless the image feature amounts are similar.
Actually, even if the image feature amount is not similar, the semantic content of the image is often similar. In such a case, the techniques of Patent Documents 2 to 5 are not useful.

本発明の目的は、登録画像が適切に設定されていないような画像でも、検索できるようにするとともに、利用者からの検索要求に対して第一段階としてテキストで検索し、第二段階としてテキストで検索された画像に付与されている特徴量で検索できるようにすることにある。 It is an object of the present invention to enable searching even for images in which registered images are not properly set, and to search by text as a first step in response to a search request from a user, and text as a second step. The feature is to make it possible to search with the feature amount assigned to the image searched in step (b).

本発明者は、利用者からの一回の検索要求に対して二段階の検索処理を行うことで、利用者が意図する画像を抽出でき、また、利用者が指定したテキストおよび利用者が指定した画像の画像特徴量の両者をキーとして単に検索するのではなく、テキストをキーに検索した結果得られた画像の画像特徴量により検索することで利用者が意図する画像を抽出できるとの知見を得て本発明をなすに至った。 The present inventor can extract the image intended by the user by performing a two-stage search process in response to a single search request from the user, and the text specified by the user and the user specified Knowledge that users can extract the image that the user intended by searching with the image feature amount of the image obtained as a result of searching with the text as a key, instead of simply searching with both the image feature amount of the acquired image as a key To obtain the present invention.

本発明の第１態様の画像検索装置は、画像を含む複数の文書から所定の画像を検索するものであって、一部の文書について、当該文書に含まれる画像の内容を示すテキスト情報を当該文書中のテキストから自然言語処理によって抽出するテキスト情報抽出手段と、前記一部の文書および前記他の一部の文書の画像データから画像の内容を示す特徴量を抽出する画像特徴量抽出手段と、前記一部の文書についてのテキスト情報、および前記一部の文書および前記他の一部の文書についての前記特徴量および前記画像を保管するデータベースと、利用者がテキストまたはキーワードによる検索要求を発行する検索要求発行手段と、検索要求に対して前記データベースのテキストを検索するテキスト検索処理実行手段と、検索された画像の特徴量に基づき、前記データベースから特徴量を検索し類似する特徴量の画像を検索する画像検索処理実行手段と、テキスト検索処理実行手段および画像検索処理実行手段による検索結果を利用者に提示する検索結果提示手段とを備えたことを特徴とする。
本発明の画像検索装置は、各構成要素をサーバに配備し、端末コンピュータ等、携帯型電話機によりサーバにアクセスできる検索システムとして構成することができる。また、データベースとしてＤＶＤ，ＣＤＲＯＭ等を用いたスタンドアローンのシステム、またはデータベースと端末コンピュータとがＬＡＮにより接続されたシステムとして構成することもできる。 An image search apparatus according to a first aspect of the present invention searches for a predetermined image from a plurality of documents including images, and for some documents, text information indicating the contents of the images included in the document is Text information extraction means for extracting from a text in a document by natural language processing; and image feature quantity extraction means for extracting feature quantities indicating image contents from image data of the partial document and the other partial document; , A database for storing text information about the part document, the feature amount and the image about the part document and the other part document, and a user issuing a search request by text or keyword Search request issuance means, text search processing execution means for searching for text in the database in response to the search request, and feature amount of the searched image Image search processing execution means for searching for feature quantities from the database and searching for images of similar feature quantities, and search result presentation means for presenting the search results by the text search processing execution means and the image search processing execution means to the user It is characterized by comprising.
The image search apparatus of the present invention can be configured as a search system in which each component is arranged in a server and the server can be accessed by a mobile phone such as a terminal computer. Further, a stand-alone system using a DVD, CDROM, or the like as a database, or a system in which a database and a terminal computer are connected by a LAN can be used.

第１態様の画像検索装置では、前記テキスト検索処理実行手段により検索されたテキストと、それに対応付けられている画像を、複数、画面に表示し、利用者に当該画像を選択させ、当該選択画像に基づき前記検索処理実行手段が前記特徴量の検索を行うことができる。 In the image search device according to the first aspect, a plurality of texts searched by the text search processing execution unit and images associated therewith are displayed on the screen, and the user is allowed to select the image, and the selected image is displayed. Based on the above, the search processing execution means can search for the feature amount.

また、第１態様の画像検索装置では、テキスト検索処理実行手段による検索において複数の画像が検索されたときは、各画像ごとにテキスト検索処理実行手段による検索結果を表示することもできる。 In the image search device according to the first aspect, when a plurality of images are searched in the search by the text search processing execution means, the search result by the text search processing execution means can be displayed for each image.

本発明の第２態様の画像検索装置は、画像を含む複数の文書から所定の画像を検索するためのものであって、前記文書に含まれる画像にかかる画像データから画像の内容を示すテキスト情報を自然言語処理によって抽出するテキスト情報抽出手段と、前記文書に含まれる画像にかかる画像データから画像の内容を示す特徴量を抽出するとともに利用者が入力した任意の画像について特徴量を抽出する特徴量抽出手段と、前記テキスト情報、前記特徴量および前記画像を保管するデータベースと、前記データベースから類似する特徴量を検索する特徴量検索手段と、前記特徴量検索手段により検索した特徴量に対応するテキスト情報を利用者に提示する検索結果提示手段と、を備えたことを特徴とする。 The image search apparatus according to the second aspect of the present invention is for searching for a predetermined image from a plurality of documents including images, and is text information indicating the content of an image from image data relating to the image included in the document A text information extracting means for extracting the image by natural language processing, and a feature for extracting a feature amount indicating the content of the image from image data relating to the image included in the document and extracting a feature amount for an arbitrary image input by the user Corresponding to the feature quantity retrieved by the feature quantity retrieval means, the database storing the text information, the feature quantity and the image, the feature quantity search means for retrieving a similar feature quantity from the database, and the feature quantity retrieval means And a search result presenting means for presenting text information to the user.

本発明の第２態様の画像検索装置では、前記特徴量検索手段により検索された画像と、それに対応付けられたテキスト情報とを画面に表示する確認情報表示手段を備え、利用者は、前記特徴量による検索が利用者の意図と一致しているか、または、画像（さらに特徴量）とテキストが正しく対応付けられているかを確認し、利用者の操作により画像（さらに特徴量）とテキストとの対応付けを解除する特徴量・テキスト対応解除手段と、および／または、画像（さらに特徴量）とテキストにその対応付けの確信度を設定し、または設定した値を変更する特徴量・テキスト対応確信度設定手段とを備えることができる。 In the image search device according to the second aspect of the present invention, the image search apparatus includes confirmation information display means for displaying an image searched by the feature amount search means and text information associated therewith on a screen. Check whether the search by the quantity matches the user's intention, or whether the image (further feature quantity) and the text are correctly associated with each other. Feature quantity / text correspondence release means for canceling the correspondence, and / or feature quantity / text correspondence confidence for setting the confidence level of the correspondence to the image (and the feature quantity) and the text, or changing the set value Degree setting means.

本発明の第１，第２態様の画像検索装置では、複数の言語に対応し、一つの特徴量に対して各言語ごとにテキストをマップし検索することができる。 In the image search apparatus according to the first and second aspects of the present invention, it is possible to map and search text for each language corresponding to a plurality of languages.

また、本発明の第１，第２態様の画像検索装置では、前記テキストと前記特徴量との関連付けを自動的に行なう自動関連付け手段を備えるとともに、利用者が当該関連付け行なうことための編集手段を備えることができる。 The image search apparatus according to the first and second aspects of the present invention further includes automatic association means for automatically associating the text with the feature amount, and editing means for the user to perform the association. Can be provided.

また、本発明の第１，第２態様の画像検索装置では、テキストと特徴量の他に、そのテキストや特徴量が出現したコンテキスト情報（Ｗｅｂページや文書の種別）を保持し、画像検索に際して前記コンテキスト情報をも含めて検索することができる。 In the image search apparatus according to the first and second aspects of the present invention, in addition to the text and the feature amount, context information (Web page or document type) in which the text or feature amount appears is held, and the image search is performed. It is possible to search including the context information.

本発明の第３態様の画像検索方法は、画像を含む複数の文書から所定の画像を検索するためのものであって、一部の文書について、当該文書に含まれる画像の内容を示すテキスト情報を当該文書中のテキストから自然言語処理によって抽出するテキスト情報抽出ステップと、前記一部の文書および前記他の一部の文書の画像データから画像の内容を示す特徴量を抽出する画像特徴量抽出ステップと、前記一部の文書についてのテキスト情報、および前記一部の文書および前記他の一部の文書についての前記特徴量および前記画像をデータベースに保管するデータベース保管ステップと、利用者が発行したテキストまたはキーワードによる検索要求に対して前記データベースのテキストを検索するテキスト検索処理実行ステップと、検索された画像の特徴量に基づき、前記データベースから特徴量を検索し類似する特徴量の画像を検索する画像検索処理実行ステップと、テキスト検索処理実行ステップおよび画像検索処理実行ステップにおける検索結果を利用者に提示する検索結果提示ステップとを備えたことを特徴とする。 The image search method according to the third aspect of the present invention is for searching a predetermined image from a plurality of documents including images, and for some documents, text information indicating the contents of the images included in the documents. A text information extraction step for extracting the text from the text in the document by natural language processing, and an image feature quantity extraction for extracting a feature quantity indicating the content of the image from the image data of the partial document and the other partial document Issued by a user, and a database storage step for storing text information about the part of the document and the feature quantity and the image of the part of the document and the other part of the document in a database A text search processing execution step for searching for text in the database in response to a search request by text or keyword, Based on the feature amount of the image, search the feature amount from the database and search for an image having a similar feature amount, and present the search results in the text search processing execution step and the image search processing execution step to the user And a search result presentation step.

本発明の第３態様の画像検索方法では、前記テキスト検索処理実行ステップにおいて検索されたテキストと、それに対応付けられている画像を、複数、画面に表示し、利用者に当該画像を選択させ、当該選択画像に基づき前記画像検索処理実行ステップにおいて前記特徴量の画像の検索を行うことができる。 In the image search method of the third aspect of the present invention, the text searched in the text search processing execution step and a plurality of images associated therewith are displayed on the screen, and the user selects the image, Based on the selected image, the image of the feature amount can be searched in the image search processing execution step.

本発明の第３態様の画像検索方法では、前記テキスト検索処理実行ステップにおける検索において複数の画像が検索されたときは、各画像ごとにテキスト検索処理実行ステップにおける検索結果を表示することができる。 In the image search method of the third aspect of the present invention, when a plurality of images are searched in the search in the text search process execution step, the search result in the text search process execution step can be displayed for each image.

本発明の第４態様の画像検索方法は、画像を含む複数の文書から所定の画像を検索するためのものであって、前記文書に含まれる画像にかかる画像データから画像の内容を示すテキスト情報を自然言語処理によって抽出するテキスト情報抽出ステップと、前記文書に含まれる画像にかかる画像データから画像の内容を示す特徴量を抽出するとともに、利用者が入力した任意の画像について特徴量を抽出する特徴量抽出ステップと、前記テキスト情報、前記特徴量および前記画像をデータベースに保管するデータベース保管ステップと、前記データベースから類似する特徴量を検索する特徴量検索ステップと、前記特徴量検索ステップにおいて検索した特徴量に対応するテキスト情報を利用者に提示する検索結果提示ステップとを備えたことを特徴とする。 The image search method of the fourth aspect of the present invention is for searching for a predetermined image from a plurality of documents including images, and is text information indicating the contents of the image from image data relating to the images included in the document A text information extraction step for extracting the image by natural language processing, and extracting a feature amount indicating the content of the image from image data relating to the image included in the document, and extracting a feature amount for an arbitrary image input by the user The feature amount extraction step, the database storage step for storing the text information, the feature amount, and the image in a database, the feature amount search step for searching for a similar feature amount from the database, and the feature amount search step A search result presentation step for presenting text information corresponding to the feature amount to the user. And butterflies.

本発明の第４態様の画像検索方法では、前記特徴量検索ステップにおいて検索された画像と、それに対応付けられたテキスト情報とを画面に表示する確認情報表示ステップを備え、利用者の操作により画像とテキストとの対応付けを解除する特徴量・テキスト対応解除ステップ、および／または、画像とテキストにその対応付けの確信度を設定し、または設定した値を変更する特徴量・テキスト対応確信度設定ステップを備えることができる。 The image search method according to the fourth aspect of the present invention includes a confirmation information display step for displaying on the screen the image searched in the feature amount search step and the text information associated therewith, and the image is operated by a user operation. Feature / text correspondence release step to cancel the association between text and text, and / or feature / text correspondence confidence setting to set the confidence level of the correspondence between images and text, or to change the set value Steps may be provided.

本発明の第３および第４態様の画像検索方法では、複数の言語に対応し、一つの特徴量に対して各言語ごとにテキストをマップし検索することができる。 In the image search methods of the third and fourth aspects of the present invention, it is possible to map and search text for each language corresponding to a plurality of languages and for one feature amount.

本発明の第３および第４態様の画像検索方法では、前記テキストと前記特徴量との関連付けを自動的に行なう自動関連付けステップを備えることができる。 The image search method according to the third and fourth aspects of the present invention can include an automatic association step of automatically associating the text with the feature amount.

本発明の第３および第４態様の画像検索方法では、テキストと特徴量の他に、そのテキストや特徴量が出現したコンテキスト情報を保持し、画像検索に際して前記コンテキスト情報をも含めて検索するコンテキスト情報検索ステップを備えることができる。 In the image search method according to the third and fourth aspects of the present invention, in addition to the text and the feature amount, the context information in which the text or the feature amount appears is held, and the context is searched including the context information when searching for the image. An information retrieval step can be provided.

（１）第１態様および第３態様の発明では、テキストをキーにして画像検索（第一段階の画像検索）を行い、これにより得られた画像の画像特徴量をキーにして次の画像検索（第二段階の画像検索）を行う。このような検索を行なうことにより、一部の登録画像のみに検索のためのテキスト情報があればよく、すべての登録画像にテキスト情報が付与されている必要がない。また、従来技術では、テキストをキーに検索する場合に、画像に対して正しくテキストが付与されていない場合には検索により所望の画像を取得することができないが、本発明では、テキストが正しく付与されていないために、第一段階の画像検索では所望の画像を取得することができなくても、特徴量が類似していれば第二段階の検索で所望の画像を取得できる。これにより、検索の漏れを無くすことができ、検索精度を向上させることができる。 (1) In the first and third aspects of the invention, an image search (first-stage image search) is performed using text as a key, and the next image search is performed using the image feature amount of the image obtained as a key. (Second stage image search) is performed. By performing such a search, it is sufficient that text information for search is included in only a part of registered images, and it is not necessary that text information is added to all registered images. In the conventional technique, when searching using text as a key, if the text is not correctly assigned to the image, the desired image cannot be obtained by the search. However, in the present invention, the text is correctly assigned. For this reason, even if the desired image cannot be acquired by the first-stage image search, the desired image can be acquired by the second-stage search if the feature amounts are similar. Thereby, omission of search can be eliminated and search accuracy can be improved.

（２）第１態様および第３態様の発明では、利用者に適切な画像を選択させることで、検索精度を向上させることができる。 (2) In the inventions of the first and third aspects, the search accuracy can be improved by allowing the user to select an appropriate image.

（３）第１態様および第３態様の発明では、第一段階での検索において複数の画像が検索された場合において、各画像についての第二段階での検索結果を各画像ごとに表示させることで、検索動作を利用者が容易に把握でき、次回の検索時の指定が容易になる。 (3) In the invention of the first aspect and the third aspect, when a plurality of images are searched in the search in the first stage, the search result in the second stage for each image is displayed for each image. Thus, the user can easily grasp the search operation, and the designation at the next search becomes easy.

（４）第２態様および第４態様の発明では、利用者が画像を指定することでその画像の内容を示すテキスト等の情報を取得することが可能である。 (4) In the second and fourth aspects of the invention, the user can acquire information such as text indicating the contents of the image by designating the image.

（５）第２態様および第４態様の発明では、検索時に利用者が自由に関連付け情報を編集することで検索精度を向上させることができる。 (5) In the inventions of the second aspect and the fourth aspect, the search accuracy can be improved by allowing the user to freely edit the association information during the search.

（６）第１から第４態様の発明では、各言語に対応することができる。 (6) In the first to fourth aspects of the invention, each language can be handled.

（７）第１から第４態様の発明では、利用者が自由に関連付け情報を編集することで検索精度を向上させることが可能である。 (7) In the inventions of the first to fourth aspects, it is possible for the user to improve the search accuracy by freely editing the association information.

（８）第１から第４態様の発明では、コンテキスト情報を追加することでより検索精度を高めることが可能である。 (8) In the inventions of the first to fourth aspects, it is possible to increase the search accuracy by adding context information.

以下、本発明の画像検索装置による検索が最も有効であると考えられるインターネット上での検索について説明する。インターネット上のＨＴＭＬで記述されるページはすべて文書であり（すなわち、文書データにより構成され）、その文書には様々な画像が埋め込まれている（すなわち、リンクされている）。 Hereinafter, a search on the Internet, which is considered to be most effective by the image search apparatus of the present invention, will be described. All pages described in HTML on the Internet are documents (that is, composed of document data), and various images are embedded (that is, linked) in the document.

図１に本発明の画像検索装置の一実施形態を示す。図１において、画像検索装置１は、登録処理手段１１と、データベース１２と、検索処理手段１３と、ユーザインタフェース１４とからなり、画像検索装置１は、インターネット１００に接続されている。登録処理手段１１は、本発明のテキスト情報抽出手段、特徴量抽出手段、自動関連付け手段等として機能する。検索処理手段１３は、本発明のテキスト検索処理実行手段、検索処理実行手段、確認情報表示手段、コンテキスト情報検索手段等として機能する。ユーザインタフェース１４は、本発明の検索要求発行手段と、検索結果提示手段として機能し、確認情報表示手段、特徴量・テキスト対応解除手段、特徴量・テキスト対応確信度設定手段等として機能する。 FIG. 1 shows an embodiment of an image search apparatus of the present invention. In FIG. 1, the image search device 1 includes a registration processing unit 11, a database 12, a search processing unit 13, and a user interface 14, and the image search device 1 is connected to the Internet 100. The registration processing unit 11 functions as a text information extraction unit, a feature amount extraction unit, an automatic association unit, and the like according to the present invention. The search processing means 13 functions as a text search processing execution means, a search processing execution means, a confirmation information display means, a context information search means, etc. according to the present invention. The user interface 14 functions as a search request issuing unit and a search result presenting unit of the present invention, and functions as a confirmation information display unit, a feature / text correspondence release unit, a feature / text correspondence certainty setting unit, and the like.

なお、本発明の画像検索方法におけるテキスト情報抽出ステップ、画像特徴量抽出ステップ、データベース保管ステップ、画像検索処理実行ステップ、検索結果提示ステップは、テキスト情報抽出手段、画像特徴量抽出手段、データベース、画像検索処理実行手段、検索結果提示手段により実現される。また、確認情報表示ステップ、特徴量・テキスト対応解除ステップ、特徴量・テキスト対応確信度設定ステップは、確認情報表示手段、特徴量・テキスト対応解除手段、特徴量・テキスト対応確信度設定手段により実現される。さらに、自動関連付けステップ、コンテキスト情報検索ステップは、自動関連付け手段、コンテキスト情報検索手段により実現される。 The text information extraction step, the image feature amount extraction step, the database storage step, the image search processing execution step, and the search result presentation step in the image search method of the present invention are text information extraction means, image feature amount extraction means, database, image This is realized by search processing execution means and search result presentation means. Also, the confirmation information display step, the feature quantity / text correspondence release step, and the feature quantity / text correspondence confidence setting step are realized by the confirmation information display means, the feature quantity / text correspondence release means, and the feature quantity / text correspondence confidence setting means. Is done. Furthermore, the automatic association step and the context information search step are realized by automatic association means and context information search means.

First embodiment

第１実施形態におけるデータベースへの登録は以下のように行なわれる。この本実施形態では、登録対象となるＨＴＭＬ文書はインターネット上またはイントラネット上にあると想定するがローカルファイルやデータベース上にＨＴＭＬ文書がある場合にも本発明が適用できる。また、人が予め指定した文書集合を登録してもよいがここではインターネット上でのページの自動取得方法として一般的なクローラ（ロボット）を用いた例を示す。検索に際しては、テキスト情報検索と特徴量検索の二段階での検索が行なわれる。テキスト情報により検索される文書は、可能であれば全文書でもよいし、全文書の一部でもよい。なお、画像特徴量は検索対象全文書から予め抽出しておきデータベースに登録しておく必要がある。図２に登録処理（Ｓ１０１〜Ｓ１０５）の概要を示し、図３に検索処理（Ｓ２０１〜２０４）の概要を示す。 Registration in the database in the first embodiment is performed as follows. In this embodiment, it is assumed that the HTML document to be registered is on the Internet or an intranet, but the present invention can also be applied when there is an HTML document on a local file or database. A document set designated in advance by a person may be registered. Here, an example in which a general crawler (robot) is used as an automatic page acquisition method on the Internet will be described. When searching, a text information search and a feature amount search are performed in two stages. The document searched by the text information may be the entire document or a part of the entire document if possible. Note that the image feature amount needs to be extracted from all search target documents in advance and registered in the database. FIG. 2 shows an outline of the registration process (S101 to S105), and FIG. 3 shows an outline of the search process (S201 to S204).

［登録画像選別］
クローラは、インターネット上に存在するページ（ＨＴＭＬ文書）を一つずつ取得し（S１０１）、取得したページから画像を抽出する（S１０２）。すなわち、画像を指定するタグがあるときはその画像を登録画像としてデータベースに登録する（S１０３）。抽出処理が終了したときや、HTML文書に画像が含まれていないときは、次のページを取得し、抽出処理を繰り返す。 [Select registered images]
The crawler acquires one page (HTML document) existing on the Internet one by one (S101), and extracts an image from the acquired page (S102). That is, when there is a tag for designating an image, the image is registered in the database as a registered image (S103). When the extraction process ends or when the HTML document does not include an image, the next page is acquired and the extraction process is repeated.

ＨＴＭＬに含まれている画像（リンクされている画像）には画像検索の対象とならないような画像（ボタン、背景、罫線など）が存在する。これらの画像は、以下のようにして登録画像から除外する。すなわち、画像のサイズ、アスペクト比、種類といった情報により登録対象画像を選別する。たとえば、画像のサイズが所定サイズよりも小さいものはボタン、チェックボックス等であることが想定されるので登録対象とはしない。また、アスペクト比が大きい画像は罫線であることが想定されるの登録対象とはしない。さらに、ＧＩＦの画像データは装飾の場合が多いので登録画像とはしない。 There are images (buttons, backgrounds, ruled lines, etc.) that are not targeted for image search in the images (linked images) included in the HTML. These images are excluded from the registered images as follows. That is, the registration target image is selected based on information such as the image size, aspect ratio, and type. For example, an image whose size is smaller than a predetermined size is assumed to be a button, a check box, or the like, and is not registered. In addition, an image with a large aspect ratio is not a registration target because it is assumed to be a ruled line. Furthermore, since GIF image data is often decorated, it is not a registered image.

［テキスト情報抽出］
登録対象となった画像から画像の内容を示すテキスト情報をＨＴＭＬ文書から抽出する（S１０４）。ＨＴＭＬ文書の場合には画像の内容を示す情報がイメージ（ｉｍｇ）タグのａｌｔ属性に記述されているので、これをテキスト情報として抽出する。テキスト情報の抽出には、特開平１１−２５１１３号公報に記載の方法を用いることができる。 [Text Information Extraction]
Text information indicating the contents of the image is extracted from the HTML document from the registered image (S104). In the case of an HTML document, information indicating the contents of the image is described in the alt attribute of the image (img) tag, and this is extracted as text information. For the extraction of text information, the method described in JP-A-11-25113 can be used.

また、画像の内容を示すテキスト情報を抽出することもできる。テキスト情報は、自然言語のテキストでもよいし、形態素解析などを用いてキーワードを抽出してもよい。また、特開平１１−２５１１３号に記載の技術のように複数のテキスト情報を抽出し各テキスト情報に重み付けを行なってもよい。 It is also possible to extract text information indicating the contents of the image. The text information may be a natural language text, or a keyword may be extracted using morphological analysis or the like. Also, a plurality of text information may be extracted and weighted for each text information as in the technique described in Japanese Patent Application Laid-Open No. 11-25113.

抽出したテキスト情報はデータベースに登録する。テキスト情報の抽出は登録対象となった画像の全てから行う必要はなく、適切なテキストが抽出できなかった場合には後述する特徴量抽出だけを行う。また、テキスト情報を抽出した結果、そのテキスト情報が既に登録されており、当該テキスト情報にかかる画像が多数ある場合には、そのテキスト情報はデータベースに登録してもよいし登録しなくてもよい。 The extracted text information is registered in the database. Extraction of text information does not have to be performed from all of the registered images, and if an appropriate text cannot be extracted, only feature amount extraction described later is performed. If the text information is already registered as a result of extracting the text information and there are many images related to the text information, the text information may or may not be registered in the database. .

［画像特徴量抽出］
登録対象となった画像データから画像特徴量を抽出する（S１０５）。画像特徴量としてカラーヒストグラムなどの一般的な画像特徴量を利用することができる。たとえば、カラーヒストグラムであれば、当該特徴量はベクトルデータとなる。また、画像全体から画像特徴量を抽出するのではなく、領域識別により抽出した領域（オブジェクト）から画像特徴量を抽出してもよい。抽出した画像特徴量と画像データは、関連付けしてデータベースに登録する。画像データ自体を登録せずに、その画像のＵＲＬを登録してもよい。画像特徴量抽出はテキスト情報と異なりすべての登録対象画像から抽出する必要がある。 [Image feature extraction]
Image feature amounts are extracted from the image data to be registered (S105). A general image feature amount such as a color histogram can be used as the image feature amount. For example, in the case of a color histogram, the feature amount is vector data. Further, instead of extracting the image feature amount from the entire image, the image feature amount may be extracted from a region (object) extracted by region identification. The extracted image feature quantity and image data are associated and registered in the database. The URL of the image may be registered without registering the image data itself. Unlike text information, image feature amount extraction needs to be extracted from all registration target images.

以下、テキストによる画像検索について説明する。利用者はテキストにより所望の画像を検索することができる。この検索は利用者の１回の検索要求に対してまずテキストをキーにした画像検索（第一段階の画像検索）を行い、第一段階で検索された画像の特徴量を元にした画像検索（第二段階の画像検索）を行ない、この二段階の検索によって得られた検索結果を利用者に提示する。第二段階の検索では、第一段階で検索された画像にリンク付けされた特徴量をキーに検索を行なう。 Hereinafter, the text image search will be described. The user can search for a desired image by text. In this search, an image search (first-stage image search) is first performed using text as a key in response to a single search request from the user, and an image search based on the feature amount of the image searched in the first stage. (Second-stage image search) is performed, and the search results obtained by the two-stage search are presented to the user. In the second stage search, the search is performed using the feature amount linked to the image searched in the first stage as a key.

［テキスト情報検索］
ユーザインタフェース１４から入力された、自然言語テキストまたはキーワード（検索テキスト）を取得し、自然言語テキストの場合には形態解析によりキーワードを抽出する（S２０１）。キーワードによりデータベース中のテキスト情報を検索する。すなわち、キーワードを含むテキスト情報を全文検索により検索し、検索されたテキスト情報に対応する画像特徴量が特定される。場合によっては複数の画像のテキスト情報が検索されるので、複数の画像特徴量が特定される。また、確信度が指定されている場合には確信度順にソートされ、予め指定された上位数件の画像と限定したり、予め指定された閾値以上の画像に限定したりすることで画像特徴量が特定される。 [Text Information Search]
Natural language text or keywords (search text) input from the user interface 14 are acquired, and in the case of natural language texts, keywords are extracted by morphological analysis (S201). Search text information in database by keyword. That is, text information including a keyword is searched by full text search, and an image feature amount corresponding to the searched text information is specified. In some cases, text information of a plurality of images is searched, and a plurality of image feature amounts are specified. In addition, when certainty factors are specified, they are sorted in the order of certainty factors, and are limited to a plurality of images specified in advance or limited to images having a predetermined threshold value or more. Is identified.

テキストによって特定された画像特徴量が抽出された画像を一覧し、利用者が意図する画像に類似する画像を明示的に利用者が指定することも可能である。図４にこの表示画面の例を示す。また、図４に示したような画面で検索用の画像の選択だけでなく、利用者が画像とテキストの組み合わせが正しくないことを指定するチェックボックスを加えることも可能である。利用者がそのチェックボックスを指定した場合には、画像とテキストの対応付けを削除（切断）したり、または、確信度を下げる処理をすることにより画像とテキストの対応付けをより正確に変更することが可能となる。 It is also possible to list images from which image feature amounts specified by text are extracted, and the user can explicitly specify an image similar to an image intended by the user. FIG. 4 shows an example of this display screen. In addition to selecting an image for search on the screen as shown in FIG. 4, it is possible for the user to add a check box for designating that the combination of the image and the text is not correct. When the user designates the check box, the association between the image and the text is changed more accurately by deleting (disconnecting) the association between the image and the text or by reducing the certainty level. It becomes possible.

［画像特徴量検索］
テキスト情報検索により特定された問い合わせ画像特徴量に類似する画像特徴量をデータベースから検索する（S２０２）。画像特徴量がカラーヒストグラムのようなベクトルデータであれば類似度は例えばユークリッド距離で計算できる。問い合わせ画像特徴量とデータベース中の全特徴量との類似度を算出し、特徴量を類似度の高い順にソーティングする。テキスト情報検索で複数の画像特徴量が特定されている場合には、類似度を各特徴量の類似度の総和とすることができるし、また、総和とせずに最小の類似度を用いてもよい。 [Image feature search]
An image feature quantity similar to the inquiry image feature quantity specified by the text information search is searched from the database (S202). If the image feature quantity is vector data such as a color histogram, the similarity can be calculated by, for example, the Euclidean distance. The similarity between the inquiry image feature quantity and all the feature quantities in the database is calculated, and the feature quantities are sorted in descending order of similarity. When a plurality of image feature amounts are specified in the text information search, the similarity can be the sum of the similarities of the feature amounts, and the minimum similarity can be used instead of the sum. Good.

［検索結果表示］
画像特徴量検索でソーティングされた特徴量の順に対応する画像を利用者に表示する（S２０３）。利用者は表示された画像一覧から画像を選択し、画像をデータを取得することが可能である。
画像を選択した場合には単純に検索された画像を一覧するだけではなく、選択された画像ごとに検索した結果の画像を図５のように表示することも可能である。 [Search result display]
Images corresponding to the order of the feature quantities sorted in the image feature quantity search are displayed to the user (S203). The user can select an image from the displayed image list and acquire data of the image.
When an image is selected, it is possible not only to list searched images but also to display an image obtained as a result of searching for each selected image as shown in FIG.

［確信度更新］
検索結果表示の処理において、利用者により画像が選択されるということは、利用者の指定したテキストと画像が適合していることを意味する。したがって、テキスト情報検索によって特定された画像特徴量とテキスト情報の組み合わせが妥当であると判断できるので、データベースに登録されている画像特徴量とテキスト情報の組み合わせについての確信度を更新する（S２０４）。こうすることによって、確信度の高い組み合わせを優先して検索することになるので検索精度を向上させることができる。 [Confidence Update]
In the search result display process, the fact that the user selects an image means that the text specified by the user matches the image. Accordingly, since it can be determined that the combination of the image feature amount and the text information specified by the text information search is appropriate, the certainty factor for the combination of the image feature amount and the text information registered in the database is updated (S204). . By doing so, a search with a high certainty factor is preferentially searched, so that the search accuracy can be improved.

［複数言語対応］
データベースで管理されるテキスト情報は言語ごとにもつことも可能である。つまり、１画像ごとに日本語、英語、フランス語のそれぞれのテキスト情報を有することを特徴とし、いずれの言語で検索することも可能とする。また、日本語で検索された画像を基に画像を特徴量を検索し、その検索された特徴量に対応付けられた英語を検索することも可能である。つまり、日本語に対応する英語を検索することができ、日英辞書のように利用することも可能である。 [Multi-language support]
Text information managed in the database can be stored for each language. That is, each image has text information of Japanese, English, and French, and the search can be performed in any language. It is also possible to search the feature amount of the image based on the image searched in Japanese and to search English associated with the searched feature amount. That is, English corresponding to Japanese can be searched, and can be used like a Japanese-English dictionary.

［特徴量とテキストの関連付けの編集］
検索の精度は特徴量とテキストの関連付けの情報に大きく依存する。この関連付け情報は自動的に抽出されるだけでなく、利用者が明示的に編集することができる。単純に関連付けを一覧して、その関連付けを削除したり、または、関連付けの確信度を変更することができる。 [Edit feature-text association]
The accuracy of the search greatly depends on the information relating the feature quantity and the text. This association information is not only automatically extracted, but can be explicitly edited by the user. You can simply list the associations and delete the associations or change the confidence of the associations.

［コンテキスト情報による検索］
テキストと特徴量の関連付けの情報にさらにコンテキスト情報を追加することも可能である。コンテキスト情報はテキストや画像がどのような種別のページで出現したかを示す情報である。Ｗｅｂページや文書にはタイトルやページのテキスト情報からそのテキストの種別が抽出できる。例えば、ページ中のテキストを形態素解析した結果キーワードを抽出し、その結果工学関連のキーワードが多ければ工学関連のページだと判断し関連付け情報にコンテクスト情報として工学を示す値を付与することが可能である。検索時にはテキストだけでなく、コンテキストを選択することを可能とし、指定されたコンテキストと共に関連付け情報を検索することで、より検索精度を高めることができる。 [Search by context information]
It is also possible to add context information to the information on the association between the text and the feature amount. The context information is information indicating what type of page the text or image appears on. For Web pages and documents, the text type can be extracted from the title and text information of the page. For example, it is possible to extract keywords as a result of morphological analysis of text in the page, and if there are many engineering-related keywords as a result, it is determined that the page is engineering-related, and a value indicating engineering can be assigned to the association information as context information. is there. When searching, it is possible to select not only the text but also the context, and the search accuracy can be further improved by searching the association information together with the designated context.

Second embodiment

第１実施形態と同様の構成で画像を認識させて画像の内容を示すテキストを表示させることができる。第１実施形態のデータベースはそのまま認識用データベースとして利用できるので第１実施形態と併用することが可能である。登録に関しては第１実施形態と同様なので説明を省略するが、認識のみを目的にする場合には全画像を登録する必要はなく、また、テキスト情報と特徴量を常に両方抽出する必要がある。検索処理の概要を図６（Ｓ３０１〜S３０３）に示す。利用者は認識したい任意の画像を指定し、その画像の内容を示すテキストを取得することが可能である。 An image can be recognized with the same configuration as in the first embodiment, and text indicating the content of the image can be displayed. Since the database of the first embodiment can be used as it is as a recognition database, it can be used together with the first embodiment. Since registration is the same as in the first embodiment, description thereof is omitted. However, when only the recognition is intended, it is not necessary to register all images, and it is necessary to always extract both text information and feature amounts. An outline of the search process is shown in FIG. 6 (S301 to S303). The user can specify an arbitrary image to be recognized and obtain text indicating the content of the image.

［画像特徴量検索］
指定された画像から画像特徴量を抽出し、第１実施形態と同様にデータベースから類似する画像と特徴量を検索する（S３０１）。画像特徴量を抽出する前に、領域抽出を行い領域（物体）を抽出した上で、その領域から画像特徴量を抽出しても良い。なお、確信度が指定されている場合には、検索時には確信度により類似度の重み付けを行う。 [Image feature search]
Image feature amounts are extracted from the designated image, and similar images and feature amounts are searched from the database as in the first embodiment (S301). Before extracting the image feature quantity, the area may be extracted to extract the area (object), and then the image feature quantity may be extracted from the area. When the certainty factor is designated, the similarity is weighted by the certainty factor during the search.

［検索結果表示］
類似度が特定の閾値以上或いは予め決められた件数の画像特徴量に対応するテキスト情報の一覧を利用者に提示する（S３０２）。 [Search result display]
A list of text information corresponding to the image feature amount whose similarity is equal to or greater than a specific threshold value or a predetermined number of cases is presented to the user (S302).

［確信度更新］
第１実施形態と同様に利用者は適当なテキストを選択することができ、選択されたテキスト情報に対応する確信度を更新する（S３０３）。 [Confidence Update]
As in the first embodiment, the user can select an appropriate text and updates the certainty factor corresponding to the selected text information (S303).

［複数言語対応］
第１実施形態と同様にテキストデータには複数の言語を持つことが可能であり、指定された言語による検索結果の表示を可能とする。 [Multi-language support]
As in the first embodiment, text data can have a plurality of languages, and search results can be displayed in a designated language.

本発明の画像検索装置の第１施形態にかかる構成図である。It is a block diagram concerning 1st Embodiment of the image search device of this invention. 図１の実施形態における登録処理を示すフローチャートである。It is a flowchart which shows the registration process in embodiment of FIG. 図１の実施形態における検索処理を示すフローチャートである。It is a flowchart which shows the search process in embodiment of FIG. 第１実施形態における検索結果の一表示例を示す図である。It is a figure which shows the example of a display of the search result in 1st Embodiment. 第１実施形態における検索結果の他の表示例を示す図である。It is a figure which shows the other example of a display of the search result in 1st Embodiment. 本発明の画像検索装置の第２実施形態における検索処理を示すフローチャートである。It is a flowchart which shows the search process in 2nd Embodiment of the image search device of this invention.

Explanation of symbols

１１登録処理手段
１２データベース
１３検索処理手段
１００インターネット 11 Registration Processing Unit 12 Database 13 Search Processing Unit 100 Internet

Claims

An image search device for searching for a predetermined image from a plurality of documents including images,
Text information extracting means for extracting text information indicating the content of an image included in the document from the text in the document by natural language processing for some documents;
Image feature amount extraction means for extracting feature amounts indicating image contents from image data of the partial document and the other partial document;
A database for storing text information about the part document, and the feature amount and the image about the part document and the other part document;
A search request issuing means in which a user issues a search request by text or keyword;
Text search processing execution means for searching for text in the database in response to a search request;
Image search processing execution means for searching for a feature quantity from the database and searching for an image with a similar feature quantity based on the feature quantity of the searched image;
Search result presenting means for presenting a search result by the text search processing executing means and the image search processing executing means to the user;
An image search apparatus comprising:

The text searched by the text search processing execution means and a plurality of images associated therewith are displayed on the screen, the user is allowed to select the image, and the image search processing execution means is based on the selected image. The image search apparatus according to claim 1, wherein an image having the feature amount is searched.

3. The image search apparatus according to claim 1, wherein when a plurality of images are searched in the search by the text search process execution unit, a search result by the text search process execution unit is displayed for each image. .

An image search device for searching for a predetermined image from a plurality of documents including images,
Text information extracting means for extracting, by natural language processing, text information indicating the content of an image from image data relating to an image included in the document;
A feature amount extracting unit that extracts a feature amount indicating the content of an image from image data relating to an image included in the document and extracts a feature amount for an arbitrary image input by a user;
A database for storing the text information, the feature amount, and the image;
Feature quantity search means for searching for similar feature quantities from the database;
Search result presentation means for presenting text information corresponding to the feature quantity searched by the feature quantity search means to the user;
An image search apparatus comprising:

A confirmation information display means for displaying an image searched by the feature amount search means and text information associated therewith on a screen;
Feature amount / text correspondence canceling means for canceling the correspondence between the image and the text by the user's operation, and / or
A feature amount / text correspondence certainty setting means for setting the certainty of the correspondence between the image and text, or changing the set value,
The image search apparatus according to claim 4, further comprising:

6. The image search apparatus according to claim 1, wherein the image search apparatus corresponds to a plurality of languages and maps and searches text for each language with respect to one feature amount.

7. The apparatus according to claim 1, further comprising automatic association means for automatically associating the text with the feature quantity, and further comprising editing means for a user to perform the association. Image search device.

2. The apparatus according to claim 1, further comprising: context information search means for holding context information in which the text or feature amount appears in addition to the text and the feature amount, and for searching including the context information when searching for an image. 8. The image search device according to any one of 7 above.

An image retrieval method for retrieving a predetermined image from a plurality of documents including images,
A text information extraction step for extracting text information indicating the content of an image included in the document from a text in the document by natural language processing for some documents;
An image feature amount extraction step for extracting a feature amount indicating the content of the image from the image data of the partial document and the other partial document;
A database storage step of storing the text information about the part of the document and the feature amount and the image of the part of the document and the other part of the document in a database;
A text search processing execution step for searching text in the database in response to a search request by text or a keyword issued by a user;
An image search processing execution step of searching for a feature amount from the database and searching for an image with a similar feature amount based on the feature amount of the searched image;
A search result presenting step for presenting search results to the user in the text search processing execution step and the image search processing execution step;
An image search method characterized by comprising:

The text searched in the text search processing execution step and a plurality of images associated therewith are displayed on the screen, and the user is allowed to select the image, and based on the selected image, in the image search processing execution step The image search method according to claim 9, wherein an image having the feature amount is searched.

The image search method according to claim 9 or 10, wherein when a plurality of images are searched in the search in the text search processing execution step, a search result in the text search processing execution step is displayed for each image. .

An image retrieval method for retrieving a predetermined image from a plurality of documents including images,
A text information extraction step of extracting text information indicating the content of the image from image data concerning the image included in the document by natural language processing;
A feature amount extraction step of extracting a feature amount indicating the content of the image from image data of the image included in the document and extracting a feature amount for an arbitrary image input by the user;
A database storage step of storing the text information, the feature amount, and the image in a database;
A feature amount search step of searching for a similar feature amount from the database;
A search result presentation step of presenting text information corresponding to the feature amount searched in the feature amount search step to the user;
An image search method characterized by comprising:

A confirmation information display step for displaying on the screen the image searched in the feature amount search step and the text information associated therewith,
A feature / text correspondence release step for releasing the correspondence between the image and the text by the user's operation, and / or
A feature / text correspondence confidence setting step for setting the certainty of the correspondence between the image and the text or changing the set value,
The image search method according to claim 12, further comprising:

The image search method according to any one of claims 9 to 13, wherein text is mapped for each language corresponding to a plurality of languages and searched for each language.

The image search method according to claim 9, further comprising an automatic association step of automatically associating the text with the feature amount.

10. A context information search step for storing context information in which the text or feature amount appears in addition to the text and the feature amount, and searching for the image information including the context information at the time of image search. The image search method according to any one of 15.