JP2012027841A

JP2012027841A - Retrieval program, retrieval device, retrieval system, retrieval method, and recording medium

Info

Publication number: JP2012027841A
Application number: JP2010168285A
Authority: JP
Inventors: Takuya Hiraoka; 卓也平岡
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2010-07-27
Filing date: 2010-07-27
Publication date: 2012-02-09

Abstract

PROBLEM TO BE SOLVED: To reduce the amount of information for a retrieval object while keeping retrieval accuracy and convenience of retrieval.SOLUTION: The information retrieval device 1 which determines, on the basis of the degree of adaptability to a designated condition, an order of displaying a plurality of documents previously stored, includes an object information DB200 which stores an index information which associates a word included in the documents with its appearance number for each plurality of items consisting the documents, a designated condition information acquisition unit 101 which acquires a word which is a designated condition, an adaptability calculation unit 102 which acquires the appearance number of the word acquired as the designated condition for each item from the index information and calculates the degree of adaptability of the word for the designated condition for each document on the basis of a value obtained by summing up, for each document, appearance numbers acquired for each item and the number of the documents including a word acquired as the designated condition.

Description

本発明は、検索プログラム、検索装置、検索システム、検索方法及び記録媒体に関し、特に、情報検索における検索対象情報の記憶容量の削減に関する。 The present invention relates to a search program, a search device, a search system, a search method, and a recording medium, and more particularly, to a reduction in storage capacity of search target information in information search.

電子データに対する検索技術、あるいは検索結果の表示技術は、検索対象の情報量の増大による検索結果数の増大のため、ますます重要な技術となっている。なぜなら、求める情報が大量の検索結果に埋もれてしまい、見つけることが困難になっているからである。このような検索技術として、例えば、入力された検索要求の解析により設定された検索条件に基づいて検索を実行し、その検索結果を所定のスコア算出手段により順序付けするランキング検索技術が提案されている。 Search technology for electronic data or search result display technology has become an increasingly important technology because of the increase in the number of search results due to an increase in the amount of information to be searched. This is because the information that is sought is buried in a large amount of search results, making it difficult to find. As such a search technique, for example, a ranking search technique is proposed in which a search is executed based on a search condition set by analyzing an input search request, and the search results are ordered by a predetermined score calculation means. .

スコア算出手段においては、指定された検索条件に含まれる検索語等が夫々の文書において出現する若しくは用いられている回数であるＴＦ（ＴｅｒｍＦｒｅｑｕｅｎｃｙ）及び上記検索語等を含む文書の数であるＤＦ（ＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ）が用いられる。そして、検索のインタラクティブ性を向上するため、検索語とＤＦとが関連付けられた情報及び夫々の文書毎に検索語とＴＦとが関連付けられた情報を含む索引が生成されることが一般的である。 In the score calculation means, TF (Term Frequency), which is the number of times a search word included in a specified search condition appears or is used in each document, and the number of documents including the search word, etc. (Document Frequency) is used. In order to improve search interactivity, it is common to generate an index including information associated with a search term and DF and information associated with a search term and TF for each document. .

このような検索技術においては、複数の索引を用いながら正しい検索結果を取得するために、複数の索引ごとの検索結果をマージして最終的な検索結果を生成する方法が提案されている（例えば、特許文献１参照）。 In such a search technique, in order to obtain a correct search result using a plurality of indexes, a method of generating a final search result by merging the search results for each of the plurality of indexes has been proposed (for example, , See Patent Document 1).

ここで、情報検索における検索対象の情報はウェブサイトや電子化された文書等であるが、例えば電子化された文書であれば、“タイトル”、“要約”、“本文”等、複数の項目に分割されている場合があり得る。そして、検索条件として、キーワードを指定するだけでなく、そのキーワードが、上記“タイトル”、“要約”、“本文”のいずれに含まれるかをも指定する場合がある。 Here, the information to be searched in the information search is a website, an electronic document, etc. For example, if it is an electronic document, a plurality of items such as “title”, “summary”, “text”, etc. It may be divided into two. In addition to specifying a keyword as a search condition, it may be specified whether the keyword is included in the “title”, “summary”, or “text”.

従って、上述した索引は、文書毎に生成されるのみでなく、文書の項目毎にも生成する必要があり、索引を２重に生成することになる。更に、検索条件としてのキーワードが含まれる範囲が、“タイトル”及び“要約”等のように複数の項目にわたって指定される場合等も考慮すると、項目の組み合わせの全ての場合について索引を生成する必要があり、検索対象の情報の容量が膨大になってしまう。 Therefore, the index described above needs to be generated not only for each document but also for each item of the document, and the index is generated twice. Furthermore, considering the case where the range including keywords as search conditions is specified across multiple items such as “Title” and “Summary”, it is necessary to generate an index for all combinations of items. There is a huge amount of information to be searched.

本発明は、上記課題を解決するためになされたものであり、検索精度及び検索の利便性を維持したまま、検索対象の情報量を低減することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to reduce the amount of information to be searched while maintaining search accuracy and convenience of search.

上記課題を解決するために、本発明の一態様は、予め格納されている複数の文書を表示する順序を、指定された条件に対する適合度に基づいて決定する検索プログラムであって、前記指定された条件となる単語を取得して記憶媒体に記憶させるステップと、前記記憶された単語に基づいて、前記文書を構成する複数の項目毎に前記文書に含まれる単語とその出現数とが関連付けられた検索対象情報を参照するステップと、前記検索対象情報において前記記憶された単語に関連付けられた出現数を前記夫々の項目毎に取得して記憶媒体に記憶させるステップと、前記夫々の項目毎に取得した出現数を前記文書毎に加算した値及び前記記憶された単語を含む文書の数に基づき、前記文書毎の前記指定された条件に対する適合度を算出して記憶媒体に記憶させるステップとを情報処理装置に実行させることを特徴とする。 In order to solve the above-described problem, one aspect of the present invention is a search program that determines the order in which a plurality of previously stored documents are displayed based on the degree of conformity to a specified condition. Acquiring a word serving as a condition and storing it in a storage medium, and based on the stored word, the word included in the document and the number of occurrences thereof are associated for each of a plurality of items constituting the document. A step of referring to the search target information, a step of acquiring the number of occurrences associated with the stored word in the search target information for each of the items, and storing the number of occurrences in a storage medium, and for each of the items Based on the value obtained by adding the number of appearances obtained for each document and the number of documents including the stored word, the degree of conformity with respect to the specified condition for each document is calculated, and the storage medium And a step to be stored, characterized in that to be executed by the information processing apparatus.

また、前記適合度を算出して記憶媒体に記憶させるステップにおいて、前記夫々の項目毎に取得した出現数に前記複数の項目毎の重要度を示す係数を乗じた上で前記文書毎に加算することが好ましい。 In addition, in the step of calculating the fitness and storing it in a storage medium, the number of appearances acquired for each item is multiplied by a coefficient indicating the importance for each of the plurality of items, and then added for each document. It is preferable.

また、前記記憶された単語を含む文書の数は、前記夫々の項目において前記記憶された単語を含む文書の数の論理和とすることができる。 Further, the number of documents including the stored word may be a logical sum of the number of documents including the stored word in each item.

また、前記記憶された単語を含む文書の数は、前記複数の項目のいずれかに前記記憶された単語を含む文書の数とすることができる。 Further, the number of documents including the stored word may be the number of documents including the stored word in any of the plurality of items.

また、前記適合度を算出して記憶媒体に記憶させるステップにおいて、前記夫々の項目毎に取得した出現数を前記文書毎に加算した値を、前記文書の長さの値を用いて調整することが好ましい。 Further, in the step of calculating the fitness and storing it in a storage medium, a value obtained by adding the number of appearances acquired for each item for each document is adjusted using the length value of the document. Is preferred.

他方、本発明の他の態様は、予め格納されている複数の文書を表示する順序を、指定された条件に対する適合度に基づいて決定する検索装置であって、前記文書を構成する複数の項目毎に前記文書に含まれる単語とその出現数とが関連付けられた検索対象情報を記憶している検索対象情報記憶部と、前記指定された条件となる単語を取得する条件取得部と、前記指定された条件として取得された単語に関連付けられた出現数を、前記検索対象情報から前記夫々の項目毎に取得し、前記夫々の項目毎に取得した出現数を前記文書毎に加算した値及び前記取得された単語を含む文書の数に基づき、前記文書毎の前記指定された条件に対する適合度を算出する適合度算出部とを含むことを特徴とする。 On the other hand, according to another aspect of the present invention, there is provided a search device that determines the order in which a plurality of prestored documents are displayed based on the degree of conformity with a specified condition, and the plurality of items constituting the document A search target information storage unit that stores search target information in which a word included in the document and the number of occurrences thereof are associated with each other, a condition acquisition unit that acquires a word that is the specified condition, and the designation The number of occurrences associated with the word obtained as a condition obtained is obtained for each of the items from the search target information, and the value obtained by adding the number of occurrences obtained for each of the items for each document and A fitness level calculation unit that calculates a fitness level for the specified condition for each document based on the number of documents including the acquired word.

ここで、前記適合度算出部は、前記夫々の項目毎に取得した出現数に前記複数の項目毎の重要度を示す係数を乗じた上で前記文書毎に加算することが好ましい。 Here, it is preferable that the fitness level calculation unit multiplies the number of appearances acquired for each item by a coefficient indicating the importance level for each of the plurality of items and adds the result for each document.

また、本発明の更に他の態様は、予め格納されている複数の文書を表示する順序を、指定された条件に対する適合度に基づいて決定する検索システムであって、前記文書を構成する複数の項目毎に前記文書に含まれる単語とその出現数とが関連付けられた検索対象情報を記憶している検索対象情報記憶部と、画像処理装置において入力された前記指定された条件となる単語をネットワークを介して取得する条件取得部と、指定された条件として取得された単語に関連付けられた出現数を、前記検索対象情報から前記夫々の項目毎に取得し、前記夫々の項目毎に取得した出現数を前記文書毎に加算した値及び前記取得された単語を含む文書の数に基づき、前記文書毎の前記指定された条件に対する適合度を算出する適合度算出部とを含むことを特徴とする。 According to still another aspect of the present invention, there is provided a search system that determines a display order of a plurality of prestored documents based on a degree of conformity with a specified condition, the plurality of documents constituting the document A search target information storage unit storing search target information in which a word included in the document and the number of occurrences thereof are associated for each item, and a word that is the specified condition input in the image processing apparatus A condition acquisition unit that acquires the number of occurrences associated with a word acquired as a specified condition for each of the items from the search target information, and the appearance acquired for each of the items A fitness calculation unit that calculates a fitness for the specified condition for each document based on a value obtained by adding a number for each document and the number of documents including the acquired word. And butterflies.

また、本発明の更に他の態様は、予め格納されている複数の文書を表示する順序を、指定された条件に対する適合度に基づいて決定する検索方法であって、前記指定された条件となる単語を取得して記憶媒体に記憶させ、前記記憶された単語に基づいて、前記文書を構成する複数の項目毎に前記文書に含まれる単語とその出現数とが関連付けられた検索対象情報を参照し、前記検索対象情報において前記記憶された単語に関連付けられた出現数を前記夫々の項目毎に取得して記憶媒体に記憶させ、前記夫々の項目毎に取得した出現数を前記文書毎に加算した値及び前記記憶された単語を含む文書の数に基づき、前記文書毎の前記指定された条件に対する適合度を算出して記憶媒体に記憶させることを特徴とする。 According to still another aspect of the present invention, there is provided a search method for determining a display order of a plurality of prestored documents based on a degree of conformity with a specified condition, which is the specified condition. A word is acquired and stored in a storage medium, and based on the stored word, reference is made to search target information in which a word included in the document and the number of occurrences thereof are associated for each of a plurality of items constituting the document Then, the number of occurrences associated with the stored word in the search target information is acquired for each item and stored in a storage medium, and the number of occurrences acquired for each item is added for each document. Based on the calculated value and the number of documents including the stored word, the degree of conformity with respect to the specified condition for each document is calculated and stored in a storage medium.

また、本発明の更に他の態様は、記録媒体であって、上記検索プログラムを情報処理装置によって読み取り可能な形式で記録したしたことを特徴とする。 Yet another aspect of the present invention is a recording medium, wherein the search program is recorded in a format readable by an information processing apparatus.

本発明によれば、検索精度及び検索の利便性を維持したまま、検索対象の情報量を低減することができる。 According to the present invention, it is possible to reduce the amount of information to be searched while maintaining search accuracy and convenience of search.

本発明の実施形態に係るシステムの運用形態を示す図である。It is a figure which shows the operation | use form of the system which concerns on embodiment of this invention. 本発明の実施形態に係る検索装置、クライアント装置及び対象情報ＤＢのハードウェア構成を模式的に示すブロック図である。It is a block diagram which shows typically the hardware constitutions of the search device which concerns on embodiment of this invention, a client apparatus, and object information DB. 本発明の実施形態に係る検索装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the search device which concerns on embodiment of this invention. 本発明の実施形態に係る指定条件情報の例を示す図である。It is a figure which shows the example of the designation | designated condition information which concerns on embodiment of this invention. 本発明の実施形態に係る索引情報の例を示す図である。It is a figure which shows the example of the index information which concerns on embodiment of this invention. 本発明の実施形態に係るレコード情報の例を示す図である。It is a figure which shows the example of the record information which concerns on embodiment of this invention. 本発明の実施形態に係るシステムの動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the system which concerns on embodiment of this invention. 本発明の実施形態に係るＴＦの生成結果を示す図である。It is a figure which shows the production | generation result of TF which concerns on embodiment of this invention. 本発明の実施形態に係る適合度の算出結果を示す図である。It is a figure which shows the calculation result of the fitness based on embodiment of this invention. 本発明の実施形態に係る項目毎の重み付けの情報を示す図である。It is a figure which shows the information of the weighting for every item which concerns on embodiment of this invention. 本発明の他の実施形態に係るシステムの運用形態を示す図である。It is a figure which shows the operation | use form of the system which concerns on other embodiment of this invention.

以下、図面を参照して、本発明の実施形態を詳細に説明する。本実施形態においては、複数の項目を含む電子文書を検索する情報検索システムにおいて、検索対象の情報である索引の情報が、文書の項目毎にのみ生成されて記憶されているシステムを例として説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, an information retrieval system that retrieves an electronic document including a plurality of items will be described by taking as an example a system in which index information that is information to be retrieved is generated and stored only for each item of the document. To do.

図１は、本実施の形態に係る情報検索システムの運用形態の例を示す図である。図１に示すように、本実施形態に係る情報検索システムは、情報検索装置１、クライアント装置２及び対象情報ＤＢ２００を含む。クライアント装置２は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の一般的な情報処理装置によって構成される。情報検索装置１は、ネットワークを介してクライアント装置２と接続されており、クライアント装置２からの検索要求を受けて対象情報ＤＢ２００に格納されている文書情報を検索するサーバとして運用される。 FIG. 1 is a diagram illustrating an example of an operation mode of the information search system according to the present embodiment. As illustrated in FIG. 1, the information search system according to the present embodiment includes an information search device 1, a client device 2, and a target information DB 200. The client device 2 is configured by a general information processing device such as a PC (Personal Computer). The information retrieval apparatus 1 is connected to the client apparatus 2 via a network, and is operated as a server that retrieves document information stored in the target information DB 200 in response to a retrieval request from the client apparatus 2.

対象情報ＤＢ２００は、検索対象の情報である電子文書の情報に加えて、電子文書に基づいて生成された索引の情報を記憶している。本実施形態に係る情報検索装置１は、この索引の情報を参照し、与えられた検索条件に対する電子文書毎の適合度を算出する、対象情報ＤＢ２００が記憶している索引の情報について、後に詳述する。尚、図１に示すように、本実施形態においては、対象情報ＤＢ２００が情報検索装置１とは別に設けられている例を説明するが、対象情報ＤＢ２００を情報検索装置１内部に構成することも可能である。対象情報ＤＢ２００は、ＨＤＤ等の不揮発性記憶媒体によって構成される。 The target information DB 200 stores information on the index generated based on the electronic document in addition to the information on the electronic document that is the search target information. The information retrieval apparatus 1 according to the present embodiment refers to the information of the index, calculates the degree of fitness for each electronic document with respect to a given search condition, and details the index information stored in the target information DB 200 later. Describe. As shown in FIG. 1, in this embodiment, an example in which the target information DB 200 is provided separately from the information search apparatus 1 will be described. However, the target information DB 200 may be configured inside the information search apparatus 1. Is possible. The target information DB 200 is configured by a nonvolatile storage medium such as an HDD.

次に、本実施形態に係る情報検索装置１及びクライアント装置２のハードウェア構成について説明する。図２は、本実施形態に係る情報検索装置１のハードウェア構成を示すブロック図である。図２においては、情報検索装置１のハードウェア構成について説明するが、クライアント装置２についても同様である。 Next, the hardware configuration of the information search device 1 and the client device 2 according to the present embodiment will be described. FIG. 2 is a block diagram illustrating a hardware configuration of the information search apparatus 1 according to the present embodiment. In FIG. 2, the hardware configuration of the information search apparatus 1 will be described, but the same applies to the client apparatus 2.

図２に示すように、本実施形態に係る情報検索装置１は、一般的なサーバやＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の情報処理端末と同様の構成を有する。即ち、本実施形態に係る情報検索装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）３０、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）４０及びＩ／Ｆ５０がバス８０を介して接続されている。また、Ｉ／Ｆ５０にはＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）６０及び操作部７０が接続されている。 As shown in FIG. 2, the information search apparatus 1 according to the present embodiment has the same configuration as an information processing terminal such as a general server or a PC (Personal Computer). That is, the information search apparatus 1 according to the present embodiment includes a CPU (Central Processing Unit) 10, a RAM (Random Access Memory) 20, a ROM (Read Only Memory) 30, an HDD (Hard Disk Drive) 40, and an I / F 50. 80 is connected. Further, an LCD (Liquid Crystal Display) 60 and an operation unit 70 are connected to the I / F 50.

ＣＰＵ１０は演算手段であり、情報検索装置１全体の動作を制御する。ＲＡＭ２０は、情報の高速な読み書きが可能な揮発性の記憶媒体であり、ＣＰＵ１０が情報を処理する際の作業領域として用いられる。ＲＯＭ３０は、読み出し専用の不揮発性記憶媒体であり、ファームウェア等のプログラムが格納されている。ＨＤＤ４０は、情報の読み書きが可能な不揮発性の記憶媒体であり、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や各種の制御プログラム、アプリケーション・プログラム等が格納される。 The CPU 10 is a calculation means and controls the operation of the entire information retrieval apparatus 1. The RAM 20 is a volatile storage medium capable of reading and writing information at high speed, and is used as a work area when the CPU 10 processes information. The ROM 30 is a read-only nonvolatile storage medium and stores a program such as firmware. The HDD 40 is a non-volatile storage medium that can read and write information, and stores an OS (Operating System), various control programs, application programs, and the like.

Ｉ／Ｆ５０は、バス８０と各種のハードウェアやネットワーク等を接続し制御する。ＬＣＤ６０は、ユーザが情報検索装置１の状態を確認するための視覚的ユーザインタフェースである。操作部７０は、キーボードやマウス等、ユーザが情報検索装置１に情報を入力するためのユーザインタフェースである。尚、図１において説明したように、本実施形態に係る情報検索装置１は、サーバとして運用される。従って、ＬＣＤ６０及び操作部７０等のユーザインタフェースは省略可能である。 The I / F 50 connects and controls the bus 80 and various hardware and networks. The LCD 60 is a visual user interface for the user to check the state of the information search device 1. The operation unit 70 is a user interface such as a keyboard and a mouse for the user to input information to the information search apparatus 1. As described with reference to FIG. 1, the information search apparatus 1 according to the present embodiment is operated as a server. Therefore, user interfaces such as the LCD 60 and the operation unit 70 can be omitted.

このようなハードウェア構成において、ＲＯＭ３０やＨＤＤ４０若しくは図示しない光学ディスク等の記憶媒体に格納されたプログラムがＲＡＭ２０に読み出され、そのプログラムに従ってＣＰＵ１０が演算を行う事により、ソフトウェア制御部が構成される。このようにして構成されたソフトウェア制御部と、ハードウェアとの組み合わせによって、本実施形態に係る情報検索装置１の機能を実現する機能ブロックが構成される。 In such a hardware configuration, a program stored in a storage medium such as the ROM 30, the HDD 40, or an optical disk (not shown) is read into the RAM 20, and the CPU 10 performs calculations according to the program, thereby configuring a software control unit. . A functional block that realizes the function of the information search apparatus 1 according to the present embodiment is configured by a combination of the software control unit configured as described above and hardware.

次に、本実施形態に係る情報検索装置１の機能ブロックについて、図３を参照して説明する。図３は、本実施形態に係る情報検索装置１の機能ブロック及び情報検索装置１が検索する対象の電子文書の情報を格納している対象情報ＤＢ２００を示すブロック図である。図３に示すように、本実施形態に係る情報検索装置１は、検索制御部１００、情報入力部１１０、ネットワークＩ／Ｆ１２０及び表示部１３０を有する。 Next, functional blocks of the information search apparatus 1 according to the present embodiment will be described with reference to FIG. FIG. 3 is a block diagram showing the target information DB 200 storing the functional blocks of the information search apparatus 1 according to the present embodiment and the information of the electronic document to be searched by the information search apparatus 1. As illustrated in FIG. 3, the information search apparatus 1 according to the present embodiment includes a search control unit 100, an information input unit 110, a network I / F 120, and a display unit 130.

情報入力部１１０は、ユーザが情報検索装置１を操作して検索制御部１００に情報を入力するための構成であり、図２に示すＩ／Ｆ５０及び操作部７０によって実現される。ネットワークＩ／Ｆ１２０は、情報検索装置１がネットワークを介して情報を取得し、若しくはネットワークを介して情報を送信するためのインタフェースであり、図２に示すＩ／Ｆ５０によって実現され、具体的には、例えばＥｔｈｅｒｎｅｔ（登録商標）接続のインタフェースや、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）接続のインタフェースによって実現される。 The information input unit 110 is configured to allow a user to operate the information search apparatus 1 and input information to the search control unit 100, and is realized by the I / F 50 and the operation unit 70 illustrated in FIG. The network I / F 120 is an interface for the information search apparatus 1 to acquire information via the network or transmit information via the network, and is realized by the I / F 50 shown in FIG. For example, it is realized by an interface of Ethernet (registered trademark) connection or an interface of USB (Universal Serial Bus) connection.

表示部１３０は、情報検索装置１の動作状態や、検索結果等が表示される構成であり、図２に示すＩ／Ｆ５０及びＬＣＤ６０によって実現される。尚、上述したように、情報入力部１１０及び表示部１３０は省略可能である。検索制御部１００は、本実施形態に係る情報検索装置１の検索機能を担う構成であり、指定条件情報取得部１０１、適合度算出部１０２及び算出結果処理部１０３を有する。検索制御部１００は、図２に示すＲＡＭ２０にロードされたプログラムに従ってＣＰＵ１０が演算を行うことにより構成される。 The display unit 130 is configured to display the operation state of the information search apparatus 1, search results, and the like, and is realized by the I / F 50 and the LCD 60 shown in FIG. As described above, the information input unit 110 and the display unit 130 can be omitted. The search control unit 100 is configured to perform the search function of the information search apparatus 1 according to the present embodiment, and includes a specified condition information acquisition unit 101, a fitness calculation unit 102, and a calculation result processing unit 103. The search control unit 100 is configured by the CPU 10 performing calculations according to a program loaded in the RAM 20 shown in FIG.

指定条件情報取得部１０１は、ユーザによって情報入力部１１０を介して入力された検索条件の情報若しくはネットワークＩ／Ｆ１２０を介してネットワーク経由で入力された検索条件の情報を指定条件情報として取得する。指定条件情報とは、所望の文書を抽出するための条件として、ユーザによって指定される条件であり、検索対象の電子文書が含むべき単語を指定するキーワードの情報や、検索対象の電子文書において、そのキーワードが含まれるべき項目を指定する情報である。 The specified condition information acquisition unit 101 acquires search condition information input by the user via the information input unit 110 or search condition information input via the network via the network I / F 120 as specified condition information. The designation condition information is a condition designated by the user as a condition for extracting a desired document. In the keyword information for designating a word to be included in the electronic document to be searched or the electronic document to be searched, This is information specifying an item that should contain the keyword.

図４（ａ）〜（ｃ）を参照して、指定条件情報取得部１０１が取得する指定条件情報の例について説明する。図４（ａ）は、指定条件情報の例として、“システム”及び“データベース”というキーワードが指定され、キーワードが含まれるべき項目として“タイトル”が指定される場合を示す図である。この場合、“システム”及び“データベース”というキーワードが電子文書の“タイトル”という項目に含まれる文書が検索により抽出され、適合度の算出対象となる。 An example of the specified condition information acquired by the specified condition information acquiring unit 101 will be described with reference to FIGS. FIG. 4A is a diagram showing a case where keywords “system” and “database” are designated as an example of the designation condition information, and “title” is designated as an item that should contain the keyword. In this case, a document in which the keywords “system” and “database” are included in the item “title” of the electronic document is extracted by the search and becomes a calculation target of the fitness.

図４（ｂ）は、“システム”というキーワードが指定され、キーワードが含まれるべき項目として“タイトル”及び“要約”が指定される場合を示す図である。この場合、上記キーワードが電子文書の“タイトル”及び“要約”という項目に含まれる文書が検索により抽出され、適合度の算出対象となる。 FIG. 4B is a diagram showing a case where the keyword “system” is designated, and “title” and “summary” are designated as items that should contain the keyword. In this case, a document in which the keyword is included in the items “title” and “summary” of the electronic document is extracted by the search, and becomes a calculation target of the fitness.

図４（ｃ）は、“システム”及び“データベース”というキーワードが指定され、キーワードが含まれるべき項目として“全文”が指定される場合を示す図である。この場合、上記キーワードが電子文書の項目を問わずに含まれる文書が検索により抽出され、適合度の算出対象となる。ｖ FIG. 4C is a diagram illustrating a case where the keywords “system” and “database” are specified, and “full text” is specified as an item that should include the keyword. In this case, a document in which the keyword is included regardless of the item of the electronic document is extracted by the search, and becomes a calculation target of the fitness. v

適合度算出部１０２は、指定条件情報取得部１０１から入力された指定条件情報に基づき、対象情報ＤＢ２００に格納されている各文書の項目毎の索引を参照し、各文書の検索条件に対する適合度を算出する、対象情報ＤＢ２００において文書中の項目毎に記憶されている複数の索引に基づく文書毎の適合度の算出方法が本実施形態の要旨の１つとなる。適合度算出部１０２による具体的な適合度の算出方法については、後に詳述する。 The fitness level calculation unit 102 refers to the index for each item of each document stored in the target information DB 200 based on the specified condition information input from the specified condition information acquisition unit 101, and the fitness level for each document search condition One of the gist of the present embodiment is a method for calculating the fitness for each document based on a plurality of indexes stored for each item in the document in the target information DB 200. A specific calculation method of the fitness level by the fitness level calculation unit 102 will be described in detail later.

算出結果処理部１０３は、適合度算出部１０２によって算出された文書毎の適合度の一覧を、表示部１３０若しくはクライアント装置２の表示部に表示するための表示情報を生成して、出力する。対象情報ＤＢ２００は、図３に示すように、“タイトル”、“要約”、“本文”等の電子文書の項目毎に、“タイトル索引”、“要約索引”、“本文索引”といった形で索引の情報を記憶している。また、対象情報ＤＢ２００は、検索対象である文書の一覧としてレコード情報を記憶している。 The calculation result processing unit 103 generates and outputs display information for displaying the list of fitness levels for each document calculated by the fitness level calculation unit 102 on the display unit 130 or the display unit of the client device 2. As shown in FIG. 3, the target information DB 200 is indexed in the form of “title index”, “summary index”, “text index” for each item of the electronic document such as “title”, “summary”, “text”. The information is memorized. The target information DB 200 stores record information as a list of documents to be searched.

図５（ａ）、（ｂ）及び図６を参照して、本実施形態に係る対象情報ＤＢ２００が記憶している情報の例を示す。図５（ａ）は、タイトル索引の情報の例を示す図であり、図５（ｂ）は、本文索引の情報の例を示す図である。図５（ａ）、（ｂ）に示すように、各索引情報には、先ず、検索対象として管理されている文書に含まれている単語にＤＦ（ＤｏｃｕｍｅｎｔＦｒｅｑｕｅｎｃｙ：文書頻度）が関連付けられた情報が含まれている。そして、夫々の単語毎について、夫々の文書を識別する文書ＩＤとＴＦ（ＴｅｒｍＦｒｅｑｕｅｎｃｙ：文書内頻度）とが関連付けられている。 An example of information stored in the target information DB 200 according to the present embodiment will be described with reference to FIGS. FIG. 5A is a diagram illustrating an example of title index information, and FIG. 5B is a diagram illustrating an example of text index information. As shown in FIGS. 5 (a) and 5 (b), each index information is first information in which a DF (Document Frequency) is associated with a word included in a document managed as a search target. It is included. For each word, a document ID for identifying each document is associated with a TF (Term Frequency).

ここで、ＤＦとは、検索対象として管理されている文書のうち、その単語を含む文書の数を示す情報である。本実施形態において図５（ａ）、（ｂ）に示すＤＦは、夫々の項目において、その単語を含む文書の数である。例えば、図５（ａ）の場合、文書のタイトルに“システム”という単語を含む文書が３つであることを示している。また、ＴＦとは、夫々の文書内において、その単語が含まれている数を示す情報である。例えば、図５（ｂ）において、“システム”という単語を本文に含む文書は、文書ＩＤ“１”〜“５”の５つの文書であり、文書ＩＤ“１”の文書において、“システム”という単語は１０個含まれている。 Here, the DF is information indicating the number of documents including the word among documents managed as search targets. In the present embodiment, DF shown in FIGS. 5A and 5B is the number of documents including the word in each item. For example, FIG. 5A shows that there are three documents including the word “system” in the document title. TF is information indicating the number of words included in each document. For example, in FIG. 5B, documents including the word “system” in the body are five documents with document IDs “1” to “5”, and “document” is the document with document ID “1”. Ten words are included.

更に、図５（ａ）、（ｂ）に示すように、夫々の索引情報には、夫々の文書の長さである文書長を示す情報も含まれる。本実施形態に係る文書長の情報は、その文書の文字数である。図５（ａ）、（ｂ）において、文書ＩＤ“１”のタイトルの文書長は“１２”であり、本文の文書長は“１００”である。 Further, as shown in FIGS. 5A and 5B, each index information includes information indicating the document length which is the length of each document. The document length information according to the present embodiment is the number of characters of the document. 5A and 5B, the document length of the title with the document ID “1” is “12”, and the document length of the text is “100”.

図６は、レコード情報の例を示す図である。図６に示すように、本実施形態に係るレコード情報においては、文書ＩＤと、その文書のタイトルと、その文書の作成者とが関連付けられている。このレコード情報は、算出結果処理部１０３が、適合度算出部１０２による適合度の算出結果に従って一覧を生成する際に参照する情報である。 FIG. 6 is a diagram illustrating an example of record information. As shown in FIG. 6, in the record information according to the present embodiment, a document ID, a title of the document, and a creator of the document are associated with each other. This record information is information that the calculation result processing unit 103 refers to when generating a list according to the calculation result of the fitness level by the fitness level calculation unit 102.

次に、本実施形態に係る情報検索システムの動作について図を参照して説明する。図７は、本実施形態に係る情報検索システムにおける情報検索動作を示すシーケンス図である。図７に示すように、文書情報ＤＢ２００に登録されている電子文書を検索する際、先ず、ユーザはクライアント装置２を操作して検索条件を指定するための検索条件指定画面を表示するための情報を情報検索装置１から取得し、検索条件指定画面を表示する（Ｓ７０１）。以下、本実施形態の説明においては、ユーザがクライアント装置２を操作して情報検索装置１の機能を利用する場合を例として説明する。 Next, the operation of the information search system according to the present embodiment will be described with reference to the drawings. FIG. 7 is a sequence diagram showing an information search operation in the information search system according to the present embodiment. As shown in FIG. 7, when searching for an electronic document registered in the document information DB 200, first, the user operates the client device 2 to display a search condition specifying screen for specifying a search condition. Is acquired from the information search apparatus 1 and a search condition designation screen is displayed (S701). Hereinafter, in the description of the present embodiment, a case where the user operates the client device 2 to use the function of the information search device 1 will be described as an example.

本実施形態においては、図４（ｂ）において説明した検索条件がユーザによって指定された場合を例として説明する。ユーザは、クライアント装置２の操作部を操作することにより、図４（ｂ）に示すような検索条件を入力し、情報検索装置１に対して指定条件情報として送信する（Ｓ７０２）。 In the present embodiment, a case where the search condition described in FIG. 4B is specified by the user will be described as an example. The user operates the operation unit of the client device 2 to input a search condition as shown in FIG. 4B, and transmits it to the information search device 1 as designated condition information (S702).

情報検索装置１に送信された指定条件情報は、ネットワークＩ／Ｆ１２０から情報検索装置１に入力され、検索制御部１００の指定条件情報取得部１０１が取得する（Ｓ７０３）。Ｓ７０３の処理は、具体的には、プログラムに従って演算を行うことにより指定条件情報取得部１０１として機能するＣＰＵ１０が、指定条件情報取得部１０１の一部として機能するＲＡＭ２０の記憶領域に指定条件情報を格納する処理である。即ち、指定条件情報取得部１０１が、条件取得部として機能する。 The specified condition information transmitted to the information search device 1 is input from the network I / F 120 to the information search device 1 and acquired by the specified condition information acquisition unit 101 of the search control unit 100 (S703). Specifically, the processing in S703 is performed by the CPU 10 functioning as the specified condition information acquisition unit 101 by performing an operation according to a program, and the specified condition information is stored in the storage area of the RAM 20 functioning as a part of the specified condition information acquisition unit 101. It is a process to store. That is, the specified condition information acquisition unit 101 functions as a condition acquisition unit.

適合度算出部１０２は、指定条件情報取得部１０１から指定条件情報を取得すると、指定された項目及びキーワードに応じて、対象情報ＤＢ２００に格納されている索引情報を検索する（Ｓ７０４）。即ち、適合度算出部１０２は、Ｓ７０４において、対象情報ＤＢ２００に格納されている索引情報を参照し、指定条件情報としてのキーワードに関連付けられた情報を抽出する。図７の例においては、図４（ｂ）に示すように、“タイトル”及び“本文”が項目として指定されているため、適合度算出部１０２は、タイトル索引及び本文索引を検索する。 When the matching condition calculation unit 102 acquires the specified condition information from the specified condition information acquisition unit 101, the matching level calculation unit 102 searches the index information stored in the target information DB 200 according to the specified item and keyword (S704). That is, in S704, the fitness level calculation unit 102 refers to the index information stored in the target information DB 200, and extracts information associated with the keyword as the specified condition information. In the example of FIG. 7, since “title” and “text” are designated as items as shown in FIG. 4B, the fitness level calculation unit 102 searches the title index and text index.

Ｓ７０４において、適合度算出部１０２は、図５（ａ）、（ｂ）において説明したタイトル索引及び本文索引を検索し、“システム”というキーワードについて、“タイトル”及び“本文”夫々の項目毎にＴＦを取得する。また、適合度算出部１２０は、Ｓ７０４において、夫々の項目毎に“システム”というキーワードについてのＤＦを取得する。この処理は、具体的には、適合度算出部１０２として機能するＣＰＵ１０が、ＲＡＭ２０の記憶領域に上記ＴＦ及びＤＦの値を格納する処理である。 In step S <b> 704, the fitness level calculation unit 102 searches the title index and text index described with reference to FIGS. 5A and 5B, and searches for the keyword “system” for each item of “title” and “text”. Get TF. In step S <b> 704, the fitness level calculation unit 120 acquires a DF for the keyword “system” for each item. Specifically, this process is a process in which the CPU 10 functioning as the fitness calculation unit 102 stores the TF and DF values in the storage area of the RAM 20.

Ｓ７０４の処理が完了すると、適合度算出部１０２は、Ｓ７０４において夫々の項目毎に取得したＴＦ及びＤＦの値に基づき、夫々の文書毎のＴＦ及びＤＦの値を生成する（Ｓ７０５）。Ｓ７０５において、適合度算出部１０２は、一の文書の夫々の項目について取得されたＴＦの値を合計することにより、その文書についてのＴＦの値を生成する。 When the processing of S704 is completed, the fitness level calculation unit 102 generates TF and DF values for each document based on the TF and DF values acquired for each item in S704 (S705). In step S <b> 705, the fitness level calculation unit 102 generates a TF value for the document by summing the TF values acquired for the respective items of one document.

また、適合度算出部１０２は、Ｓ７０５において、検索対象の項目において“システム”というキーワードを含む文書の数について、検索対象である項目の論理和をとることにより、その文書についてのＤＦ値を生成する。尚、ＤＦ値の生成について、本実施形態に係る適合度算出部１０２は、図５（ａ）、（ｂ）の中央のテーブル、即ち、文書ＩＤとＴＦ値とが関連付けられたテーブルにおいて、共通している文書ＩＤの数を、夫々の項目におけるＤＦ値の合計から引くことにより、論理和をとってＤＦ値を求める。Ｓ７０５の処理は、適合度算出部１０２として機能するＣＰＵ１０が、ＲＡＭ２０の記憶領域に、上記生成したＴＦ及びＤＦの値を格納する処理である。Ｓ７０５の処理により生成されたＴＦ値の例を、図８に示す。 In step S <b> 705, the fitness level calculation unit 102 generates a DF value for the document by calculating the logical sum of the items to be searched for the number of documents including the keyword “system” in the items to be searched. To do. For the generation of the DF value, the fitness calculation unit 102 according to the present embodiment is common in the central table of FIGS. 5A and 5B, that is, the table in which the document ID and the TF value are associated with each other. By subtracting the number of document IDs from the sum of the DF values in each item, a logical sum is obtained to obtain the DF value. The process of S705 is a process in which the CPU 10 functioning as the fitness calculation unit 102 stores the generated TF and DF values in the storage area of the RAM 20. An example of the TF value generated by the process of S705 is shown in FIG.

Ｓ７０５の処理が完了すると、適合度算出部１０２は、対象情報ＤＢ２００に格納されている夫々の文書について、上記生成したＴＦ及びＤＦの値に基づき適合度を算出する（Ｓ７０６）。Ｓ７０６の処理は、適合度算出部１０２として機能するＣＰＵ１０が、ＲＡＭ２０の記憶領域に、上記算出した適合度を格納する処理である。ここで、Ｓ７０６における適合度の算出態様について説明する。文書ｊのキーワードｉについての適合度Ｓｃｏｒｅｉ，ｊは、以下の式（１）によって求められる。

When the processing of S705 is completed, the fitness level calculation unit 102 calculates the fitness level of each document stored in the target information DB 200 based on the generated TF and DF values (S706). The process of S706 is a process in which the CPU 10 functioning as the fitness level calculation unit 102 stores the calculated fitness level in the storage area of the RAM 20. Here, the calculation mode of the fitness in S706 will be described. The score Score i, j for the keyword i of the document j is obtained by the following equation (1).

ここで、式（１）に示す“Ｎ”は、対象情報ＤＢ２００に格納されている全文書の数である。また、“ｔｆｉｊ”は、上記生成されたＴＦ値であり、“ｄｆｉ”は、上記生成されたＤＦ値である。 Here, “N” shown in Expression (1) is the number of all documents stored in the target information DB 200. “Tfij” is the generated TF value, and “dfi” is the generated DF value.

上記式（１）において、適合度Ｓｃｏｒｅｉ，ｊはＤＦの値が小さい程大きくなる。これは、その単語を含む文書の数が少ない程、即ちＤＦの値が小さい程、特徴的な単語であるという考え方に基づく。また、適合度Ｓｃｏｒｅｉ，ｊは、ＴＦの値が大きい程大きくなる。これは、その単語を多く含む文書である程、即ち、ＴＦの値が大きい程、条件に合致した文書であるという考え方に基づく。 In the above equation (1), the fitness score Scorei, j increases as the value of DF decreases. This is based on the idea that the smaller the number of documents containing the word, that is, the smaller the DF value, the more characteristic the word. Also, the fitness score Scorei, j increases as the value of TF increases. This is based on the idea that a document that contains more words, that is, a document that matches the condition, the greater the value of TF.

適合度算出部１０２は、上記式（１）を用いて、対象情報ＤＢ２００に格納されている全文書に対して、Ｓ７０５において生成したＴＦ及びＤＦの値を用いて適合度を算出する。図９に、Ｓ７０６における適合度の算出結果を示す。図９に示すように、対象情報ＤＢ２００に格納されている夫々の文書について適合度が算出される。 The goodness-of-fit calculation unit 102 uses the above formula (1) to calculate the goodness of fit for all the documents stored in the target information DB 200 using the values of TF and DF generated in S705. FIG. 9 shows the calculation result of the fitness in S706. As shown in FIG. 9, the fitness is calculated for each document stored in the target information DB 200.

適合度算出部１０２は、図９に示すように適合度を算出すると、算出された適合度に基づいて文書の並び順をソートしてランキング結果情報を生成する。そして、適合度算出部１０２は、ランキング結果情報を算出結果処理部１０３に入力する。適合度算出部１０２からランキング結果情報を受信した抽出結果処理部１０３は、ランキング検索結果を表示するための表示情報を生成し、クライアント装置２に対して送信する（Ｓ７０７）。 When the fitness level is calculated as shown in FIG. 9, the fitness level calculation unit 102 sorts the document order based on the calculated fitness level and generates ranking result information. Then, the fitness level calculation unit 102 inputs the ranking result information to the calculation result processing unit 103. The extraction result processing unit 103 that has received the ranking result information from the fitness calculation unit 102 generates display information for displaying the ranking search result, and transmits it to the client device 2 (S707).

Ｓ７０７において、抽出結果処理部１０３は、ソートされた文書ＩＤに基づき、対象情報ＤＢ２００に格納されているレコード情報から、夫々の文書ＩＤに関連付けられたタイトルを取得し、タイトルがソートされて表示される画面を生成する。表示情報を受信したクライアント装置２は、表示部にランキング検索結果を表示し（Ｓ７０８）、処理を終了する。 In S707, the extraction result processing unit 103 acquires the titles associated with the respective document IDs from the record information stored in the target information DB 200 based on the sorted document IDs, and the titles are sorted and displayed. Generate a screen. The client device 2 that has received the display information displays the ranking search result on the display unit (S708), and ends the process.

このように、本実施形態に係る情報検索システムにおいては、検索対象の情報である電子文書に含まれる複数の項目夫々について、別個にＴＦ値及びＤＦ値を示す索引情報が記憶されている。そして、複数の項目が検索条件において指定された場合、指定された項目夫々からＴＦ値及びＤＦ値を取得し、取得したＴＦ値及びＤＦ値に基づいて電子文書毎のＴＦ値及びＤＦ値を生成した上で、適合度を算出する。これにより、夫々の項目の組み合わせ毎に索引情報が記憶されるような場合に比べて、索引情報の情報容量を削減しながらも、算出される適合度の精度、即ち検索精度を維持することができる。 As described above, in the information search system according to the present embodiment, index information indicating the TF value and the DF value is stored separately for each of a plurality of items included in the electronic document that is the search target information. When a plurality of items are specified in the search condition, a TF value and a DF value are acquired from each specified item, and a TF value and a DF value for each electronic document are generated based on the acquired TF value and DF value. After that, the fitness is calculated. As a result, compared to the case where index information is stored for each combination of items, the accuracy of the calculated fitness, that is, the search accuracy can be maintained while reducing the information capacity of the index information. it can.

また、検索条件としてキーワードを指定する際、そのキーワードが含まれるか否かを検索する検索対象を、夫々の項目毎や文書全体に限定されることなく、任意の項目の組み合わせも指定可能であるため、検索の利便性も維持することができる。 In addition, when a keyword is specified as a search condition, a search target for searching whether or not the keyword is included is not limited to each item or the entire document, and any combination of items can be specified. Therefore, the convenience of search can be maintained.

尚、上記実施形態においては、電子文書の項目毎の索引情報におけるＤＦは、夫々の項目において、その単語を含む文書の数である。しかしながら、上述したように、ＤＦの意義とは、その単語を含む文書の数が少ない程、即ちＤＦの値が小さい程、特徴的な単語であるという考え方を具現化するためのものであるため、項目によらず一定値としても良い。この場合、ＤＦ値は、文書全体において、その単語を含む文書の数を用いることが好ましい。そして、項目によらず一定のＤＦ値を用いる場合、図７のＳ７０５においては、ＤＦ値は一定であるため、ＴＦ値のみを生成すれば良い。 In the above embodiment, the DF in the index information for each item of the electronic document is the number of documents including the word in each item. However, as described above, the significance of DF is to realize the idea that the smaller the number of documents containing the word, that is, the smaller the DF value, the more characteristic the word. A constant value may be used regardless of the item. In this case, the DF value is preferably the number of documents including the word in the entire document. When a constant DF value is used regardless of the item, the DF value is constant in S705 of FIG. 7, and therefore only the TF value needs to be generated.

また、上記実施形態においては、適合度の算出に際して、式（１）を用いる場合を例として説明した。一般的に、長い文書の方がそれだけ多くの単語を含んでいるため、式（１）の計算方法を用いると、長い文書の方が高い適合度が算出される傾向となる。これに対して、式（１）の計算を基本として更に文書長を考慮することにより、文書長の違いによる適合度の誤差を補正する方法がある。文書長を考慮する場合、適合度Ｓｃｏｒｅｉ，ｊは、以下の式（２）によって求められる。

Moreover, in the said embodiment, the case where Formula (1) was used was demonstrated as an example in the calculation of a fitness. In general, since a longer document includes more words, using the calculation method of Equation (1) tends to calculate a higher degree of fitness for a longer document. On the other hand, there is a method of correcting the error of the fitness due to the difference in the document length by further considering the document length based on the calculation of Expression (1). When considering the document length, the fitness score Score i, j is obtained by the following equation (2).

ここで、式（２）に示す“ｌｊ”は、文書ｊの文書長である。ここで、図５（ａ）、（ｂ）において説明したように、本実施形態に係る索引情報においては、夫々の項目毎に文書長が記憶されている。従って、適合度算出部１０２は、図７のＳ７０４において、夫々の項目毎に文書長を取得し、Ｓ７０６において、上記取得した文書長を、電子文書毎に合計することによって、文書毎の文書長“ｌｊ”を生成する。 Here, “lj” shown in Expression (2) is the document length of the document j. Here, as described in FIGS. 5A and 5B, in the index information according to the present embodiment, the document length is stored for each item. Accordingly, the suitability calculation unit 102 acquires the document length for each item in S704 of FIG. 7, and sums the acquired document length for each electronic document in S706, thereby obtaining the document length for each document. “Lj” is generated.

また、式（２）に示す“Ｌ”は、対象情報ＤＢ２００に格納されている全電子文書の文書長の平均値、即ち、平均文書長である。式（２）の意義は、適合度を算出する基となるＴＦ及びＤＦの値のうち、ＴＦの値を、夫々の文書の文書長に基づいて調整する事により、文字数が多いために多くの単語が含まれ、その結果ＴＦ値が大きくなる文書の方が適合度が高く算出される傾向を是正することにある。このような計算により、長い文書程高い適合度が算出されてしまうような課題を解決することができる。 Further, “L” shown in Expression (2) is an average value of document lengths of all electronic documents stored in the target information DB 200, that is, an average document length. The significance of the expression (2) is that there are many characters because the number of characters is large by adjusting the TF value based on the document length of each document among the TF and DF values used as the basis for calculating the fitness. This is to correct a tendency that a document including a word and having a large TF value as a result has a higher relevance. By such calculation, it is possible to solve the problem that the longer the document, the higher the fitness is calculated.

また、上記実施形態においては、図７のＳ７０５において、文書毎のＴＦ値を求める際、一の文書の夫々の項目について取得されたＴＦ値を合計する場合を例として説明した。この場合、キーワードがタイトルに現れた場合も、要約に現れた場合も、本文に現れた場合も、全て均等に“１”としてカウントされる。しかしながら、そのキーワードがタイトルや要約に現れる文書の方が、本文に現れる文書よりも、よりそのキーワードに関連する文書であると考えられる。従って、そのキーワードがタイトルに１回現れた文書について、本文に１回現れた文書よりも高い適合度が算出されることが好ましい。 In the above embodiment, the case where the TF values acquired for the respective items of one document are summed up when obtaining the TF value for each document in S705 of FIG. 7 has been described as an example. In this case, whether the keyword appears in the title, in the summary, or in the text, all counts equally as “1”. However, a document in which the keyword appears in the title or summary is considered to be a document related to the keyword more than a document that appears in the text. Therefore, it is preferable that a higher relevance degree is calculated for a document in which the keyword appears once in the title than in a document that appears once in the text.

このような態様は、項目夫々について取得されたＴＦ値を合計する際に、項目に応じた重み付けをした上で合計することにより実現することができる。例えば、図１０に示すように、タイトル、要約、本文といった夫々の項目に対して、重要度を示す係数を設定した情報を適合度算出部１０２に記憶させておき、適合度算出部１０２が、Ｓ７０５において項目毎のＴＦ値を合計する際、夫々の項目毎のＴＦ値に図１０に示す係数を乗じた上で合計する。図１０の例においては、タイトルは本文よりも１０倍重要であり、要約は本文よりも５倍重要であるように重みづけがされている。このような態様により、検索精度を更に向上することが可能である。 Such an aspect can be realized by summing up the TF values acquired for each item after weighting according to the item. For example, as shown in FIG. 10, for each item such as a title, summary, and body, information in which a coefficient indicating importance is set is stored in the fitness calculation unit 102, and the fitness calculation unit 102 When summing the TF values for each item in S705, the sum is obtained by multiplying the TF value for each item by the coefficient shown in FIG. In the example of FIG. 10, the title is weighted 10 times more important than the text, and the summary is weighted 5 times more important than the text. By such an aspect, it is possible to further improve the search accuracy.

また、上記実施形態においては、ユーザが、一般的なＰＣ等によって構成されるクライアント装置２を操作して検索条件の情報を入力する場合を例として説明した。この他、例えば、図１１に示すように、プリンタ、スキャナ及びコピー機若しくはそれらの機能を複合的に有する複合機のディスプレイパネル等の操作部を、検索条件の情報を入力するインタフェースとして用いることも可能である。図１１の例においては、クライアント装置２ではなく複合機３が、ユーザが検索条件の情報を入力するための端末として用いられている。 Further, in the above-described embodiment, the case where the user operates the client device 2 configured by a general PC or the like and inputs the search condition information has been described as an example. In addition, for example, as shown in FIG. 11, an operation unit such as a display panel of a printer, a scanner, a copier, or a complex machine having these functions in combination can be used as an interface for inputting search condition information. Is possible. In the example of FIG. 11, not the client device 2 but the multifunction device 3 is used as a terminal for a user to input information on search conditions.

１検索装置
２クライアント装置
３複合機
１０ＣＰＵ
２０ＲＡＭ
３０ＲＯＭ
０ＨＤＤ
５０Ｉ／Ｆ
６０ＬＣＤ
７０操作部
８０バス
１００検索制御部
１０１指定条件情報取得部
１０２適合度算出部
１０３算出結果処理部
１１０情報入力部
１２０ネットワークＩ／Ｆ
１３０表示部
２００対象情報ＤＢ DESCRIPTION OF SYMBOLS 1 Search apparatus 2 Client apparatus 3 Multifunction machine 10 CPU
20 RAM
30 ROM
0 HDD
50 I / F
60 LCD
70 Operation Unit 80 Bus 100 Search Control Unit 101 Designated Condition Information Acquisition Unit 102 Suitability Calculation Unit 103 Calculation Result Processing Unit 110 Information Input Unit 120 Network I / F
130 Display unit 200 Target information DB

特開２００３−３２３４５７号公報JP 2003-323457 A

Claims

A search program for determining an order of displaying a plurality of documents stored in advance based on a degree of conformity to a specified condition,
Acquiring a word as the specified condition and storing it in a storage medium;
Referring to search target information in which a word included in the document and the number of occurrences thereof are associated with each other for each of a plurality of items constituting the document, based on the stored word;
Obtaining the number of occurrences associated with the stored word in the search target information for each of the items and storing it in a storage medium;
Based on a value obtained by adding the number of appearances acquired for each item for each document and the number of documents including the stored word, a fitness for the specified condition for each document is calculated, and a storage medium And a step of causing the information processing apparatus to execute the step of storing the information in the information processing apparatus.

In the step of calculating the fitness and storing it in a storage medium, the number of appearances acquired for each item is multiplied by a coefficient indicating the importance for each of the plurality of items, and then added for each document. The search program according to claim 1, wherein

In the step of acquiring a word that is the specified condition and storing it in a storage medium, also acquiring information specifying an item that should contain the word among a plurality of items constituting the document,
The search program according to claim 1 or 2, wherein the number of documents including the stored word is the number of documents including the stored word in any of the designated items.

3. The search program according to claim 1, wherein the number of documents including the stored word is the number of documents including the stored word in any of the plurality of items.

In the step of calculating the fitness and storing it in a storage medium, the value obtained by adding the number of appearances acquired for each item for each document is adjusted using the length value of the document. The search program according to any one of claims 1 to 4.

A search device for determining an order of displaying a plurality of documents stored in advance based on a degree of conformity to a specified condition,
A search target information storage unit storing search target information in which a word included in the document and the number of appearances thereof are associated for each of a plurality of items constituting the document;
A condition acquisition unit for acquiring a word that is the specified condition;
A value obtained by acquiring the number of appearances associated with the word acquired as the specified condition for each item from the search target information, and adding the number of appearances acquired for each item for each document. And a fitness calculation unit that calculates a fitness for the specified condition for each document based on the number of documents including the acquired word.

The said fitness calculation part multiplies the coefficient which shows the importance for every said item to the appearance number acquired for each said item, and adds it for every said document, It is characterized by the above-mentioned. Search device.

A search system that determines an order of displaying a plurality of prestored documents based on a degree of conformity to a specified condition,
A search target information storage unit storing search target information in which a word included in the document and the number of appearances thereof are associated for each of a plurality of items constituting the document;
A condition acquisition unit that acquires a word that is the specified condition input in the image processing apparatus via a network;
The number of appearances associated with the word acquired as the specified condition is acquired for each of the items from the search target information, and the value obtained by adding the number of appearances acquired for each of the items for each document and A search system, comprising: a fitness calculation unit that calculates a fitness for the specified condition for each document based on the number of documents including the acquired word.

The said fitness calculation part multiplies the number of appearance acquired for each said item by the coefficient which shows the importance for every said some item, It adds for every said document, It is characterized by the above-mentioned. Search device.

A search method for determining an order of displaying a plurality of prestored documents based on a degree of conformity to a specified condition,
Acquire a word as the specified condition and store it in a storage medium,
Based on the stored word, refer to the search target information in which the word included in the document and the number of occurrences thereof are associated for each of a plurality of items constituting the document,
The number of occurrences associated with the stored word in the search target information is acquired for each of the items and stored in a storage medium,
Based on a value obtained by adding the number of appearances acquired for each item for each document and the number of documents including the stored word, a fitness for the specified condition for each document is calculated, and a storage medium A search method characterized by storing the data in a memory.

6. A recording medium in which the search program according to claim 1 is recorded in a format readable by an information processing apparatus.