JP2005135199A

JP2005135199A - Automaton generating method, method, device, and program for xml data retrieval, and recording medium for xml data retrieval program

Info

Publication number: JP2005135199A
Application number: JP2003371309A
Authority: JP
Inventors: Makoto Onizuka; 真鬼塚; Hiroyuki Uchiyama; 寛之内山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-10-30
Filing date: 2003-10-30
Publication date: 2005-05-26

Abstract

<P>PROBLEM TO BE SOLVED: To propose an XML data retrieval system using an XPath expression having a plurality of conditions. <P>SOLUTION: An automaton generating method of generating an automaton from the XPath expression by using an XML schema is characterized in that the XML schema prescribes the order of appearance of nodes included in XML data, and an XML data retrieval device 1 performs a procedure of generating a start state corresponding to a context node, a procedure of specifying the order of appearance of predicates from the appearance order of the nodes prescribed by the XML schema, a procedure of generating states by the predicates and prescribing transition of the states according to the appearance order of the predicates, and a procedure of determining a state corresponding to the end of the appearance order as a reception state and generating an automaton from the XPath expression. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、オートマトン作成方法、および、ＸＭＬデータ検索方法、ならびに、ＸＭＬデータ検索装置、ＸＭＬデータ検索プログラム、および、ＸＭＬデータ検索プログラムの記録媒体に関する。 The present invention relates to an automaton creation method, an XML data search method, an XML data search device, an XML data search program, and an XML data search program recording medium.

ＸＭＬ（Extensible Markup Language）は、ネットワーク上で交換可能な標準的なデータ記述方式を提供する技術であり、タグを用いてデータ構造を表現する（非特許文献１など）。よって、ＸＭＬに従って作成されたデータは、コンピュータが処理可能な形で構造化されているため、現在ｎｅｗｓＭＬなどをはじめとして広く使われている。 XML (Extensible Markup Language) is a technology that provides a standard data description method that can be exchanged on a network, and expresses a data structure using tags (Non-Patent Document 1, etc.). Therefore, data created in accordance with XML is structured in a form that can be processed by a computer, and is currently widely used, including newsML.

なお、ＸＭＬ形式のデータを処理するための特定の言語も、ＸＭＬに合わせて広く利用されている。例えば、ＸＰａｔｈ（XML Path Language）は、ＸＭＬデータの一部を特定するための記述方式を提供する技術であり、ＸＭＬデータに対するクエリや変換など、ＸＭＬデータへのアクセスを効率的に行うための表現方法として、重要な役割を果たしている。例えば、ＸＭＬデータの集合から所定の条件に適合するＸＭＬデータを検索する場合には、所定の条件をＸＰａｔｈ式として記述することで、ＸＭＬデータへの検索を容易にする。 A specific language for processing data in XML format is also widely used according to XML. For example, XPath (XML Path Language) is a technology that provides a description method for specifying a part of XML data, and is an expression for efficiently accessing XML data, such as a query and conversion for XML data. As a method, it plays an important role. For example, when searching for XML data that meets a predetermined condition from a set of XML data, the predetermined condition is described as an XPath expression to facilitate the search to the XML data.

ＸＭＬデータを対象とした検索処理の一例として、例えば、ｎｅｗｓＭＬ形式に従ったデータを検索対象とする処理がある。ｎｅｗｓＭＬは、ニュース記事やそれに関連した画像、動画、音声などをウェッブ、携帯電話、テレビ(データ放送)など、さまざまな端末に送ることが出来る。よって、ｎｅｗｓＭＬの受け側（利用者）は、ＸＰａｔｈ式を配信サーバに登録しておくことで膨大な情報の中から、必要な情報のみを得ることが出来る。 As an example of a search process for XML data, there is a process for searching for data according to the newsML format, for example. newsML can send news articles and related images, videos, sounds, etc. to various terminals such as web, mobile phone, TV (data broadcasting). Therefore, the recipient (user) of newsML can obtain only necessary information from a vast amount of information by registering the XPath expression in the distribution server.

そして、ＸＭＬデータを対象とした検索処理は、検索対象のＸＭＬデータをどのような順序でパーズ（走査）するかによって、大きく２種類に分類できる。 The search processing for XML data can be roughly classified into two types depending on the order in which the XML data to be searched is parsed (scanned).

まず、第１のパーズ手法は、ＤＯＭ（Document Object Model）である。ＤＯＭは、ＸＭＬが木で表現できることに注目し処理を行う技術である。ＤＯＭ(ＸＭＬを内部木で表現したもの)に対して走査を行うことで、分岐処理に対応することが出来る`但し、ＤＯＭには巨大なＸＭＬ対しても内部木(以下、ＤＯＭ木)を作らなければならないために、メモリ使用量が膨大となる、また、ニュース配信や株価配信のように逐次的なデータ(時系列データなど)として送られてリアルタイム処理を必要とする場合には、ＤＯＭ木を用いたＸＰａｔｈ式処理は、適用が困難である。 First, the first parsing method is DOM (Document Object Model). DOM is a technology that performs processing paying attention to the fact that XML can be expressed by a tree. By scanning DOM (XML expressed as an internal tree), it is possible to handle branch processing. However, DOM has an internal tree (hereinafter referred to as DOM tree) even for a large XML. If the memory usage is enormous, or if it is sent as sequential data (such as time series data) and requires real-time processing, such as news distribution or stock price distribution, the DOM tree It is difficult to apply the XPath type processing using.

一方、第２のパーズ手法は、ＳＡＸ（Simple API for XML）である。ＳＡＸは、前記のＤＯＭの弱点であるリアルタイム性とメモリ使用量削減を狙った技術である。ＤＯＭでは、ＸＭＬを木として表現することで、ＸＰａｔｈ式を与えられたときに走査することで検索及びフィルタリング処理を行った。一方ＳＡＸを用いた場合には、ＸＭＬデータをあくまでも上から下へ走査することしか出来ないためにＤＯＭのようにＸＰａｔｈ式処理を行うことが出来ない。 On the other hand, the second parsing method is SAX (Simple API for XML). SAX is a technology aimed at real-time performance and memory usage reduction, which are weak points of the DOM. In DOM, XML is represented as a tree, and scanning and filtering are performed by scanning when an XPath expression is given. On the other hand, when SAX is used, since XML data can only be scanned from top to bottom, XPath processing cannot be performed like DOM.

そこで、ＳＡＸを用いたＸＰａｔｈ式処理技術としては、非決定性オートマトンを利用した方法や決定性オートマトンを利用した方法がある。非決定性オートマトンと決定性オートマトンは完全に独立したものではなく、それぞれ一長一短がある。なお、非決定性オートマトンは、ＸＰａｔｈ式一つに対して一つのオートマトンを構築する。ちなみに、複数のＸＰａｔｈ式が与えられたときには、いくつものオートマトンを構築する必要がある。そのため、メモリ使用量は少なくて済むが、処理速度は低速となる。一方、決定性オートマトンは、非決定性オートマトンからサブセット生成法に従って構築される。よって、メモリ使用量は多くなるが、処理速度は高速となる。 Therefore, as an XPath type processing technique using SAX, there are a method using a non-deterministic automaton and a method using a deterministic automaton. Non-deterministic automata and deterministic automata are not completely independent, and each has advantages and disadvantages. Note that a nondeterministic automaton constructs one automaton for one XPath expression. Incidentally, when a plurality of XPath expressions are given, it is necessary to construct a number of automata. Therefore, the memory usage is small, but the processing speed is low. On the other hand, a deterministic automaton is constructed from a non-deterministic automaton according to a subset generation method. Therefore, the memory usage is increased, but the processing speed is increased.

非決定性オートマトンや決定性オートマトンが作成されると、入力されたＸＭＬの中でどの部分がＸＰａｔｈ式に一致するかを判断することが可能となる。非決定性オートマトンを利用した方法は、ＸＰａｔｈ式ひとつに対してひとつの非決定性オートマトンを作成する。この方法は、ＸＰａｔｈ式が多くなると処理速度が著しく劣化する特徴がある（非特許文献２参照）。決定性オートマトンを利用した方法は、前記非決定性オートマトンを変換することで実現できる。決定性オートマトンは、ＸＰａｔｈ式が多くなっても処理速度が一定のまま高速に維持できるという特徴がある。但し、変換の際にメモリ利用量が膨大になるという欠点も有する。 When a non-deterministic automaton or a deterministic automaton is created, it is possible to determine which part of the input XML matches the XPath expression. The method using a nondeterministic automaton creates one nondeterministic automaton for one XPath expression. This method has a feature that the processing speed is remarkably deteriorated as the number of XPath expressions increases (see Non-Patent Document 2). A method using a deterministic automaton can be realized by converting the non-deterministic automaton. The deterministic automaton is characterized in that even if the number of XPath expressions increases, the processing speed can be maintained at a high speed. However, there is a disadvantage that the memory usage becomes enormous during conversion.

遅延型決定性有限オートマトンと呼ばれる手法は、非決定オートマトンから決定性オートマトンヘすぐに変換を行うことをせず、ＸＭＬが入力された際に利用されるオートマトンのみを決定性化することで高速かつ少ないメモリ使用量を実現している（非特許文献３参照）。
中山幹敏、奥井康弘著、“改訂版標準XML完全解説(上)”、ISBN4-7741-1186-4、初版 2001年4月発行、出版社“技術評論社” Yanlei Diao, Peter M. Fischer, Michael J. Franklin, Raymond To著、“YFilter: Efficient and Scalable Filtering of XML Documents.”、In Proceedings of the ICDE(2002) Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu著、“Processing XML Streams with Deterministic Automata.”、ICDT（2003） A method called a delay-type deterministic finite automaton does not immediately convert a nondeterministic automaton to a deterministic automaton, but determinizes only the automaton used when XML is input, thereby reducing the amount of memory used at high speed. (See Non-Patent Document 3).
Mikitoshi Nakayama, Yasuhiro Okui, “Revised Standard XML Complete Description (above)”, ISBN4-7741-1186-4, First Edition, April 2001, Publisher “Technical Reviewer” Yanlei Diao, Peter M. Fischer, Michael J. Franklin, Raymond To, “YFilter: Efficient and Scalable Filtering of XML Documents.”, In Proceedings of the ICDE (2002) Todd J. Green, Gerome Miklau, Makoto Onizuka, Dan Suciu, “Processing XML Streams with Deterministic Automata”, ICDT (2003)

しかし、複数の述語（検索条件）に対する処理を実現するシステムは、提案されていなかった。これは、ＳＡＸべ一スのＸＰａｔｈ式処理システムが逐次的にＸＭＬを処理するため、ただ一つの検索条件のみしか扱わない検索システムを、そのまま複数の検索条件に適合することが出来なかったためである。 However, a system that realizes processing for a plurality of predicates (search conditions) has not been proposed. This is because the SAX-based XPath processing system sequentially processes XML, so that a search system that handles only one search condition cannot be adapted to a plurality of search conditions as it is. .

ここで、検索対象となるＸＭＬデータから検索条件に適合したＸＭＬデータを効率的に検索するためには、複数の条件を指定することは、必要となる。従来のただ一つの検索条件のみしか扱わない検索システムでは、利用者の必要としないＸＭＬデータまでも、ノイズとして検索結果に出力されてしまうために、効率的ではない。 Here, it is necessary to specify a plurality of conditions in order to efficiently search XML data that matches the search conditions from the XML data to be searched. In a conventional search system that handles only one search condition, even XML data that is not required by the user is output to the search result as noise, which is not efficient.

なお、検索対象となるＸＭＬデータは、親子関係を有するノードの集合を含むものとする。また、検索条件となるＸＰａｔｈ式は、ＸＭＬデータのノードを指定するためのコンテキストノードと、そのコンテキストノードに対応する複数のプレディケイトとの対応関係の組を１つ以上含む。ここで、プレディケイト（predicate）は、個々の検索条件を示すもので、以降、“述語”や、“条件”と表記する。 The XML data to be searched includes a set of nodes having a parent-child relationship. Further, the XPath expression as a search condition includes one or more sets of correspondence relationships between a context node for designating a node of XML data and a plurality of predicates corresponding to the context node. Here, the predicate indicates individual search conditions, and is hereinafter referred to as “predicate” or “condition”.

そこで、本発明は、前記した問題を解決し、複数の条件を有するＸＰａｔｈ式を用いたＸＭＬデータ検索システムを提案することを、主な目的とする。 In view of the above, the main object of the present invention is to propose an XML data search system that uses the XPath expression having a plurality of conditions to solve the above-described problems.

前記課題を解決するため、請求項１に記載のオートマトン作成方法は、ＸＰａｔｈ式からオートマトンを作成するオートマトン作成方法であって、前記ＸＰａｔｈ式は、コンテキストノードと、そのコンテキストノードに対応する複数のプレディケイトとの対応関係の組を１つ以上含むものとし、演算処理を行う際に用いられる記憶領域としてのメモリと、前記演算処理を行う演算処理装置とを少なくとも備えるコンピュータが、
前記コンテキストノードに対応する開始状態を作成する手順と、前記プレディケイトを入力とする受理状態を前記プレディケイトごとに作成する手順と、前記開始状態と前記受理状態とをイプシロン遷移によって対応付けてＸＰａｔｈ式からオートマトンを作成する手順と、を実行することを特徴とする。 In order to solve the above problem, an automaton creation method according to claim 1 is an automaton creation method for creating an automaton from an XPath expression, and the XPath expression includes a context node and a plurality of readys corresponding to the context node. A computer including at least one pair of correspondence relations with Kate, and having at least a memory as a storage area used when performing arithmetic processing, and an arithmetic processing device that performs the arithmetic processing,
A procedure for creating a start state corresponding to the context node, a procedure for creating a reception state for each predicate with the predicate as an input, and an association between the start state and the reception state by an epsilon transition. And a step of creating an automaton from the formula.

請求項２に記載のオートマトン作成方法は、ＸＰａｔｈ式からＸＭＬスキーマを用いてオートマトンを作成するオートマトン作成方法であって、前記ＸＰａｔｈ式は、コンテキストノードと、そのコンテキストノードに対応する複数のプレディケイトとの対応関係の組を１つ以上含むものとし、前記ＸＭＬスキーマは、ＸＭＬデータに含まれるノードの出現順序を規定するものとし、演算処理を行う際に用いられる記憶領域としてのメモリと、前記演算処理を行う演算処理装置とを少なくとも備えるコンピュータが、
前記コンテキストノードに対応する開始状態を作成する手順と、前記ＸＭＬスキーマに規定される前記ノードの出現順序から前記プレディケイトの出現順序を特定する手順と、前記プレディケイトごとに状態を作成して前記プレディケイトの出現順序に従って状態の遷移を規定する手順と、前記出現順序の最後に相当する状態を受理状態としてＸＰａｔｈ式からオートマトンを作成する手順と、を実行することを特徴とする。 The automaton creation method according to claim 2 is an automaton creation method for creating an automaton from an XPath expression using an XML schema, wherein the XPath expression includes a context node and a plurality of predicates corresponding to the context node. And the XML schema prescribes the order of appearance of nodes included in the XML data, the memory serving as a storage area used when performing the arithmetic processing, and the arithmetic processing A computer comprising at least an arithmetic processing unit for performing
A procedure for creating a start state corresponding to the context node; a procedure for specifying the appearance order of the predicate from the order of appearance of the node specified in the XML schema; and creating a state for each predicate A procedure for defining state transitions according to the appearance order of predicates, and a procedure for creating an automaton from an XPath expression with the state corresponding to the end of the appearance order as an accepting state are executed.

請求項３に記載のＸＭＬデータ検索方法は、請求項１または請求項２に記載されたオートマトン作成方法によって作成されたオートマトンを用いてＸＭＬデータを検索するＸＭＬデータ検索方法であって、演算処理を行う際に用いられる記憶領域としてのメモリと、前記演算処理を行う演算処理装置とを少なくとも備えるコンピュータが、
検索対象となるＸＭＬデータの入力を受け付ける手順と、前記ＸＭＬデータを順に走査してＳＡＸイベントを発生させる手順と、前記作成されたオートマトンに対して前記ＳＡＸイベントを入力として前記オートマトンの状態を推移させる手順と、前記オートマトンの状態が前記オートマトンに含まれる全ての受理状態に到達するときに、前記ＸＭＬデータを検索結果として出力する手順と、を実行することを特徴とする。 The XML data search method according to claim 3 is an XML data search method for searching XML data using the automaton created by the automaton creation method according to claim 1 or claim 2, wherein the arithmetic processing is performed. A computer comprising at least a memory as a storage area used when performing, and an arithmetic processing device that performs the arithmetic processing,
A procedure for accepting input of XML data to be searched, a procedure for sequentially scanning the XML data to generate a SAX event, and a transition of the automaton state by inputting the SAX event to the created automaton And a step of outputting the XML data as a search result when the state of the automaton reaches all acceptance states included in the automaton.

請求項４に記載のＸＭＬデータ検索装置は、検索対象のＸＭＬデータから検索条件のＸＰａｔｈ式に適合するＸＭＬデータを検索するＸＭＬデータ検索装置であって、前記ＸＭＬデータ検索装置は、演算処理を行う際に用いられる記憶領域としてのメモリと、前記演算処理を行う演算処理装置とを少なくとも備え、前記メモリに格納される前記ＸＭＬデータは、親子関係を有するノードの集合を含むものとし、前記メモリに格納される前記ＸＰａｔｈ式は、コンテキストノードと、そのコンテキストノードに対応する複数のプレディケイトとの対応関係の組を１つ以上含むものとし、前記ＸＭＬデータ検索装置は、
検索対象のＸＭＬデータを順に走査してＳＡＸイベントを発生させるＳＡＸイベント生成部と、前記ＸＰａｔｈ式から前記コンテキストノードごとにオートマトンを作成するオートマトン作成手段と、前記ＳＡＸイベントを入力として前記作成されたオートマトンの状態遷移によりＸＭＬデータを検索するＸＰａｔｈ検索部と、を含めて構成されることを特徴とする。 The XML data retrieval apparatus according to claim 4 is an XML data retrieval apparatus that retrieves XML data that conforms to an XPath expression of a retrieval condition from XML data to be retrieved, and the XML data retrieval apparatus performs arithmetic processing. At least a memory used as a storage area and an arithmetic processing unit that performs the arithmetic processing, and the XML data stored in the memory includes a set of nodes having a parent-child relationship, and is stored in the memory The XPath expression to be included includes one or more sets of correspondence relations between a context node and a plurality of predicates corresponding to the context node, and the XML data search device includes:
A SAX event generation unit that sequentially scans XML data to be searched to generate a SAX event, an automaton creation unit that creates an automaton for each context node from the XPath expression, and the automaton that is created using the SAX event as an input And an XPath search unit that searches for XML data based on the state transition of the above.

請求項５に記載のＸＭＬデータ検索装置は、検索対象のＸＭＬデータから検索条件のＸＰａｔｈ式に適合するＸＭＬデータを検索するＸＭＬデータ検索装置であって、前記ＸＭＬデータ検索装置は、演算処理を行う際に用いられる記憶領域としてのメモリと、前記演算処理を行う演算処理装置とを少なくとも備え、前記メモリに格納される前記ＸＭＬデータは、親子関係を有するノードの集合を含むものとし、前記メモリに格納される前記ＸＰａｔｈ式は、コンテキストノードと、そのコンテキストノードに対応する複数のプレディケイトとの対応関係の組を１つ以上含むものとし、前記メモリに格納される前記ＸＭＬスキーマは、ＸＭＬデータに含まれるノードの出現順序を規定するものとし、前記ＸＭＬデータ検索装置は、
検索対象のＸＭＬデータを順に走査してＳＡＸイベントを発生させるＳＡＸイベント生成部と、前記ＸＰａｔｈ式および前記ＸＭＬスキーマに規定されたノードの出現順序から前記コンテキストノードごとにオートマトンを作成するオートマトン作成手段と、前記ＳＡＸイベントを入力として前記作成されたオートマトンの状態遷移によりＸＭＬデータを検索するＸＰａｔｈ検索部と、を含めて構成されることを特徴とする。 The XML data retrieval apparatus according to claim 5 is an XML data retrieval apparatus that retrieves XML data that conforms to an XPath expression of a retrieval condition from XML data to be retrieved, and the XML data retrieval apparatus performs arithmetic processing. At least a memory used as a storage area and an arithmetic processing unit that performs the arithmetic processing, and the XML data stored in the memory includes a set of nodes having a parent-child relationship, and is stored in the memory The XPath expression to be included includes at least one set of correspondences between a context node and a plurality of predicates corresponding to the context node, and the XML schema stored in the memory is included in the XML data. The order of appearance of nodes shall be defined, and the XML data retrieval apparatus
An SAX event generation unit that sequentially scans XML data to be searched to generate a SAX event; and an automaton creation unit that creates an automaton for each context node from the appearance order of the nodes specified in the XPath expression and the XML schema. , And an XPath search unit that searches for XML data based on the state transition of the created automaton using the SAX event as an input.

請求項６に記載のＸＭＬデータ検索プログラムは、検索対象のＸＭＬデータから検索条件のＸＰａｔｈ式に適合するＸＭＬデータを検索するＸＭＬデータ検索プログラムであって、前記ＸＭＬデータは、親子関係を有するノードの集合を含むものとし、前記ＸＰａｔｈ式は、コンテキストノードと、そのコンテキストノードに対応する複数のプレディケイトとの対応関係の組を１つ以上含むものとし、演算処理を行う際に用いられる記憶領域としてのメモリと、前記演算処理を行う演算処理装置とを少なくとも備えるコンピュータを、
検索対象のＸＭＬデータを順に走査してＳＡＸイベントを発生させるＳＡＸイベント生成手段と、前記ＸＰａｔｈ式から前記コンテキストノードごとにオートマトンを作成するオートマトン作成手段と、前記ＳＡＸイベントを入力として前記作成されたオートマトンの状態遷移によりＸＭＬデータを検索するＸＰａｔｈ検索手段、として機能させることを特徴とする。 The XML data search program according to claim 6 is an XML data search program for searching XML data that conforms to an XPath expression of a search condition from XML data to be searched, the XML data being a node having a parent-child relationship. The XPath expression includes one or more sets of correspondences between a context node and a plurality of predicates corresponding to the context node, and a memory as a storage area used when performing arithmetic processing And a computer comprising at least an arithmetic processing unit that performs the arithmetic processing,
SAX event generation means for generating SAX events by sequentially scanning XML data to be searched, automaton creation means for creating an automaton for each context node from the XPath expression, and the automaton created using the SAX event as an input It is made to function as an XPath search means for searching XML data by the state transition.

請求項７に記載のＸＭＬデータ検索プログラムは、検索対象のＸＭＬデータから検索条件のＸＰａｔｈ式に適合するＸＭＬデータを検索するＸＭＬデータ検索プログラムであって、前記ＸＭＬデータは、親子関係を有するノードの集合を含むものとし、前記ＸＰａｔｈ式は、コンテキストノードと、そのコンテキストノードに対応する複数のプレディケイトとの対応関係の組を１つ以上含むものとし、前記ＸＭＬスキーマは、ＸＭＬデータに含まれるノードの出現順序を規定するものとし、演算処理を行う際に用いられる記憶領域としてのメモリと、前記演算処理を行う演算処理装置とを少なくとも備えるコンピュータを、
検索対象のＸＭＬデータを順に走査してＳＡＸイベントを発生させるＳＡＸイベント生成手段と、前記ＸＰａｔｈ式および前記ＸＭＬスキーマに規定されたノードの出現順序から前記コンテキストノードごとにオートマトンを作成するオートマトン作成手段と、前記ＳＡＸイベントを入力として前記作成されたオートマトンの状態遷移によりＸＭＬデータを検索するＸＰａｔｈ検索手段、として機能させることを特徴とする。 The XML data search program according to claim 7 is an XML data search program for searching XML data that conforms to an XPath expression of a search condition from XML data to be searched, wherein the XML data is a node having a parent-child relationship. The XPath expression includes one or more pairs of correspondence relationships between a context node and a plurality of predicates corresponding to the context node, and the XML schema includes the occurrence of a node included in the XML data. A computer that includes at least a memory as a storage area that is used when performing arithmetic processing and an arithmetic processing device that performs the arithmetic processing, which defines the order.
SAX event generation means for generating SAX events by sequentially scanning XML data to be searched; and automaton creation means for creating an automaton for each context node from the appearance order of nodes defined in the XPath expression and the XML schema. , And an XPath search means for searching XML data based on the state transition of the created automaton using the SAX event as an input.

前記のように、本発明によって、ＳＡＸベースのＸＭＬデータ検索装置において、ＸＰａｔｈ式が複数の条件を含んでいても検索処理を行うことが可能となる。 As described above, according to the present invention, in the SAX-based XML data search apparatus, it is possible to perform search processing even if the XPath expression includes a plurality of conditions.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１には、本発明の一実施形態に関する全体構成が示されている。図１に示すＸＭＬデータ検索装置１は、検索対象のＸＭＬデータから検索条件に適合するＸＭＬデータを検索する機能を有する。このため、ＸＭＬデータ検索装置１は、検索対象のＸＭＬデータを格納するＸＭＬデータ格納部１０と、検索対象のＸＭＬデータをＸＭＬデータ格納部に登録するＸＭＬデータ登録部１１と、検索対象のＸＭＬデータが従うＸＭＬスキーマを格納するＸＭＬスキーマ格納部１２と、検索対象のＸＭＬデータを順に走査してＳＡＸイベントを発生させるＳＡＸイベント生成部１３と、検索条件であるＸＰａｔｈを格納するＸＰａｔｈ格納部２０と、検索条件であるＸＰａｔｈをＸＰａｔｈ格納部に登録するＸＰａｔｈ登録部２１と、ＸＰａｔｈから生成されたオートマトンの遷移を用いてＸＭＬデータを検索するＸＰａｔｈ検索部２２と、ＸＰａｔｈから生成されたオートマトンを格納するオートマトン格納部３０と、を含めて構成される。なお、図１に示す矢印は、データの流れを示すものである。 FIG. 1 shows an overall configuration relating to an embodiment of the present invention. The XML data search apparatus 1 shown in FIG. 1 has a function of searching for XML data that matches a search condition from XML data to be searched. Therefore, the XML data search apparatus 1 includes an XML data storage unit 10 that stores XML data to be searched, an XML data registration unit 11 that registers XML data to be searched in the XML data storage unit, and XML data to be searched. An XML schema storage unit 12 that stores an XML schema that follows, an SAX event generation unit 13 that sequentially scans XML data to be searched to generate an SAX event, an XPath storage unit 20 that stores an XPath that is a search condition, An XPath registration unit 21 that registers an XPath that is a search condition in the XPath storage unit, an XPath search unit 22 that searches XML data using transitions of the automaton generated from the XPath, and an automaton that stores the automaton generated from the XPath And the storage unit 30. The arrows shown in FIG. 1 indicate the flow of data.

また、ＸＭＬデータ検索装置１は、オートマトンの作成手段として、ＸＭＬスキーマに規定されたＸＭＬデータのノードの出現順序を用いてＸＰａｔｈからオートマトンを作成する順序有オートマトン作成部３１と、ＸＭＬデータのノードの出現順序を利用せずにＸＰａｔｈからオートマトンを作成する順序無オートマトン作成部３２とのうち、少なくとも１つを備えるものとする。なお、順序有オートマトン作成部３１または順序無オートマトン作成部３２のいずれかをオートマトンの作成手段として利用するかの判断は、検索対象のＸＭＬデータが特定のＸＭＬスキーマに従って作成されたかどうかによって行われる。 The XML data retrieval apparatus 1 also has an ordered automaton creation unit 31 that creates an automaton from XPath using the appearance order of XML data nodes defined in the XML schema, as an automaton creation means, and an XML data node It is assumed that at least one of the unordered automaton creating unit 32 that creates an automaton from XPath without using the appearance order is provided. Whether or not the ordered automaton creating unit 31 or the unordered automaton creating unit 32 is used as the automaton creating means is determined depending on whether or not the XML data to be searched is created according to a specific XML schema.

なお、ＸＭＬスキーマは、ＸＭＬデータを構成するノード（タグによって表現される）のＸＭＬデータ内での出現順序を規定した文法である。まず、検索対象のＸＭＬデータがＸＭＬスキーマに従って作成されている場合には、ＸＭＬデータ検索装置１は、順序有オートマトン作成部３１または順序無オートマトン作成部３２のどちらかを用いて、検索条件であるＸＰａｔｈからオートマトンを作成する。一方、検索対象のＸＭＬデータがＸＭＬスキーマに従って作成されていない場合には、ＸＭＬデータ検索装置１は、順序無オートマトン作成部３２を用いて、検索条件であるＸＰａｔｈからオートマトンを作成する。なお、作成されたオートマトンは、オートマトン格納部３０に保存される。 The XML schema is a grammar that defines the appearance order of nodes (represented by tags) constituting XML data in the XML data. First, when the XML data to be searched is created in accordance with the XML schema, the XML data search apparatus 1 uses either the ordered automaton creation unit 31 or the unordered automaton creation unit 32 as a search condition. Create an automaton from XPath. On the other hand, when the XML data to be searched is not created according to the XML schema, the XML data search apparatus 1 uses the unordered automaton creating unit 32 to create an automaton from XPath that is a search condition. The created automaton is stored in the automaton storage unit 30.

ここで、順序有オートマトン作成部３１は、順序無オートマトン作成部３２よりも検索処理効率の良いオートマトンを作成する。これは、順序有オートマトン作成部３１が作成するオートマトンは、順序情報を活用しているので、順序無オートマトン作成部３２が作成するオートマトンよりも、検証する受理状態が少なくて済むためである。よって、検索対象のＸＭＬデータがＸＭＬスキーマに従って作成されている場合には、順序有オートマトン作成部３１が順序無オートマトン作成部３２よりも優先的に使用される。 Here, the ordered automaton creation unit 31 creates an automaton with higher search processing efficiency than the unordered automaton creation unit 32. This is because the automaton created by the ordered automaton creating unit 31 uses order information, and therefore, the automaton created by the unordered automaton creating unit 32 requires fewer acceptance states to be verified. Therefore, when the XML data to be searched is created according to the XML schema, the ordered automaton creation unit 31 is used in preference to the unordered automaton creation unit 32.

そして、ＸＭＬデータ検索装置１は、作成されたオートマトンを用いて、検索処理を行う。具体的には、ＸＭＬデータ検索装置１は、ＳＡＸイベント生成部１３によって、ＸＭＬデータ格納部１０のＸＭＬデータを順次走査することによってＳＡＸイベントを発生させる。次に、ＸＭＬデータ検索装置１のＸＰａｔｈ検索部２２は、発生されたＳＡＸイベントを入力として、オートマトン格納部３０に格納されたオートマトンの状態遷移を検証する。そして、ＸＰａｔｈ検索部２２は、オートマトンの状態遷移が受理状態となる場合に、ＳＡＸイベントの生成元となったＸＭＬデータが、そのオートマトンの生成元となったＸＰａｔｈに適合したとして、そのＸＭＬデータを検索結果として出力する。以上、ＸＭＬデータ検索装置１の概要について説明した。次に、ＸＭＬデータ検索装置１の各構成要素について、具体的に説明する。 Then, the XML data search apparatus 1 performs a search process using the created automaton. Specifically, the XML data search apparatus 1 causes the SAX event generation unit 13 to generate SAX events by sequentially scanning the XML data in the XML data storage unit 10. Next, the XPath search unit 22 of the XML data search apparatus 1 verifies the state transition of the automaton stored in the automaton storage unit 30 by using the generated SAX event as an input. Then, when the state transition of the automaton becomes an accepting state, the XPath search unit 22 assumes that the XML data that is the generation source of the SAX event matches the XPath that is the generation source of the automaton. Output as search results. The outline of the XML data search apparatus 1 has been described above. Next, each component of the XML data search apparatus 1 will be specifically described.

まず、ＸＭＬデータ格納部１０は、検索対象のＸＭＬデータを格納する。図１０（Ａ）は、格納されるＸＭＬデータの一例を示す図である。図１０（Ａ）のＸＭＬデータは、文献用のＸＭＬでありｂｉｂタグの下にｂｏｏｋタグがある。ｂｏｏｋタグの下には、ｔｉｔｌｅタグとａｕｔｈｏｒタグがある。このような、ＸＭＬに対して、発行年が１９９９年であり、題名にＸＭＬを含み、著者がＢｏｂであるような条件に一致したｂｏｏｋ要素を抽出するため、（式１）のＸＰａｔｈの検索式を用いる。この時、[]で囲まれた部分がＸＰａｔｈ式の条件を表すことになる。そして、ＸＭＬデータ検索装置１は、図１０（Ａ）のＸＭＬデータに対して、（式１）の検索式を用いて、図１０（Ｂ）の検索結果を得る。なお、本発明の主な特徴は、本発明は、（式１）の[]の部分が複数の場合でも処理できることである。
/bib/book[contains(title/text(),`XML`)][author=`Bob`][@year=1999]・・・（式１） First, the XML data storage unit 10 stores XML data to be searched. FIG. 10A is a diagram illustrating an example of stored XML data. The XML data in FIG. 10A is a document XML, and has a book tag under the bib tag. Below the book tag are a title tag and an author tag. In order to extract book elements that match the conditions such that the year of publication is 1999, XML is included in the title, and the author is Bob, for this XML, the XPath search formula of (Formula 1) Is used. At this time, the portion surrounded by [] represents the condition of the XPath expression. Then, the XML data search apparatus 1 obtains the search result of FIG. 10B using the search expression of (Expression 1) for the XML data of FIG. 10 (A). The main feature of the present invention is that the present invention can be processed even when there are a plurality of [] parts in (Expression 1).
/ bib / book [contains (title / text (), `XML`)] [author =` Bob`] [@ year = 1999] ... (Formula 1)

次に、ＳＡＸイベント生成部１３は、ＸＭＬデータを走査して、所定のノード（タグ）をＸＭＬデータから読み込んだ場合に、そのノードの種別に対応するコールバック関数を発生させる。これは、ＳＡＸが、前記のように、ＸＭＬを解析するときに木を構成しないためである。なお、図１１（Ｂ）は、コールバック関数の一例を列挙したものである。 Next, the SAX event generation unit 13 scans the XML data, and when a predetermined node (tag) is read from the XML data, generates a callback function corresponding to the type of the node. This is because SAX does not construct a tree when analyzing XML as described above. FIG. 11B lists examples of callback functions.

また、図２は、ＳＡＸフィルタ（ＳＡＸイベント生成部１３）とユーザアプリケーションの関係を示している。図２のＳＡＸフィルタにＸＭＬデータが入力されると、上から順にＸＭＬデータが読み込まれる。ｂｉｂが認識されると、ＳＡＸフィルタはユーザアプリケーション上にあるｓｔａｒｔＥｌｅｍｅｎｔ関数を呼び出す。この際、ｓｔａｒｔＥｌｅｍｅｎｔ関数の引数には、ｂｉｂを入れておくことで、ユーザアプリケーション側ではｂｉｂという情報を取得することが出来る。このように、ＳＡＸイベント生成部１３がユーザアプリケーションの関数を呼び出すことをコールバックと呼ぶ。 FIG. 2 shows the relationship between the SAX filter (SAX event generation unit 13) and the user application. When XML data is input to the SAX filter of FIG. 2, the XML data is read in order from the top. When bib is recognized, the SAX filter calls the startElement function on the user application. At this time, by putting bib in the argument of the startElement function, information called bib can be acquired on the user application side. In this way, calling the function of the user application by the SAX event generation unit 13 is called a callback.

そして、図１１（Ａ）は、ＸＭＬスキーマ格納部１２が格納するＸＭＬスキーマのＤＴＤ（Document Type Definition）を示す。ＸＭＬスキーマは、ＸＭＬのノードの出現順序を規定している。これによると、先にあげたＸＭＬ文書例である図１０（Ａ）は、図１１（Ａ）のＸＭＬスキーマを満たすことが分かる。なお、図１１（Ａ）の３行目に示される属性は、各要素に対して１つ以上付け加えることが出来るものであり、[属性名=`値']という形式で表現される。 FIG. 11A shows an XML schema DTD (Document Type Definition) stored in the XML schema storage unit 12. The XML schema defines the order of appearance of XML nodes. According to this, it can be seen that the XML document example shown in FIG. 10A satisfies the XML schema of FIG. 11A. Note that one or more attributes shown in the third line of FIG. 11A can be added to each element, and are expressed in the format [attribute name = `value '].

なお、本実施形態では、“ＸＭＬスキーマ”という用語を、ＸＭＬデータにおけるノードの出現順序を規定する文法の総称という意味で用いている。よって、ノードの出現順序を規定する文法の記述方法の一例として、Ｗ３Ｃが規定するＤＴＤや、XML Schemaなどが挙げられるが、本実施形態では、これらの特定の一例に限定されることはなく、様々な文法の記述方法を用いてもよい。 In this embodiment, the term “XML schema” is used to mean a generic name of grammar that defines the order of appearance of nodes in XML data. Thus, examples of a grammar description method that defines the appearance order of nodes include DTD defined by W3C, XML Schema, and the like. However, in the present embodiment, the present invention is not limited to these specific examples. Various grammar description methods may be used.

さらに、図３は、順序無オートマトン作成部３２が、ＸＰａｔｈ式からオートマトンを作成する様子を示している。なお、オートマトンは、様々な記述方法があり、本実施形態では、オートマトンを状態遷移図として記述する。丸で示されているのは状態であり、矢印に付随して書いてあるのは状態が移るための入力である。なお、ＸＰａｔｈ検索部２２がオートマトンの検証を行う際には、状態が移るための入力は、ＳＡＸイベント生成部１３から与えられるＳＡＸイベントに相当する。 Further, FIG. 3 shows a state in which the unordered automaton creating unit 32 creates an automaton from the XPath expression. There are various description methods for the automaton, and in this embodiment, the automaton is described as a state transition diagram. The state indicated by the circle is the state, and what is written accompanying the arrow is the input for changing the state. When the XPath search unit 22 verifies the automaton, the input for changing the state corresponds to the SAX event given from the SAX event generation unit 13.

まず、順序無オートマトン作成部３２は、ＸＰａｔｈ式の［］印で囲まれた各条件を、それぞれ状態が移るための入力として抽出する。次に、順序無オートマトン作成部３２は、各条件の遷移後の状態を、それぞれオートマトンの受理状態とする。さらに、順序無オートマトン作成部３２は、各条件から生成されたオートマトンを、イプシロン遷移を用いて１つのオートマトンに統合する。なお、イプシロン遷移は、入力に依存せずに遷移する旨を示す。このようにして作成されたオートマトンは、イプシロン遷移から各条件のいずれかに分岐するので、分岐有オートマトンと表記する。 First, the unordered automaton creating unit 32 extracts each condition surrounded by [] marks in the XPath expression as an input for changing the state. Next, the unordered automaton creation unit 32 sets the state after the transition of each condition as the accepting state of the automaton. Further, the unordered automaton creating unit 32 integrates the automaton generated from each condition into one automaton using the epsilon transition. Note that the epsilon transition indicates that the transition does not depend on the input. The automaton created in this way branches to one of the conditions from the epsilon transition, and is therefore referred to as a branching automaton.

そして、図４は、順序有オートマトン作成部３１が、変換前のＸＰａｔｈ式から変換後のオートマトンを作成する様子を示している。なお、図４（Ａ）と図４（Ｂ）とでは、オートマトンの生成に使用されるＸＰａｔｈ式は等価なものであるが、［］印で囲まれた条件の出現順序が異なっているため、生成されるオートマトンも異なるものとなる。つまり、条件の順番が違えばオートマトンも変化することが、順序有オートマトン作成部３１によって生成されるオートマトンの特徴である。 FIG. 4 shows a state in which the ordered automaton creation unit 31 creates a post-conversion automaton from the pre-conversion XPath expression. Note that in FIG. 4A and FIG. 4B, the XPath expression used to generate the automaton is equivalent, but the appearance order of the conditions surrounded by [] marks is different. The generated automaton will also be different. That is, a feature of the automaton generated by the ordered automaton creation unit 31 is that the automaton changes if the order of the conditions is different.

図４（Ａ）を説明する。状態０は、初期状態を示している。状態０から状態１に移るためには、ｂｉｂ要素のあとにｂｏｏｋ要素がくることが必要である。状態１から状態２に移るためには、ｔｉｔｌｅ要素がはさんでいるデータ中に`ＸＭＬ'という文字を含んでいる必要がある。状態２から３へは、ａｕｔｈｏｒがＢｏｂで移る。最後に、ｂｏｏｋ要素がｙｅａｒ属性を持ちさらにその値が１９９９のとき、受理状態となり、ＸＭＬデータが検索結果として抽出される。 FIG. 4A will be described. State 0 indicates an initial state. In order to shift from state 0 to state 1, it is necessary that a book element comes after the bib element. In order to move from state 1 to state 2, it is necessary to include the character `XML 'in the data between the title elements. From state 2 to 3, author moves with Bob. Finally, when the book element has a year attribute and its value is 1999, the acceptance state is entered, and XML data is extracted as a search result.

図５は、ＸＭＬデータの構造を示す文法を持つ場合において、文法を用いて条件の適用されるべき順序を抽出し、複数の子ノードに対する条件を持ったＸＰａｔｈ式を等価な非決定性オートマトンに変換する処理を示すフローチャートである。なお、図５は、順序有オートマトン作成部３１を主体とする動作のフローチャートを示している。非決定性オートマトンへの変換結果は、オートマトン格納部３０に登録される。これにより、複数条件を持ったＸＰａｔｈ式を処理することが可能となる。以下、図５の各処理を順に説明する。 FIG. 5 shows a case where a grammar indicating the structure of XML data is used, and the order in which conditions are applied is extracted using the grammar, and an XPath expression having conditions for a plurality of child nodes is converted into an equivalent nondeterministic automaton. It is a flowchart which shows the process to perform. FIG. 5 shows a flowchart of the operation mainly performed by the ordered automaton creation unit 31. The conversion result to the non-deterministic automaton is registered in the automaton storage unit 30. As a result, an XPath expression having a plurality of conditions can be processed. Hereinafter, each process of FIG. 5 is demonstrated in order.

まず、ＸＭＬデータ検索装置１は、検索条件であるＸＰａｔｈの入力を受け付ける（Ｓ１０１）。そして、入力されるＸＭＬがスキーマを持つ場合にこのフローチャートは動作するため、ＸＭＬデータ検索装置１は、入力されるＸＭＬがスキーマを持つかどうかを判定する（Ｓ１０２）。ここで、スキーマを持たないＸＭＬデータは、順序有オートマトン作成部３１の処理対象ではないので、処理を終了する（Ｓ１０３）。 First, the XML data search apparatus 1 accepts an input of an XPath that is a search condition (S101). Since this flowchart operates when the input XML has a schema, the XML data retrieval apparatus 1 determines whether the input XML has a schema (S102). Here, since the XML data having no schema is not a processing target of the ordered automaton creation unit 31, the processing ends (S103).

次に、ＸＭＬデータ検索装置１は、文法（スキーマ）を解析し、要素の出現順序を得る（Ｓ１０４）。なお、ＤＴＤであれば、そのような出現順序は要素型宣言において表現されている。要素型宣言とは、〈！ＥＬＥＭＥＮＴ要素内容モデル〉という形をとっている。内容モデルの部分において、要素の順番を「要素名１、要素名２、要素名３、...」のようにカンマで区切って表現する。 Next, the XML data search apparatus 1 analyzes the grammar (schema) and obtains the appearance order of elements (S104). In the case of DTD, such an appearance order is expressed in the element type declaration. Element type declaration is <! ELEMENT element content model>. In the content model portion, the order of elements is expressed by separating them with commas such as “element name 1, element name 2, element name 3,.

そこで、ＸＭＬデータ検索装置１は、要素名１、要素名２、要素名３、...の順番を記憶しておく。例えば、図１１（Ａ）のような文法の場合であれば、必ずｙｅａｒ、ｔｉｔｌｅ、ａｕｔｈｏｒの順番で現れることを指定している。 Therefore, the XML data retrieval apparatus 1 stores the order of element name 1, element name 2, element name 3,. For example, in the case of the grammar as shown in FIG. 11A, it is specified that it always appears in the order of ear, title, author.

なお、属性は、特殊な扱いとなっていて、図１１（Ａ）の順序３のように宣言される。属性は、ＸＭＬデータにおける出現位置が対象要素と同じであるために、属性に対する条件は一番最初に処理されなければならない。そこで、複数の条件があるときに属性に対する条件が存在すれば（Ｓ１０５、Ｙ）、そのような条件部分をまず一番最初に持ってくる（Ｓ１０６）。全ての属性に対する条件を前に持ってきたら、要素(ｔｉｔｌｅ、ａｕｔｈｏｒ)に対する順番を先ほどＤＴＤから得られた順序に移動する（Ｓ１０７）。このようにして得られたＸＰａｔｈ式を出力として、ＸＰａｔｈ登録部２１に出力する（Ｓ１０８）。 Note that the attribute is treated specially and is declared as shown in order 3 in FIG. Since the appearance position of the attribute in the XML data is the same as that of the target element, the condition for the attribute must be processed first. Therefore, if there is a condition for the attribute when there are a plurality of conditions (S105, Y), such a condition portion is brought first (S106). If the conditions for all the attributes are brought forward, the order for the elements (title, author) is moved to the order obtained from the DTD (S107). The XPath expression thus obtained is output as an output to the XPath registration unit 21 (S108).

具体例を用いて説明する。ＸＰａｔｈ式（式２）が与えられると、まず属性に関する条件全てを条件群の前へ移動して、（式３）とする。次に、文法に従って（式３）の条件を整列させて（式４）とする。このようにして、文法に従った順番で条件を整列する。このＸＰａｔｈ式は、例で示したＤＴＤに関して、元のＸＰａｔｈ式と等価である。よって、この出力されたＸＰａｔｈ式を用いて、検索処理を行うことが出来る。
/bib/book[author=`Bob`][contains(title/text(),`XML`)][@year=1999]・・・（式２）
/bib/book[@year=1999][author=`Bob`][contains(title/text(),`XML`)]・・・（式３）
/bib/book[@year=1999][contains(title/text(),`XML`)][author=`Bob`]・・・（式４） This will be described using a specific example. When the XPath expression (Expression 2) is given, first, all the conditions related to the attribute are moved to the front of the condition group to obtain (Expression 3). Next, the conditions of (Equation 3) are aligned according to the grammar to obtain (Equation 4). In this way, the conditions are arranged in the order according to the grammar. This XPath expression is equivalent to the original XPath expression with respect to the DTD shown in the example. Therefore, the search process can be performed using the output XPath expression.
/ bib / book [author = `Bob`] [contains (title / text (),` XML`)] [@ year = 1999] ... (Formula 2)
/ bib / book [@ year = 1999] [author = `Bob`] [contains (title / text (),` XML`)] ... (Formula 3)
/ bib / book [@ year = 1999] [contains (title / text (), `XML`)] [author =` Bob`] ... (Formula 4)

図６は、ＸＭＬデータの構造を示す文法（スキーマ）を持たない場合において、複数の子ノードに対する条件を持ったＸＰａｔｈ式でＸＭＬデータに対する検索処理を可能にするフローチャートである。つまり、図６は、複数の子ノードに対する条件を持ったＸＰａｔｈ式を分岐有り非決定性オートマトンに変換する処理を示すものである。なお、図６は、順序無オートマトン作成部３２を動作の主体とするものである。以下、図６の各手順を順に説明する。 FIG. 6 is a flowchart that enables a search process for XML data using an XPath expression having conditions for a plurality of child nodes when the grammar (schema) indicating the structure of the XML data is not provided. That is, FIG. 6 shows processing for converting an XPath expression having conditions for a plurality of child nodes into a nondeterministic automaton with a branch. FIG. 6 shows the operation of the unordered automaton creating unit 32. Hereafter, each procedure of FIG. 6 is demonstrated in order.

まず、ＸＭＬデータ検索装置１は、複数のＸＰａｔｈ式が与えられたときには（Ｓ２０１）、各ＸＰａｔｈを１つ１つ順に処理する。よって、ＸＭＬデータ検索装置１は、全てのＸＰａｔｈ式が処理されたかどうかを判定し（Ｓ２０２）、全てのＸＰａｔｈ式が処理されたら、処理を終了する（Ｓ２０３）。 First, when a plurality of XPath expressions are given (S201), the XML data retrieval apparatus 1 processes each XPath one by one. Therefore, the XML data retrieval apparatus 1 determines whether all the XPath expressions have been processed (S202). When all the XPath expressions have been processed, the process ends (S203).

次に、ＸＭＬデータ検索装置１は、ＸＰａｔｈ式が入力されると（Ｓ２０４）、入力されたＸＰａｔｈ式を解析して（Ｓ２０５）、ＸＰａｔｈ式が条件（述語）を持つかどうかを判断する（Ｓ２０６）。つまり、ＸＰａｔｈ式の中から[]でくくられた条件の部分を探すことになる。よって、ＸＭＬデータ検索装置１は、それぞれの条件を分割して、スタックしておく。そして、ＸＭＬデータ検索装置１は、スタックした複数の条件の中からひとつの条件を取り出して連結し、ＸＰａｔｈプロセッサに登録する（Ｓ２０７）。連結した条件は、スタックから取り除き、選択された条件のみを削除してＸＰａｔｈを新しいＸＰａｔｈとして出力する（Ｓ２０８）。 Next, when the XPath expression is input (S204), the XML data search apparatus 1 analyzes the input XPath expression (S205) and determines whether the XPath expression has a condition (predicate) (S206). ). That is, the part of the condition surrounded by [] is searched from the XPath expression. Therefore, the XML data retrieval apparatus 1 divides the respective conditions and stacks them. Then, the XML data retrieval apparatus 1 extracts and connects one condition from the plurality of stacked conditions, and registers it in the XPath processor (S207). The connected conditions are removed from the stack, only the selected conditions are deleted, and XPath is output as a new XPath (S208).

さらに、ＸＭＬデータ検索装置１は、スタックから条件がなくなるまで繰り返すと（Ｓ２０６、Ｎ）、条件の数と同じだけのＸＰａｔｈ式集合が構築される。最後に、元のＸＰａｔｈ式から条件部分を取り除いたＸＰａｔｈ式−Ｐａｒｅｎｔを取得する（Ｓ２０９）。ＸＰａｔｈ式−ＰａｒｅｎｔとＸＰａｔｈ式集合の間に親子関係を表す情報を構築しておく（Ｓ２１０）。 Further, when the XML data search apparatus 1 repeats until there are no more conditions from the stack (S206, N), as many XPath expression sets as the number of conditions are constructed. Finally, an XPath expression-Parent obtained by removing the condition part from the original XPath expression is acquired (S209). Information indicating a parent-child relationship is constructed between the XPath expression-Parent and the XPath expression set (S210).

例を用いて説明する。文法を持たないＸＭＬデータが、図１０（Ａ）のように与えられているとする。この時、ＸＰａｔｈ式である（式１）を考える。この時、前記の考察から単にオートマトンを構成してもフィルタリング処理することが出来ない、そこでまず、ＸＰａｔｈ式を、図１２（Ａ）のように分割する。これは、条件部を抜いた部分と複数条件をひとつの条件に分割している。 This will be described using an example. Assume that XML data having no grammar is given as shown in FIG. At this time, the XPath formula (Formula 1) is considered. At this time, even if an automaton is simply configured from the above consideration, filtering cannot be performed. First, the XPath expression is divided as shown in FIG. In this method, a part from which the condition part is omitted and a plurality of conditions are divided into one condition.

次に、各条件部分と、各条件部分に対応するコンテキストノードとを連結して、図１２（Ｂ）のようなＸＰａｔｈ式集合を得ることが出来る。なお、ここで得られたＸＰａｔｈ式群は、親子関係を持っている。前記の例に対する分岐有り非決定性オートマトンは、図３に示されているように構成される。 Next, each condition part and a context node corresponding to each condition part can be connected to obtain an XPath expression set as shown in FIG. Note that the XPath expression group obtained here has a parent-child relationship. The branching nondeterministic automaton for the above example is constructed as shown in FIG.

なお、本発明の一実施形態に関するＸＭＬデータ検索装置１は、順序無オートマトン作成部３２によって構築されたＸＰａｔｈ式群をＸＰａｔｈ格納部２０に登録しておく。例の中で＃Ｙ、＃Ｚ、＃Ｕ、＃Ｘなどを用いたが、これらは実際のＸＰａｔｈ式と同じ働きを行う。例えば、／ｂｉｂ／ｂｏｏｋがｓｔａｒｔＣｏｎｔｅｘｔから呼ばれると＃Ｙが呼ばれる。＃Ｚ、＃Ｕ、＃Ｘが＃Ｙを用いて定義されているのは、／／ｂｏｏｋに対応するためである。／／ｂｏｏｋのようなＸＰａｔｈ式は、ＸＭＬがｂｏｏｋ要素をネストして保持している場合にも対応しなければならない。#Ｙは、／／ｂｏｏｋが展開した後のＸＰａｔｈ式を表現しているので、＃Ｚ、＃Ｕ、＃Ｘをそれぞれ＃Ｙへの相対ＸＰａｔｈ式として定義することが出来る。 The XML data search apparatus 1 according to an embodiment of the present invention registers the XPath expression group constructed by the unordered automaton creation unit 32 in the XPath storage unit 20 in advance. In the examples, #Y, #Z, #U, #X, etc. are used, but these perform the same function as the actual XPath expression. For example, when / bib / book is called from startContext, #Y is called. The reason why #Z, #U, and #X are defined using #Y is to support // book. An XPath expression such as // book must also support the case where XML holds book elements nested. Since #Y expresses an XPath expression after // book is expanded, #Z, #U, and #X can be defined as relative XPath expressions to #Y, respectively.

ここで、図８に示すＸＭＬデータの検索処理の概要について、図７に示す具体例を用いて説明する。なお、図７の灰色で示された状態が受理状態を示している。 Here, an outline of the XML data search process shown in FIG. 8 will be described using a specific example shown in FIG. In addition, the state shown in gray in FIG. 7 indicates the acceptance state.

まず、図１０（Ｂ）のようなＸＭＬデータが、検索対象となる。そして、（式５）のようなＸＰａｔｈ式が入力されて、図７（ａ）のようなオートマトンが作成されたとする。
/bib/book[author=`Bob`][@year=1999][contains(title/text(),`XML`)]・・・（式５） First, XML data as shown in FIG. 10B is a search target. Assume that an XPath expression such as (Expression 5) is input and an automaton as shown in FIG. 7A is created.
/ bib / book [author = `Bob`] [@ year = 1999] [contains (title / text (),` XML`)] ... (Formula 5)

この時、ｓｔａｒｔＥｌｅｍｅｎｔ関数が２回呼び出されて、＄Ｙに遷移が移る。すると、＄Ｙに対する子オートマトンがそれぞれ状態を持つ、ｅｎｄＥｅｌｍｅｎｔ関数の引数には、属性の情報`ｙｅａｒ=１９９９'も含まれているので、＄Ｘが受理状態となる（図７（ｂ）参照）。よって、親オートマトンである＄Ｙに対して自分自身を登録し、＄Ｘが受理状態であることを伝える。 At this time, the startElement function is called twice, and a transition is made to $ Y. Then, since the child automaton for $ Y has a state, the argument of the endEelment function also includes attribute information `year = 1999 ', so $ X is in the accepting state (see FIG. 7B). . Therefore, it registers itself to $ Y, which is the parent automaton, and informs that $ X is in an accepting state.

同様にｅｎｄＥｌｅｍｅｎｔ関数が呼び出された後、ｃｏｎｔａｉｎｓ(ｔｉｔｌｅ/ｔｅｘｔ（）、`ＸＭＬ')とａｕｔｈｏｒ＝`Ｂｏｂ'という条件が満たされて、＄Ｚ、＄Ｕがそれぞれ受理状態となる（図７（ｃ）および図７（ｄ）参照）。次に、ｅｎｄＥｌｅｍｅｎｔ関数が＜／ｂｏｏｋ＞に関して呼び出されると、＄Ｙに対応するオートマトンは、全ての子オートマトンが受理状態であるかを検査し、受理状態であるので＄Ｙも受理する（図７（ｅ）参照）。 Similarly, after the endElement function is called, the conditions of “contains (title / text (),` XML ') and author = `Bob" are satisfied, and $ Z and $ U are in the accepting state (FIG. 7 ( c) and FIG. 7 (d)). Next, when the endElement function is called with respect to </ book>, the automaton corresponding to $ Y checks whether all child automata are in the accepting state, and accepts $ Y because it is in the accepting state (FIG. 7). (See (e)).

ここで、ｘｐａｔｈ＿ｖａｒｉａｂｌｅが持つスタックについて、図９を用いて説明する。図９には、＄Ｙ、＄Ｚ、＄Ｕ、＄Ｘの持つスタックを表している。スタックは、子オートマトンの受理状態を表すレジスタから構成されるので、＄Ｙは、レジスタ｛＄Ｚ、＄Ｕ、＄Ｘ｝をスタックしている。こうすることで、子オートマトンの受理状態を受け取ることが出来る。リーフオートマトンは、子オートマトンを持たないのでスタックの中は空である。 Here, the stack of xpath_variable will be described with reference to FIG. FIG. 9 shows a stack of $ Y, $ Z, $ U, and $ X. Since the stack is composed of registers indicating the acceptance state of the child automaton, $ Y stacks registers {$ Z, $ U, $ X}. By doing this, the acceptance status of the child automaton can be received. Leaf automata have no child automata, so the stack is empty.

なお、スタックを利用している理由を述べる。今回紹介した例は、簡単のため＄Ｙが絶対パスで表現されている。実際には、／／ｂｏｏｋのようにｂｏｏｋ要素を再帰的に指定するＸＰａｔｈが存在する。この時、登録できるレジスタが一つだけであると、複数のｂｏｏｋ要素がｓｔａｒｔＥｌｅｍｅｎｔ関数からコールバックを受けたときに現時点で処理するべきｂｏｏｋ要素が判断できなくなる。そのため、レジスタを複数格納しかつ処理するべき要素を判断するために、スタックを用いている。 The reason for using the stack will be described. In the example introduced this time, $ Y is expressed by an absolute path for simplicity. Actually, there is an XPath that recursively designates a book element, such as // book. At this time, if there is only one register that can be registered, when a plurality of book elements receive a callback from the startElement function, the book element to be processed at the present time cannot be determined. For this reason, the stack is used to store a plurality of registers and determine an element to be processed.

このようにして、受理されたときにＸＰａｔｈ式に受理されたＸＭＬデータを出力するために、ルートオートマトンにおいて遷移が起こったところからＸＭＬデータをレジスタに格納しておく。受理状態になったらレジストされているデータを全て出力する。受理状態にならなかったときには、受理状態にならなかったオートマトンがレジスタの開放作業を行う。以上により、文法（ＸＭＬスキーマ）が無い場合においても、複数条件をもったＸＰａｔｈ式の処理を行うことが出来る。 In this way, in order to output the XML data accepted by the XPath expression when accepted, the XML data is stored in the register from the place where the transition occurred in the root automaton. When it is accepted, all registered data is output. When the acceptance state is not reached, the automaton that has not entered the acceptance state opens the register. As described above, even when there is no grammar (XML schema), it is possible to perform processing of an XPath expression having a plurality of conditions.

図８は、作成されたオートマトンを用いて、入力されたＸＭＬデータを検索する処理を示すフローチャートである。ここでは、作成されたオートマトンの一例として、順序無オートマトン作成部３２によって作成された分岐有り非決定性オートマトンを用いている。なお、図８は、ＸＰａｔｈ検索部２２が主体となる動作を示す。以下、図８の処理を順に説明する。 FIG. 8 is a flowchart showing a process for searching input XML data using the created automaton. Here, as an example of the created automaton, the branching nondeterministic automaton created by the unordered automaton creating unit 32 is used. FIG. 8 shows an operation mainly performed by the XPath search unit 22. Hereinafter, the processing of FIG. 8 will be described in order.

ＸＭＬデータ検索装置１は、入力されたＸＭＬデータ（Ｓ３０１）に対するＳＡＸイベント生成部１３からのコールバック（Ｓ３０２）を、処理の契機とする。まず、コールバック関数であるｓｔａｒｔＣｏｎｔｅｘｔ関数とｅｎｄＣｏｎｔｅｘｔ関数は、対象となるＸＰａｔｈ式を示す変数ｘｐａｔｈ＿ｖａｒｉａｂｌｅを引数として持つ。 The XML data search apparatus 1 uses the callback (S302) from the SAX event generation unit 13 for the input XML data (S301) as a trigger for processing. First, the startContext function and the endContext function, which are callback functions, have a variable xpath_variable indicating the target XPath expression as an argument.

ここで、ｓｔａｒｔＣｏｎｔｅｘｔ関数がコールバックされたときには、ｘｐａｔｈ＿ｖａｒｉａｂｌｅに対応するスタックを用意する（Ｓ３０５）。スタックされるのは、ｘｐａｔｈ＿ｖａｒｉａｂｌｅが持つ全ての子オートマトンの受理状態をｔｒｕｅ／ｆａｌｓｅで格納することが出来るレジスタである。子オートマトンは、親オートマトンに対して受理状態であればｔｒｕｅを親オートマトンに通知する。レジスタは、ｆａｌｓｅに初期化される。 Here, when the startContext function is called back, a stack corresponding to xpath_variable is prepared (S305). What is stacked is a register that can store the accept states of all child automata possessed by xpath_variable as true / false. If the child automaton is in an accepting state with respect to the parent automaton, the child automaton notifies the parent automaton of true. The register is initialized to false.

一方、ｅｎｄＣｏｎｔｅｘｔ関数がコールバックされたときには、分岐有りオートマトン上のリーフであるならば（Ｓ３０６、Ｙ）、ｘｐａｔｈ＿ｖａｒｉａｂｌｅの親に相当するｘｐａｔｈに対して、ｘｐａｔｈ＿ｖａｒｉａｂｌｅに関するレジスタを持っているので、そのレジスタをｔｒｕｅとする。ここで、リーフとは図３を木として捉えたときの葉となる部分のことをさす。ｘｐａｔｈ＿ｖａｒｉａｂｌｅがリーフでないときには（Ｓ３０６、Ｎ）、ｘｐａｔｈ＿ｖａｒｉａｂｌｅの子に対応する子オートマトンが全て受理状態かどうかをレジスタの状態によって判断する。 On the other hand, when the endContext function is called back, if it is a leaf on a branching automaton (S306, Y), since there is a register related to xpath_variable for xpath corresponding to the parent of xpath_variable, Let it be true. Here, the leaf refers to a portion that becomes a leaf when FIG. 3 is regarded as a tree. When xpath_variable is not a leaf (N in S306), it is determined based on the state of the register whether all child automata corresponding to the children of xpath_variable are in an accepting state.

複数の子オートマトンで表現されている条件は、ａｎｄ／ｏｒを用いた論理式で表現される。よって、レジスタに含まれるそれぞれの子オートマトンに含まれるｔｒｕｅ／ｆａｌｓｅを条件群の論理式に代入演算し、論理式がｔｒｕｅであるかどうかを判断する。 A condition expressed by a plurality of child automata is expressed by a logical expression using and / or. Thus, true / false included in each child automaton included in the register is assigned to the logical expression of the condition group to determine whether the logical expression is true.

もし、条件群の論理式がｔｒｕｅならば（Ｓ３０７、Ｙ）、子オートマトン群の表す条件は受理状態である。受理状態であれば、ｘｐａｔｈ＿ｖａｒｉａｂｌｅは受理される。一方、条件群の論理式がｆａｌｓｅならば（Ｓ３０７、Ｎ）、子オートマトン群の表す条件は受理状態でない。受理状態でなければ、ｘｐａｔｈ＿ｖａｒｉａｂｌｅも受理されない。そして、受理状態であっても無くてもｘｐａｔｈ＿ｖａｒｉａｂｌｅの持つスタックから現在のレジスタをポップする（Ｓ３０９、Ｓ３１０）。 If the logical expression of the condition group is true (S307, Y), the condition represented by the child automaton group is an acceptance state. If it is in the accepting state, xpath_variable is accepted. On the other hand, if the logical expression of the condition group is false (S307, N), the condition represented by the child automaton group is not in the accepting state. If it is not in the accepting state, xpath_variable is not accepted. Then, the current register is popped from the stack of xpath_variable whether it is in the accepting state or not (S309, S310).

また、受理状態の場合（Ｓ３０７、Ｙ）には、ｘｐａｔｈ＿ｖａｒｉａｂｌｅに対応するオートマトンがルートであるかどうかによって処理を分岐する（Ｓ３１１）。ルートである場合には（Ｓ３１１、Ｙ）、対象となる分岐オートマトンが受理されていることになるため、元のＸＰａｔｈ式が受理状態であることを示している。そこで、そのＸＰａｔｈ式をリストＡにプッシュする（Ｓ３１３）。ルートで無い場合には（Ｓ３１１、Ｎ）、ｘｐａｔｈ＿ｖａｒｉａｂｌｅの親に相当するＸＰａｔｈ式に対して、ｘｐａｔｈ＿ｖａｒｉａｂｌｅをプッシュする（Ｓ３１２）。どちらの場合にも、次のコールバック関数が呼ばれるまで待つ（Ｓ３０２）。 In the case of an acceptance state (S307, Y), the process branches depending on whether the automaton corresponding to xpath_variable is a route (S311). If it is a route (S311, Y), the target branch automaton is accepted, indicating that the original XPath expression is in the accepted state. Therefore, the XPath expression is pushed to the list A (S313). If it is not the root (S311; N), xpath_variable is pushed to the XPath expression corresponding to the parent of xpath_variable (S312). In either case, the process waits until the next callback function is called (S302).

以上説明した本発明は、前記した実施例に限定されることなく、その技術思想の及ぶ範囲で様々な形態として実施することができる。 The present invention described above is not limited to the embodiments described above, and can be implemented in various forms within the scope of the technical idea.

本発明の一実施形態に関するＸＭＬデータ検索装置の構成図である。It is a block diagram of the XML data search apparatus regarding one Embodiment of this invention. 本発明の一実施形態に関するＳＡＸイベント生成部の説明図である。It is explanatory drawing of the SAX event production | generation part regarding one Embodiment of this invention. 本発明の一実施形態に関する順序無オートマトン作成部の説明図である。It is explanatory drawing of the unordered automaton preparation part regarding one Embodiment of this invention. 本発明の一実施形態に関する順序有オートマトン作成部の説明図である。It is explanatory drawing of the ordered automaton preparation part regarding one Embodiment of this invention. 本発明の一実施形態に関するノードの出現順序を用いたオートマトンの作成処理を示すフローチャートである。It is a flowchart which shows the creation process of the automaton using the appearance order of the node regarding one Embodiment of this invention. 本発明の一実施形態に関するノードの出現順序を用いずに作成したオートマトンの作成処理を示すフローチャートである。It is a flowchart which shows the production process of the automaton produced without using the appearance order of the node regarding one Embodiment of this invention. 本発明の一実施形態に関するＸＭＬデータの検索処理の動作を説明する図である。It is a figure explaining the operation | movement of the search process of the XML data regarding one Embodiment of this invention. 本発明の一実施形態に関するＸＭＬデータの検索処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the search process of the XML data regarding one Embodiment of this invention. 本発明の一実施形態に関するオートマトンの受理状態を管理する方法を説明する図である。It is a figure explaining the method of managing the acceptance state of the automaton regarding one Embodiment of this invention. 本発明の一実施形態に関するＸＭＬデータの一例を示す図である。It is a figure which shows an example of the XML data regarding one Embodiment of this invention. 本発明の一実施形態に関するＸＭＬスキーマおよびＳＡＸイベントを示す図である。FIG. 4 illustrates an XML schema and SAX events for one embodiment of the present invention. 本発明の一実施形態に関するＸＰａｔｈ式の分割処理を示す図である。It is a figure which shows the division process of the XPath type | formula regarding one Embodiment of this invention.

Explanation of symbols

１ＸＭＬデータ検索装置
１０ＸＭＬデータ格納部
１２ＸＭＬスキーマ格納部
１３ＳＡＸイベント生成部
２０ＸＰａｔｈ格納部
２２ＸＰａｔｈ検索部
３０オートマトン格納部
３１順序有オートマトン作成部（オートマトン作成手段）
３２順序無オートマトン作成部（オートマトン作成手段） DESCRIPTION OF SYMBOLS 1 XML data search apparatus 10 XML data storage part 12 XML schema storage part 13 SAX event generation part 20 XPath storage part 22 XPath search part 30 Automaton storage part 31 Ordered automaton creation part (automaton creation means)
32 Orderless automaton creation part (automata creation means)

Claims

An automaton creation method for creating an automaton from an XPath expression, wherein the XPath expression includes at least one set of correspondences between a context node and a plurality of predicates corresponding to the context node, and performs an arithmetic process A computer including at least a memory as a storage area used at the time, and an arithmetic processing device that performs the arithmetic processing,
A procedure for creating a start state corresponding to the context node, a procedure for creating a reception state for each predicate with the predicate as an input, and an association between the start state and the reception state by an epsilon transition. A method for creating an automaton from an expression.

An automaton creation method for creating an automaton from an XPath expression using an XML schema, wherein the XPath expression includes one or more pairs of correspondence relationships between a context node and a plurality of predicates corresponding to the context node. The XML schema defines the order of appearance of nodes included in the XML data, and a computer including at least a memory serving as a storage area used when performing arithmetic processing, and an arithmetic processing device that performs the arithmetic processing. ,
A procedure for creating a start state corresponding to the context node; a procedure for specifying the appearance order of the predicate from the order of appearance of the node specified in the XML schema; and creating a state for each predicate A method for creating an automaton, comprising: a step of defining a state transition in accordance with a predicate appearance order; and a step of creating an automaton from an XPath expression with a state corresponding to the end of the appearance order as an acceptance state.

An XML data search method for searching XML data using an automaton created by the automaton creation method according to claim 1 or 2, wherein a memory as a storage area used when performing arithmetic processing; A computer comprising at least an arithmetic processing unit that performs the arithmetic processing,
A procedure for accepting input of XML data to be searched, a procedure for sequentially scanning the XML data to generate a SAX event, and a transition of the automaton state by inputting the SAX event to the created automaton An XML data search method comprising: executing a procedure; and outputting the XML data as a search result when the state of the automaton reaches all acceptance states included in the automaton.

An XML data search device that searches XML data that conforms to an XPath expression of a search condition from XML data to be searched, the XML data search device including a memory as a storage area used when performing arithmetic processing, The XML data stored in the memory includes a set of nodes having a parent-child relationship, and the XPath expression stored in the memory includes a context node, It is assumed that the XML data retrieval apparatus includes at least one set of correspondences with a plurality of predicates corresponding to the context node.
A SAX event generation unit that sequentially scans XML data to be searched to generate a SAX event, an automaton creation unit that creates an automaton for each context node from the XPath expression, and the automaton created by using the SAX event as an input An XML data search device comprising: an XPath search unit that searches for XML data based on state transitions of:

An XML data search device that searches XML data that conforms to an XPath expression of a search condition from XML data to be searched, the XML data search device including a memory as a storage area used when performing arithmetic processing, The XML data stored in the memory includes a set of nodes having a parent-child relationship, and the XPath expression stored in the memory includes a context node, The XML schema stored in the memory includes one or more pairs of correspondence relationships with a plurality of predicates corresponding to the context node, and defines the appearance order of nodes included in the XML data. The data retrieval device
An SAX event generation unit that sequentially scans XML data to be searched to generate a SAX event; and an automaton creation unit that creates an automaton for each context node from the appearance order of the nodes specified in the XPath expression and the XML schema. And an XPath search unit that searches for XML data based on the state transition of the created automaton using the SAX event as an input.

An XML data search program for searching XML data that matches an XPath expression of a search condition from XML data to be searched, wherein the XML data includes a set of nodes having a parent-child relationship, and the XPath expression is a context node And one or more pairs of correspondence relationships with a plurality of predicates corresponding to the context node, a memory serving as a storage area used when performing the arithmetic processing, and an arithmetic processing device that performs the arithmetic processing A computer comprising at least
SAX event generation means for generating SAX events by sequentially scanning XML data to be searched, automaton creation means for creating an automaton for each context node from the XPath expression, and the automaton created using the SAX event as an input XML data search program for functioning as an XPath search means for searching XML data based on the state transition of.

An XML data search program for searching XML data that matches an XPath expression of a search condition from XML data to be searched, wherein the XML data includes a set of nodes having a parent-child relationship, and the XPath expression is a context node And one or more pairs of correspondence relations with a plurality of predicates corresponding to the context node, and the XML schema prescribes the appearance order of the nodes included in the XML data. A computer comprising at least a memory as a storage area used for the above and an arithmetic processing unit that performs the arithmetic processing;
SAX event generation means for generating SAX events by sequentially scanning XML data to be searched; and automaton creation means for creating an automaton for each context node from the appearance order of nodes defined in the XPath expression and the XML schema. An XML data search program for functioning as an XPath search means for searching XML data based on the state transition of the created automaton with the SAX event as an input.

A computer-readable recording medium in which the program according to claim 6 or 7 is recorded.