Nothing Special   »   [go: up one dir, main page]

skip to main content
Query and data mapping across heterogeneous information sources
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
ISBN:978-0-493-08549-4
Order Number:AAI3000016
Pages:
185
Reflects downloads up to 17 Feb 2025Bibliometrics
Skip Abstract Section
Abstract

The Internet has brought together information sources worldwide. Integrating such heterogeneous and autonomous sources is challenging because of their non-uniform query languages and data representations. To help users uniformly query over different sources, we have developed an integration system or a mediator for optimally mapping queries and data across disparate contexts. Such a translation technique is essential for many important applications that require querying sources and analyzing data on the web, such as meta-searching, e-commerce, and web mining. This thesis presents our solutions for the main functionalities of mediation: query translation, postfiltering, and data translation. First, the mediator must translate a user query for a source to execute. We develop a general approximate query mapping mechanism that finds the closet mappings under virtually any closeness criteria, such as minimal-superset, maximal subset, or some hybrid scheme that combines both precision and recall. Furthermore, for the important special case of minimal-superset mapping (and its dual case maximal-subset mapping), we present efficient algorithms that do not rely on query normal forms. Since the translation machinery relies on separately-supplied rules for rewriting basic query constraints, we also develop algorithms for rewriting IR predicates commonly used for document retrieval. Second, because a translated query may contain extra answers that do not match the original query, the mediator must perform post-filtering to remove the errors. We develop an algorithm for deriving the optimal filters that incur the least processing costs, and report our experiments to quantify the worst-case costs (i.e., for superset mappings). Finally, to present the query results uniformly, the mediator must translate the native data retrieved from the external source. We adopt our general query mapping framework for data translation by developing the modeling of data as a set of conjunctive constraints. The machinery can deal with flat data as well as hierarchically structured information such as XML.

Contributors
  • Stanford University
  • University of Illinois Urbana-Champaign
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations