The Internet has brought together information sources worldwide. Integrating such heterogeneous and autonomous sources is challenging because of their non-uniform query languages and data representations. To help users uniformly query over different sources, we have developed an integration system or a mediator for optimally mapping queries and data across disparate contexts. Such a translation technique is essential for many important applications that require querying sources and analyzing data on the web, such as meta-searching, e-commerce, and web mining. This thesis presents our solutions for the main functionalities of mediation: query translation, postfiltering, and data translation. First, the mediator must translate a user query for a source to execute. We develop a general approximate query mapping mechanism that finds the closet mappings under virtually any closeness criteria, such as minimal-superset, maximal subset, or some hybrid scheme that combines both precision and recall. Furthermore, for the important special case of minimal-superset mapping (and its dual case maximal-subset mapping), we present efficient algorithms that do not rely on query normal forms. Since the translation machinery relies on separately-supplied rules for rewriting basic query constraints, we also develop algorithms for rewriting IR predicates commonly used for document retrieval. Second, because a translated query may contain extra answers that do not match the original query, the mediator must perform post-filtering to remove the errors. We develop an algorithm for deriving the optimal filters that incur the least processing costs, and report our experiments to quantify the worst-case costs (i.e., for superset mappings). Finally, to present the query results uniformly, the mediator must translate the native data retrieved from the external source. We adopt our general query mapping framework for data translation by developing the modeling of data as a set of conjunctive constraints. The machinery can deal with flat data as well as hierarchically structured information such as XML.
Recommendations
Boolean Query Mapping Across Heterogeneous Information Sources
Searching over heterogeneous information sources is difficult because of the nonuniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the ...
Query Relaxation across Heterogeneous Data Sources
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementThe fundamental assumption for query rewriting in heterogeneous environments is that the mappings used for the rewriting are complete, i.e., every relation and attribute mentioned in the query is associated, through mappings, to relations and attributes ...
An adaptive approach to query mediation across heterogeneous information sources
COOPIS '96: Proceedings of the First IFCIS International Conference on Cooperative Information SystemsThe authors propose a query mediation framework to support customizable information gathering across heterogeneous and autonomous information sources. Instead of an integrated (and static) global schema, they propose an adaptive approach to ...