Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/502585.502608acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Merging techniques for performing data fusion on the web

Published: 05 October 2001 Publication History

Abstract

Data fusion on the Web refers to the merging, into a unified single list, of the ranked document lists, which are retrieved in response to a user query by more than one Web search engine. It is performed by metasearch engines and their merging algorithms utilise the information present in the ranked lists of retrieved documents provided to them by the underlying search engines, such as the rank positions of the retrieved documents and their retrieval scores. In this paper, merging techniques are introduced that take into account not only the rank positions, but also the title and the summary accompanying the retrieved documents. Furthermore, the data fusion process is viewed as being similar to the combination of belief in uncertain reasoning and is modelled using Dempster-Shafer's theory of evidence. Our evaluation experiments indicate that the above merging techniques yield improvements in the effectiveness and that their effectiveness is comparable to that of the approach that merges the ranked lists by downloading and analysing the Web documents.

References

[1]
Baeza-Yates, R. & Ribeiro-Neto, B. Modern Information Retrieval. Addison & Wesley, 1999.
[2]
Belkin, N. J., Kantor, P., Fox, E. A. & Shaw, J. A. Combining the evidence of multiple query representations for information retrieval. Information Processing & Management, 31(3), pp. 431-448, 1995.
[3]
Brin, S. & Page, L. The Anatomy of a Large-Scale HyperTextual Web Search Engine. Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 1998.
[4]
Callan, J.P., Croft, W.B., & Harding, S.M. The INQUERY Retrieval System. In the Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, 1992, pp. 78-83.
[5]
Callan, J. P., Lu, Z. & Croft, W.B. Searching Distributed Collections with Inference Networks. In the Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995.
[6]
Dreilinger, D. & Howe, A. Experiences with Selecting Search Engines Using Metasearchl. ACM TOIS, 15(3), July 1997, pp. 195-222.
[7]
Frakes, W. B. & Baeza-Yates, R. Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs, NJ, USA, 1992.
[8]
Gravano, L., Chang, K., Garcia-Molina, H., Lagoze, C. & Paepcke, A. Digital Library Project, Stanford University. STARTS - Stanford Protocol Proposal for Internet Retrieval and Search. http://www-db.stanford.edu/-uravano/starts.html
[9]
Gravano, L., Chang, K., Garcia-Molina, H. & Paepcke, A. STARTS - Stanford Protocol Proposal for Internet Meta- Searching. In the Proceedings ACM SIGMOD International Conference on Management ofData, May 13-15,1997, Tucson, Arizona, USA.
[10]
Gravano, L. & Garcia-Molina, H. Merging Ranks from Heterogeneous Internet Sources. In the Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.
[11]
Gauch, S. & Wang, H. Information Fusion with ProFusion. In the Proceedings of the WebNet96: The First Conference on the Web Society, San Francisco, CA, USA, October 1996.
[12]
Gauch, S., Wang, H. & Gomez, M. ProFusion: Intelligent Fusion from Multiple, Distributed Search Engines. Journal of Universal Computing, Springer-Verlag, Volume 2 (9), September 1996.
[13]
Hawking, D., Craswell, N. & Hannan, D. Results and Challenges in Web Search Evaluation. In the Proceedings of the Eigth International World Wide Web Conference, Toronto, Canada, 1999.
[14]
Kirsch, S. T. Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents, United States Patent #5,659,732, 1997.
[15]
Lawrence, S. & Lee Giles, C. NEC Research Institute. Inquirus - The NECI Metasearch Engine. http://www.ncci.ni.nec.com/-lawrencelinouirus.html.
[16]
Lawrence, S. & Lee Giles, C. Inquirus - The NECI Metasearch Engine. In the Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Elsevier Sience, pp. 95-105, 1998.
[17]
Lawrence, S. & Lee Giles, C. NEC Research Institute. Searching the World Wide Web. Science, Volume 280, Number 5360, pp.98-100, 1998.
[18]
Porter, M.F. An algorithm for suffix stripping. In K. Sparck Jones and P. Willet, editors, Readings in Information Retrieval, pages 3 13-3 16. Morgan Kaufmann Publishers Inc., 1997.
[19]
Smeaton, A. F. & Crimmins, F. Using a Data Fusion Agent for Searching the WWW. Poster presented at the Sixth International World Wide Web Conference, Stanford, USA, April 1997.
[20]
Savoy, J., Le Calve, A. & Vrajitoru, D. Report on the TREC-5 Experiment: Data Fusion and Collection Fusion. Proceedings TRECS, 1996.NIST Publication 500-238, Gaithersburg (MD), 489-502.
[21]
Selberg, E. & Etzioni, 0. Multi-Service Search and Comparison using the MetaCrawler. In the Proceedings of the 4th International World Wide Web Conference, December 1995.
[22]
Selberg, E. & Etzioni, 0. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, January / February 1997, Volume 12 No. 1, pp. 8-14.
[23]
Shafer, G. A mathematical theory of evidence, Princeton University Press, 1976.
[24]
Turtle, H. & Croft, W.B. Evaluation of an Inference Network-Based Retrieval Model. ACM Transactions on Information Systems, 9(3), pp. 187-222.
[25]
Vogt, C. C. How much more is better? Characterising the effects of adding more IR systems to the combination. In the Proceedings of the Computer Assisted Information Retrieval International Conference (RIAO), Paris 2000.
[26]
Voorhees, E. M., Gupta, N. K. & Johnson-Laird, B. The collection fusion problem. In the Proceedings of the Third Text Retrieval (TREC-3) Conference, pp. 95-104, 1994.
[27]
Yager, R. R. & Rybalov, A. On the Fusion of Documents from Multiple Collection Information Retrieval Systems. Journal of the American Society for Information Science. 49(13), pp.1177-1184, 1998.

Cited By

View all
  • (2017)Result Merging for Structured Queries on the Deep Web with Active Relevance Weight EstimationInformation Systems10.1016/j.is.2016.06.00564:C(93-103)Online publication date: 1-Mar-2017
  • (2016)A Probabilistic Fusion FrameworkProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983739(1463-1472)Online publication date: 24-Oct-2016
  • (2016)Query Performance Prediction Using Reference ListsACM Transactions on Information Systems10.1145/292679034:4(1-34)Online publication date: 9-Jun-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dempster-Shafer's theory of evidence
  2. information retrieval
  3. web data fusion

Qualifiers

  • Article

Conference

CIKM01
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Result Merging for Structured Queries on the Deep Web with Active Relevance Weight EstimationInformation Systems10.1016/j.is.2016.06.00564:C(93-103)Online publication date: 1-Mar-2017
  • (2016)A Probabilistic Fusion FrameworkProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983739(1463-1472)Online publication date: 24-Oct-2016
  • (2016)Query Performance Prediction Using Reference ListsACM Transactions on Information Systems10.1145/292679034:4(1-34)Online publication date: 9-Jun-2016
  • (2015)Evaluating federated search tools: usability and retrievability frameworkThe Electronic Library10.1108/EL-12-2013-021133:6(1079-1099)Online publication date: 2-Nov-2015
  • (2014)Utilizing relevance feedback in fusion-based retrievalProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609573(313-322)Online publication date: 3-Jul-2014
  • (2012)Result merging using modified Bayesian method for Meta Search Engine2012 World Congress on Information and Communication Technologies10.1109/WICT.2012.6409201(892-896)Online publication date: Oct-2012
  • (2011)Improving Domain Searches through Customized Search EnginesIntelligent, Adaptive and Reasoning Technologies10.4018/978-1-60960-595-7.ch001(1-22)Online publication date: 2011
  • (2011)Cluster-based fusion of retrieved listsProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval10.1145/2009916.2010035(893-902)Online publication date: 24-Jul-2011
  • (2010)Advanced Metasearch Engine TechnologySynthesis Lectures on Data Management10.2200/S00307ED1V01Y201011DTM0112:1(1-129)Online publication date: Jan-2010
  • (2010)Ontology-Based Specific and Exhaustive User Profiles for Constraint Information Fusion for Multi-agentsProceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 0110.1109/WI-IAT.2010.76(264-271)Online publication date: 31-Aug-2010
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media