Learning to Distribute Queries into Web Search Nodes

Marcelo Mendoza²⁴,
Mauricio Marín²⁴,
Flavio Ferrarotti²⁴ &
…
Barbara Poblete²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

European Conference on Information Retrieval

2177 Accesses
2 Citations

Abstract

Web search engines are composed of a large set of search nodes and a broker machine that feeds them with queries. A location cache keeps minimal information in the broker to register the search nodes capable of producing the top-N results for frequent queries. In this paper we show that it is possible to use the location cache as a training dataset for a standard machine learning algorithm and build a predictive model of the search nodes expected to produce the best approximated results for queries. This can be used to prevent the broker from sending queries to all search nodes under situations of sudden peaks in query traffic and, as a result, avoid search node saturation. This paper proposes a logistic regression model to quickly predict the most pertinent search nodes for a given query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Web Information Retrieval and Search

Query Processing in Highly-Loaded Search Engines

Evaluating Web Crawlers with Machine Learning Algorithms for Accurate Location Extraction from Job Offers

References

Amiri, K., Park, S., Tewari, R., Padmanabhan, S.: Scalable template-based query containment checking for web semantic caches. In: ICDE (2003)
Google Scholar
Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: Design trade-offs for search engine caching. ACM TWEB 2(4) (2008)
Google Scholar
Chidlovskii, B., Roncancio, C., Schneider, M.: Semantic Cache Mechanism for Heterogeneous Web Querying. Computer Networks 31(11-16), 1347–1360 (1999)
Article Google Scholar
Chidlovskii, B., Borghoff, U.: Semantic Caching of Web Queries. VLDB Journal 9(1), 2–17 (2000)
Article Google Scholar
Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: KDD (2003)
Google Scholar
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM TOIS 24(1), 51–78 (2006)
Article Google Scholar
Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F.: A Metric Cache for Similarity Search. In: LSDS-IR (2008)
Google Scholar
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
Google Scholar
Ferrarotti, F., Marin, M., Mendoza, M.: A Last-Resort Semantic Cache for Web Queries. In: SPIRE (2009)
Google Scholar
Gan, Q., Suel, T.: Improved Techniques for Result Caching in Web Search Engines. In: WWW (2009)
Google Scholar
Godfrey, P., Gryz, J.: Answering Queries by Semantic Caches. In: Bench-Capon, T.J.M., Soda, G., Tjoa, A.M. (eds.) DEXA 1999. LNCS, vol. 1677, pp. 485–498. Springer, Heidelberg (1999)
Chapter Google Scholar
Keerthi, S., Sundararajan, S., Chang, K., Hsieh, C., Lin, C.: A sequential dual method for large scale multi-class linear SVMs. In: SIGKDD (2008)
Google Scholar
Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: WWW (2003)
Google Scholar
Lin, C., Weng, R., Keerthi, S.: Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research 9, 627–650 (2008)
MathSciNet Google Scholar
Long, X., Suel, T.: Three-level caching for efficient query processing in large Web search engines. In: WWW (2005)
Google Scholar
Marin, M., Ferrarotti, F., Mendoza, M., Gomez, C., Gil-Costa, V.: Location Cache for Web Queries. In: CIKM (2009)
Google Scholar
Markatos, E.: On caching search engine query results. Computer Communications 24(7), 137–143 (2000)
Google Scholar
Puppin, D., Silvestri, F.: C++ implementation of the co-cluster algorithm by Dhillon, Mallela, and Modha, http://hpc.isti.cnr.it
Puppin, D., Silvestri, F., Perego, R., Baeza-Yates, R.: Load-balancing and caching for collection selection architectures. In: INFOSCALE (2007)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label Classification: An Overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)
Google Scholar
Yahoo! Search BOSS API (2009), http://developer.yahoo.com/search/boss/
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo! Research Latin America, Av. Blanco Encalada 2120, 4th floor, Santiago, Chile
Marcelo Mendoza, Mauricio Marín, Flavio Ferrarotti & Barbara Poblete

Authors

Marcelo Mendoza
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Marín
View author publications
You can also search for this author in PubMed Google Scholar
Flavio Ferrarotti
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Poblete
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Adaptive Information Cluster, Dublin City University, Dublin, 9, Ireland
Cathal Gurrin
The Open University, Walton Hall, MK7 6HF, Milton Keynes, UK
Yulan He
Microsoft Research Ltd, 7 JJ Thomson Avenue, CB3 0FB, Cambridge, UK
Gabriella Kazai
Department of Computer Science, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK
Udo Kruschwitz
The Open University, Walton Hall, Milton Keynes, UK
Suzanne Little
University of London, London, UK
Thomas Roelleke
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Department of Computing Science, University of Glasgow, 17 Lilybank Gardens, G12 8QQ, Glasgow, UK
Keith van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mendoza, M., Marín, M., Ferrarotti, F., Poblete, B. (2010). Learning to Distribute Queries into Web Search Nodes. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-12275-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Distribute Queries into Web Search Nodes

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Web Information Retrieval and Search

Query Processing in Highly-Loaded Search Engines

Evaluating Web Crawlers with Machine Learning Algorithms for Accurate Location Extraction from Job Offers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning to Distribute Queries into Web Search Nodes

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Web Information Retrieval and Search

Query Processing in Highly-Loaded Search Engines

Evaluating Web Crawlers with Machine Learning Algorithms for Accurate Location Extraction from Job Offers

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation