Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1451983.1451985acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesiea-aeiConference Proceedingsconference-collections
research-article

A large-scale study of automated web search traffic

Published: 22 April 2008 Publication History

Abstract

As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a website's rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by bots. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. We believe these features formulate a basis for a production-level query stream classifier.

References

[1]
E. Agichtein, E. Brill, S. Dumais and R. Ragno. Learning User Interaction Models for Predicting Web Search Result Preferences, in Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR), 2006.
[2]
P. Anick. Using Terminological Feedback for Web Search Refinement - A Log-based Study. In Proceedings of the SIGIR Conference on Information Retrieval (Toronto, Canada, July 28 - August 1, 2003). SIGIR '03. ACM Press, New York, NY, 88--95.
[3]
Click Quality Team. How Fictitious Clicks Occur in Third-Party Click Fraud Audit Reports, Google, Inc, 2006.
[4]
N. Daswani, M. Stoppelman, and the Google Click Quality and Security Teams. The Anatomy of Clickbot. A, In Proceedings of the USENIX HOTBOTS Workshop, 2007.
[5]
D. Fetterly, M. Manasse, and M. Najork. Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. In 7th International Workshop on the Web and Databases, Paris, France, June 2004, pages 1--6.
[6]
T Joachims, Optimizing Search Engines Using Clickthrough Data, In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (SIGKDD), 2002.
[7]
T. Joachims, L. Granka, B. Pang, H. Hembrooke and G. Gay, Accurately Interpreting Clickthrough Data as Implicit Feedback, in Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR), 2005. Ls -1
[8]
M. Kamvar and S. Baluja. A Large Scale Study of Wireless Search Behavior: Google Mobile Search. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Toronto, Canada, April 22--27, 2006). CHI 2006. ACM Press, New York, NY, 701--709.
[9]
A. Karasaridis, B. Rexroad, D. Hoeflin. Wide-scale Botnet Detection and Characterization, In Proceedings of the USENIX HOTBOTS Workshop, 2007.
[10]
T. Schluessler, S. Goglin, and E. Johnson. Is a Bot at the Controls? Detecting Input Data Attacks, in NetGames, 2007.
[11]
A. Tuzhilin. The Lane's Gifts v. Google Report.
[12]
Weka. http://www.cs.waikato.ac.nz/~ml/weka/
[13]
K.-L. Wu, P. S. Yu, and A. Ballman. SpeedTracer: A Web usage mining and analysis tool. In IBM Systems Journal, Volume 37, Number 1, 1998

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web
April 2008
81 pages
ISBN:9781605581590
DOI:10.1145/1451983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bot
  2. query
  3. search

Qualifiers

  • Research-article

Conference

AIRWeb'08

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A hybrid approach for identifying non-human traffic in online digital advertisingMultimedia Tools and Applications10.1007/s11042-021-11533-4Online publication date: 11-Oct-2021
  • (2019)ClicktokProceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks10.1145/3317549.3323407(105-116)Online publication date: 15-May-2019
  • (2018)Spam query detection using stream clusteringWorld Wide Web10.1007/s11280-017-0471-z21:2(557-572)Online publication date: 1-Mar-2018
  • (2018)Detecting and Characterizing Web Bot Traffic in a Large E-commerce MarketplaceComputer Security10.1007/978-3-319-98989-1_8(143-163)Online publication date: 7-Aug-2018
  • (2018)Identifying In-App User Actions from Mobile Web LogsAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-93037-4_24(300-311)Online publication date: 20-Jun-2018
  • (2015)Query or spam: Detecting fraudulent web requests using stream clustering2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI)10.1109/KBEI.2015.7436155(853-859)Online publication date: Nov-2015
  • (2015)Detecting Marionette Microblog Users for Improved Information CredibilityJournal of Computer Science and Technology10.1007/s11390-015-1584-430:5(1082-1096)Online publication date: 14-Sep-2015
  • (2014)Click Fraud Detection: Adversarial Pattern Recognition over 5 Years at MicrosoftReal World Data Mining Applications10.1007/978-3-319-07812-0_10(181-201)Online publication date: 14-Nov-2014
  • (2013)Click Fraud Detection with Bot Signatures2013 IEEE International Conference on Intelligence and Security Informatics10.1109/ISI.2013.6578805(146-150)Online publication date: Jun-2013
  • (2013)The Making of a Large-Scale Online Ad Server: Practical Lessons Building One of the World's Largest Online Ad Servers2013 IEEE 13th International Conference on Data Mining Workshops10.1109/ICDMW.2013.159(172-179)Online publication date: Dec-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media