Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1242572.1242612acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Spam double-funnel: connecting web spammers with advertisers

Published: 08 May 2007 Publication History

Abstract

Spammers use questionable search engine optimization (SEO) techniques to promote their spam links into top search results. In this paper, we focus on one prevalent type of spam - redirection spam - where one can identify spam pages by the third-party domains that these pages redirect traffic to. We propose a five-layer, double-funnel model for describing end-to-end redirection spam, present a methodology for analyzing the layers, and identify prominent domains on each layer using two sets of commercial keywords. one targeting spammers and the other targeting advertisers. The methodology and findings are useful for search engines to strengthen their ranking algorithms against spam, for legitimate website owners to locate and remove spam doorway pages, and for legitimate advertisers to identify unscrupulous syndicators who serve ads on spam pages.

References

[1]
Adali, S., Liu, T., and Magdon-Ismail, M. Optimal Link Bombs are Uncoordinated. In the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), May 2005.
[2]
Baeza-Yates, R, Castillo, C., and Lopez, V. Pagerank Increase Under Different Collusion Topologies. In the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), May 2005.
[3]
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R. Link-based Characterization and Detection of Web Spam. In the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), August 2006.
[4]
Benczur, A., Csalogany, K., Sarlos, T., and Uher, M. SpamRank -- Fully Automatic Link Spam Detection. In the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), May 2005.
[5]
Chellapilla, K. and Chickering, D.M. Improving Cloaking Detection Using Search Query Popularity and Monetizability. In the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), August 2006.
[6]
da Costa Carvalho, A. L., Chirita, P., de Moura, E. S., Calado, P., and Nejdl, W. Site Level Noise Removal for Search Engines. In Proc. of International World Wide Web Conference (WWW). May, 2006.
[7]
Fetterly, D., Manasse, M., and Najork, M. Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. In Proc of the 7th International Workshop on the Web and Databases. pp. 1--6, 2004.
[8]
Gyongyi, Z. and Garcia-Molina, H. Web Spam Taxonomy. In the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005.
[9]
Jansen, B.J. Adversarial Informaton Retrieval Aspects of Sponsored Search. In the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2006.
[10]
Jarvelin, K. and Kekalainen, J. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proc. ACM SIGIR Conference on R&D in Information Retrieval, 2000.
[11]
Kolari, P., Tim Finin, T., and Joshi, A. SVMs for the Blogosphere: Blog Identification and Splog Detection. In AAAI Spring Symposium on Computational Approaches to Analysing Weblogs, March 2006.
[12]
Krishnan, V. and Raj, R. Web Spam Detection and Anti-Trust Rank. In the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), August 2006.
[13]
Metaxas, P. and DeStephano, J. Web Spam, Propaganda and Trust. In the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), May 2005.
[14]
Mishne, G., Carmel, D., and Lempel, R. Blocking Blog Spam with Language Model Disagreement. In the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), May 2005.
[15]
Niu, Y., Wang, Y. M., Chen, H., Ma, M., and Hsu, F. A Quantitative Study of Forum Spamming Using Context-based Analysis. In Proc. Network and Distributed System Security (NDSS) Symposium, February 2007.
[16]
Ntoulas, A., Najork, M., Manasse, M., and Fetterly, D. Detecting Spam Web Pages through Content Analysis. In Proc. International World Wide Web Conference (WWW), May 2006.
[17]
Sarukkai, R.R. How Much is a Keyword Worth? In Proc. International World Wide Web Conference, (WWW), May 2005.
[18]
Urvoy, T., Lavernge, T., Filoche, P. Tracking Web Spam with Hidden Style Similarity. In the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), August 2006.
[19]
Wang, Y. M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., and King, S. Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities. In Proc. Network and Distributed System Security (NDSS) Symposium, February 2006.
[20]
Wang, Y. M., Beck, D., Wang, J., Verbowski, C., and Daniels, B. Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting. In Proc. 2nd Workshop on Steps to Reducing Unwanted Traffic on the Internet (SRUTI), July 2006.
[21]
Wang, Y. M. and Ma, M. Strider Search Ranger: Towards an Autonomic Anti-Spam Search Engine. Microsoft Research Technical Report, MSR-TR-2006-174, December 2006.
[22]
Wang, Y. M. and Ma, M. Detecting Stealth Web Pages That Use Click-Through Cloaking. Microsoft Research Technical Report, MSR-TR-2006-178, December 2006.
[23]
Wu, B. and Davison, B.D. Cloaking and Redirection: A Preliminary Study. In the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005.
[24]
Wu, B., and Davison, B. D. Identifying Link Farm Pages. In Proc. International World Wide Web Conference (WWW), 2005.
[25]
Wu, B. and Davison, B.D. Detecting Semantic Cloaking on the Web. In Proc. International World Wide Web Conference (WWW), August 2006.
[26]
Wu, B., Goel, V., Davison, B.D. Propagating Trust and Distrust to Demote Web Spam. In Proc. Models of Trust for the Web Workshop (MTW), International World Wide Web Conference, 2006.
[27]
Fiddler HTTP Proxy, http://www.fiddlertool.com/
[28]
Fighting Splogs, http://fightsplog.blogspot.com/
[29]
The Google AdSense Program, http://google.com/adsense
[30]
Network Whois records, http://whois.domaintools.com/ 66.230.138.211 and http://whois.domaintools.com/64.111.214.154
[31]
Screenshots of sample redirection spam pages, http://research.microsoft.com/SearchRanger/Redirection-spam_3_types.htm
[32]
Screenshots of sample click-through analyses, http://research.microsoft.com/SearchRanger/Spam_ads_click-through_analysis.htm

Cited By

View all

Index Terms

  1. Spam double-funnel: connecting web spammers with advertisers

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '07: Proceedings of the 16th international conference on World Wide Web
        May 2007
        1382 pages
        ISBN:9781595936547
        DOI:10.1145/1242572
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 May 2007

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. advertisement syndication
        2. redirection and cloaking
        3. search spam
        4. web spam

        Qualifiers

        • Article

        Conference

        WWW'07
        Sponsor:
        WWW'07: 16th International World Wide Web Conference
        May 8 - 12, 2007
        Alberta, Banff, Canada

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)11
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 23 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)How Do Home Computer Users Browse the Web?ACM Transactions on the Web10.1145/347334316:1(1-27)Online publication date: 28-Sep-2021
        • (2020)The Chameleon Attack: Manipulating Content Display in Online Social MediaProceedings of The Web Conference 202010.1145/3366423.3380165(848-859)Online publication date: 20-Apr-2020
        • (2018)Web Spam DetectionEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_465(4677-4681)Online publication date: 7-Dec-2018
        • (2017)An adaptive neuro-fuzzy inference system for detecting redirection spam2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)10.1109/ICRITO.2017.8342490(559-564)Online publication date: Sep-2017
        • (2017)Challenges in the Analysis of Online Social NetworksWireless Personal Communications: An International Journal10.1007/s11277-017-4712-397:3(4015-4061)Online publication date: 1-Dec-2017
        • (2017)Detecting redirection spam using multilayer perceptron neural networkSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2531-921:13(3803-3814)Online publication date: 1-Jul-2017
        • (2017)Detecting Negative Deceptive Opinion from TweetsMobile and Wireless Technologies 201710.1007/978-981-10-5281-1_36(329-339)Online publication date: 17-Jun-2017
        • (2017)Web Spam DetectionEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_465-3(1-5)Online publication date: 11-Feb-2017
        • (2016)A fuzzy logic approach for detecting redirection spamInternational Journal of Electronic Security and Digital Forensics10.1504/IJESDF.2016.0774358:3(191-204)Online publication date: 1-Jan-2016
        • (2016)WSF2Scientific Programming10.1155/2016/60913852016(1-1)Online publication date: 1-Jan-2016
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media