Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1451983.1451994acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesiea-aeiConference Proceedingsconference-collections
research-article

Web spam identification through content and hyperlinks

Published: 22 April 2008 Publication History

Abstract

We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as well as page contents and features. The method is efficient, scalable, and provides state-of-the-art accuracy on a standard Web spam benchmark.

References

[1]
Graph Labeling Workshop. http://graphlab.lip6.fr/, 2007.
[2]
Web Spam Challenge. http://webspam.lip6.fr/, 2007.
[3]
J. Abernethy, O. Chapelle, and C. Castillo. WITCH: A new approach to web spam detection. Technical Report 2008--001, Yahoo! Research, 2008.
[4]
M. Belkin, P. Niyogi, and V. Sindhwani. On manifold regularization. In Proceedings of the Tenth International Workshop on Artifical Intelligence and Statistics (AISTATS), 2005.
[5]
C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, and S. Vigna. A reference collection for web spam. SIGIR Forum, 40(2):11--24, December 2006.
[6]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In Proceedings of SIGIR, Amsterdam, Netherlands, July 2007. ACM.
[7]
B. D. Davison. Topical locality in the web. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pages 272--279, Athens, Greece, 2000. ACM Press.
[8]
Q. Gan and T. Suel. Improving web spam classifers using link structure. In AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 17--20, New York, NY, USA, 2007. ACM.
[9]
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In First International Workshop on Adversarial Information Retrieval on the Web, pages 39--47, Chiba, Japan, 2005.
[10]
Z. Gyöngyi, H. Garcia-Molina. and J. Pedersen. Combating Web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 576--587, Toronto, Canada, August 2004. Morgan Kaufmann.
[11]
S. W. Haas and E. S. Grams. Page and link classifications: connecting diverse resources. In DL '98: Proceedings of the third ACM conference on Digital libraries, pages 99--107, New York, NY, USA, 1998. ACM Press.
[12]
V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In ACM SIGIR workshop on Adversarial Information Retrieval on the Web, 2006.
[13]
A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of the World Wide Web conference, pages 83--92, Edinburgh, Scotland, May 2006.
[14]
J. R. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMU-CS-94-125, School of Computer Science, Carnegie Mellon University, 1994.
[15]
V. Vapnik. Statistical Learning Theory. John Wiley & Sons Inc, 1998.
[16]
T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 821--826, New York, NY, USA, 2006. ACM Press.
[17]
D. Zhou, C. J. C. Burges, and T. Tao. Transductive link spam detection. In AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 21--28, New York, NY, USA, 2007. ACM Press.

Cited By

View all
  • (2023)Learning Category Distribution for Text ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358527922:4(1-13)Online publication date: 12-Apr-2023
  • (2022)LP-MLTSVM: Laplacian Multi-Label Twin Support Vector Machine for Semi-Supervised ClassificationIEEE Access10.1109/ACCESS.2021.313992910(13738-13752)Online publication date: 2022
  • (2021)A fuzzy Dempster–Shafer classifier for detecting Web spamsJournal of Information Security and Applications10.1016/j.jisa.2021.10279359(102793)Online publication date: Jun-2021
  • Show More Cited By

Index Terms

  1. Web spam identification through content and hyperlinks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web
      April 2008
      81 pages
      ISBN:9781605581590
      DOI:10.1145/1451983
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 April 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. graph regularization
      2. support vector machines
      3. web spam

      Qualifiers

      • Research-article

      Conference

      AIRWeb'08

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 23 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Learning Category Distribution for Text ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358527922:4(1-13)Online publication date: 12-Apr-2023
      • (2022)LP-MLTSVM: Laplacian Multi-Label Twin Support Vector Machine for Semi-Supervised ClassificationIEEE Access10.1109/ACCESS.2021.313992910(13738-13752)Online publication date: 2022
      • (2021)A fuzzy Dempster–Shafer classifier for detecting Web spamsJournal of Information Security and Applications10.1016/j.jisa.2021.10279359(102793)Online publication date: Jun-2021
      • (2020)Structured Optimal Graph-Based Clustering With Flexible EmbeddingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2019.294632931:10(3801-3813)Online publication date: Oct-2020
      • (2020)Pointwise manifold regularization for semi-supervised learningFrontiers of Computer Science10.1007/s11704-019-9115-z15:1Online publication date: 13-Aug-2020
      • (2019)Sparse graphs using global and local smoothness constraintsInternational Journal of Machine Learning and Cybernetics10.1007/s13042-019-01035-zOnline publication date: 25-Nov-2019
      • (2018)Unsupervised Domain Ranking in Large-Scale Web CrawlsACM Transactions on the Web10.1145/318218012:4(1-29)Online publication date: 27-Sep-2018
      • (2018)Few are as Good as Many: An Ontology-Based Tweet Spam Detection ApproachIEEE Access10.1109/ACCESS.2018.28776856(63890-63904)Online publication date: 2018
      • (2017)Semi-supervised orthogonal graph embedding with recursive projectionsProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172209(2308-2314)Online publication date: 19-Aug-2017
      • (2017)Detecting opinion spammer groups and spam targets through community discovery and sentiment analysisJournal of Computer Security10.3233/JCS-1694125:3(283-318)Online publication date: 29-May-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media