Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A hybrid evolutionary approach for identifying spam websites for search engine marketing

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

With the increased digital usage, web visibility has become critically essential for organizations when catering to a larger audience. This visibility on the web is directly related to web searches on search engines which is often governed by search engine optimization techniques liked link building and link farming amongst others. The current study identifies metrics for segregating websites for the purpose of link building for search engine optimization as it is important to invest resources in the right website sources. These metrics are further used for detecting websites outliers for effective optimization and subsequent search engine marketing. Two case studies of knowledge management portals from different domains are used having 1682 and 1070 websites respectively for validation of the proposed approach. The study evolutionary intelligence by proposing a k-means chaotic firefly algorithm coupled with k-nearest neighbor outlier detection for solving the problem. Factors like Page Rank, Page Authority, Domain Authority, Alexa Rank, Social Shares, Google Index and Domain Age emerge significant in the process. Further, the proposed chaotic firefly variants are compared to K-Means integrated firefly algorithm, bat algorithm and cuckoo search algorithm for accuracy and convergence showing comparable accuracy. Findings indicate that the convergence speeds are higher for proposed chaotic firefly approach for tuning absorption and attractiveness coefficients resulting in faster search for optimal cluster centroids. The proposed approach contributes both theoretically and methodologically in the domain of vendor selection for identifying genuine websites for avoiding investment on untrustworthy websites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Aggarwal CC (2015) Outlier analysis. In: Data mining. Springer, Berlin, pp 237–263

  2. Agichtein E, Brill E, Dumais S (2006) Improving web search ranking by incorporating user behaviour information. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–26

  3. Aswani R, Chandra S, Ghrera SP, Kar AK (2017c) Identifying popular online news: an approach using chaotic cuckoo search algorithm. In: 2017 2nd International conference on computational systems and information technology for sustainable solution (CSITSS). IEEE, pp 1–6

  4. Aswani R, Ghrera SP, Chandra S (2016) A novel approach to outlier detection using modified grey wolf optimization and k-nearest neighbors algorithm. Indian J Sci Technol 9(44):1–8

    Article  Google Scholar 

  5. Aswani R, Ghrera SP, Chandra S, Kar AK (2017a) Outlier detection among influencer blogs based on off-site web analytics data. In: Conference on e-business, e-services and e-society. Springer, Cham, pp 251–260

  6. Aswani R, Ghrera SP, Kar AK, Chandra S (2017) Identifying buzz in social media: a hybrid approach using artificial bee colony and k-nearest neighbors for outlier detection. Soc Netw Anal Min 7(1):38

    Article  Google Scholar 

  7. Aswani R, Kar AK, Ilavarasan PV (2018) Detection of spammers in twitter marketing: a hybrid approach using social media analytics and bio inspired computing. Inf Syst Front 20(3):515–530

    Article  Google Scholar 

  8. Bifet Figuerol AC, Castillo C, Chirita PA, Weber I (2005) An analysis of factors used in search engine ranking. In: AIRWeb, pp 48–57

  9. Boothalingam R (2018) Optimization using lion algorithm: a biological inspiration from lion’s social behavior. Evol Intel 11(1–2):31–52

    Article  Google Scholar 

  10. Boulter L (2015) Positive link building using Majestic tools and metrics. Majestic Blog. https://blog.majestic.com/training/positive-link-building-with-majestic-tools/. Accessed 10 Feb 2017

  11. Chakraborty A, Kar AK (2016) A review of bio-inspired computing methods and potential applications. In: Proceedings of the international conference on signal, networks, computing, and systems. Springer, pp 155–161

  12. Chakraborty A, Kar AK (2017) Swarm intelligence: A review of algorithms. In: Nature-inspired computing and optimization. Springer, pp 475–494

  13. Chakraborty M, Pal S, Pramanik R, Chowdary CR (2016) Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag 52(6):1053–1073

    Article  Google Scholar 

  14. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  15. Dalkey N, Helmer O (1963) An experimental application of the Delphi method to the use of experts. Manag Sci 9(3):458–467

    Article  Google Scholar 

  16. dos Santos Coelho L, Mariani VC (2008) Use of chaotic sequences in a biologically inspired algorithm for engineering design optimization. Expert Syst Appl 34(3):1905–1913

    Article  Google Scholar 

  17. Dou W, Lim KH, Su C, Zhou N, Cui N (2010) Brand positioning strategy using search engine marketing. Mis Q 34(2):261–279

    Article  Google Scholar 

  18. Evans MP (2007) Analysing Google rankings through search engine optimization data. Int Res 17(1):21–37

    MathSciNet  Google Scholar 

  19. Fiorelli G (2015) A practical guide to content and its metrics. Moz Blog. https://moz.com/blog/practical-guidecontent-metrics. Accessed 15 Feb 2017

  20. Fister I, Yang XS, Brest J (2013) A comprehensive review of firefly algorithms. Swarm Evolut Comput 13:34–46

    Article  Google Scholar 

  21. Fister I, Yang XS, Brest J (2013) Modified firefly algorithm using quaternion representation. Expert Syst Appl 40(18):7220–7230

    Article  Google Scholar 

  22. Gandomi AH, Yang XS, Alavi AH (2013) Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng Comput 29(1):17–35

    Article  Google Scholar 

  23. Gandomi AH, Yang XS, Talatahari S, Alavi AH (2013) Firefly algorithm with chaos. Commun Nonlinear Sci Numer Simul 18(1):89–98

    Article  MathSciNet  MATH  Google Scholar 

  24. Gebremeskel GB, Yi C, He Z, Haile D (2016) Combined data mining techniques based patient data outlier detection for healthcare safety. Int J Intell Comput Cybern 9(1):42–68

    Article  Google Scholar 

  25. Green P (1982) The content of a college-level outdoor leadership course. Paper presented at the conference of the northwest district association for the American alliance for health, physical education, recreation, and dance, Spokane, WA

  26. Gyöngyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with trustrank. In: Proceedings of the 30th international conference on very large data bases, vol 30. VLDB Endowment, pp 576–587

  27. Jain A, Dave M (2013) The role of backlinks in search engine ranking. Int J Adv Res Comput Sci Softw Eng 3(4):596–599

    Google Scholar 

  28. Jansen BJ, Spink A (2006) How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Inf Process Manag 42(1):248–263

    Article  Google Scholar 

  29. Jansen BJ, Liu Z, Weaver C, Campbell G, Gregg M (2011) Real time search on the web: queries, topics, and economic value. Inf Process Manag 47(4):491–506

    Article  Google Scholar 

  30. Jones DF, Mirrazavi SK, Tamiz M (2002) Multi-objective meta-heuristics: an overview of the current stateof- the-art. Eur J Oper Res 137(1):1–9

    Article  MATH  Google Scholar 

  31. Kakol M, Nielek R, Wierzbicki A (2017) Understanding and predicting Web content credibility using the content credibility corpus. Inf Process Manag 53(5):1043–1061

    Article  Google Scholar 

  32. Kar AK (2014) A decision support system for website selection for internet based advertising and promotions. In: Emerging trends in computing and communication. Springer, pp 453–457

  33. Kar AK (2016) Bio inspired computing—a review of algorithms and scope of applications. Expert Syst Appl 59:20–32

    Article  Google Scholar 

  34. Kar AK (2015) Growing the online portal of business fundas. Harvard Business Publishing, Harvard

    Google Scholar 

  35. Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 39(3):459–471

    Article  MathSciNet  MATH  Google Scholar 

  36. Kiang MY, Raghu TS, Shang KHM (2000) Marketing on the Internet—who can benefit from an online marketing approach? Decis Support Syst 27(4):383–393

    Article  Google Scholar 

  37. Killoran JB (2013) How to use search engine optimization techniques to increase website visibility. IEEE Trans Prof Commun 56(1):50–66

    Article  Google Scholar 

  38. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. JACM 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  39. Koppu S, Viswanatham VM (2018) Medical image security enhancement using two dimensional chaotic mapping optimized by self-adaptive grey wolf algorithm. Evol Intel 11(1–2):53–71

    Article  Google Scholar 

  40. Kusyk J, Uyar MU, Sahin CS (2018) Survey on evolutionary computation methods for cybersecurity of mobile ad hoc networks. Evol Intel 10(3–4):95–117

    Article  Google Scholar 

  41. Lee Y, Kozar KA (2006) Investigating the effect of website quality on e-business success: an analytic hierarchy process (AHP) approach. Decis Support Syst 42(3):1383–1401

    Article  Google Scholar 

  42. Loia V, Luongo P (2001) An evolutionary approach to automatic web page categorization and updating. In: Web intelligence: research and development. Springer, Berlin, pp 292–302

  43. Mahesh KM, Renjit JA (2018) Evolutionary intelligence for brain tumor recognition from MRI images: a critical study and review. Evol Intel 11(1–2):19–30

    Article  Google Scholar 

  44. Malaga RA (2008) Worst practices in search engine optimization. Commun ACM 51(12):147–150

    Article  Google Scholar 

  45. Malaga RA (2010) Search engine optimization—black and white hat approaches. Adv Comput 78:1–39

    Article  Google Scholar 

  46. Malcolm JA, Lane PC (2008) An approach to detecting article spinning. In: Proceedings of the 3rd international conference on plagiarism

  47. Manaskasemsak B, Rungsawang A (2015) Web spam detection using trust and distrust-based ant colony optimization learning. Int J Web Inf Syst 11(2):142–161

    Article  Google Scholar 

  48. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  49. Mohammadi M, Akbari A, Raahemi B, Nassersharif B, Asgharian H (2014) A fast anomaly detection system using probabilistic artificial immune algorithm capable of learning new attacks. Evol Intel 6(3):135–156

    Article  Google Scholar 

  50. Mohseni SA, Wong T, Duchaine V (2016) MCOA: mutated and self-adaptive cuckoo optimization algorithm. Evol Intel 9(1–2):21–36

    Article  Google Scholar 

  51. Ortiz-Cordova A, Yang Y, Jansen BJ (2015) External to internal search: associating searching on search engines with searching on sites. Inf Process Manag 51(5):718–736

    Article  Google Scholar 

  52. Pandey AC, Rajpoot DS (2019) Spam review detection using spiral cuckoo search clustering method. Evol Intell 12(2):147–164

    Article  Google Scholar 

  53. Ramaboa KK, Fish P (2018) Keyword length and matching options as indicators of search intent in sponsored search. Inf Process Manag 54(2):175–183

    Article  Google Scholar 

  54. Rathore S, Loia V, Park JH (2018) SpamSpotter: an efficient spammer detection framework based on intelligent decision support system on facebook. Appl Soft Comput 67:920–932

    Article  Google Scholar 

  55. Ruano-Ordás D, Fdez-Riverola F, Méndez JR (2018) Using evolutionary computation for discovering spam patterns from e-mail samples. Inf Process Manag 54(2):303–317

    Article  Google Scholar 

  56. Sanchiz M, Chin J, Chevalier A, Fu WT, Amadieu F, He J (2017) Searching for information on the web: impact of cognitive aging, prior domain knowledge and complexity of the search problems. Inf Process Manag 53(1):281–294

    Article  Google Scholar 

  57. Sen R (2005) Optimal search engine marketing strategy. Int J Electron Commerce 10(1):9–25

    Article  Google Scholar 

  58. Senthilnath J, Omkar SN, Mani V (2011) Clustering using firefly algorithm: performance study. Swarm Evolut Comput 1(3):164–171

    Article  Google Scholar 

  59. Slegg J (2016) A complete guide to panda, penguin, and Hummingbird. Search Engine J. http://www.searchenginejournal.com/seo-guide/google-penguin-panda-hummingbird Accessed 15 Feb 2017

  60. Soulo T (2016) Ahrefs’ SEO metrics explained (finally). Ahrefs Blog. https://ahrefs.com/blog/seo-metrics/. Accessed 10 Feb 2017

  61. Spais G (2010) Search Engine Optimization (SEO) as a dynamic online promotion technique: the implications of activity theory for promotion managers. Innov Mark 6(1):7–24

    Google Scholar 

  62. Tang R, Fong S, Yang XS, Deb S (2012) Integrating nature-inspired optimization algorithms to Kmeans clustering. In 2012 7th international conference on digital information management (ICDIM). IEEE, pp 116–123

  63. Turner AJ, Miller JF (2014) NeuroEvolution: evolving heterogeneous artificial neural networks. Evol Intel 7(3):135–154

    Article  Google Scholar 

  64. Vaughan L (2004) New measurements for search engine evaluation proposed and tested. Inf Process Manag 40(4):677–691

    Article  MATH  Google Scholar 

  65. Wahid F, Ghazali R (2018) Hybrid of firefly algorithm and pattern search for solving optimization problems. Evol Intell 12(1):1–10

    Google Scholar 

  66. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Article  Google Scholar 

  67. Yahya NM, Tokhi MO, Kasdirin HA (2016) A new bats echolocation-based algorithm for single objective optimisation. Evol Intel 9(1–2):1–20

    Article  Google Scholar 

  68. Yang XS (2010a) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer, Berlin, , pp 65–74

  69. Yang XS (2010) Firefly algorithm, stochastic test functions and design optimisation. Int J Bio Inspired Comput 2(2):78–84

    Article  Google Scholar 

  70. Yang XS (2010c) Firefly algorithm, Levy flights and global optimization. In: Research and development in intelligent systems XXVI. Springer, London, pp 209–218

  71. Yang XS (2014) Swarm intelligence based algorithms: a critical analysis. Evol Intel 7(1):17–28

    Article  Google Scholar 

  72. Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: World congress on nature & biologically inspired computing, 2009. NaBIC 2009. IEEE, pp 210–214

  73. Yang XS, Deb S (2010) Engineering optimisation by cuckoo search. Int J Math Model Numer Optim 1(4):330–343

    MATH  Google Scholar 

  74. Yang XS, Hossein Gandomi A (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483

    Article  Google Scholar 

  75. Young RD (2011) Who uses search engines? 92% of adult U.S. internet users. Search engine watch. https://searchenginewatch.com/sew/study/2101282/search-engines-92-adult-internet-users-study. Accessed 15 Feb 2017

  76. Zhang J, Dimitroff A (2005) The impact of webpage content characteristics on webpage visibility in search engine results (Part I). Inf Process Manag 41(3):665–690

    Article  Google Scholar 

  77. Zhang Y, Jansen BJ, Spink A (2009) Time series analysis of a Web search engine transaction log. Inf Process Manag 45(2):230–245

    Article  Google Scholar 

  78. Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170

    Article  Google Scholar 

  79. Zuze H, Weideman M (2013) Keyword stuffing and the big three search engines. Online Inf Rev 37(2):268–286

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reema Aswani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aswani, R., Ghrera, S.P., Chandra, S. et al. A hybrid evolutionary approach for identifying spam websites for search engine marketing. Evol. Intel. 14, 1803–1815 (2021). https://doi.org/10.1007/s12065-020-00461-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00461-1

Keywords

Navigation