Abstract
With the increased digital usage, web visibility has become critically essential for organizations when catering to a larger audience. This visibility on the web is directly related to web searches on search engines which is often governed by search engine optimization techniques liked link building and link farming amongst others. The current study identifies metrics for segregating websites for the purpose of link building for search engine optimization as it is important to invest resources in the right website sources. These metrics are further used for detecting websites outliers for effective optimization and subsequent search engine marketing. Two case studies of knowledge management portals from different domains are used having 1682 and 1070 websites respectively for validation of the proposed approach. The study evolutionary intelligence by proposing a k-means chaotic firefly algorithm coupled with k-nearest neighbor outlier detection for solving the problem. Factors like Page Rank, Page Authority, Domain Authority, Alexa Rank, Social Shares, Google Index and Domain Age emerge significant in the process. Further, the proposed chaotic firefly variants are compared to K-Means integrated firefly algorithm, bat algorithm and cuckoo search algorithm for accuracy and convergence showing comparable accuracy. Findings indicate that the convergence speeds are higher for proposed chaotic firefly approach for tuning absorption and attractiveness coefficients resulting in faster search for optimal cluster centroids. The proposed approach contributes both theoretically and methodologically in the domain of vendor selection for identifying genuine websites for avoiding investment on untrustworthy websites.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal CC (2015) Outlier analysis. In: Data mining. Springer, Berlin, pp 237–263
Agichtein E, Brill E, Dumais S (2006) Improving web search ranking by incorporating user behaviour information. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–26
Aswani R, Chandra S, Ghrera SP, Kar AK (2017c) Identifying popular online news: an approach using chaotic cuckoo search algorithm. In: 2017 2nd International conference on computational systems and information technology for sustainable solution (CSITSS). IEEE, pp 1–6
Aswani R, Ghrera SP, Chandra S (2016) A novel approach to outlier detection using modified grey wolf optimization and k-nearest neighbors algorithm. Indian J Sci Technol 9(44):1–8
Aswani R, Ghrera SP, Chandra S, Kar AK (2017a) Outlier detection among influencer blogs based on off-site web analytics data. In: Conference on e-business, e-services and e-society. Springer, Cham, pp 251–260
Aswani R, Ghrera SP, Kar AK, Chandra S (2017) Identifying buzz in social media: a hybrid approach using artificial bee colony and k-nearest neighbors for outlier detection. Soc Netw Anal Min 7(1):38
Aswani R, Kar AK, Ilavarasan PV (2018) Detection of spammers in twitter marketing: a hybrid approach using social media analytics and bio inspired computing. Inf Syst Front 20(3):515–530
Bifet Figuerol AC, Castillo C, Chirita PA, Weber I (2005) An analysis of factors used in search engine ranking. In: AIRWeb, pp 48–57
Boothalingam R (2018) Optimization using lion algorithm: a biological inspiration from lion’s social behavior. Evol Intel 11(1–2):31–52
Boulter L (2015) Positive link building using Majestic tools and metrics. Majestic Blog. https://blog.majestic.com/training/positive-link-building-with-majestic-tools/. Accessed 10 Feb 2017
Chakraborty A, Kar AK (2016) A review of bio-inspired computing methods and potential applications. In: Proceedings of the international conference on signal, networks, computing, and systems. Springer, pp 155–161
Chakraborty A, Kar AK (2017) Swarm intelligence: A review of algorithms. In: Nature-inspired computing and optimization. Springer, pp 475–494
Chakraborty M, Pal S, Pramanik R, Chowdary CR (2016) Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag 52(6):1053–1073
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Dalkey N, Helmer O (1963) An experimental application of the Delphi method to the use of experts. Manag Sci 9(3):458–467
dos Santos Coelho L, Mariani VC (2008) Use of chaotic sequences in a biologically inspired algorithm for engineering design optimization. Expert Syst Appl 34(3):1905–1913
Dou W, Lim KH, Su C, Zhou N, Cui N (2010) Brand positioning strategy using search engine marketing. Mis Q 34(2):261–279
Evans MP (2007) Analysing Google rankings through search engine optimization data. Int Res 17(1):21–37
Fiorelli G (2015) A practical guide to content and its metrics. Moz Blog. https://moz.com/blog/practical-guidecontent-metrics. Accessed 15 Feb 2017
Fister I, Yang XS, Brest J (2013) A comprehensive review of firefly algorithms. Swarm Evolut Comput 13:34–46
Fister I, Yang XS, Brest J (2013) Modified firefly algorithm using quaternion representation. Expert Syst Appl 40(18):7220–7230
Gandomi AH, Yang XS, Alavi AH (2013) Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng Comput 29(1):17–35
Gandomi AH, Yang XS, Talatahari S, Alavi AH (2013) Firefly algorithm with chaos. Commun Nonlinear Sci Numer Simul 18(1):89–98
Gebremeskel GB, Yi C, He Z, Haile D (2016) Combined data mining techniques based patient data outlier detection for healthcare safety. Int J Intell Comput Cybern 9(1):42–68
Green P (1982) The content of a college-level outdoor leadership course. Paper presented at the conference of the northwest district association for the American alliance for health, physical education, recreation, and dance, Spokane, WA
Gyöngyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with trustrank. In: Proceedings of the 30th international conference on very large data bases, vol 30. VLDB Endowment, pp 576–587
Jain A, Dave M (2013) The role of backlinks in search engine ranking. Int J Adv Res Comput Sci Softw Eng 3(4):596–599
Jansen BJ, Spink A (2006) How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Inf Process Manag 42(1):248–263
Jansen BJ, Liu Z, Weaver C, Campbell G, Gregg M (2011) Real time search on the web: queries, topics, and economic value. Inf Process Manag 47(4):491–506
Jones DF, Mirrazavi SK, Tamiz M (2002) Multi-objective meta-heuristics: an overview of the current stateof- the-art. Eur J Oper Res 137(1):1–9
Kakol M, Nielek R, Wierzbicki A (2017) Understanding and predicting Web content credibility using the content credibility corpus. Inf Process Manag 53(5):1043–1061
Kar AK (2014) A decision support system for website selection for internet based advertising and promotions. In: Emerging trends in computing and communication. Springer, pp 453–457
Kar AK (2016) Bio inspired computing—a review of algorithms and scope of applications. Expert Syst Appl 59:20–32
Kar AK (2015) Growing the online portal of business fundas. Harvard Business Publishing, Harvard
Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 39(3):459–471
Kiang MY, Raghu TS, Shang KHM (2000) Marketing on the Internet—who can benefit from an online marketing approach? Decis Support Syst 27(4):383–393
Killoran JB (2013) How to use search engine optimization techniques to increase website visibility. IEEE Trans Prof Commun 56(1):50–66
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. JACM 46(5):604–632
Koppu S, Viswanatham VM (2018) Medical image security enhancement using two dimensional chaotic mapping optimized by self-adaptive grey wolf algorithm. Evol Intel 11(1–2):53–71
Kusyk J, Uyar MU, Sahin CS (2018) Survey on evolutionary computation methods for cybersecurity of mobile ad hoc networks. Evol Intel 10(3–4):95–117
Lee Y, Kozar KA (2006) Investigating the effect of website quality on e-business success: an analytic hierarchy process (AHP) approach. Decis Support Syst 42(3):1383–1401
Loia V, Luongo P (2001) An evolutionary approach to automatic web page categorization and updating. In: Web intelligence: research and development. Springer, Berlin, pp 292–302
Mahesh KM, Renjit JA (2018) Evolutionary intelligence for brain tumor recognition from MRI images: a critical study and review. Evol Intel 11(1–2):19–30
Malaga RA (2008) Worst practices in search engine optimization. Commun ACM 51(12):147–150
Malaga RA (2010) Search engine optimization—black and white hat approaches. Adv Comput 78:1–39
Malcolm JA, Lane PC (2008) An approach to detecting article spinning. In: Proceedings of the 3rd international conference on plagiarism
Manaskasemsak B, Rungsawang A (2015) Web spam detection using trust and distrust-based ant colony optimization learning. Int J Web Inf Syst 11(2):142–161
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Mohammadi M, Akbari A, Raahemi B, Nassersharif B, Asgharian H (2014) A fast anomaly detection system using probabilistic artificial immune algorithm capable of learning new attacks. Evol Intel 6(3):135–156
Mohseni SA, Wong T, Duchaine V (2016) MCOA: mutated and self-adaptive cuckoo optimization algorithm. Evol Intel 9(1–2):21–36
Ortiz-Cordova A, Yang Y, Jansen BJ (2015) External to internal search: associating searching on search engines with searching on sites. Inf Process Manag 51(5):718–736
Pandey AC, Rajpoot DS (2019) Spam review detection using spiral cuckoo search clustering method. Evol Intell 12(2):147–164
Ramaboa KK, Fish P (2018) Keyword length and matching options as indicators of search intent in sponsored search. Inf Process Manag 54(2):175–183
Rathore S, Loia V, Park JH (2018) SpamSpotter: an efficient spammer detection framework based on intelligent decision support system on facebook. Appl Soft Comput 67:920–932
Ruano-Ordás D, Fdez-Riverola F, Méndez JR (2018) Using evolutionary computation for discovering spam patterns from e-mail samples. Inf Process Manag 54(2):303–317
Sanchiz M, Chin J, Chevalier A, Fu WT, Amadieu F, He J (2017) Searching for information on the web: impact of cognitive aging, prior domain knowledge and complexity of the search problems. Inf Process Manag 53(1):281–294
Sen R (2005) Optimal search engine marketing strategy. Int J Electron Commerce 10(1):9–25
Senthilnath J, Omkar SN, Mani V (2011) Clustering using firefly algorithm: performance study. Swarm Evolut Comput 1(3):164–171
Slegg J (2016) A complete guide to panda, penguin, and Hummingbird. Search Engine J. http://www.searchenginejournal.com/seo-guide/google-penguin-panda-hummingbird Accessed 15 Feb 2017
Soulo T (2016) Ahrefs’ SEO metrics explained (finally). Ahrefs Blog. https://ahrefs.com/blog/seo-metrics/. Accessed 10 Feb 2017
Spais G (2010) Search Engine Optimization (SEO) as a dynamic online promotion technique: the implications of activity theory for promotion managers. Innov Mark 6(1):7–24
Tang R, Fong S, Yang XS, Deb S (2012) Integrating nature-inspired optimization algorithms to Kmeans clustering. In 2012 7th international conference on digital information management (ICDIM). IEEE, pp 116–123
Turner AJ, Miller JF (2014) NeuroEvolution: evolving heterogeneous artificial neural networks. Evol Intel 7(3):135–154
Vaughan L (2004) New measurements for search engine evaluation proposed and tested. Inf Process Manag 40(4):677–691
Wahid F, Ghazali R (2018) Hybrid of firefly algorithm and pattern search for solving optimization problems. Evol Intell 12(1):1–10
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Yahya NM, Tokhi MO, Kasdirin HA (2016) A new bats echolocation-based algorithm for single objective optimisation. Evol Intel 9(1–2):1–20
Yang XS (2010a) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer, Berlin, , pp 65–74
Yang XS (2010) Firefly algorithm, stochastic test functions and design optimisation. Int J Bio Inspired Comput 2(2):78–84
Yang XS (2010c) Firefly algorithm, Levy flights and global optimization. In: Research and development in intelligent systems XXVI. Springer, London, pp 209–218
Yang XS (2014) Swarm intelligence based algorithms: a critical analysis. Evol Intel 7(1):17–28
Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: World congress on nature & biologically inspired computing, 2009. NaBIC 2009. IEEE, pp 210–214
Yang XS, Deb S (2010) Engineering optimisation by cuckoo search. Int J Math Model Numer Optim 1(4):330–343
Yang XS, Hossein Gandomi A (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483
Young RD (2011) Who uses search engines? 92% of adult U.S. internet users. Search engine watch. https://searchenginewatch.com/sew/study/2101282/search-engines-92-adult-internet-users-study. Accessed 15 Feb 2017
Zhang J, Dimitroff A (2005) The impact of webpage content characteristics on webpage visibility in search engine results (Part I). Inf Process Manag 41(3):665–690
Zhang Y, Jansen BJ, Spink A (2009) Time series analysis of a Web search engine transaction log. Inf Process Manag 45(2):230–245
Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170
Zuze H, Weideman M (2013) Keyword stuffing and the big three search engines. Online Inf Rev 37(2):268–286
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aswani, R., Ghrera, S.P., Chandra, S. et al. A hybrid evolutionary approach for identifying spam websites for search engine marketing. Evol. Intel. 14, 1803–1815 (2021). https://doi.org/10.1007/s12065-020-00461-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-020-00461-1