Abstract
There are many parameters that may affect the navigation behaviour of web users. Prediction of the potential next page that may be visited by the web user is important, since this information can be used for prefetching or personalization of the page for that user. One of the successful methods for the determination of the next web page is to construct behaviour models of the users by clustering. The success of clustering is highly correlated with the similarity measure that is used for calculating the similarity among navigation sequences. This work proposes a new approach for determining the next web page by extending the standard clustering with the content-based semantic similarity method. Semantics of web-pages are represented as sets of concepts, and thus, user session are modelled as sequence of sets. As a result, session similarity is defined as an alignment of two sequences of sets. The success of the proposed method has been shown through applying it on real life web log data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
CLUTO (scluster, gcluto)—a cross-platform for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters, http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview
One year of access web-logs of METU C.Eng. website (http://www.ceng.metu.edu.tr)
References
Batet, M., Erola, A., Sanchez, D., & Castella-Roca, J. (2013). Utility preserving query log anonymization via semantic microaggregation. Information Sciences, 242, 49–63.
Bayir, M., Toroslu, I., Cosar, A., & Fidan, G. (2009). Smart miner: a new framework for mining large scale web usage data. In International conference in World Wide Web (pp. 161–170).
Bayir, M., Toroslu, I., Demirbas, M., & Cosar, A. (2012). Discovering better navigation sequences for the session construction problem. Data and Knowledge Engineering, 73, 58–72.
Berendt, B. (2000a). Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9, 56–75.
Berendt, B. (2000b). Web usage mining, site semantics and the support of navigation. In Web mining for e-commerce—challenges and opportunities workshop (WEBKDD).
Berendt, B. (2001). Understanding web usage at different levels of abstraction: coarsening and visualizing sequence. In WEBKDD Workshop of mining log data across all customer touch points.
Blanco, L., Dalvi, N., & Machanavajjhala, A. (2011). Highly efficient algorithms for structural clustering of large websites. In 20th international conference on world wide web (WWW) (pp. 443– 446).
Dai, H., & Mobasher, B. (2002). Using ontologies to discover domain-level web usage profiles. In PKDD workshop on semantic mining.
Eirinaki, M., & Vazirgiannis, M. (2003). Web mining for web personalization. ACM Transactions on Internet Technology, 3(1), 1–27.
Eirinaki, M., Vazigiannis, M., & Varlamis, I. (2003). Sewep: using site semantics and a taxonomy to enhance the web personalization process. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 99–108).
Gunel, B., & Senkul, P. (2012a). Integrating semantic tagging with popularity based pagerank for next page prediction. In International symposium on computer and information sciences (ISCIS).
Gunel, B., & Senkul, P. (2012b). Investigating the effect of duration, page size end frequency on next page recommendation with pagerank algorithm. In WSDM Workshop on web search and click data (WSCD).
Harispe, S., Sanchez, D., Ranwez, S., Janaqi, S., & Montmain, J. (2014). A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. Journal of Biomedical Informatics, 48, 38–53.
Heflin, J., Hendler, J., & Luke, S. (1999). Shoe: a knowledge representation language for internet applications. CS-TR-4078 (UMACS TR-99-71), University of Maryland, Dept. of Computer Sciences.
Kilic, S., Senkul, P., & Toroslu, I.H. (2012). Clustering frequent navigation patterns from website logs by using ontology and temporal information. In International symposium on computer and information sciences (ISCIS) (pp. 363–370).
Mobasher, B., Cooley, R., & Srivastava, J. (1999). Creating adaptive web through usage-based clustering of urls. In IEEE Knowledge and data engineering exchange workshop.
Mobasher, B., Cooley, R., & Srivastava, J. (2000a). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.
Mobasher, B., Dai, H., Luo, T., Nakagawa, M., Yuqing, S., & Wiltshire, J. (2000b). Discovery of aggregate usage profiles for web personalization. In WEBKDD workshop on web mining for e-commerce.
Mobasher, B., Dai, H., Luo, T., Yuqing, S., & Zhu, J. (2000c). Integrating web usage and content mining for more effective personalization. In International conference on e-commerce and web technologies (ECWeb).
Needleman, S., & Wunsch, C. (1970). A general method applicable to search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453.
Pallis, G., Lefteris, A., & Vakali, A. (2007). Validation and interpretation of web users’ session clusters. Information Processing and Managament, 43(5), 1348–1367.
Perkowitz, M., & Etzioni, O. (1998). Adaptive web sites: automatically synthesizing web pages. In National conference on artificial intelligence.
Perkowitz, M., & Etzioni, O. (1999). Adaptive web sites: conceptual cluster mining. In International joint conference on artificial intelligence (IJCAI).
Perkowitz, M., & Etzioni, O. (2000). Towards adaptive web sites: conceptual framework and case study. Artificial Intelligence, 118(1–2), 245–275.
Pirro, G. (2009). A semantic similarity metric combining features and intrinsic information content. Data and Knowledge Engineering, 68(11), 1289–1308.
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 17–30.
Ricklefs, M., & Blomqvist, E. (2008). Ontology-based relevance assesment: an evaluation of different semantic similarity measures. In On the move (OTM) confederated international conferences (coopIS) (pp. 1235–1252).
Sanchez, D., Batet, M., Isem, D., & Valls, A. (2012). Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728.
Senkul, P., & Salin, S. (2012). Improving pattern quality in web usage mining by using semantic information. Knowledge and Information Systems, 30, 527–541.
Spiliopolou, M. (2000). Web usage mining for web site evaluation. Communications of the ACM, 43(8), 127–134.
Spiliopoulou, M., & Faulstich, L. (1998). Wum: a web utilization miner. In International workshop on the web and databases.
Spiliopoulou, M., Faulstich, L., & Wilkler, K. (1999). A data miner analyzing the navigational behaviour of web users. In ACAI workshop on machine learning in user modeling.
Thwe, P. (2014). Web page access prediction based on integrated approach. International Journal of Computer Science and Business Informatics, 12(1), 55–64.
Varelas, G., Voutsakis, E., Raftapoulou, P., Petrakis, E., & Milios, E. (2005). Semantic similarity methods in wordnet and their application to information retrieval on the web. In International workshop on web information and data management (WIDM) (pp. 10–16).
Zhao, Y., & Karypis, G. (2004). Emprical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55, 311–331.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sen, E., Toroslu, I.H. & Karagoz, P. Improving the prediction of page access by using semantically enhanced clustering. J Intell Inf Syst 47, 165–192 (2016). https://doi.org/10.1007/s10844-016-0398-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-016-0398-3