Abstract
Identifying and targeting visitors on e-commerce website with personalized content in real-time is extremely important to marketers. Although such targeting exists today, it is based on demographic attributes of the visitors. We show that dynamic visitor attributes extracted from their click-stream provide much better predictive capabilities of visitor intent. In this work, we propose a mechanism for identifying similar visitor sessions on a website based on their click-streams. Novel techniques for extracting features from visitor clicks are employed. Large margin nearest neighbour (LMNN) algorithm is used to learn a similarity metric between any two sessions. Further the sessions are classified into purchasers and non-purchasers using k-nearest neighbour (kNN) classification. Experimental results showing significant improvements over baseline algorithms based on Hidden Markov Model(HMM), support vector machine (SVM) and random forest are presented on two large real-world data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aizawa, A.: An information-theoretic perspective of tf–idf measures. Information Processing & Management 39(1), 45–65 (2003)
Blitzer, J., Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2005)
Bucklin, R.E., Sismeiro, C.: A model of web site browsing behavior estimated on clickstream data. Journal of Marketing Research, 249–267 (2003)
Cadez, I., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of navigation patterns on a web site using model-based clustering. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280–284. ACM (2000)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: ACM SIGCOMM Computer Communication Review, vol. 29, pp. 251–262. ACM (1999)
Joachims, T.: A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, DTIC Document (1996)
Li, J., Tian, H., Xing, D.: Clustering user session data for web applications test. Journal of Computational Information Systems 7(9), 3174–3181 (2011)
Mahalanobis, P.C.: On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta) 2, 49–55 (1936)
Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 488–501. Springer, Heidelberg (2012)
Moe, W.W.: Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. Journal of Consumer Psychology 13(1), 29–39 (2003)
Montgomery, A.L., Li, S., Srinivasan, K., Liechty, J.C.: Modeling online browsing and path analysis using clickstream data. Marketing Science (2004)
Newman, M.E.: Power laws, pareto distributions and zipf’s law. Contemporary Physics 46(5), 323–351 (2005)
Petridou, S.G., Koutsonikola, V.A., Vakali, A.I., Papadimitriou, G.I.: Time-aware web users’ clustering. IEEE Transactions on Knowledge and Data Engineering 20(5), 653–667 (2008)
Poggi, N., Carrera, D., Gavalda, R., Ayguadé, E., Torres, J.: A methodology for the evaluation of high response time on e-commerce users and sales
Scott, S.L., Hann, I.-H.: A nested hidden markov model for internet browsing behavior (2006)
Sismeiro, C., Bucklin, R.E.: Modeling purchase behavior at an e-commerce web site: a task-completion approach. Journal of Marketing Research (2004)
Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Review (1996)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS) 26(3), 13 (2008)
Ypma, A., Ypma, E., Heskes, T.: Categorization of web pages and user clustering with mixtures of hidden markov models (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pai, D., Sharang, A., Yadagiri, M.M., Agrawal, S. (2014). Modelling Visit Similarity Using Click-Stream Data: A Supervised Approach. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-11749-2_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)