Abstract
With the growing popularity of the World Wide Web, large volume of user access data has been gathered automatically by Web servers and stored in Web logs. Discovering and understanding user behavior patterns from log files can provide Web personalized recommendation services. In this paper, a novel clustering method is presented for log files called Clustering large Weblog based on Key Path Model (CWKPM), which is based on user browsing key path model, to get user behavior profiles. Compared with the previous Boolean model, key path model considers the major features of users' accessing to the Web: ordinal, contiguous and duplicate. Moreover, for clustering, it has fewer dimensions. The analysis and experiments show that CWKPM is an efficient and effective approach for clustering large and high-dimension Web logs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Konstan J, Miller B, Maltz Det al. Group Lens: Applying collaborative filtering to usenet news.Communications of the ACM, 1997, 40(3): 78–87.
Zaiane O R, Xin M, Han J. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs. 2002, http://citeseer.nj.nec.com/zaiane98discovering.html
Buchner A G, Mulvenna M Det al. An Internet-enabled knowledge discovery process. May, 2002. http://citeseer.nj.nec.com/290505.html.
Joshi K P, Joshi Aet al. Warehousing and mining Web logs. InACM CIKM'99 2nd Workshop on Web Information and Data Management (WIDM'99), Nov. 5–6, 1999, Kansas City, Missouri, USA, pp.63–68.
Mobasher B, Jain N, Han E Het al. Web mining: Pattern discovery from World Wide Web transaction. April, 2002. http://citeseer.nj.nec.com/mobasher96Web.html
Mobasher B, Cooley R. Automatic personalization based on Web usage mining.Communications of the ACM, 2000, 48(8): 142–151.
Nasraoui O, Frigui H, Joshi A, Krishnapuram R. Mining Web access logs using relational competitive fuzzy clustering. May 2002. http://citeseer.nj.nec.com/nasraoui99mining.html.
Shahabi C, Zarkesh A M, Adibi Jet al. Knowledge discovery from users Web-page navigation. InProc. Workshop on Research Issues in Data Engineering, Birmingham, England, 1997, pp.20–29.
Yan T, Jacobsen M, Garcia-Molina Het al. From user access patterns to dynamic hypertext linking. 2002. http://www5conf.inria.fr/fich_html/papers/P8/Overview.html.
Song Q B, Shen J Y. An efficient and multi-purpose algorithm for mining Web logs.Journal of Computer Research & Development, 2002, 38(3): 328–333. (in Chinese)
Han E H, Karypis G, Kumar V, Mobasher B. Hypergraph based clustering in high-dimensional data sets: A summary of results.IEEE Bulletin of the Technical Committee on Data Engineering, March 1998, 21(1): 15–22.
Shahabi C, Banaei F, Faruque J. Feature matrices: A model for efficient and anonymous mining of Web navigation.EC-Web2001, Sept. 2001, Germany, pp.68–82.
Chen M Set al. Data mining for path traversal patterns in a Web environment. InProc. the 16th International Conference on Distributed Computing System, 1996, pp.385–392.
Spiliopoulou M, Faulstich L. WUM: A tool for Web utilization analysis. InExtended Version of Proc. EDBT Workshop Web DB'98, LNCS 1590, Springer-Verlag, 1999, pp.184–203.
Xiao Y Q, Dunham M H. Efficient mining of traversal patterns.Data and Knowledge Engineering, 2001, 39: 191–214.
Song A B, Hu K F, Dong Y S. Research on Web log mining.Journal of Southeast University, 2002, 32(1): 15–18.
Colly R, Mobosher Bet al. Data preparation for mining World Wide Web browsing patterns.Knowledge and Information Systems, 1999, 1(1): 53–62.
Chen N, Chen A, Zhou L X. An effective clustering algorithm in large transaction databases.Journal of Software, 2001, 12(4): 475–484.
Jardine N, Sibson R. Mathematical Taxonomy. John Wiley & Sons, London and New York, 1971.
Perkowitz M, Etzioni O. Towards adaptive Web sites: Conceptual framework and study.Artificial Intelligence, 2000, 118: 245–275.
McCreight E M. A Space-economical suffix tree construction algorithm.J. ACM, 1976, 23(1): 262–272.
Ukkonen E. On-line construction of suffix trees.Algorithmic, 1995, 14(3): 249–260.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the Special Program “Network based Science Activity Environment” of the National Natural Science Foundation of China, and Jiangsu Provincial Key Laboratory of Network and Information Security under Grant No.BM2003201.
Ai-Bo Song received the B.S. and M.S. degrees from School of Information and Engineering, Shandong University of Science and Technology in 1993 and 1996, respectively. Currently he is a Ph.D. candidate in Department of Computer Science and Engineering at Southeast University. His current research interests include data mining, data warehousing and Petri nets.
Mao-Xian Zhao is a Ph.D. candidate in School of Transportation and Traffic at Northern Jiaotong University. His current research interests are optimization & its application and algorithm analysis.
Zuo-Peng Liang is a Ph.D. candidate in Department of Computer Science and Engineering at Southeast University. His current research interests include data mining and XML data management.
Yi-Sheng Dong received the B.S. degree from Department of Computer Science and Engineering, Southeast University in 1965. Since then, he has been with Southeast University. His main research interests are database and software technology.
Jun-Zhou Luo is a professor of Department of Computer Science and Engineering, Southeast University, the secretary-general Petri net Committee of China Computer Federation, an active member of Now York Academy of Science. His current research interests include Petri-nets-based protocol engineering, computer network, and concurrent engineering.
Rights and permissions
About this article
Cite this article
Song, AB., Zhao, MX., Liang, ZP. et al. Discovering user profiles for Web personalized recommendation. J. Comput. Sci. & Technol. 19, 320–328 (2004). https://doi.org/10.1007/BF02944902
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02944902