Abstract
Web Usage mining is a technique used to identify the user needs from the web log. Discovering hidden patterns from the logs is an upcoming research area. Association rules play an important role in many web mining applications to detect interesting patterns. However, it generates enormous rules that cause researchers to spend ample time and expertise to discover the really interesting ones. This paper works on the server logs from the MSNBC dataset for the month of September 1999. This research aims at predicting the probable subsequent page in the usage of web pages listed in this data based on their navigating behaviour by using Apriori prefix tree (PT) algorithm. The generated rules were ranked based on the support, confidence and lift evaluation measures. The final predictions revealed that the interestingness of pages mainly depended on the support and lift measure whereas confidence assumed a uniform value among all the pages. It proved that the system guaranteed 100% confidence with the support of 1.3E −05. It revealed that the pages such as Front page, On-air, News, Sports and BBS attracted more interested subsequent users compared to Travel, MSN-News and MSN-Sports which were of less interest.
Similar content being viewed by others
References
Agrawal R and Srikant R 1994 Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, 1215(1): 487–499
Anitha A and Krishnan N 2011 A dynamic web mining framework for E-learning recommendations using rough sets and association rule mining. Int. J. Comp. Appl. 12 (11): 19–25
Babu K G, Komali A, Mythry V and Ratnam A S K 2000 Web mining using semantic data mining techniques. Int. J. Soft Comput. Eng. (IJSCE) 3 (2): 2231–2307
Chakrabarti S 2002 Mining the web: Analysis of hypertext and semi structured data, Morgan Kaufmann
Chandra B and Basker S 2000 A new approach for classification of patterns having categorical attributes. IEEE International Conference on Systems, Man, and Cybernetics (SMC): 960–964
Chifu V and Salomie I 2009 A fluent calculus approach to automatic web service composition. Adv. Electr. Comput. Eng. 9 (3): 75–83
Chun-sheng Z and Li Y. 2014 Extension of local association rules mining algorithm based on Apriori algorithm, pp. 340–343
Debahuti M 2010 Predictive data mining: Promising future and applications. Int. J. Comput. Commun. Technol. 2 (1): 20–28
Eirinaki M, Vazirgiannis M and Kapogiannis D 2005 Web path recommendations based on page ranking and Markov models. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, 2–9
Ganapathy S, Sethukkarasi R, Yogesh P, Vijayakumar R and Kannan A 2014 An intelligent temporal pattern classification system using fuzzy temporal rules and particle swarm optimization. Sadhana, Indian Acad. Sci. 39 (2): 283–302
Gao S, Alhajj R, Rokne J and Guan J 2009 Set-based approach in mining sequential patterns. In: IEEE 24th International Symposium on Computer and Information Sciences, 2009. ISCIS 2009. pp. 218–223
Hacibeyoglu M, Arslan S and Kahramanli S 2013 A hybrid method for fast finding the reduct with the best classification accuracy. Adv. Electr. Comput. Eng. 13 (4): 57–64
Han J and Kamber M 2011 Data mining – Concepts and techniques, 3rd edition, Morgan Kauffmann Publishers
Hung Y S, Chen K L B, Yang C T and Deng G F 2013 Web usage mining for analysing elder self-care behavior patterns. Expert Syst. Appl. 40 (2): 775–783
Kum H. -C., Paulsen S. and Wang W. 2005 Comparative study of sequential pattern mining frameworks -support framework vs. multiple alignment framework. In IEEE 2nd International conference on data mining - workshop on the foundation of data mining and discovery. ICDM 2002. pp. 43–70
Internet Usage Statistics http://www.internetworldstats.com/stats.htm
Jacob S G and Ramani R G 2012 Evolving efficient classification rules from cardiotocography data through data mining methods and techniques. Eur. J. Sci. Res. 78 (3): 468–480
Jacob S G and Ramani R G 2013 Design and Implementation of a clinical data classifier: A supervised learning approach. Res. J. Biotechnol. 8 (2): 16–24
Jacob S G, Ramani R G and Nancy P 2013 Discovery of knowledge patterns in lymphographic clinical data through data mining methods and techniques. Advances in computing and information technology. LNCS Springer Berlin Heidelberg, 129–140
Jaideep S, Cooley R, Deshpande M and Tan P N 2000 Web usage mining: Discovery and applications of usage patterns from web data. ACM SIGKDD Explorations Newsletter 1 (2): 12–23
Kotsiantis S B and Kanellopoulos D 2001 Association rules mining: A recent overview. GESTS Int. Trans. Comput. Sci. Eng. 32 (1): 71–82
Kotsiantis S B, Zaharakis I D and Pintelas P E 2007 Supervised machine learning: A review of classification techniques, pp. 3–24
Kriegel H P 2007 Future trends in data mining. Data Mining Knowledge Discovery 15 (1): 87–97
Kumar S K and Chezian R M 2012 A survey on association rule mining using Apriori algorithm. Int. J. Comput. Appl. 45 (5): 7–50
Liu L and Peng T 2013 Post-processing of deep web information extraction based on domain ontology. Adv. Electr. Comput. Eng. 13 (4): 25–32
Madhuri B 2002 Analysis of the navigation behavior of the users’ using grey relational pattern: Analysis with Markov chains. Int. J. Eng. Sci. Technol. 2 (10): 5402–5412
Mary S S A and Malarvizhi M 2012 A new improved weighted association rule mining with dynamic programming approach for predicting a user’s next access. Comput. Sci. Inform. Technol. 2 (1): 10–15
Mitchell T 2009 Machine learning. McGraw Hill
Phoa F K H and Sanchez J 2013 Modeling the browsing behavior of world wide web users. Open Journal of Statistics. 3(2):145–154
Ramani R G and Jacob S G 2013a Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models. PloS one 8 (3): e58772
Ramani R G and Jacob S G 2013b Benchmarking classification models for cancer prediction from gene expression data: A novel approach and new findings. Studies Informatics Control 22 (2): 134–143
Ramani R G, Lakshmi B and Jacob S G 2012 Data mining method of evaluating classifier prediction accuracy in retinal data. IEEE International Conference on Computational Intelligence & Computing Research (ICCIC)
Renáta I and Vajk I 2006 Frequent pattern mining in web log data. Acta Polytechnica Hungarica 3 (1): 77–90
Robert C, Mobasher B and Srivastava J 1999 Data preparation for mining world wide web browsing patterns. Knowledge Inform. Syst. 1 (1): 5–32
Sanchez J and Liu C T 2011 Bayesian hierarchical model of the browsing behavior of world wide web Users. Department of Statistics, UCLA
Santhisree K and Damodaram A 2010 Optics on sequential data: Experiments and test results. Int. J. Comput. Appl. 11 (5): 15–21
Suraya A, Norhisham R M and Fun T S 2011 Discovering frequent sequential pattern using personalized minimum support threshold with minimum items. International Conference on Research and Innovation in Information Systems (ICRIIS) 10 (1): 1–6
Suresh K, Madanamohana R, Reddy R A and Subramanyam A 2011 Improved FCM algorithm for clustering on web usage mining. IEEE International Conference in Computer And Management (CAMAN): 1–4
Tassa T 2014 Secure mining of association rules in horizontally distributed databases. IEEE Trans. Knowledge Data Eng. 26(4): 970–983
University of California, Machine Learning Repository https://archive.ics.uci.edu/ml/.../MSNBC.com+Anonymous+Web+Data
Veeramalai S, Jaisankar N and Kannan A 2010 Efficient web log mining using enhanced Apriori algorithm with hash tree and fuzzy. Int. J. Comput. Sci. Inform. Technol. 2 (4): 241–247
Wang W, Yang J and Philip S Y 2000 Efficient mining of weighted association rules (WAR) In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 13–31
Wen-Hai G 2010 Research on client behavior pattern recognition system based on web log mining. International Conference On Machine Learning and Cybernetics (ICMLC) 1 (1): 10–21
Yang B, Xiangjun D and Fufu S 2009 Research of web usage mining based on negative association rules. International Forum on Computer Science-Technology and Applications 1 (1): 336
Zhang Y and Chen G 2014 A Forensics method of web browsing behaviour based on association rule mining. In: 2nd International Conference on Systems and Informatics, pp. 927–932
Zhou X and Huang Y 2014 An improved parallel association rules algorithm based on mapreduce framework for big data. In: 11th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 284–288
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
GEETHARAMANI, R., REVATHY, P. & JACOB, S.G. Prediction of users webpage access behaviour using association rule mining. Sadhana 40, 2353–2365 (2015). https://doi.org/10.1007/s12046-015-0424-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12046-015-0424-0