Abstract
The behavior and purposes of Internet usage of the users need to be understood based on the Web usage history within an organization. The data are stored as huge log files. Often, data are stored separately and exist at various places; therefore, it is difficult to manage or utilize the data. This research aims at examining and developing an analysis tool for log files applying Hadoop and Hive. The development was divided into two parts. First, data from the Web History were gathered by using PHP via SQLite in order to classify the data into website categories, especially Google, YouTube and Facebook. The obtained data were then used to analyze the categories of accessed websites. The findings were recorded on Hive by an enhanced algorithm to be able to analyze the categories. The algorithm was also designed to analyze words and phrases used in Google search. Second, behavior and purposes of accessing websites during class was analyzed. The results can be displayed in real time in a percent format and the frequency of Website accesses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gavandi P, Guri B, Ingawle S, Yadav S (2016) Web server log processing using Hadoop. In: 1st International Conference on Research. Enhancement and Advancements in Technology and Engineering
Namahoot CS, Pinijkitcharoenkul S, Brückner M (2018) Travel review analysis system with big data (TRAS). In: Lecture Note in Computer Science, 11344, pp 18–28
Savitha K,Vijaya MS (2014) Mining of web server logs in a distributed cluster using big data technologies. Int J Adv Comput Sci Appl 5(1):137–142
Hingave H, Ingle R (2015) An approach for MapReduce based log analysis using Hadoop. In: 2nd International Conference on Electronics and Communication Systems, pp 1264–1268
Saravanan S, Maheswari BU (2014) Analysing large web log files in a Hadoop distributed cluster environment. Int J Comput Appl Technol 5(5):1677–1681
Narkhede S, Baraskar T (2013) HMR log analyzer: analyze web application logs over Hadoop MapReduce. Int J UbiComp, IJU 4(3):41–51
Rashmi S, Anirban B (2015) Scheduling strategies in Hadoop: a survey. Orient J Comput Sci Technology 8(3):234–240
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Antony S et al (2010) Hive a warehousing solution over a map-reduce framework. The VLDB Endowment 2(2):1626–1629
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang, N, et al (2010) Hive a petabyte scale data warehouse using Hadoop. In: ICDE Conference, pp 996–1005
Oh J, Lee S, Lee S (2011) Advanced evidence collection and analysis of web browser activity. Digital investigation 8:S62–S70
Savant P, Bhattacharyya D, Kim T (2016) Hadoop based Weblog analysis: a review. International Journal of Software Engineering and its Applications 10(6):13–30
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Namahoot, C.S., Brückner, M., Lekkam, W. (2020). System for Analysing Big Weblog Data. In: Kim, K., Kim, HY. (eds) Information Science and Applications. Lecture Notes in Electrical Engineering, vol 621. Springer, Singapore. https://doi.org/10.1007/978-981-15-1465-4_53
Download citation
DOI: https://doi.org/10.1007/978-981-15-1465-4_53
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1464-7
Online ISBN: 978-981-15-1465-4
eBook Packages: EngineeringEngineering (R0)