Abstract
During last years we have seen an impressive growth and diffusion of applications shared and used by a huge amount of users around the world, like for example social networks, web portals or elearning platforms. Such systems produce in general a large amount of data, normally stored in its raw format in log file systems and databases. To prevent an unmanageable growing of the necessary space to store data and the breakdown of data usability, such data can be condensed and summarized to improve reporting performance and reduce the system load. This data summarization reduces the amount of space that is required to store software data but produces, as a side effect, a decrease of their informative capability due to an information loss. In this work the problem of summarizing data obtained by the log systems of applications with a lot of users is studied. In particular a model to represent these raw data as temporal events collected in time sequences is proposed, methods to reduce the data size, collapsing the descriptions of more events in a unique descriptor or in a smaller set of descriptors, are provided and the optimal summarization problem is posed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hadoop: an Open-Source MapReduce computing platform, http://hadoop.apache.org
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 3–14 (1995)
Allen, J.F.: An interval-based representation of temporal knowledge. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, vol. 1, pp. 221–226 (1981)
Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 832–843 (1983)
Chandola, V., Kumar, V.: Summarization–compressing data into an informative representation. Knowledge and Information Systems 12(3), 355–378 (2007)
Costantini, A., Tasso, S., Gervasi, O.: It Visualization and Web Services for Studying Molecular Properties. In: Computational Science and Applications, pp. 222–228 (2009) ISBN 978-0-7695-3701-6
Jiang, Y., Perng, C.S., Li, T.: Natural event summarization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 765–774. ACM (2011)
Kiernan, J., Terzi, E.: Constructing comprehensive summaries of large event sequences. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 417–425. ACM (2008)
Kiernan, J., Terzi, E.: Constructing comprehensive summaries of large event sequences. ACM Transactions on Knowledge Discovery from Data (TKDD) 3(4), 21 (2009)
Kiernan, J., Terzi, E.: EventSummarizer: A tool for summarizing large event sequences. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 1136–1139. ACM (2009)
Pallottelli, S., Tasso, S., Pannacci, N., Costantini, A., Lago, N.F.: Distributed and Collaborative Learning Objects Repositories on Grid Networks. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds.) ICCSA 2010. LNCS, vol. 6019, pp. 29–40. Springer, Heidelberg (2010)
Peng, W., Perng, C., Li, T., Wang, H.: Event summarization for system management. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1028–1032 (2007)
Pham, Q.K., Raschia, G., Mouaddib, N., Saint-Paul, R., Benatallah, B.: Time sequence summarization to scale up chronology-dependent applications. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 1137–1146 (2009)
Povinelli, R.J.: Identifying temporal patterns for characterization and prediction of financial time series events. In: Temporal Spatial and SpatioTemporal Data Mining, pp. 46–61 (2001)
Saint-Paul, R., Raschia, G., Mouaddib, N.: General purpose database summarization. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 733–744. VLDB Endowment (2005)
Tang, L., Li, T., Perng, C.S.: LogSig: Generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 785–794. ACM (2011)
Tasso, S., Pallottelli, S., Bastianini, R., Lagana, A.: Federation of Distributed and Collaborative Repositories and Its Application on Science Learning Objects. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part III. LNCS, vol. 6784, pp. 466–478. Springer, Heidelberg (2011)
Wang, J., Karypis, G.: On efficiently summarizing categorical databases. Knowledge and Information Systems 9(1), 19–37 (2006)
Wang, P., Wang, H., Liu, M., Wang, W.: An algorithmic approach to event summarization. In: Proceedings of the 2010 International Conference on Management of Data, pp. 183–194. ACM (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gentili, E., Milani, A., Poggioni, V. (2012). Data Summarization Model for User Action Log Files. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31137-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-31137-6_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31136-9
Online ISBN: 978-3-642-31137-6
eBook Packages: Computer ScienceComputer Science (R0)