Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3302541.3313101acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Can we Predict Performance Events with Time Series Data from Monitoring Multiple Systems?

Published: 27 March 2019 Publication History

Abstract

Predicting performance-related events is an important part of proactive fault management. As a result, many approaches exist for the context of single systems. Surprisingly, despite its potential benefits, multi-system event prediction, i.e., using data from multiple, independent systems, has received less attention. We present ongoing work towards an approach for multi-system event prediction that works with limited data and can predict events for new systems. We present initial results showing the feasibility of our approach. Our preliminary evaluation is based on 20 days of continuous, preprocessed monitoring time series data of 90 independent systems. We created five multi-system machine learning models and compared them to the performance of single-system machine learning models. The results show promising prediction capabilities with accuracies and F1-scores over 90% and false-positive-rates below 10%.

References

[1]
Javier Alonso, Luis Antonio Belanche Mu noz, and Dimiter Avresky. 2011. Predicting software anomalies using machine learning techniques. In Proceedings of the International Symposium on Network Computing and Applications. IEEE Computer Society Publications, 163--170.
[2]
Ayman Amin, Lars Grunske, and Alan Colman. 2013. An approach to software reliability prediction based on time series modeling. Journal of Systems and Software, Vol. 86, 7 (2013), 1923--1932.
[3]
Momotaz Begum and Tadashi Dohi. 2017. A neuro-based software fault prediction with Box-Cox power transformation. Journal of Software Engineering and Applications, Vol. 10, 03 (2017), 288.
[4]
Peter Bodik et almbox. 2010. Fingerprinting the Datacenter: Automated Classification of Performance Crises. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10). ACM, 111--124.
[5]
Anwesha Das, Frank Mueller, Charles Siegel, and Abhinav Vishnu. 2018. Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18). ACM, New York, NY, USA, 40--51.
[6]
Ilenia Fronza, Alberto Sillitti, Giancarlo Succi, Mikko Terho, and Jelena Vlasenko. 2013. Failure prediction based on log files using Random Indexing and Support Vector Machines. Journal of Systems and Software, Vol. 86, 1 (2013), 2--11.
[7]
Song Fu and Cheng-Zhong Xu. 2007. Exploring Event Correlation for Failure Prediction in Coalitions of Clusters. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07). ACM, New York, NY, USA, Article 41, bibinfonumpages12 pages.
[8]
Seyedrebvar Hosseini, Burak Turhan, and Dimuthu Gunarathna. 2018. A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction. IEEE Transactions on Software Engineering (2018).
[9]
Arunima Jaiswal and Ruchika Malhotra. 2016. Software Reliability Prediction Using Machine Learning Techniques. In Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Springer Singapore, 141--163.
[10]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9, Nov (2008), 2579--2605.
[11]
Cathy O'Neil and Rachel Schutt. 2014. Doing Data Science 3 ed.). O'Reilly Media Inc., Sebastopol, CA.
[12]
Fabian Pedregosa et almbox. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[13]
Fayola Peters, Tim Menzies, and Andrian Marcus. 2013. Better Cross Company Defect Prediction. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 409--418.
[14]
Teerat Pitakrat, Jonas Grunert, Oliver Kabierschke, Fabian Keller, and André van Hoorn. 2014. A Framework for System Event Classification and Prediction by Means of Machine Learning. In Proceedings of the 8th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS '14). ICST, 173--180.
[15]
Teerat Pitakrat, Dušan Okanovic, André van Hoorn, and Lars Grunske. 2018. Hora: Architecture-aware online failure prediction. Journal of Systems and Software, Vol. 137 (2018), 669--685.
[16]
Felix Salfner, Maren Lenk, and Miroslaw Malek. 2010. A Survey of Online Failure Prediction Methods. ACM Comput. Surv., Vol. 42, 3 (2010), 10:1--10:42.
[17]
Felix Salfner and Steffen Tschirpke. 2008. Error Log Processing for Accurate Failure Prediction. In Proceedings 1st USENIX WS on the Analysis of System Logs.
[18]
Andreas Schörgenhumer, Mario Kahlhofer Peter Chalupar, Hanspeter Mössenböck, and Paul Grünbacher. 2019. A Framework for Preprocessing Multivariate, Topology-Aware Time Series and Event Data in a Multi-System Environment. In Proceedings 19th International Symposium on High Assurance Systems Engineering (HASE). IEEE, Hangzhou, China.
[19]
Andreas Schörgenhumer, Mario Kahlhofer, Hanspeter Mössenböck, and Paul Grünbacher. 2018. Using Crash Frequency Analysis to Identify Error-Prone Software Technologies in Multi-System Monitoring. In The 18th IEEE International Conference on Software Quality, Reliability and Security (QRS). 183--190.
[20]
Bikash Sharma, Praveen Jayachandran, Akshat Verma, and Chita R. Das. 2013. CloudPD: Problem determination and diagnosis in shared dynamic clouds. In Proceedings 43rd International Conference on Dependable Systems and Networks. 1--12.
[21]
Yongmin Tan, Xiaohui Gu, and Haixun Wang. 2010. Adaptive System Anomaly Prediction for Large-scale Hosting Infrastructures. In Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. 173--182.
[22]
Li Yu, Ziming Zheng, Zhiling Lan, and Susan Coghlan. 2011. Practical online failure prediction for Blue Gene/P: Period-based vs event-driven. In Proceedings 41st International Conference on Dependable Systems and Networks Workshops. 259--264.
[23]
Xiao Zhang, Fanjing Meng, Pengfei Chen, and Jingmin Xu. 2016. TaskInsight: A Fine-Grained Performance Anomaly Detection and Problem Locating System. In Proceedings of the 9th International Conference on Cloud Computing. 917--920.

Cited By

View all
  • (2023)APRENDIZADO DE MÁQUINA EM AMBIENTES HOSPITALARES: UM ESTUDO DE ANÁLISE DE TENDÊNCIAS DE SOBRECARGA EM SISTEMAS DE TECNOLOGIAS DA INFORMAÇÃO E COMUNICAÇÃORevista Contemporânea10.56083/RCV3N9-1273:9(15866-15893)Online publication date: 27-Sep-2023
  • (2021)Guided ExplorationProceedings of the ACM on Human-Computer Interaction10.1145/34617315:EICS(1-34)Online publication date: 29-May-2021
  • (2020)A Taxonomy of Techniques for SLO Failure Prediction in Software SystemsComputers10.3390/computers90100109:1(10)Online publication date: 11-Feb-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '19: Companion of the 2019 ACM/SPEC International Conference on Performance Engineering
March 2019
99 pages
ISBN:9781450362863
DOI:10.1145/3302541
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. event prediction
  2. infrastructure monitoring data
  3. multivariate timeseries
  4. supervised machine learning

Qualifiers

  • Research-article

Funding Sources

Conference

ICPE '19

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)APRENDIZADO DE MÁQUINA EM AMBIENTES HOSPITALARES: UM ESTUDO DE ANÁLISE DE TENDÊNCIAS DE SOBRECARGA EM SISTEMAS DE TECNOLOGIAS DA INFORMAÇÃO E COMUNICAÇÃORevista Contemporânea10.56083/RCV3N9-1273:9(15866-15893)Online publication date: 27-Sep-2023
  • (2021)Guided ExplorationProceedings of the ACM on Human-Computer Interaction10.1145/34617315:EICS(1-34)Online publication date: 29-May-2021
  • (2020)A Taxonomy of Techniques for SLO Failure Prediction in Software SystemsComputers10.3390/computers90100109:1(10)Online publication date: 11-Feb-2020
  • (2020)Failure Prediction by Utilizing Log AnalysisProceedings of the International Conference on Research in Adaptive and Convergent Systems10.1145/3400286.3418263(188-195)Online publication date: 13-Oct-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media