Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2623330.2623360acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Unveiling clusters of events for alert and incident management in large-scale enterprise it

Published: 24 August 2014 Publication History

Abstract

Large enterprise IT (Information Technology) infrastructure components generate large volumes of alerts and incident tickets. These are manually screened, but it is otherwise difficult to extract information automatically from them to gain insights in order to improve operational efficiency. We propose a framework to cluster alerts and incident tickets based on the text in them, using unsupervised machine learning. This would be a step towards eliminating manual classification of the alerts and incidents, which is very labor intense and costly. Our framework can handle the semi-structured text in alerts generated by IT infrastructure components such as storage devices, network devices, servers etc., as well as the unstructured text in incident tickets created manually by operations support personnel. After text pre-processing and application of appropriate distance metrics, we apply different graph-theoretic approaches to cluster the alerts and incident tickets, based on their semi-structured and unstructured text respectively. For automated interpretation and read-ability on semi-structured text clusters, we propose a method to visualize clusters that preserves the structure and human-readability of the text data as compared to traditional word clouds where the text structure is not preserved; for unstructured text clusters, we find a simple way to define prototypes of clusters for easy interpretation. This framework for clustering and visualization will enable enterprises to prioritize the issues in their IT infrastructure and improve the reliability and availability of their services.

Supplementary Material

MP4 File (p1630-sidebyside.mp4)

References

[1]
J. Bien and R. Tibshirani. Hierarchical clustering with prototypes via minimax linkage. Journal of the American Statistical Association, 106(495):1075--1084, 2011.
[2]
S. F. C.Fowlkes and J.Malik. Spectral grouping using the nystrom method. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26, February 2004.
[3]
J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton. Mad skills: new analysis practices for big data. Proceedings of the VLDB Endowment, 2(2):1481--1492, 2009.
[4]
P. ERDdS and A. R&WI. On random graphs i. Publ. Math. Debrecen, 6:290--297, 1959.
[5]
IBM Netcool. http://tinyurl.com/m5v8nh2.
[6]
S. Jain, I. Singh, A. Chandra, Z.-L. Zhang, and G. Bronevetsky. Extracting the textual and temporal structure of supercomputing logs. In High Performance Computing (HiPC), 2009 International Conference on, pages 254--263. IEEE, 2009.
[7]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22, August 2000.
[8]
A. N. Langville, C. D. Meyer, R. Albright, J. Cox, and D. Duling. Initializations for the nonnegative matrix factorization. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 23--26, 2006.
[9]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.
[10]
S. Z. Li, X. Hou, H. Zhang, and Q. Cheng. Learning spatially localized, parts-based representation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2001.
[11]
A. W. Moore. An introductory tutorial on KD-trees, 1991. http://tinyurl.com/cja9o9.
[12]
P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2):111--126, 1994.
[13]
V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons. Text mining using non-negative matrix factorizations. In Proceedings of SIAM International Conference on Data Mining, 2004.
[14]
F. Salfner and S. Tschirpke. Error log processing for accurate failure prediction. In WASL, 2008.
[15]
F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons. Document clustering using nonnegative matrix factorization. Information Processing & Management, 42(2):373--386, 2006.
[16]
L. Tang, T. Li, L. Shwartz, F. Pinel, and G. Y. Grabarnik. An integrated framework for optimizing automatic monitoring systems in large it infrastructures. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1249--1257. ACM, 2013.
[17]
W. Webber, A. Moffat, and J. Zobel. A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS), 28(4):20, 2010.
[18]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of ACM SIGIR conference on Research and Development in Information Retrieval, pages 267--273, 2003.
[19]
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix prize. In Algorithmic Aspects in Information and Management, pages 337--348. Springer, 2008.

Cited By

View all
  • (2024)Leveraging Large Language Models for Efficient Alert Aggregation in AIOPsElectronics10.3390/electronics1322442513:22(4425)Online publication date: 12-Nov-2024
  • (2024)Intelligent Monitoring Framework for Cloud Services: A Data-Driven ApproachProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639753(381-391)Online publication date: 14-Apr-2024
  • (2024)Dynamic Alert Suppression Policy for Noise Reduction in AIOpsProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639752(178-188)Online publication date: 14-Apr-2024
  • Show More Cited By

Index Terms

  1. Unveiling clusters of events for alert and incident management in large-scale enterprise it

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2014
      2028 pages
      ISBN:9781450329569
      DOI:10.1145/2623330
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 August 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. alerts and incidents management
      2. complete linkage
      3. connected components
      4. graph cut
      5. hierarchical clustering
      6. kd-tree
      7. non-negative matrix factorization
      8. tickets analysis

      Qualifiers

      • Research-article

      Conference

      KDD '14
      Sponsor:

      Acceptance Rates

      KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)26
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 20 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Leveraging Large Language Models for Efficient Alert Aggregation in AIOPsElectronics10.3390/electronics1322442513:22(4425)Online publication date: 12-Nov-2024
      • (2024)Intelligent Monitoring Framework for Cloud Services: A Data-Driven ApproachProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639753(381-391)Online publication date: 14-Apr-2024
      • (2024)Dynamic Alert Suppression Policy for Noise Reduction in AIOpsProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639752(178-188)Online publication date: 14-Apr-2024
      • (2024)Dependency Aware Incident Linking in Large Cloud SystemsCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648311(141-150)Online publication date: 13-May-2024
      • (2023)Outage-Watch: Early Prediction of Outages using Extreme Event RegularizerProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616316(682-694)Online publication date: 30-Nov-2023
      • (2023)Dynamic Graph Neural Networks-Based Alert Link Prediction for Online Service SystemsProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00177(79-90)Online publication date: 11-Nov-2023
      • (2023)Unsupervised Online Event Ranking for IT OperationsIntelligent Data Engineering and Automated Learning – IDEAL 202310.1007/978-3-031-48232-8_32(345-355)Online publication date: 22-Nov-2023
      • (2022)Online summarizing alerts through semantic and behavior informationProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510055(1646-1657)Online publication date: 21-May-2022
      • (2022)Characterizing and Mitigating Anti-patterns of Alerts in Industrial Cloud Systems2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN53405.2022.00047(393-401)Online publication date: Jun-2022
      • (2022)Context2VectorInformation and Software Technology10.1016/j.infsof.2022.106856146:COnline publication date: 1-Jun-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media