Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICSE-SEIP52600.2021.00031acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Neural knowledge extraction from cloud service incidents

Published: 17 December 2021 Publication History

Abstract

The move from boxed products to services and the widespread adoption of cloud computing has had a huge impact on the software development life cycle and DevOps processes. Particularly, incident management has become critical for developing and operating large-scale services. Prior work on incident management has heavily focused on the challenges with incident triaging and de-duplication. In this work, we address the fundamental problem of structured knowledge extraction from service incidents. We have built SoftNER, a framework for unsupervised knowledge extraction from service incidents. We frame the knowledge extraction problem as a Named-Entity Recognition task for extracting factual information. SoftNER leverages structural patterns like key-value pairs and tables for bootstrapping the training data. Further, we build a novel multitask learning based BiLSTM-CRF model which leverages not just the semantic context but also the data-types for named-entity extraction. We have deployed SoftNER at Microsoft, a major cloud service provider and have evaluated it on more than 2 months of cloud incidents. We show that the unsupervised machine learning pipeline has a high precision of 0.96. Our multi-task learning based deep learning model also outperforms the state of the art NER models. Lastly, using the knowledge extracted by SoftNER we are able to build significantly more accurate models for important downstream tasks like incident triaging.

References

[1]
S. Mehta, R. Bhagwan, R. Kumar, C. Bansal, C. Maddila, B. Ashok, S. Asthana, C. Bird, and A. Kumar, "Rex: Preventing bugs and miscon-figuration in large services using correlated change analysis," in 17th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 20), 2020, pp. 435--448.
[2]
Y. Dang, Q. Lin, and P. Huang, "Aiops: real-world challenges and research innovations," in 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 2019, pp. 4--5.
[3]
R. Kumar, C. Bansal, C. Maddila, N. Sharma, S. Martelock, and R. Bhargava, "Building sankie: An ai platform for devops," in Proceedings of the 1st International Workshop on Bots in Software Engineering, ser. BotSE '19. IEEE Press, 2019, p. 48--53.
[4]
J. Chen, X. He, Q. Lin, H. Zhang, D. Hao, F. Gao, Z. Xu, Y. Dang, and D. Zhang, "Continuous incident triage for large-scale online service systems," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 364--375.
[5]
J. Chen, X. He, Q. Lin, Y. Xu, H. Zhang, D. Hao, F. Gao, Z. Xu, Y. Dang, and D. Zhang, "An empirical investigation of incident triage for online service systems," in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019, pp. 111--120.
[6]
C. Bansal, S. Renganathan, A. Asudani, O. Midy, and M. Janakiraman, "Decaf: Diagnosing and triaging performance issues in large-scale cloud services," in 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2020.
[7]
C. Luo, J.-G. Lou, Q. Lin, Q. Fu, R. Ding, D. Zhang, and Z. Wang, "Correlating events with time series for incident diagnosis," in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 1583--1592.
[8]
D. Nadeau and S. Sekine, "A survey of named entity recognition and classification," Lingvisticae Investigationes, vol. 30, no. 1, pp. 3--26.
[9]
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, "Neural architectures for named entity recognition," arXiv preprint arXiv:1603.01360, 2016.
[10]
A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré, "Snorkel: Rapid training data creation with weak supervision," in Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, vol. 11, no. 3. NIH Public Access, 2017, p. 269.
[11]
N. Rao, C. Bansal, and J. Guan, "Code search intent classification using weak supervision," arXiv preprint arXiv:2011.11950, 2020.
[12]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," Journal of machine learning research, vol. 12, no. Aug, pp. 2493--2537, 2011.
[13]
J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014.
[14]
T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur, "Recurrent neural network based language model," in Eleventh annual conference of the international speech communication association, 2010.
[15]
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735--1780, 1997.
[16]
A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional lstm and other neural network architectures," Neural networks, vol. 18, no. 5--6, pp. 602--610, 2005.
[17]
D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[18]
P. Chen, Z. Sun, L. Bing, and W. Yang, "Recurrent attention network on memory for aspect sentiment analysis," in Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp. 452--461.
[19]
Q. Li, T. Li, and B. Chang, "Discourse parsing with attention-based hierarchical neural networks," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 362--371.
[20]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998--6008.
[21]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[22]
J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," 2001.
[23]
R. Caruana, "Multitask learning," Machine learning, vol. 28, no. 1, pp. 41--75, 1997.
[24]
Z. Huang, W. Xu, and K. Yu, "Bidirectional lstm-crf models for sequence tagging," arXiv preprint arXiv:1508.01991, 2015.
[25]
J. P. Chiu and E. Nichols, "Named entity recognition with bidirectional lstm-cnns," Transactions of the Association for Computational Linguistics, vol. 4, pp. 357--370, 2016.
[26]
R. Paulus, C. Xiong, and R. Socher, "A deep reinforced model for abstractive summarization," arXiv preprint arXiv:1705.04304, 2017.
[27]
Y. Chen, X. Yang, Q. Lin, H. Zhang, F. Gao, Z. Xu, Y. Dang, D. Zhang, H. Dong, Y. Xu et al., "Outage prediction and diagnosis for cloud service systems," in The World Wide Web Conference, 2019, pp. 2659--2665.
[28]
N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim, "Extracting structural information from bug reports," in Proceedings of the 2008 international working conference on Mining software repositories, 2008.
[29]
J. Anvik, L. Hiew, and G. C. Murphy, "Who should fix this bug?" in Proceedings of the 28th ICSE, 2006, pp. 361--370.
[30]
Y. Tian, D. Wijedasa, D. Lo, and C. Le Goues, "Learning to rank for bug report assignee recommendation," in 2016 IEEE 24th International Conference on Program Comprehension (ICPC). IEEE, 2016, pp. 1--10.
[31]
P. Pantel, T. Lin, and M. Gamon, "Mining entity types from query logs via user intent modeling," in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012, pp. 563--571.
[32]
Y. Xu, F. Ding, and B. Wang, "Entity-based query reformulation using wikipedia," in Proceedings of the 17th ACM conference on Information and knowledge management, 2008, pp. 1441--1442.
[33]
R. Florian, A. Ittycheriah, H. Jing, and T. Zhang, "Named entity recognition through classifier combination," in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 2003, pp. 168--171.
[34]
A. McCallum and W. Li, "Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons," in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, 2003.

Cited By

View all
  • (2024)LLexus: an AI agent system for incident managementACM SIGOPS Operating Systems Review10.1145/3689051.368905658:1(23-36)Online publication date: 14-Aug-2024
  • (2024)Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686670(72-83)Online publication date: 24-Oct-2024
  • (2024)AgraBOT: Accelerating Third-Party Security Risk Management in Enterprise Setting through Generative AICompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663829(74-79)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. Neural knowledge extraction from cloud service incidents
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE-SEIP '21: Proceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice
    May 2021
    405 pages
    ISBN:9780738146690

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    IEEE Press

    Publication History

    Published: 17 December 2021

    Check for updates

    Qualifiers

    • Research-article

    Conference

    ICSE '21
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)LLexus: an AI agent system for incident managementACM SIGOPS Operating Systems Review10.1145/3689051.368905658:1(23-36)Online publication date: 14-Aug-2024
    • (2024)Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686670(72-83)Online publication date: 24-Oct-2024
    • (2024)AgraBOT: Accelerating Third-Party Security Risk Management in Enterprise Setting through Generative AICompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663829(74-79)Online publication date: 10-Jul-2024
    • (2024)MTL-TRANSFER: Leveraging Multi-task Learning and Transferred Knowledge for Improving Fault Localization and Program RepairACM Transactions on Software Engineering and Methodology10.1145/365444133:6(1-31)Online publication date: 27-Jun-2024
    • (2024)Xpert: Empowering Incident Management with Query Recommendations via Large Language ModelsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639081(1-13)Online publication date: 20-May-2024
    • (2024)Dependency Aware Incident Linking in Large Cloud SystemsCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648311(141-150)Online publication date: 13-May-2024
    • (2023)Detection Is Better Than Cure: A Cloud Incidents PerspectiveProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613898(1891-1902)Online publication date: 30-Nov-2023
    • (2022)How to fight production incidents?Proceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563482(126-141)Online publication date: 7-Nov-2022
    • (2022)DeepAnalyzeProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3512759(549-560)Online publication date: 21-May-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media