Grey Fault Detection Method Based on Context Knowledge Graph in Container Cloud Storage

Birui Liang¹²,
Ningjiang Chen^12,13,
Yongsheng Xie¹²,
Ruifeng Wang¹² &
…
Yuhua Chen¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1042))

Included in the following conference series:

CCF Conference on Computer Supported Cooperative Work and Social Computing

1134 Accesses
1 Citations

Abstract

In the field of container cloud storage cluster resource scheduling, the activities, such as how to schedule resources according to load changes, and migrate according to resource conditions, are mainly considered. These activities bring about frequent changes in the context and also changes in the application’s operating environment. They pose great difficulties in locating fault, especially the location of grey faults, which affect the operation of the application in the containers. Therefore, in order to ensure the normal operation of the application, grey fault detection method is proposed, which establishes a relationship knowledge graph for the relationship between the context change and the grey fault by studying the change of the application attention feature, which are brought by the context change. The method introduces temporal and spatial snapshot group architecture to solve a large number of situational temporal queries caused by too large structure of knowledge graph. The method is validated in the container cluster project and the Google open source dataset, which can effectively detect grey fault scenarios and the accuracy rate has been improved by more than 90%.

Supported by the Natural Science Foundation of China (No. 61762008), and the Guangxi Natural Science Foundation Project (No. 2017GXNSFAA198141), and Key R&D project of Guangxi (No. GuiKE AB17195014).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CMonitor: A Monitoring and Alarming Platform for Container-Based Clouds

PLMSys: A Cloud Monitoring System Based on Cluster Performance and Container Logs

Distributed Cloud Monitoring Platform Based on Log In-Sight

References

Huang, P., et al.: Gray failure: the Achilles’ heel of cloud-scale systems. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 150–155. ACM (2017)
Google Scholar
Miao, Y., et al.: ImmortalGraph: a system for storage and analysis of temporal graphs. ACM Trans. Storage (TOS) 11(3), 14 (2015)
Google Scholar
Docker: docker (2014). https://docs.docker.com/swarm/
Bernstein, D.: Containers and cloud: from LXC to docker to kubernetes. IEEE Cloud Comput. 1(3), 81–84 (2014)
Article Google Scholar
Huang, P., Guo, C., Lorch, J.R., Zhou, L., Dang, Y.: Capturing and enhancing in situ system observability for failure detection. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 1–16 (2018)
Google Scholar
Kubernetes: kubernetes (2014). https://www.kubernetes.org.cn/
Islam, T., Manivannan, D.: Predicting application failure in cloud: a machine learning approach. In: 2017 IEEE International Conference on Cognitive Computing (ICCC), pp. 24–31. IEEE (2017)
Google Scholar
Alquraan, A., Takruri, H., Alfatafta, M., Al-Kiswany, S.: An analysis of network-partitioning failures in cloud systems. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 51–68 (2018)
Google Scholar
duoergun0729: nlp. https://github.com/duoergun0729/nlp/blob/master
jerry81333: StockProdiction. https://github.com/jerry81333/StockProdiction/
Hariri, S., Kind, M.C.: Batch and online anomaly detection for scientific applications in a Kubernetes environment. In: Proceedings of the 9th Workshop on Scientific Cloud Computing, p. 3. ACM (2018)
Google Scholar
Song, B., Yu, Y., Zhou, Y., Wang, Z., Du, S.: Host load prediction with long short-term memory in cloud computing. J. Supercomput. 74(12), 6554–6568 (2018)
Article Google Scholar
Gupta, S., Dinesh, D.A.: Resource usage prediction of cloud workloads using deep bidirectional long short term memory networks. In: 2017 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), pp. 1–6. IEEE (2017)
Google Scholar
IBM: IBM cloud private technical community. https://www.ibm.com/developerworks/community/wikis/home?lang=zh#!/wiki/W1559b1be149d_43b0_881e_9783f38faaff
Gupta, S., Muthiyan, N., Kumar, S., Nigam, A., Dinesh, D.A.: A supervised deep learning framework for proactive anomaly detection in cloud workloads. In: 2017 14th IEEE India Council International Conference (INDICON), pp. 1–6. IEEE (2017)
Google Scholar
Tencent: Tencent cloud. https://cloud.tencent.com/document/product/457/9112
jianshu: Aliyun cloud. https://www.jianshu.com/p/b7a402c2cf2a
Chen, X., Lu, C.D., Pattabiraman, K.: Failure analysis of jobs in compute clouds: a Google cluster case study. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering, pp. 167–177. IEEE (2014)
Google Scholar
Hwang, S.Y., Yang, W.S.: On-tour attraction recommendation in a mobile environment. In: 2012 IEEE International Conference on Pervasive Computing and Communications Workshops, pp. 661–666. IEEE (2012)
Google Scholar
Cao, L., Luo, J., Gallagher, A., Jin, X., Han, J., Huang, T.S.: A worldwide tourism recommendation system based on geotagged web photos. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2274–2277. IEEE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Electronic Information, Guangxi University, Nanning, 530004, China
Birui Liang, Ningjiang Chen, Yongsheng Xie, Ruifeng Wang & Yuhua Chen
Guangxi Key Laboratory of Multimedia Communications and Network Technology, Nanning, 530004, China
Ningjiang Chen

Authors

Birui Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ningjiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yongsheng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Ruifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhua Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ningjiang Chen .

Editor information

Editors and Affiliations

Shandong University, Jinan, China
Yuqing Sun
Fudan University, Shanghai, China
Tun Lu
Kunming University of Science and Technology, Kunming, China
Zhengtao Yu
Tongji University, Shanghai, China
Hongfei Fan
University of Shanghai for Science and Technology, Shanghai, China
Liping Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, B., Chen, N., Xie, Y., Wang, R., Chen, Y. (2019). Grey Fault Detection Method Based on Context Knowledge Graph in Container Cloud Storage. In: Sun, Y., Lu, T., Yu, Z., Fan, H., Gao, L. (eds) Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2019. Communications in Computer and Information Science, vol 1042. Springer, Singapore. https://doi.org/10.1007/978-981-15-1377-0_5

Download citation

DOI: https://doi.org/10.1007/978-981-15-1377-0_5
Published: 14 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1376-3
Online ISBN: 978-981-15-1377-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Grey Fault Detection Method Based on Context Knowledge Graph in Container Cloud Storage

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CMonitor: A Monitoring and Alarming Platform for Container-Based Clouds

PLMSys: A Cloud Monitoring System Based on Cluster Performance and Container Logs

Distributed Cloud Monitoring Platform Based on Log In-Sight

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Grey Fault Detection Method Based on Context Knowledge Graph in Container Cloud Storage

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CMonitor: A Monitoring and Alarming Platform for Container-Based Clouds

PLMSys: A Cloud Monitoring System Based on Cluster Performance and Container Logs

Distributed Cloud Monitoring Platform Based on Log In-Sight

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation