Abstract
In the field of container cloud storage cluster resource scheduling, the activities, such as how to schedule resources according to load changes, and migrate according to resource conditions, are mainly considered. These activities bring about frequent changes in the context and also changes in the application’s operating environment. They pose great difficulties in locating fault, especially the location of grey faults, which affect the operation of the application in the containers. Therefore, in order to ensure the normal operation of the application, grey fault detection method is proposed, which establishes a relationship knowledge graph for the relationship between the context change and the grey fault by studying the change of the application attention feature, which are brought by the context change. The method introduces temporal and spatial snapshot group architecture to solve a large number of situational temporal queries caused by too large structure of knowledge graph. The method is validated in the container cluster project and the Google open source dataset, which can effectively detect grey fault scenarios and the accuracy rate has been improved by more than 90%.
Supported by the Natural Science Foundation of China (No. 61762008), and the Guangxi Natural Science Foundation Project (No. 2017GXNSFAA198141), and Key R&D project of Guangxi (No. GuiKE AB17195014).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Huang, P., et al.: Gray failure: the Achilles’ heel of cloud-scale systems. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 150–155. ACM (2017)
Miao, Y., et al.: ImmortalGraph: a system for storage and analysis of temporal graphs. ACM Trans. Storage (TOS) 11(3), 14 (2015)
Docker: docker (2014). https://docs.docker.com/swarm/
Bernstein, D.: Containers and cloud: from LXC to docker to kubernetes. IEEE Cloud Comput. 1(3), 81–84 (2014)
Huang, P., Guo, C., Lorch, J.R., Zhou, L., Dang, Y.: Capturing and enhancing in situ system observability for failure detection. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 1–16 (2018)
Kubernetes: kubernetes (2014). https://www.kubernetes.org.cn/
Islam, T., Manivannan, D.: Predicting application failure in cloud: a machine learning approach. In: 2017 IEEE International Conference on Cognitive Computing (ICCC), pp. 24–31. IEEE (2017)
Alquraan, A., Takruri, H., Alfatafta, M., Al-Kiswany, S.: An analysis of network-partitioning failures in cloud systems. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 51–68 (2018)
duoergun0729: nlp. https://github.com/duoergun0729/nlp/blob/master
jerry81333: StockProdiction. https://github.com/jerry81333/StockProdiction/
Hariri, S., Kind, M.C.: Batch and online anomaly detection for scientific applications in a Kubernetes environment. In: Proceedings of the 9th Workshop on Scientific Cloud Computing, p. 3. ACM (2018)
Song, B., Yu, Y., Zhou, Y., Wang, Z., Du, S.: Host load prediction with long short-term memory in cloud computing. J. Supercomput. 74(12), 6554–6568 (2018)
Gupta, S., Dinesh, D.A.: Resource usage prediction of cloud workloads using deep bidirectional long short term memory networks. In: 2017 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), pp. 1–6. IEEE (2017)
IBM: IBM cloud private technical community. https://www.ibm.com/developerworks/community/wikis/home?lang=zh#!/wiki/W1559b1be149d_43b0_881e_9783f38faaff
Gupta, S., Muthiyan, N., Kumar, S., Nigam, A., Dinesh, D.A.: A supervised deep learning framework for proactive anomaly detection in cloud workloads. In: 2017 14th IEEE India Council International Conference (INDICON), pp. 1–6. IEEE (2017)
Tencent: Tencent cloud. https://cloud.tencent.com/document/product/457/9112
jianshu: Aliyun cloud. https://www.jianshu.com/p/b7a402c2cf2a
Chen, X., Lu, C.D., Pattabiraman, K.: Failure analysis of jobs in compute clouds: a Google cluster case study. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering, pp. 167–177. IEEE (2014)
Hwang, S.Y., Yang, W.S.: On-tour attraction recommendation in a mobile environment. In: 2012 IEEE International Conference on Pervasive Computing and Communications Workshops, pp. 661–666. IEEE (2012)
Cao, L., Luo, J., Gallagher, A., Jin, X., Han, J., Huang, T.S.: A worldwide tourism recommendation system based on geotagged web photos. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2274–2277. IEEE (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liang, B., Chen, N., Xie, Y., Wang, R., Chen, Y. (2019). Grey Fault Detection Method Based on Context Knowledge Graph in Container Cloud Storage. In: Sun, Y., Lu, T., Yu, Z., Fan, H., Gao, L. (eds) Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2019. Communications in Computer and Information Science, vol 1042. Springer, Singapore. https://doi.org/10.1007/978-981-15-1377-0_5
Download citation
DOI: https://doi.org/10.1007/978-981-15-1377-0_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1376-3
Online ISBN: 978-981-15-1377-0
eBook Packages: Computer ScienceComputer Science (R0)