Abstract
In order to realize the research on network redundant data cleaning based on big data, this paper designs a set of redundant data cleaning framework according to the data processing flow before data analysis. According to the spatial correlation of redundant data, a method of data cleaning is designed. In the data cleaning method, appropriate cleaning algorithms are designed for abnormal data and missing data respectively, in which mathematical probability design is applied to abnormal data to delete the data with obvious deviation from the normal data value. The spatial model and algorithm are designed by applying spatial correlation to the missing data to fill the missing data value after the redundant data is cleaned by other steps in the method. The accuracy of the model is compared with that of the common data prediction algorithm, and the accuracy between the algorithm and the redundant data set is verified.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chai Q, Zheng W, Pan J, Lu S, Wen J (2018) Research on state monitoring and fault handling methods of intelligent distribution network based on big data analysis. Modern Electron Technol 10(4):3137–3147
Shen X, Li Y, Ma Y, Yang J (2019) Application of environmental monitoring system based on GIS technology in comprehensive pipe gallery. Municipal Technol 124(5):936–939
Liu B, Fu Z, Wang Y, Wang P, Gao X (2018) Big data mining technology based on parallel computing and its application in power plant boiler performance optimization. Chin J Power Eng 38(6):431–439
Wang H, Li Z, Zhang X (2017) An adaptive audit method for data integrity in cloud storage. Comput Res Dev 54(1):172–179
Zhang S, Wang Z, Wang B (2017) Integrity detection scheme of power consumption information collection terminal based on trusted computing. Electric Power Autom Equip 12:117–124
Zhang R, Ma Z (2017) Simulation research on missing optimization detection of big data network information system. Comput Simul 56(9):69–81
Zhou J, Wang J, He T, Wang J, Li P (2018) Multi-sensor data fusion of greenhouse environment based on spatio-temporal correlation. Jiangsu Agricult Sci 89(5):31–42
Wu F (2018) Data science and big data technology: the sweet pastry in emerging majors. Friends High School Stud 63(1):1–7
Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57
Sun X, Li P, Liu Y (2019) Design and implementation of smart home control system based on the internet of things. Electron Technol Softw Eng 62(7):4430–4442
Chen W (2019) Research and analysis of building energy consumption monitoring system based on internet things technology. Green Build 01:3650–3652
Wang L, Chen Q, Gao H, Ma Z, Zhang Y, He D (2018) Intelligent substation fault tracking architecture based on big data mining technology. Autom Electric Power Syst 42(03):84–91
Li H, Wan X (2017) Research on mass data sharing technology based on OS2 master station system. Electron Design Eng 20:1–6
Li H, Zhang L (2017) Multi-tenant data integrity verification scheme based on two-layer authentication tree. Chin Sci Technol Paper 107(8):203–216
Shah JS, Rai SN, DeFilippis AP, Hill BG, Bhatnagar A, Brock GN (2017) Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinform 18(1):1–13
Marshall DD, Powers R (2017) Beyond the paradigm: combining mass spectrometry and nuclear magnetic resonance for metabolomics. Prog Nuclear Magn Resonance Spectrosco 100:1–16
Xu Y (2019) The application and prospect analysis of the Internet of Things technology in the stadium system. Dig Commun World 3(08):80–89
Yi T, Xi C, Weidong L, Baochang C, Liuqing D, Liyun S, Lihong H (2017) Global and untargeted metabolomics evidence of the protective effect of different extracts of Dipsacus asper Wall. ex C.B. Clarke on estrogen deficiency after ovariectomia in rats. J Ethnopharmacol 199:20–29
Wang Z, Guo Z, Yang H, Liu B (2019) Analysis of the effect of population structure changes on medical and health expenditure based on vector autoregressive model. China Health Stat 37(2):307–332
Tao Y, Zhang H, Xu J (2018) Application research of outlier detection in big data analysis. Inf Sci 14(03):373–377
Hao S, Li G, Feng J, Wang N (2018) Overview of structured data cleaning technology. J Tsinghua Univ (Nat Sci Ed) 26(1):65–74
Qu C, Zhang Y, Wang Y, Zhao Y (2018) Energy Internet power energy big data cleaning model based on Spark framework. Electr Meas Instrum 86(3):221–236
Xu S, Mi W, Xu Z, Bo Z (2017) A dynamic data integrity verification scheme in smart grid. Comput Eng 12(8):366–371
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fang, J. Research on automatic cleaning algorithm of multi-dimensional network redundant data based on big data. Evol. Intel. 15, 2609–2617 (2022). https://doi.org/10.1007/s12065-021-00620-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-021-00620-y