计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 349-352.
冯贵兰1, 周文刚2
FENG Gui-lan1, ZHOU Wen-gang2
摘要: 随着大数据时代的到来,异常检测受到了广泛关注。针对传统KNN异常检测算法处理速度和计算资源的瓶颈,以及Hadoop平台上的MapReduce不能友好支持迭代计算和基于内存计算等问题,提出了一种基于Spark平台的并行KNN异常检测算法。该算法首先对数据集进行分区和广播,然后用map函数计算数据集在每个分区的K近邻,使用reduce函数归并map函数的输出计算全局K近邻得到异常度,将异常度前n个对象视为异常。与传统KNN异常检测算法相比,在保证检测精度的前提下该算法的性能与计算资源呈近似线性关系;与其他并行异常检测算法相比,该算法无需额外扩展数据,支持迭代,而且通过在内存中缓存中间结果来减少I/O花销。实验结果证明,该算法可以提高KNN算法在大规模数据上的异常检测效率。
中图分类号:
[1]陈运文,吴飞,吴庐山,等.基于异常检测的时间序列研究[J].计算机技术与发展,2015(4):166-170. [2]HODGE V J,AUSTIN J.A survey of outlier detection metho-dologies[J].Artificial Intelligence Review,2004,22(2):85-126. [3]邹云峰,张昕,宋世渊,等.基于局部密度的快速离群点检测算法[J].计算机应用,2017,37(10):2932-2937. [4]KNORR,EDWIN M,NG,et al.Distance-based outliers:algorithms and applications[J].Vldb Journal,2000,8(3-4):237-253. [5]BREUNIG M M.LOF:identifying density-based local outliers[J].ACM Sigmod Record,2000,29(2):93-104. [6]辛丽玲.基于密度差异的离群点检测研究[D].北京:北京交通大学,2015. [7]PAN Y,ZHANG J.Parallel Programming on Cloud Computing Platforms[J].Journal of Convergence Volume,2012,3:23-28. [8]SUBRAMANYAM R B V,SONAM G.Map-Reduce Algorithm for Mining Outliers in the Large Data Sets using Twister Programing Model[J].International Journal of Computer Science and Electronics Engineering,2015,3(1):81-86. [9]郭一鹏,梁吉业,赵兴旺.基于MapReduce的混合数据孤立点检测算法[J].小型微型计算机系统,2014,35(9):1961-1966. [10]苟杰,马自堂,张喆程.PODKNN:面向大数据集的并行离群点检测算法[J].计算机科学,2016,43(7):251-254. [11]KNORR E M,NG R T.A Unified Notion of Outliers:Properties and Computation[C]∥International Conference on Knowledge Discovery & Data Mining.1997:219-222. [12]RAMASWAMY S,RASTOGI R,SHIM K.Efficient algorithms for mining outliers from large data sets[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2000:427-438. [13]高彦杰.Spark 大数据处理技术、应用与性能优化[M].北京:机械工业出版社,2014. [14]AGGARWAL C C,YU P S.Outlier Detection for High Dimensional Data[J].ACM Sigmod Record,2001,30(2):37-46. [15]ZHANG X,GONG K,ZHAO G.Parallel K-Medoids algorithm based on MapReduce[J].Journal of Computer Applications,2013,33(4):1023-1005. [16]ALCALÁ-FDEZ J,FERNÁNDEZ A,LUENGO J,et al.KEEL Data-Mining Software Tool:Data Set Repository,Integration of Algorithms and Experimental Analysis Framework[J].Journal of Multiple-Valued Logic & Soft Computing,2011,17:255-287. |
[1] | 王子凯, 朱健, 张伯钧, 胡凯. 区块链与智能合约并行方法研究与实现 Research and Implementation of Parallel Method in Blockchain and Smart Contract 计算机科学, 2022, 49(9): 312-317. https://doi.org/10.11896/jsjkx.210800102 |
[2] | 徐天慧, 郭强, 张彩明. 基于全变分比分隔距离的时序数据异常检测 Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance 计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174 |
[3] | 李其烨, 邢红杰. 基于最大相关熵的KPCA异常检测方法 KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion 计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175 |
[4] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[5] | 魏恺轩, 付莹. 基于重参数化多尺度融合网络的高效极暗光原始图像降噪 Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising 计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179 |
[6] | 杜航原, 李铎, 王文剑. 一种面向电商网络的异常用户检测方法 Method for Abnormal Users Detection Oriented to E-commerce Network 计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092 |
[7] | 宗迪迪, 谢益武. 基于法线迭代的模型中轴生成方法 Model Medial Axis Generation Method Based on Normal Iteration 计算机科学, 2022, 49(6A): 764-770. https://doi.org/10.11896/jsjkx.210400050 |
[8] | 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157 |
[9] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
[10] | 刘江, 刘文博, 张矩. OpenFoam中多面体网格生成的MPI+OpenMP混合并行方法 Hybrid MPI+OpenMP Parallel Method on Polyhedral Grid Generation in OpenFoam 计算机科学, 2022, 49(3): 3-10. https://doi.org/10.11896/jsjkx.210700060 |
[11] | 冷佳旭, 谭明圮, 胡波, 高新波. 基于隐式视角转换的视频异常检测 Video Anomaly Detection Based on Implicit View Transformation 计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266 |
[12] | 刘意, 毛莺池, 程杨堃, 高建, 王龙宝. 基于邻域一致性的异常检测序列集成方法 Locality and Consistency Based Sequential Ensemble Method for Outlier Detection 计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156 |
[13] | 李家振, 纪庆革. 动态低采样环境光遮蔽的实时光线追踪分子渲染 Dynamic Low-sampling Ambient Occlusion Real-time Ray Tracing for Molecular Rendering 计算机科学, 2022, 49(1): 175-180. https://doi.org/10.11896/jsjkx.210200042 |
[14] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[15] | 郭奕杉, 刘漫丹. 基于时空轨迹数据的异常检测 Anomaly Detection Based on Spatial-temporal Trajectory Data 计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193 |
|