Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Support high-order tensor data description for outlier detection in high-dimensional big sensor data

Published: 01 April 2018 Publication History

Abstract

The various high-dimensional sensor data can be collected by wireless sensor networks, video monitoring systems and multimedia sensor networks, while High-dimensional sensor data is inherently large-scale because each sensor node has spatial attributes and may also be associated with large amounts of measurement data evolving over time. Detecting outlier in high-dimensional big sensor data is a challenging task. Most of existing outlier detection methods is based on vector representation. However, high-dimensional sensor data is naturally described by tensor representations. The vector-based methods can lead to destroy original structural information and correlation for high-dimensional sensors data, result in the problem of curse of dimensionality, and some outliers cannot be detected. To solve this problem, support high-order tensor data description (STDD) and kernel support high-order tensor data description (KSTDD) are proposed to detect outliers for tensor data. STDD and KSTDD extend support vector data description from vector space to tensor space. KSTDD maintains the structural information of data, avoids the problem caused by the vectorization of tensor data, and improves the performance of outlier detection. Experiments on four sensor datasets show that the proposed method is superior to the traditional vectorized data analysis method. We construct a third-order tensor representation model for big sensory data.For outlier detection in big sensory data, we propose KSTDD and a tensorial kernel.The proposed method can improve the accuracy and efficiency of anomaly detection.

References

[1]
L. Shi, A. Gangopadhyay, V.P. Janeja, STenSr: Spatio-temporal tensor streams for anomaly detection and pattern discovery, Knowl. Inform. Syst., 43 (2015) 333-353.
[2]
V.P. Janeja, N.R. Adam, V. Atluri, J. Vaidya, Spatial neighborhood based anomaly detection in sensor datasets, Data Min. Knowl. Discov., 20 (2010) 221-258.
[3]
Yanan Sun, Vandana Janeja, STOUT: Spatio-Temporal Outlier detection Using Tensors. ACM SIGKDD 2014 Workshop ODD2.
[4]
D. Hawkins, Chapman and Hall, 1980.
[5]
J. Mori, J. Yu, Quality relevant nonlinear batch process performance monitoring using a kernel based multiway non-Gaussian latent subspace projection approach, J. Process Control, 24 (2014) 57-71.
[6]
K.P. Singh, A. Malik, N. Basant, Multi-way partial least squares modeling of water quality data, Anal. Chim. Acta, 584 (2007) 385-396.
[7]
M.A. Engle, M. Gallo, K.T. Schroeder, Three-way compositional analysis of water quality monitoring data, Environ. Ecol. Stat., 21 (2014) 565-581.
[8]
L. Tran, C. Navasca, J. Luo, Video detection anomaly via low-rank and sparse decompositions, in: Image Processing Workshop (WNYIPW), 2012 Western New York, IEEE, 2012, pp. 17-20.
[9]
J. Li, G. Han, J. Wen, X. Gao, Robust tensor subspace learning for anomaly detection, Int. J. Mach. Learn. Cybern., 2 (2011) 89-98.
[10]
Q. Zhao, G. Zhou, T. Adali, Kernelization of tensor-based models for multiway data analysis: Processing of multidimensional structured data, IEEE Signal Process. Mag., 30 (2013) 137-148.
[11]
J. Sun, D. Tao, C. Faloutsos, Beyond streams and graphs: Dynamic tensor analysis, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2006, pp. 374-383.
[12]
H.H. Mao, C.J. Wu, E.E. Papalexakis, Malspot: Multi2 malicious network behavior patterns analysis, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer International Publishing, 2014, pp. 1-14.
[13]
H. Kim, S. Lee, X. Ma, C. Wang, Higher-order PCA for anomaly detection in large-scale networks, in: Proceedings of the 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), IEEE, 2009, pp. 85-88.
[14]
M. Araujo, S. Papadimitriou, S. Gnnemann, Com2: Fast automatic discovery of temporal (comet) communities, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer International Publishing, 2014, pp. 271-283.
[15]
J. Sun, D. Tao, S. Papadimitriou, P.S. Yu, C. Faloutsos, Incremental tensor analysis: Theory and applications, ACM Trans. Knowl. Discov. Data, 2 (2008) 651-678.
[16]
W. Peng, T. Li, Temporal relation co-clustering on directional social network and author-topic evolution, Knowl. Inform. Syst., 26 (2011) 467-486.
[17]
Z. Xu, F. Yan, Y. Qi, Bayesian nonparametric models for multiway data analysis, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015) 475-487.
[18]
J. Wang, F. Gao, P. Cui, Discovering urban spatio-temporal structure from time-evolving traffic networks, in: Asia-Pacific Web Conference, Springer International Publishing, 2014, pp. 93-104.
[19]
T.H. Fanaee, J. Gama, Event detection from traffic tensors: A hybrid model, Neurocomputing, 203 (2016) 22-33.
[20]
H. Tan, J. Feng, G. Feng, Traffic volume data outlier recovery via tensor model, in: Mathematical Problems in Engineering, 2013.
[21]
J.C. Ho, J. Ghosh, S.R. Steinhubl, Limestone: High-throughput candidate phenotype generation via tensor factorization, J. Biomed. Inform., 52 (2014) 199-211.
[22]
T.H. Fanaee, J. Gama, EigenEvent: An algorithm for event detection from complex data streams in syndromic surveillance, Intell. Data Anal., 19 (2015) 597-616.
[23]
V. Barnett, T. Lewis, John Wiley & Sons, 1994.
[24]
I. Ruts, P.J. Rousseeuw, Computing depth contours of bivariate point clouds, Comput. Statist. Data Anal., 23 (1996) 153-168.
[25]
A. Arning, R. Agrawal, P. Raghavan, KDD, 1996.
[26]
E.M. Knox, R.T. Ng, Algorithms for mining distancebased outliers in large datasets, in: Proceedings of the International Conference on Very Large Data Bases, 1998, pp. 392403.
[27]
M.M. Breunig, H.P. Kriegel, R.T. Ng, LOF: Identifying density-based local outliers, ACM sigmod record. ACM, 29 (2000) 93-104.
[28]
H. Ringberg, A. Soule, J. Rexford, Sensitivity of PCA for traffic anomaly detection, ACM SIGMETRICS Perform. Eval. Rev., 35 (2007) 109-120.
[29]
H. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, MPCA: Multilinear principal component analysis of tensor objects, IEEE Trans. Neural Netw., 19 (2008) 18-39.
[30]
C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn., 20 (1995) 273-297.
[31]
Yanan Sun, Vandana Janeja, STOUT: Spatio-Temporal Outlier detection Using Tensors, ACM SIGKDD 2014 Workshop ODD2.
[32]
L. He, X. Kong, S.Y. Philip, Dusk: A dual structure-preserving kernel for supervised tensor learning with applications to neuroimages, Matrix, 3 (2014) 2.
[33]
Sarah M.Erfani, Mahsa Baktashmotlaghy, Sutharshan Rajasegararz, Vinh Nguyen, R1STM: One-class Support Tensor Machine with Randomised Kernel, in: SIAM International Conference on Data Mining (SDM), 2016.
[34]
D.M.J. Tax, R.P.W. Duin, Support vector data description, Mach. Learn., 54 (2004) 45-66.
[35]
D.M.J. Tax, R.P.W. Duin, Support vector domain description, Pattern Recognit. Lett., 20 (1999) 1191-1199.
[36]
G. Chen, X. Zhang, Z.J. Wang, Robust support vector data description for outlier detection with noise or uncertain data, Knowl.-Based Syst., 90 (2015) 129-137.
[37]
K.Y. Lee, D.W. Kim, K.H. Lee, Density-induced support vector data description, IEEE Trans. Neural Netw., 18 (2007) 284-289.
[38]
B. Liu, Y. Xiao, L. Cao, SVDD-based outlier detection on uncertain data, Knowl. Inform. Syst., 34 (2013) 597-618.
[39]
C.D. Wang, J. Lai, Position regularized support vector domain description, Pattern Recognit., 46 (2013) 875-884.
[40]
M. Cha, J.S. Kim, J.G. Baek, Density weighted support vector data description, Expert Syst. Appl., 41 (2014) 3343-3350.
[41]
Montes dataset. http://db.csail.mit.edu/labdata/labdata.html.
[42]
ASA Section on Statistical Computing, Data expo 2009, http://stat-computing.org/dataexpo/2009/, 2014.
[43]
Occupancy detection dataset, http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+.
[44]
Air Quality dataset, http://archive.ics.uci.edu/ml/datasets/Air+Quality.
[45]
M. Kilmer, C. Martin, Factorization strategies for third-order tensors, Linear Algebra Appl., 435 (2011) 641-658.
[46]
V. DeSilva, L.H. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM J. Matrix Anal. Appl., 30 (2008) 1084-1127.
[47]
N. Vervliet, O. Debals, L. Sorber, M. VanBarel, L. DeLathauwer, Tensorlab 3.0, Available online, Mar. 2016. URL: http://www.tensorlab.net/.
[48]
D.M.J. Tax, DDtools, the Data Description Toolbox for Matlab 2.1.2, Available online, June 2015. URL: http://prlab.tudelft.nl/david-tax/dd_tools.html.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 81, Issue C
April 2018
580 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 April 2018

Author Tags

  1. Big sensor data
  2. CP factorization
  3. High-dimensional data
  4. KSTDD
  5. Outlier detection

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media