Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3019612.3019671acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Enhanced situation space mining for data streams

Published: 03 April 2017 Publication History

Abstract

Data streams can capture the situation which an actor is experiencing. Knowledge of the present situation is highly beneficial for a wide range of applications. An algorithm called pcStream can be used to extract situations from a numerical data stream in an unsupervised manner. Although pcStream outperforms other stream clustering algorithms at this task, pcStream has two major flaws. The first is its complexity due to continuously performing principal component analysis (PCA). The second is its difficulty in detecting emerging situations whose distributions overlap in the same feature space.
In this paper we introduce pcStream2, a variant of pcStream which employs windowing and persistence in order to distinguish between emerging overlapping concepts. We also propose the use of incremental PCA (IPCA) to reduce the overall complexity and memory requirements of the algorithm. Although any IPCA algorithm can be used, we use a novel IPCA algorithm called Just-In-Time PCA which is better suited for processing streams. JIT-PCA makes intelligent 'short cuts' in order to reduce computations. We provide experimental results on real-world datasets that demonstrates how the proposed improvements make pcStream2 a more accurate and practical tool for situation space mining.

References

[1]
Pierluigi Casale, Oriol Pujol, and Petia Radeva. Human activity recognition from accelerometer data using a wearable device. In Iberian Conference on Pattern Recognition and Image Analysis, pages 289--296. Springer, 2011.
[2]
Guillermo Suarez-Tangil, Juan E Tapiador, Pedro Peris-Lopez, and Arturo Ribagorda. Evolution, detection and analysis of malware for smart devices. IEEE Communications Surveys & Tutorials, 16(2):961--987, 2014.
[3]
Amir Padovitz, Seng Wai Loke, and Arkady Zaslavsky. Towards a theory of context spaces. In Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second IEEE Annual Conference on, pages 38--42. IEEE, 2004.
[4]
Julien Pauty, Paul Couderc, and Michel Banâtre. Using context to navigate through a photo collection. In Proceedings of the 7th international conference on Human computer interaction with mobile devices & services, pages 145--152. ACM, 2005.
[5]
Pari Delir Haghighi, Arkady Zaslavsky, Shonali Krishnaswamy, and Mohamed Medhat Gaber. Mobile data mining for intelligent healthcare support. In hicss, pages 1--10. IEEE, 2009.
[6]
Seungkeun Lee and Junghyun Lee. Dynamic context aware system for ubiquitous computing environment. In Agent Computing and Multi-Agent Systems, pages 409--419. Springer, 2006.
[7]
Yisroel Mirsky, Bracha Shapira, Lior Rokach, and Yuval Elovici. pcstream: A stream clustering algorithm for dynamically detecting and managing temporal contexts. In Advances in Knowledge Discovery and Data Mining, pages 119--133. Springer, 2015.
[8]
Yisroel Mirsky, Asaf Shabtai, Bracha Shapira, Yuval Elovici, and Lior Rokach. Anomaly detection for smartphone data streams. Pervasive and Mobile Computing, 2016.
[9]
Alexey Tsymbal. The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin, 106, 2004.
[10]
Jonathon Shlens. A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100, 2014.
[11]
Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, André C. P. L. F. de Carvalho, and João Gama. Data stream clustering: A survey. ACM Comput. Surv., 46(1):13:1--13:31, July 2013.
[12]
Brain Babcock, Mayur Datar, Rajeev Motwani, and Liadan O'Callaghan. Maintaining variance and k-medians over data stream windows. In Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '03, pages 234--243, New York, NY, USA, 2003. ACM.
[13]
Jonathan A Silva, Elaine R Faria, Rodrigo C Barros, Eduardo R Hruschka, André CPLF de Carvalho, and João Gama. Data stream clustering: A survey. ACM Computing Surveys (CSUR), 46(1):13, 2013.
[14]
A Levey and Michael Lindenbaum. Sequential karhunen-loeve basis extraction and its application to images. IEEE Transactions on Image processing, 9(8):1371--1374, 2000.
[15]
Younes Chahlaoutf, Kyle A Gallivant, and Paul Van Dooren. An incremental method for computing dominant singular spaces. Computational information retrieval, 106:53, 2001.
[16]
Matthew Brand. Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications, 415(1):20--30, 2006.
[17]
Edo Liberty. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--588. ACM, 2013.
[18]
Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast monte-carlo algorithms for finding low-rank approximations. Journal of the ACM (JACM), 51(6):1025--1041, 2004.
[19]
Moshe Unger, Ariel Bar, Bracha Shapira, Lior Rokach, and Ehud Gudes. Contexto: lessons learned from mobile context inference. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pages 175--178. ACM, 2014.
[20]
Stephen D Bay, Dennis Kibler, Michael J Pazzani, and Padhraic Smyth. The uci kdd archive of large data sets for data mining research and experimentation. ACM SIGKDD Explorations Newsletter, 2(2):81--85, 2000.
[21]
Yisroel Mirsky, Asaf Shabtai, Lior Rokach, Bracha Shapira, and Yuval Elovici. Sherlock vs moriarty: A smartphone dataset for cybersecurity research. In Proceedings of the 2016 ACM workshop on Artificial intelligence and security, pages 45--54. ACM, 2016.
[22]
Pierluigi Casale, Oriol Pujol, and Petia Radeva. Personalization and user verification in wearable systems using biometric walking patterns. Personal and Ubiquitous Computing, 16(5):563--580, 2012.

Cited By

View all
  • (2024)Identifying Anomaly in IoT Traffic Flow With Locality Sensitive HashesIEEE Access10.1109/ACCESS.2024.342023812(89467-89478)Online publication date: 2024
  • (2022)Griffin: Real-Time Network Intrusion Detection System via Ensemble of Autoencoder in SDNIEEE Transactions on Network and Service Management10.1109/TNSM.2022.317571019:3(2269-2281)Online publication date: Sep-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '17: Proceedings of the Symposium on Applied Computing
April 2017
2004 pages
ISBN:9781450344869
DOI:10.1145/3019612
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. context space theory
  2. data mining
  3. data stream

Qualifiers

  • Research-article

Funding Sources

  • Israeli Ministry of Science, Technology and Space

Conference

SAC 2017
Sponsor:
SAC 2017: Symposium on Applied Computing
April 3 - 7, 2017
Marrakech, Morocco

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Identifying Anomaly in IoT Traffic Flow With Locality Sensitive HashesIEEE Access10.1109/ACCESS.2024.342023812(89467-89478)Online publication date: 2024
  • (2022)Griffin: Real-Time Network Intrusion Detection System via Ensemble of Autoencoder in SDNIEEE Transactions on Network and Service Management10.1109/TNSM.2022.317571019:3(2269-2281)Online publication date: Sep-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media