Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

From Rocks to Pebbles: Smoothing Spatiotemporal Data Streams in an Overlay of Sensors

Published: 12 August 2019 Publication History

Abstract

Spatiotemporal streams are prone to data quality issues such as missing, duplicated and delayed data—when data generating sensors malfunction, data transmissions experience problems, or when data are stored or processed improperly. However, many important real-time applications rely on the continuous availability of stream values, e.g., to monitor traffic flow, resource usage, weather phenomena, and so on. Other non real-time applications that support continuous or offline historical analytics also require high quality data to avoid producing misleading output such as false positives, erroneous conclusions, and decisions.
In this article, we study the problem of smoothing streams produced by an overlay of sensors. We present nonparametric (data-driven, distribution free) statistical methods to provide an uninterrupted stream of high-quality spatiotemporal data to real-time applications, even when the raw stream suffers data quality issues, such as noise or missing values. Our novel family of robust methods computes smoothed values (SVs) that could be used as proxies for data of questionable quality. The methods make use of a partition of the monitored area into cells to compute SVs based on historical data and the deviation from normalcy in neighboring spatial cells in a way that outperforms standard regression or interpolation. Our methods use incremental computation for efficiency, and they differ in how the deviations are normalized, e.g., with respect to zeroth-order, first-order, and second-order moments. We use three real data sets to run a suite of experiments and empirically demonstrate the superiority of the method that uses normalization with respect to variability.

References

[1]
Charu C. Aggarwal. 2003. A framework for diagnosing changes in evolving data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 575--586.
[2]
Charu C. Aggarwal. 2007. Data Streams: Models and Algorithms. Vol. 31. Springer Science 8 Business Media, New York.
[3]
Mohamed Ali, Badrish Chandramouli, Balan S. Raman, and Ed Katibah. 2010. Real-time spatio-temporal analytics using Microsoft Streaminsight. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS’10). ACM, New York, NY, 542--543.
[4]
Fabrizio Angiulli and Fabio Fassetti. 2007. Detecting distance-based outliers in streams of data. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, 811--820.
[5]
Laure Berti-Equille, Tamraparni Dasu, and Divesh Srivastava. 2011. Discovery of complex glitch patterns: A novel approach to quantitative data cleaning. In Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE’11). IEEE Computer Society, 733--744.
[6]
George E. P. Box, Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung. 2015. Time Series Analysis: Forecasting and Control. John Wiley 8 Sons, Hoboken, NJ.
[7]
Peter J. Brockwell and Richard A. Davis. 2002. Introduction to Time Series and Forecasting, Second Edition. Springer-Verlag New York, Inc., NY.
[8]
Philip E. Brown, Tamraparni Dasu, Yaron Kanza, and Divesh Srivastava. 2017. Mind the gaps (and bumps): Statistical smoothing of spatiotemporal streams. In Proceedings of the 8th ACM SIGSPATIAL Workshop on GeoStreaming (IWGS’17). ACM, New York, NY, 29--38.
[9]
Robert Goodell Brown. 2004. Smoothing, Forecasting and Prediction of Discrete Time Series. Dover Publications, Inc., Mineola, NY.
[10]
Chris Chatfield. 2016. The Analysis of Time Series: An Introduction. Chapman and Hall/CRC, New York, NY.
[11]
New York City. 2015. CitiBike System Data. Retrieved from: https://www.citibikenyc.com/system-data/.
[12]
Edith Cohen and Haim Kaplan. 2007. Spatially decaying aggregation over a network. J. Comput. Syst. Sci. 73, 3 (May 2007), 265--288.
[13]
Edith Cohen and Martin J. Strauss. 2006. Maintaining time-decaying stream aggregates. J. Algorithms 59, 1 (Apr. 2006), 19--36.
[14]
Graham Cormode, Flip Korn, and Srikanta Tirthapura. 2008. Exponentially decayed aggregates on data streams. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE’08). IEEE Computer Society, 1379--1381.
[15]
Graham Cormode, Flip Korn, and Srikanta Tirthapura. 2008. Time-decaying aggregates in out-of-order streams. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’08). ACM, New York, NY, 89--98.
[16]
Noel Cressie, Tao Shi, and Emily L. Kang. 2010. Fixed rank filtering for spatio-temporal data. J. Comput. Graph. Stat. 19, 3 (2010), 724--745.
[17]
Tamraparni Dasu. 2013. Data glitches: Monsters in your data. In Handbook of Data Quality. Springer, Berlin, 163--178.
[18]
Tamraparni Dasu, Yaron Kanza, and Divesh Srivastava. 2017. Geotagging IP packets for location-aware software-defined networking in the presence of virtual network functions. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’17). ACM, New York, NY, Article 9, 4 pages.
[19]
Tamraparni Dasu, Yaron Kanza, and Divesh Srivastava. 2018. Geofences in the sky: Herding drones with blockchains and 5G. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’18). ACM, New York, NY, 73--76.
[20]
Tamraparni Dasu, Shankar Krishnan, Dongyu Lin, Suresh Venkatasubramanian, and Kevin Yi. 2009. Change (detection) you can believe in: Finding distributional shifts in data streams. In Proceedings of the International Symposium on Intelligent Data Analysis. Springer, Berlin, 21--34.
[21]
Tamraparni Dasu and Ji Meng Loh. 2012. Statistical distortion: Consequences of data cleaning. Proc. VLDB Endow. 5, 11 (July 2012), 1674--1683.
[22]
Tamraparni Dasu, Vladislav Shkapenyuk, Divesh Srivastava, and Deborah F. Swayne. 2015. FIT to monitor feed quality. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1728--1739.
[23]
Victor Teixeira de Almeida, Ralf Hartmut Guting, and Thomas Behr. 2006. Querying moving objects in SECONDO. In Proceedings of the 7th International Conference on Mobile Data Management (MDM’06). IEEE Computer Society, 47--.
[24]
Alysha M. De Livera, Rob J. Hyndman, and Ralph D. Snyder. 2011. Forecasting time series with complex seasonal patterns using exponential smoothing. J. Amer. Statist. Assoc. 106, 496 (2011), 1513--1527.
[25]
D. Donoho. 1982. Breakdown Properties of Multivariate Location Estimators. Ph.D. Dissertation. Harvard University.
[26]
V. A. Epanechnikov. 1969. Non-parametric estimation of a multivariate probability density. Theor. Prob. Appl. 14, 1 (1969), 153--158.
[27]
Luca Forlizzi, Ralf Hartmut Güting, Enrico Nardelli, and Markus Schneider. 2000. A data model and data structures for moving objects databases. SIGMOD Rec. 29, 2 (May 2000), 319--330.
[28]
Zdravko Galić, Emir Mešković, Krešimir Križanović, and Mirta Baranović. 2012. OCEANUS: A spatio-temporal data stream system prototype. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS’12). ACM, New York, NY, 109--115.
[29]
Zdravko Galić, Emir Mešković, and Dario Osmanović. 2017. Distributed processing of big mobility data as spatio-temporal data streams. Geoinformatica 21, 2 (Apr. 2017), 263--291.
[30]
Deepak Ganesan, Sylvia Ratnasamy, Hanbiao Wang, and Deborah Estrin. 2004. Coping with irregular spatio-temporal sampling in sensor networks. SIGCOMM Comput. Commun. Rev. 34, 1 (Jan. 2004), 125--130.
[31]
Everette S. Gardner Jr. 1985. Exponential smoothing: The state of the art. J. Forecast. 4, 1 (1985), 1--28.
[32]
Johannes Gehrke, Flip Korn, and Divesh Srivastava. 2001. On computing correlated aggregates over continual data streams. SIGMOD Rec. 30, 2 (May 2001), 13--24.
[33]
Johannes Gehrke, Flip Korn, and Divesh Srivastava. 2001. On computing correlated aggregates over continual data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’01). ACM, New York, NY, 13--24.
[34]
Lukasz Golab and M. Tamer Özsu. 2003. Issues in data stream management. SIGMOD Rec. 32, 2 (June 2003), 5--14.
[35]
Ranit Gotsman and Yaron Kanza. 2015. A dilution-matching-encoding compaction of trajectories over road networks. Geoinformatica 19, 2 (Apr. 2015), 331--364.
[36]
Lewis D. Griffin. 2000. Mean, median and mode filtering of images. In Proc. Royal Soc. London A: Math., Phys. Eng. Sci., Vol. 456. The Royal Society, London, UK, 2995--3004.
[37]
Ralf Hartmut Güting, Michael H. Böhlen, Martin Erwig, Christian S. Jensen, Nikos A. Lorentzos, Markus Schneider, and Michalis Vazirgiannis. 2000. A foundation for representing and querying moving objects. ACM Trans. Database Syst. 25, 1 (Mar. 2000), 1--42.
[38]
James Douglas Hamilton. 1994. Time Series Analysis. Vol. 2. Princeton University Press, Princeton, NJ.
[39]
Andrew C. Harvey. 1990. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge, UK.
[40]
Abdeltawab M. Hendawi, Mohamed Ali, and Mohamed F. Mokbel. 2017. Panda*: A generic and scalable framework for predictive spatio-temporal queries. Geoinformatica 21, 2 (Apr. 2017), 175--208.
[41]
Abdeltawab M. Hendawi and Mohamed F. Mokbel. 2012. Panda: A predictive spatio-temporal query processor. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems (SIGSPATIAL’12). ACM, New York, NY, 13--22.
[42]
Hesam Izakian and Witold Pedrycz. 2014. Anomaly detection and characterization in spatial time series data: A cluster-centric approach. IEEE Trans. Fuzz. Syst. 22, 6 (2014), 1612--1624.
[43]
Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. J. Basic Eng. 82, 1 (1960), 35--45.
[44]
Matthias Katzfuss and Noel Cressie. 2011. Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets. J. Time Series Anal. 32, 4 (2011), 430--446.
[45]
Seyed Jalal Kazemitabar, Farnoush Banaei-Kashani, and Dennis McLeod. 2011. Geostreaming in cloud. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS’11). ACM, New York, NY, 3--9.
[46]
Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2011. Continuous monitoring of distance-based outliers over data streams. In Proceedings of the 27th International Conference on Data Engineering (ICDE’11). IEEE, 135--146.
[47]
Flip Korn, S. Muthukrishnan, and Divesh Srivastava. 2002. Reverse nearest neighbor aggregates over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB’02). VLDB Endowment, 814--825. Retrieved from: http://dl.acm.org/citation.cfm?id=1287369.1287439.
[48]
Timothy Masters. 1995. Neural, Novel and Hybrid Algorithms for Time Series Prediction (1st ed.). John Wiley 8 Sons, Inc., New York, NY.
[49]
Jeremiah Miller, Miles Raymond, Josh Archer, Seid Adem, Leo Hansel, Sushma Konda, Malik Luti, Yao Zhao, Ankur Teredesai, and Mohamed Ali. 2011. An extensibility approach for spatio-temporal stream processing using Microsoft Streaminsight. In Proceedings of the 12th International Conference on Advances in Spatial and Temporal Databases (SSTD’11). Springer-Verlag, Berlin, 496--501. Retrieved from: http://dl.acm.org/citation.cfm?id=2035253.2035300.
[50]
Mohamed F. Mokbel, Xiaopeng Xiong, Moustafa A. Hammad, and Walid G. Aref. 2005. Continuous query processing of spatio-temporal data streams in PLACE. Geoinformatica 9, 4 (Dec. 2005), 343--365.
[51]
National Centers for Environmental Informations. 2014. Quality Controlled Local Climatological Data. National Centers for Environmental Informations. Retrieved from: https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/quality-controlled-local-climatological-data-qclcd.
[52]
Daniel B. Neill. 2009. Expectation-based scan statistics for monitoring spatial time series data. Int. J. Forecast. 25, 3 (2009), 498--517.
[53]
Silvia Nittel. 2015. Real-time sensor data streams. SIGSPATIAL Special 7, 2 (Sept. 2015), 22--28.
[54]
Silvia Nittel, J. C. Whittier, and Qinghan Liang. 2012. Real-time spatial interpolation of continuous phenomena using mobile sensor data streams. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems (SIGSPATIAL’12). ACM, New York, NY, 530--533.
[55]
S. Rakshit, Ashish Ghosh, and B. Uma Shankar. 2007. Fast mean filtering technique (FMFT). Pattern Recog. 40, 3 (2007), 890--897.
[56]
C. R. Rao. 1973. Linear Statistical Inference and Its Applications. Wiley, New York.
[57]
Loic Salmon and Cyril Ray. 2017. Design principles of a stream-based framework for mobility analysis. Geoinformatica 21, 2 (Apr. 2017), 237--261.
[58]
Jeffrey Shafer, Scott Rixner, and Alan L. Cox. 2010. The Hadoop distributed filesystem: Balancing portability and performance. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS). IEEE, 122--133.
[59]
Robert H. Shumway and David S. Stoffer. 1982. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Anal. 3, 4 (1982), 253--264.
[60]
Bradley P. Carlin Sudipto Banerjee, and Alan E. Gelfand. 2004. Hierarchical Modeling and Analysis for Spatial Data. CRC Press, Boca Raton, FL.
[61]
Norbert Wiener. 1964. Extrapolation, Interpolation, and Smoothing of Stationary Time Series. The MIT Press. https://dl.acm.org/citation.cfm?id=1097023.
[62]
Guozhang Wang, Joel Koshy, Sriram Subramanian, Kartik Paramasivam, Mammad Zadeh, Neha Narkhede, Jun Rao, Jay Kreps, and Joe Stein. 2015. Building a replicated logging system with Apache Kafka. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1654--1655.
[63]
Andreas S. Weigend. 2018. Time Series Prediction: Forecasting the Future and Understanding the Past. Taylor 8 Francis, New York, NY.
[64]
J. C. Whittier, Qinghan Liang, and Silvia Nittel. 2014. Evaluating stream predicates over dynamic fields. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS’14). ACM, New York, NY, 2--11.
[65]
J. C. Whittier, Silvia Nittel, Mark A. Plummer, and Qinghan Liang. 2013. Towards window stream queries over continuous phenomena. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming (IWGS’13). ACM, New York, NY, 2--11.
[66]
Cort J. Willmott and Kenji Matsuura. 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Res. 30, 1 (2005), 79--82.
[67]
Jianqiu Xu and Ralf Hartmut Güting. 2013. A generic data model for moving objects. Geoinformatica 17, 1 (Jan. 2013), 125--172.
[68]
Pusheng Zhang, Yan Huang, Shashi Shekhar, and Vipin Kumar. 2003. Correlation analysis of spatial time series datasets: A filter-and-refine approach. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, 532--544.
[69]
Yunyue Zhu and Dennis Shasha. 2002. StatStream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB’02). VLDB Endowment, 358--369. Retrieved from: http://dl.acm.org/citation.cfm?id=1287369.1287401.

Cited By

View all
  • (2023)Planning Wireless Backhaul Links by Testing Line of Sight and Fresnel Zone ClearanceACM Transactions on Spatial Algorithms and Systems10.1145/35173829:1(1-30)Online publication date: 12-Jan-2023
  • (2022)A fault-tolerant clustering algorithm for processing data from multiple streamsInformation Sciences: an International Journal10.1016/j.ins.2021.10.049584:C(649-664)Online publication date: 1-Jan-2022
  • (2021)Spatial Interpolation Techniques on Participatory Sensing DataACM Transactions on Spatial Algorithms and Systems10.1145/34576097:3(1-32)Online publication date: 8-Jun-2021
  • Show More Cited By

Index Terms

  1. From Rocks to Pebbles: Smoothing Spatiotemporal Data Streams in an Overlay of Sensors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Spatial Algorithms and Systems
      ACM Transactions on Spatial Algorithms and Systems  Volume 5, Issue 3
      September 2019
      189 pages
      ISSN:2374-0353
      EISSN:2374-0361
      DOI:10.1145/3356873
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2019
      Accepted: 01 April 2019
      Revised: 01 April 2019
      Received: 01 July 2018
      Published in TSAS Volume 5, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Spatial sensors
      2. data quality
      3. residuals
      4. smoothing
      5. spatiotemporal streams

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)13
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 25 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Planning Wireless Backhaul Links by Testing Line of Sight and Fresnel Zone ClearanceACM Transactions on Spatial Algorithms and Systems10.1145/35173829:1(1-30)Online publication date: 12-Jan-2023
      • (2022)A fault-tolerant clustering algorithm for processing data from multiple streamsInformation Sciences: an International Journal10.1016/j.ins.2021.10.049584:C(649-664)Online publication date: 1-Jan-2022
      • (2021)Spatial Interpolation Techniques on Participatory Sensing DataACM Transactions on Spatial Algorithms and Systems10.1145/34576097:3(1-32)Online publication date: 8-Jun-2021
      • (2021)Tracking Stream Quality Issues in Combined Physical and Radar Sensors for IoT-based Data-driven Actuation2021 CIE International Conference on Radar (Radar)10.1109/Radar53847.2021.10028325(2429-2434)Online publication date: 15-Dec-2021

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media