Abstract
We present our initial findings regarding the problem of the impact that time series compression may have on similarity-queries, in the settings in which the elements of the dataset are accompanied with additional contexts. Broadly, the main objective of any data compression approach is to provide a more compact (i.e., smaller size) representation of a given original dataset. However, as has been observed in the large body of works on compression of spatial data, applying a particular algorithm “blindly” may yield outcomes that defy the intuitive expectations – e.g., distorting certain topological relationships that exist in the “raw” data [7]. In this study, we quantify this distortion by defining a measure of similarity distortion based on Kendall’s \(\tau \). We evaluate this measure, and the correspondingly achieved compression ratio for the five most commonly used time series compression algorithms and the three most common time series similarity measures. We report some of our observations here, along with the discussion of the possible broader impacts and the challenges that we plan to address in the future.
X. Teng—Research supported by NSF grant III 1823267.
G. Trajcevski—Research supported by NSF grants III-1823279 and CNS-1823267, and ONR grant N00014-14-1-0215.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Around the same time, there were other algorithms developed for polyline simplification, some of which had almost-identical methodologies with the DP algorithm. Most notably [16] which is the reason that sometimes the name Ramer-Douglas-Peuker is used in the literature.
- 2.
References
GPCC: Global Precipitation Climatology Centre. https://climatedataguide.ucar.edu/climate-data/gpcc-global-precipitation-climatology-centre
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-57301-1_5
Barequet, G., Chen, D.Z., Daescu, O., Goodrich, M.T., Snoeyink, J.: Efficiently approximating polygonal paths in three and higher dimensions. Algorithmica 33(2), 150–167 (2002)
Chan, W.S., Chin, F.: Approximation of polygonal curves with minimum number of line segments. Int. J. Comput. Geom. Appl. 6 (1992)
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2) (2008)
Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitised line or its caricature. Can. Cartographer 10(2) (1973)
Estkowski, R., Mitchell, J.S.B.: Simplifying a polygonal subdivision while keeping it simple. In: Symposium on Computational Geometry (2001)
Ferrari, L., Rosi, A., Mamei, M., Zambonelli, F.: Extracting urban patterns from location-based social networks. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks, LBSN 2011 (2011)
Hey, T., Tansley, S., Tolle, K.M., et al.: The Fourth Paradigm: Data-Intensive Scientific Discovery, vol. 1. Microsoft Research, Redmond (2009)
Hirschberg, D., Lelewer, D.A.: Data compression. Comput. Surv. 19(3) (1987)
Kasperson, J.X., Kasperson, R.E., Turner II, B.L.: Regions at Risk: Comparisons of Threatened Environments. United Nations University Press, Tokyo (1995)
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7, 358–386 (2005)
Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3) (2001)
Pearson, K.: Note on regression and inheritance in the case of two parents. Proc. Roy. Soc. London (1895)
Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Comput. Graph. Image Process. 1, 244–256 (1972)
Reddy, M.J., Adarsh, S.: Time-frequency characterization of sub-divisional scale seasonal rainfall in India using the Hilbert-Huang transform. Stochast. Environ. Res. Risk Assess. 30, 1063–1085 (2016)
Rényi, A.: On measures of entropy and information. Technical report, Hungarian Academy of Sciences, Budapest, Hungary (1961)
Rienecker, M.M., et al.: Merra: Nasa’s modern-era retrospective analysis for research and applications. J. Climate 24(14), 3624–3648 (2011)
Saalfeld, A.: Topologically consistent line simplification with the Douglas-Peucker algorithm. Cartography Geogr. Inf. Sci. 26(1), 7–18 (1999)
Sayemuzzaman, M., Jha, M.K.: Seasonal and annual precipitation time series trend analysis in North Carolina, United States. Atmos. Res. 137 (2014)
Sayood, K.: Introduction to Data Compression. Morgan Kauffman, Burlington (1996)
Sharma, C.S., Panda, S.N., Pradhan, R.P., Singh, A., Kawamura, A.: Precipitation and temperature changes in Eastern India by multiple trend detection methods. Atmos. Res. 180 (2016)
Shekhar, S., Chawla, S.: Spatial Databases: A Tour. Prentice Hall, Upper Saddle River (2003)
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, vol. 400, pp. 525–526 (2000)
Suarez, M.J., et al.: The geos-5 data assimilation system-documentation of versions 5.0. 1, 5.1. 0, and 5.2. 0 (2008)
Vaisman, A.A., Zimányi, E.: Data Warehouse Systems - Design and Implementation. Data-Centric Systems and Applications. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54655-6
van Kreveld, M., Luo, J.: The definition and computation of trajectory and subtrajectory similarity. In: GIS (2007)
Visvalingam, M., Whyatt, J.D.: Line generalisation by repeated elimination of points. Cartographic J. 30(1) (1993)
Weibel, R.: Generalization of spatial data: principles and selected algorithms. In: van Kreveld, M., Nievergelt, J., Roos, T., Widmayer, P. (eds.) CISM School 1996. LNCS, vol. 1340, pp. 99–152. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63818-0_5
Yang, G., Züfle, A.: Spatio-temporal prediction of social connections. In: Proceedings of the Fourth International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, Chicago, IL, USA, 14 May 2017 (2017)
Acknowledgments
We thank Praxxal Patel and Yash Thesia for their help in finalizing part of the experiments.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Teng, X., Züfle, A., Trajcevski, G., Klabjan, D. (2018). Location-Awareness in Time Series Compression. In: Benczúr, A., Thalheim, B., Horváth, T. (eds) Advances in Databases and Information Systems. ADBIS 2018. Lecture Notes in Computer Science(), vol 11019. Springer, Cham. https://doi.org/10.1007/978-3-319-98398-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-98398-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98397-4
Online ISBN: 978-3-319-98398-1
eBook Packages: Computer ScienceComputer Science (R0)