Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Distributed trajectory similarity search

Published: 01 August 2017 Publication History

Abstract

Mobile and sensing devices have already become ubiquitous. They have made tracking moving objects an easy task. As a result, mobile applications like Uber and many IoT projects have generated massive amounts of trajectory data that can no longer be processed by a single machine efficiently. Among the typical query operations over trajectories, similarity search is a common yet expensive operator in querying trajectory data. It is useful for applications in different domains such as traffic and transportation optimizations, weather forecast and modeling, and sports analytics. It is also a fundamental operator for many important mining operations such as clustering and classification of trajectories. In this paper, we propose a distributed query framework to process trajectory similarity search over a large set of trajectories. We have implemented the proposed framework in Spark, a popular distributed data processing engine, by carefully considering different design choices. Our query framework supports both the Hausdorff distance the Fréchet distance. Extensive experiments have demonstrated the excellent scalability and query efficiency achieved by our design, compared to other methods and design alternatives.

References

[1]
P. K. Agarwal, R. Ben Avraham, H. Kaplan, and M. Sharir. Computing the discrete frechet distance in subquadratic time. Siam Journal of Computing, 43:429--449, 2014.
[2]
P. K. Agarwal and R. Sharathkumar. Approximation algorithms for bipartite matching with metric and geometric costs. In STOC, 2014.
[3]
H. Alt and M. Godau. Computing the fröchet distance between two polygonal curves. JCG Appl., 5:75--91, 1995.
[4]
H. Alt, C. Knauer, and C. Wenk. Comparison of distance measures for planar curves. Algoritkmica, 2004.
[5]
H. Alt and L. Scharf. Computing the hausdorff distance between curved objects. JCG Appl., 18(4):307--320, 2008.
[6]
Y.-B. Bai, J.-H. Yong, C.-Y Liu, X.-M. Liu, and Y. Meng. Polyline approach for approximating Hausdorff distance between planar free-form curves. CAD, 43:687--698, 2011.
[7]
M. d. Berg, O. Cheong, M. v. Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag TELOS, 2008.
[8]
T. Bozkaya and Z. M. Özsoyoglu. Indexing large metric spaces for similarity search queries. TODS, 24(3):361--404, 1999.
[9]
S. Cabello, P. Giannopoulos, C. Knauer, and G. Rote. Matching point sets with respect to the Earth mover's distance. Computational Geometry: Theory and Applications, 39:118--133, 2008.
[10]
P. Cao and Z. Wang. Efficient top-k query calculation in distributed networks. In PODC, pages 206--215, 2004.
[11]
S. Chambi, D. Lemire, O. Kaser, and R. Godin. Better bitmap performance with roaring bitmaps. Softw., Pract. Exper., 2016.
[12]
L. Chen and R. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004.
[13]
L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005.
[14]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--35, 1997.
[15]
P. Cudré-Mauroux, E. Wu, and S. Madden. Trajstore: An adaptive storage system for very large trajectory data sets. In ICDE, pages 109--120, 2010.
[16]
M. de Berg, A. F. Cook IV, and J. Gudmundsson. Fast frechet queries. In Symposium on Algorithms and Computation, 2011.
[17]
A. Driemel and S. Har-Peled. Jaywalking your dog - computing the Frëchet distance with shortcuts. Siam Journal of Computing, 42:1830--1866, 2013.
[18]
E. Frentzos, K. Gratsias, and Y Theodoridis. Index-based most similar trajectory search. In ICDE, pages 816--825, 2007.
[19]
A. W. Fu, P. M. Chan, Y Cheung, and Y. S. Moon. Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. VLDBJ, 2000.
[20]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[21]
J. Hangouët. Computation of the Hausdorff distance betwene plane vector polylines. In Proceedings AutoCarto 12, 1995.
[22]
I. Kamel and C. Faloutsos. Hilbert r-tree: An improved r-tree using fractals. In VLDB, pages 500--509, 1994.
[23]
J.-G. Lee, J. Han, and X. Li. Trajectory outlier detection: A partition-and-detect framework. In ICDE, pages 140--149, 2008.
[24]
J.-G. Lee, J. Han, X. Li, and H. Gonzalez. Traclass: trajectory classification using hierarchical region-based and trajectory-based clustering. In VLDB, 2008.
[25]
J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory clustering: a partition-and-group framework. In SIGMOD, pages 593--604, 2007.
[26]
S. T. Leutenegger, M. Lopez, J. Edgington, et al. Str: A simple and efficient algorithm for r-tree packing. In ICDE, pages 497--506, 1997.
[27]
C. Long, R. C. Wong, and H. V. Jagadish. Direction-preserving trajectory simplification. PVLDB, 6(10):949--960, 2013.
[28]
C. Long, R. C. Wong, and H. V. Jagadish. Trajectory simplification: On minimizing the direction-based error. PVLDB, 8(1):49--60, 2014.
[29]
S. Ranu, D. P, A. D. Telang, P. Deshpande, and S. Raghavan. Indexing and matching trajectories under inconsistent sampling rates. In ICDE, 2015.
[30]
B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 12:2389--2410, 2011.
[31]
H. Su, K. Zheng, J. Huang, H. Wang, and X. Zhou. Calibrating trajectory data for spatio-temporal similarity analysis. VLDBJ, 24(1):93--116, 2015.
[32]
H. Su, K. Zheng, H. Wang, J. Huang, and X. Zhou. Calibrating trajectory data for similarity-based analysis. In SIGMOD, 2013.
[33]
H. Su, K. Zheng, K. Zeng, J. Huang, S. W. Sadiq, N. J. Yuan, and X. Zhou. Making sense of trajectory data: A partition-and-summarization approach. In ICDE, pages 963--974, 2015.
[34]
H. Su, K. Zheng, K. Zheng, J. Huang, and X. Zhou. Stmaker - A system to make sense of trajectory data. PVLDB, 7(13):1701--1704, 2014.
[35]
C. F. Torres and R. Trujillo-Rasua. The fréchet/manhattan distance and the trajectory anonymisation problem. In DBSec, pages 19--34, 2016.
[36]
J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett., 40(4):175--179, 1991.
[37]
M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, 2002.
[38]
H. Wang, H. Su, K. Zheng, S. W. Sadiq, and X. Zhou. An effectiveness study on trajectory similarity measures. In ADC, 2013.
[39]
H. Wang, K. Zheng, X. Zhou, and S. W. Sadiq. Sharkdb: An in-memory storage system for massive trajectory data. In SIGMOD, pages 1099--1104, 2015.
[40]
D. Xie, F. Li, B. Yao, G. Li, L. Zhou, and M. Guo. Simba: Efficient in-memory spatial analytics. In SIGMOD, pages 1071--1085, 2016.
[41]
B.-K. Yi, H. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE, 1998.
[42]
P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, 1993.
[43]
R. Ying, J. Pan, K. Fox, and P. K. Agarwal. A simple efficient approximation algorithm for dynamic time warping. In SIGSPATIAL, 2016.
[44]
J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, and Y Huang. T-drive: driving directions based on taxi trajectories. In SIGSPATIAL, 2010.
[45]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
[46]
Z. Zhang, K. Huang, and T. Tan. Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes. In ICPR, 2006.
[47]
Y. Zheng and X. Zhou, editors. Computing with Spatial Trajectories. Springer, 2011.

Cited By

View all
  • (2024)Trajectory Similarity Measurement: An Efficiency PerspectiveProceedings of the VLDB Endowment10.14778/3665844.366585817:9(2293-2306)Online publication date: 1-May-2024
  • (2024)BT-Tree: A Reinforcement Learning Based Index for Big Trajectory DataProceedings of the ACM on Management of Data10.1145/36771302:4(1-27)Online publication date: 30-Sep-2024
  • (2024)Retrieving Similar Trajectories from Cellular Data of Multiple Carriers at City ScaleACM Transactions on Sensor Networks10.1145/361324520:2(1-28)Online publication date: 16-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 10, Issue 11
August 2017
432 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2017
Published in PVLDB Volume 10, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)13
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Trajectory Similarity Measurement: An Efficiency PerspectiveProceedings of the VLDB Endowment10.14778/3665844.366585817:9(2293-2306)Online publication date: 1-May-2024
  • (2024)BT-Tree: A Reinforcement Learning Based Index for Big Trajectory DataProceedings of the ACM on Management of Data10.1145/36771302:4(1-27)Online publication date: 30-Sep-2024
  • (2024)Retrieving Similar Trajectories from Cellular Data of Multiple Carriers at City ScaleACM Transactions on Sensor Networks10.1145/361324520:2(1-28)Online publication date: 16-Feb-2024
  • (2024)Spatiotemporal gated traffic trajectory simulation with semantic-aware graph learningInformation Fusion10.1016/j.inffus.2024.102404108:COnline publication date: 1-Aug-2024
  • (2024)Dynamic trajectory partition optimization method based on historical trajectory dataApplied Soft Computing10.1016/j.asoc.2023.111120151:COnline publication date: 17-Apr-2024
  • (2024)Domain-Knowledge Enhanced GANs for High-Quality Trajectory GenerationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5606-3_33(386-396)Online publication date: 5-Aug-2024
  • (2023)A Distributed Spatial Index With High Update Efficiency for Location-Based Real-Time ServicesJournal of Database Management10.4018/JDM.31845434:1(1-28)Online publication date: 24-Feb-2023
  • (2023)Continuous trajectory generation based on two-stage GANProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i4.25557(4374-4382)Online publication date: 7-Feb-2023
  • (2023)Ghost: A General Framework for High-Performance Online Similarity Queries over Distributed Trajectory StreamsProceedings of the ACM on Management of Data10.1145/35893181:2(1-25)Online publication date: 20-Jun-2023
  • (2023)ST4ML: Machine Learning Oriented Spatio-Temporal Data Processing at ScaleProceedings of the ACM on Management of Data10.1145/35889411:1(1-28)Online publication date: 30-May-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media