Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3344341.3368802acmconferencesArticle/Chapter ViewAbstractPublication PagesuccConference Proceedingsconference-collections
research-article
Public Access

ATLAS: A Distributed File System for Spatiotemporal Data

Published: 02 December 2019 Publication History

Abstract

A majority of the data generated in several domains is geotagged. These data also have a chronological component associated with them. Pervasive data generation and collection efforts have led to an increase in data volumes. These data hold the potential to unlock valuable insights. To facilitate such knowledge extraction in a timely manner, the underlying file system must satisfy several objectives. In this study, we present Atlas, a distributed file system designed specifically for spatiotemporal data. Atlas includes several capabilities that are suited for performing large-scale analyses: aligning dispersion with data access patterns, load balancing storage, and facilitating interoperation with analytical engines such as Hadoop and Spark. Our empirical benchmarks profile several aspects of Atlas, and demonstrate the suitability of our methodology.

References

[1]
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. 2013. Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proceedings of the VLDB Endowment 6, 11 (2013), 1009--1020.
[2]
A. Akdogan, U. Demiryurek, F. Banaei-Kashani, and C. Shahabi. 2010. Voronoi-based geospatial query processing with mapreduce. In 2010 IEEE Second International Conference on Cloud Computing Technology and Science. IEEE, 9--16.
[3]
F. Aurenhammer. 1991. Voronoi diagrams-a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR) 23, 3 (1991), 345--405.
[4]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. 1990. The R*-tree: an efficient and robust access method for points and rectangles. In Acm Sigmod Record, Vol. 19. Acm, 322--331.
[5]
A. Eldawy, Y. Li, M. F. Mokbel, and R. Janardan. 2013. CG_Hadoop: computational geometry in MapReduce. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 294--303.
[6]
A. Eldawy and M. F. Mokbel. 2014. Pigeon: A spatial mapreduce language. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 1242--1245.
[7]
A. Eldawy and M. F. Mokbel. 2015. Spatialhadoop: A mapreduce framework for spatial data. In 2015 IEEE 31st international conference on Data Engineering. IEEE, 1352--1363.
[8]
R. A. Finkel and J. L. Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta informatica 4, 1 (1974), 1--9.
[9]
G. Fox, S. Lim, S. Pallickara, and M. Pierce. 2005. Message-based cellular peer-to-peer grids: foundations for secure federation and autonomic services. Future Generation Computer Systems 21, 3 (2005), 401--415.
[10]
G. Fox, S. Pallickara, and X. Rao. 2005. Towards enabling peerto- peer Grids. Concurrency and Computation: Practice and Experience 17, 7--8 (2005), 1109--1131.
[11]
S. Ghemawat, H. Gobioff, and S.-T. Leung. 2003. The Google file system. (2003).
[12]
A. Guttman. 1984. R-trees: A dynamic index structure for spatial searching. Vol. 14. ACM.
[13]
D. Han and E. Stroulia. 2013. Hgrid: A data model for large geospatial data sets in hbase. In 2013 IEEE Sixth International Conference on Cloud Computing. IEEE, 910--917.
[14]
I. Kamel and C. Faloutsos. 1993. Hilbert R-tree: An improved R-tree using fractals. Technical Report.
[15]
V. Kantere, S. Skiadopoulos, and T. Sellis. 2008. Storing and indexing spatial data in p2p systems. IEEE Transactions on Knowledge and Data Engineering 21, 2 (2008), 287--300.
[16]
J. Lu and R. H. Güting. 2012. Parallel secondo: boosting database engines with hadoop. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems. IEEE, 738--743.
[17]
W. Lu, Y. Shen, S. Chen, and B. C. Ooi. 2012. Efficient processing of k nearest neighbor joins using mapreduce. Proceedings of the VLDB Endowment 5, 10 (2012), 1016--1027.
[18]
Q. Ma, B. Yang, W. Qian, and A. Zhou. 2009. Query processing of massive trajectory data based on mapreduce. In Proceedings of the first international workshop on Cloud data management. ACM, 9--16.
[19]
A. Mondal, Y. Lifu, and M. Kitsuregawa. 2004. P2pr-tree: An rtree- based spatial index for peer-to-peer environments. In International Conference on Extending Database Technology. Springer, 516--525.
[20]
S. Nishimura, S. Das, D. Agrawal, and A. El Abbadi. 2011. Mdhbase: A scalable multi-dimensional data infrastructure for location aware services. In 2011 IEEE 12th International Conference on Mobile Data Management, Vol. 1. IEEE, 7--16.
[21]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. 2008. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 1099--1110.
[22]
T. Sellis, N. Roussopoulos, and C. Faloutsos. 1987. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. Technical Report.
[23]
K. Shvachko, H. Kuang, S. Radia, R. Chansler, et al. 2010. The hadoop distributed file system. In MSST, Vol. 10. 1--10.
[24]
Y. L. Simmhan, S. L. Pallickara, N. N. Vijayakumar, and B. Plale. 2007. Data management in dynamic environment-driven computational science. In Grid-based problem solving environments. Springer, 317--333.
[25]
E. Tanin, A. Harwood, and H. Samet. 2007. Using a distributed quadtree index in peer-to-peer networks. The VLDB Journal- The International Journal on Very Large Data Bases 16, 2 (2007), 165--178.
[26]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2, 2 (2009), 1626--1629.
[27]
M. N. Vora. 2011. Hadoop-HBase for large-scale data. In Proceedings of 2011 International Conference on Computer Science and Network Technology, Vol. 1. IEEE, 601--605.
[28]
K. Wang, J. Han, B. Tu, J. Dai, W. Zhou, and X. Song. 2010. Accelerating spatial data processing with mapreduce. In 2010 IEEE 16th International Conference on Parallel and Distributed Systems. IEEE, 229--236.
[29]
C. Zhang, F. Li, and J. Jestes. 2012. Efficient parallel kNN joins for large data in MapReduce. In Proceedings of the 15th international conference on extending database technology. ACM, 38--49.
[30]
N. Zhang, G. Zheng, H. Chen, J. Chen, and X. Chen. 2014. Hbasespatial: A scalable spatial data storage based on hbase. In 2014 IEEE 13th international conference on trust, security and privacy in computing and communications. IEEE, 644--651.
[31]
S. Zhang, J. Han, Z. Liu, K. Wang, and S. Feng. 2009. Spatial queries evaluation with mapreduce. In 2009 Eighth International Conference on Grid and Cooperative Computing. IEEE, 287-- 292.
[32]
S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. 2009. Sjmr: Parallelizing spatial join with mapreduce on clusters. In 2009 IEEE International Conference on Cluster Computing and Workshops. IEEE, 1--8.
[33]
Y. Zhong, J. Han, T. Zhang, Z. Li, J. Fang, and G. Chen. 2012. Towards parallel spatial query processing for big spatial data. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 2085--2094.

Cited By

View all
  • (2023)AQUA: A Framework for Spatiotemporal Analysis and Visualizations of Water Quality Data at Scale2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386898(1555-1562)Online publication date: 15-Dec-2023
  • (2022)A Survey of Spatio-Temporal Big Data Indexing Methods in Distributed EnvironmentIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2022.317565715(4132-4155)Online publication date: 2022
  • (2022)Alleviating Resource Requirements for Spatial Deep Learning Workloads2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00055(452-462)Online publication date: May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
UCC'19: Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing
December 2019
307 pages
ISBN:9781450368940
DOI:10.1145/3344341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. analytics
  2. file systems
  3. hdfs
  4. spatiotemporal data

Qualifiers

  • Research-article

Funding Sources

Conference

UCC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 38 of 125 submissions, 30%

Upcoming Conference

UCC '24
2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing
December 16 - 19, 2024
Sharjah , United Arab Emirates

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)14
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)AQUA: A Framework for Spatiotemporal Analysis and Visualizations of Water Quality Data at Scale2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386898(1555-1562)Online publication date: 15-Dec-2023
  • (2022)A Survey of Spatio-Temporal Big Data Indexing Methods in Distributed EnvironmentIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2022.317565715(4132-4155)Online publication date: 2022
  • (2022)Alleviating Resource Requirements for Spatial Deep Learning Workloads2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00055(452-462)Online publication date: May-2022
  • (2021)Distributed Orchestration of Regression Models Over Administrative BoundariesProceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies10.1145/3492324.3494164(80-90)Online publication date: 6-Dec-2021
  • (2021)Review on Integrating Geospatial Big Datasets and Open Research IssuesIEEE Access10.1109/ACCESS.2021.30510849(10604-10620)Online publication date: 2021
  • (2020)Small is Beautiful: Distributed Orchestration of Spatial Deep Learning Workloads2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC48980.2020.00029(101-111)Online publication date: Dec-2020
  • (2020)Towards Timely, Resource-Efficient Analyses Through Spatially-Aware Constructs within Spark2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC48980.2020.00024(46-56)Online publication date: Dec-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media