Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3329785.3329921acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Cold Storage Data Archives: More Than Just a Bunch of Tapes

Published: 01 July 2019 Publication History

Abstract

The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention.
In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community.

References

[1]
Amazon Glacier. https://aws.amazon.com/de/glacier/. Accessed: 01-02-2019.
[2]
Google Archival Cloud Storage. https://cloud.google.com/storage/archival/. Accessed: 01-02-2019.
[3]
Microsoft Cool Blob Storage. https://azure.microsoft.com/en-us/blog/introducing-azure-cool-storage/. Accessed: 01-02-2019.
[4]
Patrick Anderson, Richard Black, Ausra Cerkauskaite, Andromachi Chatzieleftheriou, James Clegg, Chris Dainty, Raluca Diaconu, Rokas Drevinskas, Austin Donnelly, Alexander L. Gaunt, Andreas Georgiou, Ariel Gomez Diaz, Peter G. Kazansky, David Lara, Sergey Legtchenko, Sebastian Nowozin, Aaron Ogus, Douglas Phillips, Antony Rowstron, Masaaki Sakakura, Ioan Stefanovici, Benn Thomsen, Lei Wang, Hugh Williams, and Mengyang Yang. Glass: A New Media for a New Era? In 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 18), Boston, MA, 2018. USENIX Association.
[5]
Raja Appuswamy, Renata Borovica-Gajic, Goetz Graefe, and Anastasia Ailamaki. The Five-minute Rule Thirty Years Later and its Impact on the Storage Hierarchy. In International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2017, Munich, Germany, September 1, 2017., pages 1--8, 2017.
[6]
Raja Appuswamy, Kevin Lebrigand, Pascal Barbry, Marc Antonini, Olivier Madderson, Paul Freemont, James McDonald, and Thomas Heinis. OligoArchive: Using DNA in the DBMS storage hierarchy. In Biennal Conference on Innovative Data Systems Research, CIDR '19, 2019.
[7]
Shobana Balakrishnan, Richard Black, Austin Donnelly, Paul England, Adam Glass, Dave Harper, Sergey Legtchenko, Aaron Ogus, Eric Peterson, and Antony Rowstron. Pelican: A Building Block for Exascale Cold Data Storage. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 351--365, Broomfield, CO, 2014. USENIX Association.
[8]
Richard Black, Austin Donnelly, Dave Harper, Aaron Ogus, and Anthony Rowstron. Feeding the Pelican: Using Archival Hard Drives for Cold Storage Racks. In 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16), Denver, CO, 2016. USENIX Association.
[9]
Peter Boncz, Thomas Neumann, and Orri Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. In Performance Characterization and Benchmarking, pages 61--76. Springer International Publishing, 2014.
[10]
James Bornholt, Randolph Lopez, Douglas M. Carmean, Luis Ceze, Georg Seelig, and Karin Strauss. A DNA-Based Archival Storage System. SIGPLAN Not., 51(4):637--649, March 2016.
[11]
Matthias Grawinkel, Lars Nagel, Markus Mäsker, Federico Padua, André Brinkmann, and Lennart Sorth. Analysis of the ECMWF storage landscape. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST'15, pages 15--27, Berkeley, CA, USA, 2015. USENIX Association.
[12]
Jonathan Kaldor, Jonathan Mace, MichałBejda, Edison Gao, Wiktor Kuropatwa, Joe O'Neill, Kian Win Ong, Bill Schaller, Pingjia Shan, Brendan Viscomi, Vinod Venkataraman, Kaushik Veeraraghavan, and Yee Jiun Song. Canopy: An end-to-end performance tracing and analysis system. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, pages 34--50, New York, NY, USA, 2017. ACM.
[13]
S. Kiemle, K. Molch, S. Schropp, N. Weiland, and E. Mikusch. Big Data Management in Earth Observation: The German satellite data archive at the German Aerospace Center. IEEE Geoscience and Remote Sensing Magazine, 4(3):51--58, Sep. 2016.
[14]
Fred Moore. Storage Outlook 2016. https://horison.com/publications/storage-outlook-2016, 2016.
[15]
S Murray, V Bahyl, G Cancio, E Cano, V Kotlyar, D F Kruse, and J Leduc. An efficient, modular and simple tape archiving solution for LHC Run-3. Journal of Physics: Conference Series, 898:062013, October 2017.
[16]
Sirko Schindler, Marcus Paradies, and André Twele. Here is my Query, where are my Results? A Search Log Analysis of The EOWEB® Geoportal. In 2019 Conference on Big Data from Space: Turning Data into Insights, (BiDS'19), Munich, Germany, 19-21 February, 2019., pages 1--4, 2019.
[17]
G.N.J. van Diepen. Casacore Table Data System and its use in the MeasurementSet. Astronomy and Computing, 12:174--180, 2015.
[18]
Wenrui Yan, Jie Yao, Qiang Cao, Changsheng Xie, and Hong Jiang. ROS: A Rack-based Optical Storage System with Inline Accessibility for Long-Term Data Preservation. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys '17, pages 161--174, New York, NY, USA, 2017. ACM.

Cited By

View all
  • (2024)TALICS3: Tape library cloud storage system simulatorSimulation Modelling Practice and Theory10.1016/j.simpat.2024.102947134(102947)Online publication date: Jul-2024
  • (2023)Reducing Read Amplification and Re-synthesis in DNA-based Archival Storage2023 IEEE International Conference on Rebooting Computing (ICRC)10.1109/ICRC60800.2023.10386460(1-5)Online publication date: 5-Dec-2023

Index Terms

  1. Cold Storage Data Archives: More Than Just a Bunch of Tapes

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DaMoN'19: Proceedings of the 15th International Workshop on Data Management on New Hardware
      July 2019
      150 pages
      ISBN:9781450368018
      DOI:10.1145/3329785
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 July 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      SIGMOD/PODS '19
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 94 of 127 submissions, 74%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)28
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 27 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)TALICS3: Tape library cloud storage system simulatorSimulation Modelling Practice and Theory10.1016/j.simpat.2024.102947134(102947)Online publication date: Jul-2024
      • (2023)Reducing Read Amplification and Re-synthesis in DNA-based Archival Storage2023 IEEE International Conference on Rebooting Computing (ICRC)10.1109/ICRC60800.2023.10386460(1-5)Online publication date: 5-Dec-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media