Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Predicting file lifetimes for data placement in multi-tiered storage systems for HPC

Published: 06 June 2021 Publication History

Abstract

The emergence of Exascale machines in HPC will have the foreseen consequence of putting more pressure on the storage systems in place, not only in terms of capacity but also bandwidth and latency. With limited budget we cannot imagine using only storage class memory, which leads to the use of a heterogeneous tiered storage hierarchy. In order to make the most efficient use of the high performance tier in this storage hierarchy, we need to be able to place user data on the right tier and at the right time. In this paper, we assume a 2-tier storage hierarchy with a high performance tier and a high capacity archival tier. Files are placed on the high performance tier at creation time and moved to capacity tier once their lifetime expires (that is once they are no more accessed). The main contribution of this paper lies in the design of a file lifetime prediction model solely based on its path based on the use of Convolutional Neural Network. Results show that our solution strikes a good trade-off between accuracy and under-estimation. Compared to previous work, our model made it possible to reach an accuracy close to previous work (around 98.60% compared to 98.84%) while reducing the underestimations by almost 10x to reach 2.21% (compared to 21.86%). The reduction in underestimations is crucial as it avoids misplacing files in the capacity tier while they are still in use.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016. USENIX, 265--283. https://doi.org/10.5555/3026877.3026899 arXiv:1605.08695
[2]
Anthony Agelastos, Benjamin Allan, Jim Brandt, Paul Cassella, Jeremy Enos, Joshi Fullop, Ann Gentile, Steve Monk, Nichamon Naksinehaboon, Jeff Ogden, Mahesh Rajan, Michael Showerman, Joel Stevenson, Narate Taerat, and Tom Tucker. 2014. The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society, 154--165. https://doi.org/10.1109/SC.2014.18
[3]
Djillali Boukhelef, Jalil Boukhobza, Kamel Boukhalfa, Hamza Ouarnoughi, and Laurent Lemarchand. 2019. Optimizing the cost of DBaaS object placement in hybrid storage systems. Future Generation Computer Systems 93 (apr 2019), 176--187. https://doi.org/10.1016/ j.future.2018.10.030
[4]
Jalil Boukhobza and Pierre Olivier. 2017. Flash Memory Integration (1st ed.). ISTE Press - Elsevier.
[5]
Jalil Boukhobza, Stéphane Rubini, Renhai Chen, and Zili Shao. 2017. Emerging NVM: A Survey on Architectural Integration and Research Challenges. ACM Trans. Des. Autom. Electron. Syst. 23, 2, Article 14 (Nov. 2017), 32 pages. https://doi.org/10.1145/3131848 105 CHEOPS '21, April 26, 2021, Online, United Kingdom Thomas and Gougeaud, et al.
[6]
CEA. 2020. CEA - HPC - Computing centers. Retrieved 2021-02--23 from http://www-hpc.cea.fr/en/complexe/computing-ressources.htm
[7]
CEA. 2020. CEA - HPC - TERA. Retrieved 2021-02--23 from http://www-hpc.cea.fr/en/complexe/tera.htm
[8]
CEA. 2020. CEA - HPC - TGCC. Retrieved 2021-02--23 from http://www-hpc.cea.fr/en/complexe/tgcc.htm
[9]
CEA. 2020. CEA - HPC - TGCC Storage system. Retrieved 2021-02--23 from http://www-hpc.cea.fr/en/complexe/tgcc-storage-system.htm
[10]
CEA. 2020. English Portal - The CEA: a key player in technological research. Retrieved 2021-02--23 from https://www.cea.fr/english/ Pages/cea/the-cea-a-key-player-in-technological-research.aspx
[11]
Chandranil Chakraborttii and Heiner Litz. 2020. Learning I/O Access Patterns to Improve Prefetching in SSDs. In The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD). https://www.researchgate.net/publication/344379801_Learning_IO_Access_Patterns_to_Improve_Prefetching_in_SSDs
[12]
Sean Cochrane, Ken Kutzer, and L McIntosh. 2009. Solving the HPC I/O bottleneck: Sun? Lustre? storage system. Sun BluePrints? Online 820 (2009). http://nz11-agh1.ifj.edu.pl/public_users/b14olsze/Lustre.pdf
[13]
Tom Coughlin. 2011. New storage hierarchy for consumer computers. In 2011 IEEE International Conference on Consumer Electronics (ICCE). 483--484. https://doi.org/10.1109/ICCE.2011.5722696 ISSN: 2158--4001.
[14]
Jean Emile Dartois, Jalil Boukhobza, Anas Knefati, and Olivier Barais. 2019. Investigating Machine Learning Algorithms for Modeling SSD I/O Performance for Container-based Virtualization. IEEE Transactions on Cloud Computing (2019). https://doi.org/10.1109/TCC.2019.2898192
[15]
Richard Evans. 2020. Democratizing Parallel Filesystem Monitoring. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). 454--458. https://doi.org/10.1109/CLUSTER49012.2020.00065
[16]
Ting Gong, Tyler Lee, Cory Stephenson, Venkata Renduchintala, Suchismita Padhy, Anthony Ndirango, Gokce Keskin, and Oguz Elibol. 2019. A Comparison of LossWeighting Strategies for Multi task Learning in Deep Neural Networks. IEEE Access 7 (2019), 141627--141632. https://doi.org/10.1109/ACCESS.2019.2943604
[17]
Takahiro Hirofuchi and Ryousei Takano. 2020. A Prompt Report on the Performance of Intel Optane DC Persistent Memory Module. IEICE Transactions on Information and Systems E103.D, 5 (May 2020), 1168-- 1172. https://doi.org/10.1587/transinf.2019EDL8141 arXiv: 2002.06018.
[18]
Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[19]
Kathy Kincade. 2019. UniviStor: Next-Generation Data Storage for Heterogeneous HPC. Retrieved 2021-02--23 from https://cs.lbl.gov/news-media/news/2019/univistor-a-nextgeneration- data-storage-tool-for-heterogeneous-hpc-storage/
[20]
S Klasky, Hasan Abbasi, M Ainsworth, Jong Youl Choi, Matthew Curry, T Kurc, Q Liu, Jay Lofstead, Carlos Maltzahn, Manish Parashar, Norbert Podhorszki, Eric Suchyta, F Wang, M Wolf, C.S. Chang, R. Churchill, and Stéphane Ethier. 2016. Exascale Storage Systems the SIRIUS Way. Journal of Physics: Conference Series 759 (Oct. 2016), 012095. https: //doi.org/10.1088/1742--6596/759/1/012095
[21]
Thomas Leibovici. 2015. Taking back control of HPC file systems with Robinhood Policy Engine. International Workshop on the Lustre Ecosystem: Challenges and Opportunities (2015). arXiv:1505.01448 http: //arxiv.org/abs/1505.01448
[22]
Zhen Liang, Johann Lombardi, Mohamad Chaarawi, and Michael Hennecke. 2020. DAOS: A Scale-Out High Performance Storage Stack for Storage Class Memory. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 12082 LNCS. Springer, 40--54. https://doi.org/10.1007/978--3-030--48842-0_3
[23]
Glenn K. Lockwood, Wucherl Yoo, Suren Byna, Nicholas J. Wright, Shane Snyder, Kevin Harms, Zachary Nault, and Philip Carns. 2017. UMAMI: A recipe for generating meaningful metrics through holistic I/O performance analysis. In Proceedings of PDSW-DISCS 2017 - 2nd Joint InternationalWorkshop on Parallel Data Storage and Data Intensive Scalable Computing Systems - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage a. 55--60. https://doi.org/10.1145/3149393.3149395
[24]
Jakob Lüttgau, Michael Kuhn, Kira Duwe, Yevhen Alforov, Eugen Betke, Julian Kunkel, and Thomas Ludwig. 2018. Survey of Storage Systems for High-Performance Computing. Supercomputing Frontiers and Innovations 5, 1 (April 2018), 31--58--58. https://doi.org/10.14529/ jsfi180103 Number: 1.
[25]
Florent Monjalet and Thomas Leibovici. 2019. Predicting File Lifetimes with Machine Learning. In High Performance Computing, Vol. 11887 LNCS. Springer, 288--299. https://doi.org/10.1007/978--3-030--34356- 9_23
[26]
Feiping Nie, Zhanxuan Hu, and Xuelong Li. 2018. An investigation for loss functions widely used in machine learning. Communications in Information and Systems 18, 1 (2018), 37--52. https://doi.org/10.4310/ cis.2018.v18.n1.a2
[27]
Hamza Ouarnoughi, Jalil Boukhobza, Frank Singhoff, and Stéphane Rubini. 2014. A multi-level I/O tracer for timing and performance storage systems in IaaS cloud. In 3rd IEEE International Workshop on Real-Time and Distributed Computing in Emerging Applications (REACTION). IEEE Computer Society, 1--8.
[28]
John K Ousterhout. 1990. Why Aren't Operating Systems Getting Faster As Fast as Hardware? 1990 Summer USENIX Annual Technical Conference (1990), 247--256.
[29]
Jinting Ren, Xianzhang Chen, Yujuan Tan, Duo Liu, Moming Duan, Liang Liang, and Lei Qiao. 2019. Archivist: A Machine Learning Assisted Data Placement Mechanism for Hybrid Storage Systems. In 2019 IEEE 37th International Conference on Computer Design (ICCD). 676--679. https://doi.org/10.1109/ICCD46524.2019.00098 ISSN: 2576--6996.
[30]
Pau Rodríguez, Miguel Bautista, Jordi Gonzàlez, and Sergio Escalera. 2018. Beyond One-hot Encoding: lower dimensional target embedding. Image and Vision Computing 75 (05 2018). https://doi.org/10.1016/j. imavis.2018.04.004
[31]
Joshua Saxe and Konstantin Berlin. 2017. eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys. arXiv:1702.08568 [cs] (Feb. 2017). http://arxiv.org/abs/1702.08568 arXiv: 1702.08568.
[32]
Woong Shin, Christopher Brumgard, Bing Xie, Sudharshan Vazhkudai, Devarshi Ghoshal, Sarp Oral, and Lavanya Ramakrishnan. 2019. Data Jockey: Automatic Data Management for HPC Multi-tiered Storage Systems. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 511--522. https://doi.org/10.1109/IPDPS.2019. 00061 ISSN: 1530--2075.
[33]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017. AAAI press, 4278--4284. https: //doi.org/10.5555/3298023.3298188 arXiv:1602.07261
[34]
Bharti Wadhwa, Surendra Byna, and Ali Butt. 2018. Toward Transparent Data Management in Multi-Layer Storage Hierarchy of HPC Systems. In 2018 IEEE International Conference on Cloud Engineering (IC2E). 211--217. https://doi.org/10.1109/IC2E.2018.00046
[35]
Lipeng Wan, Zheng Lu, Qing Cao, Feiyi Wang, Sarp Oral, and Bradley Settlemyer. 2014. SSD-optimized workload placement with adaptive learning and classification in HPC environments. In 2014 30th Symposium on Mass Storage Systems and Technologies (MSST). 1--6. https://doi.org/10.1109/MSST.2014.6855552
[36]
Wenguang Wang. 2004. Storage Management for Large Scale Systems. Ph.D. Dissertation. CAN. https://doi.org/10.5555/1123838 AAINR06171. 106 Predicting file lifetimes for data placement in multi-tiered storage systems for HPC CHEOPS '21, April 26, 2021, Online, United Kingdom
[37]
HPC Wire. 2020. Fujitsu and RIKEN Take First Place Worldwide in TOP500, HPCG, and HPL-AI with Supercomputer Fugaku. Retrieved 2021-01--25 from https://www.hpcwire.com/off-the-wire/fujitsu-andriken- take-first-place-worldwide-in-top500-hpcg-and-hpl-ai-withsupercomputer- fugaku/
[38]
Orcun Yildiz, Amelie Zhou, and Shadi Ibrahim. 2017. Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 87--91. https://doi.org/10.1109/CLUSTER.2017.73

Cited By

View all
  • (2022)A survey on AI for storageCCF Transactions on High Performance Computing10.1007/s42514-022-00101-34:3(233-264)Online publication date: 23-May-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 55, Issue 1
SIGOPS
July 2021
107 pages
ISSN:0163-5980
DOI:10.1145/3469379
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2021
Published in SIGOPS Volume 55, Issue 1

Check for updates

Author Tags

  1. convolutional neural network
  2. data placement
  3. file lifetime
  4. heterogeneous storage
  5. high performance computing
  6. machine learning
  7. multi-tier storage
  8. storage hierarchy

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A survey on AI for storageCCF Transactions on High Performance Computing10.1007/s42514-022-00101-34:3(233-264)Online publication date: 23-May-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media