Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage Systems

Published: 11 September 2024 Publication History

Abstract

The data movement in large-scale computing facilities (from compute nodes to data nodes) is categorized as one of the major contributors to high cost and energy utilization. To tackle it, in-storage processing (ISP) within storage devices, such as Solid-State Drives (SSDs), has been explored actively. The introduction of computational storage drives (CSDs) enabled ISP within the same form factor as regular SSDs and made it easy to replace SSDs within traditional compute nodes. With CSDs, host systems can offload various operations such as search, filter, and count. However, commercialized CSDs have different hardware resources and performance characteristics. Thus, it requires careful consideration of hardware, performance, and workload characteristics for building a CSD-based storage system within a compute node. Therefore, storage architects are hesitant to build a storage system based on CSDs as there are no tools to determine the benefits of CSD-based compute nodes to meet the performance requirements compared to traditional nodes based on SSDs. In this work, we proposed an analytical model-based storage capacity planner called CsdPlan for system architects to build performance-effective CSD-based compute nodes. Our model takes into account the performance characteristics of the host system, targeted workloads, and hardware and performance characteristics of CSDs to be deployed and provides optimal configuration based on the number of CSDs for a compute node. Furthermore, CsdPlan estimates and reduces the total cost of ownership (TCO) for building a CSD-based compute node. To evaluate the efficacy of CsdPlan, we selected two commercially available CSDs and four representative big data analysis workloads.

References

[1]
2022. Frontier - Exascale Supercomputer. (2022). https://www.olcf.ornl.gov/frontierLast Accessed: December 1, 2022.
[2]
2022. Los Alamos National Laboratory and SK hynix to demonstrate first-of-a-kind ordered Key-value Store Computational Storage Device. (2022). https://discover.lanl.gov/news/0728-storage-devicLast Accessed: November 28, 2022.
[4]
2022. Top500 Supercomputer site. https://www.top500.org/lists/top500/list/2022/11/. (2022). Last Accessed: November 28, 2022.
[5]
ARM Xilinx. 2018. BRAM and Other Memories. (2018). Retrieved Nov. 10, 2022 from https://www.xilinx.com/htmldocs/xilinx2017_4/sdaccel_doc/jbt1504034294480.html
[6]
ARM Xilinx. 2021. P2P bandwidth Example. (2021). Retrieved Nov. 10, 2022 from https://github.com/Xilinx/Vitis_Accel_Examples/tree/master/host/p2p_bandwidth
[7]
ARM Xilinx. 2021. Vitis Accel Examples. (2021). Retrieved Nov. 10, 2022 from https://github.com/Xilinx/Vitis_Accel_Examples
[8]
ARM Xilinx. 2021. Vitis Accel Examples Documentation. (2021). Retrieved Nov. 10, 2022 from https://xilinx.github.io/Vitis_Accel_Examples/2021.2/html/index.html
[9]
ARM Xilinx. 2022. UG1416-Vitis-Documentation. (2022). Retrieved Nov. 10, 2022 from https://docs.xilinx.com/v/u/en-US/ug1416-vitis-documentation
[10]
ARM Xilinx. 2022. Vitis High-Level Synthesis User Guide (UG1399). (2022). Retrieved Nov. 10, 2022 from https://docs.xilinx.com/r/en-US/ug1399-vitis-hls
[11]
ARM Xilinx. 2022. Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393). (2022). Retrieved Nov. 10, 2022 from https://docs.xilinx.com/r/en-US/ug1393-vitis-application-acceleration
[12]
Axboe, J. 2021. GitHub—axboe/fio: Flexible I/O Tester. (2021). Retrieved Nov. 10, 2022 from https://github.com/axboe/fio
[13]
Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H. Moreno, Richard Murphy, Ravi Nair, and Steven Swanson. 2014. Near-data processing: Insights from a MICRO-46 workshop. IEEE Micro 34, 4 (2014), 36–42. DOI:
[14]
Djillali Boukhelef, Jalil Boukhobza, Kamel Boukhalfa, Hamza Ouarnoughi, and Laurent Lemarchand. 2019. Optimizing the cost of DBaaS object placement in hybrid storage systems. Future Generation Computer Systems 93 (2019), 176–187.
[15]
Wei Cao, Yang Liu, Zhushi Cheng, Ning Zheng, Wei Li, Wenjie Wu, Linqiang Ouyang, Peng Wang, Yijing Wang, Ray Kuan, Zhenjun Liu, Feng Zhu, and Tong Zhang. 2020. POLARDB meets computational storage: Efficiently support analytical workloads in cloud-native relational database. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20). USENIX Association, USA, 29–42.
[16]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (Jan. 2008), 107–113. DOI:
[17]
Jaeyoung Do, Victor C. Ferreira, Hossein Bobarshad, Mahdi Torabzadehkashi, Siavash Rezaei, Ali Heydarigorji, Diego Souza, Brunno F. Goldstein, Leandro Santiago, Min Soo Kim, Priscila M. V. Lima, Felipe M. G. França, and Vladimir Alves. 2020. Cost-effective, energy-efficient, and scalable storage computing for large-scale AI applications. ACM Trans. Storage 16, 4, Article 21 (Oct. 2020), 37 pages. DOI:
[18]
Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A framework for near-data processing of big data workloads. SIGARCH Comput. Archit. News 44, 3 (2016), 153–165. DOI:
[19]
John L. Gustafson. 2011. Amdahl’s Law. Springer US, Boston, MA, 53–60. DOI:
[20]
John C. McCallum. 2022. Flash Memory and SSD Prices. (Oct. 22, 2022). https://jcmit.net/flashprice.htm
[21]
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An appliance for big data analytics. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). ACM, 1–13.
[22]
Yangwook Kang, Yang-suk Kee, Ethan L. Miller, and Chanik Park. 2013. Enabling cost-effective data processing with smart SSD. In 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST). 1–12.
[23]
Awais Khan, Hyogi Sim, Sudharshan S. Vazhkudai, Ali R. Butt, and Youngjae Kim. 2021. An analysis of system balance and architectural trends based on Top500 supercomputers. In The International Conference on High Performance Computing in Asia-Pacific Region. 11–22.
[24]
Sungchan Kim, Hyunok Oh, Chanik Park, Sangyeun Cho, Sang-Won Lee, and Bongki Moon. 2016. In-storage processing of database scans and joins. Inf. Sci. 327, C (Jan. 2016), 183–200. DOI:
[25]
Youngjae Kim, Aayush Gupta, Bhuvan Urgaonkar, Piotr Berman, and Anand Sivasubramaniam. 2011. HybridStore: A cost-efficient, high-performance storage system combining SSDs and HDDs. In Proceedings of the 19th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 227–236.
[26]
Youngjae Kim, Aayush Gupta, Bhuvan Urgaonkar, Piotr Berman, and Anand Sivasubramaniam. 2014. HybridPlan: A capacity planning technique for projecting storage requirements in hybrid storage systems. The Journal of Supercomputing 67, 1 (2014), 277–303.
[27]
Gunjae Koo, Kiran Kumar Matam, Te I, H. V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50’17). 219–231.
[28]
Dongup Kwon, Dongryeong Kim, Junehyuk Boo, Wonsik Lee, and Jangwoo Kim. 2021. A fast and flexible hardware-based virtualization mechanism for computational storage devices. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC). 729–743.
[29]
Karol Latecki and Maciej Wawryk. 2022. SPDK NVMe BDEV Performance Report release 22.01. (February 2022), 11–12. https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2201.pdf
[30]
Shengwen Liang, Ying Wang, Youyou Lu, Zhe Yang, Huawei Li, and Xiaowei Li. 2019. Cognitive SSD: A deep learning engine for in-storage data retrieval. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC). USENIX, 395–410.
[31]
Dushyanth Narayanan, Eno Thereska, Austin Donnelly, Sameh Elnikety, and Antony Rowstron. 2009. Migrating server storage to SSDs: Analysis of tradeoffs. In Proceedings of the Fourth European Conference on Computer Systems (EuroSys’20).
[32]
NGD Systems. 2022. Newport CSD. (2022). Retrieved Nov. 10, 2022 from https://www.ngdsystems.com/solutions#NewportSection
[33]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.Technical Report. Stanford InfoLab.
[34]
Philip Schwan. 2003. Lustre: Building a file system for 1,000-node clusters. In Proceedings of the Linux Symposium.
[35]
Zhenyuan Ruan, Tong He, and Jason Cong. 2019. INSIDER: Designing in-storage computing system for emerging high-performance drive. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 379–394. https://www.usenix.org/conference/atc19/presentation/ruan
[36]
Samsung Electronics. 2022. SmartSSD. (2022). Retrieved Nov. 10, 2022 from https://semiconductor.samsung.com/ssd/smart-ssd/
[37]
Scaleflux Inc. 2022. Scaleflux. (2022). Retrieved Nov. 10, 2022 from http://www.scaleflux.com/
[38]
Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A user-programmable SSD. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 67–80. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/seshadri
[39]
Devesh Tiwari, Simona Boboila, Sudharshan Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solihin. 2013. Active Flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In 11th USENIX Conference on File and Storage Technologies (FAST 13). USENIX Association, San Jose, CA, 119–132. https://www.usenix.org/conference/fast13/technical-sessions/presentation/tiwari
[40]
Mahdi Torabzadehkashi, Ali Heydarigorji, Siavash Rezaei, Hosein Bobarshad, Vladimir Alves, and Nader Bagherzadeh. 2019. Accelerating HPC applications using computational storage devices. In 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 1878–1885.
[41]
Mahdi Torabzadehkashi, Siavash Rezaei, Ali Heydarigorji, Hosein Bobarshad, Vladimir Alves, and Nader Bagherzadeh. 2019. Catalina: In-storage processing acceleration for scalable big data analytics. In 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). 430–437. DOI:
[42]
Mahdi Torabzadehkashi, Siavash Rezaei, Ali HeydariGorji, Hosein Bobarshad, Vladimir Alves, and Nader Bagherzadeh. 2019. Computational storage: An efficient and scalable platform for big data and HPC applications. Journal of Big Data 6, 1 (2019), 1–29.
[43]
Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD In-storage computing for list intersection. In Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN’16). Association for Computing Machinery, New York, NY, USA, Article 4, 7 pages. DOI:
[44]
Satoru Watanabe, Kazuhisa Fujimoto, Yuji Saeki, Yoshifumi Fujikawa, and Hiroshi Yoshino. 2019. Column-oriented database acceleration using FPGAs. In Proceedings of 2019 IEEE 35th International Conference on Data Engineering (ICDE). 686–697.
[45]
Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Koim, Sungjin Lee, and Arvind. 2020. AQUOMAN: An analytic-query offloading machine. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 386–399.
[46]
Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay Lofstead, Qing Liu, Scott Klasky, Manish Parashar, Norbert Podhorszki, Karsten Schwan, and Matthew Wolf. 2010. PreDatA–preparatory data analytics on peta-scale machines. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1–12.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 23, Issue 6
November 2024
505 pages
EISSN:1558-3465
DOI:10.1145/3613645
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 11 September 2024
Online AM: 14 September 2023
Accepted: 02 September 2023
Revised: 27 July 2023
Received: 09 December 2022
Published in TECS Volume 23, Issue 6

Check for updates

Author Tags

  1. Computational storage drives
  2. solid state drives
  3. in-storage processing
  4. near-data processing
  5. analytical modeling
  6. distributed processing

Qualifiers

  • Research-article

Funding Sources

  • Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 319
    Total Downloads
  • Downloads (Last 12 months)258
  • Downloads (Last 6 weeks)15
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media