research-article

CostPI: Cost-Effective Performance Isolation for Shared NVMe SSDs

Authors:

Dan FengAuthors Info & Claims

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Article No.: 25, Pages 1 - 10

https://doi.org/10.1145/3337821.3337879

Published: 05 August 2019 Publication History

Abstract

NVMe SSDs have been wildly adopted to provide storage services in cloud platforms where diverse workloads (including latency-sensitive, throughput-oriented and capacity-oriented workloads) are colocated. To achieve performance isolation, existing solutions partition the shared SSD into multiple isolated regions and assign each workload a separate region. However, these isolation solutions could result in inefficient resource utilization and imbalanced wear. More importantly, they cannot reduce the interference caused by embedded cache contention. In this paper, we present CostPI to improve isolation and resource utilization by providing latency-sensitive workloads with dedicated resources (including data cache, mapping table cache and NAND flash), and providing throughput-oriented and capacity-oriented workloads with shared resources. Specifically, at the NVMe queue level, we present an SLO-aware arbitration mechanism which fetches requests from NVMe queues at different granularities according to workload SLOs. At the embedded cache level, we use an asymmetric allocation scheme to partition the cache (including data cache and mapping table cache). For different data cache partitions, we adopt different cache polices to meet diverse workload requirements while reducing the imbalanced wear. At the NAND flash level, we partition the hardware resources at the channel granularity to enable the strongest isolation. Our experiments show that CostPI can reduce the average response time by up to 44.2%, the 99% response time by up to 89.5%, and the 99.9% by up to 88.5% for latency-sensitive workloads. Meanwhile, CostPI can increase resource utilization and reduce wear-imbalance for the shared NVMe SSD.

References

[1]

Microsoft Enterprise Traces. http://iotta.snia.org/traces/130.

[2]

Microsoft Production Server Traces. http://iotta.snia.org/traces/158.

[3]

2019. NVM Express 1.3 specification. https://nvmexpress.org/.

[4]

UMass Trace Repository. http://traces.cs.umass.edu/index.php/Storage/Storage.

[5]

Lakshmi N. Bairavasundaram, Gokul Soundararajan, Vipul Mathur, Kaladhar Voruganti, and Steven Kleiman. 2011. Italian for Beginners: The Next Steps for SLO-Based Management. In 3rd USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2011, Portland, OR, USA, June 14, 2011.

Digital Library

[6]

Da-Wei Chang, Hsin-Hung Chen, and Wei-Jian Su. 2015. VSSD: Performance Isolation in a Solid-State Drive. ACM Trans. Design Autom. Electr. Syst. 20, 4 (2015), 51:1--51:33.

Digital Library

[7]

Jinhua Cui, Weiguo Wu, Yinfeng Wang, and Zhangfeng Duan. 2014. PT-LRU: a probabilistic page replacement algorithm for NAND flash-based consumer electronics. IEEE Trans. Consumer Electronics 60, 4 (2014), 614--622.

[8]

Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74--80.

Digital Library

[9]

Jian Huang, Anirudh Badam, Laura Caulfield, Suman Nath, Sudipta Sengupta, Bikash Sharma, and Moinuddin K. Qureshi. 2017. FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs. In 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017. 375--390.

Digital Library

[10]

Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In VLDB'94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile. 439--450.

Digital Library

[11]

Hoyoung Jung, Hyoki Shim, Sungmin Park, Sooyong Kang, and Jaehyuk Cha. 2008. LRU-WSR: integration of LRU and writes sequence reordering for flash memory. IEEE Trans. Consumer Electronics 54, 3 (2008), 1215--1223.

Digital Library

[12]

Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The Multi-streamed Solid-State Drive. In 6th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage '14, Philadelphia, PA, USA, June 17-18, 2014.

Digital Library

[13]

Won-Kyung Kang, Dongkun Shin, and Sungjoo Yoo. 2017. Reinforcement Learning-Assisted Garbage Collection to Mitigate Long-Tail Latency in SSD. ACM Trans. Embedded Comput. Syst. 16, 5 (2017), 134:1--134:20.

Digital Library

[14]

Swaroop Kavalanekar, Bruce L. Worthington, Qi Zhang, and Vishal Sharda. 2008. Characterization of storage workload traces from production Windows Servers. In 4th International Symposium on Workload Characterization (IISWC 2008), Seattle, Washington, USA, September 14-16, 2008. 119--128.

[15]

Bryan Suk Kim. 2018. Utilitarian Performance Isolation in Shared SSDs. In 10th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2018, Boston, MA, USA, July 9-10, 2018.

Digital Library

[16]

Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO Complying SSDs Through OPS Isolation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, Santa Clara, CA, USA, February 16-19, 2015. 183--189.

Digital Library

[17]

Donghee Lee, Jongmoo Choi, Jong-Hun Kim, Sam H. Noh, Sang Lyul Min, Yookun Cho, and Chong-Sang Kim. 2001. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Trans. Computers 50, 12 (2001), 1352--1361.

Digital Library

[18]

Zhi Li, Peiquan Jin, Xuan Su, Kai Cui, and Lihua Yue. 2009. CCF-LRU: a new buffer replacement algorithm for flash memory. IEEE Trans. Consumer Electronics 55, 3 (2009), 1351--1359.

Digital Library

[19]

Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the FAST '03 Conference on File and Storage Technologies, March 31 - April 2, 2003, Cathedral Hill Hotel, San Francisco, California, USA.

Digital Library

[20]

Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. 1993. The LRU-K Page Replacement Algorithm For Database Disk Buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 26-28, 1993. 297--306.

Digital Library

[21]

Seon-Yeong Park, Dawoon Jung, Jeong-Uk Kang, Jinsoo Kim, and Joonwon Lee. 2006. CFLRU: a replacement algorithm for flash memory. In Proceedings of the 2006 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2006, Seoul, Korea, October 22-25, 2006. 234--241.

Digital Library

[22]

R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, and Jim Zelenka. 1995. Informed Prefetching and Caching. In Proceedings of the Fifteenth ACM Symposium on Operating System Principles, SOSP 1995, Copper Mountain Resort, Colorado, USA, December 3-6, 1995. 79--95.

Digital Library

[23]

Arash Tavakkol, Juan Gómez-Luna, Mohammad Sadrosadati, Saugata Ghose, and Onur Mutlu. 2018. MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices. In 16th USENIX Conference on File and Storage Technologies, FAST 2018, Oakland, CA, USA, February 12-15, 2018. 49--66.

Digital Library

[24]

Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose, Jeremie Kim, Yixin Luo, Yaohua Wang, Nika Mansouri-Ghiasi, Lois Orosa, Juan Gómez-Luna, and Onur Mutlu. 2018. FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives. In 45th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2018, Los Angeles, CA, USA, June 1-6, 2018. 397--410.

Digital Library

[25]

Suzhen Wu, Yanping Lin, Bo Mao, and Hong Jiang. 2016. GCaR: Garbage Collection aware Cache Management with Improved Performance for Flash-based SSDs. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1-3, 2016. 28:1--28:12.

Digital Library

[26]

Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017. 15--28.

Digital Library

[27]

Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Changlim Lee, Mohammad Alian, Myoungjun Chun, Mahmut Taylan Kandemir, Nam Sung Kim, Jihong Kim, and Myoungsoo Jung. 2018. FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018. 477--492.

Digital Library

Cited By

Zhang XBhimani JPei SLee ELee SSeong YKim EChoi CNam EChoi JKim B(2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
https://dl.acm.org/doi/10.1145/3708992
Zhou YWang FShi ZFeng D(2024)CoFS: A Collaboration-Aware Fairness Scheme for NVMe SSD in Cloud Storage SystemIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.341297043:12(4490-4504)Online publication date: Dec-2024
https://doi.org/10.1109/TCAD.2024.3412970
Pang SDeng YZhang GHuang JWu Z(2024)Minato: A Read-Disturb-Aware Dynamic Buffer Management Scheme for NAND Flash MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336410943:7(1930-1943)Online publication date: Jul-2024
https://doi.org/10.1109/TCAD.2024.3364109
Show More Cited By

Recommendations

vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

A key role of virtualization is to give an illusion that a consolidated workload runs on a dedicated machine although the underlying resources are actively shared by multiple workloads. Technical advances have enabled a virtual machine (VM) to exercise ...
Batch scheduling of consolidated virtual machines based on their workload interference model

The use of virtualization technology (VT) has become widespread in modern datacenters and Clouds in recent years. In spite of their many advantages, such as provisioning of isolated execution environments and migration, current implementations of VT do ...
RESTRAIN: A dynamic and cost-efficient resource management scheme for addressing performance interference in NFV-based systems
Abstract
Network Functions Virtualization (NFV) replaces the conventional middleboxes by their software counterparts known as Virtual Network Functions (VNFs) which run on general-purpose hardware platforms and promise several benefits like ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

August 2019

1107 pages

ISBN:9781450362955

DOI:10.1145/3337821

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Defense Preliminary Research Project
Hubei Province Technical Innovation Special Project
Wuhan Application Basic Research Project
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities

Conference

ICPP 2019

ICPP 2019: 48th International Conference on Parallel Processing

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
526
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)15

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XBhimani JPei SLee ELee SSeong YKim EChoi CNam EChoi JKim B(2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
https://dl.acm.org/doi/10.1145/3708992
Zhou YWang FShi ZFeng D(2024)CoFS: A Collaboration-Aware Fairness Scheme for NVMe SSD in Cloud Storage SystemIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.341297043:12(4490-4504)Online publication date: Dec-2024
https://doi.org/10.1109/TCAD.2024.3412970
Pang SDeng YZhang GHuang JWu Z(2024)Minato: A Read-Disturb-Aware Dynamic Buffer Management Scheme for NAND Flash MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336410943:7(1930-1943)Online publication date: Jul-2024
https://doi.org/10.1109/TCAD.2024.3364109
Lee WKang MKim S(2024)Highly VM-Scalable SSD in Cloud Storage SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.330557343:1(113-126)Online publication date: Jan-2024
https://doi.org/10.1109/TCAD.2023.3305573
Liu RTan ZShen YLong LLiu D(2024)Fair-ZNS: Enhancing Fairness in ZNS SSDs Through Self-Balancing I/O SchedulingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.323299743:7(2012-2022)Online publication date: Jul-2024
https://doi.org/10.1109/TCAD.2022.3232997
Fan HYe YIbrahim SHuang ZLi XXue WWu SYu CShi XJin H(2023)QoS-pro: A QoS-enhanced Transaction Processing Framework for Shared SSDsACM Transactions on Architecture and Code Optimization10.1145/3632955Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3632955
Liu WCui JLi TLiu JYang L(2023)A Space-Efficient Fair Cache Scheme Based on Machine Learning for NVMe SSDsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322141034:1(383-399)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TPDS.2022.3221410
Wu CChen LHsu RDai J(2023)A State-Aware Method for Flows With Fairness on NVMe SSDs With Load BalanceIEEE Transactions on Cloud Computing10.1109/TCC.2023.3253864(1-16)Online publication date: 2023
https://doi.org/10.1109/TCC.2023.3253864
Zhu JWang LXiao LLiu LQin G(2023)EBIO: An Efficient Block I/O Stack for NVMe SSDs With Mixed WorkloadsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329636942:12(5048-5060)Online publication date: Dec-2023
https://doi.org/10.1109/TCAD.2023.3296369
Liu RTan ZLong LWu YTan YLiu D(2022)Improving Fairness for SSD Devices through DRAM Over-Provisioning Cache ManagementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.314329533:10(2444-2454)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TPDS.2022.3143295
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten