Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3620666.3651337acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples

Published: 27 April 2024 Publication History

Abstract

Representative modeling of I/O activity is crucial when designing large-scale distributed storage systems. Particularly important use cases are counterfactual "what-if" analyses that assess the impact of anticipated or hypothetical new storage policies or hardware prior to deployment. We propose Thesios, a methodology to accurately synthesize such hypothetical full-resolution I/O traces by carefully combining down-sampled I/O traces collected from multiple disks attached to multiple storage servers. Applying this approach to real-world traces that are already routinely sampled at Google, we show that our synthesized traces achieve 95--99.5% accuracy in read/write request numbers, 90--97% accuracy in utilization, and 80--99.8% accuracy in read latency compared to metrics collected from actual disks. We demonstrate how Thesios enables diverse counterfactual I/O trace synthesis and analyses of hypothetical policy, hardware, and server changes through four case studies: (1) studying the effects of changing disk's utilization, fullness, and capacity, (2) evaluating new data placement policy, (3) analyzing the impact on power and performance of deploying disks with reduced rotations-per-minute (RPM), and (4) understanding the impact of increased buffer cache size on a storage server. Without Thesios, such counterfactual analyses would require costly and potentially risky A/B experiments in production.

References

[1]
Cristina L Abad, Huong Luu, Nathan Roberts, Kihwal Lee, Yi Lu, and Roy H Campbell. Metadata traces and workload models for evaluating big storage systems. In 2012 IEEE fifth international conference on utility and cloud computing, pages 125--132. IEEE, 2012.
[2]
Nitin Agrawal, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Generating realistic impressions for file-system benchmarking. ACM Transactions on Storage (TOS), 5(4):1--30, 2009.
[3]
Eric Anderson, Mahesh Kallahalla, Mustafa Uysal, and Ram Swaminathan. Buttress: A toolkit for flexible and high fidelity i/o benchmarking. In FAST, volume 4, pages 45--58, 2004.
[4]
Akshat Aranya, Charles P Wright, and Erez Zadok. Tracefs: A file system to trace them all. In FAST, pages 129--145, 2004.
[5]
Akshat Aranya, Charles P. Wright, and Erez Zadok. TraceFS sample traces (SNIA IOTTA trace 3). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, February 2007.
[6]
Jens Axboe. Flexible I/O Tester, https://fio.readthedocs.io.
[7]
Shobana Balakrishnan, Richard Black, Austin Donnelly, Paul England, Adam Glass, Dave Harper, Sergey Legtchenko, Aaron Ogus, Eric Peterson, and Antony Rowstron. Pelican: A building block for exascale cold data storage. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 351--365, Broomfield, CO, October 2014. USENIX Association.
[8]
Timothy Bisson, Scott A Brandt, and Darrell DE Long. A hybrid disk-aware spin-down algorithm with i/o subsystem support. In 2007 IEEE International Performance, Computing, and Communications Conference, pages 236--245. IEEE, 2007.
[9]
Peter Bodik, Armando Fox, Michael J Franklin, Michael I Jordan, and David A Patterson. Characterizing, modeling, and generating workload spikes for stateful services. In Proceedings of the 1st ACM symposium on Cloud computing, pages 241--252, 2010.
[10]
Dhruba Borthakur et al. Hdfs architecture guide. Hadoop apache project, 53(1-13):2, 2008.
[11]
John S Bucy, Gregory R Ganger, et al. The DiskSim simulation environment version 3.0 reference manual. School of Computer Science, Carnegie Mellon University, 2003.
[12]
Daniel Campello, Hector Lopez, Luis Useche, Ricardo Koller, and Raju Rangaswami. FIU filesystem syscall traces (SNIA IOTTA trace set 5198). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, September 2014.
[13]
Cloud Native Computing Foundation. OpenTelemetry: High-quality, ubiquitous, and portable telemetry to enable effective observability. https://opentelemetry.io/, Accessed Dec 29, 2023.
[14]
D. Colarelli and D. Grunwald. Massive arrays of idle disks for storage archives. In SC '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pages 47--47, 2002.
[15]
Junwei Da. Netapp autosupport analysis. 2012.
[16]
Alex Davies and Alessandro Orsaria. Scale out with glusterfs. Linux Journal, 2013(235):1, 2013.
[17]
Fred Douglis, Padmanabhan Krishnan, Brian Bershad, et al. Adaptive disk spin-down policies for mobile computers. Computing Systems, 8(4):381--413, 1995.
[18]
Xixhou Feng, Rong Ge, and Kirk W Cameron. Power and energy profiling of scientific applications on distributed systems. In 19th IEEE International Parallel and Distributed Processing Symposium, pages 10--pp. IEEE, 2005.
[19]
Archana Ganapathi, Yanpei Chen, Armando Fox, Randy Katz, and David Patterson. Statistics-driven workload modeling for the cloud. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pages 87--92. IEEE, 2010.
[20]
Gregory R Ganger. Generating representative synthetic workloads: An unsolved problem. In Proc. Computer Measurement Group (CMG) Conference, Dec. 1995, 1995.
[21]
Richard Golding, Peter Bosch, John Wilkes, et al. Idleness is not sloth. In USENIX, pages 201--212. Citeseer, 1995.
[22]
María Engracia Gomez and Vicente Santonja. A new approach in the modeling and generation of synthetic disk workload. In Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No. PR00728), pages 199--206. IEEE, 2000.
[23]
Paul M Greenawalt. Modeling power management for hard disks. In Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pages 62--66. IEEE, 1994.
[24]
Alireza Haghdoost, Weiping He, Jerry Fredin, and David HC Du. On the accuracy and scalability of intensive i/o workload replay. In FAST, volume 510, pages 315--328, 2017.
[25]
Tyler Harter, Chris Dragga, Michael Vaughn, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. iBench traces (SNIA IOTTA trace 416). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2011.
[26]
Tyler Harter, Brandon Salmon, Rose Liu, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Slacker: Fast distribution with lazy docker containers. In 14th USENIX Conference on File and Storage Technologies (FAST 16), pages 181--195, 2016.
[27]
Bo Hong and Tara M Madhyastha. The relevance of long-range dependence in disk traffic and implications for trace synthesis. In 22nd IEEE/13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05), pages 316--326. IEEE, 2005.
[28]
Bo Hong, Tara M Madhyastha, and Bing Zhang. Cluster-based input/output trace synthesis. In PCCC 2005. 24th IEEE International Performance, Computing, and Communications Conference, 2005., pages 91--98. IEEE, 2005.
[29]
Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, Samsung Electronics, and Youjip Won. MobiGen traces (SNIA IOTTA trace set 5189). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, January 2013.
[30]
Nikolai Joukov, Timothy Wong, and Erez Zadok. Accurate and efficient replaying of file system traces. In FAST, volume 5, pages 25--25, 2005.
[31]
Saurabh Kadekodi, Francisco Maturana, Suhas Jayaram Subramanya, Juncheng Yang, KV Rashmi, and Gregory R Ganger. PACEMAKER: Avoiding heart attacks in storage clusters with disk-adaptive redundancy. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020.
[32]
Saurabh Kadekodi, Vaishnavh Nagarajan, and Gregory R Ganger. Geriatrix: Aging what you see and what you {don't} see. a file system aging approach for modern storage systems. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 691--704, 2018.
[33]
Saurabh Kadekodi, K V Rashmi, and Gregory R Ganger. Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity. In USENIX File and Storage Technologies (FAST), 2019.
[34]
Swaroop Kavalanekar and Bruce Worthington. Microsoft enterprise traces (SNIA IOTTA trace set 130). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, February 2008.
[35]
Swaroop Kavalanekar, Bruce Worthington, Qi Zhang, and Vishal Sharda. Characterization of storage workload traces from production windows servers. In 2008 IEEE International Symposium on Workload Characterization, pages 119--128. IEEE, 2008.
[36]
Tracy Kimbrel, Andrew Tomkins, R Hugo Patterson, Brian Bershad, Pei Cao, Edward W Felten, Garth A Gibson, Anna R Karlin, and Kai Li. A trace-driven comparison of algorithms for parallel prefetching and caching. In OSDI, pages 19--34, 1996.
[37]
Geoffrey H. Kuenning. Seer traces (ASCII) (SNIA IOTTA trace 4925). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, April 1997.
[38]
Geoffrey H. Kuenning. Seer traces (binary) (SNIA IOTTA trace 1). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, April 1997.
[39]
Geoffrey H. Kuenning. LASR traces (ASCII) (SNIA IOTTA trace set 4924). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, August 2001.
[40]
Geoffrey H. Kuenning. LASR traces (binary) (SNIA IOTTA trace set 4926). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, August 2001.
[41]
Geoffrey H Kuenning, Gerald Popek, and Peter L Reiher. An analysis of trace data for predictive file caching in mobile computing. Computer Science Department, University of California, 1994.
[42]
Chunghan Lee, Tatsuo Kumano, Tatsuma Matsuki, Hiroshi Endo, Naoto Fukumoto, and Mariko Sugawara. Understanding storage traffic characteristics on enterprise virtual desktop infrastructure. In Proceedings of the 10th ACM International Systems and Storage Conference, pages 1--11, 2017.
[43]
Sai-Qin Long, Yue-Long Zhao, and Wei Chen. Morm: A multi-objective optimized replication management strategy for cloud storage cluster. J. Syst. Archit., 60(2):234--244, feb 2014.
[44]
Yung-Hsiang Lu and Giovanni De Micheli. Adaptive hard disk power management on personal computers. In Proceedings Ninth Great Lakes Symposium on VLSI, pages 50--53. IEEE, 1999.
[45]
Christopher R Lumb, Jiri Schindler, Gregory R Ganger, et al. Freeblock scheduling outside of disk firmware. In USENIX File and Storage Technologies (FAST), 2002.
[46]
Christopher R Lumb, Jiri Schindler, Gregory R Ganger, David F Nagle, and Erik Riedel. Towards higher disk head utilization: extracting free bandwidth from busy disk drives. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2000.
[47]
Michael P Mesnier, Matthew Wachs, Raja R Simbasivan, Julio Lopez, James Hendricks, Gregory R Ganger, and David R O'Hallaron. //trace: parallel trace replay with approximate causal events. 2007.
[48]
Lily B. Mummert and Mahadev Satyanarayanan. CMU DFS traces (ASCII) (SNIA IOTTA trace set 5144). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, December 1993.
[49]
Lily B. Mummert and Mahadev Satyanarayanan. CMU DFS traces (binary) (SNIA IOTTA trace set 384). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, December 1993.
[50]
Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. MSR Cambridge traces (SNIA IOTTA trace set 388). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2007.
[51]
John K Ousterhout, Herve Da Costa, David Harrison, John A Kunze, Mike Kupfer, and James G Thompson. A trace-driven analysis of the unix 4.2 bsd file system. In Proceedings of the tenth ACM symposium on Operating systems principles, pages 15--24, 1985.
[52]
Eduardo Pinheiro and Ricardo Bianchini. Energy conservation techniques for disk array-based servers. ICS '04, page 68--78, New York, NY, USA, 2004. Association for Computing Machinery.
[53]
Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso. Failure Trends in a Large Disk Drive Population. In USENIX File and Storage Technologies (FAST), 2007.
[54]
Jiri Schindler and Gregory R Ganger. Automated disk drive characterization. ACM SIGMETRICS Performance Evaluation Review, 28(1):112--113, 2000.
[55]
Bianca Schroeder and Garth A Gibson. Understanding failures in petascale computers. In Journal of Physics: Conference Series. IOP Publishing, 2007.
[56]
Bianca Schroeder, Arif Merchant, and Raghav Lagisetty. Reliability of nand-based ssds: What field studies tell us. Proceedings of the IEEE, 105(9):1751--1769, 2017.
[57]
Vishal Sharda, Swaroop Kavalanekar, and Bruce Worthington. Microsoft production server traces (SNIA IOTTA trace set 158). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2008.
[58]
Anton Shilov. Seagate's Roadmap: The Path to 120 TB Hard Drives, https://www.anandtech.com/show/16544/seagates-roadmap-120-tb-hdds.
[59]
Benjamin H Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. 2010.
[60]
Tajana Simunic, Luca Benini, Peter Glynn, and Giovanni De Micheli. Dynamic power management of laptop hard disk. In Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537), page 736. IEEE, 2000.
[61]
Keith A Smith and Margo I Seltzer. File system aging---increasing the relevance of file system benchmarks. In Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 203--213, 1997.
[62]
Vasily Tarasov, Koundinya Santhosh Kumar, Erez Zadok, and Geoff Kuenning. T2m: Converting i/o traces to workload models.
[63]
Vasily Tarasov, Santhosh Kumar, Jack Ma, Dean Hildebrand, Anna Povzner, Geoff Kuenning, and Erez Zadok. Extracting flexible, replayable models from large block traces. In FAST, volume 12, page 22, 2012.
[64]
Mojtaba Tarihi, Hossein Asadi, and Hamid Sarbazi-Azad. Diskaccel: Accelerating disk-based experiments by representative sampling. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 297--308, 2015.
[65]
Eno Thereska, Austin Donnelly, and Dushyanth Narayanan. Sierra: Practical power-proportionality for data center storage. In Proceedings of the Sixth Conference on Computer Systems, EuroSys '11, page 169--182, New York, NY, USA, 2011. Association for Computing Machinery.
[66]
Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-El-Malek, Julio Lopez, and Gregory R Ganger. Stardust: Tracking activity in a distributed storage system. ACM SIGMETRICS Performance Evaluation Review, 34(1):3--14, 2006.
[67]
Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J Franklin, Michael I Jordan, and David A Patterson. The scads director: Scaling a distributed storage system under stringent performance requirements. In FAST, volume 11, pages 163--176, 2011.
[68]
Marc-André Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. Challenges and solutions for tracing storage systems: A case study with spectrum scale. ACM Transactions on Storage (TOS), 14(2):1--24, 2018.
[69]
Alistair Veitch. HP FSTraces (SNIA IOTTA trace set 27419). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, December 2000.
[70]
Akshat Verma, Ricardo Koller, Luis Useche, and Raju Rangaswami. FIU traces (SNIA IOTTA trace set 390). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2009.
[71]
Mengzhi Wang, Anastassia Ailamaki, and Christos Faloutsos. Capturing the spatio-temporal behavior of real traffic data. Performance Evaluation, 49(1-4):147--163, 2002.
[72]
Mengzhi Wang, Tara Madhyastha, Ngai Hang Chan, Spiros Papadimitriou, and Christos Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In Proceedings 18th International Conference on Data Engineering, pages 507--516. IEEE, 2002.
[73]
Charles Weddle, Mathew Oldham, Jin Qian, An-I Andy Wang, Peter Reiher, and Geoff Kuenning. Paraid: A gear-shifting power-aware raid. ACM Transactions on Storage (TOS), 3(3):13--es, 2007.
[74]
Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. Ceph: A scalable, high-performance distributed file system. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006.
[75]
Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas JA Harvey, and Andrew Warfield. Characterizing storage workloads with counter stacks. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 335--349, 2014.
[76]
Mingli Wu, Zhongmei Zhang, and Yebai Li. Application research of hadoop resource monitoring system based on ganglia and nagios. In 2013 IEEE 4th International Conference on Software Engineering and Service Science, pages 684--688. IEEE, 2013.
[77]
Tao Xie. Sea: A striping-based energy-aware strategy for data placement in raid-structured storage systems. IEEE Transactions on Computers, 57(6):748--761, 2008.
[78]
Gala Yadgar, MOSHE Gabel, Shehbaz Jaffer, and Bianca Schroeder. Ssd-based workload characteristics and their performance implications. ACM Transactions on Storage (TOS), 17(1):1--26, 2021.
[79]
Bin Yang, Wei Xue, Tianyu Zhang, Shichao Liu, Xiaosong Ma, Xiyang Wang, and Weiguo Liu. End-to-end i/o monitoring on leading supercomputers. ACM Transactions on Storage, 19(1):1--35, 2023.
[80]
John Zedlewski, Sumeet Sobti, Nitin Garg, Fengzhou Zheng, Arvind Krishnamurthy, and Randolph Wang. Modeling {Hard-Disk} power consumption. In 2nd USENIX Conference on File and Storage Technologies (FAST 03), 2003.
[81]
Jianyong Zhang, Anand Sivasubramaniam, Hubertus Franke, Natarajan Gautam, Yanyong Zhang, and Shailabh Nagar. Synthesizing representative i/o workloads for tpc-h. In 10th International Symposium on High Performance Computer Architecture (HPCA'04), pages 142--142. IEEE, 2004.
[82]
Ningning Zhu, Jiawu Chen, Tzi-cker Chiueh, and Daniel Ellard. An nfs trace player for file system evaluation. Technical report, Citeseer, 2003.
[83]
Ningning Zhu, Jiawu Chen, Tzi-Cker Chiueh, and Daniel Ellard. Tbbt: Scalable and accurate trace replay for file server evaluation. ACM SIGMETRICS Performance Evaluation Review, 33(1):392--393, 2005.
[84]
Qingbo Zhu, Zhifeng Chen, Lin Tan, Yuanyuan Zhou, Kimberly Keeton, and John Wilkes. Hibernator: Helping disk arrays sleep through the winter. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, SOSP '05, page 177--190, New York, NY, USA, 2005. Association for Computing Machinery.

Cited By

View all
  • (2024)Morph: Efficient File-Lifetime Redundancy Management for Cluster File SystemsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695981(330-346)Online publication date: 4-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
April 2024
1106 pages
ISBN:9798400703867
DOI:10.1145/3620666
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 April 2024

Check for updates

Qualifiers

  • Research-article

Conference

ASPLOS '24

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,192
  • Downloads (Last 6 weeks)184
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Morph: Efficient File-Lifetime Redundancy Management for Cluster File SystemsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695981(330-346)Online publication date: 4-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media