Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3552326.3587444acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Public Access

Effective Performance Issue Diagnosis with Value-Assisted Cost Profiling

Published: 08 May 2023 Publication History

Abstract

Diagnosing performance issues is often difficult, especially when they occur only during some program executions. Profilers can help with performance debugging, but are ineffective when the most costly functions are not the root causes of performance issues. To address this problem, we introduce a new profiling methodology, value-assisted cost profiling, and a tool vProf. Our insight is that capturing the values of variables can greatly help diagnose performance issues. vProf continuously records values while profiling normal and buggy program executions. It identifies anomalies in the values and the functions where they occur to pinpoint the real root causes of performance issues. Using a set of 15 real-world performance bugs in four widely used applications, we show that vProf is effective at diagnosing all of the issues while other state-of-the-art tools diagnose only a few of them. We further use vProf to diagnose longstanding performance issues in these applications that have been unresolved for over four years.

References

[1]
Mohammad Mejbah ul Alam, Tongping Liu, Guangming Zeng, and Abdullah Muzahid. SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs. In Proceedings of the 12th European Conference on Computer Systems, page 298--313, April 2017.
[2]
T. W. Anderson and D. A. Darling. Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes. The Annals of Mathematical Statistics, 23(2):193 -- 212, 1952.
[3]
Apache. httpd: Apache Hypertext Transfer Protocol Server. https://httpd.apache.org/.
[4]
Apple. macOS Instruments Overview. https://help.apple.com/instruments/mac/current/#/dev7b09c84f5.
[5]
Mona Attariyan, Michael Chow, and Jason Flinn. X-Ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, page 307--320, October 2012.
[6]
Reza Azimi, Michael Stumm, and Robert W. Wisniewski. Online Performance Analysis by Statistical Sampling of Microprocessor Performance Counters. In Proceedings of the 19th Annual International Conference on Supercomputing, page 101--110, June 2005.
[7]
Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. Using Magpie for Request Extraction and Workload Modelling. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation, pages 259--272, December 2004.
[8]
Eli Bendersky. Parsing ELF and DWARF in Python. https://github.com/eliben/pyelftools.
[9]
Damien BRS and Thirunarayanan Balathandayuthapani. Recovery Failure: Loop of Read Redo Log up to LSN. https://jira.mariadb.org/browse/MDEV-21826.
[10]
Marc Brünink and David S. Rosenblum. Mining Performance Specifications. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, page 39--49, November 2016.
[11]
Milind Chabbi and John Mellor-Crummey. DeadSpy:ATool to Pinpoint Program Inefficiencies. In Proceedings of the 10th International Symposium on Code Generation and Optimization, page 124--134, March 2012.
[12]
Charlie Curtsinger and Emery D. Berger. COZ: Finding Code that Counts with Causal Profiling. In Proceedings of the 25th ACM Symposium on Operating Systems Principles, pages 184--197, October 2015.
[13]
Ting Dai, Daniel Dean, Peipei Wang, Xiaohui Gu, and Shan Lu. Hytrace: A Hybrid Approach to Performance Bug Diagnosis in Production Cloud Infrastructures. In Proceedings of the 2017 Symposium on Cloud Computing, pages 641 -- 652, September 2017.
[14]
Luca Della Toffola, Michael Pradel, and Thomas R. Gross. Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications, page 607--622, October 2015.
[15]
Michael J. Eager. Introduction to the DWARF Debugging Format. pages 1--11, 2012.
[16]
Michael D Ernst, Jeff H Perkins, Philip J Guo, Stephen McCamant, Carlos Pacheco, Matthew S Tschantz, and Chen Xiao. The Daikon System for Dynamic Detection of Likely Invariants. Science of Computer Programming, 69(1--3):35--45, December 2007.
[17]
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. A Performance Counter Architecture for Computing Accurate CPI Components. In Proceedings of the 12th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, page 175--184, October 2006.
[18]
Thomas Gleixner, Ingo Molnar, et al. perf: Linux Profiling with Performance Counters. https://perf.wiki.kernel.org/index.php/Main_Page.
[19]
Google. Gperftools: Google Performance Tools. https://github.com/gperftools/gperftools.
[20]
Susan L. Graham, Peter B. Kessler, and Marshall K. Mckusick. Gprof: A Call Graph Execution Profiler. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, page 120--126, June 1982.
[21]
Yigong Hu, Gongqi Huang, and Peng Huang. Automated Reasoning and Detection of Specious Configuration in Large Systems with Symbolic Execution. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, pages 719--734, November 2020.
[22]
Intel. Intel VTune Profiler. https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html.
[23]
Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. Understanding and Detecting Real-World Performance Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, page 77--88, June 2012.
[24]
Tanvir Ahmed Khan, Ian Neal, Gilles Pokam, Barzan Mozafari, and Baris Kasikci. DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, pages 163--181, July 2021.
[25]
Chung Hwan Kim, Junghwan Rhee, Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, page 595--606, November 2016.
[26]
John Levon. OProfile: A System Profiler for Linux. https://oprofile.sourceforge.io/about.
[27]
Sarah Jamie Lewis. A Performance Debugging Story. https://twitter.com/SarahJamieLewis/status/1397313537538592769.
[28]
Xu Liu, Kamal Sharma, and John Mellor-Crummey. ArrayTool: A Lightweight Profiler to Guide Array Regrouping. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, page 405--416, August 2014.
[29]
LLVM. Writing an LLVM Pass. https://llvm.org/docs/WritingAnLLVMPass.html.
[30]
MariaDB. The Open Source Relational Database. https://mariadb.org.
[31]
David Mosberger-Tang, Arun Sharma, Dave Watson, et al. The libunwind Project. https://savannah.nongnu.org/projects/libunwind/.
[32]
Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 89--100, June 2007.
[33]
Khanh Nguyen and Guoqing Xu. Cachetor: Detecting Cacheable Data to Remove Bloat. In Proceedings of the 21st ACM SIGSOFT International Symposium on Foundations of Software Engineering, page 268--278, August 2013.
[34]
M.S. Nikulin. Hellinger Distance. Encyclopedia of Mathematics, 2001.
[35]
Oswaldo Olivo, Isil Dillig, and Calvin Lin. Static Detection of Asymptotic Performance Bugs in Collection Traversals. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, page 369--378, June 2015.
[36]
PostgreSQL. The World's Most Advanced Open Source Relational Database. https://www.postgresql.org.
[37]
Redis. A Vibrant, Open Source Database. https://redis.io.
[38]
Daniele Rogora, Antonio Carzaniga, Amer Diwan, Matthias Hauswirth, and Robert Soulé. Analyzing System Performance with Probabilistic Performance Annotations. In Proceedings of the 15th European Conference on Computer Systems, pages 1--14, April 2020.
[39]
Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. Managing Performance vs. Accuracy Trade-Offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT International Symposium on Foundations of Software Engineering, page 124--134, September 2011.
[40]
Linhai Song and Shan Lu. Statistical Debugging for Real-World Performance Problems. In Proceedings of the 2014 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications, page 561--578, October 2014.
[41]
Miguel Velez, Pooyan Jamshidi, Norbert Siegmund, Sven Apel, and Christian Kästner. On Debugging the Performance of Configurable Software Systems: Developer Needs and Tailored Tool Support. In Proceedings of the 44th International Conference on Software Engineering, page 1571--1583, July 2022.
[42]
Shasha Wen, Milind Chabbi, and Xu Liu. REDSPY: Exploring Value Locality in Software. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, page 47--61, March 2017.
[43]
Lingmei Weng, Peng Huang, Jason Nieh, and Junfeng Yang. Argus: Debugging Performance Issues in Modern Desktop Applications with Annotated Causal Tracing. In 2021 USENIX Annual Technical Conference, pages 193--207, July 2021.
[44]
Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian, and Xu Liu. ZeroSpy: Exploring Software Inefficiency with Redundant Zeros. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--14, November 2020.
[45]
Dmitrijs Zaparanuks and Matthias Hauswirth. Algorithmic Profiling. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, page 67--76, June 2012.
[46]
Xu Zhao, Kirk Rodrigues, Yu Luo, Ding Yuan, and Michael Stumm. Non-Intrusive Performance Profiling for Entire Software Stacks Based on the Flow Reconstruction Principle. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, page 603--618, November 2016.

Cited By

View all
  • (2024)Demystifying the Fight Against Complexity: A Comprehensive Study of Live Debugging Activities in Production Cloud SystemsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698568(341-360)Online publication date: 20-Nov-2024
  • (2024)Diagnosing Performance Issues for Large-Scale Microservice Systems With Heterogeneous GraphIEEE Transactions on Services Computing10.1109/TSC.2024.340217217:5(2223-2235)Online publication date: Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems
May 2023
910 pages
ISBN:9781450394871
DOI:10.1145/3552326
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. debugging
  2. profilers
  3. program analysis

Qualifiers

  • Research-article

Funding Sources

  • DARPA
  • NSF grants
  • Amazon Research Award
  • Meta Research Award
  • Guggenheim Fellowship
  • GE/DARPA grant
  • CAIT grant
  • JP Morgan
  • Didi
  • Accenture

Conference

EuroSys '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)364
  • Downloads (Last 6 weeks)42
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Demystifying the Fight Against Complexity: A Comprehensive Study of Live Debugging Activities in Production Cloud SystemsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698568(341-360)Online publication date: 20-Nov-2024
  • (2024)Diagnosing Performance Issues for Large-Scale Microservice Systems With Heterogeneous GraphIEEE Transactions on Services Computing10.1109/TSC.2024.340217217:5(2223-2235)Online publication date: Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media