Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Instrumentation and sampling strategies for cooperative concurrency bug isolation

Published: 17 October 2010 Publication History

Abstract

Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under specific execution interleavings that may not arise during in-house testing, thereby demanding a lightweight program monitoring technique that can be used post-deployment.
We present Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by crugs. CCI tracks specific thread interleavings at run-time, and uses statistical models to identify strong failure predictors among these. We offer a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability. We also develop variant random sampling strategies that suit different types of predicates and help keep the run-time overhead low. Experiments with 9 real-world bugs in 6 non-trivial C applications show that these schemes span a wide spectrum of performance and diagnosis capabilities, each suitable for different usage scenarios.

References

[1]
}}The Apache Software Foundation. Apache HTTP Server Project. http://httpd.apache.org/, 2009.
[2]
}}C. Armour-Brown, J. Fitzhardinge, T. Hughes, N. Nethercote, P. Mackerras, D. Mueller, J. Seward, B. V. Assche, R. Walsh, and J. Weidendorfer. Valgrind User Manual. Valgrind project, 3.5.0 edition, Aug. 2009. http://valgrind.org/docs/manual/manual.html.
[3]
}}P. Arumuga Nainar, T. Chen, J. Rosin, and B. Liblit. Statistical debugging using compound Boolean predicates. In ISSTA, 2007.
[4]
}}E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for c/c++. In OOPSLA, 2009.
[5]
}}M. D. Bond, K. E. Coons, and K. S. McKinley. Pacer: Proportional detection of data races. In PLDI, 2010.
[6]
}}A. Bron, E. Farchi, Y. Magid, Y. Nir, and S. Ur. Applications of synchronization coverage. In PPOPP, 2005.
[7]
}}J. Burnim and K. Sen. Asserting and checking determinism for multithreaded programs. In FSE, 2009.
[8]
}}T. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In ICSE, May 2009.
[9]
}}J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and precise datarace detection for multithreaded object-oriented programs. In PLDI, 2002.
[10]
}}G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008.
[11]
}}T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, 2009.
[12]
}}D. Engler and K. Ashcraft. RacerX: Effective, static detection of race conditions and deadlocks. SIGOPS Oper. Syst. Rev., 37(5), 2003.
[13]
}}C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker for multithreaded programs. In POPL, 2004.
[14]
}}C. Flanagan and S. N. Freund. Fasttrack: efficient and precise dynamic race detection. In PLDI, 2009.
[15]
}}J. Gilchrist. PBZIP2: Parallel BZIP2 Data Compression Software. http://compression.ca/pbzip2/, 2009.
[16]
}}P. Godefroid and N. Nagappan. Concurrency at Microsoft - an exploratory survey. In Workshop on Exploiting Concurrency Efficiently and Correctly, 2008.
[17]
}}T. A. Henzinger, R. Jhala, and R. Majumdar. Race checking by context inference. In PLDI, 2004.
[18]
}}M. Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1), 1991.
[19]
}}M. Hirzel and T. M. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2001.
[20]
}}D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multi-processor replay. CACM, 2009.
[21]
}}V. Kahlon, Y. Yang, S. Sankaranarayanan, and A. Gupta. Fast and accurate static data-race detection for concurrent programs. In CAV, 2007.
[22]
}}T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4), 1987.
[23]
}}B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003.
[24]
}}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Public deployment of Cooperative Bug Isolation. In RAMSS, 2004.
[25]
}}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In PLDI, 2005.
[26]
}}S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting atomicity violations via access-interleaving invariants. In ASPLOS, 2006.
[27]
}}S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes - a comprehensive study of real world concurrency bug characteristics. In ASPLOS, March 2008.
[28]
}}B. Lucia, J. Devietti, L. Ceze, and K. Strauss. Atom-aid: Detecting and surviving atomicity violations. IEEE Micro, 29 (1), 2009.
[29]
}}D. Marino, M. Musuvathi, and S. Narayanasamy. Effective sampling for lightweight data-race detection. In PLDI, 2009.
[30]
}}M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. In PLDI, 2007.
[31]
}}S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically classifying benign and harmful data races using replay analysis. In PLDI, 2007.
[32]
}}S. Park, S. Lu, and Y. Zhou. CTrigger: exposing atomicity violation bugs from their hiding places. In ASPLOS, 2009.
[33]
}}W. Pugh and N. Ayewah. Unit testing concurrent software. In ASE, 2007.
[34]
}}S. Rajamani, G. Ramalingam, V.-P. Ranganath, and K. Vaswani. Isolator: dynamically ensuring isolation in comcurrent programs. In ASPLOS, 2009.
[35]
}}P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In PPoPP, 2009.
[36]
}}S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15, 1997.
[37]
}}SecurityFocus. Software bug contributed to blackout. http://www.securityfocus.com/news/8016, Feb. 2004.
[38]
}}K. Sen. Race directed random testing of concurrent programs. In PLDI, 2008.
[39]
}}Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I Use the Wrong Definition? DefUse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs. In OOPSLA, 2010.
[40]
}}A. Sussman and J. Trawick. Corrupt log lines at high volumes. https://issues.apache.org/bugzilla/show_bug.cgi?id=25520, 2003.
[41]
}}A. Thakur, R. Sen, B. Liblit, and S. Lu. Cooperative Crug Isolation. In WODA, 2009.
[42]
}}M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006.
[43]
}}M. T. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchronization. In POPL, 2010.
[44]
}}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995.
[45]
}}J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ISCA, 2009.
[46]
}}Y. Yu, T. Rodeheffer, and W. Chen. Racetrack: efficient detection of data race conditions via adaptive tracking. In SOSP, 2005.

Cited By

View all
  • (2022)Toward More Efficient Statistical Debugging with Abstraction RefinementACM Transactions on Software Engineering and Methodology10.1145/3544790Online publication date: 23-Jun-2022
  • (2022)Algorithmic Profiling for Real-World Complexity ProblemsIEEE Transactions on Software Engineering10.1109/TSE.2021.306765248:7(2680-2694)Online publication date: 1-Jul-2022
  • (2021)Postmortem accurate IR-level state recovery for deployed concurrent programsACM SIGAPP Applied Computing Review10.1145/3493499.349350221:3(33-48)Online publication date: 20-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 45, Issue 10
OOPSLA '10
October 2010
957 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1932682
Issue’s Table of Contents
  • cover image ACM Conferences
    OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
    October 2010
    984 pages
    ISBN:9781450302036
    DOI:10.1145/1869459
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2010
Published in SIGPLAN Volume 45, Issue 10

Check for updates

Author Tags

  1. bug isolation
  2. concurrency
  3. random sampling
  4. statistical debugging

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Toward More Efficient Statistical Debugging with Abstraction RefinementACM Transactions on Software Engineering and Methodology10.1145/3544790Online publication date: 23-Jun-2022
  • (2022)Algorithmic Profiling for Real-World Complexity ProblemsIEEE Transactions on Software Engineering10.1109/TSE.2021.306765248:7(2680-2694)Online publication date: 1-Jul-2022
  • (2021)Postmortem accurate IR-level state recovery for deployed concurrent programsACM SIGAPP Applied Computing Review10.1145/3493499.349350221:3(33-48)Online publication date: 20-Oct-2021
  • (2020)In-The-Field Monitoring of Functional Calls: Is It Feasible?Journal of Systems and Software10.1016/j.jss.2020.110523(110523)Online publication date: Jan-2020
  • (2019)Field Monitoring With Delayed SavingIEEE Access10.1109/ACCESS.2019.29258557(85913-85924)Online publication date: 2019
  • (2019)AVPredictor: Comprehensive prediction and detection of atomicity violationsConcurrency and Computation: Practice and Experience10.1002/cpe.516031:15Online publication date: 20-Feb-2019
  • (2018)A systematic survey on automated concurrency bug detection, exposing, avoidance, and fixing techniquesSoftware Quality Journal10.1007/s11219-017-9385-326:3(855-889)Online publication date: 1-Sep-2018
  • (2017)In the field monitoring of interactive applicationsProceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track10.1109/ICSE-NIER.2017.19(55-58)Online publication date: 20-May-2017
  • (2017)CoopREP: Cooperative record and replay of concurrency bugsSoftware Testing, Verification and Reliability10.1002/stvr.164528:1(e1645)Online publication date: 5-Sep-2017
  • (2016)PerfDoc: Automatic Performance Bug Diagnosis in Production Cloud Computing Infrastructures2016 IEEE Trustcom/BigDataSE/ISPA10.1109/TrustCom.2016.0126(683-690)Online publication date: Aug-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media