research-article

AI: a lightweight system for tolerating concurrency bugs

Authors:

Mingxing Zhang,

Weimin ZhengAuthors Info & Claims

FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Pages 330 - 340

https://doi.org/10.1145/2635868.2635885

Published: 11 November 2014 Publication History

Abstract

Concurrency bugs are notoriously difficult to eradicate during software testing because of their non-deterministic nature. Moreover, fixing concurrency bugs is time-consuming and error-prone. Thus, tolerating concurrency bugs during production runs is an attractive complementary approach to bug detection and testing. Unfortunately, existing bug-tolerating tools are usually either 1) constrained in types of bugs they can handle or 2) requiring roll-back mechanism, which can hitherto not be fully achieved efficiently without hardware supports. This paper presents a novel program invariant, called Anticipating Invariant (AI), which can help anticipate bugs before any irreversible changes are made. Benefiting from this ability of anticipating bugs beforehand, our software-only system is able to forestall the failures with a simple thread stalling technique, which does not rely on execution roll-back and hence has good performance Experiments with 35 real-world concurrency bugs demonstrate that AI is capable of detecting and tolerating most types of concurrency bugs, including both atomicity and order violations. Two new bugs have been detected and confirmed by the corresponding developers. Performance evaluation with 6 representative parallel programs shows that AI incurs negligible overhead (<1%) for many nontrivial desktop and server applications.

References

[1]

MySQL. Bug report time to close stats. http://bugs.mysql.com/bugstats.php.

[2]

Mysql bugs: Statistics. http://bugs.mysql.com/bugstats.php.

[3]

S. Burckhardt, P. Kothari, M. Musuvathi, and S. Nagarakatte. A randomized scheduler with probabilistic guarantees of finding bugs. ASPLOS ’10.

Digital Library

[4]

M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao. The daikon system for dynamic detection of likely invariants. Sci. Comput. Program., 69(1-3), Dec. 2007.

Digital Library

[5]

Z. Gu, E. T. Barr, D. J. Hamilton, and Z. Su. Has the bug really been fixed? ICSE ’10.

Digital Library

[6]

G. Jin, A. Thakur, B. Liblit, and S. Lu. Instrumentation and sampling strategies for cooperative concurrency bug isolation. OOPSLA ’10.

Digital Library

[7]

S. Kundu, M. K. Ganai, and C. Wang. Contessa: Concurrency testing augmented with symbolic analysis. CAV’10.

Digital Library

[8]

R. H. A.-D. S. L. Lanyue Lu, Andrea C. Arpaci-Dusseau. A study of linux file system evolution. FAST ’04.

Digital Library

[9]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. CGO ’’04.

Digital Library

[10]

C. E. Leiserson, R. L. Rivest, C. Stein, and T. H. Cormen. Introduction to algorithms. The MIT press, 2001.

[11]

S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. ASPLOS ’08.

Digital Library

[12]

S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: detecting atomicity violations via access interleaving invariants. ASPLOS ’06, 2006.

Digital Library

[13]

B. Lucia and L. Ceze. Cooperative empirical failure avoidance for multithreaded programs. ASPLOS ’13.

Digital Library

[14]

B. Lucia and L. Ceze. Finding concurrency bugs with context-aware communication graphs. MICRO 42, 2009.

Digital Library

[15]

B. Lucia, L. Ceze, and K. Strauss. ColorSafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations. ISCA ’10, 2010.

Digital Library

[16]

B. Lucia, J. Devietti, K. Strauss, and L. Ceze. Atom-Aid: Detecting and surviving atomicity violations. ISCA ’08, 2008.

Digital Library

[17]

D. Marino, M. Musuvathi, and S. Narayanasamy. LiteRace: effective sampling for lightweight data-race detection. PLDI ’09.

Digital Library

[18]

M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. PLDI ’07, 2007.

Digital Library

[19]

M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. Nainar, and I. Neamtiu. Finding and reproducing heisenbugs in concurrent programs. OSDI ’08.

Digital Library

[20]

A. Muzahid, N. Otsuki, and J. Torrellas. AtomTracker: A comprehensive approach to atomic region inference and violation detection. MICRO 43, 2010.

Digital Library

[21]

S. Nagarakatte, S. Burckhardt, M. M. Martin, and M. Musuvathi. Multicore acceleration of priority-based schedulers for concurrency bug detection. PLDI ’12, 2012.

Digital Library

[22]

C.-S. Park and K. Sen. Randomized active atomicity violation detection in concurrent programs. SIGSOFT ’08/FSE-16, 2008.

Digital Library

[23]

S. Park, S. Lu, and Y. Zhou. CTrigger: exposing atomicity violation bugs from their hiding places. ASPLOS ’09, 2009.

Digital Library

[24]

S. Park, R. W. Vuduc, and M. J. Harrold. Falcon: fault localization in concurrent programs. ICSE ’10.

Digital Library

[25]

K. Sen. Race directed random testing of concurrent programs. PLDI ’08.

Digital Library

[26]

K. Serebryany, A. Potapenko, T. Iskhodzhanov, and D. Vyukov. Dynamic race detection with llvm compiler. In Runtime Verification, pages 110–114. Springer, 2012.

Digital Library

[27]

Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I use the wrong definition?: Defuse: definition-use invariants for detecting concurrency and sequential bugs. OOPSLA ’10, 2010.

Digital Library

[28]

Y. Smaragdakis, J. Evans, C. Sadowski, J. Yi, and C. Flanagan. Sound predictive race detection in polynomial time. POPL ’12.

Digital Library

[29]

K. Veeraraghavan, P. M. Chen, J. Flinn, and S. Narayanasamy. Detecting and surviving data races using complementary schedules. SOSP ’11.

Digital Library

[30]

H. Volos, A. Tack, M. Swift, and S. Lu. Applying transactional memory to concurrency bugs. ASPLOS ’12.

Digital Library

[31]

C. Wang, M. Said, and A. Gupta. Coverage guided systematic concurrency testing. ICSE ’11.

Digital Library

[32]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. ISCA ’95, 1995.

Digital Library

[33]

J. Wu, H. Cui, and J. Yang. Bypassing races in live applications with execution filters. OSDI ’10, pages 1–13, 2010.

Digital Library

[34]

J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. ISCA ’09, 2009.

Digital Library

[35]

J. Yu and S. Narayanasamy. Tolerating concurrency bugs using transactions as lifeguards. MICRO 43, 2010.

Digital Library

[36]

W. Zhang, M. de Kruijf, A. Li, S. Lu, and K. Sankaralingam. ConAir: Featherweight concurrency bug recovery via single-threaded idempotent execution. ASPLOS ’13.

Digital Library

[37]

W. Zhang, C. Sun, and S. Lu. ConMem: detecting severe concurrency bugs through an effect-oriented approach. ASPLOS ’10, 2010. Introduction Motivation Our New Approach Contributions Anticipating Invariant Definition Case Studies Atomicity Violation Order Violation Rationales Implementation Overview Training & Tolerating Custom Instrumentation Strategy Experimental Evaluation Test Platform, Applications, and Bugs Detecting and Anticipating Capability AVIO DUI CCI PSet Discussion New Bugs Performance Sufficient Training Related Work Conclusion Acknowledgments References

Digital Library

Cited By

Chen ZHua YZhang YDing LFalsafi BFerdman MLu SWenisch T(2022)Efficiently detecting concurrency bugs in persistent memory programsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507755(873-887)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507755
Chen YWang SLu SSankaralingam K(2019)Applying Transactional Memory for Concurrency-Bug Failure Recovery in Production RunsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287765630:5(990-1006)Online publication date: 1-May-2019
https://doi.org/10.1109/TPDS.2018.2877656
Yu ZZuo YZhao Y(2019)Convoider: A Concurrency Bug Avoider Based on Transparent Software Transactional MemoryInternational Journal of Parallel Programming10.1007/s10766-019-00642-1Online publication date: 12-Sep-2019
https://doi.org/10.1007/s10766-019-00642-1
Show More Cited By

Index Terms

AI: a lightweight system for tolerating concurrency bugs

Recommendations

Grail: context-aware fixing of concurrency bugs
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

Writing efficient synchronization for multithreaded programs is notoriously hard. The resulting code often contains subtle concurrency bugs. Even worse, many bug fixes introduce new bugs. A classic example, seen widely in practice, is deadlocks ...
Tolerating Concurrency Bugs Using Transactions as Lifeguards
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Parallel programming is hard, because it is impractical to test all possible thread interleavings. One promising approach to improve a multi-threaded program’s reliability is to constrain a production run’s thread interleavings in such a way that ...
A Lightweight System for Detecting and Tolerating Concurrency Bugs

Along with the prevalence of multi-threaded programs, concurrency bugs have become one of the most important sources of software bugs. Even worse, due to the non-deterministic nature of concurrency bugs, these bugs are both difficult to detect and fix ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering

November 2014

856 pages

ISBN:9781450330565

DOI:10.1145/2635868

General Chair:
Shing-Chi Cheung
Hong Kong University of Science and Technology, China
,
Program Chairs:
Alessandro Orso
Georgia Institute of Technology, USA
,
Margaret-Anne Storey
University of Victoria, Canada

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGSOFT/FSE'14

Sponsor:

SIGSOFT

SIGSOFT/FSE'14: 22nd ACM SIGSOFT Symposium on the Foundations of Software Engineering

November 16 - 21, 2014

Hong Kong, China

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
630
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen ZHua YZhang YDing LFalsafi BFerdman MLu SWenisch T(2022)Efficiently detecting concurrency bugs in persistent memory programsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507755(873-887)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507755
Chen YWang SLu SSankaralingam K(2019)Applying Transactional Memory for Concurrency-Bug Failure Recovery in Production RunsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287765630:5(990-1006)Online publication date: 1-May-2019
https://doi.org/10.1109/TPDS.2018.2877656
Yu ZZuo YZhao Y(2019)Convoider: A Concurrency Bug Avoider Based on Transparent Software Transactional MemoryInternational Journal of Parallel Programming10.1007/s10766-019-00642-1Online publication date: 12-Sep-2019
https://doi.org/10.1007/s10766-019-00642-1
Chen DJiang YXu CMa XLu JLeavens GGarcia APăsăreanu C(2018)Testing multithreaded programs via thread speed controlProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3236077(15-25)Online publication date: 26-Oct-2018
https://dl.acm.org/doi/10.1145/3236024.3236077
Wang JDou WGao YGao CQin FYin KWei JRosu GDi Penta MNguyen T(2017)A comprehensive study on real world concurrency bugs in Node.jsProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155628(520-531)Online publication date: 30-Oct-2017
https://dl.acm.org/doi/10.5555/3155562.3155628
Jahic JKuhn TJung MWehn N(2017)Supervised testing of concurrent software in embedded systems2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)10.1109/SAMOS.2017.8344633(233-238)Online publication date: Jul-2017
https://doi.org/10.1109/SAMOS.2017.8344633
Adamsen CMøller AKarim RSridharan MTip FSen KUchitel SOrso ARobillard M(2017)Repairing event race errors by controlling nondeterminismProceedings of the 39th International Conference on Software Engineering10.1109/ICSE.2017.34(289-299)Online publication date: 20-May-2017
https://dl.acm.org/doi/10.1109/ICSE.2017.34
Wang JDou WGao YGao CQin FYin KWei J(2017)A comprehensive study on real world concurrency bugs in Node.js2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE.2017.8115663(520-531)Online publication date: Oct-2017
https://doi.org/10.1109/ASE.2017.8115663
Zhang MWu YLu SQi SRen JZheng W(2016)A Lightweight System for Detecting and Tolerating Concurrency BugsIEEE Transactions on Software Engineering10.1109/TSE.2016.253166642:10(899-917)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1109/TSE.2016.2531666
Tchamgoue GKim KJun Y(2016)EventHealerJournal of Systems and Software10.1016/j.jss.2016.02.051118:C(208-220)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1016/j.jss.2016.02.051
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten