research-article

Open access

Learning Abstraction Selection for Bayesian Program Analysis

Authors:

Xin ZhangAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 8, Issue OOPSLA1

Article No.: 128, Pages 954 - 982

https://doi.org/10.1145/3649845

Published: 29 April 2024 Publication History

Abstract

We propose a learning-based approach to select abstractions for Bayesian program analysis. Bayesian program analysis converts a program analysis into a Bayesian model by attaching probabilities to analysis rules. It computes probabilities of analysis results and can update them by learning from user feedback, test runs, and other information. Its abstraction heavily affects how well it learns from such information. There exists a long line of works in selecting abstractions for conventional program analysis but they are not effective for Bayesian program analysis. This is because they do not optimize for generalization ability. We propose a data-driven framework to solve this problem by learning from labeled programs. Starting from an abstraction, it decides how to change the abstraction based on analysis derivations. To be general, it considers graph properties of analysis derivations; to be effective, it considers the derivations before and after changing the abstraction. We demonstrate the effectiveness of our approach using a datarace analysis and a thread-escape analysis.

References

[1]

Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2017. Learning a Static Analyzer from Data. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, Rupak Majumdar and Viktor Kuncak (Eds.) (Lecture Notes in Computer Science, Vol. 10426). Springer, 233–253. https://doi.org/10.1007/978-3-319-63387-9_12

[2]

Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khan, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Bok Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: java benchmarking development and analysis. In Proceedings of the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2006, October 22-26, 2006, Portland, Oregon, USA, Peri L. Tarr and William R. Cook (Eds.). ACM, 169–190. https://doi.org/10.1145/1167473.1167488

Digital Library

[3]

Tianyi Chen, Kihong Heo, and Mukund Raghothaman. 2021. Boosting static analysis accuracy with instrumented test executions. In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 1154–1165. https://doi.org/10.1145/3468264.3468626

Digital Library

[4]

Patrick Cousot. 1996. Abstract Interpretation. ACM Comput. Surv., 28, 2 (1996), 324–328. https://doi.org/10.1145/234528.234740

Digital Library

[5]

Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognit. Lett., 27, 8 (2006), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

Digital Library

[6]

Radu Grigore and Hongseok Yang. 2016. Abstraction refinement guided by a learnt probabilistic model. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, Rastislav Bodík and Rupak Majumdar (Eds.). ACM, 485–498. https://doi.org/10.1145/2837614.2837663

Digital Library

[7]

Behnaz Hassanshahi, Raghavendra Kagalavadi Ramesh, Padmanabhan Krishnan, Bernhard Scholz, and Yi Lu. 2017. An efficient tunable selective points-to analysis for large codebases. In Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, SOAP@PLDI 2017, Barcelona, Spain, June 18, 2017, Karim Ali and Cristina Cifuentes (Eds.). ACM, 13–18. https://doi.org/10.1145/3088515.3088519

Digital Library

[8]

Jingxuan He, Gagandeep Singh, Markus Püschel, and Martin T. Vechev. 2020. Learning fast and precise numerical analysis. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 1112–1127. https://doi.org/10.1145/3385412.3386016

Digital Library

[9]

Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2016. Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis. In Static Analysis - 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Proceedings, Xavier Rival (Ed.) (Lecture Notes in Computer Science, Vol. 9837). Springer, 237–256. https://doi.org/10.1007/978-3-662-53413-7_12

[10]

Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2019. Resource-aware program analysis via online abstraction coarsening. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 94–104. https://doi.org/10.1109/ICSE.2019.00027

Digital Library

[11]

Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017. Machine-learning-guided selectively unsound static analysis. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.). IEEE / ACM, 519–529. https://doi.org/10.1109/ICSE.2017.54

Digital Library

[12]

Kihong Heo, Mukund Raghothaman, Xujie Si, and Mayur Naik. 2019. Continuously reasoning about programs using differential Bayesian inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 561–575. https://doi.org/10.1145/3314221.3314616

Digital Library

[13]

Minseok Jeon, Sehun Jeong, Sung Deok Cha, and Hakjoo Oh. 2019. A Machine-Learning Algorithm with Disjunctive Model for Data-Driven Program Analysis. ACM Trans. Program. Lang. Syst., 41, 2 (2019), 13:1–13:41. https://doi.org/10.1145/3293607

Digital Library

[14]

Minseok Jeon, Sehun Jeong, and Hakjoo Oh. 2018. Precise and scalable points-to analysis via data-driven context tunneling. Proc. ACM Program. Lang., 2, OOPSLA (2018), 140:1–140:29. https://doi.org/10.1145/3276510

Digital Library

[15]

Minseok Jeon, Myungho Lee, and Hakjoo Oh. 2020. Learning graph-based heuristics for pointer analysis without handcrafting application-specific features. Proc. ACM Program. Lang., 4, OOPSLA (2020), 179:1–179:30. https://doi.org/10.1145/3428247

Digital Library

[16]

Minseok Jeon and Hakjoo Oh. 2022. Return of CFA: call-site sensitivity can be superior to object sensitivity even for object-oriented programs. Proc. ACM Program. Lang., 6, POPL (2022), 1–29. https://doi.org/10.1145/3498720

Digital Library

[17]

Sehun Jeong, Minseok Jeon, Sung Deok Cha, and Hakjoo Oh. 2017. Data-driven context-sensitivity for points-to analysis. Proc. ACM Program. Lang., 1, OOPSLA (2017), 100:1–100:28. https://doi.org/10.1145/3133924

Digital Library

[18]

George Kastrinis and Yannis Smaragdakis. 2013. Hybrid context-sensitivity for points-to analysis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013, Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 423–434. https://doi.org/10.1145/2491956.2462191

Digital Library

[19]

Hyunsu Kim, Mukund Raghothaman, and Kihong Heo. 2022. Learning Probabilistic Models for Static Analysis Alarms. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 1282–1293. https://doi.org/10.1145/3510003.3510098

Digital Library

[20]

Scott Kirkpatrick, D. Gelatt Jr., and Mario P. Vecchi. 1983. Optimization by Simmulated Annealing. Sci., 220, 4598 (1983), 671–680. https://doi.org/10.1126/science.220.4598.671

[21]

Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models - Principles and Techniques. MIT Press. isbn:978-0-262-01319-2 https://dl.acm.org/doi/10.5555/1795555

[22]

Haofeng Li, Jie Lu, Haining Meng, Liqing Cao, Yongheng Huang, Lian Li, and Lin Gao. 2022. Generic sensitivity: customizing context-sensitive pointer analysis for generics. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 1110–1121. https://doi.org/10.1145/3540250.3549122

Digital Library

[23]

Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Precision-guided context sensitivity for pointer analysis. Proc. ACM Program. Lang., 2, OOPSLA (2018), 141:1–141:29. https://doi.org/10.1145/3276511

Digital Library

[24]

Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Scalability-first pointer analysis with self-tuning context-sensitivity. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, Gary T. Leavens, Alessandro Garcia, and Corina S. Pasareanu (Eds.). ACM, 129–140. https://doi.org/10.1145/3236024.3236041

Digital Library

[25]

Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2020. A Principled Approach to Selective Context Sensitivity for Pointer Analysis. ACM Trans. Program. Lang. Syst., 42, 2 (2020), 10:1–10:40. https://doi.org/10.1145/3381915

Digital Library

[26]

Percy Liang and Mayur Naik. 2011. Scaling abstraction refinement via pruning. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, Mary W. Hall and David A. Padua (Eds.). ACM, 590–601. https://doi.org/10.1145/1993498.1993567

Digital Library

[27]

Percy Liang, Omer Tripp, and Mayur Naik. 2011. Learning minimal abstractions. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, Thomas Ball and Mooly Sagiv (Eds.). ACM, 31–42. https://doi.org/10.1145/1926385.1926391

Digital Library

[28]

Jingbo Lu and Jingling Xue. 2019. Precision-preserving yet fast object-sensitive pointer analysis with partial context sensitivity. Proc. ACM Program. Lang., 3, OOPSLA (2019), 148:1–148:29. https://doi.org/10.1145/3360574

Digital Library

[29]

Ravi Mangal, Xin Zhang, Aditya V. Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015, Elisabetta Di Nitto, Mark Harman, and Patrick Heymans (Eds.). ACM, 462–473. https://doi.org/10.1145/2786805.2786851

Digital Library

[30]

Ana L. Milanova, Atanas Rountev, and Barbara G. Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Trans. Softw. Eng. Methodol., 14, 1 (2005), 1–41. https://doi.org/10.1145/1044834.1044835

Digital Library

[31]

Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy Belief Propagation for Approximate Inference: An Empirical Study. In UAI ’99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30 - August 1, 1999, Kathryn B. Laskey and Henri Prade (Eds.). Morgan Kaufmann, 467–475. https://dl.acm.org/doi/10.5555/2073796.2073849

[32]

Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective static race detection for Java. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, 2006, Michael I. Schwartzbach and Thomas Ball (Eds.). ACM, 308–319. https://doi.org/10.1145/1133981.1134018

Digital Library

[33]

Mayur Naik, Hongseok Yang, Ghila Castelnuovo, and Mooly Sagiv. 2012. Abstractions from tests. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, John Field and Michael Hicks (Eds.). ACM, 373–386. https://doi.org/10.1145/2103656.2103701

Digital Library

[34]

Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. 2014. Selective context-sensitivity guided by impact pre-analysis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, Michael F. P. O’Boyle and Keshav Pingali (Eds.). ACM, 475–484. https://doi.org/10.1145/2594291.2594318

Digital Library

[35]

Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a strategy for adapting a program analysis via bayesian optimisation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25-30, 2015, Jonathan Aldrich and Patrick Eugster (Eds.). ACM, 572–588. https://doi.org/10.1145/2814270.2814309

Digital Library

[36]

Hila Peleg, Sharon Shoham, and Eran Yahav. 2016. D^3 : Data-Driven Disjunctive Abstraction. In Verification, Model Checking, and Abstract Interpretation - 17th International Conference, VMCAI 2016, St. Petersburg, FL, USA, January 17-19, 2016. Proceedings, Barbara Jobstmann and K. Rustan M. Leino (Eds.) (Lecture Notes in Computer Science, Vol. 9583). Springer, 185–205. https://doi.org/10.1007/978-3-662-49122-5_9

Digital Library

[37]

Mukund Raghothaman, Sulekha Kulkarni, Kihong Heo, and Mayur Naik. 2018. User-guided program reasoning using Bayesian inference. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 722–735. https://doi.org/10.1145/3192366.3192417

Digital Library

[38]

Gagandeep Singh, Markus Püschel, and Martin T. Vechev. 2018. Fast Numerical Program Analysis with Reinforcement Learning. In Computer Aided Verification - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part I, Hana Chockler and Georg Weissenbacher (Eds.) (Lecture Notes in Computer Science, Vol. 10981). Springer, 211–229. https://doi.org/10.1007/978-3-319-96145-3_12

[39]

Yannis Smaragdakis, George Kastrinis, and George Balatsouras. 2014. Introspective analysis: context-sensitivity, across the board. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, Michael F. P. O’Boyle and Keshav Pingali (Eds.). ACM, 485–495. https://doi.org/10.1145/2594291.2594320

Digital Library

[40]

Tian Tan, Yue Li, Xiaoxing Ma, Chang Xu, and Yannis Smaragdakis. 2021. Making pointer analysis more precise by unleashing the power of selective context sensitivity. Proc. ACM Program. Lang., 5, OOPSLA (2021), 1–27. https://doi.org/10.1145/3485524

Digital Library

[41]

Tian Tan, Yue Li, and Jingling Xue. 2016. Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting. In Static Analysis - 23rd International Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Proceedings, Xavier Rival (Ed.) (Lecture Notes in Computer Science, Vol. 9837). Springer, 489–510. https://doi.org/10.1007/978-3-662-53413-7_24

[42]

Tian Tan, Yue Li, and Jingling Xue. 2017. Efficient and precise points-to analysis: modeling the heap by merging equivalent automata. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 278–291. https://doi.org/10.1145/3062341.3062360

Digital Library

[43]

Shiyi Wei and Barbara G. Ryder. 2015. Adaptive Context-sensitive Analysis for JavaScript. In 29th European Conference on Object-Oriented Programming, ECOOP 2015, July 5-10, 2015, Prague, Czech Republic, John Tang Boyland (Ed.) (LIPIcs, Vol. 37). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 712–734. https://doi.org/10.4230/LIPIcs.ECOOP.2015.712

[44]

Xin Zhang, Radu Grigore, Xujie Si, and Mayur Naik. 2017. Effective interactive resolution of static analysis alarms. Proc. ACM Program. Lang., 1, OOPSLA (2017), 57:1–57:30. https://doi.org/10.1145/3133881

Digital Library

[45]

Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On abstraction refinement for program analyses in Datalog. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, Michael F. P. O’Boyle and Keshav Pingali (Eds.). ACM, 239–248. https://doi.org/10.1145/2594291.2594327

Digital Library

[46]

Xin Zhang, Mayur Naik, and Hongseok Yang. 2013. Finding optimum abstractions in parametric dataflow analysis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013, Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 365–376. https://doi.org/10.1145/2491956.2462185

Digital Library

[47]

Yifan Zhang, Yuanfeng Shi, and Xin Zhang. 2024. Learning Abstraction Selection for Bayesian Program Analysis (Paper Artifact). https://doi.org/10.5281/zenodo.10897277

Cited By

Wang ZYang LChen MBu YLi ZWang QQin SYi XYin JFilkov VRay BZhou M(2024)Parf: Adaptive Parameter Refining for Abstract InterpretationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695487(1082-1093)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695487

Index Terms

Learning Abstraction Selection for Bayesian Program Analysis
1. Software and its engineering
  1. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Automated static analysis

Recommendations

User-guided program reasoning using Bayesian inference
PLDI '18

Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false ...
User-guided program reasoning using Bayesian inference
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false ...
Discrete Bayesian Network Classifiers: A Survey

We have had to wait over 30 years since the naive Bayes model was first introduced in 1960 for the so-called Bayesian network classifiers to resurge. Based on Bayesian networks, these classifiers have many strengths, like model interpretability, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 8, Issue OOPSLA1

April 2024

1492 pages

EISSN:2475-1421

DOI:10.1145/3554316

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2024

Published in PACMPL Volume 8, Issue OOPSLA1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
246
Total Downloads

Downloads (Last 12 months)246
Downloads (Last 6 weeks)60

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZYang LChen MBu YLi ZWang QQin SYi XYin JFilkov VRay BZhou M(2024)Parf: Adaptive Parameter Refining for Abstract InterpretationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695487(1082-1093)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695487

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents