Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2814270.2814307acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Giga-scale exhaustive points-to analysis for Java in under a minute

Published: 23 October 2015 Publication History

Abstract

Computing a precise points-to analysis for very large Java programs remains challenging despite the large body of research on points-to analysis. Any approach must solve an underlying dynamic graph reachability problem, for which the best algorithms have near-cubic worst-case runtime complexity, and, hence, previous work does not scale to programs with millions of lines of code. In this work, we present a novel approach for solving the field-sensitive points-to problem for Java with the means of (1) a transitive-closure data-structure, and (2) a pre-computed set of potentially matching load/store pairs to accelerate the fix-point calculation. Experimentation on Java benchmarks validates the superior performance of our approach over the standard context-free language reachability implementations. Our approach computes a points-to index for the OpenJDK with over 1.5 billion tuples in under a minute.

Supplementary Material

Auxiliary Archive (p535-dietrich-s.zip)
Source code and datasets used in our experimental evaluation. This is a modified version of the artefact submitted for evaluation by the review committee. Proprietary software and datasets have been removed.

References

[1]
R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In Proceedings SIGMOD’89. ACM, 1989.
[2]
L. O. Andersen. Program analysis and specialization for the C programming language. PhD thesis, University of Copenhagen, 1994.
[3]
V. Arlazarov, E. Dinic, M. Kronrod, and I. Faradzev. On economic construction of the transitive closure of a directed graph. Soviet Math. Dokl., 11:1209–1210, 1970.
[4]
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The dacapo benchmarks: Java benchmarking development and analysis. In Proceedings OOPSLA’06. ACM, 2006.
[5]
E. Bodden, A. Sewe, J. Sinschek, M. Mezini, and H. Oueslati. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceeding ICSE ’11. ACM, 2011.
[6]
R. Chatterjee, B. G. Ryder, and W. A. Landi. Relevant context inference. In Proceedings POPL’99. ACM, 1999.
[7]
S. Chaudhuri. Subcubic algorithms for recursive state machines. In Proceedings POPL’08. ACM, 2008.
[8]
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 32(5):1338–1355, 2003.
[9]
A. Colantonio and R. Di Pietro. Concise: Compressed ncomposable integer set. Information Processing Letters, 110(16): 644–650, 2010.
[10]
D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9 (3):251–280, 1990.
[11]
F. L. Gall. Powers of tensors and fast matrix multiplication. arXiv preprint arXiv:1401.7714, 2014.
[12]
D. Grove and C. Chambers. A framework for call graph construction algorithms. ACM Transactions on Programming Languages and Systems (TOPLAS), 23(6):685–746, 2001.
[13]
H. Jagadish. A compression technique to materialize transitive closure. ACM Transactions on Database Systems (TODS), 15 (4):558–598, 1990.
[14]
R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In Proceedings SIGMOD’2008. ACM, 2008.
[15]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a highcompression indexing scheme for reachability query. In Proceedings SIGMOD’2009. ACM, 2009.
[16]
O. Lhoták. Spark: A flexible points-to analysis framework for java, 2002.
[17]
O. Lhoták and L. Hendren. Evaluating the benefits of context-sensitive points-to analysis using a bdd-based implementation. ACM Transactions on Software Engineering and Methodology (TOSEM), 18(1):3, 2008.
[18]
V. B. Livshits and M. S. Lam. Finding security vulnerabilities in java applications with static analysis. In Proceedings USENIX’05, pages 18–18, Berkeley, CA, USA, 2005. USENIX Association.
[19]
D. Melski and T. Reps. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248(1):29–98, 2000.
[20]
A. Milanova, A. Rountev, and B. G. Ryder. Parameterized object sensitivity for points-to and side-effect analyses for java. In ACM SIGSOFT Software Engineering Notes, volume 27, pages 1–11. ACM, 2002.
[21]
E. Nuutila. Efficient transitive closure computation in large digraphs. PhD thesis, PhD thesis, Helsinki University of Technology, 1995. Acta Polytechnica Scandinavica, Mathematics and Computing in Engineering Series, 1995.
[22]
D. J. Pearce, P. H. J. Kelly, and C. Hankin. Online cycle detection and difference propagation: Applications to pointer analysis. volume 12, pages 311–337. Kluwer, Dec. 2004.
[23]
T. Reps. Program analysis via graph reachability. Information and Software Technology, 40(11):701–726, 1998.
[24]
B. G. Ryder. Dimensions of precision in reference analysis of object-oriented programming languages. In Proceedings CC’03. Springer, 2003.
[25]
L. Shang, X. Xie, and J. Xue. On-demand dynamic summarybased points-to analysis. In Proceedings CGO’12. ACM, 2012.
[26]
M. Sharir and A. Pnueli. Two approaches to interprocedural data flow analysis. New York University, 1978.
[27]
O. Shivers. Control-flow analysis of higher-order languages. PhD thesis, Carnegie Mellon University Pittsburgh, 1991.
[28]
Y. Smaragdakis and M. Bravenboer. Using datalog for fast and easy program analysis. In O. de Moor, G. Gottlob, T. Furche, and A. Sellers, editors, Datalog Reloaded, volume 6702 of LNCS, pages 245–251. Springer Berlin Heidelberg, 2011. ISBN 978-3-642-24205-2.
[29]
Y. Smaragdakis, M. Bravenboer, and O. Lhoták. Pick your contexts well: understanding object-sensitivity. ACM SIGPLAN Notices, 46(1):17–30, 2011.
[30]
Y. Smaragdakis, G. Kastrinis, and G. Balatsouras. Introspective analysis: context-sensitivity, across the board. In Proceedings PLDI’2014. ACM, 2014.
[31]
M. Sridharan and R. Bod´ık. Refinement-based contextsensitive points-to analysis for java. In Proceedings PLDI’06. ACM, 2006.
[32]
M. Sridharan and S. J. Fink. The complexity of Andersens analysis in practice. In Static Analysis, pages 205–221. Springer, 2009.
[33]
M. Sridharan, D. Gopan, L. Shan, and R. Bod´ık. Demanddriven points-to analysis for java. In Proceedings OOPSLA’05. ACM, 2005.
[34]
B. Steensgaard. Points-to analysis in almost linear time. In Proceedings POPL ’96. ACM, 1996.
[35]
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13(4):354–356, 1969.
[36]
R. Tarjan. Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2):146–160, 1972.
[37]
E. Tempero. How fields are used in Java: An empirical study. In Proceedings ASWEC’09. IEEE, 2009.
[38]
F. Tip and J. Palsberg. Scalable propagation-based call graph construction algorithms. In Proceedings OOPSLA’00. ACM, 2000.
[39]
R. Vallée-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot - a java bytecode optimization framework. In Proceedings CASCON ’99. IBM Press, 1999.
[40]
S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In Proceedings SIGMOD’11. ACM, 2011.
[41]
J. Whaley and M. S. Lam. An efficient inclusion-based pointsto analysis for strictly-typed languages. In Static Analysis, pages 180–195. Springer, 2002.
[42]
K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems (TODS), 31(1):1–38, 2006.
[43]
G. Xu and A. Rountev. Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis. In Proceedings ISSTA ’08. ACM, 2008.
[44]
G. Xu, A. Rountev, and M. Sridharan. Scaling cflreachability-based points-to analysis using context-sensitive must-not-alias analysis. In Proceedings ECOOP’09. Springer, 2009.
[45]
D. Yan, G. Xu, and A. Rountev. Demand-driven contextsensitive alias analysis for java. In Proceedings ISSTA’11. ACM, 2011.
[46]
M. Yannakakis. Graph-theoretic methods in database theory. In Proceedings PODS’90. ACM, 1990.
[47]
H. Yildirim, V. Chaoji, and M. J. Zaki. Grail: Scalable reachability index for large graphs. Proceedings of the VLDB Endowment, 3(1-2):276–284, 2010.
[48]
Q. Zhang, M. R. Lyu, H. Yuan, and Z. Su. Fast algorithms for dyck-cfl-reachability with applications to alias analysis. In Proceedings PLDI’13. ACM, 2013.

Cited By

View all
  • (2024)The ART of Sharing Points-to Analysis: Reusing Points-to Analysis Results Safely and EfficientlyProceedings of the ACM on Programming Languages10.1145/36898038:OOPSLA2(2606-2632)Online publication date: 8-Oct-2024
  • (2022)Indexing the extended Dyck-CFL reachability for context-sensitive program analysisProceedings of the ACM on Programming Languages10.1145/35633396:OOPSLA2(1438-1468)Online publication date: 31-Oct-2022
  • (2021)A Survey of Parametric Static AnalysisACM Computing Surveys10.1145/346445754:7(1-37)Online publication date: 18-Jul-2021
  • Show More Cited By

Index Terms

  1. Giga-scale exhaustive points-to analysis for Java in under a minute

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
    October 2015
    953 pages
    ISBN:9781450336895
    DOI:10.1145/2814270
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 10
      OOPSLA '15
      October 2015
      953 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2858965
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Context-free Language
    2. Java
    3. Points-to Analysis
    4. Transitive Closure

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SPLASH '15
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 268 of 1,244 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The ART of Sharing Points-to Analysis: Reusing Points-to Analysis Results Safely and EfficientlyProceedings of the ACM on Programming Languages10.1145/36898038:OOPSLA2(2606-2632)Online publication date: 8-Oct-2024
    • (2022)Indexing the extended Dyck-CFL reachability for context-sensitive program analysisProceedings of the ACM on Programming Languages10.1145/35633396:OOPSLA2(1438-1468)Online publication date: 31-Oct-2022
    • (2021)A Survey of Parametric Static AnalysisACM Computing Surveys10.1145/346445754:7(1-37)Online publication date: 18-Jul-2021
    • (2021)[Engineering] eNYPD—Entry Points Detector Jakarta Server Faces Use Case2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM52516.2021.00013(30-35)Online publication date: Sep-2021
    • (2021)Caught in the Web: DoS Vulnerabilities in Parsers for Structured DataComputer Security – ESORICS 202110.1007/978-3-030-88418-5_4(67-85)Online publication date: 30-Sep-2021
    • (2019)PYEACM Transactions on Programming Languages and Systems10.1145/333779441:3(1-37)Online publication date: 2-Jul-2019
    • (2019)Rethinking Incremental and Parallel Pointer AnalysisACM Transactions on Programming Languages and Systems10.1145/329360641:1(1-31)Online publication date: 1-Mar-2019
    • (2018)Automatic index selection for large-scale datalog computationProceedings of the VLDB Endowment10.14778/3282495.328250012:2(141-153)Online publication date: 1-Oct-2018
    • (2018)SDPA: An Optimizer for Program Analysis of Data-Parallel Applications2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00034(14-21)Online publication date: Jun-2018
    • (2018)Combining range and inequality information for pointer disambiguationScience of Computer Programming10.1016/j.scico.2017.10.014152:C(161-184)Online publication date: 15-Jan-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media