[PDF][PDF] Analysis of pointers and structures

DR Chase, M Wegman, FK Zadeck - ACM SIGPLAN Notices, 1990 - dl.acm.org
DR Chase, M Wegman, FK Zadeck
ACM SIGPLAN Notices, 1990dl.acm.org
High-level languages could be optimized significantly if compilers could determine
automatically how pointers and heap allocated structures are used. Setter knowledge of
aliasing can improve classical optimizations applied to scalars (common sub-expression
elimination, loop-invariant code motion, reduction in st. rength, constant propagation) by
permitting less conservative assumptions about what is affected by an update to stor, age,
and can aid in dependence analysis for purposes of parallelization. In addition, information …
High-level languages could be optimized significantly if compilers could determine automatically how pointers and heap allocated structures are used. Setter knowledge of aliasing can improve classical optimizations applied to scalars (common sub-expression elimination, loop-invariant code motion, reduction in st. rength, constant propagation) by permitting less conservative assumptions about what is affected by an update to stor, age, and can aid in dependence analysis for purposes of parallelization. In addition, information about the shape and use of linked data structures can be used to apply storage overwriting and allocation optimizations(for instance, reusing storage instead of making a copy). This problem is a complex one, in part because it is possible to construct unbounded data structures that must necessarily be represented in some finite way. As with almost all program a: lalysis and optimization problems, one must limit the kinds of information one tries to collect, because exact information is generally undecidable or at least very difficult to compute. Our work follows that of Jones and Muchnick [Jbl81] who summarize the data structures allocated in a heap by making a graph, in which one node corresponds to possibly many nodes in the heap. The major issue is how to choose which heap cells to associate with which nodes. We view the program as a generator for data structures. Each symbolic execution of X+-cons (A, X) adds a new node to the data structure. As the data structure grows, it must be compressed by making one node stand for many. We difler from other works principally in the kind of information we use to do the compression. Most work to date [Sch75a, Sch75b, JM81, Rug87, RM88, LH88, Lar89, HPR89] does this compression by bounding acyclic path length in the modeled data structures; this is known as k-bounded approximation. They limit the length of (acyclic) paths to k by truncating long paths with summary nodes containing all paths occurring in the original. The (path-length) kb ounded approaches have several flaws: they are potentially very slow (unless k is very small), unbounded structures lose all structure beyond depth k, and information provided by the program structure is ignored.
ACM Digital Library