Survey of Alias Analysis
Qiang Wu


Part I: Fundamentals of Alias Analysis

This part severs as an introduction and overview of alias analysis:

  • Different alias representations in literature.
  • Motivation and application of alias analysis.
  • Intra-procedural alias analysis.
  • Overview of alias analysis in current practice and research activities.

Representations of Alias

  • What is alias?
    • Two variables referring to the same memory location.
    • For example, in C, if p = &i, then <*p, i> is an alias pair.
  • Representations of alias relation in literature.
    • Complete alias pairs: store all alias pairs explicitly, such as in [Landi et al.'92].
    • Compact alias pairs: store only basic alias pairs, and derive new alias pairs by dereference, transitivity and commutativity, such as in [Choi et al.'93].
    • Points-to relations: points-to pairs to indicate one variable is pointing to anothe, such as in [Emmai'93]
  • An example: "q=&p; p=&i; r=&i;"
    • Complete: <*q,p> <*p,i> <*r,i> <**q,p> <**q,i> <*p,*r> <**q,*r>
    • Compact: <*q,p> <*p,i> <*r,i>
    • Points-to: (q,p) (p,i) (r,i)
  • Must-alias and maybe-alias

Why Alias Analysis?

  • Alias analysis are needed for
    • More accurate memory dependence analysis and data flow analysis.
    • More aggressive optimization and scheduling. Without alias analysis, data flow analysis, optimization and scheduling have to be conservative.
  • An example:
    • Without alias information: no optimization and wide-issue scheduling for   
           r1 = M[A];
           M[B] = r2;
           r3 = M[A];
           r5 = r3 +r4;
    • With alias information: assume A and B are shown not alias, now can eliminate redundant load/store and schedule code
           r1 = M[A];    M[B] = r2;
           r5 = r1 + r4;
  • To get alias information, the intra-procedural alias analysis ......

Intra-procedural Alias Analysis

  • Iterative data flow analysis framework
    • Like standard data flow analysis, such as reaching definition analysis, see figure below.
    • Gen(N) = set of points-to pairs generated by Node N.
    • Kill(N) = set of points-to pairs killed by Node N .
    • Forward order: Out(N) = Gen(N) +(In(N) - Kill(N));

  • A simple example:
    • Control flow graph:
    • Iteration 1:
    • Iteration 2:
  • A more complicated example here with complex language features ......

Challenges to Alias Analysis

  • Function calls
    • will formal parameters and global variables be changed?
  • Function pointers
    • how to resolve function pointers?
  • Aggregate data structure
    • how to handle struct and union?
  • Heap objects
    • how to include heap object?
  • Recursive data structure
    • will it lead to infinite number of alias?
  • Pointer arithmetic
    • will points-to relations change due to pointer arithmetic?
  • Type casting
    • what to do if a variable is type-casted?

Overview of Alias Analysis

  • Research started in late 1970s.
  • Moderate alias analysis implemented in commercial compilers, like gcc and DEC
  • Aggressive alias analysis still in research stage.
  • Classifications of existing alias analysis:
    • Intra-procedural: isolated functions only.
    • Inter-procedural: consider function calls and their interactions;
    • Type-based techniques: use type information to decide alias.
    • Flow-based techniques: consider alias generated by flow of statement.
  • A view by graph:

Part II: Inter-procedural Alias Analysis - a survey

This part surveys several recent inter-procedural alias analysis algorithms in literature:

  • Inter-procedural frameworks
  • Other advanced issues
    • Abstract data representations
    • Recursive and aggregate data structure
    • Heap objects
    • Function pointers and recursive function calls

Inter-procedural Frameworks

Algorithms are grouped into 4 groups according to their inter-procedural frameworks:

  • Group #1: Inter-procedural control flow graph (ICFG).
    • [Landi et al.'92]
    • [Landi et al.'93]
  • Group #2: Invocation graph (IG).
    • [Emami'93, Emami et al'94]
    • [Wilson et al'95]
  • Group #3: Procedural call graph (PCG)
    • [Choi et al'93]
    • [Burke et al'95]
  • Group #4: Context-sensitive PCG
    • [Ryder et al'99]
    • [Cheng et al'00]
An example here will be used to illustrate different inter-procedural frameworks.

Group #1: ICFG

  • Inter-procedural control flow graph [Landi et al'92 93].
    • Connection of isolated control flow graphs.
    • Call node and Return node added.
    • Entry node and Exit node added (see figure below).

  • Problems:
    • Poor efficiency and scalebility - ICFG can be very large.
    • inaccuracy: context insensitivity - can not distinguish different calls of the same procedure (e.g. foo() can not distinguish call sites C1 and C2, therefore a spurious points-to pair (b,e) is generated at call site C2).
  • How to get context sensitivity?
    • one solution: Invocation Graph

Group #2: Invocation graph (IG)

  • Invocation graph (IG) [Emami '93]
    • Fully context sensitive.
    • Each call sites - a node in IG.
    • Each function call - a unique path from main() in IG (see figure below).

  • Algorithm outline
    • Starting from main(), recursively process each function call.
    • Using map and unmap processor for formal-actual match.
    • Processing callee function body for each procedure call (IG node).
    • A sample algorithm to process calls in IG.

  • Problems:
    • Number of IG nodes can increase exponentially - memory storage requirement!
    • Processing function body for each IG node - large amount of computation!
  • Can we use memoization?
    • one solution: partial transfer function.

Partial Transfer Function (PTF)

  • Partial transfer function [Wilson et al'95]
    • Use memoization.
    • funcOutput = PTF(funcInput).
    • PTF computed once for an "input pattern".
    • Lazily compute PTFs for each procedure.
  • An example to illustrate PTF

Group #3: Procedure Call Graph (PCG)

  • Procedure call graph [Choi et al'93]
    • Each procedure - a PCG node.
    • Each call site - an edge in PCG.

  • Iterative data flow analysis framework
    • Similar to Group #1, but decoupled.
    • Decoupled: two levels of iteration
      • Outer level: PCG
      • Inner level: CFG in each procedure
  • Effects of a procedure call
    • Summarized in two sets of points-to pairs
      • foo.Entry - set at the entry point of procedure foo()
      • foo.Exit - set at the exit point of procedure foo()
    • Backward_binding foo.Exit from callee to caller.
    • Cumulative updating foo.Entry.

  • A sample algorithm using PCG.
  • Problems:
    • Context insensitive.
    • Efficiency and scalebility
  • Do we really care about flow order?
    • One way to improve efficiency and scalebility: flow-insensitive[Burke et al'95].
  • Flow-insensitive approach

Group #4: PCG with Context Sensitivity

  • Using procedure call graph (PCG) in a context sensitive way [Ryder et al'99,Cheng et al'00].
  • Idea: context independent summary function (CISF):
    • CISF: effects of a procedure with unknown initial alias at the entry.
    • PTF: transfer function for a particular input pattern at the entry.
  • Algorithm outline
    • Context sensitive: only input at the current call site involved in actual-formal binding (see figure below).
    • Effective: for each procedure, only one CISF need to be computed and stored (as compared with a number of PTFs)

  • A sample algorithm using CISF.
  • Computing CISF:
  • Put all together: compare algorithms in Group #2,#3 and #4

Other Advanced Issues

  • Abstract data representations
  • Recursive and aggregate data structure
  • Heap objects
  • Function pointers and recursive function calls

Abstract Data Representations

  • Invisible variables [Landi et al'92, Emami et al'94]
    • Variables not "visible" to a procedure, such as its caller's local variables.
    • Abstractly represented by "invisible variables" and map-information
  • An example to illustrate invisible variables
  • Extended variable [Wilson et al'95]
    • Global variables are also represented abstractly.
    • For fewer input patterns and partial transfer functions.
  • Access path (AP) [Landi et al'92, Deutch'94, Cheng et al'00]
    • Access path is defined as a symbolic notation to represent memory location.
        AP : variable
              | AP *
              | AP.field_name
        e.g:   p*.left*.right*.data
    • AP is used for abstract data representation [Cheng et al'00]
  • Extended access path (EAP) [Cheng et al'00]
    • Access path extended from formal parameters or global variables.
    • Only those with non-empty EAP need to be propagated back to caller. for example:
            (temp, malloc_1*) => EAP(malloc_1*) = empty.
            (x*, malloc_1*) => EAP(malloc_1*) = x**.

Heap Objects

  • Nameless: [Emami et al'94]
    • All heap objects are represented by one "heap".
  • Context-insensitive naming: [Landi et al'92]
    • Head objects are named by the creating statement.
  • Context-sensitive naming: [Choi et al'93, Wilson et al'95]
    • Head objects are named by the creating statement and the call path.
  • An example:
    • Source code here
    • Results:
      • Points-to pairs with context-insensitive naming:
           (p, S1) (q,S1)
      • Points-to pairs with context-sensitive naming:
           (p,C1_S1) (q,C2_S1)

Recursive Data Structures

  • Problem: potentially infinite number of aliases
  • Approaches classified by depth of indirection:
    • 1-level: combine all elements as one big cell [Emami'93,Wilson et al'95]
    • k-limiting: distinguish first k-elements (k is an arbitrary number) [Landi et al'92, Cheng et al'00]
    • beyond k-limiting: consider all elements, using some kinds symbolic access paths [Deutch'94]
    • shape analysis: not only distinguish all elements, but also tell how they are connected (connection analysis) and what their shape is.[Ghiya et al'95]

Function Pointers

  • A deadlock?
    • Need call graph to resolve function pointers.
    • Need resolving function pointers to build the call graph.
  • Solution:

Recursive Function Calls

  • A problem for IG approach
  • Solution: [Emami '93, Wilson et al'95]
    • foo_Recursive
    • foo_Approximate
    • pending list in foo_Recursive

Algorithm Summary

  • Classified using context and flow sensitivity
  • Comparison by a table

Part III. Assembly Level Alias Analysis

Traditional alias analysis are conducted at the level of source codes or high level IRs (including those in Part II). This part will look at assembly level alias analysis:

  • Why assembly level?
  • Previous work at assembly level.
  • Alias analysis for Liberty Xcode.

Why Assembly Level?

  • Avoid inaccuracy in alias information due to code transformation.
  • Reduce cost of maintaining alias information.
  • Apply to wider range of language.
  • High level code may not be available:
    • Binary translation and optimization.
    • Run-time optimization.
    • Assembly applications.
  • Information about the entire program/library.

Previous Work at Assembly/Binary Level

  • [Debray et al'98]
    • Binary level, inter-procedural, flow-sensitive, context insensitive.
  • Algorithm outline:
    • ICFG, iterative data flow analysis framework.
    • mod-k residues address representation.
    • Address descriptor (AD): set of possible addresses for each register
      • AD(r) = < I, M >
      • I: definition instruction.
      • M: offset, which is a set of mod-k residues
    • Iteration framework: Out[N](r) = Evaluate(In[N])
    • Sufficient condition for not-an-alias
      • at N1: address1 = < I, M1 >
      • at N2: address1 = < I, M2 >
      • condition is:
        • I dominates both N1 and N2
        • either N1 dominate N2, or vice verse.
        • M1 intersect M2 = empty
    • An example: to show N1 and N2 are not dependent:

Summary of Assembly level Analysis

  • No type information available.
  • More complicated pointer arithmetics.
  • Still a frontier area.
  • Algorithm in [Debray et al'98] is relatively efficient but inaccurate
    • tested for gcc with 137,389 ld/st instructions.
    • 30 percent has known mod-k residues address.
  • Can we do better?

Alias Analysis for Liberty Xcode

Part IV. References

  • [Burke et al'95]: M.Burke, P.Carini,J.Choi and M.Hind, "Flow-insensitive inter-procedural alias analysis in the presence of pointers", In Lecture Notes in Computer Science, 892, pp.234-250, Springier-Verlag, 1995.
  • [Cheng et al'00] B.Cheng and W.Hwu, "Modular Interprocedural pointer analysis using access paths: design implementation and evaluation,", PLDI 2000, Vancouver, British Columbia, Canada.
  • [Choi et al'93]: J.Choi, M.Burke and P.Carini, "Efficient flow-sensitive inter-procedural computation of pointer-induced aliases and side effects,", In Proceedings of the 20th Annual ACM Symposium on Principles of Programming Languages, pp.233-245, January 1993.
  • [Deutsch'94]:A. Deutsch, "Interprocedural may-alias analysis for pointers: Beyond k-limiting,", In Proceeding of the ACM SIGPLAN'94 Conference on Programing language Design and Implementation, pp.230-241, June 1994.
  • [Emami'93]: M. Emami, "A practical inter-procedural alias analysis for an optimizing/paralleling C compiler", Master thesis, School of Computer Science, McGill University, August 1993.
  • [Emami et al'94]: M.Emami, R.Ghiya and L.Hendren, "Context-sensitive inter-procedural points-to analysis in the presence of function pointers", in Proceedings of ACM SIGPLAN'94 Conference on Programming Language Design and Implementation, pp. 242-256, June 1994.
  • [Landi et al'92]: W.Landi and B.Ryder, "A safe approximate algorithm for inter-procedural pointer aliasing". In Proceeding of the ACM SIGPLAN'92 Conference on Programing language Design and Implementation, pp.235-248, June 1992.
  • [Ryder et al'99] R.Chatterjee, R.Ryder and W.Landi, "Relevant context inference,", In Proceedings of the ACM Symposium on Principles of Programming Languages, pp.133-146, January 1999.
  • [Wilson et al'95] R.Wilson and M.Lam, "Efficient context-sensitive pointer analysis for C programs", In Proceeding of the ACM SIGPLAN'95 Conference on Programing language Design and Implementation, pp.1-12, June 1995.
  • More ......