README.txt 6.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
  1. //===----------------------------------------------------------------------===//
  2. // Clang Static Analyzer
  3. //===----------------------------------------------------------------------===//
  4. = Library Structure =
  5. The analyzer library has two layers: a (low-level) static analysis
  6. engine (GRExprEngine.cpp and friends), and some static checkers
  7. (*Checker.cpp). The latter are built on top of the former via the
  8. Checker and CheckerVisitor interfaces (Checker.h and
  9. CheckerVisitor.h). The Checker interface is designed to be minimal
  10. and simple for checker writers, and attempts to isolate them from much
  11. of the gore of the internal analysis engine.
  12. = How It Works =
  13. The analyzer is inspired by several foundational research papers ([1],
  14. [2]). (FIXME: kremenek to add more links)
  15. In a nutshell, the analyzer is basically a source code simulator that
  16. traces out possible paths of execution. The state of the program
  17. (values of variables and expressions) is encapsulated by the state
  18. (ProgramState). A location in the program is called a program point
  19. (ProgramPoint), and the combination of state and program point is a
  20. node in an exploded graph (ExplodedGraph). The term "exploded" comes
  21. from exploding the control-flow edges in the control-flow graph (CFG).
  22. Conceptually the analyzer does a reachability analysis through the
  23. ExplodedGraph. We start at a root node, which has the entry program
  24. point and initial state, and then simulate transitions by analyzing
  25. individual expressions. The analysis of an expression can cause the
  26. state to change, resulting in a new node in the ExplodedGraph with an
  27. updated program point and an updated state. A bug is found by hitting
  28. a node that satisfies some "bug condition" (basically a violation of a
  29. checking invariant).
  30. The analyzer traces out multiple paths by reasoning about branches and
  31. then bifurcating the state: on the true branch the conditions of the
  32. branch are assumed to be true and on the false branch the conditions
  33. of the branch are assumed to be false. Such "assumptions" create
  34. constraints on the values of the program, and those constraints are
  35. recorded in the ProgramState object (and are manipulated by the
  36. ConstraintManager). If assuming the conditions of a branch would
  37. cause the constraints to be unsatisfiable, the branch is considered
  38. infeasible and that path is not taken. This is how we get
  39. path-sensitivity. We reduce exponential blow-up by caching nodes. If
  40. a new node with the same state and program point as an existing node
  41. would get generated, the path "caches out" and we simply reuse the
  42. existing node. Thus the ExplodedGraph is not a DAG; it can contain
  43. cycles as paths loop back onto each other and cache out.
  44. ProgramState and ExplodedNodes are basically immutable once created. Once
  45. one creates a ProgramState, you need to create a new one to get a new
  46. ProgramState. This immutability is key since the ExplodedGraph represents
  47. the behavior of the analyzed program from the entry point. To
  48. represent these efficiently, we use functional data structures (e.g.,
  49. ImmutableMaps) which share data between instances.
  50. Finally, individual Checkers work by also manipulating the analysis
  51. state. The analyzer engine talks to them via a visitor interface.
  52. For example, the PreVisitCallExpr() method is called by GRExprEngine
  53. to tell the Checker that we are about to analyze a CallExpr, and the
  54. checker is asked to check for any preconditions that might not be
  55. satisfied. The checker can do nothing, or it can generate a new
  56. ProgramState and ExplodedNode which contains updated checker state. If it
  57. finds a bug, it can tell the BugReporter object about the bug,
  58. providing it an ExplodedNode which is the last node in the path that
  59. triggered the problem.
  60. = Notes about C++ =
  61. Since now constructors are seen before the variable that is constructed
  62. in the CFG, we create a temporary object as the destination region that
  63. is constructed into. See ExprEngine::VisitCXXConstructExpr().
  64. In ExprEngine::processCallExit(), we always bind the object region to the
  65. evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the
  66. corresponding lazy compound value if the variable is not a reference, and
  67. bind the variable region to the lazy compound value. If the variable
  68. is a reference, just use the object region as the initializer value.
  69. Before entering a C++ method (or ctor/dtor), the 'this' region is bound
  70. to the object region. In ctors, we synthesize 'this' region with
  71. CXXRecordDecl*, which means we do not use type qualifiers. In methods, we
  72. synthesize 'this' region with CXXMethodDecl*, which has getThisType()
  73. taking type qualifiers into account. It does not matter we use qualified
  74. 'this' region in one method and unqualified 'this' region in another
  75. method, because we only need to ensure the 'this' region is consistent
  76. when we synthesize it and create it directly from CXXThisExpr in a single
  77. method call.
  78. = Working on the Analyzer =
  79. If you are interested in bringing up support for C++ expressions, the
  80. best place to look is the visitation logic in GRExprEngine, which
  81. handles the simulation of individual expressions. There are plenty of
  82. examples there of how other expressions are handled.
  83. If you are interested in writing checkers, look at the Checker and
  84. CheckerVisitor interfaces (Checker.h and CheckerVisitor.h). Also look
  85. at the files named *Checker.cpp for examples on how you can implement
  86. these interfaces.
  87. = Debugging the Analyzer =
  88. There are some useful command-line options for debugging. For example:
  89. $ clang -cc1 -help | grep analyze
  90. -analyze-function <value>
  91. -analyzer-display-progress
  92. -analyzer-viz-egraph-graphviz
  93. ...
  94. The first allows you to specify only analyzing a specific function.
  95. The second prints to the console what function is being analyzed. The
  96. third generates a graphviz dot file of the ExplodedGraph. This is
  97. extremely useful when debugging the analyzer and viewing the
  98. simulation results.
  99. Of course, viewing the CFG (Control-Flow Graph) is also useful:
  100. $ clang -cc1 -help | grep cfg
  101. -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
  102. -cfg-add-initializers Add C++ initializers to CFGs for all analyses
  103. -cfg-dump Display Control-Flow Graphs
  104. -cfg-view View Control-Flow Graphs using GraphViz
  105. -unoptimized-cfg Generate unoptimized CFGs for all analyses
  106. -cfg-dump dumps a textual representation of the CFG to the console,
  107. and -cfg-view creates a GraphViz representation.
  108. = References =
  109. [1] Precise interprocedural dataflow analysis via graph reachability,
  110. T Reps, S Horwitz, and M Sagiv, POPL '95,
  111. http://portal.acm.org/citation.cfm?id=199462
  112. [2] A memory model for static analysis of C programs, Z Xu, T
  113. Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf