Kayrebt::Callgraphs

The Callgrapher is a GCC plugin that extracts function calls from a codebase. It works one compilation unit at a time and export the calls as a database representing an adjacency list. It is currently limited to static calls, and does not track neither calls trough function pointers, nor GCC builtin functions.

Sources

The sources of the path examiner are available on the Inria forge.

Here is the direct link to the git repository (anonymous read-only access):
https://scm.gforge.inria.fr/anonscm/git/kayrebt/Callgraphs.git.

There is a mirror of this code on GitHub where bug reports and patches can be submitted.

Compiling

The requirements to build the callgrapher are:

  • g++ version 4.8 exactly,
  • SQLite3.

The callgrapher is packaged with the autotools. If you download a tarball, the build process is (usually) as simple as:
./configure make make install

If you have sufficient privileges, make install will install a symbolic link to the compiled plugin in GCC plugins' directory.

Running

To run the callgrapher, you need a compilable codebase. Theoretically, all languages supported by a GCC frontend could be handled but the code is only tested with C codebases. Expect wierd results due to mangling and virtual calls, or crashes, if you try it on other languages.

It is necessary to pass a few special options to GCC to make it load and use the plugin. The following is generally suficient:
gcc-4.8 -fplugin=kayrebt_callgraphs -fplugin-arg-kayrebt_callgraphs-dbfile=db.sqlite file.c

Callgraphs can be output in two forms: either in a database or in the dump files generated by GCC. To export the callgraph as a sqlite database, prepare a file with proper tables definition as such:
cat <<EOF | sqlite3 callgraph.db CREATE TABLE functions (Id INTEGER PRIMARY KEY, Name TEXT, File TEXT, Line INTEGER, Global INTEGER); CREATE TABLE calls (Caller INTEGER, Callee INTEGER, PRIMARY KEY (Caller,Callee)); EOF
Table functions stores each function identified in the codebase with the following informations: its name, the file and line it is declared at, and whether it is a globally visible function or a function only callable in its compilation unit (a static function in C terminology). This allows a correct identification of all functions, with no duplicates, as far as only C codebases are analyzed. The second table contains pairs of function identifiers, and represents the calls relation between functions.

If you do not wish to install or use SQLite3, it is possible to extract the information anyway in GCC dumpfiles. Since there is no option allowing to activate dumps for a plugin (at least, I have not find it), you have to activate the dump for all passes so this method quickly becomes quite messy.
gcc-4.8 -fdump-tree-all-all -fopt-info-all-all file.c
This will produce a file with a name similar to file.c.246t.findcalls whose content will be like
;; Function f (f, funcdef_no=0, decl_uid=2176, cgraph_uid=0) ;; f (defined in subdir/1.h, l.3) (static) calls no function (other than builtins) f () { : __builtin_puts (&"Hello world!"[0]); return; } ;; Function main (main, funcdef_no=1, decl_uid=2180, cgraph_uid=1) ;; main (defined in subdir/1.c, l.7) calls the following functions: ;; f (defined in subdir/1.h, l.3) (static) main () { void (*) () fn; int D.2184; : f (); fn = g; f (); fn (); D.2184 = 0; : return D.2184; }

Limitations

This plugin slows compilation down noticeably. This is probably due to the use of the SQLite database. In the tests we ran on the Linux kernel, compilation took several hours instead of twenty to thirty minutes usually. Dumping millions of lines in a SQLite database is not really clever but we cannot dump directly to a CSV file since we sometimes need to query the database before inserting a new function to disambiguate or deduplicate them. It could be possible to dump to another SQL database such as PostgreSQL but the relational model is not really the best anyway to store graphs. We chose SQLite because of its really simple and easy-to-use C++ binding, plus the fact that it is light and easy to install and use.

To perform heavy processing, we recommend dumping the database as a CSV file and importing it in a graph database such as Neo4j. We may eventually use C++ binding to directly export the callgraphs to a Neo4j database from the plugin, in a future release.

API documentation

The source code of the callgrapher will be documented soon with Doxygen and the documentation will be available here.