Kayrebt::Extractor

The extractor is a plugin for GCC-4.8. It is inserted in the compilation process and adds a pass specifically to dump the control-flow graph representation that GCC creates and maintains for each function it compiles.

The position at which the pass is inserted varied during the development of the extractor. As of version 4, it is placed after the optimized pass, when all high-level optimizations have already been applied and GCC is about to change the internal representation of the code into register transfer language. In the next release, the position of the extractor will be configurable. However, this does not mean it will be entirely free. The extractor relies on two structures which must be available: the SSA form and the control-flow graphs. Therefore, it has to be placed after these structures are constructed and before they are released.

Sources

The sources of the path examiner are available on the Inria forge.

Here is the direct link to the git repository (anonymous read-only access):
https://scm.gforge.inria.fr/anonscm/git/kayrebt/kayrebt.git.

Compiling

The requirements to build the extractor are:

  • Kayrebt::Lib::ActivityDiagrams
  • g++, the GCC compiler for C++, version 4.8 exactly. The executable may be called g++-4.8, or g++-4.8.5 for instance or simply g++ if the 4.8 version is your default version
  • the GCC plugin headers, you can verify whether you have those with ls $(gcc-4.8 -print-file-name=plugins/include)
  • the yaml-cpp library
  • the sqlite3 library
  • the pkgconfig utility

The extractor is packaged with the autotools. If you download a tarball, the build process is as simple as:
./configure CC=gcc-4.8 CXX=g++-4.8 make make install
If you have the appropriate permissions when doing make install, a symbolic link to the plugin will be created in GCC's plugin directory gcc -print-file-name=plugin. If you have cloned via git the developper branch, you need the GNU autotools to build the extractor and the activity-diagrams library shipped with it.

Configuring

Configuring is done with a Yaml file like the following:
general: - greedy: 0 - url: - dbfile: my_db.sqlite - dbname: symbols - categories: - 1: 'bgcolor=blue' - 2: 'textcolor=red' source_file1.c: - functions: ['function1','function2'] - match: - '(k|m)alloc.*': 1 source_file2.c: - functions: ['one_more_function','especially_interesting_function'] - start_match: - '.*spin_lock.*': 2 - end_match: ['spin_unlock']
The configuration file has a section general which is mandatory, and zero or more compilation units sections. In the general section, the following options can be set:

  • greedy Enable or disable the greedy mode (disabled by default. If the greedy mode is on, the extractor wil output a graph for every function compiled by GCC. Otherwise, only explicitly named functions will be dumped.
  • url Configuration subsection for the URL resolver. The URL solver is used by the extractor if it is configured to associate to function calls nodes the source file and line number of the definition of the function. This is useful for example to hyperlink the callee graphs from the caller.
    • dbfile The database file, which must be a SQLITE3 file.
    • dbname The table name. This table must have these columns: symbol: the function symbol name, dir: the directory name where the source file where the function is defined resides, file: the said source file, line: the line at which the function is defined.
  • categories Configuration subsection for the categorizer. The categorizer applies custom attributes to nodes, depending on their category (see below).
  • match A series of regular expressions associated with a category. All nodes whose label match the regular expression are put in the corresponding category.
  • start_match A series of regular expressions associated with a category. When a node matches a regular expression, it and all the following nodes until a node matches a stop_match regular expression are marked with the corresponding category.
  • stop_match All the regular expressions that stop the categorization of nodes

In compilation units sections, the match, start_match, and end_match are also available. They apply only to functions compiled inside this compilation unit and have precedence over the similar settings in the general section. Furthermore, another option functions gives the list of functions for which a graph has to be dumped during the compilation.

Running

The extractor is run through gcc. To use it, make sure to have a configuration file as described above and execute:
gcc-4.8 -fplugin=path/to/the/compiled/plugin/kayrebt_extractor.so \ -fplugin-kayrebt_extractor-config=path/to/the/configuration/file.yml \ [rest of the compilation parameters...] This will output one dump file per compilation unit, named appropriately enough compilation_unit.c.dump. By default, the configuration file is ./config. If the plugin is installed in GCC's plugin directory, the path can be shortened to its bare name kayrebt_extractor.

API documentation

The source code of the extractor is documented with Doxygen and the documentation is available here.