Making a Variable Tracker Graph

Abstract: Want to know all the possible values of a variable? Understand’s Information Browser is a great place to start. If you need more, check out this custom graph plugin.

Note: No need to install this plugin, we decided this graph was so useful we re-wrote it as part of Understand which adds new features like being able to dynamically follow nodes in the tree. This article is a great example of writing a custom API graph so we’re leaving it here.

Suppose variable foo is causing a crash. What can Understand tell me about the possible values of this variable?

The Information Browser

The Information Browser provides a quick view of the information Understand knows about your variable. Here’s an example for the variable foo. (Yeah, you don’t have the source code for foo yet. That comes later.)

Information Browser showing variables for source code. — A screenshot of the Information Browser for the variable foo.

Here are a few things to notice:

The third line down tells us the that initial value of foo is 7. Initial values are stored by Understand and do not depend on having the source code.
The “Assignments” section tells us that on line 24 foo=x.
The “Assigned To” section tells us that on line 28 i=foo, and on line 31 f0=foo.
In the references section, we can see that there are five set references for foo, so the value was set five different times.

Some Limitations

Now, for the actual source code.

int func0(int & f0)
{
  return ++f0;
}

int func1(int a, int b, int c)
{
  switch (a) {
    case 0:
      return 0;
    case 1:
      return b + c;
    default:
      return b - c;
  }
}

int func2()
{
  int x = 0, y = 1, z = 2;
  int foo = 7;
  foo = 6;

  foo = x;
  foo = y + z;
  foo = func1(x,y,z);

  int i = foo;
  int j = 2*foo;
  foo = func1(y,y+7,y);
  int k = func0(foo);

  foo ++;

  return i + j + k + foo;
}

You might notice some limitations with the “Assignments” and “Assigned To” sections. Because references link one entity to a second entity, assignments to data (foo = 6), multiple entities (foo = y + z), or expressions (int j = 2*foo) don’t have references. Assign references also don’t work for return value of a function call (foo = func1(y,y+7,y)). But, they do catch parameters, like f0 on line 31. In fact, the reference kind even distinguishes between assignby value and assignby ref.

A Graph Plugin

Download the plugin here

So, we have a lot of information in the Information Browser. But, exhaustively exploring all possible values of foo would still involve manually looking at each set reference. From there, it could involve manually exploring the set references for the right hand side variables or the return statements for the right hand side functions.

Wait, what’s with all this “manual.” Are we not programmers?! Repetitive tasks like this should be automated.

I’ll write a graph plugin for it. My graph plugin will have a node with the source code for each set reference. Then, I’ll add edges for each entity on the right hand side of that reference that feed into my source code node. So, I’ll end up with something like this:

Graph plugin for source code for each set of references. — A mock-up of the desired graph information

The same information could be presented in a tree view. In that case, I’d write an interactive report like this one for friends. But, I much prefer graphs to trees because they’re prettier, easier to navigate, and don’t have to duplicate subtrees.

Finding Source Code

To get started with graph plugins, check out this article. Download the complete graph plugin here. For this article, I’ll just focus on using source code in graphs.

The Lexer class represents code as a series of Lexeme objects. Each lexeme object has a location, token kind, and text. Importantly, a lexeme can also have an associated entity and/or reference. You can get a quick view of how your source code will appear through a lexer with this interactive report.

The first step in using the Lexer is to create one. Lexers are available for file entities. Because creating the lexer can take a lot of time, we’ll cache them for this script.

def findlexer(file, lexercache):
  # Lexer creation takes significant time, so keep track of lexers already
  # created instead of recreating them each time
  lexer = None
  if file in lexercache:
    lexer = lexercache[file]
  else:
    try:
      lexer = file.lexer()
      lexercache[file] = lexer
    except:
      pass
  return lexer

Once we have a lexer, we can jump to a location in the source code with the lexer.lexeme(line, column) function. Now we want to continue to read lexemes with lexeme.next() until the end of the set. How do we know when it ends?

In the simplest case, an assignment ends with a semicolon. It’s also possible to end with a comma, like line 20 in the sample code. Finally, it’s possible to end at a parenthesis in statements like:

if (x = func()) doSomething();

So, to graph the lexemes for a set reference, we can use this function:

def setRefLexemes(ref, lexercache):
  lexlist = []
  lexer = findlexer(ref.file(), lexercache)
  if lexer:
    # Start at the referenced lexeme
    lexeme = lexer.lexeme(ref.line(), ref.column())
    parens = 0
    atEnd = False
    # Read to the end of the current statement, which is usually a semicolon
    # but can be parentheses, ex:
    #  if (x = func()) ...;
    # would be "x = func()"
    while lexeme and parens >= 0 and not atEnd:
      lexlist.append(lexeme)
      lexeme = lexeme.next()
      if lexeme and lexeme.token() == "Punctuation":
        if lexeme.text() == ';':
          atEnd = True
        elif lexeme.text() == ')':
          parens -= 1
        elif lexeme.text() == '(':
          parens += 1
      if lexeme and lexeme.token() == "Operator":
        if parens == 0 and lexeme.text() == ',':
          atEnd = True

  return lexlist

Graphing Source Code

Now that we have a list of lexemes, we can create graph nodes for them. The text to display can be retrieved with lexeme.text(). To sync a node to location, the node.sync() function must be called. It can accept a reference or a file, line, column. So, a locNode function could look like this:

def srcNode(graph, loc, lexlist):
  # Create a node from a lexeme list. These nodes are not cached.

  # The text of the node is the source code from the lexemes
  text = ""
  for lexeme in lexlist:
    text += lexeme.text()

  if not text or graph.options().lookup("Show Source Locations") == "On":
    if isinstance(loc, understand.Ref):
      text += "\\l[" + loc.file().relname() + " (" + str(loc.line()) + ":" + str(loc.column()) + ")]"
    else:
      text += "\\l[" + loc[0].relname() + " (" + str(loc[1]) + ":" + str(loc[2]) + ")]"

  node = graph.node(text)

  # The node syncs to the location in the source code. loc should be either
  # a reference or a (file, line, column) tuple/list
  if isinstance(loc, understand.Ref):
    node.sync(loc)
  else:
    node.sync(loc[0].longname(), loc[1], loc[2])

  # Source code node styling
  node.set("shape","none")
  return node

To connect our nodes, we first connect our starting entity to a the source node:

  # initial entity
  headNode = entNode(graph, ent, nodecache)

  # Create source code nodes for each set reference
  for ref in ent.refs("c setby"):
    lexlist = setRefLexemes(ref, lexercache)
    tailNode = srcNode(graph, ref, lexlist)
    graph.edge(tailNode, headNode)

Now, we need to find all the entities in the right hand side of the expression. We can use the lexeme.ent() function and assume it is a right hand side entity as long as it isn’t the initial entity:

def ents(lexlist, exceptEnt):
  # Find all the entities appearing in the list of lexemes. exceptEnt allows
  # ignoring the initial entity.
  entlist = []
  for lexeme in lexlist:
    nextent = lexeme.ent()
    if nextent and not nextent in entlist and nextent != exceptEnt:
      entlist.append(nextent)
  return entlist

Finally, we draw an edge from each input entity to the source code node, and report those input entities as the next level in the graph.

    for lexEnt in ents(lexlist, ent):
      next.append(lexEnt)
      graph.edge(entNode(graph, lexEnt, nodecache), tailNode)

The Results

Here’s the final graph for the sample code with foo.

Final graph displaying connections between variables for source code.

If you try out the plugin, you’ll notice some other cool features. For instance, suppose we wanted to know what possible values func1 might return. The plugin has a “Returns” option:

What about the parameters b and c? What are their possible values? The graph also has a “Parameter Calls” option to grab the source code pertaining to each parameter:

Let us know if you find this plugin useful. If it becomes a popular plugin, we can create a built-in version of the graph that would have syntax highlighting and node-specific options. Or, it can be extended to handle modify references like line 33 in the sample code, or the assign by reference / pointer modifications like the assignment to f0. We’d love your feedback!