De-Mystifying Dependencies

Abstract: Dependencies seem simple on the surface, but there is a surprising amount of nuance involved in the details. We guide you through these areas as we explore new dependency options now available in Understand.

I have a confession, I’ve always found Understand’s dependencies confusing. However, I inherited the code. It’s a sad state of affairs when even the engineer maintaining the code has a hard time explaining it – so I decided to use Understand for what it does best – comprehending difficult code. In this case, comprehending its own code!

Defining Dependency

What makes dependencies so hard? The first problem is defining “dependency”. Suppose I have the following setup with three C++ files:

// Main.cpp

#include "MyCode.h"

int main()

{

  doSomething();

  return 0;

}

// MyCode.h

void doSomething();


// MyCode.cpp

// Lines 2-10 ommitted

void doSomething()
{
// Lines oommitted
}

What file(s) does Main.cpp depend on? One answer is that Main.cpp depends on MyCode.cpp because Main.cpp calls doSomething() which is defined in MyCode.cpp. Another answer is that Main.cpp depends on MyCode.h because Main.cpp calls doSomething() which is declared in MyCode.h.

Which way is “right”? That changes with what you want to know:

  • Which files need to be part of the executable for Main.cpp? Main.cpp should depend on MyCode.cpp because both C++ files need to be linked together.
  • Is “#include MyCode.h” an unused include? Now Main.cpp should depend on MyCode.h. “#include MyCode.h” is used because Main.cpp needs the declaration of doSomething() to compile.

Writing this out, the word “need” is starting to stand out. Perhaps a synonym for “dependency” is “need.” File A depends on File B if File A needs file B for … something. For linking? For compiling? And, if you really want to get confused, what about programming languages other than C? Rather than go down that wormhole now, let’s stick with C/C++ file dependencies. Since both linking and compiling are useful paradigms, we’ll allow users to pick.

Then: File A depends on File B if File A needs File B to (compile | link).

The “How” of Dependencies

Knowing that File A depends on File B is good, but it’s most useful to know how File A depends on File B. This requires a little bit of Understand terminology.

An entity is anything Understand has information on from your source code, such as a file, class, function, or variable. The files Main.cpp, MyCode.h, and MyCode.cpp are entities. The functions main() and doSomething() are also entities.

A reference is a relationship between two entities:

  1. Main.cpp includes MyCode.h on line 2 (column 0) of Main.cpp.
  2. main() calls doSomething() on line 6 (column 2) of Main.cpp.
  3. MyCode.h declares doSomething() on line 2 (column 5) of MyCode.h.
  4. MyCode.cpp defines doSomething() on line 11 (column 5) of MyCode.cpp.

Understand answers the question of “how does file A depend on file B” with references. Looking at compile-time dependencies, Main.cpp depends on MyCode.h because of references 1 and 2 in the list above. For link-time dependencies, Main.cpp depends on MyCode.cpp because of reference 2.

Notice that there are different kinds of references. A call reference is a different kind of reference than an include reference. Not all reference kinds are used for dependencies. Which kinds are? You can pick based on the languages in your project. By default, for C++, the following reference kinds are used:

  • Call
  • Include
  • Modify
  • Set
  • Typed
  • Use

  • Base/Derive
  • Catch
  • Overrides
  • Throw
  • Using

File Parents

Let’s take a closer look at the second reference in the list above:

main() calls doSomething() on line 6 (column 2) of Main.cpp.

How does Understand map this reference to a dependency between Main.cpp and MyCode.cpp or MyCode.h? First, some more terminology. In this example, main() is the scope entity and doSomething() is the referenced entity. So, when drawing the reference with an arrow, the tail is at the scope and the referenced entity is at the arrowhead:

            Main —-calls—> doSomething()

            Scope —reference kind ——> referenced entity.

To map “main() calls doSomething()” to a dependency between files, we need to find the scope file and the referenced file. The scope file is easy. It’s already part of the reference as the file location.

            main() calls doSomething() on line 6 (column 2) of Main.cpp.

What about the referenced file? Where does doSomething() belong? The information browser in Understand provides one answer. doSomething() is defined in MyCode.cpp so it belongs to MyCode.cpp. Dependencies used this default for a long time and it’s roughly equivalent to link-time dependencies. But, there are two problems with this approach:

  1. Entities such as typedefs and macros can have multiple definitions and in that case an arbitrary file was picked from among the possible ones.
  2. Entities with no definition don’t belong anywhere so they don’t count. Suppose in your project you only have Main.cpp and MyCode.h. MyCode.cpp is part of a binary library and you don’t have the original source code.  In that case, doSomething() doesn’t have a file parent.

So dependency analysis needs to allow for multiple possible parents and have a way to pick between them.

Possible parents for C++ depend on link mode or compile mode. In link-mode, each file an entity is defined in is a possible parent. So doSomething() has a single possible file parent: MyCode.cpp. In compile-mode, declaration files are also possible parents so doSomething() has two possible parents: MyCode.h and MyCode.cpp.

If there are multiple possibilities, which one is right? That depends on the scope file. From Main.cpp, MyCode.h is a better parent than MyCode.cpp because Main.cpp includes MyCode.h. For C++, then, the best file parent is the one that’s found in the include tree of the scope file. What if there are multiple “best” file parents? There are other heuristics to improve things (like the file parent in the same directory is probably better than the file parent far away in the directory structure). But, in the end, if there are multiple “best” files then the reference counts as a dependency to each of them.

Non-File Dependencies

So far, the example has been with file dependencies. But dependencies can be calculated for architectures and for classes too. So, how does that work?

The basic idea remains the same. Given a reference, like “main() calls doSomething() on line 6 (column 2) of Main.cpp,” we need to decide on the scope architecture or class and the referenced architecture or class.

The easiest jump is from file dependencies to architecture dependencies for something like the Directory Structure architecture where everything in the architecture is a file and all files belong to exactly one place. Using the rules above, we can find the scope file and referenced file(s) and look up the files in the architecture.

Consider this architecture for the above files

  • FileArchitecture
    • Source
      • Main.cpp
    • LibHeader
      • MyCode.h
    • LibSource
      • MyCode.cpp

The reference “main() calls doSomething()” is a reference between Main.cpp and MyCode.h (in compile-mode) as we determined above. So, at an architecture level, since Main.cpp belongs to Source and MyCode.h belongs to LibHeader, the dependency goes from Source to LibHeader. In link-mode, it would go from Source to LibSource.

The problems start to come when an architecture is incomplete, redundant, or contains non-file entities. Consider the custom architecture

  • My Incomplete Architecture
    • My Favorite Files
      • Main.cpp

Suppose I want to know what “My Favorite Files” depends on. Well, I know that “main() calls doSomething()” is a reference with Main.cpp (and therefore “My Favorite Files” architecture) as a scope. But where does doSomething() belong? Nowhere.

This is often a point of confusion with dependencies. If a parent doesn’t exist at the given dependency level (root architecture, file-level, or class-level), then the reference won’t be displayed as a dependency. Class dependencies only exist between classes so a class will never have a dependency to a non-member function even if it calls non-member functions.

Redundance

Redundant architectures are easier to deal with. It’s essentially the same problem as multiple file parents.

  • My Redundant Architecture
    • Files I manage
      • Main.cpp
      • MyCode.h
    • Files I wrote
      • Main.cpp
    • Files others wrote
      • Jack wrote this
        • MyCode.h
      • Jill wrote this
        • MyCode.cpp

When considering “main() calls doSomething()” as a dependency for “Files I wrote” in compile mode, MyCode.h belongs to two different architectures: “Files others wrote/Jack wrote this” and “Files I manage.” When multiple parents are possible, the first step is to see if some parents are better than others. For architectures, the calculation is distance-based. “Files I Manage” is closer (a sibling) to “Files I wrote” than “Files others wrote/Jack wrote this” (a niece/nephew) is. So, in this case, the dependency is only between “Files I wrote” and “Files I manage.” If multiple architectures are the same distance away, then all the architectures at the minimum distance are used.

Note that this distance filter means that “main() calls doSomething()” is internal to “Files I manage”  because “Main.cpp” and “MyCode.h” both belong to “Files I manage.”

Finally, let’s look at non-file entities. Suppose I have an architecture setup like this:

  • My Refactor Plan
    • Keep
      • Main.cpp
      • MyCode.h
      • MyCode.cpp
      • main()
    • Remove
      • doSomething()

Like the redundant architecture above, doSomething() could now belong in multiple places. As a child of MyCode.h or MyCode.cpp, it belongs in Keep. But it is also directly tagged in Remove. Main is even more confusing. It solidly belongs in Keep, but it belongs there twice, once as a child of Main.cpp and once because it is directly tagged. Personally, this feels like a “read the customers mind” question. What does the customer expect in this situation?

The previous behavior would result with “main() calls doSomething()” counting twice as a dependency from Keep to Remove, once from the directly tagged main() and once from Main.cpp. Also if doSomething() had a dependency, then both Keep and Remove would have that dependency since doSomething belongs to both of them (through MyCode.cpp for Keep, and directly for Remove).

While the reasoning behind the previous behavior makes sense when it’s thought through, it isn’t how I personally would want the architecture to behave. I would expect an “exception” type behavior. Everything in MyCode.cpp belongs to Keep except doSomething(). So “main() calls doSomething()” is a single dependency from Keep (through the directly tagged main(), not Main.cpp) to Remove (through the directly tagged doSomething()). Dependencies out of doSomething() only contribute to Remove’s dependencies, and would never be part of Keep even though MyCode.cpp belongs to Keep.

Non-File Children

Ok, that sounds simple enough. Ready for more mind-boggling questions? For this part of dependencies, the simple example we’ve been using isn’t quite enough. So, let’s add some details to doSomething().  

// MyCode.cpp
#include "MyCode.h"

typedef struct MyStruct {
  int var1;
  int var2;
} MyStruct;

void doSomethingElse(MyStruct s);

void doSomething()
{
  MyStruct args;
#if MACRO_WITH_MULTIPLE_DEFINITIONS
  doSomethingElse(args);
#endif
}

Now, consider the following minimalistic architecture:

  • Parents Example 1
    • Structs
      • MyStruct
    • Functions
      • doSomething()

Does Functions depend on Structs? Well, here are the references:

  1. doSomething() defines args on line 13 of MyCode.cpp
  2. args types MyStruct on line 13 of MyCode.cpp
  3. doSomething() calls doSomethingElse() on line 15 of MyCode.cpp
  4. doSomething() uses args on line 15 of MyCode.cpp

So, doSomething() has references to args and to doSomethingElse() but doSomething() does not have direct references to MyStruct. The only way to express a dependency between doSomething() and MyStruct is to consider args as a child of doSomething() using reference 1. This is different than the parent/child relationships mentioned so far which have all used files. From the file perspective, MyCode.cpp is the scope file for reference 1 because of the reference location. The scope and referenced entities don’t matter for a file. But, to handle non-file entities, we have to find the children of non-file entities. So, given an arbitrary tagged entity, what children (and grandchildren and great-grandchildren and so on) get considered?

Things can get language-specific here, but sticking with C/C++ code, here are the current rules:

Entity KindReference KindChildren Included
c class, c struct, c uniondefine,declarec unnamed member class, c member enum, c member function, c member object, c unnamed member struct, c member union
c enumdefinec enumerator
c functiondefinec lambda function, c local object, c parameter

For an example of how these rules apply, consider the following example:

// Parents.cpp
#include <algorithm>
#include <vector>

#define MACRO_WITH_MULTIPLE_DEFINITIONS 1

class MyClass {
public:
  enum MyMemberEnum {
    MyMemberVal1,
    MyMemberVal2
  };

  virtual void pureVirtualMemberFunction() = 0;

  void memberFunction(MyMemberEnum param)
  {
    // a local variable
    std::vector<int> sorted;

    // a lambda function
#if MACRO_WITH_MULTIPLE_DEFINITIONS
    std::sort(sorted.begin(), sorted.end(),
    [](const int &lhs, const int & rhs) {
      return lhs < rhs;
    });
#endif
  }

  struct MyNestedStruct {
    int a;
    int b;
  };

  MyNestedStruct mMemberData;

  struct {
    int c;
    int d;
  };
};

The parent-child hierarchy would be:

  • MyClass (c class)
    • MyMemberEnum (c enum, defined in MyClass)
      • MyMemberValue1 (c enumerator defined in MyMemberEnum)
      • MyMemberValue1 (c enumerator defined in MyMemberEnum)
    • pureVirtualMemberFunction (included by declaration even though there is no definition)
    • memberFunction() (c member function defined in MyClass)
      • param (c parameter defined in memberFunction)
      • sorted (c local object defined in memberFunction)
      • unnamed lambda function (c lambda function defined in memberFunction)
        • lhs (c parameter)
        • rhs (c parameter)
    • mMemberData (c member object defined in MyClass)
    • unnamed struct (c unnamed member struct defined in MyClass)
      • c (c member object)
      • d (c member object)
  • MyNestedStruct (c member struct)
    • a (c member object)
    • b (c member object)

Notice that MyNestedStruct is not a child of MyClass when calculating dependencies (although it would show up as a child in the Architecture Browser). So, MyClass can have dependencies to MyNestedStruct. In fact, it does have a dependency because mMemberData types MyNestedStruct. Also, notice the recursive application of the rules. Even the parameter lhs belongs to MyClass by following defined in references to the top.

For architectures, fitting with the exception rule, a parent includes all of its descendants except for those descendants tagged elsewhere. So, for the following architecture:

  • Parents Example 2
    • Classes
      • MyClass
    • Functions
      • memberFunction()

The enumerator MyMemberValue1 belongs in Classes through the grandparent MyClass, but the parameter “param” belongs to Functions through the parent memberFunction(). A dependency would exist from Functions to Classes because the parameter param types MyMemberEnum.

Other Languages

The examples so far have all been in C/C++, but Understand parses other languages too. How do they work?

In general, the dependency calculation for languages other than C has not changed significantly. The compile-time versus link-time dependency distinction is only made for C entities. For other languages, an entity belongs to the file it is defined in, and parent/child relationships for non-file entities use the parser-determined parent() function (available from the APIs and visible in the “API Info” Interactive Report). The parent() function is typically also based on the define reference. The main change that impacts other languages is the exception style architecture change that means a directly tagged non-file entity is no longer considered as part of the architecture its file parent belongs to.

One language besides C/C++ does have specific rules: Ada. In Ada, like C/C++, there can be multiple file parents for an entity. Usually, the choice is between the spec file and the body file. In Ada, file parents are chosen based on the reference kind (in contrast to c++ where file parents are chosen based on the include tree of the scope file). In general, most reference kinds map to the body file, but usepackage, and with references are dependencies to the spec file. The idea, to quote the engineer over Ada, is “Bodies depend on specs, subunits depend on bodies, variable uses/sets depend on defining file, calls make a dependency to the definition (actual body).”