Getting Git: Submodules

Abstract: The limitations on how submodules are currently handled in Understand and an algorithm to improve it.

One of the git features currently in Understand is the ability to filter CodeCheck violations to only violations in or after particular git commits. 

During a demo, the presenter edited a file to add a violation, picked uncommitted changes, and ran CodeCheck. Unfortunately, no violations were found, making for an embarrassing demo. What was the problem? The file happened to be part of a submodule instead of the main repository.

For CodeCheck filters and also for an upcoming feature (git comparison databases), we’d really like submodules to work seamlessly with the main repository. From the user’s perspective, it shouldn’t matter if the file was part of the base repository or a submodule when calculating diffs or searching for files.

Internally, Understand uses a wrapper library around libgit2 to access git. The wrapper library comes from the open-source project GitAhead, also written by SciTools. The Git filter works by making a diff and checking if the files and line numbers of violations were changed in the diff. The constructor is:

GitFilesFilter(const QString & commitPrefix, Udb::Db * db, FileFilter::Relation relation)
    : FileFilter(), mDb(db), mInCommit(relation)
  {
    QString repoPath = mDb->project()->settings()->ComparisonOptions().repository(mDb->project()->parent());
    mRepo = git::Repository::open(repoPath);
    if (mRepo.isValid()) {
      git::Index index = mRepo.index();
      git::Reference headRef = mRepo.head();
      if (relation == AfterCommit)
        mStatusDiff = mRepo.status(index, nullptr, false, 0);

      // Can't merge commit diff to status diff (error message about conflicting
      // options) so track both.
      if (!commitPrefix.isEmpty()) {
        mCommit = mRepo.lookupCommit(commitPrefix);
        if (mCommit.isValid()) {
          if (relation == InCommit)
            mCommitDiff = mCommit.diff(git::Commit(),0,false);
          else if (headRef.isValid())
            mCommitDiff = headRef.target().diff(mCommit,0,false);
        }
      }
    }
  }

The problem is that just like with a normal git status, a changed submodule shows up as a single entry in the diff, not as the list of files in the submodule that have been changed.

A basic solution would be to recurse over all the submodules too. For the status diff, something like this:

if (relation == AfterCommit) {
  mStatusDiff = mRepo.status(index, nullptr, false, 0);
  QList<git::Submodule> subs = mRepo.submodules();
  while (!subs.isEmpty()) {
    git::Repository subRepo = subs.takeFirst().open();
    if (!subRepo)
      continue;
    git::Index subIndex = subRepo.index();
    mStatusDiff.merge(subRepo.status(index, nullptr, false, 0));
    subs << subRepo.submodules();
  }
}

If you were to try this solution, you would first realize that if mStatusDiff was null or the returned status diff was null, errors would occur on the merge line. But fixing those up, there’s still a major problem. Suppose you had the following files:

  • ./directory/fileInBaseRepo.cpp
  • ./submod/fileInSubMod.cpp

Assuming both are changed, then the paths in the diff that are stored are:

  • directory/fileInBaseRepo.cpp
  • fileInSubMod.cpp

How did fileInSubMod.cpp lose part of its path? The paths are always relative to the repository, but fileInSubMod was part of a submodule’s repository, not the base repository. So the relative path is wrong. To fix it, each diff must be stored separately and know the corresponding repository path.

To update the diff filter to work correctly, first a helper struct to store a diff with its repository.

  struct DiffRepo {
    git::Repository repo;
    git::Diff diff;
  };
  QList<DiffRepo> mDiffs;

Then, for the status diffs:

GitFilesFilter(const QString & commitPrefix, Udb::Db * db, FileFilter::Relation relation)
  : FileFilter(), mDb(db), mInCommit(relation)
{
  QString repoPath = mDb->project()->settings()->ComparisonOptions()
                     .repository(mDb->project()->parent());
  git::Repository repo = git::Repository::open(repoPath);
  process(relation, repo, relation == AfterCommit );
}

void process(
  FileFilter::Relation relation,
  git::Repository repo,
  bool statusDiff)
{
  if (!repo.isValid())
    return;

  // Status diff if requested
  if (statusDiff) {
    git::Index index = repo.index();
    git::Diff diff = repo.status(index, nullptr, false, 0);
    if (diff.isValid())
      mDiffs.append({repo, diff});
  }

  // Commit diff code will need to go here

  foreach (git::Submodule submod, repo.submodules()) {
    git::Repository subRepo = submod.open();
    if (subRepo.isValid())
      process(relation, subRepo, statusDiff);
  }
}

With that, the status diff works correctly, so the uncommitted changes CodeCheck filter will work. What about changes after a commit. That one isn’t so easy. The problem here is that the provided commit is for the base repository. We need to find what the submodule commit was given a parent repository’s commit.

// Test if the commit belongs to the current repository. Repository objects
// can't be compared directly, so compare the working directory.
if (commit.isValid() && commit.repo().workdir() != repo.workdir()) {
  // Clear the commit because it isn't part of the current repository
  git::Commit parCommit = commit;
  commit = git::Commit();

  // Determine if the current repository is a submodule of the commit's
  // repository by comparing the paths
  QString commitRepoPath = parCommit.repo().workdir().absolutePath();
  QString subPath = repo.workdir().absolutePath();
  if (subPath.startsWith(commitRepoPath)) {
    
    // Look up the path to the submodule in the parent repository
    QString relative = parCommit.repo().workdir().relativeFilePath(subPath);
    
    // Get the submodule commit at the parent repository's commit
    git::Id id = parCommit.tree().id(relative);
    
    // convert it
    commit = repo.lookupCommit(id);
  }
}

With the commit converted, it’s now possible to calculate the diff:

    if (commit.isValid()) {
      git::Diff commitDiff;
      if (relation == InCommit)
        commitDiff = commit.diff(git::Commit(),0,false);
      else if (relation == AfterCommit && headCommit.isValid())
        commitDiff = headCommit.diff(commit,0,false);

      if (commitDiff.isValid())
        mDiffs.append({repo,commitDiff});
    }

The same commit conversion code can be used while scanning a git tree for project files. This fix hasn’t been released yet since it’s part of the larger git comparison databases feature. But for now, if you hit the same problem our presenter did, you’ll know the reason.