Skip to content
SciTools Blog
Menu
  • Home
  • Blog
  • Support
  • Contact
  • Login
  • Pricing
  • Free Trial
Menu
brown bear plush toy on white textile

Getting Git: Scanning Directories

Posted on November 11, 2021

Abstract: My algorithm for finding files at a particular git revision.

Understand databases come in two main flavors: watched directory project and imported projects. A watched directory project decides which files belong in the project by scanning the file system. An imported project uses a visual studio project/solution, Xcode project, or CMake compile commands json file to determine the files in the project and their settings.

I’m working on a feature to create a database at a provided git commit. For it to work correctly, watched directory projects need to scan the git repository instead of the file system to find files in the project. 

The current directory scanning code uses QT. The basic algorithm is something like this:

void RescanDir::rescanImpl(const QDir &dir, QList<QFileInfo> &files)
{
  foreach (const QFileInfo &info, dir.entryInfoList(kFilters)) {
    // Report progress.

    // Handle regular files.
    QString ext = info.suffix();
    if (!info.isDir() || kBundleExts.contains(ext)) {
     if (/*file doesn’t match options*/)
        continue;

      // Add the file.
      files.append(info);
      continue;
    }

    // Handle directories
    if (!mRecursive || /*directory doesn’t match options*/)
      continue;

    // Enter subdirectory.
    rescanImpl(path, files);
  }
}

For Git access, Understand uses a wrapper around libgit2. The wrapper is essentially the same as the git wrapper (src/git) in the open source GitAhead project, also written by SciTools. I’m by no means an expert on git, but in general tree objects are kind of like directories and blobs are kind of like of files. So, a rescan operation for git would be something like:

void rescanImpl(const git::Tree & tree, const QString & curPath, QStringList & files)
{
  for (int i = 0; i < tree.count(); i++) {
    QString path = curPath + "/" + tree.name(i);
    git::Object object = tree.object(i);
    git::Tree subTree(object);
    if (!subTree) {
      if (/*file doesn't match options*/)
        continue;
      
      files.append(path);
      continue;
    }
    
    if (/*directory doesn't match options*/)
      continue;
    rescanImpl(subTree, path, files);
  }
}

It seems simple enough, but there are some problems. The first problem is ignored files. It’s a relatively common paradigm to ignore automatically generated files in git, but those files may still be part of the Understand project. So, when scanning for project files, what we actually want is a list of files that are in git plus any files that exist on disk that are ignored by git. 

void gitFilter(const QDir & dir, QList<QFileInfo> & onDisk)
{
  QString path = mRepo.workdir().relativeFilePath(dir.absolutePath());
  git::Id treeId = mCommit.tree().id(path);
  // TODO Convert the treeId to a tree object
  QSet<QString> gitEntries;
  for (int i = 0; i < tree.count(); i++)
    gitEntries.insert(tree.name(i));
  
  // Remove any entries from the fileinfo list that don't exist
  QSet<QString> found;
  auto iter = onDisk.begin();
  while (iter != onDisk.end()) {
    QString filename = iter->fileName();
    if (gitEntries.contains(filename) ||
        mRepo.isIgnored(treePath + filename)) {
      found.insert(filename);
      iter ++;
    } else {
      // doesn't exist in the repository for this commit, remove
      iter = entries.erase(iter);
    }
  }

  // Build the entries that exist in git but not on disk
  foreach (const QString & gitFile, gitEntries) {
    if (!found.contains(gitFile)
      entries.append(QFileInfo(dir.filePath(gitFile)));
  }
}

There is one immediately obvious problem with the code. The wrapper library doesn’t provide a way to find a tree object by id or by path. There is a function to lookup a blob by ID in Repository:

Blob Repository::lookupBlob(const Id &id) const
{
  git_object *obj = nullptr;
  git_object_lookup(&obj, d->repo, id, GIT_OBJECT_BLOB);
  return Blob(reinterpret_cast<git_blob *>(obj));
}

Adding a function for lookup tree turns out to be pretty simple:

Tree Repository::lookupTree(const Id &id) const
{
  git_object *obj = nullptr;
  git_object_lookup(&obj, d->repo, id, GIT_OBJECT_TREE);
  return Tree(reinterpret_cast<git_blob *>(obj));
} 

Now, to test it. Unfortunately, it doesn’t work. The problem is that QFileInfo::isDir() and the related functions only work if the file exists. Even ensuring the directories end in ‘/’ won’t change the result. So, the file system rescan needs additional information for the files that only existed in git. How much? The file type is probably enough. Again, there isn’t a wrapper function to access the type. But adding one is just a matter of finding the libgit2 function to wrap: 

git_filemode_t Tree::filemode(int index) const
{
  const git_tree_entry *entry = git_tree_entry_byindex(*this, index);
  return git_tree_entry_filemode(entry);
}

Now, we can return a map from the name to the file type from the filter function, and update the rescan library:

void RescanDir::rescanImpl(const QDir &dir, QList<QFileInfo> &files)
{
  auto dirEntries = dir.entryInfoList(kFilters);
  auto gitOnly = VersionControlManager::instance()->filteredEntries(
                   dir, dirEntries, mCommit);
  foreach (const QFileInfo &info, dirEntries) {
    // Report progress.

    // Check git
    bool isGit = gitOnly.contains(info.fileName());
    git_filemode_t gitMode = gitOnly.value(info.fileName(),
                                           GIT_FILEMODE_UNREADABLE);
    // Handle regular files.
    QString ext = info.suffix();
    // info.isDir() only works for files that exist (not git only files)
    bool isDir = isGit ? (gitMode == GIT_FILEMODE_TREE ||
                          gitMode == GIT_FILEMODE_COMMIT) : info.isDir();
    if (!isDir || kBundleExts.contains(ext)) {
      if (/*file doesn’t match options*/)
      continue;

      // Add the file.
      files.append(info);
      continue;
    }

    // Handle directories
    if (!mRecursive || /*directory doesn’t match options*/)
      continue;


    // Enter subdirectory.
    rescanImpl(path, files);
  }
}

There is one other problem to be solved: submodules. That topic will be covered in another post, but a small hint is that GIT_FILEMODE_COMMIT is in the code. From the rescan function’s perspective, that condition ensures submodules are treated like directories.

  • Instagram
  • Facebook
  • LinkedIn
  • Twitter
  • YouTube

Learn more about Understand's Features

  • Dependency Graph
    View Dependency Graphs
  • Comply with Standards
  • View Useful Metrics
  • Team Annotations
  • Browse Depenedencies
  • Edit and Refactor

Related

  • API
  • Architectures
  • Business
  • Code Comparison
  • Code Comprehension
  • Code Navigation
  • Code Visualization
  • Coding Standards
  • Dependencies
  • Developer Notes
  • DevOps
  • Getting Started
  • Legacy Code
  • Licensing
  • Metrics
  • Platform
  • Plugins
  • Power User Tips
  • Programming Practices
  • Uncategorized
  • Useful Scripts
  • User Stories
  • May 2025
  • January 2025
  • December 2024
  • November 2024
  • August 2024
  • June 2024
  • May 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • June 2023
  • April 2023
  • January 2023
  • December 2022
  • November 2022
  • September 2022
  • August 2022
  • May 2022
  • April 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021

©2025 SciTools Blog | Design: Newspaperly WordPress Theme