Abstract: Cool Git plugins for Understand to access commits, authors, owners, cohesion, coupling, and dates.
The article “Code as a Crime Scene” discusses the importance of time-based metrics. For instance, suppose there are two files with high cyclomatic complexity. The first is modified every week, and the second hasn’t been touched in ten years. The cyclomatic complexity of the first file is likely a bigger problem than the cyclomatic complexity of the second file.
Time-based metrics can be calculated from version control systems. Understand calculates code quantity and quality metrics. With metric plugins, the two sets of metrics can be combined in Understand and visualized with tree maps or calculated over custom groups (architectures). Here are six pieces of information from Git you can access with plugins.
1. Commits
The git metric plugin defines five metrics. Git commits is the number of commits that touched a given file or directory. From “Code as a Crime Scene,” “This metric is based on the idea that code that has changed in the past is likely to change again.” For OpenSSL, a tree map with size by the sum cyclomatic complexity and color by Git commits is.
In this case, the cyclomatic complexity ssl_lib.c (1019) is more likely to be a problem than that of cmp_ctx_test.c (1384), even though cmp_ctx_test.c has a higher complexity because ssl_lib.c is more likely to change again (778 commits compared to 36).
2. Authors
Another metric from “Code as a Crime Scene” is the number of unique authors. The metric is based on a research paper titled “The Influence of Organizational Structure on Software Quality,” which breaks it down into former and current employees. It comes from the idea that communication between authors touching the same piece of code decreases the likelihood of errors. So the more authors touching code, the more lines of communication required. A set of only four authors has six lines of communication (3 + 2 + 1), but 20 authors requires 190 (20*19/2). For OpenSSL, the tree map with size by sum cyclomatic complexity and color by authors is:
Again, ssl_lib.c would be prioritized higher than cmp_ctx_test.c with 69 unique authors compared to 7. But now s_client.c stands out more with 78 authors (625 sum cyclomatic complexity, 541 commits).
Architecture
A tree map is great for comparing two metric values. But what if I want to create groups instead. For example, the last Git author that touched a file. Architecture plugins can be used to programmatically create architectures (custom groups). Architectures can then be used to calculate summary metrics or graphs.
3. Ownership And Contributors
The count of unique authors isn’t the only author metric. The three technical debt metrics listed by StepSize are all calculated from Git. The first one is ownership, based on the research paper “Don’t Touch My Code! Examining the Effects of Ownership on Software Quality.” The other two are cohesion (discussed below) and churn, which is similar to the commits metric above and discussed in the Date section.
Architecture
Focusing on ownership, the owner of a file is the author who has made the most commits to that file. There’s an architecture plugin for Git owners as well.
Comparing the results to the Git author architecture, author Dr. Matthias St. Pierre owns slightly more files than those last touched, but the owned files are mostly header files and have fewer dependencies on other owners.
Metric
As a metric, ownership is the percentage of commits made by the owner. Since the owner is expected to know the most about the file, a higher ownership percentage is better. For OpenSSL, the ownership tree map is:
Here, darker colors are better since they represent higher ownership. Following the previous files, ssl_lib.c has an ownership of 26.61%, cmp_ctx_test.c has 61.11%, and s_client.c has an ownership of 19.96%.
Major Contributions & Minor Contributions
The research paper on ownership also broke down authors into two categories. Those authors who made at least 5% of the total commits are major contributors. The remaining contributors are minor contributors. A minor contributor in one area is usually a major contributor to a different area. Including minor contributors improved models predicting failures.
For OpenSSL, the tree map for minor contributors is:
Again following the previous files, ssl_lib.c has 64 minor contirbutors, cmp_ctx_test.c has 4, and s_client.c 74.
Interactive Report
What if I want to know who the owners are for a particular file? I’d have to search for the file in the Git Owner architecture, which isn’t very convenient. Or, what if I wanted to know the number of commits for each author on the file? The metric plugin knows those values to calculate ownership, but that information can’t be displayed with metrics. Instead, I’ll need to use an interactive report. An interactive report can display custom information about an entity, architecture, or database. The authors ireport can be used to show author details. For ssl_lib.c, the result is:
4. Cohesion
The second technical debt metric described by StepSize is cohesion. A cohesive commit is a commit whose changed files are all in the same path. A non-cohesive commit has files with different paths. The cohesion of a project is the percentage of cohesive commits.
There is a different metric plugin for this metric than the other five, and it is only calculated for architectures. For the root of the Directory Structure architecture, the metric definition matches that provided by StepSize. A cohesive commit touches only project files with the same direct parent architecture. A non-cohesive commit touches files with different direct parents.
So, given this architecture:
- Directory Structure
- Parent Folder
- Child File
- Child Folder
- Grandchild File
- Parent Folder
A commit modifying “Child File” and “Grandchild File” is not cohesive because one file belongs to “Child Folder” and the other to “Parent Folder.”
For OpenSSL, the cohesion using Directory Structure is 74.13%. The cohesion can also be calculated for other root architectures besides Directory Structure. For example, the cohesion of the Git Author architecture is 66.40%. The Git Owner architecture cohesion is 74.51%.
Finally, the cohesion can be calculated for a child architecture. In that case, it works a little differently, counting all descendants. Using the previous example, if the child architecture is “Parent Folder” then both “Child File” and “Grandchild File” count as belonging to “Parent Folder.” A cohesive commit is one that touches only descendants of “Parent Folder,” and a non-cohesive commit includes other files. For OpenSSL, the crypto folder (which contains subfolders) has a cohesion of 72.79%.
5. Coupling
The article “Code as a Crime Scene” also discusses coupling. Given a pair of files, the coupling between them is the number of commits that changed both files.
The logic from “Code as a Crime Scene” is that coupled files can indicate unwanted dependencies. Some coupling is expected, like a change in a header file is likely to change the corresponding code file, or a producer-consumer file pair is likely to change together. But coupling can also be a sign of copy-paste code that should be refactored out into a common location or more subtle unwanted dependencies.
Since this value operates on pairs of files, the easiest way to display the information is an interactive report. The report will also display the percentage of commits relative to the number of commits of the target file. An example for ssl_lib.c is:
Metrics
If I know the file I’m interested in, the interactive report shows me the information I need. But if I’m looking at a project with hundreds or thousands of files, how do I know where to start? The article doesn’t mention any per-file metrics, so I’ll invent a few. The metric plugin for coupling provides the following values:
- Git Max Coupling: The highest coupling percentage
- Git Average Coupling: The average coupling percentage
- Git Coupled Files: The total number of coupled files
- Git Strongly Coupled Files: The number of coupled files whose coupling percentage is >= 50%
Now I can use tree maps to get a feel for my project:
Of the four metrics, I’d probably start with “Strongly Coupled Files.” For instance, the dark blue file in the bottom right quadrant is obj_local.h with a value of 55. That means if I change that file, there’s at least a 50% chance based on history that I’m going to have to change 55 other files at the same time.
Graph
The interpretation of coupling reminds me of dependency networks. Both use the project graph to predict how changing one file will impact the other files in the project. I can view a dependency overview graph similar to this article with Understand builds later than 1164. For Open SSL, the graph is:
I’ll make a graph plugin (that also requires build 1164 or later for the layout) to make the same kind of graph with coupling information instead of dependencies. An edge exists from file A to file B if the coupling percentage is greater than a cutoff. The cutoff is necessary because too many edges will hang the UI during render and isn’t very useful anyway. The default cutoff is 50%. The dependency overview graph uses lines of code to size the nodes, but I’ll use number of commits for this graph. The coloring is the same (color by architecture with edge color by the source node).
Interestingly, the Directory Structure architecture seems to reflect the actual Git coupling pretty well. Coupling also formed natural clusters, unlike dependencies.
6. Dates
Recall that the final technical debt metric listed by StepSize was Git churn. The number of commits gives an idea of how much a file has changed. But, what if I want to know how recently a file has changed? StepSize breaks down files as:
- Active: at least two changes in the past month
- Recurrently Active: active for more than one month
- Stable
It’s normal for files involved with developing features to be active. But recurrently active files may be a sign of technical debt since the file may be changing frequently due to many bugs or unwanted dependencies. There’s an architecture plugin for breaking files down into these three categories, where a “month” is defined as the past 30 days rather than the current calendar month. The version of OpenSSL I’m using to demo is from 2022, so it is too old to have a useful stability architecture.
Other Architectures
I can also use Git to create calendar architectures. I’ll make four architecture variants based on commit and style. The commit is either the most recent or the earliest commit and the style is either absolute or relative. Then the four architectures are:
- Absolute from the last modified date arch_modified.upy
- Relative from the last modified date arch_modified_rel.upy
- Absolute from the earliest date arch_created.upy
- Relative from the earliest date arch_created_rel.upy
Relative architectures are the same style as the built-in Calendar architecture with groups such as “Today” and “This Week.” Since my Open SSL version is from 2022, everything falls into the “Earlier” category. But the absolute calendars are interesting. These use the year and month.
So, the earliest files are from 1998, but all files have been modified as recently as 2018.
Metrics
Going back to “Code as a Crime Scene,” a file that has changed a lot in the past is likely to change again. But there’s a time dimension here, too. A file might have a lot of commits, but if they were all years ago then the file probably isn’t about to suddenly start changing. So, is there a metric I can use to find active files? I’ll create two metrics (plugin):
- Days since last modified
- Days since created
Then my tree maps are:
The scale isn’t ideal because there are some files in the top right that don’t have Git information and so defaulted to a value of 0. But I can still pick out files like t1_trce.c at the top that are more stable (hasn’t changed since 2018).
Days since last modified is also available as a line metric, so I can use it to color Control Flow Graphs similar to the blame margin in the editor:
Conclusion
To review, the plugins referenced in this article are:
- Metrics plugin for Commits, Authors, Ownership, Major Contributors, and Minor Contributors
- Metrics plugin for Cohesion
- Metrics plugin for Coupling
- Metric plugin for Dates
- Architecture plugin for the last Git author to modify a file
- Architecture plugin for the Git owner of a file
- Architecture plugin for Git stability
- Architecture calendar plugins
- Interactive Report plugin for authorship
- Interactive Report plugin for coupling
- Graph plugin for coupling
Looking for more cool plugins? Check out our plugin repository.