TLDR
I speed up our build system by using a simple Python script to create an architecture based on build dependencies, then use it to identify circular dependencies.
Details
Like most of our engineers, I personally prefer Mac over Windows for most tasks. With more engineers focused on it (and because it’s better) the Mac version of Understand now builds significantly faster than the Windows version (3 minutes vs 18 minutes). My manager, however, runs on Windows and he asked me to look into making the Windows build faster. Alas, my initial suggestion, that he buy a Mac instead, was ignored, so into the rabbit hole!
The initial analysis showed that the bulk of the time is spent in linking, so I’m going to try and speed up that process. Some reading suggests that converting static libraries that Understand uses into object libraries would significantly speed up the linking process so I’ll start with that.
Understand is built using CMake with lots of small static libraries linked into the final executable. Since each library is in its own directory, the Directory Structure architecture is a pretty good approximation for the way files are linked together. So, I can use an architecture dependency graph to get a general idea of link-time dependencies between the various static libraries.
Unfortunately for me, a general idea of link-time dependencies wasn’t sufficient because I started to hit circular dependency errors from CMake. I needed to see exactly what was causing the circular dependencies.
How can Understand help? I could have made an architecture with the Understand GUI, but Understand has over 10,000 source files, and what programmer would do that by hand when a script could be used instead? I’ll write the script in python. The script uses the old XML format for architectures so it doesn’t even require the python Understand API.
You can download the full script here, but I break it down below.
The script needs to parse CMakeLists.txt files. A typical CMakeLists.txt file for Understand looks like this:
The first bit of the script imports libraries:
Next, the script defines regular expressions for finding text in CMakeLists.txt files like the one above. Basically, I want to find all add_library lines, use the library name as an architecture name, and report the remainder of the list as files belonging to that architecture. So, I have this list of regular expressions (including some for handling variables like ${REFACTOR_UI_HDRS}):
To really handle CMake variables, I need a dictionary from a variable name to the list of files it represents.
Now, the important part: building an architecture. The architecture XML format is old. It’s used for imports from 5.1 projects, but it’s not how architectures are currently stored. It only supports absolute paths to files, not other entities. In short, consider this method as something that may become deprecated when the ability to make architectures from the Understand API is added. But for now, it’s the only way and it works well for lists of files like what CMake provides.
Each architecture in the XML format is an “arch” element with a “name” attribute and text containing a list of absolute file paths. Nested architectures are represented with nested elements. For example:
The python code for generating it is:
Notice I had to provide a dirname argument to convert the files to absolute paths. The result argument is the result of the regular expression. CMake doesn’t specify header files generally, but if there is only one library or executable in the current directory, I want to include any header files even if there isn’t a cpp file counterpart.
The main function loops over CMakeLists.txt file arguments (sys.argv), reads in the file, populates the variable dictionary, then adds architectures for libraries and/or executables. Finally, the complete architecture is written to a file.
I run the script for the directories I’m interested in. I only ran this script on the core Understand code, not including libraries.
find util understand parsers CMakeLists.txt -name "CMakeLists.txt" -exec ~/projects/scripts/cmakeArch.py {} \+
Then I can import it using Understand’s command-line interface, und
und import -arch path/to/cmake_arch.xml path/to/my/project.und
Now, I can get architecture dependency graphs that are a lot closer to the actual way files are linked. The script has limitations: it assumes a certain CMake setup, doesn’t handle variables that exist across files or in the CMake Cache, and only handles some automatically-generated files. But, it’s a lot closer to the actual library setup than the Directory Structure Architecture and I didn’t have to assign 10,000+ files by hand. This new architecture helped me identify the circular dependencies from our build system and eventually I was able to cut the Windows build-time in half!