Debugging parallelism problems in Make

As I’m now working on the Yocto Project, I’ve a new i7 build machine which builds all of the distro with -j8 for speed (and builds up to 8 packages at once, just to make sure that all the cores are busy). I don’t actually know what -j level the autobuilders are using but they’ve 24 cores each… Anyway, lots of code is being built daily with high Make parallelism, so we’re good at finding subtle races in makefiles. Debugging these isn’t trivial or obvious at first, so I thought I’d blog about a few that I’ve encountered recently.

telepathy-glib

| Making all in telepathy-glib
| make[2]: Entering directory `/buildarea1/yocto-autobuilder/yocto-slave/nightly-x86/build/build/tmp/work/i586-poky-linux/telepathy-glib-0.19.2-r0/telepathy-glib-0.19.2/telepathy-glib'
| /bin/mkdir -p _gen
| ( cd . && cat versions/0.7.0.abi [...] versions/0.19.2.abi  ) | \
| 		/bin/grep '^tp_cli_.*_run_.*' > _gen/reentrant-methods.list.tmp
| /bin/sh: line 1: _gen/reentrant-methods.list.tmp: No such file or directory
| make[2]: *** [_gen/reentrant-methods.list] Error 1

So it creates a directory, and then fails to create a file? The hint is that the error is “no such file or directory” which tells you that _gen/ isn’t present. What isn’t obvious from the output is that make is running the mkdir and the subshell containing reentrant-methods.list in parallel, which you can confirm by looking at the makefile. It’s rather large, but the gist of it is that the rule that does the mkdir isn’t a dependency of the code that generates reentrant-methods.list, so they must be dependencies of some higher target and are therefore being run in parallel.

Most of the time the mkdir happens first but occasionally the subshell wins the race and _gen/ doesn’t exist yet. Once this was understood it’s a simple matter to add some missing dependencies to the makefile.

gThumb

This was more fun. When building with any level of parallelism, make would busy-loop forever. Annoying on your desktop, not so funny on a build server.

When make is running tasks sequentially, it knows when the task has been completed it can check to see if files have appeared and so on. This logic changes with any level of parallelism because multiple things are happening at once. Strangely make solves this by busy-looping, watching for file changes (you can see this with --debug). Generally the expected files either appear or there is an error, but in this case make was spinning for ever.

Digging into the rules for the enumeration generator shows some dependenceis that are not required, and rather complex logic when putting the generated files in the right place. Complicated, and Doing It Wrong.

Writing to a temporary file and then atomically moving that to the right file is a good thing, and essential in parallel builds, as otherwise dependent rules could read a partially-written file. But this makefile is comparing the temporary file with the target and copying the file only if it’s different. This looks like an attempted optimisation to reduce rebuilds caused by the enum timestamp changing (won’t work: the enum re-generation is happening for a reason, so the rest of the source will rebuild too) and this is what is causing the problem: make is waiting for a file to change when it won’t ever change. Once this is understood the fix is simple and results in a cleaner makefile.

WebKitGTK

Oh, WebKit… The one package that you need to build with -j to get a build time less than two days, and it exposes a bug in Make 3.82 causing it to fail with -j. Thanks for that, Make. For reference this is the WebKitGTK+ bug and this is the two-year old Make bug.

5 thoughts on “Debugging parallelism problems in Make

  1. about the enum files:

    So assuming this happens in the debug-modify-build cycle:
    1) You touch a rarely-used header file

    With your patch, this happens:
    2) The enums get rebuilt, because they depend on glib-object.h, which depends on every header
    3) Almost all C files get rebuilt, because they include the enums header

    What you want to avoid is step 3 – the enum header did not change, so there is no need to rebuild more than the few files that actually depend on the changed header.

    While this is not noticable if you do full builds, it’s very noticable in the debug-modify-build cycle – in particular with projects the size of WebKit. ;)

    1. I presume you are talking about similar rules in something like glib? The dependencies in gThumb are purely the headers that the enums are generated from.

Comments are closed.