As I'm now working on the Yocto Project,
I've a new i7 build machine which builds all of the distro with -j8
for speed (and builds up to 8 packages at once, just to make sure that
all the cores are busy). I don't actually know what -j
level the
autobuilders are using but they've 24 cores each... Anyway, lots of code
is being built daily with high Make parallelism, so we're good at
finding subtle races in makefiles. Debugging these isn't trivial or
obvious at first, so I thought I'd blog about a few that I've
encountered recently.
telepathy-glib
| Making all in telepathy-glib
| make[2]: Entering directory `/buildarea1/yocto-autobuilder/yocto-slave/nightly-x86/build/build/tmp/work/i586-poky-linux/telepathy-glib-0.19.2-r0/telepathy-glib-0.19.2/telepathy-glib'
| /bin/mkdir -p _gen
| ( cd . && cat versions/0.7.0.abi [...] versions/0.19.2.abi ) |
| /bin/grep '^tp_cli_.run.' > _gen/reentrant-methods.list.tmp | /bin/sh: line 1: _gen/reentrant-methods.list.tmp: No such file or directory | make[2]: *** [_gen/reentrant-methods.list] Error 1
So it creates a directory, and then fails to create a file? The hint
is that the error is "no such file or directory" which tells you that
_gen/
isn't present. What isn't obvious from the output is that make
is running the mkdir
and the subshell containing
reentrant-methods.list
in parallel, which you can confirm by looking
at the
makefile.
It's rather large, but the gist of it is that the rule that does the
mkdir
isn't a dependency of the code that generates
reentrant-methods.list
, so they must be dependencies of some higher
target and are therefore being run in parallel.
Most of the time the mkdir
happens first but occasionally the subshell
wins the race and _gen/
doesn't exist yet. Once this was understood
it's a simple matter to add some missing
dependencies
to the makefile.
gThumb
This was more fun. When building with any level of parallelism, make
would busy-loop forever. Annoying on your desktop, not so funny on a
build server.
When make is running tasks sequentially, it knows when the task has been
completed it can check to see if files have appeared and so on. This
logic changes with any level of parallelism because multiple things are
happening at once. Strangely make solves this by busy-looping, watching
for file changes (you can see this with --debug
). Generally the
expected files either appear or there is an error, but in this case make
was spinning for ever.
Digging into the rules for the enumeration generator shows some dependenceis that are not required, and rather complex logic when putting the generated files in the right place. Complicated, and Doing It Wrong.
Writing to a temporary file and then atomically moving that to the right file is a good thing, and essential in parallel builds, as otherwise dependent rules could read a partially-written file. But this makefile is comparing the temporary file with the target and copying the file only if it's different. This looks like an attempted optimisation to reduce rebuilds caused by the enum timestamp changing (won't work: the enum re-generation is happening for a reason, so the rest of the source will rebuild too) and this is what is causing the problem: make is waiting for a file to change when it won't ever change. Once this is understood the fix is simple and results in a cleaner makefile.
WebKitGTK
Oh, WebKit... The one package that you need to build with -j
to get
a build time less than two days, and it exposes a bug in Make 3.82
causing it to fail with -j
. Thanks for that, Make. For reference this
is the WebKitGTK+ bug
and this is the two-year old Make
bug.