One fun problem in massively parallel OpenEmbedded builds is when tasks have bad dependencies or just bugs and you can end up with failures due to races on disk.
One example of this happened last week when an integration branch was being tested and one of the builds failed with tar error: file changed as we read it
whilst it was generating the images. This means that the root filesystem was being altered whilst tar was reading it, so we've a parallelism problem. There's only a limited number of tasks that could be having this effect here so searching the log isn't too difficult, but as they say: why do something by hand when you can write a script to do it for you.
findfails
is a script that will parse a Bitbake log and maintain the set of currently active tasks, so when it finds a task that fails it can tell you what other tasks are also running:
$ findfails log
Task core-image-sato-dev-1.0-r0:do_image_tar failed
Active tasks are:
core-image-sato-sdk-ptest-1.0-r0:do_rootfs
core-image-sato-dev-1.0-r0:do_image_wic
core-image-sato-dev-1.0-r0:do_image_jffs2
core-image-sato-dev-1.0-r0:do_image_tar
core-image-sato-sdk-1.0-r0:do_rootfs```
We knew that there were changes to do_image_wic
in that branch, so it was easy to identify and drop the patch that was incorrectly writing to the rootfs source directory. Sorted!