One fun problem in massively parallel OpenEmbedded builds is when tasks have bad dependencies or just bugs and you can end up with failures due to races on disk.
One example of this happened last week when an integration branch was being tested and one of the builds failed with
tar error: file changed as we read it whilst it was generating the images. This means that the root filesystem was being altered whilst tar was reading it, so we've a parallelism problem. There's only a limited number of tasks that could be having this effect here so searching the log isn't too difficult, but as they say: why do something by hand when you can write a script to do it for you.
findfails is a script that will parse a Bitbake log and maintain the set of currently active tasks, so when it finds a task that fails it can tell you what other tasks are also running:
$ findfails log Task core-image-sato-dev-1.0-r0:do_image_tar failed Active tasks are: core-image-sato-sdk-ptest-1.0-r0:do_rootfs core-image-sato-dev-1.0-r0:do_image_wic core-image-sato-dev-1.0-r0:do_image_jffs2 core-image-sato-dev-1.0-r0:do_image_tar core-image-sato-sdk-1.0-r0:do_rootfs```
We knew that there were changes to
do_image_wic in that branch, so it was easy to identify and drop the patch that was incorrectly writing to the rootfs source directory. Sorted!