Today we were cleaning up some old bugs in the Yocto Project bugzilla and came across a bug which was asking for the ability to specify a remote URL for the source tarball checksums (SRC_URI[md5sum]
and/or SRC_URI[sha256um]
). We require a checksum for tarballs for two reasons:
- Download integrity. We want to be sure that the download wasn't corrupted in some way, such as truncation or bad encoding.
- Security. We want to be sure that the tarball hasn't changed over time, be it the maintainer regenerating the tarball for an old release but with different content (this happens more than you'd expect, with non-trivial changes too), or alternatively a malicious attack on the file which now contains malware (such as the Handbrake hack in May).
The rationale for reading remote URLs for checksums was that for files that are changing frequently it would be easier to upgrade the recipe if the checksums didn't need to be altered too. For some situations I can see this argument, but I don't want to encourage practices that nullify the security checksums. For this reason I rejected the bug but thanks to the power of Bitbake I did provide a working example of how to do this in your recipe.
The trick is to observe that the only time the SRC_URI[md5sum]
is read is during do_fetch
. By adding a new function to do_fetch[prefuncs]
(the list of functions that will be executed before do_fetch
is executed) we can download the checksums and write the variable just before the fetcher needs it. Here is a partial example that works for GNOME-style checksums, where each upload generates foo-1.2.tar.bz2
, foo-1.2.tar.xz
, foo-1.2.sha256sum
, and foo-1.2.md5sum
. To keep it interesting the checksum files contain the sums for both compression types, so we need to iterate through the file to find the right line:
SRC_URI = "https://download.gnome.org/sources/glib/2.52/glib-2.52.2.tar.xz"
SHASUM_URI = "https://download.gnome.org/sources/glib/2.52/glib-2.52.2.sha256sum"
do_fetch[prefuncs] += "fetch_checksums"
python fetch_checksums() {
import urllib
for line in urllib.request.urlopen(d.getVar("SHASUM_URI")):
(sha, filename) = line.decode("ascii").strip().split()
if filename == "glib-2.52.2.tar.xz":
d.setVarFlag("SRC_URI", "sha256sum", sha)
return
bb.error("Could not find remote checksum")
}
Note that as fetch_checksums
is a pre-function for do_fetch
it is only executed just before do_fetch
and not at any other time, so this doesn't impose any delays on builds that don't need to fetch.
If I were taking this beyond a proof of concept and making it into a general-purpose class there's a number of changes I would want to make:
- Use the proxies when calling
urlopen()
- Extract the filename to search for from the
SRC_URI
- Generate the checksum URL from the
SRC_URI
I'll leave those as an exercise to the reader though. Patches welcome!