Dynamic source checksums in OpenEmbedded

Posted by Ross Burton on June 13, 2017

Today we were cleaning up some old bugs in the Yocto Project bugzilla and came across a bug which was asking for the ability to specify a remote URL for the source tarball checksums (SRC_URI[md5sum] and/or SRC_URI[sha256um]). We require a checksum for tarballs for two reasons:

  1. Download integrity. We want to be sure that the download wasn't corrupted in some way, such as truncation or bad encoding.
  2. Security. We want to be sure that the tarball hasn't changed over time, be it the maintainer regenerating the tarball for an old release but with different content (this happens more than you'd expect, with non-trivial changes too), or alternatively a malicious attack on the file which now contains malware (such as the Handbrake hack in May).

The rationale for reading remote URLs for checksums was that for files that are changing frequently it would be easier to upgrade the recipe if the checksums didn't need to be altered too. For some situations I can see this argument, but I don't want to encourage practices that nullify the security checksums. For this reason I rejected the bug but thanks to the power of Bitbake I did provide a working example of how to do this in your recipe.

The trick is to observe that the only time the SRC_URI[md5sum] is read is during do_fetch. By adding a new function to do_fetch[prefuncs] (the list of functions that will be executed before do_fetch is executed) we can download the checksums and write the variable just before the fetcher needs it. Here is a partial example that works for GNOME-style checksums, where each upload generates foo-1.2.tar.bz2, foo-1.2.tar.xz, foo-1.2.sha256sum, and foo-1.2.md5sum. To keep it interesting the checksum files contain the sums for both compression types, so we need to iterate through the file to find the right line:

SRC_URI = "https://download.gnome.org/sources/glib/2.52/glib-2.52.2.tar.xz"
SHASUM_URI = "https://download.gnome.org/sources/glib/2.52/glib-2.52.2.sha256sum"

do_fetch[prefuncs] += "fetch_checksums"
python fetch_checksums() {
    import urllib
    for line in urllib.request.urlopen(d.getVar("SHASUM_URI")):
        (sha, filename) = line.decode("ascii").strip().split()
        if filename == "glib-2.52.2.tar.xz":
            d.setVarFlag("SRC_URI", "sha256sum", sha)
            return
    bb.error("Could not find remote checksum")
}

Note that as fetch_checksums is a pre-function for do_fetch it is only executed just before do_fetch and not at any other time, so this doesn't impose any delays on builds that don't need to fetch.

If I were taking this beyond a proof of concept and making it into a general-purpose class there's a number of changes I would want to make:

  1. Use the proxies when calling urlopen()
  2. Extract the filename to search for from the SRC_URI
  3. Generate the checksum URL from the SRC_URI

I'll leave those as an exercise to the reader though. Patches welcome!

tags: yocto