Faster image transfer across the network with zsync

Posted by Ross Burton on June 10, 2021

Those of us involved in building operating system images using tools such as OpenEmbedded/Yocto Project or Buildroot don't always have a power build machine under our desk or in the same building on gigabit. Our build machine may be in the cloud, or in another office over a VPN running over a slow residential ADSL connection. In these scenarios, repeatedly downloading gigabyte-sized images for local testing can get very tedious.

There are some interesting solutions if you use Yocto: you could expose the shared state over the network and recreate the image, which if the configurations are the same will result in no local compilation. However this isn't feasible if your local machine isn't running Linux or you just want to download the image without any other complications. This is where zsync is useful.

zsync is a tool similar to rsync but optimised for transfering single large files across the network. The server generates metadata containing the chunk information, and then shares both the image and the metadata over HTTP. The client can then use any existing local file as a seed file to speed up downloading the remote file.

On the server, run zsyncmake on the file to be transferred to generate the .zsync metadata. You can also pass -z if the file isn't already compressed to tell it to compress the file first.

$ ls -lh core-image-minimal-*.wic*
-rw-r--r-- 1 ross ross 421M Jun 10 13:44 core-image-minimal-fvp-base-20210610124230.rootfs.wic

$ zsyncmake -z core-image-minimal-*.wic

$ ls -lh core-image-minimal-*.wic*
-rw-r--r-- 1 ross ross 4.7K Jun 10 13:44 core-image-minimal-fvp-base-20210610124230.rootfs.manifest
-rw-r--r-- 1 ross ross 421M Jun 10 13:44 core-image-minimal-fvp-base-20210610124230.rootfs.wic
-rw-r--r-- 1 ross ross  53M Jun 10 13:45 core-image-minimal-fvp-base-20210610124230.rootfs.wic.gz

Here we have ~420MB of disk image, which compressed down to a slight 53MB, and just ~5KB of metadata. This image compressed very well as the raw image is largely empty space, but for the purposes of this example we can ignore that.

The zsync client downloads over HTTP and has some non-trivial requirements so you can't just use any HTTP server, specifically my go-to dumb server (Python's integrated http.server) isn't sufficient. If you want a hassle-free server then the Node.js package http-server works nicely, or any other proper server will work. However you choose to do it, share both the .zsync and .wic.gz files.

$ npm install -g http-server
$ http-server -p 8080 /path/to/images

Now you can use the zsync client to download the images. Sadly zsync isn't actually magical, so the first download will still need to download the full file:

$ zsync http://buildmachine:8080/core-image-minimal-fvp-base-20210610124230.rootfs.wic.zsync
No relevent local data found - I will be downloading the whole file.
downloading from http://buildmachine:8080/core-image-minimal-fvp-base-20210610124230.rootfs.wic.gz:
#################### 100.0% 7359.7 kBps DONE

verifying download...checksum matches OK
used 0 local, fetched 55208393

However, subsequent downloads will be a lot faster as only the differences will be fetched. Say I decide that core-image-minimal is too, well, minimal, and build core-image-sato which is a full stack instead of just busybox. After building the the image and metadata we now have a ~700MB image:

-rw-r--r-- 1 ross ross 729M Jun 10 14:17 core-image-sato-fvp-base-20210610125939.rootfs.wic
-rw-r--r-- 1 ross ross 118M Jun 10 14:18 core-image-sato-fvp-base-20210610125939.rootfs.wic.gz
-rw-r--r-- 1 ross ross 2.2M Jun 10 14:19 core-image-sato-fvp-base-20210610125939.rootfs.wic.zsync```

Normally we'd have to download the full 730MB, but with zsync we can just fetch the differences. By telling the client to use the existing core-image-minimal as a seed file, we can fetch the new core-image-sato:

$ zsync -i core-image-minimal-fvp-base-20210610124230.rootfs.wic  http://buildmachine:8080/core-image-sato-fvp-base-20210610125939.rootfs.wic.zsync
reading seed file core-image-minimal-fvp-base-20210610124230.rootfs.wic
core-image-minimal-fvp-base-20210610124230.rootfs.wic. Target 70.5% complete.
downloading from http://buildmachine:8080/core-image-sato-fvp-base-20210610125939.rootfs.wic.gz:
#################### 100.0% 10071.8 kBps DONE     

verifying download...checksum matches OK
used 538800128 local, fetched 70972961

By using the seed file, zsync determined that it already has 70% of the file on disk, and downloaded just the remaining chunks.

For incremental builds the differences can be very small when using the Yocto Project, as thanks to the reproducible builds effort there are no spurious changes (such as embedded timestamps or non-deterministic compilation) on recompiles.

Now, obviously I don't recommend doing all of this by hand. For Yocto Project users, as of right now there is a patch queued for meta-openembedded adding a recipe for zsync-curl, and a patch queued for openembedded-core to add zsync and gzsync image conversion types (for IMAGE_FSTYPES, for example wic.gzsync) to generate the metadata automatically. Bring your own HTTP server and you can fetch without further effort.

tags: tech, yocto