UEFI Validation

The Linux UEFI Validation Project was announced recently:

Bringing together multiple separate upstream test suites into a cohesive and easy-to-use product with a unified reporting framework, LUV validates UEFI firmware at critical levels of the Linux software stack and boot phases.

LUV also provides tests in areas not previously available, such as the interaction between the bootloader, Linux kernel and firmware. Integrated into one Linux distribution, firmware can now be tested under conditions that are closer to real scenarios of operation. The result: fewer firmware issues disrupting the operating system.

Of course that “one Linux distribution” is built using the Yocto Project, so it’s trivial to grab the source and patch/extend it, or rebuild it for different processors if for example you want to validate UEFI on ARM or a new CPU that mainstream distros don’t fully support.

Reproducible builds and GPL compliance

LWN has a good article on GPL compliance (if you’re not a subscriber you’ll have to wait) that has an interesting quote:

Developers, and embedded developers in particular, can help stop these violations. When you get code from a supplier, ensure that you can build it, he said, because someone will eventually ask. Consider using the Yocto Project, as Beth Flanagan has been adding a number of features to Yocto to help with GPL compliance. Having reproducible builds is simply good engineering practice—if you can’t reproduce your build, you have a problem.

This has always been one of the key points that we emphasis when explaining why you should use the Yocto Project for your next product.  If you’re shipping a product that is built using fifty open source projects then ensuring that you can redistribute all the original sources, and the patches that you’ve applied, and the configure options that you’ve used, and any tweaks to go from a directory of binaries to a bootable image isn’t something you can knock up in an afternoon when you get a letter from the SFC.  Fingers crossed you didn’t accidentally use some GPLv3 code when that is considered toxic.

Beth is awesome and has worked with others in the Yocto community to ensure all of this is covered.  Yocto can produce license manifests, upstream sources + patches archives, verify GPLv3 code isn’t distributed, and more.  All the work that is terribly boring at the beginning when you have a great idea and are full of enthusiasm (and Club-Mate), but by the time you’re shipping is often nigh on impossible.  Dave built the kernel on his machine but the disk with the right source tree on died, and Sarah left without telling anyone else the right flags to make libhadjaha actually link…  it’ll be fine, right?

Misc for Sale

As we’re moving house we are having a bit of a clear out, and have some techie gadgets that someone else might want:

  • Netgear DGN3500 ADSL/WiFi router. £15.
  • Bluetooth GPS dongle. £5.
  • Seagate Momentus 5400.6 160GB 2.5″ SATA hard disk (ex-MacBook). £5.

Those prices are including UK postage. Anyone interested?

Better bash completion?

Bash completion is great and everything, but I spend more time than is advisable dealing with numerous timestamped files.

$ mv core-image-sato-qemux86-64-20140204[tab]
core-image-sato-qemux86-64-20140204194448.rootfs.ext3
core-image-sato-qemux86-64-20140204202414.rootfs.ext3
core-image-sato-qemux86-64-20140204203642.rootfs.ext3

This isn’t an obvious choice as I now need to remember long sequences of numbers. Does anyone know if bash can be told to highlight the bit I’m being asked to pick from, something like this:

$ mv core-image-sato-qemux86-64-20140204[tab]
core-image-sato-qemux86-64-20140204194448.rootfs.ext3
core-image-sato-qemux86-64-20140204202414.rootfs.ext3
core-image-sato-qemux86-64-20140204203642.rootfs.ext3

Remote X11 on OS X

I thought I’d blog this just in case someone else is having problems using XQuartz on OS X as a server to remote X11 applications (i.e. using ssh -X somehost).

At first this works but after some time (20 minutes, to be exact) you’ll get “can’t open display: localhost:10.0″ errors when applications attempt to connect to the X server. This is because the X forwarding is “untrusted” and that has a 20 minute timeout. There are two solution here: increase the X11 timeout (the maximum is 596 hours) or enable trusted forwarding.

It’s probably only best to enable trusted forwarding if you’re connecting to machines you, well, trust. The option is ForwardX11Trusted yes and this can be set globally in /etc/ssh_config or per host in ~/.ssh/config.

Network Oddity

This is… strange. Two machines, connected through cat5 and gigabit adaptors/hub.

$ iperf -c melchett.local -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to melchett.local, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.7 port 35197 connected with 192.168.1.10 port 5001
[  5] local 192.168.1.7 port 5001 connected with 192.168.1.10 port 33692
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.08 GBytes   926 Mbits/sec
[  5]  0.0-10.0 sec  1.05 GBytes   897 Mbits/sec

Simultaneous transfers get ~900MBits/s.

$ iperf -c melchett.local -r
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to melchett.local, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  5] local 192.168.1.7 port 35202 connected with 192.168.1.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec   210 MBytes   176 Mbits/sec
[  4] local 192.168.1.7 port 5001 connected with 192.168.1.10 port 33693
[  4]  0.0-10.0 sec  1.10 GBytes   941 Mbits/sec

Testing each direction independently results in only 176MBits/sec on the transfer to the iperf server (melchett). This is 100% reproducible, and the same results appear if I swap iperf client and servers.

I’ve swapped one of the cables involved but the other is harder to get to, but I don’t see how physical damage could cause this sort of performance issue. Oh Internet, any ideas?

Solving buildhistory slowness

The buildhistory class in oe-core is incredibly useful for analysing the changes in packages and images over time, but when doing frequently builds all of this metadata builds up and the resulting git repository can be quite unwieldy. I recently noticed that updating my buildhistory repository was often taking several minutes, with git frantically doing huge amounts of I/O. This wasn’t surprising after realising that my buildhistory repository was now 2.9GB, covering every build I’ve done since April. Historical metrics are useful but I only ever go back a few days, so this is slightly over the top. Deleting the entire repository is one idea, but a better solution would be to drop everything but the last week or so.

Luckily Paul Eggleton had already been looking into this so pointed me at a StackOverflow page which used “git graft points” to erase history. The basic theory is that it’s possible to tell git that a certain commit has specific parents, or in this case no parent, so it becomes the end of history. A quick git filter-branch and a re-clone later to clean out the stale history and the repository is far smaller.

$ git rev-parse "HEAD@{1 month ago}" > .git/info/grafts

This tells git that the commit a month before HEAD has no parents. The documentation for graft points explains the syntax, but for this purpose that’s all you need to know.

$ git filter-branch

This rewrites the repository from the new start of history. This isn’t a quick operation: the manpage for filter-branch suggests using a tmpfs as a working directory and I have to agree it would have been a good idea.

$ git clone file:///your/path/here/buildhistory buildhistory.new
$ rm -rf buildhistory
$ mv buildhistory.new buildhistory

After filter-branch all of the previous objects still exist in reflogs and so on, so this is the easiest way of reducing the repository to just the objects needed for the revised history. My newly shrunk repository is a fraction of the original size, and more importantly doesn’t take several minutes to run git status in.

Using netconsole in the Yocto Project

Debugging problems which mean init goes crazy is tricky, especially so on modern Intel hardware that doesn’t have anything resembling a serial port you can connect to.

Luckily this isn’t a new problem, as Linux supports a network console which will send the console messages over UDP packets to a specific machine. This is mostly easy to use but there are some caveats that are not obvious.

The prerequisites are that netconsole support is enabled, and your ethernet driver is built in to the kernel and not a module. Luckily, the stock Yocto kernels have netconsole enabled and the atom-pc machine integrates the driver for my hardware.

Then, on the target machine, you pass netconsole=... to the kernel. The kernel documentation explains this quite well:

netconsole=[src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr]

   where
        src-port      source for UDP packets (defaults to 6665)
        src-ip        source IP to use (interface address)
        dev           network interface (eth0)
        tgt-port      port for logging agent (6666)
        tgt-ip        IP address for logging agent
        tgt-macaddr   ethernet MAC address for logging agent (broadcast)

The biggest gotcha is that you (obviously) need a source IP address, and netconsole starts before the networking normally comes up. Apart from that you can generally get away with minimal settings:

netconsole=@192.168.1.21/,@192.168.1.17/

Note that apparently some routers may not forward the broadcast packets correctly, so you may need to specify the target MAC address.

On the target machine run something like netcat to capture the packets:

$ netcat -l -u -p 6666 | tee console.log

If you get the options wrong the kernel will tell you why, so if you don’t get any logging iterate a working argument on an image that works, using dmesg to see what the problem is.

Finally instead of typing in this argument every time you boot, you can add it to the boot loader in your local.conf:

APPEND += "netconsole=@192.168.1.21/,@192.168.1.17/"

(APPEND being the name of the variable that is passed to the kernel by the boot loader.)

Update: when the journal starts up systemd will stop logging to the console, so if you want to get all systemd messages also pass systemd.log_target=kmsg.

Mutually Exclusive PulseAudio streams

The question of mutually exclusive streams in PulseAudio came to mind earlier, and thanks to Arun and Jens in #gupnp I discovered the PulseAudio supports them already. The use-case here is a system where there are many ways of playing music, but instead of mixing them PA should pause the playing stream when a new one starts.

Configuring this with PulseAudio is trivial, using the module-role-cork module:

$ pactl load-module module-role-cork trigger_roles=music cork_roles=music

This means that when a new stream with the “music” role starts, “cork” (pause to everyone else) all streams that have the “music” role. Handily this module won’t pause the trigger stream, so this implements exclusive playback.

Testing is simple with gst-launch

$ PULSE_PROP='media.role=music' gst-launch-0.10 audiotestsrc ! pulsesink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstPulseSinkClock

At this point, starting another gst-launch results in this stream being paused:

Setting state to PAUSED as requested by /GstPipeline:pipeline0/GstPulseSink:pulsesink0...

Note that it won’t automatically un-cork when the newer stream disappears, but for what I want this is the desired behaviour anyway.

Yocto Project Build Times

Last month our friends at Codethink were guests on FLOSS Weekly, talking about Baserock. Baserock is a new embedded build system with some interesting features/quirks (depending on your point of view) that I won’t go into. What caught my attention was the discussion about build times for various embedded build systems.

Yocto, again, if you want to do a clean build it will take days to build your system, even if you do an incremental build, even if you just do a single change and test it, that will take hours.

(source: FLOSS Weekly #230, timestamp 13:21, slightly edited for clarity)

Now “days” for a clean build and “hours” for re-building an image with a single change is quite excessive for the Yocto Project, but also quite specific. I asked Rob Taylor where he was getting these durations from, and he corrected himself on Twitter:

I’m not sure if he meant “hours” for both a full build and an incremental build, or whether by “hours” for incremental he actually meant “minutes”, but I’ll leave this for now and talk about real build times.

Now, my build machine is new but nothing special. It’s built around an Intel Core i7-3770 CPU (quad-core, 3.4GHz) with 16GB of RAM (which is overkill, but more RAM means more disk cache which is always good), and two disks: a 250GB Western Digital Blue for /, and a 1TB Western Digital Green for /data (which is where the builds happen). This was built by PC Specialist for around £600 (the budget was $1000 without taxes) and happily sits in my home study running a nightly build without waking the kids up. People with more money stripe /data across multiple disks, use SSDs, or 10GB tmpfs filesystems, but I had a budget to stick to.

So, let’s wipe my build directory and do another build from scratch (with sources already downloaded). As a reference image I’ll use core-image-sato, which includes an X server, GTK+, the Matchbox window manager suite and some demo applications. For completeness, this is using the 1.3 release – I expect the current master branch to be slightly faster as there’s some optimisations to the housekeeping that have landed.

$ rm -rf /data/poky-master/tmp/
$ time bitbake core-image-sato
Pseudo is not present but is required, building this first before the main build
Parsing of 817 .bb files complete (0 cached, 817 parsed). 1117 targets, 18 skipped, 0 masked, 0 errors.
...
NOTE: Tasks Summary: Attempted 5393 tasks of which 4495 didn't need to be rerun and all succeeded.

real 9m47.289s

Okay, that was a bit too fast. What happened is that I wiped my local build directory, but it’s pulling build components from the “shared state cache”, so it spent six minutes reconstructing a working tree from shared state, and then three minutes building the image itself. The shared state cache is fantastic, especially as you can share it between multiple machines. Anyway, by renaming the sstate directory it won’t be found, and then we can do a proper build from scratch.

$ rm -rf /data/poky-master/tmp/
$ mv /data/poky-master/sstate /data/poky-master/sstate-old
$ time bitbake core-image-sato
Pseudo is not present but is required, building this first before the main build
...
NOTE: Tasks Summary: Attempted 5117 tasks of which 352 didn't need to be rerun and all succeeded.

real 70m37.298s
user 326m45.417s
sys 37m13.304s

That’s a full build from scratch (with downloaded sources, we’re not benchmarking my ADSL) in just over an hour on affordable commodity hardware. As I said this isn’t some “minimal” image that boots straight to busybox, this is building a complete cross-compiling toolchain, the kernel, X.org, GTK+, GStreamer, the Matchbox window manager/panel/desktop, and finally several applications. In total, 431 source packages were built and packaged, numerous QA tests executed and flashable images generated.

My configuration is to build for Intel Atom but a build for an ARM, MIPS, or PowerPC target would also take a similar amount of time, as even what could be considered “native” targets (targeting Atom, building on i7) doesn’t always turn out to be native: for example carrier-grade Xeon’s have instructions that my i7 doesn’t have, and if you were building carrier-grade embedded software you’d want to ensure they were used.

So, next time someone claims Yocto Project/OpenEmbedded takes “days” or even “hours” to do a build, you can denounce that as FUD and point them here!