Heading home Friday evening, I stopped by Aki’s desk and took these quick photos. At first glance, these might not look all that exciting:
…but look closer… closer… see the little bits of paper taped on the machines??
This is actually really really big! This is the beginning of a pool of mobile devices slaves to automatically run unittest and talos on mobile builds.
Aki’s been using these two machines to figure out linux mobile hardware setup problems around the hardware setup, toolchain setup, memory limitations as well as how to get a buildbot slave running on the device, communicating to a buildbot master.
Since end of Sept, we’ve been producing linux-arm builds of fennec automatically on checkin and every night. However, humans were still needed to manually download specific builds onto the physical devices when they had time, and manually run some tests. There’s still no automated unittests, talos, graphserver for mobile.
The build automation was done by updating all the linux slaves in our existing pool of slaves, and then doing cross-compile builds. That allowed us to re-use a bunch of existing infrastruture. However, we need to use actual devices for testing and performance… which opens a whole set of problems we have to figure out for the first time. For example:
- if the device hangs during a test, how do you powercycle it? (Most people first suggest removing battery and running on remote switchable a/c; however, these devices refuse to boot unless a battery is present, and it looks like they need a physical button pushed on after a power reset, so we’re still looking for a solution – all suggestions welcome!)
- How do these devices communicate to other machines; how do we get builds onto, and test data off from, these devices? The buildbot master needs to tell slaves when and what to do. The slaves need to post results to buildbot master as well as ftp.m.o and graphserver. Something (nagios?) needs to monitor to see if devices are ok. If we use wifi, does that cause noisy deviations in our test results? If we use ethernet, what s/w drivers and cable converters do we need?
- If we setup devices in office, for easier manual reboots, what ssh/firewall changes do we need to get access to other build systems in secured area in colo?
- What toolchain do we need to install on these devices? Both of these devices were “polluted” when we got them, meaning no-one was exactly sure whats on them, and we’re fairly sure they’re different from each other. We’ve got some more machines on order, so will have to figure out clean-setup-instructions as part of this setup.
- Is there enough memory in the device to run o.s. + buildbot + talos + talos-tools + fennec? Can far can we increase memory? Do we have to rewrite buildbot/talos to fit into low memory situations?
- We can’t build on these devices, but unittests currently requires building first before testing.
- What can we do about turnaround time? These devices are *SLOW*… Break testsuites into chunks that can be run in parallel?
And all that is before you get to the the fun stuff… the usually expected problems, like how many tests fail because of different display properties, memory availability, slower CPUs causing test timeouts, environment differences, etc. etc. etc.
While there’s still lots to solve along the way, its exciting to watch Aki making methodical, persistent and rapid progress. Very very cool.