During the Open Design Lunch last week, one topic that came up frequently was around unittests. Most questions were variations of “intermittent unittest failures block developers from landing”, and “unittests take too long to run”.
Hopefully this blog post will explain some of the work already done/inprogress to make this better.
Short answer is:
- fixup unittest machines & toolchain
- fix unittest framework so each unittest run does not require a rebuild
- run unrelated unittest suites concurrently
- split out big suites like mochitest into multiple smaller suites
Solving these problems will get us muchly improved end-to-end turnaround time, simplify debugging intermittent failures, and allow us to start running unittests on nightly and release builds.
A longer, more detailed answer needs more text, some diagrams… and obviously coffee!
Each “unittest run” actually does the following steps sequentially: pull tip-of-tree, build (with modified mozconfig), TUnit, reftest, crashtest, mochitest, mochichrome, browserchrome, a11y. However, this means if you run unittests twice in a row, even without any code change, you are actually doing: pull tip-of-tree, build (with modified mozconfig), TUnit, reftest, crashtest, mochitest, mochichrome, browserchrome, a11y, pull tip-of-tree, build (with modified mozconfig), TUnit, reftest, crashtest, mochitest, mochichrome, browserchrome, a11y. Note the double pull, and double build.
This causes several important problems:
- each unittest cycle takes a long time, because its doing a build every time.
- it was not practical to run each unittest suite as a separate concurrent job, because:
- each unittest suite would need its own build step (costing more overall CPU time) and
- because each build would have its own BuildID (complicating work of reassembling together all the test results afterwards).
- crashes for each rebuild get different memory stackdumps
- each build pulls tip-of-tree, so if a change lands while you are re-running tests, each build could get a different pull of tip-of-the-tree source code, and you’d be testing different things.
- each build has a different BuildID, so harder to confirm if all builds have same code.
- having new builds each time makes it hard to spot any machine or compiler problems.
- typical way to find an intermittent problem is to run test ‘n’ times. If you run “reftest” 5 times in a row, thats quick and useful. However, the wasted time of rebuilding and then running all suites serially even if you are only interested in rerunning just one suite, really adds up. Running build+all unittest suites 5 times in a row quickly becomes impractical, especially when you require the tip-of-tree to remain constant for the duration.
Our plan to fix these is:
- Make sure that the spec of machines/VMs being used were sufficient for either build or unittest jobs. Also, consolidate both toolchains into one toolchain suitable for both builds and unittests.
- There was lots of work done by lsblakk, robcee, schrep and others during summer 2008 to make unittest machines identical to build machines in one general purpose pool-of-slaves.
- There was also a lot of work done by robcee, schrep, mrz, justin and myself to see if the intermittent tests would be solved by moving to faster VMs or dedicated physical hardware. While its true that we can always make incremental improvements in turnaround time by spec-ing faster VMs or buying faster dedicated physical machines, those experiments found (different!) intermittent unittest failures each time.
- I assert that fixing the system design problems outlined above will get us significantly better turnaround time, and also solve other problems that just brute force cant fix, so should be done first. Only after that global (large) optimization is done, should we revisit the discussion about local (smaller) optimizations.
- Consolidate the two toolchains, and consolidate the two sets of machines in one production pool-of-slaves. This was finished just before Christmas 2008 and means that:
- all build slaves and unittest slaves are now part of the one pool-of-slaves, and all able to do either builds *or* unittests.
- we can enable unittests on a new branch as the same time as we enable builds on any new branch
- we have more machines to scale up and handle build&unittest load on whatever branch is the most active branch.
- we can now run unittests everywhere we can run builds. We’re already running unittests on each active code line. We’re nearly finished enabling unittests on try server (see bug#445611)
- Separate out build from unittest
- consolidate build mozconfig with unittest mozconfig
- cleanup test setup assumptions about what files/env.settings are needed by a unittest suite, being done by Ted in bug#421611.
- one by one, as each suite is separated out, we enable that standalone suite running by itself in pool-o-slaves, and disable that suite from as part of the “build-and-remaining-unittest-suite” jobs. (seeÂ bug#383136)
Once we have all the unittest suites running without requiring a build step, then we can:
- quickly re-run test suites on the same *identical* build, get easy to compare stack traces, and have no concerns about unexpected landings changing what we build from tip-of-tree.
- quickly re-run specific test suite of interest much quicker (if you only care about reftest, only rerun reftest…)
- run tests on older builds to figure out when a test started failing intermittently.
- run each separate test suite concurrently on different machines, and post results for each suite as each individual suite completes.
- split the longest running suites into smaller bite-size suites, for better efficiency.
- start running unittests on nightly and release builds.
All in all, this is very exciting stuff! Not sure how much of that came across in the Open Design Lunch, but hopefully that all makes sense – let me know if you have questions/comments?
ps: An early attempt to reducing the build+unittest time was to adjust some compile options to reduce build time, but that actually complicates matters (Win32 unittest tests non-PGO builds; Mac unittest tests intel-only builds, not universal builds, etc). We’re still investigating what to do here; any suggestions?