Since writing this post and then this post, we now have “unittests on try” running on production. Big tip of the hat to Lukas and Catlee for that! So, whats next on our “make unittest better” ToDo list?
1) separate out unittest from build
Basically, running each unittest suite concurrently on separate slaves should improve end-to-end turnaround times. Its also a pre-requisite for helping track down intermittent unittest failures. See here for more details, and some pretty diagrams.
Ted’s just finished fixing the makefiles in bug#421611 so, in theory, its now possible to run a unittest suite on a different machine from where the unittest build was done. Now, the next phase of work separating builds from unittests can begin. RelEng will start using these new makefile targets throughout our infrastructure. We’ll also start to publish those partial-debug builds someplace for these standalone unittest suites to use.
Once a suite running standalone is confirmed to give same results as the same suite running as part of the “build-and-unittest” job, then we’ll disable that specific suite from the “build-and-unittest” job, and use the standalone unittest job in production. Each standalone unittest suite can be run concurrently, so this gets us better turnaround times.
Funny enough, after all the behind-the-scenes complex systems infrastructure work to make this possible, we’ve still have to decide the best way to display this information in tinderbox waterfall?!?! Urgh.
We’ll have to do this for both production pool-of-slaves and also the try pool-of-slaves before we can declare this done. The curious can follow along in bug#474671, bug#452861.
2) Run unittests on full-debug builds, not partial-debug builds
Until now, unittests have always run on partial-debug builds. This requires us to produce a special build *only* for running unittests, which we then throw away afterwards. This is separate from and additional to the opt build and full-debug builds that we also produce and publish.
Changing unittests to run against full-debug builds will require test/code cleanup around how exceptions are handled, but is a good thing to do for two reasons:
- its something developers have been asking for, as it simplifies debugging unittest failures. We’ve just not been able to get to before now.
- this will allow us to do 1 fewer build per o.s. per checkin. This might not sound like much, but when we’re dealing with 900-1100 pushes per month, and each push triggers 7-11 builds, optimizing how we use pool-of-slaves capacity really matters. Data for recent months is here Jan2009 , Feb2009
The curious can follow along in bug#372581.
3) automatically detect intermittent test failures
We all know some unittests are failing intermittently right now, bug#438871 and also the work Clint is doing here should help clean that all up. However, once all the unittests are fixed up, how do we make sure we dont drift back into this state again – can we automatically detect new intermittent regressions as they creep back in?
To automatically detect this, we’re going to periodically run the same unittest suite ‘n’ times in a row on the same identical build. We don’t care so much what the test results are, we just care that we should get *identical* results each time; after all its the same test suite being run & re-run on the exact same build. Any variation in test results will be flagged to QA and Dev to investigate.
- Part of this will involve scheduling batches of the same suite to be run. Running five-times-in-a-row, once-per-week sounds like a good start, but we’ll tweak the #iterations, and the frequency of runs, as we experiment.
- This assumes we’ve already separated out the build from the running of the unittest (see above). Running test five-times-in-a-row will take time; running build-and-test five-times-in-a-row will take *lots* of time, and is testing on slightly different bits each time anyway, so not good.
- Part of this will involve automating how we detect and track those intermittent failures. Doing this manually simply does not scale. We dont have a bug on file yet for this, because we’re still figuring it out. The Open Design lunch we had a few weeks ago was a start, but any comments and suggestions would be great!
Hope all that makes sense.