Yesterday, Alice de-tangled one more part of the messy time stamp problem by fixing bug#419487.
All data points for time on Talos and graph server now use â€œbuild timeâ€, not â€œtestrun timeâ€.
This will greatly simplify a lot of manual regression triage work for people looking at performance graphs on graph server. Now, if you are tracking down what change caused a perf regression:
- Before 8am PDT Wednesday, 25 June 2008, all charts use â€œtestrun timeâ€. This means debugging regressions require manually padding a regression range multiple hours widerÂ – enough to catch from start of build through queue to job starting on available slave. Different O.S. take different amounts of time, and any machine hiccups really complicate this padding-guess-work further. If you get it wrong, you can incorrectly rule out bad changes, so pad out more then you think. It means extra triage work, but is safer.
- After 8am PDT Wednesday, 25 June 2008, all charts use â€œbuild timeâ€. We’re still fixing other problems with timestamps, so you still need *some* extra range padding, but much less padding then before. At most manually pad out to the next/previous hour. This padding should be fixed once the BuildID changes in bug#431270 and bug#431905 are landed.
Anyone curious for details should read Aliceâ€™s recent post to mozilla.dev.builds and mozilla.dev.performance (â€change in talos time stamps (as of 8am PDT June 25th 2008)â€), bug#291167, bug#417633 and bug#419487. This is a continuation of the work described (in tedious detail!) in my previous blog post.
This sounds like a small simple change, but it was not. Its a tricky, complex, area in the infrastructure, with lots relying on it, and lots of different people with different assumptions about how time is used here. There was lots of behind-the-scenes homework on this, and to avoid causing any confusion, we held off landing this until after Firefox3 shipped.
Tip of the hat to Alice for pushing this through to production so smoothly.