Keeping perspective: 34hours vs 37hours

It took 34 hours to produce Firefox3.0beta1 rc1.

Those 34 hours were frantic. Two people, tag teaming day & night, working with the nervous tension of knowing that a single one character typo could invalidate the entire build, and force us to start all over again. Those 34 hours only got us as far as producing unsigned builds on each platform – roughly 1/3 of the overall Build work needed to do a release – before we hit a problem. A typo. At the beginning of it all, one person typed PDT into one computer, while the other person typed PST into another computer. That typo meant rc1 did not include a last minute important bugfix. So, we scrapped rc1 and started all over again, building rc2. (I note that the D and S are even next to each other on the keyboard [sigh!]. And if it wasnt for the timezone change last week, it would have not mattered either[sigh! sigh!])

To put that 34 hours in perspective, Build took 37 hours to do everything needed for the complete FF2.0.0.9 release… and most of that was actually just watching the automation chugging along. Active human work was down to a handful of hours for signing, bouncer/mirror updates, and a little nervous manual rechecking of the automated checks, just to be sure, to be sure.

Why the night and day difference?

We’ve been focusing on automation for the FF2.0.0.x branch over the last few months, shipping FF2.0.0.7, FF2.0.0.8 and FF2.0.0.9 each time with automation improved from the previous release. Sadly, none of this automation work is live on trunk yet. All the trunk releases, like the alphas, and now this FF3.0beta1, are done the old fashioned way. By hand. One command at a time.

This week was a stark reminder of what things used to be like, and gave perspective on how much we’ve accomplished so far this year.

Free Software 2.0.0.9 builds now also available…

… at ftp://ftp.mozilla.org/pub/firefox/releases/2.0.0.9/contrib/free-software/.

This special build of Firefox2.0.0.9 uses the exact same code cutoff time and cvs branch as the regular Firefox2.0.0.9 release, but was compiled with branding, logos and talkback removed.

As an aside, I didnt know much about this special build until recently, hence there was no plan to include this in our build automation work. However, looking back on ftp.mozilla.org, I see quite a few of them, and asking around, it was done manually once the dust settled on a given Firefox release. We are now tracking automating these FreeSoftware builds in bug#385783, with some related cleanup in bug#402582.

Firefox 2.0.0.9 by the (wall-clock) numbers

Mozilla released Firefox 2.0.0.9 on Thursday 01-nov-2007, at 5.40pm PST.

From “do we need a release” to “release is now available to public” was 11 days 2 hours wall-clock time, of which the Beta period took 2.75 days, and Build&Release took 37 hours.

15:35 22oct: decide regressions introduced in FF2008 justify producing a quick FF2009 to address
12:30 25oct: Dev says “go”
14:40 25oct: 2009rc1 builds started
20:00 25oct: linux builds handed to QA
22:00 25oct: mac builds handed to QA
01:00 26oct: win32 signed builds handed to QA
19:40 26oct: update snippets on betatest update channel
16:30 29oct: QA says “go” for Beta
16:50 29oct: update snippets on beta update channel
10:40 01nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
14:15 01nov: final signing, bouncer entries done; mirror replication started
17:15 01nov: update snippets on live update channel; announced

While Build Automation in FF2009 was much smoother than FF2008, this was not yet a “human free” release:
1) The talkback server had been renamed after the FF2.0.0.8 release shipped and before FF2.0.0.9 started, so our first automation run timed out at the end of the build, waiting for humans to answer the RSA “are you sure you want to connect to this machine” login question?! 🙁 We didnt detect this until the build overran the estimated completion time, but then after a quick fix, we were forced to rerun the entire build again. This would have been caught if our nightlies were part of the same build automation (see bug#401936)
2) We still manually do signing, adding bouncer entries, starting mirror replication and monitoring mirror replication, pushing snippets to beta channel, pushing snippets to release channel. Combined, these took 6.5 hours of the Build time, and are worthy of automation attention. Pushing updates snippets to betatest channel has been automated since the FF2008 release.
3) Mirror absorption took 3 hours to reach 72-80%. The mac DMG files always straggle much lower then everything else for mirror absorption, apparently a known problem with how webservers handle that file type, but new details are emerging in bug#402141. Experiments continue, but every time we do a release, we always give thanks to morgamic for giving us the tools to measure with!

take care
John

Firefox 2.0.0.8 by the (wall-clock) numbers

Mozilla released Firefox 2.0.0.8 on Tuesday 18-oct-2007, at 5.30pm PST.

From “code freeze” to “fix available to public” was 14 days 2 hours wall-clock time, which included a 7day Beta period (this was a non-firedrill release). Build&Release took 68 hours.

15:00 04oct: Dev says “go”
15:33 04oct: 2008rc1 builds started
18:20 04oct: linux builds handed to QA
19:45 04oct: mac builds handed to QA
12:45 05oct: win32 signed builds handed to QA
20:05 05oct: update snippets on betatest update channel
11:30 08oct: 2008rc1 halted. Respin declared for bugs 398422 and 398837
15:20 08oct: Dev says “go”
16:05 08oct: 2008rc2 builds started
19:50 08oct: linux builds handed to QA
22:05 08oct: mac builds handed to QA
00:45 08oct: win32 signed builds handed to QA
01:00 10oct: update snippets on betatest update channel
15:05 10oct: QA says “go” for Beta
16:05 10oct: update snippets on beta update channel
11:55 18oct: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
14:25 18oct: final signing, bouncer entries done; mirror replication started
17:30 18oct: update snippets on live update channel; announced

While Build Automation in FF2008 was much smoother than FF2007, this was not yet a “human free” release:
1) signing still done manually in two places. This is known and expected.

2) As the initial build steps get automated, the steps near the end of the process become more visible. Steps like pushing-updates-snippets-to-channels, adding bouncer entries, starting mirror replication and monitoring mirror replication are now worthy of automation attention. Combined, these took 6.5 hours of the Build time, and were all manual.

3) It was interesting to note that we needed only 3 hours of mirror replication time to reach 65-72% mirror absorption. There’s been quite a lot of folklore around how long it takes for mirror replication, but as mirrors have changed, we’ve been measuring to get concrete data. Even for a mirror replication in daytime, like in this release, we saw quick absorption around 60% within the first 2hours. We are still experimenting with IT to find out how much absorption is “enough”, so decided to wait until absorption hit around 70%, just to play safe. This is definitely not a science, we will continue experimenting with this in future releases… any comments/feedback very very welcome!

take care
John

Firefox 3alpha8 by the (wall-clock) numbers

Mozilla released Firefox3a8 on Thursday, 20-sep-2007, at 08:30am PST.

This was a manual build run (not automated on trunk yet), and an alpha release (not a high-priority security release), so the numbers are quite different to the earlier Firefox2.0.0.7 release. Even as an apples-to-baseballs comparison, I thought the numbers were interesting and worth sharing. From “code freeze” to “available for public download” was 14.33 days wall-clock time. Of that time Build&Release took 2.25 days (55 hours including the respin).

00:01 06-sept: M8 code freeze, tree closed
18:48 11-sept: Dev verifies last fix landed, and gives “go” to build
20:46 11-sept: Build starts building
01:26 12-sept: blocker bug#395862 filed
08:40 12-sept: blocker patch landed
09:22 12-sept: Build restarts building
13:49 12-sept: linux & mac builds handed to QA
18:01 12-sept: signed-win32 build handed to QA
11:17 18-sept: QA signed off on all builds
00:01 19-sept: Build supposed to finish signing and publish builds externally
02:58 20-sept: files available externally for download
08:31 20-sept: mirror absorption completed and release announced

There were a few interesting point about this release
1) There was a 5.75 day delay between when the code freeze started, and when the tree was first deemed ready for builds to start.
2) After builds started, a last minute blocker bug caused those builds to be abandoned and new builds started. This respin cost Build 12 hours.
3) Between 13-17sept inclusive, both Build and QA switched to work on FF2.0.0.7 (a higher priority security firedrill release). This caused wall-clock delays.
4) After QA signoff, we delayed releasing Firefox3a8 from 18sept to 19sept, to avoid traffic load of releasing Firefox3a8 on the same day as Firefox2.0.0.7.
5) There was a 1 day delay between when QA signed off on the builds and when Build group ran the remaining manual steps (signing installer, pushing bits externally, etc). These remaining Build steps only took a handful of hours to complete. However, the person doing those remaining manual steps (ie me!), was sidetracked with other non-release work.

Firefox 2.0.0.7 by the (wall-clock) numbers

Mozilla released Firefox 2.0.0.7 on Tuesday 18-sep-2007, at 3pm PST. For background on this security firedrill, see here.

This was our first production run using the new automation, so I thought the following wall-clock numbers might be interesting. From “initial report” to “fix available to public” was 6.25 day wall-clock time. Of that, Build&Release took just under 2 days (45 hours).

09:00 Wed: bug reported 9am (or 8.30am?). Dev start working on fix
13:40 Fri: fix landed on 1.8 branch
14:30 Fri: build started
18:30 Fri: linux builds handed to QA
22:30 Fri: mac builds handed to QA
22:30 Fri: win32 unsigned builds handed to QA
11:58 Sat: win32 signed builds handed to QA (1st time)
01:30 Sun: win32 signed builds handed to QA (2nd time, rebuilt on old
machine)
12:10 Sun: update snippets pushed to beta update channel
15:00 Tue: update snippets pushed to live update channel; announced

Full disclaimer, while this fast turnaround kept Mike Shaver happy, it was not yet a “human free” release. We hit 4 issues, which required manual intervention:

1) last minute question about possible CVS-cross-branch tagging problem in automation scripts. Problem unconfirmed, but decided to manually tag anyway, just to be safe. Problem still unconfirmed, but test case now designed to clarify for future releases (see bug#396290)

2) l10n builds on win32 had the wrong cr-lf settings in README, EULA. This root cause of this was an internal communications snafu within the Build&Release group. Historically, we build l10n win32 on different machines to win32 en-US machines. As part of automation rollout, some folks thought the l10n win32 builds were now being done on same machines as en-US for 2005+2006, some thought l10n win32 was still being built on different machines. Because these different machines have different cygwin cr-lf settings, this problem first surfaced as a problem where text files like README, EULA had the wrong cr-lf settings. It was caught by a recently added test. Rather the debug/fix the problem, we just built on the old l10n machine and shipped that for win32. This miscommunication has been clarified. Still checking if there’s anything else here we missed.

3) signing still done manually. This is known and expected. Note: as the step-before-signing finished late at night, the automation waited overnight until human woke up and did the signing the next morning.

4) manually copying bits from stage to build-console after each step completed. This was a known issue that we expected to have fixed for the scheduled 2007 release, but was not yet in place when this Firefox2.0.0.7 firedrill started. After each step finished, we had to manually copy files between “stage” and “build-console”, so that the next step would find the files it was expecting. Was intrusive and annoying. On track to be completed before end sept. (see bug#396438)

tc
John.
=====
ps: After the release, we’ve heard a few questions about the new GPG key. The previous key had expired sept2006, and was still being used, until this new key was available in August2007. We used the new key in Firefox3a7, and also in Firefox2007. After the Firefox2007 release, some questions about how to confirm the new public signing key on key servers. We’ve reviewed the keys on key servers, and they seem ok, but are still investigating. (see bug#377781).