Recently, I was asked to lead a discussion with a few VPs within Mozilla about the scope of Release Engineering at Mozilla. Each VP was well established in their career, technically-seasoned, smart, and each brought their own different preconceived notions of what RelEng means, each with different terminology, each from their own perspectives from their own different previous companies. To make things even more interesting, different organizations have different ideas and terminology on what they mean by “Release Engineering”, so getting everyone on the same page was going to be interesting… and important to get right, if we were all to work well together.
This blogpost is a quick summary and if curious, PDFs of slides are here.
At Mozilla, Release Engineering covers two main topics:
1) Release Automation:
People who are not day-to-day-developers typically think of this first. How efficient is the software delivery pipeline within a software organization? How long it takes from “go to build a release” to “users can start downloading updates”? The faster and more reliable this software delivery pipeline, the more competitive the company can be in the marketplace. This used to be where Mozilla’s RelEng, as a group, spent most of their time, sleeping in the office, getting bribes for releases, and all that drama. Now, thankfully, our automation is really great, so chemspills are super-quick (great for our users) and mostly-hands-off (great for the humans in RelEng). There’s still lots to improve, and always some adjustments because of changing-product-requirements, but its already night-and-day improved since 2007. It continues to improve even since we wrote about it in a book!
2) Continuous Integration:
Day-to-day developers think of this, and deal with this, every single day. Anyone doing code changes at Mozilla keeps an eager eye on tbpl.m.o to see if their change is all green (good!), they can close out their bug as FIXED and move on to the next bug. Making the Continuous Integration process more efficient has allowed Mozilla to hire more developers to do more checkins, transition developers from all-on-one-tip-development to multi-project-branch-development, and change the organization from traditional releases to rapid-release model. This required RelEng to scale up significantly in the last <6 years, from a humble 86 machines to ~3,400 machines spread across 4 physical Mozilla colos as well as 3 Amazon AWS regions. Here’s a quick summary diagram of how all these machines are interconnected, which RelEng knows by heart, but which I couldn’t find posted anywhere so I drew as part of doing this presentation.
This was a fun meeting. My favorite quotes from the lively back-forth were: “every software company lives-or-dies by the efficiency of its development process and its software delivery pipeline” …and… “everyone interacts with different parts of the elephant, so everyone has very different ideas of what they are looking at”.
Hopefully, others find this interesting too. Of course, if you have questions or comments, please post them below, or drop me an email.