Following on from my last post, I was curious what is a good time for downtimes, so did some further digging. Here’s the same “pushes-for-the-month-of-Jan” data broken down instead by time of day.
Here’s the same data broken down by average#pushes per hour, across the entire month.
Basically, the inflow of pushes never stop! It looks, afaict, like we’ve officially grown into a literal 24×7 project now, and there’s no real good time for a downtime anymore. If January is “normal”, then midnight-5am PST on some weekends might be the least-disruptive time for a downtime, but even so, that wasn’t true for every weekend.
The move from dedicated-machines to pool-of-slaves is really paying off here. While we still need downtimes for some types of maintenance, a lot of maintenance on slaves can be done *without* closing the tree; instead, while the tree remains open, we simply take one slave out of the pool, fix it, put it back in the pool, take the next slave out, etc, while the rest of the pool-of-slaves continue working as normal. Doing this takes more time from RelEng point-of-view, but its less disruptive for developers because the tree remains open and jobs are still being processed throughout. This was simply not an option when we were running on dedicated machines, and is more and more important now that we don’t really have a “good time for a downtime” anymore.