My earlier blog post shows how much we spend per checking using AWS “on demand” instances. Rail and catlee has been working on using much cheaper AWS spot instances (see details here and here for details).
As of today, AWS spot instances are now being used for 30-40% of our linux test jobs in production. We continue to monitor closely, and ramp up more every day. Builds are still being done on on-demand instances, but even so, we’re already seeing this work reduce our costs on AWS – now our AWS costs per checkin us USD$26.40 (down from $30.60) ; broken out as follows: USD$8.44 (down from $11.93) for Firefox builds/tests, USD$5.31 (unchanged) for Fennec builds/tests and USD$12.65 (down from 13.36) for B2G builds/tests.
It is worth noting that our AWS bill for November was *down*, even though our checkin load was *up*. While spot instances get more attention, because they are more technically interesting, it is worth noting for the record that only a small percentage of this cost saving was because of using spot instances. Most of the cost savings in November were from the less-glamorous work of identifying and turning off unwanted tests – ~20% of our overall load was no-longer-needed.
- A spot instance can be deleted out from under you at zero notice, killing your job-in-progress, if someone else bids more then you for that instance. We’re only seeing ~1% of jobs on spot-instances being killed. We’ve changed our scheduling automation so that now, any spot job which is killed, will automatically be re-triggered on a *non-spot* instance. While a developer might tolerate a delay/restart once, because of the significant cost savings, they would quickly be frustrating if a job was unlucky enough to be killed multiple times.
- When you ask for an on-demand instance, it is almost instant. By contrast, when you ask for a spot-instance, you do not have any guarantee on how quickly Amazon will provide the instance. We’ve seen delays of up to 20-30mins minutes, all totally unpredictable. Handling delays like this requires changes to our automation logic. All work well in progress, but still, work to be done.
- This post only includes costs for AWS jobs, but we run a lot more builds+tests on inhouse machines. Cshields continues to work with mmayo to calculate TCO (Total Cost of Ownership) numbers for the different physical machines Mozilla runs in Mozilla colos. Until we have accurate numbers from IT, I’ve set those inhouse costs to $0.00. This is obviously unrealistic, but felt better then confusing this post with inaccurate data.
- The Amazon prices used here are “OnDemand” prices. For context, Amazon WebServices has 4 different price brackets available, for each different type of machine available:
** OnDemand Instance: The most expensive. No need to prepay. Get an instance in your requested region, within a few seconds of asking. Very high reliability – out of the hundreds of instances that RelEng runs daily, we’ve only lost a few instances over the last ~18months. Our OnDemand builders cost us $0.45 per hour, while our OnDemand testers cost us $0.12 per hour.
** 1 year Reserved Instance: Pay in advance for 1 year of use, get a discount from OnDemand price. Functionally totally identical to OnDemand, the only change is in billing. Using 1 year Reserved Instances, our builders would cost us $0.25 per hour, while our OnDemand testers cost us $0.07 per hour.
** 3 year Reserved Instances: Pay in advance for 3 year of use, get a discount from OnDemand price. Functionally, totally identical to OnDemand, the only change is in billing. Using 3 year Reserved Instances, our builders would cost us $0.20 per hour, while our 3 year Reserved Instance testers cost us $0.05 per hour.
** Spot Instances: The cheapest. No need to prepay. Like a live auction, you bid how much you are willing to pay for it, and so long as you are the highest bidder, you’ll get an instance. This price varies throughout the day, depending on what demand other companies place on that AWS region. We’re using a fixed $0.025, and in future, it might be possible to save even more by doing tricker dynamic bidding.
More news as we have it.