Amazon Public Cloud – Toppled by Human Error
What is AWS?
AWS (Amazon Web Services) is one of the largest and most popular public cloud computing platforms. It provides instant on-demand computing services, typically on a simple pay-as-you-go basis, for anyone to use for whatever purpose. This giant of the cloud computing world underpins many of the world’s best-known brands, including the likes of Kellogg’s, British Gas, Channel 4, The Financial Times, BMW, Vodafone and even Nasa, just to name a few.
On Tuesday afternoon, a big outage took down a large chunk of the AWS computing services hosted in the US east coast. This left many people in North America unable to access a whole host of websites and services, including Adobe’s cloud, Zendesk, Citrix, Yahoo Mail, Expedia, Salesforce.com, Nest and many, many more.
How did it happen?
After the outage, Amazon said one of its employees was debugging a billing system issue and unintentionally took a number of servers offline. This simple error started a domino effect that took down further server systems and so on. Subsystems cross-dependency means as one went down, others would also stop working as well.
Who is to blame?
Amazon have made adjustments to staff policies in an attempt to mitigate against this type of thing happening again. Which does imply that their engineers had a little too much free reign. But companies who use AWS have to also take their share of responsibility, as their developers should be distributing their sites, applications and services across multiple AWS regions, so should an AWS region fail again, everything would cut-across to another region.
What does this tell us about Public Cloud Services?
Public cloud computing can be very tempting and still delivers attractive benefits, such as low contract commitments, competitive pricing, global reach and infinite scalability. However, for businesses running critical IT infrastructure looking to leverage public cloud computing, careful consideration should be taken when deciding the best route. As this AWS outage shows, it clearly doesn’t pay to put all your eggs in one basket. But the good news is that there are plenty of ways to mitigate risk, for example building across multiple regions or running a hybrid environment; either on-premise/cloud or private/public cloud.
How can Opus help?
Our technical architects learn your IT environment, take time to understand the critical elements, then design the perfect IT architecture using best-of-breed technologies and competitive commercials.