Veteran technology columnist John C. Dvorak famously labeled the cloud “risky, unreliable, and dumb” in a PC magazine article a few years back. Unfortunately, many of his assertions hold true, even as cloud computing rounds the decade-old mark since its inception.
Despite the cloud’s indisputable benefits, the following 20 cloud failures prove that all technologies (and people managing said tech) are fallable—even massively scaled utility computing services.
20. AOL Outage
Yes, AOL is still alive and kicking—albeit, the service was kicked hard on February 19, 2015 and didn’t come back for several hours. AOL email users were unable to access their accounts reportedly due to a networking issue.
19. Level 3 Communications Outage
Cloud services go up and down, but it’s another story when the underlying pipes breaks. Level 3 Communications suffered a fiber network outage in June 2015, disrupting internet traffic in the US and most of western Europe. The error was due to a network routing snafu at a tier 1 backbone internet provider; to the dismay of millions, services like Twitter were intermittently down for extended periods.
18. Outlook.com Outage
Outlook.com went offline for several hours in April 2015. Sure, big deal right? This was not the first—or last—service interruption for Microsoft’s public cloud email service. In fact, multiple sites like this one have popped up for tracking and notifying users of any Outlook.com downtime, à la downforeveryoneorjustme.com.
17. Google Compute Engine
Part of the Google Cloud Platform, Google Compute Engine is the company’s primary IaaS offering, competing with the likes of AWS and Microsoft Azure. It suffered an outage across multiple zones for almost 3 hours back in February 2015.
16. Apple iCloud Services
In March 2015, both the Apple App Store and iTunes went belly up for approximately 11 hours, taking with them iCloud Mail and other web tools. A couple months later, iCloud experienced a 7 hour outage that affected 11 cloud-based services, including iCloud Drive, Photos, Documents, and iCloud Mail, among others. The causes were determined to be internal DNS errors at Apple.
15. Verizon IaaS Service
In January 2015, Verizon brought down its IaaS offering for 40 hours to perform routine maintenance on its cloud infrastructure. In all fairness, customers were warned ahead of time, and the service reportedly gained significant improvements in performance due to the upgrades applied during the downtime. Loud and clear, Verizon—thanks for the head’s up.
Popular cloud storage provider Dropbox suffered a global outage in January 2014 due a failed upgrade of its systems. Sporadic website server errors and non-syncing files ensued for two days. There would more outages later, including a major global service interruption in August 2015 due to “routine internal maintenance.”
On April 2014, a massive fire in a Samsung SDS data center in South Korea disrupts mobile access to data stored in the cloud globally, as well as credit card services and other Samsung Smart devices’ cloud-dependent features. Oddly, SDS is the firm’s IT services arm providing consulting and outsourcing solutions, so it’s not clear why so many cloud servers mission-critical to consumer offerings were consolidated at that one particular subsidiary location.
In May 2014, a regional utility power outage leads to UPS failures inside of Internap’s New York City data centers, bringing down colocated servers and IP connectivity services for 7 hours. StackExchange, its network of sites, and streaming video platform Livestream disappear from the face of the internet temporarily.
June and July of 2012 both saw outages at the world’s largest cloud CRM company. For Salesforce, a momentary power loss at a 3rd party data center resulted in a 9 hour service interruption; a storage tier fault resulted in the previous month’s downtime.
Popular Denver-based cloud hosting provider hosting.com went lights out for 1,000 customers on July 28th, 2012 after preventive maintenance measures on a UPS system resulted in loss of critical power to portions of a facility. In other words, some imbecile accidentally shut the power off to the data center.
Microsoft seizes 23 domains from Reno-based free dynamic DNS provider No-IP.com in an act of unprecedented of corporate vigilantism: the software giant claims that its users were victims of malware originating from the domains. Unfortunately, as a result 1.8 million law-abiding customers also experienced downtime for two days, including companies like network security company SonicWall.
GoDaddy is the world’s largest domain name registrar, which in turn makes it one of the biggest DNS providers as well. Which explains why on September 10th, 2012, a significant portion of the web blacked out. Corrupted data in the firm’s routing tables caused DNS service disruptions, impacting millions of websites for 6 hours.
7. Xen Vulnerability and AWS, Rackspace, IBM SoftLayer Downtime
November 2014 saw the brief but widespread cloud outages of several major cloud providers; AWS, Rackspace, and SoftLayer, among others. The reason for the downtime: to patch a security vulnerability in the Xen hypervisor.
6. Microsoft Azure Outage
In February of 2013, Azure suffered a 12-hour outage as a result of an expired SSL certificate. Not the first, but perhaps the most well-remembered outage of Microsoft’s public cloud hosting service. A year later Azure took the prize for being the cloud provider with the worst uptime record of 2014; 2015 also saw Azure plagued by intermittent outages. Perhaps David N. Cutler, father of Azure and several other Windows operating systems, needs to get back under the hood for some fine tuning. Too bad he’s since left the OS world to join the Microsoft’s Xbox team.
5. Amazon AWS Outage
In April of 2011, Azure suffered a massive service disruption that also took down heavily trafficked sites like Reddit, NetFlix, and FourSquare, among others. The source of the glitch was a network upgrade error, resulting in almost four days of utter misery for a large percentage of AWS customers. Four years later in September of 2015, Amazon went down again—bringing with it NetFlix (again), as well as Tinder, IMDb, and Amazon.com. The cost of the downtime caused by this Amazon DynamoDB glitch? In Amazon.com’s case, an estimated $5 million in lost revenue per hour of downtime.
4. Healthcare.gov Failure
Universal healthcare? Not if faulty technology can help it. The website supporting Obama’s crowning achievement—the Affordable Care Act—cost a whopping $630 million taxpayer dollars to develop but nonetheless crashed and burned on the first day. Glitches, sign-in errors, miscalculations, account creation errors, and more plagued the website in subsequent weeks. To this day, Healthcare.gov continues to experience frequent glitches.
3. Adobe Systems Data Breach
$1.2 million plus change. This is the amount settled for in a class action lawsuit filed by customers against Adobe for its massive 2013 data breach. 38 million records containing Adobe IDs, credit card information, and login data were stolen—up from a comparatively paltry 3 million breached records initially quoted by the software giant. Sadly, the company admitted to knowing of lackluster security controls in its Creative Cloud offering (e.g., using the same encryption key for all passwords) but failed to fix the issues prior to the breach.
2. Knight Capital Collapse
An automated cloud-based stock trading software goes haywire and costs the company $440 million in just forty five minutes. This may sound like an Oliver Stone take on the dangers of technology—if it only were a movie. No doubt, the sentiments of Knight Capital’s execs as they watched a faulty software algorithm erroneously buy several billion dollars of unwanted stocks, erasing 75% of the company’s value, and eventually—the company itself.
1. Hurricane Sandy
The 2012 Atlantic hurricane season saw the arrival of Sandy, the 2nd-costliest hurricane in U.S. history. In New York and New Jersey, floods and power outages wreaked havoc on data centers in the surrounding areas. The incident perhaps most notably opened up broader discussions around the impact of natural disasters on business and service continuity. Nothing is immune to the wrath of Mother Nature—not even the cloud
As you can see, the causes for the top cloud failures in recent history run the gambit. Natural disasters, fires, human error, technical glitches, and cyber attacks all had a hand in the aforementioned items. The cloud may not be bulletproof, but its the best computing innovation we’ve been able to capitalize on thus far.