There have recently been high profile companies receiving media coverage after delivering poor customer experience as a result of a 'cloud outage'. In fact, it may seem like we're seeing more and more cloud outages leading to some organizations questioning whether the cloud really is stable enough to trust with your key customer interactions. I can assure you it is. What we're actually witnessing is a record number of end users being affected whenever an outage occurs, simply because more and more organizations - and individuals - are migrating to the cloud. Since cloud services now power much of our day-to-day computing and application functionality, outages are more widely felt in the community at large.
Are Outages Inevitable?
Cloud service outages tend to occur because of how sophisticated the technology is. A good analogy for cloud computing would be a large superhighway. If you see 15 empty lanes in the middle of the night, you might not be able to imagine how traffic could ever become jammed. Although it seems almost impossible, all it would take is for a single car to blow a tire to cause the entire freeway to come to a grinding halt in both directions.
The same thing applies to cloud infrastructure—it could be higher than expected usage or a memory leak within an application that causes resources to become restricted. At some point in time, the system will eventually fail. If something minor goes wrong, services that are usually running at hundreds of miles per hour will have to stop in order to get fixed. Before you know it, the entire network could become backed up. The system will run out of resources and will have to be shut down in order to recover.
Mitigation Strategies
Self-healing networks are very helpful to mitigate the impact of a potential outage. Even when cloud service providers only go down for a brief period of time, it has a huge impact on business and home users. To get ahead of the game, companies like Netflix continuously inject faults into their network to see what issues will arise. They can use the results to proactively improve their network infrastructure. If and when a fault occurs in an uncontrolled situation, they will already have a workaround in place.
Multiple levels of redundancy are another helpful method of coping with cloud outages. A company's service will be more robust if it's installed on more than one cloud hosting provider (e.g, Google, Microsoft Azure, or Amazon Web Services). If one cloud provider goes down, it won't result in a total service outage. Instead, the company can seamlessly switch over to the backup provider.
Small business owners don't have the same level of resources to prevent cloud outages from making a large impact on their service. If you are a small business choosing a cloud provider, it would be a good idea to look for someone that has data centers in multiple regions. That means that they've taken it upon themselves to compartmentalize their service delivery. If a failure ever occurs in one region, they can transfer the load to another region that hasn't been affected.
Above all else, do your homework to make sure the provider you've chosen has the proper level of risk mitigation and capacity. It's really a cost/benefit analysis—you have to find a provider that has the best level of redundancy and risk mitigation in place to fit your budget.
*Embedded video from the movie Sex Tape (2014). All rights belong to respective owners, including but not limited to: Escape Artists, LStar Capital, Media Rights Capital (as MRC), Sony Pictures Entertainment (SPE)