Assessing the risk, cost and complexity of high availability (HA) and disaster recovery (DR) in a cloud environment
What happens when a cloud server goes down?
Unlike a traditional IT environment which needs to provide fault tolerance against physical hardware and infrastructures, organizations that leverage the public cloud can transfer their workload from one cloud-based server to another, almost immediately in an automated way. This built-in resilience and flexibility of the cloud organically reduces the need for high availability (HA) and disaster recovery (DR) plans so long as the organization can tolerate brief periods of downtime.
However, there are some companies for which 60 minutes—or even six minutes—of downtime poses a significant risk to their business. For example, large financial institutions, particularly those involved in the trading of stocks, the cost of downtime is measured not in hours or even minutes, but seconds. Many global production or manufacturing companies also operate on a 24/7 model wherein even brief interruptions can set off seismic waves of disruption throughout the business and supply network.
For these companies, the cost of interruption far outweighs that of developing HA/DR capabilities in the cloud. In this post we explore how and why one financial institution decided to implement a cloud-based HA/DR solution—and the process your business might consider when doing the same.
Case Study: A closer look at cloud-based HA/DR within the financial services industry
One of our clients, a large financial institution, wanted to limit interruptions of their cloud-based system to just three minutes. The solution would need to support all system components including SAP ERP Central Component (ECC) and Business Warehouse (BW), along with their corresponding databases (HANA) running on SUSE Linux.
Over the course of several meetings that involved more than 20 executives from the client team, Protera developed a list of six cloud-based HA/DR solutions and the associated cost and complexity of each. The Protera team also helped the organization determine how to maximize their current investment and integrate it within the new solution.
After careful evaluation the team selected a symmetrical solution wherein the HA and DR systems were sized and built as they were in Production. The model consisted of four different clusters: two for the applications servers (one each for ECC & BW) and two for the corresponding databases. This solution allowed the organization to run their business loads on either the primary or secondary node of each cluster without any degradation of performance.
Protera worked with the client to design, test and deploy the solution. The team was also engaged to manage regular monthly testing and quarterly drills, as well as oversee daily monitoring.
- Speed: A truly symmetrical solution, the system allowed the organization to recover from any event within three minutes.
- Reliability: The solution was also designed to support a safe and efficient recovery process, by orchestrating the proper sequence and process for restarting the system.
- Scalability: A flexible solution, this model also has the ability to integrate with other services, such as Vertex.
The solution was put to the test three years ago, when one of the cluster nodes experienced a hardware failure. Recovery was so seamless and fast that the client’s IT team was not even aware of the incident until the Protera monitoring team reported the event and noted that the clusters were inverted—a tell-tale sign of a production system fail over. In the words of our client: “The switch over didn’t affect our business operations at all. Our users were not aware of any outage.”
Does your business need a cloud-based HA/DR plan?
Ultimately, the decision to create additional HA/DR capabilities will come down to how long the organization can withstand interruption, as well as the cost and risk associated with that disruption. To that end, the customer should calculate the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements for the business to aid in deciding on the optimal HA/DR architecture. Generally speaking, the shorter the window, the more likely it is that your business will need to consider creating a cloud-based HA/DR plan.
Fortunately, for those organizations that cannot withstand even brief periods of downtime, developing a HA/DR plan has become more cost effective and better performing when applied to cloud native technologies. For such companies, these capabilities are important components of the overall cloud strategy and must be part of the transformation journey.
Want to get started on a HA/DR solution for your business? Reach out to Protera to schedule a consultation and learn more about our cloud-based HA/DR services