High Availability & Disaster Recovery Databases for Microsoft SQL Server

Highly Available and Disaster Recovery database systems require different configurations and expectations in order to prove successful. When your database’s recovery times meets your users’ and team’s needs, it too can be considered highly available. In this blog post, I compare and contrast each term as a primer to Microsoft SQL Server databases.

Are High Availability (HA) and Disaster Recovery (DR) SQL server databases the same? These two terms are used together quite often as if they’re joined at the hip. Before we go into the specifics about configuring a Highly Available SQL Server database, let’s wrap our heads around the concepts of High Availability and Disaster Recovery.

This is how I would describe them: High Availability is a system configured to avoid or minimize the possibility of an unplanned outage, whereas Disaster Recovery entails the processes and procedures required to restore the system to its Highly Available state after said outage. Two key success factors in designing a Highly Available system are:

  • Its ability to minimize or avoid unplanned system outages

  • Its ability to reduce the complexity and time needed to recover when an unplanned outage (disaster) occurs

For a given software application, service, or platform, high availability is ultimately measured in terms of the end user’s experience and expectations. A system is considered to be highly available according the agreement and expectations of customers and stakeholders. The availability of a system can be expressed as this calculation:

The resulting value is often expressed by industry in terms of the number of 9’s that the solution provides, which is meant to convey an annual number of minutes of possible up time, or conversely, minutes of downtime.

Number of 9’sAvailability PercentageTotal Annual Downtime
299%3 days, 15 hours
399.9%8 hours, 45 minutes
499.99%52 minutes, 34 seconds
599.999%5 minutes, 15 seconds

For example, let’s assume that the end users come to an agreement that the company’s financial system will not be down for more than 8 hours in a quarter. At the end of the quarter, if the system has been down for only 4 hours, this means that the system’s availability is 100%. The definition of availability differs from system to system within a company and between companies.

It is also important for a company to define recovery goals. These goals are set by defining the Recovery Time Objective (RTO) and Recovery Point Objective (RPO). Recovery goals can also differ between systems and companies and are usually based on factors like information loss, property damage, decreased productivity, opportunity costs, contractual damages, or the loss of goodwill. For more help with effective goal setting, get help from our data management consultants.

Recovery Time Objective vs. Recovery Point Objective

Recovery TIME Objective (RTO): This objective deals with time associated with a recovery. How long does it take to bring the system back online after an outage? The shorter the duration, the better it is. For example, RTO for a system can be defined as “System should be back online with full functionally within 2 hours after an outage has occurred.”

Recovery POINT Objective (RPO): This objective deals with data loss associated with a recovery. How much of data is lost during an outage? It is tough to quantify the amount of data that is lost because a system can experience varying workloads during a day. Hence  why this objective also is measured in terms of time. For example, RPO could be defined as “Should be able to recovery to within 15 minutes of the last outage.”

I hope this introduction provides a good primer about High Availability and Disaster Recovery SQL server databases. In my next post, we’ll dive into the specifics of Microsoft’s SQL Server’s capabilities to configure a database system that is highly available and easy to recover from a disaster.

We’ll also cover:

  • FCI – Fail Over Cluster Instance

  • WSFC – Windows Server Fail Over Cluster

  • VNN – Virtual Network Name

  • AG – Availability Group

  • Availability Group Listener Name (For read only workloads, set Application Intent Attribute in connection string – Read Only)

There’s more to explore at Smartbridge.com!

Sign up to be notified when we publish articles, news, videos and more!