The Basics of High Availability Engineering

Does your business rely on cloud services, databases, remote servers, or stored data of some kind?

Of course it does.

Of course you are.

In our 24-hour, always-on, always connected, someone-is-always-awake market(s), uptime is critical. Simply put, going dark is bad for business… and it is unacceptable. That’s why high availability engineering solutions—like ones offered by Liquid Web, and other modern web hosting companies—are so important. From high availability infrastructure and SQL databases, to redundant replication, transaction logs, and the elimination of single points of failure, a web host is only as good as its high availability (HA) engineering services.

Let’s assume your application will be hosted on a traditional managed infrastructure.

Now, let’s look at why a high availability server is a better solution.

High Availability Summarized

When it comes to HA, the three principles of reliability engineering must be considered:

Reduce or eliminate single points of failure.
In redundant systems, make sure crossover points are reliable.
The system in place must detect and react to failures in real time.

When these three principles are reliably implemented, a significant reduction in downtime is achieved. A quality web host will have these principles in mind when designing their services.

Reducing single points of failure in an HA system means redundancy in data—virtual, physical, or a combination of the two. An HA structure will have a primary volume, and at least one physical backup volume. A standard configuration is comprised of two, identical, primary volumes backed up by two, identical, Distributed Replicated Block Device (DR:BD) physical volumes, backed up by two DR:BD virtual volumes. DR:BD volumes perform selective, synchronous data replication, which means that only blocks of changed data (not the entire volume) are rewritten and backed up in real time.

DR:BD volumes ultimately reduce backup times as they require fewer computing resources at one time. Each backup tier (two identical primary volumes, two identical DR:BD virtual volumes, etc.) is stored on separate physical servers—some hosts will even back up to a server at a remote location. A configuration with a remote location eliminates another single point of failure by protecting your data from natural disasters and other location-based issues like power outages and network failures.

What To Do With the Database

In an HA system, it’s recommended that your SQL database is stored on a separate, redundant server environment as it improves performance and reduces overhead on your primary server. A dedicated SQL server also works with/toward the principles of reliability engineering as it is specifically designed for high availability, including automated and reliable crossovers and real-time failure detection.

SQL databases also create incremental transaction logs; another guard against single points of failure. Transaction logs record every change in the database at set intervals as frequently as one minute—the SQL database can use the transaction logs as a dataset, writing to the backup servers in your HA configuration.

Liquid Web’s standard configuration for SQL database hosting includes a daily backup of the entire database and 24 rolling hours of hourly transaction logs.

heartbeat monitoring on high availability

Monitoring for Failover

At the very core of the HA configuration should be a monitoring system that is constantly and consistently keeping an eye on the health of the clustered servers and automatically performing failovers when necessary. The monitoring subsystem most popular across the industry is Heartbeat. Heartbeat is a Linux-based monitor that can reliably support multiple nodes. Heartbeat can quickly and accurately identify critical failures and automatically transition the system to a redundant server.

As you can see, each part of the HA system works with/toward more than one of the three principles of high availability engineering. The redundant data nodes (physical and virtual) reduce single points of failure and create reliable crossover points.

A dedicated SQL server creates another layer of redundancy, another guard against single points of failure, and has built-in, automated crossover points.

Finally, Heartbeat sits at the center of the entire configuration, monitoring the system in real-time and automating crossovers when necessary.

With a quality HA solution in place, downtime is reduced or virtually eliminated, keeping your business on and operational all day, every day.