High Availability vs Fault Tolerance: An Overview

Posted on by Michael McDonald | Updated:
Reading Time: 5 minutes

Introduction

Businesses are more reliant than ever on the servers and infrastructure providing connected access to the services they offer. These systems and applications are in constant use and have high demands and expectations for availability using terms like:

Unexpected outages and the planned maintenance of critical equipment both have the possibility of disrupting access for users. Whether intentional or not, downtime can produce a negative response from customers and tarnish the reputation of your business. 

Critical applications and infrastructure can be designed to prevent service interruptions caused by hardware failures, application errors, or any number of unexpected events from happening by utilizing high availability or fault-tolerant configurations. These design choices will help you as a business owner or operator reduce or eliminate connectivity issues for the systems your users expect and rely on; however, which method is best for you depends on many factors and business considerations.

What is High Availability?

High availability is a design concept where your system will continue to work in a typical fashion after experiencing an unexpected failure or malfunction with minimal interruption to service. Think of it as multiple roads leading to the same house: if road A is blocked, you can take road B and still get to where you are going.

The primary focus for high availability is to build a system with no single point of failure.

A single point of failure represents the failure of a single hardware component that can lead to loss of data access or potential loss of data.”

This is accomplished by:

  • Allowing a secondary system to take over in the event of a failure.
  • Utilizing a method of safely and reliably moving services from the failed primary system to a functional secondary system (also referred to as a crossover). This method is usually software-based and uses some type of monitoring component to identify a failure and initiate a transfer of traffic or resources to the backup machine.

Advantages of High Availability

Cost Savings

A significant advantage of high availability solutions is the cost savings over a fault-tolerant design. While any type of system designed to prevent or minimize the impact of outage-level events will come at a cost, high availability solutions are easier to design and implement. This lack of complexity increases ease of use and simplifies maintenance.

Easily Scalable

Highly available solutions are easily scalable. Since the easiest method of introducing a highly available system is utilizing a duplicate set of infrastructure, the system’s overall design is simplified, and minimal thought is needed to identify what is necessary to build out the highly available infrastructure. 

Load-Balancing Solution

Highly available systems provide a load-balanced solution when operating normally without the need for extra infrastructure acting like load balancers. In this setup, traffic is split over multiple environments, with the traffic consolidating in a failure. 

For example, half of your website traffic goes to server A while the other half goes to server B. The load each server experiences is less than what one would experience when combined, so this split produces a more desirable response for clients.

If a failure occurs, all traffic for the failing infrastructure can divert to the second environment. It’s not the most desirable outcome, but combining the traffic will keep the environment accessible while resolving the issue.

Disadvantages of High Availability

Service Disruption

A crossover event on high availability systems moves or diverts traffic from failing systems to healthy ones. This crossover depends on several factors and elements such as:

  • Software monitoring for a failure.
  • Identifying between failures and false positives (heavy traffic or a lost packet).
  • The event that triggers the crossover to a healthy system (including alerts).

While fast, users may experience a brief outage event (a few seconds) while the crossover occurs. The minimal amount of time (that may go unnoticed) is still favorable compared to a complete disruption in traffic due to an outage.

Required Component Duplication

Every component of your system/infrastructure is duplicated to sufficiently protect against outages and failures, resulting in redundant components.

While the duplication leads to a higher cost than non-highly available systems, it is less than the potential loss in revenue and required hours to fix something during an outage.

The best analogy for perceiving this expenditure is insurance: While it is best to have it just in case, it will be costly, especially when you don’t need it.

Data Loss (Rare)

In rare circumstances, a high availability system can produce a loss in data. For example, suppose you experience a failure event, and the primary components of your system have to transfer authority to the secondary system. In that case, your system will experience an extremely narrow gap of time in which data sent from a user is sent towards the primary system precisely as it fails but before the secondary system takes over.

While there is a disruption of service, the more detrimental factor is data loss. In almost all of these circumstances, the system resends the end user's request. Ideally, the data ends up in the secondary system, but it is not guaranteed, resulting in potential data loss.

What is Fault Tolerance?

Fault tolerance is a design concept where your system will continue working normally after experiencing an unexpected failure or malfunction with zero service interruption. An easy way to think of this is talking to a person over two different phones simultaneously. If one phone cuts out, the communication is uninterrupted since the second phone connection is still working. 

Unlike high availability, fault-tolerant systems:

  • Experience absolutely no downtime in the event of a failure since there is no crossover event. 
  • Are designed so that all traffic, requests, and changes to data are duplicated onto multiple redundant systems simultaneously. 

Advantages of Fault Tolerance

Zero Interruption

The primary difference between high availability and fault tolerance is that fault tolerance offers zero interruption to service for any client. No interruptions increase reliability for the end user and offer the owner or operator unique opportunities for any activities that may otherwise cause interruption, such as:

  • Hardware upgrades/replacements.
  • Software patches.
  • Backups/data migrations.

Unlike a highly available system, the fault-tolerant transition is seamless for the end user, and every request coming in will always be processed. This guarantee that the system and the information it contains is always online is a massive benefit for many organizations. 

No Loss of Data

A fault-tolerant system eliminates the loss of data that potentially occurs during the HA crossover event. Fault-tolerant systems do not have that crossover component between active/passive systems and will write/receive all requests. 

Disadvantages of Fault Tolerance

System Complexity

A significant disadvantage of a fault-tolerant system is the extreme complexity of handling traffic volume while duplicating information instantly. Mirroring information from a hardware and software perspective is difficult and time-consuming. In addition, many applications are not built to handle mirroring data and requests simultaneously while also serving the same request to read information and write it.

For example, an information request is received and processed on one infrastructure set in the fault-tolerant system. The information requested is also edited on a separate set of infrastructure in the system simultaneously. 

This complexity creates numerous opportunities for design failures that can cripple and prevent the system from working (or serving clients with the wrong information). 

Cost

Fault-tolerant systems have numerous moving pieces that all have to work in conjunction with one another to behave as expected under normal operations and provide security and connectivity during unexpected issues. Unfortunately, this behavior expectation is not cheap, and the hardware, software, and administrators’ expertise are all costly factors.

Due to cost, businesses must evaluate if a few moments of connectivity issues during a crossover for an HA system is worth the savings of a high availability setup compared to a fault-tolerant one. 

Conclusion

Whether a high availability solution or fault-tolerant system is right for you depends on several factors such as budget, need for 100% uptime, and data integrity. For example, some businesses can afford to risk a few moments of connectivity issues during an event that would otherwise cause a complete outage, like news websites or streaming services, whereas medical or financial institutions cannot.

Your users will have expectations, and weighing those expectations versus the cost of building out infrastructure capable of meeting them will ultimately determine which solution is best for you and your business. Every business needs to consider the risks and cost benefits of always being online and have a disaster recovery plan before problems arise.

Not sure what will work for you? One of our most helpful humans in hosting will happy to assist you, so reach out to us today!

About the Author: Michael McDonald

Michael has been a member of the Most Helpful Humans in Hosting for over six years. Starting down a path of education and teaching, Michael returned to his passion of working with technology and helping people around the world embrace the Internet. In his free time, Michael can be found in the northern parts of Michigan spending time on the great lakes or engrossed in a good book.

Have Some Questions?

Our Sales and Support teams are available 24 hours by phone or e-mail to assist.

1.800.580.4985
1.517.322.0434

Latest Articles

Cloudstack vs OpenStack: Which is Right For You?

Read Article

Cloning an Existing Virtual Machine with VMware

Read Article

Five Steps to Create a Robots.txt File for Your Website

Read Article

Premium Business Email Pricing FAQ

Read Article

Microsoft Exchange Server Security Update

Read Article