High Availability vs Fault Tolerance: An Overview

Posted on June 24, 2022 by Ronald Caldwell | Updated: July 7, 2022

Category: Other | Tags: Disaster Recovery, Fault Tolerance, High Availability, Load Balancing, Scalability

Reading Time: 6 minutes

Businesses are more reliant than ever on the servers and infrastructure providing connected access to their services. These systems and applications are in constant use and have high demands and expectations for availability using terms like five nines (99.999 percent uptime) or industry-standard uptime.

Unexpected outages and the planned maintenance of critical equipment both have the possibility of disrupting access for users. Whether intentional or not, downtime can produce a negative response from customers and tarnish the reputation of your business.

Critical applications and infrastructure can be designed to prevent service interruptions caused by hardware failures, application errors, or unexpected events by utilizing high availability (HA) or fault-tolerant (FT) configurations. These design choices will help you as a business owner or operator reduce or eliminate connectivity issues for the systems on which your users rely. However, the best method depends on many factors and business considerations.

What is Redundancy?

Redundancy can be understood as two servers with duplicate or mirrored data. When comparing high availability vs redundancy, HA includes implementation of automatic failover in case of failure, whereas redundancy refers to removing points of hardware or software failures. Comparing fault-tolerance vs redundancy, FT ensures minimal and core business operations stay online, whereas redundancy is concerned with duplication of hardware and software.

What is High Availability?

High availability servers are designed to have maximum uptime by removing all single points of failure to keep mission-critical applications and websites online during catastrophic events such as spikes in traffic, malicious attacks, or hardware failure. Essentially, HA is implementing redundancy in infrastructure to stay online. You can have redundancy without high availability, but you cannot have high availability without redundancy.

High availability is accomplished by allowing a secondary system to take over in the event of a failure. It uses a method of safely and reliably moving services from the failed primary system to a functional secondary system (also referred to as a crossover). This method is usually software-based and uses a monitoring component to identify a failure and initiate a transfer of traffic or resources to the backup machine.

Advantages of High Availability

Cost Savings

A significant advantage of high availability solutions is the cost savings over a fault-tolerant design. While any type of system designed to prevent or minimize the impact of outage-level events will come at a cost, high availability solutions are easier to design and implement. This lack of complexity increases ease of use and simplifies maintenance.

Easily Scalable

Highly available solutions are easily scalable. Since the easiest method of introducing a highly available system is utilizing a duplicate set of infrastructure, the system’s overall design is simplified, and minimal thought is needed to identify what is necessary to build out the highly available infrastructure.

Load-Balancing Solution

Highly available systems provide a load-balanced solution when operating normally without requiring extra infrastructure acting like load balancers. In this setup, traffic is split over multiple environments, with the traffic consolidating in a failure.

For example, half of your website traffic goes to server A while the other half goes to server B. The load each server experiences is less than what one would experience when combined, so this split produces a more desirable response for clients.

If a failure occurs, all traffic for the failing infrastructure can divert to the second environment. It’s not the most desirable outcome, but combining the traffic will keep the environment accessible while resolving the issue.

Disadvantages of High Availability

Service Disruption

A crossover event on high availability systems moves or diverts traffic from failing systems to healthy ones. This crossover depends on several factors and elements, such as:

Software monitoring for a failure.
Identifying between failures and false positives (heavy traffic or a lost packet).
The event that triggers the crossover to a healthy system (including alerts).

While fast, users may experience a brief outage event (a few seconds) while the crossover occurs. The minimal amount of time (that may go unnoticed) is still favorable compared to a complete disruption in traffic due to an outage.

Required Component Duplication

Every component of your system/infrastructure is duplicated to sufficiently protect against outages and failures, resulting in redundant components.

While the duplication leads to a higher cost than non-highly available systems, it is less than the potential loss in revenue and required hours to fix something during an outage.

The best analogy for perceiving this expenditure is insurance: While it is best to have it just in case, it will be costly, especially when you don’t need it.

Data Loss (Rare)

In rare circumstances, a high availability system can produce a loss in data. For example, suppose you experience a failure event, and the primary components of your system have to transfer authority to the secondary system. In that case, your system will experience a highly narrow gap of time in which data sent from a user is sent to the primary system precisely as it fails before the secondary system takes over.

While there is a disruption of service, the more detrimental factor is data loss. The system resends the end user's request in almost all of these circumstances. Ideally, the data ends up in the secondary system, but it is not guaranteed, resulting in potential data loss.

What is Fault Tolerance?

Fault tolerance is a form of redundancy, enabling visitors to access the system if one or more components fail.

When comparing fault tolerance vs high availability, fault tolerance enables visitors to still receive the requested site or application with limited functionality in the event of a failure of any component. In contrast, high availability is designed to keep all systems online using automatic failover mechanisms to automatically transfer traffic and workloads to fully-functioning nodes.

Fault tolerance is achieved through a storage area network (SAN). Using extremely fast, low latency gigabit ethernet connected directly to the servers, a SAN is a highly scalable and fault-tolerant central network storage cluster for critical data. As a result, users transfer data sequentially or parallel without affecting the host server's performance.

Unlike high availability, fault-tolerant systems experience almost no downtime in the event of a failure since there is no crossover event. They are designed so that all traffic, requests, and changes to data are duplicated onto multiple redundant systems simultaneously.

Advantages of Fault Tolerance

Zero Interruption

The primary difference between high availability and fault tolerance is that fault tolerance offers zero interruption to service for any client. No interruptions increase reliability for the end-user and offer the owner or operator unique opportunities for any activities that may otherwise cause interruption, such as:

Hardware upgrades/replacements.
Software patches.
Backups/data migrations.

Unlike a highly available system, the fault-tolerant transition is seamless for the end-user, and every request is processed. This guarantee that the system and its information are always online is a massive benefit for many organizations.

No Loss of Data

A fault-tolerant system eliminates the loss of data that potentially occurs during the HA crossover event. Fault-tolerant systems do not have that crossover component between active/passive systems and will write/receive all requests.

Disadvantages of Fault Tolerance

System Complexity

A significant disadvantage of a fault-tolerant system is the extreme complexity of handling traffic volume while duplicating information instantly. Mirroring information from a hardware and software perspective is difficult and time-consuming. In addition, many applications are not built to handle mirroring data and requests simultaneously while also serving the same request to read and write information.

For example, an information request is received and processed on one infrastructure set in the fault-tolerant system. The requested information is also edited simultaneously on a separate set of infrastructure in the system.

This complexity creates numerous opportunities for design failures that can cripple and prevent the system from working (or serving clients with the wrong information).

Cost

Fault-tolerant systems have numerous moving pieces that all have to work together to behave as expected under normal operations and provide security and connectivity during unexpected issues. Unfortunately, this behavior expectation is not cheap, and the hardware, software, and administrators’ expertise are all costly factors.

Due to cost, businesses must evaluate if a few moments of connectivity issues during a crossover for an HA system is worth the savings of a high availability setup compared to a fault-tolerant one.

Final Thoughts

Whether a high availability vs fault-tolerant system is suitable for you depends on factors such as budget, need for 100 percent uptime, and data integrity. For example, some businesses can afford to risk a few moments of connectivity issues during an event that would otherwise cause a complete outage, like news websites or streaming services. Medical or financial institutions, however, cannot.

Your users will have expectations, and weighing those expectations versus the cost of building out infrastructure capable of meeting them will ultimately determine which solution is best for you and your business. Every business needs to consider the cost, risks, and benefits of always being online and have a disaster recovery plan before problems arise.

Not sure what will work for you? Contact our sales team to discuss the best option and the solution that will fit your needs.

About the Author: Ronald Caldwell

Ron is a Technical Writer at Liquid Web working with the Marketing team. He has 9+ years of experience in Technology. He obtained an Associate of Science in Computer Science from Prairie State College in 2015. He is happily married to his high school sweetheart and lives in Michigan with her and their children.