What is VMware HA and DRS?

Posted on by Dayne Larsen | Updated:
Reading Time: 6 minutes

Downtime for any web professional leads to upset customers, management pressure, lost profits, and damages to a business’s reputation. VMware high availability (HA) with distributed resource scheduler (DRS) capabilities help to mitigate downtime while keeping all virtual machines (VMs) running optimally. We will discuss VMware HA and DRS and why you should choose VMware for any business.

What is a VMware Cluster?

A VMware cluster is a collection of servers running the VMware ESXi hypervisor. ESXi is the underlying operating system that facilitates efficiently creating and running virtual machines on a single server and not a cluster. The controller for a VMware cluster of ESXi hosts is called vCenter. The vCenter VM appliance server provides an advanced management platform for managing any number of clusters.

The same server configuration is recommended for all ESXi hosts in a cluster to simplify management. While VMware does not put a hard requirement on having all ESXi hosts on identical hardware, it must use the same CPU architecture between hosts, no mixing AMD and Intel.

What is VMware HA?

VMware HA’s purpose is to make sure that VM instances are up and running as much as possible. In addition, HA helps to keep VMs in a cluster powered on should there be ESXi host operating system errors, connectivity issues, or hardware failures.

A benefit of high availability is the automatic activation of maintenance mode for failed host servers while keeping VMs powered on via a remaining host within the cluster. Therefore, the HA functionality is essential to any VMware deployment plan.

vSphere HA

When vSphere HA is enabled transparently, a master host is elected and monitors the ESXi hosts in the cluster with a heartbeat check every second. Then, if the host heartbeat is not detected, a heartbeat check to the cluster’s data storage determines if the VM can still communicate with the storage device.

When both checks fail, the host is deemed dead. The VMs restart on a healthy host in the cluster. This instantaneous ability to react to failures and get VMs back up and running via automation is considered high availability. The master host then reports to vCenter on the states of the HA-protected hosts.

What if the ESXi Master Host Goes Offline?

When the elected ESXi master goes offline, it is a minor issue. When vCenter finds the master ESXi host down, a new master host will be selected to continue monitoring, not disrupting functionality.

What if vCenter Goes Offline?

In the HA cluster, vCenter runs as a VM appliance on a host and is susceptible to failure. If a failure occurs, the high availability functions for any powered-on VMs would remain intact even with vCenter down. However, if a VM gets powered on after vCenter goes down, there will be no HA functionality until vCenter is back online.

When vCenter is down, the ability to make cluster setting changes is lost, but some functionality remains intact. VMs can be managed to some extent from their ESXi hosts when vCenter is down. Routinely the vCenter appliance is migrated live to a new host to avoid cluster failure, should DRS deem it necessary.

What is VMware DRS?

VMware DRS provides automated or recommended resource balancing for optimal VM and cluster performance. DRS is a set it and forget it automation tool to keep VMware cluster resource usage balanced and optimized in fully automated mode. This allows developers to concentrate on their primary business operations rather than worry about keeping VMs online.

Which host to power a VM onto is the first recommendation that DRS makes when enabled. First, the DRS algorithm decides which ESXi host is the best fit for a new VM. Next, the algorithm considers how the new VM resource demand will affect the overall cluster balance. Finally, the VM gets powered on and added to the DRS monitoring pool.

The main goal of DRS is to ensure all virtual machines have the resources they require to run efficiently while keeping the cluster balanced. DRS does a health check every 5 minutes of each host's resources and the VM resource usage to keep the cluster balanced. If the DRS algorithm determines that usage demands have significantly changed in the cluster, a live migration of one or multiple VMs, called vMotion, will be triggered.

The vMotion is seamless and keeps all VMs active during the process. There can be some performance hits on both the source and destination ESXi host, which DRS factors in before initiating any vMotion process. The VMware DRS is an indispensable tool to simplify VM management duties while maintaining optimal performance and stability of a cluster.

Requirements for HA

There are a few requirements to have enabled in a VMware cluster for high availability:

  • License: vCenter Server Standard.
  • Multiple ESXi Hosts: Two or more ESXi hosts are necessary to create a highly available cluster.
  • Shared Storage: A shared storage device, such as a storage area network (SAN), must be used for the vSphere cluster. All hosts have to have access to the VM storage device so they can power them back on.
  • Redundant Physical Networking: There need to be at least two physical networking interfaces between the ESXi hosts. Redundant connections help to assure the heartbeats will work as expected should one of the network connections go down.
  • Host Failure Tolerance: HA host failure tolerance is the number of hosts that can go down while the remaining alive hosts can still provide all the CPU and RAM resources needed for all VMs in the cluster.

Requirements for DRS

The requirements of VMware DRS functionality are below:

  • License: VMware Enterprise Plus.
  • Multiple ESXi Hosts: At least two hosts (minimum requirement), but VMware recommends at least three hosts.
  • Shared Storage: The cluster must have a SAN so all the hosts in the cluster can access all VM datastores.
  • CPU: Processors must be from the same manufacturer within a cluster, so no mixing Intel with AMD. CPU architecture mixing is not allowed because when a VM is migrated live from one host to another, the processor on the destination needs to be able to understand the exact instructions executed on the source host.
  • Networking: There must be a dedicated migration network setup between hosts.

How to Configure HA and DRS in VMware

How to Configure VMware HA

Enable VMware HA in the vCenter appliance by logging into your vCenter portal and following the steps below.

Note:
Most customers may not have access to vCenter, and for those who do, it will vary depending on your hosting provider and environment setup.
  1. Navigate to the main HA-Cluster in the Hosts and Clusters section. Once there, click the Configure tab and then the vSphere Availability menu. Next, select the Edit button to the far right of vSphere HA is Turned ON.
  2. In the popup, toggle the vSphere HA button so it turns green to enable high availability. Setting Host Failure Response to Restart VMs is a good option. Select Heartbeat Datastores to monitor the datastores, which will vary depending on the deployment.
  3. Click OK to save and apply the settings.  

How to Configure VMware DRS

These are the steps to enable DRS.

  1. Navigate to the main HA-Cluster in the Hosts and Clusters section. Next, click the Configure tab and then the vSphere DRS menu. 
  2. Select the EDIT button to the far right of the vSphere DRS is Turned ON setting.   
  3. In the new popup, toggle the vSphere DRS slider button so it turns green.
  4. Set the Automation Level. There are three options:
    1. Fully Automated: DRS will determine which ESXi host to initially power a VM onto and move VMs as its algorithm dictates to balance the cluster. The fully automated option is the set it and forget it mode and is recommended for most configurations.
    2. Partially Automated: No further automated balancing will occur after the DRS decides which host to power a VM onto. DRS will only recommend possible balancing actions in this mode, which a user must agree to before any action occurs.  
    3. Manual: DRS will only recommend where to power on a VM and if a move might be good to maintain balance in the cluster. The manual mode requires that a user always manually accept any suggestions.
  5. The Migration Threshold is how aggressively DRS will move VMs to keep the cluster balanced. There are five levels from conservative to aggressive. The middle level is recommended and set by default. However, if VMs are moved around often, adjusting to a more conservative level may be pertinent.
  6. Click OK to save and apply the settings.

Conclusion

Building HA and DRS-enabled VMware private clouds are essential for any business-critical VMs running on a vSphere cluster. HA functionality will assure VMs get powered back on should an ESXi host fail in a cluster. In addition, DRS ensures balanced VMware clusters for the best performance and stability of VMs.

To utilize these functions, it is imperative to properly size the ESXi hosts, which make up the VMware private cloud cluster. With HA and DRS enabled on properly sized ESXi hosts, a VMware Private Cloud cluster will provide superior uptime and stability, providing more time to run the core business with less time spent managing VMs.

Ready to discuss the possibilities of using VMware for your private cloud? Liquid Web’s sales specialists are available to talk via phone or chat 24/7/365. Contact them today to start the process.

Avatar for Dayne Larsen

About the Author: Dayne Larsen

Dayne was born into IT and remembers taking notes as a kid on old mainframe punch cards his mom brought home. With a degree in Technology Education from Montana State University, he enjoys being a helpful human to clients and loves to empower people to tackle challenges. He has worn many hats in his day, from a stay-at-home Dad, to a freelance web designer, LEGO company Merchandiser and even an antique store owner. In his free time you will find him fly fishing, snowboarding, working on his 66' Mustang, home improving, or enjoying his family.

Latest Articles

Blocking IP or whitelisting IP addresses with UFW

Read Article

CentOS Linux 7 end of life migrations

Read Article

Use ChatGPT to diagnose and resolve server issues

Read Article

What is SDDC VMware?

Read Article

Best authentication practices for email senders

Read Article