Maintaining the health and resilience of a vSAN Stretched Cluster requires keeping the vSAN Witness Hosts up to date. In this post, we’ll cover the steps to safely and efficiently update your vSAN Witness Hosts without disrupting cluster operations. Whether you’re applying patches or upgrading to a new version, this guide will help ensure your stretched cluster remains robust and reliable.
Before diving into that topic, I’d like to summarize what vSAN Stretched Clustering is all about
vSAN Stretched cluster
VMware vSAN stretched clustering is a high-availability solution that extends a vSAN cluster across geographically separate sites, enabling active-active data access and seamless failover. Introduced in vSAN 6.1, stretched clusters were designed to meet the growing demand for zero-downtime operations and robust disaster recovery. The solution synchronously replicates data between two active sites with a witness node at a third site, ensuring data consistency and high availability even during a full site outage. With VMware Cloud Foundation (VCF) 5.x, vSAN stretched clusters are more tightly integrated into the platform, offering automated lifecycle management, simplified deployment through SDDC Manager, and native support for workload domain design across sites. This makes stretched clustering a powerful, enterprise-ready option for organizations needing resilient, hybrid-cloud infrastructure without relying on third-party storage solutions.
vSAN Witness
One of the main components of a vSAN stretched cluster is the vSAN Witness—a specialized ESXi host whose primary role is to serve as a quorum-voting mechanism to maintain data consistency and availability during site failures
VMware Broadcom wrote down how to do just this, but how does it work in practice. My blog will show you how to.
VMware Broadcom’s Advise
VMware advises upgrading the vSAN Witness ESXi host after successfully upgrading all data hosts in the vSAN stretched cluster. This sequence ensures that the data hosts are stable and fully operational before updating the witness host. It’s crucial to avoid upgrading the witness host simultaneously with any data host to prevent potential upgrade issues. Therefore, configure vSphere Lifecycle Manager to upgrade the witness host only after all data hosts have been updated and have exited maintenance mode. Additionally, ensure that the witness host’s ESXi version matches the data hosts’ ESXi version to maintain compatibility and cluster stability. Note however, some documentation mention the order of updates the other way around. I don’t believe the latter to be right though.
Steps required to update the vSAN Witness ESXi host
- Download the ISO; in the Release Notes (ESXi 8.0 Update 3d) you will find the download link
- In this specific case this results in a file like the following: VMware-ESXi-8.0U3d-24585383-depot.zip
- Import the downloaded depot ZIP file to vSphere LifeCycle Manager > Actions > Import Update
- Importing takes about a minute
- In vSphere Lifecycle Manager, create an Upgrade Baseline
- New > Baseline > Select Upgrade OR
Edit an Upgrade Baseline
- New > Baseline > Select Upgrade OR
- Attach the Baseline
- Navigate to the vSAN Witness Host
- Click on Updates (vSphere Lifecycle Manager)
- Attach the newly created baseline
- Click Attach
- Check Compliance
- In Aria Operations, if available, make sure the vSAN adapter is disabled for the duration of this change. Also check for the Witness Host object itself, filtering the FQDN
- Put the vSAN Witness ESXi host in Maintenance Mode and check the host for Compliance
- Remediate the vSAN Witness Host
- Agree to the license agreement, confirm the Remediation Settings, click Next and Remediate
- Click Remediate
- You can monitor its progress from the VM’s console as well
- Once the vSAN Witness host returns, normally this takes about ten minutes at most, check if the VM’s ESXi build number is correct. If remediation completed successfully, you can exit Maintenance Mode, and confirm that vSAN Skyline (Online) Health is healthy again.
You can do it too! Please don’t let Skyline Health scare you about non-compliant virtual objects, or the Performance Service – Stats DB object error during the remediation process; these are completely normal