VMW Microsoft Exchange Server 2019 On Vmware Best Practices
VMW Microsoft Exchange Server 2019 On Vmware Best Practices
VMW Microsoft Exchange Server 2019 On Vmware Best Practices
VMware vSphere
Exchange Server 2019 / vSphere 7.0
BEST PRACTICES GUIDE
Microsoft Exchange Server on
VMware vSphere
Table of Contents
1. Introduction ........................................................................................................................................... 5
1.1 Purpose........................................................................................................................................ 5
1.2 Target Audience .......................................................................................................................... 5
1.3 Scope ........................................................................................................................................... 6
1.4 External References .................................................................................................................... 6
2. ESXi Host Best Practices for Exchange ............................................................................................... 7
2.1 CPU Configuration Guidelines ..................................................................................................... 7
2.1.1 Physical and Virtual CPUs ...................................................................................................... 7
2.1.2 Architectural Limitations in Exchange Server .......................................................................... 7
2.1.3 vSphere Virtual Symmetric Multiprocessing ............................................................................ 7
2.1.4 CPU Reservations ................................................................................................................... 9
2.1.5 Virtual Cores and Virtual Sockets............................................................................................ 9
2.1.6 Hyper-threading ..................................................................................................................... 12
2.1.7 “L1 Terminal Fault – VMM” and Hyper-threading.................................................................. 12
2.1.8 Non-Uniform Memory Access ............................................................................................... 12
2.1.9 vNUMA and CPU Hot Plug .................................................................................................... 14
2.2 Memory Configuration Guidelines ............................................................................................. 14
2.2.1 ESXi Memory Management Concepts .................................................................................. 14
2.2.2 Virtual Machine Memory Concepts ....................................................................................... 14
2.2.3 Memory Tax for Idle Virtual Machines ................................................................................... 15
2.2.4 Allocating Memory to Exchange Virtual Machines ................................................................ 15
2.2.5 Memory Hot Add, Over-subscription, and Dynamic Memory ................................................ 16
2.3 Storage Virtualization ................................................................................................................. 17
2.3.1 Raw Device Mapping ............................................................................................................ 19
In-Guest iSCSI and Network-Attached Storage ................................................................................. 21
2.3.2 Virtual SCSI Adapters ........................................................................................................... 21
2.3.3 Virtual SCSI Queue Depth .................................................................................................... 22
2.3.4 A Word on MetaCacheDatabase (MCDB) ............................................................................ 23
2.3.5 Exchange Server 2019 on All-Flash Storage Array .............................................................. 23
2.3.6 Using VMware vSAN for Microsoft Exchange Server Workloads ......................................... 26
2.3.6.1. Hybrid vs. All-Flash vSAN for Exchange Server ............................................................... 27
2.3.6.2. General vSAN for Exchange Server Recommendations .................................................. 27
2.4 Networking Configuration Guidelines ........................................................................................ 28
2.4.1 Virtual Networking Concepts ................................................................................................. 28
2.4.2 Virtual Networking Best Practices ......................................................................................... 30
2.4.3 Sample Exchange Virtual Network Configuration ................................................................. 30
List of Figures
1. Introduction
Microsoft Exchange Server is the dominant enterprise-class electronic messaging and collaboration
application in the industry today. Given the multitude of technical and operational enhancements in the
latest released version of Microsoft Exchange Server (2019), customers are expected to continue using
Exchange Server, which should retain its dominant position in the enterprise.
Concurrent usage of the Exchange Server native high availability feature (Database Availability Group or
DAG) with VMware vSphere® native high availability features has been fully and unconditionally
supported by Microsoft since Exchange Server 2019. Microsoft continues the trend by extending this
declarative statement of support for virtualization to the 2019 version of Exchange Server.
Because the vSphere hypervisor is part of the Microsoft Server Virtualization Validation Program (SVVP),
virtualizing an Exchange Server 2019 instance on vSphere is fully supported.
This document provides technical guidance for VMware customers who are considering virtualizing their
Exchange Server on the vSphere virtualization platform.
Enterprise communication and collaboration is now so integral to an organization’s operations that
applications such as Exchange Server are now routinely classified as mission-critical. Organizations
expect measurable and optimal performance, scalability, reliability, and recoverability from this class of
applications. The main objective of this guide is to provide the information required to help a customer
satisfy the operational requirements of running Exchange Server 2019 on all currently shipping and
supported versions of VMware vSphere up to vSphere version 7.0.
1.1 Purpose
This guide provides best practice guidelines for deploying Exchange Server 2019 on vSphere. The
recommendations in this guide are not specific to any particular hardware, nor to the size and scope of
any particular Exchange implementation. The examples and considerations in this document provide
guidance but do not represent strict design requirements, as the flexibility of Exchange Server 2019 on
vSphere allows for a wide variety of valid configurations.
1.3 Scope
The scope of this document is limited to the following topics:
• VMware ESXi™ Host Best Practices for Exchange – Best practice guidelines for preparing the
vSphere platform for running Exchange Server 2019. Guidance is included for CPU, memory,
storage, and networking.
• Using VMware vSphere vMotion®, VMware vSphere Distributed Resource Scheduler™ (DRS),
and VMware vSphere High Availability (HA) with Exchange Server 2019 – Overview of vSphere
vMotion, vSphere HA, and DRS, and guidance for usage of these vSphere features with
Exchange Server 2019 virtual machines (VM).
• Exchange Performance on vSphere – Background information on Exchange Server performance
in a VM. This section also provides information on official VMware partner testing and guidelines
for conducting and measuring internal performance tests.
• VMware Enhancements for Deployment and Operations – A brief look at vSphere features and
add-ons that enhance deployment and management of Exchange Server 2019.
The following topics are out of scope for this document.
• Design and Sizing Guidance – Historically, sizing an Exchange environment is a guessing game,
even after using the Exchange Server Role Requirements Calculator (also known as Exchange
Calculator) available from Microsoft. As of this writing, Microsoft has not updated the Exchange
Calculator to include sizing considerations for Exchange Server 2019. This gap makes it especially
critical for customers to be judicious in baselining their Exchange Server sizing exercise – to not only
ensure that they allocate adequate resources to the Exchange Server workloads, but to also ensure
they do not unnecessarily over-allocate such resources.
This and other guides are limited in focus to deploying Microsoft Exchange Server workloads on VMware
vSphere. Exchange deployments cover a wide subject area, and Exchange-specific design principles
should always follow Microsoft guidelines for best results.
VMware strongly recommends allocating resources to a VM based on the actual needs of the applications
hosted on the VM.
The ESXi scheduler uses a mechanism called relaxed co-scheduling to schedule processors. Strict co-
scheduling requires all vCPUs to be scheduled on physical cores simultaneously, whereas relaxed co-
scheduling monitors time skew between vCPUs to make scheduling or co-stopping decisions. A leading
vCPU might decide to co-stop itself to allow for a lagging vCPU to catch up. Consider the following points
when using multiple vCPUs:
• VMs with multiple vCPUs perform well in the latest versions of vSphere, as compared with older
versions where strict co-scheduling was used.
• Regardless of relaxed co-scheduling, the ESXi scheduler prefers to schedule vCPUs together,
when possible, to keep them in sync. Deploying VMs with multiple vCPUs that are not used
wastes resources and might result in reduced performance of other VMs.
For detailed information regarding the CPU scheduler and considerations for optimal vCPU allocation,
please see the section on ESXi CPU considerations in Performance Best Practices for VMware
vSphere 7.0.
• VMware recommends allocating multiple vCPUs to a VM only if the anticipated Exchange
workload can truly take advantage of all the vCPUs.
• Use the Microsoft-provided Exchange Server Role Requirements Calculator tool to aid in your
sizing exercise.
Since Exchange Server 2019 requires a minimum OS version of Windows Server 2019, and the operating
system does not suffer from the same socket limitations of prior Windows versions, it is logical to assume
that previous guidance to “leave cores-per-socket at default” would persist. This is no longer valid,
however, due to the architectural changes and optimizations VMware has made to CPU scheduling
algorithms in newer versions of vSphere (since version 6.5).
VMware now recommends that, when presenting vCPUs to a VM, customers should allocate the vCPUs
in accordance with the PHYSICAL NUMA topology of the underlying ESXi Host. Customers should
consult their hardware vendors (or the appropriate documentation) to determine the number of sockets
and cores physically present in the server hardware and use that knowledge as operating guidance for
VM CPU allocation. The recommendation to present all vCPUs to a VM as “sockets” is no longer valid in
modern vSphere/ESXi versions.
The following is a high-level representation of the new vCPU allocation for VMs in a vSphere version 6.5
infrastructure and newer.
Figure 2. New Virtual Machine CPU Allocation Recommendation
VMs, including those running Exchange Server 2019, should be configured with multiple virtual sockets
and cores which, together, equal the number of vCPUs intended. This sockets-cores combination should
reflect the topology of the sockets-cores present on the motherboard.
Where the number of vCPUs intended for a VM is not greater than the number of cores present in one
physical socket, all of the vCPUs so allocated should come from one socket. Conversely, if a VM requires
more vCPUs than are physically available in one physical socket, the desired number of vCPUs should
be evenly divided between two sockets.
NOTE: The 2-socket prescription is based on Microsoft’s restated requirements for a single Exchange
Server.
While VMs using vNUMA may benefit from this option, the recommendation for these VMs is to use virtual
sockets (CPUs in the web client). Exchange Server 2019 is not a NUMA-aware application and
performance tests have shown no significant performance improvements by enabling vNUMA. However,
Windows Server 2019 OS is NUMA-aware and Exchange Server 2019 (as an application) does not
experience any performance, reliability, or stability issues attributable to vNUMA.
2.1.6 Hyper-threading
Hyper-threading technology (recent versions are called symmetric multithreading, or SMT) allows a single
physical processor core to behave like two logical processors, so that two independent threads are able
to run simultaneously. Unlike having twice as many processor cores that can roughly double
performance, hyper-threading can provide anywhere from a slight to a significant increase in system
performance by keeping the processor pipeline busier. For example, an ESXi host system enabled for
SMT on an 8-core server sees 16 threads that appear as 16 logical processors.
Previous guidance provided by Microsoft regarding Exchange sizing and the use of hyper-threading led to
some confusion among those looking at virtualizing Exchange Server. Microsoft has since updated all
applicable documents to clarify that statements relating to hyper-threading and Exchange Server do not
apply to virtualization platforms. Microsoft’s guidance in this respect is expected to complement
Microsoft’s “Preferred Architecture” design option, which does not incorporate any virtualization design
choices, options or considerations. See Ask The Perf Guy: What’s The Story With Hyper-threading and
Virtualization? for the most recent guidance from Microsoft.
vSphere uses hyper-threads to provide more scheduling choices for the hypervisor. Hyper-threads
provide additional targets for worlds, a schedulable CPU context that can include a vCPU or hypervisor
management process. For workloads that are not CPU-bound, scheduling multiple vCPUs onto a physical
core’s logical cores can provide increased throughput by increasing the work in the pipeline. The CPU
scheduler schedules to a whole core over a hyper-thread, or partial core, if CPU time is lost due to hyper-
thread contention. Consequently, VMware recommends enabling hyper-threading on the ESXi host if the
underlying hardware supports the configuration.
to work effectively, size the VM to fit within a single NUMA node. This placement is not a guarantee,
however, as the scheduler migrates a VM between NUMA nodes based on the demand.
The second mechanism for providing VMs with NUMA capabilities is vNUMA. When enabled for vNUMA,
a VM is presented with the NUMA architecture of the underlying hardware. This allows NUMA-aware
operating systems and applications to make intelligent decisions based on the underlying host’s
capabilities. By default, vNUMA is automatically enabled for VMs with nine or more vCPUs on vSphere.
Because Exchange Server 2019 is not NUMA-aware, enabling vNUMA for an Exchange VM does not
provide any additional performance benefit, nor does doing so incur any performance degradation.
Consider sizing Exchange Server 2019 VMs to fit within the size of the physical NUMA node for best
performance. The following figure depicts an ESXi host with four NUMA nodes, each comprising 20
physical cores and 128GB of memory. The VM allocated with 20 vCPUs and 128 GB of memory can be
scheduled by ESXi onto a single NUMA node. Likewise, a VM with 40 vCPUs and 256 GB RAM can be
scheduled on 2 NUMA nodes.
Figure 3. NUMA Architecture Sizing Scenarios
A VM allocated with 24 vCPUs and 128 GB of memory must span NUMA nodes in order to accommodate
the extra four vCPUs, which might then cause the VM to incur some memory access latency as a result of
four vCPUs outspanning a single NUMA node. The associated latency can be minimized or avoided
through the use of the appropriate combination of vNUMA control options in the VM’s Advanced
Configuration options. See Specifying NUMA Control in the VMware vSphere Resource Management
Guide.
While a VM allocated with 48 vCPUs and 256 GB can evenly span multiple NUMA nodes without
incurring the memory access latency issues described earlier, such a configuration is neither
recommended nor supported because the number of NUMA nodes (sockets) required to accommodate
the configuration exceeds Microsoft’s maximum 2-sockets recommendation.
For large environments, VMware strongly recommends that customers thoroughly test each configuration
scenario to determine whether additional latency associated with remote memory-addressing warrants
creating additional, smaller rather than larger VMs.
Verify that all ESXi hosts have NUMA enabled in the system BIOS. In some systems, NUMA is enabled
by disabling node interleaving.
• Swappable – VM memory that can be reclaimed by the balloon driver or by vSphere swapping.
Ballooning occurs before vSphere swapping. If this memory is in use by the VM (touched and in
use), the balloon driver causes the guest operating system to swap. Also, this value is the size of
the per-VM swap file that is created on the VMware vSphere Virtual Machine File System
(VMFS).
• If the balloon driver is unable to reclaim memory quickly enough, or is disabled or not installed,
vSphere forcibly reclaims memory from the VM using the VMkernel swapping mechanism.
memory size to the Exchange Server, or to limit memory access for other, non-essential VMs in
the vSphere cluster.
• For production systems, it is possible to achieve this objective by setting a memory reservation to
the configured size of the Exchange Server VM.
Note the following:
o Setting memory reservations might limit vSphere vMotion. A VM can be migrated only if the target
ESXi host has free physical memory equal to or greater than the size of the reservation.
o Setting the memory reservation to the configured size of the VM results in a per-VM VMkernel
swap file of near zero bytes that consumes less storage and eliminates ESXi host-level swapping.
The guest operating system within the VM still requires its own page file.
o Reservations are recommended only when it is possible that memory might become
overcommitted on hosts running Exchange VMs, when SLAs dictate that memory be guaranteed,
or when there is a desire to reclaim space used by a VM swap file.
o There is a slight, appreciable performance benefit to enabling memory reservation, even if
memory over-commitment in the vSphere cluster is not expected.
• It is important to right-size the configured memory of a VM. This might be difficult to determine in
an Exchange environment because the Exchange JET cache is allocated based on memory
present during service start-up. Understand the expected mailbox profile and recommended
mailbox cache allocation to determine the best starting point for memory allocation.
• Do not disable the balloon driver (which is installed with VMware Tools ™) or any other ESXi
memory-management mechanism.
Note the following:
o Transparent Page Sharing (TPS) enables ESXi hosts to more efficiently utilize its available
physical memory to support more workloads. TPS is useful in scenarios where multiple VM
siblings share the same characteristics (e.g., the same OS and applications). In this configuration,
vSphere is able to avoid redundancy by sharing similar pages among the different Exchange
Server VMs. This sharing is transparent to the applications and processes inside the VM.
For security reasons, inter-VM page-sharing is disabled by default on current versions of vSphere. While
a VM continues to benefit from TPS in this configuration (i.e., the VM is able to share pages internally
among its own processes and components), a greater benefit can be realized by enabling inter-VM page-
sharing. See Sharing Memory Across Virtual Machine in the vSphere Resource Management Guide.
Enable DRS to balance workloads in the ESXi host cluster. DRS and reservations can give critical
workloads the resources they require to operate optimally. More recommendations for using DRS with
Exchange Server 2019 are available in the Using vSphere Technologies with Exchange Server 2019
section below.
request their resources at the same time at any point in time, resource over-commitment on the ESXi or
cluster occurs.
Transient resource over-commitment is possible within a virtual environment. Frequent or sustained
occurrence of such incidents is problematic for critical applications such as Exchange Server.
Dynamic memory is a Microsoft Hyper-V construct that does not have a direct equivalence on vSphere.
Even in an over-commitment scenario, the VM on vSphere is never induced to believe that its allocated
memory has been physically reduced. vSphere uses other memory-management techniques for
arbitrating contentions during a resource over-commitment condition.
Microsoft Exchange Server’s JET cache is allocated based on the amount of memory available to the
operating system at the time of service start-up. After being allocated, the JET cache is distributed among
active and passive databases. With this model of memory pre-allocation for use by Exchange databases,
adding memory to a running Exchange VM provides no additional benefit unless the VM was rebooted or
Exchange services restarted. Consequently, memory hot-add is neither useable by nor beneficial to an
Exchange Server VM and is therefore neither recommended nor supported. In contrast, removing
memory JET has allocated for database consumption impacts performance of the store worker and
indexing processes by increasing processing and storage I/O.
Microsoft support for the virtualization of Exchange Server 2019 states that the over-subscription and
dynamic allocation of memory for Exchange VMs is not supported. To help avoid confusion, refer to the
preceding paragraphs to understand why these requirements are not relevant to Exchange Servers
virtualized on the vSphere platform.
Over-subscription is different from over-commitment. Over-subscription is benign and does not impact
VMs. Over-commitment is the adverse extension of over-subscription and should be avoided in all cases.
However, if it’s expected that an ESXi cluster may occasionally experience resource contention as a
result of memory over-commitment, VMware recommends judiciously reserving all memory allocated to
Exchange Server VMs.
Because ESXi does not support hot-unplug (i.e., hot removal) of memory from a Windows VM, the only
way to reduce the amount of memory presented to a VM running Exchange Server 2019 is to power off
the VM and change the memory allocation. When the VM powers on again, the OS will see the new
memory size, and Exchange Server will reallocate the available memory to its worker processes. This is
not dynamic memory.
Exchange Server has improved significantly in recent releases and Exchange Server 2019 continues
those improvements, making Exchange Server less storage I/O intensive than before. This reality informs
Microsoft’s preference for commodity-class direct-attached storage (DAS) for Exchange Server. While the
case for DAS and JBOD storage for Exchange appears reasonable from an I/O perspective, the
associated operational and administrative overhead for an enterprise-level production Exchange Server
infrastructure do not justify this guidance.
To overcome the increased failure rate and shorter lifespan of commodity storage, Microsoft routinely
recommends maintaining multiple copies of Exchange data across larger sets of storage and Exchange
Servers than operationally necessary.
While VMware supports the use of converged storage solutions for virtualizing Exchange Server on
vSphere, VMware recommends that customers using such solutions thoroughly benchmark and validate
the suitability of such solutions and engage directly with the applicable vendors for configuration, sizing,
performance and availability guidance.
Even with the reduced I/O requirements of an Exchange Server instance, without careful planning,
storage access, availability and latencies can still manifest in an Exchange server infrastructure.
VMware recommends setting up a minimum of four paths from an ESXi host to a storage array. To
accomplish this, the host requires at least two host bus adapter (HBA) ports.
Because the Exchange Server clustering option does not require sharing disks among the nodes, the only
scenario for RDM disks for a virtualized Exchange Server on vSphere is one for which the backup
solution vendor requires such configuration.
The decision to use VMFS or RDM for Exchange data should be based on technical requirements. The
following table summarizes the considerations when making a decision between the two.
Table 2. VMFS and Raw Disk Mapping Considerations for Exchange Server 2019
VMFS RDM
• Volume can contain many VM disk files, • Ideal if disks must be dedicated to a
reducing management overhead single VM
• Increases storage utilization and provides • May be required to leverage array-level
better flexibility and easier administration backup and replication tools (VSS)
and management integrated with Exchange databases
• Supports existing and future vSphere • Facilitates data-migration between
storage virtualization features physical and VMs using the LUN swing
method
• Fully supports VMware vCenter™ Site
Recovery Manager™ • Fully supports vCenter Site Recovery
Manager
• Supports the use of vSphere vMotion,
vSphere HA, and DRS • Supports vSphere vMotion, vSphere HA,
and DRS
• Supports VMFS volumes and virtual
disks/VMDK files up to 62TB • Supports presenting volumes of up to
64TB (physical compatibility) and 62TB
(virtual compatibility) to the guest
operating system
While increasing the default queue depth of a virtual SCSI controller can be beneficial to an Exchange
Server VM, the configuration can also introduce unintended adverse effects in overall performance, if not
done properly. VMware highly recommends that customers consult and work with the appropriate storage
vendor’s support personnel to evaluate the impact of such changes and obtain recommendations for
other adjustments that may be required to support the increase in queue depth of a virtual SCSI
controller. See Large-scale workloads with intensive I/O patterns might require queue depths significantly
greater than Paravirtual SCSI default values and Changing the queue depth for QLogic, Emulex, and
Brocade HBAs for further information.
expensive than traditional storage arrays when it comes to physical capacity cost-per-GB. Why would
customers be interested in an all-flash storage option for Exchange Server 2019?
While many are well aware of the performance benefits of all-flash storage, the latest generation of all-
flash storage also offers:
• Built-in data services, such as, always-on thin provisioning, inline data deduplication, and inline
data compression that provide compelling data reduction ratio
• Flash-optimized data protection that replaces traditional RAID methodologies and simplifies
Exchange Server sizing and capacity planning efforts while minimizing protection overhead and
performance penalty
• Instant space efficient copies via Volume Shadow Service (VSS) integration that significantly
increases efficiency and operational agility for DAGs and can be used for local data protection
and potentially replace lagged copies
Exchange Server administrators are constantly facing a number of challenges when upgrading:
• The architecture for Exchange Server is rigid by nature; the design is intended to be built once
and maintained with minor changes until decommission. Exchange Server admins are tasked
with predicting projected mailbox growth and usage patterns for the upcoming four to five-year
period. Storage is over-provisioned to avoid the potentially costlier mistake of under-sizing. The
nature of SSD allows modern all-flash vendors to support 100% storage-on-demand with always-
on thin provisioning and no performance impact. The initial acquisition cost of storage is
significantly driven down.
• Exchange Server has long ago moved away from single instance storage. Exchange Server
DAGs consume between 2x and 6x the capacity required to store a production copy of
databases. Most companies report that capacity requirements have multiplied by a factor of 6x
after migrating from Exchange Server 2003/2007 to version 2010/2013. Exchange Server data
that used to consume 12TB of space in an Exchange Server 2003 single-copy cluster, now
consumes 72TB in a three-copy DAG. With the right all-flash storage, initial database copies can
be reduced in capacity. Passive DAG copies can be created in seconds via VSS integrated copy
technology and consume no additional space.
• Distribution Groups (DLs) is a de facto method for transmitting messages within organizations.
Coupled with the fact that nearly 70% of all email messages contain attachments and every
attachment is stored repeatedly in every DL member's inbox, mailboxes and mailbox databases
are larger than ever before. Massive opportunities exist to increase storage efficiency with all-
flash storage solutions that offer inline data deduplication and compression.
• Many organizations are forced to run their Exchange Server in online mode for several reasons,
including virtual desktop infrastructure (VDI), security and governance (cannot use OSTs),
workflow applications (cannot tolerate cached versions of mailbox items), and HIPPA regulations.
Online mode increases the I/O requirements compared to cached mode by 270%. Performance
can still be a critical consideration for Exchange Server deployment in many cases.
During a study of Exchange Server on EMC XtremIO, EMC found incredible efficiencies resulting in a
reduction of the total disk required to manage an Exchange Server environment. These efficiencies led to
significant cost-reduction opportunities over a five-year period. Further to the point, the total cost of
ownership (TCO) of XtremIO mirrored that of alternative solutions including VNX and Microsoft preferred
architecture, while offering tangibly improved performance and simpler storage management.
The TCO study above is based on a straw man configuration of 10,000 seats and 2GB average mailbox
size, with 150 messages sent and received per user, per day. For the purposes of the study, TCO
includes all aspects of installing, managing, cooling, supporting, and paying for facilities costs typically
found in most TCO models. Figure 9 above shows three Exchange 2010 implementations based on three
different storage devices. All other aspects of the Exchange implementation are held constant (i.e.,
number of servers, Ethernet ports, admins, mailboxes, databases and database copies), with the only
variations occurring relative to storage and its associated costs (i.e., maintenance, installation, facilities
costs, power and cooling).
Reviewing the results, all three storage configurations incurred average mailbox costs within 25 cents of
each other, with hardware costs potentially bringing these further in line. Prices used in this TCO study
are the typical of those found in the open marketplace and do not include special discounts, list prices, or
one-time offers.
The data-reduction ratio for EMC XtremIO, resulting from the combination of thin provisioning,
deduplication, compression, and space-efficient copies with XtremIO Virtual Copies, was 7:1. The data
reduction ratio increases as more mailboxes and DAG copies are placed onto the array, making XtremIO
even more attractive for larger deployments.
Figure 10 - Data Reduction Ratio on XtremIO
All-flash array is typically more expensive than traditional storage array or DAS. As customers move past
the initial hardware acquisition cost and begin to consider the efficiencies and operational agility of all-
flash array, they’re more likely to realize its value as a compelling storage platform for Exchange
deployments and its efficient TCO.
References:
vSAN can be configured as a hybrid or an all-flash storage. In a hybrid disk architecture, vSAN leverages
flash-based devices (SAS/SATA SSD or NVMe SSD) for cache, and magnetic disks for capacity. In an
all-flash vSAN architecture, vSAN can use flash-based devices for both cache tier, and capacity tier.
vSAN is distributed object-based storage that leverages the Storage Policy Based Management (SPBM)
vSphere feature to deliver centrally managed infrastructure, application-centric storage services, and
storage capabilities at a granular level. Administrators can specify storage attributes, such as capacity,
performance, and availability as a policy on a per-object (such as an individual VMDK) level or can also
apply to all objects within a virtual machine.
Hosting virtualized Microsoft Exchange Server workloads on VMware vSAN storage requires the same
careful considerations for the underlying storage subsystem, exactly in the same manner as it would be
for a physical storage array. Before making design decisions, ensure that the type of workload to be
hosted is analyzed, and the performance, availability, and capacity requirements are collected. It is
particularly important to ensure that you input the correct storage information when using the “Exchange
Calculator” for your sizing exercise.
While the overall disk, partition and volume sizing decisions will be largely driven by a combination of the
prescriptions dictated by the “Calculator”, the number of mailbox servers and databases, as well as the
volumes and characteristics of the emails sent, received and stored, VMware is providing the following
recommendations to help customers select the right vSAN configuration which will best ensure the most
optimal performance, resilience and support for the various database and logs read/write/search
operations expected in the environment.
• A minimum of 10Gb networking is required, both host networking and physical switch. The NIC
must listed on the HCL. The switch must be a non-blocking high buffer count switch.
• A minimum of 2 disk groups per host is recommended.
• vSAN Services to Enable for a virtualized Microsoft Exchange Server infrastructure:
o vSAN Encryption – if required.
• Configure and set SPBM for Exchange Server data:
• Failures to tolerate: Use the vSAN Default Storage Policy’s “Failures to Tolerate” of 1 for
Transaction Logs and Database VMDKs.
o Number of disk stripes per object: The default policy of stripe width (1) is sufficient and
optimal for most Exchange Server infrastructure.
o If you are trying to increase performance consider spreading the data between multiple
VMDKs attached to multiple PVSCSI controllers. Check the recommended virtual disk
design in the previous section of this document.
o Use separate VMDKSs for Logs and Databases
For exceptional cases where even more performance and availability are highly important, consider the
following:
• Use an All-flash vSAN deployment.
• Consider using at least SAS SSD devices. A SAS SSD device has a larger queue depth and will
perform better than a SATA SSD device in most cases.
• RAID 1 mirroring and at least 1 failure to tolerate (FTT) for both database and logs VMDKs.
• While additional availability requirements might tempt Exchange Server Administrators and
Architects to want to increase the FTT, Vmware recommends that customers consider increasing
the number of DAG copies on the Exchange Server side instead.
o NOTE: More DAG copies require more storage
• If a multi-site availability is required, the “vSAN Stretched Cluster” configuration may be used to
increase the data availability across datacentres
• Consider using high performance networking devices for the vSAN backend network. Use at least
10 Gbit switches with enough buffers to sustain high throughput.
Note The examples do not reflect design requirements and do not cover all possible Exchange network
design scenarios.
The virtual networking layer includes virtual network adapters and the virtual switches. Virtual switches
are the key networking components in vSphere. The following figure provides an overview of virtual
networking in vSphere.
Figure 12. vSphere Virtual Networking Overview
As shown in the preceding figure, the following components make up the virtual network:
• Physical switch – vSphere host-facing edge of the physical local area network
• NIC team – group of physical NICs connected to the same physical/logical networks to provide
redundancy
• Physical network interface (pnic/vmnic/uplink) – provides connectivity between the ESXi host and
the local area network
• vSphere switch (standard and distributed) – the virtual switch is created in software and provides
connectivity between VMs. Virtual switches must uplink to a physical NIC (also known as vmnic)
to provide VMs with connectivity to the LAN, otherwise VM traffic is contained within the virtual
switch.
• Port group – used to create a logical boundary within a virtual switch. This boundary can provide
VLAN segmentation when 802.1q trunking is passed from the physical switch, or it can create a
boundary for policy settings.
• Virtual NIC (vNIC) – provides connectivity between the VM and the virtual switch
• VMkernel (vmknic) – interface for hypervisor functions, such as connectivity for NFS, iSCSI,
vSphere vMotion, and VMware vSphere Fault Tolerance logging
• Virtual port – provides connectivity between a vmknic and a virtual switch
It’s not required to separate client access and DAG replication traffic onto different network adapters.
However, this configuration is still a general practice among many customers. Although VMware
encourages customers to validate this configuration with their Microsoft support representatives, the
diagram depicts a configuration for completeness.
In the vSphere environment, traffic separation can be established using virtual or physical networks. The
figure above provides examples of the following two scenarios:
• The scenario on the left depicts an ESXi host with two network interfaces, teamed for redundancy
and using virtual networks and port groups to provide traffic separation for client access and DAG
replication traffic. This scenario can also utilize VMware vSphere Network I/O Control for dynamic
traffic prioritization.
• The scenario on the right depicts an ESXi host with multiple network interfaces. Physical traffic
separation is accomplished by allocating two vmnics on one network to a virtual switch. These
vmnics are teamed and dedicated to client access network traffic. DAG replication traffic uses a
third vmnic on a separate virtual switch.
In both scenarios, the DAG member VM is connected to both networks, according to best practice.
management on an Exchange Server VM on the vSphere platform. See Best Practices for Performance
Tuning of Latency-Sensitive Workloads in vSphere VMs and the Running Network Latency Sensitive
Workloads section of Performance Best Practices for VMware vSphere 7.0.
NOTE: Keep in mind that many of the prescriptions provided in these documents do not apply to
Exchange Server workloads and might induce suboptimal performance if applied to Exchange Server
VMs. VMware recommends that customers ensure they thoroughly test implementations of these
recommendations in non-production environments.
Server hardware and operating systems are engineered to minimize power consumption. Both the
Windows operating system and vSphere ESXi hypervisor favor minimized power consumption over
performance. Modern vSphere versions (including vSphere 7.0) default to a balanced power scheme. For
critical applications such as Exchange Server, the default power scheme in vSphere 7.0 is not
recommended.
Figure 14. Default ESXi 6.x Power-Management Setting
There are three distinct areas of power management in a vSphere hypervisor virtual environment: server
hardware, hypervisor and guest OS. The following section provides power management and power
setting recommendations for each of these areas.
High Performance The VMkernel detects certain power-management features but will not
use them unless the BIOS requests them for power-capping or thermal
events. This is the recommended power policy for an Exchange Server
running on ESXi.
Not supported The host does not support any power-management features or power
management is not enabled in the BIOS.
VMware recommends setting the high-performance power policy for ESXi hosts in an infrastructure
hosting Microsoft Exchange Server VMs. It’s possible to select a policy for a host using the vSphere Web
Client. If a policy is not selected, ESXi uses balanced by default.
When a CPU runs at lower frequency, it can also run at lower voltage, which saves power. This type of
power management is called dynamic voltage and frequency scaling (DVFS). ESXi attempts to adjust
CPU frequencies so that VM performance is not affected.
When a CPU is idle, ESXi can take advantage of deep halt states (known as C-states). The deeper the C-
state, the less power the CPU uses, but the longer it takes for the CPU to resume running. When a CPU
becomes idle, ESXi applies an algorithm to predict how long it will be in an idle state and chooses an
appropriate C-state to enter. In power-management policies that do not use deep C-states, ESXi uses
only the shallowest halt state (C1) for idle CPUs.
Microsoft recommends the high-performance power-management policy for applications requiring stability
and performance. VMware supports this recommendation and encourages customers to incorporate it
into their server tuning and administration practices for virtualized Exchange Server VMs.
Figure 17. Recommended Windows Guest Power Scheme
3.1.1 vSphere HA
With vSphere HA, Exchange Server VMs on a failed ESXi host can be restarted on another ESXi host.
This feature provides a cost-effective failover alternative to third-party clustering and replication solutions.
When using vSphere HA, users should be aware of the following:
• vSphere HA handles ESXi host hardware failure and does not monitor the status of the Exchange
services. These must be monitored separately.
• A vSphere HA heartbeat is sent using the vSphere VMkernel network, so optimal uplink
bandwidth and redundancy in this network are strongly recommended.
• Allowing two nodes from the same DAG to run on the same ESXi host for an extended period is
not recommended when using symmetrical mailbox database distribution. This condition will
create a single-point-of-failure scenario if the two nodes have the only copies of one or more
mailbox databases. DRS anti-affinity or guest-to-host affinity rules should be used to mitigate the
risk of running active and passive mailbox databases on the same ESXi host.
One of the most common challenges when performing a vMotion operation on clustered Exchange
Mailbox Server (using DAG) is the potential to trigger unintended database failover during the vMotion
operation. This can occur under the following conditions:
• Resource constraints within the vSphere cluster, which makes it difficult for the vMotion operation
to complete on time, as vSphere tries to find enough compute resources to accommodate the
migrated VM on the target host
• Network congestion or constraints – a vMotion operation copies the state of a VM over the
network. If there is no adequate network throughput, the copy operation will take longer to
complete. The operation could also be abandoned midway if it is determined that the operation
cannot be completed within a reasonable time.
Even under ideal conditions, the heavy load of Exchange workloads and memory usage can cause a
vSphere vMotion operation to trigger a database failover. Database failovers are not necessarily a
problem if the environment is designed to properly distribute the load and can help to validate the cluster
health by activating databases that might normally go for weeks or months without accepting a user load.
However, many administrators prefer that database activations be a planned activity, or only done in the
case of a failure. For this reason, VMware has studied the effect of vSphere vMotion on Exchange DAG
members and provided the following best practice recommendations:
This is a known behavior. See Tuning Failover Cluster Network Thresholds for a detailed discussion that
includes avoiding an unintended cluster failover incident (and its associated disruptive effects) when
performing a vSphere vMotion operation on a DAG node. There are several configuration options
described in the following sections that can be used to overcome these disruptive effects.
It is, therefore, no longer necessary for customers to adjust these settings as VMware finds the default
values in Windows Server 2019 to be adequate and sufficient.
Using jumbo frames reduces the processing overhead to provide the best possible performance by
reducing the number of frames that must be generated and transmitted by the system. During testing,
VMware tested vSphere vMotion migration of DAG nodes with and without jumbo frames enabled.
Results showed that, with jumbo frames enabled for all VMkernel ports and on the VMware vNetwork
Distributed Switch, vSphere vMotion migrations of DAG member VMs were completed successfully.
During these migrations, no database failovers occurred, and there was no need to modify the cluster
heartbeat setting.
The use of jumbo frames requires that all network hops between the vSphere hosts support the larger
frame size. This includes the systems and all network equipment in between. Switches that do not
support (or are not configured to accept) large frames will drop them. Routers and Layer 3 switches might
fragment the large frames into smaller frames that must then be reassembled, which can cause both
performance degradation and a pronounced incidence of unintended database failovers during a vSphere
vMotion operation. Do not enable jumbo frames within a vSphere infrastructure unless the underlying
physical network devices are configured to support this setting.
A quick way to verify this is by running the following command from one ESXi host to another and
examining the returned output for errors or reports of fragmentation:
vmkping -s 8972 -d Target-Host-IP-address
• The source and destination hosts must have compatible CPU models, otherwise migration with
vSphere vMotion fails. If the vSphere cluster contains hosts with different CPU generations, then
Enhanced vSphere vMotion Compatibility (EVC) must be enabled in the cluster to allow vSphere
vMotion operations to succeed. See EVC and CPU Compatibility FAQ for more information on
EVC.
• VMs with smaller memory sizes are better candidates for migration than larger ones
• Persistent resource over-commitment in a vSphere cluster can impede the efficiency of vSphere
vMotion operations
Anti-affinity rules enforce VM separation during power-on operations and vSphere vMotion migrations due
to a DRS recommendation, including a host entering maintenance mode. Prior to vSphere 5.5, if a VM is
enabled for vSphere HA and a host experiences a failure, vSphere HA may power-on a VM and violate a
DRS anti-affinity rule, as vSphere HA does not inspect DRS rules during a recovery task. However, during
the next DRS evaluation (every 5 minutes), the VM is migrated to fix the violation.
To avoid this condition when utilizing DRS with vSphere 5.5, VMware encourages customers to apply the
following vSphere HA Advanced Configuration option to their vSphere Clusters:
das.respectVmVmAntiAffinityRules = TRUE
This setting instructs vSphere HA to inspect and respect vm-vm anti-affinity rules when restarting VMs
after a host failure. The Exchange Server VMs separated by an anti-affinity will not be co-located on the
same ESXi host.
As shown in the following figure, vSphere 7.0 includes an improved, GUI-based configuration option to
control the way in which vSphere HA responds to all DRS rules in a cluster. It is no longer required to
configure the Advanced Configuration parameter manually.
Should run on rules provide soft enforcement of VM placement. If a rule stipulates that a group of VMs
should run on a group of ESXi hosts, those VMs will always be preferentially placed on hosts in the host
group. They can still run on other hosts in the vSphere cluster outside of the host group, if needed (e.g., if
all the hosts in the host group are unavailable or otherwise unsuitable for the VM).
In the following figure, two VM groups and two host groups are defined. Two should run on rules, shown
in the broken green ovals, keep the VMs in each group running on their respective host group. The VM in
the middle is not tied to a group or a rule and might roam. In the case of a failure of all hosts in the group,
VMs bound to those hosts by a should run on rule can be brought back online by vSphere HA.
In an Exchange Server 2019 environment, VM-to-host rules can be used to provide soft or hard
enforcement of VM placement. As an example, consider creating groups of ESXi hosts based on a failure
domain, such as a blade chassis or server rack. Create two VM groups with each containing half of the
Exchange Server VMs and create rules to link each VM group to a host group. In the case of a complete
chassis or rack failure, any VMs that have failed can be powered back on by vSphere HA.
When enabling a vSphere cluster for HA with the intention of protecting DAG members, consider the
following:
• Members of the same DAG should not reside on the same vSphere host for an extended period
of time when databases are symmetrically distributed between members. Allowing two members
to run on the same host for a short period of time (e.g., after a vSphere HA event), even if doing
so may violate resource availability constraints and DRS rule, allows the Exchange server VM to
become operational and for database replication and protection to resume quicker. DAG
members should be separated as soon as operationally feasible (e.g., as soon as the ESXi host
becomes available or additional capacity has been added to the vSphere cluster).
• To adequately protect from an extended server outage, vSphere clusters should be designed in
an N+1 configuration, where N is the number of DAG members. If a hardware failure occurs
causing vSphere HA to power on a failed DAG member, Exchange servers and DAG maintain the
same levels of performance and protection as during normal runtime.
• Use anti-affinity rules to keep DAG members separated. vSphere HA might violate this rule during
a power-on operation (one caused by a host failure), but DRS fixes the violation during the next
interval. To eliminate the possibility of DAG members running on the same host (even for a short
period), must not run on virtual machine to host anti-affinity rules must be used.
the disk subsystem supporting the Exchange Servers at the database level. Although the Jetstress tool
has not been specifically updated for Exchange Server 2019, the current version (which also supports
both 2013 and 2016) is suitable for baselining Exchange Server 2019 storage requirements and
configurations.
The Microsoft Exchange Server Load Generator (LoadGen) simulates client access to an Exchange
infrastructure for the purpose of measuring and analyzing Exchange Server performance under heavy
client activities. At the time of this writing, Microsoft does not plan to update LoadGen specifically for
Exchange Server 2019.
Note The reduction in storage I/O in Exchange Server 2019 may lead to an oversized proposed
configuration when using Exchange Server 2013 or 2016 sizing tools.
It is important to address a concern with the collection of performance metrics from within VMs. Early in
the virtualization of high-performance applications, the validity of in-guest performance metrics came into
question because of a time skew that can be possible in highly overcommitted environments. With the
advancements in hypervisor technology and server hardware, this issue has mostly been addressed,
especially when testing is performed on under-committed hardware. This is validated by Microsoft support
for running Jetstress within VMs. More information on VM support for Jetstress is available in the
Jetstress 2013 Field Guide.
Many of the counters available can be used to help confirm allocations have been set properly when
vCenter Server access is not available or for configuration monitoring. The following table lists counters
that can be actively monitored.
Table 4. Virtual Machine Perfmon Counters of Interest
vSphere and Exchange administrators can also use the counters listed in the following table to monitor
performance at the ESXi host level. Those metrics can then be correlated with metrics from Exchange
VMs. See the section on performance-monitoring utilities in vSphere Monitoring and Performance for a
comprehensive list of performance counters and metrics in vSphere and for information on using
vSphere-native tools to monitor an ESXi host and VM performance.
The preceding table indicates a few key counters that should be added to the list of inspection points for
Exchange administrators. Of the CPU counters, the total used time indicates system load. Ready time
indicates overloaded CPU resources. A significant swap rate in the memory counters is a clear indication
of a shortage of memory, and high device latencies in the storage section point to an overloaded or
misconfigured array. Network traffic is not frequently the cause of most Exchange performance problems,
except when large amounts of iSCSI storage traffic are using a single network line. Check total
throughput on the NICs to see whether the network is saturated.
• Virtual IP (VIP) Address – an IP address and service port number used by the user to access the
service
• Server Pool – the pool of back-end servers that need to be load balanced. A VIP address is
associated with the server pool.
• Service Monitor – defines the health-check parameters for a particular type of network traffic. A
service monitor is associated with the server pool to monitor the pool members.
• Application Profile – defines the behaviour of a particular type of network traffic (e.g., the session
persistence parameter and SSL parameters)
NSX Edge supports both Layer 7 (the recommended load-balancing option without session affinity
requirements in Exchange Server 2019) and Layer 4 load-balancing of HTTP and HTTPS protocols. It
supports multiple load-balancing methods, such as round-robin and least-connection. Layer 7
HTTP/HTTPS VIP addresses are processed after the NSX Edge firewall. NSX Edge uses the faster Layer
4 load-balancer engine. The Layer 4 VIP address is processed before NSX Edge firewall.
The NSX Edge services gateway supports the following deployment models for load-balancer
functionality:
• One-arm load-balancer
• Inline load-balancer
• Power on registered VMs at the recovery site, in the exact order specified in the recovery plan
• Using information contained in recovery plan, reconfigure VM IP addresses, if required
• If configured, recovery step pauses for external administrator’s tasks
• Continue with recovery steps upon completion of administrator’s actions
• Verify that VMware Tools starts successfully on recovered VMs
• Execute any in-guest (or SRM server-hosted) scripts and commands specified in the recovery
plan
• Notify administrators about completion
• Power off recovered VMs (test failover)
• Unregister VMs (test failover)
• Remove storage snapshot from the recovery side (test failover)
• Provide option to configure protection for recovered VMs, as soon as failed site becomes
operational, or to another surviving site
Exchange Server’s native high-availability feature (DAGs) can provide high availability by implementing
database-level intra- and inter-site database replication for some (or all) Exchange Server databases.
Although DAG is an excellent choice for data center high availability, the application-centric nature of a
DAG might not be in line with a company’s DR plans. In addition, configuring DAG for the purposes of
timely and optimal recovery of Exchange services in the event of a catastrophic data center failure is
complex, costly, and less reliable than leveraging the features and capabilities of the VMware Site
Recovery Manager.
Site Recovery Manager is not a replacement for application-aware clustering solutions (such as DAG)
that may be deployed within the guest operating system. Site Recovery Manager provides integration of
the replication solution, vSphere, and optionally customer-developed scripts to provide a simple,
repeatable, and reportable process for DR of the entire virtual environment, regardless of the application.
Site Recovery Manager complements and enhances Exchange DAG capabilities by streamlining,
automating and optimizing recovery operations in the event of a site-level disaster.
The Exchange 2019 Preferred Architecture prescribes a minimum of four DAG copies and three
geographically-dispersed data centers to achieve a semblance of site-resilience and DR with DAG. Even
with these requirements, the following impediments still make Site Recovery Manager a superior DR
solution over DAG:
• No testing capability. Numerous changes happen over the life of a given IT infrastructure. Some
of these changes invalidate previous configurations, scripts and processes, requiring iterative
updates and testing. A DR plan requires reliability and assurance because an actual disaster
event is a poor time to discover that a previously-configured recovery plan has been invalidated
by evolving infrastructure changes. Site Recovery Manager enables continuous, periodic testing
and reconfiguration of recovery plans without inducing interruption or service outage in the
Exchange Server infrastructure. Simulating recovery from a site disaster event with Exchange
DAG requires service-interruption for the duration of the simulation exercise. Post-simulation,
returning the Exchange infrastructure to its prior state – a one-click operation with Site Recovery
Manager – is also a complex undertaking, requiring multiple steps and a lengthy database
reseeding operation.
• Cost efficiency. Site Recovery Manager is more cost-efficient, both in terms of administrative
efforts and financial costs. Each of the four Exchange Servers is required to satisfy the preferred
architecture design requirements of its own server hardware, OS, Exchange, antivirus and other
application licenses, in addition to having the storage required to support the configuration. Even
when configured with DAG as prescribed, the associated administrative, management and
maintenance efforts required to support the design can become quickly overwhelming and
prohibitive. With Site Recovery Manager, it’s possible to achieve a better DR solution with just a
two-member DAG configuration, providing a better and less costly alternative to the preferred
architecture design.
• Unified DR Solution. Exchange DAG is a high-availability solution for Exchange Servers and
only Exchange Servers. An Exchange Server instance has multiple dependencies (e.g., active
directory, backup and other messaging hygiene and security components), which are required to
be available before Exchange Server services can be successfully recovered in the event of a
site disaster. Recovering Exchange Servers alone in this configuration does not add much value
unless the dependencies themselves are recovered as well. Site Recovery Manager is
application-agnostic and suitable for protecting and recovering any server, including the
dependencies that DAG cannot protect. The unification of a DR solution improves efficiency and
reduces costs. In this case, administrators do not have to manage multiple DR solutions,
removing confusion, complexity, and stress in an otherwise challenging DR event.
As stated by Microsoft:
“The specific prescriptive nature of the PA means of course that not every customer will be able
to deploy it (for example, customers without multiple data centers). And some of our customers
have different business requirements or other needs which necessitate a different architecture. If
you fall into those categories, and you want to deploy Exchange on-premises, there are still
advantages to adhering as closely as possible to the PA, and deviate only where your
requirements widely differ”.
DR for a production, enterprise-level Exchange Server infrastructure is a critical design consideration that
requires deviation from Microsoft’s prescriptive guidance.
The following is a high-level overview of Site Recovery Manager as a DR solution for an Exchange Server
infrastructure. See the Microsoft Exchange 2013 on VMware Availability and Recovery Options Guide for
a detailed discussion of this topic.
Using Site Recovery Manager to protect Exchange Server components (including DAG) and
infrastructure is a fully supported configuration. Because Site Recovery Manager is application-agnostic,
it does not interfere with, modify or otherwise affect Exchange Servers. Site Recovery Manager is not
involved in the replication of VM files and data from a protected site to a recovery site. This function is
performed by the applicable storage-replication components of the customer’s choosing. Site Recovery
Manager provides the necessary storage API (i.e., storage replication adapter) required to interact with
the underlying storage infrastructure. Site Recovery Manager does not need to install any agent or
components on the VM.
Site Recovery Manager adds automation and orchestration capabilities to a virtual infrastructure,
affording customers the ability to configure a comprehensive recovery plan that includes every facet of
the recovery steps and actions required to restore services to an Exchange infrastructure in a DR
scenario. Site Recovery Manager includes the capability to pause a recovery operation to allow for
manual administrative intervention where required (for example, reconfigure DNS records or load-
balancer configuration in a non-stretched network) as well as script callouts.
Figure 29. Faster Exchange Service Recovery with Site Recovery Manager Automated DR Workflows
Site Recovery Manager supports all features of a vSphere infrastructure, including, DRS, vSphere HA,
fault-tolerance, and virtual SAN (vSAN). vMotion support includes storage and cross-data center vSphere
vMotion operations. While Site Recovery Manager supports configuring an isolated test network for
testing DR plan, it does not require such configuration – a test failover operation auto-generates the
fenced network that is required to isolate the recovered Exchange infrastructure from the production
environment.
Site Recovery Manager provides multiple topologies and recovery options for protecting an organization’s
Exchange Server infrastructure:
• Active-Passive. Site Recovery Manager supports the traditional active-passive DR scenario,
where a production site running applications is recovered at a second site that is idle until failover
is required. Although the most common configuration, this scenario also means costs are
significant for a DR site that is idle most of the time.
• Active-Active. To make better use of the recovery site, Site Recovery Manager also enables
leveraging the recovery site for other workloads when not in use for DR. Site Recovery Manager
can be configured to automatically shut down or suspend VMs at the recovery site as part of the
failover process so that it’s possible to more easily free up compute capacity for the workloads
being recovered.
• Bi-directional. Site Recovery Manager can also provide bi-directional failover protection so that
active production workloads can be run at both sites and failover to the other site in either
direction. The spare capacity at the other site will be used to run the VMs that are failed over.
• Shared Recovery Sites. Although less common, some customers may need to be able to
failover within a given site or campus, for example, when a storage array failure occurs or when
building maintenance forces movement of workloads to a different campus building.
• Active-Active Datacentres. This is a new topology supported with metro-distance stretched
storage solutions. Production apps run at both sites, and the stretched storage provides
synchronous reads-and-writes on storage when sites are within a metro distance (less than 100
km). Site Recovery Manager is used to orchestrate recovery, or even live migration, of VMs
between sites.
Figure 30. Failover Scenarios with Site Recovery Manager
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.com
Copyright © 2019 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents.
VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.