Oracle RAC On VMware VSAN
Oracle RAC On VMware VSAN
Oracle RAC On VMware VSAN
Table of Contents
Executive Summary
– Business Case
– Solution Overview
– Key Results
– Purpose
– Scope
– Audience
– Terminology
Technology Overview
– Technology Overview
– VMware vSphere
– VMware vSAN
– Overview
DOCUMENT | 2
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
– Deployment Benefits
Solution Configuration
– Solution Configuration
– Architecture Diagram
– Hardware Resources
– Software Resources
– Network Configuration
– vSAN Configuration
Solution Validation
– Solution Validation
– Test Overview
– vSAN Resiliency
DOCUMENT | 3
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Conclusion
Reference
– White Paper
– Product Documentation
– Other Documentation
DOCUMENT | 4
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Business Case
Customers deploying Oracle Real Application Clusters (RAC) have requirements such as stringent SLA’s, continued high
performance, and application availability. It is a major challenge for business organizations to manage data storage in these
environments due to the stringent business requirement. Common issues in using traditional storage solutions for business critical
application (BCA) include inadequate performance, scale-in/scale-out, storage inefficiency, complex management, and high
deployment and operating costs.
With more and more production servers being virtualized, the demand for highly converged server-based storage is surging.
VMware vSAN™ aims at providing a highly scalable, available, reliable, and high performance storage using cost-effective
hardware, specifically direct-attached disks in VMware ESXi™ hosts. vSAN adheres to a new policy-based storage management
paradigm, which simplifies and automates complex management workflows that exist in traditional enterprise storage systems
with respect to configuration and clustering.
Solution Overview
This solution addresses the common business challenges discussed in the previous section that CIOs face today in an online
transaction processing (OLTP) environment that requires availability, reliability, scalability, predictability and cost-effective
storage, which helps customers design and implement optimal configurations specifically for Oracle RAC Database on vSAN.
Key Results
The following highlights validate that vSAN is an enterprise-class storage solution suitable for Oracle RAC Database:
Purpose
This reference architecture validates vSAN’s ability to support industry-standard TPC-C like workloads in an Oracle RAC
environment. Oracle RAC on vSAN ensures a desired level of storage performance for mission-critical OLTP workload while
providing high availability (HA) and DR solution.
Scope
This reference architecture:
Demonstrates storage performance scalability and resiliency of enterprise-class 11gR2 Oracle RAC database in a vSAN
environment.
DOCUMENT | 5
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Shows vSAN Stretched Cluster enabling Oracle Extended RAC environment. It also shows the resiliency and ease of
deployment offered by vSAN Stretched Cluster.
Provides an availability solution including a three-site DR deployment leveraging vSAN Stretched Cluster and Oracle Data
Guard.
Provides a business continuity solution with minimal impact to the production environment for database backup and
recovery using Oracle RMAN in a vSAN environment
Audience
This reference architecture is intended for Oracle RAC Database administrators and storage architects involved in planning,
architecting, and administering an environment with vSAN.
Terminology
This paper includes the following terminologies.
Table 1. Terminology
TERM DEFINITION
Oracle Automatic Storage Oracle ASM is a volume manager and a file system for Oracle database files that
Management (Oracle ASM) support single-instance Oracle Database and Oracle RAC configurations.
Oracle Data Guard ensures high availability, data protection, and disaster recovery for
enterprise data. Data Guard provides a comprehensive set of services that create,
Oracle Data Guard
maintain, manage, and monitor one or more standby databases to enable production
Oracle databases to survive disasters and data corruptions.
Oracle Extended RAC Oracle Extended RAC is a deployment model in which servers in the cluster reside in
(Oracle RAC on Extended locations that are physically separated.
Distance Cluster)
Also known as the production database that functions in the primary role. This is the
Primary database
database that is accessed by most of your applications.
A physical standby uses Redo Apply to maintain a block for block, replica of the
Physical standby database primary database. Physical standby databases provide the best DR protection for
Oracle Database
RMAN RMAN (Recovery Manager) is a backup and recovery manager for Oracle Database.
Technology Overview
This section provides an overview of the technologies used in this solution.
Technology Overview
This section provides an overview of the technologies used in this solution:
VMware vSphere®
VMware vSAN
DOCUMENT | 6
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
VMware vSphere
VMware vSphere is the industry-leading virtualization platform for building cloud infrastructures. It enables users to run business-
critical applications with confidence and respond quickly to business needs. vSphere accelerates the shift to cloud computing for
existing data centers and underpins compatible public cloud offerings, forming the foundation for the industry’s best hybrid-cloud
model.
VMware vSAN
VMware vSAN is VMware’s software-defined storage solution for hyperconverged infrastructure, a software-driven architecture that
delivers tightly integrated computing, networking, and shared storage from a single virtualized x86 server. vSAN delivers high
performance, highly resilient shared storage by clustering server-attached flash devices and hard disks (HDDs).
vSAN delivers enterprise-class storage services for virtualized production environments along with predictable scalability and all-
flash performance—all at a fraction of the price of traditional, purpose-built storage arrays. Just like vSphere, vSAN provides users
the flexibility and control to choose from a wide range of hardware options and easily deploy and manage it for a variety of IT
workloads and use cases. vSAN can be configured as all-flash or hybrid storage.
DOCUMENT | 7
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
vSAN supports a hybrid disk architecture that leverages flash-based devices for performance and magnetic disks for capacity and
persistent data storage. In addition, vSAN can use flash-based devices for both caching and persistent storage. It is a distributed
object storage system that leverages the vSphere Storage Policy-Based Management (SPBM) feature to deliver centrally managed,
application-centric storage services and capabilities. Administrators can specify storage attributes, such as capacity, performance,
and availability, as a policy on a per VMDK level. The policies dynamically self-tune and load balance the system so that each
virtual machine has the right level of resources.
vSAN Stretched Cluster builds on the foundation of Fault Domains. The Fault Domain feature introduced rack awareness in vSAN
6.0. The feature allows customers to group multiple hosts into failure zones across multiple server racks in order to ensure that
replicas of virtual machine objects are not provisioned onto the same logical failure zones or server racks. Similarly, vSAN
DOCUMENT | 8
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Stretched Cluster requires three failure domains and is based on three sites (two active—active sites and witness site). The witness
site is only utilized to host witness virtual appliance that stores witness objects and cluster metadata information and also provide
cluster quorum services during failure events.
The nomenclature used to describe a vSAN Stretched Cluster configuration is X+Y+Z, where X is the number of ESXi hosts at data
site A, Y is the number of ESXi hosts at data site B, and Z is the number of witness host at site C. Data sites are where virtual
machines are deployed. The minimum supported configuration is three nodes (1+1+1). The maximum configuration is 31 nodes
(15+15+1).
vSAN Stretched Cluster differs from regular vSAN Cluster in the following perspectives:
Write latency: In a regular vSAN Cluster, mirrored writes incur the same latency. In a vSAN Stretched Cluster, you need to
prepare the write operations at two sites. Therefore, write operation needs to traverse the inter-site link, and thereby incur
the inter-site latency. The higher the latency, the longer it takes for the write operations to complete.
Read locality: The regular cluster does read operations in a round robin pattern across the mirrored copies of an object.
The stretched cluster does all reads from the single-object copy available at the local site.
Failure: In the event of any failure, recovery traffic needs to originate from the remote site, which has the only mirrored
copy of the object. Thus, all recovery traffic traverses the inter-site link. In addition, since the local copy of the object on a
failed node is degraded, all reads to that object are redirected to the remote copy across the inter-site link.
DOCUMENT | 9
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
DOCUMENT | 10
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
In an Oracle RAC environment, Oracle databases runs on two or more systems in a cluster while concurrently accessing a single
shared database. The result is a single database system that spans multiple hardware systems, enabling Oracle RAC to provide
high availability and redundancy during failures in the cluster. It enables multiple database instances that are linked by a network
interconnect to share access to an Oracle database Oracle RAC accommodates all system types, from read-only data warehouse
systems to update-intensive OLTP systems.
The high impact of latency, and therefore distance, creates some practical limitations as to where this architecture can be
deployed. An active / active Oracle RAC architecture fits best where the two datacenters are located relatively close (<100km) and
where the costs of setting up a low latency and dedicated direct connectivity between the sites for Oracle RAC has already taken
place, which is why it cannot be used as a replacement for a disaster recovery solution such as Oracle Data Guard or Oracle
GoldenGate.
A physical standby uses Redo Apply to maintain a block for block, exact replica of the primary database. Physical standby
databases provide the best DR protection for the Oracle Database. We use this standby database type in this reference
architecture.
The second type of the standby database uses SQL Apply to maintain a logical replica of the primary database. While a
logical standby database contains the same data as the primary database, the physical organization and structure of the
data can be different.
Oracle Active Data Guard: Oracle Active Data Guard enhances the Quality of Service (QoS) for production databases by off-loading
resource-intensive operations to one or more standby databases, which are synchronized copies of the production database. With
Oracle Active Data Guard, a physical standby database can be used for real-time reporting, with minimal latency between
reporting and production data. Additionally, Oracle Active Data Guard continues to provide the benefit of high availability and
disaster protection by failing over to the standby database in a planned or an unplanned outage at the production site.
More information about Oracle Data Guard 20c can be found here.
More information about Oracle recovery Manager 20c can be found here.
DOCUMENT | 11
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Overview
Oracle RAC implementations are primarily designed as scalability and high availability solution that resides in a single data center.
By contrast, in an Oracle Extended RAC architecture, the nodes are separated by geographic distance. For example, if a customer
has a corporate campus, they might want to place individual Oracle RAC nodes in separate buildings. This configuration provides a
higher degree of disaster tolerance in addition to the normal Oracle RAC high availability because a power shutdown or fire in one
building would not stop the database from processing if properly setup. Similarly, many customers who have two data centers in
reasonable proximity, which are already connected by a high speed link and are often on different power grids and flood plains,
can take advantage of this solution.
To deploy this type of architecture, the RAC nodes are physical dispersed across both sites to protect them from local server
failures. Similar consideration is required for storage as well.
vSAN Stretched Cluster inherently provides this storage solution suitable for Oracle Extended RAC using the vSAN fault domain
concept. This ensures that one copy of the data is placed on one site, a second copy of the data on another site, and the witness
components are always placed on the third site (witness site).
A virtual machine deployed on a vSAN Stretched Cluster has one copy of its data in site A, a second copy of its data in site B, and
any witness components placed on the witness host in site C. This configuration is achieved through fault domains. In the event of
a complete site failure, there is a full copy of the virtual machine data as well as greater than 50 percent of the components
available. This enables the virtual machine to remain available on the vSAN datastore. If the virtual machine needs to be restarted
on the other data site, VMware vSphere High Availability handles this task.
Oracle Extended RAC is not a different installation scenario comparing to Oracle RAC. Make sure to prepare the infrastructure so
the distance between the nodes is transparent to the Oracle RAC Database. Similarly, for underlying storage, the transparency is
maintained by data mirroring across distances provided by vSAN Stretched Cluster.
In the case of vSAN Stretched Cluster Network partitioning, vSAN continues IO from one of the available site. The Oracle cluster
DOCUMENT | 12
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
nodes therefore have access to voting disks only where vSAN Stretched Cluster allows IO to continue and Oracle Clusterware
reconfigures the cluster accordingly. Although the voting disks are required for Oracle Extended RAC, you do not need to deploy
them at an independent third site because vSAN Stretched Cluster Witness provides split-brain protection and guaranteed
behavior alignment of vSAN Stretched Cluster and Oracle Clusterware.
Deployment Benefits
The benefits of using vSAN Stretched Cluster for Oracle Extended RAC are:
Scale-out architecture with resources (storage and compute) balanced across both sites.
Cost-effective Server SAN solution for extended distance.
Simple deployment of Oracle Extended RAC:
Reduced consumption of Oracle Cluster node CPU cycles associated with host-based mirroring. Instead, vSAN takes care of
replicating the data across sites.
Elimination of Oracle Server and Clusterware at the third site.
Easy to deploy the preconfigured witness appliance provided by VMware and can be used as vSAN Stretched Cluster
Witness at the third site.
Simple infrastructure requirements with deployment of Oracle voting disk on vSAN Stretched Cluster datastore. No need for
NFS storage at the third site for arbitration.
vSAN Stretched Cluster offers ease of deployment with easy enablement and without additional software or hardware.
Integrates well with other VMware features like VMware vSphere vMotion® and vSphere HA.
Solution Configuration
This section introduces the resources and configurations for the solution including solution configuration, architecture diagram and
hardware & software resources.
Solution Configuration
This section introduces the resources and configurations for the solution including:
Architecture diagram
Hardware resources
Software resources
Network configuration
VMware ESXi Servers
vSAN configuration
Oracle RAC VM and database storage configuration
Architecture Diagram
The key designs for the vSAN Cluster (Hybrid) solution for Oracle RAC are:
A 4-node vSAN Cluster with two vSAN disk groups in each ESXi host. Each disk group is created from one 800GB SSD and
five 1.2TB HDDs.
Four Oracle Enterprise Linux VMs, each in one ESXi host to form an Oracle RAC Cluster. Each VM has 8 vCPUs and 64GB of
memory with 28GB assigned to Oracle system global area (SGA). The database size is 350GB.
DOCUMENT | 13
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
The key designs for the vSAN Stretched Cluster solution for Oracle Extended RAC are:
The vSAN Stretched Cluster consists of five (2+2+1) ESXi hosts. Site A and Site B has two ESXi hosts each, each ESXi host
has two vSAN disk groups. Each disk group is created from one 800GB SSD and five 1.2TB HDDs. The vSAN Stretched
witness site (site C) has a nested ESXi host VM as witness. Both physical ESXi host and virtual appliance (in the form of
nested ESXi) can be used as the witness host. In this solution, we used a preconfigured witness appliance provided by
VMware as the witness.
Two Oracle Enterprise Linux VMs each in site A and site B form an Oracle Extended RAC Cluster. Each VM has 8 vCPUs and
64GB of memory with 28GB assigned for Oracle SGA. The databases size is 350GB.
DOCUMENT | 14
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Hardware Resources
Table 2 shows the hardware resources used in this solution.
SOLUTION CONFIGURATION
Software Resources
Table 3 shows the software resources used in this solution.
DOCUMENT | 15
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Network Configuration
A VMware vSphere Distributed Switch™ acts as a single virtual switch across all associated hosts in the data cluster. This setup
allows virtual machines to maintain a consistent network configuration as they migrate across multiple hosts. The vSphere
Distributed Switch uses two 10GbE adapters per host as shown in Figure 5.
DOCUMENT | 16
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Figure 6. Distributed Switch Configuration View from one of the ESXi Host
A port group defines properties regarding security, traffic shaping, and NIC teaming. Default port group setting was used except
DOCUMENT | 17
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
the uplink failover order which is shown in Table 4. It also shows the distributed switch port groups created for different functions
and the respective active and standby uplink to balance traffic across the available uplinks.
vSAN Configuration
vSAN Storage Policy
vSAN can set availability, capacity, and performance policies per virtual machine. Table 5 shows the designed and implemented
storage policy.
DOCUMENT | 18
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Each Oracle RAC VM is installed with Oracle Enterprise Linux 6.6 (OEL) and configured with 8 vCPUs and 64GB memory with 28GB
assigned to Oracle SGA. We used this configuration all through the tests unless specified otherwise.
Oracle ASM data disk group with external redundancy is configured with allocation unit (AU) size of 1M. Data, Fast Recovery Area
(FRA), and Redo ASM disk groups are present on different PVSCSI controllers. Archive log destination uses FRA disk group. Table 6
provides Oracle RAC VM disk layout and ASM disk group configuration.
DOCUMENT | 19
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
OS and Oracle
SCSI 0 100 x 1 100 Not Applicable
Binary
Database data
SCSI 1 60 x 10 600 DATA
disks
Oracle RAC requires attaching the shared disks to one or more VMs, which requires multi-writer support for all applicable
virtual machines and VMDKs. To enable in-guest systems leveraging cluster-aware file systems that have distributed write
(multi-writer) capability, we must explicitly enable multi-writer support for all applicable virtual machines and VMDKs.
Create a VM storage policy that is applied to the virtual disks used as the Oracle RAC cluster’s shared storage. See Table 5
for vSAN storage policy used in this reference architecture.
Create shared virtual disks in the eager-zeroed thick and independent persistent mode.
Starting from vSAN 6.7 Patch 01 EZT requirement is removed and shared disks can be provisioned as thin by
default to maximize space efficiency. More information on that can be found at the following blog articles.
EZT requirement for Multi-Writer Disks is now removed for vSAN Datastore
Oracle RAC on vSAN 6.7 P01 – No more Eager Zero Thick requirement for shared vmdk’s
Oracle RAC storage migration from non-vSAN to vSAN 6.7 P01 – Through Thick to Thin
Attach the shared disks to one or more VMs.
Enable the multi-writer mode for the VMs and disks.
Apply the VM storage policy to the shared disks.
See the VMware Knowledge Base Article 2121181 for detailed steps to enable multi-writer and provisioning storage to Oracle RAC
from vSAN.
Each Oracle RAC VM is installed with Oracle Enterprise Linux 6.6 (OEL) and configured with 8 vCPUs and 64GB memory with 28GB
assigned to Oracle SGA. We used this configuration all through the tests unless specified otherwise.
Oracle ASM data disk group with external redundancy is configured with allocation unit (AU) size of 1M. Data, Fast Recovery Area
(FRA), and Redo ASM disk groups are present on different PVSCSI controllers. Archive log destination uses FRA disk group. Table 6
provides Oracle RAC VM disk layout and ASM disk group configuration.
Solution Validation
In this section, we present the test methodologies and processes used in this solution.
Solution Validation
The solution designed and deployed Oracle 11gR2 RAC on a vSAN Cluster focusing on ease of use, performance, resiliency, and
availability. In this section, we present the test methodology, processes, and results for each test scenario.
DOCUMENT | 20
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Test Overview
The solution validated the performance and functionality of Oracle RAC instances in a virtualized VMware environment running
Oracle 11gR2 RAC backed by the vSAN storage platform.
Oracle workload testing using Swingbench to generate classic-order-entry TPC-C like workload and observe the database
and vSAN performance.
RAC scalability testing to support an enterprise-class Oracle RAC Database with vSAN cost-effective storage.
Resiliency testing to ensure Oracle RAC on vSAN storage solution is highly available and supports business continuity.
vSphere vMotion testing on vSAN.
Oracle Extended RAC performance test on vSAN Stretched Cluster to verify acceptable performance across geographically
distributed sites.
Continuous application availability test during a site failure in vSAN Stretched Cluster.
Metro and global availability test with vSAN Stretched Cluster and Oracle Data Guard.
Oracle database backup and recovery test on vSAN using Oracle RMAN.
In all tests, Swingbench accesses the Oracle RAC database using EZConnect client and the simple JDBC thin URL. Unless
mentioned in the respective test, Swingbench was set up to generate TPC-C like workload using 100-user sessions on Oracle RAC
Database.
vSAN Observer
vSAN Observer is designed to capture performance statistics and bandwidth for a VMware vSAN Cluster. It provides an in-depth
snapshot of IOPS, bandwidth and latencies at different layers of vSAN, read cache hits and misses ratio, outstanding I/Os, and
congestion. This information is provided at different layers in the vSAN stack to help troubleshoot storage performance. For more
information about the VMware vSAN Observer, see the Monitoring VMware vSAN with vSAN Observer documentation.
esxtop utility
esxtop is a command-line utility that provides a detailed view on the ESXi resource usage. Refer to the VMware Knowledge Base
Article 1008205 for more information.
Automatic Workload Repository (AWR) collects, processes, and maintains performance statistics for problem detection and self-
tuning purposes for Oracle database. This tool can generate report for analyzing Oracle performance.
The Automatic Database Diagnostic Monitor (ADDM) analyzes data in the Automatic Workload Repository (AWR) to identify
potential performance bottlenecks. For each of the identified issues, it locates the root cause and provides recommendations for
correcting the problem.
DOCUMENT | 21
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Test Results
Swingbench reported the maximum transactions per minute (TPM) of 331,000 with average of 287,000 as shown in Figure 8. From
the storage perspective, vSAN Observer and Oracle AWR reported average IOPS of 28,000 and average throughput of 304 MB/s.
The four RAC nodes are spread across the four ESXi host. Therefore, Figure 9 vSAN Observer Client View shows IOPS (~7,000) and
throughout (~76MB/s) equally distributed across the four ESXi hosts. Table 7 shows data captured from Oracle AWR report that
provides Oracle workload read to write ratio (70:30). We ran multiple tests with similar workload and observed similar results.
During this workload, Figure 9 shows the overall IO response time less than 2ms. The test results demonstrated vSAN as a viable
storage solution for Oracle RAC.
Figure 9. IO Metrics from vSAN Observer during Swingbench Workload on a 4-Node RAC
DOCUMENT | 22
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Test Results
As shown in Figure 10, the average TPM increased linearly from the single-instance database to the 4-node RAC database. The TPS
shown in Figure 9 are aggregate TPS observed across all Oracle instance in a RAC. We observed linear increase in IOPS and
throughput with overall latency less than 2ms throughout. The TPM values shown in Figure 10 are aggregate of all the instances in
the RAC database. This demonstrated vSAN as a scalable storage solution for Oracle RAC database.
DOCUMENT | 23
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
vSAN Resiliency
Test Overview
This section validates vSAN resiliency in handling disk, disk group, and host failures. We designed the following scenarios to
emulate potential real-world component failures:
Disk failure
This test evaluated the impact on the virtualized Oracle RAC database when encountering one HDD failure. The HDD stored the
VMDK component of the oracle database. Hot-remove (or hot-unplug) the HDD to simulate a disk failure on one of the nodes of the
vSAN Cluster to observe whether it has functional or performance impact on the production Oracle database.
This test evaluated the impact on the virtualized Oracle RAC database when encountering a disk group failure. Hot-remove (or hot-
unplug) one SSD to simulate a disk group failure to observe whether it has functional or performance impact on the production
Oracle database.
This test evaluated the impact on the virtualized Oracle RAC database when encountering one vSAN host failure. Shut down one
host in the vSAN Cluster to simulate host failure to observe whether it has functional or performance impact on the production
Oracle database.
Test Scenarios
Single Disk Failure
We simulated an HDD disk failure by hot-removing a disk when Swingbench was generating workload on a 4-node RAC. Table 8
shows the failed disk and it had four components that stored the VMDK (Oracle ASM disks) for Oracle database. The components
residing on the disk are absent and inaccessible after the disk failure.
NO OF
FAILED DISK NAA ID ESXI HOST TOTAL CAPACITY USED CAPACITY
COMPONENTS
The failed disk group had the SSD and HDD backing disks as shown in Table 9. The table also shows the number of affected
components that stored the VMDK file for Oracle database.
DOCUMENT | 24
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
naa.5XXXXXXXXXXX893 1xx.xx.28.
SSD NA 745.21 NA
5 3
naa.5XXXXXXXXXXXd4d 1xx.xx.28.
HDD 3 1106.62 4.57
7 3
1xx.xx.28.
naa.5XXXXXXXXXXXc56f HDD 4 1106.62 6.51
3
1xx.xx.28.
naa.5XXXXXXXXXXXd7f7 HDD 3 1106.62 5.42
3
naa.5XXXXXXXXXXXd12 1xx.xx.28.
HDD 1 1106.62 1.81
b 3
1xx.xx.28.
naa.5XXXXXXXXXXXe54b HDD 2 1106.62 4.52
3
The failed storage node had two disk groups with the following SSD and HDD backing disks as shown in Table 10.
Table 10. Failed ESXi Host vSAN Disk Group—Physical Disks and Components
DOCUMENT | 25
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
ESX
DISK ESXI I NO OF TOTAL USED CAPACITY
DISK DISPLAY NAME
GROUP HOST HOS COMPONENTS CAPACITY (GB) (%)
T
Test Results
As shown in Table 11, during all types of failures, performance was affected only by a momentary drop in TPS and increase in IO
wait. Time taken for recovery to steady state TPS is also shown in Table 11. In all three failure tests, the steady state TPS after
failure value is approximately same as the value before failure. None of the test reported IO error in the Linux VMs or Oracle user-
session disconnects, which demonstrated the resiliency of vSAN during the component failures.
DOCUMENT | 26
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
TIME REDO
TAKEN DISK
FOR (VMDK)
REDO DISK (VMDK)
ORACLE RECOVERY WRITE
WRITE RESPONSE TIME
FAILURE TPS LOWEST ORACLE TO IOPS
INCREASE AT THE
TYPE BEFORE TPS AFTER FAILURE STEADY DECREASE
MOMENT OF FAILURE
FAILURE STATE TPS AT THE
(MS)
AFTER MOMENT
FAILURE OF
(SEC) FAILURE
Single
disk 4,100 1,200 29 250 to 110 1.6 to 2.4
(HDD)
Disk
3,900 1,000 51 240 to 110 1.7 to 3.2
group
Storage
3,600 700 165 170 to 48 1.7 to 4.3
host
After each of the failures, because the VM storage policy was set with “Number of FTT” greater than zero, the virtual machine
objects and components remained accessible. In the disk failure tests, because the disks are hot removed and it is not a
permanent failure, vSAN knows the disk is just removed so new objects rebuild will not start immediately. It will wait for 60
minutes as the default repair delay time. See the VMware Knowledge Base Article 2075456 for steps to change the default repair
delay time. If the removed disk is not inserted within 60 minutes, the object rebuild operation is triggered if additional capacity is
available within the cluster to satisfy the object requirement. However, if the disk fails due to unrecoverable error this is
considered as a permanent failure, vSAN immediately responds by rebuilding a disk object. In case of a host failure to avoid
unnecessary data resynchronization during the host maintenance, synchronization starts after 60 minutes as the default repair
delay value. The synchronization operation time depends on the amount of data that needs to be resynchronized.
DOCUMENT | 27
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
DOCUMENT | 28
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Figure 13 shows the Stretched Cluster fault domain configuration. We configured and tested three inter-site round trip latency
1ms, 2.2ms, and 4.2ms. We used Swingbench to generate the same TPC-C like workload in these tests.
DOCUMENT | 29
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Test Results
During these tests, we recorded Oracle TPS in Swingbench and IOPS metrics on the vSAN Stretched Cluster. With the increase of
inter-site latency using Netem functionality discussed in the above section, we observed a reduction in TPS in Oracle RAC. For the
test workload, the TPS reduction was proportional to the increase in inter-site round trip latency. As shown in Figure 14, for 1ms,
2.2ms, and 4.2ms round trip latency, the TPS reduced by 12 percent, 27 percent and 47 percent respectively. Similarly, we
observed an IOPS reduction in vSAN Observer as shown in Figure 15. We noticed that when inter-site latency increased, IO and
Oracle cache-fusion message latency increased as a result. This solution proved that Oracle Extended RAC on vSAN Stretched
Cluster provided reasonable performance for OLTP workload. vSAN Stretched Cluster with 1ms inter-site round trip latency
(typically 100km distance) is capable of delivering 88 percent of the transaction rate for Oracle Extended RAC when compared to
regular vSAN Clusters. As distance and latency between sites increase, performance gets impacted accordingly.
Figure 14. TPS Comparison for a TPC-C like Workload on Oracle RAC
DOCUMENT | 30
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Figure 15. IOPS Comparison for a TPC-C like Workload on Oracle RAC
We ran this test on the 2-node Oracle Extended RAC as shown in Figure 4. We enabled vSphere HA and vSphere DRS on the
cluster. And we generated Swingbench TPC-C like workload with 100-user sessions on the RAC cluster. After some period, an entire
site was down by powering off two ESXi hosts in site A (the vSAN Stretched Cluster preferred site) that is represented as “Not
Responding” in Figure 16.
DOCUMENT | 31
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
After some time, site A was available by powering on both ESXi hosts. The vSAN Stretched Cluster started the process of site A
synchronization, with the changed components in site B after the failure. The results demonstrated how vSAN Stretched Cluster
provided availability during site failure by automating the failover and failback process leveraging vSphere HA and vSphere DRS.
This proves vSAN Stretched Cluster’s ability to survive a complete site failure offering zero RPO and RTO for Oracle RAC database.
DOCUMENT | 32
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
continuous availability at a metro distance and Oracle Data Guard provided replication and recovery at a global distance.
Site A and site B were the data sites of vSAN Stretched Cluster, which had the Oracle Extended RAC nodes distributed
across them.
Site C hosted the Witness for vSAN Stretched Cluster.
Site A, B, and C together form Oracle Extended RAC Cluster that hosts the production database that functioned as the
primary role in Oracle Data Guard configuration.
Site D had a 2-node Oracle RAC on a regular vSAN Cluster for disaster recovery, which was ideally located at a global
distance (however, this was set up on the same data center in the lab for demonstration) for disaster recovery. The 2-node
Oracle RAC in site D was a physical standby database in Oracle Data Guard configuration. Oracle Active Data Guard was
setup between the primary and physical standby databases with the protection mode in Maximum Performance.
Figure 17. Metro Global DR with vSAN Stretched Cluster and Oracle Data Guard
Solution Validation
Oracle TPC-C like workload was generated on the primary database using Swingbench. With the Data Guard setup in the Maximum
Performance mode, the transactions were committed as soon as all the redo data generated by the transactions has been written
to the online log. The redo data was also written to the standby databases asynchronously with respect to the transaction
committed. This protection mode ensures the primary database performance is unaffected by delays in writing redo data to the
standby database. Ideally suited when the latency between the production and the DR site is more and bandwidth is limited
With Oracle Active Data Guard, a physical standby database can be used for real-time reporting, with minimal latency between
reporting and production data. Furthermore, it also allows backup operations to be off-loaded to the standby database. As shown
in Figure 17, RMAN backup is taken from standby database in Site D which is registered and managed from RMAN catalog
Database. This enables efficient utilization of vSAN storage resource at the DR site increasing overall performance and return on
the storage investment.
Another protection mode that can be considered is Maximum Availability. This mode provides the highest level of data protection
that is possible without compromising the availability of a primary database. Transactions do not commit until all redo data needed
to recover those transactions has been written to the online redo log and to the standby redo log at the standby database, so it is
DOCUMENT | 33
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
synchronized. If the primary database cannot write its redo stream to at least one synchronized standby database, it operates as if
it were in the maximum performance mode to preserve the primary database availability until it is able to write its redo stream
again to a synchronized standby database
This solution demonstrated how vSAN Stretched Cluster hosting a primary database and vSAN hosting a standby database
together with Oracle Data Guard provided a cost-effective solution with the highest level of availability across three data centers,
offering near-zero data loss during a disaster at any one of the sites.
Oracle RAC database backup from production site when there is workload on the database. In this scenario, there is no DR
site or oracle data guard configuration.
Oracle RAC database backup from physical standby database (DR site) when there is workload on the primary site. We
implemented the solution as shown in Figure 17. This method offloads backup to the standby database.
RMAN accesses backup and recovery information from either the control file or the optional recovery catalog. Having a separate
recovery catalog is a preferred method in the production environment as it serves as a secondary metadata repository and can
centralize metadata for all the target database. A recovery catalog is required when you use RMAN in a Data Guard environment.
By storing backup metadata for all the primary and standby databases, the catalog enables you to offload backup tasks to one
standby database while enabling you to restore backups on other databases in the environment.
In the lab environment, RMAN recovery catalog database is installed on a separate virtual machine as shown in Figure 17. An NFS
mount point is used to store the actual backup (disk-based backup). The RMAN catalog database and NFS mount point are stored
in a separate vSAN datastore where all the infrastructure components are hosted. A separate backup network interface and VLAN
is used for backup traffic, so we can isolate the backup traffic from the Oracle public and RAC interconnect traffic.
We configured Swingbench to generate TPC-C like workload using 100-user sessions on a 4-node Oracle RAC. Oracle TPS was
between 4,500 and 4,800. We initiated the RMAN full backup from one of the production RAC VM (vmorarac1) when there was a
workload. On this RAC VM, the read throughput increased from 50 MB/s to 115 MB/s after RMAN backup was started. This read
throughput increase was caused by the backup workload. Oracle transactions continued during the backup and the TPS was the
same as before starting the backup. Although the backup workload did not affect transaction performance, since RMAN backups
used some of the CPU and storage resources, it is recommended to initiate them during off-peak hours or offload backup to the
standby database if applicable. Further RMAN’s has incremental backup ability which only backup those database blocks that have
changed since a previous backup. This can be used to improve efficiency by reducing the time and resources required for backup
and recovery.
Oracle RAC database backup from the physical standby database (DR site)
While Swingbench was generating workload on the 2-node Oracle Extended RAC as shown in Figure 17. The RMAN backup was
initiated from the standby database. The primary and the standby databases are installed in different vSAN Clusters. Since the
backup is initiated from the standby database, there is no impact to resources on the primary site hence no impact to Oracle
transactions on the primary database. The primary and the standby databases are registered with the RMAN catalog database.
This allows you to offload backup tasks to the standby database while enabling you to restore backups to the primary database
when required.
The tests demonstrate Oracle RMAN backup as a feasible solution for Oracle RAC Database deployed on vSAN.
DOCUMENT | 34
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Design for growth: consider initial deployment with capacity in the cluster for future growth and enough flash cache to
accommodate future requirements. Use multiple vSAN disk groups per server with enough magnetic spindles and SSD
capacity behind each disk group. For future capacity addition, create disk groups with similar configuration and sizing. This
ensures an even balance of virtual machine storage components across the cluster of disks and hosts.
Design for availability: consider design with more than three hosts and additional capacity that enable the cluster to
automatically remediate in the event of a failure.
vSAN SPBM can set availability, capacity, and performance policies per virtual machine. Number of disk stripes per object
and object space reservation are the storage policies that were changed from the default values for Oracle RAC VMs in this
reference architecture:
When using the multi-writer mode, Prior to vSAN 6.7 Update 1 the virtual disk must be eager-zeroed thick. Since the disk is
thick provisioned, 100 percent capacity is reserved automatically, so the object space reservation is set to 100 percent.
Starting from vSAN 6.7 Patch 01 EZT requirement is removed.
Number of disk stripes per object—with the increase in stripe width, you may notice IO performance improvement because
objects spread across more vSAN disk groups and disks. However in a solution like Oracle RAC where we recommend
multiple VMDKs for database, the objects are spread across vSAN Cluster components even with the default stripe width of
1. So increasing the vSAN stripe width might not provide tangible benefits. Moreover, there is an additional ASM striping at
the host level as well. Hence it is recommended to use the default stripe width of 1 unless there are performance issues
observed during read cache misses or during destaging. In our Oracle RAC tests, stripe width of 2 provided marginal
increase in Oracle TPS when compared to the stripe width of 1. However, further increase in stripe width did not provide
any benefits, so stripe width of 2 was used in this testing.
Conclusion
This section summarizes on the validation of vSAN as a storage platform supporting a scalable, resilient, highly available, and high
performing Oracle RAC Cluster.
vSAN is a cost-effective and high-performance storage platform that is rapidly deployed, easy to manage, and fully integrated into
the industry-leading VMware vSphere platform.
This solution validates vSAN as a storage platform supporting a scalable, resilient, highly available, and high performing Oracle
RAC Cluster.
vSAN’s integration with vSphere enables ease of deployment and management from the single-management interface. vSAN
Stretched Cluster provides an excellent storage platform for Oracle Extended RAC Cluster solution providing zero RPO and RTO at
the metro distance. It also demonstrates how vSAN Stretched Cluster along with Oracle Data Guard and RMAN backup provides
disaster recovery and business continuity required for mission-critical application at the same time efficiently using the available
resources.
DOCUMENT | 35
ORACLE REAL APPLICATION CLUSTERS ON VMWARE VSAN
Reference
This section lists the relevant references used for this document.
White Paper
For additional information, see the following white papers:
Product Documentation
For additional information, see the following product documentation:
Other Documentation
For additional information, see the following document:
Palanivenkatesan Murugan, Senior Solution Engineer in the Product Enablement team of the Storage and Availability Business Unit.
Palani specializes in solution design and implementation for business-critical applications on VMware vSAN. He has more than 12
years of experience in enterprise-storage solution design and implementation for mission-critical workloads. Palani has worked
with large system and storage product organizations where he has delivered Storage Availability and Performance Assessments,
Complex Data Migrations across storage platforms, Proof of Concept, and Performance Benchmarking.
Sudhir Balasubramanian, Staff Solution Architect, works in the Global Field and Partner Readiness team. Sudhir specializes in the
virtualization of Oracle business-critical applications. Sudhir has more than 20 years’ experience in IT infrastructure and database,
working as the Principal Oracle DBA and Architect for large enterprises focusing on Oracle, EMC storage, and Unix/Linux
technologies. Sudhir holds a Master Degree in Computer Science from San Diego State University. Sudhir is one of the authors of
the “Virtualize Oracle Business Critical Databases” book, which is a comprehensive authority for Oracle DBAs on the subject of
Oracle and Linux on vSphere. Sudhir is a VMware vExpert and Member of the CTO Ambassador Program.
DOCUMENT | 36
VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax
650-427-5001 www.vmware.com
Copyright © 2022 VMware, Inc. All rights reserved. This product is protected by U.S. and international
copyright and intellectual property laws. VMware products are covered by one or more patents listed
at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc.
in the United States and/or other jurisdictions. All other marks and names mentioned herein may be
trademarks of their respective companies.