Managing Serviceguard - 12th Edition
Managing Serviceguard - 12th Edition
Managing Serviceguard - 12th Edition
Twelfth Edition
2
Contents
1. Serviceguard at a Glance
What is Serviceguard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Using Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Monitoring with Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Administering with Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Configuring with Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Serviceguard Manager Help. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
How Serviceguard Manager Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
What are the Distributed Systems Administration Utilities?. . . . . . . . . . . . . . . . . . . . 32
A Roadmap for Configuring Clusters and Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3
Contents
Serviceguard Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Serviceguard Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
How the Cluster Manager Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Configuration of the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Heartbeat Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Manual Startup of Entire Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Automatic Cluster Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Dynamic Cluster Re-formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Cluster Quorum to Prevent Split-Brain Syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Cluster Lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Use of an LVM Lock Disk as the Cluster Lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Use of the Quorum Server as the Cluster Lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
No Cluster Lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
How the Package Manager Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Package Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Using Older Package Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Using the Event Monitoring Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Using the EMS HA Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Choosing Package Failover Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
How Package Control Scripts Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
What Makes a Package Run? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Before the Control Script Starts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
During Run Script Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Normal and Abnormal Exits from the Run Script . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Service Startup with cmrunserv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
While Services are Running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
When a Service, Subnet, or Monitored Resource Fails, or a Dependency is Not Met 94
When a Package is Halted with a Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
During Halt Script Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Normal and Abnormal Exits from the Halt Script . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
How the Network Manager Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Stationary and Relocatable IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Adding and Deleting Relocatable IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Monitoring LAN Interfaces and Detecting Failure . . . . . . . . . . . . . . . . . . . . . . . . . 103
Automatic Port Aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
VLAN Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4
Contents
5
Contents
6
Contents
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) . . . . . 240
Preparing the Cluster and the System Multi-node Package . . . . . . . . . . . . . . . . . . 240
Creating the Disk Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Creating the Disk Group Cluster Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Creating Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Create a Filesystem and Mount Point Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Creating Checkpoint and Snapshot Packages for CFS. . . . . . . . . . . . . . . . . . . . . . . 246
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume
Manager (CVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Initializing the VERITAS Volume Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Preparing the Cluster for Use with CVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Starting the Cluster and Identifying the Master Node . . . . . . . . . . . . . . . . . . . . . . 254
Initializing Disks for CVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Creating Disk Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Creating Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Creating File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Adding Disk Groups to the Package Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 257
Using DSAU during Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Synchronizing Your Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Log Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Using Command Fan-out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Monitoring your Serviceguard Cluster and Distributed Systems . . . . . . . . . . . . . . 260
Managing the Running Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Checking Cluster Operation with Serviceguard Manager . . . . . . . . . . . . . . . . . . . . 261
Checking Cluster Operation with Serviceguard Commands . . . . . . . . . . . . . . . . . . 261
Preventing Automatic Activation of LVM Volume Groups . . . . . . . . . . . . . . . . . . . 263
Setting up Autostart Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Changing the System Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Managing a Single-Node Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Deleting the Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7
Contents
8
Contents
9
Contents
A. Serviceguard Commands
10
Contents
11
Contents
12
Tables
13
Tables
14
Printing History
Table 1 Printing History
15
The printing date changes when a new edition is printed. (Minor
corrections and updates which are incorporated at reprint do not cause
the date to change.) The part number is revised when extensive technical
changes are incorporated.
New editions of this manual will incorporate all material updated since
the previous edition.
HP Printing Division:
Infrastructure Solutions Division
Hewlett-Packard Co.
19111 Pruneridge Ave.
Cupertino, CA 95014
16
Preface
The twelfth printing of this manual is updated for Serviceguard Version
A.11.17.
This guide describes how to configure Serviceguard to run on HP 9000 or
HP Integrity servers under the HP-UX operating system. The contents
are as follows:
17
• Appendix C, “Designing Highly Available Cluster Applications,”
gives guidelines for creating cluster-aware applications that provide
optimal performance in a Serviceguard environment.
• Appendix D, “Integrating HA Applications with Serviceguard,”
presents suggestions for integrating your existing applications with
Serviceguard.
• Appendix E, “Rolling Software Upgrades,” shows how to move from
one Serviceguard or HP-UX release to another without bringing
down your applications.
• Appendix F, “Blank Planning Worksheets,” contains a set of empty
worksheets for preparing a Serviceguard configuration.
• Appendix G, “Migrating from LVM to VxVM Data Storage,” describes
how to convert from LVM data storage to VxVM data storage.
• Appendix H, “IPv6 Network Support,” describes the IPv6 addressing
scheme and the primary/standby interface configurations supported.
Related Use the following URL to access HP’s high availability web page:
Publications http://www.hp.com/go/ha
Use the following URL for access to a wide variety of HP-UX
documentation:
http://docs.hp.com
The following documents contain information useful to Serviceguard
users:
18
• From http://www.docs.hp.com -> High Availability - > HP
Serviceguard Storage Management Suite:
19
— Designing Disaster Tolerant High Availability Clusters December
2005
• From http://www.docs.hp.com -> High Availability - > HP
Serviceguard Extension for Faster Failover:
Problem Reporting If you have any problems with the software or documentation, please
contact your local Hewlett-Packard Sales Office or Customer Service
Center.
20
Serviceguard at a Glance
1 Serviceguard at a Glance
• What is Serviceguard?
• Using Serviceguard Manager
• A Roadmap for Configuring Clusters and Packages
If you are ready to start setting up Serviceguard clusters, skip ahead to
Chapter 4, “Planning and Documenting an HA Cluster,” on page 131.
Specific steps for setup are given in Chapter 5, “Building an HA Cluster
Configuration,” on page 187.
Figure 1-1 shows a typical Serviceguard cluster with two nodes.
Chapter 1 21
Serviceguard at a Glance
What is Serviceguard?
What is Serviceguard?
Serviceguard allows you to create high availability clusters of HP 9000 or
HP Integrity servers. A high availability computer system allows
application services to continue in spite of a hardware or software
failure. Highly available systems protect users from software failures as
well as from failure of a system processing unit (SPU), disk, or local area
network (LAN) component. In the event that one component fails, the
redundant component takes over. Serviceguard and other high
availability subsystems coordinate the transfer between components.
A Serviceguard cluster is a networked grouping of HP 9000 or HP
Integrity servers (host systems known as nodes) having sufficient
redundancy of software and hardware that a single point of failure
will not significantly disrupt service.
A package groups application services (individual HP-UX processes)
together. There are failover packages, system multi-node packages, and
multi-node packages:
The typical high availablity package is a failover package. It
usually is configured to run on several nodes in the cluster, and runs
on one at a time. If a service, node, network, or other package
resource fails on the node where it is running, Serviceguard can
automatically transfer control of the package to another cluster node,
allowing services to remain available with minimal interruption.
There are also packages that run on several cluster nodes at once,
and do not fail over. These are called system multi-node packages
and multi-node packages. At the release of Serviceguard A.11.17,
the only non-failover packages that are supported are the ones
specified by Hewlett-Packard, for example the packages HP supplies
for use with the VERITAS Cluster Volume Manager and the
VERITAS Cluster File System.
A system multi-node package must run on all nodes that are active
in the cluster. If it fails on one active node, that node halts. A
multi-node package can be configured to run on one or more cluster
nodes. It is considered UP as long as it is running on any of its
configured nodes.
22 Chapter 1
Serviceguard at a Glance
What is Serviceguard?
Failover
Any host system running in a Serviceguard cluster is called an active
node. Under normal conditions, a fully operating Serviceguard cluster
monitors the health of the cluster's components on all its active nodes.
Most Serviceguard packages are failover packages. When you configure a
failover package, you specify which active node will be the primary
node where the package will start, and one or more other nodes, called
adoptive nodes, that can also run the package.
Figure 1-2 shows what happens in a failover situation.
Chapter 1 23
Serviceguard at a Glance
What is Serviceguard?
24 Chapter 1
Serviceguard at a Glance
What is Serviceguard?
Serviceguard; disk arrays, which use various RAID levels for data
protection; and HP-supported uninterruptible power supplies (UPS),
such as HP PowerTrust, which eliminates failures related to power
outage. These products are highly recommended along with
Serviceguard to provide the greatest degree of availability.
Chapter 1 25
Serviceguard at a Glance
Using Serviceguard Manager
26 Chapter 1
Serviceguard at a Glance
Using Serviceguard Manager
Chapter 1 27
Serviceguard at a Glance
Using Serviceguard Manager
28 Chapter 1
Serviceguard at a Glance
Using Serviceguard Manager
Chapter 1 29
Serviceguard at a Glance
Using Serviceguard Manager
30 Chapter 1
Serviceguard at a Glance
Using Serviceguard Manager
Chapter 1 31
Serviceguard at a Glance
What are the Distributed Systems Administration Utilities?
• Configuration synchronization
• Log consolidation
• Command fan-out
With configuration synchronization, you specify a specific server as your
configuration master; all your other systems are defined as clients. The
configuration master retains copies of all files that you want
synchronized across your clients. For example, synchronization actions
can include updating client files from the configuration master, executing
shell commands, or checking for certain processes.
With log consolidation, logs from all systems you manage, whether in a
cluster or not, can send their logs (system, package, and cluster) to a
single location on the consolidation node from which they are easily
monitored. All systems must be in a cluster or connected by a network to
use consolidated logging.
With command fan-out, you can send the same command to all systems
in your configuration in a single action.
For additional information on DSAU, refer to the Managing Systems
andWorkgroups manual, posted at http://docs.hp.com.
32 Chapter 1
Serviceguard at a Glance
A Roadmap for Configuring Clusters and Packages
Chapter 1 33
Serviceguard at a Glance
A Roadmap for Configuring Clusters and Packages
34 Chapter 1
Understanding Serviceguard Hardware Configurations
2 Understanding Serviceguard
Hardware Configurations
Chapter 2 35
Understanding Serviceguard Hardware Configurations
Redundancy of Cluster Components
36 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundancy of Cluster Components
Note that a package that does not access data from a disk on a shared
bus can be configured to fail over to as many nodes as you have
configured in the cluster (regardless of disk technology). For instance, if
a package only runs local executables, it can be configured to failover to
all nodes in the cluster that have local copies of those executables,
regardless of the type of disk connectivity.
Chapter 2 37
Understanding Serviceguard Hardware Configurations
Redundant Network Components
38 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Network Components
Chapter 2 39
Understanding Serviceguard Hardware Configurations
Redundant Network Components
NOTE You should verify that network traffic is not too high on the heartbeat/
data LAN. If traffic is too high, this LAN might not perform adequately
in transmitting heartbeats if the dedicated heartbeat LAN fails.
40 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Network Components
The use of dual attach cards gives protection against failures in both
cables and connectors, but does not protect against card failures. LAN
card failure would result in a package switching to another node.
Chapter 2 41
Understanding Serviceguard Hardware Configurations
Redundant Network Components
NOTE The use of a serial (RS232) heartbeat line is supported only in a two-node
cluster configuration. A serial heartbeat line is required in a two-node
cluster that has only one heartbeat LAN. If you have at least two
heartbeat LANs, or one heartbeat LAN and one standby LAN, a serial
(RS232) heartbeat should not be used.
If the heartbeat network card fails on one node, having a serial line
heartbeat keeps the cluster up just long enough to detect the LAN
controller card status and to fail the node with bad network connections
while the healthy node stays up and runs all the packages.
Even if you have a serial (RS232) line configured for redundant
heartbeat, one LAN is still required to carry a heartbeat signal. The
serial line heartbeat protects against network saturation but not against
network failure, since Serviceguard requires TCP/IP to communicate
between cluster members.
Serial (RS232) lines are inherently unreliable compared to network
cards which run the TCP/IP protocol, such as Ethernet or FDDI. Unlike
TCP/IP, the serial line protocol has no error correction or retry
mechanism. Serial lines can also be complex and difficult to configure
because of a lack of standards.
A serial (RS232) heartbeat line is shown in Figure 2-4.
42 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Network Components
Chapter 2 43
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
• Single-ended SCSI
• SCSI
• Fibre Channel
Not all SCSI disks are supported. See the HP Unix Servers Configuration
Guide (available through your HP representative) for a list of currently
supported disks.
NOTE In a cluster that contains systems with PCI SCSI adapters, you cannot
attach both PCI and NIO SCSI adapters to the same shared SCSI bus.
44 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
shared bus. All SCSI addresses, including the addresses of all interface
cards, must be unique for all devices on a shared bus. See the manual
Configuring HP-UX for Peripherals for information on SCSI bus
addressing and priority.
Data Protection
It is required that you provide data protection for your highly available
system, using one of two methods:
• Disk Mirroring
• Disk Arrays using RAID Levels and Multiple Data Paths
Disk Mirroring
Disk mirroring is one method for providing data protection. The logical
volumes used for Serviceguard packages should be mirrored.
Serviceguard does not provide protection for data on your disks. This
capability is provided for LVM storage with HP’s Mirrordisk/UX product,
and for VxVM and CVM with the VERITAS Volume Manager. When you
configure logical volumes using software mirroring, the members of each
mirrored set contain exactly the same data. If one disk should fail, the
storage manager will automatically keep the data available by accessing
the other mirror. Three-way mirroring in LVM (or additional plexes with
VxVM) may be used to allow for online backups or even to provide an
additional level of high availability.
To protect against Fibre Channel or SCSI bus failures, each copy of the
data must be accessed by a separate bus; that is, you cannot have all
copies of the data on disk drives connected to the same bus.
It is critical for high availability that you mirror both data and root
disks. If you do not mirror your data disks and there is a disk failure, you
will not be able to run your applications on any node in the cluster until
the disk has been replaced and the data reloaded. If the root disk fails,
you will be able to run your applications on other nodes in the cluster,
since the data is shared. However, system behavior at the time of a root
disk failure is unpredictable, and it is possible for an application to hang
while the system is still running, preventing it from being started on
another node until the failing node is halted. Mirroring the root disk can
allow the system to continue normal operation when a root disk failure
occurs, and help avoid this downtime.
Chapter 2 45
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
46 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
Chapter 2 47
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
Figure 2-6 below shows a similar cluster with a disk array connected to
each node on two I/O channels. This kind of configuration can use LVM’s
PV Links or other multipath software such as VERITAS Dynamic
Multipath (DMP) or EMC PowerPath
48 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
Chapter 2 49
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
50 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
Note that if both nodes had their primary root disks connected to the
same bus, you would have an unsupported configuration.
You can put a mirror copy of Node B's root disk on the same SCSI bus as
Node A's primary root disk, because three failures would have to occur
for both systems to boot at the same time, which is an acceptable risk. In
such a scenario, Node B would have to lose its primary root disk and be
rebooted, and Node A would have to be rebooted at the same time Node B
is, for the IODC firmware to run into a problem. This configuration is
shown in Figure 2-9.
Chapter 2 51
Understanding Serviceguard Hardware Configurations
Redundant Disk Storage
Note that you cannot use a disk within a disk array as a root disk if the
array is on a shared bus.
52 Chapter 2
Understanding Serviceguard Hardware Configurations
Redundant Power Supplies
Chapter 2 53
Understanding Serviceguard Hardware Configurations
Larger Clusters
Larger Clusters
You can create clusters of up to 16 nodes with Serviceguard. Clusters of
up to 16 nodes may be built by connecting individual SPUs via Ethernet
and using FDDI networking.
The possibility of configuring a cluster consisting of 16 nodes does not
mean that all types of cluster configuration behave in the same way in a
16-node configuration. For example, in the case of shared SCSI buses,
the practical limit on the number of nodes that can be attached to the
same shared bus is four, because of bus loading and limits on cable
length. Even in this case, 16 nodes could be set up as an administrative
unit, and sub-groupings of four could be set up on different SCSI buses
which are attached to different mass storage devices.
In the case of non-shared SCSI connections to an XP series or EMC disk
array, the four-node limit does not apply. Each node can be connected
directly to the XP or EMC by means of two SCSI buses supporting PV
links. Packages can be configured to fail over among all sixteen nodes.
For more about this type of configuration, see “Point to Point
Connections to Storage Devices,” below.
NOTE When configuring larger clusters, be aware that cluster and package
configuration times as well as execution times for commands such as
cmviewcl will be extended. In the man pages for some commands, you
can find options to help to reduce the time. For example, refer to the man
page for cmquerycl for options that can reduce the amount of time
needed for probing disks or networks.
Active/Standby Model
You can also create clusters in which there is a standby node. For
example, an eight node configuration in which one node acts as the
standby for the other seven could easily be set up by equipping the
backup node with seven shared buses allowing separate connections to
each of the active nodes. This configuration is shown in Figure 2-10.
54 Chapter 2
Understanding Serviceguard Hardware Configurations
Larger Clusters
Chapter 2 55
Understanding Serviceguard Hardware Configurations
Larger Clusters
56 Chapter 2
Understanding Serviceguard Software Components
3 Understanding Serviceguard
Software Components
• Serviceguard Architecture
• How the Cluster Manager Works
• How the Package Manager Works
• How Package Control Scripts Work
• How the Network Manager Works
• Volume Managers for Data Storage
• Responses to Failures
If you are ready to start setting up Serviceguard clusters, skip ahead to
Chapter 5, “Building an HA Cluster Configuration,” on page 187
Chapter 3 57
Understanding Serviceguard Software Components
Serviceguard Architecture
Serviceguard Architecture
The following figure shows the main software components used by
Serviceguard. This chapter discusses these components in some detail.
Packages Apps/Services/Resources
Network Manager
Serviceguard Daemons
There are 12 daemon processes associated with Serviceguard. They are:
58 Chapter 3
Understanding Serviceguard Software Components
Serviceguard Architecture
Chapter 3 59
Understanding Serviceguard Software Components
Serviceguard Architecture
update the kernel timer, indicating a kernel hang. Before a TOC due to
the expiration of the safety timer, messages will be written to
/var/adm/syslog/syslog.log and the kernel’s message buffer.
The cmcld daemon also detects the health of the networks on the system
and performs local lan failover. Finally, this daemon handles the
management of Serviceguard packages, determining where to run them
and when to start them.
60 Chapter 3
Understanding Serviceguard Software Components
Serviceguard Architecture
to the object manager and receive responses from it. This daemon may
not be running on your system; it is used only by clients of the object
manager.
cmomd accepts connections from clients, and examines queries. The
queries are decomposed into categories (of classes) which are serviced by
various providers. The providers gather data from various sources,
including, commonly, the cmclconfd daemons on all connected nodes,
returning data to a central assimilation point where it is filtered to meet
the needs of the exact client query. This daemon is started by inetd(1M).
There are entries in the /etc/inetd.conf file.
Chapter 3 61
Understanding Serviceguard Software Components
Serviceguard Architecture
CFS Components
The HP Serviceguard Storage Management Suite offers additional
components for interfacing with the VERITAS Cluster File System.
Documents for the management suite are posted on
http://docs.hp.com.
VERITAS CFS components operate directly over Ethernet networks that
connect the nodes within a cluster. Redundant networks are required to
avoid single points of failure.
The VERITAS CFS components are:
62 Chapter 3
Understanding Serviceguard Software Components
Serviceguard Architecture
Chapter 3 63
Understanding Serviceguard Software Components
How the Cluster Manager Works
Heartbeat Messages
Central to the operation of the cluster manager is the sending and
receiving of heartbeat messages among the nodes in the cluster. Each
node in the cluster exchanges heartbeat messages with the cluster
coordinator over each monitored TCP/IP network or RS232 serial line
configured as a heartbeat device. (LAN monitoring is further discussed
later in the section “Monitoring LAN Interfaces and Detecting Failure”
on page 103)
If a cluster node does not receive heartbeat messages from all other
cluster nodes within the prescribed time, a cluster re-formation is
initiated. At the end of the re-formation, if a new set of nodes form a
64 Chapter 3
Understanding Serviceguard Software Components
How the Cluster Manager Works
NOTE You cannot run heartbeat on a serial line by itself. See “Using a Serial
(RS232) Heartbeat Line” on page 41 for more about serial lines in
Serviceguard.
Each node sends its heartbeat message at a rate specified by the cluster
heartbeat interval. The cluster heartbeat interval is set in the cluster
configuration file, which you create as a part of cluster configuration,
described fully in Chapter 5, “Building an HA Cluster Configuration,” on
page 187
Chapter 3 65
Understanding Serviceguard Software Components
How the Cluster Manager Works
66 Chapter 3
Understanding Serviceguard Software Components
How the Cluster Manager Works
Cluster Lock
Although a cluster quorum of more than 50% is generally required,
exactly 50% of the previously running nodes may re-form as a new
cluster provided that the other 50% of the previously running nodes do
not also re-form. This is guaranteed by the use of a tie-breaker to choose
between the two equal-sized node groups, allowing one group to form the
cluster and forcing the other group to shut down. This tie-breaker is
known as a cluster lock. The cluster lock is implemented either by
means of a lock disk or a quorum server.
The cluster lock is used as a tie-breaker only for situations in which a
running cluster fails and, as Serviceguard attempts to form a new
cluster, the cluster is split into two sub-clusters of equal size. Each
sub-cluster will attempt to acquire the cluster lock. The sub-cluster
which gets the cluster lock will form the new cluster, preventing the
Chapter 3 67
Understanding Serviceguard Software Components
How the Cluster Manager Works
Lock Requirements
A one-node cluster does not require a cluster lock. A two-node cluster
requires a cluster lock. In clusters larger than 3 nodes, a cluster lock is
strongly recommended. If you have a cluster with more than four nodes,
use a quorum server; a cluster lock disk is not allowed for that size.
68 Chapter 3
Understanding Serviceguard Software Components
How the Cluster Manager Works
Serviceguard periodically checks the health of the lock disk and writes
messages to the syslog file when a lock disk fails the health check. This
file should be monitored for early detection of lock disk problems.
You can choose between two lock disk options—a single or dual lock
disk—based on the kind of high availability configuration you are
building. A single lock disk is recommended where possible. With both
single and dual locks, however, it is important that the cluster lock be
available even if the power circuit to one node fails; thus, the choice of a
lock configuration depends partly on the number of power circuits
available. Regardless of your choice, all nodes in the cluster must have
access to the cluster lock to maintain high availability.
Chapter 3 69
Understanding Serviceguard Software Components
How the Cluster Manager Works
NOTE A dual lock disk does not provide a redundant cluster lock. In fact, the
dual lock is a compound lock. This means that two disks must be
available at cluster formation time rather than the one that is needed for
a single lock disk. Thus, the only recommended usage of the dual cluster
lock is when the single cluster lock cannot be isolated at the time of a
failure from exactly one half of the cluster nodes.
If one of the dual lock disks fails, Serviceguard will detect this when it
carries out periodic checking, and it will write a message to the syslog
file. After the loss of one of the lock disks, the failure of a cluster node
could cause the cluster to go down if the remaining surviving node(s)
cannot access the surviving cluster lock disk.".
70 Chapter 3
Understanding Serviceguard Software Components
How the Cluster Manager Works
area in memory for each cluster, and when a node obtains the cluster
lock, this area is marked so that other nodes will recognize the lock as
“taken.” If communications are lost between two equal-sized groups of
nodes, the group that obtains the lock from the Quorum Server will take
over the cluster and the other nodes will perform a TOC. Without a
cluster lock, a failure of either group of nodes will cause the other group,
and therefore the cluster, to halt. Note also that if the quorum server is
not available during an attempt to access it, the cluster will halt.
The operation of the quorum server is shown in Figure 3-3. When there
is a loss of communication between node 1 and node 2, the quorum server
chooses one node (in this example, node 2) to continue running in the
cluster. The other node halts.
The quorum server runs on a separate system, and can provide quorum
services for multiple clusters.
No Cluster Lock
Normally, you should not configure a cluster of three or fewer nodes
without a cluster lock. In two-node clusters, a cluster lock is required.
You may consider using no cluster lock with configurations of three or
more nodes, although the decision should be affected by the fact that any
cluster may require tie-breaking. For example, if one node in a
Chapter 3 71
Understanding Serviceguard Software Components
How the Cluster Manager Works
72 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
• Executes the control scripts that run and halt packages and their
services.
• Reacts to changes in the status of monitored resources.
Package Types
Three different types of packages can run in the cluster: the most
common is the failover package. There are also special-purpose
packages that run on more than one node at a time, and so do not
failover. They are typically used to manage resources of certain failover
packages.
Non-failover Packages
There are also two types of special-purpose packages that do not failover
and that can run on more than one node at the same time: the system
multi-node package, which runs on all nodes in the cluster, and the
multi-node package, which can be configured to run on all or some of
the nodes in the cluster.
These packages are not for general use, and are only supported by
Hewlett-Packard for specific applications.
One common system multi-node package is shipped with the
Serviceguard product. It is used on systems that employ VERITAS
Cluster Volume Manager (CVM) as a storage manager. This package is
Chapter 3 73
Understanding Serviceguard Software Components
How the Package Manager Works
Failover Packages
A failover package starts up on an appropriate node when the cluster
starts. A package failover takes place when the package coordinator
initiates the start of a package on a new node. A package failover
involves both halting the existing package (in the case of a service,
network, or resource failure), and starting the new instance of the
package.
Failover is shown in the following figure:
74 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
AUTO_RUN YES
Chapter 3 75
Understanding Serviceguard Software Components
How the Package Manager Works
76 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
Figure 3-6 shows the condition where Node 1 has failed and Package 1
has been transferred to Node 2. Package 1's IP address was transferred
to Node 2 along with the package. Package 1 continues to be available
and is now running on Node 2. Also note that Node 2 can now access both
Package1’s disk and Package2’s disk.
Chapter 3 77
Understanding Serviceguard Software Components
How the Package Manager Works
# Enter the failover policy for this package. This policy will
be used
# to select an adoptive node whenever the package needs to be
started.
# The default policy unless otherwise specified is
CONFIGURED_NODE.
# This policy will select nodes in priority order from the list
of
# NODE_NAME entries specified below.
#FAILOVER_POLICY CONFIGURED_NODE
If you use CONFIGURED_NODE as the value for the failover policy, the
package will start up on the highest priority node available in the node
list. When a failover occurs, the package will move to the next highest
priority node in the list that is available.
If you use MIN_PACKAGE_NODE as the value for the failover policy, the
package will start up on the node that is currently running the fewest
other packages. (Note that this does not mean the lightest load; the only
thing that is checked is the number of packages currently running on the
node.)
78 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
When the cluster starts, each package starts as shown in Figure 3-7.
If a failure occurs, any package would fail over to the node containing
fewest running packages, as in Figure 3-8, which shows a failure on node
2:
Chapter 3 79
Understanding Serviceguard Software Components
How the Package Manager Works
If you use CONFIGURED_NODE as the value for the failover policy, the
package will start up on the highest priority node in the node list,
assuming that the node is running as a member of the cluster. When a
failover occurs, the package will move to the next highest priority node in
the list that is available.
80 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
#FAILBACK_POLICY MANUAL
Node1 panics, and after the cluster reforms, pkgA starts running on
node4:
Chapter 3 81
Understanding Serviceguard Software Components
How the Package Manager Works
After rebooting, node 1 rejoins the cluster. At that point, pkgA will be
automatically stopped on node 4 and restarted on node 1.
82 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
Chapter 3 83
Understanding Serviceguard Software Components
How the Package Manager Works
84 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
Options in
Serviceguard
Switching Behavior Manager Parameters in ASCII File
Chapter 3 85
Understanding Serviceguard Software Components
How the Package Manager Works
Options in
Serviceguard
Switching Behavior Manager Parameters in ASCII File
86 Chapter 3
Understanding Serviceguard Software Components
How the Package Manager Works
Options in
Serviceguard
Switching Behavior Manager Parameters in ASCII File
Chapter 3 87
Understanding Serviceguard Software Components
How Package Control Scripts Work
88 Chapter 3
Understanding Serviceguard Software Components
How Package Control Scripts Work
Chapter 3 89
Understanding Serviceguard Software Components
How Package Control Scripts Work
NOTE If you configure the package while the cluster is running, the package
does not start up immediately after the cmapplyconf command
completes. To start the package without halting and restarting the
cluster, issue the cmrunpkg or cmmodpkg command.
How does a failover package start up, and what is its behavior while it is
running? Some of the many phases of package life are shown in
Figure 3-13.
90 Chapter 3
Understanding Serviceguard Software Components
How Package Control Scripts Work
Chapter 3 91
Understanding Serviceguard Software Components
How Package Control Scripts Work
At any step along the way, an error will result in the script exiting
abnormally (with an exit code of 1). For example, if a package service is
unable to be started, the control script will exit with an error.
Also, if the run script execution is not complete before the time specified
in the RUN_SCRIPT_TIMEOUT, the package manager will kill the script.
During run script execution, messages are written to a log file in the
same directory as the run script. This log has the same name as the run
script and the extension .log. Normal starts are recorded in the log,
together with error messages or warnings related to starting the
package.
NOTE After the package run script has finished its work, it exits, which means
that the script is no longer executing once the package is running
normally. After the script exits, the PIDs of the services started by the
script are monitored by the package manager directly. If the service dies,
the package manager will then run the package halt script or, if
SERVICE_FAILFAST_ENABLED is set to YES, it will halt the node on which
the package is running. If a number of Restarts is specified for a service
in the package control script, the service may be restarted if the restart
count allows it, without re-running the package run script.
92 Chapter 3
Understanding Serviceguard Software Components
How Package Control Scripts Work
Chapter 3 93
Understanding Serviceguard Software Components
How Package Control Scripts Work
NOTE If you set <n> restarts and also set SERVICE_FAILFAST_ENABLED to YES,
the failfast will take place after <n> restart attempts have failed. It does
not make sense to set SERVICE_RESTART to “-R” for a service and also set
SERVICE_FAILFAST_ENABLED to YES.
94 Chapter 3
Understanding Serviceguard Software Components
How Package Control Scripts Work
Package halting normally means that the package halt script executes
(see the next section). However, if a failover package’s configuration has
the SERVICE_FAILFAST_ENABLED flag set to yes for the service that fails,
then the node will halt as soon as the failure is detected. If this flag is not
set, the loss of a service will result in halting the package gracefully by
running the halt script.
If AUTO_RUN is set to YES, the package will start up on another eligible
node, if it meets all the requirements for startup. If AUTO_RUN is set to NO,
then the package simply halts without starting up anywhere else.
Chapter 3 95
Understanding Serviceguard Software Components
How Package Control Scripts Work
At any step along the way, an error will result in the script exiting
abnormally (with an exit code of 1). Also, if the halt script execution is
not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the
package manager will kill the script. During halt script execution,
messages are written to a log file in the same directory as the halt script.
96 Chapter 3
Understanding Serviceguard Software Components
How Package Control Scripts Work
This log has the same name as the halt script and the extension.log.
Normal starts are recorded in the log, together with error messages or
warnings related to halting the package.
Chapter 3 97
Understanding Serviceguard Software Components
How Package Control Scripts Work
Table 3-4 Error Conditions and Package Movement for Failover Packages
98 Chapter 3
Understanding Serviceguard Software Components
How Package Control Scripts Work
Table 3-4 Error Conditions and Package Movement for Failover Packages
Chapter 3 99
Understanding Serviceguard Software Components
How Package Control Scripts Work
Table 3-4 Error Conditions and Package Movement for Failover Packages
100 Chapter 3
Understanding Serviceguard Software Components
How the Network Manager Works
Chapter 3 101
Understanding Serviceguard Software Components
How the Network Manager Works
Types of IP Addresses
Both IPv4 and IPv6 address types are supported in Serviceguard. IPv4
addresses are the traditional addresses of the form “n.n.n.n” where ‘n’ is
a decimal digit between 0 and 255. IPv6 addresses have the form
“x:x:x:x:x:x:x:x” where ‘x’ is the hexadecimal value of each of eight 16-bit
pieces of the 128-bit address. Only IPv4 addresses are supported as
heartbeat addresses, but both IPv4 and IPv6 addresses (including
various combinations) can be defined as stationary IPs in a cluster. Both
IPv4 and IPv6 addresses also can be used as relocatable (package) IP
addresses.
Load Sharing
In one package, it is possible to have multiple services that are
associated with the same IP address. If one service is moved to a new
system, then the other services using the IP address will also be
migrated. Load sharing can be achieved by making each service its own
package and giving it a unique IP address. This gives the administrator
the ability to move selected services to less loaded systems.
102 Chapter 3
Understanding Serviceguard Software Components
How the Network Manager Works
— All bridged nets in the cluster should have more than two
interfaces each.
Chapter 3 103
Understanding Serviceguard Software Components
How the Network Manager Works
Local Switching
A local network switch involves the detection of a local network interface
failure and a failover to the local backup LAN card (also known as the
Standby LAN card). The backup LAN card must not have any IP
addresses configured.
In the case of local network switch, TCP/IP connections are not lost for
Ethernet, but IEEE 802.3 connections will be lost. For IPv4, Ethernet,
Token Ring and FDDI use the ARP protocol, and HP-UX sends out an
unsolicited ARP to notify remote systems of address mapping between
MAC (link level) addresses and IP level addresses. IEEE 802.3 does not
have the rearp function.
IPv6 uses the Neighbor Discovery Protocol (NDP) instead of ARP. The
NDP protocol is used by hosts and routers to do the following:
104 Chapter 3
Understanding Serviceguard Software Components
How the Network Manager Works
Chapter 3 105
Understanding Serviceguard Software Components
How the Network Manager Works
106 Chapter 3
Understanding Serviceguard Software Components
How the Network Manager Works
Local network switching will work with a cluster containing one or more
nodes. You may wish to design a single-node cluster in order to take
advantages of this local network switching feature in situations where
you need only one node and do not wish to set up a more complex cluster.
Chapter 3 107
Understanding Serviceguard Software Components
How the Network Manager Works
Remote Switching
A remote switch (that is, a package switch) involves moving packages
and their associated IP addresses to a new system. The new system must
already have the same subnetwork configured and working properly,
otherwise the packages will not be started. With remote switching, TCP
connections are lost. TCP applications must reconnect to regain
connectivity; this is not handled automatically. Note that if the package
is dependent on multiple subnetworks, all subnetworks must be
available on the target node before the package will be started.
Note that remote switching is supported only between LANs of the same
type. For example, a remote switchover between Ethernet on one
machine and FDDI interfaces on the failover machine is not supported.
The remote switching of relocatable IP addresses was shown previously
in Figure 3-5 and Figure 3-6.
108 Chapter 3
Understanding Serviceguard Software Components
How the Network Manager Works
Chapter 3 109
Understanding Serviceguard Software Components
How the Network Manager Works
VLAN Configurations
Virtual LAN configuration using HP-UX VLAN software is now
supported in Serviceguard clusters. VLAN is also supported on
dual-stack kernel.
What is VLAN?
Virtual LAN (or VLAN) is a networking technology that allows grouping
of network nodes based on an association rule regardless of their
physical locations. Specifically, VLAN can be used to divide a physical
LAN into multiple logical LAN segments or broadcast domains. Each
interface in a logical LAN will be assigned a tag id at the time it is
configured. VLAN interfaces, which share the same tag id, can
communicate to each other as if they were on the same physical network.
The advantages of creating virtual LANs are to reduce broadcast traffic,
increase network performance and security, and improve manageability.
On HP-UX, initial VLAN association rules are port-based, IP-based, and
protocol-based. Multiple VLAN interfaces can be configured from a
physical LAN interface and then appear to applications as regular
network interfaces. IP addresses can then be assigned on these VLAN
110 Chapter 3
Understanding Serviceguard Software Components
How the Network Manager Works
interfaces to form their own subnets. Please refer to the document Using
HP-UX VLAN (T1453-90001) for more details on how to configure VLAN
interfaces.
Configuration Restrictions
HP-UX allows up to 1024 VLANs to be created from a physical NIC port.
Obviously, a large pool of system resources is required to accommodate
such a configuration. With the availability of VLAN technology,
Serviceguard may face potential performance degradation, high CPU
utilization and memory shortage issues if there is a large number of
network interfaces configured in each cluster node. To provide enough
flexibility in VLAN networking, Serviceguard solutions should adhere to
the following VLAN and general network configuration requirements:
Chapter 3 111
Understanding Serviceguard Software Components
How the Network Manager Works
112 Chapter 3
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Chapter 3 113
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Figure 3-22 shows the mirrors configured into LVM volume groups,
shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume
groups are activated by Serviceguard packages for use by highly
available applications.
114 Chapter 3
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Chapter 3 115
Understanding Serviceguard Software Components
Volume Managers for Data Storage
NOTE LUN definition is normally done using utility programs provided by the
disk array manufacturer. Since arrays vary considerably, you should
refer to the documentation that accompanies your storage unit.
Finally, the multiple paths are configured into volume groups as shown
in Figure 3-25.
116 Chapter 3
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Chapter 3 117
Understanding Serviceguard Software Components
Volume Managers for Data Storage
118 Chapter 3
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Chapter 3 119
Understanding Serviceguard Software Components
Volume Managers for Data Storage
• will run applications that require fast disk group activation after
package failover.
• require activation on more than one node at a time, for example to
perform a backup from one node while a package using the volume is
active on another node. In this case, the package using the disk
group would have the disk group active in exclusive write mode while
the node that is doing the backup would have the disk group active in
shared read mode.
• require activation on more than one node at the same time, for
example Oracle RAC.
Heartbeat configurations are configured differently for CVM 3.5 and 4.1.
See “Redundant Heartbeat Subnet Required” on page 121.
CVM is supported on 8 nodes or fewer at this release. Shared storage
devices must be connected to all nodes in the cluster, whether or not the
node accesses data on the device.
120 Chapter 3
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Chapter 3 121
Understanding Serviceguard Software Components
Volume Managers for Data Storage
122 Chapter 3
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Shared Logical • Provided free with SGeRAC for • Lacks the flexibility and
Volume Manager multi-node access to RAC data extended features of some
(SLVM) other volume managers.
• Supports up to 16 nodes in
shared read/write mode for each • Limited mirroring support.
cluster
• Supports exclusive activation
• Supports multiple heartbeat
subnets.
• Online node configuration with
activated shared volume groups
(using specific SLVM kernel and
Serviceguard revisions)
Base-VxVM • Software is free with HP-UX 11i • Cannot be used for a cluster
and later. lock.
• Java-based administration • Using the disk as a
through graphical user root/boot disk is only
interface. supported for VxVM 3.5 or
later, when installed on
• Striping (RAID-0) support.
HP-UX 11.11.
• Concatenation.
• Supports only exclusive
• Online resizing of volumes. read or write activation.
• Supports multiple heartbeat • Package delays are
subnets. possible, due to lengthy
vxdg import at the time the
• Supports up to 16 nodes
package is started or failed
over.
Chapter 3 123
Understanding Serviceguard Software Components
Volume Managers for Data Storage
124 Chapter 3
Understanding Serviceguard Software Components
Volume Managers for Data Storage
Chapter 3 125
Understanding Serviceguard Software Components
Responses to Failures
Responses to Failures
Serviceguard responds to different kinds of failures in specific ways. For
most hardware failures, the response is not user-configurable, but for
package and service failures, you can choose the system’s response,
within limits.
126 Chapter 3
Understanding Serviceguard Software Components
Responses to Failures
Chapter 3 127
Understanding Serviceguard Software Components
Responses to Failures
Service Restarts
You can allow a service to restart locally following a failure. To do this,
you indicate a number of restarts for each service in the package control
script. When a service starts, the variable RESTART_COUNT is set in the
service's environment. The service, as it executes, can examine this
variable to see whether it has been restarted after a failure, and if so, it
can take appropriate action such as cleanup.
128 Chapter 3
Understanding Serviceguard Software Components
Responses to Failures
Chapter 3 129
Understanding Serviceguard Software Components
Responses to Failures
130 Chapter 3
Planning and Documenting an HA Cluster
• General Planning
Chapter 4 131
Planning and Documenting an HA Cluster
• Hardware Planning
• Power Supply Planning
• Quorum Server Planning
• LVM Planning
NOTE Planning and installation overlap considerably, so you may not be able to
complete the worksheets before you proceed to the actual configuration.
In cases where the worksheet is incomplete, fill in the missing elements
to document the system as you proceed with the configuration.
132 Chapter 4
Planning and Documenting an HA Cluster
General Planning
General Planning
A clear understanding of your high availability objectives will quickly
help you to define your hardware requirements and design your system.
Use the following questions as a guide for general planning:
Chapter 4 133
Planning and Documenting an HA Cluster
General Planning
134 Chapter 4
Planning and Documenting an HA Cluster
Hardware Planning
Hardware Planning
Hardware planning requires examining the physical hardware itself.
One useful procedure is to sketch the hardware configuration in a
diagram that shows adapter cards and buses, cabling, disks and
peripherals. A sample diagram for a two-node cluster is shown in
Figure 4-1.
Create a similar sketch for your own cluster, and record the information
on the Hardware Worksheet. Indicate which device adapters occupy
which slots, and determine the bus address for each adapter. Update the
details as you do the cluster configuration (described in Chapter 5). Use
one form for each SPU. The form has three parts:
• SPU Information
Chapter 4 135
Planning and Documenting an HA Cluster
Hardware Planning
• Network Information
• Disk I/O Information
SPU Information
SPU information includes the basic characteristics of the systems you
are using in the cluster. Different models of computers can be mixed in
the same cluster. This configuration model also applies to HP Integrity
servers. HP-UX workstations are not supported for Serviceguard.
On one worksheet per node, include the following items:
Server Number
Enter the series number; for example, rp8400 or
rx8620-32.
Host Name
Enter the name to be used on the system as the host
name.
Memory Capacity
Enter the memory in MB.
Number of I/O slots
Indicate the number of slots.
Network Information
Serviceguard monitors LAN interfaces as well as serial lines (RS232)
configured to carry cluster heartbeat only.
136 Chapter 4
Planning and Documenting an HA Cluster
Hardware Planning
LAN Information
While a minimum of one LAN interface per subnet is required, at least
two LAN interfaces, one primary and one or more standby, are needed to
eliminate single points of network failure.
It is recommended that you configure heartbeats on all subnets,
including those to be used for client data. On the worksheet, enter the
following for each LAN interface:
Subnet Name
Enter the IP address mask for the subnet. Note that
heartbeat IP addresses must be on the same subnet on
each node.
Interface Name
Enter the name of the LAN card as used by this node to
access the subnet. This name is shown by lanscan
after you install the card.
IP Address
Enter this node’s host IP address(es), to be used on this
interface. If the interface is a standby and does not
have an IP address, enter 'Standby.'
An IPv4 address is a string of 4 digits separated with
decimals, in this form:
nnn.nnn.nnn.nnn
An IPV6 address is a string of 8 hexadecimal values
separated with colons, in this form:
xxx:xxx:xxx:xxx:xxx:xxx:xxx:xxx.
For more details of IPv6 address format, see the
Appendix , “IPv6 Address Types,” on page 468
NETWORK_FAILURE_DETECTION
When there is a primary and a standby network card,
Serviceguard needs to determine when a card has
failed, so it knows whether to fail traffic over to the
other card. To detect failures, Serviceguard’s Network
Manager monitors both inbound and outbound traffic.
The Network Manager will mark the card DOWN and
Chapter 4 137
Planning and Documenting an HA Cluster
Hardware Planning
• Heartbeat
• Client Traffic
• Standby
Label the list to show the subnets that belong to a bridged net.
Information from this section of the worksheet is used in creating the
subnet groupings and identifying the IP addresses in the configuration
steps for the cluster manager and package manager.
138 Chapter 4
Planning and Documenting an HA Cluster
Hardware Planning
RS232 Information
If you plan to configure a serial line (RS232), you need to determine the
serial device file that corresponds with the serial port on each node.
1. If you are using a MUX panel, make a note of the system slot number
that corresponds to the MUX and also note the port number that
appears next to the selected port on the panel.
2. On each node, use ioscan -fnC tty to display hardware addresses
and device file names. For example:
# ioscan -fnC tty
This lists all the device files associated with each RS232 device on a
specific node.
3. Connect the appropriate RS-232 cable between the desired ports on
the two nodes.
4. Once you have identified the device files, verify your connection as
follows. Assume that node 1 uses /dev/tty0p0, and node 2 uses
/dev/tty0p6.
Chapter 4 139
Planning and Documenting an HA Cluster
Hardware Planning
Primary System A 7
Primary System B 6
Primary System C 5
Primary System D 4
Disk #1 3
Disk #2 2
Disk #3 1
Disk #4 0
Disk #5 15
Disk #6 14
Others 13 - 8
140 Chapter 4
Planning and Documenting an HA Cluster
Hardware Planning
• diskinfo
• ioscan -fnC disk
• lssf /dev/*dsk/c*
• bdf
• mount
• swapinfo
• vgdisplay -v
Chapter 4 141
Planning and Documenting an HA Cluster
Hardware Planning
• lvdisplay -v
• lvlnboot -v
• vxdg list (VxVM and CVM)
• vxprint (VxVM and CVM)
These are standard HP-UX commands. See their man pages for
information of specific usage. The commands should be issued from all
nodes after installing the hardware and rebooting the system. The
information will be useful when doing storage group and cluster
configuration. A printed listing of the output from the lssf command
can be marked up to indicate which physical volume group a disk should
be assigned to.
=============================================================================
LAN Information:
=============================================================================
Network Failure Dections: ____INOUT___
=============================================================================
142 Chapter 4
Planning and Documenting an HA Cluster
Hardware Planning
=============================================================================
Disk I/O Information for Shared Disks:
Bus Type _SCSI_ Slot Number _4__ Address _16_ Disk Device File __c0t1d0_
Bus Type _SCSI_ Slot Number _6_ Address _24_ Disk Device File __c0t2d0_
Bus Type ______ Slot Number ___ Address ____ Disk Device File _________
Attach a printout of the output from the ioscan -fnC disk command
after installing disk hardware and rebooting the system. Mark this
printout to indicate which physical volume group each disk belongs to.
Chapter 4 143
Planning and Documenting an HA Cluster
Power Supply Planning
144 Chapter 4
Planning and Documenting an HA Cluster
Power Supply Planning
==========================================================================
Disk Power:
==========================================================================
Tape Backup Power:
==========================================================================
Chapter 4 145
Planning and Documenting an HA Cluster
Power Supply Planning
Other Power:
146 Chapter 4
Planning and Documenting an HA Cluster
Quorum Server Planning
NOTE It is recommended that the node on which the quorum server is running
be in the same subnet as the clusters for which it is providing services.
This will help prevent any network delays which could affect quorum
server operation. If you use a different subnet, you may experience
network delays which may cause quorum server timeouts. To prevent
these timeouts, you can use the QS_TIMEOUT_EXTENSION parameter in
the cluster ASCII file to increase the quorum server timeout interval.
If the network used to connect to the quorum server is a cluster
heartbeat network, ensure that at least one other network is also a
heartbeat network so that both are not likely to fail at the same time.
For more information, see the Quorum Server Release Notes and white
papers at: http://www.docs.hp.com -> High Availability.
Use the Quorum Server Worksheet to identify a quorum server for use
with one or more clusters. You should also enter quorum server host and
timing parameters on the Cluster Configuration Worksheet.
On the QS worksheet, enter the following:
Quorum Server Host
Enter the host name for the quorum server .
IP Address Enter the IP address by which the quorum server will
be accessed. The quorum server has to be configured on
an IPv4 network. IPv6 addresses are not supported.
Supported Node Names
Chapter 4 147
Planning and Documenting an HA Cluster
Quorum Server Planning
==============================================================================
148 Chapter 4
Planning and Documenting an HA Cluster
LVM Planning
LVM Planning
You can create storage groups using the HP-UX Logical Volume
Manager (LVM), or using VERITAS VxVM and CVM software, which are
described in the next section.
When designing your disk layout using LVM, you should consider the
following:
Chapter 4 149
Planning and Documenting an HA Cluster
LVM Planning
LVM Worksheet
The following worksheet will help you organize and record your specific
physical disk configuration. This worksheet is an example; blank
worksheets are in Appendix F, “Blank Planning Worksheets,” on
page 443. Make as many copies as you need. Fill out the worksheet and
keep it for future reference.
This worksheet only includes volume groups and physical volumes. The
Package Configuration worksheet (presented later in this chapter)
contains more space for recording information about the logical volumes
and file systems that are part of each volume group.
=============================================================================
150 Chapter 4
Planning and Documenting an HA Cluster
LVM Planning
Chapter 4 151
Planning and Documenting an HA Cluster
CVM and VxVM Planning
• You must create a rootdg disk group on each cluster node that will be
using VxVM storage. This is not the same as the HP-UX root disk, if
an LVM volume group is used. The VxVM root disk group can only be
imported on the node where it is created. This disk group is created
only once on each cluster node.
NOTE: the VxVM rootdg is only required for VxVM 3.5 and earlier; it
is not required for VxVM 4.1
• CVM disk groups are created after the cluster is configured, whereas
VxVM disk groups may be created before cluster configuration if
desired.
• High availability applications, services, and data should be placed in
separate disk groups from non-high availability applications,
services, and data.
• You must not group two different high availability applications,
services, or data, whose control needs to be transferred
independently, onto the same disk group.
• Your HP-UX root disk can belong to an LVM or VxVM volume group
(starting from VxVM 3.5) that is not shared among cluster nodes.
• The cluster lock disk can only be configured with an LVM volume
group.
• VxVM disk group names should not be entered into the cluster
configuration ASCII file. These names are not inserted into the
cluster configuration ASCII file by cmquerycl.
152 Chapter 4
Planning and Documenting an HA Cluster
CVM and VxVM Planning
=========================================================================
Chapter 4 153
Planning and Documenting an HA Cluster
Cluster Configuration Planning
• The length of the cluster heartbeat interval and node timeout. They
should each be set as short as practical, but not shorter than
1000000 (one second) and 2000000 (two seconds), respectively. The
recommended value for heartbeat interval is 1000000 (one second),
and the recommended value for node timeout is within the 5 to 8
second range (5000000 to 8000000).
• The design of the run and halt instructions in the package control
script. They should be written for fast execution.
• The availability of raw disk access. Applications that use raw disk
access should be designed with crash recovery services.
• The application and database recovery time. They should be
designed for the shortest recovery time.
In addition, you must provide consistency across the cluster so that:
154 Chapter 4
Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 155
Planning and Documenting an HA Cluster
Cluster Configuration Planning
156 Chapter 4
Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 157
Planning and Documenting an HA Cluster
Cluster Configuration Planning
STATIONARY_IP
The IP address of each monitored subnet that does not
carry the cluster heartbeat. You can identify any
number of subnets to be monitored. If you want to
separate application data from heartbeat messages,
define a monitored non-heartbeat subnet here.
A stationary IP address can be either an IPv4 or an
IPv6 address. For more details of IPv6 address format,
see the Appendix , “IPv6 Address Types,” on page 468
FIRST_CLUSTER_LOCK_PV, SECOND_CLUSTER_LOCK_PV
The name of the physical volume within the Lock
Volume Group that will have the cluster lock written
on it. This parameter is FIRST_CLUSTER_LOCK_PV for
the first physical lock volume and
SECOND_CLUSTER_LOCK_PV for the second physical lock
volume. If there is a second physical lock volume, the
parameter SECOND_CLUSTER_LOCK_PV is included in
the file on a separate line. These parameters are only
used when you employ a lock disk for tie-breaking
services in the cluster.
158 Chapter 4
Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 159
Planning and Documenting an HA Cluster
Cluster Configuration Planning
160 Chapter 4
Planning and Documenting an HA Cluster
Cluster Configuration Planning
MAX_CONFIGURED_PACKAGES
This parameter sets the maximum number of packages
that can be configured in the cluster.
The minimum value is 0, and the maximum value is
150. The default value for Serviceguard A.11.17 is 150,
and you can change it without halting the cluster.
VOLUME_GROUP
The name of an LVM volume group whose disks are
attached to at least two nodes in the cluster. Such disks
are considered cluster aware. In the ASCII cluster
configuration file, this parameter is VOLUME_GROUP.
The volume group name can have up to 39 characters.
Access Control Policies
Specify three things for each policy: USER_NAME,
USER_HOST, and USER_ROLE. For Serviceguard
Manager, USER_HOST must be the name of the
Session node. Policies set in the configuration file of a
cluster and its packages must not be conflicting or
redundant. For more information, see “Editing Security
Files” on page 190.
FAILOVER_OPTIMIZATION
You will only see this parameter if you have installed
Serviceguard Extension for Faster Failover, a
separately purchased product. You enable the product
by setting this parameter to TWO_NODE. Default is
disabled, set to NONE. For more information about the
product and its cluster configuration requirements, go
to http://www.docs.hp.com/ -> high
availability and click Serviceguard Extension for
Faster Failover.
NETWORK_FAILURE_DETECTION
When there is a primary and a standby network card,
Serviceguard needs to determine when a card has
failed, so it knows whether to fail traffic over to the
other card. To detect failures, Serviceguard’s Network
Manager monitors both inbound and outbound traffic.
The Network Manager will mark the card DOWN and
Chapter 4 161
Planning and Documenting an HA Cluster
Cluster Configuration Planning
162 Chapter 4
Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 163
Planning and Documenting an HA Cluster
Package Configuration Planning
NOTE LVM Volume groups that are to be activated by packages must also be
defined as cluster aware in the cluster configuration file. See the
previous section on “Cluster Configuration Planning.” VERITAS disk
groups that are to be activated by packages must be defined in the
package configuration ASCII file, described below.
164 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
Create an entry for each logical volume, indicating its use for a file
system or for a raw device.
NOTE Do not use /etc/fstab to mount file systems that are used by
Serviceguard packages.
Chapter 4 165
Planning and Documenting an HA Cluster
Package Configuration Planning
CVM 4.1 without CFS VERITAS Cluster Volume Manager 4.1 uses
the system multi-node package SG-CFS-pkg to manage the cluster’s
volumes.
CVM 4.1 and the SG-CFS-pkg allow you to configure multiple heartbeat
networks. Using APA, Infiniband, or VLAN interfaces as the heartbeat
network is not supported.
CVM 4.1 with CFS CFS (VERITAS Cluster File System) is supported
for use with VERITAS Cluster Volume Manager Version 4.1.
The system multi-node package SG-CFS-pkg manages the cluster’s
volumes. Two sets of multi-node packages are also used: the CFS mount
packages, SG-CFS-MP-id# , and the CFS disk group packages,
SG-CFS-DG-id#. Create the multi-node packages with the cfs family of
commands; do not edit the ASCII file.
CVM 4.1 and the SG-CFS-pkg allow you to configure multiple heartbeat
networks. Using APA or Infiniband as the heartbeat network is not
supported.
You create a chain of dependencies for application failover package and
the non-failover packages:
166 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
CAUTION Once you create the disk group and mount point packages, it is
critical that you administer the cluster with the cfs commands,
including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. If you
use the general commands such as mount and umount, it could cause
serious problems such as writing to the local file system instead of
the cluster file system.
Any form of the mount command (for example, mount -o cluster,
dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or
cfsumount in a HP Serviceguard Storage Management Suite
environment with CFS should be done with caution. These non-cfs
commands could cause conflicts with subsequent command
operations on the file system or Serviceguard packages. Use of these
other forms of mount will not create an appropriate multi-node
package which means that the cluster packages are not aware of the
file system changes.
Chapter 4 167
Planning and Documenting an HA Cluster
Package Configuration Planning
NOTE Please note that the diskgroup and mount point multi-node packages
(SG-CFS-DG_ID# and SG-CFS-MP_ID#) do not monitor the health of
the disk group and mount point. They check that the application
packages that depend on them have access to the disk groups and
mount points. If the dependent application package loses access and
cannot read and write to the disk, it will fail; however that will not
cause the DG or MP multi-node package to fail.
168 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
RESOURCE_NAME /net/interfaces/lan/status/lan0
RESOURCE_POLLING_INTERVAL 60
RESOURCE_START DEFERRED
RESOURCE_UP_VALUE = UP
RESOURCE_NAME /net/interfaces/lan/status/lan1
RESOURCE_POLLING_INTERVAL 60
RESOURCE_START DEFERRED
RESOURCE_UP_VALUE = UP
RESOURCE_NAME /net/interfaces/lan/status/lan2
RESOURCE_POLLING_INTERVAL 60
RESOURCE_START AUTOMATIC
RESOURCE_UP_VALUE = UP
In the package control script, specify only the deferred resources, using
the DEFERRED_RESOURCE_NAME parameter:
DEFERRED_RESOURCE_NAME[0]="/net/interfaces/lan/status/lan0"
DEFERRED_RESOURCE_NAME[1]="/net/interfaces/lan/status/lan1"
Chapter 4 169
Planning and Documenting an HA Cluster
Package Configuration Planning
170 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
Chapter 4 171
Planning and Documenting an HA Cluster
Package Configuration Planning
172 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
Chapter 4 173
Planning and Documenting an HA Cluster
Package Configuration Planning
Run script timeout and Halt script timeout If the script has not
completed by the specified timeout value, Serviceguard
will terminate the script. In the ASCII configuration
file, these parameters are RUN_SCRIPT_TIMEOUT and
HALT_SCRIPT_TIMEOUT. Enter a value in seconds.
The default is 0, or no timeout. The minimum is 10
seconds, but the minimum HALT_SCRIPT_TIMEOUT
value must be greater than the sum of all the Service
Halt Timeout values. The absolute maximum value is
restricted only by the HP-UX parameter ULONG_MAX,
for an absolute limit of 4,294 seconds.
If the timeout is exceeded:
CVM diskgroups This parameter is used for CVM disk groups that do
not use VERITAS Cluster File System. Enter the
names of all the CVM disk groups the package will use.
In the ASCII package configuration file, this parameter
is called STORAGE_GROUP.
174 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
NOTE Use the parameter for CVM storage only. Do not enter
CVM disk groups that are used in a CFS cluster.
Additionally, do not enter the names of LVM volume
groups or VxVM disk groups in the package ASCII
configuration file.
Service name Enter a unique name for each service. In the ASCII
package configuration file, this parameter is called
SERVICE_NAME.
Define one SERVICE_NAME entry for each service.You
can configure a maximum of 30 services per package
and 900 services per cluster. The service name must
not contain any of the following illegal characters:
space, slash (/), backslash (\), and asterisk (*). All other
characters are legal. The service name can contain up
to 39 characters.
Service fail fast Enter Enabled or Disabled for each service. This
parameter indicates whether or not the failure of a
service results in the failure of a node. If the parameter
is set to Enabled, in the event of a service failure,
Serviceguard will halt the node on which the service is
running with a TOC. (An attempt is first made to
reboot the node prior to the TOC.)The default is
Disabled.
In the ASCII package configuration file, this parameter
is SERVICE_FAIL_FAST_ENABLED, and possible values
are YES and NO. The default is NO. Define one
SERVICE_FAIL_FAST_ENABLED entry for each service.
Chapter 4 175
Planning and Documenting an HA Cluster
Package Configuration Planning
176 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
Chapter 4 177
Planning and Documenting an HA Cluster
Package Configuration Planning
178 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
_______________________________________________________________________
Chapter 4 179
Planning and Documenting an HA Cluster
Package Configuration Planning
180 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
VXVOL
Controls the method of mirror recovery for mirrored
VxVM volumes.
Use the default VXVOL=“vxvol -g \$DiskGroup
startall” if you want the package control script to
wait until recovery has been completed.
Use VXVOL=“vxvol -g \$DiskGroup -o bg
startall” if you want the mirror resynchronization to
occur in parallel with the package startup.
Volume Groups
This array parameter contains a list of the LVM
volume groups that will be activated by the package.
Enter each VG on a separate line.
CVM Disk Groups
This array parameter contains a list of the VERITAS
CVM disk groups that will be used by the package.
Enter each disk group on a separate line. Begin the list
with CVM_DG[0], and increment the list in sequence. Do
not use CVM and VxVM disk group parameters to
reference devices used by CFS (VERITAS Cluster File
System). CFS resources are controlled by the
multi-node packages, SG-CFS-DG-id# and
SG-CFS-MP-id#.
VxVM Disk Groups
This array parameter contains a list of the VERITAS
VxVM disk groups that will be activated by the
package. Enter each disk group on a separate line.
Logical Volumes, File Systems and Mount Options
Chapter 4 181
Planning and Documenting an HA Cluster
Package Configuration Planning
182 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
Chapter 4 183
Planning and Documenting an HA Cluster
Package Configuration Planning
184 Chapter 4
Planning and Documenting an HA Cluster
Package Configuration Planning
VG[0]_______________VG[1]________________VG[2]________________
VGCHANGE: ______________________________________________
CVM_DG[0]___/dev/vx/dg01____CVM_DG[1]_____________CVM_DG[2]_______________
CVM_ACTIVATION_CMD: ______________________________________________
Chapter 4 185
Planning and Documenting an HA Cluster
Package Configuration Planning
VXVM_DG[0]___/dev/vx/dg01____VXVM_DG[1]____________VXVM_DG[2]_____________
================================================================================
Logical Volumes and File Systems:
LV[0]___/dev/vg01/1v011____FS[0]____/mnt1___________FS_MOUNT_OPT[0]_________
LV[1]______________________FS[1]____________________FS_MOUNT_OPT[1]_________
LV[2]______________________FS[2]____________________FS_MOUNT_OPT[2]_________
===============================================================================
Network Information:
IP[0] ____15.13.171.14____ SUBNET ____15.13.168___________
186 Chapter 4
Building an HA Cluster Configuration
5 Building an HA Cluster
Configuration
This chapter and the next take you through the configuration tasks
required to set up a Serviceguard cluster. These procedures are carried
out on one node, called the configuration node, and the resulting
binary file is distributed by Serviceguard to all the nodes in the cluster.
In the examples in this chapter, the configuration node is named ftsys9,
and the sample target node is called ftsys10. This chapter describes the
following cluster configuration tasks:
Chapter 5 187
Building an HA Cluster Configuration
188 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
SGCONF=/etc/cmcluster
SGSBIN=/usr/sbin
SGLBIN=/usr/lbin
SGLIB=/usr/lib
SGCMOM=/opt/cmom
SGRUN=/var/adm/cmcluster
SGAUTOSTART=/etc/rc.config.d/cmcluster
SGCMOMLOG=/var/adm/syslog/cmom
NOTE If these variables are not defined on your system, then source the file
/etc/cmcluster.conf in your login profile for user root. For example,
you can add this line to root's .profile file:
. /etc/cmcluster.conf
Throughout this book, system filenames are usually given with one of
these location prefixes. Thus, references to $SGCONF/<FileName> can be
resolved by supplying the definition of the prefix that is found in this file.
For example, if SGCONF is defined as /etc/cmcluster/conf, then the
complete pathname for file $SGCONF/cmclconfig would be
/etc/cmcluster/conf/cmclconfig.
Chapter 5 189
Building an HA Cluster Configuration
Preparing Your Systems
IP Address Resolution
Access control policies for Serviceguard are name-based. IP addresses for
incoming connections must be resolved into hostnames to match against
access control policies.
Communication between two Serviceguard nodes could be received over
any of their shared networks. Therefore, all of their primary addresses
on each of those networks needs to be identified.
Serviceguard supports using aliases. An IP address may resolve into
multiple hostnames, one of those should match the name defined in the
policy.
190 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
NOTE If you use a fully qualified domain name (FQDN), Serviceguard will only
recognize the hostname portion. For example, two nodes
gryf.uksr.hp.com and gryf.cup.hp.com could not be in the same
cluster, as they would both be treated as the same host gryf.
Chapter 5 191
Building an HA Cluster Configuration
Preparing Your Systems
Username Validation
Serviceguard relies on the ident service of the client node to verify the
username of the incoming network connection. If the Serviceguard
daemon is unable to connect to the client's ident daemon, permission will
be denied.
Root on a node is defined as any user who has the UID of 0. For a user to
be identified as root on a remote system, the “root” user entry in
/etc/passwd for the local system must come before any other user who
may also be UID 0. The ident daemon will return the username for the
first UID match. For Serviceguard to consider a remote user as a root
user on that remote node, the ident service must return the username as
“root”.
It is possible to configure Serviceguard to not use the ident service,
however this configuration is not recommended. Consult the white paper
“Securing Serviceguard” for more information.
To disable the use of identd, add the -i option to the tcp hacl-cfg and
hacl-probe inetd configurations.
For example, on HP-UX with Serviceguard A.11.17:
192 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
Access Roles
Serviceguard has two levels of access, root and non-root:
• Root Access: Users who have been authorized for root access have
total control over the configuration of the cluster and packages.
• Non-root Access: Non-root users can be assigned one of four roles:
— Monitor: These users have read-only access to the cluster and its
packages. On the command line, these users can issue these
commands: cmviewcl, cmquerycl, cmgetconf, and cmviewconf.
Serviceguard Manager users can see status and configuration
information on the map, tree and properties.
— (single-package) Package Admin: Applies only to a specific
package. (This is the only access role defined in the package
configuration file; the others are defined in the cluster
configuration.) On the command line, these users can issue the
commands for the specified package: cmrunpkg, cmhaltpkg, and
cmmodpkg. Serviceguard Manager users can see these Admin
menu options for the specific package: Run Package, Halt
Package, Move Package, and Enable or Disable Switching.
Package Admins can not configure or create packages. Package
Admin includes the privileges of the Monitor role.
— (all-packages) Package Admin: Applies to all packages in the
cluster and so is defined in the cluster configuration. The
commands are the same as the role above. Package Admin
includes the privledges of the Monitor role.
— Full Admin: These users can administer the cluster. On the
command line, these users can issue these commands in their
cluster: cmruncl, cmhaltcl, cmrunnode, and cmhaltnode. Full
Admins can not configure or create a cluster. In the Serviceguard
Manager, they can see the Admin menu for their cluster and any
packages in their cluster. Full Admin includes the privledges of
the Package Admin role.
If you upgrade a cluster to Version A.11.16 or later, the cmclnodelist
entries are automatically updated into Access Control Policies in the
cluster configuration file. All non-root user-hostname pairs will be given
the role of Monitor (view only).
Chapter 5 193
Building an HA Cluster Configuration
Preparing Your Systems
194 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
In this example, root on the nodes gryf, sly, and bit all have root access
to the node with this file. The non-root user user1 has the Monitor role
from nodes gryf and sly. Once the access policy is configured, user1 can
be given the role of Monitor or one of the administration roles.
Serviceguard also accepts the use of a “+” in the cmclnodelist file which
indicates that any root user on any node may configure this node and any
non-root user has the Monitor role.
Chapter 5 195
Building an HA Cluster Configuration
Preparing Your Systems
NOTE Root access cannot be given to root users on nodes outside the cluster.
Access control policies for a configured cluster are defined in the ASCII
cluster configuration file. Access control policies for a specific package
are defined in the package configuration file. Any combination of hosts
and users may be assigned roles for the cluster. You can have up to 200
access policies defined for a cluster.
Access policies are defined by three parameters in the configuration file:
196 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
— MONITOR
— FULL_ADMIN
— PACKAGE_ADMIN
MONITOR and FULL_ADMIN can only be set in the cluster configuration
file and they apply to the entire cluster. PACKAGE_ADMIN can be set in
the cluster or a package configuration file. If it is set in the cluster
configuration file, PACKAGE_ADMIN applies to all configured packages.
If it is set in a package configuration file, PACKAGE_ADMIN applies to
that package only.
NOTE You do not have to halt the cluster or package to configure or modify
access control policies.
# Policy 2:
USER_NAME john
USER_HOST bit
USER_ROLE MONITOR
Chapter 5 197
Building an HA Cluster Configuration
Preparing Your Systems
# Policy 3:
USER_NAME ANY_USER
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE MONITOR
In the above example, the configuration would fail, user john is assigned
two roles. Policy 2 is redundant because PACKAGE_ADMIN already
includes the role of MONITOR.
Policy 3 does not conflict with any other policies, even though the
wildcard ANY_USER includes the individual user john.
Plan the cluster’s roles and validate them as soon as possible. Depending
on the organization’s security policy, it may be easiest to create group
logins. For example, you could create a MONITOR role for user operator1
from ANY_CLUSTER_NODE. Then you could give this login name and
password to everyone who will need to monitor your clusters.
Use caution when defining access to ANY_SERVICEGUARD_NODE. This will
allow access from any node on the subnet.
To avoid this problem, you can use the /etc/hosts file on all cluster
nodes in addition to DNS or NIS. It is also recommended to make DNS
highly available either by using multiple DNS servers or by configuring
DNS into a Serviceguard package.
198 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
1. Edit the /etc/hosts file on all nodes in the cluster. Add name
resolution for all heartbeat IP addresses, and other IP addresses
from all the cluster nodes. Example:
15.13.172.231 hasupt01
192.2.1.1 hasupt01
192.2.8.2 hasupt01
15.13.172.232 hasupt02
192.2.1.2 hasupt02
192.2.8.2 hasupt02
15.13.172.233 hasupt03
192.2.1.3 hasupt03
192.2.8.3 sgsupt03
Chapter 5 199
Building an HA Cluster Configuration
Preparing Your Systems
NOTE For each cluster node, the public network IP address must be the
first address listed. This enables other applications to talk to other
nodes on public networks.
2. Edit or create the /etc/nsswitch.conf file on all nodes and add the
following text (on one line), if it does not already exist:
hosts: files [NOTFOUND=continue UNAVAIL=contine] dns
[NOTFOUND=return UNAVAIL=return]
If a line beginning with the string “hosts:” already exists, then
make sure that the text immediately to the right of this string is (on
one line):
files [NOTFOUND=continue UNAVAIL=contine] dns
[NOTFOUND=return UNAVAIL=return]
This step is critical so that the nodes in the cluster can still resolve
hostnames to IP addresses while DNS is down or if the primary LAN
is down.
3. If no cluster exists on a node, crate and edit an /etc/cmclnodelist
file on all nodes and add access to all cluster node primary IP
addresses and node names:
15.13.172.231 hasupt01
15.13.172.232 hasupt02
15.13.172.233 hasupt03
200 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
NOTE The boot, root, and swap logical volumes must be done in exactly the
following order to ensure that the boot volume occupies the first
contiguous set of extents on the new disk, followed by the swap and
the root.
Chapter 5 201
Building an HA Cluster Configuration
Preparing Your Systems
/dev/dsk/c4t6d0
Root: lvol3 on: /dev/dsk/c4t5d0
/dev/dsk/c4t6d0
Swap: lvol2 on: /dev/dsk/c4t5d0
/dev/dsk/c4t6d0
Dump: lvol2 on: /dev/dsk/c4t6d0, 0
202 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
NOTE You must use the vgcfgbackup and vgcfgrestore commands to back up
and restore the lock volume group configuration data regardless of how
you create the lock volume group.
Chapter 5 203
Building an HA Cluster Configuration
Preparing Your Systems
• ndd is the network tuning utility. For more information, see the man
page for ndd(1M)
• kmtune is the system tuning utility. For more information, see the
man page for kmtune(1M).
Serviceguard has also been tested with non-default values for these two
network parameters:
204 Chapter 5
Building an HA Cluster Configuration
Preparing Your Systems
Chapter 5 205
Building an HA Cluster Configuration
Setting up the Quorum Server
NOTE It is recommended that the node on which the quorum server is running
be in the same subnet as the clusters for which it is providing services.
This will help prevent any network delays which could affect quorum
server operation. If you use a different subnet, you may experience
network delays which may cause quorum server timeouts. To prevent
these timeouts, you can use the QS_TIMEOUT_EXTENSION parameter in
the cluster ASCII file to increase the quorum server timeout interval.
If the network used to connect to the quorum server is a cluster
heartbeat network, ensure that at least one other network is also a
heartbeat network so that both quorum server and heartbeat
communication are not likely to fail at the same time.
To allow access by all nodes, enter the plus character (+) on its own line.
206 Chapter 5
Building an HA Cluster Configuration
Setting up the Quorum Server
Chapter 5 207
Building an HA Cluster Configuration
Installing Serviceguard
Installing Serviceguard
Installing Serviceguard includes updating the software via Software
Distributor. It is assumed that you have already installed HP-UX.
Use the following steps for each node:
208 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
Chapter 5 209
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
210 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
Selecting Disks for the Volume Group Obtain a list of the disks on both
nodes and identify which device files are used for the same disk on both. Use the
following command on each node to list available disks as they are known to each
system:
# lssf /dev/dsk/*
The major number is always 64, and the hexadecimal minor number has the
form
0xhh0000
where hh must be unique to the volume group you are creating. Use a unique
minor number that is available across all the nodes for the mknod command
above. (This will avoid further reconfiguration later, when NFS-mounted
logical volumes are created in the VG.)
Use the following command to display a list of existing volume groups:
# ls -l /dev/*/group
3. Create the volume group and add physical volumes to it with the following
commands:
# vgcreate -g bus0 /dev/vgdatabase /dev/dsk/c1t2d0
# vgextend -g bus1 /dev/vgdatabase /dev/dsk/c0t2d0
Chapter 5 211
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
The first command creates the volume group and adds a physical volume to
it in a physical volume group called bus0. The second command adds the
second drive to the volume group, locating it in a different physical volume
group named bus1. The use of physical volume groups allows the use of
PVG-strict mirroring of disks and PV links.
NOTE If you are using disk arrays in RAID 1 or RAID 5 mode, omit the -m 1
and -s g options.
Note the mount command uses the block device file for the logical
volume.
4. Verify the configuration:
# vgdisplay -v /dev/vgdatabase
212 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
Assume that the disk array has been configured, and that both the
following device files appear for the same LUN (logical disk) when you
run the ioscan command:
/dev/dsk/c0t15d0
/dev/dsk/c1t3d0
Chapter 5 213
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
Use the following steps to configure a volume group for this logical disk:
You can now use the vgdisplay -v command to see the primary and
alternate links. LVM will now recognize the I/O channel represented by
/dev/dsk/c0t15d0 as the primary link to the disk; if the primary link fails,
LVM will automatically switch to the alternate I/O channel represented
by /dev/dsk/c1t3d0.
To create logical volumes, use the procedure described in the previous
section, “Creating Logical Volumes.”
214 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
Deactivating the Volume Group At the time you create the volume
group, it is active on the configuration node (ftsys9, for example). Before
setting up the volume group for use on other nodes, you must first
unmount any filesystems that reside on the volume group, then
deactivate it. At run time, volume group activation and filesystem
mounting are done through the package control script.
Continuing with the example presented in earlier sections, do the
following on ftsys9:
# umount /mnt1
# vgchange -a n /dev/vgdatabase
Chapter 5 215
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
5. Import the volume group data using the map file from node ftsys9.
On node fsys10, enter:
# vgimport -s -m /tmp/vgdatabase.map /dev/vgdatabase
Note that the disk device names on ftsys10 may be different from
their names on ftsys9. You should check to ensure that the physical
volume names are correct throughout the cluster.
When the VG can be activated on this node, perform a vgcfgbackup
in the unlikely event that a vgcfgrestore must be performed on this
node because of a disaster on the primary node and an LVM problem
with the volume group. Do this as shown in the example below:
# vgchange -a y /dev/vgdatabase
# vgcfgbackup /dev/vgdatabase
# vgchange -a n /dev/vgdatabase
NOTE When you use PVG-strict mirroring, the physical volume group
configuration is recorded in the /etc/lvmpvg file on the
configuration node. This file defines the physical volume groups
which are the basis of mirroring and indicate which physical volumes
belong to each PVG. Note that on each cluster node, the
/etc/lvmpvg file must contain the correct physical volume names for
the PVG’s disks as they are known on that node. Physical volume
names for the same disks may not be the same on different nodes.
After distributing volume groups to other nodes, you must ensure
that each node’s /etc/lvmpvg file correctly reflects the contents of
all physical volume groups on that node. Refer to the following
section, “Making Physical Volume Group Files Consistent.”
7. Make sure that you have deactivated the volume group on ftsys9.
Then enable the volume group on ftsys10:
# vgchange -a y /dev/vgdatabase
216 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
Chapter 5 217
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
218 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
IMPORTANT The rootdg for the VERITAS Cluster Volume Manager 3.5 is not the
same as the HP-UX root disk if an LVM volume group is used for the
HP-UX root disk filesystem. Note also that rootdg cannot be used for
shared storage. However, rootdg can be used for other local filesystems
(e.g., /export/home), so it need not be wasted. (CVM 4.1 does not have
this restriction.)
Note that you should create a rootdg disk group only once on each node.
Chapter 5 219
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
NOTE These commands make the disk and its data unusable by LVM, and
allow it to be initialized by VxVM. (The commands should only be used if
you have previously used the disk with LVM and do not want to save the
data on it.)
You can remove LVM header data from the disk as in the following
example (note that all data on the disk will be erased):
# pvremove /dev/rdsk/c0t3d2
Then, use the vxdiskadm program to initialize multiple disks for VxVM,
or use the vxdisksetup command to initialize one disk at a time, as in
the following example:
# /usr/lib/vxvm/bin/vxdisksetup -i c0t3d2
Creating Volumes
Use the vxassist command to create logical volumes. The following is
an example:
# vxassist -g logdata make log_files 1024m
This command creates a 1024 MB volume named log_files in a disk
group named logdata. The volume can be referenced with the block
device file /dev/vx/dsk/logdata/log_files or the raw (character)
device file /dev/vx/rdsk/logdata/log_files. Verify the configuration
with the following command:
220 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
# vxprint -g logdata
The output of this command is shown in the following example:
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE
TUTILO PUTILO
NOTE The specific commands for creating mirrored and multi-path storage
using VxVM are described in the VERITAS Volume Manager Reference
Guide.
Chapter 5 221
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
NOTE Unlike LVM volume groups, VxVM disk groups are not entered in the
cluster ASCII configuration file, and they are not entered in the package
ASCII configuration file.
222 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with LVM and VxVM
Note that the clearimport is done for disks previously imported with
noautoimport set on any system that has Serviceguard installed,
whether it is configured in a cluster or not.
Chapter 5 223
Building an HA Cluster Configuration
Configuring the Cluster
224 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
Chapter 5 225
Building an HA Cluster Configuration
Configuring the Cluster
CLUSTER_NAME cluster1
226 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
#
# The default quorum server timeout is calculated from the
# Serviceguard cluster parameters, including NODE_TIMEOUT and
# HEARTBEAT_INTERVAL. If you are experiencing quorum server
# timeouts, you can adjust these parameters, or you can include
# the QS_TIMEOUT_EXTENSION parameter.
#
# The value of QS_TIMEOUT_EXTENSION will directly effect the amount
# of time it takes for cluster reformation in the event of failure.
# For example, if QS_TIMEOUT_EXTENSION is set to 10 seconds, the cluster
# reformation will take 10 seconds longer than if the QS_TIMEOUT_EXTENSION
# was set to 0. This delay applies even if there is no delay in
# contacting the Quorum Server. The recommended value for
# QS_TIMEOUT_EXTENSION is 0, which is used as the default
# and the maximum supported value is 30000000 (5 minutes).
#
# For example, to configure a quorum server running on node
# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to
# add 2 seconds to the system assigned value for the quorum server
# timeout, enter:
#
# QS_HOST qshost
# QS_POLLING_INTERVAL 120000000
# QS_TIMEOUT_EXTENSION 2000000
QS_HOST sysman5
QS_POLLING_INTERVAL 300000000
# Definition of nodes in the cluster.
# Repeat node definitions as necessary for additional nodes.
# NODE_NAME is the specified nodename in the cluster.
# It must match the hostname and both cannot contain full domain name.
# Each NETWORK_INTERFACE, if configured with IPv4 address,
# must have ONLY one IPv4 address entry with it which could
# be either HEARTBEAT_IP or STATIONARY_IP.
# Each NETWORK_INTERFACE, if configured with IPv6 address(es)
# can have multiple IPv6 address entries(up to a maximum of 2,
# only one IPv6 address entry belonging to site-local scope
# and only one belonging to global scope) which must be all
# STATIONARY_IP. They cannot be HEARTBEAT_IP.
NODE_NAME fresno
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.13.168.91
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0
Chapter 5 227
Building an HA Cluster Configuration
Configuring the Cluster
NODE_NAME lodi
NETWORK_INTERFACE lan0
HEARTBEAT_IP 15.13.168.94
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0
HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 2000000
228 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
FAILOVER_OPTIMIZATION NONE
AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000
Chapter 5 229
Building an HA Cluster Configuration
Configuring the Cluster
# in the cluster
# * FULL_ADMIN: MONITOR and PACKAGE_ADMIN plus the administrative
# commands for the cluster.
#
# Access control policy does not set a role for configuration
# capability. To configure, a user must log on to one of the
# cluster’s nodes as root (UID=0). Access control
# policy cannot limit root users’ access.
#
# MONITOR and FULL_ADMIN can only be set in the cluster configuration file,
# and they apply to the entire cluster. PACKAGE_ADMIN can be set in the
# cluster or a package configuration file. If set in the cluster
# configuration file, PACKAGE_ADMIN applies to all configured packages.
# If set in a package configuration file, PACKAGE_ADMIN applies to that
# package only.
#
# Conflicting or redundant policies will cause an error while applying
# the configuration, and stop the process. The maximum number of access
# policies that can be configured in the cluster is 200.
#
#
# Example: to configure a role for user john from node noir to
# administer a cluster and all its packages, enter:
# USER_NAME john
# USER_HOST noir
# USER_ROLE FULL_ADMIN
USER_NAME root
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE full_admin
# List of cluster aware LVM Volume Groups. These volume groups will
# be used by package applications via the vgchange -a e command.
# Neither CVM or VxVM Disk Groups should be used here.
# For example:
# VOLUME_GROUP /dev/vgdatabase
# VOLUME_GROUP /dev/vg02
230 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
# OPS_VOLUME_GROUP /dev/vgdatabase
# OPS_VOLUME_GROUP /dev/vg02
The man page for the cmquerycl command lists the definitions of all the
parameters that appear in this file. Many are also described in the
“Planning” chapter. Modify your /etc/cmcluster/clust1.config file
to your requirements, using the data on the cluster worksheet.
In the file, keywords are separated from definitions by white space.
Comments are permitted, and must be preceded by a pound sign (#) in
the far left column. See the man page for the cmquerycl command for
more details.
Chapter 5 231
Building an HA Cluster Configuration
Configuring the Cluster
clear the cluster ID from the volume group. After you are done, do not
forget to run vgchange -c y <vg name> to re-write the cluster ID back
to the volume group.
NOTE You should not configure a second lock volume group or physical volume
unless your configuration specifically requires it. See the discussion
“Dual Cluster Lock” in the section “Cluster Lock” in Chapter 3.
232 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
#
# For example, to configure a quorum server running on node
# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to
# add 2 seconds to the system assigned value for the quorum
server
# timeout, enter:
#
# QS_HOST qshost
# QS_POLLING_INTERVAL 120000000
# QS_TIMEOUT_EXTENSION 2000000
NOTE If you are using Version 3.5 VERITAS CVM disk groups, you can
configure only a single heartbeat subnet, which should be a dedicated
subnet. Each system on this subnet must have standby LANs configured,
to ensure that there is a highly available heartbeat path. (Version 4.1
configurations can have multiple heartbeats.)
Chapter 5 233
Building an HA Cluster Configuration
Configuring the Cluster
Optimization
Serviceguard Extension for Faster Failover (SGeFF) is a separately
purchased product. If it is installed, the configuration file will display the
parameter to enable it.
SGeFF reduces the time it takes Serviceguard to process a failover. It
cannot, however, change the time it takes for packages and applications
to gracefully shut down and restart.
SGeFF has requirements for cluster configuration, as outlined in the
cluster configuration template file.
For more information, see the Serviceguard Extension for Faster
Failover Release Notes posted on http://www.docs.hp.com/hpux/ha.
234 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
NOTE If you are using CVM disk groups, they should be configured after cluster
configuration is done, using the procedures described in “Creating the
Storage Infrastructure and Filesystems with VERITAS Cluster Volume
Manager (CVM)” on page 251. VERITAS disk groups are added to the
package configuration file, as described in Chapter 6.
Chapter 5 235
Building an HA Cluster Configuration
Configuring the Cluster
If you have edited an ASCII cluster configuration file using the command
line, use the following command to verify the content of the file:
# cmcheckconf -k -v -C /etc/cmcluster/clust1.config
Both methods check the following:
236 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
NOTE Using the -k option means that cmcheckconf only checks disk
connectivity to the LVM disks that are identified in the ASCII file.
Omitting the -k option (the default behavior) means that cmcheckconf
tests the connectivity of all LVM disks on all nodes. Using -k can result
in significantly faster operation of the command.
• Activate the cluster lock volume group so that the lock disk can be
initialized:
# vgchange -a y /dev/vglock
• Generate the binary configuration file and distribute it:
# cmapplyconf -k -v -C /etc/cmcluster/clust1.config
Chapter 5 237
Building an HA Cluster Configuration
Configuring the Cluster
or
# cmapplyconf -k -v -C /etc/cmcluster/clust1.ascii
NOTE Using the -k option means that cmapplyconf only checks disk
connectivity to the LVM disks that are identified in the ASCII file.
Omitting the -k option (the default behavior) means that
cmapplyconf tests the connectivity of all LVM disks on all nodes.
Using -k can result in significantly faster operation of the command.
NOTE The apply will not complete unless the cluster lock volume group is
activated on exactly one node before applying. There is one exception to
this rule: a cluster lock had been previously configured on the same
physical volume and volume group.
After the configuration is applied, the cluster lock volume group must be
deactivated.
238 Chapter 5
Building an HA Cluster Configuration
Configuring the Cluster
NOTE You must use the vgcfgbackup command to store a copy of the cluster
lock disk's configuration data whether you created the volume group
using SAM or using HP-UX commands.
If the cluster lock disk ever needs to be replaced while the cluster is
running, you must use the vgcfgrestore command to restore lock
information to the replacement disk. Failure to do this might result in a
failure of the entire cluster if all redundant copies of the lock disk have
failed and if replacement mechanisms or LUNs have not had the lock
configuration restored. (If the cluster lock disk is configured in a disk
array, RAID protection provides a redundant copy of the cluster lock
data. Mirrordisk/UX does not mirror cluster lock information.)
Chapter 5 239
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
240 Chapter 5
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
3. If you have not initialized your disk groups, or if you have an old
install that needs to be re-initialized, use the vxinstall command
to initialize VxVM/CVM disk groups. See “Initializing the VERITAS
Volume Manager” on page 252.
4. The VERITAS cluster volumes are managed by a
Serviceguard-supplied system multi-node package which runs on
all nodes at once, and cannot failover. In CVM 4.1, which is required
for the Cluster File System, Serviceguard supplies the SG-CFS-pkg
template. (In CVM 3.5, Serviceguard supplies the VxVM-CVM-pkg
template)
The CVM 4.1 package has the following responsibilities:
Chapter 5 241
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
# cfscluster status
Node : ftsys9
Cluster Manager : up
CVM state : up (MASTER)
MOUNT POINT TYPE SHARED VOLUME DISK GROUP STATUS
Node : ftsys10
Cluster Manager : up
CVM state : up
MOUNT POINT TYPE SHARED VOLUME DISK GROUP STATUS
NOTE Because the CVM 4.1 system multi-node package automatically starts
up the VERITAS processess, do not edit these files:
/etc/llthosts
/etc/llttab
/etc/gabtab
242 Chapter 5
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
NOTE If you want to create a cluster with CVM only - without CFS, stop here.
Then, in your application package’s configuration file, add the
dependency triplet, with DEPENDENCY_CONDITION set to
SG-DG-pkg-id#=UP and LOCATION set to SAME_NODE. For more
information about the DEPENDENCY parameter, see “Package
Configuration File Parameters” on page 170.
Chapter 5 243
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
5. To view the package name that is monitoring a disk group, use the
cfsdgadm show_package command:
# cfsdgadm show_package logdata
sg_cfs_dg-1
Creating Volumes
1. Make log_files volume on the logdata disk group:
# vxassist -g logdata make log_files 1024m
You do not need to create the directory. The command creates one on
each of the nodes, during the mount.
CAUTION Once you create the disk group and mount point packages, it is
critical that you administer the cluster with the cfs commands,
including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. These
244 Chapter 5
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
NOTE Please note that the disk group and mount point multi-node
packages do not monitor the health of the disk group and mount
point. They check that the packages that depend on them have access
to the disk groups and mount points. If the dependent application
package loses access and cannot read and write to the disk, it will
fail; however that will not cause the DG or MP multi-node package to
fail.
Chapter 5 245
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
MULTI_NODE_PACKAGES
PACKAGE STATUS STATE AUTO_RUN SYSTEM
SG-CFS-pkg up running enabled yes
SG-CFS-DG-1 up running enabled no
SG-CFS-MP-1 up running enabled no
# ftsys9/etc/cmcluster/cfs> bdf
Filesystem kbytes used avail %used Mounted on
/dev/vx/dsk/logdata/log_files 10485 17338 966793 2% tmp/logdata/log_files
# ftsys10/etc/cmcluster/cfs> bdf
Filesystem kbytes used avail %used Mounted on
/dev/vx/dsk/logdata/log_files 10485 17338 966793 2% tmp/logdata/log_files
7. To view the package name that is monitoring a mount point, use the
cfsmntadm show_package command:
# cfsmntadm show_package /tmp/logdata/log_files
SG-CFS-MP-1
8. After creating your mount point packages for the cluster file system,
you can configure your application package to depend on the mount
points. In the ASCII configuration file, specify the dependency
triplet, specifying DEPENDENCY_CONDITION SG-mp-pkg-#=UP
and DEPENDENCY_LOCATION SAME_NODE. For more
information about the DEPENDENCY parameter, see “Package
Configuration File Parameters” on page 170.
NOTE Unlike LVM volume groups, CVM disk groups are not entered in the
cluster configuration file, they are entered in the package
configuration file only.
246 Chapter 5
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
# cfsmount /tmp/check_logfiles
3. Verify.
# cmviewcl
Chapter 5 247
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
CLUSTER STATUS
cfs-cluster up
MULTI_NODE_PACKAGES
248 Chapter 5
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
MULTI_NODE_PACKAGES
Chapter 5 249
Building an HA Cluster Configuration
Creating a Storage Infrastructure with VERITAS Cluster File System (CFS)
# bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 544768 352233 180547 66% /
/dev/vg00/lvol1 307157 80196 196245 29% /stand
/dev/vg00/lvol5 1101824 678426 397916 63% /var
/dev/vg00/lvol7 2621440 1702848 861206 66% /usr
/dev/vg00/lvol4 4096 707 3235 18% /tmp
/dev/vg00/lvol6 2367488 1718101 608857 74% /opt
/dev/vghome/varopt 4194304 258609 3689741 7% /var/opt
/dev/vghome/home 2097152 17167 1949993 1% /home
/dev/vx/dsk/logdata/log_files
102400 1765 94353 2% /tmp/logdata/log_files
/dev/vx/dsk/dg1/vol1
102400 1765 94346 2% /local/snap1
250 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)
Chapter 5 251
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)
• In CVM 3.5, you must create a disk group known as rootdg that
contains at least one disk. From the main menu, choose the “Custom”
option, and specify the disk you wish to include in rootdg.
IMPORTANT The rootdg in version 3.5 of VERITAS Volume Manager is not the
same as the HP-UX root disk if an LVM volume group is used for the
HP-UX root filesystem (/). Note also that rootdg cannot be used for
shared storage. However, rootdg can be used for other local
filesystems (e.g., /export/home), so it need not be wasted.
Note that you should create a root disk group only once on each node.
• CVM 4.1 does not require that you create the special VERITAS
rootdg disk.
252 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)
on the map and tree, like failover packages. These clusterwide packages’
properties have a special tab in the cluster properties, and their admin
menu is available when you select their cluster.
• With CVM 3.5, prepare the cluster for CVM disk group configuration
you can configure only one heartbeat subnet in the cluster.
• With CVM 4.1, the cluster can have multiple heartbeats.
Neither version can use Auto Port Aggregation, Infiniband, or VLAN
interfaces as a heartbeat subnet.
The VERITAS cluster volumes are managed by a Serviceguard-supplied
system multi-node package which runs on all nodes at once, and
cannot failover. In CVM 3.5, Serviceguard creates the VxVM-CVM-pkg. In
CVM 4.1, Serviceguard creates the SG-CFS-pkg.
The SG-CFS-pkg package has the following responsibilities:
Chapter 5 253
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)
MULTI_NODE_PACKAGES:
254 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)
1. Use the vxdg command to create disk groups. Use the -s option to
specify shared mode, as in the following example:
# vxdg -s init logdata c0t3d2
2. Verify the configuration with the following command:
# vxdg list
NAME STATE ID
Creating Volumes
Use the vxassist command to create volumes, as in the following
example:
# vxassist -g logdata make log_files 1024m
This command creates a 1024 MB volume named log_files in a disk
group named logdata. The volume can be referenced with the block
device file /dev/vx/dsk/logdata/log_files or the raw (character)
device file /dev/vx/rdsk/logdata/log_files.
Chapter 5 255
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)
NOTE The specific commands for creating mirrored and multi-path storage
using CVM are described in the HP-UX documentation for the VERITAS
Volume Manager, posted at http://docs.hp.com.
256 Chapter 5
Building an HA Cluster Configuration
Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM)
NOTE Unlike LVM volume groups, CVM disk groups are not entered in the
cluster configuration file, they are entered in the package
configuration file only.
Chapter 5 257
Building an HA Cluster Configuration
Using DSAU during Configuration
• Configuration synchronization
• Log consolidation
• Command fan-out
• Ensure that you have the same reference configuration files as the
configuration master (for example, package scripts, /etc/hosts and
so on)
• Update a client file with information from the configuration master
• Verify file permissions
• Verify file ownership
258 Chapter 5
Building an HA Cluster Configuration
Using DSAU during Configuration
• Edit files
• Execute shell commands
• Disable use of a specific file
• Signal processes
• Check for processes
• Clean up directories
• Check for symbolic links in system files
• Maintain symbolic links
Template configuration-description files are provided with DSAU that
include examples of the common files that need to be synchronized in a
Serviceguard cluster.
For additional information on DSAU, refer to the Managing Systems
and Workgroups manual posted at http://docs.hp.com.
Log Consolidation
The Distributed Systems Administration Utilities (DSAU) lets you
configure a highly available log consolidation package. Using the log
consolidator, the data from each member-specific
/var/adm/syslog/syslog.log file can be combined into a single file, similar
to standard syslog forwarding. (Note that the DSAU consolidated log
uses the same format as syslog.log: date/time, hostname, message.)
In addition, member-specific package logs can be consolidated in a single
package-specific consolidated log. This helps the administrator easily
monitor, analyze and troubleshoot cluster members and packages.
The log-consolidation tools also offer increased logging reliability using
TCP as well as UDP transports, and sophisticated log-filtering features.
For additional information on DSAU, refer to the Managing Systems
and Workgroups manual posted at http://docs.hp.com.
Chapter 5 259
Building an HA Cluster Configuration
Using DSAU during Configuration
The DSAU utilities can use remsh or ssh as a transport. When using ssh,
the DSAU csshsetup utility makes it easy to distribute ssh keys across all
members of the cluster.
DSAU also includes output filtering tools that help to consolidate the
command output returned from each member and easily see places
where members are returning identical information. DSAU additionally
provides utilities for commands that are frequently used cluster-wide.
These include:
• Configuration synchronization
• Log consolidation
With these tools, you can easily see what is happening throughout your
Serviceguard cluster.
For additional information on DSAU, refer to the Managing Systems
and Workgroups manual posted at http://docs.hp.com.
260 Chapter 5
Building an HA Cluster Configuration
Managing the Running Cluster
• You can see status information on the map. Failover packages are
shown on the map and tree, connected to the node where they are
running. Symbols for multi-node and system multi-node packages
are not shown on the map or tree; instead, their properties are in the
cluster properties, and their command menu is available when the
cluster symbol is selected.
• Configuration and status information is available from the property
sheets for cluster, nodes, and packages.
When you create or modify a package or cluster configuration, you can
start it running, and then save the session a Serviceguard Manager
(.sgm) file. This way, you can archive the package or cluster’s
configuration and healthy running behavior. The data in this file can be
compared with later versions of the cluster to understand the changes
that are made over time. It will be particularly useful in troubleshooting
to compare this file to a problem cluster.
In Serviceguard Manager, you can also use administration commands if
the Session Server node and the target node both have Serviceguard
version A.11.12 or later installed.
Chapter 5 261
Building an HA Cluster Configuration
Managing the Running Cluster
1. If the cluster is not already online, start it. From the Serviceguard
Manager menu, choose Run Cluster. From the command line, use
cmruncl -v.
By default, cmruncl will check the networks. Serviceguard will probe
the actual network configuration with the network information in
the cluster configuration. If you do not need this validation, use
cmruncl -v - w none instead, to turn off validation and save time
2. When the cluster has started, make sure that cluster components are
operating correctly. In Serviceguard Manager, open the cluster on
the map or tree, and perhaps check its Properties. On the command
line, use the cmviewcl -v command.
Make sure that all nodes and networks are functioning as expected.
For more information, refer to Chapter 7, “Cluster and Package
Maintenance,” on page 309.
3. Verify that nodes leave and enter the cluster as expected using the
following steps:
262 Chapter 5
Building an HA Cluster Configuration
Managing the Running Cluster
NOTE The root volume group does not need to be included in the
custom_vg_activation function, since it is automatically activated
before the /etc/lvmrc file is used at boot time.
Chapter 5 263
Building an HA Cluster Configuration
Managing the Running Cluster
• The cluster is not running on any node, all cluster nodes must be
reachable, and all must be attempting to start up. In this case, the
node attempts to form a cluster consisting of all configured nodes.
• The cluster is already running on at least one node. In this case, the
node attempts to join that cluster.
• Neither is true: the cluster is not running on any node, and not all
the nodes are reachable and trying to start. In this case, the node will
attempt to start for the AUTO_START_TIMEOUT period. If neither of
these things becomes true in that time, startup will fail.
To enable automatic cluster start, set the flag AUTOSTART_CMCLD to 1 in
the /etc/rc.config.d/cmcluster file on each node in the cluster; the
nodes will then join the cluster at boot time.
Here is an example of the /etc/rc.config.d/cmcluster file:
#************************ CMCLUSTER ************************
# Highly Available Cluster configuration
#
# @(#) $Revision: 72.2 $
#
# AUTOSTART_CMCLD: If set to 1, the node will attempt to
# join it's CM cluster automatically when
# the system boots.
# If set to 0, the node will not attempt
# to join it's CM cluster.
#
AUTOSTART_CMCLD=1
264 Chapter 5
Building an HA Cluster Configuration
Managing the Running Cluster
You might wish to include a list of all cluster nodes in this message,
together with additional cluster-specific information.
The /etc/issue and /etc/motd files may be customized to include
cluster-related information.
Chapter 5 265
Building an HA Cluster Configuration
Managing the Running Cluster
Single-Node Operation
Single-node operation occurs in a single-node cluster or in a multi-node
cluster, following a situation where all but one node has failed, or where
you have shut down all but one node, which will probably have
applications running. As long as the Serviceguard daemon cmcld is
active, other nodes can re-join the cluster at a later time.
If the Serviceguard daemon fails when in single-node operation, it will
leave the single node up and your applications running. This is different
from the loss of the Serviceguard daemon in a multi-node cluster, which
halts the node with a TOC, and causes packages to be switched to
adoptive nodes.
It is not necessary to halt the single node in this scenario, since the
application is still running, and no other node is currently available for
package switching.
However, you should not try to restart Serviceguard, since data
corruption might occur if the node were to attempt to start up a new
instance of the application that is still running on the node. Instead of
restarting the cluster, choose an appropriate time to shutdown and
reboot the node, which will allow the applications to shut down and then
permit Serviceguard to restart the cluster after rebooting.
NOTE The cmdeleteconf command removes only the cluster binary file
/etc/cmcluster/cmclconfig. It does not remove any other files from
the /etc/cmcluster directory.
266 Chapter 5
Building an HA Cluster Configuration
Managing the Running Cluster
Although the cluster must be halted, all nodes in the cluster should be
powered up and accessible before you use the cmdeleteconf command. If
a node is powered down, power it up and boot. If a node is inaccessible,
you will see a list of inaccessible nodes together with the following
message:
It is recommended that you do not proceed with the
configuration operation unless you are sure these nodes are
permanently unavailable.Do you want to continue?
Reply Yes to remove the configuration. Later, if the inaccessible node
becomes available, you should run the cmdeleteconf command on that
node to remove the configuration file.
Chapter 5 267
Building an HA Cluster Configuration
Managing the Running Cluster
268 Chapter 5
Configuring Packages and Their Services
Chapter 6 269
Configuring Packages and Their Services
Creating the Package Configuration
270 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
Chapter 6 271
Configuring Packages and Their Services
Creating the Package Configuration
For CVM, use the cmapplyconf command to add the system multi-node
packages to your cluster. If you are using the VERITAS Cluster File
System, use the cfscluster command to activate and halt the system
multi-node package in your cluster.
272 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
CAUTION Once you create the disk group and mount point packages, it is critical
that you administer these packages with the cfs commands, including
cfsdgadm, cfsmntadm, cfsmount, and cfsumount. These non-cfs
commands could cause conflicts with subsequent command operations on
the file system or Serviceguard packages. Use of these other forms of
mount will not create an appropriate multi-node package which means
that the cluster packages are not aware of the file system changes.
NOTE Please note that the disk group and mount point multi-node packages do
not monitor the health of the disk group and mount point. They check
that the packages that depend on them have access to the disk groups
and mount points. If the dependent application package loses access and
cannot read and write to the disk, it will fail; however that will not cause
the DG or MP multi-node package to fail.
NOTE Do not create or edit ASCII configuration files for the Serviceguard
supplied packages VxVM-CVM-pkg, SG-CFS-pkg, SG-CFS-DG-id#, or
SG-CFS-MP-id#. Create VxVM-CVM-pkg and SG-CFS-pkg by issuing the
cmapplyconf command. Create and modify SG-CFS-DG-id# and
SG-CFS-MP-id# using the cfs* commands listed in Appendix A.
Chapter 6 273
Configuring Packages and Their Services
Creating the Package Configuration
# mkdir /etc/cmcluster/pkg1
You can use any directory names you wish.
2. Next, generate a package configuration template for the package:
# cmmakepkg -p /etc/cmcluster/pkg1/pkg1.config
You can use any file names you wish for the ASCII templates.
3. Edit these template files to specify package name, prioritized list of
nodes (with 31 bytes or less in the name), the location of the control
script, and failover parameters for each package. Include the data
recorded on the Package Configuration Worksheet.
274 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
#**********************************************************************
# ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template)*******
#**********************************************************************
# ******* Note: This file MUST be edited before it can be used.********
# * For complete details about package parameters and how to set them,*
# * consult the Serviceguard Extension for RAC manuals.
#**********************************************************************
# Enter a name for this package. This name will be used to identify the
# package when viewing or manipulating it. It must be different from
# the other configured package names.
PACKAGE_NAME
Chapter 6 275
Configuring Packages and Their Services
Creating the Package Configuration
# FAILOVER_POLICY
# FAILBACK_POLICY
#
# Since an IP address can not be assigned to more than node at a
# time, relocatable IP addresses can not be assigned in the
# package control script for MULTI_NODE packages. If volume
# groups are used in a MULTI_NODE package, they must be
# activated in a shared mode and data integrity is left to the
PACKAGE_TYPE FAILOVER
# Enter the failover policy for this package. This policy will be used
# toselect an adoptive node whenever the package needs to be started.
# The default policy unless otherwise specified is CONFIGURED_NODE.
# This policy will select nodes in priority order from the list of
# NODE_NAME entries specified below.
#
# The alternative policy is MIN_PACKAGE_NODE. This policy will select
# the node, from the list of NODE_NAME entries below, which is
# running the least number of packages at the time this package needs
# to start.
FAILOVER_POLICY CONFIGURED_NODE
# Enter the failback policy for this package. This policy will be used
# to determine what action to take when a package is not running on
# its primary node and its primary node is capable of running the
# package. The default policy unless otherwise specified is MANUAL.
# The MANUAL policy means no attempt will be made to move the package
# back to its primary node when it is running on an adoptive node.
#
# The alternative policy is AUTOMATIC. This policy will attempt to
# move the package back to its primary node whenever the primary node
# is capable of running the package.
FAILBACK_POLICY MANUAL
276 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
# Enter the names of the nodes configured for this package. Repeat
# this line as necessary for additional adoptive nodes.
#
# NOTE: The order is relevant.
# Put the second Adoptive Node after the first one.
#
# Example : NODE_NAME original_node
# NODE_NAME adoptive_node
#
# If all nodes in the cluster are to be specified and order is not
# important, "NODE_NAME *" may be specified.
#
# Example : NODE_NAME *
NODE_NAME
# Enter the value for AUTO_RUN. Possible values are YES and NO.
# The default for AUTO_RUN is YES. When the cluster is started the
# package will be automatically started. In the event of a failure the
# package will be started on an adoptive node. Adjust as necessary.
#
# AUTO_RUN replaces obsolete PKG_SWITCHING_ENABLED.
AUTO_RUN YES
LOCAL_LAN_FAILOVER_ALLOWED YES
Chapter 6 277
Configuring Packages and Their Services
Creating the Package Configuration
NODE_FAIL_FAST_ENABLED NO
# Enter the complete path for the run and halt scripts. In most cases
# the run script and halt script specified here will be the same
# script,the package control script generated by the cmmakepkg command.
# This control script handles the run(ning) and halt(ing) of the
# package.
#
# Enter the timeout, specified in seconds, for the run and halt
# scripts. If the script has not completed by the specified timeout
# value, it will be terminated. The default for each script timeout
# is NO_TIMEOUT. Adjust the timeouts as necessary to permit full
# execution of each script.
#
# Note: The HALT_SCRIPT_TIMEOUT should be greater than the sum of
# all SERVICE_HALT_TIMEOUT values specified for all services.
#
# The file where the output of the scripts is logged can be specified
# via the SCRIPT_LOG_FILE parameter. If not set, script output is sent
# to a file named by appending '.log' to the script path.
#
#SCRIPT_LOG_FILE
RUN_SCRIPT
RUN_SCRIPT_TIMEOUT NO_TIMEOUT
HALT_SCRIPT
HALT_SCRIPT_TIMEOUT NO_TIMEOUT
# Enter the names of the storage groups configured for this package.
# Repeat this line as necessary for additional storage groups.
#
# Storage groups are only used with CVM disk groups. Neither
# VxVM disk groups or LVM volume groups should be listed here.
# By specifying a CVM disk group with the STORAGE_GROUP keyword
# this package will not run until the CVM system multi node package is
# running and thus the CVM shared disk groups are ready for
# activation.
#
# NOTE: Should only be used by applications provided by
# Hewlett-Packard.
#
# Example : STORAGE_GROUP dg01
# STORAGE_GROUP dg02
278 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
# STORAGE_GROUP dg03
# STORAGE_GROUP dg04
#
#
# Enter the names of the dependency condition for this package.
# Dependencies are used to describe the relationship between packages
# To define a dependency, all three attributes are required.
#
# DEPENDENCY_NAME must have a unique identifier for the dependency.
#
# DEPENDENCY_CONDITION
# This is an expression describing what must be true for
# the dependency to be satisfied.
#
# The syntax is: <package name> = UP , where <package name>
# is the name of multi-node or system multi-node package.
#
# DEPENDENCY_LOCATION
# This describes where the condition must be satisfied.
# The only possible value for this attribute is SAME_NODE
#
# NOTE:
# Dependencies should be used only for a CFS cluster, or by
# applications specified by Hewlett-Packard.
# These are automatically set up in the SYSTEM-MULTI-NODE
# and MULTI-NODE packages created for disk groups and mount points.
# Customers configure dependencies for FAILOVER type
# packages only; and the dependency would be on a MULTI+NODE mount
# point (MP) package.
#
# Example :
# DEPENDENCY_NAME SG-CFS-MP-1
# DEPENDENCY_CONDITION SG-CFS-MP-1=UP
# DEPENDENCY_LOCATION SAME_NODE
#
#DEPENDENCY_NAME
#DEPENDENCY_CONDITION
#DEPENDENCY_LOCATION SAME_NODE
#
# Enter the SERVICE_NAME, the SERVICE_FAIL_FAST_ENABLED and the
# SERVICE_HALT_TIMEOUT values for this package. Repeat these
# three lines as necessary for additional service names. All
# service names MUST correspond to the SERVICE_NAME[] entries in
# the package control script.
#
# The value for SERVICE_FAIL_FAST_ENABLED can be either YES or
Chapter 6 279
Configuring Packages and Their Services
Creating the Package Configuration
280 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
Chapter 6 281
Configuring Packages and Their Services
Creating the Package Configuration
282 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
NOTE Do not create or edit ASCII configuration files for the Serviceguard
supplied packages VxVM-CVM-pkg, SG-CFS-pkg, SG-CFS-DG-id#, or
SG-CFS-MP-id# Create VxVM-CVM-pkg and SG-CFS-pkg by issuing
the cmapplyconf command. Create and modify SG-CFS-DG-id# and
SG-CFS-MP-id# using the cfs commands listed in Appendix A.
Chapter 6 283
Configuring Packages and Their Services
Creating the Package Configuration
NOTE You should not enter LVM volume groups or VxVM disk groups in
this file.
284 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
RESOURCE_NAME
/net/interfaces/lan/status/lan1
RESOURCE_POLLING_INTERVAL 60
RESOURCE_START DEFERRED
RESOURCE_UP_VALUE = UP
RESOURCE_NAME
/net/interfaces/lan/status/lan2
RESOURCE_POLLING_INTERVAL 60
Chapter 6 285
Configuring Packages and Their Services
Creating the Package Configuration
RESOURCE_START AUTOMATIC
RESOURCE_UP_VALUE = UP
286 Chapter 6
Configuring Packages and Their Services
Creating the Package Configuration
Chapter 6 287
Configuring Packages and Their Services
Creating the Package Control Script
288 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
Chapter 6 289
Configuring Packages and Their Services
Creating the Package Control Script
• If you are using CVM, enter the names of disk groups to be activated
using the CVM_DG[] array parameters, and select the appropriate
storage activation command, CVM_ACTIVATION_CMD. Do not use the
VG[] or VXVM_DG[] parameters for CVM disk groups.
• If you are using VxVM disk groups without CVM, enter the names of
VxVM disk groups that will be imported using the VXVM_DG[] array
parameters. Enter one disk group per array element. Do not use the
CVM_DG[] or VG[] parameters for VxVM disk groups without CVM.
Also, do not specify an activation command.
CFS-based disk groups should not be included in the package control
script; they are activated by the CFS multi-node packages before
standard packages are started.
• If you are using mirrored VxVM disks, specify the mirror recovery
option VXVOL.
• Add the names of logical volumes and the file system that will be
mounted on them.
• Select the appropriate options for the storage activation command
(not applicable for basic VxVM disk groups), and also include options
for mounting filesystems, if desired.
• Specify the filesystem mount retry and unmount count options.
• If your package uses a large number of volume groups or disk groups
or mounts a large number of file systems, consider increasing the
number of concurrent vgchange, mount/umount, and fsck
operations. The default of 1 is adequate for most packages.
• Define IP subnet and IP address pairs for your package. IPv4 or IPv6
addresses may be used.
• Add service name(s).
• Add service command(s)
• Add a service restart parameter, if desired.
NOTE Use care in defining service run commands. Each run command is
executed by the control script in the following way:
290 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
If you need to define a set of run and halt operations in addition to the
defaults, create functions for them in the sections under the heading
CUSTOMER DEFINED FUNCTIONS. If your package needs to run short-lived
processes, such as commands to initialize or halt a packaged application,
you can also run these from the CUSTOMER DEFINED FUNCTIONS.
Chapter 6 291
Configuring Packages and Their Services
Creating the Package Control Script
CAUTION Although Serviceguard uses the -C option within the package control
script framework, this option should not normally be used from the
command line. The “Troubleshooting” section shows some situations
where you might need to use -C from the command line.
The following example shows the command with the same options that
are used by the control script:
# vxdg -tfC import dg_01
This command takes over ownership of all the disks in disk group dg_01,
even though the disk currently has a different host ID written on it. The
command writes the current node’s host ID on all disks in disk group
dg_01 and sets the noautoimport flag for the disks. This flag prevents a
disk group from being automatically re-imported by a node following a
reboot. If a node in the cluster fails, the host ID is still written on each
disk in the disk group. However, if the node is part of a Serviceguard
cluster then on reboot the host ID will be cleared by the owning node
from all disks which have the noautoimport flag set, even if the disk
group is not under Serviceguard control. This allows all cluster nodes,
which have access to the disk group, to be able to import the disks as
part of cluster operation.
The control script also uses the vxvol startall command to start up
the logical volumes in each disk group that is imported.
292 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
• CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS—defines a number of
parallel mount operations during package startup and unmount
operations during package shutdown.
You can use the -s option with FSCK_OPT and FS_UMOUNT_OPT
parameters for environments that use a large number of filesystems. The
-s option allows mount/umounts and fscks to be done in parallel. (With
the standard 11iv1 (11.11) HP-UX, you need to install patches to get this
option.)
# **********************************************************************
# * *
# * HIGH AVAILABILITY PACKAGE CONTROL SCRIPT (template) *
# * *
# * Note: This file MUST be edited before it can be used. *
# * *
# **********************************************************************
. ${SGCONFFILE:=/etc/cmcluster.conf}
Chapter 6 293
Configuring Packages and Their Services
Creating the Package Control Script
#
# Uncomment the second line (VGCHANGE=”vgchange -a e -q n -s”), and comment
# and you want the mirror resynchronization to ocurr in parallel with
# the package startup.
#
# Uncomment the third line (VGCHANGE=”vgchange -a y”) if you wish to
# use non-exclusive activation mode. Single node cluster configurations
# must use non-exclusive activation.
#
# VGCHANGE=”vgchange -a e -q n”
# VGCHANGE=”vgchange -a e -q n -s”
# VGCHANGE=”vgchange -a y”
VGCHANGE=”vgchange -a e” # Default
# VOLUME GROUPS
# Specify which volume groups are used by this package. Uncomment VG[0]=””
# and fill in the name of your first volume group. You must begin with
# VG[0], and increment the list in sequence.
#
# For example, if this package uses your volume groups vg01 and vg02, enter:
294 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
# VG[0]=vg01
# VG[1]=vg02
#
# The volume group activation method is defined above. The filesystems
# associated with these volume groups are specified below.
#
#VG[0]=””
#
# CVM DISK GROUPS
# Specify which cvm disk groups are used by this package. Uncomment
# CVM_DG[0]=”” and fill in the name of your first disk group. You must
# begin with CVM_DG[0], and increment the list in sequence.
#
# For example, if this package uses your disk groups dg01 and dg02, enter:
# CVM_DG[0]=dg01
# CVM_DG[1]=dg02
#
# The cvm disk group activation method is defined above. The filesystems
# associated with these volume groups are specified below in the CVM_
# variables.
#
# NOTE: Do not use CVM and VxVM disk group parameters to reference
# devices used by CFS (cluster file system). CFS resources are
# controlled by the Disk Group and Mount Multi-node packages.
#
#CVM_DG[0]=””
# NOTE: Do not use CVM and VxVM disk group parameters to reference
# devices used by CFS (cluster file system). CFS resources are
# controlled by the Disk Group and Mount Multi-node packages.
#
# VxVM DISK GROUPS
# Specify which VxVM disk groups are used by this package. Uncomment
# VXVM_DG[0]=”” and fill in the name of your first disk group. You must
# begin with VXVM_DG[0], and increment the list in sequence.
#
# For example, if this package uses your disk groups dg01 and dg02, enter:
# VXVM_DG[0]=dg01
# VXVM_DG[1]=dg02
#
# The cvm disk group activation method is defined above.
#
#VXVM_DG[0]=””
#
# NOTE: A package could have LVM volume groups, CVM disk groups and VxVM
# disk groups.
Chapter 6 295
Configuring Packages and Their Services
Creating the Package Control Script
#
# NOTE: When VxVM is initialized it will store the hostname of the
# local node in its volboot file in a variable called ‘hostid’.
# The MC Serviceguard package control scripts use both the values of
# the hostname(1m) command and the VxVM hostid. As a result
# the VxVM hostid should always match the value of the
# hostname(1m) command.
#
# If you modify the local host name after VxVM has been
# initialized and such that hostname(1m) does not equal uname -n,
# you need to use the vxdctl(1m) command to set the VxVM hostid
# field to the value of hostname(1m). Failure to do so will
# result in the package failing to start.
# FILESYSTEMS
# Filesystems are defined as entries specifying the logical volume, the
# mount point, the mount, umount and fsck options and type of the file system.
# Each filesystem will be fsck’d prior to being mounted. The filesystems
# will be mounted in the order specified during package startup and will
# be unmounted in reverse order during package shutdown. Ensure that
# volume groups referenced by the logical volume definitions below are
# included in volume group definitions above.
#
# Specify the filesystems which are used by this package. Uncomment
# LV[0]=””; FS[0]=””; FS_MOUNT_OPT[0]=””; FS_UMOUNT_OPT[0]=””;
# FS_FSCK_OPT[0]=”” FS_TYPE[0]=”” and fill in the name of your
# first logical volume, filesystem, mount, umount and fsck options
# and filesystem type for the file system.
# You must begin with LV[0], FS[0],
# FS_MOUNT_OPT[0], FS_UMOUNT_OPT[0], FS_FSCK_OPT[0], FS_TYPE[0]
# and increment the list in sequence.
#
# Note: The FS_TYPE parameter lets you specify the type of filesystem to be
# mounted. Specifying a particular FS_TYPE will improve package failover time.
296 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
#
# LV[1]=/dev/vg01/lvol2; FS[1]=/pkg01b; FS_MOUNT_OPT[1]=”-o rw”
# FS_UMOUNT_OPT[1]=””; FS_FSCK_OPT[1]=””; FS_TYPE[1]=”vxfs”
#
#LV[0]=””; FS[0]=””; FS_MOUNT_OPT[0]=””; FS_UMOUNT_OPT[0]=””;FS_FSCK_OPT[0]=””
#FS_TYPE[0]=””
#
# VOLUME RECOVERY
#
# When mirrored VxVM volumes are started during the package control
# bring up, if recovery is required the default behavior is for
# the package control script to wait until recovery has been
# completed.
#
# To allow mirror resynchronization to ocurr in parallel with
# the package startup, uncomment the line
# VXVOL=”vxvol -g \$DiskGroup -o bg startall” and comment out the default.
#
# VXVOL=”vxvol -g \$DiskGroup -o bg startall”
VXVOL=”vxvol -g \$DiskGroup startall” # Default
Chapter 6 297
Configuring Packages and Their Services
Creating the Package Control Script
# for the system resources available on your cluster nodes. Some examples
# of system resources that can affect the optimum number of concurrent
# operations are: number of CPUs, amount of available memory, the kernel
# configuration for nfile and nproc. In some cases, if you set the number
# of concurrent operations too high, the package may not be able to start
# or to halt. For example, if you set CONCURRENT_VGCHANGE_OPERATIONS=5
# and the node where the package is started has only one processor, then
# running concurrent volume group activations will not be beneficial.
# It is suggested that the number of concurrent operations be tuned
# carefully, increasing the values a little at a time and observing the
# effect on the performance, and the values should never be set to a value
# where the performance levels off or declines. Additionally, the values
# used should take into account the node with the least resources in the
# cluster, and how many other packages may be running on the node.
# For instance, if you tune the concurrent operations for a package so
# that it provides optimum performance for the package on a node while
# no other packages are running on that node, the package performance
# may be significantly reduced, or may even fail when other packages are
# already running on that node.
#
# CONCURRENT VGCHANGE OPERATIONS
# Specify the number of concurrent volume group activations or
# deactivations to allow during package startup or shutdown.
# Setting this value to an appropriate number may improve the performance
# while activating or deactivating a large number of volume groups in the
# package. If the specified value is less than 1, the script defaults it
# to 1 and proceeds with a warning message in the package control script
# logfile.
CONCURRENT_VGCHANGE_OPERATIONS=1
#
# CONCURRENT FSCK OPERATIONS
# Specify the number of concurrent fsck to allow during package startup.
# Setting this value to an appropriate number may improve the performance
# while checking a large number of file systems in the package. If the
# specified value is less than 1, the script defaults it to 1 and proceeds
# with a warning message in the package control script logfile.
CONCURRENT_FSCK_OPERATIONS=1
298 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS=1
Chapter 6 299
Configuring Packages and Their Services
Creating the Package Control Script
# (netmask=ffff:ffff:ffff:ffff::)
#
# Hint: Run “netstat -i” to see the available IPv6 subnets by looking
# at the address prefixes
# IP/Subnet address pairs for each IP address you want to add to a subnet
# interface card. Must be set in pairs, even for IP addresses on the same
# subnet.
#
#IP[0]=””
#SUBNET[0]=””
# DEFERRED_RESOURCE NAME
# Specify the full path name of the ‘DEFERRED’ resources configured for
# this package. Uncomment DEFERRED_RESOURCE_NAME[0]=”” and fill in the
# full path name of the resource.
#
#DEFERRED_RESOURCE_NAME[0]=””
300 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
The above excerpt from the control script shows the assignment of values
to a set of variables. The remainder of the script uses these variables to
control the package by executing Logical Volume Manager commands,
HP-UX commands, and Serviceguard commands including cmrunserv,
cmmodnet, and cmhaltserv. Examine a copy of the control script template
to see the flow of logic. Use the following command:
# cmmakepkg -s | more
The main function appears at the end of the script.
Note that individual variables are optional; you should include only as
many as you need for proper package operation. For example, if your
package does not need to activate a volume group, omit the VG variables;
if the package does not use services, omit the corresponding
SERVICE_NAME, SERVICE_CMD, and SERVICE_RESTART variables; and so
on.
If you have defined an EMS resource in the package configuration file
that is labeled as DEFERRED, you need to define a
DEFERRED_RESOURCE_NAME in the package control script. Specify only the
deferred resources, using the DEFERRED_RESOURCE_NAME parameter:
DEFERRED_RESOURCE_NAME[0]="/net/interfaces/lan/status/lan0"
DEFERRED_RESOURCE_NAME[1]="/net/interfaces/lan/status/lan1"
Chapter 6 301
Configuring Packages and Their Services
Creating the Package Control Script
function customer_defined_run_cmds
{
# ADD customer defined run commands.
: # do nothing instruction, because a function must contain some command.
date >> /tmp/pkg1.datelog
echo 'Starting pkg1' >> /tmp/pkg1.datelog
test_return 51
}
function customer_defined_halt_cmds
{
# ADD customer defined halt commands.
: # do nothing instruction, because a function must contain some command.
date >> /tmp/pkg1.datelog
echo 'Halting pkg1' >> /tmp/pkg1.datelog
test_return 52
}
302 Chapter 6
Configuring Packages and Their Services
Creating the Package Control Script
Chapter 6 303
Configuring Packages and Their Services
Verifying the Package Configuration
Errors are displayed on the standard output. If necessary, edit the file to
correct any errors, then run the command again until it completes
without errors.
Serviceguard Manager’s Check button and the cmcheckconf command
check the following:
304 Chapter 6
Configuring Packages and Their Services
Distributing the Configuration
Chapter 6 305
Configuring Packages and Their Services
Distributing the Configuration
• Activate the cluster lock volume group so that the lock disk can be
initialized:
# vgchange -a y /dev/vg01
• Generate the binary configuration file and distribute it across the
nodes.
# cmapplyconf -v -C /etc/cmcluster/cmcl.config -P \
/etc/cmcluster/pkg1/pkg1.config
• If you are using a lock disk, deactivate the cluster lock volume group.
# vgchange -a n /dev/vg01
NOTE cmcheckconf and cmapplyconf must be used again any time changes
are made to the cluster and package configuration files.
• Configuration synchronization
• Log consolidation
• Command fan-out
With configuration synchronization, you can have confidence that
systems in your Serviceguard cluster are maintained to a standard you
adopt. As you make changes on your configuration master, those changes
are propagated to all your client systems.
With log consolidation, you can examine a single log that contains
entries from all systems in your configuration, in order of their time
stamps, so you can find a specific entry easily.
306 Chapter 6
Configuring Packages and Their Services
Distributing the Configuration
With command fan-out, you can send the same command from one
designated system to all the systems in your Serviceguard cluster. This
eliminates both visiting all systems in the configuration and many
manual operations.
For additional information on using DSAU, refer to the Managing
Systems and Workgroups manual, posted at http://docs.hp.com.
Chapter 6 307
Configuring Packages and Their Services
Distributing the Configuration
308 Chapter 6
Cluster and Package Maintenance
Chapter 7 309
Cluster and Package Maintenance
Reviewing Cluster and Package Status
310 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
• There are more details in the cluster, node, and package property
sheets. Cluster multi-node packages’ properties are contained in the
cluster properties.
Chapter 7 311
Cluster and Package Maintenance
Reviewing Cluster and Package Status
You can also specify that the output should be formatted as it was in a
specific earlier release by using the -r option indicating the release
format you wish. Example:
# cmviewcl -r A.11.09
The formatting options lets you choose a style. The tabulated format is
designed for viewing. The line format is designed for scripting, and is
easily parsed.
See the man page for a detailed description of other cmviewcl options.
312 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Cluster Status
The status of a cluster may be one of the following:
• Failed. A node never sees itself in this state. Other active members
of the cluster will see a node in this state if that node was in an
active cluster, but is no longer, and is not halted.
• Reforming. A node is in this state when the cluster is re-forming.
The node is currently running the protocols which ensure that all
nodes agree to the new membership of an active cluster. If agreement
is reached, the status database is updated to reflect the new cluster
membership.
• Running. A node in this state has completed all required activity for
the last re-formation and is operating normally.
• Halted. A node never sees itself in this state. Other nodes will see it
in this state after the node has gracefully left the active cluster, for
instance with a cmhaltnode command.
Chapter 7 313
Cluster and Package Maintenance
Reviewing Cluster and Package Status
• Unknown. A node never sees itself in this state. Other nodes assign a
node this state if it has never been an active cluster member.
Package Status and State The status of a package can be one of the
following:
• Starting. The start instructions in the control script are being run.
• Running. Services are active and being monitored.
• Halting. The halt instructions in the control script are being run.
314 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
• Up.
• Down.
• Unknown. Serviceguard cannot determine whether the interface is up
or down. This can happen when the cluster is down. A standby
interface has this status.
Serial Line Status The serial line has only status, as follows:
Chapter 7 315
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
STANDBY up 60/6 lan1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
STANDBY up 32.1 lan1
316 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service2
Subnet up 0 0 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10 (current)
Alternate up enabled ftsys9
Chapter 7 317
Cluster and Package Maintenance
Reviewing Cluster and Package Status
MULTI_NODE_PACKAGES:
When you use the -v option, the display shows the system multi-node
package associated with each active node in the cluster, as in the
following:
MULTI_NODE_PACKAGES:
318 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 VxVM-CVM-pkg.srv
CFS Package Status If the cluster is using the VERITAS Cluster File
System, the system multi-node package SG-CFS-pkg must be running on
all active nodes, and the multi-node packages for disk group and mount
point must also be running on at least one of their configured nodes.
The following shows an example of cmviewcl output for status of these
packages:
# cmviewcl -v -p SG-CFS-pkg
MULTI_NODE_PACKAGES
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
Chapter 7 319
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
STANDBY up 60/6 lan1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 15.13.168.0
Resource up /example/float
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
STANDBY up 32.1 lan1
UNOWNED_PACKAGES
320 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Resource up ftsys9 /example/float
Subnet up ftsys9 15.13.168.0
Resource up ftsys10 /example/float
Subnet up ftsys10 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9
Pkg2 now has the status “down”, and it is shown as in the unowned state,
with package switching disabled. Resource “/example/float,” which is
configured as a dependency of pkg2, is down on one node. Note that
switching is enabled for both nodes, however. This means that once
global switching is re-enabled for the package, it will attempt to start up
on the primary node.
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
STANDBY up 60/6 lan1
Chapter 7 321
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service1
Subnet up 15.13.168.0
Resource up /example/float
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys9 (current)
Alternate up enabled ftsys10
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 service2.1
Subnet up 15.13.168.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled ftsys10
Alternate up enabled ftsys9 (current)
Network_Parameters:
INTERFACE STATUS PATH NAME
322 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Now pkg2 is running on node ftsys9. Note that it is still disabled from
switching.
Both packages are now running on ftsys9 and pkg2 is enabled for
switching. Ftsys10 is running the daemon and no packages are running
on ftsys10.
Chapter 7 323
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
Serial_Heartbeat:
DEVICE_FILE_NAME STATUS CONNECTED_TO:
/dev/tty0p0 up ftsys10 /dev/tty0p0
NODE STATUS STATE
ftsys10 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
Serial_Heartbeat:
DEVICE_FILE_NAME STATUS CONNECTED_TO:
/dev/tty0p0 up ftsys9 /dev/tty0p0
The following display shows status after node ftsys10 has halted:
CLUSTER STATUS
example up
NODE STATUS STATE
ftsys9 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
Serial_Heartbeat:
DEVICE_FILE_NAME STATUS CONNECTED_TO:
324 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
Serial_Heartbeat:
DEVICE_FILE_NAME STATUS CONNECTED_TO:
/dev/tty0p0 unknown ftsys9 /dev/tty0p0
The following shows status when the serial line is not working:
CLUSTER STATUS
example up
NODE STATUS STATE
ftsys9 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 56/36.1 lan0
Serial_Heartbeat:
DEVICE_FILE_NAME STATUS CONNECTED_TO:
/dev/tty0p0 down ftsys10 /dev/tty0p0
NODE STATUS STATE
ftsys10 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 28.1 lan0
Serial_Heartbeat:
DEVICE_FILE_NAME STATUS CONNECTED_TO:
/dev/tty0p0 down ftsys9 /dev/tty0p0
Chapter 7 325
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover min_package_node
Failback automatic
Script_Parameters:
ITEM STATUS NODE_NAME NAME
Resource up manx /resource/random
Subnet up manx 192.8.15.0
Resource up burmese /resource/random
Subnet up burmese 192.8.15.0
Resource up tabby /resource/random
Subnet up tabby 192.8.15.0
Resource up persian /resource/random
Subnet up persian 192.8.15.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled manx
Alternate up enabled burmese
Alternate up enabled tabby
Alternate up enabled persian
326 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
SYSTEM_MULTI_NODE_PACKAGES:
Checking Status with Cluster File System If the cluster his using
the cluster file system, you can check status with the cfscluster
command, as show in the example below:
#cfscluster status
Node : ftsys9
Cluster Manager : up
CVM state : up (MASTER)
/var/opt/sgtest/
tmp/mnt/dev/vx/dsk/
vg_for_cvm1_dd5/1vol1 regular lvol1 vg_for_cvm_dd5 MOUNTED
/var/opt/sgtest/
tmp/mnt/dev/vx/dsk/
vg_for_cvm1_dd5/lvol4 regular lvol4 vg_for_cvm_dd5 MOUNTED
Node : ftsys8
Cluster Manager : up
CVM state : up
/var/opt/sgtest/
tmp/mnt/dev/vx/dsk/
vg_for_cvm1_dd5/lvol1 regular lvol1 vg_for_cvm_veggie_dd5 MOUNTED
/var/opt/sgtest/
tmp/mnt/dev/vx/dsk/
vg_for_cvm1_dd5/lvol4 regular lvol4 vg_for_cvm_dd5 MOUNTED
Chapter 7 327
Cluster and Package Maintenance
Reviewing Cluster and Package Status
#cmviewcl -v -p SG-CFS-pkg
MULTI_NODE_PACKAGES
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 SG-CFS-vxconfigd
Service up 5 0 SG-CFS-sgcvmd
Service up 5 0 SG-CFS-vxfsckd
Service up 0 0 SG-CFS-cmvxd
Service up 0 0 SG-CFS-cmvxpingd
Status of CFS disk group packages To see the status of the disk
group, use the cfsdgadm display command. For example, for the disk-
group logdata, enter:
# cfsdgadm display -v logdata
NODE NAME ACTIVATION MODE
ftsys9 sw (sw)
MOUNT POINT SHARED VOLUME TYPE
ftsys10 sw (sw)
MOUNT POINT SHARED VOLUME TYPE
...
328 Chapter 7
Cluster and Package Maintenance
Reviewing Cluster and Package Status
Status of CFS mount point packages To see the status of the mount point
package, use the cfsmntadm display command. For example, for the mount
point /tmp/logdata/log_files, enter:
# cfsmntadm display -v /tmp/logdata/log_files
Mount Point : /tmp/logdata/log_files
Shared Volume : lvol1
Disk Group : logdata
To see which package is monitoring a mount point, use the cfsmntadm
show_package command. For example, for the diskgroup logdata:
# cfsmntadm show_package /tmp/logdata/log_files
SG-CFS_MP-1
Chapter 7 329
Cluster and Package Maintenance
Managing the Cluster and Nodes
NOTE Manually starting or halting the cluster or individual nodes does not
require access to the quorum server, if one is configured. The quorum
server is only used when tie-breaking is needed following a cluster
partition.
330 Chapter 7
Cluster and Package Maintenance
Managing the Cluster and Nodes
CAUTION Serviceguard cannot guarantee data integrity if you try to start a cluster
with the cmruncl -n command while a subset of the cluster's nodes are
already running a cluster. If the network connection is down between
nodes, using cmruncl -n might result in a second cluster forming, and
this second cluster might start up the same applications that are already
running on the other cluster. The result could be two applications
overwriting each other's data on the disks.
Chapter 7 331
Cluster and Package Maintenance
Managing the Cluster and Nodes
332 Chapter 7
Cluster and Package Maintenance
Managing the Cluster and Nodes
This halts any packages running on the node ftsys9 by executing the
halt instructions in each package's control script. ftsys9 is halted and
the packages start on their adoptive node.
The use of cmhaltnode is a convenient way of bringing a node down for
system maintenance while keeping its packages available on other
nodes. After maintenance, the package can be returned to its primary
node. See “Moving a Package,” below.
To restart a node running in the cluster again, use cmrunnode.
Chapter 7 333
Cluster and Package Maintenance
Managing the Cluster and Nodes
334 Chapter 7
Cluster and Package Maintenance
Managing Packages and Services
• Starting a Package
• Halting a Package
• Moving a Package (halt, then start)
• Changing Package Switching Behavior
In Serviceguard A.11.16 and later, these commands can be done by
non-root users, according to access policies in the cluster’s configuration
files. See “Editing Security Files” on page 190, for more information
about configuring access.
You can use Serviceguard Manager or the Serviceguard command line to
perform these tasks.
Starting a Package
Ordinarily, when a cluster starts up, the packages configured as part of
the cluster will start up on their configured nodes. You may need to start
a package manually after it has been halted manually. You can do this
either in Serviceguard Manager or on the Serviceguard command line.
If any package has a configured dependencies on another package,
Serviceguard will start them in order, ensuring that a package will not
start until its dependency is met.
Chapter 7 335
Cluster and Package Maintenance
Managing Packages and Services
The progress window shows messages as the action takes place. This will
include messages for starting the package.
The cluster must be running in order to start a package.
Halting a Package
You halt a Serviceguard package when you wish to bring the package out
of use but wish the node to continue in operation. You can halt a package
using Serviceguard Manager or on the Serviceguard command line.
Halting a package has a different effect than halting the node. When you
halt the node, its failover packages may switch to adoptive nodes
(assuming that switching is enabled for them); when you halt a failover
package, it is disabled from switching to another node, and must be
restarted manually on another node or on the same node.
System multi-node packages run on all cluster nodes simultaneously;
halting these packages stops them running on all nodes. A multi-node
package can run on several nodes simultaneously; you can halt them on
all the nodes where they are running, or you can specify individual
nodes.
336 Chapter 7
Cluster and Package Maintenance
Managing Packages and Services
You cannot halt a package unless all the packages that depend on it are
down. If you try, Serviceguard will send a message telling why it cannot
complete the operation. If this happens, you can repeat the halt
command, this time including the dependency package(s); Serviceguard
will halt the all the listed packages in the correct order. First, use
cmviewcl to be sure that no other running package has a dependency on
any of the packages you are halting.
Chapter 7 337
Cluster and Package Maintenance
Managing Packages and Services
338 Chapter 7
Cluster and Package Maintenance
Managing Packages and Services
Chapter 7 339
Cluster and Package Maintenance
Managing Packages and Services
340 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Cluster
Reconfiguring a Cluster
You can reconfigure a cluster either when it is halted or while it is still
running. Some operations can only be done when the cluster is halted.
Table 7-1 shows the required cluster state for many kinds of changes.
Table 7-1 Types of Changes to Permanent Cluster Configuration
Chapter 7 341
Cluster and Package Maintenance
Reconfiguring a Cluster
342 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Cluster
Chapter 7 343
Cluster and Package Maintenance
Reconfiguring a Cluster
344 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Cluster
5. Apply the changes to the configuration and send the new binary
configuration file to all cluster nodes:
# cmapplyconf -C clconfig.ascii
Use cmrunnode to start the new node, and, if desired, set the
AUTOSTART_CMCLD parameter to 1 in the /etc/rc.config.d/cmcluster
file to enable the new node to join the cluster automatically each time it
reboots.
NOTE If you add a node to a running cluster that uses CVM disk groups, the
disk groups will be available for import when the node joins the cluster.
To add a node to the cluster, it must already have connectivity to the disk
devices for all CVM disk groups
Chapter 7 345
Cluster and Package Maintenance
Reconfiguring a Cluster
NOTE If you want to remove a node from the cluster, issue the cmapplyconf
command from another node in the same cluster. If you try to issue the
command on the node you want removed, you will get an error message.
NOTE If you are attempting to remove an unreachable node that has many
packages dependent on it, especially if the dependent packages use a
large number of EMS resources, you may see the following message:
The configuration change is too large to process while the
cluster is running.
Split the configuration change into multiple requests or halt
the cluster.
In this situation, you must halt the cluster to remove the node.
346 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Cluster
NOTE If you are removing a volume group from the cluster configuration, make
sure that you also modify or delete any package control script that
activates and deactivates this volume group. In addition, you should use
the LVM vgexport command on the removed volume group from each
node that will no longer be using the volume group.
NOTE If the volume group that you are deleting from the cluster is currently
activated by a package, the configuration will be changed but the
deletion will not take effect until the package is halted; thereafter, the
package will no longer be able to run without further modification, such
as removing the volume group from the package control script.
Chapter 7 347
Cluster and Package Maintenance
Reconfiguring a Cluster
• For CVM 3.5, and for CVM 4.1 without CFS, edit the configuration
ASCII file of the package that uses CVM storage. Add the CVM
storage group in a STORAGE_GROUP statement. Then issue the
cmapplyconf command.
• For CVM 4.1 with CFS, edit the configuration ASCII file of the
package that uses CFS. Fill in the three-part DEPENDENCY
parameter. Then issue the cmapplyconf command.
Similarly, you can delete VxVM or CVM disk groups provided they are
not being used by a cluster node at the time.
NOTE If you are removing a disk group from the cluster configuration, make
sure that you also modify or delete any package control script that
imports and deports this disk group. If you are removing a disk group
managed by CVM without CFS, be sure to remove the STORAGE_GROUP
entries for the disk group from the package ASCII file. If you are
removing a disk group managed by CVM with CFS, be sure to remove
the DEPENDENCY parameter.
348 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Package
Reconfiguring a Package
The process of reconfiguration of a package is somewhat like the basic
configuration described in Chapter 6. Refer to that chapter for details on
the configuration process.
The cluster can be either halted or running during package
reconfiguration. The types of changes that can be made and the times
when they take effect depend on whether the package is running or not.
If you reconfigure a package while it is running, it is possible that the
package could fail later, even if the cmapplyconf succeeded. For
example, consider a package with two volume groups. When this package
started, it activated both volume groups. While the package is running,
you could change its configuration to list only one of the volume groups,
and cmapplyconf would succeed. If you issue cmhaltpkg command,
however, the halt would fail. The modified package would not deactivate
both of the volume groups that it had activated at startup, because it
would only see the one volume group in its current configuration file.
Chapter 7 349
Cluster and Package Maintenance
Reconfiguring a Package
• Copy the modified control script to all nodes that can run the
package. (Done automatically in Serviceguard Manager as part of
Apply.)
• Use the Serviceguard Manager Run Cluster command, or enter
cmruncl on the command line to start the cluster on all nodes or on a
subset of nodes, as desired. The package will start up as nodes come
online.
350 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Package
Chapter 7 351
Cluster and Package Maintenance
Reconfiguring a Package
package is down; the cluster may be up. This removes the package
information from the binary configuration file on all the nodes in the
cluster.
The following example halts the failover package mypkg and removes the
package configuration from the cluster:
# cmhaltpkg mypkg
# cmdeleteconf -p mypkg
The command prompts for a verification before deleting the files unless
you use the -f option. The directory /etc/cmcluster/mypkg is not
deleted by this command.
You can remove nodes from a multi-node package configuration using
the cfs commands listed in Appendix A. All the packages that depend
on the multi-node package must be halted on that node.
To remove the CFS mount point and disk group packages, follow these
steps:
NOTE Any form of the mount command (for example, mount -o cluster,
dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or
cfsumount in a HP Serviceguard Storage Management Suite
environment with CFS should be done with caution. These non-cfs
commands could cause conflicts with subsequent command operations on
352 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Package
NOTE Any form of the mount command (for example, mount -o cluster,
dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or
cfsumount in a HP Serviceguard Storage Management Suite
environment with CFS should be done with caution. These non-cfs
commands could cause conflicts with subsequent command operations on
the file system or Serviceguard packages. Use of these other forms of
mount will not create an appropriate multi-node package which means
that the cluster packages are not aware of the file system changes.
The current value of the restart counter may be seen in the output of the
cmviewcl -v command.
Chapter 7 353
Cluster and Package Maintenance
Reconfiguring a Package
Change to the
Package Required Package State
354 Chapter 7
Cluster and Package Maintenance
Reconfiguring a Package
Change to the
Package Required Package State
Chapter 7 355
Cluster and Package Maintenance
Reconfiguring a Package
Change to the
Package Required Package State
356 Chapter 7
Cluster and Package Maintenance
Responding to Cluster Events
Chapter 7 357
Cluster and Package Maintenance
Removing Serviceguard from a System
358 Chapter 7
Troubleshooting Your Cluster
Chapter 8 359
Troubleshooting Your Cluster
Testing Cluster Operation
CAUTION In testing the cluster in the following procedures, be aware that you are
causing various components of the cluster to fail, so that you can
determine that the cluster responds correctly to failure situations. As a
result, the availability of nodes and applications may be disrupted.
360 Chapter 8
Troubleshooting Your Cluster
Testing Cluster Operation
The node should be recognized by the cluster, but its packages should
not be running.
5. Move the packages back to original node using Serviceguard
Manager.
Select the package from the map or tree. From the Actions menu,
choose Administering Packages -> Move package.
6. Repeat this procedure for all nodes in the cluster one at a time.
Chapter 8 361
Troubleshooting Your Cluster
Testing Cluster Operation
362 Chapter 8
Troubleshooting Your Cluster
Monitoring Hardware
Monitoring Hardware
Good standard practice in handling a high availability system includes
careful fault monitoring so as to prevent failures if possible or at least to
react to them swiftly when they occur. The following should be monitored
for errors or warnings of all kinds:
• Disks
• CPUs
• Memory
• LAN cards
• Power sources
• All cables
• Disk interface cards
Some monitoring can be done through simple physical inspection, but for
the most comprehensive monitoring, you should examine the system log
file (/var/adm/syslog/syslog.log) periodically for reports on all configured
HA devices. The presence of errors relating to a device will show the need
for maintenance.
When the proper redundancy has been configured, failures can occur
with no external symptoms. Proper monitoring is important. For
example, if a Fibre Channel switch in a redundant mass storage
configuration fails, LVM will automatically fail over to the alternate path
through another Fibre Channel switch. Without monitoring, however,
you may not know that the failure has occurred, since the applications
are still running normally. But at this point, there is no redundant path
if another failover occurs, so the mass storage configuration is
vulnerable.
Chapter 8 363
Troubleshooting Your Cluster
Monitoring Hardware
364 Chapter 8
Troubleshooting Your Cluster
Monitoring Hardware
Chapter 8 365
Troubleshooting Your Cluster
Replacing Disks
Replacing Disks
The procedure for replacing a faulty disk mechanism depends on the type
of disk configuration you are using. Separate descriptions are provided
for replacing an array mechanism and a disk in a high availability
enclosure.
For more information, see When Good Disks Go Bad (5991-1236), posted
at http://docs.hp.com
1. Identify the physical volume name of the failed disk and the name of
the volume group in which it was configured. In the following
examples, the volume group name is shown as /dev/vg_sg01 and
the physical volume name is shown as /dev/dsk/c2t3d0. Substitute
the volume group and physical volume names that are correct for
your system.
2. Identify the names of any logical volumes that have extents defined
on the failed physical volume.
3. On the node on which the volume group is currently activated, use
the following command for each logical volume that has extents on the
failed physical volume:
# lvreduce -m 0 /dev/vg_sg01/lvolname /dev/dsk/c2t3d0
4. At this point, remove the failed disk and insert a new one. The new
disk will have the same HP-UX device name as the old one.
366 Chapter 8
Troubleshooting Your Cluster
Replacing Disks
5. On the node from which you issued the lvreduce command, issue
the following command to restore the volume group configuration
data to the newly inserted disk:
# vgcfgrestore -n /dev/vg_sg01 /dev/dsk/c2t3d0
6. Issue the following command to extend the logical volume to the
newly inserted disk:
# lvextend -m 1 /dev/vg_sg01 /dev/dsk/c2t3d0
7. Finally, use the lvsync command for each logical volume that has
extents on the failed physical volume. This synchronizes the extents
of the new disk with the extents of the other mirror.
# lvsync /dev/vg_sg01/lvolname
Chapter 8 367
Troubleshooting Your Cluster
Replacement of I/O Cards
1. Halt the node. In Serviceguard Manager, select the node; from the
Actions menu, choose Administering Serviceguard -> Halt node.
Or, from the Serviceguard command line, use the cmhaltnode
command. Packages should fail over normally to other nodes.
2. Remove the SCSI cable from the card.
3. Using SAM, select the option to do an on-line replacement of an I/O
card.
4. Remove the defective SCSI card.
5. Install the new SCSI card. The new card must be exactly the same
card type, and it must be installed in the same slot as the card you
removed. You must set the SCSI ID for the new card to be the same
as the card it is replacing.
6. In SAM, select the option to attach the new SCSI card.
7. Add the node back into the cluster. In Serviceguard Manager, select
the node; from the Actions menu, choose Administering Serviceguard
-> Run Node. Or, from the Serviceguard command line, issue the
cmrunnode command.
368 Chapter 8
Troubleshooting Your Cluster
Replacement of LAN or Fibre Channel Cards
Off-Line Replacement
The following steps show how to replace an I/O card off-line. These steps
apply to both HP-UX 11.0 and 11i:
On-Line Replacement
If your system hardware supports hotswap I/O cards, and if the system is
running HP-UX 11i (B.11.11 or later), you have the option of replacing
the defective I/O card on-line. This will significantly improve the overall
availability of the system. To do this, follow the steps provided in the
section “How to On-line Replace (OLR) a PCI Card Using SAM” in the
document Configuring HP-UX for Peripherals. The OLR procedure also
requires that the new card must be exactly the same card type as the
card you removed to avoid improper operation of the network driver.
Serviceguard will automatically recover a LAN card once it has been
replaced and reconnected to the network.
Chapter 8 369
Troubleshooting Your Cluster
Replacement of LAN or Fibre Channel Cards
NOTE After replacing a Fibre Channel I/O card, it may necessary to reconfigure
the SAN to use the World Wide Name (WWN) of the new Fibre Channel
card if Fabric Zoning or other SAN security requiring WWN is used.
370 Chapter 8
Troubleshooting Your Cluster
Replacing a Failed Quorum Server System
Chapter 8 371
Troubleshooting Your Cluster
Replacing a Failed Quorum Server System
NOTE While the old quorum server is down and the new one is being set up,
these things can happen:
NOTE Make sure that the old Quorum Server system does not re-join the
network with the old IP address.
372 Chapter 8
Troubleshooting Your Cluster
Troubleshooting Approaches
Troubleshooting Approaches
The following sections offer a few suggestions for troubleshooting by
reviewing the state of the running system and by examining cluster
status data, log files, and configuration files. Topics include:
Chapter 8 373
Troubleshooting Your Cluster
Troubleshooting Approaches
IPv6:
Name Mtu Address/Prefix Ipkts Opkts
lan1* 1500 none 0 0
lo0 4136 ::1/128 10690 10690
374 Chapter 8
Troubleshooting Your Cluster
Troubleshooting Approaches
Chapter 8 375
Troubleshooting Your Cluster
Troubleshooting Approaches
376 Chapter 8
Troubleshooting Your Cluster
Troubleshooting Approaches
Chapter 8 377
Troubleshooting Your Cluster
Troubleshooting Approaches
node-specific items for all nodes in the cluster. cmscancl actually runs
several different HP-UX commands on all nodes and gathers the output
into a report on the node where you run the command.
To run the cmscancl command, the root user on the cluster nodes must
have the .rhosts file configured to allow the command to complete
successfully. Without that, the command can only collect information on
the local node, rather than all cluster nodes.
The following are the types of configuration data that cmscancl displays
for each node:
Table 8-1 Data Displayed by the cmscancl Command
378 Chapter 8
Troubleshooting Your Cluster
Troubleshooting Approaches
Chapter 8 379
Troubleshooting Your Cluster
Solving Problems
Solving Problems
Problems with Serviceguard may be of several types. The following is a
list of common categories of problem:
380 Chapter 8
Troubleshooting Your Cluster
Solving Problems
Name: ftsys9.cup.hp.com
Address: 15.13.172.229
If the output of this command does not include the correct IP address of
the node, then check your name resolution services further.
Cluster Re-formations
Cluster re-formations may occur from time to time due to current cluster
conditions. Some of the causes are as follows:
• local switch on an Ethernet LAN if the switch takes longer than the
cluster NODE_TIMEOUT value. To prevent this problem, you can
increase the cluster NODE_TIMEOUT value, or you can use a different
LAN type.
• excessive network traffic on heartbeat LANs. To prevent this, you
can use dedicated heartbeat LANs, or LANs with less traffic on
them.
• an overloaded system, with too much total I/O and network traffic.
• an improperly configured network, for example, one with a very large
routing table.
In these cases, applications continue running, though they might
experience a small performance impact during cluster re-formation.
Chapter 8 381
Troubleshooting Your Cluster
Solving Problems
You can use the following commands to check the status of your disks:
382 Chapter 8
Troubleshooting Your Cluster
Solving Problems
NOTE Any form of the mount command (for example, mount -o cluster,
dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or
cfsumount in a HP Serviceguard Storage Management Suite
environment with CFS should be done with caution. These non-cfs
commands could cause conflicts with subsequent command operations on
the file system or Serviceguard packages. Use of these other forms of
mount will not create an appropriate multi-node package which means
that the cluster packages are not aware of the file system changes.
Chapter 8 383
Troubleshooting Your Cluster
Solving Problems
384 Chapter 8
Troubleshooting Your Cluster
Solving Problems
2. b - vxfen
3. v w - cvm
4. f - cfs
Any form of the mount command (for example, mount -o cluster,
dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or
cfsumount in a HP Serviceguard Storage Management Suite
environment with CFS should be done with caution. These non-cfs
commands could cause conflicts with subsequent command operations on
the file system or Serviceguard packages. Use of these other forms of
mount will not create an appropriate multi-node package which means
that the cluster packages are not aware of the file system changes.
Also check the syslog file for information.
This can happen if a package is running on a node which then fails before
the package control script can deport the disk group. In these cases, the
host name of the node that had failed is still written on the disk group
header.
When the package starts up on another node in the cluster, a series of
messages is printed in the package log file, as in the following example
(the hostname of the failed system is ftsys9, and the disk group is dg_01):
check_dg: Error dg_01 may still be imported on ftsys9
Chapter 8 385
Troubleshooting Your Cluster
Solving Problems
******************* WARNING**************************
*******************************************************
Follow the instructions in the message to use the force import option (-C)
to allow the current node to import the disk group. Then deport the disk
group, after which it can be used again by the package. Example:
# vxdg -tfC import dg_01
# vxdg deport dg_01
The force import will clear the host name currently written on the disks
in the disk group, after which you can deport the disk group without
error so it can then be imported by a package running on a different
node.
386 Chapter 8
Troubleshooting Your Cluster
Solving Problems
CAUTION This force import procedure should only be used when you are certain
the disk is not currently being accessed by another node. If you force
import a disk that is already being accessed on another node, data
corruption can result.
Chapter 8 387
Troubleshooting Your Cluster
Solving Problems
• netstat -in - to display LAN status and check to see if the package
IP is stacked on the LAN card.
• lanscan - to see if the LAN is on the primary interface or has
switched to the standby interface.
• arp -a - to check the arp tables.
• lanadmin - to display, test, and reset the LAN cards.
Since your cluster is unique, there are no cookbook solutions to all
possible problems. But if you apply these checks and commands and
work your way through the log files, you will be successful in identifying
and solving problems.
Timeout Problems
The following kinds of message in a Serviceguard node’s syslog file may
indicate timeout problems:
Unable to set client version at quorum server
192.6.7.2:reply timed out
Probe of quorum server 192.6.7.2 timed out
These messages could be an indication of an intermittent network; or the
default quorum server timeout may not be sufficient. You can set the
QS_TIMEOUT_EXTENSION to increase the timeout, or you can increase the
heartbeat or node timeout value.
The following kind of message in a Serviceguard node’s syslog file
indicates that the node did not receive a reply to it's lock request on time.
This could be because of delay in communication between the node and
the qs or between the qs and other nodes in the cluster:
388 Chapter 8
Troubleshooting Your Cluster
Solving Problems
Messages
The coordinator node in Serviceguard sometimes sends a request to the
quorum server to set the lock state. (This is different from a request to
obtain the lock in tie-breaking.) If the quorum server’s connection to one
of the cluster nodes has not completed, the request to set may fail with a
two-line message like the following in the quorum server’s log file:
Oct 008 16:10:05:0: There is no connection to the applicant
2 for lock /sg/lockTest1
Oct 08 16:10:05:0:Request for lock /sg/lockTest1 from
applicant 1 failed: not connected to all applicants.
This condition can be ignored. The request will be retried a few seconds
later and will succeed. The following message is logged:
Oct 008 16:10:06:0: Request for lock /sg/lockTest1
succeeded. New lock owners: 1,2.
Chapter 8 389
Troubleshooting Your Cluster
Solving Problems
390 Chapter 8
Serviceguard Commands
A Serviceguard Commands
Command Description
Appendix A 391
Serviceguard Commands
Command Description
392 Appendix A
Serviceguard Commands
Command Description
Appendix A 393
Serviceguard Commands
Command Description
394 Appendix A
Serviceguard Commands
Command Description
Appendix A 395
Serviceguard Commands
Command Description
396 Appendix A
Serviceguard Commands
Command Description
Appendix A 397
Serviceguard Commands
Command Description
398 Appendix A
Serviceguard Commands
Command Description
Appendix A 399
Serviceguard Commands
Command Description
400 Appendix A
Serviceguard Commands
Command Description
Appendix A 401
Serviceguard Commands
Command Description
402 Appendix A
Serviceguard Commands
Command Description
Appendix A 403
Serviceguard Commands
404 Appendix A
Enterprise Cluster Master Toolkit
• HP Apache
• HP Tomcat
• HP CIFS/9000
ECMT includes toolkits from the following database applications:
• Oracle 9i
• Oracle10g
• Informix (11iv 1 only)
• Sybase (11iv 1 only)
• DB2 (11iv 1 only)
• Progress (11iv 1 only)
A separate NFS toolkit is available. Refer to Managing Highly Available
NFS (HP Part Number B5140-90017) for more information.
Other application integration scripts are available from your HP
representative.
Appendix B 405
Enterprise Cluster Master Toolkit
406 Appendix B
Designing Highly Available Cluster Applications
Appendix C 407
Designing Highly Available Cluster Applications
Automating Application Operation
408 Appendix C
Designing Highly Available Cluster Applications
Automating Application Operation
Appendix C 409
Designing Highly Available Cluster Applications
Controlling the Speed of Application Failover
410 Appendix C
Designing Highly Available Cluster Applications
Controlling the Speed of Application Failover
Appendix C 411
Designing Highly Available Cluster Applications
Controlling the Speed of Application Failover
412 Appendix C
Designing Highly Available Cluster Applications
Controlling the Speed of Application Failover
Use Checkpoints
Design applications to checkpoint complex transactions. A single
transaction from the user's perspective may result in several actual
database transactions. Although this issue is related to restartable
transactions, here it is advisable to record progress locally on the client
so that a transaction that was interrupted by a system failure can be
completed after the failover occurs.
For example, suppose the application being used is calculating PI. On
the original system, the application has gotten to the 1,000th decimal
point, but the application has not yet written anything to disk. At that
moment in time, the node crashes. The application is restarted on the
second node, but the application is started up from scratch. The
application must recalculate those 1,000 decimal points. However, if the
application had written to disk the decimal points on a regular basis, the
application could have restarted from where it left off.
Appendix C 413
Designing Highly Available Cluster Applications
Controlling the Speed of Application Failover
414 Appendix C
Designing Highly Available Cluster Applications
Designing Applications to Run on Multiple Systems
Appendix C 415
Designing Highly Available Cluster Applications
Designing Applications to Run on Multiple Systems
416 Appendix C
Designing Highly Available Cluster Applications
Designing Applications to Run on Multiple Systems
• Old network devices between the source and the destination such as
routers had to be manually programmed with MAC and IP address
pairs. The solution to this problem is to move the MAC address along
with the IP address in case of failover.
• Up to 20 minute delays could occur while network device caches were
updated due to timeouts associated with systems going down. This is
dealt with in current HA software by broadcasting a new ARP
translation of the old IP address with the new MAC address.
Use DNS
DNS provides an API which can be used to map hostnames to IP
addresses and vice versa. This is useful for BSD socket applications such
as telnet which are first told the target system name. The application
must then map the name to an IP address in order to establish a
connection. However, some calls should be used with caution.
Appendix C 417
Designing Highly Available Cluster Applications
Designing Applications to Run on Multiple Systems
418 Appendix C
Designing Highly Available Cluster Applications
Designing Applications to Run on Multiple Systems
Appendix C 419
Designing Highly Available Cluster Applications
Designing Applications to Run on Multiple Systems
For TCP stream sockets, the TCP level of the protocol stack resolves this
problem for the client since it is a connection-based protocol. On the
client, TCP ignores the stationary IP address and continues to use the
previously bound relocatable IP address originally used by the client.
With UDP datagram sockets, however, there is a problem. The client
may connect to multiple servers utilizing the relocatable IP address and
sort out the replies based on the source IP address in the server’s
response message. However, the source IP address given in this response
will be the stationary IP address rather than the relocatable application
IP address. Therefore, when creating a UDP socket for listening, the
application must always call bind(2) with the appropriate relocatable
application IP address rather than INADDR_ANY.
If the application cannot be modified as recommended above, a
workaround to this problem is to not use the stationary IP address at all,
and only use a single relocatable application IP address on a given LAN
card. Limitations with this workaround are as follows:
420 Appendix C
Designing Highly Available Cluster Applications
Designing Applications to Run on Multiple Systems
Appendix C 421
Designing Highly Available Cluster Applications
Restoring Client Connections
422 Appendix C
Designing Highly Available Cluster Applications
Restoring Client Connections
the retry to the current server should continue for the amount of
time it takes to restart the server locally. This will keep the client
from having to switch to the second server in the event of a
application failure.
• Use a transaction processing monitor or message queueing software
to increase robustness.
Use transaction processing monitors such as Tuxedo or DCE/Encina,
which provide an interface between the server and the client.
Transaction processing monitors (TPMs) can be useful in creating a
more highly available application. Transactions can be queued such
that the client does not detect a server failure. Many TPMs provide
for the optional automatic rerouting to alternate servers or for the
automatic retry of a transaction. TPMs also provide for ensuring the
reliable completion of transactions, although they are not the only
mechanism for doing this. After the server is back online, the
transaction monitor reconnects to the new server and continues
routing it the transactions.
• Queue Up Requests
As an alternative to using a TPM, queue up requests when the server
is unavailable. Rather than notifying the user when a server is
unavailable, the user request is queued up and transmitted later
when the server becomes available again. Message queueing
software ensures that messages of any kind, not necessarily just
transactions, are delivered and acknowledged.
Message queueing is useful only when the user does not need or
expect response that the request has been completed (i.e, the
application is not interactive).
Appendix C 423
Designing Highly Available Cluster Applications
Handling Application Failures
424 Appendix C
Designing Highly Available Cluster Applications
Handling Application Failures
Appendix C 425
Designing Highly Available Cluster Applications
Minimizing Planned Downtime
426 Appendix C
Designing Highly Available Cluster Applications
Minimizing Planned Downtime
Appendix C 427
Designing Highly Available Cluster Applications
Minimizing Planned Downtime
428 Appendix C
Integrating HA Applications with Serviceguard
D Integrating HA Applications
with Serviceguard
1. Read the rest of this book, including the chapters on cluster and
package configuration, and the Appendix “Designing Highly
Available Cluster Applications.”
2. Define the cluster's behavior for normal operations:
Appendix D 429
Integrating HA Applications with Serviceguard
430 Appendix D
Integrating HA Applications with Serviceguard
Checklist for Integrating HA Applications
Appendix D 431
Integrating HA Applications with Serviceguard
Checklist for Integrating HA Applications
432 Appendix D
Integrating HA Applications with Serviceguard
Checklist for Integrating HA Applications
• Fail one of the systems. For example, turn off the power on node
1. Make sure the package starts up on node 2.
• Repeat failover from node2 back to node1.
2. Be sure to test all combinations of application load during the
testing. Repeat the failover processes under different application
states such as heavy user load versus no user load, batch jobs vs
online transactions, etc.
3. Record timelines of the amount of time spent during the failover for
each application state. A sample timeline might be 45 seconds to
reconfigure the cluster, 15 seconds to run fsck on the filesystems, 30
seconds to start the application and 3 minutes to recover the
database.
Appendix D 433
Integrating HA Applications with Serviceguard
Checklist for Integrating HA Applications
434 Appendix D
Rolling Software Upgrades
You can upgrade the HP-UX operating system and the Serviceguard
software one node at a time without bringing down your clusters. This
process can also be used any time one system needs to be taken offline for
hardware maintenance or patch installations. Until the process of
upgrade is complete on all nodes, you cannot change the cluster
configuration files, and you will not be able to use any of the features of
the new Serviceguard release.
Rolling upgrade is supported with any of the supported revisions of
Serviceguard. You can roll forward from any previous revision to any
higher revision. For example, it is possible to roll from Serviceguard
version A.10.05 on HP-UX 10.10 to version A.11.15 on HP-UX 11.11.
The sections in this appendix are as follows:
Appendix E 435
Rolling Software Upgrades
Steps for Rolling Upgrades
1. Halt the node you wish to upgrade. This will cause the node's
packages to start up on an adoptive node. In Serviceguard Manager,
select the node; from the Actions menu, choose Administering
Serviceguard, Halt node. Or, on the Serviceguard command line,
issue the cmhaltnode command.
2. Edit the /etc/rc.config.d/cmcluster file to include the following
line:
AUTOSTART_CMCLD = 0
3. Upgrade the node to the available HP-UX release, including
Serviceguard. You can perform other software or hardware upgrades
if you wish (such as installation of VERITAS Volume Manager
software), provided you do not detach any SCSI cabling. Refer to the
section on hardware maintenance in the “Troubleshooting” chapter.
4. Edit the /etc/rc.config.d/cmcluster file to include the following
line:
AUTOSTART_CMCLD = 1
5. Restart the cluster on the upgraded node. In Serviceguard Manager,
select the node; from the Actions menu, choose Administering
Serviceguard, Run node. Or, on the Serviceguard command line,
issue the cmrunnode command.
6. Repeat this process for each node in the cluster.
If a cluster were to fail before the rolling upgrade was complete (perhaps
due to a catastrophic power failure), the cluster can be restarted by
entering the cmruncl command from a node which has been upgraded to
the latest revision of the software.
436 Appendix E
Rolling Software Upgrades
Steps for Rolling Upgrades
Appendix E 437
Rolling Software Upgrades
Example of Rolling Upgrade
Step 1.
Halt the first node, as follows
# cmhaltnode -f node1
This will cause PKG1 to be halted cleanly and moved to node 2. The
Serviceguard daemon on node 1 is halted, and the result is shown in
Figure E-2.
438 Appendix E
Rolling Software Upgrades
Example of Rolling Upgrade
Step 2.
Upgrade node 1 to the next operating system release (in this example,
HP-UX 11.00), and install the next version of Serviceguard (11.13), as
shown in Figure E-3.
Step 3.
When upgrading is finished, enter the following command on node 1 to
restart the cluster on node 1.
# cmrunnode -n node1
Appendix E 439
Rolling Software Upgrades
Example of Rolling Upgrade
Step 4.
Repeat the process on node 2. Halt the node, as follows:
# cmhaltnode -f node2
Step 5.
Move PKG2 back to its original node. Use the following commands:
# cmhaltpkg pkg2
# cmrunpkg -n node2 pkg2
# cmmodpkg -e pkg2
440 Appendix E
Rolling Software Upgrades
Example of Rolling Upgrade
Appendix E 441
Rolling Software Upgrades
Limitations of Rolling Upgrades
442 Appendix E
Blank Planning Worksheets
Appendix F 443
Blank Planning Worksheets
Worksheet for Hardware Planning
===============================================================================
Node Information:
===============================================================================
LAN Information:
===============================================================================
===============================================================================
Disk I/O Information:
Hardware Device
Bus Type ______ Path ______________ File Name ______________
Hardware Device
Bus Type ______ Path ______________ File Name ______________
Hardware Device
Bus Type ______ Path ______________ File Name ______________
444 Appendix F
Blank Planning Worksheets
Worksheet for Hardware Planning
Appendix F 445
Blank Planning Worksheets
Power Supply Worksheet
===============================================================================
SPU Power:
===============================================================================
Disk Power:
===============================================================================
Tape Backup Power:
===============================================================================
Other Power:
446 Appendix F
Blank Planning Worksheets
Quorum Server Worksheet
==============================================================================
Appendix F 447
Blank Planning Worksheets
LVM Volume Group and Physical Volume Worksheet
===============================================================================
PV Link 1 PV Link2
448 Appendix F
Blank Planning Worksheets
LVM Volume Group and Physical Volume Worksheet
Appendix F 449
Blank Planning Worksheets
VxVM Disk Group and Disk Worksheet
450 Appendix F
Blank Planning Worksheets
Cluster Configuration Worksheet
Appendix F 451
Blank Planning Worksheets
Cluster Configuration Worksheet
452 Appendix F
Blank Planning Worksheets
Package Configuration Worksheet
Failover Policy:___________________________
_______________________________________________________________________
_______________________________________________________________________
Appendix F 453
Blank Planning Worksheets
Package Control Script Worksheet
VG[0]_______________VG[1]________________VG[2]________________
VGCHANGE: ______________________________________________
CVM_DG[0]______________CVM_DG[1]_____________CVM_DG[2]_______________
CVM_ACTIVATION_CMD: ______________________________________________
VXVM_DG[0]_____________VXVM_DG[1]____________VXVM_DG[2]_____________
================================================================================
Logical Volumes and File Systems:
LV[0]_____________FS[0]_____________________________FS_MOUNT_OPT[0]_________
LV[1]______________________FS[1]____________________FS_MOUNT_OPT[1]_________
LV[2]______________________FS[2]____________________FS_MOUNT_OPT[2]_________
================================================================================
454 Appendix F
Blank Planning Worksheets
Package Control Script Worksheet
Deferred Resources:
Appendix F 455
Blank Planning Worksheets
Package Control Script Worksheet
456 Appendix F
Migrating from LVM to VxVM Data Storage
• Loading VxVM
• Migrating Volume Groups
• Customizing Packages for VxVM
• Customizing Packages for CVM 3.5 and 4.1
• Removing LVM Volume Groups
The emphasis is on the steps you must take to manage the cluster and
packages during migration; detailed instructions for configuring VxVM
disk groups are given in the VERITAS Volume Manager Administrator’s
Guide and the VERITAS Volume Manager Migration Guide at
http://docs.hp.com/. Refer to Chapter 5 if you wish to create basic storage
for a new system starting with fresh disks.
The procedures described below can be carried out while the cluster is
running, but any package that uses a volume group that is being
migrated must be halted. For disk groups that will be used with the
Cluster Volume Manager (CVM), an additional set of steps is provided.
Appendix G 457
Migrating from LVM to VxVM Data Storage
Loading VxVM
Loading VxVM
Before you can begin migrating data, you must install the VERITAS
Volume Manager software and all required VxVM licenses on all cluster
nodes. This step requires each system to be rebooted, so it requires you to
remove the node from the cluster before the installation, and restart the
node after installation. This can be done as a part of a rolling upgrade
procedure, described in Appendix E.
Details about VxVM installation are provided in the VERITAS Volume
Manager Release Notes, available from http://www.docs.hp.com.
458 Appendix G
Migrating from LVM to VxVM Data Storage
Migrating Volume Groups
1. Halt the package that activates the volume group you wish to
convert to VxVM:
# cmhaltpkg PackageName
2. Activate the LVM volume group in read-only mode:
# vgchange -a r VolumeGroupName
3. Back up the volume group’s data, using whatever means are most
appropriate for the data contained on this volume group. For
example, you might use a backup/restore utility such as Omniback,
or you might use an HP-UX utility such as dd.
4. Back up the volume group configuration:
# vgcfgbackup
5. Define the new VxVM disk groups and logical volumes. You will need
to have enough additional disks available to create a VxVM version
of all LVM volume groups. You should create VxVM logical volumes
that have the same general layout as the LVM configuration. For
example, an LVM mirrored volume might have one mirror copy on
one SCSI controller and a second copy on another controller to guard
against a single controller failure disabling an entire volume.
(Physical volume groups are sometimes used in LVM to enforce this
separation.) The same mirroring pattern should be followed in
creating the VxVM plexes, with different plexes configured on disks
that are attached to different buses.
Appendix G 459
Migrating from LVM to VxVM Data Storage
Migrating Volume Groups
NOTE Remember that the cluster lock disk must be configured on an LVM
volume group and physical volume. If you have a lock volume group
containing data that you wish to move to VxVM, you can do so, but
do not use vxvmconvert, because the LVM header is still required for
the lock disk.
6. Restore the data to the new VxVM disk groups. Use whatever means
are most appropriate for the way in which the data was backed up in
step 3 above.
460 Appendix G
Migrating from LVM to VxVM Data Storage
Customizing Packages for VxVM
• The VXVM_DG[] array. This defines the VxVM disk groups that
are used for this package. The first VxVM_DG[] entry should be in
index 0, the second in 1, etc. For example:
VXVM_DG[0]="dg01"
VXVM_DG[1]="dg02"
• The LV[], FS[] and FS_MOUNT_OPT[] arrays are used the same
as they are for LVM. LV[] defines the logical volumes, FS[]
defines the mount points, and FS_MOUNT_OPT[] defines any
mount options. For example lets say we have two volumes
defined in each of the two disk groups from above, lvol101 and
lvol102, and lvol201 and lvol202. These are mounted on
/mnt_dg0101 and /mnt_dg0102, and /mnt_dg0201 and
/mnt_dg0202, respectively.
/mnt_dg0101 and /mnt_dg0201 are both mounted read only. The
LV[], FS[] and FS_MOUNT_OPT[] entries for these would be as
follows:
Appendix G 461
Migrating from LVM to VxVM Data Storage
Customizing Packages for VxVM
LV[0]="/dev/vx/dsk/dg01/lvol101"
LV[1]="/dev/vx/dsk/dg01/lvol102"
LV[2]="/dev/vx/dsk/dg02/lvol201"
LV[3]="/dev/vx/dsk/dg02/lvol202"
FS[0]="/mnt_dg0101"
FS[1]="/mnt_dg0102"
FS[2]="/mnt_dg0201"
FS[3]="/mnt_dg0202"
FS_MOUNT_OPT[0]="-o ro"
FS_MOUNT_OPT[1]="-o rw"
FS_MOUNT_OPT[2]="-o ro"
FS_MOUNT_OPT[3]="-o rw"
4. Be sure to copy from the old script any user-specific code that may
have been added, including environment variables and customer
defined functions.
5. Distribute the new package control scripts to all nodes in the cluster.
6. Test to make sure the disk group and data are intact.
7. Deport the disk group:
# vxdg deport DiskGroupName
8. Make the disk group visible to the other nodes in the cluster by
issuing the following command on all other nodes:
# vxdctl enable
9. Restart the package.
462 Appendix G
Migrating from LVM to VxVM Data Storage
Customizing Packages for CVM 3.5 and 4.1
• The CVM_DG[] array. This defines the CVM disk groups that are
used for this package. The first CVM_DG[] entry should be in
index 0, the second in 1, etc. For example:
CVM_DG[0]="dg01"
CVM_DG[1]="dg02"
• The LV[], FS[] and FS_MOUNT_OPT[] arrays are used the same
as they are for LVM. LV[] defines the logical volumes, FS[]
defines the mount points, and FS_MOUNT_OPT[] defines any
mount options.
For example lets say we have two volumes defined in each of the
two disk groups from above, lvol101 and lvol102, and lvol201
and lvol202. These are mounted on /mnt_dg0101 and
/mnt_dg0102, and /mnt_dg0201 and /mnt_dg0202, respectively.
/mnt_dg0101 and /mnt_dg0201 are both mounted read-only. The
LV[], FS[] and FS_MOUNT_OPT[] entries for these would be as
follows:
Appendix G 463
Migrating from LVM to VxVM Data Storage
Customizing Packages for CVM 3.5 and 4.1
LV[0]="/dev/vx/dsk/dg01/lvol101"
LV[1]="/dev/vx/dsk/dg01/lvol102"
LV[2]="/dev/vx/dsk/dg02/lvol201"
LV[3]="/dev/vx/dsk/dg02/lvol202"
FS[0]="/mnt_dg0101"
FS[1]="/mnt_dg0102"
FS[2]="/mnt_dg0201"
FS[3]="/mnt_dg0202"
FS_MOUNT_OPT[0]="-o ro"
FS_MOUNT_OPT[1]="-o rw"
FS_MOUNT_OPT[2]="-o ro"
FS_MOUNT_OPT[3]="-o rw"
4. Be sure to copy from the old script any user-specific code that may
have been added, including environment variables and customer
defined functions.
5. Be sure to uncomment the appropriate CVM_ACTIVATION_CMD
statement to specify the kind of import you wish the package to
perform on the disk group.
6. Distribute the new package control scripts to all nodes in the cluster.
7. Enter each disk group into the package ASCII configuration file
immediately following the HALT_SCRIPT_TIMEOUT parameter. Add
one STORAGE_GROUP definition for each disk group. For the two disk
groups in the previous example, you would enter the following lines:
STORAGE_GROUP dg01
STORAGE_GROUP dg02
464 Appendix G
Migrating from LVM to VxVM Data Storage
Customizing Packages for CVM 3.5 and 4.1
11. When CVM starts up, it selects a master node, and this is the node
from which you must issue the disk group configuration commands.
To determine the master node, issue the following command from
each node in the cluster:
# vxdctl -c mode
One node will identify itself as the master.
12. Make the disk group visible to the other nodes in the cluster by
issuing the following command on the master node:
# vxdg -s import DiskGroupName
13. Restart the package.
Appendix G 465
Migrating from LVM to VxVM Data Storage
Removing LVM Volume Groups
466 Appendix G
IPv6 Network Support
Appendix H 467
IPv6 Network Support
IPv6 Address Types
Anycast An address for a set of interfaces. In most cases these interfaces belong to
different nodes. A packet sent to an anycast address is delivered to one of
these interfaces identified by the address. Since the standards for using
anycast addresses is still evolving, they are not supported in HP-UX as of
now.
• The first form is x:x:x:x:x:x:x:x, where 'x's are the hexadecimal values
of eight 16-bit pieces of the 128-bit address. Example:
2001:fecd:ba23:cd1f:dcb1:1010:9234:4088.
• Some of the IPv6 addresses may contain a long strings of zero bits. In
order to make it easy for representing such addresses textually a
special syntax is available. The use of "::" indicates that there are
468 Appendix H
IPv6 Network Support
IPv6 Address Types
multiple groups of 16-bits of zeros. The "::" can appear only once in
an address and it can be used to compress the leading, trailing, or
contiguous sixteen-bit zeroes in an address. Example:
fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234.
• When dealing with a mixed environment of IPv4 and IPv6 nodes
there is an alternative form of IPv6 address that will be used. It is
x:x:x:x:x:x:d.d.d.d, where 'x's are the hexadecimal values of higher
order 96 bits of IPv6 address and the 'd's are the decimal values of
the 32-bit lower order bits. Typically IPv4 Mapped IPv6 addresses
and IPv4 Compatible IPv6 addresses will be represented in this
notation. These addresses will be discussed in later sections.
Examples:
0:0:0:0:0:0:10.1.2.3
and
::10.11.3.123
Appendix H 469
IPv6 Network Support
IPv6 Address Types
Unicast Addresses
IPv6 unicast addresses are classified into different types. They are global
aggregatable unicast address, site-local address and link-local address.
Typically a unicast address is logically divided as follows:
Table H-2
Example:
::192.168.0.1
470 Appendix H
IPv6 Network Support
IPv6 Address Types
Example:
::ffff:192.168.0.1
Table H-5
3 13 8 24 16 64 bits
where
FP = Format prefix. Value of this is "001" for Aggregatable Global
unicast addresses.
TLA ID = Top-level Aggregation Identifier.
RES = Reserved for future use.
NLA ID = Next-Level Aggregation Identifier.
SLA ID = Site-Level Aggregation Identifier.
Interface ID = Interface Identifier.
Appendix H 471
IPv6 Network Support
IPv6 Address Types
Link-Local Addresses
Link-local addresses have the following format:
Table H-6
1111111010 0 interface ID
Site-Local Addresses
Site-local addresses have the following format:
Table H-7
Link-local address are supposed to be used within a site. Routers will not
forward any packet with site-local source or destination address outside
the site.
Multicast Addresses
A multicast address is an identifier for a group of nodes. Multicast
addresses have the following format:
Table H-8
472 Appendix H
IPv6 Network Support
IPv6 Address Types
Appendix H 473
IPv6 Network Support
Network Configuration Restrictions
474 Appendix H
IPv6 Network Support
Network Configuration Restrictions
Appendix H 475
IPv6 Network Support
IPv6 Relocatable Address and Duplicate Address Detection Feature
476 Appendix H
IPv6 Network Support
IPv6 Relocatable Address and Duplicate Address Detection Feature
# TRANSPORT_NAME[index]=ip6
# NDD_NAME[index]=ip6_nd_dad_solicit_count
# NDD_VALUE[index]=n
Where index is the next available integer value of the nddconf file, and
n is a number: either 1 to turn the feature ON or 0 to turn it OFF.
Appendix H 477
IPv6 Network Support
Local Primary/Standby LAN Patterns
• Since a network interface card can have both an IPv4 and an IPv6
address as the primary IPs, the standby network interface card could
potentially act as a standby for both types of primary interfaces.
However, if the IPv4 and IPv6 address(es) are configured on two
separate network interfaces, then the standby interface can take
over the IP address from only one network interface during a local
failover.
That is, IPv4 and IPv6 addresses from two separate network
interfaces are mutually exclusive in a failover condition.
• Serviceguard will switch over link-local address configured on the
primary network interface along with all other IP addresses which
are configured as part of the cluster configuration to the standby
network interface. This includes all heartbeat and stationary IPs
(IPv4 and IPv6) and package IPs (both IPv4 and IPv6) added by
Serviceguard.
The examples that follow illustrate this.
478 Appendix H
IPv6 Network Support
Example Configurations
Example Configurations
An example of a LAN configuration on a cluster node using both IPv4
and IPv6 addresses is shown in below.
Following the loss of lan0 or lan2, lan1 can adopt either address, as
shown below.
Figure H-2 Example 1: IPv4 and IPv6 Addresses after Failover to Standby
The same LAN card can be configured with both IPv4 and IPv6
addresses, as shown in below.
Appendix H 479
IPv6 Network Support
Example Configurations
Figure H-4 Example 2: IPv4 and IPv6 Addresses After Failover to Standby
480 Appendix H
A ARP messages
Access Control Policies, 178, 193 after switching, 108
Access Control Policy, 161 array
Access roles, 161 replacing a faulty mechanism, 366
active node, 23 arrays
adding a package to a running cluster, 351 disk arrays for data protection, 45
adding cluster nodes ASCII cluster configuration file template, 225
advance planning, 205 ASCII package configuration file template,
adding nodes to a running cluster, 332 274
adding nodes while the cluster is running, auto port aggregation
344 define, 109
adding packages on a running cluster, 287 AUTO_RUN
additional package resource in sample ASCII package configuration file,
parameter in package configuration, 176, 274
177 parameter in package configuration, 171
additional package resources AUTO_RUN parameter, 289
monitoring, 83 AUTO_START_TIMEOUT
addressing, SCSI, 140 in sample configuration file, 226
administration parameter in cluster manager
adding nodes to a ruuning cluster, 332 configuration, 160
cluster and package states, 313 automatic failback
halting a package, 336 configuring with failover policies, 80
halting the entire cluster, 334 automatic restart of cluster, 66
moving a package, 337 automatic switching
of packages and services, 335 parameter in package configuration, 171
of the cluster, 330 automatically restarting the cluster, 334
reconfiguring a package while the cluster is automating application operation, 408
running, 350 autostart delay
parameter in the cluster configuration file,
reconfiguring a package with the cluster
160
offline, 349
autostart for clusters
reconfiguring the cluster, 342
setting up, 264
removing nodes from operation in a ruuning
cluster, 332
responding to cluster events, 357 B
reviewing configuration files, 375, 376 backing up cluster lock information, 202
starting a cluster when all nodes are down, binding
330 in network applications, 419
starting a package, 335 bootstrap cmclnodelist, 193
bridged net
troubleshooting, 373 defined, 38
adoptive node, 23
for redundancy in network interfaces, 38
alternate Links
creating volume groups with, 213 building a cluster
ASCII cluster configuration file template,
APA
auto port aggregation, 109 225
applications CFS infrastructure, 240
automating, 408 cluster configuration steps, 224
checklist of steps for integrating with CVM infrastructure, 251
Serviceguard, 429 identifying cluster lock volume group, 231
handling failures, 424 identifying heartbeat subnets, 233
writing HA services for networks, 409 identifying quorum server, 232
481
logical volume infrastructure, 209 and power supplies, 53
verifying the cluster configuration, 235 backup up lock data, 202
VxVM infrastructure, 218 dual lock disk, 70
bus type identifying in configuration file, 231, 232
hardware planning, 141 no locks, 71
single lock disk, 69
C storing configuration data, 238
CFS two nodes, 68
Creating a storage infrastructure, 240 use in re-forming a cluster, 68
creating a storage infrastructure, 240 cluster manager
changes in cluster membership, 66 automatic restart of cluster, 66
changes to cluster allowed while the cluster blank planning worksheet, 451
is running, 341 cluster node parameter, 156, 157
changes to packages allowed while the cluster volume group parameter, 161
cluster is running, 354 defined, 64
changing the volume group configuration dynamic re-formation, 66
while the cluster is running, 347 heartbeat interval parameter, 159
checkpoints, 413 heartbeat subnet parameter, 157
client connections
initial configuration of the cluster, 64
restoring in applications, 422
cluster main functions, 64
configuring with commands, 225 maximum configured packages parameter,
redundancy of components, 36 161
Serviceguard, 22 monitored non-heartbeat subnet, 158
typical configuration, 21 network polling interval parameter, 160
understanding components, 36 node timeout parameter, 159
cluster administration, 330 physical lock volume parameter, 157
solving problems, 380 planning the configuration, 156
cluster and package maintenance, 309 quorum server parameter, 156
cluster configuration quorum server polling interval parameter,
creating with SAM or Commands, 224 156
file on all nodes, 64 quorum server timeout extension
identifying cluster lock volume group, 231 parameter, 156
identifying cluster-aware volume groups, serial device file parameter, 159
233 testing, 361
planning, 154 Cluster monitoring, 260
planning worksheet, 162 cluster node
sample diagram, 135 parameter in cluster manager
verifying the cluster configuration, 235 configuration, 156, 157
cluster configuration file, 226 startup and shutdown OPS instances, 289
Autostart Delay parameter cluster parameters
(AUTO_START_TIMEOUT), 160 initial configuration, 64
cluster coordinator cluster re-formation time , 155
defined, 64 cluster startup
cluster lock, 67 manual, 66
4 or more nodes, 71 cluster volume group
and cluster re-formation time, 155 creating physical volumes, 211
482
parameter in cluster manager of the cluster, 64
configuration, 161 package, 269
cluster with high availability disk array package planning, 164
figure, 49, 50 service, 269
CLUSTER_NAME (cluster name) configuration file
in sample configuration file, 226 for cluster manager, 64
clusters troubleshooting, 375, 376
active/standby type, 54 Configuration synchronization, 258
larger size, 54 Configuration synchronization with DSAU ,
cmapplyconf , 237, 305 32
cmassistd daemon, 59 configuration with dual attach FDDI stations
cmcheckconf, 236, 304 figure, 41
troubleshooting, 377 Configuring clusters with Serviceguard
cmclconfd daemon, 58 , 59 command line, 225
cmcld daemon, 58, 59 Configuring clusters with Serviceguard
cmclnodelist bootstrap file, 193 Manager, 224
cmdeleteconf configuring failover packages, 273
deleting a package configuration, 351 configuring multi-node packages, 272
deleting the cluster configuration, 266 configuring packages and their services, 269
cmfileassistd daemon, 58, 60 configuring system multi-node packages, 271
cmlogd daemon, 59, 60 Consolidation, log, 259
cmlvmd daemon, 59, 60 control script
cmmodnet adding customer defined functions, 302
assigning IP addresses in control scripts, creating with commands, 288
101 creating with SAM, 288
cmnetassist daemon, 62 in package configuration, 288
cmnetassistd daemon, 59 pathname parameter in package
cmomd daemon, 59, 60 configuration, 173
cmquerycl support for additional productss, 303
troubleshooting, 377 troubleshooting, 376
cmsnmpd daemon, 59, 61 controlling the speed of application failover,
cmsrvassistd daemon, 61 410
cmvxd daemon, 59 creating the package configuration, 270
cmvxd for CVM and CFS, 63 customer defined functions
cmvxping for CVM and CFS, 63 adding to the control script, 302
cmvxpingd daemon, 59 CVM , 117, 119
command fan-out with DSAU, 32 creating a storage infrastructure, 251
CONCURRENT_FSCK_OPERATIONS
planning, 152
parameter in package control script, 183
use of the VxVM-CVM-pkg, 253
CONCURRENT_MOUNT_OPERATIONS
parameter in package control script, 183 CVM planning
CONCURRENT_VGCHANGE_OPERATIO Version 3.5, 166
NS Version 4.1 with CFS, 166
parameter in package control script, 182 Version 4.1 without CFS, 166
configuration CVM_ACTIVATION_CMD, 180, 181
ASCII cluster configuration file template, in package control script, 291
225 CVM_DG
ASCII package configuration file template, in package control script, 291
274
basic tasks and steps, 33 D
cluster planning, 154 data
disks, 44
483
data congestion, 65 disk types supported by Serviceguard, 44
databases disks
toolkits, 405 in Serviceguard, 44
deactivating volume groups, 215 replacing, 366
deciding when and where to run packages, 75 disks, mirroring, 45
deferred resource name, 184, 185 Distributed Systems Administration
deleting a package configuration Utilities, 32, 306
using cmdeleteconf, 351 distributing the cluster and package
deleting a package from a running cluster, configuration, 304, 305
351 DNS services, 198
deleting nodes while the cluster is running, down time
345, 348 minimizing planned, 426
deleting the cluster configuration DSAU, 32, 258, 259, 260
using cmdeleteconf, 266 using during configuration, 258
dependencies DTC
configuring, 178 using with Serviceguard, 185
designing applications to run on multiple dual attach FDDI stations, 41
systems, 415 dual cluster locks
detecting failures choosing, 70
in network manager, 103 dynamic cluster re-formation, 66
disk
choosing for volume groups, 211 E
data, 44
interfaces, 44 eight-node active/standby cluster
figure, 55
mirroring, 45
eight-node cluster with disk array
root, 44 figure, 56
sample configurations, 47 , 49 EMS
disk arrays for disk monitoring, 46
creating volume groups with PV links, 213 for preventive monitoring, 363, 364
disk enclosures monitoring package resources with, 83
high availability, 46
using the EMS HA monitors, 84
disk failure
protection through mirroring, 23 enclosure for disks
replacing a faulty mechanism, 366
disk group
enclosures
planning, 152
high availability, 46
disk group and disk planning, 152
disk I/O Ethernet
redundant configuration, 38
hardware planning, 141
Event Monitoring Service
disk layout
for disk monitoring, 46
planning, 149
disk logical units in troubleshooting, 363, 364
hardware planning, 141 event monitoring service
using, 83
disk management, 113
disk monitor, 46 expanding the cluster
disk monitor (EMS), 84 planning ahead, 133
disk storage expansion
creating the infrastructure with CFS, 240 planning for, 169
creating the infrastructure with CVM , 251
484
F running cluster with packages moved to
failback policy node 1, 440
package configuration file parameter, 170 running cluster with packages moved to
used by package manager, 80 node 2, 439
FAILBACK_POLICY parameter sample cluster configuration, 135
in package configuration file, 170 serial (RS232) heartbeat line, 42
used by package manager, 80 typical cluster after failover, 24
failover typical cluster configuration, 21
controlling the speed in applications, 410 file locking, 421
defined, 23 file system
failover behavior in control script, 181
in packages, 85 file systems
failover package, 73 creating for a cluster, 212, 221, 256
failover packages planning, 149
configuring, 273 Filesystem mount retry count, 182
failover policy Filesystem unmount count, 182
package configuration parameter, 170 FIRST_CLUSTER_LOCK_PV
used by package manager, 77 in sample configuration file, 226
FAILOVER_POLICY parameter parameter in cluster manager
in package configuration file, 170 configuration, 157, 158
used by package manager, 77 FIRST_CLUSTER_LOCK_VG
failure in sample configuration file, 226
kinds of responses, 126 floating IP address
network communication, 129 defined, 101
response to hardware failures, 127 floating IP addresses, 101
responses to package and service failures, in Serviceguard, 101
128 FS
restarting a service after failure, 128 in sample package control script, 291
failures FS_MOUNT_OPT
of applications, 424 in sample package control script, 291
figures
cluster with high availability disk array, 49, G
50 GAB for CVM and CFS, 62
configuration with dual attach FDDI general planning, 133
stations, 41 gethostbyname
eight-node active/standby cluster, 55 and package IP addresses, 101
eight-node cluster with EMC disk array, 56 gethostbyname(), 417
mirrored disks connected for high
availability, 48 H
node 1 rejoining the cluster, 440 HA
node 1 upgraded to HP-UX 10.01, 439 disk enclosures, 46
primary disk and mirrors on different HA monitors (EMS), 84
buses, 52 HALT_SCRIPT
redundant FDDI configuration, 40 in sample ASCII package configuration file,
redundant LANs, 39 274
root disks on different shared buses, 51 parameter in package configuration, 173
running cluster after upgrades, 441 HALT_SCRIPT_TIMEOUT (halt script
running cluster before rolling upgrade, 438 timeout)
in sample ASCII package configuration file,
274
485
parameter in package configuration, 173 parameter in cluster manager
halting a cluster, 334 configuration, 159
halting a package, 336 HEARTBEAT_IP
halting the entire cluster, 334 in sample configuration file, 226
handling application failures, 424 parameter in cluster manager
hardware configuration, 157
blank planning worksheet, 444 high availability, 22
monitoring, 363 HA cluster defined, 36
hardware failures objectives in planning, 133
response to, 127 host IP address
hardware for OPS on HP-UX hardware planning, 137, 147, 148
power supplies, 53 host name
hardware planning hardware planning, 136
Disk I/O Bus Type, 141 how the cluster manager works, 64
disk I/O information for shared disks, 141 how the network manager works, 101
host IP address, 137, 147, 148 HP, 117
host name, 136 HP Predictive monitoring
I/O bus addresses, 141 in troubleshooting, 364
I/O slot numbers, 141
LAN information, 136 I
LAN interface name, 137, 147 I/O bus addresses
LAN traffic type, 137 hardware planning, 141
memory capacity, 136 I/O slots
number of I/O slots, 136 hardware planning, 136, 141
planning the configuration, 135 identifying cluster-aware volume groups, 233
RS232 heartbeat line, 139 in-line terminator
S800 series number, 136 permitting online hardware maintenance,
SPU information, 136 367
subnet, 137, 147 Installing Serviceguard, 208
worksheet, 142 installing software
heartbeat quorum server, 206
RS232 line, 139 integrating HA applications with
heartbeat interval Serviceguard, 429
parameter in cluster manager internet
configuration, 159 toolkits, 405
heartbeat line introduction
configuring RS232, 139 Serviceguard at a glance, 22
heartbeat lines, serial, 42 IP
heartbeat messages, 23 in sample package control script, 291
defined, 64 IP address array variable in package
heartbeat subnet address control script, 183
parameter in cluster manager IP address
configuration, 157 adding and deleting in packages, 102
HEARTBEAT_INTERVAL for nodes and packages, 101
in sample configuration file, 226 hardware planning, 137, 147, 148
HEARTBEAT_INTERVAL (heartbeat portable, 101
timeout) reviewing for packages, 373
486
switching, 76, 77, 108 worksheet, 150, 153
lssf
J using to obtain a list of disks, 211
LV
JFS, 411
in sample package control script, 291
lvextend
K creating a root mirror with, 201
kernel consistency LVM , 117, 118
in cluster configuration, 194, 195, 203 commands for cluster use, 209
creating a root mirror, 200
L disks, 44
LAN migrating to VxVM, 457
heartbeat, 64 planning, 149
interface name, 137, 147 setting up volume groups on another node,
planning information, 136 215
LAN failure LVM configuration
Serviceguard behavior, 36 worksheet, 150, 153
LAN interfaces LVM_ACTIVATION_CMD, 180
monitoring with network manager, 103
primary and secondary, 38 M
LAN planning MAC addresses, 417
host IP address, 137, 147, 148 managing the cluster and nodes, 330
traffic type, 137 manual cluster startup, 66
larger clusters, 54 MAX_CONFIGURED_PACKAGES
link-level addresses, 417 parameter in cluster manager
LLT for CVM and CFS, 62 configuration, 161
load sharing with IP addresses, 102 maximum number of nodes, 36
local switching, 104 membership change
parameter in package configuration, 172 reasons for, 66
LOCAL_LAN_FAILOVER_ALLOWED memory capacity
in sample ASCII package configuration file, hardware planning, 136
274 memory requirements
parameter in package configuration, 172 lockable memory for Serviceguard, 133
lock minimizing planned down time, 426
cluster locks and power supplies, 53 mirror copies of data
use of the cluster lock disk, 68 protection against disk failure, 23
use of the quorum server, 70 MirrorDisk/UX, 45
lock disk mirrored disks connected for high
4 or more nodes, 69 availability
lock volume group figure, 48
identifying in configuration file, 231 mirroring
planning, 155 disks, 45
lock volume group, reconfiguring, 342 mirroring disks, 45
Log consolidation, 259 mkboot
log consolidation with DSAU, 32 creating a root mirror with, 201
logical volumes monitor cluster with Serviceguard
blank planning worksheet, 450 commands, 261
creating for a cluster, 212, 220, 255 monitor clusters with Serviceguard
creating the infrastructure, 209, 218 Manager, 261
planning, 149 monitored non-heartbeat subnet
487
parameter in cluster manager parameter in cluster manager
configuration, 158 configuration, 160
monitored resource failure network time protocol (NTP)
Serviceguard behavior, 36 for clusters, 203
monitoring hardware , 363 NETWORK_INTERFACE
monitoring LAN interfaces in sample configuration file, 226
in network manager, 103 NETWORK_POLLING_INTERVAL
Monitoring, cluster, 260 (network polling interval)
mount options in sample configuration file, 226
in control script, 181 networking
moving a package, 337 redundant subnets, 137
multi-node package, 73 networks
multi-node package configuration, 272 binding to IP addresses, 419
multi-node packages binding to port addresses, 419
configuring, 272 IP addresses and naming, 415
multiple systems node and package IP addresses, 101
designing applications for, 415 packages using IP addresses, 417
supported types, 38
N writing network applications as HA
name resolution services, 198 services, 409
network no cluster locks
adding and deleting package IP addresses, choosing, 71
102 node
failure, 105 basic concepts, 36
load sharing with IP addresses, 102 in Serviceguard cluster, 22
local interface switching, 104 IP addresses, 101
local switching, 105 node types
redundancy, 38, 42 active, 23
remote system switching, 108 primary, 23
network communication failure, 129 NODE_FAIL_FAST_ENABLED
network components in sample ASCII package configuration file,
in Serviceguard, 38 274
network failure detection parameter in package configuration, 172
INONLY_OR_INOUT, 103 NODE_FAILFAST_ENABLED parameter,
INOUT, 103 289
Network Failure Detection parameter, 103 NODE_NAME
network manager in sample ASCII package configuration file,
adding and deleting package IP addresses, 274
102 parameter in cluster manager
main functions, 101 configuration, 156, 157
monitoring LAN interfaces, 103 NODE_TIMEOUT (heartbeat timeout)
testing, 361 in sample configuration file, 226
network planning NODE_TIMEOUT (node timeout)
subnet, 137, 147 parameter in cluster manager
network polling interval configuration, 159
(NETWORK_POLLING_INTERVAL) nodetypes
primary, 23
488
NTP resource up parameter, 177
time protocol for clusters, 203 run and halt script timeout parameters, 173
service fail fast parameter, 175
O service halt timeout parameter, 176
online hardware maintenance service name parameter, 174, 175
by means of in-line SCSI terminators, 367 step by step, 269
OPS subnet parameter, 176
startup and shutdown instances, 289 system multi-node packages, 271
optimizing packages for large numbers of using Serviceguard commands, 271
storage units, 292 verifying the configuration, 304, 305
outages writing the package control script, 288
insulating users from , 408 package configuration file, 274
package control script
P generating with commands, 288
package IP addresses, 182, 183
adding and deleting package IP addresses, service command, 184
102 service name, 183
basic concepts, 36 service restart variable, 184
changes allowed while the cluster is subnets, 182, 183
running, 354 worksheet, 185
halting, 336 package coordinator
local interface switching, 104 defined, 65
moving, 337 package failfast
parameter in package configuration, 172
reconfiguring while the cluster is running,
package failover behavior, 85
350
package failures
reconfiguring with the cluster offline, 349 responses, 128
remote switching, 108 package IP address
starting, 335 defined, 101
toolkits for databases, 405 package IP addresses, 101
package administration, 335 defined, 101
solving problems, 380 reviewing, 373
package administration access, 178 package manager
package and cluster maintenance, 309 blank planning worksheet, 453, 454
package configuration testing, 360
additional package resource parameter, package name
176, 177 parameter in package configuration, 170
automatic switching parameter, 171 package switching behavior
control script pathname parameter, 171 changing, 339, 340
distributing the configuration file, 304, 305 package type
failback policy parameter, 170 parameter in package configuration, 176
failover policy parameter, 170 Package types, 22
in SAM, 270 failover, 22
local switching parameter, 172 multi-node, 22
multi-node packages, 272 system multi-node, 22
package failfast parameter, 172 package types, 22
package name parameter, 170 PACKAGE_NAME
package type parameter, 176 in sample ASCII package configuration file,
planning, 164 274
resource polling interval parameter, 177
489
parameter in package ASCII configuration worksheets for physical volume planning,
file, 170 448
PACKAGE_TYPE planning and documenting an HA cluster,
parameter in package ASCII configuration 131
file, 176 planning for cluster expansion, 133
packages planning worksheets
deciding where and when to run, 75 blanks, 443
launching OPS instances, 289 point of failure
parameter in networking, 38, 42
AUTO_RUN, 289 point to point connections to storage devices,
NODE_FAILFAST_ENABLED, 289 55
parameters ports
for failover, 85 dual and single aggregated, 110
parameters for cluster manager power planning
initial configuration, 64 power sources, 144
PATH, 180 worksheet, 145
performance power supplies
optimizing packages for large numbers of blank planning worksheet, 445
storage units, 292 power supply
performance variables in package control and cluster lock, 53
script, 182, 183 blank planning worksheet, 446
physical volume UPS for OPS on HP-UX, 53
for cluster lock, 68 Predictive monitoring, 364
parameter in cluster lock configuration, 157 primary disks and mirrors on different buses
physical volumes figure, 52
creating for clusters, 211 primary LAN interfaces
filled in planning worksheet, 448 defined, 38
planning, 149 primary network interface, 38
worksheet, 150, 153 primary node, 23
PV Links
planning
creating volume groups with, 213
cluster configuration, 154
pvcreate
cluster lock and cluster expansion, 155
creating a root mirror with, 200
cluster manager configuration, 156 PVG-strict mirroring
disk groups and disks, 152 creating volume groups with, 211
disk I/O information, 141
for expansion, 169
Q
hardware configuration, 135
high availability objectives, 133 qs daemon, 59, 61
LAN information, 136 QS_HOST
overview, 131 parameter in cluster manager
configuration, 156
package configuration, 164
QS_POLLING_INTERVAL
power, 144
parameter in cluster manager
quorum server, 147
configuration, 156
SCSI addresses, 140
QS_TIMEOUT_EXTENSION
SPU information, 136 parameter in cluster manager
volume groups and physical volumes, 149 configuration, 156
worksheets, 142
490
quorum server parameter in package configuration, 176,
blank planning worksheet, 447 177
installing, 206 resource polling interval
parameters in cluster manager parameter in package configuration, 177
configuration, 156 resource up interval
planning, 147 parameter in package configuration, 177
status and state, 317 RESOURCE_NAME
use in re-forming a cluster, 70 in sample ASCII package configuration file,
worksheet, 148 274
parameter in package configuration, 176,
R 177
RESOURCE_POLLING_INTERVAL
RAID in sample ASCII package configuration file,
for data protection, 45
274
raw volumes, 411
parameter in package configuration, 177
README
for database toolkits, 405 RESOURCE_UP_VALUE
in sample ASCII package configuration file,
reconfiguring a package
while the cluster is running, 350 274
reconfiguring a package with the cluster parameter in package configuration, 177
offline, 349 resources
reconfiguring a running cluster, 343 disks, 44
reconfiguring the entire cluster, 342 responses
reconfiguring the lock volume group, 342 to cluster events, 357
recovery time, 154 to package and service failures, 128
redundancy responses to failures, 126
in networking, 38 , 42 responses to hardware failures, 127
of cluster components, 36 restart
redundancy in network interfaces, 38 automatic restart of cluster, 66
redundant Ethernet configuration, 38 following failure, 128
redundant FDDI configuration SERVICE_RESTART variable in package
figure, 40 control script, 184
redundant FDDI connections, 40 restartable transactions, 412
redundant LANS restarting the cluster automatically, 334
figure, 39 restoring client connections in applications,
redundant networks 422
for heartbeat, 23 retry count, 182
re-formation rhosts file
of cluster, 66 for security, 190
re-formation time, 155 rolling software upgrades, 435
relocatable IP address example, 438
defined, 101 steps, 436
relocatable IP addresses, 101 rolling upgrade
in Serviceguard, 101 limitations, 442
remote switching, 108 root disk limitations on shared buses, 50
removing nodes from operation in a running root disks on different shared buses
cluster, 332 figure, 51
removing packages on a running cluster, 287 root mirror
Removing Serviceguard from a system, 358 creating with LVM, 200
replacing disks, 366 rotating standby
Resource Name configuring with failover policies, 78
setting package policies, 78
491
RS232 connection responses, 128
for heartbeats, 139 service halt timeout
RS232 heartbeat line, configuring, 139 parameter in package configuration, 176
RS232 serial heartbeat line, 42 service name, 183
RS232 status, viewing, 324 parameter in package configuration, 174,
RUN_SCRIPT 175
in sample ASCII package configuration file, variable in package control script, 183
274 service restart parameter
parameter in package configuration, 173 variable in package control script, 184
RUN_SCRIPT_TIMEOUT service restarts, 128
in sample ASCII package configuration file, SERVICE_CMD
274 array variable in package control script,
RUN_SCRIPT_TIMEOUT (run script 184, 185
timeout) in sample package control script, 291
parameter in package configuration, 173 SERVICE_FAIL_FAST_ENABLED
running cluster in sample ASCII package configuration file,
adding or removing packages, 287 274
parameter in package configuration, 175
S SERVICE_HALT_TIMEOUT
SAM in sample ASCII package configuration file,
using to configure packages, 270 274
sample cluster configuration parameter in package configuration, 176
figure, 135 SERVICE_NAME
sample disk configurations, 47, 49 array variable in package control script, 183
SCSI addressing, 140, 155 in sample ASCII package configuration file,
SECOND_CLUSTER_LOCK_PV 274
parameter in cluster manager in sample package control script, 291
configuration, 157, 158 parameter in package configuration, 174,
security 175
editing files, 190 SERVICE_RESTART
serial (RS232) heartbeat line, 42 array variable in package control script, 184
figure, 42 in sample package control script, 291
serial heartbeat connections Serviceguard
identifying, 234 install, 208
serial port introduction, 22
using for heartbeats, 139 Serviceguard at a glance, 21
SERIAL_DEVICE_FILE(RS232) Serviceguard behavior after monitored
parameter in cluster manager resource failure, 36
configuration, 159 Serviceguard behavior in LAN failure, 36
service administration, 335 Serviceguard behavior in software failure, 36
service command Serviceguard commands
variable in package control script, 184, 185 to configure a package, 271
service configuration ServiceGuard Manager
step by step, 269 overview, 26
service fail fast Serviceguard Manager, 30
parameter in package configuration, 175 SG-CFS-DG-id# multi-node package, 166
service failures SG-CFS-MP-id# multi-node package, 166
492
SG-CFS-pkg system multi-node package, 166 in sample package control script, 291
SGCONF, 189 parameter in package configuration, 176
shared disks subnet
planning, 141 hardware planning, 137, 147
shutdown and startup parameter in package configuration, 176
defined for applications, 409 supported disks in Serviceguard , 44
single cluster lock switching
choosing, 69 ARP messages after switching, 108
single point of failure local interface switching, 104
avoiding, 22 remote system switching, 108
single-node operation, 265 switching IP addresses, 76, 77, 108
size of cluster Synchronizing configuration, 258
preparing for changes, 205 system log file
SMN package, 73 troubleshooting, 374
SNA applications, 421 system message
software failure changing for clusters, 265
Serviceguard behavior, 36 system multi-node package, 73
software planning used with CVM, 253
CVM and VxVM, 152 system multi-node package configuration,
LVM, 149 271
solving problems, 380 system multi-node packages
SPU information configuring, 271
planning, 136
standby LAN interfaces T
defined, 38
standby network interface, 38 tasks in Serviceguard configuration, 33
starting a package, 335 template
startup and shutdown ASCII cluster configuration file, 225
defined for applications, 409 ASCII package configuration file, 274
startup of cluster testing
manual, 66 cluster manager, 361
when all nodes are down, 330 network manager, 361
state package manager, 360
of cluster and package, 313 testing cluster operation, 360
stationary IP addresses, 101 time protocol (NTP)
STATIONARY_IP for clusters, 203
parameter in cluster manager TOC
configuration, 158 when a node fails, 126
status toolkits
cmviewcl, 312 for databases, 405
multi-node packages, 312 traffic type
of cluster and package, 313 LAN hardware planning, 137
package IP address, 373 troubleshooting
system log file, 374 approaches, 373
stopping a cluster, 334 monitoring hardware, 363
storage management, 113 replacing disks, 366
SUBNET reviewing control scripts, 376
array variable in package control script, reviewing package IP addresses, 373
182, 183 reviewing system log file, 374
in sample ASCII package configuration file, using cmquerycl and cmcheckconf, 377
274 troubleshooting your cluster, 359
493
typical cluster after failover setting up on another node with LVM
figure, 24 Commands, 215
typical cluster configuration worksheet, 150, 153
figure, 21 volume group and physical volume planning,
149
U Volume groups
in control script, 181
uname(2), 418
unmount count, 182 volume managers, 113
UPS comparison, 121
in power planning, 144 CVM, 119
power supply for OPS on HP-UX, 53 LVM, 118
use of the cluster lock, 68, 70 migrating from LVM to VxVM, 457
USER_HOST, 161 VxVM, 118
USER_NAME, 161 VOLUME_GROUP
USER_ROLE, 161 in sample configuration file, 226
parameter in cluster manager
V configuration, 161
verifying cluster configuration, 235 vxfend for CVM and CFS, 63
verifying the cluster and package VxM-CVM-pkg system multi-node package,
configuration, 304, 305 166
VERITAS, 117 VxVM, 117, 118
VERITAS CFS components, 62 creating a storage infrastructure, 218
VERITAS disk group packages migrating from LVM to VxVM, 457
creating, 243 planning, 152
VERITAS mount point packages VXVM_DG
creating, 244 in package control script, 291
VERITAS system multi-node packages, 241 VxVM-CVM package , 74
VG VxVM-CVM-pkg, 253
in sample package control script, 291
vgcfgbackup W
and cluster lock data, 238 What is Serviceguard?, 22
VGCHANGE worksheet
in package control script, 291 cluster configuration, 162
vgextend hardware configuration, 142
creating a root mirror with, 201 package configuration data, 179
vgimport package control script, 185
using to set up volume groups on another power supply configuration, 145
node, 216 quorum server configuration, 148
viewing RS232 status, 324 use in planning, 131
Volume, 113
volume group and physical volumes, 150,
volume group
creating for a cluster, 211, 213 153
creating physical volumes for clusters, 211 worksheets
physical volume planning, 448
deactivating before export to another node,
worksheets for planning
215
blanks, 443
for cluster lock, 68
planning, 149
494