Ibm Diverse
Ibm Diverse
Ibm Diverse
Jon Tate
Pawel Brodacki
Tilak Buneti
Christian Burns
Jana Jamsek
Erez Kirson
Marcin Tabinowski
Bosmat Tuv-El
ibm.com/redbooks
SG24-7521-03
Note: Before using this information and the product it supports, read the information in Notices on
page xv.
Copyright International Business Machines Corporation 2008, 2014. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
IntelliMagic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
IBM Redbooks promotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
September 2014, Fourth Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Part 1. Configuration guidelines and preferred practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 1. Updates in IBM System Storage SAN Volume Controller . . . . . . . . . . . . . . . 3
1.1 Enhancements and changes in SAN Volume Controller V5.1 . . . . . . . . . . . . . . . . . . . . 4
1.2 Enhancements and changes in SAN Volume Controller V6.1 . . . . . . . . . . . . . . . . . . . . 5
1.3 Enhancements and changes in SAN Volume Controller V6.2 . . . . . . . . . . . . . . . . . . . . 7
1.4 Enhancements and changes in SAN Volume Controller V6.3 . . . . . . . . . . . . . . . . . . . . 9
1.5 Enhancements and changes in SAN Volume Controller V6.4 . . . . . . . . . . . . . . . . . . . 11
1.6 Enhancements and changes in SAN Volume Controller V7.1 . . . . . . . . . . . . . . . . . . . 12
1.7 Enhancements and changes in SAN Volume Controller V7.2 . . . . . . . . . . . . . . . . . . . 14
Chapter 2. SAN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 SAN topology of the SAN Volume Controller/Storwize . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Topology basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 ISL oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.4 Single switch SAN Volume Controller/Storwize SANs . . . . . . . . . . . . . . . . . . . . .
2.1.5 Basic core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.6 Four-SAN, core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.7 Common topology issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.8 Stretched Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.9 Enhanced Stretched Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Selecting SAN switch models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Switch port layout for large SAN edge switches . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Switch port layout for director-class SAN switches . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Virtual channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.5 IBM System Storage and IBM b-type SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.6 IBM System Storage and Cisco SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.7 SAN routing and duplicate worldwide node names. . . . . . . . . . . . . . . . . . . . . . . .
2.3 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Types of zoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Prezoning tips and shortcuts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 SAN Volume Controller internode communications zone . . . . . . . . . . . . . . . . . . .
Copyright IBM Corp. 2008, 2014. All rights reserved.
17
18
18
19
20
21
21
22
24
27
30
31
31
31
32
32
35
37
38
38
39
40
41
iii
41
44
46
50
50
50
50
51
51
51
53
53
53
54
54
54
55
55
55
56
56
59
60
61
61
62
62
64
67
68
iv
125
126
126
126
127
127
Contents
vi
157
158
159
161
161
161
164
165
171
172
172
173
175
175
176
177
177
177
178
180
180
181
181
182
183
183
184
184
184
184
185
186
187
188
189
190
190
194
194
194
195
196
198
198
199
200
201
201
205
207
207
208
209
211
212
214
214
216
216
217
221
222
223
223
Chapter 8. Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 Configuration guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 Host levels and host object name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.2 The number of paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
225
226
226
226
Contents
vii
227
228
228
228
229
232
233
233
233
234
234
235
237
240
240
242
242
243
244
244
244
246
246
247
247
248
249
250
250
251
251
251
252
252
252
252
253
253
253
254
254
254
255
255
256
256
256
257
257
258
258
258
263
264
265
266
267
267
268
269
270
271
271
272
282
282
283
284
284
285
288
290
291
291
296
296
298
303
303
303
305
306
307
307
307
310
312
314
314
314
316
317
317
317
319
320
320
320
321
Contents
ix
321
322
322
323
324
324
324
325
325
326
326
329
329
330
331
333
334
334
335
335
338
339
340
340
341
341
341
341
342
342
343
343
343
344
345
346
348
349
350
350
350
351
351
352
352
352
352
353
354
354
Contents
xi
509
511
511
512
513
514
514
515
517
519
520
520
520
522
522
524
524
527
532
536
539
540
541
544
545
549
549
549
550
550
551
552
552
555
556
568
568
570
575
575
579
579
580
582
582
583
584
585
603
604
604
604
605
605
605
606
606
607
607
608
608
609
609
610
610
611
611
612
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Referenced websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
613
613
614
614
615
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Contents
xiii
xiv
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results
obtained in other operating environments may vary significantly. Some measurements may have been made
on development-level systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
xv
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX
alphaWorks
BladeCenter
Cognos
DB2
developerWorks
DS4000
DS6000
DS8000
Easy Tier
Enterprise Storage Server
eServer
FlashCopy
FlashSystem
GPFS
HACMP
IBM
IBM FlashSystem
IBM Flex System
Nextra
POWER
PowerHA
PowerVM
Real-time Compression
Redbooks
Redpaper
Redbooks (logo)
xvi
SPONSORSHIP PROMOTION
IntelliMagic
z1-877-815-3799
zwww.intellimagic.com
THE ABOVE IS A PAID PROMOTION. IT DOES NOT CONSTITUTE AN ENDORSEMENT OF ANY OF THE ABOVE
COMPANY'S PRODUCTS, SERVICES OR WEBSITES BY IBM. NOR DOES IT REFLECT THE OPINION OF IBM, IBM
MANAGEMENT, SHAREHOLDERS OR OFFICERS. IBM DISCLAIMS ANY AND ALL WARRANTEES FOR GOODS OR
SERVICES RECEIVED THROUGH OR PROMOTED BY THE ABOVE COMPANY.
Download
Now
Android
iOS
ibm.com/Redbooks
About Redbooks
Preface
This IBM Redbooks publication captures several of the preferred practices that are based
on field experience and describes the performance gains that can be achieved by
implementing the IBM System Storage SAN Volume Controller and Storwize V7000 V7.2.
This book begins with a look at the latest developments with SAN Volume Controller and
Storwize V7000 and reviews the changes in the previous versions of the product. It highlights
configuration guidelines and preferred practices for the storage area network (SAN) topology,
clustered system, back-end storage, storage pools and managed disks, volumes, remote
copy services, and hosts. Then, this book provides performance guidelines for SAN Volume
Controller, back-end storage, and applications. It explains how you can optimize disk
performance with the IBM System Storage Easy Tier function. Next, it provides preferred
practices for monitoring, maintaining, and troubleshooting SAN Volume Controller and
Storwize V7000. Finally, this book highlights several scenarios that demonstrate the preferred
practices and performance guidelines.
This book is intended for experienced storage, SAN, and SAN Volume Controller
administrators and technicians. Before reading this book, you must have advanced
knowledge of the SAN Volume Controller and Storwize V7000 and SAN environments. For
more information, see the following publications:
Implementing the IBM System Storage SAN Volume Controller V7.2, SG24-7933
Implementing the IBM Storwize V7000 V7.2, SG24-7938
Real-time Compression in SAN Volume Controller and Storwize V7000, REDP-4859
IBM SAN Volume Controller and IBM FlashSystem 820: Best Practices and Performance
Capabilities, REDP-5027
Implementing the IBM SAN Volume Controller and FlashSystem 820, SG24-8172
Introduction to Storage Area Networks and System Networking, SG24-5470
xxi
xxii
Preface
xxiii
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
http://www.ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
xxiv
Preface
xxv
xxvi
Summary of changes
This section describes the technical changes that were made in this edition of the book and in
previous editions. This edition might also include minor corrections and editorial changes that
are not identified.
Summary of Changes
for SG24-7521-03
for Best Practices and Performance Guidelines
as created or updated on January 30, 2015.
xxvii
xxviii
Part 1
Part
Configuration
guidelines and
preferred practices
This part describes the latest developments for IBM System Storage SAN Volume Controller
V6.2 and reviews the changes in the previous versions of the product. It highlights
configuration guidelines and preferred practices for the storage area network (SAN) topology,
clustered system, back-end storage, storage pools and managed disks, volumes, remote
copy services, and hosts.
This part includes the following chapters:
Chapter 1.
Service Assistant
SAN Volume Controller V6.1 introduces a new method for performing service tasks on the
system. In addition to performing service tasks from the front panel, you can service a
node through an Ethernet connection by using a web browser or command-line interface
(CLI). The web browser runs a new service application that is called the Service Assistant.
All functions that were previously available through the front panel are now available from
the Ethernet connection, with the advantages of an easier to use interface and remote
access from the cluster. You also can run Service Assistant commands through a USB
flash drive for easier serviceability.
IBM System Storage Easy Tier function added at no charge
SAN Volume Controller V6.1 delivers IBM System Storage Easy Tier, which is a dynamic
data relocation feature that allows host transparent movement of data among two tiers of
storage. This feature includes the ability to automatically relocate volume extents with high
activity to storage media with higher performance characteristics. Extents with low activity
are migrated to storage media with lower performance characteristics. This capability
aligns the SAN Volume Controller system with current workload requirements, which
increases overall storage performance.
Temporary withdrawal of support for SSDs on the 2145-CF8 nodes
At the time of this writing, 2145-CF8 nodes that use internal SSDs are not supported by
V6.1.0.x code (fixed in version 6.2).
Interoperability with new storage controllers, host operating systems, fabric devices, and
other hardware
For an updated list, see V6.1 Supported Hardware List, Device Driver, Firmware and
Recommended Software Levels for SAN Volume Controller, S1003697, which is available
at this website:
https://www.ibm.com/support/docview.wss?uid=ssg1S1003697
Removal of 15-character maximum name length restrictions
SAN Volume Controller V6.1 supports object names up to 63 characters. Previous levels
supported only up to 15 characters.
SAN Volume Controller code upgrades
The SAN Volume Controller console code is now removed. You need to update only the
SAN Volume Controller code. The upgrade from SAN Volume Controller V5.1 requires
usage of the former console interface or a command line. After the upgrade is complete,
you can remove the existing ICA console application from your SSPC or master console.
The new GUI is started through a web browser that points the SAN Volume Controller IP
address.
SAN Volume Controller to back-end controller I/O change
SAN Volume Controller V6.1 allows variable block sizes, up to 256 KB against 32 KB
supported in the previous versions. This change is handled automatically by the SAN
Volume Controller system without requiring any user control.
Scalability
The maximum extent size increased four times to 8 GB. With an extent size of 8 GB, the
total storage capacity that is manageable for each cluster is 32 PB. The maximum volume
size increased to 1 PB. The maximum number of worldwide node names (WWNN)
increased to 1,024, which allows up to 1,024 back-end storage subsystems to be
virtualized.
Term in previous
versions of SAN
Volume Controller
Description
Event
Error
Host mapping
VDisk-to-host
mapping
Storage pool
Managed disk
group
Thin provisioning
(thin-provisioned)
Space efficient
Volume
Table 1-2 shows the current and previous usage of one changed common term.
Table 1-2 SAN Volume Controller Version 6.2 terminology mapping table
Term in SAN Volume
Controller V6.2
Term in previous
versions of SAN
Volume Controller
Description
Clustered system or
system
Cluster
10
Performance monitoring: Read and write latency statistics for volumes and MDisk
Interoperability with new storage controllers, host operating systems, fabric devices, and
other hardware
For an updated list, see V6.3 Supported Hardware List, Device Driver, Firmware and
Recommended Software Levels for SAN Volume Controller, S1003907, which is available
at this website:
https://www.ibm.com/support/docview.wss?uid=ssg1S1003907
11
12
Port masking
The addition of more Fibre Channel HBA ports that are introduced with feature code AHA7
allow clients to optimize their SAN Volume Controller configuration by using dedicated
ports for certain system functions. However, the addition of these ports necessitates the
ability to ensure traffic isolation. As such, SAN Volume Controller V7.1 introduces port
masking.
Traffic types that you might want to isolate by using port masking are shown in the
following examples:
Local node-to-node communication
Replication traffic
Support for Easy Tier with compressed volumes
Easy Tier is a performance optimization function that automatically migrates hot extents
that belong to a volume to MDisks that better meet the performance requirements of that
extent. The Easy Tier function can be turned on or off at the storage pool level and at the
volume.
Real-time Compression is a feature of SAN Volume Controller that addresses all of the
requirements of primary storage data reduction, including performance and the use of
purpose-built compression technology, which allow for data reduction of up to 80%.
In practice, clients find that their target workloads for these two features have a significant
overlap. Before SAN Volume Controller Storage Software version 7.1, the use of these two
features was mutually exclusive at the volume level. SAN Volume Controller V7.1
introduces support for the concurrent use of Easy Tier and Real-time Compression on the
same volume.
Enhanced flexibility in modifying Remote Copy relationships
SAN Volume Controller V7.1 introduces the ability to change between Metro Mirror and
Global Mirror (with or without change volumes) without requiring a full resync of all data
from the primary volume to the secondary volume.
Storwize V3700 support for Remote Copy
SAN Volume Controller V7.1 introduces support for Remote Copy on Storwize V3700
systems, which allows for remote replication between any combination of the following
systems:
SAN Volume Controller
Storwize V7000
Flex System V7000
Storwize V3700
Interoperability
For an updated list, see V7.1 Supported Hardware List, Device Driver, Firmware and
Recommended Software Levels for SAN Volume Controller, S1004392, which is available
at this website:
https://www.ibm.com/support/docview.wss?uid=ssg1S1004392
13
14
Improved performance and efficiency for Real-time Compression with the introduction of
the Random Access Compression Engine (RACE) 2.2
V7.2 introduces the following improvements to the Real-time Compression functionality:
Up to 3x higher sequential write throughput, which allows for faster VMware vMotion
operations and sequential copy operations and more VMware vMotion sessions in
parallel
35% higher throughput (IOPS) in intensive DB OLTP workloads
35% lower compression CPU usage for the same workload compared to V7.1
Interoperability
For an updated list, see V7.2 Supported Hardware List, Device Driver, Firmware and
Recommended Software Levels for SAN Volume Controller, S1004453, which is available
at this website:
https://www.ibm.com/support/docview.wss?uid=ssg1S1004453
15
16
Chapter 2.
SAN topology
The IBM System Storage SAN Volume Controller and Storwize family systems have unique
SAN fabric configuration requirements that differ from what you might be used to in your
storage infrastructure. A quality SAN configuration can help you achieve a stable, reliable,
and scalable SAN Volume Controller/Storwize installation. Conversely, a poor SAN
environment can make your SAN Volume Controller/Storwize experience considerably less
pleasant.
This chapter helps to tackle this topic that is based on experiences from the field. Although
many other SAN configurations are possible (and supported), this chapter highlights the
preferred configurations.
This chapter includes the following sections:
17
SAN design: If you are planning for a SAN Volume Controller installation, you must be
knowledgeable about general SAN design principles. For more information about SAN
design, limitations, caveats, and updates that are specific to your SAN Volume Controller
environment, see the following publications:
IBM System Storage SAN Volume Controller V6.4.1 - Software Installation and
Configuration Guide, GC27-2286, which is available at this website:
http://pic.dhe.ibm.com/infocenter/svc/ic/topic/com.ibm.storage.svc.console.6
41.doc/mlt_relatedinfo_224agr.html
V7.2 Configuration Limits and Restrictions for IBM System Storage SAN Volume
Controller, S1004510, which is available at this website:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004510
For updated documentation before you implement your solution, see the IBM System
Storage SAN Volume Controller Support Portal at this website:
http://www.ibm.com/support/entry/portal/Overview/Hardware/System_Storage/Storag
e_software/Storage_virtualization/SAN_Volume_Controller_(2145)
For updated documentation and information about Storwize family systems, see the
following IBM Storwize Support Portals:
Storwize V3700
http://www-947.ibm.com/support/entry/portal/product/system_storage/disk_syst
ems/entry-level_disk_systems/ibm_storwize_v3700?productContext=-124971743
Storwize V5000
http://www-947.ibm.com/support/entry/portal/product/system_storage/disk_syst
ems/mid-range_disk_systems/ibm_storwize_v5000?productContext=-2033461677
Storwize V7000
http://www-947.ibm.com/support/entry/portal/product/system_storage/disk_syst
ems/mid-range_disk_systems/ibm_storwize_v7000_(2076)?productContext=-1546771
614
2.1.1 Redundancy
One of the fundamental SAN requirements for SAN Volume Controller/Storwize is to create
two (or more) separate SANs that are not connected to each other over Fibre Channel (FC) in
any way. The easiest way is to construct two SANs that are mirror images of each other.
Note: SAN Volume Controller/Storwize can be connected to up to four separate fabrics.
18
Technically, the SAN Volume Controller/Storwize supports the use of a single SAN
(appropriately zoned) to connect the entire SAN Volume Controller/Storwize. However, we
recommend that you do not use this design in any production environment. Based on
experience from the field, do not use this design in development environments either because
a stable development platform is important to programmers. Also, an extended outage in the
development environment can have an expensive business effect. However, for a dedicated
storage test platform, it might be acceptable.
19
When an Ethernet network becomes congested, the Ethernet switches discard frames for
which no room is available. When an FC network becomes congested, the FC switches
stop accepting more frames until the congestion clears and occasionally drop frames. This
congestion quickly moves upstream in the fabric and clogs the end devices (such as the
SAN Volume Controller/Storwize) from communicating anywhere. This behavior is referred
to as head-of-line blocking. Although modern SAN switches internally have a nonblocking
architecture, head-of-line blocking still exists as a SAN fabric problem. Head-of-line
blocking can result in the inability of SAN Volume Controller/Storwize nodes to
communicate with storage subsystems or to mirror their write caches because you have a
single congested link that leads to an edge switch.
If possible, use SAN directors to avoid many ISL connections. Problems that are related to
oversubscription or congestion are much less likely to occur within SAN directors fabrics.
20
21
SVC/Storwize Node
SVC/Storwize Node
2
Core Switch
Edge Switch
Core Switch
Edge Switch
Host
Edge Switch
Edge Switch
Host
22
SVC/Storwize Node
SVC/Storwize Node
Core Switch
Core Switch
Edge Switch
Edge Switch
Host
Core Switch
Core Switch
Edge Switch
Edge Switch
Host
Although some clients simplify management by connecting the SANs into pairs with a single
ISL, do not use this design. With only a single ISL connecting fabrics, a small zoning mistake
can quickly lead to severe SAN congestion.
SAN Volume Controller/Storwize as a SAN bridge: With the ability to connect a SAN
Volume Controller/Storwize to four SAN fabrics, you can use the SAN Volume
Controller/Storwize as a bridge between two SAN environments (with two fabrics in each
environment). This configuration is useful for sharing resources between SAN environments
without merging them. Another use is if you have devices with different SAN requirements in
your installation.
When you use the SAN Volume Controller/Storwize as a SAN bridge, pay attention to any
restrictions and requirements that might apply to your installation.
23
SVC Node
SVC Node
2
Switch
Switch
Switch
Switch
On SVC/Storwize,
zone
SVC
-> Storage Traffic
storage traffic to never travel
should be zoned to never
over these links.
travel
over these links
SVC-attach host
Non-SVC-attach
host
If you have this type of topology, you must zone the SAN Volume Controller/Storwize so that it
detects only paths to the storage subsystems on the same SAN switch as the SAN Volume
Controller/Storwize nodes. You might consider implementing a storage subsystem host port
mask here.
Restrictive zoning: With this type of topology, you must have more restrictive zoning than
what is described in 2.3.6, Standard SAN Volume Controller/Storwize zoning
configuration on page 46.
24
Because of the way that the SAN Volume Controller/Storwize load balances traffic between
the SAN Volume Controller nodes and MDisks, the amount of traffic that transits your ISLs is
unpredictable and varies significantly. You can use Cisco VSANs or Brocade Traffic Isolation
Zones to dedicate an ISL to high-priority traffic. However, internode and SAN Volume
Controller/Storwize to back-end storage communication must never cross ISLs.
Important: The SAN Volume Controller/Storwize traffic to storage devices can fill up your
ISLs if you have many storage devices ports that are accessed via ISL, especially with the
Round Robin Path Selection that was introduced in v6.3 of SAN Volume
Controller/Storwize code. Therefore, remember this when you are planning.
25
SVC Node
Old Switch
Host
SVC Node
New Switch
SVC Node
Old Switch
New Switch
Host
This design is a valid configuration, but you must take the following precautions:
Do not access the storage subsystems over the ISLs. As described in Accidentally
accessing storage over ISLs on page 24, zone and LUN mask the SAN and storage
subsystems. With this design, your storage subsystems need connections to the old and
new SAN switches.
Have two dedicated ISLs between the two switches on each SAN with no data traffic
traveling over them. Use this design because, if this link becomes congested or lost, you
might experience problems with your SAN Volume Controller/Storwize clustered system if
issues occur at the same time on the other SAN. If possible, set a 5% traffic threshold alert
on the ISLs so that you know whether a zoning mistake allowed any data traffic over the
links.
Important: Do not use this configuration to perform mirroring between I/O groups within
the same clustered system. Also, for SAN Volume Controller, never split the two nodes in
an I/O group between various SAN switches within the same SAN fabric if you do not use
the SAN Volume Controller Stretched Cluster scenario.
By using the optional 8 Gbps longwave (LW) small form factor pluggables (SFPs) in the
2145-CF8 and 2145-CG8, you can split a SAN Volume Controller I/O group across long
distances, as described in 2.1.8, Stretched Cluster on page 27.
26
27
Consider the physical distance of SAN Volume Controller nodes as related to the service
actions. Some service actions require physical access to all SAN Volume Controller nodes
in a system. If nodes in a split clustered system are separated by more than 100 meters,
service actions might require multiple service personnel.
Figure 2-5 shows a stretched clustered system configuration. When used with volume
mirroring, this configuration provides a high availability solution that is tolerant of failure at a
single site.
Physical Location 3
Storage
Subsystem
public
SAN1
Active
quorum
host
public
SAN2
public
SAN1
host
public
SAN2
public
SAN1
Storage
Subsystem
public
SAN2
SVC Node 2
SVC Node 1
private
SAN1
private
SAN1
private
SAN2
private
SAN2
Primary Site
Physical Location 1
Secondary Site
Physical Location 2
Figure 2-5 Stretched clustered system with physical switches and a quorum disk at a third site
If you do not have enough SAN switches to create two public and two private fabrics, you can
use Brocade Virtual Fabrics or Cisco Virtual SANs, as shown in Figure 2-6 on page 29.
28
Physical Location 3
Active
quorum
Storage
Subsystem
public
SAN1
host
Storage
Subsystem
public
SAN2
public SAN 1
(VSAN or VF)
public SAN 1
(VSAN or VF)
private SAN 1
(VSAN or FV)
private SAN 1
(VSAN or FV)
SVC Node 1
host
SVC Node 2
public SAN 2
(VSAN or VF)
public SAN 2
(VSAN or VF)
private SAN 2
(VSAN or FV)
private SAN 2
(VSAN or FV)
Primary Site
Physical Location 1
Secondary Site
Physical Location 2
Figure 2-6 SAN Volume Controller stretched cluster with VSANs or Virtual Fabrics
Quorum placement
A stretched clustered system configuration locates the active quorum disk at a third site. If
communication is lost between the primary and secondary sites, the site with access to the
active quorum disk continues to process transactions. If communication is lost to the active
quorum disk, an alternative quorum disk at another site can become the active quorum disk.
Although you can configure a system of SAN Volume Controller nodes to use up to three
quorum disks, only one quorum disk can be elected to solve a situation where the system is
partitioned into two sets of nodes of equal size. The purpose of the other quorum disks is to
provide redundancy if a quorum disk fails before the system is partitioned.
Important: Do not use solid-state drive (SSD) physical disks or managed disks for quorum
disk purposes if the SSD lifespan depends on write workload.
Configuration summary
Generally, when the nodes in a system are split among sites, configure the SAN Volume
Controller system in the following way:
Site 1 has half of the SAN Volume Controller system nodes and one quorum disk
candidate.
Site 2 has half of the SAN Volume Controller system nodes and one quorum disk
candidate.
Site 3 has the active quorum disk.
Disable the dynamic quorum configuration by using the chquorum command with the
override yes option.
For more information about Stretched Cluster, see IBM SAN and SVC Stretched Cluster and
VMware Solution Implementation, SG24-8072.
Chapter 2. SAN topology
29
30
31
This configuration applies to all ports in a particular switch, as shown in the following
examples:
Port 0 (0000) is assigned virtual channel 2
Port 1 (0001) is assigned virtual channel 3
Port 2 (0010) is assigned virtual channel 4
Port 3 (0011) is assigned virtual channel 5
Port 4 (0100) is assigned virtual channel 2
Port 5 (0101) is assigned virtual channel 3, and so on
When you connect SAN Volume Controller or Storwize to the switch, avoid connecting it to the
ports on the same virtual channel; for example, do not connect SAN Volume
Controller/Storwize to ports 0, 4, 8, 12, 16, and so on. This might lead to the buffer credit
starvation, which can cause congestion and the drop of frames on this particular virtual
channel, even if other virtual channels on the same ISL work without any problem.
Figure 2-7 on page 33 shows the correct and incorrect connection schema for non-director
class switches.
32
As shown on the left side of Figure 2-7, all ports of Storwize V7000 are connected to the
following separate virtual channels in each fabric:
Fabric 1:
Node canister 1, port 1 switch port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 1 switch port 1, port ID 0000 0001, virtual channel 3
Node canister 1, port 3 switch port 2, port ID 0000 0010, virtual channel 4
Node canister 2, port 3 switch port 3, port ID 0000 0011, virtual channel 5
Fabric 2:
Node canister 1, port 2 switch port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 2 switch port 1, port ID 0000 0001, virtual channel 3
Node canister 1, port 4 switch port 2, port ID 0000 0010, virtual channel 4
Node canister 2, port 4 switch port 3, port ID 0000 0011, virtual channel 5
In the right side of Figure 2-7, the wrong schema is shown because all of the ports are
connected to the following same virtual channel in each fabric:
Fabric 1:
Node canister 1, port 1 switch port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 1 switch port 8, port ID 0000 1000, virtual channel 2
Node canister 1, port 3 switch port 16, port ID 0001 0000, virtual channel 2
Node canister 2, port 3 switch port 24, port ID 0001 1000, virtual channel 2
Fabric 2:
Node canister 1, port 2 switch port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 2 switch port 8, port ID 0000 1000, virtual channel 2
Node canister 1, port 4 switch port 16, port ID 0001 0000, virtual channel 2
Node canister 2, port 4 switch port 24, port ID 0001 1000, virtual channel 2
33
A similar situation occurs with director class SAN switches. The best way is to connect each
SAN Volume Controller/Storwize port to separate virtual channels on separate port blade,
which is called diagonal connection.
Figure 2-8 shows the correct and incorrect cabling for director class switches.
As shown in the left side of Figure 2-8, all ports of Storwize V7000 are connected to the
following separate virtual channels and to separate port blades in each fabric:
Fabric 1:
Node canister 1, port 1 switch blade1/port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 1 switch blade2/port 1, port ID 0000 0001, virtual channel 3
Node canister 1, port 3 switch blade3/port 2, port ID 0000 0010, virtual channel 4
Node canister 2, port 3 switch blade4/port 3, port ID 0000 0011, virtual channel 5
Fabric 2:
Node canister 1, port 2 switch blade1/port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 2 switch blade2/port 1, port ID 0000 0001, virtual channel 3
Node canister 1, port 4 switch blade3/port 2, port ID 0000 0010, virtual channel 4
Node canister 2, port 4 switch blade4/port 3, port ID 0000 0011, virtual channel 5
In the right side of Figure 2-8, the schema is wrong because all ports are connected to the
same virtual channel in each of the following fabric even if they are in separate port blades:
Fabric 1:
Node canister 1, port 1 switch blade1/port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 1 switch blade2/port 0, port ID 0000 0000, virtual channel 2
34
Node canister 1, port 3 switch blade3/port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 3 switch blade4/port 0, port ID 0000 0000, virtual channel 2
Fabric 2:
Node canister 1, port 2 switch blade1/port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 2 switch blade2/port 0, port ID 0000 0000, virtual channel 2
Node canister 1, port 4 switch blade3/port 0, port ID 0000 0000, virtual channel 2
Node canister 2, port 4 switch blade14port 0, port ID 0000 0000, virtual channel 2
Best practice: Always connect all SAN Volume Controller/Storwize Fibre Channel ports to
separate virtual channels, if possible.
For more information about virtual channels, see Brocade Fabric OS Administrators Guide
version 7.2, which is available at this website:
http://www.brocade.com/downloads/documents/product_manuals/B_SAN/FOS_AdminGd_v720.
pdf
More information also is available at Sebs sanblog, which is available at this website:
https://www.ibm.com/developerworks/community/blogs/sanblog/entry/how_to_not_connec
t_an_svc_in_a_core_edge_brocade_fabric16?lang=en
Fabric Watch
If the SAN Volume Controller/Storwize relies on a healthy, properly functioning SAN, consider
the use of the Fabric Watch feature in newer Brocade-based SAN switches. Fabric Watch is a
SAN health monitor that enables real-time proactive awareness of the health, performance,
and security of each switch. It automatically alerts SAN managers to predictable problems to
help avoid costly failures. It tracks a wide range of fabric elements, events, and counters.
By using Fabric Watch, you can configure the monitoring and measuring frequency for each
switch and fabric element and specify notification thresholds. Whenever these thresholds are
exceeded, Fabric Watch automatically provides notification by using several methods,
including email messages, SNMP traps, log entries, or posts alerts to IBM Network Advisor.
The components that Fabric Watch monitors are grouped into the following classes:
Environment, such as temperature
Fabric, such as zone changes, fabric segmentation, and E_Port down
Field Replaceable Unit, which provides an alert when a part replacement is needed
Performance Monitor; for example, RX and TX performance between two devices
Port, which monitors port statistics and takes actions (such as port fencing) that are based
on the configured thresholds and actions
Resource, such as RAM, flash, memory, and processor
Security, which monitors different security violations on the switch and takes action that is
based on the configured thresholds and their actions
SFP, which monitor the physical aspects of an SFP, such as voltage, current, RXP, TXP,
and state changes in physical ports
35
By implementing Fabric Watch, you benefit by improved high availability from proactive
notification. You also can reduce troubleshooting and root cause analysis (RCA) times. Fabric
Watch is an optionally licensed feature of Fabric OS. However, it is already included in the
base licensing of the new IBM System Storage b-series switches.
Bottleneck detection
A bottleneck is a situation where the frames of a fabric port cannot get through as fast as they
should. In this condition, the offered load is greater than the achieved egress throughput on
the affected port.
The bottleneck detection feature does not require any other license. It identifies and alerts
you to ISL or device congestion and device latency conditions in the fabric. By using
bottleneck detection, you can prevent degradation of throughput in the fabric and to reduce
the time it takes to troubleshoot SAN performance problems. Bottlenecks are reported
through RAS log alerts and SNMP traps, and you can set alert thresholds for the severity and
duration of the bottleneck. Starting in Fabric OS 6.4.0, you configure bottleneck detection on
a per-switch basis, with per-port exclusions.
The following types of bottleneck detection are available in Brocade b-type switches:
Congestion bottleneck detection, which measures utilization of fabric links.
Latency bottleneck detection, which indicates of buffer credits starvation.
You can enable bottleneck detection by using the bottlneckmon command in the CLI.
Best practice: To spot SAN problems as soon as possible, it is advised to upgrade b-type
switches to at least version 7.0 of FOS and enable bottleneck detection.
Virtual Fabrics
Virtual Fabrics adds the capability for physical switches to be partitioned into independently
managed logical switches. Implementing Virtual Fabrics has several advantages, such as
hardware consolidation, improved security, and resource sharing by several customers.
The following IBM System Storage platforms are Virtual Fabrics capable:
SAN768B, SAN768B-2
SAN384B, SAN384B-2
SAN96B-5
SAN80B-4
SAN48B-5
To configure Virtual Fabrics, you do not need to install any more licenses.
36
Port channels
To ease the required planning efforts for future SAN expansions, ISLs or port channels can be
made up of any combination of ports in the switch. With this approach, you do not need to
reserve special ports for future expansions when you provision ISLs. Instead, you can use
any free port in the switch to expand the capacity of an ISL or port channel.
Cisco VSANs
By using VSANs, you can achieve an improved SAN scalability, availability, and security by
allowing multiple FC SANs to share a common physical infrastructure of switches and ISLs.
These benefits are achieved based on independent FC services and traffic isolation between
VSANs. By using Inter-VSAN Routing (IVR), you can establish a data communication path
between initiators and targets on different VSANs without merging VSANs into a single logical
fabric.
If VSANs can group ports across multiple physical switches, you can use enhanced ISLs to
carry traffic that belongs to multiple VSANs (VSAN trunking).
37
The main VSAN implementation advantages are hardware consolidation, improved security,
and resource sharing by several independent organizations. You can use Cisco VSANs with
inter-VSAN routes to isolate the hosts from the storage arrays. This arrangement provides
little benefit for a great deal of added configuration complexity. However, VSANs with
inter-VSAN routes can be useful for fabric migrations that are not from Cisco vendors onto
Cisco fabrics, or for other short-term situations.
VSANs can also be useful if you have a storage array that is direct attached by hosts with
some space virtualized through the SAN Volume Controller/Storwize. In this case, use
separate storage ports for the SAN Volume Controller/Storwize and the hosts. Do not use
inter-VSAN routes to enable port sharing.
2.3 Zoning
Because the SAN Volume Controller/Storwize differs from traditional storage devices,
properly zoning the SAN Volume Controller/Storwize into your SAN fabric is a source of
misunderstanding and errors. Despite the misunderstandings and errors, zoning the SAN
Volume Controller/Storwize into your SAN fabric is not complicated.
Important: Errors that are caused by improper SAN Volume Controller/Storwize zoning
are often difficult to isolate. Therefore, create your zoning configuration carefully.
Basic SAN Volume Controller/Storwize zoning entails the following tasks:
1. Create the internode communications zone for the SAN Volume Controller. Although this
zone is not necessary for Storwize family systems, it is highly recommended to have one.
2. Create a clustered system for the SAN Volume Controller/Storwize.
3. Create a SAN Volume Controller/Storwize Back-end storage subsystem zones.
4. Assign back-end storage to the SAN Volume Controller/Storwize.
5. Create a host SAN Volume Controller/Storwize zones.
6. Create host definitions on the SAN Volume Controller/Storwize.
The zoning scheme that is described in the following section is slightly more restrictive than
the zoning that is described in IBM System Storage SAN Volume Controller V6.4.0 - Software
Installation and Configuration Guide, GC27-2286. The Configuration Guide is a statement of
what is supported. However, this Redbooks publication describes the preferred way to set up
zoning, even if other ways are possible and supported.
38
39
Aliases
Use zoning aliases when you create your SAN Volume Controller/Storwize zones if they are
available on your particular type of SAN switch. Zoning aliases make your zoning easier to
configure and understand and cause fewer possibilities for errors.
One approach is to include multiple members in one alias because zoning aliases can
normally contain multiple members (similar to zones). Create the following zone aliases:
One zone alias that holds all the SAN Volume Controller/Storwize node ports on each
fabric.
One zone alias for each storage subsystem (or controller blade for DS4x00 units).
One zone alias for each I/O group port pair (it must contain the first node in the I/O group,
port X, and the second node in the I/O group, port X).
You can omit host aliases in smaller environments, as we did in the lab environment that was
used for this IBM Redbooks publication. Figure 2-10 on page 41 shows some aliases
examples.
40
41
CtrlA_FabricA
SAN Fabric A
Ctrl A
1
2
3
4
CtrlB_FabricA
CtrlA_FabricB
Ctrl B
SAN Fabric B
1
2
3
4
CtrlB_FabricB
DS4000/DS5000
Network
SVC nodes
For more information about zoning the IBM System Storage IBM DS4000 or IBM DS5000
series within the SAN Volume Controller/Storwize, see IBM Midrange System Storage
Implementation and Best Practices Guide, SG24-6363.
42
Tip: Only single rack XIV configurations are supported by SAN Volume
Controller/Storwize. Multiple single racks can be supported where each single rack is seen
by SAN Volume Controller/Storwize as a single controller.
SAN Fabric A
1
2
3
4
SAN Fabric B
1
2
3
4
Network
SVC nodes
43
Note: To change the layer, you must disable the visibility of every other Storwize or SAN
Volume Controller on all fabrics. This means deleting partnerships, remote copy relations,
and zoning between Storwize and other Storwize or SAN Volume Controller. Then, use the
command chsystem -layer to set the layer of the system.
Figure 2-13 shows how you can zone the SAN Volume Controller with the minimum Storwize
ports.
Canister 1
Canister 2
SAN Fabric A
1
2
3
4
SAN Fabric B
1
2
3
4
Storwize V7000
Network
SVC nodes
44
Zone
Foo_Slot3_SAN_A
SVC Node
I/O Group 0
Zone
Bar_Slot2_SAN_A
Switch A
Zone: Foo_Slot3_SAN_A
50:00:11:22:33:44:55:66
SVC_Group0_Port_A
Zone: Bar_Slot2_SAN_A
50:11:22:33:44:55:66:77
SVC_Group0_Port_C
Host Foo
SVC Node
Zone
Foo_Slot5_SAN_B
Zone
Bar_Slot8_SAN_B
Switch B
Zone: Foo_Slot5_SAN_B
50:00:11:22:33:44:55:67
SVC_Group0_Port_D
Zone: Bar_Slot8_SAN_B
50:11:22:33:44:55:66:78
SVC_Group0_Port_B
Host Bar
This configuration provides four paths to each volume, which is the number of paths per
volume for which Subsystem Device Driver (SDDPCM and SDDDSM) multipathing software
and the SAN Volume Controller/Storwize are tuned.
For more information about the placement of many hosts in a single zone as a supported
configuration in some circumstances, see IBM System Storage SAN Volume Controller
V6.4.0 - Software Installation and Configuration Guide, GC27-2286. Although this design
usually works, instability in one of your hosts can trigger various impossible-to-diagnose
problems in the other hosts in the zone. For this reason, you need only a single host in each
zone (single initiator zones).
A supported configuration is to have eight paths to each volume. However, this design
provides no performance benefit and, in some circumstances, reduces performance. Also, it
does not significantly improve reliability nor availability.
To obtain the best overall performance of the system and to prevent overloading, the workload
to each SAN Volume Controller/Storwize port must be equal. Having the same amount of
workload typically involves zoning approximately the same number of host FC ports to each
SAN Volume Controller/Storwize FC port.
45
SVC Node
SVC Node
SVC Node
Switch A
Peter
Switch B
Barry
Jon
Ian
Thorsten
Ronda
Deon
Foo
Aliases
Unfortunately, you cannot nest aliases. Therefore, several of the WWPNs appear in multiple
aliases. Also, your WWPNs might not look like the ones in the example; some were created
when this book was written.
46
Some switch vendors do not allow multiple-member aliases, but you can still create
single-member aliases. Although creating single-member aliases does not reduce the size of
your zoning configuration, it still makes it easier to read than a mass of raw WWPNs.
For the alias names, SAN_A is appended on the end where necessary to distinguish that
these alias names are the ports on SAN A. This system helps if you must troubleshoot both
SAN fabrics at one time.
SVC_Cluster_SAN_A:
50:05:07:68:01:40:37:e5
50:05:07:68:01:10:37:e5
50:05:07:68:01:40:37:dc
50:05:07:68:01:10:37:dc
50:05:07:68:01:40:1d:1c
50:05:07:68:01:10:1d:1c
50:05:07:68:01:40:27:e2
50:05:07:68:01:10:27:e2
The clustered system alias that is created is used for the internode communications zone and
for all back-end storage zones. It is also used in any zones that you need for remote mirroring
with another SAN Volume Controller clustered system (not be addressed in this example).
SVC_Group0_Port1:
50:05:07:68:01:40:37:e5
50:05:07:68:01:40:37:dc
SVC_Group0_Port3:
50:05:07:68:01:10:37:e5
50:05:07:68:01:10:37:dc
SVC_Group1_Port1:
50:05:07:68:01:40:1d:1c
50:05:07:68:01:40:27:e2
SVC_Group1_Port3:
50:05:07:68:01:10:1d:1c
Chapter 2. SAN topology
47
50:05:07:68:01:10:27:e2
DS4k_23K45_Blade_A_SAN_A
20:04:00:a0:b8:17:44:32
20:04:00:a0:b8:17:44:33
DS4k_23K45_Blade_B_SAN_A
20:05:00:a0:b8:17:44:32
20:05:00:a0:b8:17:44:33
DS8k_34912_SAN_A
50:05:00:63:02:ac:01:47
50:05:00:63:02:bd:01:37
50:05:00:63:02:7f:01:8d
50:05:00:63:02:2a:01:fc
Zones
When you name your zones, do not give them identical names as aliases. For the
environment that is described in this book, we use the following sample zone set, which uses
the defined aliases as described in Aliases on page 40.
SVC_Cluster_Zone_SAN_A:
SVC_Cluster_SAN_A
48
SVC_DS4k_23K45_Zone_Blade_A_SAN_A:
SVC_Cluster_SAN_A
DS4k_23K45_Blade_A_SAN_A
SVC_DS4k_23K45_Zone_Blade_B_SAN_A:
SVC_Cluster_SAN_A
DS4K_23K45_BLADE_B_SAN_A
SVC_DS8k_34912_Zone_SAN_A:
SVC_Cluster_SAN_A
DS8k_34912_SAN_A
WinPeter_Slot3:
21:00:00:e0:8b:05:41:bc
SVC_Group0_Port1
WinBarry_Slot7:
21:00:00:e0:8b:05:37:ab
SVC_Group0_Port3
WinJon_Slot1:
21:00:00:e0:8b:05:28:f9
SVC_Group1_Port1
WinIan_Slot2:
21:00:00:e0:8b:05:1a:6f
SVC_Group1_Port3
AIXRonda_Slot6_fcs1:
10:00:00:00:c9:32:a8:00
SVC_Group0_Port1
AIXThorsten_Slot2_fcs0:
10:00:00:00:c9:32:bf:c7
SVC_Group0_Port3
49
AIXDeon_Slot9_fcs3:
10:00:00:00:c9:32:c9:6f
SVC_Group1_Port1
AIXFoo_Slot1_fcs2:
10:00:00:00:c9:32:a8:67
SVC_Group1_Port3
Best practice: Although we used raw WWPNs for this example, the preferred practice is to
always use aliases for your WWPNs and name them in the meaningful manner.
50
Of these options, the optical varieties of distance extension are preferred. IP distance
extension introduces more complexity, is less reliable, and has performance limitations.
However, optical distance extension is impractical in many cases because of cost or
unavailability.
Distance extension: If possible, use distance extension only for links between SAN
Volume Controller clustered systems. Do not use it for intraclustered system
communication. Technically, distance extension is supported for relatively short distances,
such as a few kilometers (or miles). For information about why this arrangement should not
be used, see IBM System Storage SAN Volume Controller Restrictions, S1003799.
51
Also, when you are communicating with your organizations networking architects, distinguish
between megabytes per second (MBps) and megabits per second (Mbps). In the storage
world, bandwidth often is specified in MBps, but network engineers specify bandwidth in
Mbps. If you fail to specify MB, you can end up with an impressive-sounding 155 Mbps OC-3
link, which supplies only 15 MBps or so to your SAN Volume Controller/Storwize. If you
include the safety margins, this link is not as fast as you might hope, so ensure that the
terminology is correct.
Consider the following steps when you are planning for your FCIP TCP/IP links:
For redundancy purposes, use as many TCP/IP links between sites as you have fabrics in
each site, which you want to connect. In most cases, there are two SAN FC fabrics in each
site, so you need two TCP/IP connections between sites.
Try to dedicate TCP/IP links only for storage interconnection. Separate them from other
LAN/WAN traffic.
Make sure that you have a Service Level Agreement (SLA) with your TCP/IP link vendor
that meets your needs and expectations.
If you do not use Global Mirror with Change Volumes (GMVC), make sure that you sized
your TCP/IP link to sustain peak workloads.
The use of SAN Volume Controller/Storwize internal Global Mirror (GM) simulation options
can help you test your applications before production implementation. You can simulate
GM environment within one SAN Volume Controller or one Storwize system, without
partnership with another. Use the chsystem command with following parameters to
perform GM testing:
gmlinktolerance
gmmaxhostdelay
gminterdelaysimulation
gmintradelaysimulation
If you are not sure about your TCP/IP link security, enable Internet Protocol Security
(IPSec) on the all FCIP devices. IPSec is enabled on the Fabric OS level, so you do not
need any external IPSec appliances.
In addition to planning for your TCP/IP link, consider adhering to the following preferred
practices:
Set the link bandwidth and background copy rate of partnership between your replicating
SAN Volume Controller/Storwize to value lower than your TCP/IP link capacity. Failing to
do that can cause an unstable TCP/IP tunnel, which can lead to stopping all your remote
copy relations that use that tunnel.
The best case is to use Global Mirror with Change Volumes (GMCV) when replication is
done over long distances.
Use compression on corresponding FCIP devices.
Use at least two ISLs from your local FC switch to local FCIP router.
Use VE and VEX ports on FCIP routers to avoid merging fabrics from both sites.
For more information about FCIP, see the following publications:
IBM System Storage b-type Multiprotocol Routing: An Introduction and Implementation,
SG24-7544
Brocade Fabric OS Administrators Guide version 7.2
52
53
When you have SAN fabrics with multiple vendors, pay special attention to any particular
requirements. For example, observe from which switch in the fabric the zoning must be
performed.
54
55
When the partner node returns to the online state, its IP addresses and iSCSI names failback
after a delay of 5 minutes. This method ensures that the recently online node is stable before
the host uses it for I/O again.
The svcinfo lsportip command lists a nodes own IP addresses and iSCSI names and the
addresses and names of its partner node. The addresses and names of the partner node are
identified by the failover field that is set to yes. The failover_active value of yes in the
svcinfo lsnode command output indicates that the IP addresses and iSCSI names of the
partner node failed over to a particular node.
56
Figure 2-16 Difference between Storwize V7000 and Storwize V3700/V5000 SAS connectors
Storwize V3700 and V5000 have one, four-port SAS per canister node. In Storwize V3700,
one of those ports is used for connecting expansion drawers in one chain and three ports are
used for host connection. In Storwize V5000, two ports are used for connecting expansion
drawers in two chains and two ports are used to connect hosts. Each host must have at least
one HIC with two SAS ports because it must be connected to both canister nodes in Storwize
V3700 or V5000.
The proper cabling is shown in Figure 2-17.
57
New GUI options and CLI commands for defining SAS-connected hosts were added to
address this new feature.
Also, with 7.2 code version, it is possible to directly connect DS3200 or DS3500 to Storwize
V3700 or Storwize V5000 for data migration. As with host connection, there is no support for
SAS switches, only direct connection. The external storage SAS connection is not for general
virtualization purpose; instead, it is only for data migration. Migrated DS3200/DS3500 storage
systems can be directly connected to the SAS host ports in each of Storwize V3700/V5000
node canisters.
For more information about supported servers and storage systems, see Supported
Hardware List, Device Driver, Firmware and Recommended Software Levels for Flex System
V7000 and IBM Storwize V3500, V3700 and V5000, S1004515, which is available at this
website:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004515
58
Chapter 3.
Advantages of virtualization
Scalability of SAN Volume Controller clustered systems
Scalability of IBM Storwize V7000
Clustered system upgrade
59
By using the SAN Volume Controller, you can join capacity from various heterogeneous
storage subsystem arrays into one pool of capacity for better utilization and more flexible
access. This design helps the administrator to control and manage this capacity from a single
common interface instead of managing several independent disk systems and interfaces. The
SAN Volume Controller also can improve the performance and efficiency of your storage
subsystem array. This improvement is possible by introducing 24 GB of cache memory in
each node and the option of the use of internal solid-state drives (SSDs) with the IBM System
Storage Easy Tier function.
By using SAN Volume Controller virtualization, users can move data nondisruptively between
different storage subsystems. This feature can be useful, for example, when you replace an
existing storage array with a new one or when you move data in a tiered storage
infrastructure.
By using the Volume mirroring feature, you can store two copies of a volume on different
storage subsystems. This function helps to improve application availability if a failure occurs
or disruptive maintenance occurs to an array or disk system. Moreover, the two mirror copies
can be placed at a distance of 10 km (6.2 miles) when you use longwave (LW) small form
factor pluggables (SFPs) with a split-clustered system configuration.
As a virtualization function, thin-provisioned volumes allow provisioning of storage volumes
that is based on future growth that requires only physical storage for the current utilization.
This feature is best for host operating systems that do not support logical volume managers.
In addition to remote replication services, local copy services offer a set of copy functions.
Multiple target FlashCopy volumes for a single source, incremental FlashCopy, and Reverse
FlashCopy functions enrich the virtualization layer that is provided by SAN Volume Controller.
FlashCopy is commonly used for backup activities and is a source of point-in-time remote
copy relationships. Reverse FlashCopy allows a quick restore of a previous snapshot without
breaking the FlashCopy relationship and without waiting for the original copy. This feature is
convenient, for example, after a failing host application upgrade or data corruption. In such a
situation, you can restore the previous snapshot almost instantaneously.
If you are presenting storage to multiple clients with different performance requirements, you
can create a tiered storage environment and provision storage with SAN Volume Controller.
60
61
62
63
Table 3-1 Storwize/SAN Volume Controller Maximum configurations for an I/O group
Objects
Maximum
number
Comments
I/O groups
2048
256
512
N/A
1024 TB
1024 TB
Maximum number
Comments
MDisks
4096
8192
64
Objects
Maximum number
Comments
32 PB
1024
2048
N/A
128
96 cores
384 GB RAM
If you exceed one of the current maximum configuration limits for the fully deployed SAN
Volume Controller clustered system, you scale out by adding a SAN Volume Controller
clustered system and distributing the workload to it.
Because the current maximum configuration limits can change, see for the current SAN
Volume Controller restrictions that is in IBM System Storage SAN Volume Controller 7.2.x
Configuration Limits and Restrictions, S1004510, which is available at this website:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004510
By splitting a SAN Volume Controller system or having a secondary SAN Volume Controller
system, you can implement a disaster recovery option in the environment. With two SAN
Volume Controller clustered systems in two locations, work continues even if one site is down.
By using the SAN Volume Controller Advanced Copy functions, you can copy data from the
local primary environment to a remote secondary site. The maximum configuration limits also
apply.
Another advantage of having two clustered systems is the option of using the SAN Volume
Controller Advanced Copy functions. Licensing is based on the following factors:
The total amount of storage (in GB) that is virtualized.
The Metro Mirror and Global Mirror capacity that is in use (primary and secondary).
The FlashCopy source capacity that is in use.
In each case, the number of terabytes (TBs) to order for Metro Mirror and Global Mirror is the
total number of source TBs and target TBs that are participating in the copy operations.
Because FlashCopy is licensed, SAN Volume Controller now counts as the main source in
FlashCopy relationships.
65
If you are adding a node that was used previously, consider changing its worldwide node
name (WWNN) before you add it to the SAN Volume Controller clustered system. For
more information, see Chapter 3, SAN Volume Controller user interfaces for servicing
your system in IBM System Storage SAN Volume Controller Troubleshooting Guide,
GC27-2284-01.
Install the new nodes and connect them to the local area network (LAN) and SAN.
Power on the new nodes.
Include the new nodes in the internode communication zones and in the back-end zones.
Use LUN masking on back-end storage LUNs (managed disks) to include the worldwide
port names (WWPNs) of the SAN Volume Controller nodes that you want to add.
Add the SAN Volume Controller nodes to the clustered system
Check the SAN Volume Controller status, including the nodes, managed disks, and
(storage) controllers.
For more information about adding an I/O group, see Replacing or adding nodes to an
existing clustered system in the IBM System Storage SAN Volume Controller Software
Installation and Configuration Guide, GC27-2286-01.
66
This option is more difficult, involves more steps (replication services), and requires more
preparation in advance. For more information about this option, see Chapter 7, Remote
copy services on page 157.
Use the volume managed-mode-to-image-mode migration to move workload from one
SAN Volume Controller clustered system to the new SAN Volume Controller clustered
system. You migrate a volume from managed mode to image mode and reassign the disk
(LUN masking) from your storage subsystem point of view. Then, you introduce the disk to
your new SAN Volume Controller clustered system and use the image mode to manage
mode migration.
Outage: This scenario also invokes an outage to your host systems and the I/O to the
involved SAN Volume Controller volumes.
This option involves the longest outage to the host systems; therefore, it is not a preferred
option. For more information about this option, see Chapter 6, Volumes on page 125.
It is uncommon to reduce the number of I/O groups. It can happen when you replace old
nodes with new more powerful ones. It can also occur in a remote partnership when more
bandwidth is required on one side and spare bandwidth is on the other side.
Upgrading hardware
You have a few choices to upgrade existing SAN Volume Controller system hardware. Your
choice depends on the size of the existing clustered system.
Up to six nodes
If your clustered system has up to six nodes, the following options are available:
Add the new hardware to the clustered system, migrate volumes to the new nodes, and
then retire the older hardware when it is no longer managing any volumes. This method
requires a brief outage to the hosts to change the I/O group for each volume.
Swap out one node in each I/O group at a time and replace it with the new hardware.
Contact an IBM service support representative (IBM SSR) to help you with this process.
You can perform this swap without an outage to the hosts.
67
Up to eight nodes
If your clustered system has eight nodes, the following options are available:
Swap out a node in each I/O group (one at a time) and replace it with the new hardware.
Contact an IBM SSR to help you with this process. You can perform this swap without an
outage to the hosts, and you need to swap a node in one I/O group at a time. Do not
change all I/O groups in a multi-I/O group clustered system at one time.
Move the volumes to another I/O group so that all volumes are on three of the four I/O
groups. You can then remove the remaining I/O group with no volumes and add the new
hardware to the clustered system.
As each pair of new nodes is added, volumes can then be moved to the new nodes,
leaving another old I/O group pair that can be removed. After all the old pairs are removed,
the last two new nodes can be added, and, if required, volumes can be moved onto them.
Unfortunately, this method requires several outages to the host because volumes are
moved between I/O groups. This method might not be practical unless you must
implement the new hardware over an extended period and the first option is not practical
for your environment.
68
69
70
Chapter 4.
Back-end storage
This chapter describes aspects and characteristics to consider when you plan the attachment
of a back-end storage device to be virtualized by an IBM System Storage SAN Volume
Controller or Storwize.
This chapter includes the following sections:
Controller affinity and preferred path
Considerations for DS4000 and DS5000 series
Considerations for DS8000 series
Considerations for IBM XIV Storage System
Considerations for IBM Storwize V7000/V5000/V3700
Considerations for IBM FlashSystem
Considerations for third-party storage with EMC Symmetrix DMX and Hitachi Data
Systems
Medium error logging
Mapping physical LBAs to volume extents
Identifying storage controller boundaries by using the IBM Tivoli Storage Productivity
Center
71
72
For more information about the latest updates of this list, see V7.2.x Supported Hardware
List, Device Driver, Firmware and Recommended Software Levels for SAN Volume Controller,
which is available at this website:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004453#_V3K
73
4.3.1 Setting the DS4000 and DS5000 series so that both controllers have the
same worldwide node name
The SAN Volume Controller/Storwize recognizes that the DS4000 and DS5000 series
controllers belong to the same storage system unit if they both have the same worldwide
node name (WWNN). You can choose from several methods to determine whether the
WWNN is set correctly for SAN Volume Controller/Storwize. From the SAN switch GUI, you
can check whether the worldwide port name (WWPN) and WWNN of all devices are logged in
to the fabric. You confirm that the WWPN of all DS4000 or DS5000 series host ports are
unique but that the WWNNs are identical for all ports that belong to a single storage unit.
You can obtain the same information from the Controller section when you view the Storage
Subsystem Profile from the Storage Manager GUI. This section lists the WWPN and WWNN
information for each host port, as shown in the following example:
World-wide port identifier: 20:27:00:80:e5:17:b5:bc
World-wide node identifier: 20:06:00:80:e5:17:b5:bc
If the controllers are set up with different WWNNs, run the SameWWN.script script that is
bundled with the Storage Manager client download file to change it.
Attention: This procedure is intended for initial configuration of the DS4000 or DS5000
series. Do not run the script in a live environment because all hosts that access the storage
subsystem are affected by the changes.
74
4.3.4 Auto-Logical Drive Transfer for the DS4000 and DS5000 series (firmware
version before 7.83.x)
The DS4000 and DS5000 series have a feature that is called Auto-Logical Drive Transfer
(ADT), which allows logical drive-level failover as opposed to controller level failover. When
you enable this option, the DS4000 or DS5000 series moves LUN ownership between
controllers according to the path that is used by the host.
75
For the SAN Volume Controller/Storwize, the ADT feature is enabled by default when you
select IBM TS SAN VCE as the host type.
IBM TS SAN VCE: When you configure the DS4000 or DS5000 series for SAN Volume
Controller or Storwize attachment, select the IBM TS SAN VCE host type so that the SAN
Volume Controller/Storwize can properly manage the back-end paths. If the host type is
incorrect, SAN Volume Controller/Storwize reports error 1625 (incorrect controller
configuration).
For more information about checking the back-end paths to storage controllers, see
Chapter 15, Troubleshooting and diagnostics on page 519.
4.3.5 Asymmetric Logical Unit Access for the DS4000 and DS5000 series
(firmware 7.83.x and later)
With the firmware version of 7.83.x, the new setting was introduced called Asymmetric
Logical Unit Access (ALUA). ALUA replaces the ADT setting for SAN Volume
Controller/Storwize starting with DS4000/DS5000 series firmware v7.83.x.
With ALUA compatible storage systems, the controllers are no longer active-passive but act
as active-active. LUNs still have a controller affinity; however, if all preferred paths fail,
multipath driver redirects all of the I/O to the non-preferred paths, which is the non-preferred
controller. In cases where preferred controller works, all I/O is redirected from non-preferred
controller to the preferred controller. The controllers do not change the ownership of the LUNs
if this condition lasts less than 5 minutes. After 5 minutes, non-preferred controller stops
redirecting I/O to preferred controller and takes ownership of the LUNs.
ALUA features the following advantages:
Boot from SAN does not fail if the boot LUN is not on preferred path.
Eliminates LUN failover/fallback if there are transitory path interruptions.
Prevents LUN trashing in clustered environments.
I/O can be sent to both controllers.
Note: IBM SAN Volume Controller and Storwize family storage systems are ALUA capable.
For more information about ALUA, see Installation and Host Support Guide v10.8 - IBM
System Storage DS Storage Manager, which is available at this website:
https://www-947.ibm.com/support/entry/myportal/docdisplay?lndocid=MIGR-5090826&bra
ndind=5000028
76
A common mistake that people make when they select an array width is the tendency to focus
only on the capability of a single array to perform various workloads. However, you must also
consider in this decision the aggregate throughput requirements of the entire storage server.
Many physical disks in an array can create a workload imbalance between the controllers
because only one controller of the DS4000 or DS5000 series actively accesses a specific
array.
When you select array width, you must also consider its effect on rebuild time and availability.
A larger number of disks in an array increases the rebuild time for disk failures, which can
have a negative effect on performance. Also, more disks in an array increase the probability of
having a second drive fail within the same array before the rebuild completion of an initial
drive failure, which is an inherent exposure to the RAID 5 architecture.
Best practice: For the DS4000 or DS5000 series, use array widths of 4+p and 8+p for
RAID5, and 4+4 or 8+8 for RAID10.
Segment size
With direct-attached hosts, considerations are often made to align device data partitions to
physical drive boundaries within the storage controller. For the SAN Volume
Controller/Storwize, aligning device data partitions to physical drive boundaries within the
storage controller is less critical. The reason is based on the caching that the SAN Volume
Controller/Storwize provides and on the fact that less variation is in its I/O profile, which is
used to access back-end disks.
For the SAN Volume Controller/Storwize, the only opportunity for a full stride write occurs with
large sequential workloads. In that case, the larger the segment size is, the better. However,
larger segment sizes can adversely affect random I/O. The SAN Volume Controller/Storwize
and controller cache hide the RAID 5 write penalty for random I/O well. Therefore, larger
segment sizes can be accommodated. The primary consideration for selecting segment size is
to ensure that a single host I/O fits within a single segment to prevent access to multiple
physical drives.
Testing demonstrated that the best compromise for handling all workloads is to use a
segment size of 256 KB.
Best practice: Use a segment size of 256 KB as the best compromise for all workloads.
77
Table 4-1 shows the values for SAN Volume Controller/Storwize and DS4000 or DS5000
series.
Table 4-1 SAN Volume Controller/Storwize values
Models
Attribute
Value
256
Managed mode
Striped
256
DS4000a series
4 KB (default)
DS5000 series
32 KB
80/80 (default)
Readahead
1 (enabled)
RAID 5
RAID 6
8+P+Q
RAID10
4+4 or 8+8
a. For the newest models (on firmware 7.xx and later), use 32 KB.
78
The DS8000 series controllers assigns server (controller) affinity to ranks when they are
added to an extent pool. Ranks that belong to an even-numbered extent pool have an affinity
to server0, and ranks that belong to an odd-numbered extent pool have an affinity to server1.
Example 4-1 shows the correct configuration that balances the workload across all four DA
pairs with an even balance between odd and even extent pools. The arrays that are on the
same DA pair are split between groups 0 and 1.
Example 4-1 Output of the lsarray command
dscli> lsarray -l
Date/Time: Nov 20, 2013 10:15:43 AM CEST IBM DSCLI Version:5.2.410.299 DS: IBM.2107-75L2321
Array State Data
RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass
===================================================================================
A0
Assign Normal
5 (6+P+S)
S1
R0
0
146.0
ENT
A1
Assign Normal
5 (6+P+S)
S9
R1
1
146.0
ENT
A2
Assign Normal
5 (6+P+S)
S17
R2
2
146.0
ENT
A3
Assign Normal
5 (6+P+S)
S25
R3
3
146.0
ENT
A4
Assign Normal
5 (6+P+S)
S2
R4
0
146.0
ENT
A5
Assign Normal
5 (6+P+S)
S10
R5
1
146.0
ENT
A6
Assign Normal
5 (6+P+S)
S18
R6
2
146.0
ENT
A7
Assign Normal
5 (6+P+S)
S26
R7
3
146.0
ENT
dscli> lsrank -l
Date/Time: Nov 20, 2013 10:20:23 AM CEST IBM DSCLI Version: 5.2.410.299 DS:
IBM.2107-75L2321
ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts
======================================================================================
R0
0 Normal Normal
A0
5
P0
extpool0 fb
779
779
R1
1 Normal Normal
A1
5
P1
extpool1 fb
779
779
R2
0 Normal Normal
A2
5
P2
extpool2 fb
779
779
R3
1 Normal Normal
A3
5
P3
extpool3 fb
779
779
R4
1 Normal Normal
A4
5
P5
extpool5 fb
779
779
R5
0 Normal Normal
A5
5
P4
extpool4 fb
779
779
R6
1 Normal Normal
A6
5
P7
extpool7 fb
779
779
R7
0 Normal Normal
A7
5
P6
extpool6 fb
779
779
79
To use the storage pool striping feature, your DS8000 series layout must be well-planned from
the initial DS8000 series configuration to using all resources in the DS8000 series. Otherwise,
storage pool striping can cause severe performance problems in a situation where, for
example, you configure a heavily loaded extent pool with multiple ranks from the same DA
pair.
Because the SAN Volume Controller stripes across MDisks, without proper planning you
might end up with a double striping issue: one striping on extent pool level in DS8000 series
and another striping on an MDisk in SAN Volume Controller or Storwize. For more
information, see IBM DS8870 Architecture and Implementation, SG24-8085, which is
available at this website:
http://www.redbooks.ibm.com/abstracts/sg248085.html?Open
The use of extent pool striping can boost performance per MDisk and this is the
recommended method for extent pool configuration.
Best practice: Configure four to eight ranks per extent pool.
Cache
For the DS8000, you cannot tune the array and cache parameters. The arrays are 6+p or 7+p.
This configuration depends on whether the array site contains a spare and whether the
segment size (contiguous amount of data that is written to a single disk) is 256 KB for fixed
block volumes. Caching for the DS8000 series is done on a 64 KB track boundary.
Note: Because of the SAN Volume Controller aggressive cache prefetch algorithms,
sometimes it might be beneficiary to turn off SAN Volume Controller cache prefetch and
allow the DS8000 series controllers sophisticated cache algorithms do the prefetching.
Extraordinary caution must be taken here because cache prefetch is a SAN Volume
Controller system-wide parameter. Turning it off means turning off prefetch to all volumes
from all backend storage systems. Therefore, this should be done only with coordination
with IBM support.
80
4.4.4 Determining the number of controller ports for the DS8000 series
With the introduction of a Round Robin path selection mechanism in version 6.3 of SAN
Volume Controller code, the preferred practice is to configure 16 controller ports from one
DS8000. (SAN Volume Controller supports a maximum of 16 ports.) Additionally, use no more
than two ports of each of the four-port adapters of the DS8000 series, unless you have
DS8700 or later.
The DS8000 series populates FC adapters across 2 - 8 I/O enclosures, depending on the
configuration. Each I/O enclosure represents a separate hardware domain.
Ensure that adapters that are configured to different SAN networks do not share I/O
enclosure as part of the goal of keeping redundant SAN networks isolated from each other.
Best practices: Consider the following preferred practices:
Configure 16 ports per DS8000 series.
Configure a maximum of two ports per one DS8000 series adapter, unless you have
DS8700 or later.
Configure adapters across redundant SANs from different I/O enclosures.
Example 4-3 on page 82 shows output for the lshostconnect command from the DS8000
series. In this example, you can see that all eight ports of the two-node cluster are assigned
to the same volume group (V0) and, therefore, are assigned to the same four LUNs.
81
From Example 4-3, you can see that only the SAN Volume Controller WWPNs are assigned
to V0.
Attention: Data corruption can occur if the same LUNs are assigned to SAN Volume
Controller nodes and non-SAN Volume Controller nodes; that is, direct-attached hosts.
Next, you see how the SAN Volume Controller detects these LUNs if the zoning is properly
configured. The Managed Disk Link Count (mdisk_link_count) represents the total number of
MDisks that are presented to the SAN Volume Controller cluster by that specific controller.
Example 4-4 shows the general details of the output storage controller by using the SAN
Volume Controller command-line interface (CLI).
Example 4-4 Output of the lscontroller command
IBM_2145:svccf8:admin>svcinfo lscontroller DS8K75L3001
id 1
controller_name DS8K75L3001
WWNN 5005076305FFC74C
mdisk_link_count 16
max_mdisk_link_count 16
degraded no
vendor_id IBM
product_id_low 2107900
product_id_high
product_revision 3.44
ctrl_s/n 75L3001FFFF
allow_quorum yes
WWPN 500507630500C74C
path_count 16
max_path_count 16
WWPN 500507630508C74C
path_count 16
max_path_count 16
IBM_2145:svccf8:admin>
82
Example 4-4 on page 82 also shows that the Managed Disk Link Count is 16 and the storage
controller port details. The path_count represents a connection from a single node to a single
LUN. Because this configuration has 2 nodes and 16 LUNs, you can expect to see a total of
32 paths, with all paths evenly distributed across the available storage ports. This
configuration was validated and is correct because 16 paths are on one WWPN and 16 paths
on the other WWPN, for a total of 32 paths.
B1
S1 S2 S4 S5
00 01 03 04
B2
S1 S2 S4 S5
08 09 0B 0C
B3
S1 S2 S4 S5
10 11 13 14
B4
S1 S2 S4 S5
18 19 1B 1C
IO Bay
Slot
XX
B5
S1 S2 S4 S5
20 21 23 24
B6
S1 S2 S4 S5
28 29 2B 2C
B7
S1 S2 S4 S5
30 31 33 34
B8
S1 S2 S4 S5
38 39 3B 3C
Port
Y
P1
0
P2
4
P3
8
P4
C
83
You must use ports 1 and 3 from every interface module and connect them into the fabric for
SAN Volume Controller/Storwize use.
Best practice: Use ports 1 and 3 because they belong to two separate HBA cards.
Connect ports 1 and 3 to the separate SAN fabrics. Zone all XIV ports with all SAN Volume
Controller/Storwize ports in one large zone in each SAN fabric.
Figure 4-2 shows a two-node cluster that uses redundant fabrics.
SAN Volume Controller/Storwize supports a maximum of 16 ports from any disk system. The
XIV system supports 8 - 24 FC ports, depending on the configuration (6 - 15 modules).
Table 4-2 indicates port usage for each XIV system configuration.
Table 4-2 Number of SAN Volume Controller ports and XIV modules
Number of XIV
modules
Number of FC ports
available on XIV
Number of SAN
Volume
Controller ports
used
Module 4 and 5
Module 4, 5, 7 and 8
16
10
Module 4, 5, 7 and 8
16
11
Module 4, 5, 7, 8 and 9
20
10
12
Module 4, 5, 7, 8 and 9
20
10
13
Module 4, 5, 6, 7, 8 and 9
24
12
14
Module 4, 5, 6, 7, 8 and 9
24
12
84
Number of XIV
modules
Number of FC ports
available on XIV
Number of SAN
Volume
Controller ports
used
15
Module 4, 5, 6, 7, 8 and 9
24
12
85
Figure 4-3 SAN Volume Controller/Storwize host definition on IBM XIV Storage System
86
For more information about XIV and SAN Volume Controller/Storwize sizing and
configuration, see IBM XIV Gen3 with IBM System Storage SAN Volume Controller and
Storwize V7000, REDP-5063, which is available at this website:
http://www.redbooks.ibm.com/redpieces/abstracts/redp5063.html?Open
4.5.4 Restrictions
This section highlights restrictions for using the XIV system as back-end storage for the SAN
Volume Controller/Storwize.
87
88
Note: By default, SAN Volume Controller is in the replication layer and Storwize is in the
storage layer. This means SAN Volume Controller can virtualize (but cannot create) a
partnership with Storwize. The layer of SAN Volume Controller cannot be changed.
If you change the Storwize layer to replication, it can virtualize another Storwize and can
create a partnership with SAN Volume Controller or other Storwize in replication layer. To
change a Storwize layer, make sure that it cannot see any other Storwize or SAN Volume
Controller in the SAN fabric, which means you must remove all remote copy relationships
and all zoning first.
Complete the following steps to configure the Storwize system:
1. On the backend Storwize system, define a host object, and then add all WWPNs from the
SAN Volume Controller or front-end Storwize to it.
2. On the backend Storwize system, create host mappings between each volume and the
SAN Volume Controller or front-end Storwize host object that you created.
The volumes that are presented by the backend Storwize system are displayed in the SAN
Volume Controller or front-end Storwize MDisk view. The back-end Storwize system is
displayed in the SAN Volume Controller or front-end Storwize view with a vendor ID of IBM
and a product ID of 2145.
89
In the case of two HBAs in SAN Volume Controller nodes, the recommended connections are
shown in Figure 4-5, where blue connections are in the first fabric and the red connections
are in the second fabric.
Figure 4-5 FlashSystem to SAN Volume Controller zoning with two HBA cards
Best practice: For FlashSystems, use SAN Volume Controller with two HBA cards and
dedicate two ports of each card to FlashSystem. You cannot create storage zones with
ports 7 and 8; therefore, use ports 3 and 4 to connect other storage systems. Create host
zones with ports 3, 4, 7, or 8.
90
4.7.4 Volumes
To fully use all SAN Volume Controller/Storwize resources, at least eight volumes should be
created per FlashSystem storage controller. This way, all CPU cores, nodes, and FC ports are
fully used. The number of volumes often is not a problem because in real-world scenarios, the
number of volumes is much higher.
However, one important factor must be considered when volumes are created from the
FlashSystem storage pool. FlashSystem can process I/Os much faster than traditional HDDs.
In fact, they are even faster than cache operations because with cache, all I/Os to the volume
must be mirrored to another node in I/O group. This operation can take as much as 1
millisecond while I/Os that are issued directly (which means without a cache) to the
FlashSystem can take 100 - 200 microseconds. If you use volumes from the pure
FlashSystem storage pool, it is better to create cache-disabled volumes.
Best Practice: On SAN Volume Controller/Storwize, disable the cache for volumes that
are created in storage pools that are composed of only FlashSystem MDisks. You can
disable the cache when the volume is created or later. This does not apply to the volumes
in multitier storage pools.
If you have a mirrored volume and one copy comes from FlashSystem MDisks but the other
copy comes from spinning-type MDisks, it is better to have primary copy set to the
FlashSystem MDisks. Writes to mirrored volumes can be processed synchronously or
asynchronously to both copies. This configuration depends on the writemirrorpriority
volume parameter, which can have the value of latency (asynchronous) or redundancy
(synchronous).
Chapter 4. Back-end storage
91
Reads are processed only by the primary copy of the mirrored volume. Although
FlashSystem copy might improve your write performance (depending on the
writemirrorpriority setting of the volume), it can dramatically improve your read
performance if you set the primary copy to FlashSystem MDisk copy.
To change a primary copy of a volume, use the command chvdisk -primary
copy_id_of_mirrored_volume volume_name_or_volume_id. To change the mirroring type of a
volume copy to synchronous or asynchronous, use the command chvdisk
-mirrorwritepriority latency|redundancy volume_name_or_volume_id.
Best practice: Always change the volume primary copy to the copy that was built of
FlashSystem MDisks and change the mirrorwritepriority setting to latency.
For more information, see the following resources:
IBM SAN Volume Controller and IBM FlashSystem 820: Best Practices and Performance
Capabilities, REDP-5027, which is available at this website:
http://www.redbooks.ibm.com/abstracts/redp5027.html?Open
Implementing the IBM SAN Volume Controller and FlashSystem 820, SG24-8172, which
is available at this website:
http://www.redbooks.ibm.com/abstracts/sg248172.html?Open
92
93
Information that is learned from this type of analysis can lead to actions that are taken to
mitigate risks, such as scheduling application downtime, performing volume migrations, and
initiating FlashCopy. By using IBM Tivoli Storage Productivity Center, mapping the
virtualization layer can be done quickly. Also, Tivoli Storage Productivity Center can help to
eliminate mistakes that can be made by using a manual approach.
Figure 4-6 shows how a failing disk on a storage controller can be mapped to the MDisk that
is being used by an SAN Volume Controller cluster. To display this panel, click Physical
Disk RAID5 Array Logical Volume MDisk.
Figure 4-7 completes the end-to-end view by mapping the MDisk through the SAN Volume
Controller to the attached host. Click MDisk MDGroup VDisk Host disk.
94
Chapter 5.
95
96
97
4 - 60
4 - 60
4 - 60
4 - 60
4 - 60
IBM FlashSystems
8 - 60
98
1-2
IBM FlashSystems
The selection of LUN attributes for storage pools requires the following primary
considerations:
All LUNs (known to the SAN Volume Controller as MDisks) for a storage pool creation must
have the same performance characteristics. If MDisks of varying performance levels are
placed in the same storage pool, the performance of the storage pool can be reduced to the
level of the poorest performing MDisk. Likewise, all LUNs must also possess the same
availability characteristics.
Remember that the SAN Volume Controller does not provide any RAID capabilities within a
storage pool. The loss of access to any one of the MDisks within the storage pool affects the
entire storage pool. However, with the introduction of volume mirroring in SAN Volume
Controller V4.3, you can protect against the loss of a storage pool by mirroring a volume
across multiple storage pools. For more information, see Chapter 6, Volumes on page 125.
99
For LUN selection within a storage pool, ensure that the LUNs have the following
configuration:
Same type
Same RAID level
Same RAID width (number of physical disks in array)
Same availability and fault tolerance characteristics
You must place in separate storage pools the MDisks that are created on LUNs with varying
performance and availability characteristics.
The minimum volume size is 17 GB. Although you can create smaller LUNs, define LUNs on
17 GB boundaries to maximize the physical space available.
100
Support for MDisks larger than 2 TB: Although SAN Volume Controller V6.2 and higher
supports MDisks up to 1 PB, at the time of the writing of this book in 2013, there was no
support available for MDisks that are larger than 2 TB on the XIV system. However, further
enhancements were made in the subsequent releases and starting with SAN Volume
Controller V7.1 and higher, support for MDisks larger than 2 TB on the XIV system was
added.
SAN Volume Controller has a maximum of 511 LUNs that can be presented from the XIV
system. SAN Volume Controller does not currently support dynamically expanding the size of
the MDisk.
Because the XIV configuration grows 6 - 15 modules, use the SAN Volume Controller
rebalancing script to restripe volume extents to include new MDisks. For more information,
see 5.7, Rebalancing extents across a storage pool on page 107.
For a fully populated rack with 12 ports and 2 TB drives, create 48 volumes of 1632 GB each.
Tip: Always use the largest volumes possible.
Table 5-3 shows the number of 1632 GB LUNs that are created, depending on the XIV capacity
that is populated with 2 TB drives.
Table 5-3 Values that use the 1632 GB LUNs
Number of LUNs (MDisks) at
1632 GB each
16
26.1
27
26
42.4
43
30
48.9
50
33
53.9
54
37
60.4
61
40
65.3
66
44
71.8
73
48
78.3
79
The best use of the SAN Volume Controller virtualization solution with the XIV Storage
System can be achieved by running LUN allocation with the following basic parameters:
Allocate all LUNs (MDisks) to one storage pool. If multiple XIV systems are being
managed by SAN Volume Controller, each physical XIV system should have a separate
storage pool. This design provides a good queue depth on the SAN Volume Controller to
drive XIV adequately.
Use 1 GB or larger extent sizes because this large extent size ensures that data is striped
across all XIV system drives.
101
For more information about configuration of XIV behind SAN Volume Controller/Storwize, see
the following resources:
XIV Gen3 with SAN Volume Controller and Storwize V7000, REDP-5063, which is
available at this website:
http://www.redbooks.ibm.com/redpieces/abstracts/redp5063.html?Open
Can you use SAN Volume Controller with XIV as storage?, which is available at this
website:
https://www.ibm.com/developerworks/community/blogs/storage_redbooks/entry/can_y
ou_use_svc_with_xiv_as_storage?lang=en_us
IBM_2145:ITSO-CLS4:admin>svcinfo lsquorum
quorum_index status id name
controller_id
0
online 0 mdisk0 0
1
online 1 mdisk1 0
2
online 2 mdisk2 0
To move one SAN Volume Controller quorum MDisks from one MDisk to another or from one
storage subsystem to another, use the svctask chquorum command, as shown in
Example 5-2 on page 103.
102
As you can see in Example 5-2, quorum index 2 moved from mdisk2 on ITSO-4700 controller
to mdisk9 on ITSO-XIV controller.
Tip: Although the setquorum command (deprecated) still works, use the chquorum
command to change the quorum association.
The cluster uses the quorum disk for the following purposes:
As a tie breaker if a SAN fault occurs when exactly half of the nodes that were previously
members of the cluster are present
To hold a copy of important cluster configuration data
Only one active quorum disk is in a cluster. However, the cluster uses three MDisks as
quorum disk candidates. The cluster automatically selects the actual active quorum disk from
the pool of assigned quorum disk candidates.
If a tiebreaker condition occurs, the one-half portion of the cluster nodes that can reserve the
quorum disk after the split occurs locks the disk and continues to operate. The other half
stops its operation. This design prevents both sides from becoming inconsistent with each
other.
Criteria for quorum disk eligibility: To be considered eligible as a quorum disk, the
MDisk must meet the following criteria:
An MDisk must be presented by a disk subsystem that is supported to provide SAN
Volume Controller quorum disks.
To manually allow the controller to be a quorum disk candidate, you must enter the
following command:
svctask chcontroller -allowquorum yes
An MDisk must be in managed mode (no image mode disks).
An MDisk must have sufficient free extents to hold the cluster state information and the
stored configuration metadata.
An MDisk must be visible to all of the nodes in the cluster.
For more information about special considerations for the placement of the active quorum
disk for Stretched Cluster configurations, see Guidance for Identifying and Changing
Managed Disks Assigned as Quorum Disk Candidates, S1003311, which is available at this
website:
http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003311
103
Attention: Running an SAN Volume Controller cluster without a quorum disk can seriously
affect your operation. A lack of available quorum disks for storing metadata prevents any
migration operation (including a forced MDisk delete). Mirrored volumes can be taken
offline if no quorum disk is available. This behavior occurs because synchronization status
for mirrored volumes is recorded on the quorum disk.
During normal operation of the cluster, the nodes communicate with each other. If a node is
idle for a few seconds, a heartbeat signal is sent to ensure connectivity with the cluster. If a
node fails for any reason, the workload that is intended for it is taken over by another node
until the failed node is restarted and admitted again to the cluster (which happens
automatically). If the microcode on a node becomes corrupted (which results in a failure), the
workload is transferred to another node. The code on the failed node is repaired and the node
is admitted again to the cluster (all automatically).
The number of extents that are required depends on the extent size for the storage pool that
contains the MDisk. Table 5-4 provides the number of extents that are reserved for quorum
use by extent size.
Table 5-4 Number of extents that are reserved by extent size
Extent size (MB)
16
17
32
64
128
256
512
1024
2048
4096
8192
104
The MDisks that are used are of the same size and are, therefore, MDisks that provide the
same number of extents. If this requirement is not feasible, you must check the distribution
of the extents of the volumes in that storage pool.
In a multitiered storage pool, you have a mix of MDisks with more than one type of disk tier
attribute. For example, a storage pool contains a mix of generic_hdd and generic_ssd
MDisks.
A multitiered storage pool, therefore, contains MDisks with various characteristics, as
opposed to a single-tier storage pool. However, each tier must have MDisks of the same size
and MDisks that provide the same number of extents. Multi-tiered storage pools are used to
enable the automatic migration of extents between disk tiers by using the SAN Volume
Controller Easy Tier function. For more information about IBM System Storage Easy Tier, see
Chapter 11, IBM System Storage Easy Tier function on page 319.
It is likely that the MDisks (LUNs) that are presented to the SAN Volume Controller cluster
have various performance attributes because of the type of disk or RAID array on which they
are installed. The MDisks can be on a 15 K RPM Fibre Channel (FC) or serial-attached SCSI
(SAS) disk, a nearline SAS, Serial Advanced Technology Attachment (SATA), or SSDs.
Therefore, a storage tier attribute is assigned to each MDisk, with the default of generic_hdd.
With SAN Volume Controller V6.2, a new tier 0 level disk attribute is available for SSDs, and it
is known as generic_ssd.
There are two types of storage tier: generic_ssd or generic_hdd. For Storwize, when you
create an array with SSD drives, the MDisk becomes generic_ssd by default. If you create an
array with normal HDD drives (SAS or NL-SAS), the MDisk becomes generic_hdd by default.
If you present an external MDisk to SAN Volume Controller or Storwize, it becomes
generic_hdd by default, even if that external MDisk was built by using SSD drives or a flash
memory system; for example, where it is presented from IBM FlashSystem storage system.
You can change the MDisk tier only for MDisks that are presented from external storage
systems.
You can also define storage tiers by using storage controllers of varying performance and
availability levels. Then, you can easily provision them based on host, application, and user
requirements.
Remember that a single storage tier can be represented by multiple storage pools. For
example, if you have a large pool of tier 3 storage that is provided by many low-cost storage
controllers, it is sensible to use several storage pools. The use of several storage pools
prevents a single offline volume from taking all of the tier 3 storage offline.
When multiple storage tiers are defined, precautions must be taken to ensure that storage is
provisioned from the appropriate tiers. You can ensure that storage is provisioned from the
appropriate tiers through storage pool and MDisk naming conventions, with clearly defined
storage requirements for all hosts within the installation.
Naming conventions: When multiple tiers are configured, clearly indicate the storage tier
in the naming convention that is used for the storage pools and MDisks.
105
One or more nodes cannot access the MDisk through the chosen controller port.
I/O to the disk does not complete within a reasonable time.
The SCSI inquiry data that is provided for the disk is incorrect or incomplete.
The SAN Volume Controller cluster suffers a software error during the MDisk test.
Image-mode MDisks are not tested before they are added to a storage pool because an
offline image-mode MDisk does not take the storage pool offline.
106
There was a new release of SAN Volume ControllerTools package that was released in April
2013 with support of Storwize products and Easy Tier and is available for download at this
website:
https://www.ibm.com/developerworks/community/groups/service/html/communityview?com
munityUuid=18d10b14-e2c8-4780-bace-9af1fc463cc0
If you want to manually balance extents, you can use the following CLI commands to identify
and correct extent imbalance across storage pools. Remember that the svcinfo and svctask
prefixes are no longer required:
lsmdiskextent
migrateexts
lsmigrate
The following section describes how to use the script from the SAN Volume ControllerTools
package to rebalance extents automatically. You can use this script on any host with Perl and
an SSH client installed. The next section also describes how to install it on a Windows Server
2003 server.
107
108
controller_name UID
itso_ds4500
itso_ds4500
itso_ds4500
itso_ds4500
itso_ds4500
itso_ds4500
itso_ds4500
itso_ds4500
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
64
0
2
64
0
1
64
0
4
64
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
mdisk0
mdisk1
mdisk2
mdisk3
mdisk4
mdisk5
mdisk6
mdisk7
The balance.pl script is then run on the Master Console by using the following command:
C:\SVCTools\examples\balance>perl balance.pl itso_ds45_18gb -k "c:\icat.ppk" -i
9.43.86.117 -r -e
where:
itso_ds45_18gb
Indicates the storage pool to be rebalanced.
-k "c:\icat.ppk"
Gives the location of the PuTTY private key file, which is authorized for administrator
access to the SAN Volume Controller cluster.
-i 9.43.86.117
Gives the IP address of the cluster.
-r
Requires that the optimal solution is found. If this option is not specified, the extents can
still be unevenly spread at completion. However, not specifying -r often requires fewer
migration commands and less time. If time is important, you might not want to use -r at
first, but then rerun the command with -r if the solution is not good enough.
-e
Specifies that the script runs the extent migration commands. Without this option, it merely
prints the commands that it might run. You can use this option to check that the series of
steps is logical before you commit to migration.
In this example, with 4 x 8 GB volumes, the migration completed within around 15 minutes.
You can use the svcinfo lsmigrate command to monitor progress. This command shows a
percentage for each extent migration command that is issued by the script.
109
After the script completed, check that the extents are correctly rebalanced. Example 5-4
shows that the extents were correctly rebalanced in the example for this book. In a test run of
40 minutes of I/O (25% random, 70/30 read/write) to the four volumes, performance for the
balanced storage pool was around 20% better than for the unbalanced storage pool.
Example 5-4 Output of the lsmdiskextent command that shows a balanced storage pool
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
31
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
33
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
id
number_of_extents copy_id
0
32
0
2
32
0
1
32
0
4
32
0
110
mdisk0
mdisk1
mdisk2
mdisk3
mdisk4
mdisk5
mdisk6
mdisk7
111
Specify the -force flag on the svctask rmmdisk command, or select the corresponding option
in the GUI. Both actions cause the SAN Volume Controller to automatically move all used
extents on the MDisk to the remaining MDisks in the storage pool.
Alternatively, you might want to manually perform the extent migrations. Otherwise, the
automatic migration randomly allocates extents to MDisks (and areas of MDisks). After all of
the extents are manually migrated, the MDisk removal can proceed without the -force flag.
DS4000 volumes
Identify the DS4000 volumes by using the Logical Drive ID and the LUN that is associated
with the host mapping. The example in this section uses the following values:
Logical drive ID: 600a0b80001744310000c60b4e2eb524
LUN: 3
To identify the logical drive ID by using the Storage Manager Software, on the
Logical/Physical View tab, right-click a volume, and select Properties. The Logical Drive
Properties window (see Figure 5-1 on page 113) opens.
112
To identify your LUN, on the Mappings View tab, select your SAN Volume Controller host
group and then look in the LUN column in the right pane, as shown in Figure 5-2.
Complete the following steps to correlate the LUN with your corresponding MDisk:
1. Review the MDisk details and the UID field. The first 32 bits of the MDisk UID field
(600a0b80001744310000c60b4e2eb524) must be the same as your DS4000 logical drive ID.
2. Make sure that the associated DS4000 LUN correlates with the SAN Volume Controller
ctrl_LUN_#. For this task, convert your DS4000 LUN in hexadecimal and check the last
two bits in the SAN Volume Controller ctrl_LUN_# field. In the example that is shown in
Figure 5-3 on page 114, it is 0000000000000003.
113
The CLI references the Controller LUN as ctrl_LUN_#. The GUI references the Controller
LUN as LUN.
Note: The same identification steps apply to DS3000, DS5000, and DCS3000 storage
systems.
DS8000 LUN
The LUN ID only uniquely identifies LUNs within the same storage controller. If multiple
storage devices are attached to the same SAN Volume Controller cluster, the LUN ID must be
combined with the worldwide node name (WWNN) attribute to uniquely identify LUNs within
the SAN Volume Controller cluster.
To get the WWNN of the DS8000 controller, take the first 16 digits of the MDisk UID and
change the first digit from 6 to 5; for example, from 5005076305ffc74c to 6005076305ffc74c.
When detected as SAN Volume Controller ctrl_LUN_#, the DS8000 LUN is decoded as
40XX40YY00000000, where XX is the logical subsystem (LSS) and YY is the LUN within the LSS.
As detected by the DS8000, the LUN ID is the four digits starting from the 29th digit, as in the
example 6005076305ffc74c000000000000100700000000000000000000000000000000.
Figure 5-4 on page 115 shows LUN ID fields that are displayed in the DS8000 Storage
Manager.
114
From the MDisk details panel that is shown in Figure 5-5, the Controller LUN Number field is
4010400700000000, which translates to LUN ID 0x1007 (represented in hex).
You can also identify the storage controller from the Storage Subsystem field as DS8K75L3001,
which was manually assigned.
115
To identify your LUN, in the Volumes by Hosts view, expand your SAN Volume Controller host
group and then review the LUN column, as shown in Figure 5-7 on page 117.
116
The MDisk UID field consists of part of the controller WWNN from bits 2 - 13. You might check
those bits by using the svcinfo lscontroller command, as shown in Example 5-6.
Example 5-6 The lscontroller command
IBM_2145:tpcsvc62:admin>svcinfo lscontroller 10
id 10
controller_name controller10
WWNN 5001738002860000
...
The correlation can now be performed by taking the first 16 bits from the MDisk UID field. Bits
1 - 13 refer to the controller WWNN, as shown in Example 5-6. Bits 14 - 16 are the XIV
volume serial number (897) in hexadecimal format (resulting in 381 hex). The translation is
0017380002860381000000000000000000000000000000000000000000000000, where
0017380002860 is the controller WWNN (bits 2 - 13) and 381 is the XIV volume serial number
that is converted in hex.
To correlate the SAN Volume Controller ctrl_LUN_#, convert the XIV volume number in
hexadecimal format and then check the last three bits from the SAN Volume Controller
ctrl_LUN_#. In this example, the number is 0000000000000002, as shown in Figure 5-8 on
page 118.
117
Storwize volumes
The IBM Storwize solution is built upon the IBM SAN Volume Controller technology base and
uses similar terminology.
Complete the following steps to correlate the Storwize volumes with the MDisks:
1. Looking at the Storwize side first, check the Volume UID field that was presented to the
SAN Volume Controller host, as shown in Figure 5-9 on page 119.
118
2. On the Host Maps tab (see Figure 5-10), check the SCSI ID number for the specific volume.
This value is used to match the SAN Volume Controller ctrl_LUN_# (in hexadecimal format).
119
3. On the SAN Volume Controller side, review the MDisk details (see Figure 5-11) and
compare the MDisk UID field with the Storwize Volume UID. The first 32 bits should be the
same.
Figure 5-11 SAN Volume Controller MDisk Details for Storwize volumes
4. Double-check that the SAN Volume Controller ctrl_LUN_# is the Storwize SCSI ID number
in hexadecimal format. In this example, the number is 0000000000000004.
MDisk ID
MDisk name
1000
mdisk01
DA2/P0
1001
mdisk02
DA6/P16
1002
mdisk03
DA7/P30
1100
mdisk04
DA0/P9
1101
mdisk05
DA4/P23
1102
mdisk06
DA5/P39
To change extent allocation so that each extent alternates between even and odd extent
pools, the MDisks can be removed from the storage pool and then added again to the storage
pool in the new order.
Table 5-6 shows how the MDisks were added back to the storage pool in their new order so
that the extent allocation alternates between even and odd extent pools.
Table 5-6 MDisks that were added again
LUN ID
MDisk ID
MDisk name
1000
mdisk01
DA2/P0
1100
mdisk04
DA0/P9
1001
mdisk02
DA6/P16
1101
mdisk05
DA4/P23
1002
mdisk03
DA7/P30
1102
mdisk06
DA5/P39
121
If none of these options are appropriate, complete the following steps to move an MDisk to
another cluster:
1. Ensure that the MDisk is in image mode rather than striped or sequential mode.
If the MDisk is in image mode, the MDisk contains only the raw client data and not any
SAN Volume Controller metadata. If you want to move data from a non-image mode
volume, use the svctask migratetoimage command to migrate to a single image-mode
MDisk. For a thin-provisioned volume, image mode means that all metadata for the
volume is present on the same MDisk as the client data, which not readable by a host, but
it can be imported by another SAN Volume Controller cluster.
2. Remove the image-mode volumes from the first cluster by using the svctask rmvdisk
command.
The -force option: You must not use the -force option of the svctask rmvdisk
command. If you use the -force option, data in the cache is not written to the disk,
which might result in metadata corruption for a thin-provisioned volume.
3. Verify that the volume is no longer displayed by entering the svcinfo lsvdisk command.
You must wait until the volume is removed to allow cached data to destage to disk.
4. Change the back-end storage LUN mappings to prevent the source SAN Volume
Controller cluster from detecting the disk, and then make it available to the target cluster.
5. Enter the svctask detectmdisk command on the target cluster.
6. Import the MDisk to the target cluster:
If the MDisk is not a thin-provisioned volume, use the svctask mkvdisk command with
the -image option.
If the MDisk is a thin-provisioned volume, use the following options:
-import instructs the SAN Volume Controller to look for thin volume metadata on
the specified MDisk.
-rsize indicates that the disk is thin-provisioned. The value that is given to -rsize
must be at least the amount of space that the source cluster that is used on the
thin-provisioned volume. If it is smaller, an 1862 error is logged. In this case, delete
the volume and enter the svctask mkvdisk command again.
The volume is now online. If it is not online and the volume is thin-provisioned, check the SAN
Volume Controller error log for an 1862 error. If present, an 1862 error indicates why the
volume import failed (for example, metadata corruption). Then, you might be able to use the
repairsevdiskcopy command to correct the problem.
123
124
Chapter 6.
Volumes
This chapter explains how to create, manage, and migrate volumes (formerly volume disks)
across I/O groups. It also explains how to use IBM FlashCopy.
This chapter includes the following sections:
Overview of volumes
Volume mirroring
Creating volumes
Volume migration
Preferred paths to a volume
Non-Disruptive volume move (NDVM)
Cache mode and cache-disabled volumes
Effect of a load on storage controllers
Setting up FlashCopy services
125
Real capacity defines how much disk space is allocated to a volume. Virtual capacity is the
capacity of the volume that is reported to other IBM System Storage SAN Volume Controller
components (such as FlashCopy or remote copy) and to the hosts.
A directory maps the virtual address space to the real address space. The directory and the
user data share the real capacity.
Thin-provisioned volumes are available in two operating modes: autoexpand and
nonautoexpand. You can switch the mode at any time. If you select the autoexpand feature,
the SAN Volume Controller automatically adds a fixed amount of extra real capacity to the thin
volume as required. Therefore, the autoexpand feature attempts to maintain a fixed amount of
unused real capacity for the volume. This amount is known as the contingency capacity. The
contingency capacity is initially set to the real capacity that is assigned when the volume is
created. If the user modifies the real capacity, the contingency capacity is reset to be the
difference between the used capacity and real capacity.
A volume that is created without the autoexpand feature, and thus has a zero contingency
capacity, goes offline when the real capacity is used and must expand.
Warning threshold: Enable the warning threshold (by using email or an SNMP trap) when
you are working with thin-provisioned volumes, on the volume, and on the storage pool
side, especially when you do not use the autoexpand mode. Otherwise, the thin volume
goes offline if it runs out of space.
126
Autoexpand mode does not cause real capacity to grow much beyond the virtual capacity.
The real capacity can be manually expanded to more than the maximum that is required by
the current virtual capacity, and the contingency capacity is recalculated.
A thin-provisioned volume can be converted nondisruptively to a fully allocated volume, or
vice versa, by using the volume mirroring function. For example, you can add a
thin-provisioned copy to a fully allocated primary volume and then remove the fully allocated
copy from the volume after they are synchronized.
The fully allocated to thin-provisioned migration procedure uses a zero-detection algorithm so
that grains that contain all zeros do not cause any real capacity to be used.
Tip: Consider the use of thin-provisioned volumes as targets in FlashCopy relationships.
127
File system problems can be moderated by tools, such as defrag, or by managing storage by
using host Logical Volume Managers (LVMs).
The thin-provisioned volume also depends on how applications use the file system. For
example, some applications delete log files only when the file system is nearly full.
For more information about performance, see Part 2, Performance preferred practices on
page 261.
16
2,048
2,000
32
4,096
4,000
64
8,192
8,000
128
16,384
16,000
256
32,768
32,000
512
65,536
65,000
1024
131,072
130,000
2048
262,144
260,000
4096
524,288
520,000
8192
1,048,576
1,040,000
Table 6-2 show the maximum thin-provisioned volume virtual capacities for a grain size.
Table 6-2 Maximum thin volume virtual capacities for a grain size
Grain size in KB
32
260,000
64
520,000
128
1,040,000
256
2,080,000
128
Chapter 6. Volumes
129
For FlashCopy usage, a mirrored volume is only online to other nodes if it is online in its own
I/O group and if the other nodes are visible to the same copies as the nodes in the I/O group.
If a mirrored volume is a source volume in a FlashCopy relationship, asymmetric path failures
or a failure of the I/O group for the mirrored volume can cause the target volume to be taken
offline.
130
Performance tip: The minimal volume size should be determined by the following formula:
number of mdisks * extent size
For example:
8 mdisks * 256 MB Extent size = 2 GB
Volumes that are smaller than 2 GB are not evenly distributed across all MDisks. For
optimal performance, volume size should be the product of 2 GB. This enables full stripe
across all of the MDisks and spindles of the storage pool.
In Stretched Cluster environments, it is recommended to configure the preferred node that
is based on site awareness.
The maximum number of volumes per I/O group is 2048.
The maximum number of volumes per cluster is 8192 (eight-node cluster).
The smaller the extent size is that you select, the finer the granularity is of the volume of
space that is occupied on the underlying storage controller. A volume occupies an integer
number of extents, but its length does not need to be an integer multiple of the extent size.
The length does need to be an integer multiple of the block size. Any space left over
between the last logical block in the volume and the end of the last extent in the volume is
unused. A small extent size is used to minimize this unused space.
The counter view to this view is that, the smaller the extent size is, the smaller the total
storage volume is that the SAN Volume Controller can virtualize. The extent size does not
affect performance. For most clients, extent sizes of 128 MB or 256 MB give a reasonable
balance between volume granularity and cluster capacity. A default value set is no longer
available. Extent size is set during the managed disk group creation.
Important: You can migrate volumes only between storage pools that have the same
extent size, except for mirrored volumes. The two copies can be in different storage pools
with different extent sizes.
As described in 6.1, Overview of volumes on page 126, a volume can be created as
thin-provisioned or fully allocated, in one mode (striped, sequential, or image) and with one or
two copies (volume mirroring). With a few rare exceptions, you must always configure
volumes by using striping mode.
Important: To avoid negatively affecting system performance, you must thoroughly
understand the data layout and workload characteristics if you use sequential mode over
striping.
131
IBM_2145:svccg8:admin>svcinfo
id name
node_count
0 io_grp0
2
1 io_grp1
0
2 io_grp2
0
3 io_grp3
0
4 recovery_io_grp 0
glsiogrp
vdisk_count
29
0
0
0
0
host_count
9
9
9
9
0
IBM_2145:svccg8:admin>svcinfo
IBM_2145:svccg8:admin>svcinfo
id name
node_count
0 io_grp0
2
1 io_grp1
0
2 io_grp2
0
3 io_grp3
0
4 recovery_io_grp 0
132
6. On the SAN Volume Controller migrate the volume back to the original I/O group while
adding the preferred node options. Example 6-4 shows how to migrate by using the node
option.
Example 6-4 Migrating by using the node option
IBM_2145:svccg8:admin>svctask movevdisk
Chapter 6. Volumes
133
4. After you confirm that the new paths are online, remove access from the old I/O group by
running the rmvdiskaccess -iogrp iogrp id/name volume id/name command.
5. Run the appropriate commands on the hosts that are mapped to the volume to remove the
paths to the old I/O group.
For more information about nondisruptive volume movement in Linux, see the Host section
of this website:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004510
134
4. To move the volume across I/O groups, run the svctask movevdisk -iogrp io_grp1
TEST_1 command.
This command does not work when data is in the SAN Volume Controller cache that must
be written to the volume. After 2 minutes, the data automatically destages if no other
condition forces an earlier destaging.
5. On the host, rediscover the volume. For example, in Windows, run the rescan command,
and then mount the volume or add a drive letter. For more information, see Chapter 8,
Hosts on page 225.
6. Resume copy operations as required.
7. Resume I/O operations on the host.
After any copy relationships are stopped, you can move the volume across I/O Groups with
the following command in a SAN Volume Controller:
svctask movevdisk-iogrp newiogrpname/id vdiskname/id
Where newiogrpname/id is the name or ID of the I/O group to which you move the volume,
and vdiskname/id is the name or ID of the volume.
For example, the following command moves the volume that is named TEST_1 from its
existing I/O group, io_grp0, to io_grp1:
IBM_2145:svccg8:admin>svctask movevdisk-iogrp io_grp1 TEST_1
Migrating volumes between I/O groups can be a potential issue if the old definitions of the
volumes are not removed from the configuration before the volumes are imported to the host.
Migrating volumes between I/O groups is not a dynamic configuration change. However, you
must shut down the host before you migrate the volumes. Then, follow the procedure in
Chapter 8, Hosts on page 225 to reconfigure the SAN Volume Controller volumes to hosts.
Remove the stale configuration and restart the host to reconfigure the volumes that are
mapped to a host.
For information about how to dynamically reconfigure the SDD for the specific host operating
system, see Multipath Subsystem Device Driver: Users Guide, GC52-1309.
Important: Do not move a volume to an offline I/O group for any reason. Before you move
the volumes, you must ensure that the I/O group is online to avoid any data loss.
The command that is shown in step 4 on page 135 does not work if any data is in the SAN
Volume Controller cache that must first be flushed out. A -force flag is available that discards
the data in the cache rather than flushing it to the volume. If the command fails because of
outstanding I/Os, wait a few minutes after which the SAN Volume Controller automatically
flushes the data to the volume.
Attention: The use of the -force flag can result in data integrity issues.
Chapter 6. Volumes
135
Command
Managed-to-managed or
Image-to-managed
migratevdisk
Managed-to-image or
Image-to-image
migratetoimage
Migrating a volume from one storage pool to another is nondisruptive to the host application
by using the volume. Depending on the workload of the SAN Volume Controller, there might
be a slight performance impact. For this reason, migrate a volume from one storage pool to
another when the SAN Volume Controller has a relatively low load.
Migrating a volume from one storage pool to another storage pool: For the migration to
be acceptable, the source and destination storage pool must have the same extent size.
Volume mirroring can also be used to migrate a volume between storage pools. You can use
this method if the extent sizes of the two pools are not the same.
This section provides guidance for migrating volumes.
IBM_2145:svccg8:admin>svcinfo lsmigrate
migrate_type MDisk_Group_Migration
progress 0
migrate_source_vdisk_index 3
migrate_target_mdisk_grp 2
max_thread_count 4
migrate_source_vdisk_copy_id 0
IBM_2145:svccg8:admin>
136
2. To migrate the volume, get the name of the MDisk to which you migrate it by using the
command that is shown in Example 6-9.
Example 6-9 The lsmdisk command output
IBM_2145:svccg8:admin>lsmdisk -delim :
id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_name:UID:tier
0:D4K_ST1S12_LUN1:online:managed:2:MDG1DS4K:20.0GB:0000000000000000:DS4K:600a0b8000174233000071894e2eccaf000000000000000
00000000000000000:generic_hdd
1:mdisk0:online:array:3:MDG4DS8KL3331:136.2GB::::generic_ssd
2:D8K_L3001_1001:online:managed:0:MDG1DS8KL3001:20.0GB:4010400100000000:DS8K75L3001:6005076305ffc74c00000000000010010000
0000000000000000000000000000:generic_hdd
...
33:D8K_L3331_1108:online:unmanaged:::20.0GB:4011400800000000:DS8K75L3331:6005076305ffc7470000000000001108000000000000000
00000000000000000:generic_hdd
34:D4K_ST1S12_LUN2:online:managed:2:MDG1DS4K:20.0GB:0000000000000001:DS4K:600a0b80001744310000c6094e2eb4e400000000000000
000000000000000000:generic_hdd
From this command, you can see that D8K_L3331_1108 is the candidate for the image type
migration because it is unmanaged.
Chapter 6. Volumes
137
3. Run the migratetoimage command (as shown in Example 6-10) to migrate the volume to
the image type.
Example 6-10 The migratetoimage command
138
By default, the SAN Volume Controller assigns ownership of even-numbered volumes to one
node of a caching pair and the ownership of odd-numbered volumes to the other node. It is
possible for the ownership distribution in a caching pair to become unbalanced if volume sizes
are different between the nodes or if the volume numbers that are assigned to the caching
pair are predominantly even or odd.
To provide flexibility in making plans to avoid this problem, the ownership for a specific volume
can be explicitly assigned to a specific node when the volume is created. A node that is
explicitly assigned as an owner of a volume is known as the preferred node. Because it is
expected that hosts access volumes through the preferred nodes, those nodes can become
overloaded. When a node becomes overloaded, volumes can be moved to other I/O groups
because the ownership of a volume cannot be changed after the volume is created. For more
information, see 6.3.3, Non-Disruptive volume move on page 133.
SDD is aware of the preferred paths that SAN Volume Controller sets per volume. SDD uses
a load balancing and optimizing algorithm when failing over paths. That is, it tries the next
known preferred path. If this effort fails and all preferred paths were tried, it load balances on
the nonpreferred paths until it finds an available path. If all paths are unavailable, the volume
goes offline. Therefore, it can take time to perform path failover when multiple paths go offline.
SDD also performs load balancing across the preferred paths where appropriate.
Chapter 6. Volumes
139
capacity 1.00GB
type many
formatted no
mdisk_id many
mdisk_name many
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018205E12000000000000002
throttling 0
preferred_node_id 2
fast_write_state empty
cache readwrite
...
The throttle setting of zero indicates that no throttling is set. After you check the volume, you
can then run the chvdisk command.
To modify the throttle setting, run the following command:
svctask chvdisk -rate 40 -unitmb TEST_1
Running the lsvdisk command generates the output that is shown in Example 6-12.
Example 6-12 Output of the lsvdisk command
140
As shown in Example 6-13, the throttle setting has no unit parameter, which means that it is
an I/O rate setting.
Example 6-13 The chvdisk command and lsvdisk output
Chapter 6. Volumes
141
142
Chapter 6. Volumes
143
cache none
udid
fc_map_count 0
sync_rate 50
copy_count 1
se_copy_count 0
...
Tip: By default, the volumes are created with the cache mode enabled (read/write), but you
can specify the cache mode when the volume is created by using the -cache option.
Chapter 6. Volumes
145
Effect on I/O
Insignificant
Reads only
Insignificant
2xF
14 x F
Random writes
49 x F
Therefore, to calculate the average I/O per volume before overloading the storage pool, use
the following formula:
I/O rate = (I/O Capability) / (No volumes + Weighting Factor)
By using the example storage pool as defined earlier in this section, consider a situation in
which you add 20 volumes to the storage pool and that storage pool can sustain 5250 IOPS,
and two FlashCopy mappings also have random reads and writes. In this case, the average
I/O rate is calculated by the following formula:
5250 / (20 + 28) = 110
Therefore, if half of the volumes sustain 200 I/Os and the other half of the volumes sustain 10
I/Os, the average is still 110 IOPS.
Summary
As you can see from the examples in this section, Tivoli Storage Productivity Center is a
powerful tool for analyzing and solving performance problems. To monitor the performance of
your system, you can use the read and response times parameter for volumes and MDisks.
This parameter shows everything that you need in one view and it is the key day-to-day
performance validation metric. You can easily notice if a system that usually had 2 ms writes
and 6 ms reads suddenly has 10 ms writes and 12 ms reads and is becoming overloaded. A
general monthly check of CPU usage shows how the system is growing over time and
highlights when you must add an I/O group (or cluster).
In addition, rules apply to OLTP-type workloads, such as the maximum I/O rates for back-end
storage arrays. However, for batch workloads, the maximum I/O rates depend on many
factors such as workload, backend storage, code levels, and security.
146
2. The application sends the data to a file. The file system that accepts the data might buffer
it in memory for a period.
3. The file system sends the I/O to a disk controller after a defined period (or even based on
an event).
4. The disk controller might cache its write in memory before it sends the data to the physical
drive.
If the SAN Volume Controller is the disk controller, it stores the write in its internal cache
before it sends the I/O to the real disk controller.
5. The data is stored on the drive.
At any point, any number of unwritten blocks of data might be in any of these steps and are
waiting to go to the next step.
Also, the order of the data blocks that were created in step 1 might not be in the same order
that was used when the blocks are sent to steps 2, 3, or 4. Therefore, at any point, data that
arrives in step 4 might be missing a vital component that was not yet sent from step 1, 2, or 3.
FlashCopy copies often are created with data that is visible from step 4. Therefore, to
maintain application integrity, any I/O that is generated in step 1 must make it to step 4 when
the FlashCopy is started when a FlashCopy is created. There must not be any outstanding
write I/Os in steps 1, 2, or 3. If write I/Os are outstanding, the copy of the disk that is created
at step 4 is likely to be missing those transactions. If the FlashCopy is to be used, these
missing I/Os can make it unusable.
If you want to put Vdisk_1 into a FlashCopy mapping, you do not need to know the byte
size of that volume because it is a striped volume. Creating a target volume of 2 GB is
sufficient. The VDISK_IMAGE, which is used in our example, is an image-mode volume. In
this case, you must know its exact size in bytes.
Chapter 6. Volumes
147
Example 6-16 uses the -bytes parameter of the svcinfo lsvdisk command to find its
exact size. Therefore, you must create the target volume with a size of 21474836480 bytes,
not 20 GB.
Example 6-16 Finding the size of an image mode volume by using the CLI
148
Chapter 6. Volumes
149
Again, when a snap copy of the relational database environment is taken, all three disks
must be in sync. That way, when they are used in a recovery, the relational database is not
missing any transactions that might occur if each volume was copied by using FlashCopy
independently.
To ensure that data integrity is preserved when volumes are related to each other, complete
the following steps:
1. Ensure that your host is writing to the volumes as part of its daily activities. These volumes
become the source volumes in the FlashCopy mappings.
2. Identify the size and type (image, sequential, or striped) of each source volume. If any of
the source volumes is an image mode volume, you must know its size in bytes. If any of
the source volumes are sequential or striped mode volumes, their size, as reported by the
SAN Volume Controller GUI or SAN Volume Controller command line, is sufficient.
3. Create a target volume of the required size for each source that is identified in step 2. The
target volume can be an image, sequential, or striped mode volume. The only requirement
is that they must be the same size as their source volume. The target volume can be
cache-enabled or cache-disabled.
4. Define a FlashCopy consistency group. This consistency group is linked to each FlashCopy
mapping that you defined; therefore, that data integrity is preserved between each volume.
5. Define a FlashCopy mapping for each source volume, making sure that you defined the
source disk and the target disk in the correct order. If you use any of your newly created
volumes as a source and the volume of the existing host as the target, you destroy the
data on the volume if you start the FlashCopy.
When the mapping is defined, link this mapping to the FlashCopy consistency group that
you defined in the previous step.
As part of defining the mapping, you can specify the copy rate of 0 - 100. The copy rate
determines how quickly the SAN Volume Controller copies the source volumes to the
target volumes. When you set the copy rate to 0 (NOCOPY), SAN Volume Controller
copies only the blocks that changed on any volume since the consistency group was
started on the source volume or the target volume (if the target volume is mounted
read/write to a host).
6. Prepare the FlashCopy consistency group. This preparation process can take several
minutes to complete because it forces the SAN Volume Controller to flush any outstanding
write I/Os that belong to the volumes in the consistency group to the disk of the storage
controller. After the preparation process completes, the consistency group has a Prepared
status, and all source volumes behave as though they were cache-disabled volumes until
the consistency group is started or deleted.
You can perform step 1 on page 150 - step 6 on page 150 when the host that owns the
source volumes is performing its typical daily duties (that is, no downtime). During the
prepare step (which can take several minutes) you might experience a delay in I/O
throughput because the cache on the volumes is temporarily disabled.
More latency: If you create a FlashCopy mapping where the source volume is a target
volume of an active Metro Mirror relationship, this mapping adds latency to that existing
Metro Mirror relationship. It also can affect the host that is using the source volume of
that Metro Mirror relationship as a result. The reason for the added latency is that the
preparation process of the FlashCopy consistency group disables the cache on all
source volumes, which might be target volumes of a Metro Mirror relationship.
Therefore, all write I/Os from the Metro Mirror relationship must commit to the storage
controller before the complete status is returned to the host.
150
7. After the consistency group is prepared, quiesce the host by forcing the host and the
application to stop I/Os and to flush any outstanding write I/Os to disk. This process differs
for each application and for each operating system. One way to quiesce the host is to stop
the application and unmount the volumes from the host.
You must perform this step when the application I/O is stopped (or suspended). However,
steps 8 and 9 complete quickly and application unavailability is minimal.
8. When the host completes its flushing, start the consistency group. The FlashCopy start
completes quickly (at most, in a few seconds).
9. After the consistency group starts, unquiesce your application (or mount the volumes and
start the application), at which point the cache is re-enabled. FlashCopy continues to run
in the background and preserves the data that existed on the volumes when the
consistency group was started.
The target FlashCopy volumes can now be assigned to another host and used for read or
write, even though the FlashCopy processes were not completed.
Hint: Consider a situation where you intend to use any target volumes on the same host as
their source volume at the same time that the source volume is visible to that host. In this
case, you might need to perform more preparation steps to enable the host to access
volumes that are identical.
Chapter 6. Volumes
151
152
If you intend to keep the target so that you can use it as part of a quick recovery process, you
might choose one of the following options:
Create the FlashCopy mapping with the NOCOPY option initially. If the target is used and
migrated into production, you can change the copy rate at the appropriate time to the
appropriate rate to copy all the data to the target disk. When the copy completes, you can
delete the FlashCopy mapping and delete the source volume, freeing the space.
Create the FlashCopy mapping with a low copy rate. The use of a low rate might enable
the copy to complete without affecting your storage controller, which leaves bandwidth
available for production work. If the target is used and migrated into production, you can
change the copy rate to a higher value at the appropriate time to ensure that all data is
copied to the target disk. After the copy completes, you can delete the source, which frees
the space.
Create the FlashCopy with a high copy rate. Although this copy rate might add more I/O
burden to your storage controller, it ensures that you get a complete copy of the source
disk as quickly as possible.
By using the target on a different storage pool, which, in turn, uses a different array or
controller, you reduce your window of risk if the storage that provides the source disk
becomes unavailable.
With multiple target FlashCopy, you can now use a combination of these methods. For
example, you can use the NOCOPY rate for an hourly snapshot of a volume with a daily
FlashCopy that uses a high copy rate.
153
9. Change the masking on the LUNs on the old storage controller so that the SAN Volume
Controller is now the only user of the LUNs. You can change this masking one LUN at a
time. This way, you can discover them (in the next step) one at a time and not mix up any
LUNs.
10.Run the svctask detectmdisk command to discover the LUNs as MDisks. Then, run the
svctask chmdisk command to give the LUNs a more meaningful name.
11.Define a volume from each LUN and note its exact size (to the number of bytes) by
running the svcinfo lsvdisk command.
12.Define a FlashCopy mapping and start the FlashCopy mapping for each volume by
following the steps that are described in 6.8.1, Making a FlashCopy volume with
application data integrity on page 147.
13.Assign the target volumes to the hosts and then restart your hosts. Your host sees the
original data with the exception that the storage is now an IBM SAN Volume Controller
LUN.
You now have a copy of the existing storage and the SAN Volume Controller is not configured
to write to the original storage. Therefore, if you encounter any problems with these steps, you
can reverse everything that you did, assign the old storage back to the host, and continue
without the SAN Volume Controller.
By using FlashCopy, any incoming writes go to the new storage subsystem and any read
requests that were not copied to the new subsystem automatically come from the old
subsystem (the FlashCopy source).
You can alter the FlashCopy copy rate (as appropriate) to ensure that all the data is copied to
the new controller.
After FlashCopy completes, you can delete the FlashCopy mappings and the source
volumes. After all the LUNs are migrated across to the new storage controller, you can
remove the old storage controller from the SAN Volume Controller node zones and then,
optionally, remove the old storage controller from the SAN fabric.
You can also use this process if you want to migrate to a new storage controller and not keep
the SAN Volume Controller after the migration. In step 2 on page 153, make sure that you
create LUNs that are the same size as the original LUNs. Then, in step 11, use image mode
volumes. When the FlashCopy mappings are completed, you can shut down the hosts and
map the storage directly to them, remove the SAN Volume Controller, and continue on the
new storage controller.
154
The target volume must be the same size as the source volume. However, the target
volume can be a different type (image, striped, or sequential mode) or have different cache
settings (cache-enabled or cache-disabled).
If you stop a FlashCopy mapping or a consistency group before it is completed, you lose
access to the target volumes. If the target volumes are mapped to hosts, they have I/O errors.
A volume cannot be a source in one FlashCopy mapping and a target in another
FlashCopy mapping.
A volume can be the source for up to 256 targets.
Starting with SAN Volume Controller V6.2.0.0, you can create a FlashCopy mapping by using
a target volume that is part of a remote copy relationship. This way, you can use the reverse
feature with a disaster recovery implementation. You can also use fast failback from a
consistent copy that is held on a FlashCopy target volume at the auxiliary cluster to the
master copy.
6.8.10 IBM System Storage Support for Microsoft Volume Shadow Copy
Service
The SAN Volume Controller provides support for the Microsoft Volume Shadow Copy Service
and Virtual Disk Service. The Microsoft Volume Shadow Copy Service can provide a
point-in-time (shadow) copy of a Windows host volume when the volume is mounted and files
are in use. The Microsoft Virtual Disk Service provides a single vendor and
technology-neutral interface for managing block storage virtualization, whether done by
operating system software, RAID storage hardware, or other storage virtualization engines.
The following components are used to provide support for the service:
SAN Volume Controller
The cluster Common Information Model (CIM) server
IBM System Storage hardware provider, which is known as the IBM System Storage
Support, for Microsoft Volume Shadow Copy Service and Virtual Disk Service software
Microsoft Volume Shadow Copy Service
The VMware vSphere Web Services when it is in a VMware virtual platform
Chapter 6. Volumes
155
The IBM System Storage hardware provider is installed on the Windows host. To provide the
point-in-time shadow copy, the components complete the following process:
1. A backup application on the Windows host starts a snapshot backup.
2. The Volume Shadow Copy Service notifies the IBM System Storage hardware provider
that a copy is needed.
3. The SAN Volume Controller prepares the volumes for a snapshot.
4. The Volume Shadow Copy Service quiesces the software applications that are writing
data on the host and flushes file system buffers to prepare for the copy.
5. The SAN Volume Controller creates the shadow copy by using the FlashCopy Copy
Service.
6. The Volume Shadow Copy Service notifies the writing applications that I/O operations can
resume and notifies the backup application that the backup was successful.
The Volume Shadow Copy Service maintains a free pool of volumes for use as a FlashCopy
target and a reserved pool of volumes. These pools are implemented as virtual host systems
on the SAN Volume Controller.
For more information about how to implement and work with IBM System Storage Support for
Microsoft Volume Shadow Copy Service, see Implementing the IBM System Storage SAN
Volume Controller V6.3, SG24-7933.
156
Chapter 7.
157
Background write synchronization and resynchronization writes I/O across the ICL (which is
performed in the background) to synchronize source volumes to target mirrored volumes on a
remote cluster. This concept is also referred to as a background copy.
Foreground I/O reads and writes I/O on a local SAN, which generates a mirrored foreground
write I/O that is across the ICL and remote SAN.
158
When you consider a remote copy solution, you must consider each of these processes and
the traffic that they generate on the SAN and ICL. You must understand how much traffic the
SAN can take (without disruption) and how much traffic your application and copy services
processes generate.
Successful implementation depends on taking a holistic approach in which you consider all
components and their associated properties. The components and properties include host
application sensitivity, local and remote SAN configurations, local and remote cluster and
storage configuration, and the ICL.
159
Performance loss: A performance loss in the foreground write I/O is a result of ICL
latency.
Asynchronous remote copy (Global Mirror)
A foreground write I/O is acknowledged as complete to the local host application before
the mirrored foreground write I/O is cached at the remote cluster. Mirrored foreground
writes are processed asynchronously at the remote cluster, but in a committed sequential
order as determined and managed by the Global Mirror remote copy process.
Performance loss: Performance loss in the foreground write I/O is minimized by
adopting an asynchronous policy to run a mirrored foreground write I/O. The effect of
ICL latency is reduced. However, a small increase occurs in processing foreground
write I/O because it passes through the remote copy component of the SAN Volume
Controllers software stack.
Global Mirror Change Volume
Holds earlier consistent revisions of data when changes are made. A change volume can
be created for the master volume and the auxiliary volume of the relationship.
Figure 7-1 shows some of the concepts of remote copy.
Link latency is the time that is taken by data to move across a network from one location to
another and is measured in milliseconds. The longer the time, the greater the performance
impact.
Link bandwidth is the network capacity to move data as measured in millions of bits per
second (Mbps) or billions of bits per second (Gbps).
The term bandwidth is also used in the following context:
Storage bandwidth: The ability of the back-end storage to process I/O. Measures the
amount of data (in bytes) that can be sent in a specified amount of time.
Global Mirror Partnership Bandwidth (parameter): The rate at which background write
synchronization is attempted (unit of MBps).
IP Replication
Before V7.2, remote copy services between remote SAN Volume Controller/Storwize storage
systems had to use FC network connections. By using V7.2 provides, users can configure
remote replication by using a 1 Gbit Ethernet connection without FCIP routers. That is, the
SAN Volume Controller/Storwize V7000 now offers native IP Replication.
161
This feature supports all remote copy modes with the normal remote copy license. IP
replication in V7.2 virtualization software includes Bridgeworks SANSlide network
optimization technology, which enhances the parallelism of data transfer by using multiple
virtual connections (VC) and by that improves WAN connection use. These virtual
connections share the same IP link and addresses and send more packets across other
virtual connections. For more information about SANSlide technology, see IBM Storwize
V7000 and SANSlide Implementation, RED-5023, which is available at this website:
http://www.redbooks.ibm.com/redpapers/pdfs/redp5023.pdf
Primary
Volume
H1
M1
Secondary
Volume
Site 1
M2
Site 2
Figure 7-3 on page 163 shows single physical link in two I/O groups system. With this
configuration, remote copy port group on each system includes four IP addresses. If H1 node
fails, the session between H1 and M2 fails and the system automatically establishes another
session between H2, H3, or H4 and M1, M2, M3, or M4.
162
11
11
H1
M1
I/O is
forwarded
from H6 to
H1
11
11
11
H3
H4
Primary
Volume
11
M2
M3
M4
H5
M5
H6
M6
I/O is
forwarded
from M4
to M6
Secondary
Volume
Site 2
Site 1
Note: For systems with more than two I/O groups, there is maximum of four IP addresses
than can be configured; therefore, only four nodes can participate in the remote copy port
group.
Dual physical links with all ports active and no standby ports
With this configuration, there is no redundancy in case of node failure and only half of the
bandwidth is available for replication. If there is a node failure, Global Mirror or Metro Mirror
replication does not operate properly.
Figure 7-4 shows dual physical links with all ports active.
Primary
Volume
H1
H2
M1
Site 1
Secondary
Volume
M2
Site 2
163
Note: It is recommended to cover node failure situations. In single I/O groups, use two ports
of the local node and two ports of the remote node in the same remote copy port group, as
shown in Figure 7-4 on page 163.
7.2.2 Dual physical links with active/standby for use in two or more I/O groups
environments
Each remote copy port group on each system includes two IP addresses. When the port
group is initially configured, the system establishes the pairings that are used.
If H1 node fails, the session between H1 and M2 fails and the system automatically
establishes another session between H3 and M2 or M4 because they are all in the same
remote copy port group with H1.
Figure 7-5 shows dual physical links with active/standby.
Primary
Volume
H1
H2 2
H3
H4
M1
M2
I/O is
forwarded
from M1 or
M2 to M3
M3
Secondary
Volume
Site 1
M4
Site 2
164
that are used to transmit data. Make sure that these ports are opened in the Firewall to
configure IP replication.
Do not mix iSCSI host I/O and IP partnership traffic. The recommendation is to use
different ports for each.
IP replication might use CPU resources. The first compressed volume is enabled in an I/O
group, which results in the reallocation of CPU cores. If you choose to create an IP
partnership on a Storwize V7000 system that has compressed volumes and the expected
throughout is more than 100 MBps in the inter-site link, it is recommended to configure
ports for the IP partnership in I/O groups that do not contain compressed volumes. For
more information about the CPU allocation when compression is used, see Chapter 17,
IBM Real-time Compression on page 593.
IP replication limitations
IP Replication features the following limitations:
1 Gb and 10 Gb cannot be mixed in the same port group.
If IPv6 is used for IP replication, the management IPs on both systems should have IPv6
addresses that have connectivity with each other
If IPv4 is used for IP replication, the management IPs on both systems should have
IPv4 addresses that have connectivity with each other
You can have only two direct attach links (not less or more) and both need to be on the
same I/O group
NAT (Network Address Translation) between systems that are being configured in an IP
Partnership group is not supported.
165
166
Figure 7-6 shows the supported and unsupported configurations for multiple cluster mirroring.
Improved support for Metro Mirror and Global Mirror relationships and
consistency groups
With SAN Volume Controller V5.1, the number of Metro Mirror and Global Mirror remote copy
relationships that can be supported increases from 1024 to 8192. This increase provides
improved scalability regarding increased data protection, and greater flexibility so that you
can use fully the new multiple Cluster Mirroring possibilities.
Consistency groups: You can create up to 256 consistency groups, and all 8192
relationships can be in a single consistency group, if required.
Zoning considerations
The zoning requirements were revised, as described in 7.4, Intercluster link on page 181.
For more information, see Nodes in Metro or Global Mirror Inter-cluster Partnerships May
Reboot if the Inter-cluster Link Becomes Overloaded, S1003634, which is available at this
website:
https://www.ibm.com/support/docview.wss?uid=ssg1S1003634
167
Stopped
FlashCopy
F
Metro Mirror
M
Global Mirror G
In release 6.1 and before, you couldnt Remote Copy (Global or Metro Mirror) a FlashCopy
target
So you could take a FlashCopy of a Remote Copy secondary for protecting consistency when
resynchronising, or to record an important state of the disk
G
But you couldnt copy it back to B without deleting the remote copy, then recreating the Remote Copy
means we have to copy everything to A
If corruption occurs on source volume A or the relationship stops and becomes inconsistent,
you might want to recover from the last incremental FlashCopy that was taken. Unfortunately,
recovering SAN Volume Controller versions before 6.2 means destroying the Metro Mirror and
Global Mirror relationship. In this case, the remote copy does not need to be running when a
FlashCopy process changes the state of the volume. If both processes were running
concurrently, a volume might be subject to simultaneous data changes.
Destruction of the Metro Mirror and Global Mirror relationship means that a complete
background copy is required before the relationship is again in a consistent-synchronized
state. In this case, the host applications are unprotected for an extended period.
With the release of 6.2, the relationship does not need to be destroyed, and a
consistent-synchronized state can be achieved more quickly. That is, host applications are
unprotected for a reduced period.
168
Remote copy: SAN Volume Controller supports the ability to make a FlashCopy copy
away from a Metro Mirror or Global Mirror source or target volume. That is, volumes in
remote copy relationships can act as source volumes of FlashCopy relationship.
However, when you prepare a FlashCopy mapping, the SAN Volume Controller puts the
source volumes in a temporary cache-disabled state. This temporary state adds latency to
the remote copy relationship. I/Os that are normally committed to the SAN Volume
Controller cache must now be directly committed as destaged to the back-end storage
controller.
169
To change the layer of a Storwize V7000, the system must meet the following prerequisites:
The Storwize V7000 must not have any defined host objects and must not be presenting
any volumes to an SAN Volume Controller as MDisks.
The Storwize V7000 must not be visible to any other SAN Volume Controller or Storwize
V7000 in the SAN fabric (this might require SAN zoning changes).
Changing a Storwize V7000 from Storage layer to Replication layer can be performed only by
using the CLI. After you confirm that these prerequisites are met, run the following command:
chsystem -layer replication
Figure 7-9 shows an example for possible replication. SAN Volume Controller uses a V7000
as a backend storage controller and replicates to a different V7000 (SAN Volume Controller =
replication, backend V7000 = storage, remote V7000 = replication).
Figure 7-10 on page 171 shows an example of replication between two Storwize V7000
system in replication layer and Storwize V3700 is the backend storage.
170
Figure 7-11 shows an example for replication between two Storwize V7000 when both of
them are in the Storage layer.
Figure 7-11 Replication example that uses Storwize V7000 at the storage layer
171
bandwidth
relationship_bandwidth_limit
gmlinktolerance
gm_max_hostdelay
The Global Mirror partnership bandwidth parameter specifies the rate (in MBps) at which the
background write resynchronization processes are attempted. That is, it specifies the total
bandwidth that the processes use.
With SAN Volume Controller V5.1.0, the granularity of control at a volume relationship level
for Background Write Resynchronization also can be modified by using the
relationship_bandwidth_limit parameter. Unlike its co-parameter, this parameter has a
default value of 25 MBps. The parameter defines at a cluster-wide level the maximum rate at
which background write resynchronization of an individual source-to-target volume is
attempted. Background write resynchronization is attempted at the lowest level of the
combination of these two parameters.
Background write resynchronization: The term background write resynchronization,
when used with SAN Volume Controller, is also referred to as Global Mirror Background
copy in this book and in other IBM publications.
172
Although asynchronous Global Mirror adds overhead to foreground write I/O, it requires a
dedicated portion of the interlink bandwidth to function. Controlling this overhead is critical to
foreground write I/O performance and is achieved by using the gmlinktolerance parameter.
This parameter defines the amount of time that Global Mirror processes can run on a poorly
performing link without adversely affecting foreground write I/O. By setting the
gmlinktolerance time limit parameter, you define a safety valve that suspends Global Mirror
processes so that foreground application write activity continues at acceptable performance
levels.
When you create a Global Mirror Partnership, the default limit of 300 seconds (5 minutes) is
used, but you can adjust this limit. The parameter can also be set to 0, which effectively turns off
the safety valve, meaning that a poorly performing link might adversely affect foreground write
I/O.
The gmlinktolerance parameter does not define what constitutes a poorly performing link. It
also does not explicitly define the latency that is acceptable for host applications.
With the release of V5.1.0, you define what constitutes a poorly performing link by using the
gmmaxhostdelay parameter. With this parameter, you can specify the maximum allowable
overhead increase in processing foreground write I/O (in milliseconds) that is attributed to the
effect of running Global Mirror processes. This threshold value defines the maximum
allowable impact that Global Mirror operations can add to the response times of foreground
writes on Global Mirror source volumes. You can use the parameter to increase the threshold
limit from its default value of 5 milliseconds. If this threshold limit is exceeded, the link is
considered to be performing poorly, and the gmlinktolerance parameter becomes a factor.
The Global Mirror link tolerance timer starts counting down.
173
Asynchronous writes: Writes to the target volume are made asynchronously. The host
that writes to the source volume provides the host with confirmation that the write is
complete before the I/O completes on the target volume.
174
Host
(1)
Write
(2) Ack
SCSI Target
Cache
Remote Copy
Master
volume
Cache
Section of SVCs Software Stack
Figure 7-13 Write I/O to volumes that are not in remote copy relationships
175
Host
(1)
Write
(2)
Remote Copy
Cache
Master
volume
Remote Copy
Cache
Auxillary
volume
With Global Mirror, a confirmation is sent to the host server before the host receives a
confirmation of the completion at the auxiliary volume. When a write is sent to a master
volume, it is assigned a sequence number. Mirror writes that are sent to the auxiliary volume
are committed in sequential number order. If a write is issued when another write is
outstanding, it might be given the same sequence number.
This function maintains a consistent image at the auxiliary volume all times. It identifies sets
of I/Os that are active concurrently at the primary VDisk, assigning an order to those sets, and
applying these sets of I/Os in the assigned order at the auxiliary volume. Further writes might
be received from a host when the secondary write is still active for the same block. In this
case, although the primary write might complete, the new host write on the auxiliary volume is
delayed until the previous write is completed.
176
177
A need still exists for master writes to be serialized, and the intermediate states of the master
data must be kept in a non-volatile journal while the writes are outstanding to maintain the
correct write ordering during reconstruction. Reconstruction must never overwrite data on the
auxiliary with an earlier version. The colliding writes of volume statistic monitoring are now
limited to those writes that are not affected by this change.
Figure 7-15 shows a colliding write sequence.
The following numbers correspond to the numbers that are shown in Figure 7-15:
1. A first write is performed from the host to LBA X.
2. A host is provided acknowledgment that the write is complete, even though the mirrored
write to the auxiliary volume is not yet completed.
The first two actions (1 and 2) occur asynchronously with the first write.
3. A second write is performed from the host to LBA X. If this write occurs before the host
receives acknowledgment (2), the write is written to the journal file.
4. A host is provided acknowledgment that the second write is complete.
Link speed
The speed of a communication link (link speed) determines how much data can be
transported and how long the transmission takes. The faster the link is, the more data can be
transferred within an amount of time.
178
Latency
Latency is the time that is taken by data to move across a network from one location to
another location and is measured in milliseconds. The longer the time is, the greater the
performance impact. Latency depends on the speed of light (c = 3 x108m/s, vacuum = 3.3
microsec/km (microsec represents microseconds, which is one millionth of a second)). The
bits of data travel at about two-thirds of the speed of light in an optical fiber cable.
However, some latency is added when packets are processed by switches and routers and
are then forwarded to their destination. Although the speed of light might seem infinitely fast,
latency becomes a noticeable factor over continental and global distances. Distance has a
direct relationship with latency. Speed of light propagation dictates about one milliseconds of
latency for every 100 miles. For some synchronous remote copy solutions, even a few
milliseconds of more delay can be unacceptable. Latency is a difficult challenge because
bandwidth and spending more money for higher speeds reduces latency.
Tip: SCSI write -over FC requires two round trips per I/O operation, as shown in the
following example:
2 (round trips) x 2 (operations) x 5 microsec/km = 20 microsec/km
At 50 km, you have another latency, as shown in the following example:
20 microsec/km x 50 km = 1000 microsec = 1 msec (msec represents millisecond)
Each SCSI I/O has 1 ms of more service time. At 100 km, it becomes 2 ms for more
service time.
Bandwidth
Regarding FC networks, bandwidth is the network capacity to move data as measured in
millions of bits per second (Mbps) or billions of bits per second (Gbps). In storage terms,
bandwidth measures the amount of data that can be sent in a specified amount of time.
Storage applications issue read and write requests to storage devices. These requests are
satisfied at a certain speed that is commonly called the data rate. Usually disk and tape
device data rates are measured in bytes per unit of time and not in bits.
Most modern technology storage device LUNs or volumes can manage sequential sustained
data rates in the order of 10 MBps to 80 - 90 MBps. Some manage higher rates.
For example, an application writes to disk at 80 MBps. If you consider a conversion ratio of
1 MB to 10 Mb (which is reasonable because it accounts for protocol overhead), the data rate
is 800 Mb.
Always check and make sure that you correctly correlate MBps to Mbps.
Attention: When you set up a Global Mirror partnership, the use of the mkpartnership
command with the -bandwidth parameter does not refer to the general bandwidth
characteristic of the links between a local and remote cluster. Instead, this parameter
refers to the background copy (or write resynchronization) rate, as determined by the client
that the ICL can sustain.
179
180
Master
volume
Auxillary
volume
Copy direction
Role
Primary
Role
Secondary
Role
Secondary
Role
Primary
Copy direction
Attention: When the direction of the relationship is changed, the roles of the volumes are
altered. A consequence is that the read/write properties are also changed, meaning that
the master volume takes on a secondary role and becomes read-only.
Redundancy
The ICL must adopt the same policy toward redundancy as for the local and remote clusters
to which it is connecting. The ISLs must have redundancy, and the individual ISLs must
provide the necessary bandwidth in isolation.
181
182
7.4.3 Zoning
Zoning requirements were revised, as described in Nodes in Metro or Global Mirror
Inter-cluster Partnerships May Reboot if the Inter-cluster Link Becomes Overloaded,
S1003634, which is available at this website:
https://www.ibm.com/support/docview.wss?uid=ssg1S1003634
Although Multicluster Mirroring is supported since SAN Volume Controller V5.1, it increases
the potential to zone multiple clusters (nodes) in usable configurations. Therefore, do not use
this configuration.
Abstract
SAN Volume Controller nodes in Metro Mirror or Global Mirror intercluster partnerships can
experience lease expiry reboot events if an ICL to a partner system becomes overloaded.
These reboot events can occur on all nodes simultaneously, which leads to a temporary loss
of host access to volumes.
Content
If an ICL becomes severely and abruptly overloaded, the local Fibre Channel fabric can
become congested if no FC ports on the local SAN Volume Controller nodes can perform
local intracluster heartbeat communication. This situation can result in the nodes that
experience lease expiry events, in which a node reboots to attempt to re-establish
communication with the other nodes in the system. If all nodes lease expire simultaneously,
this situation can lead to a loss of host access to volumes during the reboot events.
Workaround
Default zoning for intercluster Metro Mirror and Global Mirror partnerships now ensures that, if
link-induced congestion occurs, only two of the four Fibre Channel ports on each node can be
subjected to this congestion. The remaining two ports on each node remain unaffected, and
therefore, can continue to perform intracluster heartbeat communication without interruption.
Adhere to the following revised guidelines for zoning:
For each node in a clustered system, zone only two Fibre Channel ports to two FC ports
from each node in the partner system. That is, for each system, you have two ports on
each SAN Volume Controller node that has only local zones (not remote zones).
If dual-redundant ISLs are available, split the two ports from each node evenly between
the two ISLs. For example, zone one port from each node across each ISL. Local system
zoning must continue to follow the standard requirement for all ports, on all nodes, in a
clustered system to be zoned to one another.
183
However, optical distance extension can be impractical in many cases because of cost or
unavailability.
SAN Volume Controller cluster links: Use distance extension only for links between SAN
Volume Controller clusters. Do not use it for intracluster links. Technically, distance
extension is supported for relatively short distances, such as a few kilometers (or miles). For
more information about why this arrangement should not be used, see IBM System Storage
SAN Volume Controller Restrictions, S1003903.
184
Two nodes
Four nodes
Six nodes
Eight nodes
Two nodes
2.6
4.0
5.4
6.7
Four nodes
4.0
5.5
7.1
8.6
Six nodes
5.4
7.1
8.8
10.5
Eight nodes
6.7
8.6
10.5
12.4
These numbers represent the total traffic between the two clusters when no I/O is occurring to
a mirrored volume on the remote cluster. Half of the data is sent by one cluster, and half of the
data is sent by the other cluster. The traffic is divided evenly over all available ICLs. Therefore,
if you have two redundant links, half of this traffic is sent over each link during fault-free
operation.
If the link between the sites is configured with redundancy to tolerate single failures, size the
link so that the bandwidth and latency statements continue to be accurate even during single
failure conditions.
185
Tip: SCSI write-over Fibre Channel requires two round trips per I/O operation, as shown in
the following example:
2 (round trips) x 2 (operations) x 5 microsec/km = 20 microsec/km
At 50 km, you have more latency, as shown in the following example:
20 microsec/km x 50 km = 1000 microsec = 1 msec (msec represents millisecond)
Each SCSI I/O has 1 msec of more service time. At 100 km, it becomes 2 msec of more
service time.
The decibel (dB) is a convenient way to express an amount of signal loss or gain within a
system or the amount of loss or gain that is caused by a component of a system. When signal
power is lost, you never lose a fixed amount of power. The rate at which you lose power is not
linear. Instead, you lose a portion of power; that is, one half, one quarter, and so on, which
makes it difficult to add up the lost power along a signals path through the network if you
measure signal loss in watts.
For example, a signal loses half of its power through a bad connection. Then, it loses another
quarter of its power on a bent cable. You cannot add plus ( + ) to find the total loss.
You must multiply by ( x ), which makes calculating large network dB loss
time-consuming and difficult. However, decibels are logarithmic so that you can easily
calculate the total loss or gain characteristics of a system by adding them up (they scale
logarithmically). If your signal gains 3 dB, the signal doubles in power. If your signal loses 3
dB, the signal divides the power into equal parts.
The decibel is a ratio of signal powers. You must have a reference point. For example, you
can state, There is a 5 dB drop over that connection. But you cannot state, The signal is 5
dB at the connection. A decibel is not a measure of signal strength. Instead, it is a measure
of signal power loss or gain.
A decibel milliwatt (dBm) is a measure of signal strength. People often confuse dBm with dB.
A dBm is the signal power in relation to 1 milliwatt. A signal power of zero dBm is 1 milliwatt, a
signal power of 3 dBm is 2 milliwatts, 6 dBm is 4 milliwatts, and so on. The more negative the
dBm goes, the closer the power level gets to zero. Do not be misled by the minus signs
because they have nothing to do with signal direction.
A good link has a small rate of frame loss. A retransmission occurs when a frame is lost,
which directly impacts performance. SAN Volume Controller aims to support retransmissions
at 0.2 or 0.1.
7.4.10 Hops
The hop count is not increased by the intersite connection architecture. For example, if you
have a SAN extension that is based on DWDM, the DWDM components are not apparent to
the number of hops. The hop count limit within a fabric is set by the fabric devices (switch or
director) operating system and is used to derive a frame hold time value for each fabric
device. This hold time value is the maximum amount of time that a frame can be held in a
switch before it is dropped or the fabric is busy condition is returned. For example, a frame
might be held if its destination port is unavailable. The hold time is derived from a formula that
uses the error detect timeout value and the resource allocation timeout value.
186
For more information about fabric values, see IBM TotalStorage: SAN Product, Design, and
Optimization Guide, SG24-6384. If these times become excessive, the fabric experiences
undesirable timeouts. It is considered that every extra hop adds about 1.2 microseconds of
latency to the transmission.
Currently, SAN Volume Controller remote copy services support three hops when protocol
conversion exists. Therefore, if you have DWDM extended between primary and secondary
sites, three SAN directors or switches can exist between primary and secondary SAN Volume
Controller.
187
You need at least 50 buffers to allow for nonstop transmission at 100 km distance. The
maximum distance that can be achieved at full performance depends on the capabilities of
the FC node that is attached at either end of the link extenders, which is vendor-specific. A
match should occur between the buffer credit capability of the nodes at either end of the
extenders.
A host bus adapter (HBA), with a buffer credit of 64 that communicates with a switch port that
has only eight buffer credits, can read at full performance over a greater distance than it can
write. The reason is that the HBA can send a maximum of only eight buffers to the switch port
on the writes; however, the switch can send up to 64 buffers to the HBA on the reads.
188
Write consistency for remote copy. This way, when the primary VDisk and the secondary
VDisk are synchronized, the VDisks stay synchronized even if a failure occurs in the
primary cluster or other failures that cause the results of writes to be uncertain.
189
190
Each node of the remote cluster has a fixed pool of Global Mirror system resources for each
node of the primary cluster. That is, each remote node has a separate queue for I/O from
each of the primary nodes. This queue is a fixed size and is the same size for every node.
If preferred nodes for the volumes of the remote cluster are set so that every combination of
primary node and secondary node is used, Global Mirror performance is maximized.
Figure 7-17 shows an example of Global Mirror resources that are not optimized. Volumes
from the local cluster are replicated to the remote cluster, where all volumes with a preferred
node of node 1 are replicated to the remote cluster, where the target volumes also have a
preferred node of node 1.
With this configuration, the resources for remote cluster node 1 that are reserved for local
cluster node 2 are not used. The resources for local cluster node 1 that are used for remote
cluster node 2 also are not used.
If the configuration changes to the configuration that is shown in Figure 7-18, all Global Mirror
resources for each node are used and SAN Volume Controller Global Mirror operates with
better performance than this configuration.
191
192
If the capabilities of this hardware are exceeded, the system becomes backlogged and the
hosts receive higher latencies on their write I/O. Remote copy in Metro Mirror and Global
Mirror implements a protection mechanism to detect this condition and halts mirrored
foreground write and background copy I/O. Suspension of this type of I/O traffic ensures that
misconfiguration or hardware problems (or both) do not affect host application availability.
Global Mirror attempts to detect and differentiate between back logs that are because of the
operation of the Global Mirror protocol. It does not examine the general delays in the system
when it is heavily loaded, where a host might see high latency even if Global Mirror were
disabled.
To detect these specific scenarios, Global Mirror measures the time that is taken to perform
the messaging to assign and record the sequence number for a write I/O. If this process
exceeds the expected average over a period of 10 seconds, this period is treated as being
overloaded.
Global Mirror uses the maxhostdelay and gmlinktolerance parameters to monitor Global
Mirror protocol backlogs in the following ways:
Users set the maxhostdelay and gmlinktolerance parameters to control how software
responds to these delays. The maxhostdelay parameter is a value in milliseconds that can
go up to 100.
Every 10 seconds, Global Mirror samples all of the Global Mirror writes and determines
how much of a delay it added. If over half of these writes is greater than the maxhostdelay
setting, that sample period is marked as bad.
Software keeps a running count of bad periods. Each time a bad period occurs, this count
goes up by one. Each time a good period occurs, this count goes down by 1, to a minimum
value of 0.
If the link is overloaded for several consecutive seconds greater than the gmlinktolerance
value, a 1920 error (or other Global Mirror error code) is recorded against the volume that
used the most Global Mirror resource over recent time.
A period without overload decrements the count of consecutive periods of overload.
Therefore, an error log is also raised if, over any period, the amount of time in overload
exceeds the amount of nonoverloaded time by the gmlinktolerance parameter.
193
Edge case
The worst possible situation is achieved by setting the gm_max_host_delay and
gmlinktolerance parameters to their minimum settings (1 ms and 20 s).
With these settings, you need only two consecutive bad sample periods before a 1920 error
condition is reported. Consider a foreground write I/O that has a light I/O load. For example, a
single I/O happens in the 20 s. With unlucky timing, a single bad I/O results (that is, a write I/O
that took over 1 ms in remote copy), and it spans the boundary of two, 10-second sample
periods. This single bad I/O theoretically can be counted as 2 x the bad periods and trigger a
1920 error.
A higher gmlinktolerance value, gm_max_host_delay setting, or I/O load might reduce the risk
of encountering this edge case.
194
You must have the same target volume size as the source volume size. However, the
target volume can be a different type (image, striped, or sequential mode) or have different
cache settings (cache-enabled or cache-disabled).
When you use SAN Volume Controller Global Mirror, ensure that all components in the
SAN switches, remote links, and storage controllers can sustain the workload that is
generated by application hosts or foreground I/O on the primary cluster. They must also
sustain workload that is generated by the following remote copy processes:
Mirrored foreground writes
Background copy (background write resynchronization)
Intercluster heartbeat messaging
You must set the Ignore Bandwidth parameter (which controls the background copy rate)
to a value that is appropriate to the link and secondary back-end storage.
Global Mirror is not supported for cache-disabled volumes that are participating in a
Global Mirror relationship.
Use a SAN performance monitoring tool, such as IBM Tivoli Storage Productivity Center,
to continuously monitor the SAN components for error conditions and performance
problems.
Have IBM Tivoli Storage Productivity Center alert you when a performance problem
occurs or if a Global Mirror (or Metro Mirror) link is automatically suspended by SAN
Volume Controller. A remote copy relationship that remains stopped without intervention
can severely affect your recovery point objective. Also, restarting a link that was
suspended for a long time can add burden to your links while the synchronization catches
up.
Set the gmlinktolerance parameter of the remote copy partnership to an appropriate
value. The default value of 300 seconds (5 minutes) is appropriate for most clients.
If you plan to perform SAN maintenance that might affect SAN Volume Controller Global
Mirror relationships, complete the following tasks:
Select a maintenance window where application I/O workload is reduced during the
maintenance.
Disable the gmlinktolerance feature or increase the gmlinktolerance value, meaning
that application hosts might see extended response times from Global Mirror volumes.
Stop the Global Mirror relationships.
195
Bandwidth analysis and capacity planning for your links helps to define how many links you
need and when you need to add more links to ensure the best possible performance and high
availability. As part of your implementation project, you can identify and then distribute hot
spots across your configuration, or take other actions to manage and balance the load.
You must consider the following areas:
If your bandwidth is so little, you might see an increase in the response time of your
applications at times of high workload.
The speed of light is less that 300,000 km/s, which is less than 300 km/ms on fiber optic
cable. Therefore, the data must go to the other site and then an acknowledgment must
come back. Add any possible latency times of some active components on the way and
you get approximately 1 ms of overhead per 100 km for write I/Os.
Metro Mirror adds extra latency time because of the link distance to the time of write
operation.
Determine whether your current SAN Volume Controller cluster or clusters can handle the
extra load.
Problems are not always related to remote copy services or ICL, but rather to hot spots on
the disks subsystems. Be sure to resolve these problems. Can your auxiliary storage
handle the added workload that it receives? It is basically the same back-end workload
that is generated by the primary applications.
196
197
198
199
Attention: If you do not use the -sync option, all of these steps are redundant because
the SAN Volume Controller performs a full initial synchronization anyway.
2. Stop each mirror relationship by using the -access option, which enables write access to
the target VDisks. You need this write access later.
3. Copy the source volume to the alternative media by using the dd command to copy the
contents of the volume to tape. Another option is to use your backup tool (for example,
IBM Tivoli Storage) to make an image backup of the volume.
Change tracking: Although the source is being modified while you are copying the
image, the SAN Volume Controller is tracking those changes. The image that you
create might have some of the changes and is likely to also miss some of the changes.
When the relationship is restarted, the SAN Volume Controller applies all of the
changes that occurred since the relationship stopped in step 2 on page 200. After all
the changes are applied, you have a consistent target image.
4. Ship your media to the remote site and apply the contents to the targets of the Metro
Mirror or Global Mirror relationship. You can mount the Metro Mirror and Global Mirror
target volumes to a UNIX server and use the dd command to copy the contents of the tape
to the target volume.
If you used your backup tool to make an image of the volume, follow the instructions for
your tool to restore the image to the target volume. Remember to remove the mount if the
host is temporary.
Tip: It does not matter how long it takes to get your media to the remote site and
perform this step. However, the faster you can get the media to the remote site and load
it, the quicker SAN Volume Controller starts running and maintaining the Metro Mirror
and Global Mirror.
5. Unmount the target volumes from your host. When you start the Metro Mirror and Global
Mirror relationship later, the SAN Volume Controller stops write access to the volume while
the mirror relationship is running.
6. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship
catches up, the target volume is not usable at all. When it reaches Consistent Copying
status, your remote volume is ready for use in a disaster.
200
201
Object names: SAN Volume Controller V6.1 supports object names up to 63 characters.
Previous levels supported only up to 15 characters. When SAN Volume Controller V6.1
clusters are partnered with V4.3.1 and V5.1.0 clusters, various object names are truncated
at 15 characters when displayed from V4.3.1 and V5.1.0 clusters.
202
Using a star topology, you can migrate applications by using the following process:
1. Suspend application at A.
2. Remove the A-B relationship.
3. Create the A-C relationship (or alternatively, the B-C relationship).
4. Synchronize to cluster C, and ensure that the following A-C relationship is established:
A-B, A-C, A-D, B-C, B-D, and C-D
A-B, A-C, and B-C
By using the cluster-star topology, you can migrate different applications at different times by
using the following process:
1. Suspend the application at data center A.
2. Take down the A-B data center relationship.
203
3. Create an A-C data center relationship (or a B-C data center relationship).
4. Synchronize to data center C and ensure that the A-C data center relationship is
established.
Migrating different applications over a series of weekends provides a phased migration
capability.
Attention: Create this configuration only if relationships are needed between every pair of
clusters. Restrict intercluster zoning only to where it is necessary.
Although clusters can have up to three partnerships, volumes can be part of only one remote
copy relationship; for example, A-B.
204
Important: The SAN Volume Controller supports copy services between only two clusters.
In Figure 7-25, the primary site uses SAN Volume Controller copy services (Global Mirror or
Metro Mirror) at the secondary site. Therefore, if a disaster occurs at the primary site, the
storage administrator enables access to the target volume (from the secondary site) and the
business application continues processing.
205
While the business continues processing at the secondary site, the storage controller copy
services replicate to the third site.
206
207
Tracking and applying the changes: Although the source is modified when you copy the
image, the SAN Volume Controller is tracking those changes. The image that you create
might include part of the changes and is likely to miss part of the changes.
When the relationship is restarted, the SAN Volume Controller applies all changes that
occurred since the relationship stopped in step 1. After all the changes are applied, you
have a consistent target image.
3. Ship your media to the remote site and apply the contents to the targets of the Metro
Mirror or Global Mirror relationship. You can mount the Metro Mirror and Global Mirror
target volumes to a UNIX server, and use the dd command to copy the contents of the tape
to the target volume. If you used your backup tool to make an image of the volume, follow
the instructions for your tool to restore the image to the target volume. Remember to
remove the mount if this host is temporary.
Tip: It does not matter how long it takes to get your media to the remote site and
perform this step. However, the faster you can get the media to the remote site and load
it, the quicker SAN Volume Controller starts running and maintaining the Metro Mirror
and Global Mirror.
4. Unmount the target volumes from your host. When you start the Metro Mirror and Global
Mirror relationship later, the SAN Volume Controller stops write access to the volume
when the mirror relationship is running.
5. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship
catches up, the target volume is unusable. When it reaches the Consistent Copying status,
your remote volume is ready for use in a disaster.
208
If clusters are at the same code level, the partnership is supported. If clusters are at different
code levels, check the compatibility according to the table in Figure 7-26 by completing the
following steps:
1. Select the higher code level from the column on the left side of the table.
2. Select the partner cluster code level from the row on the top of the table.
Figure 7-26 also shows intercluster Metro Mirror and Global Mirror compatibility.
If all clusters are running software V5.1 or later, each cluster can be partnered with up to
three other clusters, which support Multicluster Mirroring. If a cluster is running a software
level of V5.1 or earlier, each cluster can be partnered with only one other cluster.
209
Figure 7-27 shows a Metro Mirror or Global Mirror and FlashCopy relationship before SAN
Volume Controller V6.2.
Figure 7-27 Metro Mirror or Global Mirror and FlashCopy relationship before SAN Volume Controller V6.2
210
Figure 7-28 shows a Metro Mirror or Global Mirror and FlashCopy relationship with SAN
Volume Controller V6.2.
Figure 7-28 Metro Mirror or Global Mirror and FlashCopy relationships with SAN Volume Controller V6.2
211
In this method, the administrator must ensure that the source and target volumes contain
identical data before the relationship is created. There are two ways to ensure that the source
and master volumes contain identical data:
Both volumes are created with the security delete (-fmtdisk) feature to make all data zero.
A complete tape image (or other method of moving data) is copied from the source volume
to the target volume before you start the Global Mirror relationship. With this technique, do
not allow I/O on the source or target before the relationship is established.
Then, the administrator must run the following commands:
To ensure that a Global Mirror relationship is created, run the mkrcrelationship command
with the -sync flag.
To ensure that a new relationship is started, run the startrcrelationship command with
the -clean flag.
Attention: If you do not correctly perform these steps, Global Mirror can report the
relationship as consistent when it is not, which creates a data loss or data integrity
exposure for hosts that access the data on the auxiliary volume.
212
Stop or Error
When a remote copy relationship is stopped (intentionally or because of an error), a state
transition is applied. For example, the Metro Mirror relationships in the
ConsistentSynchronized state enter the ConsistentStopped state. The Metro Mirror
relationships in the InconsistentCopying state enter the InconsistentStopped state. If the
connection is broken between the SAN Volume Controller clusters in a partnership, all
intercluster Metro Mirror relationships enter a Disconnected state.
You must be careful when you restart relationships that are in an idle state because auxiliary
volumes in this state can process read and write I/O. If an auxiliary volume is written to when
in an idle state, the state of relationship is implicitly altered to inconsistent. When you restart
the relationship, you must change the direction of the relationship if you want to preserve any
write I/Os that occurred on the auxiliary volume.
213
7.9.2 Disaster recovery and Metro Mirror and Global Mirror states
A secondary (target volume) does not contain the requested data to be useful for disaster
recovery purposes until the background copy is complete. Until this point, all new write I/O
since the relationship started is processed through the background copy processes. As such,
it is subject to sequence and ordering of the Metro Mirror and Global Mirror internal
processes, which differ from the real-world ordering of the application.
At background copy completion, the relationship enters a ConsistentSynchronized state. All
new write I/O is replicated as it is received from the host in a consistent-synchronized
relationship. The primary and secondary volumes are different only in regions where writes
from the host are outstanding.
In this state, the target volume is also available in read-only mode. As shown in Figure 7-29
on page 212, a relationship can enter from ConsistentSynchronized in either of the following
states:
ConsistentStopped (state entered when posting a 1920 error)
Idling
Both the source and target volumes have a common point-in-time consistent state, and both
are made available in read/write mode. Write available means that both volumes can service
host applications, but any additional writing to volumes in this state causes the relationship to
become inconsistent.
Tip: Moving from this point usually involves a period of inconsistent copying and, therefore,
loss of redundancy. Errors that occur in this state become more critical because an
inconsistent stopped volume does not provide a known consistent level of redundancy. The
inconsistent stopped volume is unavailable in respect to read-only or write/read.
214
215
216
A 1920 error can result for many reasons. The condition might be the result of a temporary
failure, such as maintenance on the ICL, unexpectedly higher foreground host I/O workload,
or a permanent error because of a hardware failure. It is also possible that not all
relationships are affected and that multiple 1920 errors can be posted.
217
To debug, you must obtain information from the following components to ascertain their health
at the point of failure:
Switch logs (confirmation of the state of the link at the point of failure)
Storage logs
System configuration information from the master and auxiliary clusters for SAN Volume
Controller (by using the snap command), including the following types:
I/O stats logs, if available
Live dumps, if they were triggered at the point of failure
Tivoli Storage Productivity Center statistics (if available)
Important: Contact IBM Level 2 Support for assistance in collecting log information for
1920 errors. IBM Support personnel can provide collection scripts that you can use during
problem recreation or that you can deploy during proof-of-concept activities.
Technology
Bandwidth
Distance on all links (which can take multiple paths for redundancy)
Whether the link dedicated or shared; if so, the resource and amount of those
resources they use
Switch Write Acceleration to check with IBM for compatibility or known limitations
218
Specific workloads at the time of 1920 errors, which might not be relevant,
depending upon the occurrence of the 1920 errors and the VDisks that are involved
RAID rebuilds
Whether 1920 errors are associated with Workload Peaks or Scheduled Backup
Intercluster link
For diagnostic purposes, ask the following questions about the ICL:
Was link maintenance being performed?
Consider the hardware or software maintenance that is associated with ICL; for example,
updating firmware or adding more capacity.
Is the ICL overloaded?
You can find indications of this situation by using statistical analysis with the help of I/O
stats, Tivoli Storage Productivity Center, or both, to examine the internode
communications, storage controller performance, or both. By using Tivoli Storage
Productivity Center, you can check the storage metrics before for the Global Mirror
relationships were stopped, which can be tens of minutes depending on the
gmlinktolerance parameter.
Diagnose the overloaded link by using the following methods:
High response time for internode communication
An overloaded long-distance link causes high response times in the internode
messages that are sent by SAN Volume Controller. If delays persist, the messaging
protocols exhaust their tolerance elasticity and the Global Mirror protocol is forced to
delay handling new foreground writes while waiting for resources to free up.
Storage metrics (before the 1920 error is posted):
Target volume write throughput is greater than the source volume write throughput.
If this condition exists, the situation suggests a high level of background copy and
mirrored foreground write I/O. In these circumstances, decrease the background
copy rate parameter of the Global Mirror partnership to bring the combined mirrored
foreground I/O and background copy I/O rates back within the remote links
bandwidth.
Source volume write throughput after the Global Mirror relationships were stopped.
If write throughput increases greatly (by 30% or more) after the Global Mirror
relationships are stopped, the application host was attempting to perform more I/O
than the remote link can sustain.
When the Global Mirror relationships are active, the overloaded remote link causes
higher response times to the application host, which, in turn, decreases the
throughput of application host I/O at the source volume. After the Global Mirror
relationships stop, the application host I/O sees a lower response time, and the true
write throughput returns.
To resolve this issue, increase the remote link bandwidth, reduce the application
host I/O, or reduce the number of Global Mirror relationships.
Chapter 7. Remote copy services
219
Storage controllers
Investigate the primary and remote storage controllers, starting at the remote site. If the
back-end storage at the secondary cluster is overloaded, or another problem is affecting the
cache there, the Global Mirror protocol fails to keep up. The problem similarly exhausts the
(gmlinktolerance) elasticity and has a similar effect at the primary cluster.
In this situation, ask the following questions:
Are the storage controllers at the remote cluster overloaded (pilfering slowly)?
Use Tivoli Storage Productivity Center to obtain the back-end write response time for each
MDisk at the remote cluster. A response time for any individual MDisk that exhibits a
sudden increase of 50 ms or more, or that is higher than 100 ms, generally indicates a
problem with the back end.
Tip: Any of the MDisks on the remote back-end storage controller that are providing
poor response times can be the underlying cause of a 1920 error. For example, the
response prevents application I/O from proceeding at the rate that is required by the
application host and the gmlinktolerance parameter is issued, which causes the 1920
error.
However, if you followed the specified back-end storage controller requirements and were
running without problems until recently, the error is most likely caused by a decrease in
controller performance because of maintenance actions or a hardware failure of the
controller. Check whether an error condition is on the storage controller, for example,
media errors, a failed physical disk, or a recovery activity, such as RAID array rebuilding
that uses more bandwidth.
If an error occurs, fix the problem, and then restart the Global Mirror relationships.
If no error occurs, consider whether the secondary controller can process the required
level of application host I/O. You might improve the performance of the controller in the
following ways:
Adding more or faster physical disks to a RAID array
Changing the RAID level of the array
Changing the cache settings of the controller and checking that the cache batteries are
healthy, if applicable
Changing other controller-specific configuration parameter
Are the storage controllers at the primary site overloaded?
Analyze the performance of the primary back-end storage by using the same steps that
you use for the remote back-end storage. The main effect of bad performance is to limit
the amount of I/O that can be performed by application hosts. Therefore, you must monitor
back-end storage at the primary site regardless of Global Mirror.
However, if bad performance continues for a prolonged period, a false 1920 error might be
flagged. For example, the algorithms that access the effect of the running Global Mirror
incorrectly interpret slow foreground write activity (and the slow background write activity
that is associated with it) as being slow as a consequence of running Global Mirror. Then,
the Global Mirror relationships stop.
220
7.10.3 Recovery
After a 1920 error occurs, the Global Mirror auxiliary VDisks are no longer in the
ConsistentSynchronized state. You must establish the cause of the problem and fix it before
you restart the relationship. When the relationship is restarted, you must resynchronize it.
During this period, the data on the Metro Mirror or Global Mirror auxiliary VDisks on the
secondary cluster is inconsistent, and your applications cannot use the VDisks as backup
disks.
Tip: If the relationship stopped in a consistent state, you can use the data on the auxiliary
volume at the remote cluster as backup. Creating a FlashCopy of this volume before you
restart the relationship gives more data protection. The FlashCopy volume that is created
maintains the current, consistent, image until the Metro Mirror or Global Mirror relationship
is synchronized again and back in a consistent state.
To ensure that the system can handle the background copy load, you might want to delay
restarting the Metro Mirror or Global Mirror relationship until a quiet period occurs. If the
required link capacity is unavailable, you might experience another 1920 error, and the Metro
Mirror or Global Mirror relationship stops in an inconsistent state.
221
222
223
If your VDisk or MDisk configuration changed, restart your Tivoli Storage Productivity Center
performance report to ensure that performance is correctly monitored for the new
configuration.
If you are using Tivoli Storage Productivity Center, monitor the following information:
Global Mirror secondary write lag
You monitor the Global Mirror secondary write lag to identify mirror delays.
Port-to-remote node send response
Time must be less than 80 ms (the maximum latency that is supported by SAN Volume
Controller Global Mirror). A number in excess of 80 ms suggests that the long-distance
link has excessive latency, which must be rectified. One possibility to investigate is that the
link is operating at maximum bandwidth.
Sum of Port-to-local node send response time and Port-to-local node send queue
The time must be less than 1 ms for the primary cluster. A number in excess of 1 ms might
indicate that an I/O group is reaching its I/O throughput limit, which can limit performance.
CPU utilization percentage
CPU utilization must be below 50%.
Sum of Back-end write response time and Write queue time for Global Mirror MDisks at
the remote cluster
The time must be less than 100 ms. A longer response time can indicate that the storage
controller is overloaded. If the response time for a specific storage controller is outside of
its specified operating range, investigate for the same reason.
Sum of Back-end write response time and Write queue time for Global Mirror MDisks at
the primary cluster
Time must also be less than 100 ms. If response time is greater than 100 ms, the
application hosts might see extended response times if the cache of the SAN Volume
Controller becomes full.
Write data rate for Global Mirror managed disk groups at the remote cluster
This data rate indicates the amount of data that is being written by Global Mirror. If this
number approaches the ICL bandwidth or the storage controller throughput limit, further
increases can cause overloading of the system. Therefore, monitor this number
appropriately.
Hints and tips for Tivoli Storage Productivity Center statistics collection
Analysis requires Tivoli Storage Productivity Center Statistics (CSV) or SAN Volume
Controller Raw Statistics (XML). You can export statistics from your Tivoli Storage
Productivity Center instance. Because these files become large quickly, you can limit this
situation. For example, you can filter the statistics files so that individual records that are
below a certain threshold are not exported.
Default naming convention: IBM Support has several automated systems that support
analysis of Tivoli Storage Productivity Center data. These systems rely on the default
naming conventions (file names) that are used. The default name for Tivoli Storage
Productivity Center files is StorageSubsystemPerformance ByXXXXXX.csv, where XXXXXX is
the I/O group, managed disk group, MDisk, node, or volume.
224
Chapter 8.
Hosts
You can monitor host systems that are attached to the SAN Volume Controller by following
several preferred practices. A host system is an Open Systems computer that is connected to
the switch through a Fibre Channel (FC) interface.
The most important part of tuning, troubleshooting, and performance is the host that is
attached to a SAN Volume Controller. You must consider the following areas for performance:
The use of multipathing and bandwidth (physical capability of SAN and back-end storage)
Understanding how your host performs I/O and the types of I/O
The use of measurement and test tools to determine host performance and for tuning
This chapter supplements the IBM System Storage SAN Volume Controller V7.2 Information
Center and Guides, which are available at this website:
http://pic.dhe.ibm.com/infocenter/svc/ic/index.jsp
This chapter includes the following sections:
Configuration guidelines
Host pathing
I/O queues
Multipathing software
Host clustering and reserves
AIX hosts
Virtual I/O Server
Windows hosts
Linux hosts
Solaris hosts
VMware server
Mirroring considerations
Monitoring
225
226
We measured the effect of multipathing on performance, as shown in Table 8-1. As the table
shows, the differences in performance are minimal, but the differences can reduce
performance by almost 10% for specific workloads. These numbers were produced with an
AIX host that is running IBM Subsystem Device Driver (SDD) against the SAN Volume
Controller. The host was tuned specifically for performance by adjusting queue depths and
buffers.
We tested a range of reads and writes, random and sequential, cache hits, and misses at
transfer sizes of 512 bytes, 4 KB, and 64 KB.
Table 8-1 shows the effects of multipathing in IBM System Storage SAN Volume Controller
2145-8G4.
Table 8-1 Effect of multipathing on write performance
Read/write test
Four paths
Eight paths
Difference
81 877
74 909
-8.6%
60 510.4
57 567.1
-5.0%
130 445.3
124 547.9
-5.6%
1 810.8138
1 834.2696
1.3%
97 822.6
98 427.8
0.6%
1 674.5727
1 678.1815
0.2%
Although these measurements were taken with SAN Volume Controller 2145-8G4, hardware
and software performance does change release to release, and the figures that are shown in
Table 8-1 provide an example of the difference that multipathing can make.
Chapter 8. Hosts
227
228
id:name:SCSI_id:vdisk_id:vdisk_name:wwpn:vdisk_UID
2:host2:0:10:vdisk10:0000000000000ACA:6005076801958001500000000000000A
2:host2:1:11:vdisk11:0000000000000ACA:6005076801958001500000000000000B
2:host2:2:12:vdisk12:0000000000000ACA:6005076801958001500000000000000C
2:host2:3:13:vdisk13:0000000000000ACA:6005076801958001500000000000000D
2:host2:4:14:vdisk14:0000000000000ACA:6005076801958001500000000000000E
Chapter 8. Hosts
229
If you are using IBM multipathing software (SDD or SDDDSM), the datapath query device
command shows the vdisk_UID (unique identifier), which enables easier management of
volumes. The equivalent command for SDDPCM is the pcmpath query device command.
vdisk_UID
60050768018101BF28000000000000A8
60050768018101BF28000000000000A9
60050768018101BF28000000000000AA
60050768018101BF28000000000000AB
60050768018101BF28000000000000A7
60050768018101BF28000000000000B9
60050768018101BF28000000000000BA
60050768018101BF28000000000000B5
60050768018101BF28000000000000B1
60050768018101BF28000000000000B2
60050768018101BF28000000000000B3
60050768018101BF28000000000000B4
Example 8-5 shows the datapath query device output of this Windows host. The order of the
volumes of the two I/O groups is reversed from the hostmap. Volume s-1-8-2 is first, followed
by the rest of the LUNs from the second I/O group, then volume s-0-6-4, and the rest of the
LUNs from the first I/O group. Most likely, Windows discovered the second set of LUNS first.
However, the relative order within an I/O group is maintained.
Example 8-5 Using datapath query device for the hostmap
DEV#:
0 DEVICE NAME: Disk1 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B5
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk1 Part0
OPEN
NORMAL
1342
0
2
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk1 Part0
OPEN
NORMAL
1444
0
DEV#:
1 DEVICE NAME: Disk2 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B1
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
230
0
1
2
3
Scsi
Scsi
Scsi
Scsi
Port2
Port2
Port3
Port3
Bus0/Disk2
Bus0/Disk2
Bus0/Disk2
Bus0/Disk2
Part0
Part0
Part0
Part0
OPEN
OPEN
OPEN
OPEN
NORMAL
NORMAL
NORMAL
NORMAL
1405
0
1387
0
0
0
0
0
DEV#:
2 DEVICE NAME: Disk3 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B2
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk3 Part0
OPEN
NORMAL
1398
0
1
Scsi Port2 Bus0/Disk3 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk3 Part0
OPEN
NORMAL
1407
0
3
Scsi Port3 Bus0/Disk3 Part0
OPEN
NORMAL
0
0
DEV#:
3 DEVICE NAME: Disk4 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B3
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk4 Part0
OPEN
NORMAL
1504
0
1
Scsi Port2 Bus0/Disk4 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
1281
0
3
Scsi Port3 Bus0/Disk4 Part0
OPEN
NORMAL
0
0
DEV#:
4 DEVICE NAME: Disk5 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B4
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk5 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk5 Part0
OPEN
NORMAL
1399
0
2
Scsi Port3 Bus0/Disk5 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk5 Part0
OPEN
NORMAL
1391
0
DEV#:
5 DEVICE NAME: Disk6 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A8
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk6 Part0
OPEN
NORMAL
1400
0
1
Scsi Port2 Bus0/Disk6 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk6 Part0
OPEN
NORMAL
1390
0
3
Scsi Port3 Bus0/Disk6 Part0
OPEN
NORMAL
0
0
DEV#:
6 DEVICE NAME: Disk7 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A9
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk7 Part0
OPEN
NORMAL
1379
0
1
Scsi Port2 Bus0/Disk7 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk7 Part0
OPEN
NORMAL
1412
0
3
Scsi Port3 Bus0/Disk7 Part0
OPEN
NORMAL
0
0
DEV#:
7 DEVICE NAME: Disk8 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000AA
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk8 Part0
OPEN
NORMAL
0
0
Chapter 8. Hosts
231
1
2
3
OPEN
OPEN
OPEN
NORMAL
NORMAL
NORMAL
1417
0
1381
0
0
0
DEV#:
8 DEVICE NAME: Disk9 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000AB
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk9 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk9 Part0
OPEN
NORMAL
1388
0
2
Scsi Port3 Bus0/Disk9 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk9 Part0
OPEN
NORMAL
1413
0
DEV#:
9 DEVICE NAME: Disk10 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000A7
=============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk10 Part0
OPEN
NORMAL
1293
0
1
Scsi Port2 Bus0/Disk10 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk10 Part0
OPEN
NORMAL
1477
0
3
Scsi Port3 Bus0/Disk10 Part0
OPEN
NORMAL
0
0
DEV#: 10 DEVICE NAME: Disk11 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000B9
=============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk11 Part0
OPEN
NORMAL
0
0
1
Scsi Port2 Bus0/Disk11 Part0
OPEN
NORMAL
59981
0
2
Scsi Port3 Bus0/Disk11 Part0
OPEN
NORMAL
0
0
3
Scsi Port3 Bus0/Disk11 Part0
OPEN
NORMAL
60179
0
DEV#: 11 DEVICE NAME: Disk12 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018101BF28000000000000BA
=============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk12 Part0
OPEN
NORMAL
28324
0
1
Scsi Port2 Bus0/Disk12 Part0
OPEN
NORMAL
0
0
2
Scsi Port3 Bus0/Disk12 Part0
OPEN
NORMAL
27111
0
3
Scsi Port3 Bus0/Disk12 Part0
OPEN
NORMAL
0
0
Sometimes, a host might discover everything correctly at the initial configuration, but it does
not keep up with the dynamic changes in the configuration. Therefore, the SCSI ID is
important. For more information, see 8.2.4, Dynamic reconfiguration on page 235.
232
Chapter 8. Hosts
233
Nonpreferred node
Delta
18,227
21,256
3,029
Table 8-3 shows the change in throughput for 16 devices and a random 4 Kb read miss
throughput by using the preferred node versus a nonpreferred node (as shown in Table 8-2).
Table 8-3 The 16 device random 4 Kb read miss throughput (input/output per second (IOPS))
Preferred node (owner)
Nonpreferred node
Delta
105,274.3
90,292.3
14,982
Table 8-4 shows the effect of the use of the nonpreferred paths versus the preferred paths on
read performance.
Table 8-4 Random (1 TB) 4 Kb read response time (4.1 nodes, microseconds)
Preferred node (owner)
Nonpreferred node
Delta
5,074
5,147
73
Table 8-5 shows the effect of the use of nonpreferred nodes on write performance.
Table 8-5 Random (1 TB) 4 Kb write response time (4.2 nodes, microseconds)
Preferred node (owner)
Nonpreferred node
Delta
5,346
5,433
87
IBM SDD, SDDDSM, and SDDPCM software recognize the preferred nodes and use the
preferred paths.
234
Actively check the multipathing software display of paths that are available and currently in
usage. Do this check periodically and just before any SAN maintenance or software
upgrades. With IBM multipathing software (SDD, SDDPCM, and SDDDSM), this monitoring is
done by using the datapath query device or pcmpath query device commands.
Chapter 8. Hosts
235
Hosts do not dynamically reprobe storage unless they are prompted by an external change or
by the users manually, which causes rediscovery. Most operating systems do not notice a
change in a disk allocation automatically. Information is saved about the device database,
such as the Windows registry or the AIX Object Data Manager (ODM) database that is used.
DEV#:
0 DEVICE NAME: Disk1 Part0 TYPE: 2145
POLICY: OPTIMIZED
SERIAL: 60050768018201BEE000000000000041
============================================================================
Path#
Adapter/Hard Disk
State Mode
Select
Errors
0
Scsi Port2 Bus0/Disk1 Part0
CLOSE OFFLINE
0
0
1
Scsi Port3 Bus0/Disk1 Part0
CLOSE OFFLINE
263
0
Example 8-7 Datapath query device on an AIX host
The next time that a new volume is allocated and mapped to that host, the SCSI ID is reused
if it is allowed to set to the default value. Also, the host can confuse the new device with the
old device definition that is still left over in the device database or system memory.
You can get two devices that use identical device definitions in the device database, such as
in Example 8-8. Both vpath189 and vpath190 have the same hdisk definitions, but they
contain different device serial numbers. The fscsi0/hdisk1654 path exists in both vpaths.
Example 8-8 vpath sample output
236
3
fscsi1/hdisk1659
CLOSE
NORMAL
1
0
DEV#: 190 DEVICE NAME: vpath190 TYPE: 2145
POLICY:
Optimized
SERIAL: 600507680000009E68000000000007F4
============================================================================
Path#
Adapter/Hard Disk
State
Mode
Select
Errors
0
fscsi0/hdisk1654
OPEN
NORMAL
0
0
1
fscsi0/hdisk1655
OPEN
NORMAL
6336260
0
2
fscsi1/hdisk1658
OPEN
NORMAL
0
0
3
fscsi1/hdisk1659
OPEN
NORMAL
6326954
0
The multipathing software (SDD) recognizes that a new device is available because it issues
an inquiry command at configuration time and reads the mode pages. However, if the user
did not remove the stale configuration data, the ODM for the old hdisks and vpaths remains
and confuses the host because the SCSI ID, not the device serial number mapping, changed.
To avoid this situation, remove the hdisk and vpath information from the device configuration
database before you map new devices to the host and run discovery, as shown by the
commands in the following example:
rmdev -dl vpath189
rmdev -dl hdisk1654
To reconfigure the volumes that are mapped to a host, remove the stale configuration and
restart the host.
Another process that might cause host confusion is expanding a volume. The SAN Volume
Controller communicates to a host through the SCSI check condition mode parameters
changed. However, not all hosts can automatically discover the change and might confuse
LUNs or continue to use the old size.
For more information about supported hosts, see IBM System Storage SAN Volume
Controller V6.2.0 - Software Installation and Configuration Guide, GC27-2286.
Chapter 8. Hosts
237
The commands and actions on the host vary depending on the type of host and the
connection method that is used. This process must be completed on all hosts to which the
selected volumes are currently mapped.
You can also use the management GUI to move volumes between I/O groups nondisruptively.
In the management GUI, select Volumes Volumes. In the Volumes panel, select the
volume that you want to move and select Actions Move to Another I/O Group. The
wizard guides you through the steps for moving a volume to another I/O group, including any
changes to hosts that are required. For more information, click Need Help in the associated
management GUI panels.
In the following example, we move VDisk ndvm to another I/O group nondescriptively by using
Redhat Enterprise Linux 6.5 (Default Kernel).
Example 8-9 shows the Redhat Enterprise Linux 6.5 before I/O group migration. For this
example, the Storwize V7000/SAN Volume Controller caching I/O group is io_grp0.
Example 8-9 Native Linux multipath display before I/O group migration
238
3. Validate that the new paths are detected by Redhat Enterprise Linux 6.5, as shown in
Example 8-11.
Example 8-11 Native Linux multipath display access to both I/O groups
Chapter 8. Hosts
239
240
The host I/O is converted to MDisk I/O as needed. The SAN Volume Controller submits I/O to
the back-end (MDisk) storage as does any host. The host allows user control of the queue
depth that is maintained on a disk. SAN Volume Controller controls the queue depth for MDisk
I/O without any user intervention. After SAN Volume Controller submits I/Os and has Q
IOPS outstanding for a single MDisk (waiting for Q I/Os to complete), it does not submit any
more I/O until some I/O completes. That is, any new I/O requests for that MDisk are queued
inside SAN Volume Controller.
Figure 8-1 shows the effect on host volume queue depth for a simple configuration of 32
volumes and one host.
Figure 8-1 IOPS compared to queue depth for 32 volumes tests on a single host in V4.3
Figure 8-2 shows queue depth sensitivity for 32 volumes on a single host.
Figure 8-2 MBps compared to queue depth for 32 volume tests on a single host in V4.3
Chapter 8. Hosts
241
Although these measurements were taken with V4.3 code, the effect that queue depth has on
performance is the same regardless of the SAN Volume Controller code version.
Persistent reserve refers to a set of SCSI-3 standard commands and command options that
provide SCSI initiators with the ability to establish, preempt, query, and reset a reservation
policy with a specified target device. The functionality that is provided by the persistent
reserve commands is a superset of the legacy reserve or release commands. The persistent
reserve commands are incompatible with the legacy reserve or release mechanism. Also,
target devices can support only reservations from the legacy mechanism or the new
mechanism. Attempting to mix persistent reserve commands with legacy reserve or release
commands results in the target device returning a reservation conflict error.
Legacy reserve and release mechanisms (SCSI-2) reserved the entire LUN (volume) for
exclusive use down a single path. This approach prevents access from any other host or even
access from the same host that uses a different host adapter.
The persistent reserve design establishes a method and interface through a reserve policy
attribute for SCSI disks. This design specifies the type of reservation (if any) that the
operating system device driver establishes before it accesses data on the disk.
The following possible values are supported for the reserve policy:
No_reserve: No reservations are used on the disk.
Single_path: Legacy reserve or release commands are used on the disk.
242
PR_exclusive: Persistent reservation is used to establish exclusive host access to the disk.
PR_shared: Persistent reservation is used to establish shared host access to the disk.
When a device is opened (for example, when the AIX varyonvg command opens the
underlying hdisks), the device driver checks the ODM for a reserve_policy and a
PR_key_value and then opens the device appropriately. For persistent reserve, each host
that is attached to the shared disk must use a unique registration key value.
Chapter 8. Hosts
243
Use the AIX errno.h include file to determine what error number 16 indicates. This error
indicates a busy condition, which can indicate a legacy reserve or a persistent reserve from
another host (or that this host from a different adapter). However, some AIX technology levels
have a diagnostic open issue that prevents the pcmquerypr command from opening the device
to display the status or to clear a reserve.
For more information about older AIX technology levels that break the pcmquerypr command,
see IBM Multipath Subsystem Device Driver Path Control Module (PCM) Version 2.6.2.1
README FOR AIX, which is available at this website:
ftp://ftp.software.ibm.com/storage/subsystem/aix/2.6.2.1/sddpcm.readme.2.6.2.1.txt
Transaction-based settings
The host attachment script sets the default values of attributes for the SAN Volume Controller
hdisks: devices.fcp.disk.IBM.rte or devices.fcp.disk.IBM.mpio.rte. You can modify
these values as a starting point. In addition, you can use several HBA parameters to set
higher performance or large numbers of hdisk configurations.
You can change all attribute values that are changeable by using the chdev command for AIX.
AIX settings that can directly affect transaction performance are the queue_depth hdisk
attribute and num_cmd_elem attribute in the HBA attributes.
244
In this example, X is the hdisk number, and Y is the value to which you are setting X for
queue_depth.
For a high transaction workload of small random transfers, try a queue_depth value of 25 or
more. For large sequential workloads, performance is better with shallow queue depths, such
as a value of 4.
The AIX settings that can directly affect throughput performance with large I/O block size are
the lg_term_dma and max_xfer_size parameters for the fcs device.
Throughput-based settings
In the throughput-based environment, you might want to decrease the queue-depth setting to
a smaller value than the default from the host attach. In a mixed application environment, you
do not want to lower the num_cmd_elem setting because other logical drives might need this
higher value to perform. In a purely high throughput workload, this value has no effect.
Start values: For high throughput sequential I/O environments, use the start values
lg_term_dma = 0x400000 or 0x800000 (depending on the adapter type) and
max_xfr_size = 0x200000.
First, test your host with the default settings. Then, make these possible tuning changes to the
host parameters to verify whether these suggested changes enhance performance for your
specific host configuration and workload.
245
You can increase this attribute to improve performance. You can change this attribute only
with AIX V5.2 or later.
Setting the max_xfer_size attribute affects the size of a memory area that is used for data
transfer by the adapter. With the default value of max_xfer_size=0x100000, the area is 16 MB
in size, and for other allowable values of the max_xfer_size attribute, the memory area is
128 MB in size.
8.6.3 Multipathing
When the AIX operating system was first developed, multipathing was not embedded within
the device drivers. Therefore, each path to a SAN Volume Controller volume was represented
by an AIX hdisk.
The SAN Volume Controller host attachment script devices.fcp.disk.ibm.rte sets up the
predefined attributes within the AIX database for SAN Volume Controller disks. These
attributes changed with each iteration of the host attachment and AIX technology levels. Both
SDD and Veritas DMP use the hdisks for multipathing control. The host attachment is also
used for other IBM storage devices. The host attachment allows AIX device driver
configuration methods to properly identify and configure SAN Volume Controller (2145), IBM
DS6000 (1750), and IBM System Storage DS8000 (2107) LUNs.
For more information about supported host attachments for SDD on AIX, see Host
Attachments for SDD on AIX, S4000106, which is available at this website:
http://www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+attac
hment&uid=ssg1S4000106&loc=en_US&cs=utf-8&lang=en
246
8.6.4 SDD
IBM Subsystem Device Driver multipathing software was designed and updated consistently
over the last decade and is a mature multipathing technology. The SDD software also
supports many other IBM storage types, such as the 2107, that are directly connected to AIX.
SDD algorithms for handling multipathing also evolved. Throttling mechanisms within SDD
controlled overall I/O bandwidth in SDD Releases 1.6.1.0 and earlier. This throttling
mechanism evolved to be single vpath-specific and is called qdepth_enable in later releases.
SDD uses persistent reserve functions and places a persistent reserve on the device in place
of the legacy reserve when the volume group is varyon. However, if IBM High Availability
Cluster Multi-Processing (IBM HACMP) is installed, HACMP controls the persistent reserve
usage, depending on the type of varyon used. Also, the enhanced concurrent volume groups
have no reserves. The varyonvg -c command is for enhanced concurrent volume groups, and
varyonvg for regular volume groups that use the persistent reserve.
Datapath commands are a powerful method for managing the SAN Volume Controller storage
and pathing. The output shows the LUN serial number of the SAN Volume Controller volume
and which vpath and hdisk represent that SAN Volume Controller LUN. Datapath commands
can also change the multipath selection algorithm. The default is load balance, but the
multipath selection algorithm is programmable. When SDD is used, load balance by using
four paths. The datapath query device output shows a balanced number of selects on each
preferred path to the SAN Volume Controller, as shown in Example 8-15.
Example 8-15 Datapath query device output
8.6.5 SDDPCM
As Fibre Channel technologies matured, AIX was enhanced by adding native multipathing
support called multipath I/O (MPIO). By using the MPIO structure, a storage manufacturer
can create software plug-ins for their specific storage. The IBM SAN Volume Controller
version of this plug-in is called SDDPCM, which requires a host attachment script called
devices.fcp.disk.ibm.mpio.rte. For more information about SDDPCM, see Host
Attachment for SDDPCM on AIX, S4000203, which is available at this website:
http://www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+attac
hment&uid=ssg1S4000203&loc=en_US&cs=utf-8&lang=en
SDDPCM and AIX MPIO are continually improved since their release. You must be at the
latest release levels of this software.
Chapter 8. Hosts
247
You do not see the preferred path indicator for SDDPCM until after the device is opened for
the first time. For SDD, you see the preferred path immediately after you configure it.
SDDPCM features the following types of reserve policies:
No_reserve policy
Exclusive host access single path policy
Persistent reserve exclusive host policy
Persistent reserve shared host access policy
Usage of the persistent reserve now depends on the hdisk attribute, reserve_policy. Change
this policy to match your storage security requirements.
The following path selection algorithms are available:
Failover
Round-robin
Load balancing
The SDDPCM code of 2.1.3.0 and later features improvements in failed path reclamation by a
health checker, a failback error recovery algorithm, FC dynamic device tracking, and support
for a SAN boot device on MPIO-supported storage devices.
SDDPCM pathing
SDDPCM pcmpath commands are the best way to understand configuration information about
the SAN Volume Controller storage allocation. Example 8-16 shows the amount of
information that can be determined from the pcmpath query device command about the
connections to the SAN Volume Controller from this host.
Example 8-16 The pcmpath query device command
DEV#:
0 DEVICE NAME: hdisk0 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 6005076801808101400000000000037B
======================================================================
Path# Adapter/Path Name
State
Mode
Select Errors
0
fscsi0/path0
OPEN
NORMAL
155009 0
1
fscsi1/path1
OPEN
NORMAL
155156 0
248
In this example, both paths are used for the SAN Volume Controller connections. These
counts are not the normal select counts for a properly mapped SAN Volume Controller, and
two paths is an insufficient number of paths. Use the -l option on the pcmpath query device
command to check whether these paths are both preferred paths. If they are preferred paths,
one SAN Volume Controller node must be missing from the host view.
The use of the -l option shows an asterisk on both paths, which indicates that a single node
is visible to the host (and is the nonpreferred node for this volume), as shown in the following
examples:
0*
1*
fscsi0/path0
fscsi1/path1
OPEN
OPEN
NORMAL
NORMAL
9795 0
9558 0
This information indicates a problem that must be corrected. If zoning in the switch is correct,
perhaps this host was rebooted when one SAN Volume Controller node was missing from the
fabric.
Veritas
Veritas DMP multipathing is also supported for the SAN Volume Controller. Veritas DMP
multipathing requires certain AIX APARS and the Veritas Array Support Library (ASL). It also
requires a certain version of the host attachment script devices.fcp.disk.ibm.rte to
recognize the 2,145 devices as hdisks rather than MPIO hdisks. In addition to the normal
ODM databases that contain hdisk attributes, the following Veritas file sets contain
configuration data:
/dev/vx/dmp
/dev/vx/rdmp
/etc/vxX.info
Storage reconfiguration of volumes that are presented to an AIX host require cleanup of the
AIX hdisks and these Veritas file sets.
249
For more information, see V6.2 Supported Hardware List, Device Driver, Firmware and
Recommended Software Levels for SAN Volume Controller, which is available at this website:
https://www.ibm.com/support/docview.wss?uid=ssg1S1003797#_VIOS
For more information about VIOS, see this website:
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/faq.html
One common question is how to migrate data into a virtual I/O environment or how to
reconfigure storage on a VIOS. This question is addressed at the previous web address.
Many clients want to know whether you can move SCSI LUNs between the physical and
virtual environment as is. That is, on a physical SCSI device (LUN) with user data on it that
is in a SAN environment, can this device be allocated to a VIOS and then provisioned to a
client partition and used by the client as is?
The answer is no. This function is not supported as of this writing. The device cannot be used
as is. Virtual SCSI devices are new devices when they are created. The data must be put on
them after creation, which often requires a type of backup of the data in the physical SAN
environment with a restoration of the data onto the volume.
Due in part to the differences in disk format, virtual I/O is supported for new disk installations
only.
AIX, virtual I/O, and SDD development are working on changes to make this migration easier
in the future. One enhancement is to use the UDID or IEEE method of disk identification. If
you use the UDID method, you can contact IBM technical support to find a migration method
that might not require restoration. A quick and simple method to determine whether a backup
and restoration is necessary is to read the PVID off the disk by running the following
command:
250
lquerypv -h /dev/hdisk## 80 10
If the output is different on the VIOS and virtual I/O client, you must use backup and restore.
Chapter 8. Hosts
251
8.8.4 Guidelines for disk alignment by using Windows with SAN Volume
Controller volumes
You can find the preferred settings for best performance with SAN Volume Controller when
you use Microsoft Windows operating systems and applications with a significant amount of
I/O. For more information, see Performance Recommendations for Disk Alignment using
Microsoft Windows at this website:
http://www.ibm.com/support/docview.wss?rs=591&context=STPVGU&context=STPVFV&q1=mic
rosoft&uid=ssg1S1003291&loc=en_US&cs=utf-8&lang=en
252
Certain types of clustering are now supported. However, the multipathing software choice is
tied to the type of cluster and HBA driver. For example, Veritas Storage Foundation is
supported for certain hardware and kernel combinations, but it also requires Veritas DMP
multipathing. Contact IBM marketing for SCORE/RPQ support if you need Linux clustering in
your specific environment and it is not listed.
New Linux operating systems support native DM-MPIO. An example configuration of
multipath.conf is available at this website:
http://www-01.ibm.com/support/knowledgecenter/STPVGU_7.3.0/com.ibm.storage.svc.con
sole.730.doc/svc_linux_settings.html?lang=en
Chapter 8. Hosts
253
254
Chapter 8. Hosts
255
256
SAN Volume Controller V7.2 adds support for VMware vStorage APIs. SAN Volume
Controller implemented new storage-related tasks that were previously performed by
VMware, which helps improve efficiency and frees server resources for more mission-critical
tasks. The new functions include full copy, block zeroing, and hardware-assisted locking.
If you are not using the new API functions, the minimum and supported VMware level is V3.5.
If earlier versions are required, contact your IBM marketing representative and ask about the
submission of an RPQ for support. The required patches and procedures are supplied after
the specific configuration is reviewed and approved.
For more information about host attachment recommendations, see the Attachment
requirements for hosts running VMware operating systems topic in the IBM SAN Volume
Controller Version 7.2 Information Center at this website:
http://publib.boulder.ibm.com/infocenter/svc/ic/index.jsp?topic=/com.ibm.storage.s
vc.console.doc/svc_vmwrequiremnts_21layq.html
257
For more information about VMware and SAN Volume Controller, VMware storage, and
zoning recommendations, HBA settings, and attaching volumes to VMware, see
Implementing the IBM System Storage SAN Volume Controller V7.2, SG24-7933, which is
available at this website:
http://www.redbooks.ibm.com/abstracts/sg247933.html?Open
8.13 Monitoring
A consistent set of monitoring tools is available when IBM SDD, SDDDSM, and SDDPCM are
used for the multipathing software on the various operating system environments. You can
use the datapath query device and datapath query adapter commands for path monitoring.
You can also monitor path performance by using either of the following datapath commands:
datapath query devstats
pcmpath query devstats
The datapath query devstats command shows performance information for a single device,
all devices, or a range of devices. Example 8-21 on page 259 shows the output of the
datapath query devstats command for two devices.
258
Total Read
1755189
14168026
Total Write
1749581
153842715
Active Read
0
0
Active Write
0
0
Maximum
3
256
<= 512
271
<= 4k
2337858
<= 16K
104
<= 64K
1166537
> 64K
0
Total Read
20353800
162956588
Total Write
9883944
451987840
Active Read
0
0
Active Write
1
128
Maximum
4
256
<= 512
296
<= 4k
27128331
<= 16K
215
<= 64K
3108902
> 64K
0
Device #:
1
=============
I/O:
SECTOR:
Transfer Size:
Also, the datapath query adaptstats adapter-level statistics command is available (mapped
to the pcmpath query adaptstats command). Example 8-22 shows the use of two adapters.
Example 8-22 Output of the datapath query adaptstats command
Total Read
11060574
88611927
Total Write
5936795
317987806
Active Read
0
0
Active Write
0
0
Maximum
2
256
Total Read
11048415
88512687
Total Write
5930291
317726325
Active Read
0
0
Active Write
1
128
Maximum
2
256
Adapter #: 1
=============
I/O:
SECTOR:
You can clear these counters so that you can script the usage to cover a precise amount of
time. By using these commands, you can choose devices to return as a range, single device,
or all devices. To clear the counts, use the following command:
datapath clear device count
Chapter 8. Hosts
259
To prevent this loss-of-access event from occurring, many clients implement automated path
monitoring by using SDD commands and common system utilities. For example, a simple
command string, such as the following example, in a UNIX system can count the number of
paths, as shown in the following example:
datapath query device | grep dead | lc
You can combine this command with a scheduler, such as cron, and a notification system,
such as an email, to notify SAN administrators and system administrators if the number of
paths to the system changes.
260
Part 2
Part
Performance preferred
practices
This part highlights preferred practices for IBM System Storage SAN Volume Controller. It
includes the following chapters:
Chapter 9, Performance highlights for SAN Volume Controller V7.2 on page 263
Chapter 10, Back-end storage performance considerations on page 269
Chapter 11, IBM System Storage Easy Tier function on page 319
Chapter 12, Applications on page 339
261
262
Chapter 9.
263
xSeries
model
Processors
Memory
FC Ports
and speed
Solid-state
drives (SSDs)
iSCSI
4F2
x335
2 Xeon
4 GB
4@2 Gbps
8F2
x336
2 Xeon
8 GB
4@2 Gbps
8F4
x336
2 Xeon
8 GB
4@4 Gbps
8G4
x3550
2 Xeon 5160
8 GB
4@4 Gbps
8A4
x3250M2
1 dual-core
Xeon 3100
8 GB
4@4 Gbps
CF8
x3550M2
1 quad-core
Xeon E5500
24 GB
4@8 Gbps
Up to
4x 146 GBa
2x 1 Gbps
CG8
x3550M3
1 quad-core
Xeon E5600
24 GB
4@8 Gbps
Up to
4x 146 GBa
2x 1 Gbps
2x 10 Gbpsa
CG8
x3550M3
1 Hexa-core
Xeon 5600
2 Hexa-core
Xeon 5600b
24 GB
48 GBb
4@8 Gbps
8@ 8Gbpsa
Up to
4x800 GBa
2x 1 Gbps
2x 10 Gbpsa
a. Item is optional. In the CG8 model, a node can have one of the following components: SSDs or 10 Gbps
iSCSI or HBA interfaces.
b. Recommended for compression environments.
In January 2012, a SAN Volume Controller with the eight nodes model CG8 that is running
code V6.2 delivered 520,043.99 SPC-1 IOPS. For more information about these benchmarks,
see the following resources:
SPC Benchmark 1 Full Disclosure report: IBM System Storage SAN Volume Controller
v6.2 with IBM Storwize V7000 DISK storage:
http://www.storageperformance.org/benchmark_results_files/SPC-1/IBM/A00113_IBM_
SVC-6.2_Storwize-V7000/a00113_IBM_SVC-v6.2_Storwize-V7000_SPC-1_full-disclosure
.pdf
SPC Benchmark 1 Full Disclosure Report: IBM System Storage SAN Volume Controller
V5.1 (6-node cluster with 2 IBM DS8700S):
http://www.storageperformance.org/benchmark_results_files/SPC-1/IBM/A00087_IBM_
DS8700_SVC-5.1-6node/a00087_IBM_DS8700_SVC5.1-6node_full-disclosure-r1.pdf
264
Also, see the following Storage Performance Council website for the latest published SAN
Volume Controller benchmarks:
http://www.storageperformance.org/home/
When you consider Enterprise Storage solutions, raw I/O performance is important, but it is
not the only consideration. To date, IBM shipped more than 22,500 SAN Volume Controller
engines, which run in more than 7,200 SAN Volume Controller systems.
In 2008 and 2009, across the entire installed base, the SAN Volume Controller delivered
better than five nines (99.999%) availability. For more information about the SAN Volume
Controller, see the following IBM SAN Volume Controller website:
http://www.ibm.com/systems/storage/software/virtualization/svc
265
1,400,000
350,000
1,000,000
250,000
1,350,000
337,500
The goal of the test is to demonstrate SAN Volume Controller capabilities of a total
configuration and per I/O group.
For more information, see IBM SAN Volume Controller and IBM FlashSystem 820: Best
Practices and Performance Capabilities, REDP-5027, which is available at this website:
http://www.redbooks.ibm.com/abstracts/redp5027.html
266
When to use it
RAID-0
(Striped)
RAID-1
(Easy Tier)
RAID-10
(Mirrored)
4 - 8 drives, equally
distributed among each
node of the I/O group.
267
Plan carefully the distribution of your servers across your SAN Volume Controller I/O groups
and the volumes of one I/O group across its nodes. Reevaluate this distribution whenever you
attach another server to your SAN Volume Controller. Use the Performance Monitoring Tool
that is described in 9.4, Real-Time Performance Monitor on page 268 to help with this task.
Check this display periodically for possible hot spots that might be developing in your SAN
Volume Controller environment. To view this window in the GUI, go to the home page, and
select Performance in the upper-left menu. The SAN Volume Controller GUI begins plotting
the charts. After a few moments, you can view the graphs.
Position your cursor over a particular point in a curve to see details, such as the actual value
and time for that point. SAN Volume Controller plots a new point every 5 seconds and it
shows you the last 5 minutes of data. You can also change the System Statistics setting in the
upper-left corner to see details for a particular node.
The SAN Volume Controller Performance Monitor does not store performance data for later
analysis. Instead, its display shows what happened in the last 5 minutes only. Although this
information can provide valuable input to help you diagnose a performance problem in real
time, it does not trigger performance alerts or provide the long-term trends that are required
for capacity planning. For those tasks, you need a tool, such as IBM Tivoli Storage
Productivity Center, to collect and store performance data for long periods and present you
with the corresponding reports. For more information about this tool, see Chapter 13,
Monitoring on page 357.
268
10
Chapter 10.
Workload considerations
Tiering
Storage controller considerations
Array considerations
I/O ports, cache, and throughput considerations
SAN Volume Controller extent size
SAN Volume Controller cache partitioning
IBM DS8000 series considerations
IBM XIV considerations
Storwize V7000 considerations
DS5000 series considerations
Performance considerations with FlashSystems
269
270
10.2 Tiering
You can use the SAN Volume Controller to create tiers of storage in which each tier has
different performance characteristics by including only managed disks (MDisks) that have the
same performance characteristics within a managed disk group. Therefore, if you have a
storage infrastructure with, for example, three classes of storage, you create each volume
from the managed disk group that has the class of storage that most closely matches the
expected performance characteristics of the volume.
Because migrating between storage pools (or managed disk groups) is nondisruptive to
users, it is easy to migrate a volume to another storage pool if the performance is different
than expected.
Tip: If you are uncertain about in which storage pool to create a volume, use the pool with
the lowest performance first and then move the volume up to a higher performing pool
later, if required.
271
SSD
20,000
SAS 15 K
180
FC 15 K
160
SAS 10 K
150
FC 10 K
120
NL_SAS 7.2 K
100
SATA 7.2 K
80
The next parameter to consider when you calculate the I/O capacity of a RAID array is the
write penalty. Table 10-2 shows the write penalty for various RAID array types.
Table 10-2 RAID write penalty
RAID type
Number of sustained
failures
Number of disks
Write penalty
RAID 5
N+1
RAID 10
Minimum 1
2xN
RAID 6
N+2
RAID 5 and RAID 6 do not suffer from the write penalty if full stripe writes (also called
stride writes) are performed. In this case, the write penalty is 1.
With this information and the information about how many disks are in each array, you can
calculate the read and write I/O capacity of a particular array.
Table 10-3 shows the calculation for I/O capacity. In this example, the RAID array has
eight 15 K FC drives.
272
RAID 5
7 x 160 = 1120
(8 x 160)/4 = 320
RAID 10
8 x 160 = 1280
(8 x 160)/2 = 640
RAID 6
6 x 160 = 960
(8 x 160)/6 = 213
In most of the current generation of storage subsystems, write operations are cached and
handled asynchronously, meaning that the write penalty is hidden from the user. However,
heavy and steady random writes can create a situation in which write cache destage is not
fast enough. In this situation, the speed of the array is limited to the speed that is defined
by the number of drives and the RAID array type. The numbers in Table 10-3 on page 273
cover the worst-case scenario and do not consider read or write cache efficiency.
Storage pool I/O capacity
If you are using a 1:1 LUN (SAN Volume Controller managed disk) to array mapping, the
array I/O capacity is already the I/O capacity of the managed disk. The I/O capacity of the
SAN Volume Controller storage pool is the sum of the I/O capacity of all managed disks in
that pool. For example, if you have 10 managed disks from the RAID arrays with
eight disks as used in the example, the storage pool has the I/O capacity as shown in
Table 10-4.
Table 10-4 Storage pool I/O capacity
RAID type
RAID 5
10 x 1120 = 11200
10 x 320 = 3200
RAID 10
10 x 1280 = 12800
10 x 640 = 6400
RAID 6
10 x 960 = 9600
10 x 213 = 2130
The I/O capacity of a RAID 5 storage pool ranges from 3200 IOPS when the workload
pattern on the RAID array level is 100% write, and 11200 when the workload pattern is
100% read. This workload pattern is caused by a SAN Volume Controller toward the
storage subsystem. Therefore, it is not necessarily the same as it is from the host to the
SAN Volume Controller because of the SAN Volume Controller cache usage.
If more than one managed disk (LUN) is used per array, each managed disk receives a
portion of the array I/O capacity. For example, you have two LUNs per 8-disk array and
only one of the managed disks from each array is used in the storage pool. Then, the 10
managed disks have the I/O capacity that is listed in Table 10-5.
Table 10-5 Storage pool I/O capacity with two LUNs per array
RAID type
RAID 5
10 x 1120/2 = 5600
10 x 320/2 = 1600
RAID 10
10 x 1280/2 = 6400
10 x 640/2 = 3200
RAID 6
10 x 960/2 = 4800
10 x 213/2 = 1065
273
The numbers in Table 10-5 on page 273 are valid if both LUNs on the array are evenly
used. However, if the second LUNs on the arrays that are participating in the storage pool
are idle storage pool capacity, you can achieve the numbers that are shown in Table 10-4.
In an environment with two LUNs per array, the second LUN can also use the entire I/O
capacity of the array and cause the LUN used for the SAN Volume Controller storage pool
to get less available IOPS.
If the second LUN on those arrays is also used for the SAN Volume Controller storage
pool, the cumulative I/O capacity of two storage pools in this case equals one storage pool
with one LUN per array.
Storage subsystem cache influence
The numbers for the SAN Volume Controller storage pool I/O capacity that is calculated in
Table 10-5 on page 273 did not consider caching on the storage subsystem level, but only
the raw RAID array performance.
Similar to the hosts that are using SAN Volume Controller and that have the read/write
pattern and cache efficiency in its workload, the SAN Volume Controller also has the
read/write pattern and cache efficiency toward the storage subsystem. The following
example shows a host-to-SAN Volume Controller I/O pattern:
70:30:50 - 70% reads, 30% writes, 50% read cache hits
Read related IOPS generated from the host IO = Host IOPS x 0.7 x 0.5
Write related IOPS generated from the host IO = Host IOPS x 0.3
Table 10-6 shows the relationship of the host IOPS to the SAN Volume Controller
back-end IOPS.
Table 10-6 Host to SAN Volume Controller back-end I/O map
Host IOPS
Pattern
Read IOPS
Write IOPS
Total IOPS
2000
70:30:50
700
600
1300
The total IOPS from Table 10-6 is the number of IOPS that is sent from the SAN Volume
Controller to the storage pool on the storage subsystem. Because the SAN Volume
Controller is acting as the host toward the storage subsystem, we can also assume that
we have some read/write pattern and read cache hit on this traffic.
As shown in Table 10-6, the 70:30 read/write pattern with the 50% cache hit from the host
to the SAN Volume Controller is causing an approximate 54:46 read/write pattern from the
SAN Volume Controller traffic to the storage subsystem. If you apply the same read cache
hit of 50%, you get the 950 IOPS that is sent to the RAID arrays, which are part of the
storage pool, inside the storage subsystem, as shown in Table 10-7.
Table 10-7 SAN Volume Controller to storage subsystem I/O map
274
SAN Volume
Controller IOPS
Pattern
Read IOPS
Write IOPS
Total IOPS
1300
54:46:50
350
600
950
I/O considerations: These calculations are valid only when the I/O that is generated
from the host to the SAN Volume Controller generates exactly one I/O from the SAN
Volume Controller to the storage subsystem. If the SAN Volume Controller is combining
several host I/Os to one storage subsystem I/O, higher I/O capacity can be achieved.
Also, I/O with a higher block size decreases RAID array I/O capacity. Therefore, it is
possible that combining the I/Os does not increase the total array I/O capacity as viewed
from the host perspective. The drive I/O capacity numbers that are used in the preceding
I/O capacity calculations are for small block sizes, that is, 4 K - 32 K.
To simplify this example, assume that number of IOPS that is generated on the path from
the host to the SAN Volume Controller and from the SAN Volume Controller to the storage
subsystem remains the same.
If you assume the write penalty, Table 10-8 shows the total IOPS toward the RAID array for
the previous host example.
Table 10-8 RAID array total utilization
RAID type
Host IOPS
SAN Volume
Controller IOPS
RAID array
IOPS
RAID 5
2000
1300
950
350+4*600 = 2750
RAID 10
2000
1300
950
350+2*600 = 1550
RAID 6
2000
1300
950
350+6*600 = 3950
Based on these calculations, we can create a generic formula to calculate available host I/O
capacity from the RAID/storage pool I/O capacity. Assume that you have the following
parameters:
You can then calculate the host I/O capacity (HIO) by using the following formula:
HIO = XIO / (R*C1*C2/1000000 + W*WP/100)
The host I/O capacity can be lower than storage pool I/O capacity when the denominator in
the preceding formula is greater than 1.
To calculate at which write percentage in I/O pattern (W) the host I/O capacity is lower than
the storage pool capacity, use the following formula:
W =< 99.9 / (WP - C1 x C2/10000)
Write percentage (W) mainly depends on the write penalty of the RAID array. Table 10-9 on
page 276 shows the break-even value for W with a read cache hit of 50% on the SAN Volume
Controller and storage subsystem level.
275
W percentage break-even
RAID 5
26.64%
RAID 10
57.08%
RAID 6
17.37%
The W percentage break-even value from Table 10-9 is a useful reference about which RAID
level to use if you want to maximally use the storage subsystem back-end RAID arrays from
the write workload perspective.
With the preceding formulas, you can also calculate the host I/O capacity, for the example
storage pool from Table 10-4 on page 273 with the 70:30:50 I/O pattern (read:write:cache hit)
from the host side and 50% read cache hit on the storage subsystem.
Table 10-10 shows the results.
Table 10-10 Host I/O example capacity
RAID type
RAID 5
112000
8145
RAID 10
128000
16516
RAID 6
9600
4860
This formula assumes that no I/O grouping is on the SAN Volume Controller level. With SAN
Volume Controller code 6.x, the default back-end read and write I/O size is 256 K. Therefore,
a possible scenario is that a host might read or write multiple (for example, 8) aligned 32 K
blocks from or to the SAN Volume Controller. The SAN Volume Controller might combine this
to one I/O on the back-end side. In this situation, the formulas might need to be adjusted.
Also, the available host I/O for this particular storage pool might increase.
FlashCopy
The use of FlashCopy on a volume can generate more load on the back-end. When a
FlashCopy target is not fully copied or when copy rate 0 is used, the I/O to the FlashCopy
target causes an I/O load on the FlashCopy source. After the FlashCopy target is fully copied,
read/write I/Os are served independently from the source read/write I/O requests.
The combinations that are shown in Table 10-11 are possible when copy rate 0 is used or the
target FlashCopy volume is not fully copied and I/Os are run in an uncopied area.
Table 10-11 FlashCopy I/O operations
I/O operation
Source volume
write I/Os
Source volume
read I/Os
Target volume
write I/Os
Target volume
read I/Os
Redirect to the
source
276
I/O operation
Source volume
write I/Os
Source volume
read I/Os
Target volume
write I/Os
Target volume
read I/Os
In some I/O operations, you might experience multiple I/O overheads, which can cause
performance degradation of the source and target volume. If the source and the target
FlashCopy volume share the back-end storage pool (as shown in Figure 10-1), this situation
further influences performance.
Figure 10-1 FlashCopy source and target volume in the same storage pool
When frequent FlashCopy operations are run and you do not want too much effect on the
performance of the source FlashCopy volumes, place the target FlashCopy volumes in a
storage pool that does not share the back-end disks. If possible, place them on a separate
back-end controller, as shown in Figure 10-2 on page 278.
277
Figure 10-2 Source and target FlashCopy volumes in different storage pools
When you need heavy I/O on the target FlashCopy volume (for example, the FlashCopy
target of the database can be used for data mining), wait until FlashCopy copy is completed
before the target volume is used.
If volumes that participate in FlashCopy operations are large, the copy time that is required for
a full copy is not acceptable. In this situation, use the incremental FlashCopy approach. In this
setup, the initial copy lasts longer, and all subsequent copies copy only changes because of
the FlashCopy change tracking on source and target volumes. This incremental copying is
performed much faster, and it is usually in an acceptable time frame so that you have no need
to use target volumes during the copy operation, as shown in Figure 10-3 on page 279.
278
FlashCopy
SOURCE
FlashCopy
TARGET
FlashCopy
SOURCE
FlashCopy
TARGET
FlashCopy
SOURCE
FlashCopy
TARGET
Thin provisioning
The thin provisioning function also affects the performance of the volume because it
generates more I/Os. Thin provisioning is implemented by using a B-Tree directory that is
stored in the storage pool, as is the actual data. The real capacity of the volume consists of
the virtual capacity and the space that is used for the directory, as shown in Figure 10-4.
279
280
For some workloads, the combination of thin provisioning and the FlashCopy function can
significantly affect the performance of target FlashCopy volumes, which is related to the fact
that FlashCopy starts to copy the volume from its end. When the target FlashCopy volume is
thin provisioned, the last block is physically at the beginning of the volume allocation on the
back-end storage, as shown in Figure 10-6.
FlashCopy
SOURCE
FlashCopy
Thin Provisioned
TARGET
With a sequential workload (as shown in Figure 10-6), the data is on the physical level
(back-end storage) read/write from the end to the beginning. In this case, the underlying
storage subsystem cannot recognize a sequential operation, which causes performance
degradation on that I/O operation.
281
282
1-2
N/A
IBM FlashSystem
N/A
4 - 24
4 - 24
4 - 24
4 - 12
4 - 12
IBM FlashSystem
4 - 16
283
As shown in Table 10-13 on page 283, the number of arrays per storage pool is smaller in
high-end storage subsystems. This number is related to the fact that those subsystems can
deliver higher performances per array, even if the number of disks in the array is the same.
The performance difference is because of multilayer caching and specialized processors for
RAID calculations.
Consider the following points:
You must consider the number of MDisks per array and the number of arrays per managed
disk group to understand aggregate managed disk group loading effects.
You can achieve availability improvements without compromising performance objectives.
Before V6.2 of the SAN Volume Controller code, the SAN Volume Controller cluster used only
one path to the managed disk. All other paths were standby paths. When managed disks are
recognized by the cluster, active paths are assigned in round-robin fashion. To use all eight
ports in one I/O group, at least eight managed disks are needed from a particular back-end
storage subsystem. In the setup of one managed disk per array, you need at least eight arrays
from each back-end storage subsystem. A new path management was introduced in v6.2. For
more information about the new round robin path selection, see 4.2, Round Robin Path
Selection on page 73.
284
where:
If Q > 60, then Q=60 (maximum queue depth is 60)
If Q < 3, then Q=3 (minimum queue depth is 3)
In this algorithm, the following values are used:
Q: The queue for any MDisk in a specific controller
P: Number of WWPNs that is visible to SAN Volume Controller in a specific controller
N: Number of nodes in the cluster
M: Number of MDisks that is provided by the specific controller
C: A constant. C varies by the following controller types:
FAStT = 500
EMC CLARiiON = 250
DS4700, DS4800, DS6K, and DS8K = 1000
XIV Gen1= 450
XIV Gen2 and above = 900
Any other controller = 500
When the SAN Volume Controller is submitted and has Q I/Os outstanding for a single MDisk
(that is, it is waiting for Q I/Os to complete), it does not submit any more I/O until part of the
I/O completes. New I/O requests for that MDisk are queued inside the SAN Volume
Controller, which is unwanted and indicates that back-end storage is overloaded.
The following example shows how a 4-node SAN Volume Controller cluster calculates the
queue depth for 150 LUNs on a DS8000 storage controller that uses six target ports:
Q = ((6 ports x 1000/port)/4 nodes)/150 MDisks) = 10
With the sample configuration, each MDisk has a queue depth of 10.
SAN Volume Controller V4.3.1 introduced dynamic sharing of queue resources that is based
on workload. MDisks with high workload can now borrow unused queue allocation from
less-busy MDisks on the same storage system. Although the values are calculated internally
and this enhancement provides for better sharing, consider queue depth in deciding how
many MDisks to create.
Host I/O
In SAN Volume Controller versions before V6.x, the maximum back-end transfer size that
results from host I/O under normal I/O is 32 KB. If host I/O is larger than 32 KB, it is broken
into several I/Os sent to the back-end storage, as shown in Figure 10-7 on page 286. For this
example, the transfer size of the I/O is 256 KB from the host side.
285
In such cases, I/O utilization of the back-end storage ports can be multiplied compared to the
number of I/Os coming from the host side. This situation is especially true for sequential
workloads where I/O block size tends to be bigger than in traditional random I/O.
To address this situation, the back-end block I/O size for reads and writes was increased to
256 KB in SAN Volume Controller versions 6.x, as shown in Figure 10-8 on page 287.
286
Internal cache track size is 32 KB. Therefore, when the I/O comes to the SAN Volume
Controller, it is split to the adequate number of the cache tracks. For the preceding example,
this number is eight 32 KB cache tracks.
Although the back-end I/O block size can be up to 256 KB, the particular host I/O can be
smaller. As such, read or write operations to the back-end managed disks can range
512 bytes - 256 KB. The same is true for the cache because the tracks are populated to the
size of the I/O. For example, the 60 KB I/O might fit in two tracks, where first track is fully
populated with 32 KB and second track holds only 28 KB.
If the host I/O request is larger than 256 KB, it is split into 256 KB chunks where the last
chunk can be partial depending on the size of I/O from the host.
FlashCopy I/O
The transfer size for FlashCopy can be 64 KB or 256 KB for the following reasons:
The grain size of FlashCopy is 64 KB or 256 KB.
Any size write that changes data within a 64 KB or 256 KB grain results in a single 64-KB
or 256-KB read from the source and write to the target.
287
Coalescing writes
The SAN Volume Controller coalesces writes up to the 32-KB track size if writes are in the
same tracks before destage. For example, if 4 KB is written into a track, another 4 KB is
written to another location in the same track. This track moves to the bottom of the least
recently used (LRU) list in the cache upon the second write, and the track now contains 8 KB
of actual data. This system can continue until the track reaches the top of the LRU list and is
then destaged. The data is written to the back-end disk and removed from the cache. Any
contiguous data within the track is coalesced for the destage.
For sequential writes, the SAN Volume Controller does not use a caching algorithm for
explicit sequential detect, which means coalescing of writes in SAN Volume Controller cache
has a random component to it. For example, 4 KB writes to VDisks translates to a mix of
4-KB, 8-KB, 16-KB, 24-KB, and 32-KB transfers to the MDisks, which reduces probability as
the transfer size grows.
Although larger transfer sizes tend to be more efficient, this varying transfer size has no effect
on the ability of the controller to detect and coalesce sequential content to achieve full stride
writes.
For sequential reads, the SAN Volume Controller uses prefetch logic for staging reads that is
based on statistics that are maintained on 128 MB regions. If the sequential content is
sufficiently high enough within a region, prefetch occurs with 32 KB reads.
Maximum non-thin
provisioned volume
capacity in GB
Maximum thin
provisioned volume
capacity in GB
Maximum MDisk
capacity in GB
16
2048 (2 TB)
2000
2048 (2 TB)
64 TB
32
4096 (4 TB)
4000
4096 (4 TB)
128 TB
64
8192 (8 TB)
8000
8192 (8 TB)
256 TB
128
16,000
512 TB
256
32,000
1 PB
512
65,000
2 PB
1024
130,000
4 PB
2048
260,000
8 PB
288
Extent size
(MB)
Maximum non-thin
provisioned volume
capacity in GB
Maximum thin
provisioned volume
capacity in GB
Maximum MDisk
capacity in GB
4096
262,144
16 PB
8192
262,144
1,048,576 (1024
TB)
32 PB
The size of the SAN Volume Controller extent also defines how many extents are used for a
particular volume. The example of two different extent sizes that is shown in Figure 10-9
shows that fewer extents are required with a larger extent size.
The extent size and the number of managed disks in the storage pool define the extent
distribution in case of stripped volumes. The example in Figure 10-10 shows two different
cases. In one case, the ratio of volume size and extent size is the same as the number of
managed disks in the storage pool. In the other case, this ratio is not equal to the number of
managed disks.
289
For even storage pool utilization, align the size of volumes and extents so that even extent
distribution can be achieved. Because the volumes often are used from the beginning of the
volume, performance improvements are not gained, which is also valid only for non-thin
provisioned volumes.
Tip: Align the extent size to the underlying back-end storage; for example, an internal array
stride size (if possible) in relation to the whole cluster size.
Upper limit
100%
66%
40%
30%
5 or more
25%
The effect of SAN Volume Controller cache partitioning is that no single storage pool occupies
more than its upper limit of cache capacity with write data. Upper limits are the point at which
the SAN Volume Controller cache starts to limit incoming I/O rates for volumes that are
created from the storage pool.
If a particular storage pool reaches the upper limit, it experiences the same result as a global
cache resource that is full. That is, the host writes are serviced on a one-out, one-in basis as
the cache destages writes to the back-end storage. However, only writes that are targeted at
the full storage pool are limited; all I/O that is destined for other (non-limited) storage pools
continues normally.
Read I/O requests for the limited storage pool also continue normally. However, because the
SAN Volume Controller is destaging write data at a rate that is greater than the controller can
sustain (otherwise, the partition does not reach the upper limit), reads are serviced as slowly.
290
The key point to remember is that the partitioning is limited only on write I/Os. In general, a
70:30 or 50:50 ratio of read-to-write operations is observed. However, some applications or
workloads can perform 100% writes. However, write cache hits are much less of a benefit
than read cache hits. A write always hits the cache. If modified data is in the cache, it is
overwritten, which might save a single destage operation. However, read cache hits provide a
much more noticeable benefit, which saves seek and latency time at the disk layer.
In all benchmarking tests that are performed, even with single active storage pools, good path
SAN Volume Controller I/O group throughput is the same as before SAN Volume Controller
cache partitioning was introduced.
For information about SAN Volume Controller cache partitioning, see IBM SAN Volume
Controller 4.2.1 Cache Partitioning, REDP-4426.
291
The decision about which type of ranks-to-extent pool mapping to use depends mainly on the
following factors:
The DS8000 model that is used for back-end storage (DS8100, DS8300, DS8700, or
DS8800)
The stability of the DS8000 series configuration
The microcode that is installed or can be installed on the DS8000 series
The DS8700 and DS8800 models do not have the 2-TB limit. Therefore, use a single
LUN-to-rank mapping, as shown in Figure 10-12.
In this setup, we have as many extent pools as ranks, and extent pools might be evenly
divided between both internal servers (server0 and server1).
292
With both approaches, the SAN Volume Controller is used to distribute the workload across
ranks evenly by striping the volumes across LUNs.
A benefit of one rank to one extent pool is that physical LUN placement can be easily
determined when it is required, such as in performance analysis.
The drawback of such a setup is that, when ranks are added and then integrated into existing
SAN Volume Controller storage pools, existing volumes must be restriped manually or by
using scripts.
With this design, you must define the LUN size so that each has the same number of extents
on each rank (extent size of 1 GB). In the previous example, the LUN might have a size of
N x 10 GB. With this approach, the utilization of the DS8000 series on the rank level might be
balanced.
If another rank is added to the configuration, the existing DS8000 series LUNs (SAN Volume
Controller managed disks) can be rebalanced by using the DS8000 series Easy Tier manual
operation so that the optimal resource utilization of DS8000 series is achieved. With this
approach, you do not need to restripe volumes on the SAN Volume Controller level.
Extent pools
The number of extent pools on the DS8000 series depends on the rank setup. A minimum of
two extent pools is required to evenly use both servers inside DS8000. In all cases, an even
number of extent pools provides the most even distribution of resources.
293
When possible, consider adding arrays to storage pools based on multiples of the installed
DA pairs. For example, if the storage controller contains six DA pairs, use 6 or 12 arrays in a
storage pool with arrays from all DA pairs in a managed disk group.
Example 10-1 on page 295 shows what this invalid configuration looks like from the CLI
output of the lsarray and lsrank commands. The arrays that are on the same DA pair
contain the same group number (0 or 1), meaning that they have affinity to the same DS8000
series server. Here, server0 is represented by group0, and server1 is represented by group1.
As an example of this situation, consider arrays A0 and A4, which are attached to DA pair 0.
In this example, both arrays are added to an even-numbered extent pool (P0 and P4) so that
both ranks have affinity to server0 (represented by group0), which leaves the DA in server1
idle.
294
Example 10-1 Command output for the lsarray and lsrank commands
dscli> lsarray -l
Date/Time: Aug 8, 2008 8:54:58 AM CEST IBM DSCLI Version:5.2.410.299 DS: IBM.2107-75L2321
Array State Data
RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass
===================================================================================
A0
Assign Normal
5 (6+P+S)
S1
R0
0
146.0
ENT
A1
Assign Normal
5 (6+P+S)
S9
R1
1
146.0
ENT
A2
Assign Normal
5 (6+P+S)
S17
R2
2
146.0
ENT
A3
Assign Normal
5 (6+P+S)
S25
R3
3
146.0
ENT
A4
Assign Normal
5 (6+P+S)
S2
R4
0
146.0
ENT
A5
Assign Normal
5 (6+P+S)
S10
R5
1
146.0
ENT
A6
Assign Normal
5 (6+P+S)
S18
R6
2
146.0
ENT
A7
Assign Normal
5 (6+P+S)
S26
R7
3
146.0
ENT
dscli> lsrank -l
Date/Time: Aug 8, 2008 8:52:33 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321
ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts
======================================================================================
R0
0 Normal Normal
A0
5
P0
extpool0 fb
779
779
R1
1 Normal Normal
A1
5
P1
extpool1 fb
779
779
R2
0 Normal Normal
A2
5
P2
extpool2 fb
779
779
R3
1 Normal Normal
A3
5
P3
extpool3 fb
779
779
R4
0 Normal Normal
A4
5
P4
extpool4 fb
779
779
R5
1 Normal Normal
A5
5
P5
extpool5 fb
779
779
R6
0 Normal Normal
A6
5
P6
extpool6 fb
779
779
R7
1 Normal Normal
A7
5
P7
extpool7 fb
779
779
Figure 10-15 shows a correct configuration that balances the workload across all four DA pairs.
Example 10-2 on page 296 shows how this correct configuration looks from the CLI output of
the lsrank command. The configuration from the lsarray output remains unchanged. Arrays
that are on the same DA pair are now split between groups 0 and 1.
Reviewing arrays A0 and A4 again now shows that they have different affinities (A0 - group0,
A4 - group1). To achieve this correct configuration (compared to Example 10-1), array A4 now
belongs to an odd-numbered extent pool (P5).
Chapter 10. Back-end storage performance considerations
295
10.8.2 Cache
For the DS8000 series, you cannot tune the array and cache parameters. The arrays are 6+p
or 7+p, depending on whether the array site contains a spare and whether the segment size
(contiguous amount of data that is written to a single disk) is 256 KB for fixed block volumes.
Caching for the DS8000 series is done on a 64-KB track boundary.
Ports
Adapters
2 - 16
2 - 4 (2/4-port adapter)
16 - 48
16
> 48
16
The DS8000 series populates Fibre Channel adapters across two to eight I/O enclosures,
depending on configuration. Each I/O enclosure represents a separate hardware domain.
Ensure that adapters that are configured to different SAN networks do not share the I/O
enclosure as part of our goal of keeping redundant SAN networks isolated from each other.
296
Figure 10-16 shows an example of DS8800 series connections with 16 I/O ports on eight
8-port adapters. In this case, two ports per adapter are used.
297
Figure 10-17 shows an example of DS8800 series connections with four I/O ports on two
4-port adapters. In this case, two ports per adapter are used.
298
RAID 5: 6+P+S
RAID 5: 7+P
RAID 10: 2+2+2P+2S
RAID 10: 3+3+2P
These factors define the performance and size attributes of the DS8000 series LUNs that act
as managed disks for SAN Volume Controller storage pools. The SAN Volume Controller
storage pool should have MDisks with the same characteristic for performance and capacity,
which is required even for DS8000 series utilization.
Tip: Describe the main characteristics of the storage pool in its name. For example, the
pool on DS8800 series with 146 GB 15K FC disks in RAID 5 might have the name
DS8800_146G15KFCR5.
Figure 10-18 shows an example of a DS8700 series storage pool layout that is based on disk
type and RAID level. In this case, ranks with RAID5 6+P+S and 7+P are combined in the
same storage pool, and RAID10 2+2+2P+2S and 3+3+2P are combined in the same storage
pool.
Figure 10-18 DS8700 series storage pools that are based on disk type and RAID level
With this approach, some parts of volumes or some volumes might be striped only over MDs
(LUNs) that are on the arrays or ranks where no spare disk is available. Because those MDs
have one spindle more, this approach can also compensate for the performance
requirements because more extents are placed on them.
Such an approach simplifies the management of the storage pools because it allows for a
smaller number of storage pools to be used.
Four storage pools are defined in the following scenario:
To achieve an optimized configuration from the RAID perspective, the configuration includes
storage pools that are based on the number of disks in the array or rank, as shown in
Figure 10-19 on page 300.
299
Figure 10-19 DS8700 storage pools with exact number of disks in the array/rank
With this setup, seven storage pools are defined instead of four. The complexity of
management increases because more pools must be managed. From the performance
perspective, the back end is completely balanced on the RAID level.
Configurations with so many different disk types in one storage subsystem are uncommon.
One DS8000 series system often has a maximum of two types of disks, and different types of
disks are installed in different systems. Figure 10-20 shows an example of such a setup on
DS8800 series.
Figure 10-20 DS8800 series storage pool setup with two types of disks
300
Although it is possible to span the storage pool across multiple back-end systems, as shown
in Figure 10-21, keep storage pools bound inside single DS8000 series for availability.
301
Figure 10-22 shows a DS8800 series with two storage pools for 6+P+S RAID5 and 7+P
arrays.
302
Table 10-17, Table 10-18 on page 304, and Table 10-19 on page 304 show the number of
managed disks and the available capacity that is based on the number of installed modules.
Table 10-17 XIV with 1-TB disks and 1632-GB LUNs
Number of XIV
modules installed
16
26.1
27
26
42.4
43
10
30
48.9
50
11
33
53.9
54
12
37
60.4
61
13
40
65.3
66
14
44
71.8
73
15
48
78.3
79
303
Table 10-18 lists the data for XIV with 2-TB disks and 1669-GB LUNs (Gen3).
Table 10-18 XIV with 2-TB disks and 1669-GB LUNs (Gen3)
Number of XIV
modules installed
33
55.1
55.7
52
86.8
88
10
61
101.8
102.6
11
66
110.1
111.5
12
75
125.2
125.9
13
80
133.5
134.9
14
89
148.5
149.3
15
96
160.2
161.3
Table 10-19 lists the data for XIV with 3-TB disks and 2185-GB LUNs (Gen3).
Table 10-19 XIV with 3-TB disks and 2185-GB LUNs (Gen3)
304
Number of XIV
modules installed
38
83
84.1
60
131.1
132.8
10
70
152.9
154.9
11
77
168.2
168.3
12
86
187.9
190.0
13
93
203.2
203.6
14
103
225.0
225.3
15
111
242.5
243.3
If XIV is not configured with the full capacity initially, you can use the SAN Volume Controller
rebalancing script to optimize volume placement when capacity is added to the XIV.
XIV modules
with FC ports
Total available
FC ports
Ports used
per FC card
4, 5
4, 5, 7, 8
16
10
4, 5, 7, 8
16
11
4, 5, 7, 8, 9
20
10
12
4, 5, 7, 8, 9
20
10
13
4, 5, 6, 7, 8, 9
24
12
14
4, 5, 6, 7, 8, 9
24
12
15
4, 5, 6, 7, 8, 9
24
12
305
As shown in Table 10-20 on page 305, the SAN Volume Controller 16-port limit for storage
subsystem is not reached.
To provide redundancy, connect the ports available for SAN Volume Controller use to dual
fabrics. Connect each module to separate fabrics. Figure 10-24 shows an example of
preferred practice SAN connectivity.
306
307
Figure 10-25 shows an example of the Storwize V7000 configuration with optimal smaller
arrays and non-optimal larger arrays.
As shown in Figure 10-25, one hot spare disk was used for enclosure, which is not a
requirement. However, it is helpful because it provides symmetrical usage of the enclosures.
At a minimum, use one hot spare disk per SAS chain for each type of disk in the Storwize
V7000. If more than two enclosures are present, you must have at least two HS disks per
SAS chain per disk type, if those disks occupy more than two enclosures. Figure 10-26 shows
a Storwize V7000 configuration with multiple disk types.
When you define a volume on the Storwize V7000 level, use the default values. The default
values define a 256-KB strip size (the size of the RAID chunk on that disk), which is in line
with the SAN Volume Controller back-end I/O size, which in V6.1 is above 256 KB. For
example, the use of a 256 KB strip size gives a 2-MB stride size (the whole RAID chunk size)
in an 8+1 array.
308
Storwize V7000 also supports large NL-SAS drives (2 TB and 3 TB). The use of those drives
in RAID 5 arrays can produce significant RAID rebuild times, even several hours. Therefore,
use RAID 6 to avoid double failure during the rebuild period. Figure 10-27 shows this type of
setup.
Tip: Make sure that volumes that are defined on Storwize V7000 are distributed evenly
across all nodes.
309
In this setup, the SAN Volume Controller can access a Storwize V7000 with a two-node
configuration over four ports. Such connectivity is sufficient for Storwize V7000 environments
that are not fully loaded.
However, if the Storwize V7000 is hosting capacity that requires more than two connections
per node, use four connections per node, as shown in Figure 10-29 on page 311.
310
With a two-node Storwize V7000 setup, eight target connections are provided from the SAN
Volume Controller perspective. This number is well below the 16 target ports that is the
current SAN Volume Controller limit for back-end storage subsystems.
Previously, the limit in the Storwize V7000 configuration was four-node cluster. With this
configuration of four connections to the SAN, the limit of 16 target ports is reached. As such,
this configuration might still be supported. Figure 10-30 shows an example of the
configuration. However, the specified limit is no longer true and we can have configurations
supporting eight-node cluster with four control enclosures with four IO groups.
311
Redundancy consideration: At a minimum, connect two ports per node to the SAN with
connections to two redundant fabrics.
Figure 10-31 Storwize V7000 storage pool example with two pools
This example has a hot spare disk in every enclosure, which is not a requirement. To avoid
having two pools for the same disk type, create an array configuration that is based on the
following rules:
Number of disks in the array:
6+1
7+1
8+1
Number of hot spare disks: Minimum of two
Based on the array size, the following symmetrical array configuration is possible as a setup
for a five-enclosure Storwize V7000:
6+1 - 17 arrays (119 disks) + 1 x hot spare disk
7+1 - 15 arrays (120 disks) + 0 x hot spare disk
8+1 - 13 arrays (117 disks) + 3 x hot spare disks
312
The 7+1 array does not provide any hot spare disks in the symmetrical array configuration, as
shown in Figure 10-32.
The 6+1 arrays provide a single hot spare disk in the symmetrical array configuration, as
shown in Figure 10-33. It is not a preferred value for the number of hot spare disks.
The 8+1 arrays provide three hot spare disks in the symmetrical array configuration, as shown
in Figure 10-34 on page 314. These arrays are within the recommended value range for the
number of hot spare disks (two).
313
The best configuration for a single storage pool for the same type of disk in a five-enclosure
Storwize V7000 is an 8+1 array configuration.
Tip: A symmetrical array configuration for the same disk type provides the least possible
complexity in a storage pool configuration.
When you select the array width, consider its effect on rebuild time and availability. A larger
number of disks in an array increases the rebuild time for disk failures, which can have a
negative effect on performance. Also, having more disks in an array increases the probability
of having a second drive failure within the same array before the rebuild of an initial drive
failure completes. This exposure is inherent to the RAID 5 architecture.
Preferred practice: For the DS5000, use array widths of 4+p and 8+p.
Segment size
With direct-attached hosts, considerations are often made to align device data partitions to
physical drive boundaries within the storage controller. For the SAN Volume Controller,
aligning device data partitions to physical drive boundaries within the storage controller is less
critical. The reason is based on the caching that the SAN Volume Controller provides, and on
the fact that less variation is in its I/O profile, which is used to access back-end disks.
Because the maximum destage size for the SAN Volume Controller is 256 KB, it is impossible
to achieve full stride writes for random workloads. For the SAN Volume Controller, the only
opportunity for full stride writes occurs with large sequential workloads, and in that case, the
larger the segment size, the better. Larger segment sizes can adversely affect random I/O,
however. The SAN Volume Controller and controller cache hide the RAID 5 write penalty for
random I/O well, and therefore, larger segment sizes can be accommodated. The primary
consideration for selecting segment size is to ensure that a single host I/O fits within a single
segment to prevent accessing multiple physical drives.
Preferred practice: Use a segment size of 256 KB as the best compromise for all
workloads.
Attribute
Value
256
Managed mode
Striped
DS5000
256
315
Models
Attribute
Value
DS5000
32 KB
DS5000
80/80 (default)
DS5000
Readahead
DS5000
RAID 5
4+p, 8+p
316
317
318
11
Chapter 11.
319
320
As is the case for HDDs, the SSD RAID array format helps to protect against individual SSD
failures. Depending on your requirements, you can achieve more high availability protection
above the RAID level by using volume mirroring.
In the example disk tier pool that is shown in Figure 11-2 on page 322, you can see the SSD
MDisks presented from the SSD disk arrays.
321
MDisks that are used in a single-tier storage pool should have the same hardware
characteristics, for example, the same RAID type, RAID array size, disk type, and disk
revolutions per minute (RPMs), and controller performance characteristics.
Adding SSD to the pool means that more space is also now available for new volumes or
volume expansion.
322
This process is efficient and adds negligible processing overhead to the SAN Volume
Controller nodes.
Data Placement Advisor
The Data Placement Advisor uses workload statistics to make a cost benefit decision as to
which extents are to be candidates for migration to a higher performance (SSD) tier.
This process also identifies extents that must be migrated back to a lower (HDD) tier.
Data Migration Planner
By using the extents that were previously identified, the Data Migration Planner step builds
the extent migration plan for the storage pool.
Data Migrator
This process involves the actual movement or migration of the volumes extents up to, or
down from, the high disk tier. The extent migration rate is capped so that a maximum of up
to 30 MBps is migrated, which equates to around 3 TB a day that is migrated between disk
tiers.
When it relocates volume extents, Easy Tier performs the following tasks:
It attempts to migrate the most active volume extents up to SSD first.
To ensure that a free extent is available, you might need to first migrate a less frequently
accessed extent back to the HDD.
A previous migration plan and any queued extents that are not yet relocated are
abandoned.
323
Dynamic data movement is not apparent to the host server and application users of the data,
other than providing improved performance. Extents are automatically migrated, as described
in 11.3.2, Implementation rules on page 325.
The statistic summary file is also created in this mode. This file can be offloaded for input to
the advisor tool. The tool produces a report on the extents that are moved to SSD and a
prediction of performance improvement that can be gained if more SSD arrays are available.
For more information about the use of these parameters, see 11.5, Activating Easy Tier with
the SAN Volume Controller CLI on page 329, and 11.6, Activating Easy Tier with the SAN
Volume Controller GUI on page 335.
11.3.1 Prerequisites
No Easy Tier license is required for the SAN Volume Controller. Easy Tier comes as part of
the V6.1 code. For Easy Tier to migrate extents, you must have disk storage available that has
different tiers; for example, a mix of SSD and HDD.
324
325
Offloading statistics
To extract the summary performance data, use one of the following methods:
CLI
GUI
These methods are described next.
Figure 11-4 Accessing the dpa_heat file in the SAN Volume Controller 7.2 GUI
Next, right-click the row for the dpa_heat file and choose Download, as shown in Figure 11-5.
Figure 11-5 Downloading the dpa_heat file in the SAN Volume Controller 7.2 GUI
327
The index.html file is then created in the STAT base directory. When it is opened with your
browser, a summary page is displayed, as shown in Figure 11-6.
328
The distribution of hot data and cold data for each volume is shown in the volume heat
distribution report. The report displays the portion of the capacity of each volume on SSD
(red), and HDD (blue), as shown in Figure 11-7.
11.5 Activating Easy Tier with the SAN Volume Controller CLI
This section describes how to activate Easy Tier by using the SAN Volume Controller CLI.
The example is based on the storage pool configurations as shown in Figure 11-1 on
page 321 and Figure 11-2 on page 322.
The environment is an SAN Volume Controller cluster with the following resources available:
1 x I/O group with two 2145-CF8 nodes
8 x external 73 GB SSDs (4 x SSD per RAID5 array)
1 x external Storage Subsystem with HDDs
Deleted lines: Many lines that were not related to Easy Tier were deleted in the command
output or responses in the examples that are shown in the following sections so that you
can focus only on information that is related to Easy Tier.
329
vdisk_count 1
.
easy_tier on
easy_tier_status active
.
tier generic_ssd
tier_mdisk_count 0
.
tier generic_hdd
tier_mdisk_count 3
tier_capacity 200.25GB
------------ Now Repeat for the Volume ------------IBM_2145:ITSO-CLS5:admin>svcinfo lsvdisk -filtervalue "mdisk_grp_name=Single*"
id name status mdisk_grp_id mdisk_grp_name
capacity type
27 ITSO_Volume_1 online 27
Single_Tier_Storage_Pool 10.00GB striped
IBM_2145:ITSO-CLS5:admin>svcinfo lsvdisk ITSO_Volume_1
id 27
name ITSO_Volume_1
.
easy_tier off
easy_tier_status inactive
.
tier generic_ssd
tier_capacity 0.00MB
.
tier generic_hdd
tier_capacity 10.00GB
331
In this example, a storage pool is available within which we want to place the SSD arrays,
Multi_Tier_Storage_Pool. After you create the SSD arrays (which appear as MDisks), they
are placed into the storage pool, as shown in Example 11-3.
The storage pool easy_tier value is set to auto because it is the default value that is
assigned when you create a storage pool. Also, the SSD MDisks default tier value is set to
generic_hdd, and not to generic_ssd.
Tip: Internal SSD MDisks in SAN Volume Controller and Storwize family storage systems
are automatically recognized upon detection and their tier value is set to generic_ssd.
However, this is not the case for external SSD MDisks. The tier value of any external MDisk
is set to generic_hdd when the MDisks is initially detected. You must manually change the
tier value to activate Easy Tier.
Example 11-3 Multitier pool creation
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp -filtervalue "name=Multi*"
id name status mdisk_count vdisk_count capacity easy_tier easy_tier_status
28 Multi_Tier_Storage_Pool online 3 1 200.25GB auto
inactive
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdiskgrp Multi_Tier_Storage_Pool
id 28
name Multi_Tier_Storage_Pool
status online
mdisk_count 3
vdisk_count 1
.
easy_tier auto
easy_tier_status inactive
.
tier generic_ssd
tier_mdisk_count 0
.
tier generic_hdd
tier_mdisk_count 3
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdisk
mdisk_id mdisk_name status mdisk_grp_name
capacity raid_level tier
299 SSD_Array_RAID5_1 online Multi_Tier_Storage_Pool 203.6GB raid5 generic_hdd
300 SSD_Array_RAID5_2 online Multi_Tier_Storage_Pool 203.6GB raid5 generic_hdd
IBM_2145:ITSO-CLS5:admin>svcinfo lsmdisk SSD_Array_RAID5_2
mdisk_id 300
mdisk_name SSD_Array_RAID5_2
status online
mdisk_grp_id 28
mdisk_grp_name Multi_Tier_Storage_Pool
capacity 203.6GB
.
raid_level raid5
tier generic_hdd
332
333
.
tier generic_hdd
tier_mdisk_count 3
334
You now have two different tiers available in our SAN Volume Controller cluster: generic_ssd
and generic_hdd. Now, extents are used on the generic_ssd tier and the generic_hdd tier.
See the free_capacity values.
However, you cannot determine from this command if the SSD storage is being used by the
Easy Tier process. To determine whether Easy Tier is actively measuring or migrating extents
within the cluster, you must view the volume status, as shown in Example 11-5 on page 334.
11.6 Activating Easy Tier with the SAN Volume Controller GUI
This section describes how to activate Easy Tier by using the web GUI. This simple example
uses a single storage pool that contains an MDisk that is presented from an XIV Gen 2
Storage System and a MDisk that is presented from an IBM FlashSystem 820 Storage
System.
The environment is an SAN Volume Controller cluster with the following resources available:
1 x I/O group with two 2145-CF8 nodes
1 x external 400 GB FlashSystem LUN
1 x external 2.45TB XIV Gen 2 Storage System LUN
Figure 11-8 MDisks by Pools view, showing both MDisks with Tier set to Hard Disk Drive
335
When you view the properties of the storage pool, you can see that Easy Tier is inactive, as
shown in Figure 11-9.
Figure 11-9 Pool properties page showing Easy Tier inactive status
Therefore, for Easy Tier to take effect, you must change the disk tier. Right-click the selected
MDisk and choose Select Tier, as shown in Figure 11-10.
336
Now set the MDisk Tier to Solid-State Drive, as shown in Figure 11-11.
The MDisks now have the correct tier values, as shown in Figure 11-12.
Figure 11-12 MDisks by Pools page, showing the correct tier values for the MDisks in the pool
337
After the pool has an Easy Tier active status, the automatic data relocation process begins for
the volumes in the pool, which occurs because the default Easy Tier setting for volumes is on.
If we browse to the Volumes by Pool page, we can verify that Easy Tier is now active on the
volumes in the pool, as shown in Figure 11-14.
338
12
Chapter 12.
Applications
This chapter provides information about laying out storage for the best performance for
general applications; specifically, IBM AIX Virtual I/O Servers (VIOS), and IBM DB2
databases. Although most of the specific information is directed to hosts that are running the
IBM AIX operating system, the information is also relevant to other host types.
This chapter includes the following sections:
Application workloads
Application considerations
Data layout overview
Database storage
Data layout with the AIX Virtual I/O Server
Volume size
Failure domains
339
340
341
342
With large-size I/O, it is better to use large cache blocks to write larger chunks into cache with
each operation. You want the sequential I/Os to take as few back-end I/Os as possible and to
get maximum throughput from them. Therefore, carefully decide how to define the logical
drive and how to disperse the volumes on the back-end storage MDisks.
Many environments have a mix of transaction-oriented workloads and throughput-oriented
workloads. Unless you measured your workloads, assume that the host workload is mixed
and use Storwize family system striped volumes over several MDisks in a storage pool.
343
Extents from the LVM volume group are used to define logical volumes. LVM logical
volumes are used to store data on a file system that is defined on the volume or directly on
the volume. Realistically sized logical volumes are backed up by many MDisk Group
extents. This means that an LVM logical volume is striped across many MDisks, even if the
OS LVM uses linear allocation policy for LVM extents.
The schematic representation of the storage virtualization layers is shown in Figure 12-1.
Understanding of the layers of storage virtualization is important when you are making
decisions about data layout in a system.
344
For Storwize family systems, the maximum I/O size for destage to back-end storage is
256 KB. Therefore, full stripe writes to a back-end storage are not feasible. However, for
storage that is virtualized by a Storwize family system, this is not a problem. Even for internal
storage, this is not a major concern. If there are many disk drive modules in the array, whether
the I/Os that are issued by the hosts are full-stripe write or a partial-stripe write is not a major
performance concern.
Multiple disks are presented to the operating system to enable sending I/Os in parallel to
multiple independent disks. With virtualized storage, multiple disks that are presented to a
host can be backed up by the same MDisk group and, ultimately, the same physical disks. In
such circumstances, the only performance gains can be realized at the OS level, if the OS
can process I/Os to the disks in parallel at all levels of the storage stack. Often, I/Os still are
serialized at the FC card driver level.
In tests that were performed on AIX hosts, some increase in performance was observed
when the number of physical volumes was increased from 1 to 2 and then to 4. Further, the
increase of the number of physical volumes did not result in substantial performance
increase. The performance gain can be to some extent explained by increased effective
queue depth for the host (number of I/Os in flight at any moment). To realize performance
gains that result from the usage of multiple storage systems in parallel, it is important to make
sure that multiple VDisks that are presented to the host originate from separate back-end
storage systems.
At the same time, OS-level storage layout can be simplified by presenting fewer larger
volumes to the host, while making sure that these volumes are backed by multiple back-end
storage systems. Such a setup realizes the benefits of parallelization of I/Os at the Storwize
family system layer, which helps to take full benefit from investment in the storage systems
and reduce planning complexity of the storage level at the OS level.
In virtualized storage environments, the OS has little to no knowledge of the mapping
between logical addressing (LBA of a block) and the physical layout of data on physical disk
drive modules.
For example, in case of thinly provisioned volumes (including volumes that are using
Real-time Compression technology), sequential storage access as seen by the OS can result
in random access pattern at the disk module level. Conversely, because of the temporal data
locality, multiple random accesses to storage can result in few sequential accesses at the
storage level. The problem can be exacerbated when the hosts disk is virtualized by a
hypervisor, such as Virtual I/O Server (VIOS) or a VMware host. In these circumstances,
multiple hosts are issuing I/Os to the same volume simultaneously; therefore, I/O access
pattern optimization that is performed at the level of distinct OS instances has little bearing on
the access pattern at the physical disks level.
345
Some applications (database engines in particular) also have built-in mechanisms for I/O
management with which their administrators can control I/O access patterns. This ability is
often used to send multiple I/Os in parallel by using striping across LVs that are available to
the application.
Together, the storage administrator, OS administrator, and application administrator control
how data is distributed among physical disks.
346
Note: Consider the following points about the preferred general data layout:
Evenly balance I/Os across all physical disks (one method is by striping the volumes).
To maximize sequential throughput, use a maximum range of physical disks for each LV
when defining a logical volume on AIX host (by using the mklv -e x AIX command).
MDisk and volume sizes:
Create one MDisk per RAID array.
Create volumes that are based on the space that is needed and ensure that they are
backed by a back-end storage system with sufficient performance.
When you need more space on the server, dynamically extend the volume on the
Storwize family storage system and then use the appropriate OS command to see the
increased size in the system.
Use striped mode volumes if there is no recommendation to the contrary from the vendor of
the application that is using the volumes. Striped volumes are the all-purpose volumes that
provide good performance for most applications.
Manual, explicit I/O balancing for a specific application is also an option. However, this
approach requires in-depth knowledge of the application and extensive testing. Therefore,
this approach is much more time-consuming and the performance gains might not justify the
time spent on fine-tuning the storage layout. Modifying the storage system (for example,
increasing of the size of the volumes), might require another iteration of tests. Changing
workload characteristics of a manually tuned system can result in an unbalanced
configuration with suboptimal performance.
Some applications stripe their data across the underlying disks; for example, IBM DB2, IBM
GPFS, and Oracle ASM. These types of applications might require more data layout
considerations, as described in 12.3.6, LVM volume groups and logical volumes on
page 349.
347
16 MB
64 TB
32 MB
128 TB
64 MB
256 TB
128 MB
512 TB
256 MB
1 PB
512 MB
2 PB
1 GB
4 PB
2 GB
8 PB
Use striped volumes when the number of volumes does not matter.
Use striped volumes when the number of VGs does not affect performance.
Use striped volumes when sequential I/O rates are greater than the sequential rate for a
single RAID array on the back-end storage. Extremely high sequential I/O rates might
require a different layout strategy.
Use striped volumes when you prefer the use of large LUNs on the host.
For information about how to use large volumes, see 12.6, Volume size on page 351.
348
Tivoli Storage Manager uses a similar scheme as DB2 to spread out its I/O, but it also
depends on ensuring that the number of client backup sessions is equal to the number of
Tivoli Storage Manager storage volumes or containers. The perfect situation for Tivoli Storage
Manager is some number of client backup sessions that go to some number of containers
(with each container on a separate RAID array).
To summarize, if you have a good understanding of the applications I/O characteristics and
can control the destination of data streams that are generated by the application, you might
consider explicitly balancing the I/Os. In this case, use image mode volumes from Storwize
family system to direct applications sequential data streams to independent physical disk
sets. However, the use of Storwize family system striped volumes provides similar
performance while simplifying OS level storage administration complexity. It is also less
vulnerable to workload characteristics changes.
349
12.5.1 Overview
In setting up storage at a VIOS, several possibilities exist for creating volumes and serving
them to VIO clients (VIOCs). The first consideration is to create sufficient storage for each
VIOC. Less obvious, but equally important, is obtaining the best use of the storage.
Performance and availability are also significant. Internal Small Computer System Interface
(SCSI) disks (which are used for the VIOS operating system) and SAN disks often are
available. Availability for disk is usually handled by RAID on the SAN or by SCSI RAID
adapters on the VIOS.
Here, it is assumed that any internal SCSI disks are used for the VIOS operating system and
possibly for the operating systems of the VIOC. Furthermore, the applications are configured
so that the limited I/O occurs to the internal SCSI disks on the VIOS and to the rootvgs of the
VIOC. If you expect your rootvg might have a significant IOPS rate, you can configure it in the
same manner as for other application VGs later.
Remember to keep Storwize family system drivers (SDDPCM) on the VIOS system up to
date. The use of outdated drivers deprives you of benefits of updates that are in the new
version and potentially exposes your VIOS client to bugs that were fixed in new versions.
350
VIOS restrictions
You can create the following types of volumes on a VIOS:
NPIV volumes
By using NPIV volumes, you can natively present storage from a Storwize family storage
system to a logical partition defined in a Power system. This approach allows the OS
administrator to access the storage system by using its native drivers and, for example,
the use of its FlashCopy capabilities. When an NPIV mapping is defined for a logical
partition, VIOS defines two sets of WWPN addressees for the client. If you plan to use live
partition mobility, always remember to map the storage to both sets of WWPN
addressees. The drawback of the NPIV approach is that you need to maintain OS level
drivers for the storage on each LPAR, which is using NPIV to access the storage system.
Physical volume (PV) VSCSI hdisks
PV VSCSI hdisks are entire LUNs from the VIOS perspective that are presented to the
client, which sees them as physical volumes.
If you are concerned about failure of a VIOS and configured redundant VIOS for that
reason, you must use PV VSCSI or NPIV hdisks.
Logical volume (LV) VSCSI hdisks
An LV VSCSI hdisk is a logical volume that is presented by a VIOS to its client, who sees it
as a physical volume. LV VSCSI hdisks cannot be served from multiple VIOSs. LV that is
presented to a client cannot span physical volumes or be a striped logical volume.
351
352
353
How to configure IBM VSS hardware provider on VMware ESXi 5 for FlashCopy (IBM
Tivoli FCM):
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD106013
PowerHA Hardware Support Matrix:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638
354
Part 3
Part
Management,
monitoring, and
troubleshooting
This part provides information about preferred practices for monitoring, managing, and
troubleshooting your installation of SAN Volume Controller.
This part includes the following chapters:
Chapter 13, Monitoring on page 357
Chapter 14, Maintenance on page 485
Chapter 15, Troubleshooting and diagnostics on page 519
355
356
13
Chapter 13.
Monitoring
Tivoli Storage Productivity Center offers several reports that you can use to monitor SAN
Volume Controller and Storwize family products and identify performance problems. Tivoli
Storage Productivity Center version 5.2 is a major release that provides improvements to the
web-based user interface that is designed to offer ease of use access to your storage
environment. This interface also provides a common appearance that is based on the current
user interfaces for IBM XIV Storage System, IBM Storwize V7000, and IBM System Storage
SAN Volume Controller. For more information about Tivoli Storage Productivity Center version
5.2, see this website:
http://pic.dhe.ibm.com/infocenter/tivihelp/v59r1/topic/com.ibm.tpc_V521.doc/tpc_kc
_homepage.html
This chapter describes how to use the product for monitoring. It includes examples of
misconfiguration and failures. Then, it explains how you can identify them in Tivoli Storage
Productivity Center by using the Topology Viewer and performance reports. This chapter also
shows how to collect and view performance data directly from the SAN Volume Controller.
You should always use the latest version of Tivoli Storage Productivity Center that is
supported by your SAN Volume Controller code. Tivoli Storage Productivity Center is often
updated to support new SAN Volume Controller features. If you have an earlier version of
Tivoli Storage Productivity Center installed, you might still be able to reproduce the reports
that are described in this chapter, but some data might not be available.
This chapter includes the following sections:
Analyzing the SAN Volume Controller and Storwize Family Storage Systems by using
Tivoli Storage Productivity Center
Considerations for performance analysis
Top 10 reports for SAN Volume Controller and Storwize V7000
Reports for fabric and switches
Case studies
Monitoring in real time by using the SAN Volume Controller or Storwize V7000 GUI
Manually gathering SAN Volume Controller statistics
357
Note: In Tivoli Storage Productivity Center version 5.2, certain reporting and monitoring
capabilities are only available via the Tivoli Storage Productivity Center stand-alone GUI,
while others might be available in the stand-alone GUI or web-based GUI. In some
scenarios, the version 5.2 stand-alone GUI might provide more robust reporting and
monitoring capabilities than the wed-based GUI. Where possible in this chapter, we include
examples that use the Tivoli Storage Productivity Center version 5.2 web-based GUI.
Otherwise, examples are provided that use the Tivoli Storage Productivity Center version
5.2 stand-alone GUI.
358
13.1.1 Analyzing with the Tivoli Storage Productivity Center 5.2 web-based
GUI
Tivoli Storage Productivity Center provides a large amount of detailed information about SAN
Volume Controller and Storwize family products. The examples in this section show how to
access this information for a SAN Volume Controller with the Tivoli Storage Productivity
Center web-based GUI. The examples assume that the SAN Volume Controller cluster was
added to Tivoli Storage Productivity Center. For more information about the installation,
configuration, and administration of Tivoli Storage Productivity Center (including how to add a
storage system), see this website:
http://pic.dhe.ibm.com/infocenter/tivihelp/v59r1/topic/com.ibm.tpc_V521.doc/tpc_kc
_homepage.html
Tip: All tabular data views within the Tivoli Storage Productivity Center version 5.2
web-based GUI can be exported to CSV, PDF, or HTML file versions by using the Actions
menu, as shown in Figure 13-2 on page 360.
359
In the Storage Systems view, double-click the wanted storage system, as shown in
Figure 13-3.
360
The Overview window of the SAN Volume Controller storage system is displayed, as shown in
Figure 13-4. From this window, you can browse to various reports about many aspects of the
storage system.
361
The Pools page opens for this storage system, which displays the storage pools for this
system in a tabular format with various details, as shown in Figure 13-6.
Right-click any of the column headings to modify the columns that are displayed, as shown in
Figure 13-7 on page 363.
362
Tip: You can modify the columns that are displayed in any tabular view in the Tivoli Storage
Productivity Center 5.2 web GUI by right-clicking the column headings, as shown in
Figure 13-7.
By using the Pools window, you can view the following details about the pools:
Name
The name of a pool that uniquely identifies it within a storage system.
Storage System
The name of the storage system that contains a pool. This name was defined when the
storage system was added to Tivoli Storage Productivity Center. If a name was not
defined, the ID of the storage system is displayed.
Status
The status of a pool. Statuses include Online, Offline, Warning, Error, and Unknown. Use
the status to determine the condition of a pool, and if any actions must be taken. For
example, if a pool has an Error status, take immediate action to correct the problem.
363
Utilization (%)
The average daily utilization of the pool. The utilization of a pool is based on an estimate of
the average daily utilization of storage resources, such as controllers, device adapters,
and hard disks. The value for the utilization of the pool is estimated based on the
performance data that was collected on the previous day.
Tier
The tier level of pools on storage virtualizers. If the pool is not assigned a tier level, the cell
is left blank. To set or change the tier level, select one or more pools, right-click, and then
select Set Tier.
Acknowledged
Shows whether the status of a pool is acknowledged. An acknowledged status indicates
that the status was reviewed and is resolved or can be ignored. An acknowledged status is
not used the status of related, higher-level resources is determined.
For example, if the status of a pool is Error, the status of the storage system that contains
it is also Error. If the Error status for the pool is acknowledged, its status is not used to
determine the overall status of the associated storage system. In this case, if the other
internal resources of the storage system are Normal, the status of the storage system is
also Normal.
Capacity (GB)
The total amount of storage space in a pool. For XIV systems, this value represents the
physical (hard) capacity of the pool, not the virtual (soft) capacity. For other storage
systems, this value might also include overhead space if the pool is unformatted.
Allocated Space (GB)
The amount of space that is reserved for all the volumes in a pool, which includes
thin-provisioned and standard volumes.
The space that is allocated for thin-provisioned volumes is less than their virtual capacity,
which is shown in the Total Volume Capacity column. If a pool does not contain
thin-provisioned volumes, this value is the same as Total Volume Capacity.
This value is equal to Used Space for the following storage systems:
Storage systems that are not SAN Volume Controller and Storwize V7000.
SAN Volume Controller and Storwize V7000 storage systems that are not
thin-provisioned.
Available Pool Space (GB)
The amount of space in a pool that is not reserved for volumes.
Physical Allocation (%)
The percentage of physical space in a pool that was reserved for volumes. This value is
always less than or equal to 100% because you cannot reserve more physical space than
is available in a pool. The following formula is used to calculate this value:
Allocated Space / Pool Capacity * 100
For example, if the physical allocation percentage is 25% for a total storage pool size of
200 GB, the space that is reserved for volumes is 50 GB.
364
The first section of the bar in the Pools window uses the color blue and a percent (%) sign
to represent the physical allocation percentage. The second section of the bar uses the
color gray to represent the pool capacity. Hover the mouse pointer over the percentage bar
to view the following values:
Allocated Space: The amount of space that is reserved for all the volumes in a pool,
which includes both thin-provisioned and standard volumes.
Capacity: The total amount of space in a pool.
Unallocated Volume Space (GB)
The amount of the Total Volume Capacity in the pool that is not allocated.
Virtual Allocation (%): The percentage of physical space in a pool that was committed to
the total virtual capacity of the volumes in the pool. In thin-provisioned environments, this
percentage exceeds 100% if a pool is over committed (over provisioned). The following
formula is used to calculate this value:
Total Volume Capacity / Pool Capacity * 100
This value is available only for pools with thin-provisioned volumes.
For example, if the allocation percentage is 200% for a total storage pool size of 15 GB,
the virtual capacity that is committed to the volumes in the pool is 30 GB. This
configuration means that twice as much space is committed than is physically contained in
the pool. If the allocation percentage is 100% for the same pool, the virtual capacity that is
committed to the pool is 15 GB. This configuration means that all the physical capacity of
the pool is allocated to volumes.
An allocation percentage that is higher than 100% is considered aggressive because there
is insufficient physical capacity in the pool to satisfy the maximum allocation for all the
thin-provisioned volumes in the pool. In such cases, you can use the value for Shortfall (%)
to estimate how critical the shortage of space is for a pool.
Hover the mouse pointer over the percentage bar to view the following values:
Total Volume Capacity: The total storage space on all the volumes in a pool. For
thin-provisioned volumes, this value includes virtual space.
Capacity: The total amount of space in a pool.
Shortfall (%): The percentage of the remaining unallocated volume space in a pool that is
not available to be allocated. The higher the percentage, the more critical the shortfall of
pool space. This percentage is available only for a pool with thin-provisioned volumes.
The following formula is used to calculate this value:
Unallocatable Space / Committed but Unallocated Space * 100
You can use this percentage to determine when the amount of over-committed space in a
pool is at a critically high level. Specifically, if the physical space in a pool is less than the
committed virtual space, the pool does not have enough space to fulfill the commitment to
virtual space. This value represents the percentage of the committed virtual space that is
not available in a pool. As more space is used over time by volumes while the pool
capacity remains the same, this percentage increases.
For example, the remaining physical capacity of a pool is 70 GB, but 150 GB of virtual
space was committed to thin-provisioned volumes. If the volumes are using 50 GB, there
is still 100 GB committed to the volumes (150 GB - 50 GB) with a shortfall of 30 GB
(70 GB remaining pool space - 100 GB remaining commitment of volume space to the
volumes).
365
Because the volumes are overcommitted by 30 GB based on the available space in the
pool, the shortfall is 30% when the following calculation is used:
100 GB unallocated volume space - 70 GB remaining pool space / 100 GB
unallocated volume space *100
The first section of the bar in the Pools window uses the color blue and a percent (%) sign
to represent the shortfall percentage. The second section of the bar uses the color gray to
represent the unallocated volume space. Hover the mouse pointer over the percentage
bar to view the following values:
Unallocated Volume Space: The amount of the Total Volume Capacity in the pool that
is not allocated.
Available Pool Space: The amount of space in a pool that is not reserved for volumes.
Used Space (GB)
The amount of allocated space that is used by the volumes in a pool, which includes
thin-provisioned and standard volumes.
For SAN Volume Controller and Storwize V7000, you can pre-allocate thin-provisioned
volume space when the volumes are created. In these cases, the Used Space might be
different from the Allocated Space for pools that contain thin-provisioned volumes. For
pools with compressed volumes on SAN Volume Controller and Storwize V7000, the Used
Space reflects the size of compressed data that is written to disk. As the data changes, the
Used Space can, at times, be less than the Allocated Space.
For pools with volumes that are not thin provisioned or compressed in SAN Volume
Controller, Storwize V7000, and other storage systems, the values for Used Space and
Allocated Space are equal.
This value is accurate as of the most recent time that Tivoli Storage Productivity Center
collected data about a pool. Because data collection is run on a set schedule and the used
space on volumes can change rapidly, the value in this column might not be 100%
accurate for the current state of volumes.
Unused Space (GB)
The amount of space that is allocated to the volumes in a pool and is not yet used. The
following formula is used to calculate this value:
Allocated Space - Used Space
This value is available only for SAN Volume Controller and Storwize V7000 pools.
Total Volume Capacity (GB)
The total storage space on all the volumes in a pool, which includes thin-provisioned and
standard volumes. For thin-provisioned volumes, this value includes virtual space.
Unallocatable Volume Space (GB)
The amount of space by which the Total Volume Capacity exceeds the physical capacity of
a pool. The following formula is used to calculate this value:
Total Volume Capacity - Pool Capacity
In thin-provisioned environments, it is possible to over commit (over provision) storage in a
pool by creating volumes with more virtual capacity than can be physically allocated in the
pool. This value represents the amount of volume space that cannot be allocated based
on the current capacity of the pool.
Volumes
The number of volumes that are allocated from a pool.
366
Managed Disks
The number of managed disks that are assigned to a pool. This value is available only for
SAN Volume Controller and Storwize V7000 pools.
RAID Level
The RAID level of the pool, such as RAID 5 and RAID 10. The RAID level affects the
performance and fault tolerance of the volumes that are allocated from the pool. In some
cases, there might be a mix of RAID levels in a pool. The RAID levels in a mixed pool are
shown in a comma-separated list.
Extent Size (MB)
The extent granularity that was specified when a pool was created. Smaller extent sizes
limit the maximum size of the volumes that can be created in a pool, but minimize the
amount of potentially wasted space per volume. This value is available only for SAN
Volume Controller and Storwize V7000 pools.
Compression savings (%)
The percentage of capacity saving in a pool by using compression feature. This
information is available only for SAN Volume Controller and Storwize V7000 volumes.
Solid-State
Shows whether a pool contains solid-state disk drives. If a pool contains solid-state disks
and other disks, the value Mixed is shown.
Assigned Volume Space (GB)
The space on all the volumes in a pool that are mapped or assigned to host systems. For
a thin-provisioned pool, this value includes the virtual capacity of thin-provisioned
volumes, which might exceed the total space in the pool.
Unassigned Volume Space (GB)
The space on all the volumes in a pool that are not mapped or assigned to host systems.
For a thin-provisioned pool, this value includes the virtual capacity of thin-provisioned
volumes, which might exceed the total space in the pool.
Easy Tier
Shows how the Easy Tier function is enabled on a pool. The following values are possible:
Enabled/Inactive
Enabled/Active
Auto/Active
Auto/Inactive
Disabled
Unknown
367
368
Tip: The value in the Read I/O Capability column is not calculated until you select
values for the other columns that are related to back-end storage and save your
changes.
Read I/O Capability
The projected maximum number of I/O operations per second for a pool. This value is
calculated based on the values in the following fields:
This value is available only for SAN Volume Controller, Storwize V7000, and Storwize
V7000 Unified pools.
If the back-end storage system was not probed, the value in this column is blank. To help
calculate an approximate read I/O capability for the pool, you must manually define values
for the columns that are related to back-end storage.
Capacity Pool
The name of the capacity pool to which the storage pool is assigned.
Last Data Collection
The most recent date and time when data was collected about the storage system that
contains a storage pool.
Custom tag 1, 2, and 3
Any user-defined text that is associated with a pool. This text can be included as a report
column when you generate reports for the pool.
You can tag a pool to satisfy custom requirements of a service class. A service class can
specify up to three tags. To provide a service class, storage resources must have all the
same tags that are specified by the service class. If a pool is not tagged, any tags on the
containing storage system also apply to the pool for determining whether it satisfies the
custom requirements of a service class.
To edit the custom tags, right-click it and select View Properties. On the properties
notebook, click Edit.
To view detailed information about a specific storage pool, double-click the wanted storage
pool in the Pools tab in the right pane. A new window opens that includes various tabs for
displaying information about the pool, as shown in Figure 13-8 on page 370.
369
370
Figure 13-9 Accessing the Managed Disks view for a storage system
The Managed Disks page for this storage system opens, which displays the managed
disks for this system in a tabular format with various details, as shown in Figure 13-10.
371
2. Right-click any of the column headings to modify the columns that are displayed, as shown
in Figure 13-11.
By using the Managed Disks window, you can view the following details about the
managed disks:
Name
The name of a managed disk that uniquely identifies it within a storage system.
Status
The status of a managed disk. The following statuses are available:
Online
Offline
Error
Unknown
Use the status to determine the condition of a managed disk, and if any actions must
be taken. For example, if a managed disk has an Offline status, its associated pool also
is offline and you should take immediate action to correct the problem.
Pool
The name of the storage pool (if any) to which the managed disk belongs.
Storage System
The SAN Volume Controller or Storwize system that is managing this managed disk.
372
373
The Volumes window for this storage system opens, which displays the volumes for this
system in a tabular format with various details, as shown in Figure 13-14.
374
2. Right-click any of the column headings to modify the columns that are displayed, as shown
in Figure 13-15.
By using the Volumes window, you can view the following details about the volumes:
Name
The name that was assigned to a volume when it was created.
Storage System
The name of the storage system that contains a volume. This name was defined when
the storage system was added to Tivoli Storage Productivity Center. If a name was not
defined, the ID of the storage system is displayed.
Status
The status of a volume. The following statuses are available:
Normal
Warning
Error
Unknown
375
Use the status to determine the condition of the volume, and if any actions must be
taken. For example, if a volume has an Error status, take immediate action to correct
the problem.
Acknowledged
Shows whether the status of a volume is acknowledged. An acknowledged status
indicates that the status was reviewed and is resolved or can be ignored. An
acknowledged status is not used when the status of related, higher-level resources is
determined.
For example, if the status of a volume is Error, the status of the related storage system
also is Error. If the Error status of the volume is acknowledged, its status is not used to
determine the overall status of the storage system. In this case, if the other internal
resources of the storage system are Normal, the status of the storage system is also
Normal.
ID
The identifier for a volume, such as a serial number or internal ID.
Unique ID
The ID that is used to uniquely identify a volume across multiple storage systems.
Pool
The name of the storage pool in which a volume is a member.
Capacity (GB)
The total amount of storage space that is committed to a volume. For thin-provisioned
volumes, this value represents the virtual capacity of the volume. This value might also
include overhead space if the pool is unformatted.
I/O Group
The name of the I/O Group to which a volume is assigned.
Preferred Node
The name of the preferred node within the I/O Group to which a volume is assigned.
Hosts
The name of the host to which a volume is assigned. This name is the host name as
defined on the storage system, and is not the name of the server that was discovered
by a Storage Resource agent. If more than one host is assigned, the number of hosts
is displayed. For storage systems that are managed by a CIM agent, the host name in
this column might not match the configured host name on the storage system. Instead,
the host name might be replaced by the WWPN of the host port or text that is
generated by the CIM agent.
Virtual Disk Type
The type of virtual disk with which a volume was created, such as sequential, striped,
or image.
Formatted
Shows whether a volume is formatted.
Fast Write State
Shows the cache state for a volume, such as empty, not empty, corrupt, and repairing.
The corrupt state indicates that you must recover the volume by using one of the
recovervdisk commands for the storage system. The repairing state indicates that
repairs that were started by a recovervdisk command are in progress.
376
Copies
The number of secondary copies (virtual disk copies) for a volume. The primary copy
of a virtual disk is not counted as a mirror.
Volume Copy Relationship
Shows whether a volume is in a replication relationship that creates a snapshot or
point-in-time copy of the volume on a specified target volume. A volume can be a
source, target, or both a target for one copy pair and a source for a different copy pair.
In storage systems, this relationship might be referred to as a FlashCopy, snapshot, or
point-in-time copy relationship. A volume can have one of the following properties:
Source and Target: The volume is a target for one copy pair and a source for a
different copy pair.
Storage Virtualizer
The name of the storage virtualizer that is managing a volume. A storage virtualizer is
a storage system that virtualizes storage space from internal storage or from another
storage system. Examples of storage virtualizers include SAN Volume Controller and
Storwize V7000. A value is displayed in this column only if the volume is managed by a
storage virtualizer and Tivoli Storage Productivity Center collected data about that
virtualizer.
Virtualizer Disk
The managed disk for the virtualizer that corresponds to a volume.
Thin Provisioned
Shows whether a volume is a thin-provisioned volume and the type of thin-provisioning
that is used for the volume. A thin-provisioned volume is a volume with a virtual
capacity that is different from its real capacity. Not all the storage capacity of the
volume is allocated when the volume is created, but is allocated over time, as needed.
Allocated Space (GB)
The amount of space that is reserved for a volume. The space that is allocated for a
thin-provisioned volume is less than its virtual capacity. The value for Allocated Space
is equal to Used Space for SAN Volume Controller and Storwize V7000 storage
systems that are not thin-provisioned
Unallocated Space (GB)
The amount of space that is not reserved for a thin-provisioned volume. This value is
determined by using the following formula:
Capacity - Allocated Space
Compressed
Shows whether a storage volume is compressed.
Compression Savings (%)
The percentage of capacity saving in a volume by using the compression feature.
Physical Allocation (%)
The percentage of physical space that is reserved for a volume. This value cannot be
greater than 100% because it is not possible to reserve more physical space than is
available.
377
378
Enabled/Inactive
Enabled/Active
Auto/Active
Auto/Inactive
Disabled
Unknown
379
380
d. If the I/O is part of a FlashCopy and the FlashCopy block was not copied to the target
VDisk, the action must be scheduled.
4. If the I/O is a read I/O:
a. The SAN Volume Controller must check the cache to see whether the Read I/O is
already there.
b. If the I/O is not in the cache, the SAN Volume Controller must read the data from the
physical LUNs (managed disks).
5. At some point, write I/Os are sent to the storage controller.
6. To reduce latency on subsequent read commands, the SAN Volume Controller might also
perform read-ahead I/Os to load the cache.
In some situations, such as the following examples, you might want multiple managed disk
groups:
Workload isolation
Short-stroking a production managed disk group
Managing different workloads in different groups
381
d. If the I/O is part of a FlashCopy and the FlashCopy block was not copied to the target
volume, this action must be scheduled.
4. If the I/O is a read I/O:
a. The Storwize V7000 must check the cache to see whether the Read I/O is already in
the cache.
b. If the I/O is not in the cache, the Storwize V7000 must read the data from the physical
MDisks.
5. At some point, write I/Os are destaged to Storwize V7000 MDisks or sent to the back-end
SAN-attached storage controllers.
6. The Storwize V7000 might also do some data optimized, sequential detect pre-fetch
cache I/Os to preinstall the cache if the next read I/O was determined by the Storwize
V7000 cache algorithms. This approach benefits the sequential I/O when compared with
the more common least recently used (LRU) method that is used for nonsequential I/O.
382
In other cases, such as performance analysis for a particular server, you follow another
sequence, starting with Managed Disk Group Performance. By using this approach, you can
quickly identify MDisk and VDisks that belong to the server that you are analyzing.
To view system reports that are relevant to SAN Volume Controller and Storwize V7000, click
IBM Tivoli Storage Productivity Center Reporting System Reports Disk.
I/O Group Performance and Managed Disk Group Performance are specific reports for SAN
Volume Controller and Storwize V7000. Module/Node Cache Performance is also available
for IBM XIV. These reports are shown in Figure 13-18.
Figure 13-18 System reports for SAN Volume Controller and Storwize V7000
383
Figure 13-19 shows a sample structure to review basic SAN Volume Controller concepts
about SAN Volume Controller structure and then to proceed with performance analysis at the
component levels.
VDisk
(1 TB)
VDisk
(1 TB)
I/O Group
SVC Node
MDisk
(2 TB)
MDisk
(2 TB)
DS4000, 5000,
6000, 8000, XIV
. ..
SVC Node
MDisk
(2 TB)
MDisk
(2 TB)
Internal Storage
(Storwize V7000 only)
RAW stora ge
Figure 13-19 SAN Volume Controller and Storwize V7000 sample structure
Note: By using Tivoli Storage Productivity Center version 5.2, you can generate some of
the reports that were identified in this section in the stand-alone GUI or the web-based
GUI. Where applicable, examples are provided by using both methods.
13.3.1 I/O Group Performance for SAN Volume Controller and Storwize V7000
In this section, we describe the I/O group performance monitoring for SAN Volume Controller
and Storwize V7000.
384
Important: The data that is displayed in a performance report is the last collected value at
the time the report is generated. It is not an average of the last hours or days, but it shows
the last data collected.
Click the magnifying glass icon ( ) that is next to SAN Volume Controller io_grp0 entry to
drill down and view the statistics by nodes within the selected I/O group. The drill down from
io_grp0 tab is created, as shown in Figure 13-21. This tab contains the report for nodes within
the SAN Volume Controller.
To view a historical chart of one or more specific metrics for the resources, click the pie chart
icon (
). A list of metrics is displayed, as shown in Figure 13-22 on page 386. You can
select one or more metrics that use the same measurement unit. If you select metrics that
use different measurement units, you receive an error message.
385
You can change the reporting time range and click Generate Chart to regenerate the graph,
as shown in Figure 13-23. A continual high Node CPU Utilization rate indicates a busy I/O
group. In our environment, CPU utilization does not rise above 24%, which is a more than
acceptable value.
386
The I/Os are present only on Node 2. Therefore, in Figure 13-25 on page 388, you can see a
configuration problem where the workload is not well-balanced, at least during this time
frame.
387
In the I/O rate graph (as shown in Figure 13-25), you can see a configuration problem.
388
Figure 13-26 Response time selection for the SAN Volume Controller node
Figure 13-27 shows the report. The values are shown that might be accepted in the back-end
response time for read and write operations. These values are consistent for both I/O groups.
Figure 13-27 Response Time report for the SAN Volume Controller node
389
Data Rate
To view the Read Data rate, complete the following steps:
1. In the drill down from io_grp0 tab, which returns you to the performance statistics for the
nodes within the SAN Volume Controller, click the pie chart icon (
).
2. Select the Read Data Rate metric. Press Shift and select Write Data Rate and Total Data
Rate. Then, click OK to generate the chart, as shown in Figure 13-28.
390
To interpret your performance results, always return to your baseline. For more information
about creating a baseline, see SAN Storage Performance Management Using Tivoli Storage
Productivity Center, SG24-7364.
The throughput benchmark (which is 7,084.44 SPC-2 MBPS) is the industry-leading
throughput benchmark. For more information about this benchmark, see SPC Benchmark 2
Executive Summary IBM System Storage SAN Volume Controller SPC-2 V1.2.1, which is
available at this website:
http://www.storageperformance.org/results/b00024_IBM-SVC4.2_SPC2_executive-summary
.pdf
Figure 13-29 Accessing the I/O Groups view for a storage system
The I/O Groups window opens for this storage system, which displays the I/O groups for this
system in a tabular format, with various details, as shown in Figure 13-30 on page 392. In our
lab environment, data was collected for a SAN Volume Controller with a single I/O group.
391
Tip: For a SAN Volume Controller with multiple I/O groups, a separate row is generated for
every I/O group within that SAN Volume Controller.
To quickly view the performance of all the I/O Groups in this storage system, click the
Performance tab. Alternatively, to view the performance of a single I/O Group, right-click
anywhere in the corresponding row in the I/O Groups tab, as shown in Figure 13-31.
392
A window opens that displays the performance of that I/O Group, as shown in Figure 13-32.
The default view in the I/O Group performance page shows a graph in the top, which depicts
certain key metric values over time. It also includes a table at the bottom that contains
averages of the data that is used to display the graph.
The performance graph can be easily customized to suit your specific needs in the following
ways:
The default time window that is displayed is for the last 12 hours. To display metrics for a
different time frame, click one of the predefined time frame links at the top of the window,
or enter specific start and end dates and times by using the drop-down fields at the bottom
of the chart.
To display the performance metrics for a related entity, such as the volumes of the I/O
Group or the storage system to which the I/O group belongs, right-click anywhere in the
row for the I/O Group and click a selection, as shown in Figure 13-33 on page 394. A
browser window opens that displays the corresponding performance.
393
To display different metrics in the chart, click + next to the Metrics heading in the legend
area to the left of the chart, as shown in Figure 13-34.
A window opens in which you can choose the metrics to display, as shown in Figure 13-35.
394
This report can be used to determine whether the I/O workload is evenly distributed across
the nodes in the I/O group. An absence of I/O on a single node, or disproportionately large
amount of I/O on a single node might indicate a configuration issue. In our example, I/Os are
present on each node and appear to be evenly distributed.
395
A continual high CPU Utilization rate indicates a busy I/O group. In our environment, CPU
utilization does not rise above 2%, which indicates that our I/O Group is mostly idle.
Tip: To view the CPU Utilization percentage of the individual nodes in the I/O Group, open
a new performance window for the I/O Group nodes, as shown in Figure 13-33 on
page 394. In that window, follow the same steps as described in this section.
396
The graph updates to display the average backend read and write response times for the
disks that are servicing the nodes in the I/O group, as shown in Figure 13-39.
397
Data Rate
To view the data rates of the nodes in the I/O group, click the + next to the Metrics heading in
the legend area to the left of the chart (as shown in Figure 13-37 on page 396) clear any
selected metrics, select the Data Rate (MiB/s) Read and Write metrics (as shown in
Figure 13-40), and then click OK.
The graph updates to display the average read and write data rates for the nodes in the I/O
group, as shown in Figure 13-41.
398
These factors all ultimately lead to the final performance that is realized.
In reviewing the SPC benchmark (see Figure 13-42), the results for the I/O and Data Rate are
different depending on the transfer block size used.
Max I/Os and MBps Per I/O Group 70/30 Read/Write Miss
2145-8G4
4K Transfer Size
122K 500 MBps
64K Transfer Size
29K 1.8 GBps
2145-8F4
4K Transfer Size
72K 300 MBps
64K Transfer Size
23K 1.4 GBps
2145-4F2
4K Transfer Size
38K 156 MBps
64K Transfer Size
11K 700 MBps
2145-8F2
4K Transfer Size
72K 300 MBps
64K Transfer Size
15K 1 GBps
Figure 13-42 Benchmark maximum I/Os and MBps per I/O group for SPC SAN Volume Controller
399
Reviewing the two-node I/O group that is used, you might see 122,000 I/Os if all of the
transfer blocks were 4 K. However, they rarely are 4 K in typical environments. If you go down
to 64 K (or bigger) with anything over about 32 K, you might realize a result more typical of the
29,000 I/Os as noticed by the SPC benchmark.
13.3.2 Node Cache Performance for SAN Volume Controller and Storwize
V7000
Node cache performance is described in this section.
Figure 13-43 Module/Node Cache Performance report for SAN Volume Controller and Storwize V7000
400
).
3. Select Read Cache Hits percentage (overall), and then click OK to generate the chart,
as shown in Figure 13-44.
Figure 13-44 Storwize V7000 Cache Hits percentage that shows no traffic on node1
Important: The flat line for node 1 does not mean that the read request for that node
cannot be handled by the cache. It means that no traffic is on that node, as shown in
Figure 13-45 on page 402 and Figure 13-46 on page 402, where Read Cache Hit
Percentage and Read I/O Rates are compared in the same time interval.
401
402
This configuration might not be good because the two nodes are not balanced. In the lab
environment for this book, the volumes that were defined on Storwize V7000 were all defined
with node 2 as the preferred node.
After we moved the preferred node for the tpcblade3-7-ko volume from node 2 to node 1, we
obtained the graph for Read Cache Hit percentage that is shown in Figure 13-47.
Figure 13-47 Cache Hit Percentage for Storwize V7000 after reassignment
403
We also obtained the graph for Read I/O Rates that is shown in Figure 13-48.
Figure 13-48 Read I/O rate for Storwize V7000 after reassignment
404
405
The Global Mirror Overlapping Write Percentage metric is applicable only in a Global Mirror
Session. This metric is the average percentage of write operations that are issued by the
Global Mirror primary site and that were serialized overlapping writes for a component over a
specified time interval. For SAN Volume Controller V4.3.1 and later, some overlapping writes
are processed in parallel (not serialized) and are excluded. For earlier SAN Volume Controller
versions, all overlapping writes were serialized.
Select the metrics that are named percentage because you can have multiple metrics, with
the same unit type, in one chart. Complete the following steps:
1. In the Selection window (as shown in Figure 13-49), move the percentage metrics that you
want include from the Available Column to the Included Column. Then, click Selection to
select only the Storwize V7000 entries.
2. In the Select Resources window, select the node or nodes, and then click OK.
Figure 13-49 shows an example where several percentage metrics are chosen for
Storwize V7000.
3. In the Select Charting Options window, select all the metrics, and then click OK to
generate the chart.
406
As shown in Figure 13-50, we notice in our test a drop in the Cache Hits percentage. Even a
drop that is not so dramatic can be considered as an example for further investigation of
problems that arise.
Figure 13-50 Resource performance metrics for multiple Storwize V7000 nodes
Changes in these performance metrics and an increase in back-end response time (see
Figure 13-51) shows that the storage controller is heavily burdened with I/O, and the Storwize
V7000 cache can become full of outstanding write I/Os.
Figure 13-51 Increased overall back-end response time for Storwize V7000
407
Host I/O activity is affected with the backlog of data in the Storwize V7000 cache and with any
other Storwize V7000 workload that is going on to the same MDisks.
I/O groups: If cache utilization is a problem, you can add cache to the cluster by adding an
I/O group and moving volumes to the new I/O Group in SAN Volume Controller and
Storwize V7000 V6.2. Also, adding an I/O group and moving a volume from one I/O group
to another are still disruptive actions. Therefore, you must properly plan how to manage
this disruption.
For more information about rules of thumb and how to interpret these values, see SAN
Storage Performance Management Using Tivoli Storage Productivity Center, SG24-7364.
Efficient use of cache can help enhance virtual disk I/O response time. Node cache
performance can be easily viewed in Tivoli Storage Productivity Center 5.2, with which you
can monitor cache related metrics, such as Read and Write Cache Hits percentage and Read
Ahead and Dirty Writes percentage of cache hits. These metrics can provide an indication of
whether the cache can service and buffer the current workload.
This loads the Nodes window for this storage system, which displays the nodes for this
system in a tabular format with various details, as shown in Figure 13-53 on page 409. In our
lab environment, data was collected for a SAN Volume Controller with two nodes.
408
To quickly view the performance of all the Nodes in this storage system, click the
Performance tab. Alternatively, to view the performance of a specific node or nodes (that is,
the two nodes in an I/O group), highlight the row (or rows) of the wanted nodes by pressing
Ctrl and then left-click, then right-click in any of the highlighted rows in the Nodes tab and click
View Performance, as shown in Figure 13-54.
Tip: When you analyze node cache performance, it is often useful to view the cache
metrics of both nodes in an I/O group in the same chart. You can determine whether your
workload is evenly balanced across the nodes in the I/O group.
A window opens that displays the performance of the selected nodes, as shown in
Figure 13-55 on page 410.
409
The default view in the nodes performance window is similar to that of the I/O group
performance window. For more information about customizing the performance view, see
13.3.1, I/O Group Performance for SAN Volume Controller and Storwize V7000 on
page 384.
Figure 13-56 Modifying the nodes performance page to display cache related metrics
410
A window opens in which you can choose the metrics to display, as shown in Figure 13-57.
In this example, we chose to display Read Cache Hit Percentage only for both nodes in the
I/O group, as shown in Figure 13-58.
Important: A Read Cache Hit Percentage at or close to zero does not necessarily mean
that the read requests for that node cannot be handled by the cache. It might mean that no
traffic is arriving at that node. You can verify this fact by also displaying the Read I/O Rate
metric in the chart. If doing so verifies that there is little or no I/O on that node, consider
whether to balance the workload more evenly across the nodes by modifying the preferred
nodes settings for the volumes in the I/O group.
411
412
13.3.3 Viewing the Managed Disk Group Performance report for SAN Volume
Controller by using the stand-alone GUI
The Managed Disk Group Performance report provides disk performance information at the
managed disk group level. It summarizes the read and write transfer size and the back-end
read, write, and total I/O rate. From this report, you can easily browse to see the statistics of
virtual disks that are supported by a managed disk group or drill down to view the data for the
individual MDisks that make up the managed disk group.
To access this report, click IBM Tivoli Storage Productivity Center Reporting System
Reports Disk, and select Managed Disk Group Performance. A table is displayed (as
shown in Figure 13-59) that lists all the known managed disk groups and their last collected
statistics, which are based on the latest performance data collection.
413
One of the managed disk groups is named CET_DS8K1901mdg. When you click the magnifying
glass icon ( ) for the CET_DS8K1901mdg entry, a new page opens (as shown in
Figure 13-60) that shows the managed disks in the managed disk group.
Figure 13-60 Drill down from Managed Disk Group Performance report
When you click the magnifying glass icon ( ) for the mdisk61 entry, a new page (as shown in
Figure 13-61) opens that shows the volumes in the managed disk.
414
Figure 13-62 Managed disk group I/O rate selection for SAN Volume Controller
A chart is generated that is similar to the one that is shown in Figure 13-63.
Figure 13-63 Managed Disk Group I/O rate report for SAN Volume Controller
415
When you review this general chart, you must understand that it reflects all I/O to the
back-end storage from the MDisks that are included in this managed disk group. The key for
this report is a general understanding of back-end I/O rate usage, not whether balance is
outright. In this report, for the time frame that is specified at one point is a maximum of nearly
8200 IOPS.
Although the SAN Volume Controller and Storwize V7000, by default, stripe write and read
I/Os across all MDisks, the striping is not through a RAID 0 type of stripe. Rather, because the
VDisk is a concatenated volume, the striping that is injected by the SAN Volume Controller
and Storwize V7000 is only in how you identify the extents to use when you create a VDisk.
Until host I/O write actions fill up the first extent, the remaining extents in the block VDisk that
is provided by SAN Volume Controller are not used. When you are looking at the Managed
Disk Group Backend I/O report, that you might not see a balance of write activity, even for a
single managed disk group.
Figure 13-64 Backend Read Response Time for the managed disk
2. Select all of the managed disks entries and then click the pie chart icon (
).
3. In the Select Charting Option window, select the Backend Read Response time metric.
Then, click OK.
416
417
Figure 13-67 shows the report that is generated, which in this case indicates that the
workload is not balanced on MDisks.
418
The Pools view for that storage system opens. Click the Performance tab to see a
performance chart for these pools, as shown in Figure 13-69 on page 420.
419
The default performance view displays a chart of Total I/O Rate and Overall Response Time
for all the pools. Performance charts are similar for all storage systems elements (that is, I/O
Groups, Nodes, Volumes, and Pools.). For more information about how to modify the
displayed metrics, the pools that are included in the chart, or the time frame for the chart, see
Accessing I/O Group Performance by using the web-based GUI on page 391.
From this report, you can easily browse to see the statistics of the volumes that are supported
by a pool or drill down to view the statistics for the individual MDisks that make up the pool by
right-clicking the row for the wanted pool, as shown in Figure 13-70.
420
421
The chart now shows the Total Data Rate instead of Overall I/O Rate, as shown in
Figure 13-72.
422
To limit these system reports to SAN Volume Controller subsystems, complete the following
steps to specify a filter, as shown in Figure 13-73:
1. In the Selection tab, click Filter.
2. In the Edit Filter window, click Add to specify another condition to be met.
You must complete the filter process for all five reports.
Figure 13-73 Specifying a filter for SAN Volume Controller Top Volume Performance Reports
Figure 13-74 Top Volumes Cache Hit performance report for SAN Volume Controller
423
424
Figure 13-76 shows the report that is generated. If this report is generated during the run time
periods, the volumes have the highest total data rate and are listed on the report.
Figure 13-76 Top Volume Data Rate report for SAN Volume Controller
Figure 13-77 Top Volumes Disk Performance for SAN Volume Controller
425
Figure 13-78 Top Volumes I/O Rate Performance for SAN Volume Controller
426
Figure 13-79 Top Volume Response Performance report for SAN Volume Controller
427
To simplify, you can assume that (front-end) response times probably need to be 5 - 15 ms.
The rank (back-end) response times can usually operate at 20 - 25 ms, unless the hit ratio is
poor. Back-end write response times can be even higher, generally up to 80 ms.
Important: All of these considerations are not valid for SSDs, where seek time and latency
are not applicable. You can expect these disks to have much better performance and,
therefore, a shorter response time (less than 4 ms).
For more information about creating a tailored report for your environment, see 13.5.3, Top
volumes response time and I/O rate performance reports on page 455.
By default, the lower left quadrant loads a chart displaying the Most Active Volumes for this
storage system. You can choose to have this chart displayed in any of the four quadrants by
clicking the Title in that quadrant and selecting Most Active Volumes, as shown in
Figure 13-79 on page 427.
428
Figure 13-81 Changing the metrics that are displayed in the Storage System Overview page
In addition, you can change the metric by which the most active volumes are computed by
toggling left or right by using the arrows, as shown in Figure 13-82.
Figure 13-82 Changing the metric used to compute Most Active Volumes
You can quickly view the top 10 most active volumes in the storage system as measured by
the following metrics:
Creating Most Active Volumes performance report by using the web-based GUI
The predefined Most Active Volumes report can be useful for identifying problem performance
areas. You can quickly generate a report of the most active volumes within a system that is
based on the criteria you choose, and allows for easy export of the report data.
To access the report, click View predefined reports in the Reporting section of the left-side
navigation in the Tivoli Storage Productivity Center 5.2 web-based GUI, as shown in
Figure 13-83 on page 430.
429
A window opens in which you can log in to the Cognos reporting component of Tivoli
Storage Productivity Center 5.2, as shown in Figure 13-84.
430
Note: Access to this Reporting functionality requires the installation of the necessary Tivoli
Common Reporting/Cognos components.
After you are logged in, you see a list of available reports, as shown in Figure 13-85.
To access the Most Active Volumes report, click Storage Systems. In the next window, click
Volumes, as shown in Figure 13-86.
The window then displays the available predefined reports for Volumes, as shown in
Figure 13-87.
Click the link for the Most Active Volumes report. You are prompted to select the report
criteria, as shown in Figure 13-88 on page 432.
431
Figure 13-88 Selecting the criteria for the Most Active Volumes report
By using the report criteria window, you can select the following criteria:
Wanted storage system
Metric to use for sorting the report results. The following values are available:
The reporting period. You can select from various predefined time periods, or you can
define a custom time period.
Select the criteria appropriate to your scenario and click Finish to generate the report. The
report results are then loaded in the next window, as shown in Figure 13-89 on page 433.
432
After the report is generated, you can select an alternative metric from the drop-down menu
and the report automatically reloads. Additionally, you can use the menu options at the top of
the page to save, email, or export the report.
13.3.5 Port Performance reports for SAN Volume Controller and Storwize
V7000
The SAN Volume Controller and Storwize V7000 Port Performance reports help you
understand the SAN Volume Controller and Storwize V7000 effect on the fabric. They also
provide an indication of the following traffic:
SAN Volume Controller (or Storwize V7000) and hosts that receive storage
SAN Volume Controller (or Storwize V7000) and back-end storage
Nodes in the SAN Volume Controller (or Storwize V7000) cluster
These reports can help you understand whether the fabric might be a performance bottleneck
and whether upgrading the fabric can lead to performance improvement.
The Port Performance report summarizes the various send, receive, and total port I/O rates
and data rates. To access this report, click IBM Tivoli Storage Productivity Center My
Reports System Reports Disk, and select Port Performance. To display only SAN
Volume Controller and Storwize V7000 ports, click Filter. Then, produce a report for all the
volumes that belong to SAN Volume Controller or Storwize V7000 subsystems, as shown in
Figure 13-90 on page 434.
433
A separate row is generated for the ports of each subsystem. The information that is
displayed in each row reflects that data that was last collected for the port.
The Time column (not shown in Figure 13-90) shows the last collection time, which might be
different for the various subsystem ports. Not all of the metrics in the Port Performance report
are applicable for all ports. For example, the Port Send Utilization percentage, Port Receive
Utilization Percentage, and Overall Port Utilization percentage data are not available on SAN
Volume Controller or Storwize V7000 ports.
The value N/A is displayed when data is not available, as shown in Figure 13-91. By clicking
Total Port I/O Rate, you see a prioritized list by I/O rate.
You can now verify whether the data rates to the back-end ports (as shown in the report) are
beyond the normal rates that are expected for the speed of your fiber links, as shown in
Figure 13-92 on page 435. This report often is generated to support problem determination,
capacity management, or SLA reviews. Based on the 8 Gb per second fabric, these rates are
well-below the throughput capability of this fabric. Therefore, the fabric is not a bottleneck
here.
434
Figure 13-92 Port I/O Rate report for SAN Volume Controller and Storwize V7000
Next, select the Port Send Data Rate and Port Receive Data Rate metrics to generate
another historical chart, as shown in Figure 13-93. This chart confirms the unbalanced
workload for one port.
Figure 13-93 SAN Volume Controller and Storwize V7000 Port Data Rate report
To investigate further by using the Port Performance report, return to the I/O group
performances report and complete the following steps:
1. Click IBM Tivoli Storage Productivity Center My Reports System Reports
Disk. Select I/O group Performance.
435
2. Click the magnifying glass icon ( ) to drill down to the node level. As shown in
Figure 13-94, we chose node 1 of the SAN Volume Controller subsystem. Click the pie
chart icon (
).
3. In the Select Charting Option window (see Figure 13-95), select Port to Local Node
Send Queue Time, Port to Local Node Receive Queue Time, Port to Local Node
Receive Response Time, and Port to Local Node Send Response Time. Then, click
OK.
Figure 13-95 SAN Volume Controller Node port selection queue time
Review the port rates between SAN Volume Controller nodes, hosts, and disk storage
controllers. Figure 13-96 on page 437 shows low queue and response times, which indicates
that the nodes do not have a problem communicating with each other.
436
If this report shows high queue and response times, the write activity is affected because
each node communicates to each other node over the fabric.
Unusually high numbers in this report indicate the following issues:
A SAN Volume Controller (or Storwize V7000) node or port problem (unlikely)
Fabric switch congestion (more likely)
Faulty fabric ports or cables (most likely)
437
After you have the I/O rate review chart, generate a data rate chart for the same time frame to
support a review of your high availability ports for this application.
Then, generate another historical chart by choosing the Total Port Data Rate metric (see
Figure 13-98) that confirms the unbalanced workload for one port that is shown in the report
in Figure 13-97.
438
439
).
3. In the Select Charting Option window (see Figure 13-100), select Total Port Data Rate,
and then click OK.
Figure 13-100 Port Data Rate selection for the Fabric report
You now see a chart similar to the example that is shown in Figure 13-101 on page 441. Port
Data Rates do not reach a warning level, in this case, knowing that FC Port speed is 8 Gbps.
440
1 Gbps
100 MBps
50 MBps
2 Gbps
200 MBps
100 MBps
4 Gbps
400 MBps
200 MBps
8 Gbps
800 MBps
400 MBps
10 Gbps
1000 MBps
500 MBps
16 Gbps
1600 MBps
800 MBps
441
As appropriate, examples are provided that use the Tivoli Storage Productivity Center 5.2
stand-alone GUI, the Tivoli Storage Productivity Center 5.2 web-based GUI, and Tivoli
Common Reporting/Cognos.
4. Click Generate Report. You then see the output on the Computers tab, as shown in
Figure 13-103 on page 443.
You can scroll to the right at the bottom of the table to view more information, such as the
volume names, volume capacity, and allocated and deallocated volume spaces.
442
5. (Optional) To export data from the report, select File Export Data to a comma-delimited
file, a comma-delimited file with headers, a formatted report file, and an HTML file.
From the list of this volume, you can start to analyze performance data and workload I/O
rate. Tivoli Storage Productivity Center provides a report that shows volume to back-end
volume assignments.
6. To display the report, complete the following steps:
a. Click Disk Manager Reporting Storage Subsystem Volume to Backend
Volume Assignment, and select By Volume.
b. Click Filter to limit the list of the volumes to those volumes that belong to the
tpcblade3-7 server, as shown in Figure 13-104 on page 444.
443
444
You now see a list similar to the one that is shown in Figure 13-105.
d. Scroll to the right to see the SAN Volume Controller managed disks and back-end
volumes on the DS8000, as shown in Figure 13-106.
Back-end storage subsystem: The highlighted lines with the value N/A are related
to a back-end storage subsystem that is not defined in our Tivoli Storage
Productivity Center environment. To obtain the information about the back-end
storage subsystem, we must add it in the Tivoli Storage Productivity Center
environment with the corresponding probe job. See the first line in the report in
Figure 13-106, where the back-end storage subsystem is part of our Tivoli Storage
Productivity Center environment. Therefore, the volume is correctly shown in all
details.
445
With this information and the list of volumes that are mapped to this computer, you can start
to run a Performance report to understand where the problem for this server might be.
4. In the Volumes tab, click the volume that you need to investigate and then click the pie
chart icon (
).
5. In the Select Charting Option window (see Figure 13-108 on page 447), select Total I/O
Rate (overall). Then, click OK to produce the graph.
446
The history chart that is shown in Figure 13-109 shows that I/O rate was around 900
operations per second and suddenly declined to around 400 operations per second. Then,
the rate goes back to 900 operations per second. In this case study, we limited the days to
the time frame that was reported by the customer when the problem was noticed.
Figure 13-109 Total I/O rate chart for the Storwize V7000 volume
6. In the Volumes tab, select the volume that you need to investigate, and then click the pie
chart icon (
).
447
7. In the Select Charting Option window (see Figure 13-110), scroll down and select Overall
Response Time. Then, click OK to produce the chart.
Figure 13-110 Volume selection for the Storwize V7000 performance report
The chart that is shown in Figure 13-111 indicates an increase in response time from a
few milliseconds to around 30 ms. This information and the high I/O rate indicate the
occurrence of a significant problem. Therefore, further investigation is appropriate.
448
8. Complete the following steps to review the performance of MDisks in the managed disk
group:
a. To identify to which MDisk the tpcblade3-7-ko2 VDisk belongs, in the Volumes tab (see
Figure 13-112), click the drill-up icon ( ).
Figure 13-113 shows the MDisks where the tpcblade3-7-ko2 extents are installed.
b. Select all the MDisks and click the pie chart icon (
).
c. In the Select Charting Option window (see Figure 13-114 on page 450), select Overall
Backend Response Time, and then click OK.
449
Keep the charts that are generated relevant to this scenario by using the charting time
range. You can see from the chart that is shown in Figure 13-115 that something
happened on 26 May around 6:00 p.m. that likely caused the back-end response time for
all MDisks to dramatically increase.
If you review the chart for the Total Backend I/O Rate for these two MDisks during the
same time period, you see that their I/O rates all remained in a similar overlapping pattern,
even after the problem was introduced.
450
This result is as expected and might occur because tpcblade3-7-ko2 is evenly striped
across the two MDisks. The I/O rate for these MDisks is only as high as the slowest MDisk,
as shown in Figure 13-116.
We identified that the response time for all MDisks dramatically increased.
9. Generate a report to show the volumes that have an overall I/O rate equal to or greater
than 1000 Ops/ms. We also generate a chart to show which I/O rates of the volume
changed around 5:30 p.m. on 20 August. Complete the following steps:
a. Click Disk Manager Reporting Storage Subsystem Performance, and select
By Volume.
b. In the Selection tab, complete the following steps:
i. Click Display historic performance data using absolute time.
ii. Limit the time period to 1 hour before and 1 hour after the event that was reported,
as shown in Figure 13-115 on page 450.
iii. Click Filter to limit to Storwize V7000 Subsystem.
c. In the Edit Filter window (see Figure 13-117 on page 452), complete the following
steps:
i. Click Add to add a second filter.
ii. Select the Total I/O Rate (overall), and set it to greater than 1000 (meaning a high
I/O rate).
iii. Click OK.
451
The report that is shown in Figure 13-118 shows all of the performance records of the
volumes that were filtered previously. In the Volume column, only three volumes meet
these criteria: tpcblade3-7-ko2, tpcblade3-7-ko3, and tpcblade3ko4. Multiple rows are
available for each volume because each performance data record has a row. Look for
which volumes had an I/O rate change around 6:00 p.m. on 26 May. You can click the
Time column to sort the data.
10.Complete the following steps to compare the Total I/O rate (overall) metric for these
volumes and the volume subject of the case study, tpcblade3-7-k02:
a. Remove the filtering condition on the Total I/O Rate that is defined in Figure 13-117,
and then generate the report again.
b. Select one row for each of these volumes.
452
c. In the Select Charting Option window (see Figure 13-119), select Total I/O Rate
(overall), and then click OK to generate the chart.
d. For Limit days From, insert the time frame that you are investigating.
Figure 13-120 on page 454 shows the root cause. The tpcblade3-7-ko2 volume (the blue
line in Figure 13-120 on page 454) started around 5:00 p.m. and has a total I/O rate of
around 1000 IOPS. When the new workloads (which were generated by the
tpcblade3-7-ko3 and tpcblade3-ko4 volumes) started, the total I/O rate for the
tpcblade3-7-ko2 volume fell from around1000 IOPS to less than 500 I/Os. Then, it grew
again to about 1000 I/Os when one of the two loads decreased. The hardware has
physical limitations on the number of IOPS that it can handle. This limitation was reached
at 6:00 p.m.
453
To confirm this behavior, you can generate a chart by selecting Response time. The chart
that is shown in Figure 13-121 confirms that, when the new workload started, the
response time for the tpcblade3-7-ko2 volume becomes worse.
The easy solution is to split this workload by moving one VDisk to another managed
disk group.
454
13.5.3 Top volumes response time and I/O rate performance reports
Reports about the most active volumes in a SAN Volume Controller or Storwize family
systems can be accessed via all three Tivoli Storage Productivity Center GUI interfaces, as
described in 13.3.4, Top Volume Performance reports on page 422. In this case study, we
use the stand-alone GUI.
In this section, we show how to tailor the Top Volumes Response Performance report (which
is available in the Tivoli Storage Productivity Center 5.2 stand-alone GUI) to identify volumes
with long response times and high I/O rates. You can tailor the report for your environment.
You can also update your filters to exclude volumes or subsystems that you no longer want in
this report.
Complete the following steps to tailor the Top Volumes Response Performance report:
1. Click Disk Manager Reporting Storage Subsystem Performance, and select By
Volume, as shown in the left pane in Figure 13-122.
2. In the Selection tab (as shown in the right pane in Figure 13-122), keep only the wanted
metrics in the Included Columns box and move all other metrics (by using the arrow
buttons) to the Available Columns box.
You can save this report for future reference by clicking IBM Tivoli Storage Productivity
Center My Reports your user Reports.
Click Filter to specify the filters to limit the report.
3. In the Edit Filter window (see Figure 13-123 on page 456), click Add to add the conditions.
In this example, we limit the report to Subsystems SVC* and DS8*. We also limit the report
to the volumes that have an I/O rate greater than 100 Ops/sec and a Response Time
greater than 5 ms.
455
4. Complete the following steps in the Selection tab, as shown in Figure 13-124:
a. Specify the date and time of the period for which you want to make the inquiry.
Important: Specifying large intervals might require intensive processing and a long
time to complete.
b. Click Generate Report.
Figure 13-124 Limiting the days for the top volumes tailored report
Figure 13-125 on page 457 shows the resulting Volume list. By sorting by the Overall
Response Time or I/O Rate columns (by clicking the column header), you can identify which
entries have interesting total I/O rates and overall response times.
456
Guidelines for total I/O rate and overall response time in a production
environment
In a production environment, you initially might want to specify a total I/O rate overall of
1 - 100 Ops/sec and an overall response time (ms) that is greater than or equal to 15 ms.
Then, adjust these values to suit your needs as you gain more experience.
457
2. In the right pane as shown in Figure 13-126, in the Triggering Condition box under
Condition, select the alert that you want to set.
Tip: The best place to verify which thresholds are currently enabled (and at what
values) is at the beginning of a Performance Collection job.
To schedule the Performance Collection job and verify the thresholds, complete the following
steps:
1. Click Tivoli Storage Productivity Center Job Management, as shown in the left pane
of Figure 13-127 on page 460.
Chapter 13. Monitoring
459
Figure 13-127 Job management panel and SAN Volume Controller performance job log selection
2. In the Schedules table (as shown in the upper part of the right pane), select the latest
performance collection job that is running or that ran for your subsystem.
3. In the Job for Selected Schedule (as shown in the lower part of the right pane), expand the
corresponding job, and select the instance.
4. To access to the corresponding log file, click View Log File(s). Then, you can see the
threshold that is defined, as shown in Figure 13-128 on page 461.
Tip: To return to the beginning of the log file, click Top.
460
To list all the alerts that occurred, complete the following steps:
1. Click IBM Tivoli Storage Productivity Center Alerting Alert Log Storage
Subsystem.
2. Look for your SAN Volume Controller subsystem, as shown in Figure 13-129.
3. Click the magnifying glass icon ( ) that is next to the alert for which you want to see
detailed information, as shown in Figure 13-130.
461
For more information about defining alerts, see SAN Storage Performance Management
Using Tivoli Storage Productivity Center, SG24-7364.
The Alerts window opens, which displays all of the alerts for all the monitored resources, as
shown in Figure 13-132. You can filter or sort the alerts that are displayed (as required) by
using the column headings or the Filter field at the upper right of the table.
462
Right-click any alert to view, remove, or acknowledge the alert, as shown in Figure 13-133.
Select View Alert to display the details of the alert in a separate window, as shown in
Figure 13-134.
463
To monitor threshold violations in the Tivoli Storage Productivity Center 5.2 web GUI, click
Home Performance Monitors in the left side navigation, as shown Figure 13-135.
After the Performance Monitors window opens, click the Threshold Violations tab to display
all threshold violations for the monitored resources, as shown in Figure 13-136.
To view the details of a threshold violation, right-click the corresponding row and choose View
Threshold Violation, as shown in Figure 13-137 on page 465.
464
A window opens that contains the details of the threshold violation, as shown in
Figure 13-138.
465
c. In the Edit Filter window (see Figure 13-139), specify the conditions. In this case study,
we specify the following conditions under Column:
2. After you generate this report, identify on the next page by using the Topology Viewer
which device is being affected, and identify a possible solution. Figure 13-140 shows the
result in our lab.
Figure 13-140 Ports exceeding filters set for switch performance report
).
4. In the Select Charting Option window, hold down the Ctrl key and select Port Send Data
Rate, Port Receive Data Rate, and Total Port Data Rate. Click OK to generate the chart.
As shown in Figure 13-141 on page 467, the chart shows a consistent throughput that is
higher than 300 MBps in the selected time period. You can change the dates by extending
the Limit days settings.
Tip: This chart shows how persistent high utilization is for this port. This consideration
is important for establishing the significance and effect of this bottleneck.
Important: To get all the values in the selected interval, remove the defined filters in the
Edit Filter window, as shown in Figure 13-139.
466
5. Complete the following steps to identify which device is connected to port 7 on this switch:
a. Click IBM Tivoli Storage Productivity Center Topology. Right-click Switches, and
select Expand all Groups, as shown in the left pane in Figure 13-142 on page 468.
b. Look for your switch, as shown in the right pane in Figure 13-142 on page 468.
467
Tip: To navigate in the Topology Viewer, press and hold the Alt key and the left
mouse button to anchor your cursor. When you hold down these keys, you can use
the mouse to drag the panel to quickly move to the information you need.
c. Find and click port 7. The line shows that it is connected to the tpcblade3-7 computer,
as shown in Figure 13-143 on page 469. You can see Port details in the tabular view at
the bottom of the display. If you scroll to the right, you can also check the Port speed.
468
d. Double-click the tpcblade3-7 computer to highlight it. Then, click Datapath Explorer
(under Shortcuts in the small box at the top of Figure 13-143) to see the paths between
the servers and storage subsystems or between storage subsystems. For example,
you can get SAN Volume Controller to back-end storage or server to storage
subsystem.
The view consists of three panels (host information, fabric information, and subsystem
information) that show the path through a fabric or set of fabrics for the endpoint devices,
as shown in Figure 13-144 on page 470.
Tip: A possible scenario of using Data Path Explorer is an application on a host that is
running slow. The system administrator wants to determine the health status for all
associated I/O path components for this application. The system administrator
determines whether all components along that path healthy. In addition, the system
administrator sees whether there are any component-level performance problems that
might be causing the slow application response.
Looking at the data paths for tpcblade3-7 computer as shown in Figure 13-144 on
page 470, you can see that it has a single port HBA connection to the SAN. A possible
solution to improve the SAN performance for tpcblade3-7 computer is to upgrade it to a
dual port HBA.
469
13.5.6 Verifying the SAN Volume Controller and Fabric configuration by using
Topology Viewer
After Tivoli Storage Productivity Center probes the SAN environment, it automatically builds a
graphical display of the SAN environment by using the information from all the SAN
components (switches, storage controllers, and hosts). This graphical display is available by
using the Topology Viewer option on the Tivoli Storage Productivity Center Navigation Tree.
The information in the Topology Viewer panel is current as of the successful resolution of the
last problem. By default, Tivoli Storage Productivity Center probes the environment daily.
However, you can run an unplanned or immediate probe at any time.
Tip: If you are analyzing the environment for problem determination, run an ad hoc probe
to ensure that you have the latest information about the SAN environment. Make sure that
the probe completes successfully.
Status
This is a port that is connected.
This is a port that is not connected.
470
Figure 13-145 shows the SAN Volume Controller ports that are connected and the switch
ports.
Important: Figure 13-145 shows an incorrect configuration for the SAN Volume Controller
connections. This configuration is incorrect because it was implemented for lab purposes
only. In real environments, each SAN Volume Controller (or Storwize V7000) node port is
connected to two separate fabrics. If any SAN Volume Controller (or Storwize V7000) node
port is not connected, each node in the cluster displays an error on LCD display. Tivoli
Storage Productivity Center also shows the health of the cluster as a warning in the
Topology Viewer, as shown in Figure 13-145.
Consider the following points:
You have at least one port from each node in each fabric.
You have an equal number of ports in each fabric from each node. That is, do not have
three ports in Fabric 1 and only one port in Fabric 2 for a SAN Volume Controller (or
Storwize V7000) node.
In this example, the connected SAN Volume Controller ports are both online. When a SAN
Volume Controller port is not healthy, a black line is shown between the switch and the SAN
Volume Controller node.
Tivoli Storage Productivity Center can detect to where the unhealthy ports were connected on
a previous probe (which, therefore, were previously shown with a green line). Therefore, the
probe discovered that these ports were no longer connected, which resulted in the green line
becoming a black line.
If these ports are never connected to the switch, they do not have any lines.
471
472
By using Data Path Explorer, you can see, for example, that mdisk1 in Storwize
V7000-2076-ford1_tbird-IBM is available through two Storwize V7000 ports. You can trace
that connectivity to its logical unit number (LUN) rad (ID:009f) as shown in Figure 13-147.
In addition, you can hover over the MDisk, LUN, and switch ports (not shown in
Figure 13-147) and get health and performance information about these components. This
way, you can verify the status of each component to see how well it is performing.
473
The Topology Viewer shows that tpcblade3-11 is physically connected to a single fabric. By
using the Zone tab, you can see the single zone configuration that is applied to tpcblade3-11
for the 100000051E90199D zone. Therefore, tpcblade3-11 does not have redundant paths, and
if the mini switch goes offline, tpcblade3-111 loses access to its SAN storage.
By clicking the zone configuration, you can see which port is included in a zone configuration
and which switch has the zone configuration. The port that has no zone configuration is not
surrounded by a gray box.
You can also use the Data Path Viewer in Tivoli Storage Productivity Center to check and
confirm path connectivity between a disk that an operating system detects and the VDisk that
the Storwize V7000 provides. Figure 13-149 on page 475 shows the path information that
relates to the tpcblade3-11 host and its VDisks. You can hover over each component to also
get health and performance information (not shown), which might be useful when you perform
problem determination and analysis.
474
13.5.7 Verifying the SAN Volume Controller and Fabric configuration by using
the Tivoli Storage Productivity Center 5.2 web-based GUI Data Path tools
The Tivoli Storage Productivity Center 5.2 web-based GUI includes a set of Data Path tools to
help verify and monitor your environment. The tools include a Topology View, with which you
can view your environment elements in Tree, Hierarchical, or customized views, and a Table
view, with which you can view your environment elements in a tabular view. To access the
Data Path tools, click Data Path under the General section, as shown in Figure 13-150.
Figure 13-150 Accessing the Data Path tools for a storage system
Topology View window opens for this storage system, which displays the connected elements
for this storage system in a tree view format, as shown in Figure 13-151 on page 476.
475
The elements are displayed with icons that denote their status. With your mouse, point to the
icons at the lower left of the window to see captions that describe each icon. To change the
display method, use the drop-down menu, as shown in Figure 13-151. To view the details or
properties of an element, right-click the element in the window, or select the element and use
the Actions menu. To view the elements in a tabular format, click the Table View tab at the top
of the window.
Note: In Tivoli Storage Productivity Center version 5.2, the Data Path tools provide a
limited amount of information and functionality. For in-depth troubleshooting, the Topology
Viewer that is described in 13.5.6, Verifying the SAN Volume Controller and Fabric
configuration by using Topology Viewer on page 470, might provide greater capabilities.
476
The performance monitor panel (as shown in Figure 13-153 on page 478) presents the
graphs in the following four quadrants:
The upper left quadrant is the CPU utilization percentage for system CPUs and
compression CPUs.
The upper right quadrant is volume throughput in MBps and current volume latency.
The lower left quadrant is the interface throughput (FC, SAS, iSCSI, and IP Remote Copy).
The lower right quadrant is MDisk throughput in MBps, current MDisk latency.
477
Figure 13-153 Performance monitor panel in the SAN Volume Controller 7.2 web GUI
To toggle the charts to show IOPS instead of throughput (MBps), select IOPS in the
drop-down menu above the CPU Utilization chart, as shown in Figure 13-154.
Figure 13-154 Toggling between MBps and IOPS in the performance monitor panel
Each graph represents 5 minutes of collected statistics and provides a means of assessing
the overall performance of your system. For example, CPU utilization shows the current
percentage of CPU usage and specific data points on the graph that show peaks in utilization.
With this real-time performance monitor, you can quickly view bandwidth or IOPS of volumes,
interfaces, and MDisks. Each graph shows the current bandwidth or IOPS and a view of the
value over the last 5 minutes. Each data point can be accessed to determine its individual
bandwidth utilization or IOPS, and to evaluate whether a specific data point might represent
performance impacts. For example, you can monitor the interfaces, such as Fibre Channel or
SAS, to determine whether the host data-transfer rate is different from the expected rate. The
volumes and MDisk graphs also show the IOPS and latency values.
To remove a specific metric from a graph, click the corresponding item below the graph, as
shown in Figure 13-155 on page 479. Click it again to add it back to the graph. Displayed
metrics are denoted by a solid color square. Hidden metrics are denoted by a hollow colored
square.
478
Figure 13-155 Toggling the display of graph metrics in the performance monitor panel
In the left drop-down menu above the CPU Utilization chart, you can switch from system
statistic to statistics by node, and select a specific node to get its real-time performance
graphs, as shown in Figure 13-156.
Figure 13-156 Displaying node level metrics in the performance monitor panel
Viewing node level performance statistics can help identify an unbalanced usage of your
system nodes.
479
To retrieve the statistics files from the nonconfiguration nodes, copy them beforehand onto the
configuration node by using the following command:
cpdumps -prefix /dumps/iostats
To retrieve the statistics files from the SAN Volume Controller, you can use the secure copy
(scp) command, as shown in the following example:
scp -i <private key file> admin@clustername:/dumps/iostats/* <local destination dir>
If you do not use Tivoli Storage Productivity Center, you must retrieve and parse these XML
files to analyze the long-term statistics. The counters on the files are posted as absolute
values. Therefore, the application that processes the performance statistics must compare
two samples to calculate the differences from the two files.
An easy way to gather and store the performance statistic data and generate graphs is to use
the svcmon command. This command collects SAN Volume Controller and Storwize V7000
performance data every 1 - 60 minutes. Then, it creates the spreadsheet files (in CSV format)
and graph files (in GIF format). By using a database, the svcmon command manages SAN
Volume Controller and Storwize V7000 performance statistics from minutes to years.
For more information about the svcmon command, see SVC / Storwize V7000 Performance
Monitor - svcmon in IBM developerWorks at this website:
https://www.ibm.com/developerworks/mydeveloperworks/blogs/svcmon
Disclaimer: The svcmon command is a set of Perl scripts that were designed and
programmed by Yoshimichi Kosuge, personally. It is not an IBM product and it is provided
without any warranty. Therefore, you use svcmon at your own risk.
The svcmon command works in online mode or stand-alone mode, which is described briefly
in this section. The package is well-documented to run on Windows or Linux workstations. For
other platforms, you must adjust the svcmon scripts.
For a Windows workstation, you must install the ActivePerl, PostgreSQL, and the
Command-Line Transformation Utility (msxsl.exe). PuTTY is required if you want to run in
online mode. However, even in stand-alone mode, you might need it to secure copy the
/dumps/iostats/ files and the /tmp/svc.config.backup.xml files. You might also need it to
access the SAN Volume Controller from a command line. Follow the installation guide about
the svcmon command on the following IBM developerWorks blog website:
https://www.ibm.com/developerworks/mydeveloperworks/blogs/svcmon
To run svcmon in stand-alone mode, you must convert the .xml configuration backup file into
HTML format by using the svcconfig.pl script. Then, you must copy the performance files to
the iostats directory and create the svcmon database by using svcdb.pl --create or
populate the database by using svcperf.pl --offline. The last step is report generation,
which you run with the svcreport.pl script.
The reporting functionality generates multiple GIF files per object (MDisk, VDisk, and node)
with aggregated CSV files. By using the CSV files, we can generate customized charts that
are based on spreadsheet functions, such as Pivot Tables or DataPilot and search
(xLOOKUP) operations. The backup configuration file that is converted in HTML is a good
source to create another spreadsheet tab to relate, for example, VDisks with their I/O group
and preferred node.
480
Figure 13-157 shows a spreadsheet chart that was generated from the
<system_name>__vdisk.csv file that was filtered for I/O group 2. The VDisks for this I/O group
were selected by using a secondary spreadsheet tab that was populated with the VDisk
section of the configuration backup HTML file.
Figure 13-157 Total operations per VDisk for I/O group 2, where Vdisk37 is the busiest volume
By default, the svcreport.pl script generates GIF charts and CSV files with one hour of data.
The CSV files aggregate a large amount of data, but the GIF charts are presented by VDisk,
MDisk, and node as described in Table 13-3.
Table 13-3 Spreadsheets and GIF chart types that are produced by svcreport
Spreadsheets (CSV)
cache_node
cache.hits
mdisk.response.worst.resp
cache.usage.node
cache_vdisk
cache.stage
mdisk.response
cpu.usage.node
cpu
cache.throughput
mdisk.throughput
N/A
drive
cache.usage
mdisk.transaction
N/A
MDisk
vdisk.response.tx
N/A
N/A
node
vdisk.response.wr
N/A
N/A
VDisk
vdisk.throughput
N/A
N/A
N/A
vdisk.transaction
N/A
N/A
To generate a 24-hour chart, specify the --for 1440 option. The -for option specifies the
time range by minute for which you want to generate SAN Volume Controller/Storwize V7000
performance report files (CSV and GIF). The default value is 60 minutes.
Figure 13-158 on page 482 shows a chart that was automatically generated by the
svcperf.pl script for the vdisk37. This chart, which is related to vdisk37, is shown if the chart
in Figure 13-157 shows that VDisk is the one that reaches the highest IOPS values.
481
The svcmon command is not intended to replace Tivoli Storage Productivity Center. However,
it helps when Tivoli Storage Productivity Center is not available, allowing an easy
interpretation of the SAN Volume Controller performance XML data.
482
Figure 13-159 shows the read/write throughput for vdisk37 in bytes per second.
483
484
14
Chapter 14.
Maintenance
Among the many benefits that the IBM System Storage SAN Volume Controller and Storwize
family products provide is to greatly simplify the storage management tasks that system
administrators need to perform. However, as the IT environment grows and gets renewed, so
does the storage infrastructure.
This chapter highlights guidance for the day-to-day activities of storage administration by
using the SAN Volume Controller/Storwize family products. This guidance can help you to
maintain your storage infrastructure with the levels of availability, reliability, and resiliency that
is demanded by todays applications, and to keep up with storage growth needs.
This chapter focuses on the most important topics to consider in SAN Volume Controller and
Storwize administration so that you can use this chapter as a checklist. It also provides and
tips and guidance. For practical examples of the procedures that are described here, see
Chapter 16, SAN Volume Controller scenarios on page 555.
Important: The practices that are described here were effective in many SAN Volume
Controller/Storwize installations worldwide for organizations in several areas. They all had
one common need, which was the need to easily, effectively, and reliably manage their
SAN disk storage environment. Nevertheless, whenever you have a choice between two
possible implementations or configurations, if you look deep enough, you always have both
advantages and disadvantages over the other. Do not take these practices as absolute
truth, but rather use them as a guide. The choice of which approach to use is ultimately
yours.
This chapter includes the following sections:
Automating the documentation for SAN Volume Controller/Storwize and SAN environment
Storage management IDs
Standard operating procedures
SAN Volume Controller/Storwize code upgrade
SAN modifications
Hardware upgrades for SAN Volume Controller
Adding expansion enclosures to Storwize family systems
More information
485
486
Typical SAN and SAN Volume Controller component names limit the number and type of
characters you can use. For example, SAN Volume Controller/Storwize names are limited to
63 characters, which makes creating a naming convention a bit easier than in previous
versions of SAN Volume Controller/Storwize code.
Names: In previous versions of SAN Volume Controller/Storwize code, names were limited
to 15 characters. Starting with version 7.x, the limit is 63 characters.
Many names in SAN storage and in the SAN Volume Controller/Storwize can be modified
online. Therefore, you do not need to worry about planning outages to implement your new
naming convention. (Server names are the exception, as explained in Hosts on page 488.)
The naming examples that are used in the following sections are proven to be effective in
most cases, but might not be fully adequate to your particular environment or needs. The
naming convention to use is your choice, but you must implement it in the whole environment.
Storage controllers
SAN Volume Controller names the storage controllers controllerX, with X being a sequential
decimal number. If multiple controllers are attached to your SAN Volume Controller, change
the name so that it includes, for example, the vendor name, the model, or its serial number.
Therefore, if you receive an error message that points to controllerX, you do not need to log
in to SAN Volume Controller to know which storage controller to check.
Note: SAN Volume Controller/Storwize detects controllers that are based on their WWNN.
If you have a storage controller that has one WWNN per one WWPN this might lead to
many controllerX names pointing to same physical box. In this case, you should prepare
a naming convention to cover this situation.
487
Hosts
In todays environment, administrators deal with large networks, the Internet, and Cloud
Computing. Use good server naming conventions so that they can quickly identify a server
and determine the following information:
Changing a servers name might have implications for application configuration and require a
server reboot, so you might want to prepare a detailed plan if you decide to rename several
servers in your network.
The following example is for server name conventions for LLAATRFFNN, where:
LL is the location, which might designate a city, data center, building floor, or room, and so
on
AA is a major application; for example, billing, ERP, and Data Warehouse
T is the type; for example, UNIX, Windows, and VMware
R is the role; for example, Production, Test, Q&A, and Development
FF is the function; for example, DB server, application server, web server, and file server
NN is numeric
488
Be mindful of the SAN Volume Controller port aliases. The 11th digit of the port WWPN (P)
reflects the SAN Volume Controller node FC port, but not directly, as listed in Table 14-1.
Table 14-1 WWPNs for the SAN Volume Controller node ports
Value of P
SVC02_IO2_A: SAN Volume Controller cluster SVC02, ports group A for iogrp 2 (aliases
SVC02_N3P1, SVC02_N3P3, SVC02_N4P1, and SVC02_N4P3)
D8KXYZ1_I0301: DS8000 serial number 75VXYZ1, port I0301(WWPN)
TL01_TD06: Tape library 01, tape drive 06 (WWPN)
If your SAN does not support aliases, for example, in heterogeneous fabrics with switches in
some interoperations modes, use WWPNs in your zones all across. However, remember to
update every zone that uses a WWPN if you ever change it.
Have your SAN zone name reflect the devices in the SAN it includes (normally in a
one-to-one relationship) as shown in the following examples:
servername_svcclustername (from a server to the SAN Volume Controller)
svcclustername_storagename (from the SAN Volume Controller cluster to its back-end
storage)
svccluster1_svccluster2 (for remote copy services)
489
The first time that you use the SAN Health data collection tool, you must explore the options
that are provided to learn how to create a well-organized and useful diagram. Figure 14-1
shows an example of a poorly formatted diagram.
Figure 14-2 shows a SAN Health Options window in which you can choose the format of SAN
diagram that best suits your needs. Depending on the topology and size of your SAN fabrics,
you might want to manipulate the options in the Diagram Format or Report Format tabs.
490
SAN Health supports switches from manufactures other than Brocade, such as McData and
Cisco. Both the data collection tool download and the processing of files are available at no
cost, and you can download Microsoft Visio and Excel viewers at no cost from the Microsoft
website.
Another tool, which is known as SAN Health Professional, is also available for download at no
cost. With this tool, you can audit the reports in detail by using advanced search functions and
inventory tracking. You can configure the SAN Health data collection tool as a Windows
scheduled task.
Tip: Regardless of the method that is used, generate a fresh report at least once a month.
Keep previous versions so that you can track the evolution of your SAN.
svcinfo lsfabric
svcinfo lssystem
svcinfo lsmdisk
svcinfo lsmdiskgrp
svcinfo lsvdisk
svcinfo lshost
svcinfo lshostvdiskmap X (with X ranging to all defined host numbers in your SAN
Volume Controller cluster)
Import the commands into a spreadsheet, preferably with each command output on a
separate sheet.
You might also want to store the output of more commands; for example, if you have SAN
Volume Controller/Storwize Copy Services that are configured or there are dedicated
managed disk groups to specific applications or servers.
One way to automate this task is to first create a batch file (Windows) or shell script (UNIX or
Linux) that runs these commands and stores their output in temporary files. Then, use
spreadsheet macros to import these temporary files into your SAN Volume
Controller/Storwize documentation spreadsheet.
When you are gathering SAN Volume Controller/Storwize information, consider following
preferred practices:
With MS Windows, use the PuTTY plink utility to create a batch session that runs these
commands and stores their output. With UNIX or Linux, you can use the standard SSH
utility.
491
Create a SAN Volume Controller user with the Monitor privilege to run these batches. Do
not grant it Administrator privilege. Create and configure an SSH key specifically for it.
Use the -delim option of these commands to make their output delimited by a character
other than Tab, such as comma or colon. By using a comma, you can import the
temporary files into your spreadsheet in CSV format.
To make your spreadsheet macros simpler, you might want to preprocess the temporary
output files and remove any garbage or undesired lines or columns. With UNIX or Linux
you can use text edition commands such as grep, sed, and awk. Freeware software is
available for Windows with the same commands, or you can use any batch text edition
utility.
The objective is to fully automate this procedure so you can schedule it to run automatically
regularly. Make the resulting spreadsheet easy to consult and have it contain only the relevant
information you use frequently. The automated collection and storage of configuration and
support data (which is typically more extensive and difficult to use) are described in 14.1.7,
Automated support data collection on page 494.
14.1.4 Storage
Fully allocate all of the available space in the storage controllers that you use in its back end
to the SAN Volume Controller/Storwize. This way, you can perform all your Disk Storage
Management tasks by using SAN Volume Controller/Storwize. You must generate only
documentation of your back-end storage controllers manually one time after configuration.
Then, you can update the documentation when these controllers receive hardware or code
upgrades. As such, there is little point to automating this back-end storage controller
documentation. The same applies to the Storwize internal disk drives and enclosures.
However, if you use split controllers, this option might not be the best option. The portion of
your storage controllers that is used outside SAN Volume Controller/Storwize might have its
configuration changed frequently. In this case, see your back-end storage controller
documentation for more information about how to gather and store the documentation that
you might need.
492
493
Handle them within the associated incident ticket, because it might take longer to replace
the part if you need to submit, schedule, and approve a non-emergency change ticket.
An exception is if you must interrupt more servers or applications to replace the part. In
this case, you must schedule the activity and coordinate support groups. Use good
judgment and avoid unnecessary exposure and delays.
Keep handy the procedures to generate reports of the latest incidents and implemented
changes in your SAN Storage environment. Typically, you do not need to periodically
generate these reports because your organization probably already has a Problem and
Change Management group that runs such reports for trend analysis purposes.
lsfbvol
lshostconnect
lsarray
lsrank
lsioports
lsvolgrp
You can create procedures that automatically create and store this data on scheduled dates,
delete old data, or transfer the data to tape.
494
You can subscribe to receive information from each vendor of storage and SAN equipment
from the IBM website. You can often quickly determine whether an alert or notification is
applicable to your SAN storage. Therefore, open them when you receive them and keep them
in a folder of your mailbox.
495
496
Ensure that you explicitly remove a volume from any volume-to-host mappings and any
copy services relationship to which it belongs before you delete it. At all costs, avoid the
use of the -force parameter in rmvdisk. If you issue the svctask rmvdisk command and it
still has pending mappings, the SAN Volume Controller/Storwize prompts you to confirm
and is a hint that you might have done something incorrectly.
When you are deallocating volumes, plan for an interval between unmapping them to
hosts (rmvdiskhostmap) and destroying them (rmvdisk). The IBM internal Storage
Technical Quality Review Process (STQRP) asks for a minimum of a 48-hour interval so
that you can perform a quick backout if you later realize you still need some data in that
volume.
497
Set the SAN Volume Controller Target/Storwize code level to the latest Generally Available
(GA) release unless you have a specific reason not to upgrade, such as the following
reasons:
The specific version of an application or other component of your SAN Storage
environment has a known problem.
The latest SAN Volume Controller/Storwize GA release is not yet cross-certified as
compatible with another key component of your SAN storage environment.
Your organization has mitigating internal policies, such as the use of the latest minus 1
release, or prompting for seasoning in the field before implementation.
Check the compatibility of your target SAN Volume Controller/Storwize code level with all
components of your SAN storage environment (SAN switches, storage controllers, servers
HBAs) and its attached servers (operating systems and eventually, applications).
Applications often certify only the operating system that they run under and leave to the
operating system provider the task of certifying its compatibility with attached components
(such as SAN storage). However, various applications might use special hardware features or
raw devices and certify the attached SAN storage. If you have this situation, consult the
compatibility matrix for your application to certify that your SAN Volume Controller/Storwize
target code level is compatible.
For more information, see the following websites:
Storwize V3700 Concurrent Compatibility and Code Cross-Reference:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004172
Storwize V5000 Concurrent Compatibility and Code Cross-Reference:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004336
Storwize V7000 Concurrent Compatibility and Code Cross-Reference:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1003705
498
Figure 14-4 SAN Volume Controller/Storwize Upgrade Test Utility installation by using the GUI
You can use the GUI or the CLI to upload and install the SAN Volume Controller/Storwize
Upgrade Test Utility. After installation, you can run Upgrade Test Utility from CLI or GUI. In
Example 14-1 on page 500, we show the output of running Upgrade Test Utility in CLI.
499
IBM_Storwize:superuser>svcupgradetest -v 7.1.0.6 -d
svcupgradetest version 10.20
Please wait, the test may take several minutes to complete.
******************* Warning found *******************
This tool has found the internal disks of this system are
not running the recommended firmware versions.
Details follow:
+----------------------+-----------+------------+-----------------------------------------+
| Model
| Latest FW | Current FW | Drive Info
|
+----------------------+-----------+------------+-----------------------------------------+
| ST91000640SS
| BD2F
| BD2E
| Drive in slot 1 in enclosure 1
|
|
|
| Drive in slot 2 in enclosure 1
|
|
|
| Drive in slot 9 in enclosure 1
|
|
|
| Drive in slot 8 in enclosure 1
|
|
|
| Drive in slot 10 in enclosure 1
|
|
|
| Drive in slot 11 in enclosure 1
|
|
|
| Drive in slot 5 in enclosure 1
|
|
|
| Drive in slot 7 in enclosure 1
|
|
|
| Drive in slot 6 in enclosure 1
|
|
|
| Drive in slot 4 in enclosure 1
|
|
|
| Drive in slot 12 in enclosure 1
|
|
|
| Drive in slot 3 in enclosure 1
|
|
|
| Drive in slot 13 in enclosure 1
|
|
|
| Drive in slot 14 in enclosure 1
+----------------------+-----------+------------+-----------------------------------------+
We recommend that you upgrade the drive microcode at an
appropriate time. If you believe you are running the latest
version of microcode, then check for a later version of this tool.
You do not need to upgrade the drive firmware before starting the
software upgrade.
500
Figure 14-5 SAN Volume Controller node models and code versions relationship
Figure 14-6 on page 502 shows the compatibility matrix between the Storwize family systems
and code versions.
501
502
The IBM Support page on Storwize V3700 Flashes and Alerts (Troubleshooting):
https://www-947.ibm.com/support/entry/myportal/all_troubleshooting_links/system
_storage/disk_systems/entry-level_disk_systems/ibm_storwize_v3700?productContex
t=-124971743
Fix every problem or suspect that you find with the disk path failover capability. Because a
typical SAN Volume Controller/Storwize environment has several dozens of servers to a few
hundred servers that are attached to it, the use of a spreadsheet might help you with the
Attached Hosts Preparation tracking process.
If you have some host virtualization, such as VMware ESX, AIX LPARs and VIOS, or Solaris
containers in your environment, verify the redundancy and failover capability in these
virtualization layers.
Upgrade sequence
The SAN Volume Controller/Storwize Supported Hardware List provides the correct
sequence for upgrading your SAN Volume Controller/Storwize SAN storage environment
components. For V7.2 of this list, see the following resources:
V7.2.x Supported Hardware List, Device Driver, Firmware and Recommended Software
Levels for SAN Volume Controller:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004453#_Prev
V7.2.x Supported Hardware List, Device Driver, Firmware and Recommended Software
Levels for IBM Storwize V7000:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004450
V7.2.x Supported Hardware List, Device Driver, Firmware and Recommended Software
Levels for IBM Storwize V5000:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004515
503
V7.2.x Supported Hardware List, Device Driver, Firmware and Recommended Software
Levels for IBM Storwize V3700:
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1004515
By cross-checking the version of SAN Volume Controller/Storwize that are compatible with
the versions of your SAN directors, you can determine which one to upgrade first. By
checking a components upgrade path, you can determine whether that component requires a
multistep upgrade.
If you are not making major version or multistep upgrades in any components, the following
upgrade order is less prone to eventual problems:
1.
2.
3.
4.
5.
Attention: Do not upgrade two components of your SAN Volume Controller SAN storage
environment simultaneously, such as the SAN Volume Controller and one storage
controller, even if you intend to do it with your system offline. An upgrade of this type can
lead to unpredictable results, and an unexpected problem is much more difficult to debug.
504
Example 14-2 shows what happens if you run the svcupgradetest command in a cluster with
internal SSDs in a managed state.
Example 14-2 The svcupgradetest command with SSDs in managed state
IBM_2145:svccf8:admin>svcinfo lsmdiskgrp
id
name
status
mdisk_count ...
...
2
MDG3SVCCF8SSD
online
2 ...
3
MDG4DS8KL3331
online
8 ...
...
IBM_2145:svccf8:admin>svcinfo lsmdisk -filtervalue mdisk_grp_name=MDG3SVCCF8SSD
id
name
status
mode
mdisk_grp_id
mdisk_grp_name capacity
ctrl_LUN_#
controller_name UID
0
mdisk0 online
managed
2
MDG3SVCCF8SSD
136.7GB
0000000000000000 controller0
5000a7203003190c000000000000000000000000000000000000000000000000
1
mdisk1 online
managed
2
MDG3SVCCF8SSD
136.7GB
0000000000000000 controller3
5000a72030032820000000000000000000000000000000000000000000000000
IBM_2145:svccf8:admin>
IBM_2145:svccf8:admin>svcinfo lscontroller
id
controller_name
ctrl_s/n
vendor_id
product_id_low product_id_high
0
controller0
IBM
2145
Internal
1
controller1
75L3001FFFF
IBM
2107900
2
controller2
75L3331FFFF
IBM
2107900
3
controller3
IBM
2145
Internal
IBM_2145:svccf8:admin>
IBM_2145:svccf8:admin>svcupgradetest -v 6.2.0.2 -d
svcupgradetest version 6.6
Please wait while the tool tests for issues that may prevent
a software upgrade from completing successfully. The test may
take several minutes to complete.
Checking 34 mdisks:
******************** Error found ********************
The requested upgrade from 5.1.0.10 to 6.2.0.2 cannot
be completed as there are internal SSDs are in use.
Please refer to the following flash:
http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003707
505
vendor_id
IBM
IBM
IBM
IBM
product_id_low product_id_high
2145
Internal
2107900
2107900
2145
Internal
vendor_id
IBM
IBM
product_id_low
2107900
2107900
product_id_high
You must decide which RAID level you configure in the new arrays with SSDs, depending on
the purpose that you give them and the level of redundancy that is needed to protect your
data if a hardware failure occurs. Table 14-2 lists the factors to consider in each case. By
using your internal SSDs for Easy Tier, you can achieve a gain in overall performance in most
cases.
Table 14-2 RAID levels for internal SSDs
RAID level
(GUI Preset)
When to use it
RAID 0
(Striped)
RAID 1
(Easy Tier)
RAID 10
(Mirrored)
4 - 8 drives, equally
distributed among each
node of the I/O group
506
Note: Although stopping remote copy services is not necessary, it is the preferred practice
to do so. One exception is when you are upgrading to version 7.2. You must stop all Global
Mirror (GM) relationships before starting the upgrade process because of performance
improvements in GM code in SAN Volume Controller/Storwize software version 7.2. Other
remote copy relationships, such as Metro Mirror (MM) or Global Mirror with Change
Volumes (GMCV), do not have to be stopped.
507
IBM_2145:superuser>applydrivesoftware -?
applydrivesoftware
Syntax
>>- applydrivesoftware -- -file --name-------------------------->
>--+-----------------------+--+- -drive --drive_id-+------------>
|
.-firmware-. | '- -all -------------'
'- -type --+-fpga-----+-'
>--+----------+--+-------------------+--+-------------------+--><
'- -force -' '- -allowreinstall -' '- -allowdowngrade -'
Preferred practice: If you are running v7.2 of SVC/Storwize code, use the
applydrivefirmware command instead of the utilitydriveupgrade.pl tool. This utility is
not supported or tested beyond version 7.1 of SAN Volume Controller/Storwize code.
Upgrade of disk drive firmware is concurrent whether it is HDD or SSD. However, with SSD,
the firmware level and FPGA level can be upgraded. Upgrade of FPGA is not concurrent, so
all IOs to the SSDs must be stopped before the upgrade. It is not a problem if SSDs are not
yet configured; however, if you have any SSD arrays in storage pools, you must remove SSD
MDisks from the pools before the upgrade.
This task can be challenging because removing MDisks from storage pool means migrating
all extents from these MDisks to the remaining MDisk in the pool. You cannot remove SSD
MDisks from the pool if there is no space left on the remaining MDisks. In such a situation,
one option is to migrate some volumes to other storage pools to free enough extents so the
SSD MDisk can be removed.
Important: More precaution must be taken if you are upgrading the FPGA on SSD in the
hybrid storage pool with Easy Tier running. If the Easy Tier setting on the storage pool has
value of auto, Easy Tier switches off after SSD MDisks are removed from that pool, which
means it loses all its historical data. After SSD MDisks are added back to this pool, Easy
Tier must start its analysis from the beginning. If you want to avoid such a situation, switch
the Easy Tier setting on the storage pool to on. This setting ensures that Easy Tier retains
its data after SSD removal.
508
509
510
IBM_2145:svccf8:admin>lshostvdiskmap NYBIXTDB03
id name
SCSI_id vdisk_id vdisk_name
vdisk_UID
0 NYBIXTDB03 0
0
NYBIXTDB03_T01 60050768018205E12000000000000000
IBM_2145:svccf8:admin>
511
3. If your server does not support HBA hot-swap, power off your system, replace the HBA,
connect the used FC cable into the new HBA, and power on the system.
If your server does support hot-swap, follow the appropriate procedures to replace the
HBA in hot. Do not disable or disrupt the good HBA in the process.
4. Verify that the new HBA successfully logged in to the SAN switch. If it logged in
successfully, you can see its WWPN logged in to the SAN switch port.
Otherwise, fix this issue before you continue to the next step.
Cross-check the WWPN that you see in the SAN switch with the one you noted in step 1,
and make sure you did not get the WWNN mistakenly.
5. In your SAN zoning configuration tool, replace the old HBA WWPN for the new one in
every alias and zone to which it belongs. Do not touch the other SAN fabric (the one with
the good HBA) while you perform this task.
There should be only one alias that uses this WWPN, and zones must reference this alias.
If you are using SAN port zoning (though you should not be) and you did not move the new
HBA FC cable to another SAN switch port, you do not need to reconfigure zoning.
6. Verify that the new HBAs WWPN appears in the SAN Volume Controller/Storwize by
using the lsfcportcandidate command.
If the WWPN of the new HBA does not appear, troubleshoot your SAN connections and
zoning if you did not do so.
7. Add the WWPN of this new HBA in the SAN Volume Controller/Storwize host definition by
using the addhostport command. Do not remove the old one yet. Run the lshost
<servername> command. Then, verify that the good HBA shows as active, while the failed
and new HBAs show as inactive or offline.
8. Return to the server. Then, reconfigure the multipath software to recognize the new HBA
and its associated SAN disk paths. Certify that all SAN LUNs have redundant, healthy disk
paths through the good and the new HBAs.
9. Return to the SAN Volume Controller/Storwize and verify again (by using the lshost
<servername> command) that both the good and the new HBAs WWPNs are active. In this
case, you can remove the old HBA WWPN from the host definition by using the
rmhostport command.
Troubleshoot your SAN connections and zoning if you did not do so. Do not remove any
HBA WWPNs from the host definition until you ensure that you have at least two healthy,
active ones.
By following these steps, you avoid removing your only good HBA in error.
512
513
3 io_grp3
IBM_2145:svccf8:admin>rmhostiogrp -iogrp 1:2:3 NYBIXTDB02
IBM_2145:svccf8:admin>lshostiogrp NYBIXTDB02
id name
0 io_grp0
IBM_2145:svccf8:admin>lsiogrp
id name
node_count vdisk_count host_count
0 io_grp0
2
32
1
1 io_grp1
0
0
0
2 io_grp2
0
0
0
3 io_grp3
0
0
0
4 recovery_io_grp 0
0
0
IBM_2145:svccf8:admin>
If possible, avoid setting a server to use volumes from I/O groups by using different node
types (as a permanent situation, in any case). Otherwise, as this servers storage capacity
grows, you might experience a performance difference between volumes from different I/O
groups, which makes it difficult to identify and resolve eventual performance problems.
514
3. Stop and remove remote copy relations with those volumes so that the target volumes on
the new SAN Volume Controller receive read/write access.
4. Unmap the volumes from the old cluster.
5. Zone your host to the new SAN Volume Controller cluster.
6. Map the volumes from the new cluster to the host.
7. Discover new volumes on the host.
8. Import data from those volumes and start your applications.
If you must migrate a server online, modify its zoning so that it uses volumes from both SAN
Volume Controller clusters. Also, use host-based mirroring (such as AIX mirrorvg) to move
your data from the old SAN Volume Controller to the new one. This approach uses the
servers computing resources (CPU, memory, and I/O) to replicate the data but can be done
online if properly planned. Before you begin, make sure it has such resources to spare.
The biggest benefit to using this approach is that it easily accommodates (if necessary) the
replacement of your SAN switches or your back-end storage controllers. You can upgrade the
capacity of your back-end storage controllers or replace them entirely, as you can replace
your SAN switches with bigger or faster ones. However, you do need to have spare resources,
such as floor space, power, cables, and storage capacity available during the migration.
Chapter 16, SAN Volume Controller scenarios on page 555, describes a possible approach
for this scenario that replaces the SAN Volume Controller, the switches, and the back-end
storage.
515
The control enclosure is the first enclosure in the second chain and because of that, you can
add five enclosures to the first chain and four enclosures to the second chain in Storwize
V7000. For Storwize V3700, you can add two enclosures to every chain; for Storwize V5000,
three enclosures to every chain can be added. As a preferred practice, the number of
expansion enclosures should be balanced between both chains. This means that the number
of expansion enclosures in every chain cannot differ by more than one; for example, having
five expansion enclosures in the first chain and only one in the second chain is incorrect.
Note: Storwize can detect incorrect cabling and creates warning messages in the event
log.
Correct cabling in a fully populated Storwize V7000 is shown in Figure 14-7.
Figure 14-7 Storwize V7000 Control enclosure with maximum number of expansion enclosures
Preferred practice: If you plan for future growth, always leave space in the rack cabinet;
10 U under and 8 U over Storwize control enclosure for more expansion enclosures.
Adding expansion enclosures is simplified because Storwize can automatically discover new
expansion enclosures after the SAS cables are connected and prompts for configuration of
new disk drives. Expansion enclosures that are left in this state work properly; however, they
are not monitored properly and because of that, there is a lack of expansion enclosure
information in logs if there are any problems. This issue can lead to more difficult
troubleshooting and can extend time of problem resolution. To avoid this situation, always run
the Add Expansion Enclosure procedure in the GUI, as shown in Figure 14-8 on page 517.
516
Preferred practice: Because of Storwize architecture and classical disk latency, it does
not matter in which enclosure SAS or NL-SAS drives are placed. However, if you have
some SSD drives and you want to use them in the most efficient way, you should put them
in the control enclosure or in the first expansion enclosures in chains. This configuration
ensures every I/O to SSD disk drives travel the shortest possible way through the internal
SAS fabric.
For more information about adding control enclosures, see Chapter 3., SAN Volume
Controller and Storwize V7000 Cluster on page 59.
517
518
15
Chapter 15.
Common problems
Collecting data and isolating the problem
Recovering from problems
Mapping physical LBAs to volume extents
Medium error logging
Replacing a bad disk
Health status during upgrade
519
Based on this list, the host administrator must check and correct any problems.
For more information about managing hosts on the SAN Volume Controller, see Chapter 8,
Hosts on page 225.
520
521
svcinfo lsfabric
Use this command with the various options, such as -controller. Also, you can check
different parts of the SAN Volume Controller configuration to ensure that multiple paths
are available from each SAN Volume Controller node port to an attached host or controller.
Confirm that all SAN Volume Controller node port WWPNs are connected to the back-end
storage consistently.
522
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0
id 0
controller_name controller0
WWNN 200400A0B8174431
mdisk_link_count 2
max_mdisk_link_count 4
degraded no
vendor_id IBM
product_id_low 1742-900
product_id_high
product_revision 0520
ctrl_s/n
WWPN 200400A0B8174433
path_count 4
max_path_count 12
WWPN 200500A0B8174433
path_count 4
max_path_count 8
IBM_2145:itsosvccl1:admin>svcinfo lsnode
id
name
UPS_serial_number
WWNN
status
IO_group_id
IO_group_name
config_node
UPS_unique_id
hardware
6
Node1
1000739007
50050768010037E5 online
0
io_grp0
no
20400001C3240007 8G4
5
Node2
1000739004
50050768010037DC online
0
io_grp0
yes
20400001C3240004 8G4
4
Node3
100068A006
5005076801001D21 online
1
io_grp1
no
2040000188440006 8F4
8
Node4
100068A008
5005076801021D22 online
1
io_grp1
no
2040000188440008 8F4
Example 15-1 shows that two MDisks are present for the storage subsystem controller with
ID 0, and four SAN Volume Controller nodes are in the SAN Volume Controller cluster. In this
example, the following path_count is used:
2 x 4 = 8
If possible, spread the paths across all storage subsystem controller ports, as is the case for
Example 15-1 (four for each WWPN).
523
If the driver is an IBM Subsystem Device Driver (SDD), SDDDSM, or SDDPCM host, use
datapath query device or pcmpath query device to check the host multipathing. Ensure that
paths go to the preferred and nonpreferred SAN Volume Controller nodes. For more
information, see Chapter 8, Hosts on page 225.
Check that paths are open for both preferred paths (with select counts in high numbers) and
nonpreferred paths (the * or nearly zero select counts). In Example 15-2, path 0 and path 2
are the preferred paths with a high select count. Path 1 and path 3 are the nonpreferred
paths, which show an asterisk (*) and 0 select counts.
Example 15-2 Checking paths
sdd.log
sdd_bak.log
sddsrv.log
sddsrv_bak.log
AIX: /var/adm/ras
Hewlett-Packard UNIX: /var/adm
Linux: /var/log
Solaris: /var/adm
Windows 2000 Server and Windows NT Server: \WINNT\system32
Windows Server 2003: \Windows\system32
SDDPCM
SDDPCM was enhanced to collect SDDPCM trace data periodically and to write the trace
data to the systems local hard disk drive. SDDPCM maintains the following files for its trace
data:
pcm.log
pcm_bak.log
pcmsrv.log
pcmsrv_bak.log
525
Starting with SDDPCM 2.1.0.8, the relevant data for debugging problems is collected by
running the sddpcmgetdata script, as shown in Example 15-3.
Example 15-3 The sddpcmgetdata script (output shortened for clarity)
>sddpcmgetdata
>ls
sddpcmdata_confucius_20080814_012513.tar
The sddpcmgetdata script collects information that is used for problem determination. Then, it
creates a .tar file in the current directory with the current date and time as a part of the file
name, as shown in the following example:
sddpcmdata_hostname_yyyymmdd_hhmmss.tar
When you report an SDDPCM problem, you must run this script and send this .tar file to IBM
Support for problem determination.
If the sddpcmgetdata command is not found, collect the following files:
SDDDSM
SDDDSM also provides the sddgetdata script (see Example 15-4) to collect information to
use for problem determination. The SDDGETDATA.BAT batch file generates the following
information:
Example 15-4 The sddgetdata script for SDDDSM (output shortened for clarity)
C:\Program Files\IBM\SDDDSM>sddgetdata.bat
Collecting SDD trace Data
Collecting datapath command outputs
Collecting SDD and SDDSrv logs
Collecting Most current driver trace
Generating a CAB file for all the Logs
sdddata_DIOMEDE_20080814_42211.cab file generated
526
C:\Program Files\IBM\SDDDSM>dir
Volume in drive C has no label.
Volume Serial Number is 0445-53F4
Directory of C:\Program Files\IBM\SDDDSM
06/29/2008
04:22 AM
574,130 sdddata_DIOMEDE_20080814_42211.cab
Run vi /tmp/datacollect.sh
Cut and paste the script into the /tmp/datacollect.sh file, and save the file.
Run chmod 755 /tmp/datacollect.sh
Run /tmp/datacollect.sh
#!/bin/ksh
export PATH=/bin:/usr/bin:/sbin
echo "y" | snap -r # Clean up old snaps
snap -gGfkLN # Collect new; don't package yet
cd /tmp/ibmsupt/other # Add supporting data
cp /var/adm/ras/sdd* .
cp /var/adm/ras/pcm* .
cp /etc/vpexclude .
datapath query device > sddpath_query_device.out
datapath query essmap > sddpath_query_essmap.out
pcmpath query device > pcmpath_query_device.out
pcmpath query essmap > pcmpath_query_essmap.out
sddgetdata
sddpcmgetdata
snap -c # Package snap and other data
echo "Please rename /tmp/ibmsupt/snap.pax.Z after the"
echo "PMR number and ftp to IBM."
exit 0
527
Data collection for SAN Volume Controller by using the SAN Volume
Controller Console GUI
From the support panel that is shown in Figure 15-1, you can download support packages
that contain log files and information that can be sent to support personnel to help
troubleshoot the system. You can download individual log files or download statesaves, which
are dumps or livedumps of the system data.
528
2. In the Download Support Package window that opens (as shown in Figure 15-3), select
the log types that you want to download. The following download types are available:
Standard logs, which contain the most recent logs that were collected for the system.
These logs are the most commonly used by Support to diagnose and solve problems.
Standard logs plus one existing statesave, which contain the standard logs for the
system and the most recent statesave from any of the nodes in the system. Statesaves
are also known as dumps or livedumps.
Standard logs plus most recent statesave from each node, which contains the standard
logs for the system and the most recent statesaves from each node in the system.
Standard logs plus new statesaves, which generate new statesaves (livedumps) for all
nodes in the system, and package them with the most recent logs.
When you are collecting support package for troubleshooting IBM Real-time
compression-related issues, remember to note that RACE diagnostics information is only
available in a statesave (also called livedump). Standard logs do not contain this
information. Depending on the problem symptom, download the appropriate support
package that contains the RACE diagnostics information. If the system observed a failure,
select Standard logs plus most recent statesave from each node and for all other
diagnostic purposes, select Standard logs plus new statesaves.
Then, click Download.
Action completion time: Depending on your choice, this action can take several
minutes to complete.
529
3. Select where you want to save these logs, as shown in Figure 15-4. Then, click OK.
Performance statistics: Any option that is used in the GUI (1 - 4), in addition to using the
CLI, collects the performance statistics files from all nodes in the cluster.
Data collection for SAN Volume Controller by using the SAN Volume
Controller CLI 4.x or later
Because the config node is always the SAN Volume Controller node with which you
communicate, you must copy all the data from the other nodes to the config node. To copy the
files, first run the svcinfo lsnode command to determine the non-config nodes.
Example 15-6 shows the output of this command.
Example 15-6 Determine the non-config nodes (output shortened for clarity)
IBM_2145:itsosvccl1:admin>svcinfo lsnode
id
name
WWNN
status
1
node1
50050768010037E5 online
2
node2
50050768010037DC online
IO_group_id
0
0
config_node
no
yes
The output in Example 15-6 shows that the node with ID 2 is the config node. Therefore, for
all nodes except the config node, you must run the svctask cpdumps command. No feedback
is provided for this command. Example 15-7 shows the command for the node with ID 1.
Example 15-7 Copying the dump files from the other nodes
530
Attention: Dump files are large. Collect them only if you really need them.
Example 15-8 The svc_snap dumpall command
IBM_2145:itsosvccl1:admin>svc_snap dumpall
Collecting system information...
Copying files, please wait...
Copying files, please wait...
Dumping error log...
Waiting for file copying to complete...
Waiting for file copying to complete...
Waiting for file copying to complete...
Waiting for file copying to complete...
Creating snap package...
Snap data collected in /dumps/snap.104603.080815.160321.tgz
After the data collection by using the svc_snap dumpall command is complete, verify that the
new snap file appears in your 2145 dumps directory by using the svcinfo ls2145dumps
command, as shown in Example 15-9.
Example 15-9 The ls2145 dumps command (shortened for clarity)
IBM_2145:itsosvccl1:admin>svcinfo ls2145dumps
id
2145_filename
0
dump.104603.080801.161333
1
svc.config.cron.bak_node2
.
.
23
104603.trc
24
snap.104603.080815.160321.tgz
To copy the file from the SAN Volume Controller cluster, use secure copy (SCP). For more
information about the PuTTY SCP function, see Implementing the IBM System Storage SAN
Volume Controller V6.3, SG24-7933.
Livedump
SAN Volume Controller livedump is a procedure that IBM Support might ask clients to run for
problem investigation. You can generate it for all nodes from the GUI, as described in Data
collection for SAN Volume Controller by using the SAN Volume Controller Console GUI on
page 528. Alternatively, you can trigger it from the CLI; for example, on only one node of the
cluster.
Attention: Start the SAN Volume Controller livedump procedure only under the direction
of IBM Support.
Sometimes, investigations require a livedump from the configuration node in the SAN Volume
Controller cluster. A livedump is a lightweight dump from a node that can be taken without
affecting host I/O. The only effect is a slight reduction in system performance (because of
reduced memory that is available for the I/O cache) until the dump is finished.
Complete the following steps to perform a livedump:
1. Prepare the node for taking a livedump by running the following command:
svctask preplivedump <node id/name>
Chapter 15. Troubleshooting and diagnostics
531
This command reserves the necessary system resources to take a livedump. The
operation can take some time because the node might have to flush data from the cache.
System performance might be slightly affected after you run this command because part
of the memory that is normally available to the cache is not available while the node is
prepared for a livedump.
After the command completes, the livedump is ready to be triggered, which you can see
by examining the output from the following command:
svcinfo lslivedump <node id/name>
The status must be reported as prepared.
2. Run the following command to trigger the livedump:
svctask triggerlivedump <node id/name>
This command completes when the data capture is complete but before the dump file is
written to disk.
3. Run the following command to query the status and copy the dump off when complete:
svcinfo lslivedump <nodeid/name>
The status is dumping when the file is being written to disk. The status is inactive after it
is completed. After the status returns to the inactive state, you can find the livedump file in
the /dumps folder on the node with a file name in the following format:
livedump.<panel_id>.<date>.<time>
You can then copy this file off the node as you copy a normal dump by using the GUI or
SCP. Then, upload the dump to IBM Support for analysis.
532
2. In the Technical SupportSave dialog box (see Figure 15-6), select the switches that you
want to collect data for in the Available SAN Products table. Click the right arrow to move
them to the Selected Products and Hosts table. Then, click OK.
533
Data collection can take 20 - 30 minutes for each selected switch. This estimate can
increase depending on the number of switches selected.
3. To view and save the technical support information, click Monitor Technical Support
View Repository, as shown in Figure 15-8.
534
4. In the Technical Support Repository display (see Figure 15-9), click Save to store the data
on your system.
You find a User Action Event in the Master Log when the download was successful, as shown
in Figure 15-10.
Gathering data: You can gather technical data for M-EOS (McDATA SAN switches)
devices by using the Element Manager of the device.
535
IBM_2005_B5K_1:admin> supportSave
This command will collect RASLOG, TRACE, supportShow, core file, FFDC data
and other support information and then transfer them to a FTP/SCP server
or a USB device. This operation can take several minutes.
NOTE: supportSave will transfer existing trace dump file first, then
automatically generate and transfer latest one. There will be two trace dump
files transfered after this command.
OK to proceed? (yes, y, no, n): [no] y
Host IP or Host Name: 9.43.86.133
User Name: fos
Password:
Protocol (ftp or scp): ftp
Remote Directory: /
Saving support information for switch:IBM_2005_B5K_1,
..._files/IBM_2005_B5K_1-S0-200808132042-CONSOLE0.gz:
Saving support information for switch:IBM_2005_B5K_1,
...files/IBM_2005_B5K_1-S0-200808132042-RASLOG.ss.gz:
Saving support information for switch:IBM_2005_B5K_1,
...M_2005_B5K_1-S0-200808132042-old-tracedump.dmp.gz:
Saving support information for switch:IBM_2005_B5K_1,
...M_2005_B5K_1-S0-200808132042-new-tracedump.dmp.gz:
Saving support information for switch:IBM_2005_B5K_1,
...les/IBM_2005_B5K_1-S0-200808132042-ZONE_LOG.ss.gz:
Saving support information for switch:IBM_2005_B5K_1,
..._files/IBM_2005_B5K_1-S0-200808132044-CONSOLE1.gz:
Saving support information for switch:IBM_2005_B5K_1,
..._files/IBM_2005_B5K_1-S0-200808132044-sslog.ss.gz:
SupportSave completed
IBM_2005_B5K_1:admin>
module:CONSOLE0...
5.77 kB 156.68 kB/s
module:RASLOG...
38.79 kB
0.99 MB/s
module:TRACE_OLD...
239.58 kB
3.66 MB/s
module:TRACE_NEW...
1.04 MB
1.81 MB/s
module:ZONE_LOG...
51.84 kB
1.65 MB/s
module:RCS_LOG...
5.77 kB 175.18 kB/s
module:SSAVELOG...
1.87 kB
55.14 kB/s
536
3. In the Collect And Send Support Logs dialog box (see Figure 15-12 on page 538), click
Start to auto collect and send the data and upload to the predefined FTP server. If the
FTP server is not reachable, save the file locally and upload it manually to the Problem
Management Report (PMR) system.
537
When the collecting process is complete, it shows up under System Log File Name panel,
as shown in Figure 15-13.
4. Click Advanced Select the log Get to save the file on your system, as shown in
Figure 15-13.
538
lssi
lsarray -l
lsrank
lsvolgrp
lsfbvol
lsioport -l
lshostconnect
The complete data collection task normally is performed by the IBM Service Support
Representative (IBM SSR) or the IBM Support center. The IBM product engineering (PE)
package includes all current configuration data and diagnostic data.
539
For more information about the latest flashes, concurrent code upgrades, code levels, and
matrixes, see the following SAN Volume Controller website:
http://www-947.ibm.com/support/entry/portal/Overview/Hardware/System_Storage/Stora
ge_software/Storage_virtualization/SAN_Volume_Controller_%282145%29
540
After the hardware check, continue to check the following aspects of software setup:
Check that the HBA driver level and firmware level are at the preferred and supported
levels.
Check the multipathing driver level, and make sure that it is at the preferred and supported
level.
Check for link layer errors that are reported by the host or the SAN switch, which can
indicate a cabling or SFP failure.
Verify your SAN zoning configuration.
Check the general SAN switch status and health for all switches in the fabric.
Example 15-12 shows that one of the HBAs was experiencing a link failure because of a fiber
optic cable that bent over too far. After you change the cable, the missing paths reappeared.
Example 15-12 Output from datapath query device command after fiber optic cable change
The Recommended Actions panel shows event conditions that require actions and the
procedures to diagnose and fix them. The highest-priority event is indicated with information
about how long ago the event occurred. If an event is reported, you must select the event and
run a fix procedure.
541
Complete the following steps to retrieve properties and sense about a specific event:
1. Select an event in the table.
2. Click Properties in the Actions menu, as shown in Figure 15-16.
Tip: You can also obtain access to the Properties by right-clicking an event.
3. In the Properties and Sense Data for Event sequence_number window (see Figure 15-17,
where sequence_number is the sequence number of the event that you selected in the
previous step), review the information. Then, click Close.
Tip: From the Properties and Sense Data for Event window, you can use the Previous
and Next buttons to move between events.
You now return to the Recommended Actions panel.
542
Another common practice is to use the SAN Volume Controller CLI to find problems. The
following list of commands provides information about the status of your environment:
svctask detectmdisk
Discovers changes in the back-end storage configuration.
svcinfo lscluster clustername
Checks the SAN Volume Controller cluster status.
svcinfo lsnode nodeid
Checks the SAN Volume Controller nodes and port status.
svcinfo lscontroller controllerid
Checks the back-end storage status.
svcinfo lsmdisk
Provides a status of all the MDisks.
svcinfo lsmdisk mdiskid
Checks the status of a single MDisk.
svcinfo lsmdiskgrp
Provides a status of all the storage pools.
svcinfo lsmdiskgrp mdiskgrpid
Checks the status of a single storage pool.
svcinfo lsvdisk
Checks whether volumes are online.
Locating problems: Although the SAN Volume Controller raises error messages, most
problems are not caused by the SAN Volume Controller. Most problems are introduced by
the storage subsystems or the SAN.
If the problem is caused by the SAN Volume Controller and you are unable to fix it by using
the Recommended Action panel or the event log, collect the SAN Volume Controller debug
data as described in 15.2.2, SAN Volume Controller data collection on page 527.
To determine and fix other problems outside of SAN Volume Controller, consider the guidance
in the other sections in this chapter that are not related to SAN Volume Controller.
543
Software failures are more difficult to analyze. In most cases, you must collect data and to
involve IBM Support. But before you take any other steps, check the installed code level for
any known problems. Also, check whether a new code level is available that resolves the
problem that you are experiencing.
544
The most common SAN problems often are related to zoning. For example, perhaps you
choose the wrong WWPN for a host zone, such as when two SAN Volume Controller node
ports must be zoned to one HBA with one port from each SAN Volume Controller node.
However, as shown in Example 15-13, two ports are zoned that belong to the same node.
Therefore, the result is that the host and its multipathing driver do not see all of the necessary
paths. This incorrect zoning is shown in Example 15-13.
Example 15-13 Incorrect WWPN zoning
zone:
Senegal_Win2k3_itsosvccl1_iogrp0_Zone
50:05:07:68:01:20:37:dc
50:05:07:68:01:40:37:dc
20:00:00:e0:8b:89:cc:c2
The correct zoning must look like the zoning that is shown in Example 15-14.
Example 15-14 Correct WWPN zoning
zone:
Senegal_Win2k3_itsosvccl1_iogrp0_Zone
50:05:07:68:01:40:37:e5
50:05:07:68:01:40:37:dc
20:00:00:e0:8b:89:cc:c2
The following SAN Volume Controller error codes are related to the SAN environment:
Error 1060 Fibre Channel ports are not operational.
Error 1220 A remote port is excluded.
If you cannot fix the problem with these actions, use the method that is described in 15.2.3,
SAN data collection on page 532, collect the SAN switch debugging data, and then contact
IBM Support for assistance.
545
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0
id 0
controller_name controller0
WWNN 200400A0B8174431
mdisk_link_count 2
max_mdisk_link_count 4
degraded no
vendor_id IBM
product_id_low 1742-900
product_id_high
product_revision 0520
ctrl_s/n
WWPN 200400A0B8174433
path_count 8
max_path_count 12
WWPN 200500A0B8174433
path_count 0
max_path_count 8
This imbalance has the following possible causes:
546
Another possible cause is that the WWPN with zero count is not visible to all the
SAN Volume Controller nodes through the SAN zoning or the LUN masking on the
storage subsystem.
Use the SAN Volume Controller CLI command svcinfo lsfabric 0 to confirm.
If you are unsure which of the attached MDisks has which corresponding LUN ID, run
the SAN Volume Controller svcinfo lsmdisk CLI command (see Example 15-16). This
command also shows to which storage subsystem a specific MDisk belongs (the
controller ID).
Example 15-16 Determining the ID for the MDisk
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk
id
name
status
mode
mdisk_grp_id mdisk_grp_name
capacity
ctrl_LUN_#
controller_name
UID
0
mdisk0
online
managed
0
MDG-1
600.0GB
0000000000000000
controller0
600a0b800017423300000059469cf84500000000000000000000000000000000
2
mdisk2
online
managed
0
MDG-1
70.9GB
0000000000000002
controller0
600a0b800017443100000096469cf0e800000000000000000000000000000000
In this case, the problem was with the LUN allocation across the DS4500 controllers.
After you fix this allocation on the DS4500, a SAN Volume Controller MDisk
rediscovery fixed the problem from the SAN Volume Controller perspective.
Example 15-17 shows an equally distributed MDisk.
Example 15-17 Equally distributed MDisk on all available paths
IBM_2145:itsosvccl1:admin>svctask detectmdisk
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0
id 0
controller_name controller0
WWNN 200400A0B8174431
mdisk_link_count 2
max_mdisk_link_count 4
degraded no
vendor_id IBM
product_id_low 1742-900
product_id_high
product_revision 0520
ctrl_s/n
WWPN 200400A0B8174433
path_count 4
max_path_count 12
WWPN 200500A0B8174433
path_count 4
max_path_count 8
d. In this example, the problem was solved by changing the LUN allocation. If step 2 does
not solve the problem in your case, continue with step 3.
547
Common error recovery steps by using the SAN Volume Controller CLI
For back-end SAN problems or storage problems, you can use the SAN Volume Controller
CLI to perform common error recovery steps.
Although the maintenance procedures perform these steps, it is sometimes faster to run
these commands directly through the CLI. Run these commands any time that you have the
following issues:
You experience a back-end storage issue (for example, error code 1370 or error code
1630).
You performed maintenance on the back-end storage subsystems.
Important: Run these commands when back-end storage is configured or a zoning
change occurs to ensure that the SAN Volume Controller follows the changes.
Common error recovery involves the following SAN Volume Controller CLI commands:
svctask detectmdisk
Discovers the changes in the back end.
svcinfo lscontroller and svcinfo lsmdisk
Provides overall status of all controllers and MDisks.
svcinfo lscontroller controllerid
Checks the controller that was causing the problems and verifies that all the WWPNs are
listed as you expect.
svctask includemdisk mdiskid
For each degraded or offline MDisk.
svcinfo lsmdisk
Determines whether all MDisks are now online.
svcinfo lscontroller controllerid
Checks that the path_counts are distributed evenly across the WWPNs.
Finally, run the maintenance procedures on the SAN Volume Controller to fix every error.
548
549
mdisk_end
0x0005FFFF
vdisk_start
0x00000000
vdisk_end
0x0000FFFF
mdisk_end
vdisk_start
0x00000000
vdisk_end
0x0000003F
Volume 0 is a fully allocated volume. Therefore, the MDisk LBA information is displayed as
shown in Example 15-18 on page 549.
Volume 14 is a thin-provisioned volume to which the host has not yet performed any I/O. All of
its extents are unallocated. Therefore, the only information that is shown by the lsmdisklba
command is that it is unallocated and that this thin-provisioned grain starts at LBA 0x00 and
ends at 0x3F (the grain size is 32 KB).
LABEL:
SC_DISK_ERR2
IDENTIFIER:
B6267342
Date/Time:
Thu Aug 5 10:49:35 2008
Sequence Number: 4334
Machine Id:
00C91D3B4C00
Node Id:
testnode
Class:
H
Type:
PERM
Resource Name:
hdisk34
Resource Class: disk
Resource Type:
2145
Location:
U7879.001.DQDFLVP-P1-C1-T1-W5005076801401FEF-L4000000000000
VPD:
Manufacturer................IBM
Machine Type and Model......2145
ROS Level and ID............0000
Device Specific.(Z0)........0000043268101002
Device Specific.(Z1)........0200604
Serial Number...............60050768018100FF78000000000000F6
SENSE DATA
0A00 2800 001C ED00 0000 0104 0000 0000 0000 0000 0000 0000 0102 0000 F000 0300
550
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0800
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
: Node7
: mdisk
: 48
: 7073
: 7073
: Thu Jul 24 17:44:13 2008
: Epoch + 1219599853
Last Error Timestamp : Thu Jul 24 17:44:13 2008
: Epoch + 1219599853
Error Count
: 21
Error ID : 10025 : Amedia error has occurred during I/O to a Managed Disk
Error Code
: 1320 : Disk I/O medium error
Status Flag
: FIXED
Type Flag
: TRANSIENT ERROR
40
6D
04
02
00
00
00
00
11
80
02
03
00
00
00
00
40
00
00
11
00
00
00
00
02
00
02
0B
00
00
00
00
00
40
00
80
00
00
00
0B
00
00
00
6D
00
00
00
00
00
00
00
59
00
00
00
00
00
00
00
58
00
00
00
00
00
00
00
00
00
00
00
04
00
00
01
00
00
00
00
00
00
00
0A
00
00
00
00
00
02
00
00
00
00
00
00
00
28
00
00
08
00
00
00
10
00
00
80
00
00
00
00
00
58
80
00
C0
00
00
00
02
59
00
00
AA
00
00
00
01
551
552
Part 4
Part
Practical examples
This part shows practical examples of typical procedures that use the preferred practices that
are highlighted in this IBM Redbooks publication. Some of the examples were taken from
actual cases in production environment, and some examples were run in IBM Laboratories.
553
554
16
Chapter 16.
SAN Volume Controller upgrade with CF8 nodes and internal solid-state drives
Handling Stuck SAN Volume Controller Code Upgrades
Moving an AIX server
Migrating to a new SAN Volume Controller by using Copy Services
SAN Volume Controller scripting
Migrating AIX cluster volumes off DS4700
Easy Tier and FlashSystem planned outages
Changing LUN ID presented to a VMware ESXi host
555
556
real_capacity
0.00MB
0.00MB
0.00MB
0.00MB
0.00MB
Complete the following steps to upgrade the SAN Volume Controller code from V5 to V6.2:
1. Complete the steps that are described in 14.4.1, Preparing for the upgrade on page 498.
Verify the attached servers, SAN switches, and storage controllers for errors. Define the
current and target SAN Volume Controller code levels, which in this case are 5.1.0.8 and
6.2.0.2.
2. From IBM Storage Support website, download the following software:
Figure 16-1 Upload SAN Volume Controller Upgrade Test Utility version 6.6
4. In the File to Upload field that is in the File Upload pane (on the right side of Figure 16-1),
select the SAN Volume Controller Upgrade Test Utility. Click OK to copy the file to the
cluster. Point the target version to SAN Volume Controller code release 5.1.0.10. Fix any
errors that the Upgrade Test Utility finds before you proceed.
Important: Before you proceed, ensure that all servers that are attached to this SAN
Volume Controller include compatible multipath software versions. You must also
ensure that, for each server, the redundant disk paths are working error free. In
addition, you must have a clean exit from the SAN Volume Controller Upgrade Test
Utility.
5. Install SAN Volume Controller Code release 5.1.0.10 in the cluster.
6. In the Software Upgrade Status window (see Figure 16-2 on page 558), click Check
Upgrade Status to monitor the upgrade progress.
557
Figure 16-2 SAN Volume Controller Code upgrade status monitor by using the GUI
Example 16-2 shows how to monitor the upgrade by using the CLI.
Example 16-2 Monitoring the SAN Volume Controller code upgrade by using the CLI
IBM_2145:svccf8:admin>svcinfo lssoftwareupgradestatus
status
upgrading
IBM_2145:svccf8:admin>
7. After the upgrade to SAN Volume Controller code release 5.1.0.10 is completed, check the
SAN Volume Controller cluster again for any possible errors as a precaution.
8. Migrate the existing VDisks from the existing SSDs managed disk group. Example 16-3
shows a simple approach that uses the migratevdisk command.
Example 16-3 Migrating SAN Volume Controller VDisk by using the migratevdisk command
Example 16-4 SAN Volume Controller VDisk migration by using VDisk mirror copy
IBM_2145:svccf8:admin>svctask chiogrp -feature mirror -size 1 io_grp0
IBM_2145:svccf8:admin>svctask addvdiskcopy -mdiskgrp MDG4DS8KL3331 -syncrate 55 NYBIXTDB02_T03
Vdisk [0] copy [1] successfully created
IBM_2145:svccf8:admin>svcinfo lsvdisk NYBIXTDB02_T03
id 0
name NYBIXTDB02_T03
IO_group_id 0
IO_group_name io_grp0
status online
mdisk_grp_id many
mdisk_grp_name many
capacity 20.00GB
type many
formatted no
mdisk_id many
mdisk_name many
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018205E12000000000000000
throttling 0
preferred_node_id 2
fast_write_state empty
cache readwrite
udid 0
fc_map_count 0
sync_rate 55
copy_count 2
copy_id 0
status online
sync yes
primary yes
mdisk_grp_id 2
mdisk_grp_name MDG3SVCCF8SSD
type striped
mdisk_id
mdisk_name
fast_write_state empty
used_capacity 20.00GB
real_capacity 20.00GB
free_capacity 0.00MB
overallocation 100
autoexpand
warning
grainsize
copy_id 1
status online
sync no
primary no
mdisk_grp_id 3
mdisk_grp_name MDG4DS8KL3331
type striped
mdisk_id
mdisk_name
fast_write_state empty
used_capacity 20.00GB
Chapter 16. SAN Volume Controller scenarios
559
real_capacity 20.00GB
free_capacity 0.00MB
overallocation 100
autoexpand
warning
grainsize
IBM_2145:svccf8:admin>
IBM_2145:svccf8:admin>svctask addvdiskcopy -mdiskgrp MDG4DS8KL3331 -syncrate 75 NYBIXTDB02_T03
Vdisk [0] copy [1] successfully created
IBM_2145:svccf8:admin>svcinfo lsvdiskcopy
vdisk_id vdisk_name
copy_id status sync primary mdisk_grp_id mdisk_grp_name
capacity
0
NYBIXTDB02_T03
0
online yes
yes
2
MDG3SVCCF8SSD
20.00GB
0
NYBIXTDB02_T03
1
online no
no
3
MDG4DS8KL3331
20.00GB
IBM_2145:svccf8:admin>
IBM_2145:svccf8:admin>svcinfo lsvdiskcopy
vdisk_id vdisk_name
copy_id status sync
primary mdisk_grp_id mdisk_grp_name
capacity
0
NYBIXTDB02_T03
0
online yes
yes
2
MDG3SVCCF8SSD
20.00GB
0
NYBIXTDB02_T03
1
online yes
no
3
MDG4DS8KL3331
20.00GB
IBM_2145:svccf8:admin>
IBM_2145:svccf8:admin>svctask rmvdiskcopy -copy 0 NYBIXTDB02_T03
IBM_2145:svccf8:admin>svcinfo lsvdisk NYBIXTDB02_T03
id 0
name NYBIXTDB02_T03
IO_group_id 0
IO_group_name io_grp0
status online
mdisk_grp_id 3
mdisk_grp_name MDG4DS8KL3331
capacity 20.00GB
type striped
formatted no
mdisk_id
mdisk_name
FC_id
FC_name
RC_id
RC_name
vdisk_UID 60050768018205E12000000000000000
throttling 0
preferred_node_id 2
fast_write_state empty
cache readwrite
udid 0
fc_map_count 0
sync_rate 75
copy_count 1
copy_id 1
status online
sync yes
primary yes
mdisk_grp_id 3
mdisk_grp_name MDG4DS8KL3331
type striped
mdisk_id
mdisk_name
fast_write_state empty
used_capacity 20.00GB
real_capacity 20.00GB
free_capacity 0.00MB
560
type
striped
striped
type
striped
striped
overallocation 100
autoexpand
warning
grainsize
IBM_2145:svccf8:admin>
9. Remove the SSDs from their managed disk group. If you try to run the svcupgradetest
command before you remove the SSDs, errors are still returned, as shown in
Example 16-5. Because we planned to no longer use the managed disk group, the
managed disk group also was removed.
Example 16-5 SAN Volume Controller internal SSDs placed into an unmanaged state
IBM_2145:svccf8:admin>svcupgradetest -v 6.2.0.2 -d
svcupgradetest version 6.6
Please wait while the tool tests for issues that may prevent
a software upgrade from completing successfully. The test may
take several minutes to complete.
Checking 34 mdisks:
******************** Error found ********************
The requested upgrade from 5.1.0.10 to 6.2.0.2 cannot
be completed as there are internal SSDs are in use.
Please refer to the following flash:
http://www.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003707
561
11.In the Software Upgrade Status window (see Figure 16-3), click Check Upgrade Status
to monitor the upgrade progress. You notice the GUI changing its shape.
562
Figure 16-5 SAN Volume Controller cluster running SAN Volume Controller code release 6.2.0.2
12.After the upgrade is complete, click Launch Management GUI (see Figure 16-5) to
restart the management GUI.
The management GUI now runs in one SAN Volume Controller node instead of the SAN
Volume Controller console, as shown in Figure 16-6.
563
From the GUI home page (see Figure 16-7), click Physical Storage Internal. Then, on
the Internal page, click Configure Storage in the upper left corner of the right pane.
15.Because two drives are unused, click Yes to continue when you are prompted about
whether to include them in the configuration, as shown in Figure 16-8.
564
Figure 16-9 shows the progress as the drives are marked as candidates.
16.Complete the following steps in the Configure Internal Storage window (see Figure 16-10):
a. Select a RAID preset for the SSDs. For more information, see Table 14-2 on page 506.
b. Confirm the number of SSDs (see Figure 16-11 on page 566) and the RAID preset.
565
c. Click Next.
17.Select the storage pool (former managed disk group) to include the SSDs, as shown in
Figure 16-12. Click Finish.
18.In the Create RAID Arrays window (see Figure 16-13 on page 567), review the status.
When the task is completed, click Close.
566
The SAN Volume Controller now continues the SSD array initialization process, but places the
Easy Tier function of this pool in the Active state by collecting I/O data to determine which
VDisk extents to migrate to the SSDs. You can monitor your array initialization progress in the
lower right corner of the Tasks panel, as shown in Figure 16-14.
The upgrade is finished. If you did not yet do so, plan your next steps into fine-tuning the Easy
Tier function. If you do not have any other SAN Volume Controller clusters that are running
SAN Volume Controller code V5.1 or earlier, you can install SAN Volume Controller Console
code V6.
567
60050768019001277000000000000147
60050768019001277000000000000148
60050768019001277000000000000149
6005076801900127700000000000014A
6005076801900127700000000000014B
Example 16-6 Commands to move the AIX server to another pSeries LPAR
###
### Verify that both old and new HBA WWPNs are logged in both fabrics:
### Here an example in one fabric
###
b32sw1_B64:admin> nodefind 10:00:00:00:C9:59:9F:6C
Local:
Type Pid
COS
PortName
NodeName
SCR
N
401000;
2,3;10:00:00:00:c9:59:9f:6c;20:00:00:00:c9:59:9f:6c; 3
Fabric Port Name: 20:10:00:05:1e:04:16:a9
Permanent Port Name: 10:00:00:00:c9:59:9f:6c
Device type: Physical Unknown(initiator/target)
Port Index: 16
Share Area: No
Device Shared in Other AD: No
Redirect: No
Partial: No
Aliases: nybixpdb01_fcs0
b32sw1_B64:admin> nodefind 10:00:00:00:C9:99:56:DA
Remote:
Type Pid
COS
PortName
NodeName
N
4d2a00;
2,3;10:00:00:00:c9:99:56:da;20:00:00:00:c9:99:56:da;
Fabric Port Name: 20:2a:00:05:1e:06:d0:82
Permanent Port Name: 10:00:00:00:c9:99:56:da
Device type: Physical Unknown(initiator/target)
Port Index: 42
Share Area: No
Device Shared in Other AD: No
Redirect: No
Partial: No
Aliases:
b32sw1_B64:admin>
###
### Cross check SVC for HBAs WWPNs amd LUNid
###
IBM_2145:VIGSVC1:admin>
IBM_2145:VIGSVC1:admin>svcinfo lshost nybixpdb01
id 20
name nybixpdb01
port_count 2
type generic
mask 1111
iogrp_count 1
WWPN 10000000C9599F6C
node_logged_in_count 2
state active
WWPN 10000000C9594026
node_logged_in_count 2
state active
IBM_2145:VIGSVC1:admin>svcinfo lshostvdiskmap nybixpdb01
id
name
SCSI_id
vdisk_id
20
nybixpdb01
0
47
20
nybixpdb01
1
48
20
nybixpdb01
2
119
20
nybixpdb01
3
118
20
nybixpdb01
4
243
20
nybixpdb01
5
244
20
nybixpdb01
6
245
20
nybixpdb01
7
246
IBM_2145:VIGSVC1:admin>
vdisk_name
nybixpdb01_d01
nybixpdb01_d02
nybixpdb01_d03
nybixpdb01_d04
nybixpdb01_d05
nybixpdb01_d06
nybixpdb01_d07
nybixpdb01_d08
wwpn
10000000C9599F6C
10000000C9599F6C
10000000C9599F6C
10000000C9599F6C
10000000C9599F6C
10000000C9599F6C
10000000C9599F6C
10000000C9599F6C
vdisk_UID
60050768019001277000000000000030
60050768019001277000000000000031
60050768019001277000000000000146
60050768019001277000000000000147
60050768019001277000000000000148
60050768019001277000000000000149
6005076801900127700000000000014A
6005076801900127700000000000014B
###
### At this point both the old and new servers were brought down.
### As such, the HBAs would not be logged into the SAN fabrics, hence the use of the -force parameter.
### For the same reason, it makes no difference which update is made first - SAN zones or SVC host definitions
###
svctask addhostport -hbawwpn 10000000C99956DA -force nybixpdb01
svctask addhostport -hbawwpn 10000000C9994E98 -force nybixpdb01
svctask rmhostport -hbawwpn 10000000C9599F6C -force nybixpdb01
569
vdisk_name
nybixpdb01_d01
nybixpdb01_d02
nybixpdb01_d03
nybixpdb01_d04
nybixpdb01_d05
nybixpdb01_d06
nybixpdb01_d07
nybixpdb01_d08
wwpn
10000000C9994E98
10000000C9994E98
10000000C9994E98
10000000C9994E98
10000000C9994E98
10000000C9994E98
10000000C9994E98
10000000C9994E98
vdisk_UID
60050768019001277000000000000030
60050768019001277000000000000031
60050768019001277000000000000146
60050768019001277000000000000147
60050768019001277000000000000148
60050768019001277000000000000149
6005076801900127700000000000014A
6005076801900127700000000000014B
After the new LPAR shows both its HBAs as active, you can confirm that it recognized all SAN
disks that were previously assigned and that they all had healthy disk paths.
The initial configuration was the typical SAN Volume Controller environment with a 2-node
cluster, a DS8000 series as a back-end storage controller, and servers that are attached
through redundant, independent SAN fabrics, as shown in Figure 16-15.
By using SAN Volume Controller Copy Services to move the data from the old infrastructure
to the new infrastructure, you can do so with the production servers and applications still
running. You can also fine-tune the replication speed as you attempt to achieve the fastest
possible migration without causing any noticeable performance degradation.
This scenario asks for a brief, planned outage to restart each server from one infrastructure to
the other. Alternatives are possible to perform this move fully online. However, in our case, we
had a pre-scheduled maintenance window every weekend and kept an integral copy of the
servers data before the move, which allows a quick back out if required.
The new infrastructure is installed and configured with the new SAN switches that are
attached to the existing SAN fabrics (preferably by using trunks for bandwidth) and the new
SAN Volume Controller ready to use, as shown in Figure 16-16 on page 572.
571
New infrastructure is
installed and connected to
the existing SAN
infrastructure
Also, the necessary SAN zoning configuration is made between the initial and the new SAN
Volume Controller clusters, and a remote copy partnership is established between them
(notice the -bandwidth parameter). Then, for each VDisk in use by the production server, we
created a target VDisk in the new environment with the same size and a remote copy
relationship between these VDisks. We included this relationship in a consistency group.
The initial VDisks synchronization was started, which took some time for the copies to
become synchronized considering the large amount of data and the bandwidth that stayed at
its default value as a precaution.
Example 16-7 shows the SAN Volume Controller commands to set up the remote copy
relationship.
Example 16-7 SAN Volume Controller commands to set up a remote copy relationship
SVC commands used in this phase:
# lscluster
# mkpartnership -bandwidth <bw> <svcpartnercluster>
# mkvdisk -mdiskgrp <mdg> -size <sz> -unit gb -iogrp <iogrp> -vtype striped -node <node> -name <targetvdisk> -easytier off
# mkrcconsistgrp -name <cgname> -cluster <svcpartnercluster>
# mkrcrelationship -master <sourcevdisk> -aux <targetvdisk> -name <rlname> -consistgrp <cgname> -cluster <svcpartnercluster>
# startrcconsistgrp -primary master <cgname>
# chpartnership -bandwidth <newbw> <svcpartnercluster>
572
Figure 16-17 shows the initial remote copy relationship setup that results from successful
completion of the commands.
Figure 16-17 Initial SAN Volume Controller remote copy relationship setup
After the initial synchronization finished, a planned outage was scheduled to reconfigure the
server to use the new SAN Volume Controller infrastructure. Figure 16-18 shows what
happened in the planned outage. The I/O from the production server is quiesced and the
replication session is stopped.
Figure 16-18 Planned outage to switch over to the new SAN Volume Controller
573
The next step is to move the fiber connections, as shown in Figure 16-19.
With the server reconfigured, the application is restarted, as shown in Figure 16-20.
574
After some time for testing, the remote copy session is removed and the move to the new
environment is completed, as shown in Figure 16-21.
Figure 16-21 Removing remote copy relationships and reclaiming old space (backup copy)
575
The private key for authentication (for example, icat.ppk), which is the private key that
you created. To set this parameter, select Connection SSH Auth in the left pane of
the PuTTY Configuration window, as shown in Figure 16-23.
576
The IP address of the SAN Volume Controller cluster. To set this parameter, select
Session at the top of the left pane of the PuTTY Configuration window, as shown in
Figure 16-24.
When you are specifying the basic options for your PuTTY session, you need the following
information:
A session name, which in this example is redbook_CF8.
The PuTTY version, which is 0.61.
To use the predefined PuTTY session, use the following syntax:
plink redbook_CF8
If you do not use a predefined PuTTY session, use the following syntax:
plink admin@<your cluster ip address> -i "C:\DirectoryPath\KeyName.PPK"
577
Example 16-8 show a script to restart Global Mirror relationships and groups.
Example 16-8 Restarting Global Mirror relationships and groups
578
579
16.6.1 Preparation
In the initial configuration both AIX cluster nodes (aix01 and aix02), use only storage that is
exported from DS4700 volumes. In this phase, you must document the environment to
re-create it in the final configuration. Because some applications might be sensitive to device
names and LUN mappings, this scenario preserves both device names and LUN IDs after the
migration. This process reduces the possibility that no issues emerge after the storage is
migrated from DS4700 system to Storwize V7000.
During this phase of the migration, you should gather and record the following information
about each volume that is presented by the DS4700 system to each AIX host:
This information makes it easier for you to successfully re-create the configuration after the
storage is migrated. A configuration might be asymmetrical; for example, a specific volume on
DS4700 storage might appear as hdiskX on one host and as hdiskY on the second host.
Also, you should record the following information for the purposes of zoning definitions in the
SAN environment:
580
WWPNs of HBAs that are used on the AIX hosts and their aliases in the SAN configuration
Name of zones that contain AIX hosts and DS4700 controller ports
WWPNs of DS 4700 controller ports
WWPNs of Storwize V7000 controller ports
Complete the following steps to define an empty storage pool on Storwize V7000 into which
you import the image mode volumes:
1.
2.
3.
4.
In the Storwize V7000 GUI, click Pools MDisks by Pools and then click New Pool.
Enter the wanted storage pool name (for example, aix_cluster_img).
Click Next then, click Create.
Click OK to confirm that you want to create an empty storage pool.
If you plan to use an existing storage pool to host the volumes for the AIX cluster, make sure
that it has sufficient free space. Alternatively, define a new storage pool into which the
volumes are migrated.
If you do not have unused licensing capacity for external systems, in the Storwize V7000 GUI,
click Settings General and then click Licensing. In the External Virtualization field,
increase the value by 1. With this setting, you can attach the DS4700 storage to the Storwize
V7000 system. Since version 6.2 of SAN Volume Controller code, you can exceed the
licensed virtualization entitlement for 45 days from the installation date to migrate data into
the new Storwize V7000 system.
On the AIX hosts, install SDDPCM drivers and the host attachment kit on the AIX servers.
They are needed for the operating system to communicate with Storwize V7000 storage
system.
Additionally, verify that your environment meets all the requirements, including versions of
HBA firmware, FC switch firmware, OS, and driver levels. Update components as required to
run a supported combination.
Before you proceed, ensure that you have the current and verified backups of your
environment. Whenever you change your storage configuration, it is always a good practice to
have a current backup that you know you can restore in case of emergency. You can also
make a copy of the rootvg volume group by running the alt_disk_copy command, which
provides you with a quick recovery path if there are unexpected problems.
To migrate volumes from DS 4700 to Storwize V7000, you must zone these systems. Follow
the guidelines concerning external storage configuration requirements that are available at
this website:
http://pic.dhe.ibm.com/infocenter/storwize/ic/index.jsp?topic=%2Fcom.ibm.storwize.
v7000.doc%2Fsvc_configdiskcontrollersovr_22n9uf.html
For more information about zoning requirements, see this website:
http://pic.dhe.ibm.com/infocenter/storwize/ic/index.jsp?topic=%2Fcom.ibm.storwize.
v7000.doc%2Fsvc_configrules_21rdwo.html
In the DS 4700 Storage Manager GUI, define a new storage partition and set the host type to
IBM TS SAN VCE. Add Storwize V7000 controller WWPNs to the definition of the storage
partition.
In the Storwize V7000 GUI, click Hosts Hosts New Host. Define AIX hosts by using
HBA WWPNs that you gathered in 16.6.1, Preparation on page 580.
After DS 4700 and Storwize V7000 are zoned, open the Storwize V7000 GUI and click
Pools External Storage Detect MDisks. You should see a new storage controller
detected. At this time, it does not present any MDisks to the Storwize V7000.
All the tasks up to this point do not require any downtime and can be done in a working
environment.
581
582
c. After you present the volume to the AIX host, run cfgmgr command on the host. The
operating system creates a single hdisk device at a time, which gives you full control
over the mapping of volumes to hdisks. Perform this step for each volume you need to
present to the host in the order that gives the wanted hdisk names. Make sure you use
LUN IDs that you gathered in 16.6.1, Preparation on page 580 when volumes are
presented to the hosts.
The data migration is done by creating a mirror copy of the imported image mode volume in
the target storage pool. By using this approach, you can control the speed of data replication
and therefore, prevent overloading the DS4700 storage system, which must handle more I/Os
that are generated by the migration process.
After the volume mirroring process is complete, you delete the copy that is on the image
mode volume that was imported from the DS4700 storage system.
Complete the following steps to create a mirrored copy of the volume:
1. In the Storwize V7000 GUI, click Volumes Volumes and then right-click an image mode
volume that was imported from the DS4700 storage.
2. From the menu, choose Volume Copy Actions and then Add Mirrored Copy.
A window opens in which you can choose the destination storage pool. Choose the target
pool and click Add Copy.
3. In the Running Tasks window, a new task appears that is called Volume Synchronization. If
you click this task, you see the name of the volume that is copied, the name of the copy,
and the estimated time until the replication operation completes. The name of the copy (for
example, copy 1).
Repeat these steps for all image mode volumes that were imported from the DS 4700
system.
When the task completes, complete the following steps:
1. In the Storwize V7000 GUI, click Pools Volumes by Pool.
2. Right-click the newly created copy of the volume (for example, copy 1) and click Make
Primary.
3. Right-click the copy of the volume that is on the image mode volume that was imported
from the DS4700 and click Delete this Copy. Confirm your choice by clicking Yes.
The migration process is complete.
583
In the Storwize V7000 GUI, click Pools External Storage and click Detect MDisks.
There should be no external controllers visible. The migration is completed.
If you increased the number of licensed External Virtualization units (as described in section
16.6.1, Preparation on page 580), remember to return it to the original value.
Important: Verify that Easy Tier extents were migrated before you shut down and perform
maintenance. Manually verify that all Easy Tier extents were placed back in their original
location after the maintenance process. If there are other environmental workload
changes, the extents might be placed back in the same location.
584
585
The volume must be unpresented from the VMware ESXi hosts before its LUN ID can be
changed. The following prerequisites that must be met before a LUN can be unpresented
from a VMware ESXi host:
All objects (virtual machines, snapshots, templates, CD/DVD images, and so on) must be
unregistered or removed from the datastore.
The datastore cannot be used for vSphere HA heartbeat.
The datastore cannot be a part of a datastore cluster.
The datastore cannot be managed by Storage DRS.
No process that is running on the ESXi host can access the LUN.
The LUN cannot be used as the persistent scratch location for the host.
If the LUN is used as RDM storage by a virtual machine, delete this LUN from the
configuration of the virtual machine. This action removes the mapping of the LUN to the
virtual machine but preserves the contents of the LUN.
Note: You must coordinate the change of the virtual machine configuration with the
administrator of the operating system of the virtual machine.
To unpresent a datastore from a host, no virtual machines can be registered on this
datastore. However, it is important to understand this requirement because it means that no
virtual machine that is running on the host from which the datastore is unpresented is using
this datastore. In our scenario, server_00 and server_01 have their virtual disks on
datastore_1, which must be unpresented from host esxi01 to have its LUN ID changed.
However, server_01 is running on host esxi02, so no change in configuration is needed.
Because server_00 is running on host esxi01, you must move it to another host by using
VMware vMotion. This operation results in no downtime and server_00 can be moved back
after the LUN ID change operation is completed.
Complete the following steps to check whether and what objects are on the datastore
datastore_1 and to unmount the datastore:
1. In the vSphere Client, switch the view by clicking Home Inventory Datastores and
Datastore Clusters. As shown in Figure 16-27 on page 587, there are two servers
present on the datastore.
586
2. Use VMware vMotion to migrate server_00 to host esxi02. Right-click the server_00 entry
and choose Migrate (see Figure 16-28).
3. Set host esxi02 as the destination host for the virtual machine. After the migration
completes, the environment is ready for other operations.
4. In the vSphere Client, switch to the Configuration tab of the esxi01 host and click Storage
Adapters entry.
587
In the lower right pane, you can see that LUN ID 3 is used to access a device with the
identifier naa.60050768028a02b9300000000000001c. This is the datastore with the
incorrectly assigned LUN ID (see Figure 16-29). This information is available if the
Devices option is clicked, as indicated by the arrow in Figure 16-29.
5. Switch to the Storage view, right-click the datastore_1 entry, and click Unmount, as
shown in Figure 16-30.
6. A window opens in which the unmount prerequisites are listed. If all of the prerequisites
are met, click OK to confirm unmounting the datastore.
588
7. Switch to the Storage view and in the lower right pane, click Devices. In this view,
right-click the device identifier and click Detach, as shown Figure 16-31. A window opens
in which the prerequisites are listed. If all the prerequisites are met, click OK to confirm
detaching the device.
Complete the following steps to remove the mapping of the LUN on the Storwize family
storage device:
1. In the Storwize family storage device GUI, click Volumes Volumes by Pool and
right-click Map to Host. Choose esxi01 from the pull-down menu.
2. In the right pane, select esxi_datastore_1 and click Unmap, as shown in Figure 16-32 on
page 590.
589
3. Click Map Volumes. The volume is unpresented from the esxi01 host.
4. In the vSphere Client, switch to the Configuration tab of the ESXi host, click Storage
Adapters and run the Rescan task on the storage adapters. The device is removed from
the Storage Adapters view, as shown in Figure 16-33.
590
Complete the following steps to map the volume to the ESXi host with the correct LUN ID:
1. In the Storwize family storage device GUI, click Volumes Volumes by Pool view and
right-click Map to Host.
2. Choose esxi01 from the pull-down menu. The volume is mapped to the host. Make sure
that the correct LUN ID (ID 2) is set in the right pane (as shown in Figure 16-34) and click
Map Volumes.
3. In the vSphere Client, switch to the Configuration tab of the esxi01 host, click Storage
Adapters and run the Rescan task on the storage adapters. The device is added to the
Storage Adapters view.
Because the detached state is persistent, you must right-click the device and click Attach.
Similarly, the unmounted state of the datastore is persistent. You must switch to the
Configuration tab of the esxi01 host, click Storage and then click Rescan All to detect the
datastore.
591
4. Right-click the datastore that is added again and then click Mount, as shown in
Figure 16-35.
The procedure is completed. The LUN IDs of volumes that are presented to the hosts are
consistent across all VMware ESXi hosts in the cluster.
592
17
Chapter 17.
593
17.1 Overview
With the current trend of data growth in the IT industry and ongoing economic turmoil, there is
an immediate need for technologies that optimize and reduce the amount of data that is
written to disk storage and reduce the costs. Most available traditional methods of data
optimization technologies involve a post-compression mechanism, which means that the
optimization is done on data sets that are stored on the disks and are not effective in nature.
Contrary to conventional methods, IBM Real-time Compression is an inline data compression
technology that performs real-time compression of active primary data before it is written to
the disk storage without affecting performance. IBM Real-time Compression technology is
embedded into IBM System Storage SAN Volume Controller and IBM Storwize V7000
Software stack, starting with SAN Volume Controller version 6.4 and is based on the proven
Random-Access Compression Engine (RACE).
This chapter outlines the preferred practices to follow when IBM Real-time Compression is
used with IBM System Storage SAN Volume Controller and Storwize V7000 systems, which
enables customers to enjoy the compression savings IBM Real-time Compression technology
offers.
There are many IBM Redbooks publications about IBM Real-time Compression in SAN
Volume Controller and IBM Storwize V7000, including the following publications:
Real-time Compression in SAN Volume Controller and Storwize V7000, REDP-4859
Implementing IBM Real-time Compression in SAN Volume Controller and IBM Storwize
V7000, TIPS1083
These books cover many aspects of implementing compression. This chapter complements
those publications and provides details to reflect the compression enhancements in SAN
Volume Controller version 7.2.
594
595
Note: Comprestimator can run for a long period (a few hours) when it is scanning a
relatively empty device. The utility randomly selects and reads 256 KB samples from
the device. If the sample is empty (that is, full of null values), it is skipped. A minimum
number of samples with actual data are required to provide an accurate estimation.
When a device is mostly empty, many random samples are empty. As a result, the utility
runs for a longer time as it tries to gather enough non-empty samples that are required
for an accurate estimate. If the number of empty samples is over 95%, the scan is
stopped.
Use Table 17-1 thresholds for volume compressibility to determine whether to compress a
volume.
Table 17-1 To compress or not
Data Compression Rate
Recommendation
Use compression
596
This setup is not ideal because CPU and memory resources are dedicated for compression
use in all four nodes; however, in nodes 3 and 4, this allocation is used only for serving 20
volumes out of a total of 200 compressed volumes. The following preferred practices in this
scenario should be used:
Alternative 1: Migrate all compressed volumes from iogrp1 to iogrp0
Alternative 2: Migrate compressed volumes from iogrp0 to iogrp1 and load balance
across nodes. Table 17-2 shows the load distribution.
Table 17-2 Load distribution
node1
node2
node3
node4
Original setup
90 compressed
volumes
X
non-compressed
volumes
90 compressed
volumes
X
non-compressed
volumes
10 compressed
volumes
X
non-compressed
volumes
10 compressed
volumes
X
non-compressed
volumes
Alternative 1
100 compressed
volumes
X
non-compressed
volumes
100 compressed
volumes
X
non-compressed
volumes
X
non-compressed
volumes
X
non-compressed
volumes
Alternative 2
50 compressed
volumes
X
non-compressed
volumes
50 compressed
volumes
X
non-compressed
volumes
50 compressed
volumes
X
non-compressed
volumes
50 compressed
volumes
X
non-compressed
volumes
597
Consider the following points regarding SAN Volume Controller CG8 nodes (6-core
systems):
If CPU utilization on the nodes in the I/O group is below 50%, this I/O group is suitable
for using compression.
If the CPU utilization of a node is sustained above 50% most of the time, this I/O group
might not be suitable for compression because it is too busy. If your system is
supported for the Non-Disruptive Volume Move feature, this volume can be moved to
another I/O group that has the resources that are required for compression.
Table 17-3 shows the preferred practice CPU resource recommendations. Compression is
recommended for an I/O Group if the sustained CPU utilization is below the values that are
listed.
Table 17-3 CPU resources recommendations
Per Node
SAN Volume
Controller CF8
and CG8 (4 core)
SAN Volume
Controller CG8
(6 core)
SAN Volume
Controller CG8
(12 core)
Storwize V7000
CPU already
close to or above
25%
50%
No consideration
25%
Add nodes if CPU utilization is consistently above the levels that are shown. Upgrade existing
SAN Volume Controller CG8 to SAN Volume Controller-CG8-Dual-CPU-RPQ, as needed.
598
Starting with V7.1 and later, performance improvements were made that reduce the
probability of a cache throttling situation. Yet, in heavy sequential write scenarios, this
behavior of full cache can still occur and the parameter that is described in this section can
help to solve this situation.
If none of these options help, it is recommended to separate compressed and
non-compressed volumes to different storage pools. The compressed and non-compressed
volumes do not share the cache partition and the non-compressed volumes are not affected.
Storwize
V7000
Worst Case
Storwize
V7000
Best Case
SAN Volume
Controller
CG8 Worst
Case
SAN Volume
Controller
CG8 Best
Case
SAN Volume
Controller
CG8 Dual
CPU Worst
Case
SAN Volume
Controller
CG8 Dual
CPU Best
Case
Read Miss 4
KB Random
IOPs
1,259
60,948
1,928
133,328
61.230
158,163
Write Miss 4
KB Random
IOPs
1,165
12,455
2,155
77,612
24,312
98,573
70/30 Miss 4
KB Random
IOPs
1.642
51,033
2,716
108,318
50,984
127,618
599
600
For more information about implementing IBM Easy Tier with IBM Real-time Compression,
see Implementing IBM Easy Tier with IBM Real-time Compression, TIPS1072
601
602
Appendix A.
IBM i considerations
IBM Storwize Family is an excellent storage solution for midrange and high-end IBM i
customers. IBM SAN Volume Controller provides virtualization of different storage systems to
an IBM i customer. SAN Volume Controller and Storwize enable IBM i installations for
business continuity solutions that are extensively used.
In this appendix, we provide preferred practice and guidelines for implementing Storwize
family and SAN Volume Controller with IBM i.
603
Main memory
Single-level storage makes main memory work as a large cache. Reads are done from pages
in main memory, and read requests to disk are done only when the needed page is not in
main memory. Writes are done to main memory, and write operations to disk are performed
only as a result of swap or file close, and so on. Therefore, application response time
depends not only on disk response time, but on many other factors, such as how large the
IBM i storage pool is for the application, how frequently the application closes files, and
whether it uses journaling.
604
The data that was previously stored in 8 * sectors is now spread across 9 * sectors, so the
required disk capacity on SAN Volume Controller or Storwize V7000 is 9/8 of the IBM i usable
capacity. Also, the usable capacity in IBM i is 8/9 of the allocated capacity in these storage
systems.
Therefore, when a SAN Volume Controller or Storwize V7000 is attached to IBM i, you should
have the capacity overhead on the storage system to use only 8/9ths of the effective capacity.
The performance effect of block translation in IBM i is insignificant or negligible.
Native connection
Native connection requires that IBM i logical partition (LPAR) is in POWER7 and that IBM i
level V7.1, Technology Release 6, Resave 710-H or higher is installed.
605
You must adhere to the following rules for mapping server virtual FC adapters to the ports in
VIOS when an NPIV connection is implemented:
Map a maximum of one virtual FC adapter from an IBM i LPAR to a port in VIOS.
You can map up to 64 virtual FC adapters each from another IBM i LPAR to the same port
in VIOS.
You can use the same port in VIOS for NPIV mapping and a connection with VIOS VSCSI.
If PowerHA solutions of IBM i independent auxiliary storage pool (IASP) is implemented,
you must map the virtual FC adapter of System disk pool to different port than virtual FC
adapter of the IASP.
The SCSI command tag queue depth on a LUN with this type of connection is 16.
606
For disk device attributes with a VIOS Virtual SCSI connection, specify the following attributes
for each hdisk device that represents a SAN Volume Controller or Storwize LUN that is
connected to IBM i:
If Multipath with two or more VIOS is used, the attribute reserve_policy should be set to
no_reserve.
The attribute queue_depth should be set to 32.
The attribute algorithm should be set to load_balance.
Setting reserve_policy to no_reserve is required to be set in each VIOS if Multipath with
two or more VIOS is implemented to remove SCSI reservation on the hdisk device.
Setting queue_depth to 32 is recommended for performance reasons. In setting this value,
we make sure that the maximum number of I/O requests that can be outstanding on a
hdisk in the VIOS at a time matches maximal number of 32 I/O operations that IBM i
operating system allows at a time to one VIOS VSCSI that is connected LUN.
Setting algorithm to load_balance is recommended for performance reason. By setting
this value, we ensure that SDDPCM driver in VIOS balances the I/O across available path
to Storwize or SAN Volume Controller.
70% Read
50% Read
RAID-1 or RAID-10
138
122
RAID-5
96
75
RAID-1 or RAID-10
92
82
RAID-5
64
50
For example, the IBM i workload experiences 1500 I/O per second at its peak and the
read/write ratio is approximately 50/50. We are planning Storwize V7000 with 15 K RPM disk
drives in RAID-10. The following calculation is used for the needed number of disk drives:
1500 (IBM i peak IO/sec) / 122 (IO/sec per 15 K RPM disk drive in RAID-10 at Read
/ Write ratio 50/50) =ape 12
607
Therefore, we recommend implementing at least 12 disk drives for this IBM i workload.
With SAN Volume Controller or when the Storwize V7000 is implemented with background
storage, you should still consider enough hard disk drives in the storage system that is
connected to Storwize or SAN Volume Controller to accommodate IBM i workload in the
peaks. You can take into account that the part of I/O behind Storwize or SAN Volume
Controller is done to or from the cache in the background storage system. Therefore, slightly
fewer disk arms often are sufficient when comparing the needed disk arms of the storage
system that are connected to IBM i without Storwize or SAN Volume Controller.
Data layout
Spreading workloads across all Storwize V7000 or SAN Volume Controller components
maximizes the utilization of the hardware resources in the storage system. However, it is
always possible when sharing resources that performance problems might arise because of
contention on these resources. Isolation of workloads is most easily accomplished where
each ASP or LPAR has its own managed storage pool. This configuration ensures that you
can place data where you intend. I/O activity should be balanced between the two nodes or
controllers on the SAN Volume Controller or Storwize V7000.
We make the following preferred practice recommendations for the layout:
Isolate critical IBM i workloads.
Use only IBM i LUNs on any storage pool rather than mixed with non-IBM i LUNs.
608
If production and development workloads are mixed in storage pools, the customer must
understand that this configuration can affect production performance.
Solid-state drives
The use of solid-state drives (SSDs) with the Storwize V7000 or SAN Volume Controller is
done through Easy Tier. Even if you do not plan to install SSDs you can still use Easy Tier to
evaluate your workload and provide information about the benefit you might gain by adding
SSDs in the future.
Before SSDs are implemented, the performance improvement of SSD with Easy Tier to the
host system can be estimated by IBMs Disk Magic modeling. In Disk Magic, you insert the
planned configuration of SSDs and HDDs and select one of the predefined skew levels. With
IBM i, select the skew level Very low because the degree of skew of an IBM i workload is often
small because of spreading of the objects by IBM i storage management.
When Easy Tier automated management is used, it is important to allow Easy Tier some
space to move data. You should not allocate 100% of the pool capacity; instead, leave some
capacity deallocated to allow Easy Tier migrations. At a minimum, leave one extent free per
tier in each storage pool. However, for optimum use of future functions, plan to leave 10
extents free total per pool.
There is also an option to create a disk pool of SSD in Storwize V7000 or SAN Volume
Controller, and create an IBM i ASP that uses disk capacity from the SSD pool. The
applications that are running in that ASP experience a performance boost.
Note: IBM i data relocation methods, such as ASP balancing and Media preference, are
not available to use with SSDs in Storwize V7000 or SAN Volume Controller.
609
To determine the number of FC adapters that sustain a particular IBM i workload without
performance bottlenecks, consider the measured throughput that is specified in Table A-2.
Table A-2 Throughput of Fibre Channel adapters
Maximum I/O rate per port
8 Gb 2-port adapter
4 Gb 2-port adapter
Maximum sequential
throughput per port
1100 MBps
310 MBps
Maximum transaction
throughput per port
530 MBps
250 MBps
610
IBM i mirroring
Some customers prefer to have more resiliency with the IBM i mirroring function. For
example, they use mirroring between two Storwize V7000 or SAN Volume Controller systems,
each connected with one VIOS. When you are starting the mirroring process with VIOS
connected to Storwize V7000 or SAN Volume Controller, you should add the LUNs to the
mirrored ASP by completing the following steps:
1. Add the LUNs from two virtual adapters, with each adapter connecting one to-be mirrored
half of LUNs.
2. After mirroring is started for those LUNs, add the LUNs from two new virtual adapters, with
each adapter connecting one to-be mirrored half, and so on. This way, you ensure that the
mirroring is started between the two SAN Volume Controller or Storwize V7000 and not
among the LUNs in the same SAN Volume Controller.
IBM i Multipath
Multipath provides greater resiliency for SAN-attached storage. IBM i supports up to eight
paths to each LUN. In addition to the availability considerations, lab performance testing
shows that two or more paths provide performance improvements when compared to a single
path. Often, two paths to a LUN are the ideal balance of price and performance. You might
want to consider more than two paths for workloads in which there is high wait time, or where
high I/O rates are expected to LUNs.
Multipath for a LUN is achieved by connecting the LUN to two or more ports that belong to
different adapters in IBM i partition. With native connection to Storwize V7000 or SAN Volume
Controller, the ports for Multipath must be in different physical adapters in IBM i. With
VIOS_NPIV, the virtual Fibre Channel adapters for Multipath must be assigned to different
VIOS. With VIOS VSCSI, connection the virtual SCSI adapters for Multipath must be
assigned to different VIOS.
Every LUN in Storwize V7000 or SAN Volume Controller uses one node as the preferred
node. The I/O traffic to and from the particular LUN normally goes through the preferred node.
If that node fails, the I/O operations are transferred to the remaining node. With IBM i
Multipath, all of the paths to a LUN through the preferred node are active and the path
through the non-preferred node are passive. Multipath uses the load balancing among the
paths to a LUN that go through the node, which is preferred for that LUN.
611
612
Related publications
The publications that are listed in this section are considered particularly suitable for a more
detailed discussion of the topics that are covered in this book.
613
Other resources
The following publications also are relevant as further information sources:
IBM System Storage Master Console: Installation and Users Guide, GC30-4090
IBM System Storage Open Software Family SAN Volume Controller: CIM Agent
Developers Reference, SC26-7545
IBM System Storage Open Software Family SAN Volume Controller: Command-Line
Interface User's Guide, SC26-7544
IBM System Storage Open Software Family SAN Volume Controller: Configuration Guide,
SC26-7543
IBM System Storage Open Software Family SAN Volume Controller: Host Attachment
Guide, SC26-7563
IBM System Storage Open Software Family SAN Volume Controller: Installation Guide,
SC26-7541
IBM System Storage Open Software Family SAN Volume Controller: Planning Guide,
GA22-1052
IBM System Storage Open Software Family SAN Volume Controller: Service Guide,
SC26-7542
IBM System Storage SAN Volume Controller - Software Installation and Configuration
Guide, SC23-6628
IBM System Storage SAN Volume Controller V6.2.0 - Software Installation and
Configuration Guide, GC27-2286, which is available at this website:
http://pic.dhe.ibm.com/infocenter/svc/ic/topic/com.ibm.storage.svc.console.doc/
svc_bkmap_confguidebk.pdf
IBM System Storage SAN Volume Controller 6.2.0 Configuration Limits and Restrictions,
S1003799
IBM TotalStorage Multipath Subsystem Device Driver Users Guide, SC30-4096
IBM XIV and SVC/ Best Practices Implementation Guide, which is available at this
website:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105195
Considerations and Comparisons between IBM SDD for Linux and DM-MPIO, which is
available at this website:
http://www.ibm.com/support/docview.wss?rs=540&context=ST52G7&q1=linux&uid=ssg1S
7001664&loc=en_US&cs=utf-8&lang=en
Referenced websites
The following websites are also relevant as further information sources:
IBM Storage home page:
http://www.storage.ibm.com
IBM site to download SSH for AIX:
http://oss.software.ibm.com/developerworks/projects/openssh
614
Related publications
615
616
Index
Numerics
10 Gb Ethernet adapter 7
1862 error 123
1920 error 193194, 216
bad period count 217
troubleshooting 217
2145-4F2 node support 5
2145-CG8 7, 61
A
access 19, 60, 72, 228
pattern 139
Access LUN 78
-access option 200
adapters 227, 315, 350
DS8000 296
administrator 60, 93, 260, 343, 520
ADT (Auto Logical Drive Transfer) 75
aggregate workload 72, 98, 283
AIX 227, 339, 550
host 244, 527
server migration 568
alert 20, 26, 182
events
CPU utilization threshold 458
overall back-end response time threshold 459
overall port response time threshold 459
algorithms 138
alias 40, 47, 488
storage subsystem 48
alignment 349
amount of I/O 139, 197
application 61, 184, 227, 270, 339
availability 350
database 139
performance 136, 340
streaming video 139
testing 151
Application Specific Integrated Circuit (ASIC) 31
architecture 72, 242, 264
array 19, 60, 76, 96, 9899, 136, 153, 197, 282, 314
considerations for storage pool 283
layout 282
midrange storage controllers 282
parameters 76
per storage pool 283
provisioning 282
site, spare 80
size, mixing in storage pool 317
array support library (ASL) 254
ASIC (Application Specific Integrated Circuit) 31
ASL (array support library) 254
asynchronous mirroring 205
asynchronous mode 158
Copyright IBM Corp. 2008, 2014. All rights reserved.
B
back-end I/O capacity 272
back-end storage 269
controller 145, 197
back-end striping 271
back-end transfer size 285
background copy 158
bandwidth 192
background write synchronization 158
backplane 31
backup 19, 152, 250, 342, 530
node 46
sessions 349
bad period count 217
balance 46, 75, 130, 233, 343
workload 138
bandwidth 19, 62, 153, 161, 179, 182, 225, 228, 341
parameter 172, 572
requirements 53
batch workloads 270
BIOS 69, 252
blade 40, 42
BladeCenter 53
block 77, 131, 138, 146, 340
size 342
boot 229
device 248
bottleneck 340
detection feature 36
boundary crossing 349
bridge 23
Brocade 536
Webtools GUI 39
buffer 147, 154, 227, 351
credit 187
bus 240
C
cache 99, 135, 227, 282, 340341, 343, 531
617
618
D
daisy-chain topology 204
data 20, 60, 196, 227, 340, 519
consistency 149
corruption, zone considerations 50
formats 250
integrity 135, 147
layout 131, 343
strategies 351
migration 153, 250
planner 323
mining 151
pattern 340
rate 224, 270
redundancy 96
traffic 26
data collection, host 524
data layout 343
Data Path View 472
Data Placement Advisor 323
database 20, 139, 149, 236, 342, 520
applications 139
log 342
Datapath Explorer 469
dd command 200
debug 524
decibel 186
milliwatt 186
dedicated ISLs 26
degraded performance 196
design 18, 60, 234
destage 288
size 315
DetectMDisks GUI option 75
device 18, 229, 525
adapter 293
adapter loading 78
E
Easy Tier 6, 266, 320
activate 324
check mode 334
check status 338
CLI 329
evaluation mode 330
GUI activate 335
manual operation 293
operation modes 323
processes 322
edge switch 1922, 31, 182
efficiency 138
egress port 31
email 51, 184, 260
EMC Symmetrix 92
error 226, 520521
handling 552
log 524, 551
logging 93, 550
error code 551
1625 81
Ethernet ports 5
event 20, 72, 138, 246
exchange 149
execution throttle 252
expansion 20
explicit sequential detect 288
extended-unique identifier 54
extenders 184
extension 50, 183
extent 79, 131, 294, 342
balancing script 108
size 131, 288, 293, 347
8 GB 6
extent pool 7879
affinity 291
storage pool striping 293
striping 79, 291
F
fabric 3, 17, 22, 153, 226227, 520, 593
hop count limit 186
isolation 233
login 235
outage 20, 182
watch 35
failover 139, 226, 521
logical drive 75
scenario 177
failure boundary 97, 346, 352
FAStT FC2-133 252
fastwrite cache 280
fault tolerant LUNs 100
FC flow control mechanism 20, 182
fcs adapter 245
fcs device 245
Fibre Channel 18, 182, 184, 225, 227, 235, 520
adapters 315
IP conversion 51, 184
port speed 439
ports 53, 72, 229, 545
router 37, 184
traffic 20, 182
file system 147, 253
level 258
firmware 219
FlashCopy 8, 64, 93, 276, 287, 521, 550
applications 552
creation 149
I/O operations 277
incremental 278
Index
619
G
General Public License 260
Global Mirror 158, 160, 188
1920 errors 217
bandwidth parameter 192
bandwidth resource 190
change to Metro Mirror 201
features by release 165
parameters 172, 189
partnership 173
partnership bandwidth parameter 161
planning 195
planning rules 194
relationship 180, 200
restart script 222
switching direction 200
upgrade scenarios 208
writes 176
gm_inter_cluster_delay_simulation parameter 189
gm_intra_cluster_delay_simulation parameter 189
gm_link_tolerance parameter 189
gm_max_host_delay parameter 189
gm_max_hostdelay parameter 172173
gmlinktolerance parameter 172173, 192, 220
bad periods 193
disabling 195, 222
GNU 260
grain size 287
granularity 131, 258
graphs 241
H
HACMP 247
hardware
620
redundancy 72
SVC node 521
upgrade 67
HBA 39, 53, 61, 188, 227228, 233, 245, 520
parameters for performance tuning 244
replacement 511
zoning 46
head-of-line blocking 20, 182
health checker 248
health, SAN switch 541
heartbeat 185
messages 161
messaging 161
signal 104
heterogeneous 60, 522
high-bandwidth hosts 21, 31
hop count 186
hops 19
host 19, 72, 130, 284, 339, 520
cluster implementation 242
configuration 45, 154, 226, 522
creation 47
data collection 524
definitions 236
HBA 46
I/O capacity 275
information 69, 525
mapping 521
port login 228
problems 520
system monitoring 225
systems 67, 225, 520
type 76
volume mapping 229
zone 44, 47, 130, 227, 522
host-based mirroring 258
hot extents 320
I
I/O balancing 350
I/O capacity 272
rule of thumb 280
I/O collision 282
I/O governing 139
rate 141
throttle 139
I/O group 25, 46, 61, 64, 130, 138, 194, 233
host mapping 230
mirroring 26
performance 267
performance scalability 267
switch splitting 25
I/O Monitoring Easy Tier 322
I/O operations, FlashCopy 277
I/O per volume 146
I/O performance 245
I/O rate calculation 146
I/O rate setting 141
I/O resources 270
I/O service times 98
protocol 4
limitations 56
qualified name 54
support 54
target 54
ISL (interswitch link) 1920, 22, 182
capacity 31
hop count 174
oversubscription 20, 182
trunk 31, 182
isolated SAN networks 296
isolation versus availability 233
J
journal 253
K
kernel 252
keys 235, 243
L
last extent 131
latency 31, 149, 179, 341
LDAP directory 4
lease expiry event 183
lg_term_dma attribute 245
lg_term_dma parameter 245
licensing 8
limitation 18, 206, 240, 342, 539
limits 61, 240, 351
lines of business (LOB) 346
link 65, 182, 196
bandwidth 161, 185
latency 161, 185
speed 178
Linux 252
livedump 531
load balance 139, 233
traffic 25
load balancing 130, 248, 251
LOB (lines of business) 346
local cluster 159
local hosts 159
log 551
logical block address 551
logical drive 120, 244, 343
failover 75
mapping 78
logical unit (LU) 64, 229
logical unit number 206
logical volumes 349
login from host port 228
logs 149, 342, 526
long-distance link latency 185
long-distance optical transceivers 51
loops 316
LPAR 250
lquerypr utility 106
Index
621
M
maintenance 69, 235, 520
procedures 545
managed disk 545
group 95, 136, 346, 361
Managed Disk Group Performance report 413
managed mode 78, 137, 315
management 23, 226, 343, 522
capability 228
port 228
software 230
map 154, 230, 351, 546
mapping 112, 134, 147, 226, 243, 521
rank to extent pools 291
VDisk 233
masking 67, 81, 154, 228, 522
master 69, 147
cluster 159
volume 159
max_xfer_size attribute 245
max_xfer_size parameter 245
maxhostdelay parameter 193
maximum I/O 349
maximum transmission unit 55
McDATA 536
MDisk 95, 131, 233, 342
checking access 106
group 342
moving to cluster 122
performance 449
performance levels 99
removing reserve 244
selecting 97
transfer size 285
media 220, 545
error 551
622
N
names 47, 130, 251
naming convention 40, 105, 130, 486
native copy services 206
nest aliases 46
no synchronization option 129
NOCOPY 148
node 1920, 130131, 182, 226, 228, 263, 270, 285,
520521
adding 65
failure 138, 235
maximum 61
port 40, 138, 221, 227, 522
Node Cache performance report 400
Node level reports 388, 396
num_cmd_elem attribute 244245
O
offline I/O group 135
OLTP (online transaction processing) 342
online transaction processing (OLTP) 342
operating systems
alignment with device data partitions 349
data collection methods 524
host pathing 233
optical distance extension 51
optical multiplexors 51, 184
optical transceivers 51
Oracle 248, 347
oversubscription, ISL 20, 182
P
parameters 139, 220, 228, 341
partitions 249
partnership bandwidth parameter 189
path 19, 24, 68, 72, 138, 226, 270, 351, 521522
count connection 83
selection 247
pcmquerypr command 243
performance 20, 60, 96, 130, 182, 225, 263, 339, 520
advantage 98
striping 97
back-end storage 269
characteristics 131, 260
LUNs 99
tiering 271
degradation 99, 196, 282
degradation, number of extent pools 294
improvement 136
level, MDisk 99
loss 160
monitoring 223, 228
reports
Managed Disk Group 413
SVC port performance 433
requirements 68
scalability, I/O groups 267
statistics 8
storage pool 96
tuning, HBA parameters 244
Perl packages 108
persistent reserve 106
physical link error 51
physical volume 249, 351
Plain Old Documentation 111
plink.exe utility 575
PLOGI 235
point-in-time consistency 176
point-in-time copy 151, 206
policies 242, 248
pool 60
port 19, 64, 72, 221, 226, 284, 521522
bandwidth 31
channel 37
density 31
mask 228
Q
queue depth 240, 245, 252, 284, 351
queue_depth hdisk attribute 244
quick synchronization 198
quiesce 134, 149
quorum disk 102
considerations 104
placement 29
R
RAID 78, 99, 136, 197, 316
array 220, 345346
RAID 5
algorithms 417, 421
storage pool 273
random I/O performance 272
random writes 273
rank to extent pool mapping
additional ranks 293
considerations 292
RAS capabilities 5
RC management 196
RDAC 72, 106
read
cache 340
data rate 390
miss performance 138
stability 177
real address space 126
real capacity 549
Real Time Performance Monitor 268
rebalancing script, XIV 305
reconstruction 178
recovery 120, 137, 149, 226, 545
point 195
redundancy 31, 185, 227, 522
redundant paths 227
Index
623
S
SameWWN.script 74
SAN 17, 61, 153, 225, 350, 519520
availability 233
bridge 23
configuration 17, 181
fabric 17, 153, 228, 233, 522
Health Professional 491
performance monitoring tool 195
zoning 138, 472
SAN switch 31
director class 32
edge 31
models 31
SAN Volume Controller 3, 18, 20, 38, 45, 5960, 130,
158, 182, 225, 263, 342, 519
back-end read response time 416
624
Index
625
storage pool
array considerations 283
I/O capacity 273
performance 96
striping 79, 291
extent pools 293
Storwize V7000 43, 87, 89, 283, 307
configuration 89
performance 381
traffic 381
streaming 342
video application 139
stride writes 272, 315
stripe 96
across disk arrays 97
striped mode 148, 343
VDisks 347
striping 76, 349
DS5000 314
performance advantage 283
workload 98, 283
sub-LUN migration 320
subsystem cache influence 274
Subsystem Device Driver (SDD) 8, 45, 72, 106, 135, 207,
227, 230, 247, 525
for Linux 253
support 342
support alerts 494
svcinfo command 107, 134, 229, 521
svcinfo lscluster command 189
svcinfo lscontroller controllerid command 523
svcinfo lsmigrate command 107
svcinfo lsnode command 523
svcmon tool 62
svctask chcluster
command 189
svctask command 107, 153, 254, 530
svctask detectmdisk command 75, 123
svctask migratetoimage command 123
svctask mkrcrelationship command 201
svctask mkvdisk command 123
svctask rmvdisk command 123
SVCTools package 107
switch 225, 520
fabric 19, 228
failure 20, 259
interoperability 53
port layout 3132
ports 25, 470
splitting 25
-sync flag 201
-sync option 199
synchronization 185
synchronized relationship 159
synchronized state 196
synchronous mode 158
synchronous remote copy 159
system 146, 225, 341, 524
performance 131, 253, 531
statistics setting 268
626
T
table space 342
tape media 19, 199, 227
target 92, 227, 545
port 81, 228
volume 159
test 19, 225
thin provisioning 279, 287
FlashCopy considerations 281
thin-provisioned volume 126
FlashCopy 127
thread 240
three-way copy service functions 205
threshold 20, 182, 196
throttle 139, 252
setting 140
throughput 234, 245, 270, 341342
environment 342
RAID arrays 98
requirements 77
throughput-based workload 340
tiers 105, 270271
time 19, 72, 226, 270
tips 40
Tivoli Storage Manager 207, 342, 348
Tivoli Storage Productivity Center 195, 359, 524
performance best practice 381
top 10 reports 382
volume performance reports 422
tools 225, 522
topology 18, 524
issues 24
problems 24
Topology Viewer
Data Path Explorer 469
Data Path View 472
navigation 468
SAN Volume Controller and Fabric 470
SAN Volume Controller health 470
zone configuration 472
Total Cache Hit percentage 400, 410
traffic 19, 25, 233
congestion 20
Fibre Channel 53
isolation 25
threshold 26
transaction 76, 149
environment 342
log 342
transaction-based workloads 314, 340341
transceivers 184
transfer 227, 340
transit 19
triangle topology 203
troubleshooting 39, 225, 519
tuning 225
U
UID field 112, 547
V
V7000
ports 310
SAN Volume Controller considerations 307
solution 118
storage pool 312
volume 308
VDisk 45, 228, 342, 521
creation 151
mapping 233
migration 136, 551
mirroring 129
size maximum 4
VDisk deletion 134
Veritas file sets 249
VIOS 248250, 350
clients 350
virtual address space 126
virtual capacity 128
virtual disk 138, 250
Virtual Disk Service 155
virtual fabrics 36
virtual SAN 37
virtualization 59, 343, 519
layer 93
policy 129
virtualizing 235
VMware
multipathing 257
vStorage APIs 8, 257
volume
group 81, 247
allocation 544
types 126
volume mirroring 60, 99, 129, 325
VSAN 19, 3738
trunking 37
VSCSI 249, 351
type 341
worldwide node name (WWNN)
setting 74
zoning 39
worldwide port number (WWPN)
521
debug 83
zoning 39
write 227, 282, 342
ordering 177
penalty 272273
performance 130
write cache destage 273
WWNN (worldwide node name)
setting 74
WWPN (worldwide port number)
521
debug 83
zoning 39
3839, 92
3839, 92
39, 66, 82, 227, 285,
X
XFP 51
XIV
LUN size 303
port naming conventions 85
ports 42, 305
storage pool layout 306
SVC considerations 83
zoning 42
XIV Storage System 283
Z
zone 38, 153, 228, 522
configuration 472
name 49
SAN Volume Controller 24
set 48, 546
share 50
zoning 23, 38, 50, 82, 138, 228, 491
configuration 38
guideline 183
HBAs 46
requirements 167
scheme 40
single host 45
Storwize V7000 43
XIV 42
zSeries attach capability 97
W
warning threshold 126
workload 20, 77, 98, 121, 141, 182, 227, 244, 270,
340341
throughput based 340
transaction based 340
Index
627
628
(1.0 spine)
0.875<->1.498
460 <-> 788 pages
Back cover
This book begins with a look at the latest developments with SAN
Volume Controller and Storwize V7000 and reviews the changes in the
previous versions of the product. It highlights configuration guidelines
and preferred practices for the storage area network (SAN) topology,
clustered system, back-end storage, storage pools and managed
disks, volumes, remote copy services, and hosts. Then, this book
provides performance guidelines for SAN Volume Controller, back-end
storage, and applications. It explains how you can optimize disk
performance with the IBM System Storage Easy Tier function. Next, it
provides preferred practices for monitoring, maintaining, and
troubleshooting SAN Volume Controller and Storwize V7000. Finally,
this book highlights several scenarios that demonstrate the preferred
practices and performance guidelines.
This book is intended for experienced storage, SAN, and SAN Volume
Controller administrators and technicians. Before reading this book,
you must have advanced knowledge of the SAN Volume Controller and
Storwize V7000 and SAN environments.
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
ISBN 0738439762