Whats New Performance Power9
Whats New Performance Power9
Whats New Performance Power9
Steve Nasypany
IBM Washington System Center
nasypany@us.ibm.com
Please note
IBM’s statements regarding its plans, directions, and intent are subject
to change or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our
general product direction and it should not be relied on in making a
purchasing decision.
2
Credits
3
The POWER9 Processor
Power Systems Performance Collateral
https://developer.ibm.com/linuxonpower/perfcol/
https://www14.software.ibm.com/webapp/set2/sas/f/best/home.html
6
IBM POWER Architecture & Terminology Basics
v AC922
§ Designed for HPC and Cognitive workloads
§ Nvidia NVLink2.0 GPU to GPU – 150 GB/s
§ NVLink2.0 CPU to GPU
150 GB/s vs 32 GB/s x86 PCIe3
§ Memory Bus
120 GB/s vs ~77 GB/s x86
CPW – POWER8 E880 vs POWER9 E980
CPW IBM i
# Cores per node
Model
Nodes 32c 40c 44c 48c
E880 361,180 436,080 n/a 491,060
1
E980 508,900 611,300 639,000 687,500
E880 715,740 863,620 n/a 980,230
2
E890 1,012,00 1,216,000 1,271,000 1,368,000
E880 1,084,510 1,291,170 n/a 1,470,340
3
E980 1,521,000 1,827,000 1,910,000 2,055,600
E880 1,433,800 1,718,720 n/a 1,961,410
4
E980 2,030,00 2,439,000 2,549,000 2,743,000
10
rPerf Comparisons – POWER8 vs POWER9 SMT
Use the IBM Power Systems Performance Report for POWER8 to POWER9
https://www.ibm.com/systems/power/hardware/reports/system_perf.html
For earlier POWER architectures, SMT breakdowns are not provided by the
report. For reference, these approximations are ‘roughly’ used:
•POWER6, Single-Thread sizing is 66% of SMT2 rating
•POWER7/7+, Single-Thread sizing is 56% of SMT4 rating
•POWER7/7+, SMT2 sizing is 83% of SMT4 rating
11
rPerf Comparisons – S824 vs S924
S924
ST SMT2 SMT4 SMT8
S
ST 197 335 462 583
8
2 SMT2 285 1.18X 1.62X 2.04X
4 SMT4 371 0.90X 1.24X 1.57X
12
rPerf Comparisons – S92X with SMT4 & SMT8
Cores/GHz SMT rPerf
S924 S924 S922
24c/3.4-3.9 20c/3.5-3.9 20c/2.9-3.8
System Per System Per System Per
core core core
7 32c/3.6 4 335 1.38x 1.84x 1.18x 1.89x 1.01x 1.62x
5
0 8 - 1.74x 2.32x 1.49x 2.39x 1.27x 2.04x
32c/3.5 4 354 1.30x 1.74x 1.12x 1.8x 0.95x 1.53x
7
5 8 - 1.64x 2.19x 1.41x 2.27x 1.20x 1.92x
0 32c/4.0 4 397 1.16x 1.55x 1.0x 1.6x 0.85x 1.36x
+
8 - 1.47x 1.95x 1.26x 2.01x 1.07x 1.72x
24c/3.52 4 371 1.25x 1.25x 1.07x 1.28x 0.91x 1.09x
15
rPerf Comparisons – POWER8 870/880 vs POWER9 Enterprise
16
POWER9 Other Updates (April and August)
Spectre/Meltdown?
POWER8 ratings have been reduced 5-7%. One could infer that POWER9 ratings for
customers not implementing the patches would be higher than those published.
AIX 7.2 code levels for POWER9 support an option to display speculation
security settings (currently undocumented)
# lparstat -x
LPAR Speculative Execution Mode : 2
19
POWER8->POWER9 SMT4 to SMT4 (Transactional Workload)
P9 SMT8 6vcpus
5000
P9 SMT8 5vcpus
4000
3000 P9 SMT8 4vcpus
2000
1000
0
2.00 3.00 4.00 5.00 6.00 7.00
Processor Consumed
• Migration same VP count: reduced utilization for same workload with similar or
improved response time and higher throughput
• Migration to 5 vcpu partitions will observe similar response time for same workload
with further reduced PC consumption
• 33% reduction in VP, better or equal response times for utilizations < 80%, higher
throughput, lowered PC
21
Java & WebSphere on POWER9
Best practices for Java and IBM WebSphere Application Server (WAS) on
IBM POWER9
25
All Migrations should consider moving to SMT8
Under
Entitled
Over
Entitled
4343.200
Migration - Other
v S92X Models
§ Peak B/W up to 170 GB/s per socket with DDR3
§ ½ population provides best memory bandwidth
§ Workloads sensitive to memory capacity should populate all slots
§ S914 does not support 128 GB DIMM
v GPU Models
§ NVLINK 2.0 interface integrated between CPU and GPU
§ 4 GPU models, 150 GB/s between CPU and GPU
§ 6 GPU models, 100 GB/s between CPU and GPU
§ Sustained 100 GB/s+ between DDR3 memory and CPU
Enterprise Bandwidth
v Enterprise Systems
§ Peak B/W up to 240 GB/s per socket with CDIMMs
§ 16 Gb/s X-Bus intranode connected fabric
§ 4X increase in SMP A-Bus internode connected fabric
§ 2X I/O bandwidth with PCIe Gen4 slots (8/drawer)
§ DDR3 CDIMMs from E880 models can be moved to E980
§ DDR3 CDIMMs from E850 cannot be moved to E950
§ (Future) Interrupt Virtualization Engine reduces the code path length
and improves performance compared to the previous architecture –
interrupt processing moved from Hypervisor into hardware
Enterprise Bandwidth
Power E850
Processor modules 2 3 4
Power E950
Processor
2 2 2 2 4 4 4 4
modules
GHz 3.6-3.8 3.4-3.8 3.2-3.8 3.15-3.8 3.6-3.8 3.4-3.8 3.2-3.8 3.15-3.8
Cores 16 20 22 24 32 40 44 48
Power E980
32 cores 40 cores 44 cores 48 cores
3.9 to 4.0 GHz [GBps] 3.7 to 3.9 GHz 3.58 to 3.9 GHz 3.55 to 3.9 GHz
L1 data cache 11,981 - 12,288 14,208 - 14,976 15,122 - 16,474 16,358 - 17,971
L2 cache 11,981 - 12,288 14,208 - 14,976 15,122 - 16,474 16,358 - 17,9712
L3 cache 7,987 - 8,192 9,472 - 9,984 10,081 - 10,982 10,901 - 11,980
Utilization Values with Simultaneous Multithreading (SMT)
Learning point: Linux on Power typically understates CPU utilization, you need to use tools
like sar or mpstat to view individual vcpu use of SMT threads to assess per core use
How does SMT work in POWER9?
1 2 1 2 1 2 1 2
3 4 3 4 3 4 3 4
5 6 5 6 5 6 5 6
7 8 7 8 7 8 7 8
But……
• AIX has decided adjust the dispatch threshold for POWER9 systems
• Intent is to make low-thread count database workloads dispatch more
aggressively for performance (follow POWER7 and POWER8 model)
• This will be less than the calibrated single-thread utilization of 32%
• Tuning will be to < 32%: APAR IJ10535: P9 VPM FOLD THRESHOLD
• Those using earlier releases can use vpm_fold_threshold=29, which is a
schedo dynamic tunable (this is short-term guidance before APAR ships)
• POWER9 Mode still has more awareness of core architecture for better
cache optimizations
POWER6, POWER7/POWER8 and POWER9 AIX Dispatch
POWER9 SMT8
POWER7/8 SMT4 Htc0 busy Htc0 busy
Htc1 idle
POWER6 SMT2 Htc0 busy
Htc1 busy
physc: ~1.0
~50% busy ~30% busy
~50% idle ~70% idle
(pre-IJ10535) (post-IJ10535)
physc: ~1.0 physc: ~1.0
Activate
Virtual
Processor
Customers using Scaled Throughput
Scaled Throughput Mode is an alternative AIX dispatch algorithm, where
SMT threads on the same Virtual Processor are executed more
aggressively. In general, this mode:
• Reduces physical consumption by activating more SMT threads
• More “POWER6 like”
• Adopted by customers wanting to reduce physical consumption
without the effort of reducing Virtual Processor counts in a migration
• Trades some performance/latency compared to Raw Mode
• Settings are 2, 4 & 8 and map to how many SMT threads are used
before the next Virtual Processor is activated
T1
T5
T9
T13
T17
T21
T25
T29
T33
T37
T41
T45
T49
T53
T57
T61
T65
T69
T73
T77
T81
T85
T89
T93
T97
POWER8
SPS
Workload will behave the same on
same system configuration
%
%
0%
0%
90
80
70
60
50
40
30
20
10
10
Load Level
POWER9 EnergyScale
DPM
• Takes advantage of nominal NOMINAL
environmental conditions by MPM
allowing increased CPU SPS
frequency and power draw
• Lighter workloads can exploit
higher frequencies
• Idle state remains at high
frequency
%
0%
0%
90
80
70
60
50
40
30
20
10
10
Utilization Level
Monitoring Frequency
AIX
Currently, the AIX tooling only shows legacy value for Dynamic AND
Maximum Performance Modes on POWER9 / AIX 7.2 (this is a bug)
lparstat –i | grep Saving
Power Saving Mode : Dynamic Power Savings (Favor Performance)
48
Monitoring Frequency
Linux
List power management modes
dmesg | grep freq
[ 0.000000] time_init: decrementer frequency = 512.000000 MHz
[ 0.000000] time_init: processor frequency = 2900.000000 MHz
Linux (PowerVM)
ppc64_cpu --frequency
Linux (Non-PowerVM)
List frequency of all cores
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
49
Proving Frequency
52
Notices and disclaimers
© 2018 International Business Machines Corporation. No part of this Performance data contained herein was generally obtained in a controlled,
document may be reproduced or transmitted in any form without isolated environments. Customer examples are presented as illustrations of
written permission from IBM. how those
U.S. Government Users Restricted Rights — use, duplication or customers have used IBM products and the results they may have
disclosure restricted by GSA ADP Schedule Contract with IBM. achieved. Actual performance, cost, savings or other results in other
operating environments may vary.
Information in these presentations (including information relating to
products that have not yet been announced by IBM) has been reviewed References in this document to IBM products, programs, or services does
for accuracy as of the date of initial publication and could include not imply that IBM intends to make such products, programs or services
unintentional technical or typographical errors. IBM shall have no available in all countries in which IBM operates or does business.
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event, Workshops, sessions and associated materials may have been prepared by
shall IBM be liable for any damage arising from the use of this independent session speakers, and do not necessarily reflect the views of
information, including but not limited to, loss of data, business IBM. All materials and discussions are provided for informational purposes
interruption, loss of profit or loss of opportunity. IBM products and only, and are neither intended to, nor shall constitute legal or other
services are warranted per the terms and conditions of the agreements guidance or advice to any individual participant or their specific situation.
under which they are provided.
It is the customer’s responsibility to insure its own compliance with legal
IBM products are manufactured from new parts or new and used parts. requirements and to obtain advice of competent legal counsel as to
In some cases, a product may not be new and may have been previously the identification and interpretation of any relevant laws and regulatory
installed. Regardless, our warranty terms apply.” requirements that may affect the customer’s business and any actions the
customer may need to take to comply with such laws. IBM does not provide
Any statements regarding IBM's future direction, intent or product legal advice or represent or warrant that its services or products will ensure
plans are subject to change or withdrawal without notice. that the customer follows any law.
53
Notices and disclaimers
continued
Information concerning non-IBM products was obtained from the IBM, the IBM logo, ibm.com and [names of other referenced IBM
suppliers of those products, their published announcements or other products and services used in the presentation] are trademarks of
publicly available sources. IBM has not tested those products about this International Business Machines Corporation, registered in many
publication and cannot confirm the accuracy of performance, jurisdictions worldwide. Other product and service names might
compatibility or any other claims related to non-IBM be trademarks of IBM or other companies. A current list of IBM
products. Questions on the capabilities of non-IBM products should be trademarks is available on the Web at "Copyright and trademark
addressed to the suppliers of those products. IBM does not warrant the information" at: www.ibm.com/legal/copytrade.shtml.
quality of any third-party products, or the ability of any such third-party
products to interoperate with IBM’s products. IBM expressly disclaims .
all warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
54
55