Pcp-Users-And-Administrators-Guide 2
Pcp-Users-And-Administrators-Guide 2
Pcp-Users-And-Administrators-Guide 2
<pcp@groups.io>
https://pcp.io
PERFORMANCE
CO-PILOT
LICENSE
Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-Share Alike, Version
3.0 or any later version published by the Creative Commons Corp. A copy of the license is available at http://creativecommons.org/licenses/by-
sa/3.0/us/
Red Hat and the Shadowman logo are trademarks of Red Hat, Inc., registered in the United States and other countries.
Cisco is a trademark of Cisco Systems, Inc. Linux is a registered trademark of Linus Torvalds, used with permission. UNIX is a registered trademark
of The Open Group.
Table of Contents
About This Guide .............................................................................................................. ix
What This Guide Contains .......................................................................................... ix
Audience for This Guide .............................................................................................. x
Related Resources ....................................................................................................... x
Man Pages ................................................................................................................. x
Web Site ................................................................................................................... x
Conventions .............................................................................................................. xi
Reader Comments ...................................................................................................... xi
1. Introduction to PCP ......................................................................................................... 1
Objectives .................................................................................................................. 1
PCP Target Usage ............................................................................................... 2
Empowering the PCP User ................................................................................... 2
Unification of Performance Metric Domains ............................................................ 2
Uniform Naming and Access to Performance Metrics ................................................ 2
PCP Distributed Operation ................................................................................... 2
Dynamic Adaptation to Change ............................................................................. 3
Logging and Retrospective Analysis ...................................................................... 3
Automated Operational Support ............................................................................. 3
PCP Extensibility ................................................................................................ 3
Metric Coverage ................................................................................................. 4
Conceptual Foundations ............................................................................................... 4
Performance Metrics ........................................................................................... 4
Performance Metric Instances ............................................................................... 4
Current Metric Context ........................................................................................ 5
Sources of Performance Metrics and Their Domains ................................................. 5
Distributed Collection .......................................................................................... 6
Performance Metrics Name Space ......................................................................... 7
Descriptions for Performance Metrics ..................................................................... 7
Values for Performance Metrics ............................................................................ 8
Collector and Monitor Roles ................................................................................. 9
Retrospective Sources of Performance Metrics ......................................................... 9
Product Extensibility .......................................................................................... 10
Overview of Component Software ............................................................................... 10
Performance Monitoring and Visualization ............................................................ 10
Collecting, Transporting, and Archiving Performance Information ............................. 11
Operational and Infrastructure Support .................................................................. 13
Application and Agent Development .................................................................... 14
2. Installing and Configuring Performance Co-Pilot ................................................................ 15
Product Structure ...................................................................................................... 15
Performance Metrics Collection Daemon (PMCD) .......................................................... 16
Starting and Stopping the PMCD ......................................................................... 16
Restarting an Unresponsive PMCD ...................................................................... 16
PMCD Diagnostics and Error Messages ................................................................ 16
PMCD Options and Configuration Files ................................................................ 17
Managing Optional PMDAs ........................................................................................ 21
PMDA Installation on a PCP Collector Host .......................................................... 22
PMDA Removal on a PCP Collector Host ............................................................. 23
Troubleshooting ........................................................................................................ 24
Performance Metrics Name Space ........................................................................ 24
Missing and Incomplete Values for Performance Metrics ......................................... 24
Kernel Metrics and the PMCD ............................................................................ 24
iii
Performance Co-Pilot™ User's
and Administrator's Guide
iv
Performance Co-Pilot™ User's
and Administrator's Guide
v
List of Figures
1.1. Performance Metric Domains as Autonomous Collections of Data ......................................... 5
1.2. Process Structure for Distributed Operation ....................................................................... 6
1.3. Small Performance Metrics Name Space (PMNS) .............................................................. 7
1.4. Architecture for Retrospective Analysis ............................................................................ 9
5.1. Sampling Time Line .................................................................................................... 54
5.2. Three-Dimensional Parameter Space ............................................................................... 54
6.1. Archive Log Directory Structure .................................................................................... 76
7.1. PCP Deployment for a Single System ............................................................................. 88
7.2. Basic PCP Deployment for Two Systems ........................................................................ 88
7.3. General PCP Deployment for Multiple Systems ................................................................ 89
7.4. PCP Deployment to Measure Client-Server Quality of Service ............................................ 90
7.5. Designated PCP Archive Site ........................................................................................ 91
7.6. PCP Management Site Deployment ................................................................................ 93
8.1. Small Performance Metrics Name Space (PMNS) ........................................................... 102
vi
List of Tables
1.1. Sample Instance Identifiers for Disk Statistics .................................................................... 8
3.1. Physical Filenames for Components of a PCP Archive Log ................................................ 28
6.1. Filenames for PCP Archive Log Components (archive.*) ............................................... 74
A.1. Performance Co-Pilot Acronyms and Their Meanings ..................................................... 104
vii
List of Examples
2.1. PMNS Installation Output ............................................................................................ 22
5.1. pmie with the -f Option ............................................................................................. 48
5.2. pmie with the -d and -h Options ................................................................................. 48
5.3. pmie with the -v Option ............................................................................................. 49
5.4. Printed pmie Output .................................................................................................... 50
5.5. Labelled pmie Output .................................................................................................. 50
5.6. Relational Expressions ................................................................................................. 56
5.7. Rule Expression Options .............................................................................................. 60
5.8. System Log Text ........................................................................................................ 60
5.9. Standard Output .......................................................................................................... 60
5.10. Monitoring CPU Utilization ........................................................................................ 62
5.11. Monitoring Disk Activity ............................................................................................ 63
6.1. Using pminfo to Obtain Archive Information ................................................................. 77
6.2. Using pmlogsummary to Summarize Archive Information ............................................... 81
6.3. Listing Available Commands ........................................................................................ 82
8.1. PMNS Specification ................................................................................................... 102
viii
About This Guide
Table of Contents
What This Guide Contains .................................................................................................. ix
Audience for This Guide ...................................................................................................... x
Related Resources ............................................................................................................... x
Man Pages ......................................................................................................................... x
Web Site ........................................................................................................................... x
Conventions ...................................................................................................................... xi
Reader Comments .............................................................................................................. xi
This guide describes the Performance Co-Pilot (PCP) performance analysis toolkit. PCP provides a
systems-level suite of tools that cooperate to deliver distributed performance monitoring and performance
management services spanning hardware platforms, operating systems, service layers, database internals,
user applications and distributed architectures.
PCP is a cross-platform, open source software package - customizations, extensions, source code
inspection, and tinkering in general is actively encouraged.
“About This Guide” includes short descriptions of the chapters in this book, directs you to additional
sources of information, and explains typographical conventions.
• Chapter 1, Introduction to PCP, provides an introduction, a brief overview of the software components,
and conceptual foundations of the PCP software.
• Chapter 2, Installing and Configuring Performance Co-Pilot, describes the basic installation and
configuration steps necessary to get PCP running on your systems.
• Chapter 3, Common Conventions and Arguments, describes the user interface components that are
common to most of the text-based utilities that make up the monitor portion of PCP.
• Chapter 4, Monitoring System Performance, describes the performance monitoring tools available in
Performance Co-Pilot (PCP).
• Chapter 5, Performance Metrics Inference Engine, describes the Performance Metrics Inference Engine
(pmie) tool that provides automated monitoring of, and reasoning about, system performance within
the PCP framework.
• Chapter 6, Archive Logging, covers the PCP services and utilities that support archive logging for
capturing accurate historical performance records.
• Chapter 7, Performance Co-Pilot Deployment Strategies, presents the various options for deploying
PCP functionality across cooperating systems.
• Chapter 8, Customizing and Extending PCP Services, describes the procedures necessary to ensure that
the PCP configuration is customized in ways that maximize the coverage and quality of performance
monitoring and management services.
ix
About This Guide
• Appendix A, Acronyms, provides a comprehensive list of the acronyms used in this guide and in the
man pages for Performance Co-Pilot.
Related Resources
The Performance Co-Pilot Programmer's Guide, a companion document to the Performance Co-Pilot
User's and Administrator's Guide, is intended for developers who want to use the PCP framework and
services for exporting additional collections of performance metrics, or for delivering new or customized
applications to enhance performance management.
The Performance Co-Pilot Tutorials and Case Studies provides a series of real-world examples of using
various PCP tools, and lessons learned from deploying the toolkit in production environments. It serves
to provide reinforcement of the general concepts discussed in the other two books with additional case
studies, and in some cases very detailed discussion of specifics of individual tools.
Additional resources include man pages and the project web site.
Man Pages
The operating system man pages provide concise reference information on the use of commands,
subroutines, and system resources. There is usually a man page for each PCP command or subroutine. To
see a list of all the PCP man pages, start from the following command:
man PCPIntro
Each man page usually has a "SEE ALSO" section, linking to other, related entries.
To see a particular man page, supply its name to the man command, for example:
man pcp
The man pages are arranged in different sections - user commands, programming interfaces, and so on.
For a complete list of manual sections on a platform enter the command:
man man
When referring to man pages, this guide follows a standard convention: the section number in parentheses
follows the item. For example, pminfo(1) refers to the man page in section 1 for the pminfo command.
Web Site
The following web site is accessible to everyone:
URL Description
https://pcp.io PCP is open source software released under the GNU General Public
License (GPL) and GNU Lesser General Public License (LGPL)
x
About This Guide
Conventions
The following conventions are used throughout this document:
Convention Meaning
command This fixed-space font denotes literal items such as commands, files,
routines, path names, signals, messages, and programming language
structures.
variable Italic typeface denotes variable entries and words or concepts being
defined.
user input This bold, fixed-space font denotes literal items that the user enters in
interactive sessions. (Output is shown in nonbold, fixed-space font.)
ALL CAPS All capital letters denote environment variables, operator names,
directives, defined constants, and macros in C programs.
Reader Comments
If you have comments about the technical accuracy, content, or organization of this document, contact the
PCP maintainers using either the email address or the web site listed earlier.
xi
Chapter 1. Introduction to PCP
Table of Contents
Objectives .......................................................................................................................... 1
PCP Target Usage ....................................................................................................... 2
Empowering the PCP User ........................................................................................... 2
Unification of Performance Metric Domains .................................................................... 2
Uniform Naming and Access to Performance Metrics ....................................................... 2
PCP Distributed Operation ........................................................................................... 2
Dynamic Adaptation to Change ..................................................................................... 3
Logging and Retrospective Analysis .............................................................................. 3
Automated Operational Support .................................................................................... 3
PCP Extensibility ........................................................................................................ 3
Metric Coverage ......................................................................................................... 4
Conceptual Foundations ....................................................................................................... 4
Performance Metrics ................................................................................................... 4
Performance Metric Instances ....................................................................................... 4
Current Metric Context ................................................................................................ 5
Sources of Performance Metrics and Their Domains ......................................................... 5
Distributed Collection .................................................................................................. 6
Performance Metrics Name Space ................................................................................. 7
Descriptions for Performance Metrics ............................................................................. 7
Values for Performance Metrics .................................................................................... 8
Collector and Monitor Roles ......................................................................................... 9
Retrospective Sources of Performance Metrics ................................................................. 9
Product Extensibility .................................................................................................. 10
Overview of Component Software ....................................................................................... 10
Performance Monitoring and Visualization .................................................................... 10
Collecting, Transporting, and Archiving Performance Information ..................................... 11
Operational and Infrastructure Support .......................................................................... 13
Application and Agent Development ............................................................................ 14
This chapter provides an introduction to Performance Co-Pilot (PCP), an overview of its individual
components, and conceptual information to help you use this software.
• the section called “Overview of Component Software”, describes PCP tools and agents.
• the section called “Conceptual Foundations”, discusses the design theories behind PCP.
Objectives
Performance Co-Pilot (PCP) provides a range of services that may be used to monitor and manage
system performance. These services are distributed and scalable to accommodate the most complex system
configurations and performance problems.
1
Introduction to PCP
From the PCP end user's perspective, PCP presents an integrated suite of tools, user interfaces, and services
that support real-time and retrospective performance analysis, with a bias towards eliminating mundane
information and focusing attention on the exceptional and extraordinary performance behaviors. When this
is done, the user can concentrate on in-depth analysis or target management procedures for those critical
system performance problems.
Performance Metrics Domain Agents (PMDAs) within PCP encapsulate the knowledge about, and export
performance information from, autonomous performance domains.
PCP provides a single interface to name and retrieve values for all performance metrics, independently
of their source or location.
These requirements suggest a classic client-server architecture, which is exactly what PCP uses to provide
concurrent and multiconnected access to performance metrics, independent of their host location.
2
Introduction to PCP
The distributed nature of the PCP (and the modular fashion in which performance metrics domains can be
installed, upgraded, and configured on different hosts) enables PCP to adapt concurrently to changes in the
monitored system(s). Variations in the available performance metrics as a consequence of configuration
changes are handled automatically and become visible to all clients as soon as the reconfigured host is
rebooted or the responsible agent is restarted.
PCP also detects loss of client-server connections, and most clients support subsequent automated
reconnection.
Most PCP applications are able to process archive logs and real-time performance data with equal facility.
Unification of real-time access and access to the archive logs, in conjunction with the media controls,
provides powerful mechanisms for building performance tools and to review both current and historical
performance data.
• Web-based publishing of charts showing snapshots of performance activity levels in the recent past
• Flexible alarm monitoring: parameterized rules to address common critical performance scenarios and
facilities to customize and refine this monitoring
• Retrospective performance audits covering the recent past; for example, daily or weekly checks for
performance regressions or quality of service problems
PCP Extensibility
PCP permits the integration of new performance metrics into the PMNS, the collection infrastructure, and
the logging framework. The guiding principle is, “if it is important for monitoring system performance,
and you can measure it, you can easily integrate it into the PCP framework.”
3
Introduction to PCP
For many PCP users, the most important performance metrics are not those already supported, but new
performance metrics that characterize the essence of good or bad performance at their site, or within their
particular application environment.
One example is an application that measures the round-trip time for a benign “probe” transaction against
some mission-critical application.
For application developers, a library is provided to support easy-to-use insertion of trace and monitoring
points within an application, and the automatic export of resultant performance data into the PCP
framework. Other libraries and tools aid the development of customized and fully featured Performance
Metrics Domain Agents (PMDAs).
Extensive source code examples are provided in the distribution, and by using the PCP toolkit and
interfaces, these customized measures of performance or quality of service can be easily and seamlessly
integrated into the PCP framework.
Metric Coverage
The core PCP modules support export of performance metrics that include kernel instrumentation,
hardware instrumentation, process-level resource utilization, database and other system services
instrumentation, and activity in the PCP collection infrastructure.
The supplied agents support thousands of distinct performance metrics, many of which can have multiple
values, for example, per disk, per CPU, or per process.
Conceptual Foundations
The following sections provide a detailed overview of concepts that underpin Performance Co-Pilot (PCP).
Performance Metrics
Across all of the supported performance metric domains, there are a large number of performance metrics.
Each metric has its own structure and semantics. PCP presents a uniform interface to these metrics,
independent of the underlying metric data source.
The Performance Metrics Name Space (PMNS) provides a hierarchical classification of human-readable
metric names, and a mapping from these external names to internal metric identifiers. See the section
called “Performance Metrics Name Space”, for a description of the PMNS.
Transient performance metrics (such as per-process information) cause repeated requests for the same
metric to return different numbers of values, or changes in the particular instance identifiers returned.
These changes are expected and fully supported by the PCP infrastructure; however, metric instantiation
is guaranteed to be valid only at the time of collection.
4
Introduction to PCP
A metric source may be the current performance data from a particular host (a live or real-time source),
or a set of archive logs of performance data collected by pmlogger at some distant host or at an earlier
time (a retrospective or archive source).
By default, the collection time for a performance metric is the current time of day for real-time sources, or
current point within an archive source. For archives, the collection time may be reset to an arbitrary time
within the bounds of the set of archive logs.
Performance End-user
metric Kernel application
DBMS Layered
domains ABC
service
XYZ
• The operating system kernel, including sub-system data structures - per-process resource consumption,
network statistics, disk activity, or memory management instrumentation.
• A layered software service such as activity logs for a World Wide Web server or an email delivery server.
• An application program such as measured response time for a production application running a periodic
and benign probe transaction (as often required in service level agreements), or rate of computation and
throughput in jobs per minute for a batch stream.
For each domain, the set of performance metrics may be viewed as an abstract data type, with an associated
set of methods that may be used to perform the following tasks:
• Interrogate the metadata that describes the syntax and semantics of the performance metrics
5
Introduction to PCP
We refer to each functional domain as a performance metrics domain and assume that domains are
functionally, architecturally, and administratively independent and autonomous. Obviously the set of
performance metrics domains available on any host is variable, and changes with time as software and
hardware are installed and removed.
The number of performance metrics domains may be further enlarged in cluster-based or network-based
configurations, where there is potentially an instance of each performance metrics domain on each node.
Hence, the management of performance metrics domains must be both extensible at a particular host and
distributed across a number of hosts.
Each performance metrics domain on a particular host must be assigned a unique Performance Metric
Identifier (PMID). In practice, this means unique identifiers are assigned globally for each performance
metrics domain type. For example, the same identifier would be used for the Apache Web Server
performance metrics domain on all hosts.
Distributed Collection
The performance metrics collection architecture is distributed, in the sense that any performance tool may
be executing remotely. However, a PMDA usually runs on the system for which it is collecting performance
measurements. In most cases, connecting these tools together on the collector host is the responsibility of
the PMCD process, as shown in Figure 1.2, “Process Structure for Distributed Operation”.
pmcd pmcd
The host running the monitoring tools does not require any collection tools, including pmcd, because all
requests for metrics are sent to the pmcd process on the collector host. These requests are then forwarded
to the appropriate PMDAs, which respond with metric descriptions, help text, and most importantly, metric
values.
The connections between monitor clients and pmcd processes are managed in libpcp, below the PMAPI
level; see the pmapi(3) man page. Connections between PMDAs and pmcd are managed by the PMDA
routines; see the pmda(3) man page. There can be multiple monitor clients and multiple PMDAs on the
one host, but normally there would be only one pmcd process.
6
Introduction to PCP
An external name space - the Performance Metrics Name Space (PMNS) - maps from a hierarchy (or tree)
of human-readable names to PMIDs.
A metric name is formed by traversing the tree from the root to a leaf node with each node label on the
path separated by a period. The common prefix root. is omitted from all names. For example, Figure 1.3,
“Small Performance Metrics Name Space (PMNS) ” shows the nodes in a small subsection of a PMNS.
hw
In this subsection, the following are valid names for performance metrics:
kernel.percpu.syscall
network.tcp.rcvpack
hw.router.recv.total_util
• The internal identifier, Performance Metric Identifier (PMID), for the metric
• The format and encoding for the values of the metric, for example, an unsigned 32-bit integer or a string
or a 64-bit IEEE format floating point number
7
Introduction to PCP
• The semantics of the metric, particularly the interpretation of the values as free-running counters or
instantaneous values
• The dimensionality of the values, in the dimensions of events, space, and time
• The scale of values; for example, bytes, kilobytes (KB), or megabytes (MB) for the space dimension
For each metric, this metadata is defined within the associated PMDA, and PCP arranges for the
information to be exported to performance tools that use the metadata when interpreting the values for
each metric.
When a metric has multiple instances, the PCP framework does not pollute the Name Space with additional
metric names; rather, a single metric may have an associated set of values. These multiple values are
associated with the members of an instance domain, such that each instance has a unique instance identifier
within the associated instance domain. For example, the “per CPU” instance domain may use the instance
identifiers 0, 1, 2, 3, and so on to identify the configured processors in the system.
Internally, instance identifiers are encoded as binary values, but each performance metric domain also
supports corresponding strings as external names for the instance identifiers, and these names are used at
the user interface to the PCP utilities.
For example, the performance metric disk.dev.total counts I/O operations for each disk spindle,
and the associated instance domain contains one member for each disk spindle. On a system with five
specific disks, one value would be associated with each of the external and internal instance identifier pairs
shown in Table 1.1, “Sample Instance Identifiers for Disk Statistics ”.
8
Introduction to PCP
Each performance metric domain may dynamically establish the instances within an instance domain.
For example, there may be one instance for the metric kernel.percpu.idle on a workstation,
but multiple instances on a multiprocessor server. Even more dynamic is filesys.free, where the
values report the amount of free space per file system, and the number of values tracks the mounting and
unmounting of local filesystems.
PCP arranges for information describing instance domains to be exported from the performance metric
domains to the applications that require this information. Applications may also choose to retrieve values
for all instances of a performance metric, or some arbitrary subset of the available instances.
Collector Hosts that have pmcd and one or more performance metric domain agents (PMDAs)
running to collect and export performance metrics
Monitor Hosts that import performance metrics from one or more collector hosts to be
consumed by tools to monitor, manage, or record the performance of the collector
hosts
Monitor
PMAPI
9
Introduction to PCP
The PMAPI has been designed to minimize the differences required for an application to process
performance data from an archive or from a real-time source. As a result, most PCP tools support live and
retrospective monitoring with equal facility.
Product Extensibility
Much of the PCP software's potential for attacking difficult performance problems in production
environments comes from the design philosophy that considers extensibility to be critically important.
The performance analyst can take advantage of the PCP infrastructure to deploy value-added performance
monitoring tools and services. Here are some examples:
• Easy extension of the PCP collector to accommodate new performance metrics and new sources of
performance metrics, in particular using the interfaces of a special-purpose library to develop new
PMDAs (see the pmda(3) man page)
• Use of libraries (libpcp_pmda and libpcp_mmv) to aid in the development of new capabilities to
export performance metrics from local applications
• Distribution of PCP components such as collectors across the network, placing the service where it can
do the most good
• Creation of new monitor applications, using the routines described in the pmapi(3) man page
man pminfo
A representative list of PCP tools and commands, grouped by functionality, is provided in the following
four sections.
pcp-atop Full-screen monitor of the load on a system from a kernel, hardware and
processes point of view. It is modeled on the Linux atop(1) tool (home page
[http://www.atoptool.nl/]) and provides a showcase for the variety of data
available using PCP services and the Python scripting interfaces.
pmchart Strip chart tool for arbitrary performance metrics. Interactive graphical
utility that can display multiple charts simultaneously, from multiple hosts
or set of archives, aligned on a unified time axis (X-axis), or on multiple tabs.
10
Introduction to PCP
pmrep Outputs the values of arbitrary performance metrics collected live or from
a single PCP archive, in textual format.
pmevent Reports on event metrics, decoding the timestamp and event parameters for
text-based reporting.
pmieconf Creates parameterized rules to be used with the PCP inference engine
(pmie). It can be run either interactively or from scripts for automating the
setup of inference (the PCP start scripts do this, for example, to generate a
default configuration).
11
Introduction to PCP
pmdammv
Exports metrics from instrumented applications linked with the
pcp_mmv shared library or the Parfait [http://code.google.com/
p/parfait/] framework for Java instrumentation. These metrics are
custom developed per application, and in the case of Parfait,
automatically include numerous JVM, Tomcat and other server or
container statistics.
pmdanginx Extracts performance metrics from the nginx HTTP and reverse
proxy server.
pmdapostfix Export performance metrics from the Postfix mail transfer agent.
12
Introduction to PCP
pmdaweblog
Scans Web-server logs to extract metrics characterizing.
pmlogextract Reads one or more PCP archive logs and creates a temporally
merged and reduced PCP archive log as output.
pmlogger Creates PCP archive logs of performance metrics over time. Many
tools accept these PCP archive logs as alternative sources of
metrics for retrospective analysis.
pmproxy Provides REST APIs, archive discovery, and both PCP and Redis
protocol proxying when executing PCP or Redis client tools
through a network firewall system.
pmtrace Provides a simple command line interface to the trace PMDA and
its associated pcp_trace library.
13
Introduction to PCP
pmdbg Describes the available facilities and associated control flags. PCP tools include
internal diagnostic and debugging facilities that may be activated by run-time
flags.
pmnewlog Performs archive log rotation by stopping and restarting an instance of pmlogger.
pmnsadd Adds a subtree of new names into a PMNS, as used by the components of PCP.
pmnsdel Removes a subtree of names from a PMNS, as used by the components of the
PCP.
pmnsmerge Merges multiple PMNS files together, as used by the components of PCP.
pmstore Reinitializes counters or assigns new values to metrics that act as control
variables. The command changes the current values for the specified instances
of a single performance metric.
dbpmda Allows PMDA behavior to be exercised and tested. It is an interactive debugger for
PMDAs.
newhelp Generates the database files for one or more source files of PCP help text.
pmapi Defines a procedural interface for developing PCP client applications. It is the
Performance Metrics Application Programming Interface (PMAPI).
pmclient Is a simple client that uses the PMAPI to report some high-level system performance
metrics.
pmda Is a library used by many shipped PMDAs to communicate with a pmcd process. It
can expedite the development of new and custom PMDAs.
pmgenmap Generates C declarations and cpp(1) macros to aid the development of customized
programs that use the facilities of PCP. It is a PMDA development tool.
14
Chapter 2. Installing and Configuring
Performance Co-Pilot
Table of Contents
Product Structure .............................................................................................................. 15
Performance Metrics Collection Daemon (PMCD) .................................................................. 16
Starting and Stopping the PMCD ................................................................................. 16
Restarting an Unresponsive PMCD .............................................................................. 16
PMCD Diagnostics and Error Messages ........................................................................ 16
PMCD Options and Configuration Files ........................................................................ 17
Managing Optional PMDAs ................................................................................................ 21
PMDA Installation on a PCP Collector Host .................................................................. 22
PMDA Removal on a PCP Collector Host ..................................................................... 23
Troubleshooting ................................................................................................................ 24
Performance Metrics Name Space ................................................................................ 24
Missing and Incomplete Values for Performance Metrics ................................................. 24
Kernel Metrics and the PMCD .................................................................................... 24
The sections in this chapter describe the basic installation and configuration steps necessary to run
Performance Co-Pilot (PCP) on your systems. The following major sections are included:
• the section called “Product Structure” describes the main packages of PCP software and how they must
be installed on each system.
• the section called “Performance Metrics Collection Daemon (PMCD)”, describes the fundamentals of
maintaining the performance data collector.
• the section called “Managing Optional PMDAs”, describes the basics of installing a new Performance
Metrics Domain Agent (PMDA) to collect metric data and pass it to the PMCD.
• the section called “Troubleshooting”, offers advice on problems involving the PMCD.
Product Structure
In a typical deployment, Performance Co-Pilot (PCP) would be installed in a collector configuration on
one or more hosts, from which the performance information could then be collected, and in a monitor
configuration on one or more workstations, from which the performance of the server systems could then
be monitored.
On some platforms Performance Co-Pilot is presented as multiple packages; typically separating the server
components from graphical user interfaces and documentation.
15
Installing and Configuring
Performance Co-Pilot
chkconfig pmcd on
${PCP_RC_DIR}/pmcd start
These commands instruct the system to start the daemon immediately, and again whenever the system
is booted. It is not necessary to start the daemon on the monitoring system unless you wish to collect
performance information from it as well.
To stop pmcd immediately on a PCP collection system, enter the following command:
${PCP_RC_DIR}/pmcd stop
${PCP_RC_DIR}/pmcd start
On startup, pmcd looks for a configuration file at ${PCP_PMCDCONF_PATH}. This file specifies which
agents cover which performance metrics domains and how PMCD should make contact with the agents.
A comprehensive description of the configuration file syntax and semantics can be found in the pmcd(1)
man page.
If the configuration is changed, pmcd reconfigures itself when it receives the SIGHUP signal. Use the
following command to send the SIGHUP signal to the daemon:
This is also useful when one of the PMDAs managed by pmcd has failed or has been terminated by
pmcd. Upon receipt of the SIGHUP signal, pmcd restarts any PMDA that is configured but inactive. The
exception to this rule is the case of a PMDA which must run with superuser privileges (where possible,
this is avoided) - for these PMDAs, a full pmcd restart must be performed, using the process described
earlier (not SIGHUP).
16
Installing and Configuring
Performance Co-Pilot
The PMCD accepts certain command line options to control its execution, and these options are placed
in the pmcd.options file when ${PCP_RC_DIR}/pmcd is being used to start the daemon. The
following options (amongst others) are available:
-i address For hosts with more than one network interface, this option specifies the
interface on which this instance of the PMCD accepts connections. Multiple -
i options may be specified. The default in the absence of any -i option is for
PMCD to accept connections on all interfaces.
-l file Specifies a log file. If no -l option is specified, the log file name is pmcd.log
and it is created in the directory ${PCP_LOG_DIR}/pmcd/.
-s file Specifies the path to a local unix domain socket (for platforms supporting
this socket family only). The default value is ${PCP_RUN_DIR}/
pmcd.socket.
-t seconds Specifies the amount of time, in seconds, before PMCD times out on protocol
data unit (PDU) exchanges with PMDAs. If no time out is specified, the default
is five seconds. Setting time out to zero disables time outs (not recommended,
PMDAs should always respond quickly).
The time out may be dynamically modified by storing the number of seconds
into the metric pmcd.control.timeout using pmstore.
-T mask Specifies whether connection and PDU tracing are turned on for debugging
purposes.
See the pmcd(1) man page for complete information on these options.
The default pmcd.options file shipped with PCP is similar to the following:
# suppress timeouts
# -t 0
17
Installing and Configuring
Performance Co-Pilot
The most commonly used options have been placed in this file for your convenience. To uncomment and
use an option, simply remove the pound sign (#) at the beginning of the line with the option you wish to
use. Restart pmcd for the change to take effect; that is, as superuser, enter the command:
${PCP_RC_DIR}/pmcd start
With standard PCP operation (even if you have not created and added your own PMDAs), you might need
to edit this file in order to add any additional access control you wish to impose. If you do not add access
control rules, all access for all operations is granted to the local host, and read-only access is granted to
remote hosts. The pmcd.conf file is automatically generated during the software build process and on
Linux, for example, is similar to the following:
[access]
disallow ".*" : store;
disallow ":*" : store;
allow "local:*" : all;
18
Installing and Configuring
Performance Co-Pilot
Note
Even though PMCD does not run with root privileges, you must be very careful not to configure
PMDAs in this file if you are not sure of their action. This is because all PMDAs are initially
started as root (allowing them to assume alternate identities, such as postgres for example),
after which pmcd drops its privileges. Pay close attention that permissions on this file are not
inadvertently downgraded to allow public write access.
Each entry in this configuration file contains rules that specify how to connect the PMCD to a particular
PMDA and which metrics the PMDA monitors. A PMDA may be attached as a Dynamic Shared Object
(DSO) or by using a socket or a pair of pipes. The distinction between these attachment methods is
described below.
The label_name field specifies a name for the PMDA. The domain_number is an integer value that
specifies a domain of metrics for the PMDA. The type field indicates the type of entry (DSO, socket, or
pipe). The path field is for additional information, and varies according to the type of entry.
The following rules are common to DSO, socket, and pipe syntax:
entry-point The name of an initialization function called when the DSO is loaded.
path Designates the location of the DSO. An absolute path must be used. On most
platforms this will be a so suffixed file, on Windows it is a dll, and on Mac
OS X it is a dylib file.
address Specifies the address of the socket. For INET or IPv6 sockets, this is a port
number or port name. For UNIX sockets, this is the name of the PMDA's socket
on the local host.
19
Installing and Configuring
Performance Co-Pilot
command Specifies a command to start the PMDA when the PMCD is invoked and reads
the configuration file.
protocol Specifies whether a text-based or a binary PCP protocol should be used over the pipes.
Historically, this parameter was able to be “text” or “binary.” The text-based protocol
has long since been deprecated and removed, however, so nowadays “binary” is the
only valid value here.
command Specifies a command to start the PMDA when the PMCD is invoked and reads the
configuration file.
[access]
Below this line, you can add entries of the following forms:
The keywords users, groups and hosts can be used in either plural or singular form.
The userlist and grouplist fields are comma-separated lists of authenticated users and groups
from the local /etc/passwd and /etc/groups files, NIS (network information service) or LDAP
(lightweight directory access protocol) service.
The hostlist is a comma-separated list of host identifiers; the following rules apply:
• Host names must be in the local system's /etc/hosts file or known to the local DNS (domain name
service).
• A wildcarded IP or IPv6 address may be used to specify groups of hosts, with the single wildcard
character * as the last-given component of the address. The wildcard .* refers to all IP (IPv4) addresses.
The wildcard :* refers to all IPv6 addresses. If an IPv6 wildcard contains a :: component, then the final
* refers to the final 16 bits of the address only, otherwise it refers to the remaining unspecified bits of
the address.
The wildcard ``*'' refers to all users, groups or host addresses. Names of users, groups or hosts may not
be wildcarded.
20
Installing and Configuring
Performance Co-Pilot
babylon
babylon.acme.com
123.101.27.44
localhost
155.116.24.*
192.*
.*
fe80::223:14ff:feaf:b62c
fe80::223:14ff:feaf:*
fe80:*
:*
*
• The word all to allow or disallow all operations as specified in the first field.
• The words all except and a list of operations. This entry allows or disallows all operations as specified
in the first field except those listed.
• The phrase maximum N connections to set an upper bound (N) on the number of connections an
individual host, user or group of users may make. This can only be added to the operations list of
an allow statement.
fetch Allows retrieval of information from the PMCD. This may be information about a metric
(such as a description, instance domain, or help text) or an actual value for a metric.
store Allows the PMCD to store metric values in PMDAs that permit store operations. Be cautious
in allowing this operation, because it may be a security opening in large networks, although the
PMDAs shipped with the PCP package typically reject store operations, except for selected
performance metrics where the effect is benign.
Complete information on access control syntax rules in the pmcd.conf file can be found in the pmcd(1)
man page.
Other PMDAs are designed for optional activation and require some user action to make them operational.
In some cases these PMDAs expect local site customization to reflect the operational environment, the
21
Installing and Configuring
Performance Co-Pilot
system configuration, or the production workload. This customization is typically supported by interactive
installation scripts for each PMDA.
Each PMDA has its own directory located below ${PCP_PMDAS_DIR}. Each directory contains a
Remove script to unconfigure the PMDA, remove the associated metrics from the PMNS, and restart the
pmcd daemon; and an Install script to install the PMDA, update the PMNS, and restart the PMCD
daemon.
As a shortcut mechanism to support automated PMDA installation, a file named .NeedInstall can be
created in a PMDA directory below ${PCP_PMDAS_DIR}. The next restart of PCP services will invoke
that PMDAs installation automatically, with default options taken.
You need to update the PMNS, configure the PMDA, and notify PMCD. The Install script for each
PMDA automates these operations, as follows:
cd ${PCP_PMDAS_DIR}/cisco
3. In the unlikely event that you wish to use a non-default Performance Metrics Domain (PMD)
assignment, determine the current PMD assignment:
cat domain.h
./Install
You may be prompted to enter some local parameters or configuration options. The script applies all
required changes to the control files and to the PMNS, and then notifies PMCD. Example 2.1, “PMNS
Installation Output ” is illustrative of the interactions:
22
Installing and Configuring
Performance Co-Pilot
The PMNS needs to be updated, the PMDA unconfigured, and PMCD notified. The Remove script for
each PMDA automates these operations, as follows:
cd ${PCP_PMDAS_DIR}/elasticsearch
./Remove
23
Installing and Configuring
Performance Co-Pilot
Troubleshooting
The following sections offer troubleshooting advice on the Performance Metrics Name Space (PMNS),
missing and incomplete values for performance metrics, kernel metrics and the PMCD.
Advice for troubleshooting the archive logging system is provided in Chapter 6, Archive Logging.
The PMNS at the collector host is updated whenever a PMDA is installed or removed, and may also be
updated when new versions of PCP are installed. During these operations, the PMNS is typically updated
by merging the (plaintext) namespace components from each installed PMDA. These separate PMNS
components reside in the ${PCP_VAR_DIR}/pmns directory and are merged into the root file there.
Symptom: Values for some or all of the instances of a performance metric are not
available.
Cause: This can occur as a consequence of changes in the installation of modules (for
example, a DBMS or an application package) that provide the performance
instrumentation underpinning the PMDAs. Changes in the selection of
modules that are installed or operational, along with changes in the version of
these modules, may make metrics appear and disappear over time.
In simple terms, the PMNS contains a metric name, but when that metric is
requested, no PMDA at the collector host supports the metric.
For archive logs, the collection of metrics to be logged is a subset of the metrics
available, so utilities replaying from a PCP archive log may not have access to
all of the metrics available from a live (PMCD) source.
Resolution: Make sure the underlying instrumentation is available and the module is active.
Ensure that the PMDA is running on the host to be monitored. If necessary,
create a new archive log with a wider range of metrics to be logged.
24
Installing and Configuring
Performance Co-Pilot
Symptom: A PCP client tool (such as pmchart, pmie, or pmlogger) complains that
it is unable to connect to a remote PMCD (or establish a PMAPI context), but
you are sure that PMCD is active on the remote host.
Cause: To avoid hanging applications for the duration of TCP/IP time outs, the PMAPI
library implements its own time out when trying to establish a connection to
a PMCD. If the connection to the host is over a slow network, then successful
establishment of the connection may not be possible before the time out, and
the attempt is abandoned.
Alternatively, there may be a firewall in-between the client tool and PMCD
which is blocking the connection attempt.
Finally, PMCD may be running in a mode where it does not acept remote
connections, or only listening on certain interface.
This response indicates the PMCD is not running and needs restarting:
${PCP_RC_DIR}/pmcd start
Connected to far-away-host
Interrupt the telnet session, increase the PMAPI time out by setting
the PMCD_CONNECT_TIMEOUT environment variable to some number of
seconds (60 for instance), and try the PCP client tool again.
Verify that PMCD is not running in local-only mode, by looking for an enabled
value (one) from:
pminfo -f pmcd.feature.local
25
Installing and Configuring
Performance Co-Pilot
Symptom PMCD does not reconfigure itself after receiving the SIGHUP signal.
Resolution: Always monitor PMCD's log. For example, use the following command in
another window when reconfiguring PMCD, to watch errors occur:
tail -f ${PCP_LOG_DIR}/pmcd/pmcd.log
Symptom:
If the following messages appear in the PMCD log (${PCP_LOG_DIR}/
pmcd/pmcd.log), consider the cause and resolution:
Cause: PMCD is already running or was terminated before it could clean up properly.
The error occurs because the socket it advertises for client connections is
already being used or has not been cleared by the kernel.
${PCP_RC_DIR}/pmcd start
Any existing PMCD is shut down, and a new one is started in such a way that
the symptomatic message should not appear.
If you are starting PMCD this way and the symptomatic message appears,
a problem has occurred with the connection to one of the deceased PMCD's
clients.
This could happen when the network connection to a remote client is lost and
PMCD is subsequently terminated. The system may attempt to keep the socket
open for a time to allow the remote client a chance to reestablish the connection
and read any outstanding data.
The only solution in these circumstances is to wait until the socket times out
and the kernel deletes it. This netstat command displays the status of the socket
and any connections:
If the socket is in the FIN_WAIT or TIME_WAIT state, then you must wait
for it to be deleted. Once the command above produces no output, PMCD may
be restarted. Less commonly, you may have another program running on your
system that uses the same Internet port number (44321) that PMCD uses.
Refer to the PCPIntro(1) man page for a description of how to override the
default PMCD port assignment using the PMCD_PORT environment variable.
26
Chapter 3. Common Conventions and
Arguments
Table of Contents
Alternate Metrics Source Options ........................................................................................ 28
Fetching Metrics from Another Host ............................................................................ 28
Fetching Metrics from an Archive Log ......................................................................... 28
General PCP Tool Options ................................................................................................. 28
Common Directories and File Locations ....................................................................... 29
Alternate Performance Metric Name Spaces .................................................................. 30
Time Duration and Control ................................................................................................. 30
Performance Monitor Reporting Frequency and Duration ................................................. 30
Time Window Options ............................................................................................... 30
Timezone Options ..................................................................................................... 32
PCP Environment Variables ................................................................................................ 32
Running PCP Tools through a Firewall ................................................................................. 34
The pmproxy service ................................................................................................ 35
Transient Problems with Performance Metric Values ............................................................... 35
Performance Metric Wraparound ................................................................................. 35
Time Dilation and Time Skew .................................................................................... 35
This chapter deals with the user interface components that are common to most text-based utilities that
make up the monitor portion of Performance Co-Pilot (PCP). These are the major sections in this chapter:
• the section called “Alternate Metrics Source Options”, details some basic standards used in the
development of PCP tools.
• the section called “General PCP Tool Options”, details other options to use with PCP tools.
• the section called “Time Duration and Control”, describes the time control dialog and time-related
command line options available for use with PCP tools.
• the section called “PCP Environment Variables”, describes the environment variables supported by PCP
tools.
• the section called “Running PCP Tools through a Firewall”, describes how to execute PCP tools that
must retrieve performance data from the Performance Metrics Collection Daemon (PMCD) on the other
side of a TCP/IP security firewall.
• the section called “Transient Problems with Performance Metric Values ”, covers some uncommon
scenarios that may compromise performance metric integrity over the short term.
Many of the utilities provided with PCP conform to a common set of naming and syntactic conventions
for command line arguments and options. This section outlines these conventions and their meaning. The
options may be generally assumed to be honored for all utilities supporting the corresponding functionality.
In all cases, the man pages for each utility fully describe the supported command arguments and options.
Command line options are also relevant when starting PCP applications from the desktop using the Alt
double-click method. This technique launches the pmrun program to collect additional arguments to pass
along when starting a PCP application.
27
Common Conventions and Arguments
PCP archive logs are created with pmlogger. Most PCP utilities operate with equal facility for performance
information coming from either a real-time feed via PMCD on some host, or for historical data from a set
of PCP archive logs. For more information on archive logs and their use, see Chapter 6, Archive Logging.
The list of names (archive) used with the -a option implies the existence of the files created
automatically by pmlogger, as listed in Table 3.1, “Physical Filenames for Components of a PCP Archive
Log”.
Most tools are able to concurrently process multiple PCP archive logs (for example, for retrospective
analysis of performance across multiple hosts), and accept either multiple -a options or a comma separated
list of archive names following the -a option.
Note
The -h and -a options are almost always mutually exclusive. Currently, pmchart is the exception
to this rule but other tools may continue to blur this line in the future.
28
Common Conventions and Arguments
29
Common Conventions and Arguments
The distributed PMNS used in PCP avoids most requirements for an alternate PMNS, because clients'
PMNS operations are supported at the Performance Metrics Collection Daemon (PMCD) or by means of
PMNS data in a PCP archive log. The distributed PMNS is the default, but alternates may be specified
using the -n namespace argument to the PCP tools. When a PMNS is maintained on a host, it is likely
to reside in the ${PCP_VAR_DIR}/pmns directory.
The interval argument may also be expressed in terms of minutes, hours, or days, as described in
the PCPIntro(1) man page.
The -S, -T, -O and -A command line options are used by PCP applications to define a time window
of interest.
-S duration The start option may be used to request that the display start at the
nominated time. By default, the first sample of performance data is
retrieved immediately in real-time mode, or coincides with the first
sample of data of the first archive in a set of PCP archive logs in archive
mode. For archive mode, the -S option may be used to specify a later
time for the start of sampling. By default, if duration is an integer, the
units are assumed to be seconds.
-S 30min
30
Common Conventions and Arguments
To specify an offset from the end of a set of PCP archives, prefix the
duration with a minus sign. In this case, the first sample time precedes
the end of archived data by the given duration. For example, the
following entry retrieves the first sample exactly one hour preceding the
last sample in a set of PCP archives.
-S -1hour
To specify the calendar date and time (local time in the reporting
timezone) for the first sample, use the ctime(3) syntax preceded by
an “at” sign (@). For example, the following entry specifies the date and
time to be used.
Note that this format corresponds to the output format of the date
command for easy “cut and paste.” However, be sure to enclose the string
in quotes so it is preserved as a single argument for the PCP tool.
For more complete information on the date and time syntax, see the
PCPIntro(1) man page.
-T duration The terminate option may be used to request that the display stop at the
time designated by duration. By default, the PCP tools keep sampling
performance data indefinitely (in real-time mode) or until the end of a set
of PCP archives (in archive mode). The -T option may be used to specify
an earlier time to terminate sampling.
-S 1hour -T 45mins
-O duration By default, samples are fetched from the start time (see the description
of the -S option) to the terminate time (see the description of the -T
option). The offset -O option allows the specification of a time between
the start time and the terminate time where the tool should position its
initial sample time. This option is useful when initial attention is focused
at some point within a larger time window of interest, or when one PCP
tool wishes to launch another PCP tool with a common current point of
time within a shared time window.
-O -0
This is most useful with the pmchart command to display the tail-end of
the history up to the end of the time window.
31
Common Conventions and Arguments
-A 1sec
-A 30min
-A 1hour
The -A option advances the time to achieve the desired alignment as soon
as possible after the start of the time window, whether this is the default
window, or one specified with some combination of -A and -O command
line options.
Obviously the time window may be overspecified by using multiple options from the set -t, -s, -S, -T,
-A, and -O. Similarly, the time window may shrink to nothing by injudicious choice of options.
In all cases, the parsing of these options applies heuristics guided by the principal of “least surprise”; the
time window is always well-defined (with the end never earlier than the start), but may shrink to nothing
in the extreme.
Timezone Options
All utilities that report time of day use the local timezone by default. The following timezone options are
available:
-z Forces times to be reported in the timezone of the host that provided the
metric values (the PCP collector host). When used in conjunction with -
a and multiple archives, the convention is to use the timezone from the
first named archive.
The following environment variables are recognized by PCP (these definitions are also available on the
PCPIntro(1) man page):
32
Common Conventions and Arguments
33
Common Conventions and Arguments
34
Common Conventions and Arguments
If the firewall service sits between the monitor and collector tools, the pmproxy service may be used to
perform both packet forwarding and DNS proxying through the firewall; see the pmproxy(1) man page.
Otherwise, it is necessary to arrange for packet forwarding to be enabled for those TCP/IP ports used by
PCP, namely 44321 (or the value of the PMCD_PORT environment variable) for connections to PMCD.
pmprobe -h remotehost@proxyhost
This extended host specification syntax is part of a larger set of available extensions to the basic host
naming syntax - refer to the PCPIntro(1) man page for further details.
In all cases, these events are expected to occur infrequently, and should not persist beyond a few samples.
Because the value of these counter metrics is computed from the rate of change with respect to the
previous sample, this may result in a transient condition where the rate of change is an unknown value.
If the PCP_COUNTER_WRAP environment variable is set, this condition is treated as an overflow, and
speculative rate calculations are made. In either case, the correct rate calculation for the metric returns
with the next sample.
In practice, these errors are usually so small as to be insignificant, and the errors are self-correcting (not
cumulative) over consecutive samples.
A related problem may occur when the system time is not synchronized between multiple hosts, and the
time stamps for the results returned from PMCD reflect the skew in the system times. In this case, it is
recommended that NTP (network time protocol) be used to keep the system clocks on the collector systems
synchronized; for information on NTP refer to the ntpd(1) man page.
35
Chapter 4. Monitoring System
Performance
Table of Contents
The pmstat Command ....................................................................................................... 36
The pmrep Command ....................................................................................................... 38
The pmval Command ........................................................................................................ 38
The pminfo Command ....................................................................................................... 39
The pmstore Command ..................................................................................................... 43
This chapter describes the performance monitoring tools available in Performance Co-Pilot (PCP). This
product provides a group of commands and tools for measuring system performance. Each tool is described
completely by its own man page. The man pages are accessible through the man command. For example,
the man page for the tool pmrep is viewed by entering the following command:
man pmrep
• the section called “The pmstat Command”, discusses pmstat, a utility that provides a periodic one-line
summary of system performance.
• the section called “The pmrep Command”, discusses pmrep, a utility that shows the current values for
named performance metrics.
• the section called “The pmval Command”, describes pmval, a utility that displays performance metrics
in a textual format.
• the section called “The pminfo Command”, describes pminfo, a utility that displays information about
performance metrics.
• the section called “The pmstore Command”, describes the use of the pmstore utility to arbitrarily set
or reset selected performance metric values.
The following sections describe the various graphical and text-based PCP tools used to monitor local or
remote system performance.
pmstat
@ Thu Aug 15 09:25:56 2017
loadavg memory swap io system cpu
1 min swpd free buff cache pi po bi bo in cs us sy id
36
Monitoring System Performance
An additional line of output is added every five seconds. The -t interval option may be used to vary
the update interval (i.e. the sampling interval).
The output from pmstat is directed to standard output, and the columns in the report are interpreted as
follows:
memory The swpd column indicates average swap space used during the interval (all columns
reported in Kbytes unless otherwise indicated). The free column indicates average free
memory during the interval. The buff column indicates average buffer memory in use
during the interval. The cache column indicates average cached memory in use during
the interval.
swap Reports the average number of pages that are paged-in (pi) and paged-out (po) per
second during the interval. It is normal for the paged-in values to be non-zero, but the
system is suffering memory stress if the paged-out values are non-zero over an extended
period.
io The bi and bo columns indicate the average rate per second of block input and block
output operations respectfully, during the interval. These rates are independent of the I/
O block size. If the values become large, they are reported as thousands of operations
per second (K suffix) or millions of operations per second (M suffix).
system Context switch rate (cs) and interrupt rate (in). Rates are expressed as average
operations per second during the interval. Note that the interrupt rate is normally at least
HZ (the clock interrupt rate, and kernel.all.hz metric) interrupts per second.
cpu Percentage of CPU time spent executing user code (us), system and interrupt code (sy),
idle loop (id).
As with most PCP utilities, real-time metric, and archive logs are interchangeable.
For example, the following command uses a local system PCP archive log 20170731 and the timezone
of the host (smash) from which performance metrics in the archive were collected:
For complete information on pmstat usage and command line options, see the pmstat(1) man page.
37
Monitoring System Performance
pmval proc.nprocs
metric: proc.nprocs
host: localhost
semantics: instantaneous value
units: none
samples: all
interval: 1.00 sec
81
81
82
81
In this example, the number of running processes was reported once per second.
Where the semantics of the underlying performance metrics indicate that it would be sensible, pmval
reports the rate of change or resource utilization.
For example, the following command reports idle processor utilization for each of four CPUs on the remote
host dove, each five seconds apart, producing output of this form:
38
Monitoring System Performance
Similarly, the following command reports disk I/O read rate every minute for just the disk /dev/disk1,
and produces output similar to the following:
The -r flag may be used to suppress the rate calculation (for metrics with counter semantics) and display
the raw values of the metrics.
In the example below, manipulation of the time within the archive is achieved by the exchange of time
control messages between pmval and pmtime.
The pmval command is documented by the pmval(1) man page, and annotated examples of the use of
pmval can be found in the PCP Tutorials and Case Studies companion document.
The -T option is extremely useful; it provides help text about performance metrics:
pminfo -T mem.util.cached
mem.util.cached
Help:
Memory used by the page cache, including buffered file data.
This is in-memory cache for files read from the disk (the pagecache)
but doesn't include SwapCached.
The -t option displays the one-line help text associated with the selected metrics. The -T option prints
more verbose help text.
Without any options, pminfo verifies that the specified metrics exist in the namespace, and echoes those
names. Metrics may be specified as arguments to pminfo using their full metric names. For example, this
command returns the following response:
39
Monitoring System Performance
hinv.ncpu
network.interface.total.bytes
A group of related metrics in the namespace may also be specified. For example, to list all of the hinv
metrics you would use this command:
pminfo hinv
hinv.physmem
hinv.pagesize
hinv.ncpu
hinv.ndisk
hinv.nfilesys
hinv.ninterface
hinv.nnode
hinv.machine
hinv.map.scsi
hinv.map.cpu_num
hinv.map.cpu_node
hinv.map.lvname
hinv.cpu.clock
hinv.cpu.vendor
hinv.cpu.model
hinv.cpu.stepping
hinv.cpu.cache
hinv.cpu.bogomips
If no metrics are specified, pminfo displays the entire collection of metrics. This can be useful for searching
for metrics, when only part of the full name is known. For example, this command returns the following
response:
The -d option causes pminfo to display descriptive information about metrics (refer to the
pmLookupDesc(3) man page for an explanation of this metadata information). The following command
and response show use of the -d option:
disk.dev.read
Data Type: 32-bit unsigned int InDom: 60.1 0xf000001
40
Monitoring System Performance
filesys.free
Data Type: 64-bit unsigned int InDom: 60.5 0xf000005
Semantics: instant Units: Kbyte
The -l option causes pminfo to display labels about metrics (refer to the pmLookupLabels(3) man page
for an explanation of this metadata information). If the metric has an instance domain, the labels associated
with each instance of the metric is printed. The following command and response show use of the -l
option:
The -f option to pminfo forces the current value of each named metric to be fetched and printed. In the
example below, all metrics in the group hinv are selected:
pminfo -f hinv
hinv.physmem
value 15701
hinv.pagesize
value 16384
hinv.ncpu
value 4
hinv.ndisk
value 6
hinv.nfilesys
value 2
hinv.ninterface
value 8
hinv.nnode
value 2
hinv.machine
value "IP35"
hinv.map.cpu_num
inst [0 or "cpu:1.1.0.a"] value 0
inst [1 or "cpu:1.1.0.c"] value 1
inst [2 or "cpu:1.1.1.a"] value 2
inst [3 or "cpu:1.1.1.c"] value 3
41
Monitoring System Performance
hinv.map.cpu_node
inst [0 or "node:1.1.0"] value "/dev/hw/module/001c01/slab/0/node"
inst [1 or "node:1.1.1"] value "/dev/hw/module/001c01/slab/1/node"
hinv.cpu.clock
inst [0 or "cpu:1.1.0.a"] value 800
inst [1 or "cpu:1.1.0.c"] value 800
inst [2 or "cpu:1.1.1.a"] value 800
inst [3 or "cpu:1.1.1.c"] value 800
hinv.cpu.vendor
inst [0 or "cpu:1.1.0.a"] value "GenuineIntel"
inst [1 or "cpu:1.1.0.c"] value "GenuineIntel"
inst [2 or "cpu:1.1.1.a"] value "GenuineIntel"
inst [3 or "cpu:1.1.1.c"] value "GenuineIntel"
hinv.cpu.model
inst [0 or "cpu:1.1.0.a"] value "0"
inst [1 or "cpu:1.1.0.c"] value "0"
inst [2 or "cpu:1.1.1.a"] value "0"
inst [3 or "cpu:1.1.1.c"] value "0"
hinv.cpu.stepping
inst [0 or "cpu:1.1.0.a"] value "6"
inst [1 or "cpu:1.1.0.c"] value "6"
inst [2 or "cpu:1.1.1.a"] value "6"
inst [3 or "cpu:1.1.1.c"] value "6"
hinv.cpu.cache
inst [0 or "cpu:1.1.0.a"] value 0
inst [1 or "cpu:1.1.0.c"] value 0
inst [2 or "cpu:1.1.1.a"] value 0
inst [3 or "cpu:1.1.1.c"] value 0
hinv.cpu.bogomips
inst [0 or "cpu:1.1.0.a"] value 1195.37
inst [1 or "cpu:1.1.0.c"] value 1195.37
inst [2 or "cpu:1.1.1.a"] value 1195.37
inst [3 or "cpu:1.1.1.c"] value 1195.37
The -h option directs pminfo to retrieve information from the specified host. If the metric has an instance
domain, the value associated with each instance of the metric is printed:
The -m option prints the Performance Metric Identifiers (PMIDs) of the selected metrics. This is useful
for finding out which PMDA supplies the metric. For example, the output below identifies the PMDA
supporting domain 4 (the leftmost part of the PMID) as the one supplying information for the metric
environ.extrema.mintemp:
pminfo -m environ.extrema.mintemp
42
Monitoring System Performance
The -v option verifies that metric definitions in the PMNS correspond with supported metrics, and checks
that a value is available for the metric. Descriptions and values are fetched, but not printed. Only errors
are reported.
Complete information on the pminfo command is found in the pminfo(1) man page. There are further
examples of the use of pminfo in the PCP Tutorials and Case Studies.
Note
For obvious reasons, the ability to arbitrarily change the value of a performance metric is not
supported. Rather, PCP collectors selectively allow some metrics to be modified in a very
controlled fashion.
There are also command line flags to further specify the action. For example, the -i option restricts the
change to one or more instances of the performance metric.
The value may be in one of several forms, according to the following rules:
1. If the metric has an integer type, then value should consist of an optional leading hyphen, followed
either by decimal digits or “0x” and some hexadecimal digits; “0X” is also acceptable instead of “0x.”
2. If the metric has a floating point type, then value should be in the form of an integer (described above),
a fixed point number, or a number in scientific notation.
3. If the metric has a string type, then value is interpreted as a literal string of ASCII characters.
4. If the metric has an aggregate type, then an attempt is made to interpret value as an integer, a floating
point number, or a string. In the first two cases, the minimal word length encoding is used; for example,
“123” would be interpreted as a four-byte aggregate, and “0x100000000” would be interpreted as an
eight-byte aggregate.
The following example illustrates the use of pmstore to enable performance metrics collection in the
txmon PMDA (see ${PCP_PMDAS_DIR}/txmon for the source code of the txmon PMDA). When the
metric txmon.control.level has the value 0, no performance metrics are collected. Values greater
than 0 enable progressively more verbose instrumentation.
pminfo -f txmon.count
txmon.count
No value(s) available!
pmstore txmon.control.level 1
txmon.control.level old value=0 new value=1
pminfo -f txmon.count
txmon.count
43
Monitoring System Performance
For complete information on pmstore usage and syntax, see the pmstore(1) man page.
44
Chapter 5. Performance Metrics
Inference Engine
Table of Contents
Introduction to pmie .......................................................................................................... 46
Basic pmie Usage ............................................................................................................. 47
pmie use of PCP services ........................................................................................... 48
Simple pmie Usage ................................................................................................... 49
Complex pmie Examples ........................................................................................... 49
Specification Language for pmie ......................................................................................... 51
Basic pmie Syntax .................................................................................................... 51
Setting Evaluation Frequency ...................................................................................... 53
pmie Metric Expressions ............................................................................................ 53
pmie Rate Conversion ............................................................................................... 55
pmie Arithmetic Expressions ...................................................................................... 56
pmie Logical Expressions ........................................................................................... 56
pmie Rule Expressions .............................................................................................. 59
pmie Intrinsic Operators ............................................................................................. 61
pmie Examples ................................................................................................................. 62
Developing and Debugging pmie Rules ................................................................................ 64
Caveats and Notes on pmie ................................................................................................ 64
Performance Metrics Wraparound ................................................................................ 64
pmie Sample Intervals ............................................................................................... 64
pmie Instance Names ................................................................................................ 64
pmie Error Detection ................................................................................................. 65
Creating pmie Rules with pmieconf ..................................................................................... 65
Management of pmie Processes ........................................................................................... 67
Add a pmie crontab Entry ......................................................................................... 69
Global Files and Directories ........................................................................................ 69
pmie Instances and Their Progress ............................................................................... 70
The Performance Metrics Inference Engine (pmie) is a tool that provides automated monitoring of, and
reasoning about, system performance within the Performance Co-Pilot (PCP) framework.
• the section called “Introduction to pmie”, provides an introduction to the concepts and design of pmie.
• the section called “Basic pmie Usage”, describes the basic syntax and usage of pmie.
• the section called “Specification Language for pmie”, discusses the complete pmie rule specification
language.
• the section called “pmie Examples”, provides an example, covering several common performance
scenarios.
• the section called “Developing and Debugging pmie Rules”, presents some tips and techniques for pmie
rule development.
• the section called “Caveats and Notes on pmie”, presents some important information on using pmie.
45
Performance Metrics Inference Engine
• the section called “Creating pmie Rules with pmieconf”, describes how to use the pmieconf command
to generate pmie rules.
• the section called “Management of pmie Processes”, provides support for running pmie as a daemon.
Introduction to pmie
Automated reasoning within Performance Co-Pilot (PCP) is provided by the Performance Metrics
Inference Engine, (pmie), which is an applied artificial intelligence application.
The pmie tool accepts expressions describing adverse performance scenarios, and periodically evaluates
these against streams of performance metric values from one or more sources. When an expression is
found to be true, pmie is able to execute arbitrary actions to alert or notify the system administrator of
the occurrence of an adverse performance scenario. These facilities are very general, and are designed to
accommodate the automated execution of a mixture of generic and site-specific performance monitoring
and control functions.
The stream of performance metrics to be evaluated may be from one or more hosts, or from one or more
PCP archive logs. In the latter case, pmie may be used to retrospectively identify adverse performance
conditions.
Using pmie, you can filter, interpret, and reason about the large volume of performance data made available
from PCP collector systems or PCP archives.
• Automated real-time monitoring of a host, a set of hosts, or client-server pairs of hosts to raise
operational alarms when poor performance is detected in a production environment
• Nightly processing of archive logs to detect and report performance regressions, or quantify quality of
service for service level agreements or management reports, or produce advance warning of pending
performance problems
• Strategic performance management, for example, detection of slightly abnormal to chronic system
behavior, trend analysis, and capacity planning
The pmie expressions are described in a language with expressive power and operational flexibility. It
includes the following operators and functions:
• Generalized predicate-action pairs, where a predicate is a logical expression over the available
performance metrics, and the action is arbitrary. Predefined actions include the following:
• Launch a visible alarm with pmconfirm; see the pmconfirm(1) man page.
• Post an entry to the system log file; see the syslog(3) man page.
• Post an entry to the PCP noticeboard file ${PCP_LOG_DIR}/NOTICES; see the pmpost(1) man
page.
• Execute a shell command or script, for example, to send e-mail, initiate a pager call, warn the help
desk, and so on.
• Echo a message on standard output; useful for scripts that generate reports from retrospective
processing of PCP archive logs.
46
Performance Metrics Inference Engine
• Expression groups may have an independent evaluation frequency, to support both short-term and long-
term monitoring.
• Canonical scale and rate conversion of performance metric values to provide sensible expression
evaluation.
• Aggregation functions of sum, avg, min, and max, that may be applied to collections of performance
metrics values clustered over multiple hosts, or multiple instances, or multiple consecutive samples in
time.
• Universal and existential quantification, to handle expressions of the form “for every....” and “at least
one...”.
• Percentile aggregation to handle statistical outliers, such as “for at least 80% of the last 20 samples, ...”.
• Transparent operation against either live-feeds of performance metric values from PMCD on one or
more hosts, or against PCP archive logs of previously accumulated performance metric values.
The power of pmie may be harnessed to automate the most common of the deterministic system
management functions that are responses to changes in system performance. For example, disable a batch
stream if the DBMS transaction commit response time at the ninetieth percentile goes over two seconds,
or stop accepting uploads and send e-mail to the sysadmin alias if free space in a storage system falls
below five percent.
Moreover, the power of pmie can be directed towards the exceptional and sporadic performance problems.
For example, if a network packet storm is expected, enable IP header tracing for ten seconds, and send
e-mail to advise that data has been collected and is awaiting analysis. Or, if production batch throughput
falls below 50 jobs per minute, activate a pager to the systems administrator on duty.
Obviously, pmie customization is required to produce meaningful filtering and actions in each production
environment. The pmieconf tool provides a convenient customization method, allowing the user to
generate parameterized pmie rules for some of the more common performance scenarios.
- When-verbose mode: when a condition is true, the satisfying expression bindings are displayed.
W
pmie filename
47
Performance Metrics Inference Engine
In this form, the expressions to be evaluated are read from filename. In the absence of a given
filename, expressions are read from standard input, which may be your system keyboard.
PCP makes available thousands of performance metrics that you can use when formulating expressions
for pmie to evaluate. If you want to find out which metrics are currently available on your system, use
this command:
pminfo
Use the pmie command line arguments to find out more about a particular metric. In Example 5.1, “pmie
with the -f Option”, to fetch new metric values from host dove, you use the -f flag:
disk.dev.total
inst [0 or "xscsi/pci00.01.0/target81/lun0/disc"] value 131233
inst [4 or "xscsi/pci00.01.0/target82/lun0/disc"] value 4
inst [8 or "xscsi/pci00.01.0/target83/lun0/disc"] value 4
inst [12 or "xscsi/pci00.01.0/target84/lun0/disc"] value 4
inst [16 or "xscsi/pci00.01.0/target85/lun0/disc"] value 4
inst [18 or "xscsi/pci00.01.0/target86/lun0/disc"] value 4
This reveals that on the host dove, the metric disk.dev.total has six instances, one for each disk
on the system.
Use the following command to request help text (specified with the -T flag) to provide more information
about performance metrics:
pminfo -T network.interface.in.packets
The metadata associated with a performance metric is used by pmie to determine how the value should be
interpreted. You can examine the descriptor that encodes the metadata by using the -d flag for pminfo,
as shown in Example 5.2, “pmie with the -d and -h Options”:
mem.util.cached
Data Type: 64-bit unsigned int InDom: PM_INDOM_NULL 0xffffffff
Semantics: instant Units: Kbyte
kernel.percpu.cpu.user
Data Type: 64-bit unsigned int InDom: 60.0 0xf000000
48
Performance Metrics Inference Engine
Note
A cumulative counter such as kernel.percpu.cpu.user is automatically converted by
pmie into a rate (measured in events per second, or count/second), while instantaneous values
such as mem.util.cached are not subjected to rate conversion. Metrics with an instance
domain (InDom in the pminfo output) of PM_INDOM_NULL are singular and always produce
one value per source. However, a metric like kernel.percpu.cpu.user has an instance
domain, and may produce multiple values per source (in this case, it is one value for each
configured CPU).
On this system, there are two disk spindles, hence two values of the expression iops per sample. Notice
that the values for the first sample are unknown (represented by the question marks [?] in the first line
of output), because rates can be computed only when at least two samples are available. The subsequent
samples are produced every ten seconds by default. The second sample reports that during the preceding
ten seconds there was an average of 14.4 transfers per second on one disk and no transfers on the other disk.
Rates are computed using time stamps delivered by PMCD. Due to unavoidable inaccuracy in the actual
sampling time (the sample interval is not exactly 10 seconds), you may see more decimal places in values
than you expect. Notice, however, that these errors do not accumulate but cancel each other out over
subsequent samples.
In Example 5.3, “pmie with the -v Option”, the expression to be evaluated was entered using the keyboard,
followed by the end-of-file character [Ctrl+D]. Usually, it is more convenient to enter expressions into a
file (for example, myrules) and ask pmie to read the file. Use this command syntax:
pmie -v myrules
Please refer to the pmie(1) man page for a complete description of pmie command line options.
49
Performance Metrics Inference Engine
The following arithmetic expression computes the percentage of write operations over the total number
of disk transfers.
The disk.all metrics are singular, so this expression produces exactly one value per sample,
independent of the number of disk devices.
Note
If there is no disk activity, disk.all.total will be zero and pmie evaluates this expression
to be not a number. When -v is used, any such values are displayed as question marks.
The following logical expression has the value true or false for each disk:
The value is true if the number of writes exceeds the number of reads, and if there is significant disk activity
(more than 10 transfers per second). Example 5.4, “Printed pmie Output” demonstrates a simple action:
This prints a message to the standard output whenever the total number of transfers for some disk
(some_inst) exceeds 60 transfers per second. The %i (instance) in the message is replaced with the
name(s) of the disk(s) that caused the logical expression to be true.
Using pmie to evaluate the above expressions every 3 seconds, you see output similar to Example 5.5,
“Labelled pmie Output”. Notice the introduction of labels for each pmie expression.
pct_wrt: 18.43
busy_wrt: false false
busy: false
50
Performance Metrics Inference Engine
pct_wrt: 19.85
busy_wrt: true false
busy: false
pct_wrt: ?
busy_wrt: false false
busy: false
Mon Aug 5 14:56:17 2012: [disk1] high disk i/o [disk2] high disk i/o
pct_wrt: 14.8
busy_wrt: false false
busy: true
The first sample contains unknowns, since all expressions depend on computing rates. Also notice that the
expression pct_wrt may have an undefined value whenever all disks are idle, as the denominator of the
expression is zero. If one or more disks is busy, the expression busy is true, and the message from the
print in the action part of the rule appears (before the -v values).
1. Performance metric values are obtained from PMCD for real-time sources, otherwise from PCP archive
logs.
2. Metrics values may be combined using arithmetic operators to produce arithmetic expressions.
3. Arithmetic expressions may be compared using relational operators to produce logical expressions.
4. Logical expressions may be combined using Boolean operators, including powerful quantifiers.
5. Aggregation operators may be used to compute summary expressions, for either arithmetic or logical
operands.
Lexical Elements
All pmie expressions are composed of the following lexical elements:
51
Performance Metrics Inference Engine
'vms$slow_response'
Within quotes of any sort, the backslash (\) may be used as an escape character as shown in the following
example:
Comments
Comments may be embedded anywhere in the source, in either of these forms:
Macros
When they are fully specified, expressions in pmie tend to be verbose and repetitive. The use of macros can
reduce repetition and improve readability and modularity. Any statement of the following form associates
the macro name identifier with the given string constant.
identifier = "string";
Any subsequent occurrence of the macro name identifier is replaced by the string most recently
associated with a macro definition for identifier.
$identifier
disk = "disk.all";
Note
Macro expansion is performed before syntactic parsing; so macros may only be assigned constant
string values.
52
Performance Metrics Inference Engine
Units
The inference engine converts all numeric values to canonical units (seconds for time, bytes for space, and
events for count). To avoid surprises, you are encouraged to specify the units for numeric constants. If
units are specified, they are checked for dimension compatibility against the metadata for the associated
performance metrics.
The syntax for a units specification is a sequence of one or more of the following keywords separated
by either a space or a slash (/), to denote per: byte, KByte, MByte, GByte, TByte, nsec,
nanosecond, usec, microsecond, msec, millisecond, sec, second, min, minute, hour,
count, Kcount, Mcount, Gcount, or Tcount. Plural forms are also accepted.
Note
If you do not specify the units for numeric constants, it is assumed that the constant is in the
canonical units of seconds for time, bytes for space, and events for count, and the dimensionality
of the constant is assumed to be correct. Thus, in the following expression, the 500 is interpreted
as 500 bytes.
If present, units must be one of the time units described in the preceding section. If absent, units
are assumed to be seconds. For example, the following expression has the effect that any subsequent
expressions (up to the next expression that assigns a value to delta) are scheduled for evaluation at a
fixed frequency, once every five minutes.
delta = 5 min;
The default value for delta may be specified using the -t command line option; otherwise delta is
initially set to be 10 seconds.
53
Performance Metrics Inference Engine
• A sample time
The permissible values for hosts are the range of valid hostnames as provided by Internet naming
conventions.
The names for instances are provided by the Performance Metrics Domain Agents (PMDA) for the instance
domain associated with the chosen performance metric.
The sample time specification is defined as the set of natural numbers 0, 1, 2, and so on. A number refers
to one of a sequence of sampling events, from the current sample 0 to its predecessor 1, whose predecessor
was 2, and so on. This scheme is illustrated by the time line shown in Figure 5.1, “Sampling Time Line”.
4 3 2 1 0
Each sample point is assumed to be separated from its predecessor by a constant amount of real time, the
delta. The most recent sample point is always zero. The value of delta may vary from one expression
to the next, but is fixed for each expression; for more information on the sampling interval, see the section
called “Setting Evaluation Frequency”.
For pmie, a metrics expression is the name of a metric, optionally qualified by a host, instance and sample
time specification. Special characters introduce the qualifiers: colon (:) for hosts, hash or pound sign (#)
for instances, and at (@) for sample times. The following expression refers to the previous value (@1) of
the counter for the disk read operations associated with the disk instance #disk1 on the host moomba.
In fact, this expression defines a point in the three-dimensional (3D) parameter space of {host} x
{instance} x {sample time} as shown in Figure 5.2, “Three-Dimensional Parameter Space”.
host
sample
time
instance
A metric expression may also identify sets of values corresponding to one-, two-, or three-dimensional
slices of this space, according to the following rules:
54
Performance Metrics Inference Engine
1. A metric expression consists of a PCP metric name, followed by optional host specifications, followed
by optional instance specifications, and finally, optional sample time specifications.
2. A host specification consists of one or more host names, each prefixed by a colon (:). For example:
:indy :far.away.domain.com :localhost
3. A missing host specification implies the default pmie source of metrics, as defined by a -h option on
the command line, or the first named archive in an -a option on the command line, or PMCD on the
local host.
4. An instance specification consists of one or more instance names, each prefixed by a hash or pound
(#) sign. For example: #eth0 #eth2
Recall that you can discover the instance names for a particular metric, using the pminfo command.
See the section called “pmie use of PCP services”.
Within the pmie grammar, an instance name is an identifier. If the instance name contains characters
other than alphanumeric characters, enclose the instance name in single quotes; for example, #'/
boot' #'/usr'
5. A missing instance specification implies all instances for the associated performance metric from each
associated pmie source of metrics.
6. A sample time specification consists of either a single time or a range of times. A single time is
represented as an at (@) followed by a natural number. A range of times is an at (@), followed by a
natural number, followed by two periods (..) followed by a second natural number. The ordering of
the end points in a range is immaterial. For example, @0..9 specifies the last 10 sample times.
7. A missing sample time specification implies the most recent sample time.
The following metric expression refers to a three-dimensional set of values, with two hosts in one
dimension, five sample times in another, and the number of instances in the third dimension being
determined by the number of configured disk spindles on the two hosts.
disk.all.total
A single value for this metric tells you only that a certain number of disk I/O operations have occurred
since boot time, and that information may be invalid if the counter has exceeded its 32-bit range and
wrapped. You need at least two values, sampled at known times, to compute the recent rate at which the
I/O operations are being executed. The required syntax would be this:
The accuracy of delta as a measure of actual inter-sample delay is an issue. pmie requests samples, at
intervals of approximately delta, while the results exported from PMCD are time stamped with the high-
resolution system clock time when the samples were extracted. For these reasons, a built-in and implicit
rate conversion using accurate time stamps is provided by pmie for performance metrics that have counter
semantics. For example, the following expression is unconditionally converted to a rate by pmie.
disk.all.total
55
Performance Metrics Inference Engine
the section called “pmie Intrinsic Operators”, describes additional operators that may be used for aggregate
operations to reduce the dimensionality of an arithmetic expression.
• Logical constants
• Relational expressions
• Boolean expressions
• Quantification operators
Logical Constants
Like in the C programming language, pmie interprets an arithmetic value of zero to be false, and all other
arithmetic values are considered true.
Relational Expressions
Relational expressions are the simplest form of logical expression, in which values may be derived
from arithmetic expressions using pmie relational operators. For example, the following is a relational
expression that is true or false, depending on the aggregate total of disk read operations per second being
greater than 50.
All of the relational logical operators and precedence rules of the C programming language are supported
in pmie.
As described in the section called “pmie Metric Expressions”, arithmetic expressions in pmie may assume
set values. The relational operators are also required to take constant, singleton, and set-valued expressions
as arguments. The result has the same dimensionality as the operands. Suppose the rule in Example 5.6,
“Relational Expressions ” is given:
pmie -V uag.11
56
Performance Metrics Inference Engine
all_intf:
gonzo: [eth0] ? ? ?
gonzo: [eth2] ? ? ?
all_intf:
gonzo: [eth0] false ? ?
gonzo: [eth2] false ? ?
all_intf:
gonzo: [eth0] true false ?
gonzo: [eth2] false false ?
all_intf:
gonzo: [eth0] true true false
gonzo: [eth2] false false false
At each sample, the relational operator greater than (>) produces six truth values for the cross-product of
the instance and sample time dimensions.
the section called “Quantification Operators”, describes additional logical operators that may be used to
reduce the dimensionality of a relational expression.
Boolean Expressions
The regular Boolean operators from the C programming language are supported: conjunction (&&),
disjunction (||) and negation (!).
As with the relational operators, the Boolean operators accommodate set-valued operands, and set-valued
results.
Quantification Operators
Boolean and relational operators may accept set-valued operands and produce set-valued results. In many
cases, rules that are appropriate for performance management require a set of truth values to be reduced
along one or more of the dimensions of hosts, instances, and sample times described in the section called
“pmie Metric Expressions”. The pmie quantification operators perform this function.
Each quantification operator takes a one-, two-, or three-dimension set of truth values as an operand, and
reduces it to a set of smaller dimension, by quantification along a single dimension. For example, suppose
the expression in the previous example is simplified and prefixed by some_sample, to produce the
following expression:
Then the expression result is reduced from six values to two (one per interface instance), such that the
result for a particular instance will be false unless the relational expression for the same interface instance
is true for at least one of the preceding three sample times.
There are existential, universal, and percentile quantification operators in each of the host, instance,
and sample time dimensions to produce the nine operators as follows:
some_host True if the expression is true for at least one host for the same
instance and sample time.
all_host True if the expression is true for every host for the same
instance and sample time.
57
Performance Metrics Inference Engine
N%_host True if the expression is true for at least N% of the hosts for the
same instance and sample time.
some_inst True if the expression is true for at least one instance for the
same host and sample time.
all_instance True if the expression is true for every instance for the same
host and sample time.
some_sample time True if the expression is true for at least one sample time for
the same host and instance.
all_sample time True if the expression is true for every sample time for the
same host and instance.
N%_sample time True if the expression is true for at least N% of the sample
times for the same host and instance.
These operators may be nested. For example, the following expression answers the question: “Are all hosts
experiencing at least 20% of their disks busy either reading or writing?”
The following expression uses different syntax to encode the same semantics:
all_host (
20%_inst (
disk.dev.read $Servers > 40 ||
disk.dev.write $Servers > 40
)
);
Note
To avoid confusion over precedence and scope for the quantification operators, use explicit
parentheses.
Two additional quantification operators are available for the instance dimension only, namely
match_inst and nomatch_inst, that take a regular expression and a boolean expression. The result
is the boolean AND of the expression and the result of matching (or not matching) the associated instance
name against the regular expression.
For example, this rule evaluates error rates on various 10BaseT Ethernet network interfaces (such as ecN,
ethN, or efN):
some_inst
match_inst "^(ec|eth|ef)"
network.interface.total.errors > 10 count/sec
-> syslog "Ethernet errors:" " %i"
58
Performance Metrics Inference Engine
• If the logical expression lexpr evaluates true, then perform the actions that follow. Otherwise,
do not perform the actions.
• It is required that lexpr has a singular truth value. Aggregation and quantification operators must have
been applied to reduce multiple truth values to a single value.
• One or more actions may appear; consecutive actions are separated by operators that control the
execution of subsequent actions, as follows:
An action is composed of a keyword to identify the action method, an optional time specification,
and one or more arguments.
A time specification uses the same syntax as a valid time interval that may be assigned to delta, as
described in the section called “Setting Evaluation Frequency”. If the action is executed and the time
specification is present, pmie will suppress any subsequent execution of this action until the wall clock
time has advanced by time.
shell The single argument is passed to the shell for execution. This action is implemented
using system in the background. The action does not wait for the system call to return,
and succeeds unless the fork fails.
alarm A notifier containing a time stamp, a single argument as a message, and a Cancel button is
posted on the current display screen (as identified by the DISPLAY environment variable).
Each alarm action first checks if its notifier is already active. If there is an identical active
notifier, a duplicate notifier is not posted. The action succeeds unless the fork fails.
syslog A message is written into the system log. If the first word of the first argument is -p, the
second word is interpreted as the priority (see the syslog(3) man page); the message tag is
pcp-pmie. The remaining argument is the message to be written to the system log. This
action always succeeds.
print A message containing a time stamp in ctime(3) format and the argument is displayed out
to standard output (stdout). This action always succeeds.
Within the argument passed to an action method, the following expansions are supported to allow some
of the context from the logical expression on the left to appear to be embedded in the argument:
59
Performance Metrics Inference Engine
Some ambiguity may occur in respect to which host, instance, or performance metric is bound to a %-
token. In most cases, the leftmost binding in the top-level subexpression is used. You may need to use
pmie in the interactive debugging mode (specify the -d command line option) in conjunction with the -
W command line option to discover which subexpressions contributes to the %-token bindings.
Example 5.7, “Rule Expression Options ” illustrates some of the options when constructing rule
expressions:
In this case, %v and %i are both associated with the instances for the metric disk.dev.total that
make the expression true. If more than one instance makes the expression true (more than one disk is busy),
then the argument is formed by concatenating the result from each %-token binding. The text added to the
system log file might be as shown in Example 5.8, “System Log Text”:
Note
When pmie is processing performance metrics from a set of PCP archive logs, the actions
will be processed in the expected manner; however, the action methods are modified to report a
textual facsimile of the action on the standard output.
When evaluated against an archive, the following output is generated (the alarm action produces a message
on standard output):
60
Performance Metrics Inference Engine
• Arithmetic aggregation
• Transitional operators
Arithmetic Aggregation
For set-valued arithmetic expressions, the following operators reduce the dimensionality of the result by
arithmetic aggregation along one of the host, instance, or sample time dimensions. For example,
to aggregate in the host dimension, the following operators are provided:
avg_host Computes the average value across all instances for the same host and
sample time
sum_host Computes the total value across all instances for the same host and
sample time
count_host Computes the number of values across all instances for the same host and
sample time
min_host Computes the minimum value across all instances for the same host and
sample time
max_host Computes the maximum value across all instances for the same host and
sample time
The following example illustrates the use of an aggregate operator in combination with an existential
operator to answer the question “Does some host currently have two or more busy processors?”
61
Performance Metrics Inference Engine
rate mem.util.cached
It returns the rate of change for the mem.util.cached performance metric; that is, the rate at which
page cache memory is being allocated and released.
The rate intrinsic operator is most useful for metrics with instantaneous value semantics. For metrics
with counter semantics, pmie already performs an implicit rate calculation (see the the section called
“pmie Rate Conversion”) and the rate operator would produce the second derivative with respect to
time, which is less likely to be useful.
Transitional Operators
In some cases, an action needs to be triggered when an expression changes from true to false or vice versa.
The following operators take a logical expression as an operand, and return a logical expression:
rising Has the value true when the operand transitions from false to true in consecutive
samples.
falling Has the value false when the operand transitions from true to false in consecutive
samples.
pmie Examples
The examples presented in this section are task-oriented and use the full power of the pmie specification
language as described in the section called “Specification Language for pmie”.
Source code for the pmie examples in this chapter, and many more examples, is provided within the PCP
Tutorials and Case Studies. Example 5.10, “Monitoring CPU Utilization” and Example 5.11, “Monitoring
Disk Activity ” illustrate monitoring CPU utilization and disk activity.
62
Performance Metrics Inference Engine
63
Performance Metrics Inference Engine
If both the -d option and a filename are present, the expressions in the given file are loaded before entering
interactive mode. Interactive mode is useful for debugging new rules.
• If the given instance name and the looked up name are the same, they are considered to match.
• Otherwise, the first two space separated tokens are extracted from the looked up name. If the given
instance name is the same as either of these tokens, they are considered a match.
For some metrics, notably the per process (proc.xxx.xxx) metrics, the first token in the looked up
instance name is impossible to determine at the time you are writing pmie expressions. The above policy
circumvents this problem.
64
Performance Metrics Inference Engine
pmie -v -d
pmie> f
expression
Ctrl+D
pmie> l
pmie> r 10sec
Then quit:
pmie> q
It is not always possible to detect semantic errors at parse time. This happens when a performance metric
descriptor is not available from the named host at this time. A warning is issued, and the expression is put
on a wait list. The wait list is checked periodically (about every five minutes) to see if the metric descriptor
has become available. If an error is detected at this time, a message is printed to the standard error stream
(stderr) and the offending expression is set aside.
pmieconf reads two different forms of supplied input files and produces a localized pmie configuration
file as its output.
The first input form is a generalized pmie rule file such as those found below ${PCP_VAR_DIR}/
config/pmieconf. These files contain the generalized rules which pmieconf is able to manipulate.
Each of the rules can be enabled or disabled, or the individual variables associated with each rule can be
edited.
The second form is an actual pmie configuration file (that is, a file which can be interpreted by pmie,
conforming to the pmie syntax described in the section called “Specification Language for pmie”). This
file is both input to and output from pmieconf.
The input version of the file contains any changed variables or rule states from previous invocations
of pmieconf, and the output version contains both the changes in state (for any subsequent pmieconf
sessions) and the generated pmie syntax. The pmieconf state is embedded within a pmie comment block
at the head of the output file and is not interpreted by pmie itself.
pmieconf is an integral part of the pmie daemon management process described in the section called
“Management of pmie Processes”. Procedure 5.1, “Display pmieconf Rules” and Procedure 5.2, “Modify
65
Performance Metrics Inference Engine
pmieconf Rules and Generate a pmie File” introduce the pmieconf tool through a series of typical
operations.
pmieconf -f ${PCP_SYSCONF_DIR}/pmie/config.demo
Updates will be made to ${PCP_SYSCONF_DIR}/pmie/config.demo
pmieconf>
2. List the set of available pmieconf rules by using the rules command.
4. List only the enabled rules, using the rules enabled command.
pmieconf>
pmieconf>
66
Performance Metrics Inference Engine
pmieconf>
2. Disable all of the rules except for the memory.swap_low rule so that you can see the effects of
your change in isolation.
pmieconf> status
verbose: off
enabled rules: 1 of 35
pmie configuration file: ${PCP_SYSCONF_DIR}/pmie/config.demo
pmie processes (PIDs) using this file: (none found)
pmieconf> quit
You can also use the status command to verify that only one rule is enabled at the end of this step.
3. Run pmie with the new configuration file. Use a text editor to view the newly generated pmie
configuration file (${PCP_SYSCONF_DIR}/pmie/config.demo), and then run the command:
memory.swap_low: false
cat ${HOME}/demo.log
Log for pmie on venus started Mon Jun 21 16:26:06 2012
4. Notice that both of the pmieconf files used in the previous step are simple text files, as described in
the pmieconf(5) man page:
file ${PCP_SYSCONF_DIR}/pmie/config.demo
${PCP_SYSCONF_DIR}/pmie/config.demo: PCP pmie config (V.1)
file ${PCP_VAR_DIR}/config/pmieconf/memory/swap_low
${PCP_VAR_DIR}/config/pmieconf/memory/swap_low: PCP pmieconf rules (V.1)
chkconfig pmie on
67
Performance Metrics Inference Engine
${PCP_RC_DIR}/pmie start
By default, these enable a single pmie process monitoring the local host, with the default set of pmieconf
rules enabled (for more information about pmieconf, see the section called “Creating pmie Rules with
pmieconf”). Procedure 5.3, “Add a New pmie Instance to the pmie Daemon Management Framework”
illustrates how you can use these commands to start any number of pmie processes to monitor local or
remote machines.
Procedure 5.3. Add a New pmie Instance to the pmie Daemon Management
Framework
1. Use a text editor (as superuser) to edit the pmie${PCP_PMIECONTROL_PATH} and
${PCP_PMIECONTROL_PATH}.d control files. Notice the default entry, which looks like this:
This entry is used to enable a local pmie process. Add a new entry for a remote host on your
local network (for example, venus), by using your pmie configuration file (see the section called
“Creating pmie Rules with pmieconf”):
Note
Without an absolute path, the configuration file (-c above) will be resolved using
${PCP_SYSCONF_DIR}/pmie - if config.demo was created in Procedure 5.2,
“Modify pmieconf Rules and Generate a pmie File” it would be used here for host
venus, otherwise a new configuration file will be generated using the default rules (at
${PCP_SYSCONF_DIR}/pmie/config.demo).
chkconfig pmie on
This simple step allows pmie to be started as part of your machine's boot process.
3. Start the two pmie daemons. At the end of this step, you should see two new pmie processes
monitoring the local and remote hosts:
${PCP_RC_DIR}/pmie start
Performance Co-Pilot starting inference engine(s) ...
Wait a few moments while the startup scripts run. The pmie start script uses the pmie_check script
to do most of its work.
pcp
Performance Co-Pilot configuration on pluto:
68
Performance Metrics Inference Engine
If a remote host is not up at the time when pmie is started, the pmie process may exit. pmie processes
may also exit if the local machine is starved of memory resources. To counter these adverse cases, it can
be useful to have a crontab entry running. Adding an entry as shown in the section called “Add a pmie
crontab Entry” ensures that if one of the configured pmie processes exits, it is automatically restarted.
Note
Depending on your platform, the crontab entry discussed here may already have been installed
for you, as part of the package installation process. In this case, the file /etc/cron.d/pcp-
pmie will exist, and the rest of this section can be skipped.
1. Augment the crontab file for the pcp user. For example:
2. Edit ${HOME}/crontab.txt, adding lines similar to those from the sample ${PCP_VAR_DIR}/
config/pmie/crontab file for pmie_daily and pmie_check; for example:
69
Performance Metrics Inference Engine
${PCP_LOG_DIR}/pmie/* Contains the pmie log files for the host. These
files are created by the default behavior of the
${PCP_RC_DIR}/pmie startup scripts.
70
Chapter 6. Archive Logging
Table of Contents
Introduction to Archive Logging .......................................................................................... 72
Archive Logs and the PMAPI ..................................................................................... 72
Retrospective Analysis Using Archive Logs ................................................................... 72
Using Archive Logs for Capacity Planning .................................................................... 73
Using Archive Logs with Performance Tools ......................................................................... 73
Coordination between pmlogger and PCP tools .............................................................. 73
Administering PCP Archive Logs Using cron Scripts ...................................................... 73
Archive Log File Management .................................................................................... 74
Cookbook for Archive Logging ........................................................................................... 77
Primary Logger ......................................................................................................... 77
Other Logger Configurations ....................................................................................... 78
Archive Log Administration ........................................................................................ 80
Other Archive Logging Features and Services ........................................................................ 80
PCP Archive Folios ................................................................................................... 80
Manipulating Archive Logs with pmlogextract .............................................................. 81
Summarizing Archive Logs with pmlogsummary .......................................................... 81
Primary Logger ......................................................................................................... 81
Using pmlc .............................................................................................................. 82
Archive Logging Troubleshooting ........................................................................................ 83
pmlogger Cannot Write Log ....................................................................................... 83
Cannot Find Log ....................................................................................................... 83
Primary pmlogger Cannot Start ................................................................................... 84
Identifying an Active pmlogger Process ....................................................................... 85
Illegal Label Record .................................................................................................. 85
Empty Archive Log Files or pmlogger Exits Immediately ................................................ 85
Performance monitoring and management in complex systems demands the ability to accurately capture
performance characteristics for subsequent review, analysis, and comparison. Performance Co-Pilot (PCP)
provides extensive support for the creation and management of archive logs that capture a user-specified
profile of performance information to support retrospective performance analysis.
• the section called “Introduction to Archive Logging”, presents the concepts and issues involved with
creating and using archive logs.
• the section called “Using Archive Logs with Performance Tools”, describes the interaction of the PCP
tools with archive logs.
• the section called “Cookbook for Archive Logging”, shows some shortcuts for setting up useful PCP
archive logs.
• the section called “Other Archive Logging Features and Services”, provides information about other
archive logging features and sevices.
• the section called “Archive Logging Troubleshooting”, presents helpful directions if your archive
logging implementation is not functioning correctly.
71
Archive Logging
• Archive log creation at either a PCP collector (typically a server) or a PCP monitor system (typically a
workstation), or at some designated PCP archive logger host.
• Concurrent independent logging, both local and remote. The performance analyst can activate a private
pmlogger instance to collect only the metrics of interest for the problem at hand, independent of other
logging on the workstation or remote host.
• Independent determination of logging frequency for individual metrics or metric instances. For example,
you could log the “5 minute” load average every half hour, the write I/O rate on the DBMS log spindle
every 10 seconds, and aggregate I/O rates on the other disks every minute.
• Dynamic adjustment of what is to be logged, and how frequently, via pmlc. This feature may be used to
disable logging or to increase the sample interval during periods of low activity or chronic high activity.
A local pmlc may interrogate and control a remote pmlogger, subject to the access control restrictions
implemented by pmlogger.
• Self-contained logs that include all system configuration and metadata required to interpret the values
in the log. These logs can be kept for analysis at a much later time, potentially after the hardware or
software has been reconfigured and the logs have been stored as discrete, autonomous files for remote
analysis. The logs are endian-neutral and platform independent - there is no requirement that the monitor
host machine used for analysis be similar to the collector machine in any way, nor do they have to have
the same versions of PCP. PCP archives created over 15 years ago can still be replayed with the current
versions of PCP!
• cron-based scripts to expedite the operational management, for example, log rotation, consolidation,
and culling. Another helper tool, pmlogconf can be used to generate suitable logging configurations
for a variety of situations.
• Archive folios as a convenient aggregation of multiple archive logs. Archive folios may be created with
the mkaf utility and processed with the pmafm tool.
Live feeds (or real-time) sources of performance metrics and archives are literally interchangeable, with
a single Performance Metrics Application Programming Interface (PMAPI) that preserves the same
semantics for both styles of metric source. In this way, applications and tools developed against the PMAPI
can automatically process either live or historical performance data.
72
Archive Logging
Archive logs are also an invaluable source of intelligence when trying to diagnose what went wrong, as
in a performance post-mortem. Because the PCP archive logs are entirely self-contained, this analysis can
be performed off-site if necessary.
Each archive log contains metric values from only one host. However, many PCP tools can simultaneously
visualize values from multiple archives collected from different hosts.
The archives can be replayed using the inference engine (pmie is an application that uses the PMAPI).
This allows you to automate the regular, first-level analysis of system performance.
Such analysis can be performed by constructing suitable expressions to capture the essence of common
resource saturation problems, then periodically creating an archive and playing it against the expressions.
For example, you may wish to create a daily performance audit (perhaps run by the cron command) to
detect performance regressions.
For more about pmie, see Chapter 5, Performance Metrics Inference Engine.
Note
pmlogger uses buffered I/O to write the archive log so that the end of the archive may be aligned
with an I/O buffer boundary, rather than with a logical archive log record. If such an archive was
read by a PCP tool, it would appear truncated and might confuse the tool. These problems may
be avoided by sending pmlogger a SIGUSR1 signal, or by using the flush command of pmlc to
force pmlogger to flush its output buffers.
PCP supplies shell scripts to use the cron functionality to help manage your archive logs. The following
scripts are supplied:
73
Archive Logging
Script Description
pmlogger_check(1) Checks to see that all desired pmlogger processes are running
on your system, and invokes any that are missing for any reason.
The configuration files used by these scripts can be edited to suit your particular
needs, and are generally controlled by the ${PCP_PMLOGGERCONTROL_PATH} and
${PCP_PMLOGGERCONTROL_PATH}.d files (pmsnap has an additional control file,
${PCP_PMSNAPCONTROL_PATH}). Complete information on these scripts is available in the
pmlogger_daily(1) and pmsnap(1) man pages.
Basename Conventions
When a PCP archive is created by pmlogger, an archive basename must be specified and several physical
files are created, as shown in Table 6.1, “Filenames for PCP Archive Log Components (archive.*)”.
Filename Contents
archive.index Temporal index for rapid access to archive contents.
archive.meta Metadata descriptions for performance metrics and instance domains
appearing in the archive.
archive.N Volumes of performance metrics values, for N = 0,1,2,...
Log Volumes
A single PCP archive may be partitioned into a number of volumes. These volumes may expedite
management of the archive; however, the metadata file and at least one volume must be present before a
PCP tool can process the archive.
You can control the size of an archive log volume by using the -v command line option to pmlogger.
This option specifies how large a volume should become before pmlogger starts a new volume. Archive
74
Archive Logging
log volumes retain the same base filename as other files in the archive log, and are differentiated by a
numeric suffix that is incremented with each volume change. For example, you might have a log volume
sequence that looks like this:
netserver-log.0
netserver-log.1
netserver-log.2
You can also cause an existing log to be closed and a new one to be opened by sending a SIGHUP signal
to pmlogger, or by using the pmlc command to change the pmlogger instructions dynamically, without
interrupting pmlogger operation. Complete information on log volumes is found in the pmlogger(1) man
page.
Once configured, the PCP tools that manage archive logs employ a consistent scheme for selecting
the basename for an archive each time pmlogger is launched, namely the current date and time in the
format YYYYMMDD.HH.MM. Typically, at the end of each day, all archives for a particular host
on that day would be merged to produce a single archive with a basename constructed from the date,
namely YYYYMMDD. The pmlogger_daily script performs this action and a number of other routine
housekeeping chores.
Note
There are many possible configurations of pmlogger, as described in the section called
“PCP Archive Logger Deployment”. The directory organization described in this section is
recommended for any system on which pmlogger is configured for permanent execution (as
opposed to short-term executions, for example, as launched from pmchart to record some
performance data of current interest).
Typically, the filesystem structure can be used to reflect the number of hosts for which a pmlogger instance
is expected to be running locally, obviating the need for lengthy and cumbersome filenames. It makes
considerable sense to place all logs for a particular host in a separate directory named after that host.
Because each instance of pmlogger can only log metrics fetched from a single host, this also simplifies
some of the archive log management and administration tasks.
For example, consider the filesystem and naming structure shown in Figure 6.1, “Archive Log Directory
Structure”.
75
Archive Logging
PCP Archives from host one PCP Archives from host two
${PCP_LOG_DIR}/pmlogger/one ${PCP_LOG_DIR}/pmlogger/two
Basename: Basename:
20130803 20130804
Basename: Basename:
20130804 Basename: 20130805
20130805
The specification of where to place the archive log files for particular pmlogger instances is encoded in the
${PCP_PMLOGGERCONTROL_PATH} and ${PCP_PMLOGGERCONTROL_PATH}.d configuration
files, and these files should be customized on each host running an instance of pmlogger.
If many archives are being created, and the associated PCP collector systems form peer classes based upon
service type (Web servers, DBMS servers, NFS servers, and so on), then it may be appropriate to introduce
another layer into the directory structure, or use symbolic links to group together hosts providing similar
service types.
Configuration of pmlogger
The configuration files used by pmlogger describe which metrics are to be logged. Groups of metrics may
be logged at different intervals to other groups of metrics. Two states, mandatory and advisory, also apply
to each group of metrics, defining whether metrics definitely should be logged or not logged, or whether
a later advisory definition may change that state.
The mandatory state takes precedence if it is on or off, causing any subsequent request for a change in
advisory state to have no effect. If the mandatory state is maybe, then the advisory state determines if
logging is enabled or not.
The mandatory states are on, off, and maybe. The advisory states, which only affect metrics that are
mandatory maybe, are on and off. Therefore, a metric that is mandatory maybe in one definition and
advisory on in another definition would be logged at the advisory interval. Metrics that are not specified
in the pmlogger configuration file are mandatory maybe and advisory off by default and are not logged.
A complete description of the pmlogger configuration format can be found on the pmlogger(1) man page.
76
Archive Logging
pmdumplog -l ${PCP_LOG_DIR}/pmlogger/www.sgi.com/19960731
The simplest way to discover what performance metrics are contained within a set of archives is to use
pminfo as shown in Example 6.1, “Using pminfo to Obtain Archive Information”:
Primary Logger
Assume you wish to activate primary archive logging on the PCP collector host pluto. Execute the
following while logged into pluto as the superuser (root).
chkconfig pmcd on
chkconfig pmlogger on
${PCP_RC_DIR}/pmcd start
Starting pmcd ...
${PCP_RC_DIR}/pmlogger start
Starting pmlogger ...
pcp
Performance Co-Pilot configuration on pluto:
77
Archive Logging
3. Verify that the archive files are being created in the expected place:
ls ${PCP_LOG_DIR}/pmlogger/pluto
20170815.10.00.0
20170815.10.00.index
20170815.10.00.meta
Latest
pmlogger.log
4. Verify that no errors are being logged, and the rate of expected growth of the archives:
cat ${PCP_LOG_DIR}/pmlogger/pluto/pmlogger.log
Log for pmlogger on pluto started Thu Aug 15 10:00:11 2017
Config parsed
Starting primary logger for host "pluto"
Archive basename: 20170815.00.10
...
• Run the pmlogconf(1) utility to generate a configuration file, and (optionally) interactively
customize it further to suit local needs.
${PCP_BINADM_DIR}/pmlogconf ${PCP_SYSCONF_DIR}/pmlogger/config.venus
Creating config file "${PCP_SYSCONF_DIR}/pmlogger/config.venus" using default
${PCP_BINADM_DIR}/pmlogconf ${PCP_SYSCONF_DIR}/pmlogger/config.venus
78
Archive Logging
...
• Do nothing - a default configuration will be created in the following step, using pmlogconf(1)
probing and automatic file generation based on the metrics available at the remote host. The
${PCP_RC_DIR}/pmlogger start script handles this.
• Manually - create a configuration file with a text editor, or arrange to have one put in
place by configuration management tools like Puppet [https://puppetlabs.com/] or Chef [http://
www.opscode.com/chef/].
3. Start pmlogger:
${PCP_BINADM_DIR}/pmlogger_check
Restarting pmlogger for host "venus" ..... done
pcp
Performance Co-Pilot configuration on pluto:
79
Archive Logging
To create archive logs on the local host for performance metrics collected from multiple remote hosts,
repeat the steps in Procedure 6.1, “Creating Archive Logs” for each remote host (each with a new
control file entry).
Note
Depending on your platform, the crontab entry discussed here may already have been installed
for you, as part of the package installation process. In this case, the file /etc/cron.d/pcp-
pmlogger will exist, and the rest of this section can be skipped.
To activate the maintenance and housekeeping scripts for a collection of archive logs, execute the following
tasks while logged into the local host as the superuser (root):
1. Augment the crontab file for the pcp user. For example:
2. Edit ${HOME}/crontab.txt, adding lines similar to those from the sample ${PCP_VAR_DIR}/
config/pmlogger/crontab file for pmlogger_daily and pmlogger_check; for
example:
The automated archive log management services also create an archive folio named Latest for
each managed pmlogger instance, to provide a symbolic name to the most recent archive log. With
reference to Figure 6.1, “Archive Log Directory Structure”, this would mean the creation of the
folios ${PCP_LOG_DIR}/pmlogger/one/Latest and ${PCP_LOG_DIR}/pmlogger/two/
Latest.
The pmafm utility is completely described in the pmafm(1) man page, and provides the interactive
commands (single commands may also be executed from the command line) for the following services:
80
Archive Logging
• Executing PCP tools with their source of performance metrics assigned concurrently to all of the
component archives (where the tool supports this), or serially executing the PCP tool once per
component archive.
• If the folio was created by a single PCP monitoring tool, replaying all of the archives in the folio with
that monitoring tool.
• Restricting the processing to particular archives, or the archives associated with particular hosts.
• Merges the archives into a single log, while maintaining the correct time stamps for all values.
• Extracts all metric values within a temporal window that could encompass several archive logs.
See the pmlogextract(1) man page for full information on this command.
As an example, for Linux host pluto, report on its use of anonymous huge pages - average use, maximum,
time at which maximum occured, total number of samples in the set of archives, and the units used for the
values - as shown in Example 6.2, “Using pmlogsummary to Summarize Archive Information”:
pminfo -t mem.util.anonhugepages
mem.util.anonhugepages [amount of memory in anonymous huge pages]
See the pmlogsummary(1) man page for detailed information about this commands many options.
Primary Logger
On each system for which PMCD is active (each PCP collector system), there is an option to have a
distinguished instance of the archive logger pmlogger (the “primary” logger) launched each time PMCD
is started. This may be used to ensure the creation of minimalist archive logs required for ongoing system
81
Archive Logging
management and capacity planning in the event of failure of a system where a remote pmlogger may be
running, or because the preferred archive logger deployment is to activate pmlogger on each PCP collector
system.
Run the following command as superuser on each PCP collector system where you want to activate the
primary pmlogger:
chkconfig pmlogger on
The primary logger launches the next time the ${PCP_RC_DIR}/pmlogger start script runs. If you wish
this to happen immediately, follow up with this command:
${PCP_BINADM_DIR}/pmlogger_check -V
When it is started in this fashion, the ${PCP_PMLOGGERCONTROL_PATH} file (or one of the
${PCP_PMLOGGERCONTROL_PATH}.d files) must use the second field of one configuration
line to designate the primary logger, and usually will also use the pmlogger configuration file
${PCP_SYSCONF_DIR}/pmlogger/config.default (although the latter is not mandatory).
Using pmlc
You may tailor pmlogger dynamically with the pmlc command (if it is configured to allow access to
this functionality). Normally, the pmlogger configuration is read at startup. If you choose to modify the
config file to change the parameters under which pmlogger operates, you must stop and restart the
program for your changes to have effect. Alternatively, you may change parameters whenever required
by using the pmlc interface.
pmlc
By default, pmlc acts on the primary instance of pmlogger on the current host. See the pmlc(1) man page
for a description of command line options. When it is invoked, pmlc presents you with a prompt:
pmlc>
You may obtain a listing of the available commands by entering a question mark (?) and pressing Enter.
You see output similar to that in Example 6.3, “Listing Available Commands ”:
82
Archive Logging
Here is an example:
pmlc
pmlc> show loggers @babylon
The following pmloggers are running on babylon:
primary (1892)
pmlc> connect 1892 @babylon
pmlc> log advisory on 2 secs disk.dev.read
pmlc> query disk.dev
disk.dev.read
adv on nl 5 min [131073 or “disk1”]
adv on nl 5 min [131074 or “disk2”]
pmlc> quit
Note
Any changes to the set of logged metrics made via pmlc are not saved, and are lost the next time
pmlogger is started with the same configuration file. Permanent changes are made by modifying
the pmlogger configuration file(s).
Refer to the pmlc(1) and pmlogger(1) man pages for complete details.
Cause: Archive logs are considered sufficiently precious that pmlogger does not
empty or overwrite an existing set of archive log files. The log named foo
actually consists of the physical file foo.index, foo.meta, and at least
one file foo.N, where N is in the range 0, 1, 2, 3, and so on.
A message similar to the one above is produced when a new pmlogger instance
encounters one of these files already in existence.
Resolution: Move the existing archive aside, or if you are sure, remove all of the parts of
the archive log. For example, use the following command:
rm -f foo.*
83
Archive Logging
Cause: An archive consists of at least three physical files. If the base name for the
archive is mylog, then the archive actually consists of the physical files
mylog.index, mylog.meta, and at least one file mylog.N, where N is
in the range 0, 1, 2, 3, and so on.
Resolution: Use this command to check which files the utility is trying to open:
ls mylog.*
Turn on the internal debug flag DBG_TRACE_LOG (-D 128) to see which files
are being inspected by the pmOpenLog routine as shown in the following
example:
Locate the missing files and move them all to the same directory, or remove
all of the files that are part of the archive, and recreate the archive log.
Cause: There is either a primary pmlogger already running, or the previous primary
pmlogger was terminated unexpectedly before it could perform its cleanup
operations.
Resolution: If there is already a primary pmlogger running and you wish to replace it with
a new pmlogger, use the show command in pmlc to determine the process ID
of the primary pmlogger. The process ID of the primary pmlogger appears in
parentheses after the word “primary.” Send a SIGINT signal to the process
to shut it down (use either the kill command if the platform supports it, or
the pmsignal command). If the process does not exist, proceed to the manual
cleanup described in the paragraph below. If the process did exist, it should
now be possible to start the new pmlogger.
If pmlc's show command displays a process ID for a process that does not
exist, a pmlogger process was terminated before it could clean up. If it was
the primary pmlogger, the corresponding control files must be removed before
one can start a new primary pmlogger. It is a good idea to clean up any spurious
control files even if they are not for the primary pmlogger.
For the primary pmlogger, remove both the symbolic link and the file
(corresponding to its process ID) to which the link points. For other pmloggers,
remove just the process ID file. Do not remove any other files in the directory.
If the control file for an active pmlogger is removed, pmlc is not able to contact
it.
84
Archive Logging
Cause: The PID is not obvious from the log, or the archive name may not be obvious
from the output of the ps command.
pmdumplog -l foo
Log Label (Log Format Version 1)
Performance metrics from host gonzo
commencing Wed Aug 7 00:10:09.214 1996
ending Wed Aug 7 16:10:09.155 1996
All of the information describing the creator of the archive is revealed and, in
particular, the instance identifier for the PMCD metrics (10728 in the example
above) is the PID of the pmlogger instance, which may be used to control the
process via pmlc.
Cause: The label record at the start of each of the physical archive log files has become
either corrupted or one is out of sync with the others.
Resolution: If you believe the log may have been corrupted, this can be verified using
pmlogcheck. If corruption is limited to just the label record at the start, the
pmloglabel can be used to force the labels back in sync with each other, with
known-good values that you supply.
Cause: Either pmlogger encountered errors in the configuration file, has not flushed
its output buffers yet, or some (or all) metrics specified in the pmlogger
configuration file have had their state changed to advisory off or mandatory
off via pmlc. It is also possible that the logging interval specified in the
85
Archive Logging
pmlogger configuration file for some or all of the metrics is longer than the
period of time you have been waiting since pmlogger started.
Otherwise, use the status command for pmlc to interrogate the internal
pmlogger state of specific metrics.
86
Chapter 7. Performance Co-Pilot
Deployment Strategies
Table of Contents
Basic Deployment ............................................................................................................. 88
PCP Collector Deployment ................................................................................................. 89
Principal Server Deployment ....................................................................................... 89
Quality of Service Measurement .................................................................................. 90
PCP Archive Logger Deployment ........................................................................................ 91
Deployment Options .................................................................................................. 91
Resource Demands for the Deployment Options ............................................................. 92
Operational Management ............................................................................................ 92
Exporting PCP Archive Logs ...................................................................................... 92
PCP Inference Engine Deployment ...................................................................................... 92
Deployment Options .................................................................................................. 93
Resource Demands for the Deployment Options ............................................................. 94
Operational Management ............................................................................................ 94
Performance Co-Pilot (PCP) is a coordinated suite of tools and utilities allowing you to monitor
performance and make automated judgments and initiate actions based on those judgments. PCP is
designed to be fully configurable for custom implementation and deployed to meet specific needs in a
variety of operational environments.
Because each enterprise and site is different and PCP represents a new way of managing performance
information, some discussion of deployment strategies is useful.
The most common use of performance monitoring utilities is a scenario where the PCP tools are executed
on a workstation (the PCP monitoring system), while the interesting performance data is collected on
remote systems (PCP collector systems) by a number of processes, specifically the Performance Metrics
Collection Daemon (PMCD) and the associated Performance Metrics Domain Agents (PMDAs). These
processes can execute on both the monitoring system and one or more collector systems, or only on
collector systems. However, collector systems are the real objects of performance investigations.
• the section called “Basic Deployment”, presents the spectrum of deployment architectures at the highest
level.
• the section called “PCP Collector Deployment”, describes alternative deployments for PMCD and the
PMDAs.
• the section called “PCP Archive Logger Deployment”, covers alternative deployments for the pmlogger
tool.
• the section called “PCP Inference Engine Deployment”, presents the options that are available for
deploying the pmie tool.
The options shown in this chapter are merely suggestions. They are not comprehensive, and are intended to
demonstrate some possible ways of deploying the PCP tools for specific network topologies and purposes.
You are encouraged to use them as the basis for planning your own deployment, consistent with your needs.
87
Performance Co-Pilot
Deployment Strategies
Basic Deployment
In the simplest PCP deployment, one system is configured as both a collector and a monitor, as shown
in Figure 7.1, “PCP Deployment for a Single System”. Because some of the PCP monitor tools make
extensive use of visualization, this suggests the monitor system should be configured with a graphical
display.
Monitor Tool
Monitor Tool
PMCD
PMDAs
However, most PCP deployments involve at least two systems. For example, the setup shown in Figure 7.2,
“Basic PCP Deployment for Two Systems” would be representative of many common scenarios.
Monitor Tool
Collector System
Monitor Tool
PMCD
PMDAs
But the most common site configuration would include a mixture of systems configured as PCP collectors,
as PCP monitors, and as both PCP monitors and collectors, as shown in Figure 7.3, “General PCP
Deployment for Multiple Systems ”.
88
Performance Co-Pilot
Deployment Strategies
With one or more PCP collector systems and one or more PCP monitor systems, there are a number
of decisions that need to be made regarding the deployment of PCP services across multiple hosts. For
example, in Figure 7.3, “General PCP Deployment for Multiple Systems ” there are several ways in which
both the inference engine (pmie) and the PCP archive logger (pmlogger) could be deployed. These options
are discussed in the following sections of this chapter.
Monitor Tool
PMCD
Monitor Tool
PMDAs
Collector System
PMCD
PMDAs
Monitor System
Monitor Tool
Collector System
Monitor Tool
PMCD
PMDAs
• Database servers
• A video server
• A supercomputer
• An infrastructure service provider, for example, print, DNS, LDAP, gateway, firewall, router, or mail
services
89
Performance Co-Pilot
Deployment Strategies
Your objective may be to improve quality of service on a system functioning as a server for many clients.
You wish to identify and repair critical performance bottlenecks and deficiencies in order to maintain
maximum performance for clients of the server.
For some of these services, the PCP base product or the PCP add-on packages provide the necessary
collector components. Others would require customized PMDA development, as described in the
companion Performance Co-Pilot Programmer's Guide.
The arrangement in Figure 7.4, “PCP Deployment to Measure Client-Server Quality of Service” illustrates
one way of measuring quality of service for client-server applications.
Monitor Tool
Client
Application
Application
Server
The configuration of the PCP collector components on the Application Server System is standard. The
new facility is the deployment of some PCP collector components on the Application Client System; this
uses a customized PMDA and a generalization of the ICMP “ping” tool as follows:
• The Client App is specially developed to periodically make typical requests of the App Server,
and to measure the response time for these requests (this is an application-specific “ping”).
• The PMDA on the Application Client System captures the response time measurements from the
Client App and exports these into the PCP framework.
At the PCP monitor system, the performance of the system running the App Server and the end-user
quality of service measurements from the system where the Client App is running can be monitored
concurrently.
90
Performance Co-Pilot
Deployment Strategies
PCP contains a number of examples of this architecture, including the shping PMDA for IP-based
services (including HTTP), and the dbping PMDA for database servers.
The source code for each of these PMDAs is readily available; users and administrators are encouraged to
adapt these agents to the needs of the local application environment.
• Creating new instances of the Client App and PMDA to measure service quality for your own
mission-critical services.
• Deploying the Client App and associated PCP collector components in a number of strategic hosts
allows the quality of service over the enterprise's network to be monitored. For example, service can
be monitored on the Application Server System, on the same LAN segment as the Application Server
System, on the other side of a firewall system, or out in the WAN.
Deployment Options
The issue is relatively simple and reduces to “On which host(s) should pmlogger be running?” The options
are these:
• Run pmlogger on each PCP collector system to capture local performance data.
• Run pmlogger on some of the PCP monitor systems to capture performance data from remote PCP
collector systems.
• As an extension of the previous option, designate one system to act as the PCP archive site to run all
pmlogger instances. This arrangement is shown in Figure 7.5, “Designated PCP Archive Site”.
PMCD
Monitor and Collector System
PMDAs
pmlogger
pmlogger
Collector System
PMCD
PMDAs
PCP Archive Log
PCP Archive Log
91
Performance Co-Pilot
Deployment Strategies
A local pmlogger consumes disk bandwidth and disk space on the PCP collector system. A remote
pmlogger consumes disk space on the site where it is running and network bandwidth between that host
and the PCP collector host.
The archive logs typically grow at a rate of anywhere between a few kilobytes (KB) to tens of megabytes
(MB) per day, depending on how many performance metrics are logged and the choice of sampling
frequencies. There are some advantages in minimizing the number of hosts over which the disk resources
for PCP archive logs must be allocated; however, the aggregate requirement is independent of where the
pmlogger processes are running.
Operational Management
There is an initial administrative cost associated with configuring each pmlogger instance, and an ongoing
administrative investment to monitor these configurations, perform regular housekeeping (such as rotation,
compression, and culling of PCP archive log files), and execute periodic tasks to process the archives (such
as nightly performance regression checking with pmie).
Many of these tasks are handled by the supplied pmlogger administrative tools and scripts, as described
in the section called “Archive Log File Management”. However, the necessity and importance of these
tasks favor a centralized pmlogger deployment, as shown in Figure 7.5, “Designated PCP Archive Site”.
NFS mounting is obviously an option, but the PCP tools support random access and both forward and
backward temporal motion within an archive log. If an archive is to be subjected to intensive and interactive
processing, it may be more efficient to copy the files of the archive log to the PCP monitor system first.
Note
Each PCP archive log consists of at least three separate files (see the section called “Archive Log
File Management” for details). You must have concurrent access to all of these files before a PCP
tool is able to process an archive log correctly.
The following sections discuss the options and trade-offs for pmie deployment.
92
Performance Co-Pilot
Deployment Strategies
Deployment Options
The issue is relatively simple and reduces to “On which host(s) should pmie be running?” You must
consider both real-time and retrospective uses, and the options are as follows:
• For real-time analysis, run pmie on each PCP collector system to monitor local system performance.
• For real-time analysis, run pmie on some of the PCP monitor systems to monitor the performance of
remote PCP collector systems.
• For retrospective analysis, run pmie on the systems where the PCP archive logs reside. The problem then
reduces to pmlogger deployment as discussed in the section called “PCP Archive Logger Deployment”.
• As an example of the “distributed management with centralized control” philosophy, designate some
system to act as the PCP Management Site to run all pmlogger and pmie instances. This arrangement
is shown in Figure 7.6, “PCP Management Site Deployment”.
One pmie instance is capable of monitoring multiple PCP collector systems; for example, to evaluate
some universal rules that apply to all hosts. At the same time a single PCP collector system may be
monitored by multiple pmie instances; for example, for site-specific and universal rule evaluation, or
to support both tactical performance management (operations) and strategic performance management
(capacity planning). Both situations are depicted in Figure 7.6, “PCP Management Site Deployment”.
Collector System A
PMCD
pmlogger
PMDAs pmlogger
Collector System B
PMDAs
pmie
pmie Rules
pmie
pmie Rules
93
Performance Co-Pilot
Deployment Strategies
Operational Management
An initial administrative cost is associated with configuring each pmie instance, particularly in the
development of the rule sets that accurately capture and classify “good” versus “bad” performance in
your environment. These rule sets almost always involve some site-specific knowledge, particularly
in respect to the “normal” levels of activity and resource consumption. The pmieconf tool (see the
section called “Creating pmie Rules with pmieconf”) may be used to help develop localized rules based
upon parameterized templates covering many common performance scenarios. In complex environments,
customizing these rules may occur over an extended period and require considerable performance analysis
insight.
One of the functions of pmie provides for continual detection of adverse performance and the automatic
generation of alarms (visible, audible, e-mail, pager, and so on). Uncontrolled deployment of this alarm
initiating capability throughout the enterprise may cause havoc.
These considerations favor a centralized pmie deployment at a small number of PCP monitor sites, or in
a PCP Management Site as shown in Figure 7.6, “PCP Management Site Deployment”.
However, it is most likely that knowledgeable users with specific needs may find a local deployment of
pmie most useful to track some particular class of service difficulty or resource utilization. In these cases,
the alarm propagation is unlikely to be required or is confined to the system on which pmie is running.
Configuration and management of a number of pmie instances is made much easier with the scripts and
control files described in the section called “Management of pmie Processes”.
94
Chapter 8. Customizing and Extending
PCP Services
Table of Contents
PMDA Customization ........................................................................................................ 95
Customizing the Summary PMDA ............................................................................... 95
PCP Tool Customization .................................................................................................... 98
Archive Logging Customization .................................................................................. 98
Inference Engine Customization .................................................................................. 99
PMNS Management ......................................................................................................... 100
PMNS Processing Framework ................................................................................... 101
PMNS Syntax ......................................................................................................... 101
PMDA Development ........................................................................................................ 103
PCP Tool Development .................................................................................................... 103
Performance Co-Pilot (PCP) has been developed to be fully extensible. The following sections summarize
the various facilities provided to allow you to extend and customize PCP for your site:
• the section called “PMDA Customization”, describes the procedure for customizing the summary
PMDA to export derived metrics formed by aggregation of base PCP metrics from one or more collector
hosts.
• the section called “PCP Tool Customization”, describes the various options available for customizing
and extending the basic PCP tools.
• the section called “PMNS Management”, covers the concepts and tools provided for updating the PMNS
(Performance Metrics Name Space).
• the section called “PMDA Development”, details where to find further information to assist in the
development of new PMDAs to extend the range of performance metrics available through the PCP
infrastructure.
• the section called “PCP Tool Development”, outlines how new tools may be developed to process
performance data from the PCP infrastructure.
PMDA Customization
The generic procedures for installing and activating the optional PMDAs have been described in the section
called “Managing Optional PMDAs”. In some cases, these procedures prompt the user for information
based upon the local system or network configuration, application deployment, or processing profile to
customize the PMDA and hence the performance metrics it exports.
95
Customizing and
Extending PCP Services
pmie process Periodically samples the base metrics and compute values for the derived
metrics. This dedicated instance of the PCP pmie inference engine is
launched with special command line arguments by the main process. See
the section called “Introduction to pmie”, for a complete discussion of the
pmie feature set.
main process Reads and buffers the values computed by the pmie process and makes them
available to the Performance Metrics Collection Daemon (PMCD).
All of the metrics exported by the summary PMDA have a singular instance and the values are
instantaneous; the exported value is the correct value as of the last time the corresponding expression was
evaluated by the pmie process.
The summary PMDA resides in the ${PCP_PMDAS_DIR}/summary directory and may be installed
with a default configuration by following the steps described in the section called “PMDA Installation on
a PCP Collector Host”.
Alternatively, you may customize the summary PMDA to export your own derived performance metrics
by following the steps in Procedure 8.1, “Customizing the Summary PMDA”:
cd ${PCP_VAR_DIR}/pmns ./Make.stdpmid
2. Choose Performance Metric Name Space (PMNS) names for the new metrics. These must begin with
summary and follow the rules described in the pmns(5) man page. For example, you might use
summary.fs.cache_write and summary.fs.cache_hit.
3. Edit the pmns file in the ${PCP_PMDAS_DIR}/summary directory to add the new metric names
in the format described in the pmns(5) man page. You must choose a unique performance metric
identifier (PMID) for each metric. In the pmns file, these appear as SYSSUMMARY:0:x. The value
of x is arbitrary in the range 0 to 1023 and unique in this file. Refer to the section called “PMNS
Management”, for a further explanation of the rules governing PMNS updates.
For example:
summary {
cpu
disk
netif
fs /*new*/
}
summary.fs {
cache_write SYSSUMMARY:0:10
cache_hit SYSSUMMARY:0:11
}
4. Use the local test PMNS root and validate that the PMNS changes are correct.
96
Customizing and
Extending PCP Services
6. Run pmie in debug mode to verify that the expressions are being evaluated correctly, and the values
make sense.
summary.fs.cache_write: ?
summary.fs.cache_hit: ?
summary.fs.cache_write: 45.83
summary.fs.cache_hit: 83.2
summary.fs.cache_write: 39.22
summary.fs.cache_hit: 84.51
./Install
97
Customizing and
Extending PCP Services
metric: summary.fs.cache_write
host: localhost
semantics: instantaneous value
units: none
samples: 8
interval: 5.00 sec
63.60132158590308
62.71878646441073
62.71878646441073
58.73968492123031
58.73968492123031
65.33822758259046
65.33822758259046
72.6099706744868
Note that the values are being sampled here by pmval every 5 seconds, but pmie is passing only new
values to the summary PMDA every 10 seconds. Both rates could be changed to suit the dynamics
of your new metrics.
9. You may now create pmchart views, pmie rules, and pmlogger configurations to monitor and
archive your new performance metrics.
In most cases, the PCP tools provide orthogonal services, based on external configuration files. It is the
creation of new and modified configuration files that enables PCP users to customize tools quickly and
meet the needs of the local environment, in many cases allowing personal preferences to be established
for individual users on the same PCP monitor system.
The material in this section is intended to act as a checklist of pointers to detailed documentation found
elsewhere in this guide, in the man pages, and in the files that are made available as part of the PCP
installation.
The following global files and directories influence the behavior of pmlogger:
chkconfig pmlogger on
98
Customizing and
Extending PCP Services
The following global files and directories influence the behavior of pmie:
chkconfig pmie on
99
Customizing and
Extending PCP Services
PMNS Management
This section describes the syntax, semantics, and processing framework for the external specification
of a Performance Metrics Name Space (PMNS) as it might be loaded by the PMAPI routine
pmLoadNameSpace; see the pmLoadNameSpace(3) man page. This is usually done only by pmcd,
except in rare circumstances such as the section called “Customizing the Summary PMDA”.
100
Customizing and
Extending PCP Services
The PMNS specification is a simple text source file that can be edited easily. For reasons of efficiency,
a binary format is also supported; the utility pmnscomp translates the ASCII source format into binary
format; see the pmnscomp(1) man page.
• C-style comments
• #include directives
When pmcpp(1) is executed, the standard include directories are the current directory and
${PCP_VAR_DIR}/pmns, where some standard macros and default specifications may be found.
PMNS Syntax
Every PMNS is tree structured. The paths to the leaf nodes are the performance metric names. The general
syntax for a non-leaf node in PMNS is as follows:
pathname {
name [pmid]
...
}
Here pathname is the full pathname from the root of the PMNS to this non-leaf node, with each
component in the path separated by a period. The root node for the PMNS has the special name root, but
the prefix string root. must be omitted from all other pathnames.
For example, refer to the PMNS shown in Figure 8.1, “Small Performance Metrics Name Space
(PMNS)”. The correct pathname for the rightmost non-leaf node is cpu.utilization, not
root.cpu.utilization.
101
Customizing and
Extending PCP Services
root
network cpu
Each component in the pathname must begin with an alphabetic character and be followed by zero or
more alphanumeric characters or the underscore (_) character. For alphabetic characters in a component,
uppercase and lowercase are significant.
Non-leaf nodes in the PMNS may be defined in any order desired. The descendent nodes are defined by
the set of names, relative to the pathname of their parent non-leaf node. For descendent nodes, leaf nodes
have a pmid specification, but non-leaf nodes do not.
The syntax for the pmid specification was chosen to help manage the allocation of Performance Metric
IDs (PMIDs) across disjoint and autonomous domains of administration and implementation. Each pmid
consists of three integers separated by colons, for example, 14:27:11. This is intended to mirror the
implementation hierarchy of performance metrics. The first integer identifies the domain in which the
performance metric lies. Within a domain, related metrics are often grouped into clusters. The second
integer identifies the cluster, and the third integer, the metric within the cluster.
The PMNS specification for Figure 8.1, “Small Performance Metrics Name Space (PMNS)” is shown in
Example 8.1, “PMNS Specification”:
102
Customizing and
Extending PCP Services
in KERNEL:NETWORK:35
out KERNEL:NETWORK:36
}
#define CPU 10
cpu {
syscalls KERNEL:CPU:10
utilization
}
#define USER 20
#define SYSTEM 21
#define IDLE 22
cpu.utilization {
user KERNEL:CPU:USER
sys KERNEL:CPU:SYSTEM
idle KERNEL:CPU:IDLE
}
For complete documentation of the PMNS and associated utilities, see the pmns(5), pmnsadd(1),
pmnsdel(1) and pmnsmerge(1) man pages.
PMDA Development
Performance Co-Pilot (PCP) is designed to be extensible at the collector site.
Application developers are encouraged to create new PMDAs to export performance metrics from the
applications and service layers that are particularly relevant to a specific site, application suite, or
processing environment.
These PMDAs use the routines of the libpcp_pmda library, which is discussed in detail in the
Performance Co-Pilot Programmer's Guide.
Application developers are encouraged to create new PCP client applications to monitor or display
performance metrics in a manner that is particularly relevant to a specific site, application suite, or
processing environment.
Client applications use the routines of the PMAPI (performance metrics application programming
interface) described in the Performance Co-Pilot Programmer's Guide. At the time of writing, native
PMAPI interfaces are available for the C, C++ and Python languages.
103
Appendix A. Acronyms
Table A.1, “Performance Co-Pilot Acronyms and Their Meanings ” provides a list of the acronyms used
in the Performance Co-Pilot (PCP) documentation, help cards, man pages, and user interface.
104
client-server architecture PCP Distributed Operation
Index collection time Current Metric Context
collector hosts Distributed Collection Collector and
*_inst operator Arithmetic Aggregation
Monitor Roles PMDA Installation on a PCP Collector
*_sample operator Arithmetic Aggregation
Host
2D tools Monitoring System Performance
comments Comments
64-bit IEEE format Descriptions for Performance Metrics
common directories Common Directories and File
pmGetConfig function PCP Environment Variables
Locations
acronyms Acronyms
component software Overview of Component Software
active pmlogger process Identifying an Active pmlogger
conceptual foundations Conceptual Foundations
Process
configuring PCP Installing and Configuring Performance
adaptation Dynamic Adaptation to Change
Co-Pilot
application programs Application and Agent
conventions Common Conventions and Arguments
Development Sources of Performance Metrics and Their
cookbook Cookbook for Archive Logging
Domains
count_host operator Arithmetic Aggregation
archive logs
cron scripts Introduction to Archive Logging
administration Archive Log Administration
Administering PCP Archive Logs Using cron Scripts
analysis Logging and Retrospective Analysis
customization
capacity planning Using Archive Logs for Capacity
archive logs Archive Logging Customization
Planning
inference engine Inference Engine Customization
collection time Current Metric Context
PCP services Customizing and Extending PCP
contents PCP Archive Contents
Services
creation Collecting, Transporting, and Archiving
data collection tools Collecting, Transporting, and
Performance Information
Archiving Performance Information
customization Automated Operational Support
dbpmda tool Application and Agent Development
Archive Logging Customization
debugging tools Operational and Infrastructure Support
export Exporting PCP Archive Logs
deployment strategies Performance Co-Pilot Deployment
fetching metrics Fetching Metrics from an Archive
Strategies
Log
diagnostic tools Operational and Infrastructure Support
file management Archive Log File Management
DISPLAY variable pmie Rule Expressions
folios PCP Archive Folios
distributed collection Distributed Collection
physical filenames Fetching Metrics from an Archive
domains Unification of Performance Metric Domains
Log
DSO Acronyms
PMAPI Archive Logs and the PMAPI
duration Performance Monitor Reporting Frequency and
retrospective analysis Retrospective Analysis Using
Duration
Archive Logs
dynamic adaptation Dynamic Adaptation to Change
troubleshooting Archive Logging Troubleshooting
environ man page Timezone Options
usage Archive Logging
environment variables PCP Environment Variables
arithmetic aggregation Arithmetic Aggregation
error detection pmie Error Detection
arithmetic expressions pmie Arithmetic Expressions
${PCP_PMLOGGERCONTROL_PATH} file Primary
audience Empowering the PCP User
Logger
audits Automated Operational Support
${PCP_DIR}/etc/pcp.conf file Common Directories and
automated operational support Automated Operational
File Locations PCP Environment Variables
Support
${PCP_DIR}/etc/pcp.env file Common Directories and
avg_host operator Arithmetic Aggregation
File Locations PCP Environment Variables
basename conventions Basename Conventions
${PCP_RC_DIR}/pmcd file Common Directories and
Boolean expressions Boolean Expressions
File Locations
capacity planning Using Archive Logs for Capacity
evaluation frequency Setting Evaluation Frequency
Planning
extensibility PCP Extensibility Customizing and
caveats Caveats and Notes on pmie
Extending PCP Services
centralized archive logging Automated Operational
external equipment Sources of Performance Metrics and
Support
Their Domains
coverage Metric Coverage
fetching metrics Fetching Metrics from Another Host
chkhelp tool Application and Agent Development
Fetching Metrics from an Archive Log
105
Index
file locations Common Directories and File Locations operational support tools Operational and Infrastructure
firewalls Running PCP Tools through a Firewall Support
flush command Coordination between pmlogger and operators Quantification Operators
PCP tools overview Introduction to Performance Co-Pilot
folios PCP Archive Folios pcp-atop tool
functional domains Sources of Performance Metrics and brief description Performance Monitoring and
Their Domains Visualization
glossary Acronyms pmcd.options file The pmcd.options File
illegal label record Illegal Label Record PCP
inference engine Inference Engine Customization acronym Acronyms
infrastructure support tools Operational and Infrastructure archive logger deployment PCP Archive Logger
Support Deployment
installing PCP Installing and Configuring Performance collector deployment PCP Collector Deployment
Co-Pilot configuring and installing Installing and Configuring
intrinsic operators pmie Intrinsic Operators Performance Co-Pilot
I/O Acronyms conventions Common Conventions and Arguments
IPC Acronyms distributed operation PCP Distributed Operation
kill command Primary pmlogger Cannot Start environment variables PCP Environment Variables
layered software services Sources of Performance Metrics extensibility PCP Extensibility Product Extensibility
and Their Domains features Introduction to Performance Co-Pilot
lexical elements Lexical Elements log file option Fetching Metrics from an Archive Log
libpcp_mmv library Product Extensibility naming conventions Common Conventions and
libpcp_pmda library Product Extensibility Arguments
log volumes Log Volumes pmie capabilities Introduction to pmie
logging (see archive logs) pmie tool pmie use of PCP services
logical constants Logical Constants tool customization PCP Tool Customization
logical expressions pmie Logical Expressions tool development PCP Tool Development
macros Macros tool summaries Performance Monitoring and
man command Visualization Collecting, Transporting, and Archiving
usage Monitoring System Performance Performance Information Operational and
max_host operator Arithmetic Aggregation Infrastructure Support Application and Agent
metadata Descriptions for Performance Metrics Development
metric domains Unification of Performance Metric pcp tool Operational and Infrastructure Support
Domains Operational and Infrastructure Support
metric wraparound Performance Metrics Wraparound PCP Tutorials and Case Studies
min_host operator Arithmetic Aggregation pminfo command The pminfo Command
mkaf tool Collecting, Transporting, and Archiving pmval command The pmval Command
Performance Information Introduction to Archive PCP_COUNTER_WRAP variable PCP Environment
Logging Variables Performance Metric Wraparound Performance
monitor configuration Product Structure Metrics Wraparound
monitor hosts Collector and Monitor Roles PCP_STDERR variable PCP Environment Variables
monitoring system performance Monitoring System PCPIntro command PMCD Does Not Start Performance
Performance Monitor Reporting Frequency and Duration
naming scheme Uniform Naming and Access to PDU The pmcd.options File Acronyms
Performance Metrics Performance Co-Pilot (see PCP)
netstat command PMCD Does Not Start Performance Metric Identifier (see PMID)
network routers and bridges Sources of Performance performance metric wraparound Performance Metric
Metrics and Their Domains Wraparound Performance Metrics Wraparound
network transportation tools Collecting, Transporting, performance metrics
and Archiving Performance Information concept Performance Metrics
newhelp tool Application and Agent Development descriptions Descriptions for Performance Metrics
Mail servers Sources of Performance Metrics and Their methods Sources of Performance Metrics and Their
Domains Domains
objectives Objectives
106
Index
107
Index
monitoring utilities Performance Co-Pilot Deployment pmdaweblog tool Collecting, Transporting, and
Strategies Archiving Performance Information
removal PMDA Removal on a PCP Collector Host pmdaxfs tool Collecting, Transporting, and Archiving
unification Unification of Performance Metric Performance Information
Domains pmdbg facility Operational and Infrastructure Support
pmdaapache tool Collecting, Transporting, and Archiving pmdumplog tool
Performance Information archive log contents PCP Archive Contents
pmdacisco tool Collecting, Transporting, and Archiving brief description Collecting, Transporting, and
Performance Information Archiving Performance Information
pmdaelasticsearch tool Collecting, Transporting, and troubleshooting Cannot Find Log
Archiving Performance Information pmrep tool
pmdagfs2 tool Collecting, Transporting, and Archiving brief description Performance Monitoring and
Performance Information Visualization
pmdagluster tool Collecting, Transporting, and Archiving description The pmrep Command
Performance Information pmerr tool Operational and Infrastructure Support
pmdainfiniband tool Collecting, Transporting, and pmgenmap tool Application and Agent Development
Archiving Performance Information pmhostname tool Operational and Infrastructure Support
pmdakvm tool Collecting, Transporting, and Archiving PMID
Performance Information acronym Acronyms
pmdalustrecomm tool Collecting, Transporting, and description Sources of Performance Metrics and Their
Archiving Performance Information Domains Performance Metrics Name Space
pmdamailq tool Collecting, Transporting, and Archiving PMNS names Customizing the Summary PMDA
Performance Information printing The pminfo Command
pmdamemcache tool Collecting, Transporting, and pmie tool
Archiving Performance Information %-token pmie Rule Expressions
pmdammv tool Collecting, Transporting, and Archiving arithmetic aggregation Arithmetic Aggregation
Performance Information arithmetic expressions pmie Arithmetic Expressions
pmdamysql tool Collecting, Transporting, and Archiving automated reasoning Introduction to pmie
Performance Information basic examples Basic pmie Usage
pmdanamed tool Collecting, Transporting, and Archiving brief description Performance Monitoring and
Performance Information Visualization Operational and Infrastructure Support
pmdanginx tool Collecting, Transporting, and Archiving customization Introduction to pmie
Performance Information developing rules Developing and Debugging pmie
pmdapostfix tool Collecting, Transporting, and Archiving Rules
Performance Information error detection pmie Error Detection
pmdapostgresql tool Collecting, Transporting, and examples Simple pmie Usage Complex pmie
Archiving Performance Information Examples
pmdaproc tool Collecting, Transporting, and Archiving fetching metrics Fetching Metrics from Another Host
Performance Information global files and directories Global Files and
pmdarsyslog tool Collecting, Transporting, and Directories
Archiving Performance Information instance names pmie Instance Names
pmdasamba tool Collecting, Transporting, and Archiving intrinsic operators pmie Intrinsic Operators
Performance Information language Introduction to pmie Specification Language
pmdasendmail tool Collecting, Transporting, and for pmie
Archiving Performance Information logical expressions pmie Logical Expressions
pmdasnmp tool Collecting, Transporting, and Archiving metric expressions pmie Metric Expressions
Performance Information performance metrics inference engine Performance
pmdasummary tool Collecting, Transporting, and Metrics Inference Engine
Archiving Performance Information pmieconf rules Performance Monitoring and
pmdasystemd tool Collecting, Transporting, and Visualization Creating pmie Rules with pmieconf
Archiving Performance Information procedures Creating pmie Rules with pmieconf
pmdavmware tool Collecting, Transporting, and Management of pmie Processes
Archiving Performance Information rate conversion pmie Rate Conversion
rate operator The rate Operator
108
Index
real examples pmie Examples archive logs Fetching Metrics from an Archive Log
remote PMCD Cannot Connect to Remote PMCD Introduction to Archive Logging
sample intervals pmie Sample Intervals brief description Collecting, Transporting, and
setting evaluation frequency Setting Evaluation Archiving Performance Information
Frequency configuration Configuration of pmlogger Using pmlc
syntax Basic pmie Syntax cookbook tasks Cookbook for Archive Logging
transitional operators Transitional Operators current metric context Current Metric Context
pmevent tool folios PCP Archive Folios
brief description Performance Monitoring and PCP tool coordination Coordination between
Visualization pmlogger and PCP tools
pmieconf tool pmlc control Introduction to Archive Logging
brief description Performance Monitoring and primary instance Primary Logger
Visualization remote PMCD Cannot Connect to Remote PMCD
customization Introduction to pmie TCP/IP firewall Running PCP Tools through a
rules Creating pmie Rules with pmieconf Firewall
pminfo tool troubleshooting Archive Logging Troubleshooting
brief description Performance Monitoring and pmlogger_check script Operational and Infrastructure
Visualization Support Administering PCP Archive Logs Using cron
description The pminfo Command Scripts
displaying the PMNS Performance Metrics Name pmlogger_daily script Operational and Infrastructure
Space Support Administering PCP Archive Logs Using cron
PCP Tutorials and Case Studies The pminfo Scripts
Command pmlogger_merge script Operational and Infrastructure
pmie arguments pmie and the Performance Metrics Support Administering PCP Archive Logs Using cron
Collection System Scripts
pmstat tool PMLOGGER_PORT variable Running PCP Tools
brief description Performance Monitoring and through a Firewall PCP Environment Variables
Visualization PMLOGGER_LOCAL variable PCP Environment
description The pmstat Command Variables
pmlc tool pmlogsummary tool Performance Monitoring and
brief description Collecting, Transporting, and Visualization Summarizing Archive Logs with
Archiving Performance Information pmlogsummary
description Using pmlc pmnewlog tool Operational and Infrastructure Support
dynamic adjustment Introduction to Archive Logging PMNS
flush command Coordination between pmlogger and acronym Acronyms
PCP tools brief description Performance Metrics
PMLOGGER_PORT variable PCP Environment defined names Uniform Naming and Access to
Variables Performance Metrics
PMLOGGER_LOCAL variable PCP Environment description Performance Metrics Name Space
Variables management PMNS Management
show command Primary pmlogger Cannot Start metric expressions pmie Metric Expressions
SIGHUP signal Log Volumes names Customizing the Summary PMDA
TCP/IP firewall Running PCP Tools through a PMNS Alternate Performance Metric Name Spaces
Firewall syntax PMNS Syntax
pmlock tool Operational and Infrastructure Support troubleshooting Performance Metrics Name Space
pmlogcheck tool Collecting, Transporting, and Archiving PMPROXY_PORT variable PCP Environment Variables
Performance Information PMPROXY_LOCAL variable PCP Environment
pmlogconf tool Collecting, Transporting, and Archiving Variables
Performance Information pmnsadd tool Operational and Infrastructure Support
pmlogextract tool Collecting, Transporting, and pmnsdel tool Operational and Infrastructure Support
Archiving Performance Information Manipulating pmprintf tool PCP Environment Variables
Archive Logs with pmlogextract pmprobe tool Performance Monitoring and Visualization
pmlogger tool PCP Environment Variables pmie Rule pmrun tool Common Conventions and Arguments
Expressions pmsnap tool
109
Index
110
Index
${PCP_PMLOGGERCONTROL_PATH} file
Administering PCP Archive Logs Using cron Scripts
Directory Organization for Archive Log Files
${PCP_DEMOS_DIR} pmie Examples
${PCP_VAR_DIR}/pmns/stdpmid file PMDA
Installation on a PCP Collector Host
${PCP_TMP_DIR}/pmlogger files Primary pmlogger
Cannot Start
window options Time Window Options
111