Nothing Special   »   [go: up one dir, main page]

Why Measure Performance? Different Purposes Require Different Measures

You are on page 1of 21

Robert D.

Behn
Harvard University

Why Measure Performance?


Different Purposes Require Different Measures

Performance measurement is not an end in itself. So why should public managers measure perfor-
mance? Because they may find such measures helpful in achieving eight specific managerial pur-
poses. As part of their overall management strategy, public managers can use performance mea-
sures to evaluate, control, budget, motivate, promote, celebrate, learn, and improve. Unfortu-
nately, no single performance measure is appropriate for all eight purposes. Consequently, public
managers should not seek the one magic performance measure. Instead, they need to think seri-
ously about the managerial purposes to which performance measurement might contribute and
how they might deploy these measures. Only then can they select measures with the characteristics
necessary to help achieve each purpose. Without at least a tentative theory about how perfor-
mance measures can be employed to foster improvement (which is the core purpose behind the
other seven), public managers will be unable to decide what should be measured.

Everyone is measuring performance.1 Public managers Why Measure Performance?


are measuring the performance of their organizations, their What is behind all of this measuring of performance?
contractors, and the collaboratives in which they partici- What do people expect to do with the measures—other
pate. Congress, state legislatures, and city councils are in- than use them to beat up on some underperforming agency,
sisting that executive-branch agencies periodically report bureaucrat, or contractor? How are people actually using
measures of performance. Stakeholder organizations want these performance measures? What is the rationale that
performance measures so they can hold government ac- connects the measurement of government’s performance
countable. Journalists like nothing better than a front-page to some higher purpose? After all, neither the act of mea-
bar chart that compares performance measures for various suring performance nor the resulting data accomplishes
jurisdictions—whether they are average test scores for the anything itself; only when someone uses these measures
city’s schools or FBI uniform crime statistics for the state’s in some way do they accomplish something. For what pur-
cities. Moreover, public agencies are taking the initiative poses do—or might—people measure the performance of
to publish compilations of their own performance mea- public agencies, public programs, nonprofit and for-profit
surements (Murphey 1999). A major trend among the na- contractors, or the collaboratives of public, nonprofit, and
tions that comprise the Organisation for Economic Co- for-profit organizations that deliver public services?2
operation and Development, concludes Alexander Kouzmin Why measure performance? Because measuring perfor-
(1999) of the University of Western Sydney and his col- mance is good. But how do we know it is good? Because
leagues, is “the development of measurement systems business firms all measure their performance, and every-
which enable comparison of similar activities across a num- one knows that the private sector is managed better than
ber of areas,” (122) and which “help to establish a perfor-
mance-based culture in the public sector” (123). “Perfor- Robert D. Behn is a lecturer at Harvard University’s John F. Kennedy School
of Government and the faculty chair of its executive program Driving Gov-
mance measurement,” writes Terrell Blodgett of the ernment Performance. His research focuses on governance, leadership, and
University of Texas and Gerald Newfarmer of Manage- performance management. His latest book is Rethinking Democratic Account-
ability (Brookings Institution, 2001). He believes the most important perfor-
ment Partners, Inc., is “(arguably) the hottest topic in gov- mance measure is 1918: the last year the Boston Red Sox won the World
ernment today” (1996, 6). Series. Email: redsox@ksg.harvard.edu.

586 Public Administration Review • September/October 2003, Vol. 63, No. 5


the public sector. Unfortunately, the kinds of financial ra- ed measurement systems will undergird management
tios the business world uses to measure a firm’s perfor- processes, better inform resource allocation decisions,
mance are not appropriate for the public sector. So what enhance legislative oversight, and increase accounta-
should public agencies measure? Performance, of course. bility” (1995, 37).
But what kind of performance should they measure, how • Performance measurement, write David Osborne and
should they measure it, and what should they do with these Peter Plastrik in The Reinventor’s Fieldbook, “enables
measurements? A variety of commentators offer a variety officials to hold organizations accountable and to
of purposes: introduce consequences for performance. It helps citizens
• Joseph Wholey of the University of Southern California and customers judge the value that government creates
and Kathryn Newcomer of George Washington University for them. And it provides managers with the data they
observe that “the current focus on performance need to improve performance” (2000, 247).
measurement at all levels of government and in nonprofit • Robert Kravchuk of Indiana University and Ronald
organizations reflects citizen demands for evidence of Schack of the Connecticut Department of Labor do not
program effectiveness that have been made around the offer a specific list of purposes for measuring perfor-
world” (1997, 92). mance. Nevertheless, imbedded in their proposals for
• In their case for performance monitoring, Wholey and the designing effective performance measures, they suggest
Urban Institute’s Harry Hatry note that “performance a number of different purposes: planning, evaluation,
monitoring systems are beginning to be used in budget organizational learning, driving improvement efforts,
formulation and resource allocation, employee motivation, decision making, resource allocation, control, facilitat-
performance contracting, improving government services ing the devolution of authority to lower levels of the hi-
and improving communications between citizens and erarchy, and helping to promote accountability (Krav-
government” (1992, 604), as well as for “external chuck and Schack 1996, 348, 349, 350, 351).
accountability purposes” (609). Performance measures can be used for multiple pur-
• “Performance measurement may be done annually to poses. Moreover, different people have different purposes.
improve public accountability and policy decision Legislators have different purposes than journalists. Stake-
making,” write Wholey and Newcomer, “or done more holders have different purposes than public managers.
frequently to improve management and program Consequently, I will focus on just those people who man-
effectiveness” (1997, 98). age public agencies.
• The Governmental Accounting and Standards Board
suggests that performance measures are “needed for
setting goals and objectives, planning program activities Eight Managerial Purposes for Measuring
to accomplish these goals, allocating resources to these Performance
programs, monitoring and evaluating the results to What purpose—exactly—is a public manager attempt-
determine if they are making progress in achieving the ing to achieve by measuring performance? Even for this
established goals and objectives, and modifying program narrower question, the answer isn’t obvious. One analyst
plans to enhance performance” (Hatry et al. 1990, v). admonishes public managers: “Always remember that the
• Municipalities, notes Mary Kopczynski of the Urban intent of performance measures is to provide reliable and
Institute and Michael Lombardo of the International City/ valid information on performance” (Theurer 1998, 24). But
County Management Association, can use comparative that hardly answers the question. What will public manag-
performance data in five ways: “(1) to recognize good ers do with all of this reliable and valid information? Pro-
performance and to identify areas for improvement; (2) ducing reliable and valid reports of government perfor-
to use indicator values for higher-performing jurisdic- mance is no end in itself. All of the reliable and valid data
tions as improvement targets by jurisdictions that fall about performance is of little use to public managers if
short of the top marks; (3) to compare performance they lack a clear idea about how to use them or if the data
among a subset of jurisdictions believed to be similar in are not appropriate for this particular use. So what, ex-
some way (for example, in size, service delivery practice, actly, will performance measurement do, and what kinds
geography, etc); (4) to inform stakeholders outside of of measures do public managers need to do this? Indeed,
the local government sector (such as citizens or business what is the logic behind all of this performance measure-
groups); and (5) to solicit joint cooperation in improving ment—the causal link between the measures and the pub-
future outcomes in respective communities” (1999, 133). lic manager’s effort to achieve specific policy purposes?
• Advocates of performance measurement in local Hatry offers one of the few enumerated lists of the uses
government, observes David Ammons of the University of performance information. He suggests that public man-
of North Carolina, “have promised that more sophisticat- agers can use such information to perform ten different

Why Measure Performance? 587


tasks: to (1) respond to elected officials’ and the public’s Table 1 Eight Purposes that Public Managers Have for
demands for accountability; (2) make budget requests; (3) Measuring Performance
do internal budgeting; (4) trigger in-depth examinations The purpose The public manager’s question that the performance
of performance problems and possible corrections; (5) measure can help answer
motivate; (6) contract; (7) evaluate; (8) support strategic Evaluate How well is my public agency performing?
planning; (9) communicate better with the public to build Control How can I ensure that my subordinates are doing the right
thing?
public trust; and (10) improve.3 Hatry notes that improv- Budget On what programs, people, or projects should my agency
ing programs is the fundamental purpose of performance spend the public’s money?
measurement, and all but two of these ten uses—improv- Motivate How can I motivate line staff, middle managers, nonprofit
and for-profit collaborators, stakeholders, and citizens to
ing accountability and increasing communications with the do the things necessary to improve performance?
public—“are intended to make program improvements that Promote How can I convince political superiors, legislators,
lead to improved outcomes” (1999b, 158, 157). stakeholders, journalists, and citizens that my agency is
doing a good job?
My list is slightly different. From the diversity of rea- Celebrate What accomplishments are worthy of the important
sons for measuring performance, I think public managers organizational ritual of celebrating success?
have eight primary purposes that are specific and distinct Learn Why is what working or not working?
(or only marginally overlapping4). As part of their overall Improve What exactly should who do differently to improve
performance?
management strategy, the leaders of public agencies can
use performance measurement to (1) evaluate; (2) control; ing upon what people mean by accountability, they may
(3) budget; (4) motivate; (5) promote; (6) celebrate; (7) promote it by evaluating public agencies, by controlling
learn; and (8) improve.5 them, or by motivating them to improve7 (table 1).
This list could be longer or shorter. For the measure-
ment of performance, the public manager’s real purpose— Purpose 1. To Evaluate: How Well Is This
indeed, the only real purpose—is to improve performance. Government Agency Performing?
The other seven purposes are simply means for achieving Evaluation is the usual reason for measuring perfor-
this ultimate purpose. Consequently, the choice of how mance. Indeed, many of the scholars and practitioners who
many subpurposes—how many distinct means—to include are attempting to develop systems of performance mea-
is somewhat arbitrary. But my major point is not. Instead, surement have come from the field of program evaluation.
let me emphasize: The leaders of public agencies can use Often (despite the many different reasons cited earlier), no
performance measures to achieve a number of very differ- reason is given for measuring performance; instead, the
ent purposes, and they need to carefully and explicitly evaluation purpose is simply assumed. People rarely state
choose their purposes. Only then can they identify or cre- that their only (or dominant) rationale for measuring per-
ate specific measures that are appropriate for each indi- formance is to evaluate performance, let alone acknowl-
vidual purpose.6 edge there may be other purposes. It is simply there be-
Of the various purposes that others have proposed for tween the lines of many performance audits, budget
measuring performance, I have not included on my list: plan- documents, articles, speeches, and books: People are mea-
ning, decision making, modifying programs, setting per- suring the performance of this organization or that pro-
formance targets, recognizing good performance, compar- gram so they (or others) can evaluate it.
ing performance, informing stakeholders, performance In a report on early performance-measurement efforts
contracting, and promoting accountability. Why not? Be- under the Government Performance and Results Act of
cause these are really subpurposes of one (or more) of the 1993, an advisory panel of the National Academy of Pub-
eight basic purposes. For example, planning, decision mak- lic Administration (NAPA) observed, “Performance mea-
ing, and modifying are implicit in two of my eight, more surement of program outputs and outcomes provides im-
basic, purposes: budgeting and improving. The real reason portant, if not vital, information on current program status
that managers plan, or make decisions, or modify programs and how much progress is being made toward important
is to either reallocate resources or to improve future perfor- program goals. It provides needed information as to
mance. Similarly, the reason that managers set performance whether problems are worsening or improving, even if it
targets is to motivate, and thus to improve. To compare per- cannot tell us why or how the problem improvement (or
formance among jurisdictions is—implicitly but undeni- worsening) came about” (NAPA 1994, 2). These sentences
ably—to evaluate them. Recognizing good performance is do not contain the words “evaluation” or “evaluate,” yet
designed to motivate improvements. Informing stakehold- they clearly imply the performance measurements will fur-
ers both promotes and gives them the opportunity to evalu- nish some kind of assessment of program performance.
ate and learn. Performance contracting involves all of the Of course, to evaluate the performance of a public
eight purposes from evaluating to improving. And, depend- agency, the public manager needs to know what that agency

588 Public Administration Review • September/October 2003, Vol. 63, No. 5


is supposed to accomplish. For this reason, two of the ten The ultimate question of comparative data is whether
performance-measurement design principles developed by publication does more harm than good. More harm
Kravchuk and Schack are to “formulate a clear, coherent can occur if many of the measurements contain er-
mission, strategy, and objectives,” and to “rationalize the rors or are otherwise unfair, so that low performers
are unfairly beaten up by the media and have to spend
programmatic structure as a prelude to measurement.” Do
excessive amounts of time and effort attempting to
this first, they argue, because “performance measurement
explain and defend themselves.… On the other hand,
must begin with a clear understanding of the policy objec- if the data seem on the whole to encourage jurisdic-
tives of a program, or multiprogram system,” and because tions to explore why low performance has occurred
“meaningful measurement requires a rational program and how they might better themselves, then such
structure” (1996, 350). Oops. If public managers have to efforts will be worthwhile, even if a few agencies
wait for the U.S. Congress or the local city council to for- are unfairly treated.” (Hatry 1999a, 104).
mulate (for just one governmental undertaking) a clear, Whether the scholars, analysts, or managers like it, al-
coherent mission, strategy, and objectives combined with most any performance measure can and will be used to
a rationalized program structure, they will never get to the evaluate a public agency’s performance.
next step of measuring anything.8
No wonder many public managers are alarmed by the Purpose 2. To Control: How Can Public Managers
evaluative nature of performance measurement. If there Ensure Their Subordinates Are Doing the Right
existed a clear, universal understanding of their policy ob- Thing?
jectives, and if they could manage within a rational pro- Yes. Frederick Winslow Taylor is dead. Today, no man-
gram structure, they might find performance measurement ager believes the best way to influence the behavior of sub-
less scary. But without an agreement on policy objectives, ordinates is to establish the one best way for them to do
public managers know that others can use performance data their prescribed tasks and then measure their compliance
to criticize them (and their agency) for failing to achieve with this particular way. In the twenty-first century, all
objectives that they were not pursuing. And if given re- managers are into empowerment.
sponsibility for achieving widely accepted policy objec- Nevertheless, it is disingenuous to assert (or believe)
tives with an insane program structure (multiple constraints, that people no longer seek to control the behavior of pub-
inadequate resources, and unreasonable timetables), even lic agencies and public employees, let alone seek to use
the most talented managers may fall short of the agreed- performance measurement to help them do so.9 Why do
upon performance targets. governments have line-item budgets? Today, no one em-
Moreover, even if the performance measures are not ploys the measurements of time-and-motion studies for
collected for the explicit purpose of evaluation, this possi- control. Yet, legislatures and executive-branch superiors
bility is always implicit. And using performance data to do establish performance standards—whether they are spe-
evaluate a public agency is a tricky and sophisticated un- cific curriculum standards for teachers or sentencing stan-
dertaking. Yet, a simple comparison of readily available dards for judges—and then measure performance to see
data about similar (though rarely identical) agencies is the whether individuals have complied with these mandates.10
most common evaluative technique. Hatry (1999a) notes After all, the central concern of the principle–agent theory
that intergovernmental comparisons of performance “fo- is how principles can control the behavior of their agents
cus primarily on indicators that can be obtained from tra- (Ingraham and Kneedler 2000, 238–39).
ditional and readily available data sources.” This is the Indeed, the controlling style of management has a long
common practice, he continues, because “the best outcome and distinguished history. It has cleverly encoded itself into
data cannot be obtained without new, or at least, substan- one of the rarely stated but very real purposes behind per-
tially revised procedures” (104). formance measurement. “Management control depends on
Often, however, existing or easily attainable data create measurement,” writes William Bruns in a Harvard Busi-
an opportunity for simplistic, evaluative comparisons. Hatry ness School note on “Responsibility Centers and Perfor-
writes that those who collect comparative performance data, mance Measurement” (1993, 1). In business schools, ac-
as well as “the public, and the media must recognize that counting courses and accounting texts often explicitly use
the data in comparative performance measurement efforts the word “control.”11
will only be roughly comparable” (1999a, 104). But will In their original explanation of the balanced scorecard,
journalists, who must produce this evening’s news or Robert Kaplan and David Norton note that business has a
tomorrow’s newspaper under very tight deadlines, recog- control bias: “Probably because traditional measurement
nize this, let alone explain it? And will the public, in their systems have sprung from the finance function, the sys-
quick glance at an attractive bar chart, get this message? tems have a control bias. That is, traditional performance
Hatry, himself, is not completely sanguine: measurement systems specify the particular actions they

Why Measure Performance? 589


want employees to take and then measure to see whether formance data. Certainly, cutting the fire department’s bud-
the employees have in fact taken those actions. In that way, get seems like a counterproductive way to improve perfor-
the systems try to control behavior. Such measurement mance (though cutting the fire department’s budget may be
systems fit with the engineering mentality of the Indus- perfectly logical if the city council decides that fire safety
trial Age” (1992, 79). The same is true in the public sector. is less of a political priority than educating children, fixing
Legislatures create measurement systems that specify par- the sewers, or reducing crime). If analysis reveals the fire
ticular actions they want executive-branch employees to department is underperforming because it is underfunded—
take and particular ways they want executive-branch agen- because, for example, its capital budget lacks the funds for
cies to spend money. Executive-branch superiors, regula- cost-effective technology—then increasing the department’s
tory units, and overhead agencies do the same. Then, they budget is a sensible response. But poor performance may
measure to see whether the agency employees have taken be the result of factors that more (or less) money won’t fix:
the specified actions and spent the money in the specified poor leadership, the lack of a fire-prevention strategy to
ways.12 Can’t you just see Fred Taylor smiling? complement the department’s fire-fighting strategy, or the
Purpose 3. To Budget: On What Programs, People, failure to adopt industry training standards. Using budget-
or Projects Should Government Spend the Public’s ary increments to reward well-performing agencies and bud-
Money? getary decrements to punish underperforming ones is not a
Performance measurement can help public officials to strategy that will automatically fix (or even motivate) poor
make budget allocations. At the macro level, however, the performers.
apportionment of tax monies is a political decision made Nevertheless, line managers can use performance data
by political officials. Citizens delegate to elected officials to inform their resource-allocation decisions. Once elected
and their immediate subordinates the responsibility for officials have established macro political priorities, those
deciding which purposes of government action are primary responsible for more micro decisions may seek to invest
and which ones are secondary or tertiary. Thus, political their limited allocation of resources in the most cost-effec-
priorities—not agency performance—drive macro budget- tive units and activities. And when making such micro
ary choices. budgetary choices, public managers may find performance
Performance budgeting, performance-based budgeting, measures helpful.
and results-oriented budgeting are some of the names com- Purpose 4. To Motivate: How Can Public Managers
monly given to the use of performance measures in the Motivate Line Staff, Middle Managers, Nonprofit
budgetary process (Holt 1995–96; Jordon and Hackbart and For-Profit Collaborators, Stakeholders, and
1999; Joyce 1996, 1997; Lehan 1996; Melkers and Citizens to Do the Things Necessary to Improve
Willoughby 1998, 2001; Thompson 1994; Thompson and Performance?
Johansen 1999). But like so many other phrases in the per- Public managers may use performance measures to learn
formance-measurement business, they can mean different how to perform better. Or, if they already understand what
things to different people in different contexts.13 For ex- it takes to improve performance, they may use the mea-
ample, performance budgeting may simply mean includ- sures to motivate such behavior. And for this motivational
ing historical data on performance in the annual budget purpose, performance measures have proven to be very
request. Or it may mean that budgets are structured not useful.
around line-item expenditures (with performance purposes The basic concept is that establishing performance
or targets left either secondary or implicit), but around gen- goals—particularly stretch goals—grabs people’s attention.
eral performance purposes or specific performance targets Then the measurement of progress toward the goals pro-
(with line-item allocations left to the managers of the units vides useful feedback, concentrating their efforts on reach-
charged with achieving these purposes or targets). Or it ing these targets. In his book The Great Ideas of Manage-
may mean rewarding units that do well compared to some ment, Jack Duncan of the University of Alabama reports
performance targets with extra funds and punishing units on the startling conclusion of research into the impact of
that fail to achieve their targets with budget cuts. goal setting on performance: “No other motivational tech-
For improving performance, however, budgets are crude nique known to date can come close to duplicating that
tools. What should a city do if its fire department fails to record” (1989, 127).
achieve its performance targets? Cut the department’s bud- To implement this motivational strategy, an agency’s
get? Or increase its budget? Or should the city manager fire leadership needs to give its people a significant goal to
the fire chief and recruit a public manager with a track record achieve and then use performance measures—including
of fixing broken agencies? The answer depends on the spe- interim targets—to focus people’s thinking and work and
cific circumstances that are not captured by the formal per- to provide a periodic sense of accomplishment. Moreover,

590 Public Administration Review • September/October 2003, Vol. 63, No. 5


performance targets may also encourage creativity in evolv- agencies and the value of particular programs; it also can
ing better ways to achieve the goal (Behn 1999); thus, indirectly establish, and thus promote, the competence and
measures that motivate improved performance may also value of government in general.
motivate learning.14 Purpose 6. To Celebrate: What Accomplishments
In New York City in the 1970s, Gordon Chase used per- Are Worthy of the Important Organizational Ritual
formance targets to motivate the employees of the Health of Celebrating Success?
Services Administration (Rosenthal 1975; Levin and All organizations need to commemorate their accomplish-
Sanger 1994). In Massachusetts in the 1980s, the leader- ments. Such rituals tie people together, give them a sense of
ship of the Department of Public Welfare used the same their individual and collective relevance, and motivate future
strategy (Behn 1991). And in the 1990s in Pennsylvania, efforts. Moreover, by achieving specific goals, people gain a
the same basic approach worked in the Department of En- sense of personal accomplishment and self-worth (Locke and
vironmental Protection (Behn 1997a). But perhaps the most Latham 1984, 1990). Such celebrations need not be limited
famous application of performance targets to motivate pub- to one big party to mark the end of the fiscal year or the comple-
lic employees is Compstat, the system created by William tion of a significant project. Small milestones along the way—
Bratton, then commissioner of the New York Police De- as well as unusual achievements and unanticipated victories—
partment, to focus attention of precinct commanders on
provide an opportunity for impromptu celebrations that call
reducing crime (Silverman 2001, 88–89, 101).
attention to these accomplishments and to the people who
Purpose 5. To Promote: How Can Public Managers made them happen. And such celebrations can help to focus
Convince Political Superiors, Legislators, attention on the next challenge.
Stakeholders, Journalists, and Citizens that Their Like all of the other purposes for measuring perfor-
Agency Is Doing a Good Job? mance—with the sole and important exception of improve-
Americans suspect their government is both ineffective ment—celebration is not an end in itself. Rather, celebra-
and inefficient. Yet, if public agencies are to accomplish tion is important because it motivates, promotes, and
public purposes, they need the public’s support. Perfor- recruits. Celebration helps to improve performance because
mance measures can contribute to such support by reveal- it motivates people to improve further in the next year,
ing not only when government institutions are failing, but quarter, or month. Celebration helps to improve perfor-
also when they are doing a good or excellent job. For ex- mance because it brings attention to the agency, and thus
ample, the National Academy of Public Administration’s promotes its competence. And this promotion—this atten-
Center for Improving Government Performance reports that tion—may even generate increased flexibility (from over-
performance measures can be used to “validate success; head agencies) and resources (from the guardians of the
justify additional resources (when appropriate); earn cus- budget). Moreover, this promotion and attention attract
tomer, stakeholder, and staff loyalty by showing results; another resource: dedicated people who want to work for
and win recognition inside and outside the organization” a successful agency that is achieving important public pur-
(NAPA 1999, 7). poses. Celebration may even attract potential collabora-
Still, too many public managers fail to use performance tors from other organizations that have not received as much
measures to promote the value and contribution of their attention, and thus seek to enhance their own sense of ac-
agency. “Performance-based measures,” writes Harry complishment by shifting some of their energies to the high-
Boone of the Council of State Governments, “provide a performing collaborative (Behn 1991, 92–93).
justification for the agency’s existence,” yet “many agen- Celebration also may be combined with learning. Rather
cies cannot defend their effectiveness in performance-based than hold a party to acknowledge success and recognize
terms” (1996, 10). its contributors, an informal seminar or formal presenta-
In a study, “Toward Useful Performance Measures,” a tion can realize the same purposes. Asking those who pro-
National Academy of Public Administration advisory panel duced the unanticipated achievement or unusual victory to
(1994) asserts that “performance indicators can be a pow- explain how they pulled it off celebrates their triumph; but
erful tool in communicating program value and accom- it also provides others with an opportunity to learn how
plishments to a variety of constituencies” (23). In addition they might achieve a similar success (Behn 1991, 106–7).
to “the use of performance measurement to communicate Still, the links from measurement to celebration to im-
program success and worth” (9), the panel noted, the “ma- provement is the most indirect because it has to work through
jor values of a performance measurement system” include one of the other links—either motivation, budgeting, learn-
its potential “to enhance public trust” (9). That is, the panel ing, or promotion. In the end, any reason for measuring per-
argues, performance measurement can not only directly formance is valid only to the extent that it helps to achieve
establish—and thus promote—the competence of specific the most basic purpose: to improve performance.

Why Measure Performance? 591


Purpose 7. To Learn: Why Is What Working or Not formance measures can describe what is coming out of the
Working? black box of a public agency, as well as what is going in,
Performance measures contain information that can be but they don’t necessarily reveal what is happening inside.
used not only to evaluate, but also to learn. Indeed, learn- How are the various inputs interacting to produce the out-
ing is more than evaluation. The objective of evaluation is puts? What is the organizational black box actually doing
to determine what is working and what isn’t. The objec- to the inputs to convert them into the outputs? What is the
tive of learning is to determine why. societal black box actually doing to the outputs to convert
To learn from performance measures, however, manag- them into the outcomes?17
ers need some mechanism to extract information from the Public managers can, of course, create some measures
data. We may all believe that the data speak for themselves. of the processes going on inside the black box. But they
This, however, is only because we each have buried in our cannot guarantee that the internal characteristics and pro-
brain some unconscious mechanism that has already made cesses of the black box they have chosen to measure are
an implicit conversion of the abstract data into meaningful actually the ones that determine whether the inputs are
information. The data speak only through an interpreter converted into high-quality or low-quality outputs. Yet, the
that converts the collection of digits into analog lessons— more internal processes that public managers choose to
that decodes the otherwise inscrutable numbers and pro- measure, the more likely they are to discover a few that
vides a persuasive explanation. And often, different people correlate well with the outputs. Such correlations could,
use different interpreters, which explains how they can draw however, be purely random,18 or the factors that are identi-
very different lessons from the same data.15 fied by the correlations as significant contributors could
Moreover, if managers have too many performance merely be correlated with other factors that are the real
measures, they may be unable to learn anything. Carole causes. Converting performance data into an understand-
Neves of the National Academy of Public Administration, ing of what is happening inside the black box is neither
James Wolf of Virginia Tech, and Bill Benton of Benton easy nor obvious.
and Associates (1986) write that “in many agencies,” be- Purpose 8. To Improve: What Exactly Should Who
cause of the proliferation of performance measures, “there Do Differently to Improve Performance?
is more confusion or ‘noise’ than useful data.” Theodore Performance “ ‘measurement’ is not an end in itself but
Poister and Gregory Streib of Georgia State University call must be used by managers to make improvements” (NAPA
this the “‘DRIP’ syndrome—Data Rich but Information 1994, 22), emphasizes an advisory panel of the National
Poor” (1999, 326). Thus, Neves and her colleagues con- Academy of Public Administration. In fact, the word “im-
clude, “managers lack time or simply find it too difficult prove” (or “improving” or “improvement”) appears more
to try to identify good signals from the mass of numbers” than a dozen times in this NAPA report. “Ideally,” the panel
(1986, 141). concludes, “performance data should be part of a continu-
From performance measures, public managers may learn ous feedback loop that is used to report on program value
what is not working. If so, they can stop doing it and real- and accomplishment and identify areas where performance
locate money and people from this nonperforming activity is weak so that steps can be taken to promote improve-
to more effective undertakings (designed to achieve the ments” (22). Yet, the panel also found “little evidence in
identical or quite different purposes). Or they may learn most [GRPA pilot performance] plans that the performance
what is working. If so, they can shift existing resources (or information would be used to improve program perfor-
new resources that become available) to this proven activ- mance” (8).
ity. Learning can help with the budgeting of both money Similarly, Hatry argues the “fundamental purpose of
and people. performance information” is “to make program improve-
Furthermore, learning can help more directly with the ments” (1999b, 158). But how? What exactly is the con-
improving. The performance measures can reveal not only nection between the measurement and the improvement?
whether an agency is performing well or poorly, but also Who has to do what to convert the measurement into an
why: What is contributing to the agency’s excellent, fair, improvement? Or does this just happen automatically? No,
or poor performance—and what might be done to improve responds the NAPA panel: “measurement alone does not
the components that are performing fairly or poorly? bring about performance improvement” (1994, 15).
In seeking to learn from performance measures, public For example, if the measurement produces some learn-
managers frequently confront the black box enigma of so- ing, someone then must convert that learning into an im-
cial science research.16 The data—the performance mea- provement. Someone has to intervene consciously and ac-
sures—can reveal that an organization is performing well tively. But can any slightly competent individual pull this
or poorly, but they don’t necessarily reveal why. The per- off? Or does it require a sophisticated appreciation of the

592 Public Administration Review • September/October 2003, Vol. 63, No. 5


strategies and pitfalls of converting measurement into im- may be completely useless for another. For example, “in
provement? To improve, an organization needs the capac- many cases,” Newcomer notes, “the sorts of measures that
ity to adopt—and adapt—the lessons from its learning. might effectively inform program improvement decisions
Learning from performance measures, however, is tricky. may provide data that managers would not find helpful for
It isn’t obvious what lessons public managers should draw resource allocation purposes” (1997, 8). Before choosing
about which factors are contributing to the good or poor a performance measure, public managers must first choose
performance, let alone how they might modify such fac- their purpose.
tors to foster improvements. Improvement requires atten- Kravchuk and Schack note that no one measure or even
tion to the feedback—the ability to check whether the les- one collection of measures is appropriate for all circum-
sons postulated from the learning have been implemented stances: “The search for a single array of measures for all
in a way that actually changes organizational behavior so needs should be abandoned, especially where there are di-
that it results in the better outputs and outcomes that the vergent needs and interests among key users of performance
learning promised. Improvement is active, operational information.” Thus, they advocate “an explicit measure-
learning. ment strategy” that will “provide for the needs of all im-
The challenge of learning from the performance mea- portant users of performance information” (Kravchuk and
sures is both intellectual and operational. Public managers Schack 1996, 350).
who wish to use measurement to improve the performance I take a similar approach. But, rather than worry about
of their agencies face two challenges: First, they have the the needs of different kinds of users, I focus on the differ-
intellectual challenge of figuring out how to learn which ent purposes for which the users—specifically, public man-
changes in plans, or procedures, or personnel might pro- agers—can employ the performance measures. After all,
duce improvements. Then, they confront the operational different users want different measures because they have
challenge of figuring out how to implement the indicated different purposes. But it is the nature of the purpose—not
changes. the nature of the user—that determines which characteris-
There are a variety of standard mechanisms for using tics of those measures will be most helpful. The usual ad-
performance measures to evaluate. There exist some such monition of performance measurement is, “Don’t measure
mechanisms to control and budget. For the purposes of inputs. Don’t measure processes. Don’t measure outputs.
learning and improving, however, each new combination Measure outcomes.” But outcomes are not necessarily the
of policy objectives, political environment, budgetary re- best measure for all purposes.
sources, programmatic structure, operational capacity, regu- Will a particular public manager find a certain perfor-
latory constraints, and performance measures demands a mance measure helpful for a specific purpose? The answer
more open-ended, qualitative analysis. For performance depends not on the organizational position of that man-
learning and performance improvement, there is no cook- ager, but on whether this measure possesses the character-
book.19 istics required by the manager’s purpose (table 2).
How does the measurement of performance beget im-
provement? Measurement can influence performance in a Purpose 1: To Evaluate
variety of ways, most of which are hardly direct or appar- Evaluation requires a comparison. To evaluate the per-
ent. There exist a variety of feedback loops, though not all formance of an agency, its managers have to compare that
of them may be obvious, and the obvious ones may not
function as expected or desired. Consequently, to measure Table 2 Characteristics of Performance Measures for
an agency’s performance in a way that can actually help Different Purposes
improve its performance, the agency’s leadership needs to The purpose To help achieve this purpose, public managers need
think seriously not only about what it should measure, but Evaluate Outcomes, combined with inputs and with the effects of
also about how it might deploy any such measurements. exogenous factors
Control Inputs that can be regulated
Indeed, without at least some tentative theory about how
Budget Efficiency measures (specifically outcomes or outputs
the measurements can be employed to foster improvements, divided by inputs)
it is difficult to think about what should be measured. Motivate Almost-real-time outputs compared with production targets
Promote Easily understood aspects of performance about which
citizens really care
Selection Criteria for Each Measurement Celebrate Periodic and significant performance targets that, when
achieved, provide people with a real sense of personal
Purpose and collective accomplishment
What kinds of performance measures are most appro- Learn Disaggregated data that can reveal deviancies from the
expected
priate for which purposes? It isn’t obvious. Moreover, a Improve Inside-the-black-box relationships that connect changes in
measure that is particularly appropriate for one purpose operations to changes in outputs and outcomes

Why Measure Performance? 593


performance with some standard. Such a standard can come courts? Often, such requirements are described only as
from past performance, from the performance of similar guidelines: curriculum guidelines, sentencing guidelines.
agencies, from a professional or industry standard, or from Do not be fooled. These guidelines are really requirements,
political expectations. But without such a basis for com- and these requirements are designed to control. The mea-
parison, it is impossible to determine whether the agency surement of compliance with these requirements is the
is performing well or poorly. mechanism of control.
And to compare actual performance against the perfor-
mance criterion requires a variety of outcome measures, Purpose 3: To Budget
combined with some input (plus environmental, process, To use performance measures for budgeting purposes,
and output) measures. The focus, however, is on the out- public managers need measures that describe the efficiency
comes. To evaluate a public agency—to determine whether of various activities. Then, once political leaders have set
it is achieving its public purpose—requires some measure macro budgetary priorities, agency managers can use effi-
of the outcomes that the agency was designed to affect. ciency measures to suggest the activities in which they
Only with outcome measures can public managers answer should invest the appropriated funds. Why spend limited
the effectiveness question: Did the agency achieve the re- funds on some programs or organizations when the per-
sults it set out to produce? Then, dividing by some input formance measures reveal that other programs or organi-
measures, they can ask the efficiency question: Did this zations are more efficient at achieving the political objec-
agency produce these results in a cost-effective way? To tives behind the budget’s macro allocations?
answer either of these evaluative questions, a manager To use performance measures to budget, however, man-
needs to measure outcomes.20 agers need not only data on outcomes (or outputs) for the
Of course, the agency did not produce all of the out- numerator in the efficiency equation; they also need reli-
comes alone. Other factors, such as economic conditions, able cost data for the denominator. And these cost mea-
affected them. Consequently, public managers also need sures have to capture not only the obvious, direct costs of
to ask the impact question: What did the agency itself ac- the agency or program, but also the hidden, indirect costs.
complish? What is the difference between the actual out- Few governments, however, have created cost-accounting
comes and the outcomes that would have occurred if the or activity-based-accounting systems that assign to each
agency had not acted? government function the full and accurate costs (Coe 1999,
Another way of assessing an organization or program is 112; Joyce 1997, 53, 56; Thompson 1994).
to evaluate its internal operations. This is the best-practice Budgeting usually concerns the allocation of dollars.
question: How do the operations and practices of this or- But most public managers are constrained by a system of
ganization or program compare with the ones that are double budgeting. They must manage a fixed number of
known to be most effective and efficient? To conduct such dollars and a fixed number of personnel slots. Thus, in at-
a best-practice evaluation requires some process mea- tempting to maximize the productivity of these two con-
sures—appropriate descriptions of the organization’s key strained resources, they also need to budget their people.
internal operations that can be compared with some op- And to use performance measurement for this budgetary
erational standards. purpose, they need not only outcome (or output) measures
No one comparison of a single outcome measure with a for the numerator of their efficiency equation, but also in-
single performance standard will provide a definitive evalu- put data in terms of people for the denominator. Public
ation. Rather, to provide a conscientious and credible pic- managers need to allocate their people to the activities with
ture of the agency’s performance, an evaluation requires the highest productivity per person.
multiple measures compared with multiple standards.
Purpose 4: To Motivate
Purpose 2: To Control To motivate people to work harder or smarter, public
To control the behavior of agencies and employees, pub- managers need almost-real-time measures of outputs to
lic officials need input requirements. Indeed, whenever you compare with production targets. Organizations don’t pro-
discover officials who are using input measures, you can duce outcomes; organizations produce outputs. And to
be sure they are using them to control. To do this, officials motivate an organization to improve its performance,
need to measure the corresponding behavior of individu- managers have to motivate it to improve what it actually
als and organizations and then compare this performance does. Consequently, although public managers want to
with the requirements to check who has and has not com- use outcome data to evaluate their agency’s performance,
plied: Did the teachers follow the curricular requirements they need output data to motivate better performance.21
for the children in their classrooms? Did the judges follow Managers can’t motivate people to do something they
the sentencing requirements for those found guilty in their can’t do; managers can’t motivate people to affect some-

594 Public Administration Review • September/October 2003, Vol. 63, No. 5


thing over which they have little or no influence; manag- Purpose 5: To Promote
ers can’t motivate people to produce an outcome they do To convince citizens their agency is effective and effi-
not themselves produce. cient, public managers need easily understood measures
Moreover, to motivate, managers have to collect and of those aspects of performance about which many citi-
distribute the output data quickly enough to provide useable zens personally care. And such performance may be only
feedback. Those who produce the agency’s outputs cannot tangentially related to the agency’s public purpose.
adjust their production processes to respond to inadequa- The National Academy of Public Administration, in its
cies or deficiencies unless they know how well they are study of early performance-measurement plans under the
doing against their current performance target. Eli Government Performance and Results Act, noted that “most
Silverman of the John Jay College of Criminal Justice de- plans recognized the need to communicate performance
scribes Compstat as “intelligence-led policing” (2001, 182). evaluation results to higher level officials, but did not show
The New York Police Department collects, analyzes, and clear recognition that the form and level of data for these
quickly distributes to managers at all levels—from com- needs would be different than that for operating manag-
missioner to patrol sergeants—the data about the current ers.” NAPA emphasized that the needs of “department
patterns and concentrations of crime that are necessary to heads, the Executive Office of the President, and Congress”
develop strategic responses. are “different and the special needs of each should be more
This helps to explain why society attempts to motivate explicitly defined” (1994, 23). Similarly, Kaplan and
schools and teachers with test scores. The real, ultimate Norton stress that different customers have different con-
outcome that citizens seek from our public schools is chil- cerns (1992, 73–74).
dren who grow up to become productive employees and Consider a state division of motor vehicles. Its mission
responsible citizens. But using a measure of employee pro- is safety—the safety of vehicle drivers, vehicle passengers,
ductivity and citizen responsibility to motivate performance bicycle riders, and pedestrians. In pursuit of this mission,
creates a number of problems. First, it is very difficult to this agency inspects vehicles to ensure their safety equip-
develop a widely acceptable measure of employee produc- ment is working, and it inspects people to ensure they are
tivity (do we simply use wage levels?), let alone citizen safe drivers. When most citizens think about their division
responsibility (do we using voting participation?). Second, of motor vehicles, however, what is their greatest care?
schools and teachers are not the only contributors to a fu- Answer: How long they will have to stand in line. If a state
ture adult’s productivity and responsibility. And third, the DMV wants to promote itself to the public, it has to em-
lag between when the schools and teachers do their work phasize just one obvious aspect of performance: the time
and when these outcomes can be measured is not just people spend in line. To promote itself to the public, a DMV
months or years, but decades. Thus, we never could feed has to use this performance measure to convince citizens
these outcome measures back to the schools and teachers that the time they will spend in line is going down.23
in time for them to make any adjustments. Consequently, Ammons (1995) offers a “revolutionary” approach:
as a society we must resort to motivating schools and teach- “make performance measurement interesting.” Municipali-
ers with outputs—with test scores that (presumably) mea- ties, he argues, ought to adopt measures “that can capture
sure how much a child has learned. And although we can- the interest of local media and the public” (43)—particu-
not determine whether schools and teachers are producing larly measures that “allow meaningful comparisons that
productivity or responsibility in future adults, citizens ex- to some degree put community pride at stake” (38). Such
pect they will convey some very testable knowledge and comparisons could be to a professional or industry stan-
skills.22 dard. After all, as Ammons notes, “comparison with a stan-
Once an agency’s leaders have motivated significant dard captures attention, where raw information does not”
improvements using output targets, they can create some (39).24 But he is even more attracted to interjurisdictional
outcome targets. Output targets can motivate people to fo- comparisons. For example, Ammons argues that the pub-
cus on improving their agency’s internal processes, which lic pays attention to various rankings of communities (be-
produce the outputs. Outcome targets, in contrast, can cause, first, journalists pay attention to them). Thus, he
motivate people to look outside their agency—to seek ways wants measures that “are both revealing of operational ef-
to collaborate with other individuals and organizations ficiency and effectiveness and more conducive to cross-
whose activities may affect (perhaps more directly) the jurisdictional comparisons” (38)—measures that provide
outcomes and values the agency is really charged with pro- “opportunities for interesting and meaningful performance
ducing (Bardach 1998; Sparrow 2000). comparisons” (44).
Time spent in line is a measure that is both interesting
and meaningful. But what should be the standard for com-
parison? Is the average time spent in line the most mean-

Why Measure Performance? 595


ingful measure? Or do people care more about the prob- Benchmarking is a traditional form of performance
ability they will spend more than some unacceptable time measurement that is designed to facilitate learning
(say one hour) in line?25 Whatever time-in-line measure it (Holloway, Francis, and Hinton 1999). It seeks to answer
chooses, a DMV may want to compare it with the same three questions: What is my organization doing well? What
measure from neighboring (or similar) states. But will citi- is my organization not doing well? What does my organi-
zens in North Carolina really be impressed that they spend zation need to do differently to improve on what it is not
less time in a DMV line than do the citizens of South Caro- doing well? The organization, public or private, identifies
lina or North Dakota? Or will their most meaningful com- a critical internal process, measures it, and compares these
parison be with the time spent in line at their local super- data with similar measurements of the identical (or simi-
market, bank, or fast-food franchise? A DMV manager who lar) processes of organizations that are recognized as (cur-
wants to promote the agency’s competence to the public rently) the best.26 Any differences suggest not only that the
should compare its time-in-line performance with similar organization needs to improve, but also provide a basis for
organizational performance that people experience every identifying how it could achieve these improvements.
day. This is an easily understood performance characteris- Benchmarking, write Kouzmin et al., is “an instrument
tic about which citizens really care. for assessing organizational performance and for facilitat-
To do this, however, the agency must not only publish ing management transfer and learning from other bench-
the performance data; it must also make them accessible marked organizations” (1999, 121). Benchmarking, as they
both physically and psychologically. People must be able define it, “is a continuous, systematic process of measur-
to obtain—perhaps not avoid—the measures; they must ing products, services and practices against organizations
also find them easy to comprehend. regarded to be superior with the aim of rectifying any per-
formance ‘gaps’” (123). Thus, they conclude, “benchmark-
Purpose 6: To Celebrate ing can, on the whole, be seen as a learning strategy” (131).
Before an agency can do any celebrating, its managers Nevertheless, they caution, for this strategy to work, the
need to create a performance target that, when achieved, organization must become a learning organization. Con-
gives its employees and collaborators a real sense of per- sequently, they conclude, “the learning effects of bench-
sonal and collective accomplishment. This target can be marking are, to a very high degree, dependent on adequate
one that has also been used to motivate; it can be an annual organizational conditions and managerial solutions” (132).
target, or one of the monthly or quarterly targets into which Deciding which performance measures best facilitate
an annual target has been divided. Once an agency has pro- learning is not easy. If public managers know what they
duced a tangible and genuine accomplishment that is worth need to do to improve performance, they don’t need to learn
commemorating, its managers need to create a festivity it. But, if they don’t know how they might improve, how
that is proportional to the significance of the achievement. do they go about learning it? Kravchuck and Schack note
The verb “to celebrate” suggests a major undertaking— that a “measurement system is a reflection of what deci-
a big, end-of-the-fiscal-year bash, an awards ceremony sion makers expect to see and how they expect to respond”
when a most-wanted criminal is captured, or a victory party (1996, 356). That is, when designing a performance-mea-
when a badly delayed project is completed on deadline. surement system, when deciding what to measure, man-
But private-sector managers celebrate lesser accomplish- agers first will decide what they might see and then create
ments; they use celebrations of significant successes to a system to see it.
convince their employees that their firm is full of winners Real learning, however, is often triggered by the unex-
(Peters and Austin 1985). Public managers have used this pected. As Richard Feynman, the Nobel Prize-winning
strategy, too (Behn 1991, 103–11). But to make the strat- physicist, explained, when experiments produce unex-
egy work—to ensure that it creates motivation and thus pected results, scientists start guessing at possible expla-
improvement—the agency’s managers have to lead the nations (1965, 157). When Mendel crossed a dwarf plant
celebrations. with a large one, he found that he didn’t get a medium-
sized plant, but either a small or large one, which led him
Purpose 7: To Learn to discover the laws of heredity (Messadié 1991, 90).
To learn, public managers need a large number and wide When the planet Uranus was discovered to be unexpect-
variety of measures—measures that provide detailed, dis- edly deviating from its predicted orbit, John Couch Adams
aggregated information on the various aspects of the vari- and Urbain Le Verrier independently calculated the orbit
ous operations of the various components of the agency. of an unknown planet that could be causing this unantici-
When seeking to learn, caution Kravchuk and Schack, pated behavior; then, Johan Gottlieb Galle pointed his
public managers need to “avoid excessive aggregation of telescope in the suggested direction and discovered Nep-
information” (1996, 357). tune (Standage 2000). When Karl Jansky observed that

596 Public Administration Review • September/October 2003, Vol. 63, No. 5


the static on his radio peaked every 24 hours and that the they think they observe an interesting deviance, they need
peak occurred when the Milky Way was centered on his a learning strategy for probing the causes and possible
antenna, he discovered radio waves from space (Messadié implications.
1991, 179). Scientific learning often emerges from an Thus, Kravchuk and Schack (1996) caution, “organi-
effort to explain the unexpected. So does management zational learning cannot depend upon measurement
learning. alone” (356)—that is, “performance measurement sys-
Yet how can public officials design a measurement sys- tems cannot replace the efforts of administrators to truly
tem for the unexpected when they can’t figure out what know, understand, and manage their programs” (350).
they don’t expect? As Kravchuk and Schack write, “unex- Rather, they argue, the measures should indicate when
pected occurrences may not be apprehended by existing the organization needs to undertake a serious effort at
measures” (1996, 356). Nevertheless, the more disaggre- learning based on the “expert knowledge” (357) of its
gated the data, the more likely they are to reveal devian- program managers and “other sources of performance
cies that may suggest the need to learn. This is the value of information which can supplement the formal measures”
management by walking around—or what might be called (356). Thus, they suggest, “measures should be placed in
“data collection by walking around.” The stories that people a management-by-exception frame, where they are re-
tell managers are the ultimate in disaggregation; one such garded as indicators that will serve to signal the need to
story can provide a single deviate datum that the summary investigate further” (357). Similarly, Neves, Wolf, and
statistics have completely masked but that, precisely be- Benton write that “management indicators are intended
cause it was unexpected, prods further investigation that to be provocative, to suggest to managers a few areas
can produce some real learning (and thus, perhaps, some where it may be appropriate to investigate further why a
real improvement). particular indicator shows up the way it does” (1986, 129).
In fact, the learning process may first be triggered by The better the manager understands his or her agency and
some deviance from the expected that appeared not in the the political, social, and cultural environment in which it
formal performance data, but in the informal monitoring works, the better the manager is able to identify—from
in which all public managers necessarily (if only implic- among the various deviances that are generated by for-
itly) engage. Then, having noticed this deviancy—some mal and informal performance measures—the ones that
aberration that doesn’t fit previous patterns, some number are worthy of additional investigation.
so out of line that it jumps off the page, some subtle sign Performance measures that diverge from the expected
that suggests that something isn’t quite right—the man- can create an opportunity to learn. But the measures them-
ager can create a measuring strategy to learn what caused selves are more likely to suggest topics for investigation
the deviance and how it can be fixed or exploited.27 than to directly impart key operational lessons.
Failure is a most obvious deviance from the expected
and, therefore, provides a significant opportunity to learn.28 Purpose 8: To Improve
Indeed, a retrospective investigation into the causes of the To ratchet up performance, public managers need to
failure will uncover a variety of measures that deviated understand how they can influence the behavior of the
from the expected—that is, either from the agency’s pre- people inside their agency (and its collaboratives) who pro-
scribed behavior or from the predicted consequences of duce their outputs and how they can influence the conduct
such prescriptions. Thus, failure provides multiple oppor- of citizens who convert these outputs into outcomes. They
tunities to learn (Petroski 1985; Sitkin 1992). need to know what is going on inside their organization—
Yet, failure (particularly in the public sector) is usually including the broader organization that consists of every-
punished—and severely. Thus, when a failure is revealed thing and everyone whose behavior can affect these out-
(or even presumed), people tend to hide the deviate data, puts and outcomes. They need to know what is going on
for such data can be used to assign blame. Unfortunately, inside their entire, operational black box. They need in-
these are the same deviate data that are needed to learn. side-the-black-box data that explains how the inputs, en-
As glaring departures from the expected, failures pro- vironment, and operations they can change (influence or
vide managers with obvious opportunities to learn. Most inspire) do (can, or might) cause (create, or contribute to)
deviances, however, are more subtle. Thus, to learn from improvements in the outputs and outcomes. For example,
such deviances, managers must be intellectually prepared a fire chief needs to understand how the budget input in-
to recognize them and to examine their causes. They have teracts (inside the fire department’s black box) with people,
to possess enough knowledge about the operation and be- equipment, training, and values to affect how the depart-
havior of their organization—and about the operation and ment’s staff implements its fire-fighting strategy and its
behavior of their collaborators and society—to distinguish educational fire-prevention strategy—outputs that further
a significant deviance from a random aberration. And when interact with the behavior of citizens to produce the de-

Why Measure Performance? 597


sired outcomes of fewer fires and fewer people injured or To use performance measures to achieve any of these
killed by fires that do occur. eight purposes, the public manager needs some kind of
Unfortunately, what is really going on inside the black standard with which the measure can be compared.
box of any public agency is both complex and difficult to 1. To use a measure to evaluate performance, public man-
perceive. Much of it is going on inside the brains (often agers need some kind of desired result with which to
the subconscious brains) of the employees who work within compare the data, and thus judge performance.
the organization, the collaborators who somehow contrib- 2. To use a measure of performance to control behavior,
ute to its outputs, and the citizens who convert these out- public managers need first to establish the desired be-
puts into outcomes. Moreover, any single action may ripple havioral or input standard from which to gauge indi-
through the agency, collaborators, and society as people vidual or collective deviance.
adjust their behavior in response to seemingly small or ir- 3. To use efficiency measures to budget, public managers
relevant changes made by someone in some far-off corner. need an idea of what is a good, acceptable, or poor level
And when several people simultaneously take several ac- of efficiency.31
tions, the ripples may interact in complex and unpredict- 4. To use performance measures to motivate people, pub-
able ways. It is very difficult to understand the black box lic managers need some sense of what are reasonable
adjustments and interactions that happen when just a few and significant targets.
of the inputs (or processes) are changed, let alone when 5. To use performance measures to promote an agency’s
many of them are changing simultaneously and perhaps in competence, public managers need to understand what
undetected ways.29 the public cares about.
Once the managers have figured out what is going on 6. To use performance measures to celebrate, public manag-
inside their black box, they have to figure out how the ers need to discern the kinds of achievements that em-
few things they can do are connected to the internal com- ployees and collaborators think are worth celebrating.
ponents they want to affect (because these components 7. To use performance measures to learn, public manag-
are, in turn, connected to the desired outputs or outcomes). ers need to be able to detect unexpected (and signifi-
How can changes in the budget’s size or allocations af- cant) developments and anticipate a wide variety of
fect people’s behavior?30 How can changes in one core common organizational, human, and societal behaviors.
process affect other processes? How can changes in one 8. To use performance measures to improve, public man-
strategy support or undermine other strategies? How agers need an understanding (or prediction) of how
might they influence people’s behavior? their actions affect the inside-the-black-box behavior
Specifically, how might various leadership activities of the people who contribute to their desired outputs
ripple through the black box? How might frequent, infor- and outcomes.
mal recognition of clear, if modest, successes or public All of the eight purposes require (explicitly or implic-
attention to some small wins activate others? How might itly) a baseline with which the measure can be compared.
an inspirational speech or a more dramatic statement of And, of course, the appropriate baseline depends on the
the agency’s mission affect the diligence, intelligence, and context.
creativity of both organizational employees and collabo- The standard against which to compare current perfor-
rating citizens? To improve performance, public managers mance can come from a variety of sources—each with its
need measures that illuminate how their own activities af- own advantages and liabilities. The agency may use its
fect the behavior of all of the various humans whose ac- historical record as a baseline, looking to see how much it
tions affect the outputs and outcomes they seek. has improved. It may use comparative information from
similar organizations, such as the data collected by the
Comparative Performance Measurement Consortium or-
Meaningful Performance Measurement ganized by the International City/County Management
Requires a Gauge and a Context Association (1999), or the effort to measure and compare
Abstract measures are worthless. To use a performance the performance of local jurisdictions in North Carolina
measure—to extract information from it—a manager needs organized by the University of North Carolina (Rivenbark
a specific, comparative gauge, plus an understanding of and Few 2000).32 Of course, comparative data also may
the relevant context. A truck has been driven 6.0 million. come from dissimilar organizations; citizens may com-
Six million what? Six million miles? That’s impressive. pare—implicitly or quite explicitly—the ease of navigat-
Six million feet? That’s only 1,136 miles. Six million ing a government Web site with the ease of navigating those
inches? That’s not even 95 miles. Big deal—unless those created by private businesses.33 Or the standard may be an
95 miles were driven in two hours along a dirt road on a explicit performance target established by the legislature,
very rainy night. by political executives, or by career managers. Even to

598 Public Administration Review • September/October 2003, Vol. 63, No. 5


control, managers need some kind of Tayloristic standard by others. Even when they must respond to measures im-
to be met by those whose behavior they seek to control. posed by outsiders, however, the leaders of a public agency
Whether public managers want to evaluate, control, bud- have not lost their obligation to create a collection of per-
get, motivate, promote, celebrate, learn, or improve, they formance measures that they will use to manage the agency.
need both a measure and a standard of performance. The leadership team still must report the measures that
outsiders are, legitimately, requesting. And they may be
able to use some of these measures for one or more of their
The Political Complexities of Measuring own eight purposes. But even when others have chosen
Performance their own measures of the agency’s performance, its lead-
Who will pick the purpose, the measure, and the perfor- ers still need to seriously examine the eight managerial
mance standard? The leadership team of a public agency purposes for which performance measures may prove use-
has both the opportunity and the responsibility. But oth- ful and carefully select the best measures available for each
ers—elected executives and legislators, political appoin- purpose.
tees and budget officers, journalists and stakeholders, and
of course individual citizens—have the same opportunity,
and often the same responsibility. Consequently, the The Futile Search for the One Best
agency’s managers may discover that a set of performance Measure
measures has been imposed on them. “What gets measured gets done” is, perhaps, the most
In some ways, however, public managers have more flex- famous aphorism of performance measurement.35 If you
ibility in selecting the performance measures that will be measure it, people will do it. Unfortunately, what people
used by outsiders than do private-sector managers. After measure often is not precisely what they want done. And
all, investment analysts long ago settled on a specific col- people—responding to the explicit or implicit incentives
lection of performance measures—from return on equity of the measurement—will do what people are measuring,
to growth in market share—that they use when examining not what these people actually want done. This is, as Steven
a business. For public agencies, however, no such broadly Kerr, now chief learning officer at Goldman Sachs, wisely
applicable and widely acceptable performance measures observes, “the folly of rewarding A while hoping for B”
exist. Thus, every time those outsiders—whether they are (1975). Thus, although performance measures shape be-
budget officers or stakeholders—wish to examine a par- havior, they may shape behavior in both desirable and un-
ticular agency’s management, they have to create some desirable ways.36
performance measures. For a business, the traditional performance measure has
Sometimes, some will. Sometimes, a legislator or a bud- been the infamous bottom line—although any business has
get officer will know exactly how he or she thinks the per- not just one bottom line, but many of them: a variety of
formance of a particular public agency should be measured. financial ratios (return on equity, return on sales) that col-
Sometimes, none will. Sometimes, no outsider will be able lectively suggest how well the firm is doing—or, at least,
to devise a performance measure that makes much sense. how well it has done. But as Kaplan and Norton observe,
Sometimes, many will. Sometimes several outsiders—an “many have criticized financial measures because of their
elected executive, a newspaper editor, and a stakeholder well-documented inadequacies, their backward-looking
organization—will each develop a performance measure focus, and their inability to reflect contemporary value-
(or several such measures) for the agency. And when this creating actions.” Thus, Kaplan and Norton invented their
happens, these measures may well conflict. now-famous balanced scorecard to give businesses a
Mostly, these outsiders use their performance measures broader set of measures that capture more than the firm’s
to evaluate, control, budget, or punish. Some might say, most recent financial numbers. They want performance
“We need this performance measure to hold the agency measures that answer four questions from four different
accountable.” By this, they really mean, “We need this perspectives:
performance measure to evaluate the agency and if (as we • How do customers see us? (customer perspective)
suspect) the agency doesn’t measure up, we will punish it • What must we excel at? (internal business perspective)
by cutting its budget (or saying nasty things that will be • Can we continue to improve and create value? (innovation
reported by journalists).”34 Outsiders are less likely to use and learning perspective)
performance measures to motivate, promote, or celebrate— • How do we look to shareholders? (financial perspective)
though they could try to use them to force improvements. No single measure of performance answers all four ques-
Thus, the managers of a public agency may not have tions (1992, 77, 72).
complete freedom to choose their own performance mea- Similarly, there is no one magic performance measure
sures. They may have to pay attention to measures chosen that public managers can use for all of their eight purposes.

Why Measure Performance? 599


The search for the one best measurement is just as futile as a more complex set of obstacles that must be overcome to
the search for the one best way (Behn 1996). Indeed, this improve and create value.37 Consequently, they need an
is precisely the argument behind Kaplan and Norton’s bal- even more heterogeneous family of measures than the four
anced scorecard: Private-sector managers, they argue, that Kaplan and Norton propose for business.
“should not have to choose between financial and opera- The leaders of a public agency should not go looking
tional measures”; instead, business executives need “a bal- for their one magic performance measure. Instead, they
anced presentation of both financial and operational mea- should begin by deciding on the managerial purposes to
sures” (1992, 71). The same applies to public managers, which performance measurement may contribute. Only
who are faced with a more diverse set of stakeholders (not then can they select a collection of performance measures
just customers and shareholders), a more contradictory set with the characteristics necessary to help them (directly
of demands for activities in which they ought to excel, and and indirectly) achieve these purposes.

Notes

1. Okay, not everyone is measuring performance. From a sur- There is, however, no guarantee that every use of performance
vey of municipalities in the United States, Poister and Streib measures to budget or celebrate will automatically enhance
find that “some 38 percent of the [695] respondents indicate performance. There is no guarantee that every controlling or
that their cities use performance measures, a significantly motivational strategy will improve performance. Public man-
lower percentage than reported by some of the earlier sur- agers who seek evaluation or learning measures as a step to-
veys” (1999, 328). Similarly, Ammons reports on municipal ward improving performance need to think carefully not only
governments’ “meager record” of using performance mea- about why they are measuring, but also about what they will
sures (1995, 42). And, of course, people who report they are do with these measurements and how they will employ them
measuring performance may not really be using these mea- to improve performance.
sures for any real purpose. Joyce notes there is “little evi- 5. Jolie Bain Pillsbury deserves the credit for explicitly point-
dence that performance information is actually used in the ing out to me that distinctly different purposes for measuring
process of making budget decisions” (1997, 59). performance exist. On April 16, 1996, at a seminar at Duke
2. People can measure the performance of (1) a public agency; University, she defined five purposes: evaluate, motivate,
(2) a public program; (3) a nonprofit or for-profit contractor learn, promote, and celebrate (Behn 1997b).
that is providing a public service; or (4) a collaborative of Others, however, have also observed this. For example,
public, nonprofit, and for-profit organizations. For brevity, I Osborne and Gaebler (1992), in their chapter on “Results
usually mention only the agency—though I clearly intend Oriented Government,” employ five headings that capture five
my reference to a public agency’s performance to include the of my eight purposes: “If you don’t measure results, you can’t
performance of its programs, contractors, and collaboratives. tell success from failure” (147) (evaluate); “If you can’t see
3. Although Hatry provides the usual list of different types of success, you can’t reward it” (148) (motivate); “If you can’t
performance information—input, process, output, outcome, see success, you can’t learn from it” (150) (learn); “If you
efficiency, workload, and impact data (1999b, 12)—when can’t recognize failure, you can’t correct it” (152) (improve);
discussing his 10 different purposes (chapter 11), he refers “If you can demonstrate results, you can win public support”
almost exclusively to outcome measures. (154) (promote).
4. These eight purposes are not completely distinct. For example, 6. Anyone who wishes to add a purpose to this list should also
learning itself is valuable only when put to some use. Obvi- define the characteristics of potential measures that will be
ously, two ways to use the learning extracted from perfor- most appropriate for this additional purpose.
mance measurements are to improve and to budget. Simi- 7. But isn’t promoting accountability a quite distinct and also
larly, evaluation is not an ultimate purpose; to be valuable, very important purpose for measuring performance? After
any evaluation has to be used either to redesign programs (to all, scholars and practitioners emphasize the connection be-
improve) or to reallocate resources (to budget) by moving tween performance measurement and accountability. Indeed,
funds into more valuable uses. Even the budgetary purpose it is Hatry’s first use of performance information (1999b, 158).
is subordinate to improvement. In a report commissioned by the Governmental Accounting
Indeed, the other seven purposes are all subordinate to im- Standards Board on what it calls “service efforts and accom-
provement. Whenever public managers use performance plishments [SEA] reporting,” Hatry, Fountain, and Sullivan
measures to evaluate, control, budget, motivate, promote, (1990) note that SEA measurement reflects the board’s de-
celebrate, or learn, they do so only because these activities— sire “to assist in fulfilling government’s duty to be publicly
they believe or hope—will help to improve the performance accountable and … enable users to assess that accountabil-
of government. ity” (2). Moreover, they argue, without such performance
measures, elected officials, citizens, and other users “are not

600 Public Administration Review • September/October 2003, Vol. 63, No. 5


able to fully assess the adequacy of the governmental entity’s does, in some ways, seek to control the managers. Yes, under
performance or hold it accountable for the management of a make-the-manager-manage performance contract, the man-
taxpayer and other resources” (2–3). Indeed, they continue, ager has the flexibility to achieve his or her performance tar-
elected officials and public managers have a responsibility gets; at the same time, these output targets can be thought of
“to be accountable by giving information that will assist the as output “controls.” I am grateful to an anonymous referee
public in assessing the results of operations” (5). for this insight.
But what exactly does it mean to hold government account- 10. For a discussion of the pervasive efforts of public officials
able? In a 1989 resolution, the Governmental Accounting to control the behavior of their subordinates, see the classic
Standards Board called SEA information “an essential ele- discussion by Landau and Stout (1979).
ment of accountability.” Indeed, in this resolution, the agency 11. For example, Robert Anthony’s business texts on account-
“gave considerable weight to the concept of accountability: ing include The Management Control Function (1988) and
of ‘being obliged to explain one’s actions, to justify what one (with Vijay Govindarajan) Management Control Systems
does’; of being required ‘to answer to the citizenry—justify (1998). Similarly, his equivalent book (with David W. Young)
the raising of public resources and the purposes for which for the public and nonprofit sectors is titled Management
they are used’” (Hatry et al. 1990, v). But does the phrase Control in Nonprofit Organizations (1999).
“hold government accountable” cover only the requirements 12. “Do management information systems lead to greater man-
to explain, justify, and answer? Or does accountability really agement control?” Overman and Loraine (1994, 193) ask
mean punishment? this question and conclude they do not. From an analysis of
I find the use of the word “accountability” to be both ubiqui- 99 Air Force contracts, they could not find any relationship
tous and ambiguous. Yet it is difficult to examine how perfor- between the quality, detail, and timeliness of information
mance measurement will or might promote accountability received from the vendors and the cost, schedule, or quality
without first deciding what citizens collectively mean by ac- of the project. Instead, they argue, “information can sym-
countability—particularly, what we mean by accountability bolize other values in the organization” (194). Still, legisla-
for performance. What does it mean to hold a public agency tures, overhead agencies, and line managers seek control
or manager accountable for performance? Presumably, this through performance measurement.
holding-people-accountable-for-performance process would 13. Melkers and Willoughby (2001) report that 47 of the 50 states
employ some measure of performance. But what measures have some form of performance budgeting, which they de-
would be most useful for this promoting-accountability pur- fine “as requiring strategic planning regarding agency mis-
pose? And how would those measures actually be used to sion, goals and objectives, and a process that requests quan-
promote accountability? (Or to revise the logical sequence: tifiable data that provides meaningful information about
How might we use performance measures to promote account- program outcomes” (54). Yet when they asked people in
ability? Then, what measures would be most useful for pro- both the executive and legislative branches of state govern-
moting this accountability?) Before we as a polity can think ment if their state had implemented performance budget-
analytically and creatively about how we might use perfor- ing, they found that “surprisingly, budgeters from a handful
mance measures to promote accountability, we need to think of states (10) disagreed across the branches as to imple-
more analytically and creatively about what we mean by ac- mentation of performance budget in their state” (59). A hand-
countability. For a more detailed discussion of accountabil- ful? Melkers and Willoughby received responses from both
ity—particularly of accountability for performance—see branches from only 32 states. Thus, in nearly a third of the
Behn (2001). states that responded, the legislative-branch respondent dis-
8. Joyce (1997) makes a similar argument: “The ability to mea- agreed with the executive-branch respondent. Not only is it
sure performance is inexorably related to a clear understand- difficult to define what performance budgeting is, it is also
ing of what an agency or program is trying to accomplish” difficult to determine whether it has been implemented.
(50). Unfortunately, he continues, “the U.S. constitutional and 14. A more controversial use of performance measurement to
political traditions, particularly at the national and state lev- motivate is the linking of performance data to an individual’s
els, work against this kind of clarity, because objectives are pay. For a discussion, see Smith and Wallace (1995).
open to constant interpretation and reinterpretation at every
15. Williams, McShane, and Sechrest (1994) worry that “raw
stage of the policy process” (60).
data may be misinterpreted by those without statistical train-
9. Although a controlling approach to managing superior–sub- ing” (538), while at the same time “summaries of manage-
ordinate relations may be out of style, the same is not neces- ment information based on aggregate data are potentially
sarily true for how policy makers manage operating agen- dangerous to decision makers” (539). Moreover, they note
cies. The New Public Management consists of two conflicting the differing assumptions that managers and evaluators bring
approaches: Letting the managers manage, versus making the to the task of interpreting data: “The administrator often
managers manage (Kettl 1997, 447–48). And while the let- makes the implicit assumption that a project or operation is
the-managers-manage strategy does, indeed, empower the fully capable of succeeding” (541), while “the evaluator is
managers of public agencies (for example, by giving them apt to see the very core of his role as a questioning of the
more flexibility), the make-the-managers-manage strategy assumptions behind the project” (541). The evaluator starts

Why Measure Performance? 601


with the assumption that the program doesn’t work; the “outcome effect” in the evaluation of managerial perfor-
manager, of course, believes that it does. mance. This outcome effect causes the evaluators of a
16. Understanding what is going on inside the black box is dif- manager’s decision to give more weight to the actual out-
ficult in all of the sciences. Physicists, for example, do not come than is warranted given the circumstances—particu-
know what is going on inside the black box of gravity. They larly, the uncertainty—under which the original decision was
know what happens—they know that two physical objects made. That is, when an evaluator considers a decision made
attract each other, and they can calculate the strength of that by a manager, who could only make an uncertain, probabi-
attraction—but they don’t understand how the inputs of mass listic guess about the future state of one or more important
and distance are converted into the output of physical at- variables, the evaluator will give higher marks when the
traction. Newton figured out that, to determine very accu- outcome seems to validate the initial choice than when the
rately (in a very wide variety of circumstances) the force of outcome doesn’t—even when the circumstances of the de-
attraction between two objects, you need only three mea- cision are precisely the same (Baron and Hershey 1988;
surable inputs: the mass of the first object, the mass of the Lipshitz 1989; Hawkins and Hastie 1990; Hershey and Baron
second object, and the distance between them: 1992; Ghosh and Ray 2000).
F = G × m1 × m2 / d2 21. In business, Kaplan and Norton (1992) emphasize, the chal-
(where G is the universal gravitational constant) lenge is to figure out how to make an “explicit linkage be-
tween operations and finance” (79). They emphasize, “the
Unfortunately for physicists, this universal law doesn’t work
hard truth is that if improved [internal, operational] perfor-
at the subatomic level: Here, the classical laws of gravita-
mance fails to be reflected in the bottom line, executives
tional and electrical attraction between physical objects do
should reexamine the basic assumptions of their strategy
not hold. Thus, when confronted with their inability to even
and mission” (77).
calculate (using an existing formula) what is happening in-
side one of their black boxes, physicists invent new con- The same applies in government. Public managers need an
cepts (Behn 1992)—in this case, for example, the strong explicit link between operations and outcomes. If they use
force and the weak force—that they can use to produce cal- output (or process) measures to motivate people in their
culations that match the behavior they observe. But this does agency to ratchet up performance, and yet the outcomes that
not mean that physicists understand what creates these in- these outputs (or processes) are supposed to produce don’t
side-the-black-box forces. improve, they need to reexamine their strategy—and their
assumptions about how these outputs (or processes) may or
17. My black box of performance management differs from the
may not contribute to the desired outcomes.
one defined by Ingraham and Kneedler (2000) and Ingraham
and Donahue (2000). In their “new performance model,” 22. In some ways, measures that are designed to motivate inter-
the black box is government management, and the inputs nal improvements in a public agency’s performance appear
are politics, policy direction, and resources, all of which are to correspond to the measures that Kaplan and Norton de-
imbedded in a set of environmental factors or contingen- sign for their internal business perspective. Such internal
cies. In my conception, the black box is the agency (or, more measures help managers to focus on critical internal opera-
accurately, the people who work in the agency), the col- tions, write Kaplan and Norton. “Companies should decide
laborative (that is, the people who staff both the agency and what processes and competencies they must excel at and
its various partners), or society (the collection of citizens). specify measures for each.” Then, they continue “managers
Management and leadership are inputs that seek to improve must devise measures that are influenced by employees’ ac-
the performance of the black box by convincing the envi- tions” (1992, 74–75). That is, to motivate their employees
ronment to provide better inputs and by attempting to influ- to improve internal operations, a firm’s leaders need output
ence the diligence, intelligence, and creativity with which measures.
those inside the black box convert the other inputs into out- 23. In January 2002, when he announced his campaign for state
puts and outcomes. treasurer, Daniel A. Grabauskas emphasized that, as the Mas-
18. If all of the data are indeed random, analysts who use the sachusetts registrar of motor vehicles, he had cut waiting
traditional 5 percent threshold for statistical significance and time by over an hour (Klein 2002).
who check out 20 potentially causal variables will identify 24. For the general public, NAPA’s advisory panel suggests,
(on average) one of these variables as statistically signifi- performance measures need to be suitably summarized
cant. If they test 100 variables, they will (on average) find (NAPA 1994, 23).
five that are statistically significant. 25. When attempting to select a measure to promote a public
19. Perhaps this explains why formal performance evaluation agency’s achievements, it is not obvious which performance
has attracted a larger following and literature than has per- measure will capture citizens’ concerns. What, for example,
formance learning, let alone performance improvement. do the citizens visiting their state division of motor vehicles
really care about? They might care about how long they
20. A note of caution: Using outcomes to evaluate an organiza-
wait in line. They might care less about how long they wait
tion’s performance makes sense, except that the organiza-
in line if they know, when they first get in line, how long
tion’s performance is not the only factor affecting the out-
they will have to wait. Some might say they will be quite
comes. Yet, cognitive psychologists have identified the

602 Public Administration Review • September/October 2003, Vol. 63, No. 5


happy to wait 29 minutes, but not 30. Or, they might not believes that two strategies will have an synergistic effect,
care how long they wait as long as they can do it in a com- he or she will—quite naturally—choose to implement them
fortable chair. Thus, before selecting a measure to promote simultaneously in the same jurisdiction.
the agency’s performance, the agency’s leadership should 30. This suggests the limitations of performance budgeting as a
make some effort—through polls, focus groups, or customer strategy for improving performance: How much do budget
surveys—to determine what the public wants. officials know about how budget allocations affect the in-
Polls or focus groups, however, may produce only a theo- side-the-black-box behaviors that improve performance? Do
retical answer. Have people who visit the DMV only bien- they really know which changes in the budget inputs will
nially really thought through what they want? A customer create the kind of complex inside-the-black-box interactions
survey—administered as people leave the DMV (or while that can create improvements in organizational outputs, and
they wait)—might produce a better sense of how people thus societal outcomes?
really measure the agency’s performance. 31. Managers could simply allocate the available funds to the
26. Actually, managers don’t need to identify the best practice. existing activities that are (using strictly internal compari-
To improve their organization’s performance, they need only sons) most efficient. Without some external standard of ef-
to identify a better practice. ficiency, however, they could spend all of their appropria-
27. To promote the division of motor vehicles with the public, tions on completely inefficient operations.
managers may simply publish the average length of time 32. Measuring performance against similar public agencies in a
that citizens wait in line at the DMV office (compared, per- way that facilitates useful comparisons among jurisdictions
haps, with the average wait in similar lines). Waiting time is is not easy. Agencies and jurisdictions organize themselves
an easily understood concept. Yet, when an organization differently. They collect different kinds of data. They define
reports its waiting time, it rarely explains that this number inputs, processes, outputs, and outcomes differently. Con-
is the average waiting time because this is what people im- sequently, obtaining comparable data is difficult—some-
plicitly assume. times impossible. To make valid comparisons, someone must
To learn, however, the average wait is too much of a sum- undertake the time-consuming task that Ammons, Coe, and
mary statistic. The average wait obscures all of the interest- Lombardo call “data cleaning” (2001, 102).
ing deviances that can be found only in the disaggregated Still, even when perfectly similar data have been collected
data: What branch has a wait time that is half the statewide from multiple jurisdictions, making useful comparisons is
average (and what can the division learn from this)? What also difficult. Is one city reporting higher performance data
day of the month or hour of the day has the longest wait for a particular agency because its leadership is more in-
time (and what might the division do about this)? From such spiring or inventive, because the agency has inherited a more
deviances, the DMV’s managers can learn what is working effective organizational structure, because its political and
best within their agency and what needs to be fixed. managerial leadership has adopted a strategy designed to
28. After all, if any individual had expected a major deviance, focus on some outcomes and not others, because the city
he or she presumably would have done something to pre- council established the agency’s mission as a higher prior-
vent it. Of course, an individual might have anticipated this ity, because the city council was willing to invest in modern
deviance but also anticipated that it would be labeled a mi- technology, or because more of its citizens are cooperating
nor mistake rather than a huge failure (and thus they, too, fully? Even comparing performance measures for such a
have an opportunity to learn because, although they had fundamental public service as trash collection is not simple.
expected the event, they did not expect it would be called a For a more detailed discussion of why benchmarking per-
failure.) Or, an individual may have anticipated the failure formance results among local governments may be more
but may not have been in a position to prevent it or to con- difficult than theorized, see Coe (1999, 111).
vince those who could prevent it that it would really hap- 33. Criticism of public-sector service delivery has increased in
pen. Or an individual may have anticipated the failure and the last two decades because a number of the traditional
hoped it would occur because its consequences (which were process measures for public-sector services—such as the
unanticipated by others) would further the individual’s own time spent in a line or the ease of navigating a Web site—
agenda. Or an individual may have anticipated the failure can easily be compared with the same process measures for
but gambled that the probability of the failure, combined the private sector, and because many private-sector firms
with its personal costs, were less than the certain personal have made a concerted effort to improve the process mea-
costs of exposing his or her own responsibility for the causes sures that customers value. Once people become accustomed
behind this (potential, future) failure. Some people may have to a short wait at their bank or when calling a toll-free num-
anticipated the failure, but certainly not everyone did. ber—once they learn that it is technically possible to ensure
29. The evaluator’s ideal, of course, is that “only one new strat- the wait is brief—they expect the same short wait from all
egy should be introduced in one [organization],” while the other organizations, both private and public.
baseline strategy “would go on in a demographically simi- 34. For a discussion of how accountability has become a code
lar [organization].” Public executives, however, rarely are word for punishment and how we might make it mean some-
able to conduct the evaluator’s “carefully controlled field thing very different, see Behn (2001).
experiment” (Karmen 2000, 95). Moreover, if the manager

Why Measure Performance? 603


35. Peters and Waterman (1982, 268) attribute it to Mason Haire. performance, people in the organization will avoid focus-
36. For example, in business, Kaplan and Norton write, “return- ing on one measure (or one kind of measure) at the expense
on-investment and earnings-per-share can give misleading of some others; after all, an “improvement in one area may
signals for continuous improvement and innovation” (1992, have been achieved at the expense of another.” And, even if
71). a part of the organization chooses to focus on one perfor-
37. Kaplan and Norton also argue their balanced scorecard mance indicator and ignore the others, the leadership—be-
“guards against suboptimization.” Because the leadership cause it is measuring a variety of things—is much more apt
is measuring a variety of indicators of the organization’s to catch this suboptimal behavior (1992, 72).

References

Ammons, David N. 1995. Overcoming the Inadequacies of Per- Blodgett, Terrell, and Gerald Newfarmer. 1996. Performance
formance Measurement in Local Government: The Case of Measurement: (Arguably) The Hottest Topic in Government
Libraries and Leisure Services. Public Administration Review Today. Public Management, January, 6.
55(1): 37–47. Boone, Harry. 1996. Proving Government Works. State Govern-
Ammons, David N., Charles Coe, and Michael Lombardo. 2001. ment News, May, 10–12.
Performance-Comparison Projects in Local Government: Par- Bruns, William J. 1993. Responsibility Centers and Performance
ticipants’ Perspectives. Public Administration Review 61(1): Measurement. Note 9-193-101. Boston, MA: Harvard Busi-
100–10. ness School.
Anthony, Robert N. 1988. The Management Control Function. Coe, Charles. 1999. Local Government Benchmarking: Lessons
Boston, MA: Harvard Business School Press. from Two Major Multigovernment Efforts. Public Adminis-
Anthony, Robert N., and Vijay Govindarajan. 1998. Manage- tration Review 59(2): 110–23.
ment Control Systems. 9th ed. Burr Ridge, IL: McGraw-Hill/ Duncan, W. Jack. 1989. Great Ideas in Management: Lessons
Irwin. from the Founders and Foundations of Managerial Practice.
Anthony, Robert, and David W. Young. 1999. Management Con- San Francisco, CA: Jossey-Bass.
trol in Nonprofit Organizations. 6th ed. Burr Ridge, IL: Feynman, Richard. 1965. The Character of Physical Law. Cam-
McGraw-Hill/Irwin. bridge, MA: MIT Press.
Bardach, Eugene. 1998. Getting Agencies to Work Together: The Ghosh, Dipankar, and Manash R. Ray. 2000. Evaluating Mana-
Practice and Theory of Managerial Craftsmanship. Wash- gerial Performance: Mitigating the “Outcome Effect.” Jour-
ington, DC: Brookings Institution. nal of Managerial Issues 12(2): 247–60.
Baron, Jonathan, and John C. Hershey. 1988. Outcome Bias in Hatry, Harry P. 1999a. Mini-Symposium on Intergovernmental
Decision Evaluation. Journal of Personality and Social Psy- Comparative Performance Data. Public Administration Re-
chology 54(4): 569–79. view 59(2): 101–4.
Behn, Robert D. 1991. Leadership Counts: Lessons for Public ———. 1999b. Performance Measurement: Getting Results.
Managers. Cambridge, MA: Harvard University Press. Washington, DC: Urban Institute.
———. 1992. Management and the Neutrino: The Search for Hatry, Harry P., James R. Fountain, Jr., Jonathan M. Sullivan.
Meaningful Metaphors. Public Administration Review 52(5): 1990. Overview. In Service Efforts and Accomplishments
409–19. Reporting: Its Time Has Come, edited by Harry P. Hatry, James
———. 1996. The Futile Search for the One Best Way. Govern- R. Fountain, Jr., Jonathan M. Sullivan, and Lorraine Kremer,
ing, July, 82. 1–49. Norwalk, CT: Governmental Accounting and Standards
———. 1997a. The Money-Back Guarantee. Governing, Sep- Board.
tember, 74. Hatry, Harry P., James R. Fountain, Jr., Jonathan M. Sullivan,
———. 1997b. Linking Measurement to Motivation: A Chal- and Lorraine Kremer. 1990. Service Efforts and Accomplish-
lenge for Education. In Improving Educational Performance: ments Reporting: Its Time Has Come. Norwalk, CT: Govern-
Local and Systemic Reforms, Advances in Educational Ad- mental Accounting and Standards Board.
ministration 5, edited by Paul W. Thurston and James G. Ward, Hawkins, Scott A., and Reid Hastie. 1990. Hindsight: Biased
15–58. Greenwich, CT: JAI Press. Judgements of Past Events after the Outcomes are Known.
———. 1999. Do Goals Help Create Innovative Organizations? Psychological Bulletin 107(3): 311–27.
In Public Management Reform and Innovation: Research, Hershey, John C., and Jonathan Baron. 1992. Judgment by Out-
Theory, and Application, edited by H. George Frederickson comes: When Is It Justified? Organizational Behavior and
and Jocelyn M. Johnston, 70–88. Tuscaloosa, AL: University Human Decision Processes 53(1): 89–93.
of Alabama Press. Holloway, Jacky A., Graham A.J. Francis, and C. Matthew
———. 2001. Rethinking Democratic Accountability. Washing- Hinton. 1999. A Vehicle for Change? A Case Study of Per-
ton, DC: Brookings Institution. formance Improvement in the “New” Public Sector. Interna-
tional Journal of Public Sector Management 12(4): 351–65.

604 Public Administration Review • September/October 2003, Vol. 63, No. 5


Holt, Craig L. 1995–96. Performance Based Budgeting: Can It Lehan, Edward Anthony. 1996. Budget Appraisal—The Next
Really Be Done? The Public Manager 24(4): 19–21. Step in the Quest for Better Budgeting? Public Budgeting
Ingraham, Patricia W., and Amy E. Kneedler. 2000. Dissecting and Finance 16(4): 3–20.
the Black Box: Toward a Model and Measures of Govern- Levin, Martin A., and Mary Bryna Sanger. 1994. Making Gov-
ment Management Performance. In Advancing Public Man- ernment Work: How Entrepreneurial Executives Turn Bright
agement: New Developments in Theory, Methods, and Prac- Ideas into Real Results. San Francisco, CA: Jossey-Bass.
tice, edited by Jeffrey L. Brudney, Laurence J. O’Toole, Jr., Lipshitz, Raanan. 1989. Either a Medal or a Corporal: The Ef-
and Hal G. Rainey, 235–52. Washington, DC: Georgetown fects of Success and Failure on the Evaluation of Decision
University Press. Making and Decision Makers. Organizational Behavior and
Ingraham, Patricia W., and Amy Kneedler Donahue. 2000. Dis- Human Decision Processes 44(3): 380–95.
secting the Black Box Revisited: Characterizing Government Locke, Edwin A., and Gary P. Latham. 1984. Goal Setting: A
Management Capacity. In Governance and Performance: New Motivational Technique That Works. Englewood Cliffs, NJ:
Perspectives, edited by Carolyn J. Heinrich and Laurence E. Prentice Hall.
Lynn, Jr., 292–318. Washington, DC: Georgetown Univer- ———. 1990. A Theory of Goal Setting and Task Performance.
sity Press. Englewood Cliffs, NJ: Prentice Hall.
International City/County Management Association (ICMA). Melkers, Julia, and Katherine Willoughby. 1998. The State of
1999. Comparative Performance Measurement: FY 1998 Data the States: Performance-Based Budgeting Requirements in
Report. Washington, DC: ICMA. 47 out of 50. Public Administration Review 58(1): 66–73.
Jordon, Meagan M., and Merl M. Hackbart. 1999. Performance ———. 2001. Budgeters’ Views of State Performance-Budget-
Budgeting and Performance Funding in the States: A Status ing Systems: Distinctions across Branches. Public Adminis-
Assessment. Public Budgeting and Finance 19(1): 68–88. tration Review 61(1): 54–64.
Joyce, Philip G. 1996. Appraising Budget Appraisal: Can You Messadié, Gerald. 1991. Great Scientific Discoveries. New York:
Take Politics Out of Budgeting. Public Budgeting and Fi- Chambers.
nance 16(4): 21–25. Murphey, David A. 1999. Presenting Community-Level Data in
———. 1997. Using Performance Measures for Budgeting: A an “Outcomes and Indicators” Framework: Lessons from
New Beat, or Is It the Same Old Tune? In Using Performance Vermont’s Experience. Public Administration Review 59(1):
Measurement to Improve Public and Nonprofit Programs, 76–82.
New Directions for Evaluation 75, edited by Kathryn E. New- National Academy of Public Administration (NAPA). 1994. To-
comer, 45–61. San Francisco, CA: Jossey-Bass. ward Useful Performance Measurement: Lessons Learned
Kaplan, Robert S., and David P. Norton. 1992. The Balanced from Initial Pilot Performance Plans Prepared under the Gov-
Scorecard—Measures that Drive Performance. Harvard Busi- ernment Performance and Results Act. Washington, DC:
ness Review 70(1): 71–91. NAPA.
Karmen, Andrew. 2000. New York Murder Mystery: The True ———. 1999. Using Performance Data to Improve Government
Story behind the Crime Crash in the 1990s. New York: New Effectiveness. Washington, DC: NAPA.
York University Press. Neves, Carole M.P., James F. Wolf, and Bill B. Benton. 1986.
Kerr, Steve. 1975. On the Folly of Rewarding A, While Hoping The Use of Management Indicators in Monitoring the Per-
for B. Academy of Management Journal 18(4): 769–83. formance of Human Service Agencies. In Performance and
Kettl, Donald F. 1997. The Global Revolution in Public Man- Credibility: Developing Excellence in Public and Nonprofit
agement: Driving Themes, Missing Links. Journal of Policy Organizations, edited by Joseph S. Wholey, Mark A.
Analysis and Management 16(3): 446–62. Abramson, and Christopher Bellavita, 129–48. Lexington,
Klein, Rick. 2002. Registry Chief Quits to Run for State Trea- MA: Lexington Books.
surer. Boston Globe, January 10, B5. Newcomer, Kathryn E. 1997. Using Performance Measures to
Kopczynski, Mary, and Michael Lombardo. 1999. Comparative Improve Programs. In Using Performance Measurement to
Performance Measurement: Insights and Lessons Learned Improve Public and Nonprofit Programs, New Directions for
from a Consortium Effort. Public Administration Review Evaluation 75, edited by Kathryn E. Newcomer, 5–13. San
59(2): 124–34. Francisco, CA: Jossey-Bass.
Kouzmin, Alexander, Elke Löffler, Helmut Klages, and Nada Osborne, David, and Ted Gaebler. 1992. Reinventing Govern-
Korac-Kakabadse. 1999. Benchmarking and Performance ment. Reading, MA: Addison-Wesley.
Measurement in Public Sectors. International Journal of Pub- Osborne, David, and Peter Plastrik. 2000. The Reinventor’s
lic Sector Management 12(2): 121–44. Fieldbook: Tools for Transforming Your Government. San
Kravchuk, Robert S., and Ronald W. Schack. 1996. Designing Francisco, CA: Jossey-Bass.
Effective Performance Measurement Systems under the Gov- Overman, E. Sam, and Donna T. Loraine. 1994. Information for
ernment Performance and Results Act of 1993. Public Ad- Control: Another Management Proverb? Public Administra-
ministration Review 56(4): 348–58. tion Review 54(2): 193–96.
Landau, Martin, and Russell Stout, Jr. 1979. To Manage Is Not Peters, Thomas J., and Robert H. Waterman, Jr. 1982. In Search
to Control: Or the Folly of Type II Errors. Public Administra- of Excellence. New York: Harper and Row.
tion Review 39(2): 148–56.

Why Measure Performance? 605


Peters, Tom, and Nancy Austin. 1985. A Passion for Excellence:
The Leadership Difference. New York: Random House.
Petroski, Henry. 1985. To Engineer is Human: The Role of Fail-
ure in Successful Design. New York: St. Martin’s Press.
Poister, Theodore H., and Gregory Streib. 1999. Performance
Measurement in Municipal Government: Assessing the State
of the Practice. Public Administration Review 59(4): 325–35.
Rivenbark, William C., and Paula K. Few. 2000. Final Report
on City Services for Fiscal Year 1998–99: Performance and
Cost Data. Chapel Hill, NC: University of North Carolina–
Chapel Hill, North Carolina Local Government Performance
Measurement Project.
Rosenthal, Burton. 1975. Lead Poisoning (A) and (B). Cam-
bridge, MA: Kennedy School of Government.
Silverman, Eli B. 2001. NYPD Battles Crime: Innovative Strat-
egies in Policing. Boston, MA: Northeastern University Press.
Sitkin, Sim B. 1992. Learning through Failure: The Strategy of
Small Losses. Research in Organizational Behavior 14: 231–
66.
Smith, Kimberly J., and Wanda A. Wallace. 1995. Incentive Ef-
fects of Service Efforts and Accomplishments Performance
Measures: A Need for Experimentation. International Jour-
nal of Public Administration 18(2/3): 383–407.
Sparrow, Malcolm K. 2000. The Regulatory Craft: Controlling
Risks, Solving Problems, and Managing Compliance. Wash-
ington, DC: Brookings Institution.
Standage, Tom. 2000. The Neptune File: A Story of Astronomi-
cal Rivalry and the Pioneers of Planet Hunting. New York:
Walker Publishing.
Theurer, Jim. 1998. Seven Pitfalls to Avoid When Establishing
Performance Measures. Public Management 8(7): 21–24.
Thompson, Fred. 1994. Mission-Driven, Results-Oriented Bud-
geting: Fiscal Administration and the New Public Manage-
ment. Public Budgeting and Finance 15(3): 90–105.
Thompson, Fred, and Carol K. Johansen. 1999. Implementing
Mission-Driven, Results-Oriented Budgeting. In Public Man-
agement Reform and Innovation: Research, Theory, and Ap-
plication, edited by H. George Frederickson and Jocelyn M.
Johnston, 189–205. Tuscaloosa, AL: University of Alabama
Press.
Wholey, Joseph S. 1997. Clarifying Goals, Reporting Results.
In Progress and Future Directions in Evaluation: Perspec-
tives on Theory, Practice and Methods, New Directions for
Evaluation 76, edited by Debra J. Rog and Deborah Fournier,
95–105. San Francisco, CA: Jossey-Bass.
Wholey, Joseph S., and Harry P. Hatry. 1992. The Case for Per-
formance Monitoring. Public Administration Review 52(6):
604–10.
Wholey, Joseph S., and Kathryn E. Newcomer. 1997. Clarifying
Goals, Reporting Results. In Using Performance Measure-
ment to Improve Public and Nonprofit Programs, New Di-
rections for Evaluation 75, edited by Kathryn E. Newcomer,
91–98. San Francisco, CA: Jossey-Bass.
Williams, Frank P. III, Marilyn D. McShane, and Dale Sechrest.
1994. Barriers to Effective Performance Review: The Seduc-
tion of Raw Data. Public Administration Review 54(6): 537–
42.

606 Public Administration Review • September/October 2003, Vol. 63, No. 5

You might also like