Psychometrika VOL NO Eptember DOI S: Virtual
Psychometrika VOL NO Eptember DOI S: Virtual
Psychometrika VOL NO Eptember DOI S: Virtual
3, 457–475
S EPTEMBER 2009
DOI : 10.1007/ S 11336-009-9115-2
M ICHAEL J. B RUSCO
FLORIDA STATE UNIVERSITY
Several authors have touted the p-median model as a plausible alternative to within-cluster sums of
squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision
of “exemplars” as cluster centers, robustness with respect to outliers, and the accommodation of a diverse
range of similarity data. We developed a new simulated annealing heuristic for the p-median problem
and completed a thorough investigation of its computational performance. The salient findings from our
experiments are that our new method substantially outperforms a previous implementation of simulated
annealing and is competitive with the most effective metaheuristics for the p-median problem.
Key words: cluster analysis, partitioning, heuristics, p-median model, simulated annealing.
1. Introduction
The K-means method (Forgy, 1965; Hartigan & Wong, 1979; Howard, 1966; MacQueen,
1967; Steinhaus, 1956; Thorndike, 1953) is arguably the most popular method for conducting a
nonhierarchical cluster analysis of two-mode, two-way proximity data (see Steinley, 2006 for a
review). Such data are typically represented by an N × V matrix, A, that contains measurements
for N objects on each of V variables. In most applications, the K-means algorithm attempts
to partition the N objects into K clusters based on the V variables with the goal of minimiz-
ing the sum (across all objects) of the squared Euclidean distances between each object and its
cluster centroid. The centroid of a cluster in this context consists of a set of V variable means.
Because the variable means for a cluster typically do not explicitly correspond to the variable
measurements for any particular object in the cluster, the centroids are often termed implicit or
virtual.
Although less common than K-means methods, there are partitioning methods that seek to
establish clusters such that the center of each cluster explicitly corresponds to an object. Part of
the difficulty in tracking down these methods is that they span multiple disciplines and are often
called by different names. For example, Kaufman and Rousseeuw (1990, Chapter 2) describe
the “partitioning around medoids,” or “K-medoid” problem, which requires the selection of K
representative objects for clusters. This is congruent with the “p-median” approach to clustering,
which has an extensive history in the management sciences because of its relationship to facil-
ity location planning (Maranzana, 1964). The implementation of the p-median model as a data
analysis tool was discussed by Mulvey and Crowder (1979) and Klastorin (1985), and was advo-
cated by Brusco and Köhn (2008a, 2008b). Another recent contribution was offered by Frey and
Dueck (2007), who presented a method termed “affinity propagation,” which also focuses on the
solution of an adaptation of the p-median problem. Throughout the remainder of this paper, we
Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, College of Business, Florida
State University, Tallahassee, FL 32306-1110, USA. E-mail: mbrusco@cob.fsu.edu
457
© 2009 The Psychometric Society
458 PSYCHOMETRIKA
employ the term “p-median,” while recognizing the potential for other sources to use alternative
terminology for the same model.
In addition to providing a representative object (exemplar) as a centroid, several other advan-
tages of the p-median clustering method have been identified. First, Kaufman and Rousseeuw
(1990, Chapter 2) suggest that the p-median approach is more robust than K-means clustering
or other K-centroids procedures, particularly with respect to handling outliers. Second, the p-
median model has tremendous flexibility. For example, the model does not require metric data
and can be applied easily to proximity data established via the application of similarity coeffi-
cients to nominal or ordinal data. In fact, the p-median method can often be employed gainfully
to proximity matrices with triangle inequality violations, asymmetric proximity matrices, and
to nonsquare proximity matrices (Brusco & Köhn, 2008a). Third, as demonstrated by Avella,
Sassano, and Vasil’ev (2007), Beltran, Tadonki, and Vial (2006), and Brusco and Köhn (2008b),
exact (i.e., globally optimal) solutions for large instances of the p-median problems are often
achievable. Avella et al. (2007), for example, described a branch-cut-price algorithm that pro-
vides optimal solutions for problems as large as N = 3,795 and p = 500. It is important to note,
however, that the largest test problems used in most evaluations of exact methods employ in-
put proximities computed as Euclidean distances between points in two-dimensional space. The
performance of exact methods has not been sufficiently evaluated for input proximities obtained
from computing Euclidean distances between points measured on three or more variables (i.e.,
data located in a dimensional space greater than two). Moreover, violations of the triangle in-
equality and asymmetric proximity relations can pose formidable challenges for exact methods
(e.g., see Hansen, Mladenović, & Perez-Brito, 2001, p. 345).
Given the propitious characteristics of the p-median model, why does its usage pale in com-
parison to the ubiquitous K-means method? One plausible reason might be the potentially limited
scalability of p-median algorithms relative to K-means clustering heuristics. Whereas batch im-
plementations of the K-means algorithm (Forgy, 1965; Howard, 1966) can accommodate tens of
thousands of objects, some of the more popular p-median heuristics might encounter computer
storage and/or CPU time difficulties for data sets with several thousand objects. A second ex-
planation for the dearth of p-median clustering applications may stem from the lack of available
software for conducting p-median cluster analyses. Although several researchers have provided
software programs for p-median applications, the availability of procedures in commercial sta-
tistical packages is virtually nonexistent. Moreover, potential users of p-median clustering meth-
ods are currently restricted to a choice between relatively inflexible, but sophisticated, procedures
that are available as compiled Fortran or C++ codes (e.g., Avella et al., 2007; Brusco & Köhn,
2008b; Hansen et al., 2001; Resende & Werneck, 2004), or user-friendly implementations in R
or MATLAB that lack effectiveness for certain data conditions (e.g., Brusco & Köhn, 2008a;
Kaufman & Rousseeuw, 1990). Recently, Frey and Dueck (2007) presented a MATLAB imple-
mentation of their affinity propagation algorithm as a long-awaited breakthrough for application-
oriented, scalable, versatile, efficient, and effective exemplar-based clustering. Unfortunately, as
Brusco and Köhn (2008a) showed, affinity propagation suffers from a serious deficit with respect
to its ability to obtain a globally optimal partition for even modestly sized data sets. In light of
these issues, any intent to facilitate the advancement of exemplar-based clustering in the applied
behavioral sciences still faces the challenge of devising accessible programs that can produce
high-quality solutions for problems of various data structure and substantial size.
One procedure that would seem to be especially well suited for providing high-quality so-
lutions to p-median problems in a user-friendly format is simulated annealing (see Aarts &
Korst, 1989; Kirkpatrick, Gelatt, & Vecchi, 1983; van Laarhoven & Aarts, 1987), which has
been recognized as a powerful metaheuristic for solving combinatorial optimization problems
for more than two decades. Although simulated annealing does not guarantee that a globally
optimal solution will be identified, the procedure has been applied successfully to a vast array
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 459
of data-analytic tasks such as matrix permutation (Brusco, Köhn, & Stahl, 2008), classification
(Ceulemans & Van Mechelen, 2008; Ceulemans, Van Mechelen, & Leenen, 2007), unidimen-
sional scaling (Murillo, Vera, & Heiser, 2005) and multidimensional scaling (Vera, Heiser, &
Murillo, 2007). Given this formidable reputation, we were particularly intrigued by the some-
what lackluster performance of simulated annealing implementations for the p-median problem
(Chiyoshi & Galvão, 2000; Murray & Church, 1996) when compared to alternative approaches.
Consider, for example, reported results for the better of these two simulated annealing approaches
(Chiyoshi and Galvão’s) for the 40 p-median problems from the OR-Library (Beasley, 1990).
Chiyoshi and Galvão applied their simulated annealing heuristic to each of these test instances,
obtaining the known globally optimal solution for only 26 of the 40 problems. Contrastingly,
Hansen and Mladenović (1997, p. 217) obtained 39 and 38 optimal solutions (out of 40) using
variable neighborhood search and tabu search, respectively.
Our fundamental goal in this paper is to build on recent advancements in the area of
exemplar-based clustering by developing a new simulated annealing heuristic for the p-median
problem and conducting a rigorous evaluation of its performance relative to competing meth-
ods. First, we present results showing that our new method substantially outperformed Chiyoshi
and Galvão’s (2000) implementation of simulated annealing for the p-median problem. Second,
we demonstrate that our new simulated annealing heuristic generally provides partitions with
better objective function values than affinity propagation and vertex substitution across a di-
verse set of difficult test problems. Third, we provide evidence that our new simulated annealing
heuristic is competitive with the best metaheuristics available for the p-median problem (Alba &
Dominguez, 2006; Hansen & Mladenović, 1997; Resende & Werneck, 2004; Taillard, 2003).
In Section 2, we describe the p-median model and provide a brief review of exact and
approximate solution procedures. Section 3 presents the simulated annealing heuristic. Section 4
reports results for a variety of test problems from the p-median literature, ranging in size from
N = 100 to N = 5,934 objects. An application of the p-median model to a real-world data set
from the telecommunications industry is provided in Section 5. We conclude in Section 6 with a
brief summary and discussion of limitations and extensions.
optimization problem, which is similar to the one offered in dissimilarity form by Mladenović,
Brimberg, Hansen, and Moreno-Pérez (2007), is:
Maximize : z1 = max{sij } , (1)
J j ∈J
i∈I
subject to J ⊂ I, (2)
|J | = p. (3)
The PMP can also be represented as an integer program (Rao, 1971; ReVelle & Swain, 1970;
Vinod, 1969). The integer programming formulation of the PMP is facilitated by the following
set of decision variables:
X: X = {x11 , x12 , . . . , xN N } is a set of binary variables where xij = 1 if object i is assigned
to exemplar j and 0 otherwise for 1 ≤ i ≤ N and 1 ≤ j ≤ N .
The integer programming formulation is as follows:
N
N
Maximize : z1 = sij xij , (4)
X
i=1 j =1
N
subject to xij = 1 for 1 ≤ i ≤ N ; (5)
j =1
N
xjj = p; (6)
j =1
The objective function (4) represents the total sum of the similarities of each object to its most
similar exemplar. Constraint set (5) requires that each object is assigned to exactly one exemplar.
Constraint (6) guarantees the selection of exactly p exemplars. Constraint set (7) ensures that
object i can only be assigned to object j if object j is a selected exemplar. The binary restrictions
on the variables are provided by constraint set (8).
Galvão, 2000; Murray & Church, 1996), genetic algorithms (Alba & Dominguez, 2006; Alp,
Erkut, & Drezner, 2003; Moreno-Pérez, García-Roda, & Moreno-Vega, 1994), tabu search (Rol-
land, Schilling, & Current, 1996), ant-colony approaches (Levanova & Loresh, 2004), variable
neighborhood search (Hansen & Mladenović, 1997; Hansen et al., 2001), heuristic concentration
(Rosing & ReVelle, 1997; Rosing, ReVelle, Rolland, Schilling, & Current, 1998), and hybrid
methods (Resende & Werneck, 2004). A comprehensive review of these procedures was recently
provided by Mladenović et al. (2007).
In their review, Mladenović et al. (2007, p. 932) observe that the vertex substitution heuristic
(VSH) originally proposed by Teitz and Bart (1968) “. . . is one of the most often used heuris-
tics either alone or as a subroutine of other more complex methods or within metaheuristics.”
The VSH iteratively refines an initial set of exemplars by finding the best possible interchange
of one of the current exemplars with one of the unselected objects. This process continues until
there is no interchange that will lead to any improvement of the objective function. The effi-
cient implementation of this method is of considerable importance and has been the focus of
several research efforts (Hansen & Mladenović, 1997; Resende & Werneck, 2003; Rosing, 1997;
Whitaker, 1983). Fast implementations of VSH make use of lists of the most similar and second
most similar exemplars for each object. These lists help to avoid the explicit evaluation of the
replacement of each exemplar with one of the unselected objects.
subject to constraint (2). The sum of the preference scores of objects selected as exemplars sup-
posedly steers the objective function towards selecting the optimal number of clusters for a given
data set. Accordingly, a purported advantage of APP is that the incorporation of the preference
information in the objective function can be used to facilitate the selection of the number of clus-
ters. Unfortunately, the choice for the preference values is highly subjective. In the absence of
any à priori information, Frey and Dueck (2007) suggest setting each θi value equal to the median
of the similarity measures, or to the minimum similarity measure if a smaller number of clusters
is desirable. We emphasize, however, that both of these choices are somewhat arbitrary and there
is no compelling evidence that either choice results in an appropriate number of clusters.
The affinity propagation algorithm passes two pieces of information between pairs of objects
that possess similarity relationships: (a) “responsibilities,” which indicate the suitability of one
object to serve as an exemplar for another, and (b) “availabilities,” which indicate the viability
of an object to accept another object as its exemplar. The responsibilities and availabilities are
iteratively updated to provide accumulated evidence of exemplar suitability and object assign-
ment information. Although the updating rules are straightforward, affinity propagation requires
a number of nontrivial parameter decisions. These include iteration limits and thresholds to con-
trol termination, as well as a “damping factor” to prevent numerical oscillations that can occur
for some test problems.
462 PSYCHOMETRIKA
accepted solutions from becoming too large or too small. At each temperature, the proportion, π ,
of the 10N trial solutions that were accepted is computed, and the following function is used to
obtain the temperature adjustment factor, c, in Step 6:
⎧
⎪
⎪ 1 + (0.4 − π)/0.4 if π < 0.4,
⎨
c = 1/(1 + (π − 0.6)/0.4) if π > 0.6, (10)
⎪
⎪
⎩
1 if 0.4 ≤ π ≤ 0.6.
We developed a version of the simulated annealing algorithm that uses (10) for temperature ad-
justment. For this rule, the termination criterion in Step 6 is not appropriate, and it is necessary to
determine a limit on the number of temperature adjustments. For comparison to the standard tem-
perature reduction implementations, we obtained a limit by calculating the number of reductions
that would be necessary to reduce the initial temperature to .0001 using c = 0.9.
F IGURE 1.
The neighborhood solution generation procedure for the SA algorithm.
F IGURE 2.
A procedure to update the α and β arrays after replacing the exiting exemplar with the entering candidate object.
4. Computational Results
TABLE 1.
Objective values (z1 ) for apcluster.m, vsh_fc.m, sa1.m, and sa2.m.
respectively. Moreover, the mean percent deviation from the best-found objective function val-
ues for the sa1.m (c = .9), sa1.m (c = .8), vsh_fc.m, sa2.m, and apcluster.m meth-
ods were .008%, .032%, .289%, 1.345%, and 1.007%, respectively. Both versions of sa1.m
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 469
TABLE 2.
Percentage error and computation times for apcluster.m, vsh_fc.m, sa1.m, and sa2.m.
Test Percentage deviation from best-found solution Computation time (in seconds)
problem p apcluster vsh_fc sa1.m sa1.m sa2.m apcluster vsh_fc sa1.m sa1.m sa2.m
c = .8 c = .9 c = .8 c = .9
Travel 6 0.000 0.000 0.000 0.000 0.000 5 3 146 306 318
routing 7 0.505 0.000 0.000 0.000 0.000 5 3 145 305 318
N = 456 8 1.422 0.000 0.000 0.000 0.000 6 4 144 304 318
9 0.732 0.000 0.000 0.000 0.000 6 4 145 304 310
10 0.677 0.000 0.000 0.000 0.000 8 4 146 307 310
11 0.575 0.000 0.000 0.000 0.000 7 4 141 297 314
12 0.140 0.000 0.000 0.000 0.000 7 5 144 304 307
13 0.738 0.000 0.000 0.000 0.000 5 5 143 302 304
Extended 68 0.492 0.608 0.000 0.000 2.523 48 217 354 736 1770
circuit 72 0.912 1.345 0.000 0.002 2.664 40 225 370 765 1749
board 103 0.359 0.535 0.150 0.000 2.688 52 286 454 944 1673
N = 1,272 138 0.000 0.000 0.271 0.063 1.095 51 360 509 1058 1602
146 0.000 0.000 0.000 0.000 1.023 50 387 531 1108 1606
180 0.709 0.170 0.188 0.000 0.841 60 442 544 1130 1577
246 0.000 4.105 0.000 0.000 0.196 50 546 615 1279 1563
269 0.149 0.037 0.000 0.000 0.255 49 560 626 1306 1561
PCB3038 58 0.354 0.226 0.110 0.000 3.224 301 1542 1518 3042 8256
N = 3,038 67 0.631 0.338 0.110 0.000 3.189 471 1752 1477 3171 8172
93 1.037 0.412 0.007 0.000 3.564 277 2225 1683 3474 8205
145 1.213 0.655 0.092 0.000 3.812 302 2998 1519 3167 7634
201 1.043 0.561 0.000 0.016 3.862 310 3774 1504 3219 7566
301 1.214 0.608 0.000 0.010 3.649 316 4802 1741 3593 7752
482 0.814 0.672 0.000 0.032 3.179 617 6472 1822 3723 7443
765 0.449 0.916 0.126 0.000 1.947 758 7315 2068 4162 7821
produced an objective value that was as good as or better than that of sa2.m for all 48 test prob-
lems. The apcluster.m algorithm obtained a better solution than both versions of sa1.m in
one instance (the ExtendedCircuitBoard problem with p = 138) but yielded worse solutions than
sa1.m on the vast majority of test problems.
470 PSYCHOMETRIKA
4.4.2. Follow-up Study Results With respect to the PCB3038 and RL5934 test problems
in the follow-up study, the results in Table 3 show that sa1.m performed well relative to the
benchmark objective values published by Resende and Werneck (2004). The average percent-
age deviations from the benchmark values for sa1.m were .056% and .041% for (c = .8) and
(c = .9), respectively. For at least one of the two cooling factor settings, sa1.m matched the
benchmark objective values for PCB3038 at p = 10, 20, 40, and 60 and RL5934 at p = 20 and
30. More impressively, sa1.m obtained objective values that are better than Resende and Wer-
neck’s benchmarks for PCB3038 at p = 30 and p = 100, as well as RL5934 at p = 40 and
p = 100.
It is important to reiterate that the benchmarks used for comparison purposes in Table 3 cor-
respond to the best objective values across multiple metaheuristics (Hansen et al., 2001; Resende
& Werneck, 2004, Taillard, 2003). A direct comparison of our proposed method with these meta-
heuristics is complicated by the fact that they were typically implemented using more computa-
TABLE 3.
A comparison of sa1.m to benchmark objective values (z1 ) for PCB3038 and RL5934 problems.
tionally efficient hardware and software platforms. More specifically, these metaheuristics were
implemented on workstations using compiled Fortran or C++ programs, whereas our sa1.m
program is implemented on a personal computer as a MATLAB m-file, which is not compiled.
Nevertheless, the relative performance of sa1.m is exceptional. For example, in a head-to-head
comparison with published results for Hansen et al.’s (2001, pp. 347–348) variable neighborhood
search heuristic, sa1.m (c = .8) provided a better objective value for 19 of the 28 test problems
in Table 3. For c = .9, sa1.m obtained a better objective value for 21 of the 28 problems.
4.4.3. Summary of Main Findings In summary, the salient findings of our methodological
comparisons can be succinctly stated as follows:
1. Our new simulated annealing heuristic substantially outperformed Chiyoshi and Galvão’s
(2000) simulated annealing implementation across the 40 PMP test problems in the OR-
Library.
2. Simulated annealing worked much better with temperature reduction (sa1.m) than it did
with temperature adjustment (sa2.m).
3. The sa1.m heuristic performed well at both parameter settings. Increasing the cooling
parameter from c = .8 to c = .9 produced a slight improvement in average solution qual-
ity but roughly doubled the computation time.
4. Although computationally efficient relative to simulated annealing, apcluster.m sel-
dom achieves the best-found solution and can exhibit severe departures from the best-
found objective value in many instances. These findings support those reported by Br-
usco and Köhn (2008a). Moreover, apcluster.m appears to have greater computa-
tional storage requirements than sa1.m given that the former method ran out of memory
on problem RL5934.
5. Although computationally efficient and apt to identify the best-found objective value
when p is small, the relative performance of vsh_fc.m declines significantly as p in-
creases.
6. For the largest problem instances, PCB3038 and RL5934, sa1.m is competitive with the
best p-median heuristics in the literature.
We applied Frey and Dueck’s (2007) affinity propagation program, Brusco and Köhn’s
(2008a) implementation of VSH, and simulated annealing to a telecommunications data set
originally studied by Brusco, Cradit, and Tashchian (2003) and more recently considered by
Brusco and Köhn (2008b). The data consist of N = 475 responses from corporate technology
managers regarding their satisfaction with the services of their current long-distance service
provider on V = 5 dimensions: (v1 ) price competitiveness, (v2 ) account executive responsive-
ness, (v3 ) billing accuracy, (v4 ) product offering adequacy, and (v5 ) repair service responsive-
ness. Each variable was measured using a five-point Likert scale, and the similarity matrix was
obtained based on the negative Euclidean distances between pairs of respondents.
The vsh_fc.m and sa1.m (c = 0.8) algorithms were applied to the telecommunications
data set for a range of 2 ≤ p ≤ 10 clusters. The preference weights of the apcluster.m program
were adjusted by trial-and-error to produce solutions for this same range of p. Ten additional sets
of preference weights were also evaluated for apcluster.m; however, the algorithm failed to
converge in two instances. For comparison purposes, the vsh_fc.m and sa1.m algorithms
were subsequently used to produce solutions using the number of clusters selected by apclus-
ter.m. Finally, for benchmarking purposes, the globally optimal solution for each value of p
472 PSYCHOMETRIKA
TABLE 4.
A comparison of objective values (z1 ) for apcluster.m, vsh_fc.m, and sa1.m for the telecommunications data
set for different values of p (suboptimal values are shown in bold).
TABLE 5.
A comparison of the 7-cluster partitions obtained for the telecommunications data set
using sa1.m and apcluster.m (the ‘Medians’ column contains the measurements of
the cluster exemplar for each of the five variables).
was obtained using the procedure described by Brusco and Köhn (2008b). The results are pro-
vided in Table 4.
There are 19 different preference weights in Table 4; however, because apcluster.m
failed to converge for weights of −3 and −2, the comparison is limited to 17 different values
of p. The sa1.m algorithm obtained the globally optimal solution for all 17 problems, whereas
vsh_fc.m provided the optimum in all but one instance (p = 31). Contrastingly, apclus-
ter.m obtained the global optimum in only 5 of the 17 instances. From a practical standpoint,
it is important to recognize that suboptimality of apcluster.m can have serious ramifications
for interpretation of the clustering solution. To illustrate, we present a comparison in Table 5 of
the 7-cluster partitions for apcluster.m and sa1.m (alternatively, vsh_fc.m, which pro-
duced the same solution as sa1.m).
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 473
Table 5 reveals that, although apcluster.m and sa1.m yielded the same medians for
clusters 1, 2, and 3, the remaining four medians are different. Most noteworthy among the dif-
ferences is the fact that clusters 6 and 7 of the sa1.m partition reflect a more extreme level of
dissatisfaction than the corresponding clusters for the apcluster.m partition. For example,
consider that the median measures for cluster 7 are the same except that the sa1.m partition
has a more extreme measure of ‘2’ on the first variable in comparison to ‘3’ for the apcluster.m
partition. Similarly, cluster 6 of the sa1.m and apcluster.m partitions are the same, except
for the second variable where the sa1.m partition has a more extreme median of ‘2’. Although
these differences might seem subtle at first glance, they yield marked differences in cluster mem-
berships. For example, we computed a value of .639 for Hubert and Arabie’s (1985) adjusted
Rand index (ARI) as a measure of agreement between the two partitions. The ARI achieves a
value of one for perfect agreement and a value of zero for chance agreement. Based on Steinley’s
(2004) guidelines, a value of .639 would reflect a mediocre level of agreement between the two
partitions.
Our objective in this paper was to build on a recent stream of research pertaining to the
importance of the p-median problem and exemplar-based clustering (Alba & Dominguez, 2006;
Avella et al., 2007; Beltran et al., 2006; Brusco & Köhn, 2008a, 2008b; Frey & Dueck, 2007,
2008; Hansen & Mladenović, 2008; Mladenović et al., 2007). Although vital for theoretical pur-
poses, exact procedures are infeasible for large matrices, and most of the reported computa-
tional results for such methods are limited to similarity/dissimilarity data obtained from Euclid-
ean distances between points in two-dimensional space (see, for example, Avella et al., 2007;
Beltran et al., 2006; Brusco & Köhn, 2008a). The performance of these exact procedures for
data structures that are asymmetric and/or have violations of the triangle inequality has not been
demonstrated (see, for example, Hansen et al., 2001).
The approximate procedure termed “affinity propagation,” which was offered by Frey and
Dueck (2007), is capable of obtaining reasonable solutions in modest computation times. How-
ever, Brusco and Köhn (2008a) showed that affinity propagation seldom obtains the globally
optimal solution for small data sets, and our results indicate that this shortcoming persists for
larger problems. Moreover, we have found that apcluster.m also exhibits convergence dif-
ficulties in some instances. Brusco and Köhn’s MATLAB implementation of the VSH performs
well for problems where the number of clusters is 50 or fewer; however, its computation time
explodes, and its performance degrades as an increasing function of p.
We have developed a new simulated annealing heuristic for the p-median problem that sub-
stantially outperformed Chiyoshi and Galvão’s (2000) implementation of simulated annealing
across the test problems in the OR-Library. The new simulated annealing heuristic also over-
comes the limitations of apcluster.m and vsh_fc.m. Although sa1.m requires more com-
putation time than apcluster.m, the former method typically obtains solutions with better
objective values for both large and small values of N and p. Moreover, sa1.m is less suscepti-
ble to the computer memory and convergence problems encountered by apcluster.m in our
experiments. Relative to vsh_fc.m, sa1.m provides appreciably better objective function val-
ues in less time when N and p are large. We have shown that simulated annealing can produce
exceptional results for the p-median problem; however, it is important to acknowledge that glob-
ally optimal solutions are not guaranteed. Nevertheless, the quality of the solutions obtained for
the largest test problem in our study, RL5934, is extremely competitive with two of the best-
performing methods in the literature, (a) Hansen et al.’s (2001) variable neighborhood search
method and (b) Resende and Werneck’s (2004) hybrid heuristic.
474 PSYCHOMETRIKA
Finally, it is important to clarify that we are not purporting that the p-median model should
replace within-cluster sums of squares (K-means) partitioning as the most prominent discrete
optimization approach for cluster analysis. However, we do feel that the p-median model should
be considered as a plausible alternative to K-means because it possesses several desirable prop-
erties: (a) the provision of exemplars as centroids, (b) the flexibility to accommodate both metric
and nonmetric data, and (c) the flexibility to handle asymmetric and even rectangular proxim-
ity matrices. Through the development of software programs that can efficiently produce high-
quality solutions to p-median problems, we hope to afford researchers the opportunity to at least
evaluate the p-median approach as an alternative to K-means.
References
Aarts, E., & Korst, J. (1989). Simulated annealing and Boltzmann machines: A stochastic approach to combinatorial
optimization and neural computing. New York: Wiley.
Alba, E., & Dominguez, E. (2006). Comparative analysis of modern optimization tools for the p-median problem. Sta-
tistics and Computing, 16, 251–260.
Alp, O., Erkut, E., & Drezner, Z. (2003). An efficient genetic algorithm for the p-median problem. Annals of Operations
Research, 122, 21–42.
Avella, P., Sassano, A., & Vasil’ev, I. (2007). Computational study of large-scale p-median problems. Mathematical
Programming A, 109, 89–114.
Beasley, J.E. (1990). OR-Library: Distributing test problems by electronic mail. Journal of the Operational Research
Society, 41, 1069–1072.
Beltran, C., Tadonki, C., & Vial, J. (2006). Solving the p-median problem with a semi-Lagrangian relaxation. Computa-
tional Optimization and Applications, June 5, 2006, doi:10.1007/s10589-006-6513-6.
Brusco, M.J., Cradit, J.D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings:
An application to customer value. Journal of Marketing Research, 40, 225–234.
Brusco, M.J., & Köhn, H.-F. (2008a). Comment on ‘Clustering by passing messages between data points’. Science, 319,
726.
Brusco, M.J., & Köhn, H.-F. (2008b). Optimal partitioning of a data set based on the p-median model. Psychometrika,
73, 89–105.
Brusco, M.J., Köhn, H.-F., & Stahl, S. (2008). Heuristic implementation of dynamic programming for matrix permutation
problems in combinatorial data analysis. Psychometrika, 73, 503–522.
Brusco, M.J., & Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares
partitioning. Psychometrika, 72, 583–600.
Ceulemans, E., & Van Mechelen, I. (2008). CLASSI: A classification model for the study of sequential processes and
individual differences therein. Psychometrika, 73, 107–124.
Ceulemans, E., Van Mechelen, I., & Leenen, I. (2007). The local minima problem in hierarchical classes analysis:
An evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377–391.
Chiyoshi, F., & Galvão, R.D. (2000). A statistical analysis of simulated annealing applied to the p-median problem.
Annals of Operations Research, 96, 61–74.
Christofides, N., & Beasley, J.E. (1982). A tree search algorithm for the p-median problem. European Journal of Oper-
ational Research, 10, 196–204.
Cornuejols, G., Fisher, M.L., & Nemhauser, G.L. (1977). Location of bank accounts to optimize float: An analytic study
of exact and approximate algorithms. Management Science, 23, 789–810.
Du Merle, O., & Vial, J.-P. (2002). Proximal-ACCPM, a cutting plane method for column generation and Lagrangian
relaxation: application to the p-median problem (Technical report 2002.23). HEC Genève, University of Genève.
Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics,
21, 768.
Frey, B., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.
Frey, B., & Dueck, D. (2008). Response to comment on “Clustering by passing messages between data points”. Science,
319, 726.
Galvão, R.D. (1980). A dual-bounded algorithm for the p-median problem. Operations Research, 28, 1112–1121.
Hanjoul, P., & Peeters, D. (1985). A comparison of two dual-based procedures for solving the p-median problem. Euro-
pean Journal of Operational Research, 20, 387–396.
Hansen, P., & Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207–226.
Hansen, P., & Mladenović, N. (2008). Complement to a comparative analysis of heuristics for the p-median problem.
Statistics and Computing, 18, 41–46.
Hansen, P., Mladenović, N., & Perez-Brito, D. (2001). Variable neighborhood decomposition search. Journal of Heuris-
tics, 7, 335–350.
Hartigan, J.A., & Wong, M.A. (1979). Algorithm AS136: A k-means clustering program. Applied Statistics, 28, 100–
128.
Howard, R.N. (1966). Classifying a population into homogeneous groups. In J.R. Lawrence (Ed.), Operational research
and social sciences (pp. 585–594). London: Tavistock.
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 475
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Kaufman, L., & Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Klastorin, T. (1985). The p-median problem for cluster analysis: A comparative test using the mixture model approach.
Management Science, 31, 84–95.
Kuehn, A.A., & Hamburger, M.J. (1963). A heuristic program for locating warehouses. Management Science, 9, 643–
666.
Levanova, T., & Loresh, M.A. (2004). Algorithms of ant system and simulated annealing for the p-median problem.
Automation and Remote Control, 65, 431–438.
Lin, S., & Kernighan, B.W. (1973). An effective heuristic algorithm for the traveling salesman problem. Operations
Research, 21, 498–516.
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In L.M. Le Cam &
J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1,
pp. 281–297). Berkeley: University of California Press.
Maranzana, F.E. (1964). On the location of supply points to minimize transportation costs. Operational Research Quar-
terly, 15, 261–270.
Mladenović, N., Brimberg, J., Hansen, P., & Moreno-Pérez, J.A. (2007). The p-median problem: A survey of metaheuris-
tic approaches. European Journal of Operational Research, 179, 927–939.
Moreno-Pérez, J.A., García-Roda, J.L., & Moreno-Vega, J.M. (1994). A parallel genetic algorithm for the discrete
p-median problem. Studies in Location Analysis, 7, 131–141.
Mulvey, J.M., & Crowder, H.P. (1979). Cluster analysis: An application of Lagrangian relaxation. Management Science,
25, 329–340.
Murillo, A., Vera, J.-F., & Heiser, W.J. (2005). A permutation-translation simulated annealing algorithm for L1 and L2
unidimensional scaling. Journal of Classification, 22, 119–138.
Murray, A.T., & Church, R.L. (1996). Applying simulated annealing to location-planning models. Journal of Heuristics,
2, 31–53.
Narula, S.C., Ogbu, U.I., & Samuelsson, H.M. (1977). An algorithm for the p-median problem. Operations Research,
25, 709–713.
Rao, M.R. (1971). Cluster analysis and mathematical programming. Journal of the American Statistical Association, 66,
622–626.
Reinelt, G. (2001). TSPLIB. http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95.
Resende, M.G.C., & Werneck, R.F. (2003). On the implementation of a swap-based local-search procedure for the p-
median problem. In R.E. Ladner (Ed.), Proceedings of the fifth workshop on algorithm engineering and experiments
(pp. 119–127). Philadelphia: SIAM.
Resende, M.G.C., & Werneck, R.F. (2004). A hybrid heuristic for the p-median problem. Journal of Heuristics, 10,
59–88.
ReVelle, C.S., & Swain, R. (1970). Central facilities location. Geographical Analysis, 2, 30–42.
Rolland, E., Schilling, D.A., & Current, J.R. (1996). A efficient tabu search procedure for the p-median problem. Euro-
pean Journal of Operational Research, 96, 329–342.
Rosing, K.E. (1997). An empirical investigation of the effectiveness of a vertex substitution heuristic. Environment and
Planning B, 24, 59–67.
Rosing, K.E., & ReVelle, C.S. (1997). Heuristic concentration: Two stage solution construction. European Journal of
Operational Research, 97, 75–86.
Rosing, K.E., ReVelle, C.S., Rolland, E., Schilling, D.A., & Current, J.R. (1998). Heuristic concentration and tabu search:
A head to head comparison. European Journal of Operational Research, 104, 93–99.
Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences,
Classe III Mathématique, Astronomie, Physique, Chimie, Géologie, et Géographie, IV(12), 801–804.
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386–396.
Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psy-
chology, 59, 1–34.
Taillard, E.D. (2003). Heuristic methods for large centroid clustering problems. Journal of Heuristics, 9, 51–74.
Teitz, M.B., & Bart, P. (1968). Heuristic methods for estimating the generalized vertex median of a weighted graph.
Operations Research, 16, 955–961.
Thorndike, R.L. (1953). Who belongs in the family? Psychometrika, 18, 267–276.
van Laarhoven, P.J.M., & Aarts, E.H.L. (1987). Simulated annealing: Theory and applications. Dordrecht: Kluwer.
Vera, J.-F., Heiser, W.J., & Murillo, A. (2007). Global optimization in any Minkowski Metric: A permutation-translation
simulated annealing algorithm for multidimensional scaling. Journal of Classification, 24, 277–301.
Vinod, H. (1969). Integer programming and the theory of grouping. Journal of the American Statistical Association, 64,
506–517.
Whitaker, R. (1983). A fast algorithm for the greedy interchange of large-scale clustering and median location problems.
INFOR, 21, 95–108.