Maths Applied To Music
Maths Applied To Music
Maths Applied To Music
Mathematics
and Computation
in Music
Second International Conference, MCM 2009
John Clough Memorial Conference
New Haven, CT, USA, June 19-22, 2009
Proceedings
13
Volume Editors
Elaine Chew
Viterbi School of Engineering and Thornton School of Music
University of Southern California, Los Angeles, CA, USA
and
School of Engineering and Applied Sciences and Department of Music
Harvard University, Cambridge, MA, USA
E-mail: echew@usc.edu
Adrian Childs
Hugh Hodgson School of Music
University of Georgia, Athens, GA, USA
E-mail: apchilds@uga.edu
Ching-Hua Chuan
Department of Mathematics and Computer Science
Barry University, Miami Shores, FL, USA
E-mail: chchuan@mail.barry.edu
ISSN 1865-0929
ISBN-10 3-642-02393-2 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-02393-4 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2009
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12699223 06/3180 543210
Preface
As part of the conference, the Beinecke Rare Book and Manuscript Library
at Yale University mounted a special display of music and mathematics material
from their collection. A related keynote address, “The End of Pythagoreanism:
Musica theorica, Natural Science, and Aristotle’s Philosophy of Mathematics,
c.1300-c.1600,” was given by David Cohen (Columbia University, USA).
We wish to acknowledge the generous support of Susan Adler and Roberta
Hudson of Yale Conference Services; Scott Petersen, Yale Music Department,
for technical support; Eric Bianchi, PhD student in Music History at Yale, for
curating the exhibit of historical materials; Kathryn James of Yale’s Beinecke
Library, for arranging the exhibit and reception; Edward Gollin of Williams
College for program guidance; and David Cohen of Columbia University for
providing a keynote lecture.
Richard Cohn
Ian Quinn
Elaine Chew
Adrian Childs
Organization
MCM 2009
General Chairs
Richard Cohn Yale University, USA
Ian Quinn Yale University / Stanford University, USA
Program Chairs
Elaine Chew University of Southern California /
Harvard University, USA
Adrian Childs University of Georgia, USA
Publications Chair
Ching-Hua Chuan Barry University, USA
Panels Chair
Anja Volk University of Utrecht, The Netherlands
Tutorials Chair
Aline Honingh City University London, UK
Exhibits Chair
Neta Spiro University of Cambridge, UK
Program Committee
Emmanuel Amiot Lycée Arago, France
Christina Anagnostopoulou University of Athens, Greece
Moreno Andreatta Inst. de Recherche et Coordination
Acoustique/Musique Centre National de la
Recherche Scientifique, France
Gérard Assayag Inst. de Recherche et Coordination
Acoustique/Musique, France
Chantal Buteau Brock University, Canada
Ching-Hua Chuan Barry University, USA
David Clampitt Ohio State University, USA
Shlomo Dubnov University of California, San Diego, USA
Morwaread Farbood New York University, USA
Alexandre François Tufts University, USA
VIII Organization
Vice-President
Moreno Andreatta Inst. de Recherche et Coordination
Acoustique/Musique Centre National de la
Recherche Scientifique, France
Secretary
Elaine Chew University of Southern California /
Harvard University, USA
Treasurer
Ian Quinn Yale University / Stanford University, USA
Organization IX
1
For two expressions of exasperation, see John Backus, “Die Reihe — A Scientific
Evaluation,” Perspectives of New Music 1.1 (1962): 160-171, and Eric Regener,
“Allen Forte’s Theory of Chords,” Perspectives of New Music 13.1 (1974): 199.
XII Foreword
2
John Clough, “Pitch-Set Equivalence and Inclusion: A Comment on Forte’s
Theory of Set-Complexes.” Journal of Music Theory 9 (1965): 163-171; “As-
pects of Diatonic Sets.” Journal of Music Theory 23 (1979): 45-61; “Dia-
tonic Interval Sets and Transformational Structures,” Perspectives of New Mu-
sic 18 (1979-80): 461-482. A complete list of Clough’s writings is given at
http://www.music.buffalo.edu/theory/cloughpub.shtml.
3
A recent summary of this line of research can be found in introductions to two recent
volumes dedicated to extending it: David Clampitt’s “The Legacy of John Clough in
Mathematical Music Theory,” Journal of Mathematics and Music 1.2 (July 2007):
73-78; and Norman Carey, Jack Douthett, and Martha M. Hyde’s introduction to
Music Theory and Mathematics: Chords, Collections, and Transformations, ed. Jack
Douthett et. al. (Rochester: University of Rochester Press, 2008): 1-8.
Foreword XIII
Four months before John Clough passed away, Robert Peck attracted an in-
ternational group of mathematical music theorists to Baton Rouge, Louisiana, for
a special session of the American Mathematical Society. Although John was too
ill to attend, the spirit of his personality, as much as the content of his thought,
infused the proceedings. The synchronization of Clough’s Viertaktigkeit with the
projected binary periodicity of the meetings of the recently formed Society for
Mathematics and Computation in Music suggested the one-time merger of the
two events, even though it would produce an event with a different scale and
tone than its predecessors. That suggestion was solidified into a commitment as
soon as I began to imagine John’s excitement, had he survived to witness the
founding of this scholarly society—a commingling of musicians, mathematicians,
and systems scientists, with membership and leadership from both sides of the
Atlantic, producing a thrice-annual periodical, and a periodic conference with
proceedings from a major publisher. It is difficult to imagine that, for a math-
ematical music theorist as dedicated, equanimous, and magnanimous as John
Clough, dreams could get any wilder than that.
Richard Cohn
Battell Professor of the Theory of Music
Yale University
Table of Contents
Abstract. The Hamiltonian cycles in the topological dual of the Tonnetz (i.e.
the successions of triads connected only through PLR-transformations which
visit every minor and major triad only once) will be introduced, enumerated on,
studied, and classified both from a theoretical and analytical point of view.
1 Introduction
In [3], Richard Cohn explains the practical uses of Neo-Riemannian theories by show-
ing that they are "an efficient technology and descriptive language for making and
communicating new discoveries about the properties of triads and related structures,
and the relational systems in which they participate." Recently this framework has
been almost exclusively studied from a theoretical and analytical point of view. The
aim of the present paper is to show some cycles of the topological dual of the Tonnetz
(i.e. some successions of triads connected only through PLR-transformations) which
could be useful as a compositional device. Properties of minimal cycles of this graph
have been widely studied (we can mention [1] and [4]), but no attention has yet been
given to the Hamiltonian class of cycles.
In mathematics a Hamiltonian cycle (or circuit)1 is a closed path through the verti-
ces of a graph which includes every vertex exactly once. So Hamiltonian cycles in the
topological dual of the Tonnetz represent complete sequences through all twenty-four
major and minor triads using PLR-transformations in which each major and minor
triad is used only once. These cycles are exclusively triadic and overall completely
chromatic, since every pitch class is used exactly six times. As we shall see, the suc-
cession can also be more or less diatonic, depending on the patterns of the transforma-
tions that are employed. So these classes of cycles could be a useful compositional
device to define harmonic structures that are triadic (and in some cases locally dia-
tonic) but without any real tonal center. In fact some contemporary composers, like
Paul Glass in his “Corale I for Margareth”, for string orchestra (1995) or young com-
poser Jeremy Vaughan in his “Violin Sonata” (2008), used successions of triads
1
It's worthwhile to know that the Hamiltonian cycles has been used in Music Theory also in
some formalizations of the Art of Change-Ringing.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 1–10, 2009.
© Springer-Verlag Berlin Heidelberg 2009
2 G. Albini and S. Antonini
2.2 Definition. We call (Pc, I) the GIS with Pc being the set of the twelve well tem-
pered pitch classes, and I being the group of intervals in the octave that is isomorphic
to the cyclic group Z/12Z.
Notice that g1 and g2 are the inverse of one other in the group G, and that it is not nec-
essary for them to both belong to H, because the tone-network is a simple graph, so
no more than one edge can be between two vertices. To give both of them while de-
fining a tone-network is important for having a complete vision of the model in the
spcific instance of a set of pitch classes, as it shall be done for the Tonnetz.
Now, reminded that a graph is vertex-transitive if and only if every pair of vertices
is equivalent under some element of its automorphism group (i.e. no vertex can be
distinguished from any other) the following Theorem can be given.
2.4 Theorem. A tone-network is always vertex-transitive.
Proof. The group of intervals (G), whose action on the set of vertices is given by its
action on their labelling, always map adjacent vertices in adjacent vertices, so G is a
subgroup of the automorphism group of the tone-network. So for every ordered cou-
ple of vertices there is an element of the automorphism group which maps the first in
the second. ●
2
A group action is called free if any element of the set is fixed only by the identity of the
group.
3
A group action is transitive if it possesses only a single orbit.
Hamiltonian Cycles in the Topological Dual of the Tonnetz 3
The previous result is interesting because it underlines that in each GIS, and conse-
quently in every tone-network built on it, there is theoretically no way to distinguish
two pitches/vertices and it shows a property of particular interest while providing the
abstract case of pitch classes as we shall do.
Now the tonnetz can be easily defined:
2.5 Definition. The Tonnetz, or more simply Ton, is a tone-network defined on the
GIS (Pc, I) as it has been given in definition 2.2, where H contains only the following
intervals: major and minor thirds, the fifth and their inverses.
Reminded that the topological dual of a simple planar graph is a simple graph that
has vertices each of which corresponds to a face of the first graph and that are con-
nected if the corresponding faces have an edge in common, and noticed that Ton has
no edges crossing if embedded in the torus, the topological dual of the Tonnetz can be
defined:
2.6 Definition. D(Ton), the topological dual of the Tonnetz, is a simple labeled graph
whose vertices are labeled with the triads defined by the pitch classes that bound the
corresponding face.
Because two vertices are connected in D(Ton) if the corresponding triads share two
pitch classes, also edges can be labelled through the basic Neo-Riemannian operators
P, L and R. The pictures of Ton and D(Ton) are given in Figure 1 and Figure 2.4
Before giving the last theorem, let's introduce another object necessary to its proof.
2.7 Definition. Given the GIS (Pc, I), we will call T-I the group acting on Pc and gen-
erated by the group of intervals G (the twelve translations) and by the inversion ele-
ment which fixes C and F# and exchange C# and B, D and A#, D# and A, E and G#, F
and G.
Fig. 1. A planar view of Ton. Numbers outside the graph show adjacency.
4
Triads will be represented giving the fundamental and the sign + for major triads and the sign
– for minor ones.
4 G. Albini and S. Antonini
Fig. 2. A planar view of D(Ton). Numbers outside the graph show adjacency.
The proof that the action through labelling of T-I on Ton maintain adjacency is left to
the reader.5
Let's now concentrate on the main result of the section, that's original, and that will
be useful for counting Hamiltonian cycles in D(Ton).
5
It is also possible, with a proof similar to the one of theorem 2.9, to prove that T-I is isomor-
phic to the automoprhism group of Ton.
6
The automorphism group of a graph is the group of bijective mappings from the vertices of
the graph to the vertices of the same graph which preserve adjacency.
7
Groups and Graph, version 3.2 for MacOSX (2006), by William Kocay and William Palmer.
6 G. Albini and S. Antonini
The number which gives the name to the cycle depends on the order of output of
the software.
Now we can begin to study them considering the action of Aut(DTon), classifying
them in terms of the succession of transformations (independently from the direction
of the path covered) instead of triads. In fact, given a Hamiltonian cycle, if n elements
of Aut(DTon) transform it to itself, then there are exactly 24/n different Hamiltonian
cycles sharing the same model of transformation. Eight models can be recognized,
named H1, ... , H8:
H6: The cycles #6, #7, #8, #9, #10, #19, #21, #22, #27, #30, #31 and #58.
Characterized by the repetition of the model LPLPLRPLPLPRPLPLPRPLPLPR
(or RPLPLPRPLPLPRPLPLPRLPLPL). These two cycles are mapped into them-
selves by 2 elements of Aut(DTon).
H7: The cycles #3, #12, #15, #17, #26, #28, #35, #46, #51, #52, #56 and #61.
Characterized by the repetition of the model PLRPLPRLPLRLPRLRPRLRPLRL
(or LRLPRLRPRLRPLRLPLRPLPRLP). These two cycles are mapped into them-
selves by 2 elements of Aut(DTon).
4.1 Definition. We call diatonic pitch class set, a subset of seven elements of Pc that
can be ordered through perfect fifths. It will be named both with the minor and major
scale that can be built with its pitches.
4.2 Example. The set {C, D, E, F, G, A, B} is a diatonic pitch class set, since its pitch
classes can be ordered by perfect fifths in the following way: F, C, G, D, A, E, B. It
will be called the C major / A minor diatonic set.
8 G. Albini and S. Antonini
Fig. 4. Cycles #41 (H1, most symmetrical), and #1 (H8, least symmetrical)
On every diatonic set only six major and minor triads can be built. In Figure 5 the
Neo-Riemannian relations between the six triads of the C major / A minor diatonic set
are shown.
It is easy to see that two diatonic sets that are at a distance of a perfect fifth (for ex-
ample C major / A minor and G major / E minor) share six pitch classes and four
Hamiltonian Cycles in the Topological Dual of the Tonnetz 9
8
Simplicity is intended from a Riemannian point of view. In [10] Riemann wrote: «Let atten-
tion be drawn here to the definite inclination of the interpreting mind to find its way easily
through the confusion of endless possibilities of tonal combinations (in melody and harmony)
by means of preferring simple relationship over more complicated ones. This Principle of
Greatest Possible Economy for the Musical Imagination moves directly toward the rejection
of more complicated structures.»
10 G. Albini and S. Antonini
References
1. Cohn, R.: Maximally Smooth Cycles, Hexatonic Systems, and the Analysis of Late-
Romantic Triadic Progressions. Music Analysis 15(1), 9–40 (1996)
2. Cohn, R.: Neo-Riemannian Operations, Parsimonious Trichords, and Their “Tonnetz”
Representations. Journal of Music Theory 41(1), 1–66 (1997)
3. Cohn, R.: Introduction to Neo-Riemannian Theory: A Survey and a Historical Perspective.
Journal of Music Theory 42(2), 167–180 (1998)
4. Cohn, R.: Square Dancese with Cubes. Journal of Music Theory 42(2), 167–180 (1998)
5. Crans, A.S., Fiore, T.M., Satyendra, R.: Musical Actions of Dihedral Groups (2008),
http://arxiv.org/pdf/0711.1873
6. Harary, F.: Graph Theory. Addison-Wesley, Reading (1969)
7. Harary, F., Palmer, E.M.: Graphical Enumeration. Academic Press, London (1973)
8. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-
Completeness. W.H. Freeman, New York (1983)
9. Lewin, D.: Generalized musical Intervals and Transformations. Yale University Press
(1987)
10. Riemann, H.: Ideas for a Study On the Imagination of Tone. Journal of Music The-
ory 36(1), 81–117 (1992)
The Continuous Hexachordal Theorem
1 Introduction
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 11–21, 2009.
c Springer-Verlag Berlin Heidelberg 2009
12 B. Ballinger et al.
Every pair of the points on the circle determines an inter-onset duration in-
terval (the geodesic between the pair of points around the circle) [Bue78]. The
histogram of this multiset of distances in the context of musical scales and chords
is called its interval content [Lew59]. Two rhythms which are congruent to each
other obviously have the same interval content. Here by congruence we mean
geometrical congruence, i.e., equivalence under rotation or reflection. However,
two rhythms with the same histograms need not be congruent. Two sets of points
with the same multiset of distances are said to be homometric, a term introduced
by Patterson in 1939 [Pat44], who first discovered them. In the music literature,
two pitch-class sets (or two rhythms) with the same intervalic content are termed
as having the Z-relation or isomeric relation [For77].
One of the fundamental theorems in this area is the so-called Hexachordal
Theorem, which states that complementary sets with k=n/2 (and n even) are
homometric. Two examples are shown in Figs. 1 and 2. In Fig. 1, the k=4 onsets
occur at (0, 1, 4, 7), and the complementary rhythm has onsets precisely where
the first rhythm has rests: (2, 3, 5, 6). The histogram of intervals is identical.
Fig. 2 shows two complementary (n, k)=(12, 6) rhythms, again with identical
histograms.
An important convention we follow is that the pair of onsets separated by the
diameter d = n/2 contributes two counts to the interval d in the histogram. This
1 1
7 1
2 2.0
6 4 2 1.0
3 3
1 2 3 4
5 3
7 1
2
6 4 2 1
1 3 3 1
1 2 3 4
2
5 3
Fig. 1. Example of the Hexachordal Theorem, (n, k)=(8, 4). Note that the distance
d=4 is counted twice.
The Continuous Hexachordal Theorem 13
0
11 1 1
4
10 2
3
4
3
4
5
5 2
9 4 6 3
5
1 1
4
3
8 4
3 1 2 3 4 5 6
2
7 1 5
6
0
11 1
3 1 4
10 2
4
4
3
1 5
5
9 2 6 4 3 2
5 3
1 1
4
8 4
3 1 2 3 4 5 6
7 5
6
Fig. 2. Another example of the Hexachordal Theorem, (n, k)=(12, 6). Note that the
distance d=6 is counted twice.
convention simplifies the proofs but changes nothing substantively. This issue is
further addressed in Section 2.5.
The term “hexachordal” derives from Schönberg’s use of 6-note chords in a
12-tone chromatic scale, and the name “hexachordal” has been retained even
though the theorem holds for arbitrary even n.
1.2 History
The earliest proof of the Hexachordal Theorem in the music literature is, to our
knowledge, due to Lewin. In 1959 he published a paper [Lew59] on the intervalic
relations of two chords that contained an embryonic proof of the Hexachordal
Theorem; such a proof was refined in a subsequent paper [Lew60]. In 1974 Re-
gener [Reg74] found an elementary simple proof of this theorem based on the
combinatorics of pitch-class sets. Many other proofs have appeared since then,
often rivalling in conciseness. Short proofs can be found, for instance, in the
work of Mazzola [Maz03] or Jedrzejewski [Jed06]. Amiot [Ami07] gave an ele-
gant, short proof based on the discrete Fourier transform. Perhaps, one of the
simplest proofs, in the sense of using no structures such as groups or discrete
Fourier transforms, was discovered by Blau [Bla99]. His proof relies on a straight-
14 B. Ballinger et al.
1.3 Outline
We will first introduce weighted rhythms as a generalization of usual rhythms.
This generalization will consist of associating certain weights to the onsets and
rests of a rhythm. Next we will state and prove the Hexachordal Theorem in
terms of such weighted rhythms. We will then generalize the Hexachordal Theo-
rem to a continuous version of it , where rhythms will be considered as continuous
functions on the interval [0, 1]. From this version we will prove again the discrete
Hexachordal Theorem as a straightforward corollary of the continuous version.
Height: 2 1 2 2
Distance d: 1 2 3 4
This may be viewed as a function of the interval distance d: HR (d) is the height
of the histogram at distance d. With this notation, the Hexachordal Theorem
may be stated as follows:
The Continuous Hexachordal Theorem 15
Proof. The proof fixes d and establishes that HR (d) = HR (d). From the his-
togram Definition 2 we have:
1
HR (d) = f (x) f (x + d) dx.
0
1
The first integral is just 1, and the second two1 are each 2 by the assumption of
the theorem that W (R) = 12 :
1
1 1
=1− − + f (x)f (x + d) dx
2 2 0
1
= f (x)f (x + d) dx
0
= HR (d)
The last step again follows from the Definition 2, and so we have established that
HR (d) = HR (d) for all d, i.e., the histograms are identical and R is homometric
to R.
The weight function f (x) need not be a continuous function in the technical
mathematical sense.2 We only need that it be integrable,3 i.e., a function for
which an appropriate “area under the function graph” may be defined.
1
Shifting x to x + d shifts the graph of f ( ) but does not change the area underneath it.
2
A function f is continuous if, for all c in the domain, limx→c f (x) = f (c).
3
For example, Lebesgue integrable suffices.
The Continuous Hexachordal Theorem 17
f(x)
1
(a) x
0 1/8 1/4 3/8 1/2 5/8 3/4 7/8 1
H(d)
1/2
3/8
1/4
1/8
d
(b) 0 1/8 1/4 3/8 1/2
Fig. 3. (a) Weight step function f (x) corresponding to Fig. 1 (top), (n, k)=(8, 4).
(b) Corresponding histogram integral H(d).
We should note that the above proof can be directly discretized to yield a
parallel proof of the Discrete Hexachordal Theorem. Instead, we show below
that the freedom to use any integrable weight function renders the Discrete
Hexachordal Theorem 1 an immediate corollary of the Continuous Hexachordal
Theorem 2.
Suppose a discrete rhythm R has weights (w0 , w1 , . . . , wn−1 ), with each weight
either 1 or 0. Then define the step function f (x) = wi for ni ≤ x < i+1 n . For
example, Fig. 3(a) shows the step function corresponding to the top rhythm
in Fig. 1, whose discrete weights are (1, 1, 0, 0, 1, 0, 0, 1). Note that the total
weight/area is 4· 18 = 12 , which accords with the discrete weight of 12 n= 12 8=4.
We formalize this correspondence between continuous and discrete as follows:
1, for all x ∈ A
χA (x) =
0, otherwise
to represent the 1/0 characteristic function of a set A.
We convert the discrete rhythm (w0 , w1 , . . . , wn−1 ) into the continuous rhythm
n−1
f (x) = wi · χ[ i , i+1 ) .
n n
i=0
This has the feature, mentioned above, that for all x ∈ ni , i+1
n , we have f (x) = wi .
Because of the horizontal compression
n−1 involved in this conversion, the discrete
histogram contribution HR (d) = i=0 wi wi+d corresponds to the continuous
histogram contribution
1
d d
HR = f (x)f x + dx
n 0 n
1 n−1
d
= wi · χ[ i , i+1 ) f x + dx
0 i=0
n n n
1
n−1
d
= wi χ[ i , i+1 ) · f x + dx
i=0 0 n n n
i+1
n−1
n d
= wi f x+ dx
i=0 n
i n
i+d+1
n−1
n
= wi f (x) dx
i+d
i=0 n
n−1 i+d+1
n
= wi wi+d dx
i+d
i=0 n
1
n−1
= wi wi+d
n i=0
We return to the the issue of double-counting an interval that equals the di-
ameter (d = n/2 in the discrete case or d = 12 in the continuous case) in the
histogram HR (d). In music the diameter in the case of an equal-temperament
The Continuous Hexachordal Theorem 19
scale corresponds to a tritone. Recall from Definition 2 that the continuous his-
1
togram is defined by the equation HR (d) = 0 f (x)f (x + d) dx. Applying this
for d = 12 to the step function f (x) in Figure 3 results in
1
1 1
HR ( ) = f (x)f (x + ) dx.
2 0 2
Patterson’s first Theorem [Pat44] goes beyond the k = n/2 precondition of the
Discrete Hexachordal Theorem 1. It may be stated as: two homometric (n, k)-
rhythms have homometric complements. In our continuous generalizations, two
rhythms with the same number k of onsets have the same weight. So the gener-
alization is:
Proof. Let the weight function of R1 be f (x) and that of R2 be g(x). Fix a dis-
tance d. We compute HR1 (d) and show it is equal to HR2 (d). From Definitions 2
and 1, we have
1
HR1 (d) = f (x) f (x + d) dx
0
1
= (1 − f (x))(1 − f (x + d)) dx
0
And we have therefore established that the complementary rhythms are homo-
metric:
1 1
f (x) f (x + d) dx = g(x) g(x + d) dx
0 0
HR1 (d) = HR2 (d)
3 Open Problems
Our results may be interpreted in terms of polyphonic rhythms, in which several
instruments are linearly combined [OTT08]. For instance, to model three identi-
cal drums playing together, interpret the weight f (x) = 13 to mean that one drum
is struck on a particular beat, while the weight f (x) = 1 would mean all three
are struck. It would be interesting to explore whether homometric polyphonic
rhythms have a musical significance.
We know that two sets of points with different cardinalities and different
weights may be homometric, but we neither understand the constraints here
mathematically nor know if there is any musical interpretation of such sets.
Theorem 2 generalizes to weights in [0, 1] on a sphere, with distances measured
by geodesics, and with W (R) = 12 corresponding to the integral over a hemi-
sphere equalling 12 . The discrete analog is “distance regular” points on a sphere,
e.g., the vertices of a Platonic solid. Is there any musical analog for spheres in
any dimension?
Acknowledgements
The authors like to thank the anonymous referees for their useful comments.
References
[AG00] Althuis, T.A., Göbel, F.: Z-related pairs in microtonal systems. Memorandum
1524, University of Twente, The Netherlands (April 2000)
[Ami07] Amiot, E.: David Lewin and maximally even sets. Journal of Mathematics
and Music 1(3), 157–172 (2007)
[Bla99] Blau, S.K.: The hexachordal theorem: A mathematical look at interval re-
lations in twelve-tone composition. Mathematics Magazine 72(4), 310–313
(1999)
The Continuous Hexachordal Theorem 21
Fernando Benadon
American University
fernando@american.edu
1 Introduction
The goal of this paper is to present some conceptual and computational tactics related
to the metric analysis of speech rhythms. The parsing of speech by syllable stress is
similar to metric grouping in music. The two domains also inhabit similar temporal
regions. The kinship is evident not only in various forms of song, but also in explicitly
speech-based works by composers such as Steve Reich, Hermeto Pascoal, Jason
Moran, and many others. Despite a widespread interest in the temporal similarities
between speech and music, there is almost no published work on the music-temporal
structure of speech.
The literature on speech rhythms is vast and complex [1]. With titles such as “Tri-
ple Threats to Duple Rhythm” [2], articles on poetry analysis might be a good place to
begin our investigation. But they, like most works in metrical phonology, are more
interested in patterns of stress than in patterns of duration [3]. On the flipside, we find
precise durational measurements in phonetics and phonology. However, it is often
difficult to discern how their findings translate to a music-based conception of
rhythm. (What are we to make of the observation that the vowel in “bids” is on aver-
age 25 ms longer than the vowel in “bits”? [4]) When direct comparisons between
speech and musical rhythms are made, they take the form of statistical correlations
rather than rhythmic analyses [5].
In the following paragraphs I will show that the timing patterns of speech can be
analyzed in musically relevant ways. Before proceeding, I should emphasize that
there is no agreed upon methodology for marking syllable onsets. Nick Campbell [6]
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 22–31, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Speech Rhythms and Metric Frames 23
notes: “We can make no claim that clear definable boundaries exist between all pho-
neme-sized speech sounds, and in many cases the assignment of a label to a portion of
speech can be quite arbitrary” (p. 302). A common approach, used here, is to tag the
first vowel in the syllable.
B1 B2 B3 B4 B5
Fig. 1. Syllable IOIs (in milliseconds) for the five-beat phrase “I can remember during the
nineteen nineties,” recorded from the radio by the author. The beat is equivalent to a foot,
which “starts with a [syllable] stress and contains everything that follows that stress up to, but
not including, the next stress” [7].
The timing pattern for B1 falls squarely between two possible grids, one triple and
one quadruple (long-short-short). To choose the one that will provide the better fit, we
can turn to the literature on categorical rhythm perception. On Desain and Honing’s
[8] categorization experiments, most subjects transcribed this pattern as a 2-1-1—that
is, as a quadruple grid composed of one eighth plus two sixteenths.1 But this response
was by no means unanimous; the pattern sits right on a category boundary between
triple and quadruple in the authors’ time-clumping map.
1
The exact IOIs were 421-263-316, which are proportionally similar to B1’s 263-164-189.
24 F. Benadon
Another way to assess the grid of B1 is to quantize it. While various algorithms ex-
ist, the basic strategy involves looking for a goodness of fit between the performance
data and one of several competing grids. In the language of preference rules, we
would say that we prefer a grid that minimizes error distances between sounded
onsets and metrical subdivisions. Figure 2 places B1’s IOIs next to a triple and a quad-
ruple grid. The dashed error tails in the graph show that the two candidate grids pro-
duce roughly equal amounts of total error, with the triple grid (74 ms) having a very
slight edge over the quadruple grid (80 ms).
B1
Logically, the deviation error can be minimized with larger subdivision cardinal-
ities (provided the subdivision IOIs do not fall below the 100 ms threshold; [9]). This
strategy is tested in Figure 3, where the onsets are now compared to quintuplet and
sextuplet grids. The total error for the former is 74 ms; for the latter, 61 ms—the low-
est error yet.
B1
The improved goodness of fit that the sextuple grid provides should be rejected on
three grounds. One: the first syllable (“I”) is 263 ms long and therefore hardly divisi-
ble into three parts, as the sextuple grid asks of us. Two: the error difference (12 ms)
between the triple and sextuple grid is too small a reward for the computational price
being paid. I would rather endure an extra 12 ms in total error and subdivide 1+1+1
(triple) instead of 3+1+2 (sextuple). Three: the error tails now point in two directions,
Speech Rhythms and Metric Frames 25
2
Strictly speaking, a “two-note” rhythm is really a three-note rhythm consisting of two IOIs.
Likewise, a “three-note” rhythm consists of four IOIs: three attacks plus a “downbeat.”
3
There is no need to check for quintuple (2+3) or septuple (3+4), since these subdivisions are
inadmissibly small given the size of B2.
26 F. Benadon
ms. For those reluctant to relinquish a quadruple hearing of B1, (c) offers an adjusted
measurement window that is smaller than the full beat. Leaving out the first syllable,
the total duration of B1’s last two syllables is 350 ms, or an average of 175 ms per
sixteenth. This constitutes an 11% change from the end of B1 to B2. Hence the level of
metric concordance depends on the size and location of the measurement windows. I
will not pursue this idea further due to space constraints; consider flexible windows
duly placed on the pile of soon-to-be explored metric conundrums.
B1 B2 x =
Fig. 4. Evenly subdivided beats and their resulting subdivision durations. The full duration of
B1 is subdivided in (a) and (b), whereas (c) employs a narrower window for improved confor-
mity with B2. The preferred metric analysis is (b): 3/16 + 2/16.
Let us continue on to B3, the three-syllable foot “during the.” At this point in the
game, the reader will be glad to bypass the detailed report in favor of a quick diagno-
sis: this beat is basically triple (its IOIs are roughly equal). Does it maintain the sub-
division tempo established by the previous two beats? Dividing the total size of B3 by
3 yields metronomic subdivisions of 177 ms, a 9% decrease from the subdivisions in
B2 and a 15% decrease from B1. Though we need not answer it now, we should ask
the question we have been dodging: How much tempo drift are we willing to tolerate
before we deem the sequence non-metrical?
B4, the duple foot “nine-teen,” has the same duration as B1. But B1 has three short
syllables and B4 has two long ones. On the face of it, the metric solution seems sim-
ple: give B4 eighth-notes (or four sixteenths) and B1 eighth-note triplets. We will see
later that this kind of tuplet approach can be beneficial because it retains isochrony at
the beat level when subdivision isochrony begins to wobble. But in this case, it is
unclear whether the kinship between B1 and B4 also satisfies the timing patterns of
the intervening beats. The ideal meter should weigh the global needs of all beats
in the phrase. It also should take into account the sequential unfolding of events. From
the sixteenths in B3 to those in B4, there is a 12% decrease in duration. Our analytical
tolerance for this percentage drift is assuaged by recalling that subdivision speeds
have been increasing almost linearly since B1: 207-195-177-155. This trend may be
heard as a gradual accelerando of an otherwise fixed subdivision grid. Had the beats
been ordered differently (e.g., B3-B1-B4-B2), we would have walked a different ana-
lytical path, leading either to a different metric solution or perhaps to a blind alley.
Speech Rhythms and Metric Frames 27
B1 B2 B3 B4 B5
Fig. 5. Lalla Ward reading “and each one is a distinct full language” [12]. Syllables longer than
300 ms open the door for three-way (“one”) and four-way (“tinct”) subdivision. The chain of
mixed meters is held together by a fairly steady—often implicit—average sixteenth-note subdi-
vision from beat to beat.
5 The Tuplets
My main argument thus far has been that speech can be modeled as metered when its
subdivisions approach isochrony. The process often produces mixed meters sharing a
common subdivision value. When subdivision isochrony between beats is tenuous, we
might reconcile them as an overall tempo curve, as we did earlier. There is another
way to prop up meter when it falters at the subdivision.
One perceptually feasible alternative is to switch our focus from subdivision
isochrony to beat isochrony. The reason I view beat-level isochrony as a second resort
rather than as the norm is that most sentences contain feet with unequal syllable counts,
resulting in variable beat durations best explained by mixed meters.4 In some cases,
4
Some linguists once believed that speakers adjust syllable length to maintain interstress
isochrony between feet containing different numbers of syllables. The claim has since been
refuted [13].
28 F. Benadon
however, two beats may be the same size even if their subdivision count is different.
This translates into non-isochrony at the subdivision, for if two beats are the same size
and contain a different number of subdivisions, they cannot share the same subdivision
value. The alternative to subdivision isochrony is beat isochrony via the tuplet.
The first two beats in Figure 6 are quadruple and share a common subdivision du-
ration (within 10%, values not shown). The third beat is triple. Seeking subdivision
isochrony—a running sixteenth throughout—increases the sixteenth-note of the third
beat by almost 30%. Since the size of the third beat is within 5% of the first two,
modeling the third beat as three triplets (rather than three sixteenths) helps to preserve
a sense of isochrony.
Fig. 6. John Searle reading from [14]. Sixteenths on the third beat would not agree with those in
the preceding beats, so we slip into something more comfortable.
2:3
3:4
Fig. 7. Percentage change relationship between subdivision size and beat size. The lines corre-
spond to tuplet groupings.
Figure 7 shows the inverse relationship between subdivision and beat isochrony.
The y-axis plots percentage change in subdivision duration between two beats (usu-
ally adjacent). A change of 0% means that there is perfect subdivision isochrony be-
tween them. If the subdivision duration changes by a large enough amount from one
beat to the next, the new duration might lock into a tuplet value. The lower line plots a
quadruple-to-triple shift, such as the one in Figure 6. The third beat from that phrase
is marked here with a circle. Increasing the subdivision difference decreases the beat
size difference according to different tuplet configurations. A point near either axis
Speech Rhythms and Metric Frames 29
bodes well for meter: we get little change either in subdivision or in beat size. Meter
breaks down when a point is roughly half-way along a tuplet line. Returning to our
unanswered question concerning allowable metric drift, we might propose a ±10%
threshold for either the subdivision or the beat. A deviation of more than 10% at both
levels debilitates the sense of meter. For instance, the circle on the upper triple-to-
duple line corresponds to the middle beat in Figure 8. All of the phrase’s five beats
are snug 3/16’s, except for the smaller third beat. If we treat it as a beat of 2/16 in
order to seek subdivision concordance with the surrounding beats, we encounter a
21% increase in subdivision duration. If we assign a tuplet (2-in-place-of-3) in order
to seek concordance at the beat level, we encounter a 20% drop in beat duration.
Where to turn? This beat may be lost in the land of no meter.
Fig. 8. Noam Chomsky reading from [15]. Of the two boxed-in metrically feasible options,
neither one complies with the surrounding isochrony.
Figure 7 included only the quadrant where subdivision size increases, beat size de-
creases, and the tuplet’s numerator is smaller than its denominator. Figure 9 zooms
out to reveal all possible combinations. Five tuplet lines (and their reciprocals) are
shown: 1:2, 4:7, 2:3, 3:4, and 4:5.5 A tuplet is most useful as it crosses the y-axis,
where it yields little or no change in beat size. For instance, suppose that a beat with
100-ms subdivisions is followed by another containing 150-ms subdivisions, a 50%
increase (lower dashed line). We can model this deviation according to different tu-
plet frameworks, each yielding a different amount of error. With a 4:5 tuplet (circle)
we get a beat that is 20% too big. The 2:3 tuplet (triangle) gives the right fit.
For every subdivision change, there is a corresponding tuplet configuration that
provides a perfect fit in beat size. The catch, of course, is that only a small handful of
tuplet ratios are user friendly, and only one of these will match our desired subdivi-
sion count given the syllables in the foot. For instance, a 70% subdivision increase
(upper dashed line in Fig. 9) can be counterbalanced by a 10:17 tuplet. Clearly this is
not a viable metric solution. What other, more reasonable fractions cross the 70%
horizontal coordinate? The 4:7 tuplet (square) offers an appealing beat difference of
- 4%; also workable is the 2:3 tuplet (diamond), although at 12% its beat difference is
significantly larger. Figure 10 illustrates how these two tuplet options might work in
different phrases with hypothetical (but feasible) timing patterns.
5
The reciprocal of the tuplet ratio equals the slope of the line; for instance, the 2:3 tuplet line
has a slope of 1.5. Only ratios between 1.0 and 2.0 (and their reciprocals) are given. I include
1:2 (and 2:1) to provide a visual frame, even though this ratio is not generally thought of as a
tuplet.
30 F. Benadon
x= 120 204
beat -4%
sub 70 %
x = 120 204
beat 12 %
sub 70 %
Fig. 10. The number of subdivisions in the beat helps determine which tuplet form is most
appropriate. The top and bottom phrases correspond to the square and diamond in Figure 9,
respectively. Both phrases undergo the same increase in subdivision duration (70%), resulting
in different beat size deviations depending on the tuplet count.
Speech Rhythms and Metric Frames 31
6 Conclusion
Rather than lead us to a yes-or-no decision on the metrical status of a given speech
pattern, the various approaches described above suggest a more nuanced view that
weighs beat proximity, degree of isochrony within the beat, type of isochrony be-
tween beats, and magnitude of deviation. My future work will integrate these
approaches into a model that can compute concordance scores for different metric
solutions of a given speech sequence. The results could be useful for composers,
whose speech-based works are guided by musical intuition; music theorists, who have
no duration-based tools for comparing text that has been set to music with its speech
state; and popular music scholars who have noted speech-like rhythms in jazz and
blues music.
References
1. Patel, A.D.: Music, Language, and the Brain. Oxford University Press, New York (2008)
2. Weismiller, E.R.: Triple Threats to Duple Rhythm. In: Kiparsky, P., Youmans, G. (eds.)
Phonetics and Phonology: Rhythm and Meter, pp. 261–290. Academic Press, San Diego
(1989)
3. Goldsmith, J.A.: Autosegmental Metrical Phonology. Basil Blackwell, Oxford (1990)
4. Jacewicz, E., Fox, R.A., Salmons, J.: Vowel Duration in Three American Dialects. Ameri-
can Speech 82(4), 367–385 (2007)
5. Huron, D., Ollen, J.: Agogic Contrast in French and English Themes: Further Support for
Patel and Daniele. Music Perception 21, 267–271 (2003)
6. Campbell, N.: Timing in Speech: A Multi-Level Process. In: Horne, M. (ed.) Prosody:
Theory and Experiment: Studies Presented to Gösta Bruce, pp. 281–334. Kluwer Aca-
demic Publishers, Dordrecht (2000)
7. Abercrombie, D.: Syllable Quantity and Enclitics in English. In: Abercrombie, D., Fry,
D.B., MacCarthy, P.A.D., Scott, N.C., Trim, J.L.M. (eds.) In Honour of Daniel Jones, pp.
216–222. Longmans, London (1964)
8. Desain, P., Honing, H.: The Formation of Rhythmic Categories and Metric Priming. Per-
ception 32, 341–365 (2003)
9. London, J.: Hearing in Time: Psychological Aspects of Musical Meter. Oxford University
Press, New York (2004)
10. Povel, D.J.: Internal Representation of Simple Temporal Patterns. Journal of Experimental
Psychology: Human Perception and Performance 7(1), 3–18 (1981)
11. ten Hoopen, G., Sasaki, T., Nakajima, Y., Remijn, G., Massier, B., Rhebergen, K., Holle-
man, W.: Time-Shrinking and Categorical Temporal Ratio Perception: Evidence for a 1:1
Temporal Category. Music Perception 24(1), 1–22 (2006)
12. Pinker, S.: The Language Instinct (Audiobook). Orion Publishing Group (2001)
13. Dauer, R.M.: Stress-Timing and Syllable-Timing Reanalyzed. Journal of Phonetics 11,
51–62 (1983)
14. Searle, J.: The Philosophy of Mind (Audiobook). The Teaching Company (1995)
15. Chomsky, N.: The Emerging Framework of World Power (CD). AK Press (2003)
Temporal Patterns in Polyphony
Department of Computing
City University London
{bergeron,conklin}@soi.city.ac.uk
1 Motivation
Polyphony forms a large part of the western musical heritage and its essence
— having multiple concurrent streams of musical events (with the temporal
relations this implies) — is encountered in most kinds of modern music. However,
there are few computational approaches for the expression and efficient matching
of polyphonic patterns. This paper formally compares the expressiveness of three
such languages and proposes a new one, establishing the hierarchy of Figure 1. To
facilitate this presentation, arguments are restricted to patterns containing only
two voices; results may however be generalized to denser polyphonic textures.
As a motivating example, consider the two-voice suspension of Figure 2. This
typical polyphonic pattern is expressed in Figure 3 in the languages R (relational
patterns); Humdrum; and SPP (Structured Polyphonic Patterns). As illustrated
by the R expression (Figure 3i), even this simple pattern requires sophistication:
variables to be instantiated by three events; inequality statements ensuring that
the mapping from variables to events is injective; temporal relations between
events (discussed below); and pitch relations such as consonance and dissonance.
This paper restricts its attention to the following binary temporal relations:
i) m(a, b) (a meets b: a finishes when b begins), ii) the symmetric st(a, b) (a and b
start together), iii) sw(a, b) (a starts while b is sounding) and iv) the symmetric
ov(a, b) (a and b overlap: they sound together as some point in time). Figure 4
restates the relations in the notation of Allen [1] and Figure 2ii illustrates their
musical relevance.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 32–42, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Temporal Patterns in Polyphony 33
Dislocated chord
• SPP
H
SPPseq
Embellished Layered
tritone resolution Suspension passing tones
• • •
R
pitch
m
a
F4 b
E4
sw
st e
C3
c
d
F2
time
(i) (ii)
Fig. 2. (i) A 4-3 suspension between bass and alto voices in bars 16-17 of Bach’s chorale
BWV 283 and (ii) A piano-roll representation of the alto and bass voices
Fig. 3. A suspension pattern using (i) relations; (ii) Humdrum and (iii) SPP
ov(a, b) covers all the cases above (and their inverse), except a m b
Fig. 4. The temporal relation analyzed in this paper expressed in the notation of
Allen [1]
(i) (ii)
Fig. 5. Dislocated V7 chords captured by Pattern 1 : BWV 284 bar 15 (i) and BWV
318 bar 13 (ii)
consonance and dissonance with opposite voices, and satisfying the sw temporal
relation. SPP is further elaborated in Section 4.
2 Relational Patterns
A relational pattern r is simply a set of temporal relations over event variables ε:
Definition 1. r ∈ R ::= ω, . . . , ω with ω ::= m(ε, ε)
| ov(ε, ε)
| st(ε, ε)
| sw(ε, ε)
With appropriate pitch relations, the pattern below could represent the “dis-
located” V7 chords shown in Figure 5. It enforces that chord tones eventually
overlap with the root of the chord, but no other temporal relation is enforced:
Pattern 1. ov(a, b), ov(a, c), ov(a, d)
3 Humdrum
By contrast to R, temporal relations in H are specified indirectly via a token
matrix:
⎡ ⎤
Definition 2. h ∈ H ::= h11 h12 with hij ::= ε
⎢ h21 h22 ⎥
⎣ ⎦ | (ε)
..
.
|
The token ε refers to the beginning of a new event; the token (ε) is the continu-
ation of the preceding event and the token is the special “don’t care” symbol
36 M. Bergeron and D. Conklin
that enforces no temporal relation. Note that time “flows” from top to bottom
in Humdrum, e.g. the token h11 is followed by the token h21 . A H pattern is
interpreted as follows with respect to the temporal relations it enforces:
Lines Columns
H R H R
ab st(a, b) a
m(a, b)
b
(a) b sw(b, a)
(a)
m(a, b)
a (b) sw(a, b) b
(a) (b) ov(a, b) a
, , . . ., ∅
(a) a
a , a , (a) , . . ., ∅
The Humdrum pattern of Figure 3ii can be simplified to the following (variable
names correspond to those of Figure 2ii):
Example 1. e (a) sw(e, a), sw(b, e),
(e) b m(a, b)
b m m
a b a b
ov
(i) (ii) (iii) ov
a sw sw sw sw
ov
e d e
ov c m
d m m
a b a b
(iv) st (v) st st
c d c d
m
Fig. 6. Temporal networks enforced by (i) Pattern 1, (ii) Example 1, (iii) Example 2
(the dashed edge is implied by the network), (iv) Pattern 2 and (v) Pattern 3. The
network (iv) can be represented in SPP but not in Humdrum. The network (v) can
be represented in Humdrum but not in SPP.
Temporal Patterns in Polyphony 37
Patterns in SPP are defined according to the syntax below, where ε stands for
an event and −ε for a modified event (when layered, modified events start earlier
than other events in the layer); the “;” operator joins two patterns in sequence
(such that one finishes as the other starts) and the “==” operator layers two
patterns (such that both start at the same time):
Definition 3. φ ∈ SPP ::= ε
| −ε
| φ;φ
φ
|
φ
When ignoring pitch relations, the suspension example of Figure 3iii is sim-
plified to the following pattern (also Figure 6iii):
38 M. Bergeron and D. Conklin
(i) (ii)
Fig. 7. Layered passing tones captured by Pattern 2: BWV 255 bar 2 (i) and BWV
320 bar 19 (b)
Claim 1. SPP ⊆ H: there exists at least one pattern in SPP that has no
equivalent in H.
Consider the following SPP pattern (also Figure 6iv):
All of the above patterns enforce an additional temporal relation that is not
enforced by the SPP pattern, respectively st(b, d), sw(d, a) and sw(b, c).
Temporal Patterns in Polyphony 39
(i) (ii)
Fig. 8. Tritone resolutions captured by Pattern 3: BWV 257 bar 2 (i) and BWV 315
bar 13 (ii)
Claim 2. H ⊆ SPP: there exists at least one pattern in H that has no equiv-
alent in SPP.
By similar arguments, one can prove that the dislocated chord pattern (Pattern 1
and Figure 5) cannot be represented in either Humdrum or SPP. This explains
its place in the language hierarchy of Figure 1. This is also why the figure shows
that R properly subsumes H and SPP.
Characterizing the intersection between Humdrum and SPP, the pattern lan-
guage SPPseq restricts SPP to sequences of layers:
40 M. Bergeron and D. Conklin
a −a a −a
c c −c −c
Now, suppose there exists a pattern h ∈ H that covers the SPPseq pattern ϕ.
The induction cases are as follows (the case with two modified events −ε does
not appear; by definition of SPPseq , this is only allowed in the first layer):
a −a a
ϕ; ϕ; ϕ;
c c −c
The last two cases enforce an extra temporal relation (respectively sw(a, hn2 )
and sw(c, hn1 )) that the SPPseq pattern does not enforce. However, that rela-
tion can be inferred by the temporal relations that the SPPseq pattern do en-
force. That is, referring back to Figure 2ii, whenever the relations m(a, b), m(d, e),
sw(b, e) and ov(a, d) are present, then sw(e, a) can be inferred. This inference is
also indicated in Figure 6iii by a dashed edge.
Temporal Patterns in Polyphony 41
7 Discussion
This paper has presented three approaches that can accurately represent net-
works of temporal relations. Alternative approaches to polyphonic patterns often
lack that accuracy. For example, vertical patterns [3] can only match polyphonic
sources that have been expanded and sliced to yield a homophonic texture, hence
not supporting the sw relation. A point set pattern representation [6] can only
encode temporal relations with fixed duration ratios (capturing every instance of
a sw relation would require a set of patterns, the size of which can grow quickly
as many different ratios are likely to be found in the source). Techniques that
rely on approximate matching to a source fragment [7] can confuse simultane-
ous notes with notes that overlap without being simultaneous, hence lacking
precision with respect to the st relation.
With a little practice the musicologist should find it easy to write SPP pat-
terns, in contrast to Humdrum, which requires extensive knowledge of Unix
command line and regular expression tools. Relational patterns tend to be ver-
bose and one quickly loses sight of the overall temporal structure of the pattern,
where as the structure is syntactically expressed in SPP. In Humdrum, this is
readable when using the matrix form which this paper has developed. However,
negations and disjunctions that can in principle appear in the regular expressions
of a Humdrum pattern are not supported.
Finally, notice that R can express a great many temporal networks with un-
clear musical relevance (e.g. sw(a, b), sw(b, c)) and even networks that are unsat-
isfiable (e.g. m(a, b), st(a, b)). Perhaps there exists a restriction of R to “common
sense” musical patterns. Ideally, such a restriction would preserve most of R’s
expressiveness, while being conducive to efficient pattern matching algorithms.
Both Humdrum and SPP are candidate restrictions, yielding relational graphs
that are always satisfiable. The graphs are also always connected and perhaps
this connectedness is an interesting avenue to explore for future research. In par-
allel, a website with tools and tutorials is being developed in an effort to make
the languages presented in this paper more easily applicable to musicological
tasks.
References
1. Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of
the ACM 26(11), 832–843 (1983)
2. Bergeron, M., Conklin, D.: Structured polyphonic patterns. In: Ninth International
Conference on Music Information Retrieval, Philadelphia, USA, pp. 69–74 (2008)
3. Conklin, D.: Representation and discovery of vertical patterns in music. In: Anagnos-
topoulou, C., Ferrand, M., Smaill, A. (eds.) ICMAI 2002. LNCS (LNAI), vol. 2445,
pp. 32–42. Springer, Heidelberg (2002)
4. Fitsioris, G., Conklin, D.: Parallel successions of perfect fifths in the Bach chorales.
In: Fourth Conference on Interdisciplinary Musicology, Thessaloniki, Greece (2008)
42 M. Bergeron and D. Conklin
5. Jan, S.: Meme hunting with the Humdrum toolkit: Principles, problems, and
prospects. Computer Music Journal 28(4), 68–84 (2004)
6. Meredith, D., Lemström, K., Wiggins, G.A.: Algorithms for discovering repeated
patterns in multidimensional representations of polyphonic music. Journal of New
Music Research 31(4), 321–345 (2002)
7. Typke, R., Veltkamp, R.C., Wiering, F.: Searching notated polyphonic music us-
ing transportation distances. In: ACM Multimedia Conference, New York, USA,
October 2004, pp. 128–135 (2004)
Maximally Smooth Diatonic Trichord Cycles
Steven Cannon*
Abstract. In the usual seven-note diatonic scale, the maximally smooth cycle of
triads contains a long section that uses only major and minor triads, the same
triad that forms maximally smooth cycles within the twelve-note chromatic
scale. Tonal music exploits this property of the scale to create sequences of
similar chords. The goal of this study is to determine the extent to which such
long chains containing inversionally related species exist in maximally smooth
trichord cycles within microtonal scale systems that share certain properties
with the diatonic. The study thus combines neo-Riemannian theory, especially
Cohn’s concept of maximally smooth cycles, with the diatonic scale theory de-
veloped by Clough and other authors. The patterns of maximally smooth
trichord cycles depend on the type of scale within which they occur, and on the
cardinalities of the scales. Among all scales, the usual diatonic supports the
longest possible chain.
1 Introduction
An important feature of tonal music is its affinity for smooth motion from one har-
mony to another, maximizing common tones and minimizing melodic motion. In the
diatonic scale, triads take part in such motion easily. The thrift of triadic chord pro-
gressions has led Richard Cohn to characterize the major and minor triads as parsi-
monious trichords [1]. Generally, theorists discuss the progressions in a chromatic
context, but neo-Riemannian concepts are useful even within the limits of the diatonic
set. To better understand how triads fit within the familiar diatonic, I will explain how
they behave in scales from microtonal universes that use equal divisions of the octave
other than the usual twelve. To my knowledge, this intersection of scale theory and
neo-Riemannian theory has not yet been explored in detail in microtonal settings.
While a number of essays in the recent collection Music Theory and Mathematics:
Chords, Collections, and Transformations do bridge the gap between scale theory and
neo-Riemannian theory, they mostly stay within the usual 12-edo [2].
*
The author gratefully acknowledges the financial support of the Social Sciences and Humani-
ties Research Council of Canada.
1
To save space, I use common music-analytical symbols to indicate the qualities of chords: “+”
or major, “−” for minor, and “°” for diminished.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 43–56, 2009.
© Springer-Verlag Berlin Heidelberg 2009
44 S. Cannon
long chain of alternating major and minor triads, two species that are related by inver-
sion and thus, in a sense, equivalent. The major and minor triads are “parsimonious
trichords” within the larger chromatic universe, but the diminished triad is not.
Cohn further explains how to find parsimonious sets within other chromatic uni-
verses when he notes that major and minor triads (as well as half-diminished and
dominant sevenths) represent “minimal perturbations of a symmetrical division of the
octave” (see [2], p. 39, n. 40). In scale systems where the cardinality of the chromatic
universe is a multiple of three, the parsimonious trichord will be this kind of minimal
perturbation. In systems where the cardinality of the chromatic set is not a multiple of
three, the parsimonious trichord will simply be the closest approximation of an equal
division of the chromatic scale. Since the generic triad is the closest approximation of
an equal division of the diatonic scale into three, it is what Clough and Douthett call
“maximally even,” or “ME” within the diatonic. The diatonic scale is itself a maxi-
mally even distribution of seven notes within the twelve chromatic scale steps, so the
triad is thus “second-order ME” [4]. In addition, it is a “generated” set, whose genera-
tor has a diatonic length (dlen) of two scale steps, or a third.2
3 Useful Scales
While it is clear enough that parsimonious trichords exist in all chromatic universes,
how they fit in scales embedded within these universes has not yet been examined in
much detail. Before proceeding any further, however, it is first necessary to decide
which of the many possible scales we should examine. Clough and Myerson have de-
scribed important features of the usual diatonic, including cardinality equals variety
(CV) and partitioning [5]. In scales that share such features, the cardinalities of the
chromatic universe (c for short) and the scale (d for short) relate in one of the follow-
ing two ways, provided that d is odd:
c = 2d − 1 (1)
or
c = 2d − 2 . (2)
2
The abbreviations “dlen” and “clen” are also from Clough and Douthett.
Maximally Smooth Diatonic Trichord Cycles 45
Eytan Agmon was first to describe these classes of scales [6], which Clough and
Douthett later named “family A” and “family B” respectively. Clough and Douthett
call scales in family B “diatonic,” setting the familiar 7-out-of-12 scale aside with the
term “usual diatonic”; this usage has become standard in current literature. No useful
adjective has been coined yet for the first class of scale, which most theorists (proba-
bly correctly) consider less interesting, so I adopt Clough and Douthett’s term and say
that these scales are members of “family A.” I use the terms “family B scale” and
“diatonic scale” interchangeably. I do, however, also occasionally use the term “dia-
tonic” more loosely: the terms “maximally smooth diatonic cycle” and “diatonic
length” (or “dlen”) apply to both family A and family B scales. In scales of family A,
c is always odd, and the scale generator (gd for short) is always clen 2. The clen of gd
in diatonic scales is always equal to the value of d, and c is always a multiple of 4.
Notably, chromatic universes in which c is even but not divisible by four (that is, 6,
10, 14, 18, 22, etc.) do not contain any useful scales. In Clough, Engebretsen, and Ko-
chavi’s taxonomy, members of F-set 1 and F-set 2 are both diatonic, and the
complements of these scales fall into F-set 5; family A scales are members of what
Clough and colleagues call F-set 4, which also includes the complements of family A
scales [7]. I have omitted Clough and colleagues’ F-set 3, which contains a type of
scale first described by Gerald Balzano [8], because at higher cardinalities of d,
trichords that are parsimonious within the full chromatic set are not subsets of the
smaller scale.
I borrow several terms from the usual diatonic when describing scales in family A
and family B. I call all clen 2 intervals “whole-tones” and all clen 1 intervals “semi-
tones,” regardless of the real size of these intervals. Family B scales have two semi-
tones, separated as much as possible by whole tones. I call the highest note in the lar-
ger series of whole-tones the leading-tone, and the following note the tonic. I also
number scale-degrees in ascending order starting with the tonic as 1. The lowest note
in the larger series of whole-tones is the original note in the generating cycle—F in
the usual diatonic with no key signature—which has a semitone below it. I call this
note the “origin.” Figure 2 gives an example of a diatonic scale. Each circle in the
clock-face represents a note in the chromatic scale, with the notes of the diatonic scale
blacked in. Scales in family A contain but one scale-step of clen 1, with all other
scale-steps having clen 2. I consider the note directly above this single semitone to be
the tonic, and the note below it to be the leading-tone. See Figure 3 for an example of
a family A scale. In family A, 1 is also the origin, but 1 is the first generated note after
the origin in family B. To save space, I use a shorthand resembling a fraction to indi-
cate the cardinalities of scales: the number before the slash indicates the value of d,
and the number after the slash is the value of c. The scale is always a maximally even
distribution of the smaller value within the larger.
I have chosen to focus this study on family A and B in order to give these scale
types thorough treatment, but the results are also applicable to the complements of
such scales. Specifically, the pattern of the maximally smooth cycle for the second-
order ME trichord is the same for a family A scale with a given value for d as it is for
another set with the same value for d that is the complement of a family A scale. This
is also true for family B scales and their complements. For example, the cycles in 8/15
and 8/17 share a common structure. Similarly, the structure of the cycle in the usual
7/12 diatonic, shown in Figure 1, also appears in the 7/16 “hyperpentatonic.”
are augmented in some scales (two adjacent major gt’s) but diminished in others (two
adjacent minor gt’s), so I use the term “symmetrical,” (abbreviated as ) rather than ⓢ
diminished or augmented.
To determine the structure of maximally smooth diatonic trichord cycles, we
must first know how many chords of each species exist in the scale, and how they
relate to each other. Fortunately, Clough and Myerson provide methods for finding
this information quickly. An important feature implied by MP and CV is structure
yields multiplicity (SM). According to Clough and Myerson, “within a particular
genus, the number of chords in each species … is directly inferable from the generic
structure” (p. 250). This structure must be given in terms of the scale’s generator
(gd), as measured in generic scale steps. That is, we can measure any interval not
only in clen and dlen, but also by the number of gd intervals it would take to gener-
ate it (gdlen for short).3 When calculating gdlen, we must always count in generic,
rather than specific gd intervals, since specific intervals would give us different val-
ues for the two species of gt. The notes of any diatonic scale form a gd cycle, which
we know as the cycle of fifths in the usual diatonic scale. In these diatonic scales,
gdlen = 2(dlen); this doubling often forces us to octave-reduce the results, or at least
to count shortest distance as a descending interval rather than ascending (this is
equivalent to taking the complementary value within the modulus). In family A,
gdlen is the same as dlen, so no translation is necessary, and the gd cycle is identical
to the scale itself.
To calculate the multiplicities of second-order ME trichord species we can start by
finding their generators as measured in gdlen. These chords divide the diatonic scale
as evenly as possible, so to find the generator ge (measured in dlen) of any second-
order ME chord of any cardinality e, we simply divide the cardinality of the scale by
the cardinality of the chord, and round to the nearest integer:
⎡d⎤
ge = ⎢⎣ e ⎥⎦ . (3)
⎡d⎤
For trichords, gt = ⎢⎣ 3 ⎥⎦ . (4)
Once we know the dlen chord generator, we must then translate it into gdlen (only
necessary for family B scales) and generate the sets. Lastly we reduce the generated
set to within one modular cycle (mod d), and find the multiplicity of each species by
counting how many gd intervals lie between each pair of adjacent notes.
For trichords in family A scales, the value of gt in dlen also gives the multiplicities
of the major and minor species. To find this value, divide d by three and round to the
nearest integer, as in (4) above. The multiplicity of symmetrical species is what re-
mains once the major and minor trichords are deducted from d, as in (6) below.
3
Clough and Myerson have different ways of abbreviating the scale generator to show whether
it is measured in clen or dlen. In clen it is d′ and in dlen it is c′. They do not describe trichord
or tetrachord generators.
48 S. Cannon
⎡ d ⎤
no. of symmetrical trichords = d − 2 ⎢
⎣ 3 ⎥⎦
.
(6)
⎡d⎤
no. of major or minor trichords = d − 2⎢ ⎥ .
⎣3⎦ (8)
⎛ ⎡ d ⎤⎞ ⎡d⎤
no. of symmetrical trichords = d − 2 ⎜ d − 2 ⎢ ⎥ ⎟ = 4⎢ ⎥ − d . (9)
⎝ ⎣ 3 ⎦⎠ ⎣ 3⎦
Inputting different values for d, we can discard those that are multiples of 3, since
scales of these cardinalities will not support maximally smooth diatonic cycles, and CV
will not hold for harmonic trichords (although it will hold for three-note melodic lines).
5 Trichord Cycles
The patterns of the maximally smooth trichord cycles depend on the type of the scale,
and on the cardinality of d, so we will consider each case in turn with one example of
a specific scale. If d ≡ 2 (mod 3) in a family A scale, the trichords have the following
properties:
• The parsimonious trichord is asymmetrical, and will give the major and mi-
nor species.
• The number of symmetrical trichords is fewer than the number of major or
minor trichords.
See Figures 4 and 5 for illustrations of how trichords fit in this kind of scale. Each
ⓢ
scale-degree number is accompanied by a symbol (+, −, or ) indicating the species
of trichord generated from that scale-degree. The arrows within the clock-face trace
Maximally Smooth Diatonic Trichord Cycles 49
the minor trichord that has 8 as its root, which is parsimonious within the chromatic
scale and is not inversionally symmetrical.4 In these clock-face diagrams, the arrows
always indicate parsimonious trichords, which are different species in different scales.
We may now form a maximally smooth diatonic cycle in this universe, which is
analogous to the cycle in Figure 1: pairs of adjacent trichords maintain two common
tones, and the one voice that moves will only proceed by scale-step. The root motion
along this cycle is by intervals of dlen gt. The complete cycle is given in Figure 5.
Note that inversionally related major and minor trichords always come in pairs, with
the symmetrical trichords distributed as evenly as possible between these pairs. Since
there are three pairs of major and minor chords, but only two symmetrical chords, two
of the pairs must be adjacent. These adjacent pairs form a chain of four alternating
major and minor trichords in a row, which is the longest such chain in the cycle.
longest chain
^5+, ^8, ^3+, ^6, ^1 , ^4+, ^7, ^2
Fig. 5. Maximally smooth diatonic trichord cycle in 8/15
In family B scales where d ≡ 1 (mod 3), the situation is similar except that the gt in-
tervals must be translated from dlen into gdlen. Figure 6 shows a thirteen-note diatonic
4
Although 8 is the root in as much as it is the note from which the chord is generated, the so-
nority will sound to our ears as if 6 is the root because the chord’s tuning is fairly similar to
that of a minor triad in the usual diatonic.
50 S. Cannon
scale in 24-edo. Figure 7 gives the cycle of gd for this scale. The arc using a dotted line
between 1 3 and 7 indicates the short gd, equivalent to the diminished fifth between B and
F in the usual diatonic.5 The leading tone always occurs just before the short gd, the ori-
gin immediately after, and proceeding clockwise the tonic appears next after the origin.
The directions of the arrows appear to change from clockwise in Figure 6 to counter-
clockwise in Figure 7 when the intervals are doubled to translate them from dlen to
gdlen. Figure 7 also demonstrates SM well. The multiplicity of each species is given by
the distance between the notes in the gd cycle: moving counter-clockwise from the root
of the chord, the distances are 5, 5, and 3, which match the multiplicities of
5
I follow Clough and Myerson, who also indicate the short fifth in this way.
Maximally Smooth Diatonic Trichord Cycles 51
1^0, ^1+, ^5, ^9+, 1^3 , ^4, ^8+, 1^2, ^3, ^7+, 1^1, ^2+, ^6
Fig. 8. Maximally smooth diatonic trichord cycle in 13/24
Four chords may be shorter than the series of six in the usual diatonic, but in other
scales the chains are shorter still. In family A scales with d ≡ 1 (mod 3), and in family B
scales with d ≡ 2 (mod 3), second-order ME trichords have the following properties:
• The parsimonious trichord is symmetrical.
• The number of symmetrical trichords is greater than the number of major or
minor trichords.
Illustrations of second-order ME trichords in this kind of scale are given in Figures
9–13. Figures 9–10 illustrate a scale in family A, and Figures 11–13 illustrate a scale
in family B. Note the lack of any long chains, as indicated by the short brackets above
the cycles. At most, the chains are two chords long.
^1^4^7+, 1^0, ^3^6+, ^9, ^2 ^5+, ^8
More abstract clock-faces appear in Figures 14–17. The circles represent an un-
specified value of d. I do not indicate specific notes, but instead reckon the propor-
tional number of notes using the lengths along the circumference. In family A scales,
Maximally Smooth Diatonic Trichord Cycles 53
1^1^4^8, ^1+, ^5^9 ^2, ^6+, 1^0, ^3, ^7+
Fig. 13. Maximally smooth diatonic trichord cycle in 11/20
Fig. 14. Root progression of trichord cycles in family A scales with d ≡ 2 (mod 3)
Fig. 15. Root progression of trichord cycles in family B scales with d ≡ 1 (mod 3)
distances along the circumference are measured in dlen; in family B scales, distances
are measured in gdlen. The arrows inside the circle trace the root progressions in se-
quences of four trichords from maximally smooth cycles. The arcs for both major and
minor species, as well as all of the arrows, all have the same value: gt. Figures 14 and
15 illustrate scales with longer chains of alternating chords. Note that gt is greater than
54 S. Cannon
d/3, such that the third arrow places the root of the fourth chord past the root of the
first chord, and the arrows cross. Since the number of symmetrical trichords is less
than gt, no symmetrical chords are adjacent in any maximally smooth diatonic cycles;
since the numbers of major trichords and minor trichords are equal to gt, they will al-
ways come in adjacent pairs. Moreover, at least one section of the cycle always has a
chain of alternating major and minor trichords that is four chords long.
Figures 16 and 17 illustrate scales without long chains. Here gt is less than d/3: the
root of the fourth chord in the cycle does not pass the root of the first chord, and the
arrows do not cross. Since the number of symmetrical trichords is greater than gt,
there will be at least one pair of adjacent symmetrical trichords in the maximally
smooth diatonic cycle. Major and minor trichords still come in pairs, but at least one
symmetrical trichord always sits between these pairs; no section of the cycle will have
a chain longer than two chords.
Fig. 16. Root progression of trichord cycles in family A scales with d ≡ 1 (mod 3)
Fig. 17. Root progression of trichord cycles in family B scales with d ≡ 2 (mod 3)
Maximally Smooth Diatonic Trichord Cycles 55
6 Conclusions
, , , , +
Fig. 19. Maximally smooth diatonic trichord cycle in 5/8
• Ratio of symmetrical to major: 3
• Major/minor pairs cannot be adjacent but two symmetrical
trichords can
• Longest chain: 3 chords
56 S. Cannon
, +, , +, , +, , +, , , +, , +, , +,
Fig. 21. Maximally smooth diatonic trichord cycle in fictional scale system
• Ratio of symmetrical to major: 0.286
• Major/minor pairs can be adjacent
• Longest chain: 8 chords
, +, , +, , +,
Fig. 22. Maximally smooth diatonic trichord cycle in 7/12 (usual diatonic)
• Ratio of symmetrical to major: 0.333
• Major/minor pairs can be adjacent
• Longest chain: 6 chords
References
1. Cohn, R.: Neo-Riemannian Operations, Parsimonious Trichords, and their Tonnetz Repre-
sentations. Journal of Music Theory 41, 1–66 (1997)
2. Douthett, J., Hyde, M.M., Smith, C.J. (eds.): Music Theory and Mathematics: Chords, Col-
lections, and Transformations. University of Rochester Press (2008)
3. Cohn, R.: Maximally Smooth Cycles, Hexatonic Systems, and the Analysis of Late-
Romantic Triadic Progressions. Music Analysis 15, 9–40 (1996)
4. Clough, J., Douthett, J.: Maximally Even Sets. Journal of Music Theory 35, 93–173 (1991)
5. Clough, J., Myerson, G.: Variety and Multiplicity in Diatonic Systems. Journal of Music
Theory 29, 249–270 (1985)
6. Agmon, E.: A Mathematical Model of the Diatonic System. Journal of Music Theory 33, 1–
25 (1989)
7. Clough, J., Engebretsen, N., Kochavi, J.: Scales, Sets, and Interval Cycles: A Taxonomy.
Music Theory Spectrum 21, 74–104 (1999)
8. Balzano, G.J.: The Group Theoretical Description of 12-Fold and Microtonal Pitch Systems.
Computer Music Journal 4(4), 66–84 (1980)
Towards a Symbolic Approach to Sound Analysis
Università degli studi di Bologna - Via Zamboni 38, 40126 Bologna, Italia
Music, in its final stage of performance, can be described in many ways: it can be
viewed as a time-varying signal and can be described by expressing the evolution
of its physical properties over time. Music can be also viewed as a symbolic sys-
tem exploiting relationships between sonic objects1 and can be described with
a formal language able to express these relationships over time. Common ap-
proaches for music description generally take into account the different points
of view by selecting a particular degree of abstraction in the domain of the
representation: either they rely on the signal level, either on the symbolic level
or on a fixed mixture of both. The latter case is generally known as mid-level
representation: this term is used in the computer audition community to indi-
cate intermediate modelings of hearing usually based on perceptual criteria [2].
While signal-level representations are computationally efficient, invertible2 and
express some physical properties associated to the signal, they lack in abstrac-
tion and usually don’t provide any kind of information about hierarchies, formal
relationships between sonic-objects and so forth; they are unable to manipulate
concepts other than the basis of the analysis itself (such as sinusoids, wavelets,
etc.). The usual way to represent the signal-level decomposition of a signal x[n]
into expansion functions is a linear combination of the form:
K
x[n] = αk gk [n]. (1)
k=1
1
With this expression, here, we intuitively mean any kind of event that appears in
the musical flow; a precise definition of sonic-objects is exactly the scope of any
representation.
2
With invertibility, here, we mean the possibility to go back to the signal domain
from the representation itself.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 57–64, 2009.
c Springer-Verlag Berlin Heidelberg 2009
58 C.E. Cella
physical
abstractness
connection
Mid-level
Mid-level
Signal level Mid-levels Symbolic level
generality expressivity
The coefficients αk are derived from the analysis stage, while the functions
gk [n] can or cannot be determined by the analysis stage and are used during the
synthesis stage; both stages are related to a particular signal model [4].
On the other hand, symbolic-level representations can express complex rela-
tionships and hierarchies but are inefficient, non-invertible and are hardly related
to the physical nature of sound: they are usually based on logical rules that can-
not be verified by any model3 .
Mid-level representations, finally, try to address the issue related to the lack of
generality by focusing on relatively simple concepts that are, however, more ab-
stract than the basis of the analysis. These concepts are usually based on perceptual
criteria related to the low-level hearing and are situated in between the constraints
imposed on them by lower and higher levels. The power of this kind of representa-
tions stands in the fact that they are usually invertible and that the logical rules
they involve are generally verifiable by some models related to perception.
All the representation levels discussed so far can be used to describe music; they
are different because each one of them captures particular aspects of the sound.
However, they share two common drawbacks: first, all of them have a fixed degree
of abstraction. In other words, they are not scalable: once a representation level has
been selected it is not possible to go smoothly to another level; while signal-level
representations are very useful from numerical and computational points of view,
higher level representations are essential to human reasoning. Second, all of them
impose their own concepts onto the signal: each representation models the signal
with it’s own concepts, even if they are completely irrelevant to that particular
signal; figure 1 roughly depicts the described ideas.
The main purpose of this article is to propose a representation method for
music that, while being generic enough to be used for different signals, fulfills
by-design the following requirements:
– signal-dependent semantics: the underlying logic and the involved con-
cepts of the representation should be inferred from the signal, using learning
techniques; this creates the possibility to describe concepts that are really
related to the sound being analysed (adaptive);
– scalability: it should be possible to change the degree of abstraction in
the representation, ranging from the signal level to the symbolic level in a
3
In the context of this article, with symbolic-level representations we mean highly for-
malized descriptions of music, possibily based on a formal language and on its under-
lying logic [1]. First attempts to apply formal logic to music rely mainly on a deductive
system called first-order logic. Later on, inspired by linguistic ideas, other extensions
of logic have also been tested (temporal, modal, non-monotonic, etc.) [5], [6].
Towards a Symbolic Approach to Sound Analysis 59
2 Sound Types
Symbolic-level and signal-level representations are complementary views of an
underlying world: the former are expressive but don’t relate easily with the
modeled reality, the latter are physically-connected but lack in abstraction. The
following sections will propose a connection between the signal and the simbolic
level by suggesting a representation based on types inferred by some low-level
descriptions of signals and subsequent learning stages. From a logical point of
view, the concept of type is formalized in the so-called theories of types; from a
computational point of view, low-level descriptions and statistical learning build
to the so-called audio indexing theory.
Rules T1 and T2 define he so-called atomic types while rule T3 defines com-
pound types. The logical symbols of STT are defined as follows: function appli-
cation: @, funcion abstraction: λ, equality: =, definite description: ι, an infinite
set of symbols called variables: ν. We can now define a language of STT as the
ordered pair L = (C, φ) where:
In other words, a language is a set of symbols with types that have been
assigned. It is now possible to define an expression of the language L with another
set of formation rules:
4. if E is of the form λx : α.B with B of type β, then VψM (E) is the function
f : Dα → Dβ such that ∀d ∈ Dα , f (d) = Vψ(x:α→d)
M
(B);
5. if E is of the form E1 = E2 and VψM (E1 ) = VψM (E2 ), then VψM (E) = true;
otherwise VψM (E) = f alse;
The number of iterations of the whole process are the abstraction levels of the
representation. In terms of atomic decomposition, all the sets of the discovered
types are time-frequency atoms with different time scales and spectral content;
the higher the level of a type the less it is generic, the more expressive.
Figure 2 illustrates the approach.
Fig. 2. An outline of the proposed algorithm for types and rules inference
Low-level descriptors and statistical techniques are not used to classify dif-
ferent sounds, but parts of a single sound; another approach could take into
account a real population of sounds and compute sound types over a whole
database; since different atoms and sequences (moleculae) belong to the same
type as long as they share common properties (defined by the set of descriptors),
they could theoretically be shared between different sounds. From an acoustical
point of view, the information amount increases dramatically from level to level,
ranging from the so-called acoustical quanta to segments of sounds that could
be even recognized as sections of a musical composition.
64 C.E. Cella
References
1. Cella, C.E.: Sulla struttura logica della musica. Rivista umbra di musicologia 48,
1–82 (2004)
2. Ellis, D., Rosenthal, D.: Mid-level representations for Computational Auditory Scene
Analysis. In: International joint conference on Artificial Intelligence (1995)
3. Farmer, W.M.: The seven virtues of simple type theory. Journal of Applied Logic 72
(2007)
4. Goodwin, M., Vetterli, M.: Atomic decompositions of audio signal. In: IEEE Audio
Signal Processing Workshop (1997)
5. Leman, M.: Expressing coherence of musical perception in formal logic. Mathematics
and music: a Diderot Mathematical Forum, pp. 184–198 (2002)
6. Marsden, A.: Timing in music and modal temporal logic. Journal of Mathematics
and Music 1(3), 173–189 (2007)
7. Peeters, G.: A large set of audio features for sound description (similarity and clas-
sification) in the CUIDADO project. CUIDADO I.S.T. Project Report, 1–25 (2004)
Plain and Twisted Adjoints of Well-Formed
Words
Abstract. This paper studies the mathematical basis for a new study of
modes of well-formed (WF) scales, and presents a new characterization
of special standard Sturmian morphisms.
We introduce WF words, which coincide with the step-interval pat-
terns of modes of well-formed scales. WF words can be represented as
conjugates of some Christoffel word (generalized Lydian mode).
To every WF word we may assign a pair of affine automorphisms
fw and gw . These assignments induce a pair of involutions over the set
of WF words: the plain adjoint and the twisted adjoint. We study the
properties of these adjoints; in particular we show how the plain adjoint
coincides with duality over the set of Christoffel words and also that
the twisted adjoint extends Sturmian involution to the set of WF words.
Thomas Noll’s divider incidence result holds, inter alia, that w is special
standard if and only if fw (1) = 1.
1 Geometrical Motivations
In [7], two topics were connected: step-interval patterns of WF scales (see [4]
and [5]) and Christoffel words (see [9], [1] and [10]). This connection initiated
the possibility of further integration of the algebraic combinatorial theory of
words into mathematical scale theory, and possibly reciprocally. Further music-
theoretical interpretation of the results herein are explored in [6].
We consider the monoid {x, y}∗ of words on a two-letter alphabet A = {x, y},
{x, y}∗ = {w|w = w1 . . . wN ; wi ∈ A}. The empty word, denoted , belongs to
{x, y}∗ and the monoid operation is concatenation of words. If F is an endo-
morphism of {x, y}∗ , F (w) = F (w1 . . . wN ) = F (w1 ) . . . F (wN ), so F is entirely
determined by the images of x and y. The monoid St of Sturmian morphisms
on {x, y}∗ can be generated by the following morphisms:
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 65–80, 2009.
c Springer-Verlag Berlin Heidelberg 2009
66 D. Clampitt, M. Domı́nguez, and T. Noll
Given an irrational number 0 < θ < 1, we know that the scale of N notes gener-
ated by θ is well-formed if and only if N is the denominator of a (semi-) convergent
of θ. The succession of (semi-)convergents of θ produces an ordered infinite fam-
ily of well-formed scales called a hierarchy of WF scales generated by θ. If the two
distinct step intervals are of sizes a and b with a < b, then the larger step inter-
val splits into two intervals of sizes a and b − a in the next level of the hierarchy.
We can start from the two-note scale, {0, θ}, which has step-interval pattern xy,
then, every subsequent step-interval pattern of the hierarchy can be obtained as
the image of xy by an element of the monoid G, D̃. These morphisms, G and D̃
connect two consecutive cases in the hierarchy, (see Figure 1, (a) and (b)).
Let us consider generalized generated sets of the type
where {x} means decimal part of x. These sets are rotations of the scale of N
notes generated by θ. Recall that the generator θ coincides with the arc length
swept clockwise from every note on the circle σi to the next one in generation
order σi+1 . Thus, modes of a scale can be represented by generalized generated
sets Σk . It follows from the circular representation of the scale that morphisms
D and G connect consecutive cases of WF modes when notes are added to the
scale in the negative direction (see Figure 1, (c) and (d)).
The proposition below relates rotations of the step pattern of a WF scale with
step patterns of generalized generated sets.
Proof. One has just to recall that the homothecy h|w|y (k) = |w|y · k mod N
transforms scale order into generation order, and thus h|w|x transforms scale
order into generation order in negative direction.
We say that two words u, v are conjugated if u = xy and v = yx for some
words x, y. Notice that conjugation can be thought of as an equivalence rela-
tion via circle rotations, if we write the words around a circle. Therefore the
last proposition asserts that generalized generated scales have conjugated step-
interval patterns. This geometric interpretation suggests that the modes of WF
scales may be presented as words that encode rotations of generalized WF sets.
The next section will define such words as WF words, which will be shown to
be conjugates of Christoffel words.
2 Well-Formed Words
If we take as point of departure the geometrical interpretations of generalized
generated scales and their step-interval patterns, we can define in a purely word-
theoretical context a well-formed (WF) word.
Definition 1. Let w ∈ {x, y}∗ be a word formed by letters x and y.
1. The balance map of w is the map βw : {1, . . . , |w|} −→ Z with
|w|y for wk = x
βw (k) = .
−|w|x for wk = y
2. The accumulation map of w is the map αw : Z|w| −→ Z with
k
αw (0) := 0 αw (k) := βw (r).
r=1
Let {n · θ}n=0,...,N −1 be a WF scale with step pattern w ∈ {x, y}∗ . The size
of the jumps between consecutive notes in generation order is {θ} and {1 − θ}.
On the other hand, these jumps will be |w|y steps clockwise or N − |w|y =
|w|x counterclockwise, since |w|y is the diatonic length of the generator (see
[5], [3]). Thus, the balance map can be seen as the number of step intervals of
the generator in each appearance, with positive sign for clockwise and negative
for counterclockwise, whereas the accumulation map transforms scale order into
(generalized) generation order.
Definition 2. A word w is called well-formed if there exists an integer µw ∈
{0, . . . , |w| − 1} such that {αw (0) + µw , . . . , αw (|w| − 1) + µw } = {0, . . . , |w| − 1}.
µw is called the mode of w.
The notion of a well-formed word was introduced by Thomas Noll in [11] and it
is a generalization of a Christoffel word, as the following proposition states.
Proposition 2. A word w of length |w| = N is well-formed with mode 0 if and
only if it is a Christoffel word.
68 D. Clampitt, M. Domı́nguez, and T. Noll
k
k
k+1
αγw (k) = βγw (r) = βw (r + 1) = βw (r) = αw (k + 1) − βw (1),
r=1 r=1 r=2
This last result, together with proposition 2 and the fact that a morphism of
words F is Sturmian if and only if the word F (xy) is conjugated with some
Christoffel word, yields the central result of this section:
Theorem 1 (Characterization of well-formed words). A word is well-
formed ⇐⇒ it is conjugated with some Christoffel word. The set of all well-
formed words coincides with the set of all words of the type w = F (xy) where
F ∈ St.
We conclude the section with a description of special standard words in terms
of well-formed words of mode |w|y − 1.
Lemma 2. The special standard word w = uxy and the Christoffel word xuy
are conjugated and one has
−1
γ |w|y −1
w = xuy.
Proof. A word w is special standard ⇔ w = w1 · w2 with (w1 , w2 ) a standard
pair. Following [10, Lemma 2.2.8] we have
w1 = pyx = qr w1 = qyx
either or .
w2 = qxy w2 = pxy = qr
where p, q and r are in P AL (the set of palindromes). In the first case we have
that
γ |w1 |−1 w = x · qxyp · y = x · qrq · y.
Therefore γ |w1 |−1 w = xuy with u ∈ P AL ∩ P ALxyP AL that coincides, by [10,
Corollary 2.2.9], with the set of central words. xuy is thus a Christoffel word.
The second case is completely analogous. Notice finally that |w1 | = |w1 · w2 |−1
y
mod |w| and |w2 | = |w1 · w2 |−1
x mod |w|.
Proof. Proof. We have just to compute the mode of the special standard word
−1
w depending on the mode of the Christoffel word γ −(|w|y −1) w, which is zero:
µw = 0 − (|w|−1
y − 1) · |w|y = |w|y − 1.
ρ(i+k|w|−1
x )modN
= σ(|w|y ·i)modN −k , ∀i ∈ ZN .
70 D. Clampitt, M. Domı́nguez, and T. Noll
G(x,
Fig. 3. Generated scale of pattern (xyx, yx) = D y)
Example 1. We show in the following table the conjugation class of the diatonic
step pattern xxxyxxy in relation with the conjugation class of the dual pattern
(plain adjoint).
Observe that the inverse of an affinity g(k) = a·k + b is the affinity g −1 (k) = a−1
mod n · k + (−a−1 · b) mod n. Thereby one has that
βw (k) = αw (k) − αw (k − 1) = |w|−1
y mod |w|.
rev
G, D / WF
Proof. Observe that w is the dual of w and the Christoffel morphisms related
to dual words are retrograde (see [7, Proposition 10]).
One has also the same result for the standard monoid:
Proposition 6. The following diagram is commutative (where F rev denotes the
retrogradation of F as a word in G, D):
/ WF ,
G, D
rev
/ WF
G, D
72 D. Clampitt, M. Domı́nguez, and T. Noll
Proof. The square in the proposition may be decomposed in the following way:
G, D / WF
HH
HH ρ ~ ~>
HH φ ~~
HH ~~
H# ~~
~~
G, D̃ / Ch
rev
G, D̃ / Ch
@@
;v @@
ρ vv @@φ
v @@
vvv @@
vv
G, D / WF
G→G
where ρ : and φ(xuy) = uxy. One has just to show that every smaller
D → D̃
square commutes. The left square trivially commutes and the square in the center
is commutative by proposition 5. If the characteristic morphism of a Christoffel
word xuy is F, then the characteristic morphism of uxy is ρ(F ). Thus upper
and lower squares also commute. The commutativity of the square on the right,
finally, is equivalent to the formula
uxy = u xy,
The problem is that this nice formula F (xy) = F rev (xy) does not extend to
the whole special Sturmian monoid St0 . That is, the following diagram is not
commutative:
/ WF
St0
rev
St0 / WF
Notice that G̃(xy) = (y, xx) is not a morphic WF word. That is, a so-called
bad conjugate, not the image of a morphism.
We can solve this problem by giving a definition in a parallel way to the plain
adjoint, namely the twisted adjoint :
Example 2. The following table shows the twisted adjoints of all words conju-
gated with the diatonic step pattern xxxyxxy.
To prove this result we need to recall the description of the set of special
Sturmian morphisms F such that F (xy) is conjugated with a given Christof-
fel word w. There are exactly N − 1 such morphisms, and they can be ordered
as F1 , F2 , . . . , FN −1 with F1 a special standard morphism (generated by G and
D) and γFi (xy) = Fi+1 (xy) for all i = 1, . . . , N − 2 (see [2]).
Lemma 3. Given a WF word w and let F1 , F2 , . . . , FN −1 be the set of special
Sturmian morphisms related to w. Then, the twisted affinity associated with the
special Sturmian morphism Fi is gFi (k) = |w|x · k + |w|x · i.
Proof. From [2], Lemma 4.2 we have that F|w|−1 y
(xy) = u is a Christoffel word,
and thus its twisted affinity is gu (k) = |w|x · k + N − 1. By definition of twisted
affinity one has that gγ j w ≡ gw + j · |w|x mod N and thus, we have that F1 =
−1
γ (|w|x +1) F|w|−1
y
has as twisted affinity
gFi∗ = gFi·|w|
= |w|−1 −1 −1 −1
x · k + |w|x · |w|y · i = |w|x · k + |w|x · (N − |w|x ) · i ≡
y
≡ |w|−1
x · k − i = (|w|x · k + |w|x · i)
−1
= gF−1
i
,
◦ = ◦ : W FN → W FN ,
4 Divider Incidence
In this section we extend Noll’s result for plain adjointness (see [12]), Divider
Incidence, which holds, inter alia, that a word w is positive standard Sturmian if
and only if fw (1) = 1. (See Figure 4). Before we can explain, why fw (1) = 1 (and
likewise in the twisted case gw (1) = 1) are expressions of divider incidence, we
shall first understand the difference between plain and twisted adjoint in terms
of height and width trajectories. First recall the plain case (see left graph in
Figure 4 and [12] for details), where the height trajectory of a word w is built as
a point sequence Φ : {0, . . . , N } → Z2 , whose difference vectors Φ(k) − Φ(k − 1)
are either va = (|w|b , 1) or vb = (−|w|a , 1). The sequence of indices (i.e., of a’s
and b’s), which is associated with the sequence of vectors (Φ(1) − Φ(0)), (Φ(2) −
Φ(1)), ..., (Φ(N ) − Φ(N − 1)), coincides with the order of the letters a and b in
the word w. The width trajectory of the plain adjoint word w is the point
sequence Ψ : {0, . . . , N } → Z2 , whose difference vectors Ψ (k) − Ψ (k − 1) are
either vx = (1, |w |y ) or vy = (1, −|w |x ). Here again the sequence of indices
Plain and Twisted Adjoints of Well-Formed Words 75
case it is Φ (|w |y ) = Ψ (|w|b ) = (1, 1). Why is it the point (1, 1)? For any spe-
cial Sturmian morphism F the accumulation at the divider of w = F (x)|F (y)
is αw (|F (x)|) = 1. This is a consequence of the known fact that the inci-
|F (x)|x |F (y)|x
dence matrix MF = is an element of SL2 (Z) (see [10]):
|F (y)|x |F (y)|y
αw (|F (x)|) = |w|y |F (x)|x −|w|x |F (x)|y = (|F (x)|y +|F (y)|y )|F (x)|x −(|F (x))|x +
|F (y)|x )|F (x)|y ) = |F (y)|y |F (x)|x − |F (y)|) |F (x)|y = Det(MF ) = 1. Thus the
main point in the equations fw (1) = 1 and gw (1) = 1 is the assertion that it
is precisely the argument 1, where the affine morphisms fw and gw take this
accumulation value 1.
Proof. We shall first show that the conditions (b) and (c) are equivalent in both
cases (1) and (2). Remember that in the plain case the value fw (0) is the reduc-
tion modulo N of the minimal accumulation. Condition (1b) is therefore equiv-
alent to the condition (1c), namely that the index with minimal accumulation is
the divider predecessor (simply because 0 proceeds 1 in the ascending order of
the arguments). In the twisted case, the value gw (0) is the reduction modulo N
of the maximal accumulation, and condition (2b) is therefore equivalent to the
condition (2c), namely that the index with maximal accumulation is the divider
successor (because 0 follows 1 in the descending order of the arguments). The
main part of the proof is therefore to show the equivalence of either (b) or (c)
with condition (a).
For the plain case this is done in [12] by means of structural induction down the
free monoid G, D of special standard morphisms. The same technique applies
to the twisted
case. But some comments
should be made in the beginning, as
the set T = G, D · G · G, D is redundantly presented here. To begin with,
the redundant presentation perfectly shows that T is invariant under Sturmian
involution: If we have a special
Sturmian
morphism F = F1 · G · F2 with anti-
standard morphism F1 ∈ G, D
and anti-Christoffel morphism F2 ∈ G, D
we obtain F ∗ = F2∗ · G · F1∗ , which is also in T because F2∗ is anti-standard
and F1∗ is anti-Christoffel. In other words, the formulation of condition (2a) is
Plain and Twisted Adjoints of Well-Formed Words 77
w = w1 |w2 = Γ · ∆˜k (x|y) = Γ (xy k |y) = xy k |xy k y = D̃k (x|xy) = D̃k · G(x|y).
Thus, gw (1) = 1 and the maximal value gw (0) = 2k +2 is the successor of value 1
in scale order. Now suppose, we have a pair w = w1 |w2 , for which the maximum
value of the accumulation is reached after the first letter of w2 . Furthermore
— for technical reasons — we suppose that either w1 is a prefix of w2 (which
is actually the case in our initial situation, where w1 |w2 = xy k |xy k y) or that
w2 is a prefix of w1 in such a way that the factorization w1 = w2n w3 satisfies
w2 = w3 w4 for some non-empty word w4 and some n ≥ 1. We need to study two
cases:
1. Γ̃ (w1 |w2 ) = w1 |w2 w1 . If w1 is a prefix of the divider suffix w2 (in w1 |w2 ) it
is also a prefix of the longer divider suffix w2 w1 in Γ̃ (w1 |w2 ). If, however,
w2 is a prefix of the divider prefix w1 = w2n w3 (in w1 |w2 ) such that there
is a factor w4 , satisfying w2 = w3 w4 and thus w1 = (w3 w4 )n w3 , then w1
is a prefix of the new divider suffix w2 w1 = w2 (w2 )n w3 = (w3 w4 )n+1 w3 =
((w3 w4 )n w3 )w4 w3 in Γ̃ (w1 |w2 ). This means that the second (technical) part
of the induction hypothesis is satisfied for Γ̃ (w1 |w2 ).
We already noticed that highest accumulation value αw1 |w2 w1 (t) within
the factor w2 is still at the index t = |w1 | + 1. We must exclude the
possibility that a higher value is being reached in one of the two w1 fac-
tors. There are three accumulation values, which we immediately know:
αw1 |w2 w1 (0) = 0, αw1 |w2 w1 (|w1 |) = 1 and αw1 |w2 w1 (|w1 w2 w1 |) = 0. We con-
clude that αw1 |w2 w1 (|w1 w2 |) = −1. From this we may conclude that, if the
global maximum is being reached in one of the two w1 factors, it needs to
be the first one. But as w1 is a prefix of the new divider suffix w2 w1 its
accumulation must behave analogously to this prefix, up to a shift by some
number. But we know this shift explicitly. The accumulation at the divider
is 1, while the accumulation at the beginning of the word is 0, by definition.
So conclude that the global maximum of the accumulation is reached after
the first letter after the divider.
2. ∆(w1 |w2 ) = w2 w1 |w2 . If w1 is a prefix of the divider suffix w2 = w1 w3
(in w1 |w2 ), we obtain ∆(w1 |w2 ) = w1 w3 w1 |w1 w3 , which satisfies the second
(technical) part of the induction hypothesis in this case. If, however, w2 is
a prefix of the divider prefix w1 = w2n w3 (in w1 |w2 ) such that there is a
factor w4 , satisfying w2 = w3 w4 and thus w1 = (w3 w4 )n w3 , then, of course,
w2 is again a prefix of the new divider prefix w2 w1 . But it also satisfies the
factorization property with the same factor f4 and power n + 1 instead of n:
w2 w1 = (w3 w4 )(w3 w4 )n w3 = (w3 w4 )n+1 w3 in ∆(w1 |w2 ). This means that
also in this case the second (technical) part of the induction hypothesis is
satisfied. With the maximal accumulation we may argue as in the previous
case. The first factor w2 starts with accumulation 0, while divider successor
w2 starts with accumulation 1. The possibility to reach a maximum within
w1 can be excluded because the order of the accumulation values of the
factor w1 w2 within w2 w1 w2 needs to be the same as in the original word
w1 w2 , where the maximum was reached after the first letter of w2 by the
induction hypothesis.
Plain and Twisted Adjoints of Well-Formed Words 79
Fig. 6. xyxyxyy and xxxyxxy: prefixes of the Sturmian words of slopes log2 ( 32 ) and
− log 1( 3 )
2 2
5 Final Remarks
1. David Clampitt objected to the musical interpretation (that is to say, to
the graphical representation) of twisted adjoints in the present formulation.
When our treatment is extended to the free group of two letters (that is, when
inverse letters are admitted), the proper representations may be given for
the twisted cases, which for St0 are connected with upward step-interval pat-
terns/backward scale-folding patterns (i.e., negative generation orders), and
downward step-interval patterns/forward scale-folding patterns (i.e., positive
generation orders). This extension is proposed in [6].
2. For the diatonic scale of 7 notes, the step-interval pattern and scale-folding
pattern can be generated as prefixes of two Sturmian words of perpendicular
slopes (see Figure 6). One can check that this is not always the case, but
there is a tight connection between WF duality, reverse of morphisms and
retrogradation of the continued fraction representation of the generator. A
further investigation of this subject is presented in [8], where the description
of modes in terms of hight and width coordinates determined by generator
vector (1, g) and its perpendicular (−g, 1) is taken into account.
References
1. Berstel, J., de Luca, A.: Sturmian words, Lyndon words, and trees. Theoretical
Computer Science 178, 171–203 (1997)
2. Berthé, V., de Luca, A., Reutenauer, C.: On an involution of Christoffel words and
Sturmian morphisms. European Journal of Combinatorics 29(2), 535–553 (2008)
80 D. Clampitt, M. Domı́nguez, and T. Noll
3. Carey, N.: Distribution modulo 1 and musical scales. Ph.D. diss., University of
Rochester (1998)
4. Carey, N., Clampitt, D.: Aspects of well-formed scales. Music Theory Spec-
trum 11(2), 187–206 (1989)
5. Carey, N., Clampitt, D.: Self-Similar Pitch Structures, Their Duals, and Rhythmic
Analogues. Perspectives of New Music 34(2), 62–87 (1996)
6. Clampitt, D., Noll, T.: Modes, the height-width duality, and divider incidence.
Paper presented at Society for Music Theory national conference, Nashville, TN
(2008)
7. Domı́nguez, M., Clampitt, D., Noll, T.: Well-formed scales, maximally even sets and
Christoffel words. In: Proceedings of the MCM 2007, Berlin, Staatliches Institut
für Musikforschung (2007)
8. Domı́nguez, M., Noll, T.: A specific extension of Christoffel duality to a certain class
of Sturm numbers and their characteristic words. In: WORDS 2009 Conference
(submitted to, 2009)
9. Lothaire, M.: Combinatorics on Words. Cambridge University Press, Cambridge
(1983)
10. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press,
Cambridge (2002)
11. Noll, T.: Sturmian sequences and morphisms: a music-theoretical application. SMF,
Journée Annuelle, 79–102 (2008)
12. Noll, T.: Ionian theorem. Journal of Mathematics and Music 3(3) (to appear, 2009)
Regions and Standard Modes
Abstract. Norman Carey and David Clampitt observed in [4] that each
region has two well-formed scales as its prefixes. If one looks at this
finding from the viewpoint of word theory, one observes that regions are
central words and the two prefixes are their independent periods. More
precisely, each region, understood as a word in a two-letter alphabet,
contains two distinct prefixes, both of which represent well-formed scales.
One period is a special standard word, and the other period is a non-
special standard word. Thomas Noll proposed in [13] to generalize the
authentic Ionian mode through special standard words. He showed that
the property of divider incidence characterizes these words among their
conjugates. Thus there are two parallel lines of generalization which can
be further enriched by observations from [7], [8], as well as by further
combinatorial connections between central and standard words.
Two independent lines of research turn out to have so many conceptual cross-
links, that a productive synergy emerges immediately from their contact (see
[10], [6], [13], [14]). In the past two decades Norman Carey and David Clampitt
developed mathematical music theory for the study of scales, regions and related
concepts. At the same time mathematicians such as Aldo de Luca, Jean Berstel,
Valérie Berthé, Christian Kassel, and Christophe Reutenauer investigated a cer-
tain branch of algebraic combinatorics on two-letter words, which includes the
study of central words, standard words, Christoffel words. We refer the reader
to chapter 2 in [11], as well as to [3], [12], [1], [2].
1 Regions
In [4] Carey and Clampitt attempted a rational reconstruction of certain pitch-
space diagrams found in medieval treatises, such as the heptactys in Scolica
enchiriadis and elsewhere and the diamond-shaped diagrams in the Micrologus
of Guido of Arezzo. They were motivated by a care to understand aspects of
diatonicism that were perhaps more available to medieval theorists, who were
not so in the thrall of the notion of pitch class, i.e., octave equivalence. The
notion of a region arises from an alternative path suggested at the outset of their
earlier article, [5]: “At the . . . purely mathematical level, the octave and the fifth
play perfectly symmetrical roles: they are simply numbers which generate other
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 81–92, 2009.
c Springer-Verlag Berlin Heidelberg 2009
82 D. Clampitt and T. Noll
numbers. At the level of the formal theory presented here, however, octave and
fifth are presumed to play fundamentally dissimilar roles: the octave establishes
a primary equivalence relation — octave equivalence — while the fifth generates
the different pitch and interval classes. The fifth generates material which fills
the frame provided by the octave.” What are the implications of taking seriously
the symmetry announced at the beginning of this quotation? It implies first of
all that the imposed asymmetry may be reversed: as opposed to the procedure
in [5] where, for example, the diatonic scale is generated by the perfect fifth with
periodicity at the octave, we may understand the perfect fifth as the frame of
a scale with diatonic step intervals that is generated by the octave. The latter
case is precisely the dasian scale of the Enchiriadis treatises. In both cases, the
scales satisfy the well-formedness condition: generated sets where the generator
is everywhere spanned by the same number of step intervals. Abolishing the
asymmetry, rather than reversing it, leads to the notion of a region, a pitch-
space construction within which modes of the two alternative well-formed scales
are enclosed.
We will not need the very concrete instantiations of regions that the definition
in [4] provides, but for definiteness we consider a small region, the heptactys. If
we consider the perfect octave and perfect twelfth as co-generators, either one is
potentially the frame of a non-degenerate well-formed scale with step intervals
perfect fourth (a), and whole step (b): C F G C’ yields the step-interval pattern
aba, while C F G C’ F’ G’ yields the step-interval pattern abaab. The largest
pitch space within which modes of both well-formed scales may coexist is the
region C F G C’ F’ G’ C”: above the region if the note F” were chosen, that
would confirm the octave above F’ but would contradict the twelfth above G,
D”, and conversely were D” to be chosen. Similarly, below the region a choice
is forced between the B-flat a twelfth below F’ and the octave below G. The
heptactys is the maximal space within which both well-formed scales remain in
balance, in the above sense.
The heptactys region corresponds to the palindromic word abaaba, and the driv-
ing conception behind the paper — that always two well-formed scales of dif-
ferent periods have modes within an enveloping region — can be rephrased in
terms of the mathematical fact that the two prefixes of a central word that are
the fundamental patterns for its two periodicities are standard words: namely, one
positive standard word and one negative standard word. On the other hand, the
driving idea of an enveloping region was already subverted by another example in
[4], Guido’s hexachord: a region enclosed within a well-formed scale par excellence,
the diatonic (major) scale. This countervailing conception is similarly realized in
word theory in the mathematical fact that every central word may be extended in
two ways to standard words; every region extends into two standard modes.
1.1 Ut-Re-Mi-Fa-Sol-La
Let us consider that prominent music-theoretical object, the Guidonian hexa-
chord. Figure 1 displays two arrangements of its six notes, namely (U t, Re, M i,
F a, Sol, La) (step order) and (F a, U t, Sol, Re, La, M i) (generation order folded
Regions and Standard Modes 83
into an octave). Both arrangements deploy binary interval patterns, namely as-
cending major and minor seconds in the step pattern and ascending fifths and
descending fourths in the folding of the chain of fifths into the ambit of one
octave. We represent these binary patterns in terms of two two-letter words,
namely u = aabaa (for the step-interval pattern with letters a and b represent-
ing the ascending major and minor seconds respectively) and u = yxyxy (for
the folding pattern with letters x and y representing the ascending fifth and de-
scending fourth, respectively). Let q = aaba and p = aab denote the two prefixes
of u of lengths 4 and 3, respectively. When we write u = qa, we see that u has a
periodic continuation as qq = (aaba)(aaba). When we write u = paa, we see that
u has also a periodic continuation as pp = (aab)(aab). An analogous observation
can be made with the folding pattern u = yxyxy and its two prefixes q = yx of
length 2 and p = yxyxy of length 5: u has the potential periodic continuation
qqq = (yx)(yx)(yx), and p = (yxyxy) is exactly a complete period of p, and can
still be extended to pp = (yxyxy)(yxyxy).
Fig. 1. Step pattern and fifth/fourth folding of the Guidonian hexachord as instances
of central palindromes
The possibility for such a potential double periodicity is only given for words
which are short enough. It turns out that for periods 4 and 3, the length 5 =
3 + 4 − 2 is already the maximum. The same length 5 = 2 + 5 − 2 is also the
maximum for the periods 2 and 5. The following proposition is a well-known fact
in the algebraic combinatorics on words. It provides the minimal word length,
for which such a potential double-periodicity becomes impossible.
Fig. 2. Construction of two new central words aabaaabaa and aabaabaa from the cen-
tral word aabaa
1. the left successor Pa (u) = qu with co-prime periods n and n + m and asso-
ciated prefixes q and qp,
2. the right successor Pb (u) = pu with co-prime periods m and m + n and
associated prefixes p and pq.
The entire tree (see Figure 3) starts from the empty word , which — some-
what counterintuitively, but for good systematic reasons — is supposed to have
the periods n = m = 1 and associated “ghost-prefixes” q = a and p = b. This
exotic behavior continues along the outermost branches of the tree, where either
only a’s or b’s occur.
Fig. 4. Compilation of the binary tree of central words with its dual tree. In each box
the word u ∈ {a, b}∗ (on left side) denotes a node of the central tree, while the word
u ∈ {x, y}∗ (on the right side) denotes the associated dual central word, which belongs
to the associated node of the dual tree.
(((y + )x)+ x)+ = ((yx)+ x)+ = (yxyx)+ = yxyxy. Observe that — up to a re-
naming of the letters (x a, y b) — the directive words aab and yxx are
retrogrades of each other. Without proof we mention that the retrogradation of
the directive words provides an appropriate duality for central words. An ex-
pression of this duality is the conversion of periods into letter-frequencies and
vice versa. More precisely: Let q and p denote the periodic prefixes of the dual
word u of a central word u with periods n = |q| and m = |p|, and let n = |q |
and m = |p | denote the periods of u . Then the letter frequencies |u|a , |u|b ,
|u |x , u |y satisfy
Systematically adjoining every node u in the tree of Figure 3 with its dual central
word u (written in letters a and b instead of x and y), we construct a dual binary
tree (c.f. 4).
The dual of the central tree can also be defined directly, i.e., in terms of two
maps specifying the left and right successors for each node of the dual central
tree. We postpone their definitions to the subsequent section.
Regions and Standard Modes 87
2 Standard Modes
In the previous section we studied central words with two prefixes q and p.
Thereby the genealogy of central words down the tree goes along with the it-
erated concatenations of the two prefixes. The ordered pair (q, p) (of prefixes
for u) has the two successors (q, qp) (of prefixes for Pa (u) = qu) and (pq, p) (of
prefixes for Pb (u) = pu). The first prefix q in the ordered pair (q, p) is identified
by the property that it ends with letter a, while the second one, p, ends with
letter b. The present section is dedicated to the study of these pairs.
2.1 Do-Re-Mi-Fa-Sol-La-Ti-(Do )
We return to our prominent music-theoretical example, but now in relation to
an historically much later, equally if not more prominent example: the diatonic
major (Ionian) mode. In consideration of this historical disjunction, we will use
modern solmization syllables, replacing Ut with Do and adjoining Ti. The main
subject of this section is the interdependence of central words and (positive)
standard words as well as the close connection between the two dualities (i.e.,
between the duality for central words on the one hand and the duality for positive
standard words on the other). The adjunction of the note Ti and of Do (the
repetition of the finalis Do one octave higher) to the hexachord is a quite natural
procedure from the viewpoint of word theory.
As mentioned in the introduction to Section 1, every region encompasses par-
ticular modes of two well-formed scales. Closer inspection of [4] under the per-
spective of central words reveals that these two modes coincide with the periodic
prefixes q and p. Well-formed scales are mostly known as scales with a period-
icity at the interval of the octave. In this case, however, the tones Do, Re, Mi,
Fa, (Sol) with step-interval pattern q = aaba form a well-formed scale modulo
perfect fifth, which is generated by the major second, starting from Fa. Anal-
ogously, the tones Do, Re, Mi, (Fa) with step-interval pattern p = aab form a
well-formed scale modulo perfect fourth, which is also generated by the major
second, starting from Do.
When we concatenate these two scales, we obtain the fifth-generated diatonic
scale Do, Re, Mi, Fa, Sol, La, Ti, (Do ) modulo octave. More precisely, we
obtain the well-formed Ionian mode with step-interval pattern aabaaab from the
concatenation of two shorter well-formed modes with step-interval patterns aaba
and aab. Note that under this concatenative construction of the Ionian mode
from the two shorter modes aaba and aab the music-theoretical interpretation of
the letters a and b does not change: a stands for (ascending) major second and
b stands for (ascending) minor second. What does change is the modulus, the
interval of periodicity. We see below, that this concatenation qp is an instance
of a transformation on word pairs. Henceforth we use the notation w1 |w2 for
word pairs (w1 , w2 ) with the additional meaning that we regard this pair as two
factors of the concatenation w1 w2 .
For any such pair w1 |w2 we define two successors, namely
Γ (w1 |w2 ) := w1 |w1 w2 and ∆(w1 |w2 ) := w2 w1 |w2 .
88 D. Clampitt and T. Noll
There is an entirely different way to construct the Ionian mode of the diatonic
scale of the basis of transformations. This is based on substitutions rather than
concatenations. For the free monoid {a, b}∗ of finite words with letters a and b
let Ga,b and Da,b denote the following monoid morphisms. For single letters we
define:
Ga,b (a) = a, Ga,b (b) = ab and Da,b (a) = ba, Da,b (b) = b.
For words with more than two letters the transformations are applied to each
letter and the images are concatenated. For pairs w1 |w2 we may trace the
images separately, i.e., we write Ga,b (w1 |w2 ) = Ga,b (w1 )|Ga,b (w2 ) as well as
Da,b (w1 |w2 ) = Ga,b (w1 )|Ga,b (w2 ).
Again, starting from the pair a|b we obtain aaba|aab as
Ga,b (Ga,b (Da,b (a|b))) = Ga,b (Ga,b ((ba|b)) = Ga,b (aba|ab) = aaba|aab.
What remains constant in the meaning of the letters a and b under substitu-
tion notwithstanding their change in size, is the interpretation of a as a primary
step-interval and of b as a secondary step-interval. The substitutive transforma-
tions are the heart of the hierarchy of well-formed scales.
Fig. 5. Authentic Division of the Ionian Mode. The figures to the right display height-
and width-trajectories in association with the step-interval pattern and the folding
pattern of the Ionian mode.
Regions and Standard Modes 89
Dx,y (Gx,y (Gx,y (x|y))) = Dx,y (Gx,y (x|xy)) = Dx,y (x|xxy) = yx|yxyxy
Closer inspection shows that the chain of pairs x|y → yx|y → yx|yxy →
yx|yxyxy, which we obtain by the concatenative transformations ∆, Γ ∆ and
Γ Γ ∆ are precisely the folding patterns to the chain of pairs a|b → ba|b →
aba|ab → aaba|aab, which are the result of the iterated substitutional transfor-
mations Ga,b , Ga,b Ga,b , and Da,b Ga,b Ga,b . Under this view the interpretation of
x as ascending fifth and y as descending fourth remains constant. Conversely, the
intermediate stages x|y → x|xy → x|xxy → yx|yxyxy of the substitutive con-
struction of the folding pattern yx|yxyxy via Gx,y , Gx,y Gx,y and Dx,y Gx,y Gx,y
are precisely the folding patterns of the intermediate stages of the concate-
native construction of aaba|aab along a|b → ba|b → aba|ab → aaba|aab via
Γ, Γ Γ and ∆Γ Γ . In this case the more abstract music-theoretical meaning of x
and y (namely, primary vs. secondary folding interval ) remains unchanged under
transformation, while the actual sizes of these intervals change.
In figure Figure 5 there is a vertical line, connecting the dividing note Sol of
the step-interval pattern with the dividing note Sol of the folding. For ascending
scales and forward foldings (ascending fifths) this is characteristic for the Ionian
mode among all other modes (c.f. [13]).
The concrete example may easily be turned into a definition for the general
case. Every central word u has two periodic prefixes q and p and we can write
qp = uab. Its dual central word u has periodic prefixes q and p and again we
90 D. Clampitt and T. Noll
Fig. 6. Complilation of four binary trees: central words and positive standard words.
Observe that the duality is manifest through the operation of path reversal.
Regions and Standard Modes 91
can write q p = u xy. Figure 6 extends Figure 4. In each of the 2 × 2-boxes there
are now four nodes from four different — by highly related — binary trees:
1. The tree of central words (displayed in the left upper fields of each node-box).
The left and right successors of node u are the nodes Pa (u) = qu = (ua)+
and Pb (u) = qu = (ub)+ , respectively.
2. The dual tree of central words (displayed in the right upper fields of each
node-box). The left and right successors of node u are the nodes Ca (u ) =
Gx,y (u )a and Cb (u ) = Dx,y (u )b, respectively.
3. The tree of standard pairs (displayed in the left lower fields of each node-
box). The left and right successors of node q|p are the nodes Γ (q|p) = q|qp
and ∆(q|p) = pq|p, respectively.
4. The dual of this tree of standard pairs (displayed in the right lower fields
of each node-box). The left and right successors of node q|p are the nodes
Gx,y (q|p) and Dx,y (q|p), respectively.
Concluding Remark: In [9] we take a closer look at the way in which the central
word u (with period-prefixes q and p) interacts with the authentic standard mode
q|p and its conjugate modes. This paper revisits observations by the first author
(see [7]) in connection with Guido’s and Hermannus’s concept of affinity in
the light of the double periodicity of the central words. It also investigates the
sensitive interval between divider and leading tone by means of generalization.
References
[1] Berstel, J., de Luca, A.: Sturmian words, Lyndon words and trees. Theoretical
Computer Science 178, 171–203 (1997)
[2] Berstel, J., Lauve, A., Reutenauer, C., Saliola, F.V.: Combinatorics on words:
Christoffel words and repetitions in words. CRM Monograph Series, vol. 27. Amer-
ican Mathematical Society, Providence (2009)
[3] Berthé, V., de Luca, A.: Christophe Reutenauer. On an involution of Christoffel
words and Sturmian morphisms. European Journal of Combinatorics 29(2), 535–
553 (2008)
[4] Carey, N., Clampitt, D., Regions: A theory of tonal spaces in early medieval
treatises. Journal of Music Theory 40(1), 113–147 (1996)
[5] Carey, N., Clampitt, D.: Aspects of well-formed scales. Music Theory Spec-
trum 11(2), 187–206 (1989)
[6] Clampitt, D., Noll, T.: Modes, the height-width duality, and divider incidence.
In: Society for Music Theory national conference, Nashville, TN (2008)
[7] Clampitt, D.: Double neighbor polarity (unpublished paper) (2008)
[8] Clampitt, D.: Sensitive intervals: major third analogues in standard well-formed
words. In: Workshop on Mathematical Music Theory, South Bristol, ME (2008)
[9] Clampitt, D., Noll, T.: Regions within enveloping standard modes (unpublished
paper) (2009)
92 D. Clampitt and T. Noll
[10] Domı́nguez, M., Clampitt, D., Noll, T.: Well-formed scales, maximally even sets
and Christoffel words. In: Proceedings of the MCM 2007, Berlin, Staatliches In-
stitut für Musikforschung (2007)
[11] Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press,
Cambridge (2002)
[12] Kassel, C.: From Sturmian morphisms to the braid group B4 . In: The Workshop
on Braid Groups and Applications, Banff International Research Station (2004)
[13] Noll, T.: Ionian theorem. Journal of Mathematics and Music 3(3) (to appear,
2009)
[14] Noll, T., Clampitt, D., Domı́nguez, M.: What Sturmian morphisms reveal about
musical scales and tonality. In: Proceedings of the WORDS 2007 Conference,
Marseille (2007), http://iml.univ-mrs.fr/words2007/
[15] Singler, F.: Zur Dualität zwischen doppelter Periodizität und binärer Intervall-
Struktur in der Theorie der Tonregionen. Thesis (final paper). Hochschule für
Musik und Theater Felix Mendelssohn Bartholdy, Leipzig (2008)
Compatibility of the Different Tuning Systems
in an Orchestra
1 Introduction
Different criteria have been used to select the sounds that music uses. A set containing
these sounds (musical notes) is called a tuning system. Most of them have been
obtained through mathematical arguments. The numerical nature of these systems
facilitates their transmission and the manufacture of instruments, etc. However, the
harshness of the mathematical arguments relegated these tuning systems to theoretical
studies while in practice musicians tuned in a more flexible way. They implicitly deal
with complex mathematical processes involving some uncertainty in the concepts. In
fact, most of the musicians in a classical orchestra must adjust their instruments to
tune well. For instance, wind instrument players modify the air pressure or the finger
positions to fit their notes to the ensemble. Because of this, many musicians feel that
the mathematical arguments that justify tuning systems are impractical.
Sometimes, probability distributions are useful for handling uncertainty (stochastic
uncertainty) [7],[8], but in other cases it cannot be justified that the given concepts
follow a predetermined distribution (fuzzy uncertainty). As musicians need flexibility
in their reasoning, the use of fuzzy logic to connect music and uncertainty is appropri-
ate (see [6],[7],[10]). Therefore, we propose to model the notes as fuzzy sets and
analyze the compatibility between them. With this idea, we can extend the concept of
tuning systems, connecting theory and practice, and understand how musicians work
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 93–103, 2009.
© Springer-Verlag Berlin Heidelberg 2009
94 A. del Corral, T. León, and V. Liern
in real-life. In order to compare the notes that constitute theoretical tuning systems
and those performed by musicians, we have studied the compatibility of a set of notes
(recorded by a professional musician) with the corresponding notes of the Pythago-
rean, Zarlinean and Equal Temperament of 12 notes systems.
⎧ k ⎢k ⎥ ⎫
S ΛF = ⎨ 2 c n : c n = ∑ λ i h i (n) −⎢ ∑ λ i hi (n)⎥, n ∈ Z ⎬ (2)
⎩ i=1 ⎣ i=1 ⎦ ⎭
where ⎣x ⎦ is the integer part1 of x.
1
Note that the integer part in (2) is added to gain octave equivalence.
Compatibility of the Different Tuning Systems in an Orchestra 95
If every element in the tuning system is a rational number, we say that it is a tuned
system, whereas if some element is an irrational number then the system is a tem-
perament. The advantage of expressing the tuned notes as 2 c n is that if our reference
note is 20, in accordance with (1) the exponent cn provides the pitch sensation. Let us
mention that the family of integer-valued functions F mark the “interval locations”. In
those systems generated by one interval (for instance the Pythagorean) they are not
really necessary. However, in the other systems they are. For instance, in the Just
Intonation h1(n) and h2(n) indicate the position of the t fifths and the thirds considered
as tuned. Table 1 displays some examples of tuning systems.
S Λ F
Pythagorean λ 1 = log 2 (3 / 2) h1 (n) = n
λ 3 = 7 12 h2 (n) = ⎢
⎢ an + 6 ⎥ ⎢ n + 7 ⎥ ⎢ an + 8 ⎥ ⎢ n + 9 ⎥
+ + +
⎣ 12 ⎥⎦ ⎢⎣ 12 ⎥⎦ ⎢⎣ 12 ⎥⎦ ⎢⎣ 12 ⎥⎦
⎢ a n + 1⎥ ⎢ n + 4 ⎥ ⎢ a n + 5 ⎥
h 3 (n) = ⎢ + +
⎣ 12 ⎦⎥ ⎣⎢ 12 ⎦⎥ ⎣⎢ 12 ⎦⎥
In our context, if we take the note A = 440Hz (diapason) as our fixed note, a fre-
quency of 442Hz, from the point of view of the Boolean logic, would be out of tune.
96 A. del Corral, T. León, and V. Liern
However, for every musician, or anybody that hears it, that note is slightly more out
of tune than another note of 450Hz. The step between tuned or not tuned is repre-
sented as a fuzzy set in which a tolerance level has been fixed (see Fig. 1).
Fig. 1. Membership and characteristic functions for the fuzzy and classical membership func-
tions
Before explaining how the tolerance levels are fixed, we will introduce the idea of
a fuzzy number [4]:
In this paper we will use a particular case of fuzzy numbers, the LR-fuzzy numbers
[3], [4] and the relationship between them.
where L and R are reference functions, i.e., L,R :[0, + ∞[→ [0, 1] are strictly de-
creasing in supp M˜ = {x : µ M˜ (x) > 0} and upper semi-continuous functions such
that L(0) = R(0) = 1. If supp M˜ is a bounded set, L and R are defined on [0, 1] and
satisfy L(1)=R(1)=0. Moreover, if L and R are linear functions, the fuzzy number is
called trapezoidal (see Fig. 2) and their arithmetic is easy to perform. They are de-
fined by four real numbers, A˜ = (a L , a R , α L ,α R ), and values below a L − α L and
above a R + α R are not acceptable.
Compatibility of the Different Tuning Systems in an Orchestra 97
]f 2−ε , f 2ε [≅ f (4)
where ε > 0, and 1200 ε expresses, in cents, the accuracy of the human ear to the
perception of the unison.
Moreover, if the amount of notes per octave is q, we can divide the octave into q in-
tervals with a length of 1200/q cents. So, we can express the interval of the note f as:
]f 2−δ , f 2δ [, (5)
where δ = 1 (2q), the quantity ∆ = 1200δ expresses, in cents, the tolerance that we
admit for every note. Actually, this is what chromatic tuners do: they assign 12 divi-
sions per octave, δ = 1 (2 ×12) , and then, the tolerance corresponding to every note is
1 = 50 cents.
∆ = 1200 2×12
Remark 1. It is necessary that ε < q since a tuning system with more notes than the
human ear can distinguish would have no practical sense.
Taking into account (4) and (5), we can express a musical note as a trapezoidal
fuzzy number with peak [ f 2−ε , f 2ε ] and support [ f 2−δ , f 2δ ] . Besides, according to
Definition 2, the notes are expressed as powers of two, 2 c n , so it is more practical to
express the fuzzy musical notes using their exponent, c n .
2 t = (2 t−ε ,2 t+ε ,α L ,α R ) ,
˜
where α L = 2 t−ε (1− 2−δ ), α R = 2 t+ε (2δ −1) and its membership function is
⎧ 2 t−ε − x
⎪1− , 2 t−ε −δ < x ≤ 2 t−ε
2 t−ε −2 t−ε −δ
⎪ 2 t−ε < x ≤ 2 t+ε
µ 2 t ( x ) = ⎨1, t+ε
⎪1− x −2 2 t+ε < x ≤ 2 t+ε +δ
⎪ 2 t+ε +δ −2 t+ε
⎩ 0, otherwise.
Let us point out that two trapezoidal fuzzy numbers are involved in the definition
of a fuzzy musical note: one for the exponent, t˜ , which reflects the pitch sensation
and therefore is a symmetric fuzzy number, and the other for the fuzzy note which is
non-symmetric, as the expression of its membership function shows.
5 Measuring Compatibility
Let us recall the definition of intersection or similitude between two fuzzy sets [3].
˜ and B
Definition 6. The fuzzy intersection of the fuzzy sets A ˜ is a new fuzzy set
˜ ˜
A ∩ B with membership function
µ A˜ ∩ B˜ ( x ) = min{µ t˜ ( x), µ s˜ ( x)}. (6)
The concept of compatibility between two notes can be derived from this definition
[10].
˜ ˜
Definition 7. Let 2 t and 2 s be two musical notes, where t˜ = (t − ε , t + ε ,δ ,δ ) and
˜ ˜
s˜ = (s − ε, s + ε ,δ ,δ ) . We define the degree of compatibility between 2 t and 2 s as
˜ ˜
Comp[2 t ,2 s ] = max x µ s˜ ∩ t˜ ( x) , (7)
˜ ˜ ˜ ˜
and we say that 2 t and 2 s are α-compatible, α ∈ [0,1], if Comp[2 t ,2 s ] ≥ α .
Although the compatibility between notes could have been defined for notes with
different degrees of tolerance δ and δ ' (with δ ≠ δ ' ), in practice we are equally tolerant
of all the notes in an octave.
Figure 3 illustrates Definition 7 and shows that the intersection of two trapezoidal
numbers is not necessarily trapezoidal.
By a direct calculus we can obtain the next result that allows us to calculate the
compatibility between notes.
˜ ˜
Proposition 1: Two musical notes 2 t and 2 s , where t˜ = (t − ε , t + ε ,δ ,δ ) and
s˜ = (s − ε , s + ε ,δ ,δ ) , are α-compatible, α ∈ [0,1] if and only if t − s ≤ 2δ (1 − α ) + 2ε .
Taking into account (7) and Proposition 1, the compatibility can be expressed as:
˜ ˜ ⎧ t − s − 2ε ⎫
Comp[2 t ,2 s ] = max ⎨ 0,1 − ⎬. (8)
⎩ 2δ ⎭
However, for musicians it is usually more convenient to calculate the compatibility
between two notes in terms of their frequencies. Hence, given two notes with fre-
quencies f1 and f2, for which we admit a tolerance of ∆ cents, according to (8), the
compatibility between f1 and f2 is given by
⎧ d ( f1, f 2 ) − 2E ⎫
Comp[ f1 , f 2 ] = max ⎨ 0,1 − ⎬, (9)
⎩ 2∆ ⎭
where d is the distance defined in (1), ∆ = 1200 × δ and E = 1200 × ε .
In order to extend the concept of compatibility to tuning systems, we need to define
the fuzzy tuning systems.
⎧⎪ ⎛k ⎢k ⎥ ⎞ ⎫⎪
˜
S˜ ΛF (δ ) = ⎨ 2 c n : c˜ n = ⎜⎜ ∑ λ i hi (n) −⎢ ∑ λ i hi (n)⎥, δ ⎟⎟, n ∈ Z ⎬ . (10)
⎪⎩ ⎝ i=1 ⎣ i=1 ⎦ ⎠ ⎭⎪
Next, we introduce the concept of compatibility between two systems which reflects
both the idea of proximity between their notes and also whether their configuration is
similar.
Definition 9. Let S˜ q (δ ) , T˜q (δ ) be two tuning systems with q notes. We say that S˜ q (δ )
and T˜q (δ ) are α-compatible, if for each 2 s˜i ∈ S˜q (δ ) there is a unique 2 t˜i ∈ T˜q (δ ) such that
˜ t˜
Comp[2 si ,2 j ] ≥ α .
The quantity α is the degree of interchangeability between S˜ q (δ ) and T˜q (δ ) and the
uniqueness required in Definition 9 guarantees that these systems have a similar dis-
tribution in the cycle of fifths.
Remark 2. It is important to note that the α-compatibility does not define a binary
relation of equivalence in the set of tuning systems, because the transitive property is
not verified.
100 A. del Corral, T. León, and V. Liern
6 Computational Results
We have made 48 recordings of 23 bars2 of the Concert in E flat major Hob. VIIe, N.
1 for trumpet and orchestra of Franz Joseph Haydn (1732-1809) (see Fig. 4). All the
recordings were played by the same player with a Bach® trumpet in Bb.
Our aim is to analyze the compatibility between the notes recorded and the tuned
notes in the Pythagorean, Zarlinean and Equal Temperament (with 12 notes) systems.
In our approach both kinds of notes are fuzzy.
The frequencies corresponding to a practical note (which is performed slightly dif-
ferently at each recording) belong to the interval
[ fmin, fmax] ,
where fmin (resp. fmax ) is its lowest (resp. its highest) interpreted frequency (see Col-
umns 1—2 in Table 2). The exact frequencies of some tuning systems3 appear in
Columns 3—6. The logarithms in base 2 of the frequencies in the table provide the
values for t in Definition 5 and we also take ∆ = 50 cents (as the chromatic tuners)
and E = 6 cents as tolerances (see [8], [10]).
2
The different notes that appear in the fragment are Eb3, F3, G3, Bb3, Ab3, Bb3, C4, D4, Eb4, F4,
G4, Bb4, B3 and A3.
3
To be more precise, the Just Intonation system is a family of tuning systems and we have
selected the Zarlinean [5].
Compatibility of the Different Tuning Systems in an Orchestra 101
Table 2. Exact frequencies of the notes that appear in the piece of Haydn
In Table 3, we have underlined the compatibility intervals that are lower than 0.6.
This means that its compatibility is lower than 60%. If we fix this percentage as the
minimum to accept that the notes are compatible, we can see in the table that 50% of
the notes are not acceptable in all the tuning systems.
These results are very useful to the player that needs to modify the pressure and/or
the position to raise the level of compatibility.
7 Conclusions
Many musicians think that the technical treatment of musical concepts comes into
conflict with their daily practice. However, their methods are used in this paper to
make the concept of a musical note more flexible. In this framework, fuzzy mathe-
matical rules and practice are the same thing. In fact, the adjustments that the musi-
cians make constitute a method for increasing the compatibility level among systems.
In this way, describing the tuning systems as fuzzy sets allows us to include the daily
reality of musicians and their theoretical instruction in a mathematical structure. In
our opinion, this constitutes a good model of reality.
From the idea of compatibility, the possibility of substituting a tuning system with
another one arises. Therefore, when a tuning system presents many harmonic difficul-
ties, such as not allowing certain transpositions, we can use a compatible system to
avoid these disadvantages. On the other hand, knowing the compatibility between
notes allows musicians to improve their performance by choosing between different
tune positions, increasing lip pressure, etc.
Finally, we find it necessary to warn that many players and composers have diffi-
culties when they work with proposals from the area of science. So, we should con-
tinue our attempts to make them feel more comfortable with technical arguments.
References
1. Benson, D.: Music: a Mathematical Offering. Cambridge University Press, Cambridge
(2006)
2. Borup, H.: A History of String Intonation,
http://www.hasseborup.com/ahistoryofintonationfinal1.pdf
3. Carlsson, C., Fullér, R.: Fuzzy Reasoning in Decision Making and Optimization. Physica-
Verlag, Heidelberg (2002)
4. Dubois, D., Prade, H.: Fuzzy Sets and Systems:Theory and Applications. Academic Press,
New York (1980)
5. Goldáraz Gaínza, J.J.: Afinación y temperamento en la música occidental, Alianza Edito-
rial, Madrid (1992)
Compatibility of the Different Tuning Systems in an Orchestra 103
6. Hall, R.W., Josíc, K.: The mathematics of musical instruments. Amer. Math. Monthly 108,
347–357 (2001)
7. Halusca, J.: Equal temperament and pythagorean tuning:a geometrical interpretation in the
plane. Fuzzy Sets and Systems 114, 261–269 (2000)
8. Haluska, J.: The Mathematical Theory of Tone Systems. Marcel Dekker, Inc., Bratislava
(2005)
9. Lattard, J.: Gammes et tempéraments musicaux, Masson Éditions, Paris (1988)
10. Liern, V.: Fuzzy tuning systems: the mathematics of the musicians. Fuzzy Sets and Sys-
tems 150, 35–52 (2005)
11. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 1, 3–
28 (1978)
Formal Diatonic Intervallic Notation
Abstract. Numbers called quality modifiers are used to identify interval quali-
ties: 0 numerically represents perfect, ½ represents major, –½ represents minor,
and so on. These modifiers are linked with diatonic class intervals as ordered
pairs that mimic common interval notation. For example, a minor third is repre-
sented by (–½, 2). A binary operator is constructed that allows these ordered
pairs to be added consistent with our expectations. Similarly, accidental modifi-
ers numerically identify the number of sharps or flats attached to a given note: 0
indicates no attached accidentals, negative integers indicate the number of flats
attached, and positive integers indicate the number of sharps attached. These
modifiers are linked with diatonic classes as ordered pairs that mimic common
note names. For example, the note Gb is represented by (–1,4) and Gx by (2,4).
Intervals and notes represented by these ordered pairs are said to be in MD-
notation (MD for modifier-diatonic). A group action and generalized interval
system are defined for intervals and notes in MD-notation. An implied quarter-
tone system is also discussed.
1 Introduction
While current numerical diatonic intervallic notations reveal much about diatonic in-
tervals, these notations in general do not mimic what we call common diatonic nota-
tion; that is, the interval qualities (major, minor, perfect, …) are not immediately
apparent in the notation. As it turns out, to mimic common diatonic notation in a
mathematically consistent way is surprisingly complicated. For example, sometimes
the sum of two major intervals yields an augmented interval (M2 + M3 = A4), and
other times the sum yields another major interval (M2 + M2 = M3). How would one
construct a group of intervals that mimic common intervals and overcome this seem-
ing ambiguity? This problem is addressed in Sections 2 and 3. In Section 4 Hook’s [1]
notation that models common notes (e.g., Ab, F#, …) by linking accidental modifiers
and diatonic classes is coupled with Clough and Douthett’s [2] algorithm for maxi-
mally even sets to define a set of notes on which the group described above will act.
Section 5 addresses the generalized interval system induced by this action, and Sec-
tion 6 briefly discusses extensions of this notation to other equal-tempered systems.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 104–114, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Formal Diatonic Intervallic Notation 105
2 Quality Modifiers
In this section and all that follow, mod 7 diatonic class intervals (dcis) will be repre-
sented by italicized uppercase letters A and B, and mod 12 pitch class intervals (pcis)
will be represented by italicized lowercase letters a and b. We will shortly define
quality modifiers, which will be represented by lowercase Greek letters α and β .
For interval qualities, lowercase letters d and m will denote diminished and minor in-
tervals, while uppercase letters P, M, and A will denote perfect, major, and aug-
mented intervals.
Intuitively, the adjectives used to describe interval quality—perfect, major, minor,
augmented, diminished—convey something about how large a given interval is rela-
tive to an “average” interval of the same numerical (generic) size. For example, a m3
is a little smaller than an “average” third, while an A6 is quite a bit larger than an “av-
erage” sixth. With the aim of making this intuition precise, we first consider what is
meant by an “average” interval.
Since there are seven generic steps to the octave, it seems logical to say that the av-
erage interval of a dci, call it A, is the fraction A/7 of an octave; that is, the average
second (dci 1) is 1/7 of an octave, the average third (dci 2) is 2/7 of an octave, and so
on. Equivalently, measured in equal-tempered semitones, the average interval of dci A
is 12A/7: the average second is 12/7, the average third is 24/7, and so on. Note that we
could have arrived at the same result by averaging the interval sizes within the dia-
tonic scale. For example, in any diatonic scale there are four m3s (of size 3 semitones)
and three M3s (4 semitones), and the mathematical average of four 3s and three 4s is
again 24/7.
Table 1 tabulates the intervals of all common qualities in the usual 12-note equal-
tempered system (12-ETS). The first column gives the common names of the inter-
vals. The next two columns give the dci and pci, respectively, of the interval—the
generic and specific sizes of the interval in the terminology of Clough and Myerson
[3, 4]. The fourth column gives the size of the corresponding “average” interval as de-
scribed above; that is, 12A/7 where A is the dci, or equivalently an interval of size A in
a 7-ETS measured in 12 equal-tempered semitones. The fifth column labeled “Differ-
ence” tabulates the difference between the pci and the average 7-ETS interval,
counted positive if the pci is larger than average and negative if the pci is smaller than
average (Column 3 less Column 4). The last column is the quality modifier (qm) of
the interval, which is the difference in Column 5 rounded to the nearest half-integer.
Thus, if [ ⋅ ] is the function that rounds to the nearest integer, then the qm of an inter-
val with dci A and pci a is
1⎡ ⎛ 12 A ⎞ ⎤ 1 ⎡ 24 A ⎤
α = ⎢2 ⎜ a − ⎟ = a − l ( A ) where l ( A ) = ⎢ .
7 ⎠ ⎥⎦ 2 ⎣ 7 ⎥⎦
(1)
2⎣ ⎝
Observe that (1) may be used to determine a from α and A, or to determine l(A) from
a and α . Some combinations of a and α are not realized, however, as l(A) takes on a
limited number of values. (The values of l(A) for A = 0, 1, L , 6 are 0, 1½, 3½, 5, 7,
8½, 10½.)
106 J. Douthett and J. Hook
1
In practice d1 can be interpreted as d8. The notation d1 is used here because the dcis are re-
duced mod 7.
Formal Diatonic Intervallic Notation 107
When the differences in Column 5 of Table 1 are rounded to the nearest half-
integer, the results range from –1½ (for “much smaller than average” intervals) to
+1½ (for “much larger than average” intervals). These differences and the half-
integers to which they round (qms) are reorganized and consolidated in the first two
columns of Table 2. The last column of Table 2 reorganizes and consolidates the
common names in Column 1 of Table 1 so that they correspond to their qms in Col-
umn 2 of Table 2. The intervals whose qms are 0 are perfect (middle row in Table 2);
the intervals whose qms are +½ or –½ are major or minor, respectively, and those
with larger qms (positive or negative) are augmented or diminished; a raised or low-
ered perfect interval has a qm of +1 or –1, respectively, while a raised major or low-
ered minor interval has a qm of +1½ or –1½.
The qms of multiply augmented and diminished intervals can also be determined.
For example, the dci and pci of a doubly augmented sixth (AA6) are A = 5 and
a = 11 , respectively. Then
1 ⎡ 24 ⋅ 5 ⎤
α = 11 − ⎢ = +2½ .
2 ⎣ 7 ⎥⎦
(2)
3 Group Structures
In previous work on musical intervals, Brinkman [5] and Agmon [6] define an inter-
val as an ordered pair in which the first and second coordinates are a pci and dci, re-
spectively. We say these intervals are in PD-notation and denote the group of all such
intervals as I PD = I12 × I 7 where I12 and I 7 are the groups of pcis and dcis. This
108 J. Douthett and J. Hook
group is a cyclic group of order 84 generated by a7, 4b (the musical fifth).2 The binary
operator + sums intervals coordinate-wise: the first coordinates are summed mod 12,
and the second are summed mod 7. Then the sum of common intervals such as
M2 + M3 = A4 and M2 + M2 = M3 (3)
are represented in PD-notation as
a2,1b a4, 2b a6,3b and a2,1b a 2,1b a 4, 2b . (4)
While it is easy for musicians to determine the quality of the intervals in (4), the
notation by itself gives no hint of interval quality. If, however, the pcis in the first co-
ordinates of the intervals are replaced by their corresponding qms as determined by
(1), the qualities of the intervals are immediately obvious in the notation:
a½,1b a½, 2b a1,3b and a½,1b a½,1b a½, 2b . (5)
While coordinate-wise addition works for the first sum in (5) ( ½ + ½ = 1 ), it does
not work for the second ( ½ + ½ ≠ ½ ). So, it is necessary to construct a binary opera-
tor that sometimes adds coordinate-wise and other times does not. Thus ⊕ is adopted
as the binary operation instead of the usual + .
In the process of constructing this binary operation, we first let
Q 12 = {-5½, −5,-4½,L ,5½, 6} (6)
be the group of qm classes under addition mod 12, and let I MD be the image of the
map τ : I PD → I MD defined by
W a a, Ab aD , Ab where D a l A Q 12 . (7)
We say that the intervals in I MD are in MD-notation (MD for modifier-diatonic).
From (1), τ is a bijection and maps each interval in PD-notation to the same interval
in MD-notation. To discover how intervals are summed in MD-notation so that they
mirror the sum of common intervals, we require that τ map the sum of intervals in
PD-notation to the sum of the same intervals in MD-notation; that is,
W a a, Ab ab, B b W a a, Ab W ab, B b . (8)
Finding the binary operator ⊕ is somewhat backwards from many problems in group
theory texts, which give the binary operations of two groups and ask the reader to find
a map (homomorphism) between the groups that preserves the operations. Our task is:
given the binary operator of one group and a map that preserves binary operations,
find the binary operator of the other group.
Proposition 1. Let A, B ∈ I 7 , and define f as follows:
f ( A, B ) = l ( A ) + l ( B ) − l ( A + B ) . (9)
Let aD , Ab , a E , B b I MD . If (8) holds then
2
We use double brackets to denote intervals to avoid confusion with parenthetical ordered pairs
which will later be used to represent notes.
Formal Diatonic Intervallic Notation 109
aD , Ab a E , B b cD E f A, B , A B f I . (10)
e h MD
4 Group Actions
In what follows, pitch classes (pcs) will be represented by italicized lowercase letters
m and n. The set of all pcs will be denoted U12 . Similarly, diatonic classes (dcs) will
be represented by italicized uppercase letters M and N, and the set of all dcs will be
denoted U 7 . Now let U PD = U12 × U 7 . We say the notes (ordered pairs) in U PD are in
PD-notation. The action of I PD on U PD is defined by
aa, Ab m, M a m, A M U PD (11)
where the first and second coordinates are reduced mod 12 and mod 7 [5, 6].
Consider the following statements:
Statement 1. A P5 above Eb is Bb .
(12)
Statement 2. A M3 above Eb is G.
In PD-notation, these statements correspond to
a7, 4b 3, 2 10, 6 and a 4, 2b 3, 2 7, 4 , (13)
which are consistent with (11). Note that neither the interval qualities nor the acciden-
tals attached to the notes in the statements in (12) are conveyed in the notation in (13).
While MD-notation solves this problem for intervals, another notation is needed for
accidentals attached to notes. Following Hook’s [1] work on enharmonic systems, we
110 J. Douthett and J. Hook
D l A P J M J A M , A M
5
12,7
5
12,7
D P g A, M , A M U MD .
Since in both PD- and MD-notations the actions of the groups of intervals on the sets
of notes are simply transitive, there are GISes [9] associated with both notations. The
construction of a GIS in PD-notation is straightforward: For ( m, M ) , ( n, N ) ∈ U PD ,
the GIS (U PD , I PD ,int PD ) in PD-notation is defined by
int PD m, M , n, N a n m, N M b I PD . (23)
As with interval composition and group action, the definition for int MD in the GIS
(U MD , I MD ,int MD ) is a bit more complicated than int PD .
Proposition 3. Let M , N ∈ U 7 , and define h as follows:
h ( M , N ) = J12,7
5
( N ) − J12,7
5
(M ) − l ( N − M ) . (24)
112 J. Douthett and J. Hook
From (24), (25), and the left side of (26), the qm of the resultant interval should be
0 − ( −1) + h ( 2, 4 ) = 0 + 1 + 7 − 4 − 3 ½ = ½ , which is consistent with the right side of
(26). Although tedious, it is straightforward to verify that
for all ( µ , M ) , (ν , N ) , ( ρ , P ) ∈ U MD .
Similar to f and g in (9) and (21), the function h in (24) takes on only a few values
mod 12: –½, 0, and ½. But unlike f and g, which measure deviations in the sums of
modifiers, h measures the deviation in the difference of modifiers ν − µ .
6 Coda
There is a surprise ending for MD-notation; within this notation there is a disguised
quartertone system. Recall that there are 84 intervals in I MD , and they all come from
the set Q 12× I 7 . But the cardinality of Q 12× I 7 is 168. Thus I MD contains only half
the members of the set Q 12× I 7 . So, what happens if we adopt ⊕ as the binary opera-
tor for Q 12× I 7 ? In fact, it can be shown that I MD is a subgroup of I MD
*
= ( Q 12× I 7 , ⊕ )
and that in this parent group, every interval comes in perfect, minor, and major fla-
vors. Moreover, I MD *
is a cyclic group of order 168 generated by a0, 2b , which can be
interpreted as a P3 and lies halfway between a m3 and a M3. Thus, the length of the
P3 is 3½ semitones. That this length is half that of the P5 (7 semitones) is reflected in
*
the following composition in I MD :
a0, 4b a0, 2b a0, 2b . (28)
*
To define an action for I MD , it is necessary to double the size of A 12 :
*
A 12 = {-5½, −5, -4½,L ,5½, 6} . (29)
Formal Diatonic Intervallic Notation 113
1⎡ ⎛ 20 AB ⎞ ⎤ 1 ⎡ 40 AB ⎤
α B = ⎢ 2 ⎜ aB − = aB − lB ( AB ) where lB ( AB ) = ⎢
9 ⎟⎠ ⎥⎦
.
2 ⎣ 9 ⎥⎦
(30)
2 ⎣ ⎝
Then similar to the dcis in the 12-tet system, each dci in the 20-tet system comes in
either major and minor intervals (ams ½ and –½) or in a perfect interval (am 0), but
not both: dcis 2, 3, 6, and 7 come in major and minor flavors and dcis 0, 1, 4, 5, and 8
come in perfect flavors. In view of the important role the major scale plays in Western
music, it would seem reasonable to ask which rotations of Balzano’s scales might rep-
resent the “major” scales. By observing that the intervals from the root of a major
scale in the 12-tet system are either major or perfect, one might speculate that the
same is true for Balzano’s scales. Then in terms of pcs mod 20,
0, 2,5,7,9,11,14,16,18 (31)
would be Balzano’s “major scale” that begins on pc 0. The set in (31) can also be in-
terpreted as the set of natural notes; that is, in MD-notation (31) can be written as the
note set
(0,0),(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8)
. (32)
If pc 0 represents the note C in the 20-tet system, (31) can also determine the 20-tet
keyboard configuration; (31) is the set of white keys and its complement is the set of
black keys.
This approach can also apply to the study of scale systems investigated by Bohlen
[11], Mathews et al. [12], Agmon [6], Clough and Douthett [2], Brinkman [5],
Zweifel [13], Krantz and Douthett [14], Hook [1], and others.
References
1. Hook, J.: Enharmonic Systems: A Theory of Key Signatures, Enharmonic Equivalence,
and Diatonicism. J. Math. Mus. 1, 99–120 (2007)
2. Clough, J., Douthett, J.: Maximally Even Sets. J. Mus. Theory 35, 93–173 (1991)
3. Clough, J., Myerson, G.: Variety and Multiplicity in Diatonic Systems. J. Mus. Theory 29,
249–270 (1985)
4. Clough, J., Myerson, G.: Musical Scales and the Generalized Cycle of Fifths. Am. Math.
Monthly 93, 695–701 (1986)
114 J. Douthett and J. Hook
1 Introduction
There are two generic types of responses that can be collected in an experi-
mental setting where subjects are asked to make judgments on musical stimuli.
The first is a retrospective response, where the listener only makes a judgment
after hearing the musical excerpt; the second is a real-time response where judg-
ments are made while listening. The latter has become increasingly popular
among experimental psychologists as an effective means of collecting data. In
particular, studies on musical tension have often employed real-time collection
methods (Nielsen 1983; Madson and Fredrickson 1993; Krumhansl 1996; Bigand
et al. 1996; Bigand & Parncutt 1999; Toiviainen & Krumhansl 2003; Lerdahl
& Krumhansl 2007). The validity of this type of data collection is indicated
by the high inter- and intra-subject correlation between subject responses and,
more importantly, the indication that these responses correspond to identifiable
musical structures (Toiviainen & Krumhansl 2003).
In this paper we propose a method to detect and quantify the relevance of in-
dividual features in complex musical stimuli where both the musical features de-
scribing the stimuli and the subject responses are real-valued. While the method
can be used with most types of auditory or visual stimuli and most types of
responses,1 the method discussed here was developed for the purposes of under-
1
For example, the response signal can be brain activity, as measured by imaging
technology (Schoner 2000), a general biological response such as skin conductivity
(Picard et al. 2001), or direct subject input by means of a computer interface.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 115–129, 2009.
c Springer-Verlag Berlin Heidelberg 2009
116 M.M. Farbood and B. Schoner
standing how musical structures affect listener responses to tension. Our analysis
is based on the assumption that perceived tension is a function of various salient
musical parameters varying over time, such as harmony, pitch height, onset fre-
quency, and loudness (Farbood 2006). It is the objective of this paper to formu-
late a mathematically sound approach to determine the relative importance of
each individual feature to the perception of tension.
In the following sections, we will first provide a methodology based on poly-
nomial functions and the least-mean-square error measure and then extend the
methodology to arbitrary nonlinear function approximation techniques. We will
first verify our approach with simple artificial data and then apply it to complex
data from a study exploring the perception of musical tension.
2 Prior Work
In this paper we rely on prior art from two distinct fields: (A) the statistical
evaluation of experimental and continuous data, mostly using variants of lin-
ear correlation and regression (Gershenfeld 1999b) and (B) feature selection for
high-dimensional pattern recognition and function fitting in machine learning
(Mitchell 1997).
(A) is helpful for our task at hand, but its limitation stems from the assump-
tion of linearity. The importance of a feature is determined by the value of the
correlation coefficient between a feature vector and a response signal: the closer
the correlation value to 1 or to -1, the more important the feature. A variant
of this approach—based on the same mathematical correlation—uses the coeffi-
cients in a linear regression model to indicate the relevance of a feature.
(B) offers a large amount of literature mostly motivated by high-dimensional,
nonlinear machine-learning problems facing large data sets. Computational lim-
itations make it necessary to reduce the dimensionality of the available feature
set before applying a classifier algorithm or a function approximation algorithm.
The list of common techniques includes Principle Component Analysis (PCA),
which projects the feature space on the most relevant (linear) subset of com-
ponents, and Independent Component Analysis (ICA), which is the nonlinear
equivalent of PCA (Gershenfeld 1999b). Both PCA and ICA are designed to
transform the feature set for the purpose of estimating the dependent signal,
but they do not relate an individual feature to the dependent signal. In fact,
most prior work in machine learning is focused on estimating the dependent
signal, not the significance of individual features.
Prior art can also be found in the field of information theory. Koller & Sahami
(1996) developed a methodology for feature selection in multivariate, supervised
classification and pattern recognition. They select a subset of features using a
subtractive approach, starting with the full feature set and successively removing
features that can be fully replaced by a subset of the other features. Koller &
Sahami use the information-theoretic cross-entropy, also known as KL-distance
(Kullback & Leibler 1951) in their work.
Determining Feature Relevance in Subject Responses to Musical Stimuli 117
M denotes the number of basis functions and ei,m depends on the order of
polynomial approximation. For example, a two-dimensional quadratic model in-
cludes a total of M = 5 basis functions: (x1 ), (x2 ), (x1 2 ), (x1 x2 ) and (x2 2 ).
The parameters in this model are typically estimated in a least-mean-square
fit over the experimental data set, which is computationally inexpensive for
small to medium dimensional feature sets (Gershenfeld 1999b). Using the model
we compute
ŷ = f (x) for all data points (xn , yn ), and subsequently derive
E = N (ŷn − yn )2 /N .
It is a well-known fact that we can cause the error E to shrink to an arbi-
trarily small value by adding more and more resources to the model—that is, by
increasing the number of parameters and basis functions. However, in doing so
we are likely to model noise rather than the underlying causal data structure. In
order to avoid this problem, we cross-validate our model and introduce a global
regularizer that constrains our model to the “right size.”
We divide the available data into two data sets. The training data set (x, y)tr is
used to optimize the parameters of the model, whereas the test data set (x, y)test
118 M.M. Farbood and B. Schoner
(a)
(b)
Fig. 1. (a) 1-D plot of features x1 , x2 , x3 , and function yB and (b) 3-D plot of function
yB (4)
Determining Feature Relevance in Subject Responses to Musical Stimuli 119
is used to validate the model using Etest . As we slowly increase the number of
model parameters, we find that the test data estimation error Etest decreases
initially, but starts to increase as soon as the extra model parameters follow the
randomness in the training data. We declare that the model resulting in the
smallest estimation error
Em = (ŷn,m − yn )2 /Ntest (3)
Ntest
represents the best model architecture for the data set at hand.
Given these considerations, we can now provide a step-by-step algorithm to
determine the Relevance Ratio Ri :
1. Divide the available experimental data into the training set (x, y)tr and
(x, y)test . (x, y)test typically represents 10%−30% of the data. If the amount
of data set is very limited more sophisticated bootstrapping techniques can
be applied (Efron 1983).
2. Build a series of models m based on the complete feature set F , slowly
increasing the complexity of the model, i.e. increasing
the polynomial order.
3. For each model m compute the error Em = Ntest (ŷm − y)2 /Ntest . Choose
the model architecture m that results in the smallest Em . Next, build models
mi for all sets (xi , y), where the vector xi (Fi ) includes all features F , except
for xi .
4. Compute Ei = Ntest (ŷi − y)2 /Ntest for all feature sets Fi and derive the
Relevance Ratio Ri = Em /Ei for all features xi .
(a)
Polynomial Order
Function 1 2 3 4 5 6
A Training Set Error 0.8960 0.0398 0.0398 0.0397 0.0395 0.0392
Test Set Error 0.8938 0.0396 0.0396 0.0398 0.0399 0.0413
B Training Set Error 0.9989 0.1123 0.1121 0.0740 0.0728 0.0546
Test Set Error 1.0311 0.1204 0.1210 0.0848 0.0898 0.0924
(b) (c)
Function A Feature Set Function B Feature Set
F1 F2 F3 F1 F2 F3
Error Training Set 0.8960 0.1452 0.0398 Error Training Set 0.2328 0.8406 0.0745
Error Test Set 0.8938 0.1480 0.0396 Error Test Set 0.2478 0.8590 0.0842
x1 x2 x3 x1 x2 x3
Relevance Ratio 0.0443 0.2674 0.9995 Relevance Ratio 0.3423 0.0988 1.0072
120 M.M. Farbood and B. Schoner
Polynomial models and generalized linear models have many nice properties, in-
cluding the fact that parameter sets are easily understood. The drawback of these
models is that the number of basis terms increases exponentially with the dimen-
sionality of x, making them computationally prohibitive for high-dimensional
data sets.
The second category of nonlinear models uses variable coefficients inside the
nonlinear basis functions
K
y(x) = f (x, ak ). (5)
k=1
The most prominent examples of this class of models are artificial neural net-
works, graphical networks, and Gaussian mixture models (GMM). The models
are exponentially more powerful, but training requires an iterative nonlinear
search. Here we demonstrate the methodology with GMM’s which, as a subclass
of Bayesian networks, have the added benefit of being designed on probabilistic
principles.
GMM’s are derived as the joint probability density p(x, y) over a set of data
(x, y). p(x, y) is expanded as a weighted sum of Gaussian basis terms and hence
takes on the form
M
p(y, x) = p(y, x, cm ) (6)
m=1
M
= p(y|x, cm )p(x|cm )p(cm ) . (7)
m=1
Determining Feature Relevance in Subject Responses to Musical Stimuli 121
Table 2. Application of the GMM estimator to functions yA and yB (4): (a) indicates
the error for the different model m based on x; (b) and (c) indicate the resulting
Relevance Ratios for features x1 , x2 , and x3
(a)
Number of Clusters
Function 2 4 6 8 10 12 14 16 18 20
A Training Set error 0.414 0.056 0.044 0.043 0.041 0.040 0.041 0.041 0.041 0.040
Test Set Error 0.413 0.056 0.046 0.044 0.041 0.041 0.042 0.042 0.043 0.041
B Training Set Error 0.343 0.151 0.095 0.075 0.052 0.044 0.040 0.035 0.033 0.028
Test Set Error 0.362 0.161 0.106 0.081 0.056 0.050 0.046 0.041 0.038 0.033
22 24 26 28 30 32 34 36 38 40
A Training Set error 0.040 0.040 0.039 0.039 0.039 0.039 0.039 0.039 0.039 0.039
Test Set Error 0.043 0.042 0.041 0.041 0.041 0.041 0.041 0.043 0.042 0.041
B Training Set Error 0.027 0.029 0.024 0.025 0.024 0.022 0.021 0.024 0.024 0.021
Test Set Error 0.029 0.035 0.027 0.030 0.028 0.024 0.026 0.029 0.029 0.026
(b) (c)
Function A Feature Set Function B Feature Set
F1 F2 F3 F1 F2 F3
Error Training Set 0.8950 0.1464 0.0403 Error Training Set 0.2502 0.7630 0.0209
Error Test Set 0.8958 0.1515 0.0408 Error Test Set 0.2575 0.8646 0.0233
x1 x2 x3 x1 x2 x3
Relevance Ratio 0.0453 0.2676 0.9928 Relevance Ratio 0.0912 0.0272 1.0096
We choose
|P−1
k |
1/2 −1
e−(x−mk ) ·Pk ·(x−mk )/2
T
p(x|ck ) = D/2
, (8)
(2π)
where Pk is the weighted covariance matrix in the feature space. The output
distribution is chosen to be
|P−1
k,y |
1/2
·P−1
e−(y−f (x,ak )) k,y ·(y−f (x,ak ))/2
T
p(y|x, ck ) = , (9)
(2π)Dy /2
where the mean value of the output Gaussian is replaced by the function f (x, ak )
with unknown parameters ak .
From this we derive the conditional probability of y given x
y|x = y p(y|x) dy (10)
K
f (x, ak ) p(x|ck ) p(ck )
= k=1 K ,
k=1 p(x|ck ) p(ck )
which serves as our estimator of ŷ. The model is trained using the well-known
Expectation-Maximization algorithm.
The number of Gaussian basis functions and the complexity of the local models
serve as our global regularizers, resulting in the following step-by-step algorithm
analogous to the polynomial case discussed before:
122 M.M. Farbood and B. Schoner
1. Divide the data into training set (x, y)tr and test set (x, y)test .
2. Build a series of models m based on the complete feature set F , slowly
increasing the number of Gaussian basis functions.
3. For each model m compute the error Em = (ŷm − y)2 /Ntest . Choose the
model architecture m that results in the smallest Em . Build models mi for
all sets (xi , y).
4. Compute Ei = Ntest (ŷi − y)2 /Ntest for all feature sets Fi and derive the
Relevance Ratio Ri = Em /Ei for all features xi .
Applying this new approach to our artificial data sets from before (4), we
obtain the results in Table 2.
5 Kullback-Leibler Distance
The linear least-mean-square error metric is without doubt the most commonly
used practical error metric, however, other choices can be equally valid. The
framework of the Gaussian mixture model allows for the introduction of a prob-
abilistic metric, known as the cross entropy or Kulback-Leibler distance (KL-
Distance) (Kullback & Leibler 1951). The KL-Distance measures the divergence
between two probability distributions P (x) and Q(x):
P (x)
DKL (P ||Q) = P (x)log dx (11)
x Q(x)
1
N
≈ [log(p(yn |xn ) − log(p(yn |xi,n ))] ,
N n=1
Here we replaced the integral over the density with the sum over the observed
data (which itself is assumed to be drawn from the density).
To compute DKL (p||pi ) we need to first estimate p(yn |xi,n ). However, this step
consists of estimating the local model parameters only, a relatively minor task.
All other parameters needed to numerically evaluate this equation are already
part of the model built in the first place.
Determining Feature Relevance in Subject Responses to Musical Stimuli 123
(a)
(b)
Fig. 2. Features xi and three subject responses (same subject) for (a) the Brahms
excerpt (Fig. 5) and (b) the Bach-Vivaldi excerpt (Fig. 4). H = harmony, L = loudness,
M = melodic expectation, PB = pitch height of bass line, PS = pitch height of soprano
line.
124 M.M. Farbood and B. Schoner
6 Experimental Results
6.1 Data Set
¡ ÌÌ ¡¡ ¡¡ ¡
D ÌÌÌÌ ¡¡ O ÌÌ ¡
¡¡ ¶ ÌÌÌ ¡¡ ¡¡ ¡¡ ¡ ¡ O ¡ K
E D ÌÌÌÌ ¡¡ ¶ O ÌÌÌ ¡¡ Ì ¡¡ ¡¡ ¡¡ ¡ ¡ O ¡ ¡ ¡ ¡
, Ì ¡ , ¡¡ ¡
,
G K G
ÌÌ ¡¡ Ì ¡ ¶ Ì ¡ ¡ ¡
"E Ì ¶ Ì ¡ ¡ À
¡ Ì ¡ ¡
,
Fig. 3. Score of Beethoven excerpt
6.2 Results
The key results for all of the complex tonal examples are represented in Table 3.
We use both the polynomial models and GMMs and apply our method to various
subsets of the feature space. The results are largely robust against variations in
2
Without the melodic attraction component; this factor is taken into account sepa-
rately with Margulis’s model.
Determining Feature Relevance in Subject Responses to Musical Stimuli 125
Table 3. Summary of experimental results for the musical tension study. For each
experiment we indicate the type of estimation (polynomial or GMM), the global reg-
ularizer (polynomial order or number of Gaussians) and the Relevance Ratio of each
feature: H = harmony, L = loudness, M = melodic expectation, O = onset frequency,
PB = pitch height of bass line, PS = pitch height of soprano line.
Brahms
Type POLY Relevance Ratio
Polynomial order 3 H L M O PB PS
Num. Gaussians N/A 1.0166 1.0099 1.0306 1.0247 1.0251 0.9571
Type POLY Relevance Ratio
Polynomial order 3 H L M PB PS
Num. Gaussians N/A 0.8869 0.6133 0.8527 1.0099 0.8366
Type POLY Relevance Ratio
Polynomial order 4 H L PB PS
Num. Gaussians N/A 0.8460 0.4795 0.6787 0.6367
Type POLY Relevance Ratio
Polynomial order 4 H L M
Num. Gaussians N/A 0.8623 0.3228 0.5750
Type GMM Relevance Ratio
Polynomial order N/A H L M PB PS
Num. Gaussians 16 0.7230 0.2953 0.6583 0.7478 0.9509
Bach-Vivaldi
Type POLY Relevance Ratio
Polynomial order 2 H L M O PB PS
Num. Gaussians N/A 0.6950 1.1549 0.9703 0.7653 0.8047 0.9472
Type POLY Relevance Ratio
Polynomial order 3 H L M PB PS
Num. Gaussians N/A 0.7413 1.1131 0.9780 1.0115 0.9696
Type POLY Relevance Ratio
Polynomial order 3 H M PB PS
Num. Gaussians N/A 0.6436 0.8953 0.9265 0.8112
Type POLY Relevance Ratio
Polynomial order 3 H L M PS
Num. Gaussians N/A 0.7514 1.0195 0.8625 0.8717
Type GMM Relevance Ratio
Polynomial order N/A H L M PS
Num. Gaussians N/A 0.6667 1.0439 0.9362 0.8945
Beethoven
Type POLY Relevance Ratio
Polynomial order 2 H L M O PB PS
Num. Gaussians N/A 1.0575 0.9699 0.9689 0.9644 0.9375 1.0822
Type POLY Relevance Ratio
Polynomial order 2 H L M PB PS
Num. Gaussians N/A 1.0607 0.9749 1.0604 1.0580 1.0252
Type POLY Relevance Ratio
Polynomial order 2 H L PB PS
Num. Gaussians N/A 0.9502 0.4230 1.0448 1.0289
Type GMM Relevance Ratio
Polynomial order N/A H L PB PS
Num. Gaussians 4 1.2435 0.4087 0.8488 1.1299
126 M.M. Farbood and B. Schoner
¡ ¡ D¡ ¡ ¡ ¡ ¡ ¡D¡ ¡ ¡ ¡ ¡ ¶ K ¶ K ¶
¡ ¡ ¡ ¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡ ¡ ¡ ¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡¡¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡¡ ¡
¡¡ ¡¡ ¡¡¡¡
E ¶, ¶ , ¶, ¶,
¶ K ¶ K
E ¡¡ ¶ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¶ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡¡ D ¡¡¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
, , ¡¡ ¡¡ ¡¡
¡ ¡ ¡
"E ¡ ¶¡ ¡¡¡¡ ¶¡ ¡¡¡¡ ¡ K ¶ ¡K ¡ ¡ ¡ ¡
, , ¡ ¶ ¡ ¡ ¡¡¡ ¡
K
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ D ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡D ¡ ¡ D ¡ ¡ D ¡ ¡ ¡ ¡ ¡ ¡ ¡K ¡ ¡ ¡ O ¡ ¡ ¡O ¡ ¡
6
¡ , , ,¡ ¡ ¡
6
¡ ¶ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡K ¡ ¡ ¡ ¡ ¡ ¡ K K K
¡ D¡ ¡ D ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ D ¡ ¡ ¡ ¡ ¡ D ¡ D ¡ ¡ ¡ D ¡ ¡ ¡ ¡ ¡ ¡ ¡
6
" ¡ ¡¡ ¡ ¡¡ ¡ ¡
D ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
S
11
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡
¡
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ,¡ ¡¡, ¡ ¡¡ ¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡ ¡¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡
¡
¶, ¶, , ¡
11
¶ K¡ ¶ K ¡ ¡ ¡¡ ¡ ¡ ¶ ¡ ¶ ¡ ¡ ¡ ¶ ¡ ¡ ¡ ¶ ¶
O¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
11
" ¶ K¡ ¶ K ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
16
¡ ¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡ ¡ ¡
16
¡ ¡ ¡¡
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡D ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
¡ ¡ ¡ D ¡ ¡ ¡¡ ¡ ¡ ¡ ¡ O ¡ ¡ ¡¡ ¡ ¡ ¡
¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ D¡ ¡ ¡ O¡ ¡
16
¡ ¡ ¡ ¡
"
¡ ¡ ¡ ¡ ¡
¡
¡
¡
20
¡¡ ¡ ¡¡ ¡
¡¡¡¡ ¡ ¡ ¡ ¡¡¡¡ ¡ ¡ ¡¡¡ ¡¡ ¡¡¡ ¡¡ ¡¡¡¡ ¡ ¡¡¡¡ ¡ ¡¡D ¡¡¡¡¡¡¡¡¡¡¡¡¡ ¡ ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡
¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
20
¶ ¶ ¶ ¶ ¡ ¶ ¶ ¶ ¡ ¶ ¡ ¶ ¡ ¶ ¡
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
¡ ¡ ¡ ¡ ¡ ¡ ¡¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡
20
" ¡ ¡ D¡ ¡
¡ ¡ ¡ ¡ ¡
¡
Fig. 4. Score of Bach-Vivaldi excerpt
¡¡ ¡¡ O ¡ ¡¡ ¡ ¡ ¡¡ ¡¡ ¡¡ D ¡¡ ¡¡ ¡¡ ¡ ¡¡ ¡¡ ¡¡ ¡¡ D ¡¡ ¡¡ ¡¡ ¡¡ O ¡ O ¡ ¡¡
¡ ¡¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡¡ D ¡
D DDD E ¡ O¡ ¡ ¡ ¡ ¡ ¡¡ ¡¡ ¡¡ ¡ ¡¡ ¡ ¡ ¡ O ¡O ¡ O ¡ ¡ ¡
3 3 3
3 3 3 3 3 3 3 3
O¡ ¡ 3
¡ 3
¡
" D D D E ¡ ¡¡ ¡ O¡
3 3 3
O¡ ¡ ¡
3
¡ ¡ ¡
3
¡
D ¡ ¡ ¡ ¡ ¡¡ ¡ ¡¡ ¡ D¡ ¡¡ ¡ O¡ O¡ ¡
¡ ¡ ¡ ¡ ¡ D¡ O¡
¡ ¡ ¡
¡ 3 ¡ ¡ ¡ ¡
D D ¡¡¡ ¡¡¡ O ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ D ¡¡¡ ¡¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ ¡¡ O ¡¡ ¡ ¡ ¡
5
DD D D ¡ ¡ O¡ ¡ ¡¡ ¡ ¡¡ ¡ ¡ ¡ O ¡ ¡ ¡ ¡ ¡ ¡ ¡ " ¡¡ ¡¡¡ ÌÌ¡ O ¡
¡¡
3 3 3 3
¡ ! ! !
3 3 3 3 3 3 3
¡ O¡ ¡ D¡
3
¡ ¡¡ ¡
3
¡¡ ¡ ¡ 3
¡ D¡ ¡
5
" DD D
D ¡ O¡ ¡ ¡ D¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡¡ ¡ ¡¡ ¡ ¡ ¡ ¡ Ì
¡ ¡ ¡ ¡ ¡ , ¡¡ ¡¡ ÌÌ¡ O ¡
¡
3
¡ ¡ !¡ ! !
3 3 6
3
¡ ¡¡ O ¡¡ ¡ ¡¡ ¡ ¡ D ¡ ¡¡ ¡ K ¡D¡
O ¡ ¡ O ¡ ¡ D ¡ ¡¡ ¡ O ¡¡ O ¡¡¡ ¡¡ ¡ ¡D ¡
9
DD ¡¡ ¡ O ¡ ¡¡ ¡ ¡¡ ¡¡ O D ¡¡ ¡ ¡¡
DD ¡O ¡ ¡O ¡ D¡ O ¡
¡¡ ¡ ¡ O ¡ O ¡ ¡ ¡ ¡ ¡
¡
¡¶
¶
¡
¡O¡ ¡
¡O¡ ¡ ¡
O ¡¡ ¡ O¡ ¡O¡ ¡
K 6
¡
¡ ¡ O ¡¡ O ¡ O ¡ ¡ ¡ ¡ ¡ ¡D ¡ ¡¡
O ¡ ¡
9
" DD ¡O ¡ ¡ , ¶ ¡ D ¡¡¡ ¡
¡ ¡ ¡ ¡O ¡ O ¡ ¡ ¡
D D O¡ ¡O¡ ¡ D¡¡ ¡ ¡ ¡ ¡ ¡¡ ¡O ¡ O ¡ ¡ ¡ ¡
¡ ¡ ¡
¡ ¡O ¡O ¡ ¡¡
6 6 3
6
¡¡ ¡¡ ¡¡ ¡¡¡¡
12
Mathematically, this phenomenon can be explained by the fact that the features
are not statistically independent and that the relevance of one feature may be
entirely assumed by an other feature (or a set of features) (Koller & Sahami
1996).
We observe in the case of the Brahms excerpt that loudness is clearly the
predominant feature and hence has the smallest Relevance Ratio. In the case of
the Bach-Vivaldi excerpt, harmony is primarily responsible for perceived tension.
In the Beethoven excerpt, like the Brahms, loudness has the most impact on the
response. This makes qualitative sense, as there are no clear changes in the
128 M.M. Farbood and B. Schoner
dynamics for the Bach-Vivaldi example, unlike the case for the Brahms and
Beethoven, where change in loudness is a salient feature.
The Relevance Ratio confirms that listeners relate salient changes in mu-
sical parameters to changes in tension. While there are multiple factors that
contribute to how tension is perceived at any given moment, one particular fea-
ture may predominate, depending on the context. The Relevance Ratio reveals
the overall prominence of each feature in the subject responses throughout the
course of a given excerpt. While it could be argued that listeners respond more
strongly to certain features (e.g. loudness over onset frequency), it is the degree of
change in each parameter that corresponds most strongly to tension, regardless
of whether the feature is purely musical, as in the case of harmony and melodic
contour, or expressive, as in the case of tempo and dynamics.
Summary
We have introduced an new estimator called the Relevance Ratio that is derived
from arbitrary nonlinear function approximation techniques and the least-mean-
square error metric. To demonstrate the functionality of the Relevance Ratio, it
was first applied to a set of artificial test functions where the estimator correctly
identified relevant features. In a second step the estimator was applied against
a data set of experimental subject responses where we gained valuable insights
into the relevance of certain salient features for perceived musical tension. Ad-
ditionally, we introduced the KL-Distance as an alternative estimator defined in
purely probabilistic terms.
References
Bigand, E., Parncutt, R., Lerdahl, F.: Perception of musical tension in short chord se-
quences: The influence of harmonic function, sensory dissonance, horizontal motion,
and musical training. Perception & Psychophysics 58, 125–141 (1996)
Bigand, E., Parncutt, R.: Perceiving music tension in long chord sequences. Psycho-
logical Research 62, 237–254 (1999)
Efron, B.: Estimating the Error Rate of a Prediction Rule: Improvements on Cross-
Validation. Journal of the American statistical Asociation 78, 316–331 (1983)
Farbood, M.: A Quantitative, Parametric Model of Musical Tension. Ph.D. Thesis,
Massachusetts Institute of Technology (2006)
Gershenfeld, N.: The Nature of Mathematical Modeling. Cambridge University Press,
New York (1999)
Gershenfeld, N., Schoner, B., Metois, E.: Cluster-Weighted Modeling for Time Series
Analysis. Nature 379, 329–332 (1999)
Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Proceedings of the
Thirteenth International Conference on Machine Learning, pp. 284–292 (1996)
Krumhansl, C.L.: A perceptual analysis of Mozart’s Piano Sonata K. 282: Segmenta-
tion, tension, and musical ideas. Music Perception 13, 401–432
Kullback, S., Leibler, R.A.: On Information and sufficiency. Annals of Mathematical
statistics 22, 76–86 (1951)
Determining Feature Relevance in Subject Responses to Musical Stimuli 129
Lerdahl, F.: Tonal Pitch Space. Oxford University Press, New York (2001)
Lerdahl, F., Krumhansl, C.L.: Modeling Tonal Tension. Music Perception 24, 329–366
(2007)
Madson, C.K., Fredrickson, W.E.: The experience of musical tension: A replication of
Nielsen’s research using the continuous response digital interface. Journal of Music
Therapy 30, 46–63 (1993)
Margulis, E.H.: A Model of Melodic Expectation. Music Perception 22, 663–714 (2005)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Nielsen, F.V.: Oplevelse of Musikalsk Spaending. Akademisk Forlag, Copenhagen
(1983)
Picard, R.W., Vyzas, E., Healey, J.: Toward Machine Emotional Intelligence: Analysis
of Affective Physiological State. IEEE Transactions Pattern Analysis and Machine
Intelligence 23(10), 1175–1191 (2001)
Schoner, B.: Probabilistic Characterization an Synthesis of Complex Driven Systems.
Ph.D. Thesis, Massachusetts Institute of Technology (2000)
Toiviainen, P., Krumhansl, C.L.: Measuring and modeling real-time responses to music:
The dynamics of tonality induction. Perception 32, 741–766 (2003)
Sequential Association Rules in Atonal Music
1 Introduction
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 130–138, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Sequential Association Rules in Atonal Music 131
Table 2. Prototypes expressed in pitch class sets for the six categories
Table 3. Prototypes expressed in interval class vectors for the corresponding classes
of different cardinality
prototypes (IcV)
IC1 IC2 IC3 IC4 IC5 IC6
duochord [1 0 0 0 0 0] [0 1 0 0 0 0] [0 0 1 0 0 0] [0 0 0 1 0 0] [0 0 0 0 1 0] [0 0 0 0 0 1]
classes
trichord [2 1 0 0 0 0] [0 2 0 2 0 0] [0 0 2 0 0 1] [0 0 0 3 0 0] [0 1 0 0 2 0] [1 0 0 0 1 1]
classes
tetrachord [3 2 1 0 0 0] [0 3 0 2 0 1] [0 0 4 0 0 2] [1 0 1 3 1 0] [0 2 1 0 3 0] [2 0 0 0 2 2]
classes
pentachord [4 3 2 1 0 0] [1 3 1 2 2 1] [1 1 4 1 1 2] [2 0 2 4 2 0] [0 3 2 1 4 0] [3 1 0 1 3 2]
classes
hexachord [5 4 3 2 1 0] [0 6 0 6 0 3] [2 2 5 2 2 2] [3 0 3 6 3 0] [1 4 3 2 5 0] [4 2 0 2 4 3]
classes
heptachord [6 5 4 3 2 1] [2 6 2 6 2 3] [3 3 6 3 3 3] [4 2 4 6 4 1] [2 5 4 3 6 1] [5 3 2 3 5 3]
classes
octachord [7 6 5 4 4 2] [4 7 4 6 4 3] [4 4 8 4 4 4] [5 4 5 7 5 2] [4 6 5 4 7 2] [6 4 4 4 6 4]
classes
nonachord [8 7 6 6 6 3] [6 8 6 7 6 3] [6 6 8 6 6 4] [6 6 6 9 6 3] [6 7 6 6 8 3] [7 6 6 6 7 4]
classes
decachord [9 8 8 8 8 4] [8 9 8 8 8 4] [8 8 9 8 8 4] [8 8 8 9 8 4] [8 8 8 8 9 4] [8 8 8 8 8 5]
classes
the chosen similarity measure to calculate to which prototype a pitch class set is
closest. When doing this, the categorization of pentachords according to the afore-
mentioned similarity measures IcVSIM and SATSIM are identical, so Quinn’s [10]
claim about the six categories could be made even stronger. Even more similar-
ity measures could be compared in this respect. We have compared the measures
IcVSIM [6], SATSIM [1], ASIM [8] and cosθ [12], and found they all come up with
the same classification for the duochords, pentachords, heptachords, octachords,
nonachords and decachords, and the classifications for the trichords, tetrachords
and hexachords differ at most by 3 pitch class sets. This shows that similarity mea-
sures are not too different in this respect, they agree on the classification in the
six categories as we find a very high overlap.
We will base our choice of which similarity measure to use, on the ambiguity
it produces. It turns out that Rogers’ cosθ produces the least ambiguity: when
using it to calculate the category of a pitch class set, it outputs virtually always
only one category.
134 A. Honingh, T. Weyde, and D. Conklin
p(b|a)
lift(a → b) = , (1)
p(b)
Sequential Association Rules in Atonal Music 135
where p(x) denotes the probability of category x. The lift can be understood as
the number of observed progressions divided by the number of expected progres-
sions due to chance. If the lift is greater than 1, there is a positive correlation,
if the lift is smaller than 1, there is a negative correlation.
4 Results
As described in the previous section, the occurrences of each category can be
counted. It can be expected that different types of music will show a different
occurrence rate for each category. To start with a tonal piece, for example, the
distribution of categories of the fourth movement of Beethoven’s ninth symphony
is shown is Table 4. One can observe that category 5 dominates the whole piece.
This turns out to be quite typical for tonal music. In the previous section we
have mentioned that each category can be seen as having a specific character
and category 5 represents the diatonic scale. Therefore, it is not surprising that
a piece of tonal music based on the diatonic scale is dominated by category 5.
For atonal music, we expect something different. We have run the program on
atonal music of Schoenberg, Webern, Stravinsky and Boulez. The complete list
of music is shown in Table 5. On average, the distribution as shown in Table 6
was found, using this corpus of atonal music. One can see that this distribution
is totally different from Table 4 and as such this method might be useful in
discrimination tasks. We can see that the music is not dominated anymore by
category 5 but a much more equal distribution is present in atonal music.
A transition matrix can be made with our method (Table 7), listing how many
times category i is followed by category j. We have calculated the lift matrix
as described in the previous section (Table 8) from which one can see which
progressions have a positive relation and which have a negative relation.
To answer the question which progressions are meaningful, we have to perform
a significance test. We would like to know which progressions have an occurrence
rate that is significantly higher or lower than chance level. We use a chi-square
test on the data of Table 7 to calculate which progressions cannot be explained
by our null hypothesis: the probability of class j following class i does only
136 A. Honingh, T. Weyde, and D. Conklin
composer piece
Schoenberg Pierrot Lunaire part 1, 5, 8, 10, 12, 14, 17, 21
Schoenberg Piece for piano opus 33
Schoenberg Six little piano pieces opus 19 part 2, 3, 4, 5, 6
Webern Symphony opus 21 part 1
Webern String Quartet opus 28
Boulez Notations part 1
Boulez Piano sonata no 3, part 2: “Texte”
Boulez Piano sonata no 3, part 3: “Parenthese”
Stravinsky in memoriam Dylan Thomas Dirge canons (prelude)
depend on the overall number of j’s in the music. We calculate the chi-square
statistics for every progression separately by making a 2 × 2 contingency table
(with fields i → j, i → ¬j, ¬i → j, ¬i → ¬j), and calculate the probability from
the probability density function of the chi-square distribution with 1 degree of
freedom (Table 9). If we take the significance level to be 5%, the progressions
that are significantly meaningful are printed in boldface in Table 9.
Now that we can identify the meaningful progressions for our corpus of atonal
music, we can make a table for categories analogue to Piston’s table for chords.
From the lift value in Table 8 can be seen whether a significant progression rep-
resents a positive or negative association. These significant rules can be found in
Table 9 under the headings “is followed by” (positive association) and “less often
by” (negative association). One can see that there is a tendency for categories
to follow itself, so that large regions in the music are represented by just one
category. This is in accordance with observations by Ericksson [3], who describes
7 categories similar to the ones described above and says that “it is often pos-
sible to show that one region [category] dominates an entire section of a piece”.
Besides these ‘repetitions’ of categories, one other progression can be identified
to present a sequential association rule: the progression from 5 to 4, and four
other progressions can be identified to present a negative association, sequential
‘avoidance’ rules: the progression from category 1 to 2, from 1 to 4, from 1 to 5,
and from 5 to 6.
Sequential Association Rules in Atonal Music 137
To To
category 1 2 3 4 5 6 category 1 2 3 4 5 6
1 109 23 49 36 28 62 1 1.23 0.70 1.04 0.71 0.72 1.13
2 27 21 12 15 18 22 2 0.82 1.70 0.68 0.79 1.24 1.07
3 49 18 30 21 24 22 3 1.04 1.03 1.21 0.78 1.16 0.75
From From
4 44 17 29 39 15 32 4 0.87 0.90 1.08 1.35 0.67 1.02
5 33 17 15 29 27 16 5 0.85 1.17 0.73 1.30 1.57 0.66
6 47 20 28 34 22 38 6 0.85 0.97 0.96 1.08 0.91 1.11
To
category 1 2 3 4 5 6
1 0.001 0.020 >0.5 0.009 0.026 0.111
2 0.150 0.003 0.097 0.288 0.179 >0.5
3 >0.5 >0.5 0.135 0.159 0.252 0.079
From
4 0.201 >0.5 >0.5 0.008 0.059 >0.5
5 0.163 0.438 0.104 0.047 0.003 0.030
6 0.167 >0.5 >0.5 0.345 1.222 0.254
5 Concluding Remarks
progression rules may reveal a structure of atonal music that was not known
before.
Acknowledgements
We wish to thank our colleague Mathieu Bergeron for presenting this paper at the
second international conference of the Society for Mathematics and Computation
in Music 2009.
References
1. Buchler, M.: Relative saturation of interval and set classes: A new model for under-
standing pcset complementation and resemblance. Journal of Music Theory 45(2),
263–343 (2001)
2. Conklin, D.: Melodic analysis with segment classes. Machine Learning 65(2-3),
349–360 (2006)
3. Ericksson, T.: The ic max point structure, mm vectors and regions. Journal of
Music Theory 30(1), 95–111 (1986)
4. Forte, A.: The Structure of Atonal Music. Yale University Press, New Haven (1973)
5. Hanson, H.: Harmonic Materials of Modern Music. Appleton-Century-Crofts, New
York (1960)
6. Isaacson, E.J.: Similarity of interval-class content between pitch-class sets: the
IcVSIM relation. Journal of Music Theory 34, 1–28 (1990)
7. Lerdahl, F.: Atonal prolongational structure. Contemporary Music Review 4(1),
65–87 (1989)
8. Morris, R.: A similarity index for pitch-class sets. Perspectives of New Music 18,
445–460 (1980)
9. Piston, W., DeVoto, M.: Harmony. Victor Gollancz Ltd. (1989) revised and ex-
panded edition
10. Quinn, I.: Listening to similarity relations. Perspectives of New Music 39, 108–158
(2001)
11. Rahn, J.: Relating sets. Perspectives of New Music 18, 483–498 (1980)
12. Rogers, D.W.: A geometric approach to pcset similarity. Perspectives of New Mu-
sic 37(1), 77–90 (1999)
13. Scott, D., Isaacson, E.J.: The interval angle: A similarity measure for pitch-class
sets. Perspectives of New Music 36(2), 107–142 (1998)
14. Weyde, T.: Modelling cognitive and analytic musical structures in the MUSITECH
framework. In: UCM 2005 5th Conference Understanding and Creating Music,
Caserta, pp. 27–30 (November 2005)
Badness of Serial Fit Revisited
Tuukka Ilomäki
1 Introduction
The relations between twelve-tone rows have been an integral part of the twelve-
tone system from the very beginning. The early composers used informal meth-
ods to relate rows. However, in the 1940s, Milton Babbitt initiated the process
of formalizing the theory of twelve-tone rows and their relations.
Babbitt was interested in ordered pairs throughout his writings. In particular,
he introduced the notion of the twelve-tone row as a protocol that defines the
order in which the pitch classes appear in it (Babbitt 1962).
A natural way to relate rows is to compare the ordered pairs they comprise.
Rothgeb (1967) formalized this idea as the first similarity measure for twelve-
tone rows: the more ordered pairs two rows share the more similar they are. David
Lewin’s (1976) notion of Badness of Serial Fit, or BSF, builds on the ordered pairs
of pitch classes as well. However, it does not measure the differences between two
rows, but rather picks out their similar features and counts the number of rows
that share them. The more similar two rows are, the more common properties they
have, and the more distinctive this combination of properties is and, therefore, the
fewer rows there are with these properties.
It is rather extraordinary that Lewin introduces a new similarity measure, but
does not give a single nontrivial example of calculating it. He measures a row
against itself resulting in the value 1 (a row defines a protocol that is satisfied only
by itself since no other row contains precisely the same set of ordered pairs; hence,
BSF(X, X) = 1 for any twelve-tone row X), and a row against its retrograde
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 139–145, 2009.
c Springer-Verlag Berlin Heidelberg 2009
140 T. Ilomäki
the exact number of its linear extensions. (This translates directly into the fact
that it is difficult to say from looking at two twelve-tone rows what the Badness
of Serial Fit value is.) However, Pruesse and Ruskey (1997) have developed an
efficient algorithm for generating linear extensions; its running time depends on
the number of linear extensions to be generated. In technical terms the algorithm
is O(N), where N is the number of objects generated. However, as the size of the
set increases, the maximum number of linear extensions of a partial order grows
exponentially and thus the calculation time required grows exponentially.
The complexity associated with partial orders is reflected also in the number
of existing partials orders: the number of possible partial sets on a set grows
exponentially with respect to the cardinality of the set. For instance, Erné and
Stege (1991A) have calculated that the number of partial orders that can be
defined on the set of cardinality twelve is 414864951055853499. Incidentally,
since 12! · 12! < 414864951055853499, this also shows that not all partial orders
can be expressed in terms of an intersection of two linear orders.
It is easy to show that BSF does not define a metric for two reasons. First,
the value of BSF for two identical rows is not zero and, secondly, the triangle
inequality does not hold. For example, using integers 0, 1, . . . , 11 for pitch classes
(A and B standing for the integers 10 and 11, respectively), we have
Here e(P ) denotes the number of linear extensions of partial order P . A key
observation here is that the incomparability graph of X ∩ Z is covered by the
incomparability graphs of X∩Y and Y ∩Z. In order to apply Sidorenko’s theorem
to the current setting, let us take k = 2 and thus obtain the following corollary:
Corollary 1. If X, Y , and Z are three linear orders on the same set, then the
inequality e(X ∩ Z) ≤ e(X ∩ Y )e(Y ∩ Z) holds.
Let us now examine the triangle inequality for Badness of Serial Fit in more
detail. We obtain the following inequality from Corollary 1:
BSF LOGBSF
Fig. 1. The distributions of BSF and LOGBSF. The distribution of BSF goes almost
along the axes and is therefore difficult to discern in the picture.
the values using logarithms does not, in a sense, give us any new information.
However, we obtain a better perspective by using the logarithmic values. As
shown in Figure 1, the distribution of values in BSF is extremely skewed, while
the distribution of the logarithmic values creates curve resembling the bell curve.
According to BSF the most similar non-identical rows are those in which two
adjacent pitch classes have been exchanged. For example, the only difference
between rows 0123456789AB and 1023456789AB is the order of the adjacent
pitch classes 0 and 1. These two rows are the only ones that satisfy the protocol
they define. Hence BSF(0123456789AB, 1023456789AB) = 2. If we select 2 as the
base we get LOGBSF(0123456789AB, 1023456789AB) = 1. Therefore, a minimal
difference results in conveniently the value 1.
Left invariance guarantees that distances between objects do not depend on how
the they are labeled. In Definition 2, permutation τ is applied to permutations
π and σ to “relabel” the entities in them.3 Variables π and σ can be interpreted
as pitch-class rows and variable τ as an pitch-class operation. In this context,
left invariance means that we are thinking purely in permutational terms and
only the ordering relations of the twelve pitch classes matter. Correspondingly,
variables π and σ can be interpreted as order-number rows and variable τ as an
order-number operation.
All similarity measures that measure the ordering aspect of rows provide left
invariance for pitch-class rows. In other words, even if we do not customarily
think in such terms, any pitch-class operation, such as a transposition, could
be seen as relabeling the pitch classes. Hence, the application of any pitch-class
operation to pitch-class rows amounts to a relabeling of the pitch classes, but
the order relations between the elements of the rows are not changed.
Assume now that we are measuring the Badness of Serial Fit of rows X and
Y . It can be proved that BSF does not depend on how the entities are labeled
(the proof would entail showing that a given row satisfies a given protocol if and
only if the relabeled row satisfies the relabeled protocol); thus it is left invariant.
Let us now relabel the pitch classes in such a way that row Y becomes row id =
0123456789AB. The new rows will now be Y −1 X and Y −1 Y = 0123456789AB,
and the BSF value for the original rows X and Y is identical to that for rows
Y −1 X and Y −1 Y . Now, since order-number operations and twelve-tone rows
can both be reinterpreted as permutations, we can reinterpret row Y −1 X as the
order-number operation YX−1 that transforms order-number row X into order-
number row Y (since pitch-class row X interpreted as a permutation is identical
2
The concept of left invariance is often known as right invariance since right orthog-
raphy is usually used; see, for example, Mannila (1985) and Estivill-Castro, Mannila
and Wood (1993). However, as I use left orthography here I define the concept as
left invariance.
3
In fact, this strategy of relabeling the elements is used by Wong and Ruskey in the
implementation of an algorithm devised by Pruesse and Ruskey (1997) to calculate
the number of linear extensions of a partial order.
Badness of Serial Fit Revisited 145
5 Conclusions
The simple idea behind Badness of Serial Fit gives rise to enormous complexity,
which makes the measure truly fascinating. Results in mathematics have reg-
ularly informed music theory, but only rarely have music-theoretic results lead
to new ideas in mathematics. That is the case here as David Lewin’s intuition
about using logarithmic values in BSF lead to the search and definition of a new
metric for permutations.
References
Babbitt, M.: Twelve-Tone Rhythmic Structure and the Electronic Medium. Perspec-
tives of New Music 1(1), 49–79 (1962)
Brightwell, G., Winkler, P.: Counting linear extensions. Order 8(3), 225–242 (1991)
Erné, M., Stege, K.: Counting finite posets and topologies. Order 8(3), 247–265 (1991)
Estivill-Castro, V., Mannila, H., Wood, D.: Right invariant metrics and measures of
presortedness. Discrete Applied Mathematics 42, 1–16 (1993)
Lewin, D.: On Partial Ordering. Perspectives of New Music 14(2)/15(1), 252–257 (1976)
Mannila, H.: Measures of Presortedness and Optimal Sorting Algorithms. IEEE Trans-
actions on Computers 34(4), 318–325 (1985)
Morris, R.: Composition with Pitch-Classes: A Theory of Compositional Design. Yale
University Press (1987)
Morris, R.: Class Notes for Advanced Atonal Music Theory. Frog Peak Music (2001)
Pruesse, G., Ruskey, F.: Generating Linear Extensions Fast. SIAM Journal on Com-
puting 23(2), 373–386 (1997)
Rothgeb, J.: Some Ordering Relationships in the Twelve-Tone System. Journal of Music
Theory 11(2), 176–197 (1967)
Sidorenko, A.: Inequalities for the number of linear extensions. Order 8(4), 331–340
(1997)
Starr, D.: Derivation and Polyphony. Perspectives of New Music 23(1), 180–257 (1984)
Ward, J.: Theories of similarity among ordered pitch class sets. Ph.D. dissertation, The
Catholic University of America (1992)
Estimating the Tonalness of Transpositional Type
Pitch-Class Sets Using Learned Tonal Key Spaces
Özgür İzmirli
1 Introduction
After hundreds of years of common practice tonality, over the 20th century,
composers of atonal music have explored ways to avoid tonal centers in their music.
They have purposefully organized the same 12 pitches in ways that perceptually
preclude any pitch from assuming a more stable and central role. Serialism and in
particular the 12-tone technique was the point at which an even pitch distribution was
spelled out by rules. Of course, in the broadest sense of atonality, not all musical
works were serial. Some composers even in the late 19th century explored the means
of avoiding tonal centers in some of their works. From that time onward, the
compositional possibilities now extending from tonal to atonal brought about the
questions of how to choose pitch sets to achieve the desired level of atonality.
Furthermore, on the analytical side there would be need for a new framework to
compose, understand and analyze these works.
Pitch-class set analysis has been devised mainly for use in analyzing atonal music. A
pitch-class representation employs principles of octave equivalence and enharmonic
equivalence, a property that makes it convenient when dealing with acoustical input. On
the whole, the 12 pitch classes form a closed modulo 12 system in which operations
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 146–153, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Estimating the Tonalness of Transpositional Types 147
such as transposition and inversion are defined. Nonetheless, the same formalism can be
used to systematically study pitch-class sets in terms of their tonal implications, whether
they are taken from a diatonic collection or chosen more arbitrarily.
Given a pitch-class set of any cardinality, a transpositional type (Tn-type) [1]
represents a collection of 12 distinct transpositions of the pitches in that set. For
example, a Tn-type for a major triad comprises the pitch class sets {0,4,7}, {1,5,8},
{2,6,9} ... {4,8,11}... and can be represented by the generator pattern in prime form -
the one that starts at index 0. Each transposition can be shown as {0,4,7}Tn for n=0..11.
Temperley [2] proposed a probabilistic framework for measuring tonal implication,
tonal ambiguity and tonalness for pitch-class sets. According to Temperley, tonal
implication is the key the pitch-class set implies. Ambiguity refers to whether a set
implies a single key or more than one key. Tonalness is the degree to which a set is
characteristic of common-practice tonality. In this work we draw from these
definitions in order to quantify tonalness as a single measure.
Van Egmond and Butler [3] carried out a systematic analysis of pitch-class sets in
order to relate them to the common diatonic pitch collections. They listed the
connotations of Tn-types of cardinalities between two and six for the major, harmonic
minor and ascending minor sets.
Huron [4] has shown that the pitch-class sets that provide the most consonant
interval-class collections are the major diatonic scale, the harmonic and melodic
minor scales. He further points out that consonant harmonic intervals are found more
often in these sets than in other possible sets that can be drawn from the 12 equally
tempered pitch chromas.
Brown [5] suggested that there were two approaches to the perception of tonality:
structural and functional approaches. The structural approach assumes that a
distribution obtained by integration of pitches over time can be used to determine the
key. The key with the most similar distribution to the one calculated is selected as the
key of the musical fragment. Due to the choice of long integration periods this
approach in insensitive to the order of notes. This is in contrast to the functional view
which maintains that the sequence and organization of notes play an important role in
how people perceive the tonal center.
In this work, we concentrate on a structural approach to explore the tonalness of Tn-
sets constructed using real audio. We use a low dimensional representation obtained
from accumulated spectral information over long-term windows that span many chords
and even phrases in the quantification of tonal ambiguity. For this reason, we do not
distinguish between successive and simultaneous occurrences of the pitches in the Tn-
sets. Most of the models in the literature that deal with the problem of key finding from
audio accumulate spectral information in a similar manner and utilize key profiles as
reference points ([6]; see [7] for a survey of key finding methods).
This paper outlines a method for estimation of the tonalness of pitch-class sets
constructed using acoustical instrument sounds. Being mindful of the multidimensional
and abstract nature of tonality, the term tonalness, in this work, refers to how strongly
the input suggests congruence to the pitch use in the common practice of tonal music.
Tonal strength can be understood as the opposite of an atonal quality. In other words,
tonalness indicates that a listener can identify a tonal center and even follow
modulations into other keys. The lack of tonalness or tonal ambiguity, however, would
not allow for a clear tonal center to be established and therefore, works of this nature
148 Ö. İzmirli
would not have the flexibility of using the full extent of tonal key space. A generalized
measure is defined in this work that aims at quantifying this property of pitch class sets.
Chroma based features are very robust representations and have been shown to
work well in problems such as key finding, chord recognition, harmonic change
detection, audio segmentation and audio alignment (i.e. [8], [9], [10], [11], [7], [12],
[13]). Chromagrams try to capture the pitch content of the audio input by mapping
pitch-class semitone frequency ranges into their corresponding chromagram bins.
They are, however, susceptible to variations in timbre and especially to the spectral
distribution of partials. They therefore only approximate a pitch-class distribution of
the input music.
To obtain the key space, the audio is first filtered and down-sampled to fs = 5512.5
samples per second. An N=2048 point sliding window Fast Fourier Transform (FFT)
with a Hann window and 50% overlap is calculated for the duration of the signal of
interest. Next, a 12-element chromagram vector, C, is calculated for each FFT frame
with bin j mapping to chromagram bin cb(j):
fs
cb( j ) = mod( round (12 log 2 ( jd )), 12) ; d= (1)
f A4 N
C (i ) = ∑ X ( j) i = 0..11
j ; cb ( j ) = i
Estimating the Tonalness of Transpositional Types 149
Where X(j) is the FFT magnitude of bin j and fA4 is the reference frequency of A4 :
440 Hz. These chromagrams are then averaged to form a single summary
chromagram. It has been shown that further dimensionality reduction of the
chromagrams can reveal music theoretical structures such as the circle of fifths and
toroidal tonal spaces ([14], [15], [16]). This has also been demonstrated by Burgoyne
and Saul [17] using Lerdahl's distances [18].
In this work an n-dimensional representation of a key space is obtained using
Principal Component Analysis (PCA). The input matrix Z contains observations, the
summary chromagrams of the audio recordings, in its rows. The summary
chromagrams are calculated from the first 45 seconds of each piece in a database of
180 tonal pieces recorded from the Naxos site (www.naxos.com). This is done
without regard to any modulations that might be taking place during that time. A
summary chromagram consists of the averages of the individual chromagram
calculations (Eq. 1). While the key distribution is not completely flat, care was taken
for each key to be sufficiently represented. The data is standardized by subtracting the
mean and dividing by the standard deviation for each variable. Next, the eigenvalues
and eigenvectors of the covariance matrix R of the input matrix Z are calculated. The
eigenvectors are then rearranged in descending order of the eigenvalues and scaled to
have unit length. The mapping matrix A is constructed from the eigenvectors of the n
largest eigenvalues. Finally, the input is transformed to the new axes by the
transformation ZA. The entity n is the number of dimensions to be kept in the
transformation. While eliminating dimensions, it is important to monitor the total
variance explained by the dimensions to be kept so as to understand how much of the
original variation is explained in the retained data.
Fig. 2. Visualization in the first two dimensions. The transformed data points from the tonal
pieces (top left). The data points for the diatonic major set of cardinality 7 projected onto the
original space (top right). The projection for the harmonic minor set (bottom left). The
projection for the chromatic set of cardinality 12 (bottom right).
with radii comparable to the structure of the learned space. On the other hand, pitch
sets that have weak tonal implications, such as the chromatic set, tend to form small
clusters (in the first two dimensions) possibly due to the variance being spread to
other dimensions - their inter-distance patterns not being similar to the learned ones.
4 Evaluation
The method is evaluated by comparing the results to a generalized version of root
ambiguity measure proposed by Parncutt [19]. We interpret root ambiguity of a pitch-
class set to inversely reflect the degree of tonalness. In a Tn-type, if the root
ambiguity is low then the implication of key will be strong for every transposition.
This will result in a pattern resembling a diatonic set in the tonal key space. On the
other hand if the ambiguity is high then it will not manifest such a pattern. Parncutt
originally proposed this measure for sequential tones but we employ it without
Estimating the Tonalness of Transpositional Types 151
Table 1. Pitch-class sets of cardinality 3 used in this study. Sets are given by Forte name and
prime form without inversional equivalence
Number Forte Name Prime Form Number Forte Name Prime Form
1 3-1 012 11 3-7 025
2 3-2 013 12 3-7B 035
3 3-2B 023 13 3-8 026
4 3-3 014 14 3-8B 046
5 3-3B 034 15 3-9 027
6 3-4 015 16 3-10 036
7 3-4B 045 17 3-11 037
8 3-5 016 18 3-11B 047
9 3-5B 056 19 3-12 048
10 3-6 024
Fig. 3. Ambiguity measures for all Tn-types in Table 1. The proposed method is compared to
ambiguity measures using 4 other profiles.
next low-ambiguity sets: {025}Tn, {035}Tn, {037}Tn and {047}Tn. Also all except one
measure are in agreement with the most ambiguous Tn-type {012}Tn.
4 Conclusions
We have outlined a method to estimate the tonalness of transpositional type pitch-
class sets realized with real audio. A low-dimensional space is used to gain intuition
into the topological nature of the key space and the transformed data. The method is
tested on pitch-class sets of cardinality 3 and compared with measures from other
work. The results are reported as ambiguity measures which indicate the inverse of
tonalness. That is, a Tn-type with a high ambiguity measure is less likely to have
strong tonalness in reference to common-practice tonality. The model's output
correlates well with ambiguity measures derived from other key profiles and flat
diatonic profiles. Future work will involve experiments with higher cardinality and
also recorded music chosen from different tonal and atonal styles.
References
[1] Rahn, J.: Basic Atonal Theory. Longman, New York (1980)
[2] Temperley, D.: The Tonal Properties of Pitch-Class Sets: Tonal Implication, Tonal
Ambiguity, and Tonalness. In: Hewlett, W.B., Selfridge-Field, E. (eds.) Computing in
Musicology. Tonal Theory for the Digital Age, vol. 15, pp. 24–38 (2008)
Estimating the Tonalness of Transpositional Types 153
[3] Van Egmond, R., Butler, D.: Diatonic Connotations of Pitch-class Sets. Music
Perception 15, 1–29 (1997)
[4] Huron, D.: Interval-class Content in Equally Tempered Pitch-class Sets: Common Scales
Exhibit Optimum Tonal Consonance. Music Perception 11, 289–305 (1994)
[5] Brown, H.: The Interplay of Set Content and Temporal Context in a Functional Theory of
Tonality Perception. Music Perception 5(3), 219–250 (1988)
[6] İzmirli, Ö.: Audio Key Finding Using Low-Dimensional Spaces. In: Proceedings of the
International Conference on Music Information Retrieval, Victoria, Canada (2006)
[7] Gómez, E.: Tonal Description of Music Audio Signals, Ph.D. Dissertation, Pompeu Fabra
University, Barcelona (2006)
[8] Bartsch, M.A., Wakefield, G.H.: To Catch a Chorus: Using Chroma-based
Representations for Audio. In: Proceedings of the IEEE Workshop on Applications of
Signal Processing to Audio and Acoustics, New Paltz, NY (2001)
[9] Fujishima, T.: Realtime Chord Recognition of Musical Sound: A System Using Common
Lisp Music. In: Proceedings of the International Computer Music Conference, Beijing,
China, pp. 464–467 (1999)
[10] Pauws, S.: Musical Key Extraction from Audio. In: Proceedings of the Fifth International
Conference on Music Information Retrieval, Barcelona, Spain (2004)
[11] Harte, C., Sandler, M., Gasser, M.: Detecting Harmonic Change in Musical Audio. In:
Proceedings of AMCMM 2006, Santa Barbara, California, USA (2006)
[12] Sheh, A., Ellis, D.P.W.: Chord Segmentation and Recognition using EM-Trained Hidden
Markov Models. In: Proceedings of the International Conference on Music Information
Retrieval, Baltimore, Maryland, USA (2003)
[13] Hu, N., Dannenberg, R.B., Tzanetakis, G.: Polyphonic Audio Matching and Alignment
for Music Retrieval. In: Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics, New Paltz, NY, USA (2003)
[14] İzmirli, Ö.: Cyclic Distance Patterns Among Spectra of Diatonic Sets: The Case of
Instrument Sounds with Major and Minor Scales. In: Hewlett, W.B., Selfridge-Field, E.
(eds.) Computing in Musicology. Tonal Theory for the Digital Age, vol. 15, pp. 11–23
(2008)
[15] Purwins, H., Graepel, T., Blankertz, B., Obermayer, K.: Correspondence Analysis for
Visualizing Interplay of Pitch Class, Key, and Composer. In: Luis-Puebla, E., Mazzola,
G., Noll, T. (eds.) Perspectives in Mathematical Music Theory (2003)
[16] Purwins, H., Blankertz, B., Obermayer, K.: Pitch Class Profiles and Inter-Key Relations.
In: Hewlett, W.B., Selfridge-Field, E. (eds.) Computing in Musicology. Tonal Theory for
the Digital Age, vol. 15, pp. 73–98 (2008)
[17] Burgoyne, J.A., Saul, L.K.: Visualization of Low Dimensional Structure in Tonal Pitch
Space. In: Proceedings of the International Computer Music Conference (ICMC 2005),
Barcelona, Spain, pp. 243–246 (2005)
[18] Lerdahl, F.: Tonal Pitch Space. Oxford University Press, New York (2001)
[19] Parncutt, R.: Revision of Terhardt’s Psychoacoustical Model of the Root(s) of a Musical
Chord. Music Perception 6, 65–94 (1988)
[20] Krumhansl, C.L., Kessler, E.J.: Tracing the Dynamic Changes in Perceived Tonal
Organization in a Spatial Representation of Musical Keys. Psychological Review 89,
334–368 (1982)
[21] Temperley, D.: The Cognition of Basic Musical Structures. MIT Press, Cambridge (2001)
[22] Aarden, B.J.: Dynamic Melodic Expectancy. Ph.D. Thesis, Ohio State University, Music
(2003)
Musical Experiences with Block Designs
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 154–165, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Musical Experiences with Block Designs 155
rhythms based on the (11,6,3) design, a system worked out by Jeffery Dinitz and
his student Susan Janiszewski. Another example is the mapping of Messiaen’s
modes on the set X: Mode 2 with (6,3,2) 10 blocks, Mode 3 with (9,3,1) 12
blocks, Mode 4 with (8,4,3) 14 blocks, Mode 5 with (8,4,6) 28 blocks Mode 6
with (8,3,6) 56 blocks, Mode 7 with (10,4,2) 15 blocks.
As we have previously remarked, a t-design has only four parameters t-(v, k, λ).
From these quantities, we can easily derive some combinatorial properties. For
example, the number of blocks that contain any i-set is given by
v−i k−i
bi = λ / , i = 0, 1, ..., t (1)
t−i t−i
where ab = a!/b!(a − b)! indicates the binomial coefficient. In particular, the
number of blocks of a t-design is
v! (k − t)!
b=λ (2)
(v − t)! k!
And by setting
(v − 1)! (k − t)!
r=λ (3)
(v − t)! (k − 1)!
we get the following relation
bk = vr (4)
As we have seen, two t-designs are isomorphic if there is a bijection between there
blocks, and this reduces the research of representative. From a set theoretical
perspective, the knowledge of a t-design D = (X, B) leads to the knowledge of
its complement Dc = (X, X\B) where X\B is the set of blocks
X\B = {B c , B ∈ B}
Remark that D and Dc have the same number of blocks, and for t = 2, the block
design D with b blocks
v(v − 1)λ (v − 1)
b= , r=λ , bk = vr (6)
k(k − 1) (k − 1)
For example, the D = (7, 3, 1) design has an automorphism group Aut(D) equal
to the goup L3 (2) of 168 elements with presentation
where a and b are the permutations (in cyclic notation) of seven elements
a = (0 3 4 1 2 5 6), b = (1 2 0 3 5 6 4) (8)
2 Drawing t-Designs
Until now t-designs have rarely been used for musical purposes. Moreover, there
exists no canonical way to draw a t-design. Usually, musical transformations are
not considered in the mathematical litterature of t-designs. We will restrict to
the most common musical transformations, namely
1. Transpositions:
Tn (x) = x + n (mod v) (9)
2. Inversions
In (x) = −x + n (mod v) (10)
3. Affine transformations
The Kirkman problem (see also [8]) has been stated in 1850 by Thomas P.
Kirkman: Fifteen young ladies in a school walk out abreast for seven days in
succession : it is required to arrange them daily, so that no two walk twice abreast.
Since that time, we define a Kirkman Triple System (KTS) as a resolvable Steiner
Triple System, also called the social golfer problem in computer science. The
following theorem limits the cardinality of the point set.
For v = 15, it has been shown that there are eighty Steiner Triple Systems
(15,3,1). One solution is given in table 2:
The musical question is: how to draw this solution showing each parallel class
and considering musical transformations between them? Reinhard Laue [9] stud-
ied some visualizations of Steiner Systems which make resolvability obvious, and
Tom Johnson [6] gave some drawings considering sub-networks in t-designs. For
a simplier design such as (6,3,2), which is the best representation? Is it a graph
where the set of vertices is the point set, or a graph where vertices are blocks ?
(fig. 1).
In figure 1, the opposite borders are supposed to be glued together in the sense
of the arrows, in such a way that if you leave the bottom through the line [3, 4] of
the triangle {2, 3, 4}, you enter by the top through the same line in the triangle {1,
3, 4}. Each triangle has three neighbours. A compositional problem would be to
find Hamiltonian paths (i.e. paths that visit each vertex exactly once) or Hamil-
tonian circuits (i.e. cycles that visit each vertex exactly once and return to the
starting vertex), when vertices are blocks of a t-design. In figure 1, it corresponds
to the second graph (on the right) or to the dual graph of the first graph (on the
left).
158 F. Jedrzejewski, M. Andreatta, and T. Johnson
3 Cyclic Representations
In some cases, blocks can be constructed from generators under the action of
a group. This is the case when q = pα is a prime power, and the action is
the translation T1 of the cyclic group. The Steiner Triple Systems 2-(q 2 + q +
Musical Experiences with Block Designs 159
The following table (table 3) shows the first designs. Observe that there is no
Steiner Triple System for P G(2, 6), since 6 is not a prime power.
The parameters of designs are given in the first column, the second column
gives the prime power pα written P G(2, pα ) and the last column gives a gen-
erator. The action of the translation T1 in Zp2 +p+1 yields to the set of blocks.
Namely for (7,3,1), the blocks are B = {0, 1, 3}, T1 (B) = {1, 2, 4}, T12 (B), etc.
Can this construction be generalized for (7, 3, n) design with n > 1 ? Unfor-
tunately not. Look at the first values of n. For n = 1, the design (7, 3, 1) is
generated by B = {0, 1, 3} and the translation T1 (x) = x + 1 mod 7, which
is also the permutation in cyclic notation σ = (0 1 2 3 4 5 6). This design is
represented by a heptagone with outer triangles, corresponding to the blocks.
For n = 2, the design (7,3,2) is not generated by one block and a translation.
However, it is generated by two blocks and two actions: the block B1 = {0, 1, 2}
and the permutation σ = (0 1 5 3 4 2 6) and the block B2 = {0, 1, 3} and the
permutation σ 2 = (0 5 4 6 1 3 2) which is the square of the previous permu-
tation. The drawing of (7,3,2) is a triangulation of two concentric heptagones,
the vertices of each heptagone are labelled by the cyclic notation of the previ-
ous permutations. For n = 3, the design (7,3,3) is generated by the action of
σ = (0 1 3 5 2 6 4) on the blocks B1 = {0, 1, 3} and B2 = {0, 1, 2}, and the
action of σ 4 = (0 2 1 6 3 4 5) on the block B3 = {0, 2, 3}. The drawing (fig. 3)
shows the design (7, 3, 1).
Another question is to determine the generators of a t-design. We sumarize
now some results: Netto Theorem and Singer Difference Sets.
an ST S(pn) on X.
The proof of this theorem is given in [3]. As an example of how the theorem
works, consider the (7,3,1) design. As p = 7, n = 1, t = 1, and α = 3 is a
generator of F7 , then the set {1, α2 , α4 } mod 7 = {1, 2, 4} {0, 1, 3} up to
transposition, is a cyclic generator of B.
Singer Difference Sets are introduced in [4]. Let p be a prime, and m a non-
negative integer. Let f (x) be a primitive polynomial of degree m in Fp .
u0 = 1, u1 = · · · = um−1 = 0 (15)
un = −(a1 un−1 + · · · + am−1 u1 + am )
We would like now to study the relationship between Forte’ pcsets and t-designs.
Precisely, we would like to investigate the question: is there a t-design t-(v, k, λ)
with v ≤ 12 such that k-blocks include all k-pcsets in Forte classification ? Such
a design is called a Forte design. If the point set is identified with pitch classes
(v ≤ 12), each block can be considered a chord. If all chords are described by the
design, the design is a Forte design. A computer program analyzing the 2-designs
given by the Encyclopedia of t-designs shows that the 2-designs do not lead to a
Forte design. In table 4, the first column gives the parameters of the design, the
second column the number of blocks b in the design, the third column gives the
complement of the design. In the fourth column is the Forte name of at least a
missing k-chord. Stars indicate autocomplementation, and n a positive integer.
To have a complete pcset of k-chords, at least two sets of blocks are required. For
example, the (9, 3, 1)-design has under the action of σ 1 = (2 6)(3 8)(4 7)(0)(1)(5)
⎛ ⎞
000011142352
B1 = ⎝ 1 2 3 4 3 4 2 6 6 5 7 3 ⎠ (18)
687587587684
all Forte’s trichords except 3-7 and 3-12. And under the action of σ 2 = (2 7 8 6
5 4 3)(0)(1)
⎛ ⎞
000011123425
B2 = ⎝ 1 2 3 4 2 3 4 7 5 6 3 6 ⎠ (19)
756868587748
it contains all Forte’s trichords except 3-8 and 3-11. That way, using two sets of
blocks of the same design, a composer can use the whole set of trichords.
To conclude this section, we would like to mention the link of t-design with
Mathieu Groups. First, as it has been underlined in [5] Olivier Messiaen’s Ile de
feu 2 use two permutations in cyclic notation
a = (1 7 10 2 6 4 5 9 11 12)(3 8) (20)
b = (1 6 9 2 7 3 5 4 8 10 11)(1 2)
which generate Mathieu’s group M12 of order 95040. In the same way, Les
yeux dans les roues (O. Messiaen, Livre d’orgue VI) is built on six permuta-
tions (a permutation and five actions): σ 0 = (1 11 6 2 9 4 8 10 3 5) and
for j = 1, ..., 5, σ j = Aj σ 0 , the actions are defined by: Extremes au centre:
A1 = (2 12 7 4 11 6 10 8 9 5 3), Centre aux extrêmes: A2 = (1 6 9 2 7 3 5 4 8 10
11); Rétrograde: A3 = (1 12)(2 11)(3 10)(4 9)(5 8)(6 7), Extrêmes au centre,
rétrograde: A4 = A1 A3 and Centre aux extrêmes, rétrograde: A5 = A2 A3 . If we
set a = A−1 3 2
2 A1 and b = A2 A1 A2 A1 these permutation generate the Mathieu
group M12 of presentation
The two first Mathieu groups are built with eleven or twelve points. Neither
M11 , nor M12 are Forte designs. In M11 11 pcsets are missing (5-1, 5-3, 5-5,
etc.), and in M12 12 pcsets are missing (6-1, 6-4, 6-7, etc.). In M12 if we take
only three notes in each block, we get neither 3-12, nor 3-1.
5 A Compositional Application
To show a specific compositional application for all this, and also to summarize
the combinations and graphs that come together in block designs, we offer a
brief analysis of the third movement of Johnson’s Twelve for Piano (2008) (the
score is reproduced in Annexe). This one-page piece uses the precise four-note
chords produced by one of the over 17 million possible solutions of the (12,4,3)
design, where 12 elements (notes) are partitioned into 33 subsets (chords) of four
elements (notes), such that each pair of notes appears in exactly three of the
chords.
To write this music, the composer needed to map the system, so as to see
how the 33 chords related to one another, and to do this he drew a graph
Musical Experiences with Block Designs 163
by connecting chords when they had no notes in common. The graph would be
different for each of the 17 million solutions, but In this case it takes the shape of
the three hexagonal formations shown here. The three shaded triangles represent
three parallel classes, three cases where chords with no notes in common come
together as a collection of all 12 notes. These nine chords form the central section
of the piece, beginning with (1,3,7,9) (4,6,10,12) (2,5,8,11). The remaining 24
chords, those in the other two hexagons, form the opening and closing sections.
The first four phrases of the piece, the first 12 chords, come from the hexagon
at the lower right, beginning with the inner ring: (5,7,8,12) (1,2,3,6) (4,9,11,12)
and (5,6,7,10) (1,3,4,8) (2,9,10,11) followed by the outer ring: (3,4,5,8) (1,2,6,11)
(7,8,9,12) and (3,5,6,10) (1,4,11,12) (2,7,9,10). The final four phrases follow the
hexagon at the lower left in this same manner. We have not shown the numbers
on the accompanying score, but this is rather easy to decipher, since 1 is the
lowest note of the scale and 12 is the highest.
Simply following the connections in this way produces a number of remarkable
symmetries, symmetries that are difficult to imagine in a rigorous 12-tone music,
and surely impossible in any non-rigorous music.
• Consider first of all the cadences. The first and second phrases both end
on D-F-sharp, and the third and fourth phrases both end on B-flat-D. The final
four phrases in the piece rhyme in this same way.
• The notes marked “a” appear twice in the same phrase. These same notes
are omitted either in the phrase just before or in the phrase just after. Each of
the 12 notes appears exactly 11 times in the piece.
• The intervals marked “b” occur at the same place in two subsequent phrases.
164 F. Jedrzejewski, M. Andreatta, and T. Johnson
• The “c” interval of the first phrase appears again in the third phrase, and
there is a similar pair of “c” intervals in the last section of the piece.
• In the middle section, containing three complete sets of 12 notes, one finds
three “d” intervals, three “e” intervals and three “f” intervals, though it is dif-
ficult to explain why they fall as they do. But then, it is difficult to explain all
these other symmetries as well. The music produced by this block design simply
does not behave like music we already know.
References
1. Andreatta, M.: Méthodes algébriques dans la musique et musicologie du XXe siècle,
Thèse, Ircam/Ehess (2003)
2. Blanchard, J.L.: A construction for Steiner 3-designs. J. Combin. Theory A 71,
60–67 (1995)
3. Colbourn, C., Rosa, A.: Triple Systems. Clarendon Press, Oxford (1999)
4. Colbourn, C., Dinitz, J.: Handbook of Combinatorial Designs, 2nd edn. Chapman
& Hall, Boca Raton (2007)
5. Jedrzejewski, F.: Mathematical Theory of Music. Ircam/Delatour, Paris (2006)
6. Johnson, T.: Networks, Harmonies found by Tom Johnson. Editions 75, Paris (2006),
www.editions75.com
7. Kirkman, T.P.: On a problem in combinations. Cambridge and Dublin Math. J. 2,
191–204 (1847)
8. Kirkman, T.P.: On the puzzle of the fifteen young ladies. London Philos. Mag. and
J. Sci. 23, 198–204 (1862)
9. Laue, R.: Resolvable t-designs. Des. Codes Crypt. 32, 277–301 (2004)
Musical Experiences with Block Designs 165
III
3 œ œ # ˙˙
q = 128
˙˙ œ œ # ˙˙ ˙˙
a
& 4 #œ #˙ #œ Œ Œ
a
Œ b˙ Œ
a a
? 3 Œ #˙ œ b œ Œ Œ œ b˙ œ ˙b
# œ ˙˙ bœ Œ Œ
c
4 ˙
œ œ #˙
Œ #˙ # œ b ˙˙ ˙˙ œ Œ Œ œ b ˙˙ ˙˙
a
& #œ Œ Œ Œ Œ
a a
a
? œ # ˙˙ œc b œ # ˙˙ œ ˙ bœ Œ Œ
#œ ˙
b
˙. œ #˙. bœ
#˙. #œ b˙. œ
d
.
e
& ˙ bœ ˙
e
Œ
f f
Œ Œ Œ Œ Œ Œ Œ Œ Œ Œ
˙. œ b˙.
? . œ #˙. #œ ˙ Œ œ b˙. œ
Œ̇ Œ Œ Œ Œ Œ Œ Œ Œ Œ
˙. ˙ ˙. ˙. bœ
˙. #œ ˙
d d
&Œ œ Œ #œ Œ
e
f
Œ Œ Œ Œ Œ Œ Œ
? ˙. œ ˙
Œ . #œ
˙.
#œ .
œ ˙ Œ
Œ Œ Œ̇ Œ Œ Œ Œ̇ Œ
œ Œ Œ b œ ˙˙ œ #˙ ˙
˙˙b
#œ
# œ #˙ ˙
b a a
Œ bœ Œ Œ Œ Œ
a
&
? b œ b ˙˙ œ Œ Œ ˙ ˙ œ b ˙˙ bœ Œ Œ ˙ ˙
Œ Œ Œ
a
œ œ
c
˙ b œ b ˙˙b œ Ó ˙ œ b ˙˙ #œ
& Œ bœ # œ Œ Œ
a b
? b œœ b ˙ œ Œ Œ # œ n ˙˙ ˙˙
Œ œ b˙ Œ Œ
˙
Œ ˙
˙˙
Œ
a
œ
a c a
A Generalisation of Diatonicism and the
Discrete Fourier Transform as a Mean for
Classifying and Characterising Musical Scales
1 Introduction
Over centuries, western musicians have extensively used half a dozen of hepta-
tonic scales, but combinatorics teach us that they represent only a tenth of the
totally available musical material. Many catalogues exist, but they often reduce
to numerical tables, that may not be easy to handle for composers.
The musician and composer Pierre Audétat [2] developed a numerical and
graphical representation of all 66 heptatonic scales and their 462 associated
modes. Such a cartography, called the Diatonic Bell, opens a field of experiment
equally relevant for composition and analysis, and presents interesting develop-
ments for teaching.
The first part of this paper deals with the classification and ordering of
scales obtained with the diatonic bell, presenting a mathematical formulation
of Audétat’s original empirical work. The second part investigates scales in the
chromatic circle using the Discrete Fourier Transform (DFT) in order to exhibit
certain scales with remarkable properties.
David Lewin proposed this tool in 1958 for analysing intervallic relationships.
The idea was pursued by Ian Quinn [7] for classifying chords and by Emmanuel
Amiot [1] for redefining Clough and Douthett’s maximal evenness [4]. Inspired
by this work, we will see how DFT coefficients reflect the geometric configuration
of a scale in the chromatic circle, and how they can be used to characterise scales.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 166–179, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A Generalisation of Diatonicism and the Discrete Fourier Transform 167
The two methods differ structurally, the former being tonal, the latter atonal.
We will discuss in the conclusion some points of convergence between these two
approaches.
where 1lS is the indicator (or characteristic) function of the subset S. The max-
imally even scale will maximise the module of the d-th coefficient.
S0 := argmaxS ∈S dc F S ([d]c ) (5)
A Generalisation of Diatonicism and the Discrete Fourier Transform 169
C12 [0]12
[11]12 (0)
[1]12
mS
0
[9]12 C7 [3]12
[5]7 [2]7
[4]7 [3]7
[8]12 [4]12
[7]12 [5]12
[6]12
([0] )
Fig. 1. The diatonic scale’s dorian mode is the reference centred mode mS d in the
0
usual context (c = 12 and d = 7). It begins with a D ([0]12 ).
([0] )
mS d ([0]d ) = [0]c . (9)
0
(0)
(0)
mS = argminmS dCc (mS ([k]d ), mS0 ([k]d )) (10)
[k]d ∈Cd
dCc : Cc × Cc → IN
(11)
([x]c , [x ]c ) −→ argminn∈[x]c ,n ∈[x ]c |n − n |Z
[0]12
[11]12 [1]12
[0]7
[6]7 [1]7 [10]12 [2]12
[8]12 [4]12
[4]7 [k]7
[x]12 [x0 ]12
[6]12
(0)
mS
a
Fig. 2. Chromatic alteration. Two sharps ([a]12 = [+2]12 ) alter a G ([x0 ]12 = [5]12 ).
Before changing our representation space for the diatonic spiral, modelled by
the integers Z, an unfolding operation of the chromatic circle is needed. We
already defined a distance on Cc ; we still need to know the direction from one
chromatic coordinate [x]c to another [x ]c .
uCc : Cc −→ Z
(14)
[x]c −→ sgnZc ([x]c ) · dZc ([x]c )
It is now possible to compute the original diatonic coordinate ξ0 and the diatonic
alteration α on the diatonic spiral for every step [k]d ∈ Cd of a mode.
α := d · uCc ([a]c )
(15)
ξ0 := uCc ([d]−1
c · [x0 ]c )
A Generalisation of Diatonicism and the Discrete Fourier Transform 171
[0]7
[6]7 [1]7
C7
[5]7 [2]7
(0)
mS
(0) [4]7 [k]7
mS
0
Fig. 3. The diatonic alteration corresponding to Fig. 2. The diatonic spiral is modelled
by the discrete line of integers. G is indexed by 13 = −1 + 7 · 2.
The same relation as in (12) holds for the diatonic space. The final diatonic
coordinate is given by
ξ := ξ0 + α, (16)
a process depicted in Fig. 3. Note that it is impossible for two different pairs
(ξ0 , α) to correspond to a same ξ. On the diatonic spiral we have G = −1 + 7 =
+6 = −6 = +1 − 7 = A. This is due to the fact that Z = {−3, . . . , +3} ⊕ 7Z.
S > S :⇔ ∃k ∈ IN : ρ([k]d ) > ρ ([k]d ) and ρ([k̃]d ) = ρ([k̃]d ), ∀k̃ < k (18)
where ρ = ξ◦o. In case of an antisymmetric pair, the scale containing the greatest
positive coordinate is given index +1, and the scale with the greatest negative
coordinate index −1. Fig. 4 shows an example of his construction.
This procedure was first applied to the heptatonic scales, the result can be
seen in Fig. 5.
172 J. Junod et al.
−7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7
S(12,+1) 7 6 4 4 2 2 1
≥ = = ≥ ...
S(11,−1) 7 6 3 3 2 2 1
−7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7
Fig. 4. Two successive classes, 11 and 12, of antisymmetric pairs −1 and +1 are being
compared by testing for the spread of their diatonic distribution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
A## 15 -15
D## 14 -14 •
G## 13 -13 • •
C## 12 -12 • •
F## 11 -11 • • •
B# 10 -10 • • •
E# 9 -9 • • •
A# 8 -8 • • • • • •
D# 7 -7 • • • • • • •
G# 6 -6 • • • • • • • •
C# 5 -5 • • •
F# 4 -4 • • •
B 3 -3 • • •
E 2 -2 • • • • • •
A 1 -1 • • • • • • • • •
D 0 0
G -1 1 • • • • • • • • • • •
C -2 2 • • • • • • • • • • •
F -3 3 • • • • • • • • • •
Bb -4 4 • • • • • • • • •
Eb -5 5 • • • • • • • •
Ab -6 6 • • • • • • • • • • •
Db -7 7 • • • • • • • • •
Gb -8 8 • • •
Cb -9 9 • •
Fb -10 10 •
Bbb -11 11 • •
Ebb -12 12 •
Abb -13 13
Dbb -14 14
Gbb -15 15
Fig. 5. 2006
c Pierre Audétat. His original diatonic bell for heptatonic scales, as pro-
posed in [2]. Each cell represents a note and the mode corresponding to it. Each column
contains a dihedral class, consisting either of a single symmetric scale or a pair of in-
verse scales. Alterations increase from the diatonic scale on the left to the maximally
altered chromatic scale on the right. Each row represents a diatonic coordinate. The
origin of the vertical axis is D, units are in steps of fifths. Black cells are symmetric
notes, gray cells anti-symmetric notes, the bullet distinguishes the negative scale from
the positive.
The online catalogue offers many musical examples of a same melody trans-
formed into each of the 462 heptatonic modes. There are two possibilities to
transform pitches in order to preserve their role from the diatonic to the target
A Generalisation of Diatonicism and the Discrete Fourier Transform 173
scale, depending on the presence or absence of notes foreign to the diatonic scale
(black keys). In the first case, only the scale (along with its complement) can be
mapped. Information about a possible mode gets lost. In the second case, it is
possible to play with modes, and even to transpose a melody from one mode to
the other within a same scale.
In the diatonic scale, we can identify every pitch class [x0 ]c with a specific
([n ] )
step [k]d of a given mode mS 0 c of the reference scale S0 , and then map it to
0
([l] )
the same step of a given mode mS c in the target scale S .
Computing the [k]c -th DFT coefficient reduces to the vector addition of d unit
vectors pointing to the (possibly multi-) set [k]c · S , as shown by [1].
Is the index k coprime with c, the sum (4) will be computed on a shuffled
regular c-polygon. Otherwise, it is computed on a polygon having fewer vertices,
possibly populated with more than one pitch class. Such situations are described
in [7]. They are called balances, because the DFT coefficients then point to a lack
of equilibrium in the pitch class distribution.
If we display all pitch classes that accumulate in a given angle, we get stars
with kc branches, as in Fig. 7. Pitch classes occupying symmetric positions at
174 J. Junod et al.
Im
[3]12
[4]12 [2]12
[5]12 [1]12
0
[6]12 [0]12
Re
[7]12 [11]12
[8]12 [10]12
S1 [9]12
Fig. 6. The embedding of the diatonic scale S1,0 in the unit circle S 1 of the complex
−i 2π ·x
plane C. A unit vector e c points to each chromatic coordinate [x]c .
F S1,0 ([2]c ) F S1,0 ([3]c )
[9]12
[8]12 [7]12
[5]12
[3]12
[4]12 [5]12
[7]12
[10]12 [11]12
[11]12
F S1,0 ([4]c ) F S1,0 ([6]c )
[10]12
[7]12
[4]12
[1]12
[11]12 [9]12 [7]12 [5]12 [3]12 [1]12 [0]12 [2]12 [4]12 [6]12 [8]12 [10]12
[0]12 [3]12 [6]12 [9]12
[2]12
[5]12
[8]12
[11]12
Fig. 7. The four DFT balances of the diatonic scale S1,0 . The arrow represents the
[k]12 -th coefficient, in this case a unit vector always pointing to a single unbalanced
pitch class.
diameters, or regular triangles cancel each other out. The vector resulting from
their sum points at the origin and yields a null DFT coefficient. Only pitch
classes in “excess”, that are not balanced by some other ones, contribute to the
coefficient.
A Generalisation of Diatonicism and the Discrete Fourier Transform 175
But a triple cancellation is possible in the hexagonal [2]12 -th balance. This is
achieved by the melodic minor, S2,0 , as well as S12,±1 and S29,0 .
The coefficients of the DFT show a particularly nice behaviour for two op-
erations common in music. Both preserve the dihedral class numbering of the
diatonic bell.
1. Inversion. It is connected to the scales symmetry. The real part of the DFT
coefficients is an even function, the imaginary part an odd one. In case of
a palindromic (symmetric) scale, it hence must disappear. Corresponding
phases of asymmetric pairs will have opposite signs.
2. Complementation. Moving from a pentatonic S to a heptatonic scale S c
preserves DFT modules, and inverses phases of non null even coefficients
This follows from the linearity of the DFT,
d · δk,0 = F Cc ([k]c ) = F S ([k]c ) + F S c ([k]c ) ∀[k]c ∈ Cc (21)
S c = −S c . (22)
Also notice that the indicator function of a scale is a real function, so its DFT
is symmetric: there are only
2c + 1 independent coefficients.
Having restated those general principles, we now turn to the interpretation of
particular phases and modules. We keep coefficient F S c ([0]c ) aside. It always
points towards the positive real direction (null phase), and its length measures
the (already given) scale’s cardinality.
3.1 Phases
Coefficient F S ([ 2c ]c ) tells if there is an excess of even or odd pitch classes. In
the first case, the phase will be null, in the second case, the coefficient points to
the negative region of the real axis, and the phases is ±π.
For all other coefficients, the phase indicates the direction of the resulting
unbalanced excess. Fig. 7, shows how class B ([9]12 ) is unbalanced for the second
coefficient: it is the famous tritonus B-F that populates twice one corner of the
hexagon. The coefficient will thus point to −1 and the phase be equal to ±π.
Since coefficients of palindromic scales are real, their phases will be either 0
(positive) or ±π (negative). For asymmetric scales, the phases will be opposites.
176 J. Junod et al.
3.2 Modules
We will measure three different aspects of the geometric configuration of scales
with help of the modules a DFT. They all have to deal with the idea of uniform
distribution of pitch classes across the chromatic circle. The integers d and c are
coprime, which prevents us from finding an absolutely regular d-polygon, where
the three criteria would be confounded.
Symmetry. The first coefficient of the DFT becomes the sum of unit vectors
pointing to each of the pitch classes.
σ : Scd −→ IR
(23)
S −→ F S ([1]c )
A lower index indicates a higher degree of symmetry, the perfect case being
achieved when the sum is null (all vectors cancel out). In c = 12, only the
double harmonic scale (S5,0 ) shows a perfect balance (Fig. 8). The chromatic
scale (S38 ) being compactly grouped on one side of the circle shows the worst
results.
[3]12 [3]12
[4]12 [2]12 [4]12 [2]12
0 0
[6]12 [0]12 [6]12 [0]12
Re Re
σ = 0.0 σ = 3.8
Fig. 8. Vector addition and symmetry index σ. The perfectly symmetrical double har-
monic scale is built with an augmented triad [0]12 , [4]12 , [8]12 that forms a regular trian-
gle and two triton pairs [1]12 , [7]12 and [5]12 , [11]12 . In the least symmetrical chromatic
scale, only the triton [3]12 , [9]12 is neutralised, leaving five unbalanced pitch classes.
3.3 Periodicity
It is well known that the DFT measures periodicity. The higher the modulus of
the [k]c -th coefficient, where k|c, the more kc -periodic is the pitch class distribu-
tion. We define an index measuring the periodicity of a scale with:
π : Scd −→ IR
(24)
S −→ maxF S ([k]c ).
k|c
A higher index shows higher periodicity. It is maximal for the unitonic scale
(S4,0 ), see Fig. 9.
A Generalisation of Diatonicism and the Discrete Fourier Transform 177
Unitonic
[11]12 [9]12 [7]12 [5]12 [3]12 [1]12 [0]12 [2]12 [4]12 [6]12 [8]12 [10]12
π=5
Fig. 9. Periodicity π and the [6]12 -th balance. The unitonic scale contains all odd
pitch classes, that form one of the two whole tone scales, whose periodicity is 12
6
= 2.
This achieves an excess of 5 odd pitch-classes, the maximum reachable in c = 12.
The diatonic scale, whose maximal evenness ensures no excess greater than 1 on any
coefficient, obtains the worst score, see Fig. 7.
4
3
Symmetry
2
1
5 0
4
Periodicity
3
2
1
4 0
Chord Quality
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Fig. 10. Comparison of the three module based DFT indices σ, π and ε versus the
diatonic bell’s linear ordering of dihedral classes. Numbering of heptatonic scales goes
from 1 on the left for the diatonic scale S1,0 , towards the chromatic scale S38,0 on the
right.
4 Conclusion
The diatonic bell and the DFT differ in their structure. Whereas the underlying
space of the former is infinite, the usual definition of a DFT requires finiteness.
Nevertheless, both are constructed on an analogous principle: the balance. The
idea of a physical balance lies behind the process of centring scales in the diatonic
bell, and this image also helps for interpreting Fourier coefficients.
By lifting up scales from the chromatic circle to the spiral of fifths, the diatonic
bell adds a tonal structure to the atonal combinatorics of musical set theory.
Although the DFT is defined on the chromatic circle and, in this sense, is purely
atonal, it shares some elements with the diatonic bell, namely the relevance of
symmetry and the ability of pinpoint the diatonic flavour of some scales.
4.1 Symmetry
The role pitch class D plays as a symmetry axis in both the chromatic and
diatonic worlds is clearly shown. This remarkable fact was already noticed by
the french music theorist and composer Camille Durutte in his treatise of 1855
[5], where he described pitch classes with 31 integers, ranging from −15 to +15,
centred around D = 0, and ordered by fifths. The diatonic bell’s horizontal axis
thus already appeared in the first historic attempt to formalise pitch classes
algebraically.
The symmetry axis is also essential for the DFT, since it lies on the real axis
of the complex plane. Inversion then corresponds to complex conjugation.
Using the centred and compact representatives of the diatonic bell has two
advantages. The comparison between transpositional classes makes sense and
interpreting phases of the DFT coefficients becomes more accessible: it eliminates
a great amount of uninformative components that would have been induced by an
arbitrary rotation. The most striking fact is that the coefficients of palindromic
scales are purely real.
The diatonic bell displays scales as a deformation of the diatonic scale and
arranges them according to the their degree of compactness on the spiral of
fifths, ranging over all dihedral classes from the diatonic to the chromatic. Our
initial intention was to use this linear ordering to define a measure of a scale’s
diatonic or chromatic character. The former being the maximally even scale, the
latter the minimally even one, we expected to observe the same trend with DFT
coefficients measuring regularity in the geometric configurations. As shown in
Fig. 10, DFT-based indices did not confirm the bell’s ordering. The chromatic
or diatonic character of a scale does not reduce to a one-dimensional question,
at least not this way.
On the other hand, almost all measures succeed in isolating the diatonic and
chromatic scales as poles. What happens in between is less clear, but both ap-
proaches converge in distinguishing a group of particular scales, formed by the
A Generalisation of Diatonicism and the Discrete Fourier Transform 179
six first scales located on the left side of the diatonic bell. They correspond
exactly to those used in the western tradition: diatonic (S1,0 ), melodic minor
(S2,0 ), harmonic major (S3,−1 ) and minor (S3,+1 ), unitonic (S4,0 ) and double
harmonic (S5,0 ).
One reason may be that they have to be the most compact, so that the
tonic pitch class D, is surrounded with its dominant A and subdominant G, a
feature essential for tonal music. Note that only three other scales show a similar
behaviour: S22,±1 and S28,0 . Optimums of the geometrical measurements defined
with help of DFT modules in Sec. 3 systematically exhibit scales from this same
harmonic block.
– Diatonic is the most even: ε(S1,0 ) = 3.73.
– Minor melodic is also one
of the three balanced scales with regard to the
triton periodicity: F S2,0 ([2]12 ) = 0.
– Unitonic is the most periodic: π(S4,0 ) = 5.00.
– Double harmonic is the most symmetric: σ(S5,0 ) = 0.00.
In that case, the diatonic bell’s requirement for compactness seems to agree with
those of he DFT for regularity. This follows from the property of the diatonic
scale to be generated by a succession of fifths, and that this sequence is not
degraded too much for the first scales. Convergence between musical practice
and mathematical interest as personified by the diatonic scale seems to extend
also to the neighbour scales.
We are currently working on the implementation of these two approaches (the
diatonic bell and the DFT) within OpenMusic visual programming language, as
a package included in the MathTools environment. These new tools will allow
the user to automatically generate diatonic bells and musical transpositions for
the heptatonic and pentatonic scales. Divisions of the octave other than c = 12
will also be handled, as long as the requirements on the parameters are fulfilled
(Sec. 2.1). One simply should take care about the exponential growth of the
diatonic bell in microtonal context.
References
1. Amiot, E.: David lewin and maximally even sets. Journal of Mathematics and Mu-
sic 1(3), 157–172 (2007)
2. Audétat, P.: La cloche diatonique. Jazz Deptartment, University of Applied Sciences
of Western Switzerland (2006)
3. Carey, N., Clampitt, D.: Aspects of well-formed scales. Music Theory Spec-
trum 11(2), 187–206 (1989)
4. Clough, J., Douthett, J.: Maximally even sets. Journal of Music Theory 35, 93–173
(1991)
5. Durutte, C.: Esthétique musicale. Technie, ou lois générales du système harmonique.
Mallet-Bachelier, Paris (1855)
6. Knuth, D.E.: Generating All Combinations and Partitions. The Art of Computer
Programming, vol. 4(fascicle 3). Addison-Wesley, Reading (2005)
7. Quinn, I.: A Unified Theory of Chord Quality in Equal Temperaments. PhD thesis,
Eastman School of Music, University of Rochester (2004)
The Geometry of Melodic, Harmonic, and
Metrical Hierarchy
Jason Yust
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 180–192, 2009.
© Springer-Verlag Berlin Heidelberg 2009
The Geometry of Melodic, Harmonic, and Metrical Hierarchy 181
Fig. 2–3. Plane trees with musical notes or motions between notes as objects
1998, Marsden 2005, Lerdahl and Jackendoff 1983, Rahn 1979, Smoliar 1980). This
type of hierarchy involves an arbitrary assignment of either of the consonant tones as
the parent of the dissonant one, obfuscating the symmetry of the musical figure.
Larson 1997 criticizes this aspect of the theory of Lerdahl and Jackendoff 1983, and
Lerdahl’s response (1997) is dismissive and inadequate. Proctor and Higgins 1988
and Yust 2006 have pointed out that such “reductionist” theories inaccurately
represent Schenker’s own analytical practice. In more recent computational
applications that take this formal model as the standard (Marsden 2005 and 2007), the
requirement for multiply-embedded arbitrary decisions results in a combinatorial
explosion within the acceptable analysis of short musical phrases. Yust 2006 proposes
a solution to the the interpretive awkwardness of hierarchies with notes as objects:
bracketing intervals between successive notes, rather than the notes themselves, as
shown in Figure 3.
Meter presents a precisely analogous situation: it is not a bracketing on timepoints
but on timespans (intervals between timepoints). In the rhythm of Figure 1, a whole-
note duration divides into two half-note durations; the weak-beat timepoint does not
group with either the preceding or following strong beat by virtue of the meter.
Unlike models of tonal hierarchy proposed by music theorists, models of metrical
hierarchy typically take intervals between events (durations) as their objects rather
than the events themselves (timepoints). Presumably the difference treatment of
parameters comes from musical notation, which reifies durations and pitches rather
than timepoints and intervals. As a result, studies that discuss relationships and
conflict between meter and tonal structure (Komar 1971; Lerdahl and Jackendoff
1983; Rahn 1979; Schachter 1976, 1980, 1987; Yeston 1976) present models in which
time and pitch are treated in fundamentally different ways. Construing tonal
hierarchy in terms of intervals rather than notes corrects this situation, so that
rhythmic and tonal patterns can be compared directly in terms of hierarchic structure.
182 J. Yust
Fig. 4. The bijection between binary plane trees and triangulations of a polygon as
representations of musical hierarchy
Property (3) provides a useful heuristic for present purposes. Although non-binary
relationships can exist in melodic, harmonic, and rhythmic hierarchies, binary
structures are the most important and demonstrate the essential properties of hierarchy
in these modalities. Therefore I will focus here on the binary plane tree, in which all
nodes that are not leaves (called internal nodes) have exactly two children, a left
child and a right child. The application of the Stasheff polytope is more general
however, describing the relationships between all possible plane trees. (See the end
of section 3 below).
Though the objects the musical hierarchies explored here are intervals rather than
more concrete musical objects, networks visually convey their musical implications
more clearly when the nodes correspond to concrete objects. The bijective
equivalence of binary plane trees and triangulations of polygons makes it possible to
create a network on musical objects that gives a hierarchy on directed intervals
between them. Figure 4 illustrates this bijection—which replaces the nodes of the tree
with edges—and shows how to redraw the polygon to show the left-to-right ordering
on the vertices clearly. Note that internal edges in the triangulation (which correspond
to internal nodes in the tree) can be described as upward- or downward-slanting.
Schenker (1987) had the fruitful insight of taking the passing-tone figure of second
species counterpoint as a paradigm for hierarchical relationships in music. However,
the passing-tone figure can only serve as a paradigm of tonal hierarchy if it is
somehow generalized, because literal passing figures alone do not embed one another
recursively. Schenker (1979, 1997) addressed this by harmonizing the dissonant
passing note, as in the Ursatz of Figure 5, and horizontalizing the resulting vertical
consonance so that he could then fill it with new passing motion, as shown in Figure
6. The process implies a tonal hierarchy, also shown in Figure 6, that generalizes the
passing-tone figure as a symmetrical binary division of step-class intervals. That is,
The Geometry of Melodic, Harmonic, and Metrical Hierarchy 183
the passing tone divides the step-class interval of a third into two equal step-class
intervals (seconds) and similarly, the second divides into two fifths, the harmonization
of the passing tone, the fifth divides into two thirds which complete the triad that
supports the passing tone, and new passing motion then fills the horizontalized thirds.
(This process of division is an order-3 automorphism of the group of step-class
intervals, a cyclic group of order seven; see Yust 2007)
I will call this type of hierarchy harmonic structure. Because its building blocks
include triads and series of fifths, it is useful for describing harmonic patterns in tonal
music, particularly sequential harmonic patterns. Its main shortcoming, from the
perspective of music analysis, is also one of its assets: it cuts across compositional
voices and registers, collapsing tonal content into a single hierarchy that can only
show one melodic progression at a time. Therefore, a description that recognizes
counterpoint and the continuity of voices requires a different form of hierarchy on
tones, a purely melodic type of hierarchy, to complement this one.
A melodic hierarchy is one on tones strictly ordered in time and belonging to one
voice. One particularly useful melodic hierarchy emerges from the principles, first,
that the line it represents should be conceived as constantly in motion at every level
(so that the line can include no repeated tones or neighboring motions); and second,
184 J. Yust
n-dimensional space, the vertices of the polytope correspond to binary trees with n + 2
leaves, or triangulations of an (n + 3)-gon.
For example, let us construct the two-dimensional Stasheff polytope, whose points
correspond to the binary plane trees with four leaves or the triangulations of a
pentagon. Figure 8 shows the possible hierarchies on series of five musical objects,
both as trees and as triangulations. We define a point in three-dimensional space for
each triangulation where the coordinates of the point correspond to the vertices of the
triangulation from left to right (which in turn correspond to musical events in the
sequence) excluding the first and last (uppermost) vertices. The value for each
coordinate is given by taking the highest edges to the left and right of that point,
counting the number of boundary edges below them, and multiplying the two
numbers. Figure 8 shows the resulting coordinates for triangulations of the pentagon.
The Stasheff polytope is the convex hull of these points. The dimension of the
polytope is one less than the number of coordinates because the points are constrained
to lie on the plane ∑xi = n(n + 1)/2 (in dimension n). In musical examples, the
numbers define relative weights (e.g., metrical accent) for the corresponding objects.
The edges in the geometry represent elementary “flip” operations that move exactly
186 J. Yust
one internal edge in the triangualtion. A leftward flip replaces an upward-slanted edge
with a downward-slanted one; a rightward flip does the opposite. Figure 9 gives the
2-dimentsional polytope, with arrows pointing in the direction of leftward flips; as
Loday (2007) observes, they establish an important partial ordering on the hierarchies.
It is possible to define operations that relate rhythmic and melodic sequences
whose hierarchies share an edge in the polytope (i.e., are related by a flip). Figure 10
presents a series of melodies that traverse the two-dimensional polytope starting at
( 1 4 1 ) and moving counterclockwise around the perimeter until arriving back at
( 1 4 1 ) with simultaneous rhythmic and tonal operations. Figure 11 shows the
Stasheff polytope in three dimensions. Figure 12 gives an (arbitrary) sample traversal
that circles the three-dimensional polytope through a series of flips in coordinated
rhythmic-melodic structures.
The Stasheff polytope structures the relationships not only between binary
hiearchies, but between any hierarchy representable by a plane tree. All types of
structure described here admit of mixed binary-ternary hierarchies, which are
necessary for dealing with, e.g., triple and compound meter. The cells of the polytope
correspond to the plane trees with n + 1 leaves, with higher-dimensional cells
corresponding to trees with fewer internal nodes. The binary trees, with a maximum
of internal nodes, correspond to the 0-cells (points) of the polytope. Mixed
binary/ternary trees correspond to higher dimensional cells (edges, faces, etc.)
depending on the number of ternary branchings they include.
background of the theme is a descending line G–F–Eb. Schenker gradually adds new
elements in coordinated rhythmic and melodic structure, and then transforms rhythmic
structure to misalign the two hierarchies. Thus, rhythm helps to clarify the structure of
the melody at each introduction of a new element, while progressive transformations
bring it gradually in line with the actual rhythm of the fugue theme. The end result is
a combination of hierarchies far apart in the five-dimensional Stasheff polytope—a
heavily left-weighted tonal structure against a right-weighted rhythm.
Fig. 14. Rhythmic transformations in Schenker’s analysis of the C minor fugue subject
The Geometry of Melodic, Harmonic, and Metrical Hierarchy 189
Melodic structures also relate to harmonic structures through the properties of their
hierarchies. Melodic and harmonic structures, however, relate through the contraction
of their hierarchies, and therefore invoke the relationships between Stasheff polytopes
in different dimensions. The fifths-sequence figure distinguishes harmonic hierarchies
from melodic ones, so the contraction of thirds from below the fifths sequences of a
harmonic hierarchy creates an associated melodic hierarchy. Figure 17 shows this
relationship; notice that the contraction here preserves the stepwise passing motions
of the harmonic structure and introduces some new stepwise relationships, such as the
step from leading tone to tonic. The contracted edges appear as vertical intervals in
the musical notation of Figure 17.
190 J. Yust
#
Fig. 18. Two melodies in counterpoint (from the G minor prelude of Bach’s WTC I) as
differing contractions of a harmonic structure
The Geometry of Melodic, Harmonic, and Metrical Hierarchy 191
prelude of the WTC I (mm. 19–21). The imitative counterpoint of the two upper
voices consists of two different contractions of the underlying harmonic structure, and
the two voices together express the complete structure of the harmonic sequence.
Each contraction in Figure 18 requires two vertex deletions. Geometrically, a vertex
deletion defines an n-simplex in the n-dimensional Stasheff polytope (consisting of all
the hierarchies reducing to a common structure with the deletion of one foreground
vertex). The intersections of these n-simplexes reflect the corresponding simplicial
structure on the (n – 1)-polytope. Two successive vertex deletions therefore define an (n
– 1)-simplex of intersecting n-simplexes. Two melodic structures in counterpoint define
a precise harmonic structure when the uncontracted melodic structures correspond to
two (simplexes of simplexes of . . . ) simplexes in some Stasheff polytope that intersect
deletions of this structure for each melodic voice, (G# -F# -E-C# -F# and G# -C# -B-A# -F# )
each correspond to 3-simplexes (tetrahedrons) of intersecting 4-simplexes in the 4-
dimensional polytope. The two sets of sixteen points encompassed by each of these
intersects in one point, which corresponds to the overall harmonic structure of the
passage.
5 Conclusion
The geometry of hierarchy is fascinating by virtue of its depth of mathematical
structure alone. Indeed, we have only scratched the surface of that mathematical
structure here. Given the myriad ways that music is hierarchically organized this
geometry illuminates it from many angles, showing the nature of relationships
between different ways of hierarchically organizing rhythms or tonal patterns, conflict
between the simultaneous rhythmic and tonal organization of a melody or multiple
melodies in counterpoint, and the relationships between different types of hierarchical
organization that apply to tonal patterns. All of these relationships not only
demonstrate the importance of hierarchy in tonal music, but also give the lie to the
idea, implicit in Schenkerian analysis, that musical hierarchy is a unitary
phenomenon. Indeed, a single hierarchy is too simple an object to represent the
complex interactions that bring life to music in the tonal idom.
References
Cohn, R., Dempster, D.: Hierarchical unity, plural unities: Towards a reconciliation. In:
Bergeson, K., Bohlman, P.V. (eds.) Disciplining Music: Musicology and its Canons.
University of Chicago Press, Chicago (1992)
Komar, A.: Theory of Suspensions: A Study of Metrical and Pitch Relations in Tonal Music.
Princeton University Press, Princeton (1971)
192 J. Yust
Lee, C.W.: The associahedron and triangulations of the n-gon. European Journal of
Combinatorics 10/6, 551–560 (1989)
Larson, S.: The problem of prolongation in ‘tonal’ music: Terminology, perception, and
expressive meaning. Journal of Music Theory 41(1), 101–136 (1997)
Lerdahl, F.: Issues in prolongational theory. Journal of Music Theory 41(1), 141–155 (1997)
Lerdahl, F., Jackendoff, R.: A Generative Theory of Tonal Music. MIT Press, Cambridge
(1983)
Loday, J.-L.: Realization of the Stasheff polytope. Archiv der Mathematik 83, 267–278 (2004)
Loday, J.-L.: Inversion of integral series enumerating binary plane trees. Séminaire
Lotharingien de Combinatoire 53 Article B53d, 1–16 (2005)
Loday, J.-L.: Parking functions and triangulation of the associahedron. Contemporary
Mathematics 431, 327–340 (2007)
Loday, J.-L., Ronco, M.O.: Order structure on the algebra of permutations and of planar binary
trees. Journal of Algebraic Combinatorics 15, 253–270 (2002)
Marsden, A.: Generative structural representation of tonal music. Journal of New Music
Research 34(4), 409–428 (2005)
Marsden, A.: Empirical study of Schenkerian analysis by computer. In: The joint meeting of the
Society for Music Theory and the American Musicological Society, Nashville (2008)
Proctor, G., Lee Riggins, H.: Levels and the reordering of chapters in Schenker’s Free
Composition. Music Theory Spectrum 10, 102–126 (1988)
Rahn, J.: Logic, set theory, music theory. College Music Symposium 19(1), 114–127 (1979)
Schachter, C.: Rhythm and Linear Analysis: A Preliminary Study. In: Mitchell, W., Salzer, F.
(eds.) Music Forum 4 (1976); reprinted in Straus, J. (ed.): Unfoldings: Essays in Schenkerian
Theory and Analysis, pp. 17–53. Oxford University Press, New York (1999)
Schachter, C.: Rhythm and Linear Analysis: Durational Reduction. In: Mitchell, W., Salzer, F.
(eds.) Music Forum 5 (1980); reprinted in Straus, J. (ed.): Unfoldings: Essays in Schenkerian
Theory and Analysis, pp. 54–79. Oxford University Press, New York (1999)
Schachter, C.: Rhythm and Linear Analysis: Aspects of Meter. In: Mitchell, W., Salzer, F.
(eds.) Music Forum 6 (1987); reprinted in Straus, J. (ed.): Unfoldings: Essays in Schenkerian
Theory and Analysis, pp. 79–117. Oxford University Press, New York (1999)
Schenker, H.: Free Composition (2 vols.), translated by Ernst Oster. Longman, New York
(1979)
Schenker, H.: Counterpoint (2 vols.), translated by John Rothgeb and Jürgen Thym. Schirmer
Books, New York (1987)
Schenker, H.: The Masterwork in Music: A Yearbook (3 vols.), edited by William Drabkin.
Cambridge University Press, Cambridge (1997)
Smoliar, S.: A Computer Aid for Schenkerian Analysis. Computer Music Journal 4(2), 41–59
(1980)
Stasheff, J.D.: Homotopy associativity of H-spaces I. Transactions of the American
Mathematical Society 108, 275–292 (1963)
Yeston, M.: The Stratification of Musical Rhythm. Yale University Press, New Haven (1976)
Yust, J.: Formal Models of Prolongation. PhD Diss. University of Washington (2006)
Yust, J.: The Step-Class Automorphism Group in Tonal Analysis. In: Proceedings of the First
International Conference on Mathematics and Computation in Music. Springer, Berlin
(forthcoming 2007)
A Multi-tiered Approach for Analyzing
Expressive Timing in Music Performance
Panayotis Mavromatis
1 Introduction
The study of expressive musical performance has been the subject of experi-
mental as well as computational research [1,2]. It is generally acknowledged that
expressive timing—a performer’s deviations from an exact temporal rendering
of the score—is an important component of musical expression. By manipulat-
ing timing, a performer is able to communicate musical structure and shape a
listener’s experience of the music. This paper presents a method for analyzing ex-
pressive timing data, extracted through audio analysis of recorded performances.
The purpose is to uncover rules which explain a performer’s systematic timing
manipulations in terms of structural features such as form, harmonic progression,
texture, and rhythm.
A fundamental assumption of this analysis is that a performer controls a hier-
archically structured metrical cycle of measure, beat, and subdivision levels [3,4].
At each point in time, the performer’s mental clock fires at a given tempo, which
is evidenced by the cumulative effect of all the levels in the metrical cycle. The
performer’s clock rate as a function of time is represented by a tempo curve.
Identifying this curve forms a natural first tier of analysis.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 193–204, 2009.
c Springer-Verlag Berlin Heidelberg 2009
194 P. Mavromatis
Once the effect of tempo has been factored out, it is possible to examine
how the performance rendering of subdivisions between adjacent metrical layers
(e.g., the subdivision of a quarter note into two eighth notes) deviates from the
corresponding exact duration ratios (e.g., 0.5 / 0.5). In the subsequent tiers of
analysis, systematic deviations of this type are identified at each level in the
metric hierarchy.
As justification for the proposed multi-tiered approach, we first note that
it is supported by informal musical discourse: terms such as ritardando and
accelerando typically refer to the first tier of expressive timing, whereas terms
such as rubato, notes inégales, or “swing” most commonly represent deviations
in the subsequent tiers.
More to the point, it appears that, in principle at least, a skilled performer
can control each tier independently. For instance, a performer may be asked to
manipulate the tempo of a performance, while maintaining even metric subdivi-
sions. Conversely, the performer may be requested to perform at steady tempo,
while producing various types of uneven metric subdivisions. Moreover, these
uneven subdivisions can be executed independently at any particular metric
level—up to a certain depth—while maintaining even timing at higher metrical
levels.
One of the challenges of expressive performance research is to understand
the cognitive mechanisms that underlie expert expressive rendering of a musical
score. In line with current views on cognitive modeling, it is natural to seek
modular rules that specialize in responding to specific features of the musical
structure (such as metric accent) by shaping the expression in specific ways
(such as lengthening the strong half of a subdivision). This modularity require-
ment poses challenges for any analytical approach to expressive performance
data, which must identify and isolate the effect of individual rules from their
surrounding context, where other rules may be simultaneously contributing to
expressive deviations. It is in this spirit that the present analysis is offered; it
represents work aiming towards a complete rule system for expressive timing.
We believe that the multi-tiered analytical approach proposed in this paper can
help identify and isolate the right ingredients in this complex multi-faceted man-
ifestation of expert musical skill.
general options for calculating the tempo curve from the timing data. This nat-
urally leads to a non-parametric regression analysis, which does not assume a
specific functional form for the tempo curve.
For the purposes of this study, we found it most flexible to use a non-linear
regression model based on radial basis functions. The technique was first pro-
posed in [12,13], and is a particular instance of density estimation using Parzen
windows [14]. In its simplest form, the process can be illustrated as follows:
Let {xi : i = 1 . . . N } be a set of values for the independent variable X, and
let {yi : i = 1 . . . N } be the corresponding values of the dependent variable Y ,
so that (xi , yi ) are the coordinates of the i-th point in the data set. Then the
regression curve y(x) obtained from the above data set is given by
N
i=1 yi exp[−(x − xi ) /2σ ]
2 2
y(x) = N
i=1 exp[−(x − xi ) /2σ ]
2 2
Since each note’s inter-onset duration reflects not only the local tempo, but
also the note’s nominal duration value (e.g., quarter-note, eighth note, etc.),
we must normalize each of the raw inter-onset durations by dividing it by the
corresponding note’s nominal value, where a whole note equals 1.0, a quarter
note equals 0.25, etc. This way, each normalized inter-onset duration is a con-
sistent indicator of the local tempo: its value reflects the whole-note duration
corresponding to the tempo at that specific point in time. Our solution is essen-
tially equivalent to Widmer’s representation of his timing data using percentage
deviations instead of absolute durations, but has the added advantage that it
keeps track of absolute tempo information, and not just its relation to some
average. The normalized inter-onset durations for each performance were used
as data presented to the non-linear regression model, in order to obtain that
performance’s tempo curve.
Figure 1 shows the application of the above analysis to a recording of Bach’s
F minor prelude, BWV 881, from the Well-Tempered Clavier, Book 2. The piece
is performed on the harpsichord by an expert, and is recorded on a commercially
available CD. This performance will be used as an illustration throughout the
paper. In Figure 1, the data points corresponding to the normalized inter-onset
durations are shown in grey. The tempo curve derived from the regression is
shown in black.
The performances of three contrasting Bach preludes (BWV 845, 863, 881)
were analyzed, each of them performed by two different harpsichordists. The
most salient factors shaping the tempo curve appear to be
Once the effect of tempo is factored out, one can examine the lengthening or
shortening of individual measures with respect to their neighbors, in response to
specific features of the music. This individual manipulation of measure lengths
is distinct from overall tempo change, and can be represented in a graph such as
that of Fig. 2. Identified variations of this type include lengthening a measure
that
σ is usually in the range of 2–4 seconds (2.0591 secs for the curve of Fig. 1). It is
an open question whether this value may hold some special significance, either
in terms of tempo, or the structure of the piece, or even in terms of psychological
properties of time perception and production.
Fig. 1. Tempo curve for a recorded performance of Bach’s F minor prelude, BWV 881 (WTC, Book 2), plotted against normalized
inter-onset durations for each note in the piece. The x-axis represents position in the notated score using measure numbers (e.g., 2.5 is
the middle of the second measure). The y-axis represents local tempo, measured by the duration of a whole note in seconds.
A Multi-tiered Approach for Analyzing Expressive Timing
Fig. 2. Graph showing how individual measure durations deviate from the duration corresponding to performed tempo at each point in
time. The x-axis represents position in the notated score using measure numbers (see Fig. 1). The y-axis represents individual measure
durations in seconds. Therefore, a high spike represents a markedly lengthened measure. The graph comes from the same performance
201
as that of Fig. 1.
202
P. Mavromatis
Fig. 3. Graph showing the performed subdivision of each quarter note into two eighth notes. As before, the x-axis represents position in
the notated score using measure numbers (see Fig. 1). The y-axis represents the duration ratio of each subdivision. E.g. the first quarter
note in the Figure is subdivided into two eighth-note time spans of duration ratio 1.06 : 0.94. The measurements are taken from the
same performance as that of Figures 1–2. Figure 3 focuses on mm. 1–23. The complete piece is represented in Fig. 4.
A Multi-tiered Approach for Analyzing Expressive Timing
Fig. 4. The measurements presented in Fig. 3, now shown over a larger time scale that encompasses the entire piece. The axes carry the
same meaning as those of Fig. 3. From this graph, one can clearly observe how the deviations from exact subdivisions vary smoothly in
magnitude over time, as witnessed by the smooth envelope to the curve. These smooth variations are shaped by gestures (lumps in the
203
References
1. Gabrielsson, A.: Music Performance. In: Deutsch, D. (ed.) The Psychology of Mu-
sic, 2nd edn. Academic Press, San Diego (1999)
2. Widmer, G., Goebl, W.: Computational Models of Expressive Music Performance:
The State of the Art. Journal of New Music Research 33, 203–216 (2004)
3. Lerdahl, F., Jackendoff, R.S.: A Generative Theory of Tonal Music. MIT Press,
Cambridge (1983)
4. London, J.: Hearing in Time: Psychological Aspects of Musical Meter. Oxford
University Press, Oxford (2004)
5. Todd, N.P.M.: A Computational Model of Rubato. Contemporary Music Review 3,
69–88 (1989)
6. Honing, H.: Computational Modeling of Music Cognition: A Case Study on Model
Selection. Music Perception 23, 365–376 (2006)
7. Clarke, E.F.: Generative Principles in Music Performance. In: Sloboda, J.A. (ed.)
Generative Processes in Music: The Psychology of Performance, Improvisation,
and Composition. Clarendon Press, Oxford (1988)
8. Friberg, A.: A Quantitative Rule System for Musical Performance. Doctoral dis-
sertation, Royal Institute of Technology, Stockholm (1995)
9. Widmer, G.: Machine Discoveries: A Few Simple, Robust Local Expression Prin-
ciples. Journal of New Music Research 31, 37–50 (2002)
10. Widmer, G., Tobudic, A.: Playing Mozart by Analogy: Learning Multi-Level Tim-
ing and Dynamics Strategies. Journal of New Music Research 32, 259–268 (2003)
11. Todd, N.P.M.: A Model of Expressive Timing in Tonal Music. Music Perception 3,
33–58 (1985)
12. Specht, D.F.: A General Regression Neural Network. IEEE Transactions on Neural
Networks 2, 568–576 (1991)
13. Specht, D.F.: Probabilistic and General Regression Neural Networks. In: Chen,
C.H. (ed.) Fuzzy Logic and Neural Network Handbook. McGraw-Hill, New York
(1996)
14. Parzen, E.: On Estimation of a Probability Density Function and Mode. Annals of
Mathematical Statistics 33, 1065–1076 (1962)
15. Clarke, E.F.: Rhythm and Timing in Music. In: Deutsch, D. (ed.) The Psychology
of Music, 2nd edn. Academic Press, San Diego (1999)
16. Farbood, M.M.: A Quantitative, Parametric Model of Musical Tension. Doctoral
dissertation. MIT, Cambridge, MA (2006)
HMM Analysis of Musical Structure:
Identification of Latent Variables Through
Topology-Sensitive Model Selection
Panayotis Mavromatis
1 Introduction
Hidden Markov Models have been successfully employed in the exploration and
modeling of musical structure [1,2], with applications in Music Information Re-
trieval [3].
Simply put, a Hidden Markov Model is a probabilistic version of a Finite
State Machine (FSM), or formal specification of a finite state grammar. A FSM
is formally defined by states and transitions, graphically represented by circles
and arrows respectively. A FSM generates a symbolic sequence by traversing a
path of states connected by transitions, following the direction of the arrows.
The generated sequence is the string of output symbols encountered in the path.
A FMS is a simple and flexible way to specify finite-memory constraints on
the symbolic values of variables that characterize musical structure (e.g., pitch,
duration, etc.) and as such offers useful formal characterizations of the structure
of musical sequences.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 205–217, 2009.
c Springer-Verlag Berlin Heidelberg 2009
206 P. Mavromatis
P (D|M )P (M )
P (M |D) =
P (D)
since P (D) is constant over models M and therefore does not affect the maxi-
mization problem. P (M ) is known as the model prior probability, assigned to the
model on general grounds before the data set is consulted. Likewise, P (M |D) is
known as the model posterior probability, and represents the probability of the
model after the data has been taken into consideration.
Topology identification is achieved through a suitable choice of model prior
P (M ), defined as a function of model topology alone, and designed to reward
model simplicity. For a fixed topology, P (M ) is fixed, and so maximization of
the model posterior amounts to maximizing the P (D|M ) part in eq. (1). This
is achieved through the Baum-Welch (BW) algorithm, which chooses the model
parameters maximizing the probability of the data set using the Expectation-
Maximization principle. Overall, the maximization problem defined by eq. (1)
is a concrete implementation of Occam’s Razor, and achieves optimal balance
between goodness-of-fit and model simplicity.
We have shown elsewhere [8] that an optimal choice for P (M ) is a model
complexity prior given by
P (M ) = Ke−D(M) (2)
where the function D(M ) is defined by
(d + nS + 1)! (d + nA + 1)!
D(M ) ≡ L(nS ) + L(d) + nS log + nT log (3)
d!nS ! d!nA !
and L(n) is the universal prior for integers [9, pp. 34–5], defined by
A D B D F E G G A F C F D E D C C D
These words appear randomly with equal probability in the sequences of our
data set D1. (Word combinations were more restricted in Saffran’s stimuli, due
HMM Analysis of Musical Structure 209
Table 1. Calculation of model posteriors for all the HMMs considered in the word
segmentation example involving data set D1. Each model is obtained from the previous
one by state splitting. The first column shows the HMM’s number of states nS . The
second column shows the state split from which that model was obtained. Negative
logarithms of probability values are used throughout. The selected model maximizes
the model posterior or, equivalently, minimizes the value in Column 5. This model is
marked with an asterisk in Column 1.
G G A / A D B / D E D / A D B / C C D / D F E
Our HMM analysis was applied to a data set D1 constructed in the above
manner, consisting of 200 randomly generated sequences with an average length
of 27.21 symbols. A state-splitting search was performed to identify the best
HMM topology. Each candidate split was followed by Baum-Welch estimation of
the HMM parameters. The results of this search are summarized in Table 1. The
model identified as the winner is the one that carries the maximum posterior
probability. This model is marked with an asterisk in the first column of the
table. The model’s graph structure is given in Figure 1.
To illustrate how the HMM of Figure 1 performs segmentation on a data
sequence, it is helpful to consider the most likely HMM path that generates the
sequence in question, also known as the sequence’s Viterbi path [4,5, pp. 331–3].
For the sequence of example (5), this path turns out to be the following:
BEGIN G G A A D B D E D
s0 → s1 → s2 → s3 → s1 → s8 → s9 → s1 → s12 → s4 → s1
210 P. Mavromatis
Fig. 1. The best HMM for data set D1, obtained through state-splitting
A D B C C D D F E EN D
(6)
s1 → s8 → s9 → s1 → s11 → s4 → s1 → s7 → s6 → s1 → s0
With the help of this Viterbi path, all word boundaries in the sequence are clearly
identified through the HMM state s1 . The significance of that state as a marker
of word boundaries can also be confirmed by observing the graph structure of
Figure 1 and following the derivation path of any sequence generated by that
graph.
This simple example serves to illustrate that, just like the experimental sub-
jects in the study by Saffran et al. [11], the HMM topology selection technique
presented here can exploit the statistical structure of symbolic sequences to seg-
ment them into grouping units. This result is replicated with other similar data
sets and suggests that—at least in certain cases—segmentation can be performed
on the basis of statistical information alone, without recourse to other structure,
such as Gestalt principles of grouping.
patterns of durations found in Palestrina’s vocal music. Table 2 lists all the
possible note and rest durations employed in the style.
It should be noted that the goal of this application is not to do meter induction
per se. Rather, we seek to model Renaissance rhythm by establishing a syntax of
note durations. With the help of the HMM topology selection technique, we hope
to identify any other variable(s) that may be most relevant in constraining and
shaping the style’s duration patterns. In this instance, the most crucial variable
turns out to be metric placement, and is identified by the interpretation of HMM
states as explained below.
The HMM analysis of the present case study was performed on a sample of
melodies taken from the corpus of Palestrina’s masses. The corpus was obtained
from the Internet in Humdrum-encoded form.1 The sample was constructed as
follows:
An example of this encoding is shown above the staff in Figure 3. The resulting
sample consisted of 190 such sequences with an average length of 34.48 symbols.
The results of HMM inference algorithm are shown on Table 3. The HMM
with the highest posterior probability was a 6-state model, marked with an
asterisk in the first column of the table. Figure 2 shows the model in graph
form.
As in the previous case study, the model’s structure will be easier to interpret
with the help of the data sequences’ Viterbi path. As an illustration, the Viterbi
path for a typical melody in the data set is given in Figure 3.
Examination of the state sequences in the model’s Viterbi paths reveals one
striking property: there is a close correspondence between the HMM states and
the various metric positions in the compositions’ underlying 4/2 meter. As can
be seen from the example of Figure 3, states s1 and s3 occur exclusively on
strong beats (1 or 3), whereas state s2 only occurs on weak beats (2 or 4);
moreover, state s4 only occurs on weak quarters, and the rare occurrence of
state s5 coincides with a weak eighth-note subdivision. In other words, the HMM
appears to be “aware” of metric placement for each duration it generates. This
1
URL: http://csml.som.ohio-state.edu/HumdrumDatabases/classical/Renaissance/
Palestrina/Masses/ (last visited March 2009).
212 P. Mavromatis
Table 2. The note and rest durations available to the Renaissance vocal style. These are
shown along with the corresponding symbolic value of the duration variable, as encoded
for the HMM analysis of the present project. The rightmost column records the possible
metric placements for each duration, as prescribed in counterpoint instruction.
Longa L beats 1, 3
Breve B beats 1, 3
Table 3. Calculation of model posteriors for all the HMMs considered in the analysis of
Palestrina rhythm. As in the earlier example, each model is obtained from the previous
one by state splitting. The columns of this table carry the same interpretation as those
of Table 1.
5 Conclusions
The two case studies presented in this paper have demonstrated how topology-
sensitive HMM training can successfully uncover hidden structure underlying
the observable behavior of symbolic data sequences. Indeed, generic application
of the Baum-Welch algorithm would not have resulted in readily interpretable
graphs such as those of Figures 1 and 2. Only when HMM training incorporates
model topology identification, in a way that is sensitive to the data set’s statis-
tical regularities, will the HMM states be readily interpretable in terms of the
processes underlying the data sequence’s generation. In such cases, we can in-
terpret the different HMM states as representing the values of hidden, or latent,
variables that are most crucial in shaping the structural constraints of the data
sequences.
More specifically, one salient latent variable underlying Case Study I could be
identified as “word completion status” with the two values ‘yes’ (corresponding
to state s2 ) and ‘no’ (corresponding to states s1 s3 , and s4 ); furthermore, a
second latent variable of “word label” could account for the differences among
the non-boundary states s1 s3 , and s4 . For Case Study II, the most salient latent
variable seemed to be “metric position” with most HMM states representing
distinct values. A second latent variable representing “position in the phrase”
was found to differentiate between states s1 and s3 .
Of course, in both the above examples, identification of the relevant latent
variables is relatively straightforward. This is because the HMM graphs are
214 P. Mavromatis
rather small, and so the correspondence between HMM states and latent vari-
able values can be directly perceived. In more complicated situations, however,
this need not be the case. We must have a way of interpreting HMM states
that is more reliable than simple inspection. In general, the interpretation pro-
cess could be systematized by compiling contingency tables that show how each
HMM state aligns, or doesn’t align, with the values of a set of candidate latent
variables along the HMM paths that generate the data set (the Viterbi path
offering the dominant contribution).
Finally, it should be noted that, as our experiments with various data sets in-
dicate, our MDL prior of eq. (2–4) is an essential ingredient for the identification
of the right model topology. Other priors that we have tried typically produce
smaller graphs—e.g., caused by premature termination of state-splitting—whose
states are not consistently interpretable. In general, whenever the data is abun-
dant, it is found that the result is less sensitive to the choice of prior. However,
that choice really matters when data is scarce, which is the case, for example, in
historically delimited musical corpora (e.g. “all D-mode Gregorian tracts”). The
MDL approach is a strongly motivated and principled way of choosing a prior,
which in the majority of cases leads the topology search to discover interpretable
graphs.
It should be also noted that a simple splitting/merging search over model
topologies, unaided by other search heuristics, does not always yield readily
interpretable graphs, especially in data sequences with rich alphabets of symbols.
The problem is that the splitting/merging search is a form of “best first” search
that guarantees an optimal next step in the search, leading to a local maximum
of the model posterior; however, it cannot guarantee that the maximum reached
in this way will be optimal in the global sense. This is of course a concern for
any optimization problem. We have found that, in order to produce interpretable
results in the most general cases, the search proposed in this paper has to be
augmented with heuristics that determine an appropriate starting point for the
splitting or merging. This issue is currently under investigation, and will be
presented in a future work.
Fig. 2. The best HMM for the data sequences of durations in the Palestrina sample. The graph’s output symbols are encoded using the
HMM Analysis of Musical Structure
Fig. 3. A typical Palestrina vocal line (from Missa Te Deum Laudamus, Kyrie II, Tenor I) annotated with its Viterbi path. The state
sequence corresponding to that path is marked with the symbols s0 − s5 above each staff. Arrows from one state to the next have been
suppressed for visual clarity. The output symbols corresponding to the encoded durations appear below the state sequence and above
the corresponding note or rest. Duration encodings are listed in column 4 of Table 2.
HMM Analysis of Musical Structure 217
References
1. Raphael, C., Stoddard, J.: Functional Harmonic Analysis Using Probabilistic Mod-
els. Computer Music Journal 28, 45–52 (2004)
2. Mavromatis, P.: A Hidden Markov Model of Melody Production in Greek Church
Chant. Computing in Musicology 14, 93–112
3. Bello, J.P., Pickens, J.: A Robust Mid-Level Representation for Harmonic Content
in Music Signals. In: Proceedings of the 6th International Conference on Music
Information Retrieval (ISMIR 2005), London, UK (September 2005)
4. Rabiner, L.R., Juang, B.-H.: An Introduction to Hidden Markov Models. IEEE
ASSP Magazine 3, 4–16 (1986)
5. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Process-
ing. MIT Press, Cambridge (1999)
6. Stolcke, A., Omohundro, S.M.: Hidden Markov Model Induction by Bayesian Model
Merging. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural
Information Processing Systems, vol. 5, pp. 11–18. Morgan Kaufmann, San Mateo
(1993)
7. Ostendorf, M., Singer, H.: HMM Topology Design Using Maximum Likelihood
Successive State Splitting. Computer Speech and Language 11, 17–41 (1997)
8. Mavromatis, P.: Minimum Description Length Modeling of Musical Structure. Sub-
mitted to the Journal of Mathematics and Music (Under revision)
9. Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Sci-
ence, vol. 15. World Scientific, Singapore (1989)
10. Grünwald, P.D.: The Minimum Description Length Principle. Adaptive Computa-
tion and Machine Learning. MIT Press, Cambridge (2007)
11. Saffran, J.R., Johnson, E.K., Aslin, R.N., Newport, E.L.: Statistical Learning of
Tone Sequences by Human Infants and Adults. Cognition 70, 27–52 (1999)
12. Jeppesen, K.: Counterpoint: The Polyphonic Vocal Style of the Sixteenth Century.
Prentice Hall, New York (1939); reprinted by Dover (1992)
13. Gauldin, R.: A Practical Approach to Sixteenth-Century Counterpoint. Waveland
Press, Long Grove (1995)
A Declarative Language for Dynamic Multimedia
Interaction Systems
1 Introduction
Process calculi provide a language in which the structure of terms represents the struc-
ture of processes together with an operational semantics to represent computational
steps. Concurrent Constraint Programming (CCP) [13] has emerged as a declarative
model for concurrency tied to logic. In CCP, concurrent systems are specified by means
of constraints (e.g. x + y ≥ 10) representing partial information about certain variables.
This way, agents (or processes) interact with each other by telling and asking informa-
tion represented as constraints in a global store: A process tell(c) adds the constraint
c, thus making it available to other processes. A positive ask when c do P remains
blocked until the store is strong enough to entail c; if so, it behaves like P .
Interactivity in multimedia systems has become increasingly important. The aim is to
devise ways for the machine to be an active partner in a collective behavior constructed
This work has been partially supported by FORCES, an INRIA’s Equipe Associée between the
teams COMETE (INRIA), the Music Representation Research Group (IRCAM), and AVISPA.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 218–227, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A Declarative Language for Dynamic Multimedia Interaction Systems 219
dynamically by many actors. In its simplest form, a musician signals the computer when
processes should be launched or stopped. In more complex forms the machine is always
actively adapting its behavior according to the information derived from the activity of
the other partners. To be coherent these machine actions must be the result of a complex
adaptive system composed of many agents that should be coordinated in precise ways.
Constructing such systems is a challenging task. Moreover, ensuring their correctness
poses a great burden to the usual test-based techniques. In this setting, CCP has much to
offer: CCP calculi are explicitly designed for expressing complex coordination patterns
in a very simple way by means of constraints. In addition, their declarative nature allows
formally proving properties of systems modeled with them.
Interactive scores [3] are models for reactive music systems adapting their behavior
to different types of intervention from a performer. Weakly defined temporal relations
between components in an interactive score specifies loosely coupled music processes
potentially changing their properties in reaction to stimulus from the environment (say,
a performer). An interactive score defines a hierarchical structure of processes. Musical
properties of a process depend on the context in which it is located. Although the hierar-
chical structure has been treated as static in previous works, there is no reason it should
be so. A process, in reaction to a musician action, for example, could be programmed
to move from one context to another or simply to disappear. Imagine, for instance, a
particular set of musical materials within different contexts that should only be played
when an expected information from the environment actually takes place. Modeling this
kind of interactive score mobility in a coherent way is greatly simplified by using the
calculus described in this paper.
Musical improvisation is another natural context for interacting agents. Improvisa-
tion is effective when agents behavior adapts to what has been learned in previous inter-
actions. A music style-learning/improvisation scheme such as Factor Oracle (FO) [1,5]
can be seen as a reactive system where several learning and improvising agents react to
information provided by the environment or by other agents. In its simplest form three
concurrent agents, a player, a learner and an improviser must be synchronized. Since
only three independent processes are active, coordination can be implemented without
major difficulties using traditional languages and tools. The question is whether such
implementations would scale up to situations involving several concurrent agents. For
an implementation using traditional languages the complexity of such systems would
most likely impose many simplifications in coordination patterns if behavior is to be
controlled in a significant way. A CCP model, as described here, provides a compact
and simple model of the agents involved in the FO improvisation, one in which co-
ordination is automatically provided by the blocking ask construct of the calculus.
Moreover, additional agents could easily be incorporated in the system. As an extra
bonus, fundamental properties of the constructed system can be formally verified in
the model.
In this paper we argue for Universal Timed CCP (utcc) [10] as a declarative lan-
guage for the modeling and verification of multimedia interaction systems. The utcc
calculus is a timed extension of CCP with the ability to model mobile reactive sys-
tem, i.e., systems that continuously interact with the environment and may change their
communication structure.
220 C. Olarte and C. Rueda
2 Preliminaries
CCP-based languages are parametric in a constraint system [13] defining the kind of
constraints that can be used in the program. Here, constraints c, d, . . . are understood as
formulae in a first-order language. If the information of d can be entailed (or deduced)
from the information represented by c we write c d (e.g. pitch > 64 pitch > 48).
Universal timed CCP (utcc) [9] extends Timed CCP (tcc) [12] for mobile reactive
systems. Time in utcc is conceptually divided into time intervals (or time-units). In a
particular time-unit, a utcc process P gets an input c from the environment, it exe-
cutes with this input as the initial store, and when it reaches its resting point, it outputs
the resulting store d to the environment. Furthermore, the resting point determines a
residual process, which is then executed in the next time interval.
Processes in utcc are built by the following syntax:
P, Q := skip | tell(c) | (abs x; c) P | P Q | (local x; c) P |
next P | unless c next P | ! P
A process skip represents inaction. A process tell(c) adds c to the store in the
current time interval, thus making it available to other processes.
In utcc, the CCP ask operator when c do P (executing P if c can be deduced)
is replaced by the abstraction operator (abs x; c) P . This construct is a parameterized
ask where P [t/x] is executed for all the terms t s.t c[t/x] is entailed by the store.
A process P Q denotes P and Q running in parallel possibly “communicating”
via the common store. The process (local x; c) P behaves like P but the information c
about the variables in x is local to P . We shall omit c in (local x; c) P when c ≡ true.
From a programming language perspective, x in (local x; c) P can be viewed as the
local variables of P while x in (abs x; c) P as the formal parameters of P . This way,
def
abstractions can encode recursive definitions of the form X(x) = P (see [8]).
The unit-delay next P executes P in the next time interval. The (weak) time-out
unless c next P executes P in the next time-unit iff c cannot be entailed by the final
store at the current time interval. The replication ! P means P next P next 2 P...,
i.e. unboundedly many copies of P but one at a time.
A Declarative Language for Dynamic Multimedia Interaction Systems 221
We shall also use the derived operator (wait x; c) do P that waits, possibly for sev-
eral time-units, until for some t, c[t/x] holds and then it executes P [t/x] (see [10]).
An Example. The abstraction operator allows us to communicate (local) names or vari-
ables between processes, i.e., mobility in the sense of the π-calculus [7]. Let us give a
simple example of this situation. Let P be a process modeling a musician playing notes
at different time-units, and Q be an improvisation system which after “reading” the note
played by P performs some action R. Roughly, this scenario can be modeled as follows
def
P = tell(play (A)) next (tell(play (G)) next tell(play (B))) . . .
def
Q = ! (abs x; play(x)) R
When executing P Q, we observe, e.g., R[G/x] in the second time-unit. This means
that P and Q synchronized on the constraint play(·) and the note played by P (i.e. G)
was read by Q and then processed by R. See [8] for a more involved example defining
synchronization of multiple agents.
Logic Characterization. The utcc calculus enjoys a declarative view of processes as
first-order linear-time temporal logic (FLTL) formulae [6]. This means that processes
can be seen, at the same time, as computing agents and as logic formulae.
Formulae in FLTL are built from the following syntax
F, G, . . . := c | F ∧ G | ¬F | ∃xF | ◦F | F.
where c is a constraint. The modalities ◦F and F stand for resp., that F holds next
and always. We use ∀xF for ¬∃x¬F , and the eventual modality 3F as an abbreviation
of ¬¬F . See [6] for further details on this logic.
Processes in utcc can be represented as FLTL formulae as follows:
[[skip]] = true [[tell(c)]] =c [[P Q]] = [[P ]] ∧ [[Q]]
[[(abs y; c) P ]] = ∀y(c ⇒ [[P ]]) [[(local x; c) P ]] = ∃x(c ∧ [[P ]])
[[next P ]] = ◦[[P ]] [[unless c next P ]] = c ∨ ◦[[P ]] [[! P ]] = 2[[P ]]
Let A = [[P ]]. Roughly, A 3c (i.e., c eventually holds in A) iff the process P
eventually outputs c (see [8,9] for further details).
def
BoxOperations = (abs id, d; mkbox(id, d))
(local s) tell(box(id, d, s))
(abs id; destroy(id))
(abs x, sup; in(x, id) ∧ in(id, sup))
unless play(id) next tell(in(x, sup))
(abs x, y; before(x, y)) when ∃z (in(x, z) ∧ in(y, z)) do
unless play (y) next tell(bf(x, y))
(abs x, y; into(x, y)) unless play (x) next tell(in(x, y))
(abs x, y; out(x, y)) when in(x, y) do
unless play (x) next (abs z, in(y, z); tell(in(x, z)))
def
Constraints = (abs x, y; in(x, y)) (abs dx , sx ; box(x, dx , sx ))
(abs dy , sy ; box(y, dy , sy ))
tell(sy ≤ sx ) tell(dx + sx ≤ dy + sy )
(abs x, y; bf(x, y)) (abs dx , sx ; box(x, dx , sx ))
(abs dy , sy ; box(y, dy , sy )) tell(sx + dx ≤ sy )
def
P ersistence = (abs x, y; in(x, y)) when play(x) do next tell(in(x, y))
unless out(x, y) ∨ destroy(x) next tell(in(x, y))
(abs x, y; bf(x, y)) when play(y) do next tell(bf(x, y))
unless (out(x, y) ∨ destroy(y) next tell(bf(x, y))
(abs x; box(x, dx , sx )) when play(x) do next tell(box(x, dx , sx ))
unless destroy(x) next tell(box(x, dx , sx ))
def
Clock(t, v) = tell(t = v) next Clock(t, v + 1)
def
P lay(x, t) = when t ≥ 1 do tell(play(x)) unless t ≤ 1 next P lay(x, t − 1)
def
Init(t) = (wait x; init(x)) do
(abs dx , sx ; box(x, dx , sx ))
Clock(t, 0) tell(sx = t)
! (wait y, dy , sy ; box(y, dy , sy ) ∧ sy ≤ t) do P lay(y, dy )
def
System = (local t) Init(t) ! P ersistence ! Constraints ! BoxOperations U srBoxes
Finally, the whole system is the parallel composition between the previously defined
processes and the specific user model, e.g.:
def
UsrBoxes = tell(mkbox(a, 22) ∧ mkbox(b, 12) ∧ mkbox(c, 4))
tell(mkbox(d, 5) ∧ mkbox(e, 2))
tell(into(b, a) ∧ into(c, b) ∧ into(d, b) ∧ into(e, d))
tell(before(c, d))
whenever play(b) do unless signal next
tell(out(d, b) ∧ mkbox(f, 2) ∧ into(f, a))
tell(before(b, f ) ∧ before(f, d))
This system defines the hierarchy in Figure 3(a). When b starts playing, the system asks
if the signal signal is present (i.e., if it was provided by the environment). If it was
not, the box d is taking out from the context b. Furthermore, a new box f is created such
that b must be played before f and f before d as in Figure 3(b). Notice that when the
box d is taken out from b, the internal box e is still into d preserving its structure.
Verification of the Model. The processes defined by the user may lead to situations
where the final store is inconsistent as in st < 5 ∧ st > 7 where st is the start time
of a given box. Take for example the process U srBoxes above. If the box f is defined
with a duration greater than 5, the execution of f (and then that of d) will exceed the
boundaries of the box a which contains both structures.
224 C. Olarte and C. Rueda
(a) (b)
In this context, the declarative view of utcc processes as FLTL formulae provides a
valuable tool for the verification of the model: The formula A = [[P ]] allows us to verify
whether the execution of P leads to an inconsistent store. Thus, we can detect pitfalls
in the user model such as trying to place a bigger box into a smaller one or taking a box
out of the outermost box.
In the following, we present some examples of temporal properties we could verify
in an interactive score represented as the process P .
Remark. For the sake of presentation we only defined here the before relation.
Our model can be straightforwardly extended to support all Allen temporal relations
[2]. Making use of the into and out operations, we can define also the operation
move(a, b) meaning, move the structure a into the structure b.
but are constantly interacting with others in controlled ways. The interactions allow
building a complex global musical process collaboratively. Interactions become effec-
tive when each partner has somehow learned about the possible evolutions of each
musical process launched by the others, i.e, their musical style. Getting the computer
involved in the improvisation process requires learning the musical style of the human
interpreter and then playing jointly in the same style. A style in this case means some
set of meaningful sequences of musical material the interpreter has played. A graph
structure called factor oracle (F O) is used to efficiently represent this set [1].
A FO is a finite state automaton constructed in an incremental fashion. A sequence of
symbols s = σ1 σ2 . . . σn is learned in such an automaton, which states are 0, 1, 2 . . . n.
There is always a transition arrow (called factor link) labeled by the symbol σi going
from state i − 1 to state i, 1 ≤ i < n. Depending on the structure of s, other arrows will
be added. Some are directed from a state i to a state j, where 0 ≤ i < j ≤ n. These
also belong to the set of factor links and are labeled by symbol σj . Some are directed
“backwards”, going from a state i to a state j, where 0 ≤ j < i ≤ n. They are called
suffix links, and bear no label (represented as ’’ in our processes below). The factor
links model a factor automaton, that is every factor p in s corresponds to a unique factor
link path labeled by p, starting in 0 and ending in some other state. Suffix links have
an important property : a suffix link goes from i to j iff the longest repeated suffix of
s[1..i] is recognized in j. Thus suffix links connect repeated patterns of s.
The oracle (see Figure 4) is learned on-line. For each new input symbol σi , a new
state i is added and an arrow from i − 1 to i is created with label σi . Starting from
i − 1, the suffix links are iteratively followed backward, until a state is reached where
a factor link with label σi originates (going to some state j), or until there is no more
suffix links to follow. For each state met during this iteration, a new factor link labeled
by σi is added from this state to i. Finally, a suffix link is added from i to the state
j or to state 0 depending on which condition terminated the iteration. Navigating the
oracle in order to generate variants is straightforward : starting in any place, following
factor links generates a sequence of labelling symbols that are repetitions of portions
of the learned sequence; following one suffix link followed by a factor links creates
a recombined pattern sharing a common suffix with an existing pattern in the original
sequence. This common suffix is, in effect, the musical context at any given time.
In [5] a tcc model of FO is proposed. This model has three drawbacks. Firstly, it
(informally) assumes the basic calculus has been extended with general recursion in
order to correctly model suffix links traversal. Secondly, it assumes dynamic construc-
tion of new variables δiσ set to the state reached by following factor link labelled σ
from state i. This construction cannot be expressed with the local variable primitive in
basic tcc. Thirdly, the model assumes a constraint system over both finite domains
and finite sets. We use below the expressive power of the abstraction construction in
def
FO = Counter P ersistence
! (abs N ote; play(N ote)) whenever ready do Step1 (N ote)
def
Counter = tell(i = 1) ! (abs x; i = x) (when ready do next tell(i = x + 1)
unless ready next tell(i = x))
def
P ersistence = ! (abs x, y, z; edge(x, y, z)) next tell(edge(x, y, z))
def
Step1 (N ote) = tell(edge(i − 1, i, N ote)) Step2 (N ote, i − 1)
def
Step2 (N ote, E) = when E = 0 do
(abs k; edge(E, k, N ote)) (tell(edge(i, k, )) next tell(ready))
unless ∃k edge(E, K, N ote) next (tell(ready) tell(edge(i, 0, )))
when E = 0 do
(abs j; edge(E, j, ))
when ∃k edge(j, k, N ote) do
(abs k; edge(j, k, N ote)) (tell(edge(i, k, )) next tell(ready))
unless ∃k edge(j, k, N ote) next when j = 0 do tell(edge(j, i, N ote))
Step2 (N ote, j)
utcc to correct all these drawbacks (see Figure 5). Furthermore, our model leads to a
compact representation of the data structure of the FO based on constraints of the form
edge(x, y, N ) representing an arc between node x and y labeled with N .
Process Counter signals when a new played note can be learned. It can be learned
when all links for the previous note have already been added to the FO. Process Per-
sistence transmits information about already constructed arcs (factor and suffix) to all
future time-units. Process Step1 adds a factor link from i − 1 to i labelled with a just
played note and launches traversal of suffix links from i − 1. When state zero is reached
by traversing suffix links, process Step2 adds a suffix link from i to a state reached
from 0 by a factor link labelled N ote, if it exists, or from i to state zero, otherwise. For
each state k different from zero reached in the suffix links traversal, process Step2 adds
factor links labelled N ote from k to i.
The inclusion of a new agent in our FO model (e.g. a learner agent for a second
performer) entails a new process and new interactions, both with the new process and
among the existing ones. In traditional models this usually means major changes in
the synchronization scheme, which are difficult to localize and control. In utcc, all
synchronization is done semantically, through the available information in the store.
Each agent would thus have to be incremented with processes testing for the presence
of new information (e.g. a factor link with some label in the other agent’s FO graph).
The new synchronization behavior that this demands is automatically provided by the
blocking ask (abstraction) construct.
5 Concluding Remarks
Here we argued for utcc as a declarative framework for modeling and verifying dy-
namic multimedia interaction systems. We showed that the synchronization mechanism
based on entailment of constraints leads to simpler models that scale up when more
agents are added. Moreover, we showed that systems can be formally verified with
the underlying temporal logic in utcc. We modeled two non trivial interacting sys-
tems. The model proposed for interactive scores in Section 3 improved considerably the
A Declarative Language for Dynamic Multimedia Interaction Systems 227
References
1. Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: A new structure for pattern
matching. In: Bartosek, M., Tel, G., Pavelka, J. (eds.) SOFSEM 1999. LNCS, vol. 1725,
p. 295. Springer, Heidelberg (1999)
2. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11) (1983)
3. Allombert, A., Assayag, G., Desainte-Catherine, M.: A system of interactive scores based on
Petri nets. In: Proceedings of SMC 2007 (2007)
4. Allombert, A., Assayag, G., Desainte-Catherine, M., Rueda, C.: Concurrent constraints mod-
els for interactive scores. In: Proceedings of SMC 2006 (2006)
5. Assayag, G., Dubnov, S., Rueda, C.: A concurrent constraints factor oracle model for music
improvisation. In: CLEI 2006 (2006)
6. Manna, Z., Pnueli, A.: The Temporal Logic of Reactive and Concurrent Systems: Specifica-
tion. Springer, Heidelberg (1991)
7. Milner, R.: Communicating and Mobile Systems: the Pi-Calculus. Cambridge University
Press, Cambridge (1999)
8. Olarte, C., Rueda, C.: A declarative language for dynamic multimedia interaction systems
(April 2009), http://www.lix.polytechnique.fr/˜colarte/
9. Olarte, C., Valencia, F.D.: The expressivity of universal timed CCP: Undecidability of
monadic FLTL and closure operators for security. In: Proc. of PPDP 2008. ACM, New York
(2008)
10. Olarte, C., Valencia, F.D.: Universal concurrent constraint programing: Symbolic semantics
and applications to security. In: Proc. of SAC 2008. ACM Press, New York (2008)
11. Perez, J.A., Rueda, C.: Non-determinism and probabilities in timed concurrent constraint
programming. In: ICLP 2008. LNCS (2008)
12. Saraswat, V., Jagadeesan, R., Gupta, V.: Foundations of timed concurrent constraint program-
ming. In: Proc. of LICS 1994. IEEE CS, Los Alamitos (1994)
13. Saraswat, V.A.: Concurrent Constraint Programming. MIT Press, Cambridge (1993)
Generalized Voice Exchange
Robert Peck
1 Introduction
The notion of voice exchange in ordered pitch-class space conforms closely to that of
contextual inversion in neo-Riemannian theory: the melodic dyad (a, b) in one voice
inverts in another voice, and we define an axis of inversion respectively for all such
pairs. A connection to the Parallel Exchange exists. Given a C major triad in a neo-
Hauptmannian sense [1] with an Einheit C and a Zweiheit G, these pitch-classes invert
about a contextual axis under the P operation, with G’s assuming the Einheit function
and C’s assuming that of Zweiheit in the resulting C minor triad. We substitute order
positions for chordal factors in translating this concept to voice exchange; therefore,
“Einheit” becomes “first coordinate” and “Zweiheit” “second,” and the ordered dyad
(C, G) becomes (G, C). We may thus apply many of the transformational concepts of
neo-Riemannian theory to a study of voice exchange.
We draw our musical examples from the Prelude to Richard Wagner’s Tristan und
Isolde, for which a separate analytical thread exists that considers aspects of tonality
in relation to the voice exchange (or “interchange” [2]) in the resolution of the Tristan
Chord [3,4,5]. Other recent research on the Prelude considers aspects of voice-
leading efficiency, particularly in various resolutions of the Tristan Chord, wherein
voice exchanges are viewed as surface embellishments of a more fundamental step-
wise structure [6]. This notion follows from a more general conceptualization of
voice exchanges as permutations, the attitude taken implicitly in [7].
We adopt here a transformational approach, wherein we regard voice exchanges as
being aligned with contextual inversion. To highlight this connection, we model
voice exchanges using pitch classes, hence in the integers modulo 12. The theory
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 228–235, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Generalized Voice Exchange 229
presented here, however, is easily extendable to other models of pitch in more robust
number systems, including infinite pitch space using the integers, continuous pitch
space using the reals, and so on. The transformational perspective allows us ultimate-
ly to relate voice exchanges—including various chromatic exchanges and those in
differing harmonic contexts—to one another in terms of a transformational scheme by
which we may describe networks that model processes in the music.
We find two voice exchanges labeled in X in the first seven measures of the Tristan
Prelude (Figures 1 and 2). In the first exchange, the minor third (G#, B) of the sopra-
no voice inverts to (B, G#) in the tenor; the second presents a sequential image of the
first under T3. In both instances, we note that (a, b) and (b, a) as ordered intervals are
inverses of one another and belong to the same interval class.
Another situation exists for chromatic voice exchanges: those in which (a, b) and its
image have intervals that do not belong to the same interval class. Let us take the
subsequent, altered leg of the above sequence in the Tristan Prelude as an example
(Figure 3). Here, the previously ascending minor third in the soprano is extended to a
major third (D, F#), while the tenor’s descending minor third is diminished (F, D#)—
a doubly chromatic exchange. We could define a transformation1
X′ := (a, b) (b – 1, a + 1), for all (a, b) ∈ ℤ12 × ℤ12 (2)
in which to label this exchange, but as it does not commute with inversion, X′ fails as
a contextual inversion.
The centralizer of the T/I group’s action on ℤ12 × ℤ12 in the symmetric group on the
same set
1
Throughout this article, we assume all arithmetic to be performed modulo 12, unless other-
wise indicated.
Generalized Voice Exchange 231
( / )= (ℤ ×ℤ ) /
(3)
provides all 7,644,119,040 operations on ℤ12 × ℤ12 that commute with both transposi-
tion and inversion. C(T/I) is the direct product of centralizers of two T/I orbit restric-
tions. One restriction is to the union of melodic dyads that belong to set classes [0, 1]
through [0, 5], a wreath product of order 245 · 5!; the other is to the union of set
classes [0, 0] and [0, 6], another wreath product of order 22 · 2! [10]. We may use an
involution from this centralizer, X9,4, to model all three of the above voice exchanges,
including the chromatic example. In the definition of such a function, we incorporate
the interval between a and b in pitch-class space
i = b – a mod 12 (4)
contextually, thereby facilitating the commutative property with both transposition
and inversion.
9 , 2 + ( + 4 mod 5) , for 0 ≤5
9,4 =( , ) +9 , 2 + ( + 3 mod 5) , for 6 ≤ 11 (5)
( , ), for = 0 mod 6
As we will see below, the subscript 9 in the label of this function associates with the
initial harmonic interval in the exchange, whereas the subscript 4 relates to a particu-
lar permutation of T/I orbits.
We note the essential difference in the actions of X9,4 on the T/I orbits of melodic
unisons and tritones (those for which i = 0 mod 6) and on those of all other dyads.
This variance is a consequence of the structure of the centralizer, as the action of T/I
restricted to an orbit from among the former set is not permutation isomorphic to one
from among the latter. Specifically, the orbits of melodic unisons and tritones have
two fixed points each under I0: (0, 0), (6, 6) and (0, 6), (6, 0), respectively, whereas
the remaining orbits have no such fixed points.
As these three voice exchanges involve melodic thirds, we are particularly interest-
ed in the action of X9,4 on dyads with intervals in interval classes 2 (as diminished
thirds), 3 and 4. Table 1 summarizes this action; in particular, we note the two ex-
changes within interval class 3, and that within interval classes 2 and 4. More gener-
ally, interval class 3 is stabilized under X9,4, and any melodic interval in interval class
3 + n will exchange with an interval in interval class 3 – n (where n ≤ 2).
0<i≤5
IC 1 IC 2 IC 3 IC 4 IC 5
8 11 11 2 2 6
11 8 2 11 5 3
IC 5 IC 4 IC 3 IC 2 IC 1
6 < i ≤ 11
232 R. Peck
Voice exchanges in which only one voice is altered chromatically require different, but
related, operations. One such example derives from Rothstein’s [11] reading of the
Tristan Prelude’s first two measures (Figure 4): here the bass voice’s implied ascend-
ing minor third (D, F) inverts to a descending diminished third (F, D#) in the alto.
6 < i ≤ 11
IC 1 IC 2 IC 3 IC 4 IC 5
5 3
2 5
IC 4 IC 3 IC 2 IC 1 IC 5
0<i≤5
The resulting permutation of T/I orbits between functions (5) and (6) obtains from
the variance in their respective modulo 5 components. In (5), we had (i + 4 mod 5)
and (i + 3 mod 5), whereas in (6) we have (i + 0 mod 5) and (i + 2 mod 5). We note
that for each pair, we may derive the latter addend by subtracting the former from 2
mod 5. That is, in (5), we may derive i + 3 from i + 4 by observing that 2 – 4 = 3 mod
5. In the same way, in (6), we have i + 0 and i + 2. Again, the latter addend follows
from the former: i + (2 – 0 mod 5) = i + 2. Henceforth in our generalization, let p
represent the integer modulo 5 which is added to i whenever 0 < i ≤ 5, then (2 – p
mod 5) is added to i for 6 < i ≤ 11. Therefore, in (5), p = 4, and in (6), p = 0.
Generalized Voice Exchange 233
We observe that X9,4 stabilizes set class [0, 3] as a set, while X9,0 stabilizes [0, 5].
We may use p in determining these fixed sets, as well. In the former case, [0, 3] con-
sists of all dyads with intervals in interval class 3; in the latter, [0, 5] consists of those
in interval class 5. Then,
(p – (1 – p mod 5) mod 5) + 1 (7)
yields the corresponding interval class. Hence, p = 4 for X9,4; therefore, the interval
class of intervals in the stabilized set class is (4 – (1 – 4 mod 5) mod 5) + 1 = 3. For
X9,0, (0 – (1 – 0 mod 5) mod 5) + 1 = 5.
All four of the examples above feature an initial harmonic interval in interval class 3:
a major sixth in the first three examples, and a minor tenth in the last. Per our fourth
desideratum, let (a, b) represent some melodic dyad in an exchange for which 0 < b –
a ≤ 5 mod 12 , and let (c, d) represent its image for which 6 < d – c ≤ 11 mod 12.
Then, for our purposes, let
q = a – c mod 12 (8)
represent this initial interval. In each of the previous examples, q = 9. It is possible,
however, to describe voice exchanges with initial harmonic intervals in other interval
classes by incorporating q into functions like (5) and (6) above.
For example, consider Figure 5, which presents m. 20 in the Tristan Prelude. Here
we find a chromatic voice exchange for which q = 8 lies in interval class 4. We may
model this exchange using the following function:
8 , 1 + ( + 3 mod 5) , for 0 ≤5
8,3 =( , ) +8 , 3 + ( + 4 mod 5) , for 6 ≤ 11 (9)
( , ), for = 0 mod 6
noting that b – 8 and b + 8 in (9) now replace b – 9 and b + 9 in the former functions.
In other words, b – q and b + q provide the generalization. Further, the previous in-
stances of a – 2 in (5) and (6) now read in (9) as a – 1 and a – 3 for 0 < i ≤ 5 and 6 < i
≤ 11, respectively. These values may be derived from q, as well. For cases in which
0 < i ≤ 5, put a – (q – 7); for those in which 6 < i ≤ 11, put a – (11 – q). Then, as q =
9 in (5) and (6), we have q – 7 = 2 and 11 – q = 2. In (9), q = 8; therefore, q – 7 = 1
and 11 – q = 3.
Table 3 outlines the exchanges within interval classes 1 to 5 for X8,3. In particular,
we note the exchange within interval classes 4 and 3, for an initial harmonic interval q
= 8, as seen in Figure 5. Moreover, as p = 3 for this example, we observe via (7) the
stabilization of set class [0, 1], whose dyads have intervals in interval class (3 – (1 – 3
mod 5) mod 5) + 1 = 1.
0<i≤5
IC 1 IC 2 IC 3 IC 4 IC 5
0 4
4 1
IC 1 IC 5 IC 4 IC 3 IC 2
6 < i ≤ 11
Thus far, we have described three operations on ℤ12 × ℤ12 in which to model voice
exchanges with varying initial harmonic intervals (the variable q) and T/I-orbit per-
mutations (the variable p). As q and p vary respectively within ℤ12 and ℤ5, we find 12
∙ 5 = 60 such operations in the following format:
, ( 7) + ( + mod 5) , for 0 ≤5
, =( , ) + , (11 ) + ( + (2 mod 5) mod 5) , for 6 ≤ 11 (10)
( , ), for = 0 mod 6
Each of these sixty operations, then, represents twelve specific voice-leading patterns
while i varies in ℤ12. Moreover, these operations are conjugate to each other in the
centralizer of the transposition group’s action on ℤ12 × ℤ12:
( )= (ℤ ×ℤ ) (11)
12
a wreath product of order 12 · 12!. Specifically, we may define an order 60 sub-
group R < C(T),
R := 〈Rq , Rp | (Rq)12 = (Rp)5 = 1〉 (12)
where
( , ), for ≤6
=( , ) (13)
( + 1, + 1), for 6
and
( , ), for ≤6
=( , ) (14)
( , +7+( 1 mod 5)), for 6
Generalized Voice Exchange 235
isomorphic to ℤ12 × ℤ5. The set of conjugates of any Xq, p under the members of R
consists of all sixty operations in the form of (10). Using members of R, we may thus
construct a network that relates the exchanges in all five of the previous examples
(Figure 6).
( )4 ( )4
9,0 9,4 8,3
mm. 1-2 mm. 2-3 m. 20
mm. 6-7
mm. 10-11
References
1. Gollin, E.: Some Aspects of Three-Dimensional Tonnetze. Journal of Music Theory 42(2),
195–206 (1998)
2. Mitchell, W.J.: The Tristan Prelude: Techniques and Structure. The Music Forum 1, 162–
203 (1967)
3. Harrison, D.: Supplement to the Theory of Augmented-Sixth Chords. Music Theory Spec-
trum 17(2), 170–195 (1995)
4. Rothgeb, J.: The Tristan Chord: Identity and Origin. Music Theory Online 1(1) (1995)
5. Rothstein, W.: The Tristan Chord in Historical Context: A Response to John Rothgeb. Mu-
sic Theory Online 1(1) (1995)
6. Tymoczko, D.: Scale Theory, Serial Theory, and Voice Leading. Music Analysis 27(1), 1–
49 (2008)
7. Callender, C., Quinn, I., Tymoczko, D.: Generalized Voice-Leading Spaces. Science 320,
346–348 (2008)
8. Kochavi, J.: Contextually Defined Musical Transformations. Ph.D. dissertation, State Uni-
versity of New York, Buffalo (2002)
9. Peck, R.: Transformational Preservation and Set-Multiclasses. In: The Thirty-first Annual
Meeting of the Society for Music Theory, Nashville, Tennessee (2008)
10. Hook, J.: Uniform Triadic Transformations. Ph.D. dissertation. Indiana University-
Bloomington (2002)
11. Rothstein, W.: The Tristan Chord in Historical Context: A Response to John Rothgeb. Mu-
sic Theory Online 1(1) (1995)
Representing and Estimating Musical
Expression in Melody
Christopher Raphael
1 Introduction
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 236–244, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Representing and Estimating Musical Expression in Melody 237
2 The Theremin
Our goal of expressive melody synthesis must, in the end, produce actual sound.
We introduce here an audio representation we believe provides a good trade-off
between expressive power and simplicity.
Consider the case of a sine wave in which both frequency, f (t), and amplitude,
a(t), are modulated over time:
t
s(t) = a(t) sin(2π f (τ )dτ ). (1)
0
These two time-varying parameters are the ones controlled in the early electronic
instrument known as the theremin. Continuous control of these parameters can
produce a variety of musical effects such as expressive timing, vibrato, glissando,
variety of attack and dynamics. Thus, the theremin is capable of producing a
rich range of expression. One significant aspect of musical expression which the
theremin cannot capture is tone color — as a time varying sine wave, the timbre
of the theremin is always the same. Partly because of this weakness, we have
modified the above representation to allow tone color to change as a function
of amplitude. Thus our sound is still parametrized by f (t) and a(t), while we
increase the perceived dynamic range.
A = {l− , l× , l+ , l→ , l← , l∗ }
Representing and Estimating Musical Expression in Melody 239
Fig. 1. Amazing Grace (top) and Danny Boy (bot) showing the note-level labeling of
the music using symbols from our alphabet
describing the role the note plays in the larger context. These labels, to some
extent, borrow from the familiar vocabulary of symbols musicians use to notate
phrasing in printed music. The symbols {l− , l× , l+ } all denote stresses or points
of “arrival.” The variety of stress symbols allows for some distinction among the
kinds of arrivals we can represent: l− is the most direct and assertive stress; l× is
the “soft landing” stress in which we relax into repose; l+ denotes a stress that
continues forward in anticipation of future unfolding, as with some phrases that
end in the dominant chord. Examples of the use of these stresses, as well as the
other symbols are given in Figure 1. The symbols {l→ , l∗ } are used to represent
notes that move forward towards a future goal (stress). Thus these are usually
shorter notes we pass through without significant event. Of these, l→ is the
“garden-variety” passing tone, while l∗ is reserved for the passing stress, as in a
brief dissonance, or to highlight a recurring beat-level emphasis, still within the
context of forward motion. Finally, the l← symbol denotes receding movement
as when a note is connected to the stress that precedes it. This commonly occurs
when relaxing out of a strong-beat dissonance en route to harmonic stability. We
will write x = x1 , . . . , xN with xn ∈ A for the prosodic labeling of the notes.
These concepts are illustrated with the examples of Amazing Grace and Danny
Boy in Figure 1. Of course, there may be several reasonable choices in a given
musical scenario, however, we also believe that most labellings do not make
interpretive sense and offer evidence of this is Section 7. Our entire musical
collection is marked in this manner and available for scrutiny at
http://www.music.informatics.indiana.edu/papers/mcm09
t n+1
p
n+1
pn
tn t −tbend −tglis
tn+1
n+1
Fig. 2. A graph of the frequency function, f (t), between two notes. Pitches are bent
in the direction of the next pitch and make small glissandi in transition.
Theramin Parameters
>−
1.0
>
− > − −
>
> < x >
>
> > > >
>
> >
0.8
> >
0.6
(h z / a m p)
0.4
0.2
0.0
0 5 10 15
secs
Fig. 3. The functions f (t) (green) and a(t) (red) for the first phrase of Danny Boy.
These functions have different units so their ranges have been scaled to 0-1 to facilitate
comparison.
N
p(x|y) = p(x1 |y1 ) p(xn |xn−1 , yn , yn−1 ) (2)
n=2
N
= p(x1 |y1 ) p(xn |xn−1 , zn )
n=2
where zn = (yn , yn−1 ). The intuition behind this assumption is the observation (or
opinion) that much of phrasing results from a cyclic alternation between forward
moving notes, {l→ , l∗ }, stressed notes, {l− , l+ , l× }, and optional receding notes
{l← }. Usually a phrase boundary is present as we move from either stressed or
receding states to forward moving states. Thus the notion of state, as in a Markov
chain, seems to be relevant. However, it is, of course, true that music has hierar-
chical structure not expressible through the regular grammar of a Markov chain.
We estimate the conditional distributions p(xn |xn−1 , zn ) for each choice of
xn−1 ∈ A, as well as p(x1 |y1 ), using our labeled data. We will use the notation
pl (x|z) = p(xn = x|xn−1 = l, zn = z)
for l ∈ A. In training these distributions we split our score data into |A| groups,
Dl = {(xli , zli )}, where Dl is the collection of all (class label, feature vector)
pairs over all notes that immediately follow a note of class l.
Our estimation method makes no prior simplifying assumptions and follows the
familiar classification tree methodology of CART [11]. That is, for each Dl we be-
gin with a “split,” z j > c separating Dl into two sets: Dl0 = {(xli , zli ) : zlij > c}
and Dl1 = {(xli , zli ) : zlij ≤ c}. We choose the feature, j, and cutoff, c, to achieve
maximal “purity” in the sets Dl0 and Dl1 as measured by the average entropy over
the class labels. We continue to split the sets Dl0 and Dl1 , splitting their “off-
spring,” etc., in a greedy manner, until the number of examples at a tree node
is less than some minimum value. pl (x|z) is then represented by finding the ter-
minal tree node associated with z and using the empirical label distribution over
the class labels {xli } whose associated {zli } fall to the same terminal tree node.
Given a piece of music with feature vector z1 , . . . , zN ,we can compute the
optimizing labeling
N
x̂1 . . . , x̂N = arg max p(x1 |y1 ) p(xn |xn−1 , zn )
x1 ,...,xN
n=2
7 Results
We estimated a labeling for each of the M = 50 pieces in our corpus by training
our model on the remaining M − 1 pieces and finding the most likely labeling,
x̂1 , . . . , x̂N , as described above. When computing the most likely labeling for
each melody in our corpus we found a total of 678/2674 errors (25.3%) with
detailed results as presented in Figure 4.
Representing and Estimating Musical Expression in Melody 243
l∗ l→ l← l− l× l+ total
∗
l 135 112 0 18 2 0 267
l→ 62 1683 8 17 0 0 1770
l← 3 210 45 6 2 0 266
l− 49 48 4 103 15 0 219
l× 5 32 2 65 30 0 134
l+ 0 3 0 12 3 0 18
total 254 2088 59 221 52 0 2674
Fig. 4. Confusion matrix of errors over the various classes. The rows represent the true
labels while the columns represent the predicted labels. The block structure indicated
in the table shows the confusion on the coarser categories of stress, forward movement,
and receding movement.
then the sequence x̄1 , . . . , x̄N minimizes the expected number of estimation errors
E(errors|y1 , . . . , yN ) = p(xn = x̄n |y1 , . . . , yN )
n
We have not chosen this latter metric because we want a sequence that behaves
reasonably. It the sequential nature of the labeling that captures the prosodic in-
terpretation, so the most likely sequence x̂1 , . . . , x̂n seems like a more reasonable
choice.
In an effort to measure what we believe to be most important — the perceived
musicality of the performances — we performed a small user study. We took a sub-
set of the most well-known melodies of the dataset and created audio files from the
random, hand, and estimated annotations. We presented all three versions of each
melody to a collection of 23 subjects who were students in the Jacobs School of
Music at Indiana University, as well as some other comparably educated listeners.
We regard the cohort as highly educated and sophisticated, musically speaking.
The subjects were presented with random orderings of the three versions, with
different orderings for each user, and asked to respond to the statement: “The per-
formance sounds musical and expressive” with the Likert-style ratings 1=strongly
disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree, as well as to rank the
three performances in terms of musicality (the ranking does not always follow from
244 C. Raphael
the Likert ratings). Out of a total of 244 triples that were evaluated in this way,
the randomly-generated annotation received a mean score of 2.96 while the hand
and estimated annotations received mean scores of 3.48 and 3.46. The rankings
showed no preference for the hand annotations over the estimated annotations
(p = .64), while both the hand and estimated annotations were clearly preferred
to the random annotations (p = .0002, p = .0003).
Perhaps the most surprising aspect of these results is the high score of the
random labellings — in spite of the meaningless nature of these labellings, the
listeners were, in aggregate, “neutral” in judging the musicality of the examples.
We believe the reason for this is that musical prosody, the focus of this research,
accounts for only a portion of what listeners respond to. All of our examples
were rendered with the same sound engine of Section 4 which tries to create a
sense of smoothness in the delivery with appropriate use of vibrato and timbral
variation. We imagine that the listeners were partly swayed by this appropriate
affect, even when the use of stress was not satisfactory. The results also show
that our estimation produced annotations that were, essentially, as good as the
hand-labeled annotations. This demonstrates, to some extent, a success of our
research, though it is possible that this also reflects a limit in the expressive
range of our interpretation representation. Finally, while the computer-generated
interpretations clearly demonstrate some musicality, the listener rating of 3.46
— halfway between “neutral” and “agree” — show there is considerable room
for improvement.
References
1. Goebl, W., Dixon, S., De Poli, G., Friberg, A., Bresin, R., Widmer, G.: Sense in ex-
pressive music performance: Data acquisition, computational studies, and models,
ch. 5, pp. 195–242. Logos Verlag, Berlin (2008)
2. Widmer, G., Goebl, W.: Computational models for expressive music performance:
The state of the art. Journal of New Music Research 33(3), 203–216 (2004)
3. Todd, N.P.M.: The kinematics of musical expression. Journal of the Acoustical
Society of America 97(3), 1940–1949 (1995)
4. Widmer, G., Tobudic, A.: Playing Mozart by analogy: Learning multi-level timing
and dynamics strategies. Journal of New Music Research 33(3), 203–216 (2003)
5. Hiraga, R., Bresin, R., Hirata, K., Katayose, H.: Rencon 2004: Turing Test for
musical expression. In: Proceedings of the 2004 Conference on New Interfaces for
Musical Expression (NIME 2004), pp. 120–123 (2004)
6. Hashida, Y., Nakra, T., Katayose, H., Murao, Y.: Rencon: Performance Rendering
Contest for Automated Music Systems. In: Proceedings of the 10th Int. Conf. on
Music Perception and Cognition (ICMPC 10), Sapporo, Japan, pp. 53–57 (2008)
7. Sundberg, J.: The KTH synthesis of singing. Advances in Cognitive Psychology.
Special issue on Music Performance 2(2-3), 131–143 (2006)
8. Friberg, A., Bresin, R., Sundberg, J.: Overview of the KTH rule system for musical
performance. Advances in Cognitive Psychology 2(2-3), 145–161 (2006)
9. Roads, C.: The Computer Music Tutorial. MIT Press, Cambridge (1996)
10. Palmer, C.: Music Performance. Annual Review Psychology 48, 115–138 (1997)
11. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression
Trees. Wadsworth and Brooks, Monterey (1984)
Evaluating Tonal Distances between Pitch-Class
Sets and Predicting Their Tonal Centres by
Computational Models
Atte Tenkanen
Abstract. The pitch-class set belongs to the core concepts within musi-
cal set theory. The mathematical properties of pitch-class sets (in terms
of interval-class content, evenness, etc.) as well as their mutual relations
to other sets have been widely studied. In this paper, we concentrate
on investigating them as carriers of tonal implications. Results provided
by four algorithmic models, which propose hypothetical tonal centres
for pitch-class sets, are compared. In addition to finding reference pitch
class(es) for each set class of cardinality 3-9, the models are used for
evaluating tonal distances between pitch-class sets. They are applied as
’similarity measures’ in conjunction with an automated, computer-aided
analysis method called comparison set analysis.
1 Introduction
In its narrowest sense, the concept of tonality is encapsulated within the concept
of tonal centre (TC). Following Huovinen [1, xvii], the tonal centre is defined as a
’reference pitch class that attains the greatest stability in a musical passage or in
a tonally perceived local musical object’. A pitch-class set (PCS) may be seen as
such an object. Traditionally, music theorists have not conceived of pitch-class sets
primarily as carriers of tonal implications. Instead, the discussions have centred
on their symmetrical properties, interval-class contents and other features that
are easily verified. However, all PCSs except the empty set and the chromatic ag-
gregate are –at least in theory– able to induce tonal implications [2].
There are two main aims in this paper: four different algorithmic models
are applied 1) to predict the tonal centre(s) for any unordered PCS and 2) to
evaluate a ’tonal distance’ or ’tonal stability’ between two PCSs. Our models take
a PCS as an input vector and produce a 12-dimensional vector, which includes
resulting weights related to each pitch class (0-11). For the first aim, one pitch-
class set is entered into the model (see Figure 1a). An index (0-11), which attains
the greatest value in the resulting vector, is a hypothetical TC of the PCS. In
order to evaluate the tonal distance between two PCSs, their 12-dimensional
resulting vectors are compared using the correlation distance. Both procedures
have been exploited using similar algorithms (c.f.[3], [4]). The latter approach
is used in conjunction with comparison set analysis (CSA) [5] in Section 4. The
main aim in CSA is to create representations of extensive musical surfaces that,
for their part, expose the prevalence of a particular comparison set throughout
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 245–257, 2009.
c Springer-Verlag Berlin Heidelberg 2009
246 A. Tenkanen
Tonal centre=
ALGORITHMIC
PC-set 12D-vector index of the
MODEL (1-4)
maximum value
b)
PC-set 1 12D-vector
ALGORITHMIC Correlation
MODEL (1,3,4) distance
PC-set 2 12D-vector
Fig. 1. Two applications that are based on the algorithmic models. The first application
(a) produces a hypothetical tonal centre or centres for any pitch-class set. Another
application (b) evaluates the tonal distance between two pitch-class sets.
Table 1. The basic properties of the four algorithmic models presented in Sections
2.1-2.4
1
This might be, for example, a pitch-class collection or a numerical string of rhythmic
units.
Evaluating Tonal Distances between Pitch-Class Sets 247
By using several different models, we aim to obtain more generic results than
is possible when using only a couple of them. In fact, the hypotheses and models
could be combined in other ways as well in order to produce even more alterna-
tives. Regarding the tonal distance applications, if the results significantly agree
they may be safely applied to CSA.
However, we do not suggest that there are ’correct’ and unambiguous answers
as far as TCs are concerned. On the contrary, we see alternative solutions as
being equally plausible, in the same way that different listeners may have diverse
strategies for selecting a TC for a chord [1, x]. The degree to which the results
are reliable is an issue that requires further comparisons between the model
predictions and the results of perceptual and cognitive experiments.
In Section 3, we discuss differences between the results provided by the differ-
ent methods and explain our findings in Table 4 (Appendix C) which presents
TC candidates for some sample set classes2 along with some other information
in condensed form. Although all SCs of cardinality ranging from 3 to 9 are used
to make comparisons between all the approaches (Fig. 5) we have included only
some results involved in the SCs of cardinality 3 and 6 for demonstration pur-
poses. Finally in Section 4, we apply the distance-based method to tonal analysis
of the Intermezzo from the opera Wozzeck by Alban Berg in order to show that
the methods are intended to apply to all kinds of pitch class sets, not only to
sets familiar from tonal contexts, which has often been the case in studies on
tonality induction.
2 Algorithmic Models
Parncutt [3,2] uses a first-order polynomial W (t) = pw(t) for predicting the per-
ceptual root(s) of a PCS. Vector w contains ’root-support weights’. In [8] he pro-
poses w as {10,0,1,0,3,0,0,5,0,0,2,0}, which are estimations of the root-support
weights for the ordered intervals between the fundamental of a harmonic complex
tone and its first eight harmonics. p denotes the components of a PCS as a binary
vector: for example, in the case of C-major triad p = {1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0}.
t (t = 0...11) indicates a cyclic permutation of w. Thus, the result vector W is a
series of pc-weights. For the C-major triad W = {18, 0, 3, 3, 10, 6, 2, 10, 3, 7, 1, 0},
which means that when a C-major triad is sounded, C (W = 18) is more salient
than, for example, E or G (W = 10)3 .
We used Parncutt’s model in our first algorithm, but instead of estimating the
weights using the overtone approximations, the procedure was turned around:
the weights were ’tuned’ anew by defining the desired TCs for a limited amount
of PCSs and by requiring that the function satisfies all these constraints. This
2
I.e., prime-form PCSs.
3
E.g. for the C-major triad, W (0) = pw(0) :
{1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0} ∗ {10, 0, 1, 0, 3, 0, 0, 5, 0, 0, 2, 0} = 10 + 3 + 5 = 18.
248 A. Tenkanen
4
"Basically, a CSP is a problem composed of a finite set of variables, each of which
is associated with a finite domain, and a set of constraints that restricts the values
the variables can simultaneously take. The task is to assign a value to each variable
satisfying all the constraints." [9, 1] A more precise and formal explanation can be
found in [10, 137].
5
The resulting vector of a Monte Carlo process was normalized by the sum of the vector
components.
6
I.e., by using the generator 7 of a cyclic group Z 12 under addition modulo 12, all
pitch classes can be obtained from any given pitch class. This allows for distance
measurements between any two pitch classes.
Evaluating Tonal Distances between Pitch-Class Sets 249
where cij includes all the COF-distances between the PCSs A and B. To demon-
strate, the cofrel -value between the PCSs
A = {0, 4, 7, 10} and B = {0, 5, 9} is
√
( 02 + 12 + 32 + ... + 22 + 12 + 52 )/ (4 ∗ 3) ≈ 2.75.7
3. ANALYTICAL PART
ANALYTICAL INPUT:
Pitch-class set ’A’
e.g. (2,5,9,11)
2. LEARNING PART
1. MAIN COMPONENTS
Desired constraints:
Core function: COF-relation(A,B) TC(0,4,7)=0
Selected reference set ’B’ which
TC(0,4,9)=9
satisfies all the constraints,
TC(0,4,7,10)=5
e.g.: (0,2,3,4)+Transp.
... etc
ANALYTICAL OUTPUT:
The transposition giving the smallest value:
e.g. TC(2,5,9,11)=2
Fig. 3. The COF-based algorithm in its details. In the example, the algorithm proposes
pc2 (d) as a TC for the d-minor triad with an added major sixth {2,5,9,11}.
Input pattern:
pc-set 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0
Desired output:
Subject responses
32,53, 0, 0, 0, 0, 0, 0, 50, 0, 0, 0
Fig. 4. Training the MLP, used in the third TC-algorithm. Binary vector
{1,1,0,1,1,0,0,1,1,0,0,0} is associated with the SC 6-Z19A, cf. the first row of Table
2, Appendix A.
After training the network weights, all of the SCs of cardinality 3 to 9 were
entered into the network and for each of them a vector of TC-weights was cal-
culated. This procedure was repeated 300 times and, finally, a normalized mean
vector was assigned for each SC. These (’tonal profile-’) vectors were used as the
basis for the information in column 5 of Table 4 in Appendix C.
where the PCS is represented as a row matrix vpcset = (v0 , v1 , ...v11 ). The value
of vi is 1 if there is a pitch class i in the PCS, otherwise the entry is 0. For
example, for PCS {2,5,9,11} vpcset = (0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1). The columns
of the matrix M consist of all 24 K-K profiles from the C-major profile to the
b-minor profile14 . The values are standardized by the length of the pcset-vector.
The d-minor triad with added major sixth {2,5,9,11} produces a tonal profile
14
The key profile for C major is (6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66,
2.29, 2.88) and for C minor (6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69,
3.34, 3.17) [11].
252 A. Tenkanen
P 685 COF
523
549
463
550
KKP 585 H
Fig. 5. The number of TC candidates in common between the four algorithms. P refers
to the first, COF to the second, H to the third and KKP to the fourth algorithm, as
presented in Section 2.
(3.53, 2.82, 4.38, 2.82, 3.45, 4.23, 2.92, 3.84, 2.68, 4.08, 3.67, 3.37, 3.23, 3.15, 4.79,
3.30, 3.58, 3.54, 4.01, 3.55, 3.32, 4.34, 3.30, 4.40). Hence, the greatest probability
values are assigned to d minor (4.79).
In order to compare our models, we entered all the SCs of cardinality 3-9 (336
pcs) into the four TC-algorithms (henceforth abbreviated as P, COF, H and
KKP) and selected the three most probable TCs15 predicted by each algorithm
for each SC. The resulting values concerning some selected trichordal and hexa-
chordal SCs are represented in Table 4, columns 3-6. The leftmost value always
denotes the most probable TC according to the algorithm, and so forth.
The number of TC candidates in common between each pair of algorithms
is given in Figure 5. The first two models have more candidates in common
685
( 1008 = 68%). The order of three TC candidates is not taken into account.
Although the models are quite different, using the same constraints in both
algorithms seems to produce similar results. The model based on KKP-profiles
seems to differ most from the others. All of the algorithms propose the same
three TC candidates for four SCs (for SCs 4-7, 6Z-19B, 6-20 and 7-6A).16
15
That means 3*336=1008 TC candidates per algorithm, except that the last algorithm
might introduce the same TC two times, i.e. for the same major and minor tonality,
e.g. F major and f minor in the case of SC 3-4A. See Table 4 in Appendix C, the
KKP column. It produced 948 different TC candidates altogether.
16
If we think about the consistency of the results across four algorithms, four cases
out of 336 (1.1%) appear to be quite insignificant. On the other hand, if all of the
TC-combinations consisting of three candidates were equally probable in the results
for each algorithm, the theoretical probability that they all would give the same
3
three candidates would be as small as 1/ 12
3
= 9.4 ∗ 10−8 . It has to be remembered,
however, that the algorithms are based on different underlying assumptions and,
thus, they ’speak with a different voice’.
Evaluating Tonal Distances between Pitch-Class Sets 253
Which pitch classes are then the best TC candidates for a SC according to the
results? For example, for SC 3-1 there are 6 candidates altogether (see Appendix
C, the last ’dispersion’-column refers to the number of different TC candidates
predicted by these models): 9, 0, 1, 7, 2 and 6. To answer the question about
the best TC candidates for a SC, we ranked all the candidates in each cell and
counted ’points’: the best candidates for SC 3-1 are thus pc0 and pc2 (see the
’Rank’ column), because pc0 is ranked once in the first position (3 points) and
twice in the second position (2*2 points), which gives 7 out of a maximum 12
points (2+3+0+2). The second candidate, pc2, also scores 7 points and the third
candidate, pc7, scores 4 points.
In addition, we checked if the winning candidates form pure fifths between
them. There is a p5 between pc0 and pc7 as well as between pc7, and pc2 for
SC 3-1. Thus, both of their ’bottom pitch classes’, pc0 and pc7, are marked as
’tonic’ in Tonic-column. Finally, we observed the difference between the tonic
candidates according to their position in the SC: tonic pitch classes that are
not members of that SC are underlined. A SC that exhibit such a quality is
paralleled by the dominant chord.
To present a more challenging example, we considered the TC candidates of
the Promethean SC 6-34A. The first and second algorithms rank pc10 as the
strongest TC but the third and fourth algorithms rank pc9 and pc0 instead. Jim
Samson [12, 156-7] has pointed out that in Scriabin’s music this chord may take
on a dominant quality of Eb or A (i.e. pc3 or pc9, Samson’s sample chord is
in this case in prime form transposition). Nevertheless, we can accept the pc10-
interpretation given by the models as well, by playing the Promethean chord
and the Bb-note or Bb-major triad successively.17 The authors are not aware of
whether or not Scriabin used his Promethean chord in this way: that would be
an issue for further study.
Fig. 6. Alban Berg: Intermezzo from Wozzeck. Correlation distances (P, H, KKP)
and tonal stability values (COF) calculated using the harmonic d-minor collection
{1,2,4,5,7,9,10} as a comparison set. The comparison curve produced by using the K-K
d-minor profile is also added to the figure (KK, gray line).
distance and the cofrel-stability offer a more flexible way to evaluate tonalities in
the piece. A suitable comparison set is required against which musical segments,
derived from the piece, are compared.
For analysis, we chose the movement Intermezzo (’Invention on a Key (d
minor)’) from the opera Wozzeck by Alban Berg. It is found in the fourth Scene
of the third Act, bars 320-371. We wanted to assess to what extent the d-minor
tonality actually occurs in the section.
To begin with, all of the note onsets of the Intermezzo were clustered into PCSs
of cardinality 7, according to their temporal proximity to their nearest neighbours.
In practice, that means that a heptachord consisting of the nearest pitch classes was
assigned to each unique note onset time (we follow here the segmentation method
introduced by in [13]). These heptachordal segments were compared with the har-
monic d-minor scale {1,2,4,5,7,9,10} using the correlation distance along with the
first (P), the third (H), the fourth (KKP) algorithm and the cofrel-stability func-
tion (COF) as such. Musical properties like duration, voicing and loudness were
not taken into consideration. Thereafter, the correlation distance values assigned
to individual segments were averaged over each bar and the resulting values were
normalized to a zero mean and unit variance for the purpose of comparability. To
facilitate comparison, we added a curve by using a K-K d-minor profile. For each
pitch-class in each heptachordal segment 18 was assigned a value according to the d-
minor key profile and these values were averaged over each bar. Thus, five different
curves, seen in Figure 6, were produced.
D minor seems to dominate the tonality in the beginning and the end of the
section as well as in bars 345 and 365. All five approaches –even the COF-relation,
which is based on a purely mathematical model– seem to correlate strongly with
each other. The result is confirmed numerically through correlation estimates,
calculated between the different curves (Table 3, Appendix B).
Another question is: How would the average distance-values look like if some
other transposition of the harmonic minor scale or another type of reference
18
Thus, the basis of calculations was equal to all approaches.
Evaluating Tonal Distances between Pitch-Class Sets 255
set is used instead? After using every transposition of the harmonic minor scale
and calculating the mean of bar averages (without normalization), the d-minor
collection was found to produce the lowest mean value. Furthermore, when all
heptachordal PCSs were entered into the procedure, the lowest mean value was
attained by the PCS {0,1,2,5,7,9,10}, which differs from the harmonic d-minor
scale only in that pc4 is replaced by pc0. Although the music is quite chromatic19
throughout, the d-minor key thus not only articulates it at the beginning and
the end but also dominates it in terms of statistical significance.
5 Conclusions
References
1. Huovinen, E.: Pitch-Class Constellations: Studies in the Perception of Tonal Cen-
tricity. Acta Musicologica Fennica 23, Turku. The Finnish Musicological Society
(2002)
2. Parncutt, R.: Tonal implications of harmonic and melodic Tn-types. In: Proceed-
ings of Mathematics and Computation in Music, MCM 2007. Springer, Berlin (in
press) (2007)
3. Parncutt, R.: Revision of Terhardt’s psychoacoustical model of the root(s) of a
musical chord. Music Perception 6, 65–94 (1988)
4. Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford University Press,
Oxford (1990)
5. Huovinen, E., Tenkanen, A.: Bird’s Eye Views of the Musical Surface - Methods
for Systematic Pitch-Class Set Analysis. Music Analysis 26(1-2), 159–214 (2007)
6. Chew, E.: Towards a Mathematical Model of Tonality. PhD Dissertation. MIT
(2000)
19
The distribution of pitch classes is quite plain. The numbers of summed occurrences
for the twelve pitch classes are 41,38,43,34,43,48,29,38,32,36,37, and 29 when their
occurrences are counted at a maximum of once per bar.
256 A. Tenkanen
7. Tenkanen, A., Gualda, F.: Detecting Changes in Musical Texture. In: Extended
Abstracts of International Workshop on Machine Learning and Music 2008 (2008),
http://www.iua.upf.es/~rramirez/MML08/abstracts.pdf
8. Parncutt, R.: A model of the perceptual root(s) of a chord accounting for voicing
and prevailing tonality. In: Leman, M. (ed.) JIC 1996. LNCS, vol. 1317, pp. 181–
199. Springer, Heidelberg (1997)
9. Tsang, E.P.K.: Foundations of Constraint Satisfaction. Academic Press, London
(1993)
10. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Pren-
tice Hall Series in Artificial Intelligence (2003)
11. Krumhansl, C.L., Kessler, E.J.: Tracing the dynamic changes in perceived tonal or-
ganization in a spatial representation of musical keys. Psychological Review 89(4),
334–368 (1982)
12. Samson, J.: Music in Transition: A Study of Tonal Expansion and Atonality, 1900-
1920. W.W. Norton & Company, New York (1977)
13. Tenkanen, A., Gualda, F.: Multiple Approaches to Comparison Set Analysis.
Springer, Heidelberg (in press)
APPENDIX A
Table 2. A response distribution for the trials in Experiment 5 by Huovinen, table 9.4.1
in [1, 258]. The SCs are shown in their prime forms with the set members underlined.
6-Z19A 32 53 1 11 7 5 2 10 50 0 0 1
6-Z19B 52 35 1 1 13 41 0 11 13 1 3 1
6-20 14 36 6 2 13 27 3 2 15 46 3 5
6-Z26 40 25 1 12 0 12 3 15 55 2 4 3
6-Z29 17 29 0 8 4 0 26 1 64 20 2 1
6-32 59 2 28 1 10 15 1 17 2 33 1 3
6-33A 38 1 18 14 3 55 0 12 0 20 10 1
6-33B 24 1 55 2 16 2 12 17 1 38 0 4
6-Z49 14 8 5 10 29 1 4 7 5 84 1 4
6-Z50 16 20 4 3 12 0 25 10 2 75 2 3
APPENDIX B
Table 3. Correlation estimates between the different approaches, explained in Section 4
P COF H KKP KK
P 1 0.81 0.81 0.98 0.94
COF 0.81 1 0.49 0.79 0.75
H 0.81 0.49 1 0.79 0.82
KKP 0.98 0.79 0.79 1 0.92
KK 0.94 0.75 0.82 0.92 1
Average 0.91 0.77 0.78 0.90 0.89
Evaluating Tonal Distances between Pitch-Class Sets 257
APPENDIX C
Three Conceptions of Musical Distance
Dmitri Tymoczko
Abstract. This paper considers three conceptions of musical distance (or in-
verse “similarity”) that produce three different musico-geometrical spaces: the
first, based on voice leading, yields a collection of continuous quotient spaces
or orbifolds; the second, based on acoustics, gives rise to the Tonnetz and re-
lated “tuning lattices”; while the third, based on the total interval content of a
group of notes, generates a six-dimensional “quality space” first described by
Ian Quinn. I will show that although these three measures are in principle quite
distinct, they are in practice surprisingly interrelated. This produces the chal-
lenge of determining which model is appropriate to a given music-theoretical
circumstance. Since the different models can yield comparable results, unwary
theorists could potentially find themselves using one type of structure (such as a
tuning lattice) to investigate properties more perspicuously represented by an-
other (for instance, voice-leading relationships).
1 Introduction
We begin with voice-leading spaces that make use of the log-frequency metric.1
Pitches here are represented by the logarithms of their fundamental frequencies, with
distance measured according to the usual metric on R; pitches are therefore “close” if
they are near each other on the piano keyboard. A point in Rn represents an ordered
series of pitch classes. Distance in this higher-dimensional space can be interpreted
as the aggregate distance moved by a collection of musical “voices” in passing from
one chord to another. (We can think of this, roughly, as the aggregate physical dis-
tance traveled by the fingers on the piano keyboard.) By disregarding information—
such as the octave or order of a group of notes—we “fold” Rn into an non-Euclidean
quotient space or orbifold. (For example, imposing octave equivalence transforms Rn
into the n-torus Tn, while transpositional equivalence transforms Rn into Rn–1, or-
thogonally projecting points onto the hyperplane whose coordinates sum to zero.)
Points in the resulting orbifolds represent equivalence classes of musical objects—
such as chords or set classes—while “generalized line segments” represent equiva-
lence classes of voice leadings.2 For example, Figure 1, from Tymoczko 2006,
1
For more on these spaces, see Callender 2004, Tymoczko 2006, and Callender, Quinn, and
Tymoczko 2008.
2
The adjective “generalized” indicates that these “line segments” may pass through one of the
space’s singular points, giving rise to mathematical complications.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 258–272, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Three Conceptions of Musical Distance 259
represents the space of two-note chords, while Figure 2, from Callender, Quinn, and
Tymoczko 2008, represents the space of three-note transpositional set classes. In both
spaces, the distance between two points represents the size of the smallest voice lead-
ing between the objects they represent.
unison
CC CsCs DD EfEf EE FF [FsFs]
minor second
CDf CsD DEf DsE EF FGf
major second
BCs CD CsDs DE EfF EFs [FG]
CEf CsE DF EfGf EG minor third
BD
major third
BfD BDs CE DfF DFs EfG [EGs]
perfect fourth
BfEf BE CF DfGf DG EfAf
AEf BfE BF DAf [EfA] tritone
CFs CsG
perfect fourth
AE BfF BFs CG DfAf DA
major third
GsE AF BfGf BG CAf CsA [DBf]
minor third
AfF AFs BfG BGs CA DfBf
GF AfGf major second
AG BfAf BA CBf [CsB]
minor second
GFs AfG AGs BfA BAs CB
unison
FsFs GG AfAf AA BfBf BB [CC]
Fig. 1. The Möbius strip representing voice-leading relations among two-note chords
Let’s now turn to a very different sort of model, the Tonnetz and related structures,
which I will describe generically as “tuning lattices.” These models are typically
discrete, with adjacent points on a particular axis being separated by the same inter-
val. The leftmost lattice in Figure 3 shows the most familiar of these structures, with
the two axes representing acoustically pure perfect fifths and major thirds. (One can
imagine a third axis, representing either the octave or the acoustical seventh, project-
ing outward from the paper.) The model asserts that the pitch G4 has an acoustic
affinity to both C4 (its “underfifth”) and D5 (its “overfifth”), as well as to Ef4 and B4
(its “underthird” and “overthird,” respectively). The lattice thus encodes a fundamen-
tally different notion of musical distance than the earlier voice leading models:
whereas A3 and Af3 are very close in log-frequency space, they are four steps apart
our tuning lattice. Furthermore, where chords (or more generally “musical objects”)
are represented by points in the voice leadings spaces, they are represented by poly-
topes in the lattices.3
Finally, there are measures of musical distance that rely on chords’ shared interval
content. From this point of view, the chords {C, Cs, E, Fs} and {C, Df, Ef, G}
resemble one another, since they are “nontrivially homometric” or “Z-related”: that is,
they share the same collection of pairwise distances between their notes. (For in-
stance, both contain exactly one pair that is one semitone apart, exactly one pair that
is two semitones apart, and so on.) However, these chords are not particularly close
3
For a modern introduction to the Tonnetz, see Cohn 1997, 1998, and 1999.
260 D. Tymoczko
in either of the two models considered previously. It is not intuitively obvious that
this notion of “similarity” produces any particular geometrical space. But Ian Quinn
has shown that one can use the discrete Fourier transform to generate (in the familiar
equal-tempered case) a six-dimensional “quality space” in which chords that share the
same interval content are represented by the same point.4 We will explore the details
shortly.
Fig. 2. The cone representing voice-leading relations among three-note transpositional set
classes
A3 E4 B4 Fs5 Cs6 A3 E4 B4 F5 C6
F3 C4 G4 D5 A5 F3 C4 G4 D5 A5
Fig. 3. Two discrete tuning lattices. On the left, the chromatic Tonnetz, where horizontally
adjacent notes are linked by acoustically pure fifths, while vertically adjacent notes are linked
by acoustically pure major thirds. On the right, a version of the structure that uses diatonic
intervals.
Clearly, these three musical models are very different, and it would be somewhat
surprising if there were to be close connections between them. But we will soon see
that this is in fact this case.
4
See Lewin 1959, 2001, Quinn 2006, 2007, Callender 2007.
Three Conceptions of Musical Distance 261
} {C ↔
CG
↔G D}
CC DD EE FF {F
EF
FC
GD
CD DE [FG]
{G
BD DF EG
↔ A}
{B↔C}
CE
BE CF DG
BF
DA
BF
AE CG DA
AF BG
{D
F}
CA
↔
E↔
GF AG BA
E
}
{ AE
CB EB
GG AA BB [CC] {A↔B}
Fig. 4. (left) Most efficient voice-leadings between diatonic fifths form a chain that runs
through the center of the Möbius strip from Figure 1. (right) These voice leadings form an
abstract circle, in which adjacent dyads are related by three-step diatonic transposition, and are
linked by single-step voice leading.
} {G ↔
C
↔C A}
{B
a
e
{E↔
{D↔E}
F}
F
G
{C
G}
↔
↔
}
{F d
b°
{A↔B}
Fig. 5. (left) Most efficient voice-leadings between diatonic triads form a chain that runs
through the center of the orbifold representing three-note chords. (right) These voice leadings
form an abstract circle, in which adjacent triads are linked by single-step voice leading. Note
that here, adjacent triads are related by transposition by two diatonic steps.
Fig. 6. Major, minor, and augmented triads as they appear in the orbifold representing three-
note chords. Here, triads are particularly close to their major-third transpositions.
5
This graph was first discovered by Douthett and Steinbach (1998).
Three Conceptions of Musical Distance 263
. . .
{024579e}
t0} {02
79 C 46
5 7
{24 F e} {5↔ G 9e}
{t↔ 6}
Bf t0}
{0
. 4} ↔
.
{12
↔
79
1}
4
{3
679
{23
D
e}
9}
{7↔
. {8↔ .
{23578t0}
{124689e}
8}
Ef
A
2}
{2↔
{1↔
. .
3}
0}
{13
Af
78t
E 9e}
7}
{9
4
. . .
↔
68
5
{13
{6
t}
B/ {4↔5} {e↔
0} f
Cf /D
46 Cs 68t0} {13
8te F s/Gf 5
} {13
{13568te}
Fig. 7. Fifth-related diatonic scales form a chain that runs through the center of the seven-
dimensional orbifold representing seven-note chords. It is structurally analogous to the circles
in Figures 4 and 5.
Correlation
Bach .96
Haydn .93
MAJOR
Mozart .91
Beethoven .96
Bach .95
Haydn .91
MINOR
Mozart .91
Beethoven .96
Fig. 8. Correlations between modulation frequency and voice-leading distances among scales,
in Bach’s Well-Tempered Clavier, and the piano sonatas of Haydn, Mozart, and Beethoven.
The very high correlations suggest that composers typically modulate between keys whose
associated scales can be linked by efficient voice leading.
Fig. 9. On this three-dimensional Tonnetz, the C7 chord is represented by the tetrahedron whose
vertices are C, E, G, and Bf . The Cø7 chord is represented by the nearby tetrahedron C,
Ef, Gf, Bf, which shares the C- Bf edge.
We will now investigate the way tuning lattices like the Tonnetz represent voice-
leading relationships among familiar sonorities. Here my argumentative strategy will
by somewhat different, since it is widely recognized that the Tonnetz has something to
do with voice leading. (This is largely due to the important work of Richard Cohn,
who has used the Tonnetz to study what he calls “parsimonious” voice leading.7) My
goal will therefore be to explain why tuning lattices are only an approximate model of
contrapuntal relationships, and only for certain chords.
The first point to note is that inversionally related chords on a tuning lattice are
near each other when they share common tones.8 For example, the Tonnetz represents
perfect fifths by line segments; fifth-related perfect fifths, such as {C, G} and {G, D}
are related by inversion around their common note, and are adjacent on the lattice
(Figure 3). Similarly, major and minor triads on the Tonnetz are represented by trian-
gles; inversionally related triads that share an interval, such as {C, E, G} and {C, E,
A}, are joined by a common edge. (On the standard Tonnetz, the more common
tones, the closer the chords will be: C major and A minor, which share two notes, are
closer than C major and F minor, which share only one.) In the three-dimensional
Tonnetz shown in Figure 9, where the z axis represents the seventh, C7 is near its
6
Similar points could potentially be made about the prevalence, in functionally tonal music, of
root-progressions by perfect fifth. It may be that the diatonic circle of thirds shown in Figure
5 provides a more perspicuous model of functional harmony than do more traditional fifth-
based representations.
7
See Cohn 1997.
8
This is not true of the voice leading spaces considered earlier: for example, in three-note
chord space {C, D, F} is not particularly close {F, Af, Bf}.
Three Conceptions of Musical Distance 265
inversion Cø7. The point is reasonably general, and does not depend on the particular
structure of the Tonnetz or on the chords involved: on tuning lattices, inversionally
related chords are close when they share common tones.9
The second point is that acoustically consonant chords often divide the octave rela-
tively evenly; such chords can be linked by efficient voice leading to those inversions
with which they share common notes.10 It follows that proximity on a tuning lattice
will indicate the potential for efficient voice leading when the chords in question are
nearly even and are related by inversion. Thus {C, G} and {G, D} can be linked by
the stepwise voice leading (C, G)→(D, G), in which C moves up by two semitones.
Similarly, the C major and A minor triads can be linked by the single-step voice lead-
ing (C, E, G)→(C, E, A), and C7 can be linked to Cø7 by the two semitone voice-
leading (C, E, G, Bf)(C, Ef, Gf, Bf). In each case the chords are also close on the
relevant tuning lattice. (Interestingly, triadic distances on the diatonic Tonnetz in Fig.
3 exactly reproduce the circle-of-thirds distances from Fig. 5.) This will not be true
for uneven chords: {C, E} and {E, Gs} are close on the Tonnetz, but cannot be linked
by particularly efficient voice leading; the same holds for {C, G, Af} and
{G, Af, Df}. Tuning lattices are approximate models of voice-leading only when one
is concerned with the nearly-even sonorities that are fundamental to Western tonality.
Bff Ff Cf Gf Df
Df Af Ef Bf F
& œœ b œœ œœ
F
4
C G D A
?œ œ œ
3 1
2
A E B Fs Cs
Fig. 10. On the Tonnetz, F major (triangle 3) is closer to C major (triangle 1) than F minor
(triangle 4) is. In actual music, however, F minor frequently appears as a passing chord between
F major and C major. Note that, unlike in Figure 3, I have here used a Tonnetz in which the
axes are not orthogonal; this difference is merely orthographical, however.
9
In the general case, the notion of “closeness” needs to be spelled out carefully, since chords
can contain notes that are very far apart on the lattice. In the applications we are concerned
with, chords occupy a small region of the tuning lattice, and the notion of “closeness” is
fairly straightforward.
10
See Tymoczko 2006 and 2008a. The point is relatively obvious when one thinks geometri-
cally: the two chords divide the pitch-class circle nearly evenly into the same number of
pieces; hence, if any two of their notes are close, then each note of one chord is near some
note of the other.
266 D. Tymoczko
11
See Tymoczko 2006, and Hall and Tymoczko 2007. Metrics that violate the distribution
constraint have counterintuitive consequences, such as preferring “crossed” voice leadings to
their uncrossed alternatives. Here, the claim that A minor is closer to C major than F minor
leads to the F minor/F major problem discussed in Figure 10.
12
Here I use the L1 or “taxicab” metric. The correlation between Tonnetz distances and the
number of shared common tones is an even-higher .9; however, “number of shared common
tones” is not interpretable as a voice-leading metric.
Three Conceptions of Musical Distance 267
investigating triadic voice-leading. I have argued that we should resist this conclu-
sion: if we use the Tonnetz to model chromatic music, than Schubert’s major-third
juxtapositions will seem very different from his habit of interposing F minor between
F major and C major, since the first can be readily explained using the Tonnetz
whereas the second cannot.13 The danger, therefore, is that we might find ourselves
drawing unnecessary distinctions between these two cases—particularly if we mistak-
enly assume the Tonnetz is a fully faithful model of voice-leading relationships.
We conclude by investigating the relation between voice leading and the Fourier-
based perspective.14 The mechanics of the Fourier transform are relatively simple: for
any number n from 1 to 6, and every pitch-class p in a chord, the transform assigns a
two-dimensional vector whose components are:
Vp, n = (cos (2πpn/12), sin (2πpn/12))
Adding these vectors together, for one particular n and all the pitch-classes p in the
chord, produces a composite vector representing the chord as a whole—its “nth Fou-
rier component.” The length (or “magnitude”) of this vector, Quinn observes, reveals
something about the chord’s harmonic character: in particular, chords saturated with
(12/n)-semitone intervals, or intervals approximately equal to 12/n, tend to score
highly on this index of chord quality.15 The Fourier transform thus seems to quantify
the intuitive sense that chords can be more-or-less diminished-seventh-like, perfect-
fifthy, or whole-toneish. Interestingly, “Z-related” chords—or chords with the same
interval content—always score identically on this measure of chord-quality. In this
sense, Fourier space (the six-dimensional hypercube whose coordinates are the Fou-
rier magnitudes) seems to model a conception of similarity that emphasizes interval
content, rather than voice leading or acoustic consonance.
However, there is again a subtle connection to voice leading: it turns out that the
magnitude of a chord’s nth Fourier component is approximately linearly related to the
(Euclidean) size of the minimal voice leading to the nearest subset of any perfectly
even n-note chord.16 For instance, a chord’s first Fourier component (FC1) is ap-
proximately related to the size of the minimal voice leading to any transposition of
{0}; the second Fourier component is approximately related to the size of the minimal
voice leading to any transposition of either {0} or {0, 6}; the third component is ap-
proximately related to the size of the minimal voice leading to any transposition of
either {0}, {0, 4} or {0, 4, 8}, and so on. Figure 11 shows the location of the subsets
13
See Cohn 1999.
14
This material in this section appears in Tymoczko 2008b. It is influenced by Robinson
(2006), Hoffman (2007), and Callender (2007).
15
Here I use continuous pitch-class notation where the octave always has size 12, no matter
how it is divided. Thus the equal-tempered five-note scale is labeled {0, 2.4, 4.8, 7.2, 9.6}.
16
Here I measure voice-leading using the Euclidean metric, following Callender 2004. See
Tymoczko 2006 and 2008a for more on measures of voice-leading size.
268 D. Tymoczko
of the n-note perfectly even chord, as they appear in the orbifold representing three-
note set-classes, for values of n ranging from 1 to 6.17 Associated to each graph is one
of the six Fourier components. For any three-note set class, the magnitude of its nth
Fourier component is a decreasing function of the distance to the nearest of these
marked points: for instance, the magnitude of the third Fourier component (FC3) de-
creases, the farther one is from the nearest of {0}, {0, 4} and {0, 4, 8}. Thus, chords
in the shaded region of Figure 12 will tend to have a relatively large FC3, while those
in the unshaded region will have a smaller FC3. Figure 13 shows that this relationship
is very-nearly linear for twelve-tone equal-tempered trichords.
FC5, subsets of {0, 2.4, 4.8, 7.2, 9.6} FC6, subsets of {0, 2, 4, 6, 8, 10}
Fig. 11. The magnitude of a set class’s nth Fourier component is approximately linearly related
to the size of the minimal voice leading to the nearest subset of the perfectly even n-note chord,
shown here as dark spheres.
17
See Callender 2004, Tymoczko 2006, Callender, Quinn, and Tymoczko, 2008. These trian-
gles result from bisecting the cone in Figure 2. Every point represents a set class, while
every line segment represents an equivalence class of voice leadings.
Three Conceptions of Musical Distance 269
Fig. 12. Chords in the shaded region will have a large FC3 component, since they are near
subsets of {0, 4, 8}. Those in the unshaded region will have a smaller FC3 component.
4
magnitude of the 3rd
Fourier component
3 014 001
048 015 003 y = –1.38x + 3.16
004 037 005
000
2
024 002
026 006
1 027 013
016 036
012 025
0
0 0.5 1 1.5 2
minimal voice leading
Fig. 13. For trichords, the equation FC3 = –1.38VL + 3.16 relates the third Fourier component
to the Euclidean size of the minimal voice leading to the nearest subset of {0, 4, 8}
Table 1 uses the Pearson correlation coefficient to estimate the relationship be-
tween the voice-leading distances and Fourier components, for twelve-tone equal-
tempered multisets of various cardinalities. The strong anti-correlations indicate that
one variable predicts the other with a very high degree of accuracy. Table 2 calculates
the correlation coefficients for three-to-six-note chords in 48-tone equal temperament.
270 D. Tymoczko
These strong anticorrelations, very similar to those in Table 1, show that there contin-
ues to be a very close relation between Fourier magnitudes and voice-leading size in
very finely quantized pitch-class space. Since 48-tone equal temperament is so finely
quantized, these numbers are approximately valid for continuous, unquantized pitch-
class space.18
Table 2. Correlations between voice-leading distances and Fourier magnitudes in 48-tone equal
temperament
FC1
Trichords -.99
Tetrachords -.97
Pentachords -.97
Hexachords -.96
Explaining these correlations, though not very difficult, is beyond the scope of this
paper. From our perspective, the important question is whether we should measure
chord quality using the Fourier transform or voice leading.19 In particular, the issue is
whether the Fourier components model the musical intuitions we want to model: as
we have seen, the Fourier transform requires us to measure a chord’s “harmonic qual-
ity” in terms of its distance from all the subsets of the perfectly even n-note chord.
But we might sometimes wish to employ a different set of harmonic prototypes. For
instance, Figure 14 uses a chord’s distance from the augmented triad to measure the
Fig. 14. The mathematics of the Fourier transform requires that we conceive of “chord quality”
in terms of the distance to all subsets of the perfectly even n-note chord (left). Purely voice-
leading-based conceptions instead allow us to choose our harmonic prototypes freely (right).
Thus we can voice leading to model a chord’s “augmentedness” in terms of its distance from
the augmented triad, but not the tripled unison {0, 0, 0} or the doubled major third {0, 0, 4}.
18
It would be possible, though beyond the scope of this paper, to calculate this correlation
analytically. It is also possible to use statistical methods for higher-cardinality chords. A
large collection of randomly generated 24- and 100-tone chords in continuous space pro-
duced correlations of .95 and .94, respectively.
19
See Robinson 2006 and Straus 2007 for related discussion.
Three Conceptions of Musical Distance 271
trichordal set classes’ “augmentedness.” Unlike Fourier analysis, this purely voice-
leading-based method does not consider the triple unison or doubled major third to be
particularly “augmented-like”; hence, set classes like {0, 1, 4} do not score particu-
larly highly on this index of “augmentedness.” This example dramatizes the fact that,
when using voice leading, we are free to choose any set of harmonic prototypes,
rather than accepting those the Fourier transform imposes on us.
5 Conclusion
The approximate consistency between our three models is in one sense good news:
since they are closely related, it may not matter much—at least in practical terms—
which we choose. We can perhaps use a tuning lattice such as the Tonnetz to repre-
sent voice-leading, as long as we are interested in gross contrasts (“near” vs. “far”)
rather than fine quantitative differences (“3 steps away” vs. “2 steps away”). Simi-
larly, we can perhaps use voice-leading spaces to approximate the results of the Fou-
rier analysis, as long as we are interested in modeling generic harmonic intuitions
(“very fifthy” vs. “not very fifthy”) rather than exploring very fine differences among
Fourier magnitudes.
However, if we want to be more principled, then we need to be more careful. The
resemblances among our models mean that it is possible to inadvertently use one sort
of structure to discuss properties that are more directly modeled by another. And
indeed, the recent history of music theory displays some fascinating (and very fruit-
ful) imprecision about this issue. It is striking that Douthett and Steinbach, who first
described several of the lattices found in the center of the voice-leading orbifolds—
including Figure 6—explicitly presented their work as generalizing the familiar Ton-
netz.20 Their lattices, rather than depicting parsimonious voice leading among major
and minor triads, displayed single-semitone voice leadings among major, minor, and
augmented triads; and as a result of this small difference, every distance can be inter-
preted as representing voice-leading size. However, this difference only became ap-
parent after it was understood how to embed their discrete structures in the continuous
geometrical figures described at the beginning of this paper. Thus one could say that
the continuous voice-leading spaces evolved out of the Tonnetz, by way of Douthett
and Steinbach’s discrete lattices, even though the structures now appear to be funda-
mentally different. Related points could be made about Quinn’s “quality space,”
whose connection to the voice-leading spaces took several years—and the work of
several authors—to clarify.
There is, of course, nothing wrong with this: knowledge progresses slowly and fit-
fully. But our investigation suggests that we may want to think carefully about which
model is appropriate for which music-theoretical purpose. I have tried to show that
the issues here are complicated and subtle: the mere fact that tonal pieces modulate by
fifth does not, for example, require us to use a tuning lattice in which fifths are
20
See Douthett and Steinbach 1998. The same is true of Tymoczko 2004, which uses the term
“generalized Tonnetz” to describe another set of lattices appearing in the voice-leading
spaces.
272 D. Tymoczko
smaller than semitones. (Indeed, the “circle of fifths” C-G-D-… can be interpreted
either as a one-dimensional tuning lattice incorporating octave equivalence, or as a
diagram of the voice-leading relations among diatonic scales, as in Figure 7.) Like-
wise, there may be close connections between voice-leading spaces and the Fourier
transform, even though the latter associates “Z-related” chords while the former does
not. The present paper can be considered a down-payment toward a more extended
inquiry, one that attempts to determine the relative strengths and weaknesses of our
three different-yet-similar conceptions of musical distance.
References
Callender, C.: Continuous Transformations. Music Theory Online 10(3) (2004)
Callender, C.: Continuous Harmonic Spaces. Journal of Music Theory 51(2) (in press) (2007)
Callender, C., Quinn, I., Tymoczko, D.: Generalized Voice-Leading Spaces. Science 320, 346–
348 (2008)
Cohn, R.: Properties and Generability of Transpositionally Invariant Sets. Journal of Music
Theory 35, 1–32 (1991)
Cohn, R.: Maximally Smooth Cycles, Hexatonic Systems, and the Analysis of Late-Romantic
Triadic Progressions. Music Analysis 15(1), 9–40 (1996)
Cohn, R.: Neo-Riemannian Operations, Parsimonious Trichords, and their ‘Tonnetz’ Represen-
tations. Journal of Music Theory 41(1), 1–66 (1997)
Cohn, R.: Introduction to Neo-Riemannian Theory: A Survey and a Historical Perspective.
Journal of Music Theory 42(2), 167–180 (1998)
Cohn, R.: As Wonderful as Star Clusters: Instruments for Gazing at Tonality in Schubert. Nine-
teenth-Century Music 22(3), 213–232 (1999)
Douthett, J., Steinbach, P.: Parsimonious Graphs: a Study in Parsimony, Contextual Transfor-
mations, and Modes of Limited Transposition. Journal of Music Theory 42(2), 241–263
(1998)
Hall, R., Tymoczko, D.: Poverty and polyphony: a connection between music and economics.
In: Sarhanghi, R. (ed.) Bridges: Mathematical Connections in Art, Music, and Science,
Donostia, Spain (2007)
Hoffman, J.: On Pitch-class set cartography (unpublished) (2007)
Lewin, D.: Re: Intervallic Relations between Two Collections of Notes. Journal of Music The-
ory 3, 298–301 (1959)
Lewin, D.: Special Cases of the Interval Function between Pitch-Class Sets X and Y. Journal of
Music Theory 45, 1–29 (2001)
Quinn, I.: General Equal Tempered Harmony (Introduction and Part I). Perspectives of New
Music 44(2), 114–158 (2006)
Quinn, I.: General Equal-Tempered Harmony (Parts II and III). Perspectives of New Mu-
sic 45(1), 4–63 (2007)
Robinson, T.: The End of Similarity? Semitonal Offset as Similarity Measure. In: The annual
meeting of the Music Theory Society of New York State, Saratoga Springs, NY (2006)
Straus, J.: Uniformity, Balance, and Smoothness in Atonal Voice Leading. Music Theory Spec-
trum 25(2), 305–352 (2003)
Straus, J.: Voice leading in set-class space. Journal of Music Theory 49(1), 45–108 (2007)
Tymoczko, D.: Scale Networks in Debussy. Journal of Music Theory 48(2), 215–292 (2004)
Tymoczko, D.: The Geometry of Musical Chords. Science 313, 72–74 (2006)
Tymoczko, D.: Scale Theory, Serial Theory, and Voice Leading. Music Analysis 27(1), 1–49
(2008a)
Tymoczko, D.: Voice leading and the Fourier Transform. Journal of Music Theory 52(2) (in
press) (2008b)
Pairwise Well-Formed Scales and a Bestiary of Animals
on the Hexagonal Lattice
Jon Wild
Well-Formed Scales
Carey and Clampitt [1] introduced the concept of a well-formed scale to the music-
theoretical community; these collections had earlier been investigated by Erv Wilson
[2] under the name “Moments of Symmetry”. For the purposes of the present paper an
important characteristic of a well-formed scale can be expressed as follows: it is a
scale with exactly two step sizes, whose “tokenised” cyclic interval list (e.g.
aaabaaabaab) has the property that each token is maximally evenly distributed
among the other tokens. Well-formed scales are generated; that is, they are formed by
the iteration of a single generating interval, with the resulting pitches collapsed into
an octave. The properties of such collections have been extensively researched in re-
cent music-theoretical literature; see for example [3].
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 273–285, 2009.
© Springer-Verlag Berlin Heidelberg 2009
274 J. Wild
property that each token is maximally evenly distributed among the others. Only odd
cardinalities of pwwf scale are possible, and the multiplicity of each step size must be
coprime to the scale cardinality. Clampitt gives the example of the diatonic scale in
Zarlino’s syntonic tuning [5], whose scale steps enjoy the following frequency ratios
relative to C: 1:1, 9:8, 5:4, 4:3, 3:2, 5:3, 15:8, 2:1. Setting the step sizes a=9:8,
b=10:9, c=16:15 we obtain the token list abcabac, which is easily verified to have the
relevant property: the a’s are as evenly distributed as three items could be among
seven; the b’s are as evenly distributed as two items could be; likewise for the c’s.
Fig. 1.
Fig. 2.
Not all doubly-generated scales, then, are three-stepped, much less pairwise well-
formed. Later on we shall be interested in the converse question: are all pairwise well-
formed scales doubly-generated?
1
This pair is not the only choice of generators; we could just as easily have chosen the pair
(3:2, 16:15), in which case the corresponding diagram would have the upper row offset by
one position towards the left.
Pairwise Well-Formed Scales and a Bestiary of Animals on the Hexagonal Lattice 275
Fig. 3.
2
When we count rotations and reflections as distinct, we are enumerating fixed as opposed to
free animals.
3
These numbers include the animals formed by iterating only one of the two generators, along
each of the three axes of the lattice. If we wished to exclude these “one-dimensional” animals,
we would subtract 3 from these numbers.
276 J. Wild
Fig. 4.
Fig. 5.
face-connections—in other words it is not enough for each note of a scale to be repre-
sentable as some sum of the generators; the notes must form a connected subset (not
necessarily simply connected, as the first animal in Figure 5 shows). A disconnected
quasi-animal such as the second of Figure 5 is not enumerated here; of course if we
relaxed the connectivity constraints to allow Figure 5 we would have an infinite num-
ber of scales to consider. Still, the large size of our bestiary of connected lattice ani-
mals would initially appear to be a confounding factor in the study of the relationship
between animals and pwwf scales.
Fig. 6.
278 J. Wild
alternative diatonic tuning (this one has also previously been noted by Clampitt as
pwwf) where the note D is a syntonic comma lower than in Zarlino’s scale; this vari-
ant appears in the 16th-century treatise by Fogliano [6] which slightly predates Zar-
lino’s. A third pwwf tuning of a diatonic set, where G, E and B appear a syntonic
comma lower than in the Fogliano tuning, is one I have not come across before. It
contains four justly-tuned triads, one fewer than the other two diatonic tunings. Its
mirror image (reflected in a vertical axis) is not diatonic; it is a tuning for the Hungar-
ian gypsy scale which Clampitt [4] has previously identified as pwwf. Two other
pwwf scales on this lattice contain four justly tuned triads; all the scales mentioned in
this paragraph are shown labelled with note names in Figure 7.
Fig. 7.
In Figure 6 a list of step size tokens was associated with each scale; Clampitt [7]
has recently identified this kind of list with the words of mathematical word theory.
Each word has been put in a sort of prime form: among all the mappings of tokens to
step sizes, and all the rotations of the word, the one with lowest lexical position (i.e.
soonest in alphabetical order) has been selected as representative of the whole word
class. Reducing to these primeform words shows that the 21 pwwf scales on this lat-
tice belong to just four classes: aaabaac, aabacab, abababc, and abacabc. These in
fact exhaust the pwwf words possible for scales of cardinality 7.
In none of the 21 pwwf animals are all three of the axis intervals—the perfect fifth,
major third and minor third—required to construct the scale.4 This means each of the
animals is generated by only two intervals—but they may be any two. If we are more
strict about the generators, and only enumerate animals that require exclusively the
perfect fifth and major third, we will eliminate all those that have “unsupported left-
leaning segments”, and find there are 12 animals remaining in our bestiary; they are
the ones marked with an asterisk in Figure 6, which I call the “strict list” for the gen-
erator pair (3:2, 5:4). The unmarked animals also turn out to be generated by just two
intervals—but those two intervals are the fifth and minor third; or the major and mi-
nor thirds. The corresponding scales would appear in the strict lists for the lattices
formed by those pairs of generators.5
4
To illustrate, an example of a (non-pwwf) collection that does require all three intervals is
shown in Figure 8.
5
An alternative formulation for the results of this paper would restrict itself to the strict lists. In
that formulation a square rather than hexagonal grid would be appropriate, where scales are
represented by polyominoes. This eliminates the possibility of animals that require
connections along the third axis of the hexagonal grid.
Pairwise Well-Formed Scales and a Bestiary of Animals on the Hexagonal Lattice 279
Fig. 8.
Table 1.
*: The {7:4, 11:8} lattice includes one PWWF scale that is singly—not doubly—
generated. That is, the seven-note chain formed by stacking the interval 11:7 has the
pwwf property. The resulting scale has the form aabacab. Clampitt has previously re-
marked that some of these symmetrical pwwf scales may be generated by a single in-
terval. Since a scale formed by iterating a single generator can have at most three step
sizes, and since these scales can be pwwf (as in this example) or not (as in every other
simply generated scale on any of the lattices in Table 1), such one-dimensional animals
that “live” on a single row do not have a place in the healthy/unhealthy opposition
scheme for two-dimensional lattice animals. It is trivial to find a lattice where this scale
appears also as a two-dimensional animal: one generator is 11:7, and the other is
14641:2401, or 11:7 stacked four times.
**: In the “strict list” column only those scales are enumerated that can be built using
the given generators literally, rather than using their combination or difference. This is
a stricter requirement than that the collections form connected portions of the resulting
lattices—although any collection that forms a connected portion of a lattice is strictly
generated in this sense on some lattice.
280 J. Wild
again and again to a small group of recognisable animals from among the more than
three thousand candidates, no matter the size of the generators of the lattice. First, a
few statistics on the number of three-stepped and pwwf heptatonic scales for various
combinations of generators are presented in Table 1.6 As shown, different generators
result in different counts.
As in the case of the lattice of Fifths and Thirds, pwwf animals on other lattices also
only ever require two generators out of the three axis intervals. In fact we can make a
stronger statement: pwwf animals only ever require two directed generators and it is
possible to start from one cell and cumulatively generate all others using a pair of
directed arrows, as shown in the first animal of Figure 9. Here the animal requires ex-
clusively arrows to the right and arrows at an angle of 60 degrees (measured counter-
clockwise from horizontal). The second animal shown also does not require any arrows
that point along the third axis—but it does need arrows pointing in both directions on
the horizontal axis in order to reach everywhere from an initial cell, so it cannot be
constructed using only two directed generators. The third animal shown requires only
two directed generators, but cannot be generated from a single starting cell.7
Fig. 9.
Healthy Animals
6
The generators entabulated here happen to be intervals from the harmonic series; any
generators may of course be used, with the caveat that degenerate scales may result when it
is possible to form identical sums of the two generators in more than one way.
7
The ways in which the second and third animals shown in Figure 9 fail to meet the condition
are in fact interchangeable: the middle animal could be redrawn to only require two directed
generators if we had two “starting” cells; and the third animal could be redrawn with a single
starting cell if we allowed one of the generating intervals to be used both upwards and down-
wards.
Pairwise Well-Formed Scales and a Bestiary of Animals on the Hexagonal Lattice 281
experimentally verified for a large number of generator-pairs, but the statement re-
mains a conjecture for now.
Healthy heptatonic animals fall into what I identify as three families, shown in
Figure 10. Type I animals (upper left of Figure 10) have two parallel chains of gen-
erators, of lengths 3 and 4, along any of the three axes. Type II animals (right-hand
side) have three parallel chains of generators, of lengths 2, 3 and 2—in fact there are
three different ways of seeing a Type II animal as three parallel chains of generators
of lengths 2, 3 and 2, as shown in Figure 11. The Type III animal (lower left) has
four parallel chains of generators, of lengths 2, 2, 2, and 1. Figure 12 shows there are
two ways of seeing a Type III animal as four parallel chains of generators of lengths
2, 2, 2, and 1.
Fig. 10.
Fig. 11.
Fig. 12.
Several symmetries obtain: if a given Type I or Type III animal is pwwf on a given
lattice, then so is its rotation by 180 degrees. Type II animals are rotationally symmet-
rical at 180 degrees; each Type II animal, if pwwf on a given lattice, will be accom-
panied by its mirror reflection if it is distinct.
282 J. Wild
While all the examples shown in Figure 10 were positioned so their generator
chains run along the horizontal axis, any of these animals can be transformed into any
of its rotations or reflections by permuting and/or inverting the generators used—the
different orientations, if they are distinct, will represent the same scale on different
lattices. Counting all the distinct rotations and reflections of the eight free animals
shown in Figure 10 we find a total of 58 healthy animals as an upper limit for the
number of heptatonic pwwf scales on any given lattice.
Type I animals support the pwwf words aaabaac, abababc and abacabc. Type II
animals support exclusively the word aabacab (and I emphasise that this is true inde-
pendent of the lattice’s basis intervals). Type III animals, like Type I animals, support
the words aaabaac, abababc and abacabc. In fact we see in Figure 13 how Type III
animals may be decomposed into parallel chains of the same generator, of lengths 4
and 3; by substitution of generators, then, they will correspond to Type I animals on a
different lattice.
Fig. 13.
Further, it is easy to see how the four varieties of Type I animals are equivalent to
one another under a substitution of generator, as are the three varieties of Type II.
We can conclude that if we are allowed to specify the lattice, we can construct
every heptatonic pwwf scale discovered so far using only two animals: Type Ia for
words aaabaac, abababc and abacabc, and Type IIa for the word aabacab.
It must not be imagined that the “unhealthy” heptatonic animals are uniformly mis-
shapen. As the gallery of all non-pwwf three-stepped animals on the lattice of fifths
and thirds in Figure 14 shows, many animals that support three-stepped scales that are
not pairwise well-formed are nonetheless symmetrical, or present double chains of
generators of other lengths than 4+3.
Fig. 14.
284 J. Wild
the numbers appearing on the nodes of the lattice indicate the scalar position of the
corresponding pitch, when the scale is rotated to match the primeform word abacabc
(i.e. in Lydian mode):
abacabc; a=203.9, b=182.4, c=111.7
. . . . . . . . . . . . . .
. . . . 3 7 4 . . . . . .
. . . . 1 5 2 6 . . . . . .
The step sizes of the scale we seek to generate are completely unrelated to those of
the syntonic diatonic, but we can use the pattern traced out by the scalar ordering to
calculate new generators. The first scale step, 1-2, is comprised of two horizontal
generators—this is apparent in the segment 1-(5)-2 on the lower row. Since the inter-
val between degrees 1 and 2 (i.e., token a) is bissected by another scale degree on the
lattice, we define GEN1 = (a+1200)/2. The other generator, ascending to the right on
the lattice, connects scale degrees 1 and 3 (or 5 and 7, or 2 and 4). The word abacabc
tells us that each of these generic thirds is the sum of step sizes a and b. So we define
GEN2 = (a+b). Substituting the desired step size values a=150.7 and b=65.0 we ob-
tain GEN1 = 694.05 and GEN2 = 213.8.
Likewise, by examining the mapping between scalar order and lattice arrangement
for representative scales of each of the other heptatonic pwwf words, we can obtain
suitable expressions for their generators as a function of their step sizes:8
abacabc: GEN1 = (a+1200)/2; GEN2 = (a+b); animal is Type Ia.
abababc : GEN1 = a+b; GEN2 = b ; animal is Type Ia.
aaabaac: GEN1 = a; GEN2 = 3a+b; animal is Type Ia.
aabacab: GEN1 = 2a+b+c; GEN2 = –a; animal is Type IIa.
Since the four words account for all heptatonic pwwf scales, it follows that every
heptatonic pwwf scale is doubly generated.9
8
In each case these are not the only expressions that would work.
9
Some pwwf scales have an additional interpretation, as generated by a single interval. See the
footnote to Table 1.
Pairwise Well-Formed Scales and a Bestiary of Animals on the Hexagonal Lattice 285
ber of candidate animals. The lattice of fifths and thirds, for example, only possesses
four pwwf scales of cardinality 9 among over 77,000 candidate animals. A search for
scales of cardinality 5 yields 20 pwwf scales among 186 animals on the same lattice.
For n=5 and n=9 I have been able to use the above method for arbitrary pwwf scales
to find formulae for the generator pairs as a function of the desired step sizes. I con-
jecture this will be possible for higher odd cardinalities, too.
References
[1] Carey, N., Clampitt, D.: Aspects of Well-Formed Scales. Music Theory Spectrum 11(2),
187–206 (Autumn 1989)
[2] Wilson, E.: On the Development of Intonational Systems by Extended Linear Mapping.
Xenharmonikon 3 (1975)
[3] Clough, J., Engebretson, N., Covachi, J.: Scales, sets and interval cycles: a taxonomy. Mu-
sic Theory Spectrum (1999)
[4] Clampitt, D.: Pairwise well-formed scales: Structural and transformational properties,
Ph.D. dissertation, SUNY Buffalo (1997)
[5] Zarlino, G.: Le istitutioni harmoniche (Venice, 1558)
[6] Fogliano, L.: Musica theorica (1529)
[7] Clampitt, D.: Mathematical and Musical Properties of Pairwise Well-Formed Scales. Pa-
per read at MCM 2007 (2007)
[8] Carey, N.: Coherence and sameness in pairwise well-formed scales. Journal of Mathemat-
ics and Music (2007)
[9] Apagodou, M.: Counting Hexagonal Lattice Animals. Preprint paper online at
arXiv:math/0202295 (last accessed March 1) (2009)
Generalized Tonnetz and Well-Formed GTS:
A Scale Theory Inspired by the
Neo-Riemannians
Marek Žabka
1 Inspirations
The neo-Riemannian theory and its use of the concept of the Tonnetz (see e.g.
Cohn 1998, Gollin 1998) provided a source of inspiration for this paper. In this
context, a very special role plays David Lewin’s (1998) illuminating analysis of a
passage from Bach’s F minor fugue.1 The analysis is based on an original idea
that a Tonnetz might also be generated by other intervals than the fifth and the
third.2 This way, Lewin modifies the basic neo-Riemannian concept and applies
it in a different, yet very meaningful analytical situation.
Another stream of inspiration comes from Carey-Clampitt’s diatonic the-
ory. Carey and Clampitt (1989) defined a very powerful property of the well-
formedness and showed that (one-dimensional) generated scales commonly
This paper was supported by the Fulbright Foundation through a fellowship awarded
to the author.
1
For an interesting visualization of Lewin’s structural ideas see (Reed and Bain 2007).
2
Clough (2002) used other generalized Tonnetze in an analysis of Kurtág’s music.
E. Chew, A. Childs, and C.-H. Chuan (Eds.): MCM 2009, CCIS 38, pp. 286–298, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Generalized Tonnetz and Well-Formed GTS 287
encountered in music usually have this property. As the main result, they proved
that there is a direct relation between the acoustical ‘closeness’ of the end points
of the generated scale and its structural well-formedness.3
2 Generalized Tonnetz
The proposed theory relies on David Lewin’s concept of the Generalized Interval
Systems (GIS) and the concept of labeled directed graphs. For a definition of
GIS see e.g. (Lewin 2007).
Definition 2. Let G = (N, I, int∗ ) be a commutative GIS and assume that the
group I is generated by a finite subset X. A generalized Tonnetz (g-Tonnetz)
T (I; X) is the arrow-labeled directed graph (N, A, X, int) where A denotes the
complete inverse image of X under int∗ , A = int∗−1 [X], and int denotes the
restriction of int∗ to A, int = int∗ |A. Further, we say that the dimension of
the g-Tonnetz T (I; X) is n if X has exactly n distinct elements.
The complete inverse image of X under int∗ : N × N → I is the set of all ordered
pairs of nodes (p, q) such that their image int∗ (p, q) is in X. (See also Fig. 2.) The
asterisk will be omitted in the notation of int∗ if there is no risk of confusion.
3
For a related, independently formulated concept of ‘the moments of symmetry,’ see
also (Wilson 1975).
288 M. Žabka
2 4
1
3
As Kolman (2004) showed, two GIS’s are isomorphic if and only if their underly-
ing interval groups are isomorphic. Therefore, a g-Tonnetz is, up to isomorphism,
determined by the group of intervals and the selected set of generators. Thus,
a complete study of the commutative groups and their generating subsets is
sufficient for the complete understanding of the g-Tonnetze.
For our study of the g-Tonnetze, we will rely on representations of Abelian
groups as quotient groups of free Abelian groups. Any Abelian group I may be
represented
n as a quotient group of the free Abelian group Z(X) where Z(X) =
{ i=1 ki ξi | ki ∈ Z, ξi ∈ X}.
int∗
N ×N /Io / Z(X)/K
O O s9
s
ss
? ? ss
s
A
int
/X s
Lemma 2. Let S = (N, A, X, int, Z(X), gen) be a compact UGTS with respect
to o ∈ N and a set of commas K, and e : K → {−1, 1} be a mapping. Assume
the following conditions.
1. Ke = {e(κ)κ | κ ∈ K}.
2. A mapping gene : N → Z(X) is defined in the following way. For any
t ∈ N , gen(t) − gen(o) = κ∈K rt (κ)κ, consider the set K(t) = {κ ∈
K | rt (κ)= 0, e(κ) = −1}. The mapping gene assigns the value gene (t) =
gen(t) + κ∈K(t) κ.
Then Se = (N, A, X, int, Z(X), gene ) is a compact UGTS with respect to o and Ke .
Definition 6. Assume the notation from Lemma 2. UGTS’s S and Se are called
neighboring. The elements gene (o) of Z(X) are called corners of S.
Definition 7. Let S be a UGTS, compact with respect to a node o and a set of
commas K. Further, assume a node t ∈ N , i.e.:
gen(t) = gen(o) + r(κ)κ.
r(κ)∈[0,1), κ∈K
1. o is the origin.
2. t is an edge node (or a λ-edge node) if r(κ) = 0 for some λ ∈ K and for
all κ ∈ K, κ = λ.
3. t is an inner node if r(κ) = 0 for all κ ∈ K.
Example 5. In a normal UGTS, the corners are accessible from an inner tone
through a pure generator. It can be shown that in a normal UGTS, any two
tones can be connected by a chain of pure generators. Figure 8 shows a UGTS
which is not normal.
We notate R12 = [0, 12) the left-closed, right-open interval of real numbers
between 0 and 12. From the definition of the free group, it follows that there
is a unique group homomorphism pitch∗ : Z(X) → R12 such that pitch(ξ) =
pitch∗ (ξ) for all ξ ∈ X. In notating this homomorphism, we will omit the asterisk
and also call it a pitch function if there is no risk of confusion. For the rest of
the paper we assume a GTS S = (N, A, X, int, Z(X), gen, pitch).
Generalized Tonnetz and Well-Formed GTS 293
Notice that the regular addition of real numbers in the last condition from the
previous definition cannot be replaced by addition modulo 12. Addition modulo
12 is distinguished from regular addition by using the symbol ‘⊕’.
Definition 11. Consider two nodes t, u ∈ N . We say that the span of the or-
dered pair (t, u) is (k − 1) if there are exactly k distinct nodes between t and
u. We denote the span of (t, u) by span(t, u). Further, the ordered pair (t, u) is
called a step if span(t, u) = 1.
Definition 12. Consider two elements α, β ∈ Z(X). The size of the ordered
pair (α, β) is the real number r ∈ [0, 12) for which pitch(α) ⊕ r = pitch(β). We
denote the size of (α, β) by size(α, β).
Notice that in general the span is not invariant for neighboring GTS’s. However,
we require this in the definition of the well-formedness.
294 M. Žabka
where spane denotes the the span function in the neighboring GTS Se .
Example 6. We give an example of a GTS which is semi-WF but not WF. As-
sume the GTS depicted in Fig. 6. The GTS which includes the node F is semi-
WF. The span of any fifth is 3 and the span of any third is 1. On the other hand,
the neighboring GTS containing the corner 1 B is not semi-WF. The span of
the fifth (C, G) is 2 and the span of the fifth (G, D) is 4. Therefore the system
is not WF.
Definition 14. Consider a GTS S, compact with respect to a node o and a set
of commas K. We say that:
1. S is open if for some κ ∈ K there are two nodes m, n ∈ N such that
(gen(o), gen(m), gen(o) + κ) and (gen(o) + κ, gen(n), gen(o)).
2. S is closed if it is not open.
Example 7. Loosely speaking, in an open GTS there is a node between two
(neighboring) corners whose distance is a comma. Therefore the comma is not
sufficiently small. The system from the previous example shown in Fig. 9 is
open (i.e. not closed) because the comma (gen(1 B) − gen(F )) is too large: G is
between F and 1 B and D is between 1 B and F . The systems from Figures 3,
4, 6, and 7 are closed. Also the non-normal system from Fig. 8 is closed.
Generalized Tonnetz and Well-Formed GTS 295
We are ready to state the main result of the paper. It asserts that a normal
two-dimensional GTS is well-formed if and only if it is closed.
dha
ga ni ma sa pa
ri
equivalence, of course). Now if we bend the lower one upwards by two śrutis and
the upper one by the same amount downwards we obtain almost the same tone.
This is the basis of the first comma: −f + 2s ≈ f − 2s, which gives the comma
κ = 4s − 2f .
The other comma is related to the one underlying the Pythagorean pentatonic.
A tone tuned as the fifth perfect fifth is lower than the starting tone just by a
small interval. By bending the fifth fifth upwards results in a comma. However,
there is an issue: Should it be bent by two or by one śruti? In the first case,
the other comma is λ1 = 5f + 2s. In the second case, it is λ2 = 5f + 1s. It
is fascinating that both options seem to have been (unconsciously) applied by
major music cultures – the Arabic and the Indian. Figures 7 and 8 show the
g-Tonnetze T1 (Z(f, s)/{κ, λ1 }; f, s) and T2 (Z(f, s)/{κ, λ2 }; f, s).
Generalized Tonnetz and Well-Formed GTS 297
0
0
The first solution leads to a 24-tone GTS. The Arabic music theory knows
a system of 24 small intervals called nı̄ms. It is usually explained as a result of
splitting each tone of the 12-tone chromatic system into two quarter tones. Our
approach provides an alternative explanation for the structure of this system. In
this explanation, the nı̄ms do no have to be (acoustically) uniform.
More striking is the fact that the GTS implied by the second set of commas
comprises 22 elements. It seems to model suitably the Indian system of 22 śrutis.
There is no generally accepted explanation for the number of 22 in this system.5
Our explanation of this number is very simple and surprisingly accurate. It only
relies on four basic assumptions:
References
Lewin, D.: Generalized Musical Intervals and Transformations. Oxford University
Press, Oxford (2007) (Originally: Yale University Press, 1987)
Lewin, D.: Notes on the Opening of the F Minor Fugue from WTCI. Journal of Music
Theory 42(2), 235–239 (1998)
Reed, J., Bain, M.: A Tetrahelix Animates Bach: Revisualization of David Lewin’s
Analysis of the Opening of the F Minor Fugue. Music Theory Online 13(4) (2007)
Cohn, R.: Introduction to Neo-Riemannian Theory: A Survey and a Historical Per-
spective. Journal of Music Theory 42(2), 167–180 (1998)
Gollin, E.: Some Aspects of Three-Dimensional Tonnetze. Journal of Music The-
ory 42(2), 195–206 (1998)
Carey, N., Clampitt, D.: Aspects of Well-Formed Scales. Music Theory Spectrum 11(2),
187–206 (1989)
Wilson, E.: Personal corresondence with John Chalmers (1975),
www.anaphoria.com/mos.PDF
Mazzola, G.: The Topos of Music: Geometric Logic of Concepts, Theory, and Perfor-
mance. Birkhäuser, Basel (2002)
Kolman, O.: Transfer Principles for Generalized Interval Systems. Perspectives of New
Music 42(1), 150–191 (2004)
Clough, J., Douthett, J., Ramanathan, N., Rowell, L.: Early Indian Heptatonic Scales
and Recent Diatonic Theory. Music Theory Spectrum 15(1), 36–58 (1993)
Clough, J.: Diatonic Trichords in Two Pieces from Kurtag’s ‘Kafka-Fragmente’: A
Neo-Riemannian Approach. Studia Musicologica Academiae Scientiarum Hungari-
cae 43(3/4), 333–344 (2002)
Žabka, M.: Well-Formed Two-Dimensional Generalized Tone Systems (unpublished
paper)
Author Index