royalsocietypublishing.org/journal/rstb
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
Research
Cite this article: Dellert J, Erben Johansson
N, Frid J, Carling G. 2021 Preferred sound
groups of vocal iconicity reflect evolutionary
mechanisms of sound stability and first
language acquisition: evidence from Eurasia.
Phil. Trans. R. Soc. B 376: 20200190.
https://doi.org/10.1098/rstb.2020.0190
Accepted: 11 February 2021
One contribution of 17 to a theme issue
‘Reconstructing prehistoric languages’.
Subject Areas:
behaviour, evolution
Keywords:
sound evolution, vocal iconicity, phonology,
typology, language evolution,
first language acquisition
Author for correspondence:
Gerd Carling
e-mail: gerd.carling@ling.lu.se
†
Present address: Lund University, Department
of linguistics, Centre for Languages and
Literature, Box 201, 221 00 Lund, Sweden.
Electronic supplementary material is available
online at https://doi.org/10.6084/m9.figshare.
c.5324910.
Preferred sound groups of vocal iconicity
reflect evolutionary mechanisms of sound
stability and first language acquisition:
evidence from Eurasia
Johannes Dellert1, Niklas Erben Johansson2, Johan Frid3 and Gerd Carling2,†
1
Seminar für Sprachwissenschaft, Universität Tübingen, Wilhelmstraße 19, 72074 Tübingen, Germany
Center for Languages and Literature, Lund University, Helgonabacken 12, 223 62 Lund, Sweden
3
Lund University Humanities Lab, Lund University, Box 201, 221 00 Lund, Sweden
2
GC, 0000-0002-9190-9724
In speech, the connection between sounds and word meanings is mostly arbitrary. However, among basic concepts of the vocabulary, several words can be
shown to exhibit some degree of form–meaning resemblance, a feature
labelled vocal iconicity. Vocal iconicity plays a role in first language acquisition and was likely prominent also in pre-historic language. However, an
unsolved question is how vocal iconicity survives sound evolution, which
is assumed to be inevitable and ‘blind’ to the meaning of words. We analyse
the evolution of sound groups on 1016 basic vocabulary concepts in 107
Eurasian languages, building on automated homologue clustering and
sound sequence alignment to infer relative stability of sound groups over
time. We correlate this result with the occurrence of sound groups in iconic
vocabulary, measured on a cross-linguistic dataset of 344 concepts across
single-language samples from 245 families. We find that the sound stability
of the Eurasian set correlates with iconic occurrence in the global set. Further,
we find that sound stability and iconic occurrence of consonants are connected to acquisition order in the first language, indicating that children
acquiring language play a role in maintaining vocal iconicity over time.
This article is part of the theme issue ‘Reconstructing prehistoric
languages’.
1. Introduction
Human speech production involves an ability to form a range of distinct phonemes, which are a precondition for spoken language. The evolution of speech
production in pre-historic language can be approached in various ways. Studies
in the evolution of articulation can contribute to how sound distinctions
emerged [1], cross-linguistic typologies of sound systems can be used to classify
sounds and systems into types [2], and reconstructions by the comparative
method can indicate how sound systems evolve over time [3]. In addition,
first language acquisition (L1) can be viewed in terms of evolution [4,5].
The basis for sound evolution is the articulatory mechanisms of speech
production, studied in the disciplines of phonetics and phonology [6]. Crosslinguistic typologies, aiming at understanding the processes leading to global
phonological diversity, rely on basic features of articulation [7] and in a typological model [8], synchronic regularities are a result of diachronic trajectories of
change [9]. Over time, sound systems are continuously modified, a process identified as sound change. Various mechanisms govern this process, most
importantly articulation, which impacts the directionality of change [10,11].
Further, sound evolution may be connected to articulatory mechanisms
of L1. This idea, known as the child-language theory by Jakobson [4,12], is
© 2021 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution
License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original
author and source are credited.
Table 1. The scale of consonantal strength and sonority, as presented in
the theory of preference laws [5,13,15].
STRENGTH
voiced stops
voiceless fricatives
voiced fricatives
nasals
lateral liquids
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
high vowels
mid vowels
low vowels
INCREASING
SONORITY
continued in the theory of preference laws in phonological
change [13]. The approach connects evolution and typology
with acquisition, implying that a specific structure is preferred with respect to some parameter in a specific
situation, following the principles that [5]:
(1) the more languages show a property cross-linguistically,
the more preferred it is;
(2) the earlier and quicker it is acquired in L1, the more
preferred it is;
(3) the longer it takes to become lost in aphasia, the more
preferred it is.
Preference laws build on the inversion of consonantal strength,
a relational measure defined as the degree of deviation from an
unimpeded airflow, and the sonority hierarchy, which
organizes phonemes according to their acoustic energy
[6,14]. Thus, the theory sees language change as striving
towards a maximal contrast in the syllable, with the strongest
possible consonant and a maximally sonorous vowel [15]
(table 1). Studies in L1 indicate that the variation between individual children may be substantial [16], but a general
acquisition order of speech production can be supported by
data from normal and impaired children [16–20]. This order
can also be connected to phonological simplifications in aphasia patients [21].
The notion of vocal iconicity or sound symbolism, a resemblance-based mapping between form and meaning, has a long
history in linguistic literature [22]. To Saussure [23], the notion
of an arbitrary connection between form and meaning of the
linguistic sign was a precondition to his general theory of
language. This notion of arbitrariness was questioned by
Jespersen [24] and Jakobson [4,25], who observed that sound
symbolism is not just an important feature of language itself;
it is also a vital part of acquiring a language [26,27].
An important aspect of iconicity—in both speech and
signs—is the role it may have played in pre-historic language.
Whereas signs have a natural tendency to be motivated by
iconicity, indexicality or systematicity [28,29], the connection
is less evident for speech [30–32]. The discussion is about
whether speech is fundamentally iconic or arbitrary and
whether signs preceded speech in pre-historic language or
2. Theory
Since a vital part of basic vocabulary remains iconically motivated [42,44], we assume that the emergence and preservation of
vocal iconicity is a process that is integrated with core parts
of the sound evolution process. To make a cross-linguistic
investigation feasible, we restrict ourselves to basic vocabulary,
focusing on concepts which have demonstrated sound–meaning associations from a cross-linguistic perspective, together
with concepts that have not done so [42–44].
Using the preference theory as a backdrop [4,5,13], we
assume that sound preference, controlled by articulatory
mechanisms of speech production, affects sound evolution
over time. Here, we assume a gradual decrease in stability
of sound groups along the parameters outlined by the preference theory (relying on the sonority scale), roughly following
the principle that consonants range in preference from front
to back (strongest to weakest), and vowels range from back
to front/mid (weakest to strongest). In addition, we assume
that articulation affects L1 along these parameters. Due to
the role of vocal iconicity in L1 [47], we assume that vocal iconicity, by iconic preference, is encapsulated in the general
Phil. Trans. R. Soc. B 376: 20200190
central liquids
glides/approximants
2
royalsocietypublishing.org/journal/rstb
INCREASING CONSONANTAL
voiceless stops
not [33,34]. One theory argues that pre-historic language
initially consisted of iconic signs, which were later successively replaced by conventionalized symbols, giving rise to
arbitrariness [35,36]. A competing theory argues that speech
perforce is arbitrary, manifests itself by convention [30,37],
and therefore must have been arbitrary from the start. An
often-invoked argument is the communication signals of
vervet monkeys and other animals, which are supposed to
be arbitrary [38]. However, communication signals by nonhuman primates may be iconically or indexically motivated
[39], and there is evidence that chimpanzees can map
white/bright to high-pitched sounds and black/dark to
low-pitched sounds [40], so the question remains open.
Recent empirical research, using experiments and large
datasets, challenges the notion of the fundamental arbitrariness
of speech. Vocal iconicity is central to various aspects of
language processing and communication [22,41]. Several
studies, using large, cross-linguistic datasets and computational modelling, indicate that a substantial part of basic
vocabulary (the universally common, most frequent and salient
part of vocabulary) is non-arbitrarily motivated [42–44].
Our paper deals with vocal iconicity and the evolutionary
mechanisms of speech production. A fundamental problem
here is the cross-linguistic occurrence of vocal iconicity in
spite of language change. In a Neogrammarian model [45,46],
sound change is inevitable, rule-based (including conditions
and constraints of change) and governed by articulatory principles [3]. The Neogrammarians did not specifically mention
arbitrariness, but a prerequisite to the model is that sound
change is ‘blind’ to the meaning of words. Even though the
Saussurean notion of the arbitrariness was questioned by
later scholars, they did not address the problem of iconicity in
relation to inevitable and meaning-blind sound change, forming the baseline for our study. If speech is fundamentally
arbitrary and agreed upon by convention, how come that
vocal iconicity emerges in core parts of the vocabulary of all
languages? Reversely, if speech is fundamentally iconic, how
come that vocal iconicity is not destroyed by a meaning-blind
sound change?
/mama/
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
L1 acquisition order
in speech production
sound stability
Figure 1. Representation of our model of emergence of vocal iconicity in
relation to articulatory mechanisms of speech production, sound stability
and sound preference in first language acquisition.
sound evolution process by means of mechanisms of articulation. Sound groups that are more salient in evolution as well
as preferred in acquisition correlate with sound groups that
are overrepresented in iconic associations (figure 1). Using
the model of phonemic feature hierarchies [7], we identify
five basic feature classes, matching the preference theory.
These are: (1) place of articulation, (2) manner of articulation,
(3) voicing, (4) openness and (5) backness (table 2). By means
of these parameters, our different datasets can be compared
and evaluated to test our theory. Our study is confined to
contemporary, cross-linguistic data. One of our datasets has
a cross-linguistic, global coverage, picking one language per
family (iconic occurrence); the other dataset is restricted to
one continent (Eurasia), including data from 21 families
(sound evolution). We admit that the difference in data coverage is a shortcoming, but global data on sound evolution was
not available to us. However, even though the model of
uniformitarianism [48] is becoming increasingly questioned
(see other papers of this volume), we assume that our limited
sample still provides a good approximation to a general
model of the mechanisms of speech production in pre-historic
language and the general trajectories of language change
underlying the emergence of iconicity [31,34].
3. Model, method and data
Our model is empirical and quantitative, considering sound
evolution rather than sound change. For that purpose, we use
cross-linguistic data and a method that can be applied
equally to several families, independently of the degree to
which they are covered by the comparative method. The
model does not consider conditions and constraints of
sound change, which is a complex process involving factors
such as metathesis, merger, loss, lenition, epenthesis, syncope
and apocope [3].
Our linear regression model correlates three different
datasets:
(a) Sound stability
In order to substantiate our assumption that the appearance
of iconic form–meaning mappings may in part be explained
by the higher overall stability of certain sound groups against
phonetic change, we propose a way to estimate sound group
stability values (SSt) from a cognacy-annotated phonetic
form database and apply it to NorthEuraLex 0.9 [49]. We
modify the existing code for information-weighted sequence
alignment (IWSA) [50] to support the IPA segmentation
underlying the iconicity dataset (Ico) [44].
Based on this re-segmentation, we first run the relevant
script from the IWSA code on NorthEuraLex 0.9, in order to
infer sound similarity scores for sequence alignment of word
pairs. To decide which word pairs to align, we re-use the previously published automated homologue judgements
returned by the IWDSC method [50]. Initially, we considered
building on preliminary results of cognacy annotation based
on the available literature. However, published coverage of
etymologies for the relevant families remains both incomplete
and uneven, and mixing partial high-quality cognacy annotations with low-quality automated judgements for the rest of
the data could easily lead to difficult-to-handle statistical
biases. We are aware that the automated annotations are of
much lower quality than would be achievable for large parts
of our data (e.g. for Indo-European and Uralic), but from the statistical point of view, the data quality issues can simply be treated
as high noise levels, which will not distort results much as long
as the noise is equally distributed across the entire dataset.
The central idea of our sound group stability estimates is to
count how many instances of sounds from the respective group
in the database are aligned to sounds from the same group in
pairs of homologue forms. This provides an empirical answer
to the question of how commonly phonetic evolution leads to
sounds losing their group-defining properties. However, two
major biases inherent in a simple tally of the aligned sound
pairs in the relevant alignments need to be avoided.
The first problem is that the language pairs represent different durations of phonetic divergence. If some sound groups are
overrepresented in a group of closely related languages in the
database (such as the Slavic languages), simply counting all
the aligned sound pairs involving that group will overestimate
stability. In order to avoid this bias, we weight the counts by the
average stability of sounds across all homologue pairs for the
language pair in question. For instance, for Finnish and closely
related Karelian, 779 word pairs were automatically detected to
be homologues, and 5513 segment pairs were aligned in total
by IWSA. Out of these, 3968 segment pairs consisted of identical IPA symbols. Ultimately, the stability value of Finnish
Phil. Trans. R. Soc. B 376: 20200190
/ma/
articulatory mechanisms
of speech production
3
royalsocietypublishing.org/journal/rstb
(1) (SSt) = a new dataset of sound group stability data, based
on all International Phonetic Alphabet (IPA) transcriptions
from the NorthEuraLex 0.9 database, i.e. words for 1016
concepts across 107 languages from 21 language families
of Northern Eurasia.
(2) (Ico) = an existing, published dataset of iconic preference
for 344 basic concepts, covering one language each from
245 families [44]. Lexemes have been coded by sound
groups relevant to acoustic and articulatory mechanisms,
as well as to vocal iconicity values.
(3) (L1) = a small reference dataset in the form of a matrix
of sound groups by feature class, identified as earlier or
later in L1.
preference in
vocal iconicity
sound classification
first language acquisition
feature classa
earlier
later
source
1
consonants
place of articulation
labial, alveolar
palatal, velar, glottal
[16,17]
2
3
consonants
consonants
manner of articulation
voicing
stop
voiceless
continuant
voiced
[16]
[17]
4
5
vowels
vowels
openness
backness
low
back
high, mid
front, central
[18]
[18]
a
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
the table contains feature classes of sound groups that are distinguished in our data [7]. Other observed groups are labial nasals (earlier)–other nasals (later)
[16] and unaspirated stops (earlier)–aspirated stops (later) [20].
towards Karelian was computed to be as high as 0.791, i.e. only
one in five sounds will be different between etymologically
related words from the two languages. To compute stability
values for a sound group, each relevant sound pair from alignments of words from this language pair was therefore counted
as only 20.9% of a full sound pair. For Tundra Nenets, the Uralic
language which is most distant from Finnish according to the
stability measure, the equivalent value is at 74.6%, i.e. three
out of four aligned sounds are non-identical. A sound which
remains identical between Finnish and Tundra Nenets is thus
counted more than three times as much as a sound which
merely remained identical in Finnish and Karelian.
The second bias to avoid is caused by the use of dictionary forms as opposed to stems which would be the most
natural etymological comparanda. Mitigating the effects of
this property of lexicostatistical databases is the main motivation for information weighting as introduced by Dellert &
Buch [51], of which Dellert [50] represents the most recent
version. The information content quantifies for each position
in an alignment how surprising the segments are in their
current context, automatically leading to low values for
recurring inflectional material such as infinitive endings.
We simply multiply the weight by which each instance in
an alignment is counted by its information content. In sum,
each segment pair count is weighted by the product of
the information content and the overall rate of phonetic
replacement between the two relevant languages.
We compute the SSt values for each sound group by dividing the weighted count of the identical or in-group alignments
by the total weighted count of alignments involving sounds
from a sound group. The exact mathematical statement of the
SSt measure with all of its components can be found in the electronic supplementary materials (S1), and the implementation is
openly available as part of our code release.1
(b) Iconic value
In order to investigate whether there is a correlation between the
stability of a sound group and the frequency of its usage in
iconic sound–meaning associations, we need to build on
reliable and comparable stability and iconicity values for articulatory and acoustic features. For this, we re-use the iconicity
values from Erben Johansson et al. [44], where the sampled
data were phonetically transcribed and the segments were
grouped according to salient articulatory parameters and
distinctive acoustic features relevant for studying iconicity. Proportional over- and underrepresentations of each sound group
for each concept could then be estimated. These were then
transformed into odds ratios (OR) in order to make sound
groups with different levels of granularity comparable (e.g.
unrounded-rounded vowels versus high-mid-low vowels).
A region of practical equivalence (ROPE) was then defined
around the null effect of no under- or overrepresentation. Noteworthy (strong) overrepresentations were defined as a 25%
increase of the OR which also had the 95% credible intervals
for the OR falling completely outside the ROPE. Noteworthy
(weak) overrepresentations were defined as a 25% increase of
the OR which also had the 95% credible intervals for the OR
excluding zero and the median of posterior distribution was
outside the ROPE. Altogether, this produced very conservative
estimates of the degree of under- or overrepresentation. Two
hundred and twenty-five combinations of sound groups and
concepts were judged to be iconically overrepresented which
corresponded to only approximately 1.3% of all possible combinations. In order to ensure comparability, we re-use the sound
group categorization from this study for computing the stability
values (SSt). The only differences are that voiced glottals, voiceless nasals, voiceless laterals, voiceless vibrants and low front
rounded vowels had to be excluded due to data sparsity for
these sound groups in the NorthEuraLex database. All
sound–meaning comparisons with iconicity values lower
than 1 (i.e. underrepresentations of sounds) were removed
because most of these are redundant mirror images of overrepresentations (an overrepresentation of rounded vowels also
leads to an underrepresentation of unrounded vowels) and
would thus skew the comparison to sound stability.
(c) Earlier and later in first language acquisition
To contrast our data of SSt and Ico against preference in L1,
we compile a smaller reference dataset from various sources
(electronic supplementary material, S3). First, we list hierarchical chains for sound groups to be acquired earlier or
later in first language acquisition [4]. Second, we identify
phonemic feature classes ( place and manner of articulation,
voicing, openness and backness) [7] that can be applied to
our sound groups (electronic supplementary material, S2),
matching the preference theory. Finally, we scan the literature
on acquisition order in the first language to verify these feature classes and compile a matrix, which merges the
Phil. Trans. R. Soc. B 376: 20200190
main type
4
royalsocietypublishing.org/journal/rstb
Table 2. Scheme of sound groups, organized by classes [7] and defined as earlier and later in first language acquisition. The distinctions follow the babbling
period [19] and are relative notions, not specifically distinguishing absolute ages of acquisition. They were first identified by Jakobson [4] and have been
verified by different methods (diary method, day-by-day recording) in various languages (see electronic supplementary material, S3).
0.
00
%
%
10
.0
0
%
90
80
.0
0
%
.0
0
%
70
60
.0
0
%
50
.0
0
%
40
.0
0
%
.0
0
%
30
.0
0
20
10
.0
0
%
loss or gain
shift out of group
shift in group
stable
Figure 2. Barplot demonstrating the stability rates of sound groups, divided by ‘stable’ and ‘shift in group’ (=stability rate) versus ‘shift out of group’ and ‘loss or
gain’ (=instability rate), organized from most stable sound group (top) to most unstable (down) (see electronic supplementary material, S1).
suggested chains into two relative types, ‘earlier’ and ‘later’
(with no specific reference to age). We use mainly data following the babbling period (table 2).
To arrive at our combined dataset (electronic supplementary material, S4), we start from the iconic value dataset (Ico),
which contains 18 576 rows of concepts in combination with
individual sound groups, based on the lexical data. The calculation of iconic value in the original dataset is Bayesian and
includes a confidence interval. We use the centre of the confidence interval as the prototypical value. To this data, we add
the sound stability values of the SSt dataset for each row (col
K). Thereupon, we add the reference data from the L1 matrix
(table 2) for each sound group present here. We were not able
to identify safely a potential coding ‘earlier’ or ‘later’ for all
sound groups of our dataset Ico, which means that these columns contain several empty cells. However, all sound groups
containing the relevant phonetic feature of the L1 matrix
(table 2), regardless of granularity level, are coded. For
example, stop consonants include all stops (from the fivelevel distinction of the nasal, stop, continuant, vibrant, lateral
sound groups), but also voiced nasals, voiced stops, voiced
continuants, voiced vibrants and voiced laterals (from the
ten-level distinction of the nasal, stop, continuant, vibrant,
lateral sound groups with voiced and voiceless distinctions).
We analyse the joint data by linear regression tests, using R
(electronic supplementary material, S5; see below).
Phil. Trans. R. Soc. B 376: 20200190
0%
royalsocietypublishing.org/journal/rstb
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
5
[lab]
[lat]
[nas]
[nas_+v]
[lat_+v]
[alv]
[lab_–v]
[vib_+v]
[vib]
[stop]
[lab_+v]
[+voice]
[–voice]
[alv_+v]
[vel_–v]
[stop_–v]
[cont_–v]
[alv_–v]
[vel]
[–round]
[cont]
[+round]
[front]
[stop_+v]
[low]
[back]
[cont_+v]
[high]
[high_back_+r]
[mid]
[high_bck]
[high_front_–r]
[high_front]
[low_front_–r]
[low_front]
[vel_+v]
[pal_+v]
[pal]
[glot]
[low_back]
[glot_–v]
[low_back_+r]
[high_front_+r]
[low_back_–r]
[high_back_–r]
[central]
[pal_–v]
0
(iii)
−5
(ii)
−10
log(iconic_value)
6
royalsocietypublishing.org/journal/rstb
density
(i)
0 1 2 34
(a)
V.C
consonant
0.50
vowel
0
0.1
(b)
0.2
density
0.3
manner
log(iconic_value)
0
(i)
–5
–10
C_Manner
earlier
later
0.25
0.50
sound_stability
0.75
place
0
(ii)
–2.5
log(iconic_value)
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
sound_stability
–5.0
–7.5
–10.0
C_Place
earlier
–12.5
later
0.25
0.50
sound_stability
0.75
Figure 3. (a) Density plot (i) and scatter plot with regression line (ii) with a corresponding density plot (iii), adapted to a logarithmic scale, comparing sound
stability (x) and iconic value (y), separating the distribution of vowels (yellow) and consonants (grey) (see electronic supplementary material, S4, S5). (b) Density
scatterplots, adapted to a logarithmic scale, comparing sound stability (x) and iconic value (y), separating the distribution of ‘earlier’ (red) and ‘later’ (blue) in first
language acquisition (relative distinctions of acquisition order of features, following the babbling period), given by the feature classes ‘manner of articulation’ (i) and
‘place of articulation’ (ii) (see electronic supplementary material S4-S6). (Online version in colour.)
Phil. Trans. R. Soc. B 376: 20200190
0.25
5. Discussion
Our results should be considered in relation to our theory as
well as earlier literature [4,5,13]. Apparently, vocal iconicity is
overrepresented with sound groups of the higher stability spectrum, both for consonants and vowels, but with an internal
difference: high for consonants and moderately high for
vowels (figure 3a). Even though vowels are generally more
unstable (figure 2), the iconic value is overall higher for
vowels (figure 3a). This means that our assumption about the
correlation between sound stability and iconicity generally
holds for both consonants and vowels. The pattern recurs
6. Conclusion
Based on a dataset of 107 Eurasian languages, we find that
preference in speech production matches sound stability in
a relatively straight-forward way, which follows mechanisms
such as the sonority hierarchy and place and manner of
articulation. Sonorous vowels are the most unstable sounds;
continuants, vibrants and laterals show a steadily increasing
stability; alveolar and dental stops are even more stable;
and labials, nasals and laterals are the most stable sounds.
Vocal iconicity, i.e. resemblance-based form–meaning mapping of specific concepts, measured on a dataset of global
coverage, is generally restricted to sound groups, which are
higher in the sound stability spectrum. This goes for both
consonants and vowels, where a separation of consonants
and vowels considerably improves the result. This indicates
that iconic sound–meaning mappings are more likely to survive sound evolution and change compared to an average
arbitrary connection between form and meaning. For consonants, specifically in the feature classes place and manner
of articulation, sound groups of high-stability and high
iconic value co-occur with sound groups that are acquired
earlier in L1. There is no such pattern for vowels, even
though vowels are generally more frequent in iconic mappings. This indicates that for consonants, children acquiring
language trigger form–meaning mappings and reinforce
cross-linguistic patterns over time.
Data accessibility. Two of the three relevant input datasets (NorthEuraLex and the Iconicity values) were previously published and are
openly accessible (http://www.northeuralex.org/, https://osf.io/
dh7f3/). The third input dataset (L1 values) as well as our combined
dataset which was used for all analyses are available both as electronic supplementary materials (S2–S4) and via our GitHub
7
Phil. Trans. R. Soc. B 376: 20200190
If we begin with the sound stability values alone (SSt; electronic supplementary material, S1), the result distinguishes
four types: (i) stable (i.e. the percentage with an exact matching in homologues), (ii) shift in group (i.e. the percentage
with a matching within the sound group), (iii) shift out of
group (i.e. the percentage with a matching outside of the
sound group) and (iv) loss or gain (i.e. the percentage of
cases where a phoneme is lost or gained). We summarize
these into two types, stable (stable + shift in group) and
unstable (shift out of group + loss or gain) (figure 2). We conclude that the stability values of sound groups are highly
diverging, from almost complete stability (labials, laterals,
nasals) to almost complete instability (central vowels, palatals, glottals). The most striking result is the difference in
stability between consonants and vowels, where consonants
have higher stability values.
Second, we correlate the rates of SSt (stable + shift in group)
to the Ico, separating consonants and vowels. We use a density
plot and a scatter plot adapted to a logarithmic scale for a better
visualization of the results (figure 3a). The density plot shows
two clear density peaks for SSt and Ico, one for consonants
(with high stability) and one for vowels (with medium stability). Three linear regression tests (electronic supplementary
material, S5, S6) show that separating vowels and consonants
leads to a better model, as iconic value appears to be systematically higher for vowels than for consonants. The linear
regression analysis involving both SSt and Ico is significant when looking at vowels only (F1,2770 = 108.7, p < 0.001,
R 2 = 0.03777), consonants only (F1,4119 = 398.6, p < 0.001,
R 2 = 0.08823) and all data (F1,6891 = 246.08, p < 0.001, R 2 =
0.006642). The larger R 2 values for the vowels-only and consonant-only models indicate that the separation explains a larger
portion of the variance in the data, compared to an all-data
model. On the other hand, the relatively low R-values indicate
that there is a large degree of variance in the data. As expected,
the effect of iconicity, in terms of variance explained, is relatively
small, but statistically significant. Iconicity is not a main driving
force of sound evolution, but factors in a small, but steady way.
Finally, we contrast SSt and Ico, separating earlier and
later in L1, extracting rows with sound groups that have
been coded for this feature (electronic supplementary
material, S4, S5). We find that there is a clear separation by
place of articulation, where earlier is higher in stability and
iconic value (figure 3bii). Manner of articulation shows the
same pattern, but with a bimodal separation (figure 3bi). Voicing, as well as vowels (openness/backness), cannot be as
clearly separated into earlier and later with respect to SSt
and Ico (electronic supplementary material, S6).
when we match our data with observations of earlier and
later in L1. Place and manner of articulation confirm the predictions that stability and iconicity co-occur with the sound group
types that are acquired earlier when learning a language
(figure 3b). However, the pattern by vowels gives no indication,
which is interesting and noteworthy considering the fact that
vowels as such are more profiled in sound–meaning mappings
(electronic supplementary material, S6).
The results give rise to several questions which cannot be
fully answered by our study. The causality of the observed
patterns is not entirely clear, and there are several possible
explanations for pre-historic language. Vocal iconicity may be
an effect of certain high-stability sound groups randomly occurring more in words for some concepts, which would then
appear iconic because they are more resistant to change. Alternatively, only high-stability sound groups survive in iconic
mappings because they reflect the way in which words were
coined when language evolved, but the iconic low-stability
sounds did not survive until the present day. It is also possible
that there is a bias for sound laws, or sporadic changes, to
favour stabilizing iconic form–meaning mappings. Our comparison with L1 order indicates that children acquiring
language trigger iconic mappings, at least for the consonantal
onset (and possibly the coda) of the syllable, but not for the
vowel core, which on the other hand is more frequent in
iconic mappings. This indicates that the L1 theory, even
though it is supported by our results, cannot completely
explain the evolution and emergence of vocal iconicity.
royalsocietypublishing.org/journal/rstb
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
4. Results
repository (https://github.com/jdellert/icon-evol), which also hosts
all of the relevant code.
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
Marcus and Amalia Wallenberg Foundation (MAW 2017.0050,
awarded to G.C.). The work by N.E.J. was funded by a PhD grant
Endnote
1
See https://github.com/jdellert/icon-evol.
References
1.
Dediu D, Janssen R, Moisik SR. 2019 Weak biases
emerging from vocal tract anatomy shape the
repeated transmission of vowels. Nat. Hum. Behav.
3, 1107–1115. (doi:10.1038/s41562-019-0663-x)
2. Maddieson I. 2018 Is phonological typology possible
without (universal) categories? In Phonological
typology (eds LM Hyman, F Plank), pp. 107–125.
Berlin, Germany: Walter de Gruyter.
3. Garrett A, Johnson K. 2013 Phonetic bias in sound
change. In Origins of sound change: approaches to
phonologization. Oxford, UK: Oxford University Press.
4. Jakobson R. 1941 Kindersprache, Aphasie und
allgemeine Lautgesetze. Uppsala, Sweden: Almqvist
& Wiksell.
5. Mailhammer R, Restle D, Vennemann T. 2015
Preference laws in phonological change. In
The Oxford handbook of historical phonology
(eds P Honeybone, J Salmons), pp. 450–466.
Oxford, UK: Oxford University Press.
6. Lass R. 1984 Phonology: an introduction to basic
concepts. Cambridge, UK: Cambridge University Press.
7. Clements GN. 1985 The geometry of phonological
features. Phonol. Yearbook 2, 225–252. (doi:10.
1017/S0952675700000440)
8. Greenberg JH. 1969 Some methods of dynamic
comparison in linguistics. In Substance and structure
of language: lectures delivered before the Linguistic
Institute of the Linguistic Society of America,
University of California, Los Angeles, 17 June–12
August 1966 (ed. J Puhvel), pp. 147–203. Berkeley:
University of California Press.
9. Croft W. 2003 Typology and universals. Cambridge,
UK: Cambridge University Press.
10. Garrett A. 2014 Sound change. In The Routledge
handbook of historical linguistics (eds C Bowern,
B Evans), pp. 227–248. London, UK: Routledge.
11. Bybee J. 2015 Articulatory processing and frequency
of use in sound change. In The Oxford handbook of
historical phonology. Oxford, UK: Oxford University
Press.
12. Jakobson R, Halle M. 1971 Fundamentals of
language. The Hague, The Netherlands: Mouton.
13. Vennemann T. 1988 Preference laws for syllable
structure and the explanation of sound change: with
special reference to German, Germanic, Italian and
Latin. Berlin, Germany: Mouton de Gruyter.
14. Ladefoged P, Johnson K. 2015 A course in phonetics.
Stamford, CT: Cengage Learning.
15. Restle D, Vennemann T. 2001 Silbenstruktur. In
Language typology and language universals: an
international handbook, vol. 2 (eds M Haspelmath, E
König, W Oesterreicher, W Raible), pp. 1310–1336.
Berlin, Germany: Walter de Gruyter.
16. Ferguson CA, Farwell CB. 1975 Words and sounds in
early language acquisition. Language 51, 419–439.
(doi:10.2307/412864)
17. Macken MA. 1980 Aspects of the acquisition of stop
systems: a cross-linguistic perspective. In Child
phonology (eds GH Yeni-Komshian, JF Kavanagh, CA
Ferguson), pp. 143–168. New York, NY: Academic
Press.
18. Velten HV. 1943 The growth of phonemic and
lexical patterns in infant language. Language 19,
281–292. (doi:10.2307/409932)
19. Lee SAS, Davis B, MacNeilage P. 2010 Universal
production patterns and ambient language
influences in babbling: a cross-linguistic study
of Korean- and English-learning infants. J. Child
Lang. 37, 293–318. (doi:10.1017/S0305000
909009532)
20. Ferguson CA, Garnica OK, Lenneberg EH, Lenneberg
E. 1975 Theories of phonological development. In
Foundations of language development) (eds EH
Lenneberg, E Lenneberg), pp. 153–180. New York,
NY: Academic Press.
21. Romani C, Galuzzi C, Guariglia C, Goslin J. 2017
Comparing phoneme frequency, age of acquisition,
and loss in aphasia: implications for phonological
universals. Cogn. Neuropsychol. 34, 449–471.
(doi:10.1080/02643294.2017.1369942)
22. Dingemanse M, Perlman M, Perniss P. 2020
Construals of iconicity: experimental approaches to
form–meaning resemblances in language. Lang.
Cogn. 12, 1–14. (doi:10.1017/langcog.2019.48)
23. Saussure F. 1916 Cours de linguistique générale.
Paris, France: Payot.
24. Jespersen O. 1922 Language: its nature,
development and origin. London, UK: Allen &
Unwin.
25. Jakobson R. 1965 Quest for the essence of
language. Diogenes 13, 21–37. (doi:10.1177/
039219216501305103)
26. Walker P, Bremner JG, Mason U, Spring J, Mattock
K, Slater A, Johnson S. 2010 Preverbal infants’
sensitivity to synaesthetic cross-modality
correspondences. Psychol. Sci. 21, 21–25. (doi:10.
1177/0956797609354734)
27. Massaro DW, Perlman M. 2017 Quantifying
iconicity’s contribution during language acquisition:
implications for vocabulary learning. Front.
Commun. 2, 4. (doi:10.3389/fcomm.2017.00004)
28. Goldin-Meadow S. 2002 Getting a handle on
language creation. In The evolution of language out
of pre-language (eds T Givón, BF Malle), pp.
343–374. Amsterdam, The Netherlands: John
Benjamins.
29. Perniss P, Vigliocco G. 2014 The bridge of iconicity:
from a world of experience to the experience of
language. Phil. Trans. Biol. Sci. 369, 20130300.
(doi:10.1098/rstb.2013.0300)
30. Hockett CF. 1960 The origin of speech. Sci. Am. 203,
88–111. (doi:10.1038/scientificamerican0960-88)
31. Perlman M. 2017 Debunking two myths against
vocal origins of language is iconic and multimodal
to the core. Interact Stud. 18, 376–401. (doi:10.
1075/is.18.3.05per)
32. Hockett CF. 1978 In search of Jove’s brow. Am.
Speech 53, 243–313. (doi:10.2307/455140)
33. Hewes GW et al. 1973 Primate communication and
the gestural origin of language [and Comments and
Reply]. Curr. Anthropol. 14, 5–24. (doi:10.1086/
201401)
34. Heine B, Kuteva T. 2007 The genesis of grammar: a
reconstruction. Oxford, UK: Oxford University Press.
35. Wescott RW. 1971 Linguistic iconism. Language 47,
416–428. (doi:10.2307/412089)
Phil. Trans. R. Soc. B 376: 20200190
Competing interests. We declare we have no competing interests.
Funding. The work by G.C. was funded by a grant from the
8
royalsocietypublishing.org/journal/rstb
Authors’ contributions. G.C. and N.E.J. came up with the original idea of
the study. J.D., G.C., N.E.J. and J.F. designed, piloted and revalidated
the model. N.E.J and J.D. recoded the datasets to enable a joining. J.D.
performed the analysis on sound evolution. G.C. extracted the data
for first language acquisition. N.E.J. extracted the data on iconic
value and produced the joining of the datasets. J.F. performed the
regression analysis of the joined datasets. G.C. wrote the text,
except for Model, Method and Data, which was written by J.D.,
N.E.J., G.C. and J.F. All co-authors revised and edited the text. J.D.
produced S1; J.D. and N.E.J. produced S2; G.C. produced S3; N.E.J.
produced S4; J.F. produced S5 with contributions by J.D., and J.F.
produced S6. J.D. organized the electronic supplementary material.
(2016-2020) from the Faculty of Humanities and Theology, Lund
University. The work by J.F. was funded by the Swe-Clarin
consortium, by a grant from the Swedish Research Council (VR
2017-00626, awarded to Lars Borin). The work by J.D. was funded
by the project CrossLingference, a grant from the European Research
Council (ERC) under the European Union’s Horizon 2020 research
and innovation programme (grant agreement no. 834050, awarded
to Gerhard Jäger).
Acknowledgements. We acknowledge Mechtild Tronnier, Lund University, for crucial methodological input. We also acknowledge Filip
Larsson, Simon Greenhill and Chundra Cathcart for additional
input in various phases of the project.
43.
44.
45.
46.
47.
48.
49.
50.
51.
and language evolution. Phil. Trans. R Soc. B 369,
20130298. (doi:10.1098/rstb.2013.0298)
Walkden G. 2019 The many faces of
uniformitarianism in linguistics. Glossa 4, 1–18.
(doi:10.5334/gjgl.888)
Dellert J et al. 2020 NorthEuraLex: a widecoverage lexical database of Northern Eurasia. Lang.
Res. Eval. 54, 273–301. (doi:10.1007/s10579-01909480-6)
Dellert J. 2018 Combining information-weighted
sequence alignment and sound correspondence
models for improved cognate detection. In
27th Int. Conf. on Computational Linguistics
(COLING 2018), Santa Fe, New Mexico, August
20–26.
Dellert J, Buch A. 2018 A new approach to concept
basicness and stability as a window to the
robustness of concept list rankings. Lang. Dyn.
Change 8, 157–181. (doi:10.1163/2210583200802001)
9
Phil. Trans. R. Soc. B 376: 20200190
Downloaded from https://royalsocietypublishing.org/ on 22 March 2021
42.
systematicity in language. Trends Cogn. Sci. 19,
603–615. (doi:10.1016/j.tics.2015.07.013)
Blasi DE, Wichmann S, Hammarström H, Stadler PF,
Christiansen MH. 2016 Sound–meaning association
biases evidenced across thousands of languages.
Proc. Natl Acad. Sci. USA 113, 10 818–10 823.
(doi:10.1073/pnas.1605782113)
Joo I. 2019 Phonosemantic biases found in LeipzigJakarta lists of 66 languages. Ling. Typol. 24, 1–12.
(doi:10.1515/lingty-2019-0030)
Erben Johansson N, Anikin A, Carling G, Holmer A. 2020
The typology of sound symbolism: defining macroconcepts via their semantic and phonetic features. Ling.
Typol. 24, 253–310. (doi:10.1515/lingty-2020-2034)
Paul H. 1898 Prinzipien der Sprachgeschichte,
3rd edn. Halle, Germany: Max Niemeyer.
Kiparsky P. 1982 Explanation in phonology.
Dordrecht, The Netherlands: Foris.
Imai M, Kita S. 2014 The sound symbolism
bootstrapping hypothesis for language acquisition
royalsocietypublishing.org/journal/rstb
36. Givón T. 2002 Bio-linguistics: the Santa Barbara
lectures. Amsterdam, The Netherlands: John
Benjamins.
37. Newmeyer FJ. 2015 Iconicity and generative
grammar. Language 68, 756–796. (doi:10.1353/lan.
1992.0047)
38. Cheney DL, Seyfarth RM. 1990 How monkeys see the
world. Chicago, IL: University of Chicago Press.
39. Grouchy P, D’Eleuterio GMT, Christiansen MH, Lipson
H. 2016 On the evolutionary origin of symbolic
communication. Sci. Rep. 6, 1–9. (doi:10.1038/
srep34615)
40. Ludwig VU, Adachi I, Matsuzawa T. 2011
Visuoauditory mappings between high luminance
and high pitch are shared by chimpanzees (Pan
troglodytes) and humans. Proc. Natl Acad. Sci. USA
108, 20 661–20 665. (doi:10.1073/pnas.
1112605108)
41. Dingemanse M, Blasi DE, Lupyan G, Christiansen
MH, Monaghan P. 2015 Arbitrariness, iconicity, and