Perceptual grouping explains constellations across cultures
arXiv:2010.06108v1 [physics.hist-ph] 13 Oct 2020
Charles Kemp1 , Duane W. Hamacher2 , Daniel R. Little1 & Simon J. Cropper1
1
Melbourne School of Psychological Sciences & 2 School of Physics
University of Melbourne, Australia
Abstract
Cultures around the world organise stars into constellations, or asterisms, and these groupings are
often considered to be arbitrary and culture-specific. Yet there are striking similarities in asterisms across
cultures and groupings such as Orion, the Big Dipper, the Pleiades and the Southern Cross are widely
recognized across many different cultures. It has been informally suggested that these shared patterns
are explained by common perceptual principles, such as the Gestalt laws of grouping, but there have
been no systematic attempts to catalog asterisms that recur across cultures or to explain the perceptual
basis of these groupings. Here we compile data from 27 cultures around the world to show that a simple
computational model of perceptual grouping accounts for many of the recurring cross-cultural asterisms.
As expected, asterisms such as Orion and the Big Dipper are common in our data, but we also find
that lesser-known asterisms such as Delphinus and the head of Aries are both repeated across cultures
and captured by our model. Our results suggest that basic perceptual principles account for more of the
structure of asterisms across cultures than previously acknowledged and highlight ways in which specific
cultures depart from this shared baseline.
Anyone who has tried to learn the full set of 88 Western constellations will sympathize with Herschel [1]
(p 156), who wrote that “the constellations seem to have been almost purposely named and delineated to
cause as much confusion and inconvenience as possible,” and that “innumerable snakes twine through long
and contorted areas of the heavens, where no memory can follow them.” Yet Herschel [2] (p 4) and others also point out that there are “well-defined natural groups of conspicuous stars” that have been picked
out and named by multiple cultures around the world [3, 4, 5]. For example, the Southern Cross is recognized as a cross by multiple cultures [15, 7], and is identified as a stingray by the Yolngu of northern
Australia [8], an anchor by the Tainui of Aotearoa/New Zealand [9], and as a curassow bird by the Lokono
of the Guianas [10].
Asterisms (e.g. the Southern Cross) are sometimes distinguished from formal constellations (e.g. the
region of the sky within which the Southern Cross lies), but in cross-cultural work these two terms are often
used interchangeably. It is widely acknowledged that asterisms reflect both universal perceptual principles
and culture-specific traditions. For example, Urton [11] (p 5) notes that “almost every culture seems to
have recognized a few of the same celestial groupings (e.g., the tight cluster of the Pleiades, the V of the
Hyades, the straight line of the belt of Orion), but the large constellation shapes of European astronomy and
astrology simply are not universally recognized; the shapes were projected onto the stars because the shapes
were important objects or characters in the Western religious, mythological, and calendrical tradition.” Even
groupings as apparently salient as the Southern Cross are not inevitable—some Australian cultures have
many names for individual stars but tend not to “connect the dots” to form figured constellations [12, 13, 14].
Although cultural factors are undeniably important, we will argue that perceptual factors explain more
of the inventory of asterisms across cultures than has previously been recognized. Krupp [3] (p 58) suggests
that a “narrow company” of asterisms is common across cultures and lists just four: Orion’s Belt, the
1
Pleiades, the Big Dipper, and the Southern Cross. Here we draw on existing resources to compile a detailed
catalog of asterisms across cultures, and find that the list of recurring asterisms goes deeper than the handful
of examples typically given by Krupp and others [3, 15, 16]. To demonstrate that these asterisms are mostly
consistent with universal perceptual principles, we present a computational model of perceptual grouping
and show that it accounts for many of the asterisms that recur across cultures.
Our data set includes 22 systems drawn from the Stellarium software package [17] and 5 from the
ethnographic literature. The data span six major regions (Asia, Australia, Europe, North America, Oceania,
and South America), and include systems from both oral (e.g. Inuit) and literate cultures (e.g. Chinese).
Stellarium currently includes a total of 42 systems, and we excluded 20 because they were closely related
to a system already included or because their documentation was not sufficiently grounded in the scholarly
literature. Most of our sources specify asterism figures in addition to the stars included in each asterism,
but we chose not to use these figures as they can vary significantly within a culture and because they were
not available for all cultures. Some of our analyses do not require asterism figures, and for those that do we
used minimum spanning trees computed over the stars within each asterism.
Figure 1A shows a consensus system generated by overlaying minimum spanning trees for asterisms
from all 27 cultures. The thick edges in the plot join stars that are grouped by many cultures. The most
common asterisms include familiar groups such as Orion’s belt, the Pleiades, the Hyades, the Big Dipper,
the Southern Cross, and Cassiopeia. The plot also highlights asterisms such as Corona Borealis, Delphinus
and the head of Aries that are less well-known but nevertheless picked out by multiple cultures. All of these
asterisms and more are listed in Table 1, which ranks 35 asterisms based on how frequently they recur across
cultures (an extended version of the table appears as Table S1). Some cross-cultural similarities in asterisms
reflect historical relationships between cultures, and Table 1 also summarizes a mixed-effects analysis that
captures some historical relationships by including a random effect for geographic region. This mixedeffects approach prioritizes asterisms that are attested across geographic regions even if they are relatively
rare within each region, and the results suggest that asterisms including the Southern Pointers, Lyra, and
Corona Australis deserve to be listed alongside the ten singled out at the top of Figure 1A.
To explain these shared patterns in the night sky, scholars from multiple disciplines have suggested that
asterisms are shaped in part by universal perceptual principles, including the principle that bright objects
are especially salient, and that nearby objects are especially likely to be grouped [18, 19, 20]. Claims that
these principles account for star grouping across cultures are mostly anecdotal, but the relevant principles
have been carefully studied by psychologists [21, 22] and have inspired the development of formal models
of perceptual grouping [6, 24, 25, 4, 27, 28]. We build on this tradition by using a computational model
(the graph clustering model, or GC model for short) to explore the extent to which the factors of brightness
and proximity account for asterisms across cultures. The GC model constructs a graph with the stars as
nodes, assigns strengths to the edges based on proximity and brightness, and thresholds the graph so that
only the n strongest edges remain. Figure 1B shows the model graph when the threshold n is set to 320. The
connected components of this thresholded graph represent model predictions about stars that are likely to be
grouped across cultures. There is a strong resemblance between these model predictions and the consensus
system in Figure 1A. The model picks out groups that correspond closely to the ten frequently-occurring
asterisms highlighted in the inset panels of Figure 1A. Beyond these ten asterisms the model also picks out
the Southern Pointers, the teapot in Sagittarius, the head of Draco, the head and stinger of Scorpius, Lyra,
the sickle in Leo, the shaft of Aquila, and more. Table S2 lists all groups found by the model and indicates
which of them are similar to human asterisms attested in Table S1.
The steps carried out by the model are summarized by Figure 2. The first step is to construct a graph
over stars. Existing graph-based clustering models typically operate over a graph corresponding to a mini-
2
A
B
Figure 1: Common asterisms across cultures compared with model predictions. (A) Consensus system
created by overlaying minimum spanning trees for all asterisms in our data set of 27 cultures. Edge widths
indicate the number of times an edge appears across the entire dataset, and edges that appear three or fewer
times are not shown. Node sizes indicate apparent star magnitudes, and only stars with magnitudes brighter
than 4.5 have been included. Insets show 10 of the most common asterisms across cultures, and numbers
greater than 10 identify additional asterisms mentioned in the text or Table 1: Southern Pointers (11), shaft
of Aquila (12), little Dipper (13), head of Scorpius (14), stinger of Scorpius (15), sickle in Leo (16), Corvus
(17), Northern Cross (18), Lyra (19), Square of Pegasus (20), Corona Australis (21), head of Draco (22) and
the teapot in Sagittarius (23). (B) Asterisms according to the GC model with n = 320. The model assigns
a strength to each edge in a graph defined over the stars, and here the strongest 320 edges are shown. Edge
widths are proportional to the strengths assigned by the model.
3
1
2
3
4
Human
(raw)
0.63
0.62
0.59
0.57
Human
(adj)
0.55
0.57
0.51
0.46
Model
Score
1.0
1.0
1.0
0.88
5
6
0.43
0.37
0.42
0.36
1.0
0.5
7
8
9
10
11
0.35
0.3
0.29
0.28
0.26
0.35
0.34
0.33
0.31
0.3
1.0
0.6
1.0
0.71
0.45
12
13
14
15
0.25
0.24
0.24
0.24
0.29
0.31
0.38
0.33
0.75
1.0
1.0
0.44
16
17
18
0.23
0.22
0.22
0.27
0.25
0.3
1.0
1.0
0.62
19
20
21
22
23
0.21
0.21
0.21
0.2
0.19
0.34
0.31
0.29
0.29
0.31
0.04
1.0
1.0
1.0
0.75
24
25
26
27
0.19
0.19
0.18
0.18
0.35
0.28
0.25
0.3
0.0
0.83
0.56
0.33
28
0.17
0.22
0.58
29
30
31
0.17
0.16
0.16
0.27
0.33
0.27
1.0
0.83
0.56
32
33
0.16
0.15
0.26
0.22
0.01
0.2
34
35
0.14
0.14
0.3
0.35
0.0
0.67
Stars
Description
34DelOri, 46EpsOri, 50ZetOri
25EtaTau, 17Tau, 19Tau, 20Tau, 23Tau, 27Tau
87AlpTau, 54GamTau, 61Del1Tau, 74EpsTau, 78The2Tau
50AlpUMa, 48BetUMa, 64GamUMa, 69DelUMa, 77EpsUMa, 79ZetUMa, 85EtaUMa
Alp1Cru, BetCru, GamCru, DelCru
5AlpCrB, 3BetCrB, 8GamCrB, 10DelCrB, 13EpsCrB, 4TheCrB, 14IotCrB
66AlpGem, 78BetGem
34DelOri, 46EpsOri, 50ZetOri, 44IotOri, 42Ori
9AlpDel, 6BetDel, 12Gam2Del, 11DelDel
18AlpCas, 11BetCas, 27GamCas, 37DelCas, 45EpsCas
58AlpOri, 19BetOri, 24GamOri, 34DelOri, 46EpsOri, 50ZetOri, 53KapOri
46EpsOri, 50ZetOri, 48SigOri
13AlpAri, 6BetAri, 5Gam2Ari
Alp1Cen, BetCen
50AlpUMa, 48BetUMa, 64GamUMa, 69DelUMa, 77EpsUMa, 79ZetUMa, 85EtaUMa, 1OmiUMa, 29UpsUMa,
30PhiUMa, 63ChiUMa, 23UMa
53AlpAql, 60BetAql, 50GamAql
50AlpUMa, 48BetUMa
1AlpUMi, 7BetUMi, 13GamUMi, 23DelUMi, 22EpsUMi,
16ZetUMi, 21EtaUMi
54AlpPeg, 53BetPeg
8Bet1Sco, 7DelSco, 6PiSco
35LamSco, 34UpsSco
Iot1Sco, KapSco, 35LamSco, 34UpsSco
32AlpLeo, 41Gam1Leo, 17EpsLeo, 36ZetLeo, 30EtaLeo,
24MuLeo
21AlpAnd, 88GamPeg
1AlpCrv, 9BetCrv, 4GamCrv, 7DelCrv, 2EpsCrv
21AlpSco, 8Bet1Sco, 7DelSco, 6PiSco, 20SigSco
50AlpCyg, 6Bet1Cyg, 37GamCyg, 18DelCyg, 53EpsCyg,
21EtaCyg
21AlpSco, 8Bet1Sco, 7DelSco, 26EpsSco, Zet2Sco, Mu1Sco,
6PiSco, 20SigSco, 23TauSco
21AlpSco, 20SigSco, 23TauSco
3AlpLyr, 10BetLyr, 14GamLyr, 12Del2Lyr, 6Zet1Lyr
26EpsSco, Zet2Sco, EtaSco, TheSco, Iot1Sco, KapSco,
35LamSco, Mu1Sco, 34UpsSco
21AlpAnd, 54AlpPeg, 53BetPeg, 88GamPeg
34DelOri, 46EpsOri, 50ZetOri, 87AlpTau, 54GamTau,
61Del1Tau, 74EpsTau, 78The2Tau, 17Tau
43GamCnc, 47DelCnc
AlpCrA, BetCrA, GamCrA, DelCrA
Orion’s Belt
Pleiades
Hyades
Big Dipper
Southern Cross
Corona Borealis
Castor and Pollux
Delphinus
Cassiopeia
Orion
Head of Aries
Southern Pointers
Shaft of Aquila
Little Dipper
Head of Scorpius
Stinger of Scorpius
Sickle
Corvus
Northern Cross
Lyra
Square of Pegasus
Corona Australis
Table 1: Common asterisms across cultures. Raw human scores roughly indicate how often an asterism is
found in our data set, and adjusted scores are based on a mixed model that allows for historical relationships
between cultures. The model scores roughly indicate how well these asterisms are captured by the GC model
(1.0 indicates a perfect match).
4
1. Construct graph over stars
4. Scale brightness and proximity
within local neighborhood
2. Compute brightness and
proximity for each edge
3. Weight brightness and
proximity based on
5. Combine brightness and
proximity
6. Remove all but n strongest
edges to form clusters
Figure 2: Steps carried out by the graph clustering (GC) model. Each step is illustrated using a region of
the sky that includes the Southern Cross and the Southern Pointers. bxy and pxy denote brightness weights
(blue) and proximity weights (red) associated with the edge between x and y. m(x) and m(y) are the
apparent magnitudes of stars x and y, and dxy is the angular separation between these stars. bG denotes the
median brightness weight across the entire graph, bL denotes the median brightness weight within 60° of a
given edge, and pG and pL are defined similarly. In steps 2 through 6 edge widths are proportional to edge
weights.
5
mal spanning tree [2] or Delaunay Triangulation [3, 4], and the GC model uses the union of three Delaunay
triangulations defined over stars with apparent magnitudes brighter than 3.5, 4.0 and 4.5. Delaunay-like
representations are hypothesized to play a role in early stages of human visual processing [25], and combining Delaunay triangulations at multiple scales ensures that the resulting graph includes both edges between
bright stars that are relatively distant and edges between fainter stars that are relatively close. The second
step assigns a brightness and proximity to each edge. For an edge joining two stars, proximity is inversely
related to the angular distance between the stars, and brightness is based on the apparent magnitude of the
fainter of the two stars. The third step weights brightness and proximity based on a parameter ρ. For all
analyses we set ρ = 3, which means that brightness is weighted more heavily than proximity. The fourth
step scales brightness and proximity so that the distribution of these variables within a local neighborhood
of 60° is comparable with the distribution across the entire celestial sphere. Scaling in this way allows the
impact of brightness and proximity to depend on the local context. For example, the Southern Cross lies in
a region that contains many stars in close proximity, and we propose that stars need to be especially close
to stand out in this context. Previous psychological models of perceptual grouping incorporate analogous
local scaling steps [6, 4], and the neighborhood size of 60° was chosen to match the extent of mid-peripheral
vision. The fifth step multiplies brightness and proximity to assign an overall strength to each edge, and
the final step thresholds the graph so that only the strongest n edges remain. We compared the GC model
to several alternatives, including variants that remove one of its components, k-means clustering, and the
CODE model of perceptual grouping [6]. The results reveal that the GC model performs better than all of
these alternatives, and full details are provided in the supplementary information.
Each human asterism can be assigned a score between 0 and 1 that measures how well it is captured
by the GC model. Scores for each culture in our data set are plotted in Figure 3. The model accounts
for some cultures well — for example, 13 of 20 Arabic asterisms, 19 of 38 Marshall Islands asterisms
and 55 of 161 Chinese asterisms are captured perfectly by the model for some value of the threshold n.
The systems captured well by the model are drawn from a diverse set of geographical regions, suggesting
that genealogical relationships between cultures are not enough to explain the recurring patterns predicted
by the model. Yet there are also many asterisms that are not captured by the model, and the Chinese
and Western systems in particular both include many asterisms with a model score of 0. Both systems
partition virtually all of the visible sky into asterisms, and achieving this kind of comprehensive coverage
may require introducing asterisms (including Herschel’s “innumerable snakes”) that do not correspond to
natural perceptual units.
Although some attested asterisms missed by the GC model will probably resist explanation by any
model of perceptual grouping, others can perhaps be captured by extensions of the model. For example,
the model tends not to group stars separated by a relatively large distance. As a result it misses the lower
arm of the Northern Cross (Cygnus) and misses the Great Square of Pegasus entirely. These errors could
perhaps be addressed by developing a multi-scale approach that forms groups at different levels of spatial
resolution [27]. Another possible extension is to incorporate additional grouping cues such as the Gestalt
principle of good continuity, which is consistent with some of the most basic processes of visual contour
detection [31, 32, 33]. The current model combines Corona Borealis with an extraneous star and does not
connect the tail of Scorpius into a single arc, and incorporating a preference for groups that form smooth
curves [4] may resolve both shortcomings.
In addition to scoring each system in our data relative to the GC model, we also examined how closely
each system resembles other systems in our data set (see Figure S13). The system most different from all
others is the Chinese system, which includes more than 300 asterisms, many of which are small and have
no counterparts in records for other cultures. In future work, the model may prove useful for evaluating
6
Dakota
Western
3
2
1
0
15
10
5
0
0
Lokono
3
2
1
0
4
2
0
10
2
1
2
5
0
0.5
1.0
Tongan
4
2
0
Indo−Malay
4
4
2
0
0.0
Boorong
0.5
1.0
2
0
0.0
0.5
1.0
0.0
0.5
1.0
4
4
2
0
0.0
0
Lenekel
0
Inuit
2
Indian
4
0
2
Marshall
0
2
Arabic
4
15
10
5
0
Pacariqtambo
4
Tukano
4
0
2
0
Norse
0
Anutan
1
0
10
5
0
0
Sami
1
Maori
2
Chinese
Siberian
2
0
5
40
20
0
Belarusian
Ojibwe
4
Macedonian
4
Babylonian
4
2
0
0
Tupi
3
2
1
0
5
Egyptian
2
Romanian
count
Navajo
4
2
0
0.0
0.5
1.0
0.0
0.5
1.0
model score
Figure 3: Model results for individual cultures in our data set. Scores of 1 indicate asterisms that are
perfectly captured by the GC model for some value of the threshold n, and each distribution includes scores
for all asterisms that remain after filtering at a stellar magnitude of 4.5. The cultures are ordered based on
the means of the distributions.
hypotheses about historical relationships between systems from different cultures [34, 35]. For example,
the model could potentially be used to ask whether Oceanic constellations are more similar to Eurasian
constellations than would be expected based on perceptual grouping alone.
We have focused throughout on similarities in star groups across cultures, but there are also striking
similarities in the names and stories associated with these groups [34, 36, 37]. For example, in Greek
traditions Orion is known as a hunter pursuing the seven sisters of the Pleiades, and versions of the same
narrative are shared by multiple Aboriginal cultures of Australia [38, 39]. Perceptual grouping helps to
explain which patterns of stars are singled out for attention, and it is both surprising and satisfying that a
simple model based on brightness and proximity alone can account for so many of the asterisms commonly
found across cultures. Understanding the meanings invested in these asterisms, however, requires a deeper
knowledge of history, cognition and culture.
Data
22 of the systems were drawn from Stellarium [17] and the sources of the remaining 5 appear in the captions of
Figures S15-S41. Stellar data were drawn from version 5.0 of the Yale Bright Star Catalog [40]. For the mixedeffects analysis, the 27 systems were organized into 6 regions: Asia (Arabic, Chinese, Indian, Indo-Malay), Australia
(Boorong), North America (Dakota, Inuit, Navajo, Ojibwe), Oceania (Anutan, Lenakel, Maori, Marshall Islands,
7
Tongan), South America (Lokono, Pacariqtambo, Tukano, Tupi), and Western (Babylonian, Belarusian, Egyptian,
Macedonian, Norse, Romanian, Sami, Siberian, Western).
Human scores
The match between an asterism a and a reference asterism r is defined as
|a ∩ r| − |a \ r|
match(a, r) = max
,0 ,
|r|
(1)
where |a ∩ r| is the number of stars shared by a and r, |a \ r| is the number of stars in a that are not shared by r, and
|r| is the number of stars in r. The function attains its maximum value of 1 when a and r are identical.
The match between asterism a and an entire system of asterisms S is defined as
match(a, S) = max (match(a, r)) .
r∈S
(2)
Equation 2 captures the idea that a matches S well if there is at least one asterism r in S such that the match between
a and r is high.
In Table 1, the variable labeled Human (raw) is defined as
human raw(a, Shuman ) = meanS∈Shuman (match(a, S)) ,
(3)
where Shuman is the set of all 27 systems in our data set. We computed scores for all asterisms in the entire data set,
but to avoid listing variants of the same basic asterism, an asterism a is included in Table 1 only if match(a, r) < 0.5
for all asterisms r previously listed in the table.
The adjusted scores are based on a mixed ordinal regression carried out using the brms package in R [41]. For
each asterism a, match scores (Equation 2) for all 27 systems S were mapped to 11 ordered intervals, one for zero
scores and the remaining 10 for the intervals (0, 0.1],. . . , (0.9, 1]. We then fit an ordinal regression model that aimed to
predict these interval assignments given a constant fixed effect and a random effect for geographic region (the model
formula was interval ∼ 1 + (1|region)). We used the fitted model to compute the posterior predictive distribution over
intervals for a system from a novel geographic region, and the mean of this distribution is the adjusted score in Table
1 (computing the mean requires identifying each interval with its midpoint).
Model scores
The model scores in Table 1 are defined as
modelscore(a, Smodel ) = max (match(a, S))
S∈Smodel
(4)
where Smodel includes model systems for all values of n between 1 and 2000. Scoring asterisms in this way avoids
having to choose a single value of the threshold parameter n.
Acknowledgements
We acknowledge the Indigenous custodians of the traditional astronomical knowledge used in this paper, and thank
Joshua Abbott, Celia Kemp, Bradley Schaefer and Yuting Zhang for comments on the manuscript. This work was
supported in part by ARC FT190100200, ARC DE140101600, the McCoy Seed Fund, the Laby Foundation, the
Pierce Bequest, and by a seed grant from the Royal Society of Victoria.
References
[1] Herschel, J. F. W. A Treatise on Astronomy (Lea & Blanchard, 1842).
8
[2] Herschel, J. F. W. On the Advantages to be Attained by a Revision and Re-arrangment of the Constellations,
with Especial Reference to Those of the Southern Hemisphere, and on the Principles Upon Which such rearrangement ought to be conducted (Moyes and Barclay, 1841).
[3] Krupp, E. C. Night gallery: The function, origin, and evolution of constellations. Archaeoastronomy 15, 43
(2000).
[4] Aveni, A. People and the sky: Our ancestors and the cosmos (2008).
[5] Kelley, D. H. & Milone, E. F. Exploring ancient skies: A survey of ancient and cultural astronomy (Springer
Science & Business Media, 2011).
[6] Urton, G. Constructions of the ritual-agricultural calendar in Pacariqtambo, Peru. In Del Chamberlain, V.,
Carlson, J. B. & Young, J. M. (eds.) Songs from the Sky: Indigenous Astronomical and Cosmological Traditions
of the World (Ocarina Books, 2005).
[7] Roe, P. G. Mythic substitution and the stars: Aspects of Shipibo and Quechua ethnoastronomy compared. In
Del Chamberlain, V., Carlson, J. B. & Young, J. M. (eds.) Songs from the Sky: Indigenous Astronomical and
Cosmological Traditions of the World (Ocarina Books, 2005).
[8] Mountford, C. P. Art, Myth and Symbolism. Vol 1. (Melbourne University Press, 1956).
[9] Best, E. The astronomical knowledge of the Maori, genuine and empirical (Dominion Museum, 1922).
[10] Magaña, E. & Jara, F. The Carib sky. Journal de la Société des Américanistes 105–132 (1982).
[11] Urton, G. At the crossroads of the earth and the sky: an Andean cosmology (University of Texas Press, 1981).
[12] Johnson, D. Night skies of Aboriginal Australia: a noctuary (Sydney University Press, 2014).
[13] Maegraith, B. G. The astronomy of the Aranda and Luritja tribes. Transactions of the Royal Society of South
Australia 56, 19–26 (1932).
[14] Cairns, H. & Harney, B. Y. Dark sparklers: Yidumduma’s Aboriginal astronomy (2004).
[15] Krupp, E. C. Sky tales and why we tell them. In Seline, H. (ed.) Astronomy Across Cultures, 1–30 (Springer,
2000).
[16] Aveni, A. F. Skywatchers of ancient Mexico. (1980).
[17] Chéreau, F. & the Stellarium team. Stellarium (2020). URL stellarium.org. Version 0.20.1.
[18] Metzger, W. Laws of seeing. (MIT Press, 1936/2006).
[19] Yantis, S. Multielement visual tracking: Attention and perceptual organization. Cognitive Psychology 24, 295–
340 (1992).
[20] Hutchins, E. The role of cultural practices in the emergence of modern human intelligence. Philosophical
Transactions of the Royal Society B: Biological Sciences 363, 2011–2019 (2008).
[21] Wagemans, J. et al. A century of Gestalt psychology in visual perception: I. perceptual grouping and figure–
ground organization. Psychological Bulletin 138, 1172–1217 (2012).
[22] Wagemans, J. et al. A century of Gestalt psychology in visual perception: Ii. conceptual and theoretical foundations. Psychological Bulletin 138, 1218–1252 (2012).
[23] Compton, B. J. & Logan, G. D. Evaluating a computational model of perceptual grouping by proximity. Perception & Psychophysics 53, 403–421 (1993).
9
[24] Kubovy, M., Holcombe, A. O. & Wagemans, J. On the lawfulness of grouping by proximity. Cognitive Psychology 35, 71–98 (1998).
[25] Dry, M. J., Navarro, D. J., Preiss, K. & Lee, M. D. The perceptual organization of point constellations. Proceedings of the 31st Annual Meeting of the Cognitive Science Society 1151–1156 (2009).
[26] van den Berg, M. C. J. Grouping by proximity and grouping by good continuation in the perceptual organization
of random dot patterns. Ph.D. thesis, University of Virginia (1998).
[27] Froyen, V., Feldman, J. & Singh, M. Bayesian hierarchical grouping: Perceptual grouping as mixture estimation.
Psychological Review 122, 575 (2015).
[28] Im, H. Y., Zhong, S.-h. & Halberda, J. Grouping by proximity and the visual impression of approximate number
in random dot arrays. Vision research 126, 291–307 (2016).
[29] Zahn, C. T. Graph-theoretical methods for detecting and describing Gestalt clusters. IEEE Transactions on
Computers 20, 68–86 (1971).
[30] Ahuja, N. Dot pattern processing using Voronoi neighborhoods. IEEE Transactions on Pattern Analysis and
Machine Intelligence 336–343 (1982).
[31] Field, D. J., Hayes, A. & Hess, R. F. Contour integration by the human visual system: evidence for a local
“association field”. Vision research 33, 173–193 (1993).
[32] Das, A. & Gilbert, C. D. Topography of contextual modulations mediated by short-range interactions in primary
visual cortex. Nature 399, 655–661 (1999).
[33] Geisler, W. S., Perry, J. S., Super, B. J. & Gallogly, D. P. Edge co-occurrence in natural images predicts contour
grouping performance. Vision Research 41, 711–724 (2001).
[34] Gibbon, W. B. Asiatic parallels in North American star lore: Ursa Major. The Journal of American Folklore 77,
236–250 (1964).
[35] Berezkin, Y. The cosmic hunt: Variants of a Siberian-North American myth. Folklore: Electronic Journal of
Folklore 79–100 (2005).
[36] Baity, E. C. et al. Archaeoastronomy and ethnoastronomy so far [and comments and reply]. Current anthropology
14, 389–449 (1973).
[37] Culver, R. Astronomy. In Selin, H. (ed.) Encyclopaedia of the history of science, technology, and medicine in
non-western cultures, 292–299 (Springer, 2008).
[38] Johnson, D. D. Interpretations of the Pleiades in Australian Aboriginal astronomies. Proceedings of the International Astronomical Union 7, 291–297 (2011).
[39] Leaman, T. M. & Hamacher, D. W. Baiami and the emu chase: an astronomical interpretation of a Wiradjuri
Dreaming associated with the Burbung. Journal of Astronomical History and Heritage 22, 225–237 (2019).
[40] Hoffleit, D. & Jaschek, C. The bright star catalogue (1982).
[41] Bürkner, P.-C. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software
80, 1–28 (2017).
[42] Hamacher, D. W. On the astronomical knowledge and traditions of Aboriginal Australians. Ph.D. thesis, Macquarie University (2012).
10
[43] Van Oeffelen, M. P. & Vos, P. G. Enumeration of dots: An eye movement analysis. Memory & Cognition 12,
607–612 (1984).
[44] Compton, B. J. & Logan, G. D. Judgments of perceptual groups: Reliability and sensitivity to stimulus transformation. Perception & Psychophysics 61, 1320–1335 (1999).
[45] Schreiner, J. Redefining constellations and asterisms Available at http://www.jschreiner.com/
english/stars/home.html.
[46] Xu, S., Chen, K. & Zhou, Y. Re-clustering of constellations through machine learning. Tech. Rep., Stanford
University (2014).
[47] Avilin, T. Astronyms in Belarussian folk beliefs. Archaeologia Baltica 10, 1 (2009).
[48] Stanbridge, W. E. On the astronomy and mythology of the Aborigines of Victoria. Proceedings of the Philosophical Institute of Victoria 2, 137–140 (1857).
[49] Kaye, G. R. Hindu astronomy: Ancient science of the Hindus (New Delhi, 1981).
[50] Ammarell, G. Astronomy in the Indo-Malay archipelago. In Selin, H. (ed.) Encyclopaedia of the History of
Science, Technology, and Medicine in Non-Western Cultures, 324–333 (Springer, 2008).
[51] Erdland, P. A. Die Marshall-Insulaner: Leben und Sitte, Sinn und Religion eines Südsee-Volkes (Aschendorffsche, 1914).
11
Supplementary Information
Contents
1
Cross-cultural data
12
2
Stellar data
15
3
Measuring the match between asterisms
16
4
Common asterisms
16
5
The GC Model
16
5.1 Fitting ρ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6
GC model results
7
Model comparisons
7.1 Scoring functions . .
7.2 GC model . . . . . .
7.3 CODE model . . . .
7.4 k-means clustering .
7.5 Additional baselines .
7.6 Model scores . . . .
8
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
23
24
25
27
27
Comparisons across cultures
28
A Asterism systems for 27 cultures
29
B Common asterisms
44
C Asterisms for the GC model with n = 320
45
1 Cross-cultural data
Appendix A shows asterisms for all 27 cultures in our data set. The majority of the systems were drawn from
the Stellarium software package, and we compiled the remainder using sources given in the figure captions
in Appendix A. Stellarium includes multiple systems for some cultures: for example, there are early and
later versions of the Babylonian sky culture, and three versions of the Chinese sky culture. In cases like
these we removed all but a single representative of each culture. We also removed a number of additional
Stellarium systems for reasons documented in Table S1. Our final data set includes 22 of the 42 Stellarium
systems available as of May 25, 2020.
Before carrying out our analyses we pre-processed each system by removing stars fainter than 4.5 in
magnitude then removing all asterisms that included no stars or just one star after filtering. For example,
the constellation Mensa is removed from the Western system because the brightest star in this constellation
has a magnitude of 5.08. Figure S4 shows the distribution of magnitudes for each system in our data set.
12
Sky culture
Almagest
Arabic
Armintxe
Aztec
Boorong
Chinese contemporary
Chinese medieval
Hawaiian starlines
Indian
Japanese moon stations
Kamilaroi
Korean
Maya
Mongolian
Northern Andes
Sardinian
Seleucid
Western (Sky & Telescope)
Western (Hlad)
Western (Rey)
Reason for exclusion
Lists 48 constellations of the Greeks, which are the source of the Western system
Based on the 48 constellations of the Greeks, which are the source of the Western system
Identifications not sufficiently grounded in the published literature
Identifications not sufficiently grounded in the published literature
Already included in the data set
Only one Chinese system is included
Only one Chinese system is included
Identifications not sufficiently grounded in the published literature
Already included in the data set
Identifications not sufficiently grounded in the published literature
Includes names of single stars only
Closely related to the Chinese system
Identifications not sufficiently grounded in the published literature
Identifications not sufficiently grounded in the published literature
Identifications not sufficiently grounded in the published literature
Identifications not sufficiently grounded in the published literature
Only one Babylonian system is included
Only one Western system is included
Only one Western system is included
Only one Western system is included
Table S1: Stellarium systems excluded from our analysis.
13
Dakota
Western
10
5
0
50
5
0
0
Romanian
40
Tupi
15
10
5
0
20
0
Lokono
count
Navajo
Egyptian
10
15
10
5
0
0
0
Arabic
Inuit
6
4
2
0
5
0
0.0 2.5 5.0
Marshall
Indian
0
Tongan
4
2
0
Lenekel
Indo−Malay
4
10
2
5
0
0.0 2.5 5.0
Tukano
15
10
5
0
15
10
5
0
5
Boorong
0
0
Pacariqtambo
6
4
2
0
5
5
4
2
0
2
Ojibwe
10
Anutan
Sami
4
4
2
0
Chinese
Siberian
Norse
Babylonian
40
20
0
150
100
50
0
5
15
10
5
0
4
2
0
Macedonian
Belarusian
Maori
30
20
10
0
0
0.0 2.5 5.0
0.0 2.5 5.0
4
2
0
0.0 2.5 5.0
0.0 2.5 5.0
magnitude
Figure S4: Distributions of star magnitudes for all systems in our data set. The vertical line in each plot
shows the threshold value of 4.5, and the order of the systems matches Figure 3.
14
# disconnected
●
10
Egyptian
Babylonian
●
Western
Chinese ●
●
5
0
●●●●●
● ●●●
●
●●●● ●● ● ●
0
●
●
50
100
150
# connected
Figure S5: Counts of asterisms that are connected and disconnected with respect to the GC model graph.
Each point corresponds to a system in our data, and systems with more than 3 disconnected asterisms have
been labelled.
Filtering at 4.5 removes around 25% of the stars across the entire set of systems, but the proportion of
faint stars varies across systems. Nearly 50% of the stars in the Tukano system have magnitudes greater
than 4.5, but for around half of the systems 10% or fewer of the stars have magnitudes greater than 4.5.
Many of the cultures in our data have systems of asterisms that have not been documented in full, and
the ethnoastronomical accounts that do exist naturally tend to focus on brighter stars. The distributions in
Figure S4 therefore may not reflect the full set of asterisms that would be identified by expert astronomers
from the cultures in question.
The plots in Figures S18 through S44 show all asterisms remaining after the initial filtering step. In each
plot the constellation figures are minimum spanning trees computed over the model graph using angular
distance as the edge weight. In some cases an asterism does not correspond to a connected subset of the
model graph, and in these cases minimal spanning forests are shown instead. For example, the Dakota
system in Figure S23 includes a large asterism called “Ki Inyanka Ocanku” (The Race Track) that groups
stars from Gemini, Canis Minor, Canis Major, Orion, Taurus and Auriga into a large circle. The scale of
this asterism is larger than the scale of the model graph, and as a result Figure S23 shows the asterism as a
collection of 5 disconnected components. Figure S5 shows the number of disconnected asterisms for each
culture in our data. Around 90% of asterisms in the filtered data are connected with respect to the model
graph, and the Egyptian, Babylonian and Western systems stand out as having relatively high proportions of
disconnected asterisms.
2 Stellar data
We used stellar data from version 5.0 of the Yale Bright Star catalog, which includes information about
magnitude and position (right ascension and declination) for 9110 stars. Star positions in our data use J2000
coordinates, and are therefore correct for Jan 1, 2000. Star positions change over time due to precession,
nutation, and proper motion. Precession and nutation do not affect our analyses because they do not affect
the relative positions of stars with respect to each other. Proper motion does affect the shapes of asterisms
over long periods of time — for example, Hamacher[1] describes how the shape of the Southern Cross has
changed over the past 10,000 years. J2000 coordinates are suitable for our purposes because the systems
15
analyzed in this paper are based on records from the last few thousand years, and because the stars with the
greatest proper motion are at or below the threshold of visibility.
We filtered the data to retain only stars brighter than 6.5 in magnitude, which roughly corresponds to the
faintest magnitude still visible to the naked eye. Some stars (e.g. double stars) are very close to each other,
and if two stars had positions that matched up to 5 decimal places we replaced them with a single star with
magnitude equal to the combined magnitude of the pair. These initial pre-processing steps yielded a set of
8258 stars. For all analyses we filtered the set further and considered only the 918 stars brighter than 4.5 in
magnitude.
3 Measuring the match between asterisms
The function for computing the match(a, r) between asterism a and reference asterism r appears as Equation
1 in Materials and Methods.
There are two ways in which a can differ from r: it can include extraneous stars, and it can fail to include
some of the stars in r. The match function penalizes the first of these failings more heavily than the second.
This property is especially useful when comparing an asterism against a reference that includes a relatively
large number of stars. For example, the teapot asterism includes 8 of the brightest stars in Sagittarius,
and the version of Sagittarius in our Western system includes 17 stars after thresholding at magnitude 4.5.
Intuitively, the teapot matches Sagittarius fairly well, and the function in Equation 1 assigns a match of 0.47
between the teapot (a) and Sagittarius (r). If we used an alternative match function where the numerator
included penalties for both |a \ r| and |r \ a|, then the match between the teapot and Sagittarius would be 0.
The match between an asterism a and an entire system of asterisms S is defined in Equation 2 of Materials and Methods.
4 Common asterisms
The most common asterisms across our data set are listed in Table S2 in Appendix B, which extends Table
1 by including additional asterisms.
5 The GC Model
The Graph Clustering (GC) model begins by building a graph over the 918 stars that remained after preprocessing. We construct three Delaunay triangulations over stars with magnitudes less than 3.5, 4.0 and 4.5
respectively, and the final graph G (shown in Figure S6) is the union of all three.
An edge in G that joins stars x and y is labelled with two attributes: mxy , the apparent magnitude of the
fainter of the two stars, and dxy , the angular distance between the stars. The two attributes are on different
scales: m lies between -1.46 and 4.5, and d lies between 0.1 and 41.6 degrees. In both cases higher values
are “worse:” distant stars are relatively unlikely to be grouped, and faint stars are relatively unlikely to be
included in groupings. We convert each magnitude m to a brightness b, and each distance d to a proximity
p:
bxy = exp (−mxy )
pxy = exp (−dxy )
16
(S1)
Figure S6: Graph over stars used by the GC model.
17
The negative exponential transformation means that higher values are now “better.”1 We then weight brightness bxy and proximity pxy based on a parameter ρ:
ρ
b ← b ρ+1
1
(S2)
p ← p ρ+1
where we have dropped the subscripts of both bxy and pxy . When ρ = 1 proximity and brightness are
weighted equally, and when ρ > 1 brightness is weighted more than proximity. When ρ = 0 brightness is
effectively discarded, and when ρ = ∞ proximity is effectively discarded.
The next step is to scale the brightness and proximity values within a local neighborhood of 60°. For
each edge (s1 , s2 ) joining stars s1 and s2 , the local neighborhood L is the subgraph of the full model graph
G that includes all stars that lie within 60° of either s1 or s2 . The p value of the edge (s1 , s2 ) is then scaled
by the factor
median{eg (p)}
eg ∈G
pG
=
(S3)
pL
median{el (p)}
el ∈L
where eg is an edge in the full model graph G, el is an edge that lies within the local neighborhood L, and
ei (p) is the p value of edge ei . Scaling p in this way means that the distribution of p values within any
local neighborhood becomes comparable to the distribution over the entire graph. For example, consider a
neighborhood that includes many close stars. Before scaling, most edges in the neighborhood will have high
values of p. After scaling, only pairs of stars that are especially close relative to the neighborhood will have
high values of p. The same approach in Equation S3 is used to scale the brightness values b. When scaling
both attributes a neighborhood size of 60° was chosen so that the neighborhood corresponds roughly to the
extent of mid-peripheral vision.
After scaling, the proximity and brightness values for each edge are combined multiplicatively to produce a single strength s = bp for each edge. We then threshold the graph by removing all but the top n edges
in the graph, and the clusters returned by the model correspond to connected components of the thresholded
graph.
To assess the contribution made by different components of the GC model we will compare the model
to three variants. First is a model that omits the local scaling step. This GC (no scaling) model can also be
viewed as a variant in which neighborhood L in Equation S3 expands to encompass the entire graph G. The
second and third variants set ρ = 0 and ρ = ∞ respectively, and we refer to them as the GC (no brightness)
and GC (no proximity) models. These labels indicate whether or not brightness and proximity contribute
to the final edge strengths s, but in both cases brightness and proximity are still used when constructing
the original graph G: each Delaunay triangulation uses proximity, and combining the three triangulations
means that only stars brighter than 4.5 in magnitude are included.
5.1
Fitting ρ
The GC model has two parameters: ρ, which determines the relative contributions of proximity and brightness, and n, which determines the number of edges in the thresholded graph. We fit ρ based on the idea that
stars connected by strong edges in the model graph should be frequently grouped across cultures. The first
step is to assemble a set of human edges by computing minimum spanning trees (MSTs) for each asterism
1
−mxy
Brightness could be defined as flux (i.e. bxy = 10 2.5 ) but the natural exponential formulation is simpler and equally good
for our purposes. Changing from base e to base 10 leaves the model unchanged if the parameter ρ is adjusted accordingly.
18
in our data set. All MSTs were computed over the model graph G using raw angular distance as the edge
weight. The human edges included all edges in these MSTs, and the human strength of each edge was
defined as the number of times it appeared in the set. For example, the edge joining the Southern Pointers
appears in 9 of the MSTs and therefore has a human strength of 9.
We then assembled a set of model edges that included all of the strongest edges according to the model.
The model edges include all edges in the MST of G, where the MST is computed using model strengths
s = bp rather than angular distance. Parameter ρ can then be set to the value that maximizes the correlation
between the model edge strengths and the human edge strengths. The best values of ρ for the GC and GC
(no scaling) models were 3.5 and 3.2, which yield correlations of 0.72 and 0.66 respectively. Both models
achieve almost identical correlations for ρ = 3, and for simplicity we set ρ = 3 for all subsequent analyses.
Because ρ > 1, this setting means that the edge strengths in the model are influenced more by brightness
than by proximity.
Figure S7a compares human strengths with strengths according to the GC model. The two edges with
greatest human strengths join the three stars in Orion’s belt (δ, ǫ and ζ Ori), and these edges have human
strengths of 33 because some of the 27 systems in our data include Orion’s belt in more than one asterism.
The same two edges are the strongest and third-strongest edges according to the model, and the second
strongest model edge joins the Southern Pointers (α and β Cen). This second edge appears as an outlier in
Figure S7a, and one possible reason is that these stars lie relatively far south and our data set is tilted towards
cultures from the Northern Hemisphere.
Corresponding plots for the three model variants are shown in Figure S7. All three perform worse than
the full GC model, suggesting that local scaling helps to account for human groupings and confirming that
both brightness and proximity are important.
6 GC model results
We began by testing that the prediction that the most common asterisms in Table S2 should be relatively
well captured by the model. To avoid having to choose a single value of the threshold parameter n, we
created a set Smodel that includes model systems for all values of n between 1 and 2000. The video at
www.charleskemp.com/papers/constellations.mp4 includes a frame for each system and
shows how model asterisms emerge as n is increased. We then computed model scores for each human
asterism a using the function modelscore(a, Smodel ) defined in Equation 4 of Materials and Methods.
The model scores in Table S2 indicate that most of the common asterisms are captured fairly well by
the model. The most notable exception is the Great Square of Pegasus. Model scores for each culture in our
data set are plotted in Figure 3. Although Table S2 suggests that common asterisms are often captured by
the model, Figure 3 shows that there are many less common asterisms that the model does not explain.
The model evaluations thus far do not depend on a specific setting of the threshold parameter n. The
model system in Figure 1B, however, is based on setting n = 320, and Table S3 in Appendix C lists all 124
asterisms in this system. The column labeled Score indicates how well these asterisms match our data set,
and this column is defined using
human score(a, Shuman ) = max (match(a, S)) ,
S∈Shuman
(S4)
where Shuman is the set of all 27 systems in our data set. 27 of the model asterisms are identical to asterisms
from one or more cultures, and 105 of the model asterisms have scores of 0.2 or greater, indicating that they
correspond at least partially to asterisms from at least one culture. Note that the scoring function is relatively
19
A
B
r= 0.72
r= 0.66
δ Ori, ε Ori
ε Ori, ζ Ori
30
δ Tau, γ Tau
ε UMa, ζ UMa
20
human
human
δ Tau, γ Tau
ε Ori, ζ Ori
δ Ori, ε Ori
30
α Gem, β Gem
λ Sco, υ Sco
10
20
α Cru, β Cru
10
β Cru, γ Cru
α Cen, β Cen
α Cen, β Cen
0
0
0.00
0.05
0.10
0.15
0.0
0.1
GC model
C
D
r= 0.35
r= 0.23
ε Ori, ζ Ori
30
ε Ori, ζ Ori
δ Ori, ε Ori
30
δ Tau, γ Tau
δ Ori, ε Ori
δ Tau, γ Tau
20
human
human
0.2
GC (no scaling)
α Cen, β Cen
10
20
10
98 Aqr, 99 Aqr
α CMa, α CMi
0
0.0
0.2
0.4
0.6
0.8
ρ Boo, σ Boo
δ PsA, γ PsA
0
0
GC (no proximity)
ψ Aqr, ψ Aqr
1
2
3
GC (no brightness)
Figure S7: Strengths of star pairs according to the cross-cultural data and four models: (A) the GC model,
(B) the GC model without local scaling, (C) the GC model with edge strengths based on brightness only,
and (D) the GC model with edge strengths based on proximity only. In each panel each point corresponds
to a pair of stars joined by an edge in the model graph, and selected edges are labelled in gray. The y-axis of
each panel shows counts across the entire data set, and the x-axis shows strengths according to the model.
strict, especially for larger asterisms with many variants that can be created by including or excluding fainter
stars. For example, Figure 1B suggests that the model captures Orion relatively well, but the model version
of Orion achieves a score of only 0.36.
20
7 Model comparisons
Our model belongs to a family of graph-based clustering algorithms that rely on a graph defined over the
items to be clustered [2, 3, 4]. The main alternative in the literature on perceptual grouping is the CODE
model [5, 6, 7] which uses a continuous spatial representation of the items to be clustered. A third possible
approach is k-means clustering, which has been previously applied to the problem of grouping stars into asterisms [8, 9]. We compared all three approaches using a set of three different scoring functions. Consistent
with our previous analyses, the input to each model includes all stars brighter than 4.5 in magnitude.
7.1
Scoring functions
Suppose that H (for human) is a set of human clusters and M is a set of model clusters. A good set M
should have high precision: each cluster in M should be similar to a cluster in H. A good set M should
also have high recall: for each cluster h in H there should be some cluster in M that is similar to h. We
formalize precision and recall as follows:
1 X
match(m, H)
(S5)
precision(M, H) =
|M |
m∈M
recall(M, H) =
1 X
match(h, M )
|H|
(S6)
h∈H
where the match(·, ·) function is defined in Equation 2 of Materials and Methods.
Precision and recall are typically combined using an F measure:
precision · recall
Fβ = (1 + β 2 ) 2
(β · precision) + recall
(S7)
The standard F measure sets β = 1, but we also consider an F10 measure that sets β = 10 and weights
recall more heavily than precision. The F10 measure captures the idea that a model system M that includes
just one or two clusters (i.e. recall is low) should not score highly regardless of how well the model clusters
match attested clusters.
Our measures of precision and recall and their combination using a Fβ score are directly inspired by the
literature on information retrieval. If match(a, S) returned 1 if a belonged to S and 0 otherwise, then our
formulations of precision and recall in Equations S5 and S6 would be equivalent to the standard definitions.
Our match(·, ·) function, however, is graded, which means that our formulations of precision and recall are
extensions of the standard definitions.
Our third scoring function is the adjusted Rand index, which is a standard measure of the similarity
between two partitions. Many of the cluster systems that we consider pick out a relatively small number of
clusters against a background of unclustered stars. In order to apply the adjusted Rand index we assign all
unclustered stars to an “everything else” category.
Of the three scoring functions, the F10 measure deserves the most attention. The standard F measure (i.e.
F1 ) has the shortcoming of assigning high scores to model solutions with a very small number of clusters.
The adjusted Rand index is undesirable because of the need to include an “everything else” category. We
report results for both measures because they are standard in the literature, but will focus primarily on the
F10 measure.
Each of the measures so far scores a model system M relative to a single human system H. We evaluate
a model solution relative to the full set H of systems for 27 cultures by computing the average score across
this set.
21
Figure S8: Asterisms returned by the GC model (n = 165).
Figure S9: Asterisms returned by the GC model (no scaling, n = 122).
22
Figure S10: Asterisms returned by the GC model (no brightness, n = 321).
7.2
GC model
The model includes two parameters: ρ, which controls the relative weights of brightness and proximity,
and the threshold parameter n. Previously ρ was set to 3 based on the correlation analysis summarized by
Figure S11: Asterisms returned by the GC model (no proximity, n = 123).
23
Figure S12: Asterisms returned by the CODE model with rescaled Gaussian kernels, local distances and the
sum combination function (t = 0.86, β = 0.86, h = 0.5 ).
Figure S7a, and we retain that value here. The threshold is set to the value (n = 165) that maximizes model
performance according to the F10 measure. In addition to the GC model we consider the three variants of
the model previously evaluated in Figure S7, and the n parameter is optimized separately for each one using
the F10 measure. Asterisms returned by all four models are shown in Figures S8 through S11.
7.3
CODE model
The CODE model can be implemented by dropping a kernel function (e.g. a Gaussian) on each item, combining all of these kernel functions to form an “activation surface,” then cutting the activation surface at
some threshold t to produce clusters. Previous applications of the CODE model consider the problem of
clustering a field of perceptually identical items, but for us the items are stars with different magnitudes. To
capture the idea that brighter stars are more likely to be included in asterisms, we adapted the CODE model
to allow taller kernel functions for brighter stars. If two stars are extremely close to each other, the sum of
the kernels on the two should be identical to a single kernel for a star with apparent magnitude equivalent
to the two stars combined. To satisfy this condition we set kernel heights based on the flux (i.e. apparent
brightness) of a star. For a star with apparent magnitude m, the flux F of the star in the visual band is
−m
F = F0 × 10 2.5
(S8)
where F0 is a normalizing constant. For our purposes we can drop the constant because scaling all kernels by
a constant is equivalent to adjusting the threshold used by the CODE model. We also introduce an additional
parameter β so that the height of the kernel on a star of magnitude m is
−m β
10 2.5
(S9)
24
When β = 1 kernel height is proportional to flux, and when β = 0 all stars have the same kernel height
regardless of flux.
Our formulation of the CODE model has two additional numeric parameters: the threshold t, and the
nearest-neighbour coefficient h. The standard deviation of the kernel for star i is
σi = hdi
(S10)
where di is the distance between the star and its nearest neighbour. In addition to these numeric parameters
Compton and Logan (1993) consider several qualitative parameters of the model:
• Gaussian vs Laplacian: kernel functions may be Gaussian or Laplacian
• sum vs max: kernels may be combined using a sum or a max function
• local vs global: if global, all distances di in Equation S10 are replaced by the global mean of di
• standard vs rescaled: if rescaled, all kernel functions are rescaled to have the same height
In our implementation the rescaling step specified by the fourth factor is carried out before adjusting the
kernels for brightness as specified by Equation S9. As a result, the kernels are equal in height at the end of
the process only when rescaling is applied and β = 0.
We evaluated all 16 combinations of the four factors, and optimized the three numeric parameters (β, t
and h) separately for each combination using the F10 measure. The optimization began with a grid search
then used Powell’s conjugate direction method initialized using the best values found in the grid search. The
best performing version used rescaled Gaussian kernels, local distances and the sum combination function,
and the best parameters for this model were t = 0.86, β = 0.86, and h = 0.5. The clusters returned by this
model are shown in Figure S12.
7.4
k-means clustering
k-means clustering begins by randomly choosing a set of k cluster centers. The algorithm then repeatedly
assigns items to the nearest cluster and recomputes the cluster centers based on these assignments until
convergence. We used the spherecluster package in Python to implement k-means clustering with distances
computed over the celestial sphere.
We ran k-means clustering for all k between 1 and 300. For comparison, the largest system of asterisms
in our data (Chinese) includes 318 asterisms, and 161 remain when we threshold the system at a stellar
magnitude of 4.5. For each value of k we ran the algorithm using 100 different initial cluster assignments
chosen using the package default (the k-means++ algorithm). Random initialization means that there is some
noise in the results, but model performance according to the F10 measure tended to increase monotonically
with k. The best-scoring system in our simulations, however, had k = 249 and is shown in Figure S13.
As Figure S13 shows, k-means tends to partition the stars into clusters that are roughly equal in size. In
contrast, the GC and CODE models both pick out a relatively small number of clusters against a background
of “unclustered” stars. To parallel this behavior we consider a variant of k-means with a magnitude threshold
m. This k-means threshold model runs regular k-means on all stars brighter than m, and all remaining stars
are treated as unclustered. Based on the F10 measure the thresholded model achieves best performance at
m = 2.9 and k = 100, and a model result for these parameter values is shown in Figure S14.
Other than the magnitude threshold, the k-means threshold model does not take brightness into account,
and the same applies to the basic k-means model. The distance measure used by k-means could potentially
be adjusted to take brightness into account, but our focus here is on k-means clustering as it is typically
applied.
25
Figure S13: Asterisms returned by k-means clustering (k = 249).
Figure S14: Asterisms returned by k-means clustering (k = 100) with a magnitude threshold of 2.9.
26
F10 score
F score
adjusted Rand index
0.25
0.20
0.20
0.3
0.15
0.2
0.10
0.15
singleton
one cluster
k means (thresh)
CODE
k means
GC (no proximity)
GC (no scaling)
GC (no brightness)
singleton
one cluster
k means (thresh)
CODE
k means
GC (no proximity)
GC (no scaling)
GC (no brightness)
GC
singleton
one cluster
k means (thresh)
CODE
0.00
k means
0.00
GC (no proximity)
0.0
GC (no brightness)
0.05
GC
0.05
GC
0.10
0.1
GC (no scaling)
score
0.4
Figure S15: Scores of nine models according to three measures: the F10 measure, the F measure, and the
adjusted Rand index.
7.5
Additional baselines
Two additional baselines were included in the model comparison, both of which rely on a magnitude parameter m. The “one cluster” model assigns all stars brighter than m to a single cluster, and the “singleton
model” assigns each of these stars to its own cluster. When m = 4.5 the singleton model is the limit of kmeans when k approaches the total number of stars. For each baseline and each scoring metric we identified
the best-performing value of m using an exhaustive search over m ∈ {3, 3.1, . . . , 4.5}.
7.6
Model scores
Scores for all models according to the three scoring measures are shown in Figure S15. The GC model
performs best regardless of which scoring measure is used, but we focus here on results for the F10 measure.
The second best model is k-means with a magnitude threshold. Figure S14 shows that this model picks
out asterisms including the Southern Cross and Orion’s belt but the best magnitude threshold for the model
(m = 2.9) means that it misses the Big Dipper, which includes a star of magnitude 3.3. Figure S13 shows
that k-means without the magnitude threshold produces a large number of compact groupings that cover the
sky in a way that is qualitatively unlike any of the systems in our data.
The CODE model performs worse than the GC model, and the asterisms in Figure S12 reveal at least
two qualitative limitations of the model. First, the model misses asterisms (e.g. the Big Dipper) that include
stars separated by relatively large distances. Second, in relatively dense regions (e.g. the area of the Milky
Way surrounding the Southern Cross) the model tends to form groups containing relatively large numbers
of fainter stars. If the CODE activation surface lies above the threshold in a given region, then all stars in
the region are included, regardless of how faint they are. In contrast, human asterisms sometimes pick out a
handful of bright stars without including fainter stars that lie nearby. For example, Betelgeuse and Bellatrix
(the shoulders of Orion) are often grouped in our data without including a fainter star (32 Orionis) that lies
27
Chinese
Navajo
0
4
2
2
2
0
0
5
0
0
Indian
Boorong
10
5
0
2
0
0.5
1.0
Maori
0
0
4
2
0
0.0
Norse
Lenekel
0.5
1.0
4
2
0
0.0
0.5
1.0
0.0
0.5
1.0
6
4
2
0
4
0.0
5
Indo−Malay
4
2
0
Tongan
1
Belarusian
15
10
5
0
6
4
2
0
Siberian
Marshall
Lokono
2
10
5
0
4
2
0
0
Sami
3
2
1
0
2
Pacariqtambo
5
6
4
2
0
Inuit
4
15
10
5
0
2
Macedonian
0
Arabic
10
Ojibwe
4
Babylonian
4
0
Anutan
4
Romanian
Dakota
5
Tukano
20
10
0
Egyptian
3
2
1
0
Western
count
Tupi
3
2
1
0
30
20
10
0
0.0
0.5
1.0
0.0
0.5
1.0
asterism score
Figure S16: Distributions of “other culture” scores for the asterisms in each culture. Scores of 1 indicate
asterisms that are identical to asterisms in some other culture. The cultures are ordered based on the means
of the distributions.
between them.
8 Comparisons across cultures
In addition to comparing each system to the predictions of the GC model (Figure 3), we compared each
system to other systems in the data set. For each system S let S−S be the set that includes all systems
except for S. For each asterism a in S we used Equation 4 (Materials and Methods) to compute the extent to
which a resembled an asterism in some system belonging to S−S . An “other culture” score of zero indicates
that asterism a is dissimilar from all asterisms in all other cultures, and a score of 1 indicates that a is
identical to an asterism from at least one other culture. Distributions of scores for each culture are shown
in Figure S16. As mentioned in the main text, the system that differs most from all others is the Chinese
system, but this result should be interpreted in light of genealogical relationships between cultures. There are
strong historical relationships between some systems in our data set—for example, Western constellations
are based in part on Babylonian tradition. One reason why the Chinese system stands out as different from
the others is that the genealogical relationships between Chinese culture and most other cultures in our data
are rather distant.
Figure S17 explores whether systems that tend to resemble systems from other cultures also tend to
match the predictions of the GC model. The x and y coordinates of each point in the figure correspond to
28
Norse ●
1.0
Boorong
●
Belarusian Lenekel
●
Maori
●
Tongan
Indian
●
●
●
Siberian Pacariqtambo● ●
●
other culture score
Indo−Malay
Lokono
0.8
●
Inuit
●
Sami
●
Romanian
●
Macedonian
●
● ●
Arabic
●
Marshall
Ojibwe
Anutan
●
0.6
Western
●
●
●
Dakota Babylonian
●
●
Egyptian
Tupi
●
Navajo
Tukano
●
0.4
Chinese
●
0.4
0.5
0.6
0.7
0.8
0.9
model score
Figure S17: “Other culture” scores compared to model scores. Systems above the line match other cultures
better than they match the predictions of the model.
means of distributions plotted in Figures 3 and S16. The results are again influenced by genealogical relationships between cultures. In particular, the Western tradition is overrepresented in our data set, meaning
that systems from this tradition have higher “other culture” scores than would otherwise be expected. A
related bias arises because characterizations of other systems are often influenced by the Western system.
For example, the Belarusian system achieves a very high “other culture” score partly because some of the
asterisms in this system are assumed to be identical to Western constellations such as Draco and Gemini [10].
Despite these limitations, Figure S17 suggests that model scores and “other culture” scores are highly
correlated, which is expected given that the asterisms identified by the model tend to be shared across cultures. Most of the points fall above the line, indicating that systems tend to match systems from other
cultures better than they match predictions of the model. This result is undoubtedly influenced by genealogical relationships between cultures, but may also indicate that there is scope to improve the model to better
capture common patterns that recur across cultures.
A
Asterism systems for 27 cultures
29
Figure S18: Anutan (Stellarium)
Figure S19: Arabic moon stations (Stellarium)
30
Figure S20: Belarusian (Stellarium)
Figure S21: Boorong [11, 1].
31
Figure S22: Chinese (Stellarium).
Figure S23: Dakota (Stellarium).
32
Figure S24: Egyptian (Stellarium).
Figure S25: Indian [12].
33
Figure S26: Indo-Malay [13].
Figure S27: Inuit (Stellarium).
34
Figure S28: Lokono (Stellarium).
Figure S29: Macedonian (Stellarium).
35
Figure S30: Maori (Stellarium).
Figure S31: Babylonian (MUL.APIN sky culture in Stellarium).
36
Figure S32: Marshall Islands [14].
Figure S33: Navajo (Stellarium).
37
Figure S34: Norse (Stellarium).
Figure S35: Ojibwe (Stellarium).
38
Figure S36: Pacariqtambo [15].
Figure S37: Romanian (Stellarium).
39
Figure S38: Sami (Stellarium).
Figure S39: Siberian (Stellarium).
40
Figure S40: Tongan (Stellarium).
Figure S41: Tukano (Stellarium).
41
Figure S42: Tupi (Stellarium).
Figure S43: Lenakel (Vanuatu) (Netwar sky culture in Stellarium).
42
Figure S44: Western (Stellarium).
43
B
Common asterisms
1
2
3
4
Human
(raw)
0.63
0.62
0.59
0.57
Human
(adj)
0.55
0.57
0.51
0.46
Model
Score
1.0
1.0
1.0
0.88
5
6
0.43
0.37
0.42
0.36
1.0
0.5
7
8
9
10
11
0.35
0.3
0.29
0.28
0.26
0.35
0.34
0.33
0.31
0.3
1.0
0.6
1.0
0.71
0.45
12
13
14
15
0.25
0.24
0.24
0.24
0.29
0.31
0.38
0.33
0.75
1.0
1.0
0.44
16
17
18
0.23
0.22
0.22
0.27
0.25
0.3
1.0
1.0
0.62
19
20
21
22
23
0.21
0.21
0.21
0.2
0.19
0.34
0.31
0.29
0.29
0.31
0.04
1.0
1.0
1.0
0.75
24
25
26
27
0.19
0.19
0.18
0.18
0.35
0.28
0.25
0.3
0.0
0.83
0.56
0.33
28
0.17
0.22
0.58
29
30
31
0.17
0.16
0.16
0.27
0.33
0.27
1.0
0.83
0.56
32
0.16
0.26
0.01
Stars
Description
34DelOri, 46EpsOri, 50ZetOri
25EtaTau, 17Tau, 19Tau, 20Tau, 23Tau, 27Tau
87AlpTau, 54GamTau, 61Del1Tau, 74EpsTau, 78The2Tau
50AlpUMa, 48BetUMa, 64GamUMa, 69DelUMa, 77EpsUMa, 79ZetUMa, 85EtaUMa
Alp1Cru, BetCru, GamCru, DelCru
5AlpCrB, 3BetCrB, 8GamCrB, 10DelCrB, 13EpsCrB, 4TheCrB, 14IotCrB
66AlpGem, 78BetGem
34DelOri, 46EpsOri, 50ZetOri, 44IotOri, 42Ori
9AlpDel, 6BetDel, 12Gam2Del, 11DelDel
18AlpCas, 11BetCas, 27GamCas, 37DelCas, 45EpsCas
58AlpOri, 19BetOri, 24GamOri, 34DelOri, 46EpsOri, 50ZetOri, 53KapOri
46EpsOri, 50ZetOri, 48SigOri
13AlpAri, 6BetAri, 5Gam2Ari
Alp1Cen, BetCen
50AlpUMa, 48BetUMa, 64GamUMa, 69DelUMa, 77EpsUMa, 79ZetUMa, 85EtaUMa, 1OmiUMa, 29UpsUMa,
30PhiUMa, 63ChiUMa, 23UMa
53AlpAql, 60BetAql, 50GamAql
50AlpUMa, 48BetUMa
1AlpUMi, 7BetUMi, 13GamUMi, 23DelUMi, 22EpsUMi,
16ZetUMi, 21EtaUMi
54AlpPeg, 53BetPeg
8Bet1Sco, 7DelSco, 6PiSco
35LamSco, 34UpsSco
Iot1Sco, KapSco, 35LamSco, 34UpsSco
32AlpLeo, 41Gam1Leo, 17EpsLeo, 36ZetLeo, 30EtaLeo,
24MuLeo
21AlpAnd, 88GamPeg
1AlpCrv, 9BetCrv, 4GamCrv, 7DelCrv, 2EpsCrv
21AlpSco, 8Bet1Sco, 7DelSco, 6PiSco, 20SigSco
50AlpCyg, 6Bet1Cyg, 37GamCyg, 18DelCyg, 53EpsCyg,
21EtaCyg
21AlpSco, 8Bet1Sco, 7DelSco, 26EpsSco, Zet2Sco, Mu1Sco,
6PiSco, 20SigSco, 23TauSco
21AlpSco, 20SigSco, 23TauSco
3AlpLyr, 10BetLyr, 14GamLyr, 12Del2Lyr, 6Zet1Lyr
26EpsSco, Zet2Sco, EtaSco, TheSco, Iot1Sco, KapSco,
35LamSco, Mu1Sco, 34UpsSco
21AlpAnd, 54AlpPeg, 53BetPeg, 88GamPeg
Orion’s Belt
Pleiades
Hyades
Big Dipper
44
Southern Cross
Corona Borealis
Castor and Pollux
Delphinus
Cassiopeia
Orion
Head of Aries
Southern Pointers
Shaft of Aquila
Little Dipper
Head of Scorpius
Stinger of Scorpius
Sickle
Corvus
Northern Cross
Lyra
Square of Pegasus
33
0.15
0.22
0.2
34
35
36
37
0.14
0.14
0.14
0.14
0.3
0.35
0.34
0.35
0.0
0.67
1.0
0.56
38
0.13
0.21
0.27
39
40
0.13
0.13
0.21
0.21
1.0
0.32
41
42
43
44
45
46
47
0.12
0.12
0.12
0.11
0.11
0.11
0.11
0.23
0.35
0.24
0.33
0.29
0.26
0.32
0.33
1.0
0.5
1.0
0.23
0.57
0.09
48
49
0.11
0.11
0.29
0.33
0.67
0.53
50
0.1
0.21
0.0
51
52
53
54
0.1
0.1
0.1
0.1
0.31
0.23
0.27
0.18
1.0
1.0
1.0
0.5
55
56
57
0.1
0.1
0.1
0.31
0.3
0.25
0.33
0.75
0.33
34DelOri, 46EpsOri, 50ZetOri, 87AlpTau, 54GamTau,
61Del1Tau, 74EpsTau, 78The2Tau, 17Tau
43GamCnc, 47DelCnc
AlpCrA, BetCrA, GamCrA, DelCrA
39LamOri, 37Phi1Ori, 40Phi2Ori
6Alp2Cap, 9BetCap, 40GamCap, 49DelCap, 34ZetCap,
23TheCap, 32IotCap, 16PsiCap, 18OmeCap
58AlpOri, 24GamOri, 34DelOri, 46EpsOri, 50ZetOri,
47OmeOri, 51Ori
58AlpOri, 24GamOri
21AlpSco, 8Bet1Sco, 7DelSco, 26EpsSco, Zet1Sco, EtaSco,
TheSco, Iot1Sco, KapSco, 35LamSco, Mu1Sco, 6PiSco,
23TauSco
AlpCrA, BetCrA, GamCrA, EpsCrA, ZetCrA
68DelLeo, 70TheLeo
42AlpCom, 43BetCom, 15GamCom
3AlpLyr, 4Eps1Lyr, 6Zet1Lyr
13AlpAur, 34BetAur, 37TheAur, 3IotAur, 112BetTau
EpsCar, IotCar, DelVel, KapVel
11AlpDra, 23BetDra, 33GamDra, 57DelDra, 63EpsDra,
22ZetDra, 14EtaDra, 13TheDra, 12IotDra, 5KapDra, 1LamDra, 25Nu2Dra, 32XiDra, 60TauDra, 44ChiDra
48GamAqr, 55Zet2Aqr, 62EtaAqr, 52PiAqr
16AlpBoo, 42BetBoo, 27GamBoo, 49DelBoo, EpsBoo,
30ZetBoo, 8EtaBoo, 25RhoBoo, 5UpsBoo
AlpCrA, BetCrA, GamCrA, DelCrA, EpsCrA, ZetCrA,
Eta2CrA, TheCrA, Kap2CrA, LamCrA, 7122, 7129
9Alp2Lib, 27BetLib
7BetUMi, 13GamUMi, 5UMi
10AlpCMi, 3BetCMi
5RhoOph, 21AlpSco, 8Bet1Sco, 7DelSco, 6PiSco, 5RhoSco,
20SigSco, 23TauSco, 6070
67AlpVir, 29GamVir, 43DelVir, 47EpsVir, 3NuVir
23BetDra, 33GamDra, 57DelDra, 25Nu2Dra, 32XiDra
58AlpOri, 24GamOri, 39LamOri, 37Phi1Ori, 40Phi2Ori
Table S2: An extended version of Table 1 that includes 57 asterisms in total.
C
Asterisms for the GC model with n = 320
1
2
3
Score
1.0
1.0
1.0
4
1.0
Stars
25EtaTau, 17Tau, 19Tau, 20Tau, 23Tau, 27Tau
Alp1Cen, BetCen
10Gam2Sgr, 19DelSgr, 20EpsSgr, 38ZetSgr, EtaSgr, 22LamSgr,
34SigSgr, 40TauSgr, 27PhiSgr
53AlpAql, 60BetAql, 50GamAql
45
Description
Pleiades
Southern Pointers
Teapot
Shaft of Aquila
Corona Australis
Capricornus
Scorpius
Auriga
False Cross
Draco
Water Jar
Boötes
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.88
29
0.86
30
0.83
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
0.8
0.8
0.8
0.75
0.75
0.75
0.67
0.67
0.67
0.67
0.67
0.67
0.67
0.67
0.67
0.67
0.67
0.67
34AlpAqr, 48GamAqr, 55Zet2Aqr, 62EtaAqr
9AlpDel, 6BetDel, 12Gam2Del, 11DelDel
AlpCrA, BetCrA, GamCrA
5AlpSge, 6BetSge, 12GamSge, 7DelSge
13AlpAri, 6BetAri, 5Gam2Ari
5GamEqu, 7DelEqu
95 Her, 102Her
60BetOph, 62GamOph
67PiHer, 75RhoHer
25IotOph, 27KapOph
13EpsAql, 17ZetAql
40TauLib, 39UpsLib
7BetUMi, 13GamUMi, 5UMi
EpsBoo, 25RhoBoo, 28SigBoo
MuCen, NuCen, PhiCen
42ZetPeg, 46XiPeg
33LamUMa, 34MuUMa
39LamOri, 37Phi1Ori, 40Phi2Ori
13TheCyg, 10Iot2Cyg, 1KapCyg
20EtaLyr, 21TheLyr
90PhiAqr, 91Psi1Aqr, 93Psi2Aqr
57DelDra, 63EpsDra
1Pi3Ori, 3Pi4Ori
87AlpTau, 54GamTau, 61Del1Tau, 68Del3Tau, 74EpsTau,
78The2Tau, 71Tau
50AlpUMa, 48BetUMa, 64GamUMa, 69DelUMa, 77EpsUMa,
79ZetUMa, 85EtaUMa, 80UMa
18AlpCas, 11BetCas, 27GamCas, 37DelCas, 45EpsCas, 17ZetCas,
24EtaCas
23BetDra, 33GamDra, 25Nu2Dra, 32XiDra
5AlpCrB, 3BetCrB, 8GamCrB, 13EpsCrB, 4TheCrB, 49DelBoo
53BetPeg, 44EtaPeg, 47LamPeg, 48MuPeg
Alp1Cru, BetCru, GamCru, DelCru, EpsCru
32AlpLeo, 41Gam1Leo, 36ZetLeo, 30EtaLeo, 31Leo
37Xi2Sgr, 39OmiSgr, 41PiSgr
TheSco, Iot1Sco, KapSco, 35LamSco, 34UpsSco, 6630
4BetTri, 9GamTri
3AlpLyr, 12Del2Lyr, 4Eps1Lyr, 6Zet1Lyr
66AlpGem, 78BetGem, 62RhoGem, 75SigGem
9IotUMa, 12KapUMa
16AlpBoo, 8EtaBoo, 4TauBoo, 5UpsBoo
42TheOph, 44Oph
2XiTau, 1OmiTau
Bet1Sgr, Bet2Sgr
22ZetDra, 14EtaDra
67Oph, 70Oph
10BetLyr, 14GamLyr
46
Water Jar (part)
Delphinus
Corona Australis
Sagitta
Head of Aries
Judge of right and wrong (Chinese)
Textile ruler (Chinese)
Official for the royal clan (Chinese)
Woman’s bed (Chinese)
Dipper for solids (Chinese)
ar in Mejleb (Marshall Islands)
Celestial spokes (Chinese)
Jemenuwe (Marshall Islands)
Celestial lance (Chinese)
Ujela (Marshall Islands)
Thunder and lightning (Chinese)
Kam Anij (Marshall Islands)
Al-Hekaah (Arabic)
Xi Zhong (Chinese)
Nin-SAR and Erragal (Babylonian)
Mhua (Tukano)
Celestial kitchen (Chinese)
Lulal and Latarak (Babylonian)
Hyades
Big Dipper
Cassiopeia
Head of Draco
Corona Borealis
Resting palace (Chinese)
Southern Cross
Sickle (part)
Establishment (Chinese)
Tail of Scorpius
Triangulum (part)
Lyra (part)
Castor & Pollux
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
0.67
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.44
71
72
0.43
0.43
73
0.43
74
75
76
77
78
79
80
81
82
0.43
0.43
0.4
0.4
0.4
0.4
0.4
0.4
0.36
83
84
85
86
87
88
89
90
0.33
0.33
0.33
0.33
0.33
0.3
0.29
0.29
16PsiCap, 18OmeCap
TheCar, 4050, 4140
5AlpCep, 3EtaCep, 2TheCep
BetAra, GamAra, ZetAra
68DelLeo, 70TheLeo, 60Leo
10AlpCMi, 3BetCMi, 4GamCMi
4GamCrv, 7DelCrv, 8EtaCrv
AlpMus, BetMus
16LamAql, 12Aql
7AlpLac, 3BetLac
10ThePsc, 17IotPsc
24GamGem, 31XiGem, 30Gem
9AlpCMa, 2BetCMa
13GamLep, 15DelLep
11AlpLep, 9BetLep
13AlpAur, 34BetAur, 7EpsAur, 8ZetAur, 10EtaAur, 35PiAur
22TauHer, 11PhiHer
5Alp1Cap, 6Alp2Cap, 9BetCap
AlpGru, BetGru, EpsGru, ZetGru
MuCep, 10NuCep
AlpPhe, EpsPhe, KapPhe
50AlpCyg, 37GamCyg, 18DelCyg, 53EpsCyg, 58NuCyg, 62XiCyg, 31Cyg, 32Cyg
24AlpPsA, 22GamPsA, 23DelPsA
21AlpSco, 8Bet1Sco, 7DelSco, 14NuSco, 6PiSco, 20SigSco,
23TauSco, 10Ome2Sco, 9Ome1Sco
25DelCMa, 21EpsCMa, 31EtaCMa, 24Omi2CMa, 22SigCMa,
28OmeCMa
11EpsHya, 16ZetHya, 13RhoHya
17IotAnd, 19KapAnd, 16LamAnd
78IotLeo, 77SigLeo
86Aqr, 88Aqr, 98Aqr, 99Aqr
44ZetPer, 38OmiPer
31EtaCet, 45TheCet
43BetAnd, 37MuAnd
26EpsSco, Mu1Sco
67BetEri, 69LamEri, 58AlpOri, 19BetOri, 24GamOri, 34DelOri,
46EpsOri, 50ZetOri, 28EtaOri, 43The2Ori, 44IotOri, 53KapOri,
48SigOri, 20TauOri, 29Ori, 32Ori, 42Ori, 1887
BetPhe, GamPhe
51And, PhiPer
40GamCap, 49DelCap
26BetPer, 25RhoPer
40AlpLyn, 38Lyn
24AlpSer, 13DelSer, 37EpsSer
AlpCol, BetCol
51MuPer, 48Per
47
Corvus (part)
Musca (part)
Grus (part)
Northern Cross
Head of Scorpius
Rear of Canis Major
Orion
91
92
0.29
0.29
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
0.25
0.25
0.25
0.25
0.25
0.25
0.22
0.21
0.2
0.2
0.2
0.2
0.2
0.18
0.18
0.15
0.15
0.13
0.12
0.11
0.06
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
31DelAnd, 30EpsAnd
33AlpPer, 23GamPer, 39DelPer, 15EtaPer, IotPer, 35SigPer,
18TauPer, 37PsiPer
28BetSer, 41GamSer, 35KapSer
7EtaGem, 13MuGem
TheGru, IotGru
27BetHer, 20GamHer
27DelCep, 23EpsCep, 21ZetCep
EpsCar, IotCar, DelVel, KapVel, 3447, 3659, 3803
41Ups4Eri, 43Eri
17EpsLeo, 4LamLeo, 24MuLeo
1DelOph, 2EpsOph
76DelAqr, 71Tau2Aqr
GamLup, DelLup
EtaCen, KapCen, AlpLup, BetLup
PhiEri, ChiEri
14ZetLep, 16EtaLep
23DelEri, 18EpsEri
1Lac, 8485
39Cyg, 41Cyg
86MuHer, 94NuHer, 92XiHer, 103OmiHer
GamCen, DelCen, TauCen
67SigCyg, 65TauCyg
46LMi, 54NuUMa, 53XiUMa
37TheAur, 32NuAur
AlpCar, TauPup
21EtaCyg, ChiCyg
PiPup, 2787
ZetPup, 3080
AlpEri, AlpHyi
25TheUMa, 26UMa
Del1Gru, Del2Gru
43PhiDra, 44ChiDra
3445, 3487
65Kap1Tau, 69UpsTau
Perseus (part)
False Cross
Table S3: Asterisms picked out by the GC model with n = 320. The scores roughly indicate how similar
each asterism is to the closest asterism in the human data (1.0 indicates a perfect match). In some cases the
descriptions are approximate only—for example, the asterism labeled “Corona Borealis” includes an extra
star (49DelBoo).
References
[1] D. W. Hamacher. On the astronomical knowledge and traditions of Aboriginal Australians. PhD thesis,
Macquarie University, 2012.
48
[2] C. T. Zahn. Graph-theoretical methods for detecting and describing Gestalt clusters. IEEE Transactions
on Computers, 20(1):68–86, 1971.
[3] N. Ahuja. Dot pattern processing using Voronoi neighborhoods. IEEE Transactions on Pattern Analysis and Machine Intelligence, (3):336–343, 1982.
[4] M. C. J. van den Berg. Grouping by proximity and grouping by good continuation in the perceptual
organization of random dot patterns. PhD thesis, University of Virginia, 1998.
[5] Michiel P Van Oeffelen and Peter G Vos. Enumeration of dots: An eye movement analysis. Memory
& Cognition, 12(6):607–612, 1984.
[6] B. J. Compton and G. D. Logan. Evaluating a computational model of perceptual grouping by proximity. Perception & Psychophysics, 53(4):403–421, 1993.
[7] Brian J Compton and Gordon D Logan. Judgments of perceptual groups: Reliability and sensitivity to
stimulus transformation. Perception & Psychophysics, 61(7):1320–1335, 1999.
[8] J Schreiner. Redefining constellations and asterisms. Available at http://www.jschreiner.
com/english/stars/home.html.
[9] S. Xu, K. Chen, and Y. Zhou. Re-clustering of constellations through machine learning. Technical
report, Stanford University, 2014.
[10] Tsimafei Avilin. Astronyms in Belarussian folk beliefs. Archaeologia Baltica, 10:1, 2009.
[11] W. E. Stanbridge. On the astronomy and mythology of the Aborigines of Victoria. Proceedings of the
Philosophical Institute of Victoria, 2:137–140, 1857.
[12] G. R. Kaye. Hindu astronomy: Ancient science of the Hindus. New Delhi, 1981.
[13] G. Ammarell. Astronomy in the Indo-Malay archipelago. In H. Selin, editor, Encyclopaedia of the
History of Science, Technology, and Medicine in Non-Western Cultures, pages 324–333. Springer,
2008.
[14] P. A. Erdland. Die Marshall-Insulaner: Leben und Sitte, Sinn und Religion eines Südsee-Volkes.
Aschendorffsche, 1914.
[15] G. Urton. Constructions of the ritual-agricultural calendar in Pacariqtambo, Peru. In V. Del Chamberlain, J. B. Carlson, and J. M. Young, editors, Songs from the Sky: Indigenous Astronomical and
Cosmological Traditions of the World. Ocarina Books, 2005.
49