Communication and Cognition - Keith Stenning, Alex Lascarides, Jo Calder PDF
Communication and Cognition - Keith Stenning, Alex Lascarides, Jo Calder PDF
Communication and Cognition - Keith Stenning, Alex Lascarides, Jo Calder PDF
Cambridge, Massachusetts
London, England
MIT Press Math6X9/2003/02/27:16:11 Page 2
2
MIT Press Math6X9/2003/02/27:16:11 Page 1
Contents
Preface iii
V GRAPHICAL COMMUNICATION
MIT Press Math6X9/2003/02/27:16:11 Page 2
ii Contents
VII APPENDICES
Appendix A:
Bibliographical notes and further reading 519
Appendix B:
Consolidated grammars 525
Appendix C:
Glossary 539
Index 561
MIT Press Math6X9/2003/02/27:16:11 Page 3
Preface
This book grew out of a course, and the course grew out of a very
particular set of intellectual and institutional needs. The Centre for
Cognitive Science was a department founded in 1969 (with the name,
later changed, of `Epistemics') and that department initially taught only
graduate-level courses in the interdisciplinary study of cognition. In
1994, we sought to extend the Centre's teaching into the undergraduate
syllabus. It would have been possible to `start at the end' and teach nal-
year undergraduates a specialised course. Instead we decided to teach
an introductory `service' course open to students from any department
in the University.
We did this because the disciplines involved (linguistics, logic, AI,
philosophy, and psychology) are all subjects which are not much taught
in high school, and we wanted to put on a course that would give students
from any background a grounding in how these disciplines combined to
provide interdisciplinary theories of human communication.
This goal meant that surveying the various literatures was not really
an option. Instead, we would have to isolate only a few topics for study,
chosen on the basis that they brought several disciplines to bear on a
single phenomena involving human communication. Our intention was to
explore these few topics in some depth from several disciplinary angles.
We were assured by some that this would not work|how can stu-
dents learn to put together several disciplines before they have been
inducted into any discipline? Our view is that students start out with
an interest in certain problems, and are often baed by the way that
dierent disciplines slice these problems up. In Edinburgh, the various
relevant disciplines (Psychology, Philosophy, Articial Intelligence, Com-
puter Science and Linguistics) are housed in departments with some 2.5
miles between them. These kinds of distances between departments in
the Humanities and the Sciences are not unusual in any university, even
campus-based ones. And students bouncing between departments some-
times nd a radical translation problem between the languages spoken
on the dierent sides.
We felt that a `service' course that examined the dierence in per-
spectives was what was needed. How much easier for the student if we
started with target problems and tried to show how the various disci-
plines developed their distinctive views, and how those views relate, or
fail to relate. This can serve two kinds of student: ones who would never
pursue any of these disciplines any further, at least giving them an in-
MIT Press Math6X9/2003/02/27:16:11 Page 4
for delivering the nal manuscript four years late! Last but denitely
not least, we would like to thank all sta at the Human Communication
Research Centre at the University of Edinburgh for their support and
encouragement over the years and for providing such a collegial and
stimulating environment in which to do teaching and research.
We end with some remarks about the form of this book:
PEOPLE COMMUNICATING|SOME
I PHENOMENA
MIT Press Math6X9/2003/02/27:16:11 Page 2
MIT Press Math6X9/2003/02/27:16:11 Page 3
4 Chapter 1
6 Chapter 1
far from clear just what the idea is that is shared after communication
has taken place.
The anthropologist Malinowski writing in the 1920s, coined the
term phatic communication as distinct from what he called ideational
communication to describe this kind of communication which functions
to establish community, but which is not easily explained in terms of the
transfer of information. The paradigm cases of phatic communication are
ritual and fashion. Hem lines rise and fall, and with them the fortunes
of an international industry. People are entirely ostracised, or worse,
from some groups because the distance of the hem to the knee is wrong.
Wearing this as opposed to that communicates.
But what? We might try and reanalyse this phenomenon in terms of
ideational communication by incorporating the apparatus of symbolism.
Perhaps rising hemlines symbolise the proposition that the wearer is
sexually available. Such an analysis may or may not bear some grain of
truth, but it has caused many problems, both practical and theoretical.
In fact the alternative symbolic analysis that rising hemlines in the
population means that the wearer is less sexually available, for a given
degree of exposure, might be a better t to the data. But there seems
something quite intellectually simplistic as well as distasteful about
trying to turn this communication into these propositions.
The problem is not that fashion dispenses with the transmission of
information. We have to see the hemline to get the message. There is
the same requirement for energy transmission as in the more obviously
ideational cases we have considered so far. Not even the 20C post-
modernist fashion industry has managed to sell clothes that remain
unseen in the wardrobe. But if we analyse the information that is
transmitted literally as \his turn ups are N cms above the ankle" we are
left with the puzzle about why this is signicant. After all, he could take
out a classied newspaper ad reading: \My turnups are N cms above the
ankle". This would express the same proposition, but we would not (at
least yet) regard this advertising as either fashionable or unfashionable
behaviour. It seems we can communicate the same proposition without
communicating the desired phatic result.
The information that is transmitted by fashion is, most fundamen-
tally, information about membership of community. Wearing this hem-
line communicates that the wearer is or aspires to be a member of the
community which currently wears this hemline as a fashion. The last
MIT Press Math6X9/2003/02/27:16:11 Page 7
qualication is important. The absent minded lecturer who falls into
fashion by failing to change his clothes since last they were in fashion,
is not a member of this community, though he might just be mistaken
for a member. Although a sentence about turnups does not achieve the
same communication as wearing the turnups, there is still an important
element of arbitrariness in fashion. This arbitrariness is crucial to the
signal's functioning as a phatic signal. If polar explorers have to have low
turnups because of frostbite problems, then it cannot be a fashion signal
amongst polar explorers. Hemline is merely functional for this group.
Phatic communication is about choosing to belong where we could have
chosen not to.
Of course, there is an immensely complicated web of weak functional
constraints which in
uence fashion but allow enough arbitrariness to let
in the phatic. This indirect but nevertheless ever-present background is
what people tend to appeal to when they analyse fashion symbolically
as in the example above. Coverage and exposure are not unrelated to
sexual availability, or wealth, or age in our culture. The cost of fashion
is an important non-arbitrary aspect which imposes constraints that
give symbolic meaning, quite capable of complex inversions at several
removes. Fashion is about group membership, particularly in so far as
that group membership is a matter of choice, and a matter of change.
In ritual there is less of a temporal dimension. The point of ritual
is that it is a timeless re
ection of the culture and community to which
we belong. But we still see the importance of arbitrariness. Jonathan
Swift could lampoon his culture's religious bigotry with his allegory of
the culture that fought wars about which end of the boiled egg should
be eaten rst. The signicance of such arbitrary symbols is a phatic
signicance and to understand it we have to understand how the symbols
function in a society or sub-group.
Fashion and ritual are paradigmatic examples of phatic communica-
tion. Fashion and ritual are extreme examples, convenient for explaining
the concept because they are so recalcitrant to ideational analysis. But
phatic and ideational communication are not in general neatly separa-
ble. Communication always incorporates both aspects, though one may
overshadow the other. For example, academic lectures might, on the
face of it, appear to be pure ideational communication, with precious
little phatic aspect to them. But to fall for this appearance would be a
mistake. Undergraduate degrees are elaborate rituals of induction into
MIT Press Math6X9/2003/02/27:16:11 Page 8
8 Chapter 1
10 Chapter 1
The aridity which late behaviourism had fallen into gave way to cogni-
tivism, and topics in psychology and linguistics opened up again, topics
which had been out-of-bounds for a generation or more. Behaviourism
itself laid important groundwork for the appreciation of the informa-
tional level in psychological and linguistic understanding, and continues
to have lessons for cognitivism when it strays too far from its evidence.
But behaviourism's real legacy lives on in the experimental techniques
that cognitive psychology still employs. More recently there has been a
turning back of attention onto the issue of how to extract knowledge by
statistical treatment of the data of experience. This movement has some
of the interests of behaviourism but is now far more a technical and less
an ideological approach.
Disciplinary perspectives on communication
The interdisciplinary composition of cognitive science results from a con-
viction that though it was perhaps essential to dierentiate the disci-
plinary approaches during the 19C, solving the scientic problems about
the mind which face us now requires all of these approaches simultane-
ously, and that usually means that teams of researchers work together.
Their methods can usefully be classied into those that understand by
analysing and those that understand by synthesising.
Ways of understanding an X:
Analytical approaches:
{observe an X in contexts
{take an X to bits
Synthetic approaches:
{build an X
{deduce the properties any possible X must have
Of the predominantly analytic approaches, linguistics and psychol-
ogy adopt stances towards language and communication, which work
by observation of people communicating. Observation here includes ob-
servation when the environment is systematically manipulated by the
scientist experimenting. Linguistics focuses on the external representa-
tions of language: speech sounds, written sentences, characterising and
explaining their structure, and focusing on what is going on publicly.
MIT Press Math6X9/2003/02/27:16:11 Page 12
12 Chapter 1
14 Chapter 1
describe their data they may well throw out the data from the people
who didn't follow the instructions, or made too many errors. Data is
`cleaned up' in accordance with sometimes quite sweeping idealisations.
At dierent scales of enquiry, dierent idealisations turn out to be
appropriate. One linguist's idealisation (to ignore speech errors, perhaps)
turn out to be another linguist's data (in analysing the mechanisms of
speech production). The logician may reject patterns of reasoning which
people frequently appear to use, while the psychologist may take these
logical `errors' as the main phenomenon to be explained. At some points
in scientic development, there are major arguments about whether an
idealisation is a good one|as we shall see, for example, when we discuss
logical analyses of human reasoning.
In the end, idealisations are justied by their fruitfulness. With
work in progress one has to rely on one's own judgements about future
usefulness. That is a good thing to keep in mind throughout this book.
Just remember that one sometimes has to leave out what appears at
rst sight to be the main ingredient that should go in.
16 Chapter 1
1.4 Reading
Intro to sociological perspective on communication? Goman? The pre-
sentation of self-in everyday life?
Gardener, H. The mind's new science.
Exercises
Exercise 1.1: Think of your own examples to illustrate the distinction
between phatic and ideational communication. Choose phenomena other
than fashion or religious ritual. Write brief notes on why they are good
examples of these concepts.
18 Chapter 1
MIT Press Math6X9/2003/02/27:16:11 Page 19
20 Chapter 2
nc llstrtn f ths pnt s th prdctblt of vwls and cnsnnts. Smtc lnggs r wrttn wth
jst cnsnnts nd th vwls mmttd . . .
a ie iuaio o i oi i e eiaiiy o oe ooa. eii auae ae ie i u ooa a w oe oie . . .
Although natural languages display some redundancy which is cru-
cial to communication through noise, their `design' also exhibits
many eciencies of coding which information theory would predict. Zipf
showed that natural languages obey an ecient coding scheme in infor-
mation theoretic terms. He showed that frequent words (as evidenced
by counting large samples of language) tend to be short and rare words
to be longer. This was one of the rst, if rather limited, applications of
information theory to understanding natural human communication as
opposed to engineered systems.
With just this rather minimal conceptual apparatus it is possible to
show that the most eective way of transmitting information about a set
of equiprobable possibilities is by means of binary signals which divide
the set of possibilities successively into halves, and halves of halves, and
so on. Each choice point is assigned to a binary signal. Anyone who has
played 20 questions will have an intuitive grasp of the strategy. Asking
whether `it' is an ocelot, before asking whether `it' is animal, vegetable
or a mineral, is not question-ecient. If there are 2n possibilities, then
at least n binary signals are required to discriminate them all.
This scheme of successive binary division has already smuggled in
implicitly an interesting idea. We can expand the number of possibilities
discriminable even if we are still stuck with a physical channel which
is only binary, by transmitting a sequence of signals and interpreting
the sequence. If we have 256 possibilities, we can assign 8-bit codes
(28 = 256) in such a way that each is uniquely labeled. These codes range
from 00000000 through 01100001 to 11111111. But notice that signals
are now structured into `levels'. In order to interpret a bit, we have to
know where it comes in a sequence of eight signals. Rather in the way
that letters make up words, our binary signal sequences make up larger
signals. Like letters, the binary signals are no longer meaningful except
inasmuch as they contribute to dierentiating words. In fact computers
are designed so that they have a basic hardware signal length, and the
codes that are tted to this length are called words.
But compactness of codes has a downside. As we noted above, natu-
ral languages are not compact like our eight-bit code for 256 possibilities.
MIT Press Math6X9/2003/02/27:16:11 Page 22
22 Chapter 2
Deleting the vowels from written language still leaves most messages in-
tact. This less than minimal coding incorporates redundancy, and is
typical of natural communication systems. We talk and write in envi-
ronments full of random noise|acoustic, electrical, visual noise. This is
random energy which is present in the environment superimposed on the
signals. With a fully ecient (compact) code, deforming any one signal
will always guarantee that we transmit another meaningful message|
just that it will be the wrong message. For example,
ipping any bit in
the eight-bit code for 256 messages mentioned above will always generate
another code member.
If we want to be able to communicate reliably in noisy environments,
then there must be combinations of signals which are not meaningful. A
common scheme for an eight-bit binary code is to have what is called a
parity bit. The parity bit is set to 0 if there is an even number of 1s
in the other 8 positions, and 1 if there is an odd number. On receiving
a message, we can examine the parity bit and see whether it is correct
relative to the eight bits it comes with. If there are oddly many 1s with
a 0 parity bit, or evenly many 1s with an odd parity bit, then something
has gone wrong. Of course, there is a chance that more than one error
has taken place and the message has been deformed even though the
parity bit is correct, but the chance of two independent errors is much
lower than the probability of one. Introducing redundancy by `fault-
tolerant' schemes such as parity bits means that more bits have to be
transmitted, and that some possible messages are meaningless, but it is
a price worth paying if we want to communicate in noisy environments.
That's why natural language is redundant as we demonstrated above,
though its patterns of redundancy are more complex than parity bits.
Shannon and Weaver's framework, simple though it is, helps us to see
the abstractness of information. Our technology has become so digitised
that the idea of turning anything from product numbers to symphonies
into digital codes is all around us. Given the digitisation, we can see all
sorts of physical implementations of this information. To the bar-code
reader, the bar-code is seen as a pattern of light. The reader turns that
pattern into voltages, and then into magnetisations on the computer's
disk, and half a dozen dierent technologies besides. When the bar-code
on the baked beans doesn't read properly, the assistant types the code.
His typing is a pattern of mechanical energy. The keyboard is a device for
turning patterns of mechanical energy into patterns of electrical energy.
MIT Press Math6X9/2003/02/27:16:11 Page 23
24 Chapter 2
This is not to say that the framework has no use in the analysis of
language. Within the lower levels, nite codes are extremely important to
the processors which decode language. At the level of sounds for example,
information theory has an important part to play in understanding
the `design' of natural language. But as an ultimate framework for
understanding discourse, something else must be found.
A discourse which is a sequence of sentences each drawn from the
indenitely large population of possible sentences which make up a
language like English. When the receiver has decoded a sentence he
or she must do something about storing the information it carries if the
information is to connect with other sentences, as connect it must. So an
immediate consequence of innite languages is that not all information
in the mind is uniformly present throughout. There is a problem of
connecting new information with old information in order to base action
on knowledge. We will see some concrete examples soon. This process of
inference is everywhere in the mind, and everywhere in communication.
Summary: Shannon and Weaver's model of communication, like so
much in science, is simple, if not simplistic. But its concepts of sender,
receiver, channel, signal, code and information form a tightly knit web
and serve for the elaboration of many insights into communication.
Fundamentally, information is a decrease in uncertainty. Redundancy
is a diuseness of coding which can be exploited to give fault-tolerance.
The inadequacies of the model are as revealing as the phenomena it
does t, a common benet of the simplistic models science entertains.
The limitation to nite codes, and the lack of analysis of the message|
what is communicated|are the most serious problems and are closely
related to each other. A theory of communication that can deal with
messages of indenite length, where the messages themselves build
up the context for the messages' interpretation requires memory. An
analysis of memory|the storage and retrieval of information during
communication|involves us immediately in questions about what the
mind of sender and receiver are like.
The only way to deal with essentially innite possibilities is through
capturing generalisations about them. The next chapter will focus on the
treatment of rules and regularities in reasoning. We choose an example
phenomenon which requires that we take the dierent perspectives of
several disciplines to illustrate how they relate to each other. This will
give you some concrete examples of communication phenomena and
MIT Press Math6X9/2003/02/27:16:11 Page 26
26 Chapter 2
Exercises
Exercise 2.1: Find an example from everyday communication where
a code is redundant and where that redundancy reduces errors which
might otherwise ensue. Find an example where too little redundancy
leads to many errors.
2.3 Readings
Miller, G. A. (1967) The psychology communication. NY: Basic Books.
Lyons, J. (1991) Chomsky (3rd. edition). London: Fontana Modern
Masters.
MIT Press Math6X9/2003/02/27:16:11 Page 27
3.1 Introduction
This chapter will take an in-depth look at one very specic experimental
situation which appears to show widespread failure, on the part of some
very intelligent people, to understand some very simple language. We
choose this experiment for such protracted analysis because it can tell
us so much about dierent disciplinary perspectives on communication,
and the need for integration.
In this chapter we will focus on how people seek evidence for rules
which express certainties. In the next chapter we will look at how people
manage in a world which is largely made up of uncertainties. We will
see that many of the same issues crop up, albeit in dierent guises.
In what follows, we will sometimes get you to be the guinea pig
in informal experiments. We do this because it is crucial to experience
these phenomena at a personal level, and only then to try to explain
them at a scientic level. If you just read about the strange things
observed by psychologists you might be inclined to dismiss them as
something other people do. Then you would be unlikely to feel the
importance of understanding them. We the teachers did these `strange'
things in just the same way when we originally were set the same puzzles.
Sometimes we will see that scientists' theories about these phenomena
(which have been widely accepted) are not much more sophisticated
than their subjects' `errors'. Sometimes it is arguable that people's
`errors' can be seen as more rational than they at rst appear when
interpreted in the right context. But you cannot grasp what cognitive
science is all about unless you understand this to-ing and fro-ing between
the perspectives of personal, subjective experience and public objective
observation and explanation.
We make no apology for spending such a lot of time on what will
at rst appear as a tiny topic. If we can give you some insight into how
the various disciplines approach this one topic, we will have succeeded
in our aims. If we succeed we will have changed the way that you reason
yourself as a practical activity. We will have given you a better idea of
how science works in this confusing area|what is theory and what is
data for each of the approaches. You may also have a rather dierent view
MIT Press Math6X9/2003/02/27:16:11 Page 28
28 Chapter 3
of the structure of your own mind. We would feel we had succeeded if you
come away with both some glimpses of scientic understandings to be
had; conjoined with a better grasp of how little we currently understand
of some truly baing phenomena.
30 Chapter 3
no other cards. These results are highly reliable in that they have been
repeated many times on a wide variety of student subjects with very
similar results.
In normative mode, what should subjects turn? Before answering this
question we should pause and ask what external standard could there
be? If the vast majority of sane people choose a particular combination,
must that be right? What authority could possibly tell us we were all
wrong. After all, this is English. We know what if . . . then means. Could
nearly all the folks be wrong nearly all of the time? Wason employed as
a competence theory classical logic. We return to describe this logic
in Chapter 5. According to Wason's interpretation of this standard
of reasoning, only 5% of subjects got the `right answer'. Like many
experiments in psychology, this one is famous because people don't do
what some theory seys they should do. If they had done, you and we
would never have heard more of Wason's experiment. These results are
surprising. You should ask yourself whether you agree with Wason that
turning A and 7 is the right thing to do, and write down your conclusions.
Before discussing how to interpret these results, let's look at some
other probings of people's judgements that have been made in an at-
tempt to understand what is going on. One of the very rst points to
emphasise is that we do not automatically know why we did what we
did any more than anyone else does. We can collect peoples' reports
of what they think about what they did, but these are just more in-
direct evidence about what went on in their minds as they made their
card choices. People may have forgotten why they chose the cards they
chose, and they may not have known why they chose them in the rst
place. nevertheless, asking subjects to explain themselves may turn up
useful information even if it cannot always be taken at face value.
The reasons given for turning A usually sound somewhat as follows:
I turned the A because if there's an even number on the back of it, then
the rule is true. What people who don't turn 7 say is usually something
like: The rule doesn't say anything about odd numbers. Were your own
explanations similar to these?
Some experiments have tried getting students to list what possibil-
ities there are for the other sides of the cards, and to explicitly judge
each possibility for what it means for the truth of the rule. Careful lis-
tening indicates that for the A card, subjects typically realise that an
even number will make it a positive case, and an odd number a negative
MIT Press Math6X9/2003/02/27:16:11 Page 31
32 Chapter 3
But it doesn't take too much thought to see that these stances are not as
separable as might at rst appear. If our standards of what people ought
to do accorded with what 95% of people did do, you would never hear
about this task. It is interesting because people get it wrong according
to the standards Wason adopted. It might be that people mostly get it
wrong, but it also might be that the standards are wrong, or at least
that they are only one possible set of standards.
But the tension between descriptive and normative stances is not
just important because it determines what phenomena psychologists
nd interesting. It is also important because people themselves have
standards by which they judge their reasoning and their behaviour is
aected by those standards. For example, when some subjects who
did not choose the 7 card initially turn the 7 and nd an A on the
back, they utter some expletive, and proclaim they previously made
a mistake. So even a descriptive theory of what people do has to
acknowledge the notion of error, because the subjects acknowledge error
themselves. Whenever behaviour is goal-oriented, there is likely to be
both a descriptive theory of what people do and a normative theory of
what they ought to do, and people themselves are likely to have some
grasp of this distinction.
Before we turn to explanations of what people do in this task, we
will take a brief trip abroad.
34 Chapter 3
36 Chapter 3
For example, Harris (19??) has shown that even very young children
can be got to accept some approximation of the `syllogism game' by
suitable contextualisation. In Harris' experiments, children are given
syllogisms such as `All cats bark. Fido is a cat. Does Fido bark?' When
four year olds are given such problems without any clues as to how they
are to be interpreted, they experience the same kinds of clash between
their current real-world context and the content of the problem that
the Liberian peasant appears to experience. Their knowledge that cats
don't bark clashes with the problem content. However, if the child is
given some cues that the problem is to be interpreted as being about
a world of its own dened by the premisses, perhaps by prefacing the
problem with the statement \On this really strange planet . . . ", then
the child is quite likely to conclude that Fido barks.
Harris cites his ndings as showing that Scribner must have been
wrong to suppose that schooling is what changes the response from
a particularistic one to a `logical' one, because his four-year-olds have
not been exposed to schooling. I think its clear that Scribner would
have been quite happy with Harris' ndings. She did not claim that
there was no context in which unschooled subjects would adopt the
interpretation intended by the experimenter|only that the context she
used in her experiments divided subjects along the lines of schooling
they had received. It is clear that Harris' four-year-old subjects do not
spontaneously adopt the relevant interpretation in Scribner's situation,
not without those crucial extra clues which tell them to isolate their
interpretation from the current real-world context. Nor is it clear Harris's
four-year olds are really playing the fully
edged `syllogism game'.
So where does this leave us with the issue about the importance
of discourse and of logic? And with the nature of the force of `logical
compulsion'? And what about undergraduate students' responses in the
selection task|our point of departure for this trip to Africa? Well,
Wason condently asserted that his undergraduate subjects (or at least
95% of them) had made a fundamental logical error in their card-
choices in the selection task. Is Wason guilty of a lack of sensitivity
even greater than Evans-Pritchard? After all, Evans-Pritchard at least
saw the problem as how to account for the clash between the extreme
sophistication of Dugon cosmology and their apparently dismal failure
to engage in logical puzzles. And Evans-pritchard did locate the problem
in the nature of the kinds of discourse the Dugon were willing to engage
MIT Press Math6X9/2003/02/27:16:11 Page 38
38 Chapter 3
in.
If you, the (patient) reader, are at all like the undergraduate subjects
who people the extensive literature on the selection task, it is likely
that you accepted Wason's explanation of the `correct' response in the
selection task, and you accepted that your initial choices of cards as
an error of reasoning.3 But perhaps you, the reader, were too quick to
accept that you were wrong on this second count? That is, if you accepted
Wason's account of what you should do, then perhaps you were wrong to
accept so quickly that you had made an error and that Wason's criteria
of correctness are themselves so obviously correct?
This may seem like rubbing salt in the wound|rst students' reason-
ing is supposed to be wrong, and then when they accept experimenters'
explanations of why they were wrong, they are then told that this accep-
tance may be wrong as well. Many students are at this point willing to
accept their initial mistake but not willing to accept the possibility that
they are thereby mistaken. Nevertheless, it is this possibility we explore
in the next two sections as we review some explanations of the observed
card-choices in the selection task. As counseled above, you would do well
to record your own opinions on these matters so that you can compare
them with your considered opinion when we are done.
40 Chapter 3
and only if' reading at the same time as the constant anaphor reading,
then the choice of cards they ought to make is dierent again. Again,
work out what cards they ought to turn if this is their interpretation.
We will return to more complex kinds of explanation in terms of alterna-
tive logical models presently, but having raised this questionmark over
Wason's chosen denition of correct performance, let us now take a look
at the explanation he himself favoured for his striking data.
Verication and falsication
We now turn to take a look at some what psychologists who have
accepted Wason's standard logical account of the meaning of if . . .
then have had to say about the selection task. Most psychologists have
acknowledged that an error has been made, and try to explain why.
Probably the dominant explanation given is in terms of a contrast
between verification and falsification. People, or so it is claimed,
seek the instances which `make the conditional true' but ignore the ones
that make it false, thus betraying a preference for `verication' at the
expense of `falsication'. Thus, the story goes, they turn the A and often
the 4 because a 4 or an A respectively, will conrm the rule.
Leaving aside whether this is a coherent explanation, let us rst
ask why verication is supposed to be a bad strategy? The conditional
states a universal rule| whenever there is a vowel, then there is always
an even number. Suppose we consider what evidence can be found for
and against this universal rule. An A with a 4 is evidence for the rule|in
the sense that it is one case of its truth. We might argue whether a Q
with a 7 or with a 2 are cases where the card complies with the rule,
but at least an A with a 4 is a clear case of compliance. On the other
hand, an A with a 7 does not comply the rule. However, there is an
asymmetry between the true case and the false one. Although one A4
case might encourage us in belief in the universal rule a bit, just one A7
is enough to destroy our belief entirely (or so the argument goes). So
this explanation of people's behaviour is that they incorrectly seek to
verify instead of trying to falsify.
Before going any further, we should note that there is a problem
about terminology here. If we believe that the only way to verify a
universal rule, is to seek (and fail) to falsify it, then verication is
not what people are doing when they seek only positive instances. In
fact seeking verication, on this account, can only be done by seeking
MIT Press Math6X9/2003/02/27:16:11 Page 41
falsication. This terminology could be xed, but I will argue that
the theory underlying it cannot be xed so easily. If you read the
psychological papers, remember that they usually mean `seek positive
examples' when they say `verify'.
This explanatory apparatus is taken from the philosopher Karl
Popper|an interesting (and salutary) case of interdisciplinary borrow-
ing. Popper was interested in what distinguished scientic method from
pseudo-scientic posturings. In particular, his two ardent hates were
Marxism and Freudian psychology which he placed rmly in the sec-
ond category. Popper's complaint against these two colossi of the 19C
was that their theories were not falsiable. Whenever some observation
came to light which did not t the theory, then another wrinkle could
be added to explain the apparent counterexample away. Science, Popper
claimed, was always about successive falsication of theories which pro-
duced a gradual convergence towards truth (but no method of absolute
verication).
But Popper was wrong that this criterion can distinguish science
from non-science. The most famous historical case of adding `wrinkles'
to a theory was Ptolemaic astronomy|based on the idea that all celestial
bodies were situated on spheres which controlled their orbits around the
earth. Whenever an inaccuracy in the predictions of planetary motion
was noted, another epicycle could be added to the theory|another
sphere centred on the surface of an existing one would assimilate the
observation. By the time that Copernicus proposed that the sun was in
fact at the centre of the planets, and Kepler that the orbits were ellip-
tical, Ptolemaic astronomy was a mass of competing conglomerations of
epicycles each due to a dierent author. The data still didn't t all that
well in practice, but it can be shown that, allowed enough epicycles, any
observed motion can be generated as `spherical motion'. So Ptolemy and
his followers were just as guilty as Freud and Marx of holding unfalsi-
able theories. There are other dierences between their theories, but a
strong tendency to non-falsiability is something they share.
For a theory of how science works which pays more attention than
Popper's to the data of science in action, we can turn to the sociologist
Thomas Kuhn. Kuhn argues that such responses to anomaly are utterly
typical of scientic communities engaged in what he dubbed `normal
science'. Normal science is the activity that goes on within a paradigm
when there is general acceptance of a conceptual framework. Normal
MIT Press Math6X9/2003/02/27:16:11 Page 42
42 Chapter 3
44 Chapter 3
had better nd some way of dierentiating these two cases. Let's try a
digression into ornithology.
The ravens paradox
There once was an ornithology student who abhorred getting wet. Fortu-
nately, considering his eccentric choice of career, he also chose to concen-
trate on philosophy during his rst year at university. In his elementary
logic class he learned that there was a rule for reasoning from conditional
rules which went as follows: the statement that P implies Q is logically
equivalent to the statement that not Q implies not P. When the rst or-
nithology practical class came around, the task was to nd out whether
all ravens are black. Our hydrophobe friend saw immediately how to
exploit his new found logical law. All ravens are black means `If it's a
raven, then it's black' and that is logically equivalent to the statement
`if it's a non-black thing, then it's a non-raven'. So far so good. When
his hardy classmates all piled out on a eld trip into the wet Caledonian
hinterland in search of white ravens (being good falsicationists to the
last one), our friend went for a large pile of miscellaneous objects in the
corner of his professor's untidy lab. When asked what he was doing, he
explained that each non-black thing he found in the pile which turned
out to be a non-raven (say something white which turned out to be a
tennis shoe) was another piece of evidence for the ornithological law.
But our friend did not stop there. Being both a good falsicationist
and a good team player, he explained that a single instance of a non-
black thing which turned out to be a raven would prove that the rule was
false. And what is more, his evidence would be completely complemen-
tary to his class-mates. No worries about accidentally counting the same
raven twice. So, the thinking-philosophers' ornithologist stayed dry. But
he failed his practical.
Seeing why ornithology is an all-weather occupation helps to see one
of the in
uences which is at work in the selection problem. How is the
paradox of the raven to be resolved? There is rst the obvious problem
that if the student knows that the pile of non-black things which he
searches are tennis shoes, and he knows that tennis shoes are non-ravens,
then he is guilty of the drunk-under-the-lamp post fallacy of search4 . He
4 The drunk who knows he lost his keys when he fell in the ditch nevertheless
perseveres in searching beneath the lamp post. When asked why, he replies that
the light is far better there.
MIT Press Math6X9/2003/02/27:16:11 Page 46
46 Chapter 3
48 Chapter 3
50 Chapter 3
whether there are numbers on one side of the cards and letter on the
other sides. They would then have to turn all the cards to test the rule.
The experimenter is, moreover, an authority gure. If the rule should
turn out to be false, the experimenter could be interpreted as lying to
the subject. This is an uncomfortable social situation. None of this arises
with the deontic rule where the subject is asked to test whether some
third parties (the drinkers described) are breaking a law.
On this account, the descriptive and deontic versions of the selection
task are just two rather dierent tasks and one (the descriptive task)
should be much harder than the deontic one, as is observed. Unfortu-
nately, no knowledge of our Pleistocene ancestors accrues from this ob-
servation. Mental models theory also fails to make this crucial semantic
distinction in its explanations of subjects' behaviour.
A common response from psychologists to this line of argument
is that whereas philosophers and logicians may enjoy such fastidious
arguments about the meanings of English sentences, psychologists, along
with their subjects, are more down to earth people who just get on with
what the task obviously requires. To quote Cosmides:
After all, there is nothing particularly complicated about the situation de-
scribed in a rule such as `If a person eats red meat, then that person drinks
red wine'. Cosmides and Tooby (1992), footnote 14, page 223.
Here is a fundamental disagreement. We believe that in coming to
adopt one of the many possible (albeit each simple) interpretations of
such sentences subjects actually must engage in rather complex and
sophisticated reasoning, even though they may not be aware of what
they do, nor have the terminology and concepts to report it even if
they are aware. It is a useful homework exercise to note down some of
the many dierent interpretations of Cosmides example.6 The fact that
in this case we may be able to guess the commonest one from general
knowledge is neither here nor there.
Talking to the subjects|carefully
So how are we to decide between these explanations? The semantic
dierences between descriptives and deontics are well established, and
they do have the consequences for reasoning in these two versions of the
6 Hint: the commonest interpretation in our culture should strictly be stated \If a
person eats red meat and drinks wine, then they drink red wine." ie. it is not false
merely because people often eat red meat without drinking any wine.
MIT Press Math6X9/2003/02/27:16:11 Page 52
52 Chapter 3
task just described. But are they the cause (or among the causes) of
the diculties subjects have in the descriptive tasks? And how are we
to nd out? We would certainly agree with the psychologists that there
is an extra empirical question here|are these really the problems the
student subjects suer from?
This is a nice example of the importance of being clear about the
dierence between conceptual and empirical questions. The conceptual
dierences between deontic and descriptive rules and their implications
for reasoning are clear enough; but the empirical question as to whether
subjects' reasoning is actually controlled by these factors in the relevant
way is an empirical question we cannot settle in our armchairs. Equally,
we cannot interpret any set of experimental results without noticing
these semantic dierences between the tasks. The whole experimental
method depends on understanding all the dierences between experi-
mental conditions, and these semantic dierences have been missed by
researchers on the selection task.
Fortunately there are ways of distinguishing possibilities in the lab-
oratory. Two broad approaches have been used. One is to use `socratic
tutoring' to elicit subjects' reasoning and the other consists of a program
of modied experiments of the traditional kind.
The socratic tutor uses open-ended questions without direct feedback
to elicit the reasoning that subjects do. It is an interesting half-way house
between getting subjects to make simple introspective reports of their
reasoning, and the traditional experimental approach. Socratic dialogues
are not best thought of as reports|rather as arguments by the subject.
Nevertheless, the elicitation may well aect the subjects' reasoning so
it provides at best indirect evidence about subjects' reasoning in the
standard task. This variety of methods and their interrelations is an
important reason for choosing Wason's task as our example eld. All too
often researchers stick with a single kind of experiment as a source of
information about mental processes|its much easier to run the changes
on a single paradigm. We think it is better that converging evidence
from as many dierent methods as possible is brought to bear. That is
another aspect of cognitive science.
The following snippets of dialogue come from several subjects in
typical socratic tutoring dialogues about Wason's task. They are chosen
to bear on some of the semantic diculties predicted to arise from
descriptive rules in the selection task. In each case, the subject is only
MIT Press Math6X9/2003/02/27:16:11 Page 53
54 Chapter 3
..
.
S. But if that doesn't exclude the rule being true then I have to turn another
one.
E. So you are inclined to turn this over [the A] because you wanted to check?
S. Yes, to see if there is an even number.
E. And you want to turn this over [the 4]?
S. Yes, to check if there is a vowel, but if I found an odd number [on the back
of the A], then I don't need to turn this [the 4].
E. So you don't want to turn . . .
S. Well, I'm confused again because I don't know what's on the back, I don't
know if this one . . .
..
.
E. What about the 7?
S. Yes the 7 could have a vowel, then that would prove the whole thing wrong.
So that's what I mean, do you turn one at a time or do you . . . ?
Subject 10 gives evidence of the problem of contingency between the
card turnings, information gained, and further turnings. The subject
wants to give an answer which takes into account the contingencies
between what is found on the back after one turning, and whether any
more turnings are needed. You might say that it is obvious that the
experimenter is asking what cards must be turned if the information gain
turns out to be as uninformative as it can be. But again the instruction
to minimise turnings is quite naturally interpreted as implying that the
choice of turns should be contingent.
Subject 5.
56 Chapter 3
revealed not the slightest trace of the problems predicted then, although
that would not be conclusive evidence against the predictions, it would
certainly have motivated us to think harder about the developing theory.
But these dialogues also strongly suggest several important points
about the mental processes which go on in doing the selection task which
have not emerged from the hundreds of papers which use more straight-
forward experimental techniques. Although the snippets we quote here
are too short to reveal this, the whole dialogues strongly suggest that
dierent subjects have very dierent reasons for choosing (or not choos-
ing) the very same cards. This suggests that there are what psychologists
call individual dierences between subjects which are as important to
explain as the overall group tendencies. Dierent subjects experience
dierent problems and reason in dierent ways, and a good cognitive
theory should be able to describe and ultimately explain how and why.
Even in the standard experiments, there are several common patterns of
card choice but very little interest has been shown in the question what
do the dierences tell us about the subjects.
One kind of theory of individual dierences is that some people are
just more intelligent than others and this is what strings them out along
a dimension of dumb-to-clever responses. But this doesn't help much in
the selection task. Even if we accept a particular theory of what is the
`clever' choice, how are the other possible choices to be ordered relative
to this standard? And if dierent subjects make the same choice for
dierent reasons, merely ordering the responses may not help. What we
surely ought to seek is a theory about the dierent mental processes
which lead to the dierent responses?
There are good reasons why psychologists often begin in an area by
seeking to describe and explain regularities across groups of subjects.
With regard to the selection task, they have chie
y asked what factors
control how many people make the selection Wason's particular theory
says people should. Remember what was said in the Introduction about
dierent idealisations in science.
A further kind of question that can be raised about the selection task
is how do subjects learn to do the task correctly? In the socratic dialogues
we see subjects changing their mind during the dialogue. These changes
are not always in the right direction toward the expected choices or
explanations, nor always permanent. Often a subject who is clear about
one thing and confused about another at one point, gets into a state of
MIT Press Math6X9/2003/02/27:16:11 Page 57
mind in which the clarity and confusion swap around. But sometimes
there is progress toward insight into the task, and sometimes insight
is lasting. We ought to be able to provide a cognitive theory of this
learning. It is very early days in the development of such a theory.
What we do know is that a theory needs to explain how new
knowledge is built on old knowledge. Subjects implicitly know a great
deal about their language and about reasoning, though not very much
of what they know before they start tutoring is explicit. They lack
terminology and concepts. Or perhaps it is better to say that they have
lots of terminology but they don't always use it consistently. Nor have
they always dierentiated the many meanings they have for words like
`truth' and `falsity'. So socratic tutoring is highly suggestive, and could
be a route to studying learning, but we need to return to the original
task setting to substantiate our theory.
Experimental corroboration
Returning to our second question, how can we nd out whether subjects
doing the task as Wason invented it, actually encounter the problems
which the socratic dialogues reveal? In other words, does the socratic
tutoring reveal mental processes which are already going on in the
original task (whether or not subjects are aware of them), or does the
tutoring radically change what subjects think and how they reason?
One way of approaching these questions is to design variations
in the task to alleviate specic problems, and then to see whether
these changes actually facilitate subjects' reasoning. If they do, that
provides some evidence that subjects do suer from these problems
in the standard task, whether they can report them or not. We will
illustrate with three example manipulations. One is an instruction to
alleviate misunderstandings about contingencies between card-choice
and feedback; one is an instruction to separate the source of the rule
from the source of the other information about the task; and the third is
a two-rule version of the task designed to induce a brittle interpretation
of the rules as intended to be exceptionless.
If subjects are confused about whether they are intended to specify
all the card turnings that would be necessary on the assumption that
the backs were minimally informative, or whether they should imagine
making card-choices contingent on what they nd as they go, then simply
clarifying this by adding the following to the standard instructions might
MIT Press Math6X9/2003/02/27:16:11 Page 58
58 Chapter 3
help:
. . . Assume that you have to decide whether to turn each card before you get
any information from any of the turns you choose to make.
It turns out that this helps about 15% more subjects to make the
Wason's `correct' choices, and raises the turning of the problematical
not-Q card from 18% to 47%. Although it may intuitively seem implau-
sible, this is strong evidence that this confusion is an important one in
the original task.
The second manipulation is an instruction designed to separate
the source of the rule from the experimenter, in order to clarify the
communication situation and to remove the discomfort of questioning
the truthfulness of an authority gure. Here the following italicised words
were added to the standard instructions:
. . . The rule has been put forward by an unreliable source. Your task is to decide
which cards (if any) you must turn in order to decide if the unreliable source
is lying. Don't turn unnecessary cards. Tick the cards you want to turn.
You should rst ask yourself what is the `correct' answer in this case?
You should then ask yourself, on the basis of your experience of doing
this task and the original one, whether you think subjects will nd this
MIT Press Math6X9/2003/02/27:16:11 Page 59
60 Chapter 3
E. So say there were a A on the back of the 4, then what would this tell you?
S. I'm not sure where the 4 comes in because I don't know if that would make
the A-one right, because it is the opposite way around. If I turned that one
[pointing to the A] just to see if there was an 4, if there was a 4 it doesn't
mean that rule two is not true.
Part of the diculty of the standard task involving a descriptive rule
is the possibility of confusing the two relations between rule and cards.
Transferring the `truth of the card' to the `truth of the rule' may be
related to what Wason called `verication bias', but it cuts a lot deeper.
Of course, we cannot tell from these results how many subjects suer
from combinations of these problems or the several other problems that
arise in the descriptive but not the deontic selection task. That is a
topic of active research. The question cannot be answered simply by
combining all the helpful instructions for reasons you should be able to
deduce. But we can tell that the two tasks, descriptive and deontic, are
quite dierent tasks because descriptive and deontic rules have quite
dierent meanings and quite dierent logical forms.
One nal demonstration of the importance of a careful approach to
logical form before we move on to the broader implications of these
results. In the spirit of broad exploration seeking as many kinds of
evidence as we can get, we should ask ourselves, how much of the
problem in the selection task is due to the complexities of the meaning
of conditionals? One very simple way to nd out is to use a dierent
`logical connective' . What would happen if we simply substitute the
following rule in the standard task:
Rule: There are vowels on one side of the cards and even numbers on the
other.
62 Chapter 3
for the subjects doing the original task. This sequence of investigations
serves as an illustration of how the dierent methods (socratic tutoring,
controlled experiment) can be used together to support each others'
strengths and weaknesses.
64 Chapter 3
these disciplines. Instead their view of language is that there are many
logical models of parts of natural languages and the appropriateness of
these models is determined by very subtle contextual cues. It is highly
ironic that psychologists who have been most adamant that logic cannot
serve as a basis for human reasoning because people do not reason in
virtue of form, have in fact adopted the notion that a single logical model
can serve as the only standard for correct performance by which to judge
subjects' reasoning|how very 19C! In Part II, we will look further into
the nature of logical models of meaning and how they can serve as the
foundations for a psychology of reasoning and communication.
But rst we will turn back and look at the implications of our argu-
ments for psychological interpretations of the selection task. First and
foremost, logical treatments of language assign many dierent logical
forms to the same sentence. Although once a form and its interpretation
form have been adopted, then reasoning from that form is mechanical
(blind to content), we cannot ignore the process of choosing interpre-
tations. Interpretation and reasoning are interactive processes. We may
interpret something one way, and then when reasoning from that inter-
pretation, we may come to a conclusion that makes us doubt whether
we have got the intended interpretation. This may lead us to change our
interpretation, and to start reasoning again. We see this kind of interac-
tion going on in the socratic dialogues. So logic is made of two parts|an
interpretational apparatus and a reasoning apparatus. Taken together,
these two parts make logic a theory about how reasoning proceeds dier-
ently in dierent contexts. Psychologists have tended to interpret logic
as a theory that reasoning is universally the same in all contexts. This is
simply a misunderstanding of logic|a failure of interdisciplinary com-
munication.
Of course, the fact that so many reasonable interpretations of bits
of language are possible makes the study of human reasoning and
communication much more complex than it would be if we all spoke a
single language with a xed interpretation. But there are many obvious
reasons why this is not possible, and why life (and cognitive science) are
so rich and interesting.
The second implication of our arguments for cognitive science is
that the relation between psychology and logic/computation has to
be thought of quite freshly. We will illustrate this point through a
brief discussion of the `evolutionary psychologists' arguments based on
MIT Press Math6X9/2003/02/27:16:11 Page 65
the selection task. Cosmides argues that since deontic and descriptive
conditionals are of the same logical form, the fact that we can reason so
well about drinking-age laws and so poorly about vowels and consonants
means that we evolved a special `cheater-detection' module in our brains
in the Pleistocene era when we humans separated o from our primate
ancestors. This cheater-detection module is supposed to enable us to do
a certain class of deontic selection tasks but not descriptive ones. How
exciting that we can unlock the secrets of the Pleistocene? But what
implications do our observations that the logical forms of the rules are
dierent have for this argument?
The simple implication is that if descriptive rules are just dicult
because their semantics makes it complicated to reason about evidence
for them in the selection task setting, then this removes the founding
evidence for this brand of evolutionary psychology. Perhaps we should
simply note this failure and move on? But we believe that this response
misses some valuable lessons. There are many peculiarities about the
evolutionary arguments. For example, the arguments go from one very
specic behaviour in one very specic task to draw sweeping conclusions
about what happened to our ancestors as they roamed the savannah.
The arguments do not even stop to ask some obvious rst questions
about how selection task behaviour ts into the larger pattern of modern
undergraduates' cognition.
The most obvious surprise about the selection task is that under-
graduates are clearly very adept at understanding descriptive conditional
sentences of their mother-tongues in lots of real-world communication
tasks. There are even other laboratory tasks well known before the evo-
lutionary arguments were made, that show that subjects are adept at
constructing cases which make conditionals true and false, and in evalu-
ating conditionals in many situations. So what explains these abilities?
Are there modules for each of them? How does merely positing modules
help to explain systems of abilities?
A next question is why cheating-detectors are not enough to enable
undergraduate subjects to perform the descriptive selection task? After
all, if the rule is false, the rule tells a lie, and lying is a form of cheating.
We have already seen that when the instructions are changed slightly
and subjects are asked to nd out whether a third-party source of the
rule could be lying, this makes the task substantially easier, even if
not as easy as the drinking-age task. So cheating detectors ought to be
MIT Press Math6X9/2003/02/27:16:11 Page 66
66 Chapter 3
68 Chapter 3
tant. The particular way that the task is abstract relative to more nor-
mal communication situations is very common in university-level educa-
tional situations. Students must learn how to interpret written material
abstracted from its world of application. They must learn on the basis
of indirect representations how to acquire new concepts in these inter-
pretations, and when to take something on trust, and when to test it
against evidence. Logic began in classical Greece as an `educational tech-
nology' for teaching the skills of political and legal debate. Its emphasis
on making all information explicit, and on making sure that interpre-
tation remains constant throughout an argument is critical in learning
how to learn more easily. Logic still can be an important educational
technology.
So where does this leave us with regard to the questions raised
on our anthropological eld-trip to Africa? There we compared Wason
the psychological experimenter to Evans-Pritchard the anthropologist,
accusing their `subjects' of irrationality. We have now laid out arguments
to the eect that it was Wason who was deluded about the simplicity of
his model of what people ought to do in his task, just as Evans-Pritchard
underestimated the diculty of assessing the rationality of the Azande's
witch-craft beliefs. On our account our undergraduate subjects should be
seen as struggling to make sense of a very complicated set of information
that contains many con
icts in need of resolution. We do not argue that
they necessarily succeed in nding a lucid resolution. Nor do we argue
that they have perfect and explicit control of their knowledge of the
domain of reasoning. Just as we argued that in the case of Scribner's
observations of primitive peoples' reasoning, there are kinds of discourse
which our students don't readily engage in. Our subjects are neither
infallible nor irrational, just intelligent and human.
70 Chapter 3
alone|they are quite reasonably much guided by what they take the
intentions of the author of the words to be. Science places heavy weight
on words, the concepts behind words, and on making intentions explicit
in the text. Scientic concepts and terminologies, especially in the
cognitive sciences, are often confusable with ordinary uses of the same
terms. But the reader should be warned|the exact meaning is critical.
This has very direct consequences for the way that this text should be
read. If you, dear reader, have arrived at this point in the chapter after
one pass through, then it is highly unlikely that you have extracted the
moral of our tale. You should expect to have to re-read several times.
After all, we have spent forty pages discussing an experiment which
hinges on the meaning of a single innocent looking sentence|\If there's
a vowel on one side, then there's an even number on the other". Our
own language is novel, and not necessarily simpler.
But merely re-reading in the same fashion as a rst pass is unlikely to
help much. How should the reader go about this task? What is important
and what can be discarded? The best answer to this question is that it
is the structure of the argument that is crucial, and it is the structure of
the argument that determines which details are necessary because they
carry that structure, and which can be discarded because they don't.
Let's take this chapter as an example and review the structure of its
argument.
First an experiment was introduced|Wason's selection task. The
task was presented and the results summarised. Second, Wason's in-
terpretation was then described|according to Wason, almost all his
subjects got this task wrong. Thirdly, this idea of wrongness was itself
described and given Wason's justication. Fourthly, this situation of an
observer claiming that almost every one of some subject population got
some simple task wrong was likened to anthropologists who put them-
selves in the same situation when commenting on the reasoning of prim-
itive peoples. Fifthly, we turned to look at Wason's explanations of why
his subjects got things wrong in terms of verication and falsication,
along with the origins of this explanation in Popper's philosophy. Sixthly,
the possibility of a quite dierent standard of rightness or wrongness of
performance was raised through consideration of the Ravens Paradox|
perhaps subjects did what they did because of the arithmetic governing
the seeking of evidence in their natural habitat?
At this point, more experimental evidence was introduced. Drinking-
MIT Press Math6X9/2003/02/27:16:11 Page 71
age rules make the selection task very easy|at least according to
Wason's denition of correct performance. Almost as many subjects
from the same population get this task `right' as got the other task
`wrong'. Yet the tasks look as if they are of identical form. With these
new observations, came a new interpretation and new explanations|the
evolution of `cheating detectors' was supposed to explain the new data.
This new explanation is in turn a new argument with new questions.
Why can't cheating detectors perform the descriptive task? Are the
deontic and descriptive tasks of the same form?
At this point we turned to consider what logic and linguistics had to
say about the many meanings of conditional rules. Logic draws a very
basic distinction between deontic and descriptive semantics|how the
world is and how it should be. Paying careful attention to this distinc-
tion reveals how the vowels-and-consonants task poses quite dierent
problems than the drinking-age law. The two rules are interpreted as
having dierent forms. Brittleness to exceptions, contingencies of re-
sponse on new evidence, anaphora, biconditionality, card-reversibility,
the social psychology of authority gures all come into play in the rst
task but not the second.
Logic can show how the semantics may make the tasks dierent, but
it cannot demonstrate that these dierences determine how people act
dierently in the two tasks. For that we needed to go back to the labora-
tory, rst collecting Socratic dialogues and then controlled experimental
evidence. Observing dialogues is a dierent kind of evidence with dif-
ferent strengths and weaknesses than experimental evidence. Again the
observations are described, and then interpreted, and then explanations
based on them were oered. With those explanations we revisited the
evolutionary explanations and came to a quite dierent view of the rela-
tion between psychology and logic. Finally we returned to the big picture
and argued that Wason's subjects are to be seen as struggling, quite rea-
sonably, to nd an interpretation of all the information they have which
makes the task sensible to them.
Our dierent picture leads us to very dierent conclusions about
evolutionary psychology, and about the relation between logic and psy-
chology more generally. Rather than seeing two disciplines competing to
give explanations of reasoning mechanisms, we see the relation between
logic and psychology as analogous to the relation between mathematics
and physics or biology. Logic is the mathematics of information systems,
MIT Press Math6X9/2003/02/27:16:11 Page 72
72 Chapter 3
and people, at one level of analysis, are one kind of information system.
Psychology is concerned with applying the mathematical descriptions
produced by logic to understanding how information systems are im-
plemented in people. This implementation is as likely to be in feelings
as in symbols, and does not mean that people cannot make errors in
reasoning. This relation between mathematics and experiments is much
more typical of science in general than the model that has been com-
mon in psychology. Logic cannot settle empirical facts by musings in
armchairs, but psychological experiments cannot be interpreted with-
out coherent conceptual systems either. Logic and psychology are not
in competition|they ought to be in collaboration. A much richer ex-
perimental science can ensue, as we hope to have illustrated in a small
way.
This has been the structure of the argument. This is what you should
be able to produce as a precis of this chapter. Why isn't the chapter
replaceable by the precis? Because the devil is in the details. It matters,
for example, exactly what wording Wason used in his experiment, and
exactly how the drinking-age task is phrased. How can one tell which
aspects of Wason's wording is critical? One can't|at least one can't
by simply looking at the wording. Does that mean that one should
memorise the wording of Wason's original paper? No! because not all the
features of the exact wording do make important dierences. And, as the
`conjunction' experiment shows, it is often not the wording alone that is
important. How can one tell the wheat from the cha? Only by working
through what details play what roles in the argument's larger structure.
There has to be a continuous trac between details and argument.
This all sounds like hard work? Why not just remember the details?
In Chapter 6 we will show that remembering the details is even harder
work. Yes, it is hard work understanding the argument, but once you
understand the relationship between the details and the argument you
will nd the details rather easy to remember|honest. And the argument
is the only thing worth having, in the end, because it is the generalisa-
tions of the argument that tell us something about how the human mind
works|what we set out to nd out about.
A common student response at this point is to complain that its all
very well the professor stipulating that you have to remember the right
details and ditch the irrelevant ones, but the professor has a huge body of
specialist knowledge about the hundreds of experimental variations that
MIT Press Math6X9/2003/02/27:16:11 Page 73
have been tried and only that knowledge can really tell which details are
important. This is a reasonable objection. No student can be blamed for
not knowing about Bloggs & Bloggs 1932 article in the Annals of the
Transylvanian Society for Parapsychology which shows that such-and-
such change to the wording of the selection task makes no dierence
to the results.7 All any of us can do at any stage is to understand the
details we know of, in the matrix of our general explanations, in our
current state of knowledge. But none of us has any alternative to trying
to formulate what is critical about the details and what is irrelevant
surface detail in the light of the arguments as we understand them. The
student who produces the best argument for a quite dierent conclusion
than the professor's should always get the highest mark on the course,
even if the professor happens to know what is wrong with the particular
conclusion that is arrived at because of some piece of information the
student could not have known. The quality of argument is all important.
Again a student might object that this is not the marking regime that
they encounter. If that is the case, then we can only commiserate.
This brings us to the issue of authority for conclusions. A minimal
amount of asking around your university or searching a good library
or the web will easily reveal that there are many researchers in the
eld of human reasoning that do not agree with the conclusions reached
here. Some of the disagreements have been described here and some
not. Aren't we highly irresponsible presenting controversial topics to
introductory courses? Shouldn't we be presenting consensus topics so
that you can learn the `facts' of the discipline before having to mess
with controversial arguments? Our answer is emphatically `no'. As I
believe we have shown, in cognitive science, often the consensus `facts'
of one discipline are false in the next. It is a `fact' in modern logic and
linguistics that deontic and descriptive interpretations of rules involve
dierent logical forms. It has been a fact in psychology that they are
of the same form. In a young eld, the consensus facts may be both
hard to come by and rather peripheral when found. What is important
is the process of building arguments on data, seeking new evidence and
revising arguments|on that much Popper was surely right.
It is a corollary of these `facts' that you the student can very quickly
7 Except students might be failed for not wondering how the 1932 paper could
precede Wason's publication by 30-some years?
MIT Press Math6X9/2003/02/27:16:11 Page 74
74 Chapter 3
get to the point where you yourself can ask hard questions about how
the evidence ts together and design ways of nding answers to new
questions that arise. The one thing that is ruled out by the cognitive
science `house rules' is to exclude some piece of evidence because it comes
from another discipline|unfortunately this move is all too common in
academia generally.
Perhaps it is worth reviewing our trip around the disciplines and
what it has taught us about both the positive and the negative contri-
butions on this particular small topic? Logic contributes a set of concepts
and analytical techniques, and a set of alternative mathematical systems
with their own criteria of soundness, designed for modeling information
systems such as human beings. You will see examples in Chapter 5. Wa-
son adopted a particularly simple such system as his yardstick for correct
performance in his task. The psychologists who have followed Wason
have simultaneously rejected the relevance of logic to human reasoning,
while continuing to accept Wason's chosen logical yardstick. In this they
were logically already out of date when they started out. On the other
hand logic, like any other conceptual scheme and attendant mathemat-
ics, does not settle empirical questions from the armchair. Mathematics
is not science. Logicians may sometimes underestimate the distance be-
tween the inspiration for their mathematical systems and their actual
application to the world. By and large they have been content to model
their intuitions about human reasoning (subtle and interesting as those
are) rather than engaging with the messy business of modeling the real
data of human reasoning. If the selection task is anything to go by, there
are rich logical rewards in paying much closer attention to the data of
reasoning collected in carefully controlled (albeit articial) situations.
Psychology has contributed, above all, a reliably replicable experi-
mental situation in which groups of subjects do things which are not
transparently explicable and which get us rather rapidly into deep ques-
tions about the meaning of language, social norms, social authority, in-
terpretation, evidence, learning, communication and reasoning. The dif-
ferences between dierent versions of the task are radical in their eects
yet subtle in their explanation. However, by ignoring what is known by
other disciplines about relevant issues in human information process-
ing, psychologists have spent a lot of eort reinventing some old (and
rather square) wheels. This has de
ected them from some of the really
important psychological questions. While holding forth about cheating
MIT Press Math6X9/2003/02/27:16:11 Page 75
76 Chapter 3
3.8 Readings
Wason, P. (1968) Reasoning about a rule. Quarterly Journal of Exper-
imental Psychology, 20, 273{81.
The original paper that started it all o.
Wason, P. (1968) Reasoning about a rule. Quarterly Journal of Exper-
imental Psychology, 20, 273{81.
The original paper which kicked the whole thing o.
Bloor, D. (1991) Knowledge and Social Imagery. Chicago University
Press. (1st ed. 1976) esp. Chapter 7
Bloor's discussion of Evans-Pritchard's classical study of the Azande and
its implications for logic.
Scribner, S. & Cole, M. (1981) The Psychology of Literacy Harvard
University Press, Cambridge, Mass.
An introduction to the cross-cultural study of the psychology of reason-
ing.
Popper, Karl (1959) The logic of scientic discovery London: Hutchison
(rst published 1934)
The original statement of Popper's philosophy of science.
Kuhn, Thomas (1962) The structure of scientic revolutions. Chicago
UP. (3rd edition 1996)
Kuhn's study of the sociology of scientic theory development.
Chater, N. & Oaksford, M. (1994) A rational analysis of the selection
task as optimal data selection. Psychological Review, 101:608{631.
Presents the argument that an inductive Bayesian competence model
is the appropriate yardstick for judging subjects' performance in the
selection task.
Stenning, K. & van Lambalgen, M. (submitted) A little logic goes a
long way: basing experiment on semantic theory in the cognitive science
of conditional reasoning.
http://www.hcrc.ed.ac.uk/ keith/WasonSelectionTask/cognition.pdf
A fuller account of the semantic theory, the socratic dialogues, and the
experiments described brie
y here, along with a review and a lot more
references to the existing literature.
Cosmides and Tooby (1992) Cognitive adaptations for social exchange.
MIT Press Math6X9/2003/02/27:16:11 Page 78
78 Chapter 3
Exercises
Exercise 3.1: Since this chapter has been peppered with `exercises' in
reasoning and justication, the obvious post-chapter exercise is to review
the notes of your own answers which you made during reading. Did
you get the task right? If not, do you think that your interpretation(s)
related to any of those discussed? Did you accept Wason's denition of
correctness? If so was this because of the social pressure of authority?
Or was it because Wason was right? Summarise what changes if any
took place in your thinking during the chapter's arguments, and what
conclusions you draw now.
MIT Press Math6X9/2003/02/27:16:11 Page 79
80 Chapter 4
if we are allowed to not make it spin just a few times. A perfectly fair
coin, in the rst sense, in your practised hand might not behave as
a perfectly fair coin in the second sense. There might be some causal
connection between one toss and the next and so there would be a failure
of independence.
It is a short step from the concept of independence to the conclu-
sion that the gambler's fallacy is a fallacy. The reasoning behind the
gambler's fallacy is that there is dependence between dierent parts of
the sequence of events. If the early parts of the sequence have `too many
heads' then the later parts will have to have more tails. Nevertheless, this
reasoning does have a certain grip. Although you are probably sophisti-
cated enough to see through the simple version of the fallacy, the same
failure of reasoning shows up when the situation gets more complicated.
Besides, unlike the conditional reasoning involved in the selection task,
you are likely to have had at least some elementary statistical teaching
at school. If you are so sophisticated about probability that you don't see
any of this fallacious behaviour as plausible, and maybe don't even be-
lieve that it happens in experiments, then you should perhaps remember
that these were matters of active debate amongst Europe's nest math-
ematicians until well into the 17C. The debate arose in the context of
gambling games, and Pascal was one of the thinkers associated with the
rst deep understanding of these problems. If you are still sceptical that
human beings nd probability hard to reason about you may be amused
that Amos Tversky demonstrated that professional statisticians still suf-
fer from the same illusion if the problems are a bit harder and you catch
them on their day o.
An example of a slightly harder problem is the question which we
left hanging about the likelihood of dierent sequences of ten tosses. It
is quite compelling that the ten heads in a row seems freakish. People
would say `Wow! ten heads in a row', but they don't say `Wow! 3 heads,
followed by a tail, followed by 4 heads followed by 3 tails!'. Why not?
After all, if we were having to place money on the exact sequence of ten
tosses ahead of time, would you bet more money on the latter sequence
than on 10 straight heads? Your answer is probably no, though there is
evidence that in the general population there are those who still regard
the jumbled sequence as more probable. But there are more people who
can see the fallacy from the perspective of a question about a bet on a
future determinate sequence, than from the way the question was put
MIT Press Math6X9/2003/02/27:16:11 Page 81
initially.
Why should this be? Why should one determinate sequence seem
more probable than another? Just before going into this question, pause
and observe that we are in an analogous situation that we encountered
in the earlier discussion of conditional reasoning. We observe in the
population that people make systematic errors of reasoning. Subjects can
see, at least when shown through dierent ways of conceptualising the
problem, that these responses are in error. The psychologist's perspective
on this situation is that the task is to build a descriptive theory about
how people do in fact reason, an explanatory theory about why, and
maybe a pedagogical theory about how they can learn/be taught to
avoid them. The statistician's (or probability theorist's) perspective,
analogous to the logician's in the previous chapter, is rather to seek
a normative theory about how they should reason.
As the perceptive reader will have noticed in the previous discussion,
there are two sorts of questions which we can ask about coin-tossing
events. One is about the probability of specic sequences of events. The
other is about characteristics of unordered populations of events. Many of
the problems people have seem to stem from confusing these two kinds of
question. So, for example, there is a population of 10-member sequences
of coin tosses. This population contains 210 (i.e. 1024) sequences, of
which 10 heads in a row is one, 10 tails another, and all the others are
mixed sequences of heads and tails. As we have just argued, if the coin is
fair, each of these sequences is exactly as likely as each of the others. But
sequences of the kind which have ve heads and ve tails (in any order)
are a great deal commoner than the unique sequences of 10 heads, or
of 10 tails (because all reorderings of these are identical). If we plotted
the number of sequences containing N heads (from 0 to 10) against N,
we would nd a bell-curve. 1 with 10 heads. 10 with 1 tail . . . increasing
to a peak at 5 heads and decreasing symmetrically from 5 heads to 0
heads. So muddling these two kinds of questions perhaps might be at
the bottom of peoples' confusions.
So why does H T H H T T T H T H seem more likely than ten
heads? A related approach to this question is to ask whether you have
ever seen a coin yield such a sequence of tosses? Of course the answer
is likely to be `I don't have a clue'. Asked the same question about the
sequence of 10 heads, and the answer is likely to be `denitely not|if
I had I would remember'. The sequence of 10 heads has a signicance
MIT Press Math6X9/2003/02/27:16:11 Page 82
82 Chapter 4
84 Chapter 4
86 Chapter 4
subjects may not understand the question the way the experimenter
intends. Indeed, you may feel that the argument against your judgement
has something of the
avour of the Queen of Hearts complaining about
Alice: \If you haven't had any sugar it's odd asking for more." One could
have sympathy with Alice's argument that if she's had none then its very
easy to have more though impossible to have less. This line of argument
suggests that subjects interpret option 8 to mean a jazz player and not
an accountant. It is not clear how much of the phenomenon is due to
this kind of misinterpretation. It is clear from 2 and 3 that professionals
can have hobbies and hobbyists can have professions, though it might be
thought that listing them as multiple choices inclines us to treat them
as mutually exclusive. It has been argued that these errors happen much
less when people reason in situations they are highly familiar with. But
that is not an argument against Tversky's claim, which is that people
are highly susceptible to these errors in certain kinds of rather abstract
contexts, and these rather abstract contexts are common in our culture.
Consider what you wrote in explanation of your judgement (suppos-
ing you agreed with most of the subjects in the original experiment). It
usually goes something like this: \The description is a dead-ringer for
an accountant|especially the boring bit. It's not part of my stereotype
that an accountant would choose jazz for a hobby but maybe hobbies
are rather randomly chosen so it's not too unlikely. On the other hand,
jazz playing on its own really runs counter to the description so the 8 is
very improbable".
What seems to happen is that adding the information that Bill is an
accountant to the information that he plays jazz for a hobby makes
it easier to nd a plausible scenario which ts all the data. The
information about playing jazz still depresses the judged probability of
2 relative to 1, Bill simply being an accountant (as we would expect on
both scenario and probability grounds). But what we fail to notice is
that adding information to a prediction (2 relative to 8) always makes
it less probable. So what seems to be our natural way of assessing these
things, constructing a scenario and judging its plausibility gets us into
trouble with the probabilities.
Something like this lling-out of scenarios seems to be what make
horoscopes work. Adding more fairly vague information enables us to
nd a scenario from our current circumstances which ts (just as adding
the fact about accountancy to the jazz player makes it possible to
MIT Press Math6X9/2003/02/27:16:11 Page 88
88 Chapter 4
90 Chapter 4
a `slow virus' or a strange rogue protein called a `prion'. BSE, CJD and
scrapie all have long incubation periods from several to 30 years.
The question of some public concern is whether BSE is transmissible
to human beings as CJD. There has been a signicant increase in CJD in
the last few years. Although the disease remains very rare, it has roughly
doubled from about 25 to about 50 cases per year throughout the UK.
However, this may be because of increased attention leading to better
diagnosis, a common epidemiological phenomenon. The absolute rate of
CJD is still lower in the UK than in some other countries which do
not have BSE and are not thought to practice cannibalism, for example
Austria.
A quite dierent theory about the cause of BSE is that it is due to
organophosphorous insecticides|similar to the ones which have given
concern about nerve damage to agricultural workers who use sheep dips
and to soldiers in the Gulf War. The Ministry of Agriculture, Fisheries
and Food (MAFF) ran a campaign in the eighties to eradicate the warble
y from Britain, which involved heavy use of these compounds which
are known to cause nerve damage not wholly dissimilar to BSE. It has
been shown that organically farmed beef herds do not suer from BSE.
But they neither fed oal to cows when it was permitted, nor used
organophosphorous compounds, so they do not help distinguish these
two theories. On the other hand, no known mechanism is understood for
the transmission of organophosphorous poisoning from cow to calf, and
there is fairly strong evidence that BSE is transmissible.
Here is a scientic problem of some depth, but also a major political
and economic one. If BSE is transmissible to humans through eating beef
products, and the incubation period is variable between 5 and 30 years,
then we should not expect to be seeing much evidence yet of CJD result-
ing from humans eating BSE infected beef products. Reasoning about
causal processes which are so slow is known to be especially problem-
atical for human beings, and politicians in particular. I have presented
the evidence at this length because it is a case which acutely illustrates
some of the dierences between political and scientic reasoning. A case
which aects us all and which is currently unresolved makes it harder
for us to pretend to ourselves that there are easy resolutions to these
issues.
Let us concentrate on the evidence from new cases of CJD in the
UK population. One of the puzzles of the epidemiology is that there is
MIT Press Math6X9/2003/02/27:16:11 Page 91
92 Chapter 4
suciently low, then even two cases nationally could be extremely sug-
gestive. It would not in itself establish that BSE was the agent of trans-
mission. Perhaps abattoir workers were exposed to organophosphorous
insecticides from the cattle they processed in the 1980s. But it would
suggest that abattoir workers' doings would be a good place to look for
the cause of CJD. Of course it is also possible that no more abattoir
workers will be aected, but that that may be because transmission of
BSE to humans requires some other kind of contact to cause CJD|like
eating nervous tissue.
I have given this rather extensive summary of the science because it
is important to see how complex base rate information is embedded in a
real context, the complexity of the background information required, and
how hard it is to assess evidence without a well understood mechanism.
Against this background let us look at what happens when politics
meets science. Late in 1995, a minister from MAFF (the ministry of
ag. and sh, as it was known) announced that there was no conceivable
possibility of a link between BSE and CJD. He made this announcement
on the basis of advice that he received from the Committee on Infectious
Diseases, a group of scientic experts on epidemiology. The rst reaction
of any scientist (and now I mean anyone who understands about the
relation between scientic theory and evidence, rather than someone
expert on the interspecic transmission of cattle diseases) reading such
a statement must be that the minister has not understood, or willfully
misunderstands, the advice. Science has repeatedly demonstrated links
between phenomena which were considered `inconceivable'. Science does
not remove the conceivability of such causal links. The link was highly
scientically conceivable in 1995. The dominant theory of the origin of
BSE in cattle was that it is a bovine form of a sheep disease. It was
highly conceivable that this disease might pass from cattle to humans,
even if the biological distance is much greater than from sheep to cattle.
On the other hand there is considerable evidence that runs against
this theory at present. There was even an alternative theory having to do
with organophosphorous compounds. And the scientists concerned might
well acknowledge that it is perfectly conceivable that neither theory is
correct|some other agent or agents could be at work.
One might take the view that the politician is simply a liar. The
government took a great risk in changing the regulations over feeding
oal to cattle. It then dragged its heels over reintroducing regulations to
MIT Press Math6X9/2003/02/27:16:11 Page 93
control it. In fact these regulations were not properly policed, and the
announcement just quoted came from a re-reintroduction of procedures
to control the treatment of oal. The government has a strong motive in
believing, and having us believe, that BSE and CJD are unconnected.
Whether this particular politician is lying or not, our interest in the case
hangs on the interpretation of scientic reasoning and its comparison
with reasoning about policy. What should the politician say about the
evidence to hand?
One frequent complaint by politicians about scientic information
is that it is so technical no one but an expert can understand it, so it
must be interpreted for the public. What the public wants to be told,
or so the politician argues, is whether to eat beef or not. Scientists are
willing to entertain possibilities so inconceivable to the rest of us that
the politician is quite justied in stating that a link between CJD and
BSE is inconceivable. Joe Public, the politician argues, knows that life
is a risky business with which he or she must get on, and the scientists'
luxury of focusing on the highly improbable is not an available, or even
safe, option. Viewed from the perspective of someone who has to act,
focusing too much on ill-understood possibilities can lead one to succumb
to the blindingly probable.
This complaint deserves serious attention. Several of the premisses at
least are true. Science, at a particular stage of development, frequently
makes no judgement on an issue. The issues science remains silent on
are frequently the life and death issues which we do all have to make
decisions on every day. In fact one of the dening features of scientic
theories is that they choose their own problems rather than having the
practicalities decide for them what phenomena they shall encompass.
This is not to say that applicability plays no part in what problems sci-
entists spend their time on. CJD was a minor epidemiological backwater,
perhaps preserved in the textbooks by the frisson of cannibalism, until
BSE emerged. But the theoretical core of the biology that underpins this
study is immensely agnostic about many of the everyday particularities
with which we all contend.
Here is perhaps one of the greatest contrasts between the humanities
and scientic disciplines. The humanities (and in many ways social
sciences are in this respect more like humanities than natural sciences)
must address the issues of the day. Not every issue in every detail. The
humanities do not have to have a position on CJD and BSE. But they
MIT Press Math6X9/2003/02/27:16:11 Page 94
94 Chapter 4
28, having grown from 3 cases identied in 1995. This new variant is
believed to be the disease transmitted by eating beef products and shows
rather dierent symptoms than what is now known as `sporadic' CJD|
the disease as it was known before the BSE outbreak. The experts are
now much more condent that prions are the mechanism of infection.
But the expert estimates of the future extent of the epidemic still range
from the hundreds to hundreds of thousands of deaths over the next
half century, such is the indeterminacy our knowledge of the average
length of incubation of this disease. The `good' news, as far as the size
of the epidemic goes, seems to be that new variant CJD aicts much
younger people than sporadic CJD. This means at least some incubation
MIT Press Math6X9/2003/02/27:16:11 Page 96
96 Chapter 4
times must be shorter. If mean incubation times are shorter, then the
epidemic may be at the smaller estimates and be already waning. Human
understanding (particularly perhaps human political understanding) is
ill-adapted to phenomena that operate on this kind of time-scale. We
are animals evolved for responding to dangers that unfold rather more
quickly.
We have spent a long time on BSE in order to illustrate some re-
lations between what we know, what evidence we have, and what we
communicate. What we know is dependent on what discourse we are
engaged in. What is all too easily conceivable to the scientist may be
inconceivable to the politician. Learning some cognitive science means
asking this question about the relations between theoretical knowledge
and practical utterance. Taking an example that is rather topical for at
least one readership should bring home the far from academic nature of
these issues. BSE may seem a parochial UK illustration. A good home-
work exercise is to write an analogous treatment of the arguments about
global warming|a somewhat less parochial case which nevertheless il-
lustrates many of the same points. The causal processes are even slower
and still more obscure in their totality. The question about what is po-
litically reasonable to do at what stage of development of the scientic
evidence, even more acute.
A curious property of the political debate about global warming
is the focus on whether the agreed global increase in temperatures
is the result of natural or articial causes. Those who argue against
early action before the scientic evidence is unassailable appear to have
the moral argument in mind that if its `not our fault', then we need
not try to do anything about it. Of course their argument might be
that there is nothing we can do about it, but if their evidence is so
strong that we're already doomed, then it would be kind of them to
reveal the evidence. Besides, the argument that we should wait for
unassailable scientic evidence often comes from those whose concept
of unassailability doesn't even accept the evidence for evolution as
unassailable. The same constituency is only too happy to point out how
political action frequently has to be based on hunch rather than proof.
Something seems awry? The issues of vested interests are perhaps even
clearer than they were with BSE.
The politician and the scientist make utterances from dierent ev-
idential positions, for dierent purposes and with dierent constraints
MIT Press Math6X9/2003/02/27:16:11 Page 97
98 Chapter 4
100 Chapter 4
A to be executed (0) 0
A to be executed (0) 0
A B C (1/3)
B to be executed (0) 0
C to be executed (1) 1/3
A to be executed (0) 0
C to be executed (0) 0
Figure 4.1
The Prisoner's Paradox solved by a tree-diagram of all possible situations.
Jailer says B
C released
A released
B released
Jailer says C
Figure 4.2
The Prisoner's Paradox solved by a `roulette wheel' representation.
102 Chapter 4
your behaviour with that of your friend on the bus. Suppose that your
priorities are to remain together. You would both prefer to get o and
have a drink with your acquaintance, but the worst outcome for you
both is that one gets o and the other doesn't. The bus doors open and
you must independently decide whether to get o.
In this example, there is an object-level of knowledge about the
message transmitted by the third person|you both get the message from
the party outside the bus. But there is also critical meta-knowledge
about who knows what about who has this object-level knowledge. In
order to decide what to do I need to know whether you received the
message. But a moment's thought is enough to see that I also need to
know whether you know that I received the message, and so on through
indenitely many levels of meta-knowledge. If we perhaps succeed in
making eye-contact we can perhaps establish our mutual knowledge and
both get o the bus.
One hallmark of human communication is that it often succeeds (or
fails) by the sender of the signal getting the receiver of the signal to
realise that the sender intended to communicate by the signal, and in-
tended the intention to be recognised as such by the receiver. Not all
human communication has this characteristic|we may unintentionally
communicate things (perhaps giving the game away) but much deliber-
ate communication has this character.
Mutuality of knowledge is also important for understanding phatic
communication. Many phatic social rituals have the characteristic that
not only shall they be done but they shall be seen to be done, and
this because their function is to ensure mutual knowledge. Such genera-
tion of mutual knowledge is important in generating mutual trust. The
importance of eye-contact in individual-to-individual communication is
often to be understood in terms of establishing mutuality of knowledge
through joint attention. Many apparently purposeless or downright ir-
rational human activities seem to require explanation in these general
terms of the attainment of mutual knowledge. In the next section we
will take a look at what a situation where communication is prevented
can tell us about communication, specically about how communicative
conventions arise and are maintained in communities.
MIT Press Math6X9/2003/02/27:16:11 Page 103
104 Chapter 4
both will receive a sentence of say 7 years. If one confesses, and the
other doesn't, he turns state's evidence and gets only 1 year, but the
other then gets the maximum sentence of 20 years. If neither confesses,
the police have insucient evidence so they both get o with no penalty.
Here is a communication problem with a vengeance|except no
communication is allowed. The prisoners may wish they could resort to
telepathy . . . The payoff-matrix of this game is what is interesting
about it. Games with a simple pay-o matrix in which players just
compete with each other, and the more one wins, the less the other gets,
are called zero sum games. The sum of rewards for all players at the
end of the game is always the same, whatever strategy they pursue, and
by choosing suitable units of reward and punishment, their sum can be
normalised to zero (hence the name). Chess is a zero-sum game: winner
takes all and draws split the points equally. The Prisoner's Dilemma is,
in contrast, a non-zero sum game. The punishments have dierent
totals according to which plays the players make, with a range from 0
if they both keep quiet, through 14 if they both talk, to 21 if one keeps
quiet and the other talks.
The Prisoner's Dilemma is an extremely important abstract game
for understanding social coordination. One reason is that it clearly
shows how dierent ways of dening `rationality' and their dierent
ways of counting benets and losses have dierent consequences. If we
dene benets and losses in individual terms and ask what strategy is
rational to maximise individual gain and minimise individual loss, then
we get one answer: if we dene benets and losses communally and ask
what strategy will maximise communal gain and minimise communal
losses, then we get another answer. This relation between individual
and communal concepts is one of the fundamental issues in the social
sciences. Modern economics is overwhelmingly based on the behaviour
of a theoretical species Homo economicus who seeks only to maximise
individual benet. There are arguments that in fact Homo sapiens wisely
does not t the theoretical model well at all, except in some limited
but ill-understood circumstances. The Prisoner's Dilemma is important
because it shows how Homo economicus loses out.
Life is a non-zero sum game. Many activities are neither wholly
cooperative nor wholly competitive, but an interesting mixture of the
two. As the police are well aware, members of the criminal fraternity are
quite good at coming out in the optimal cell of the payo matrix (which is
MIT Press Math6X9/2003/02/27:16:11 Page 106
106 Chapter 4
108 Chapter 4
110 Chapter 4
112 Chapter 4
4.4 Summary
Starting out from the unpromising consideration of sequences of tosses
of a coin, we have seen how peoples' reasoning in uncertainty connects to
MIT Press Math6X9/2003/02/27:16:11 Page 113
4.5 References
Kahneman, D., Slovic, P. & Tversky, A. (eds.) (1983) Judgment under
uncertainty: heuristics and biases Cambridge: Cambridge University
Press, c1982.
Classical collection of papers on reasoning in uncertainty.
Schelling, T. (1960). The strategy of con
ict. Oxford University Press.
One of the earliest applications of game theory to understanding social
behaviour.
One of the earliest sociological applications of games theory ideas to
issues about convention and negotiation.
Lewis, D. (1969). Convention : a philosophical study. Harvard Univer-
sity Press: Cambridge, MA.
Develops ideas from games theory to explain linguistic conventions.
Now classical philosophical account of both the arbitrariness and deter-
mination of conventions.
Chwe, M. (2001) Rational ritual: culture, coordination and common
knowledge. Princeton University Press.
Nice discussion of the role of social ritual in establishing mutual knowl-
edge.
MIT Press Math6X9/2003/02/27:16:11 Page 114
114 Chapter 4
Exercises
Exercise 4.1: Review your notes on your own answers that you gave
to the questions asked throughout this chapter. What have you learned
by reading it? Review the similarities and dierences between this area
of reasoning and that reviewed in the previous chapter.
5.1 Introduction
In Chapter 2 the limitations of Shannon and Weaver's nite codes led us
to consider the consequences for theories of communication of assuming
that languages are essentially innite. Innite languages can only be
studied as systems of representation and inferences over them ie. in
terms of computation. Chapters 3 and 4 argued the need for logical,
probability and game-theoretic models (among others) as normative
standards of reasoning embodying such innite systems. But we saw
that there is generally a plurality of possible standards for a naturally
ocurring task. Interpretation of each new context into the terms of one
or other model is a major component of communication. Interpretation,
we saw, is guided both by the content of messages and by the context
of their utterance Deriving an interpretation and reasoning from it are
processes which iterate. If our reasoning from our interpretation leads to
contradiction or implausible conclusions, then changing interpretation is
a natural response. So understanding the mind is about understanding
how content, context and formal systems interact.
Although we have seen how it may be controversial just what is the
right `gold standard' of reasoning in any particular context, nevertheless
some systematisation of reasoning is required even if only to bring order
into descriptions of what people do. Combining the empirical observation
of peoples' behaviour with mathematical and computational modeling
of standards of performance is at the heart of cognitive science. This
Part of the book provides an introduction to some of these systems
for modeling behaviour and examines the kinds of issues that arise in
tting them to the data of human behaviour. This chapter introduces the
system of propositional logic and examines the tensions involved in using
it to model the behaviour of conditional reasoning. The next chapter
introduces the notion of computation as a quite general framework for
modeling behaviour. Chapter 6 introduces issues that arise in applying
this computational approach to understanding human behaviour.
Cognitive science is all about the tension between formal modeling
on the one hand and contentful behaviour and experience on the other.
Formality is about being free of content. The modern mind is
MIT Press Math6X9/2003/02/27:16:11 Page 118
118 Chapter 5
form. We reviewed this argument and concluded that it was valid but
that at least one of its premisses was false|the two rules were of dierent
forms. What is more the descriptive rule has multiple possible interpre-
tations varying on a number of dimensions (biconditional/conditional,
constant/variable anaphor, brittle/robust to exceptions, etc. etc.). We
observed that the content of problems cues people to the likely form
intended. For just one example of many, with a conjunctive rule, it ap-
peared that subjects were in
uenced by the already visible falsity of a
descriptive interpretation of a rule to adopt a deontic interpretation as
more likely what the experimenter had in mind.
This discussion leant heavily on unexamined notions of content and
logical form. These are two ruling concepts in logical and computational
theories. We now seek some theoretical apparatus to elaborate exactly
what we mean.
5.2 Logic
Some basic concepts
The most fundamental distinction in logic is between truth and
validity. Logic studies relations between sets of premisses which
are generally sentences, and conclusions which are more sentences.
Validity is dened in terms of truth: an argument from a set of premisses
to a conclusion is valid if, whenever the set of premisses are all true,
then the conclusion is true. Logic is about guaranteeing that if we
start reasoning from truths, we won't get to false conclusions|valid
patterns of argument preserve truth, but they do not necessarily have
true conclusions. False premisses plugged into a valid argument may
yield either true of false conclusions|garbage in leads to the possibility
of garbage out.
Arguments are vehicles for getting us to new conclusions whose truth
we don't yet know. But if any one of the premisses is false, then all
bets are o, even if the conclusion is true. If even one interpretation of
the premisses can be found in which the premisses are all true but the
conclusion is false, then the argument is invalid.
In these statements, the two words whenever and interpretation
are critical to understanding what logic is all about. Logic understands
languages as having two levels| syntactic and semantic. At a
MIT Press Math6X9/2003/02/27:16:11 Page 120
120 Chapter 5
analysing them into terms for things, properties, and relations. Other
more powerful logics (such as predicate logic) subsume PC but also
analyse structure within clauses and sentences. So sentences are the
contentful part of PC and the connectives are what denes the form.
The only way the contentful sentences gure in the logic are as sen-
tential variables: P; Q; R; : : :. These symbols only constrain content in a
very weak way|within an interpretation, the same letter always stands
for the same content.
A syntactic denition of PC is usually given by what is called a
recursive definition. First a nite vocabulary of connectives is listed:
122 Chapter 5
this case, 21 = 2.
This particular connective : reverses the truth value of the simpler
sentence it is prefaced to. The logical name for this operation is nega-
tion. In English, the rather stilted operator it is not the case that . . .
has a similar syntax and semantics. It can be placed on the front of any
indicative sentence, and it reverses the truth value. Correspondences to
other ways of expressing negation in English are not quite so straight-
forward. For example the sentence All the circles don't have crosses
in is ambiguous, and its two meanings depend on whether don't is in-
terpreted as if it is not the case that had been appended to the front
of the sentence (and the don't dropped), or whether the negative only
applies to the `verb phrase' have crosses in. So : models some aspects
of negation in English, but by no means all.
The tables for the other connectives each have two sentential vari-
ables, and so look like this:
P ^Q P Q P _Q P Q P !Q P Q
T T T T T T T T T
F T F T T F F T F
F F T T F T T F T
F F F F F F T F F
The usual nearest correspondences to familiar English connectives
will be given presently, but rst we look at what was done here purely
formally. The method of truth tabling allows you to decide for any
argument whether or not its conclusion is true whenever its assumptions
are true|that is to say whether the argument is valid. The central
concept is that of a row as one interpretation of the sentential variables|
a row denes a little world by stating some things that are true and some
that are false in that world. Tabling then oers a method of generating
a row of table for every possible assignment of truth values to variables,
and so cashes out our notion of all possible assignments (interpretations)
mentioned above.
We use this method to assess the validity of an argument formalised
in PC. Let's take the example argument which has a single premiss
P ^ Q and the conclusion :(:Q). A table with all possible combinations
of values for the atomic sentential variables (P and Q), and with columns
for premiss and for conclusion then looks like this:
MIT Press Math6X9/2003/02/27:16:11 Page 124
124 Chapter 5
126 Chapter 5
examination of the table showed that in all the rows where the pre-
misses were all true that the conclusion was also true. That would mean
that the inference from premisses to conclusion was semantically valid.
Imagine further that a given set of inference rules provided us with no
way of deriving the conclusion from the premisses. Then that set of
rules would be incomplete. Incompleteness is a certain sort of inade-
quacy of a rule system to capture all the semantic truths of the system.
Sometime incompleteness is simply due to a missing rule or rules. But
sometimes, more fundamentally, no set of rules can actually capture all
the consequences|in other words, there may be fundamental limits to
formalisation of semantic consequences.
PC is in fact complete. Much of logic is about the relation between
the semantic facts which dene a logic, and various proof theories which
capture or all or part of these facts. The proof theory gives ways
of computing semantic facts, but as we shall see, not everything is
computable.
Finally, decidability is a third crucial metalogical property. A
system is decidable if there is an algorithm (a mechanically applicable
method guaranteed to reach a solution in nitely many steps) which can
always prove or disprove any target conclusion from any set of premisses.
PC is in fact decidable. But most logical systems are undecidable. One
proof of PC's decidability is based on the truth table method. It isn't
hard to see from the description above that any argument that can be
stated in PC can be truth tabled in a table of 2N rows, where N is the
number of distinct atomic sentential variables. Although this number
gets very large very fast, it is always nite. The basic reason most
logical systems aren't decidable is easy to state informally. Whereas PC
proofs only depend on the number of atomic variables in their statement,
most logics analyse sentences into terms, and their semantics is about
the things those terms denote. Whereas the number of atoms in the
assumptions and conclusions is only ever nite, even short sentences
may be about innitely many things. `Even numbers greater than two
are not prime' is such a sentence.
Truth-tabling won't work as a decision procedure for these logics
because an innitely large table would be required. Sometimes there
are other methods that work. Sometimes it is possible to prove that
there is no method which will work. In the latter case, such systems are
undecidable.
MIT Press Math6X9/2003/02/27:16:11 Page 128
128 Chapter 5
scope for possibly strange beliefs, especially when we want to talk about
social practices like witch-craft and the law.
Before we turn to computing, we need to do two things. First to look
at the correspondences between English and PC which have been so as-
siduously avoided in order to give the advantage of an uncluttered view.
Then we turn to applying some of the logical concepts just introduced to
arguments in context to see how reasoning and interpretation interact.
130 Chapter 5
careful. Not being usable to tell lies may not seem a strong qualication
for truth, but it is a qualication adequate for PC.
PC rejects the need for any contentful connection between the mean-
ings of the clauses of implications but guarantees validity despite that.
This oends some logicians and psychologists. They argue that people
assert conditionals because they know there is some connection between
antecedent and consequent, and they want a logical analysis to capture
that connection. There is no dispute that people do indeed often have
this reason for asserting conditionals, but there is a good question as
to whether this is to be modeled in logic. Interestingly, the same people
usually do not protest about vee as a model of or, yet (:P ) _ Q is log-
ically equivalent to P ! Q. People can use or to express implications
such as Either the switch is up OR the light is o. They do this because
they know of connections between these meanings. The evidence we have
for a sentence is not necessarily part of what it means, though it may
be an important part of what we communicate in using the sentence.
These are subtle but important distinctions. We return to them when
we discuss the pragmatics of natural language in Part IV.
It is also clear that people do sometimes venture to assert condi-
tionals merely on the grounds of their knowledge of the falsity of the
antecedent: If Margaret Thatcher is a Trotskyite, I'll eat my hat. Neither
Margaret's secret leftward leanings nor their absence are causally (or in
any other way) related to the speaker's garment-hunger. The speaker
exploits the obvious falsity of hat-eating to indirectly assert the falsity
of the antecedent. The material conditional analysis ts this case rather
well. If we develop this line of argument we can see that there are many
things going on besides the bare logic in determining why people assert
things in communication, but the best theoretical approach may not be
to try and build them into the logic. In short, the correspondences be-
tween PC and English connectives are complex but not as hopeless as
might at rst appear. As with any mathematical model used in science,
tting the model to the natural phenomenon (here the logic to natural
language use) is a complex business.
The paradoxes of material implication are closely related to some
of the issues of interpretation of Wason's rule described in Chapter 3.
If one interprets the `if .. then' of the rule as the material conditional
dened here in PC, then the truth of the conditional for the four cards is
MIT Press Math6X9/2003/02/27:16:11 Page 132
132 Chapter 5
sucient to establish the truth of the rule.1 Similarly, the falsity of the
rule with regard to a single card is sucient to establish the falsity of the
whole rule. Material implication is what we called brittle with regard to
exceptions. The law-like conditionals described in Chapter 3 (All ravens
are black, if the switch is down the light is on etc. ) cannot be completely
captured by material implication. So Wason's competence model of the
conditional is the material conditional and from a logical point of view
this is strange. It is doubly strange that such a basic model should be
adopted by psychologists who were so dubious about the relevance of
logical models of reasoning to theories of human reasoning.
This much logic can also be used to throw light on the central
issue raised in Chapter 3 about the relation between interpretation and
reasoning. We saw there that there was continual interaction between
interpretation and reasoning, and we described subjects' reasoning as
mostly engaged in making sense of con
icting pieces of information about
the many possible intentions of the experimenter. We also observed
psychologists arguing that reasoning in logic proceeds only with regard
to the form of the premisses.
Now that we have some logical machinery we can see how to encom-
pass the rst phenomenon and also what is wrong with the second. Logic
is made up of two parts|assignment of interpretations to sentences, and
reasoning from premisses by inference rules. Within an argument inter-
pretation must remain xed, but in the larger view, reasoners cycle be-
tween assigning an interpretation and reasoning from it. If they come to
a conclusion that is implausible they may change the interpretation and
start reasoning again. This was our general model of what was going on
in the selection task, and the psychologists' claims that logic always pro-
ceeded mechanically in virtue of form was a misunderstanding of logic
engendered by only looking at one half of it.
Lets illustrate further by examining another argument that psychol-
ogists have made about how people do not obey logical rules. The prob-
lem is this: given if the switch is down (P), the light is on (Q) and the
switch is down (P), people naturally conclude the light is on (Q). But
then given an extra premiss, if the electricity is o, the light is o (R
! : Q), they withdraw the conclusion that the light is on. How are we
1 Strictly speaking one would need predicate calculus to encode Wason's rule because
it is about four cards, but predicate calculus has the same conditional connective and
so we need not introduce its extra-complexities here.
MIT Press Math6X9/2003/02/27:16:11 Page 133
134 Chapter 5
5.4 Summary
In this chapter we introduced just enough logical apparatus to give an
idea of a very simple formal reasoning system|propositional calculus.
We distinguished its syntactic specication from its semantic interpre-
tation. Reasoning from premisses in a formal system proceeds entirely
in terms of the forms of the squiggles in the system and the rules which
manipulate those squiggles. However, reasoning is only useful in virtue
of having an interpretation for the squiggles, and the rules for infer-
ence have to respect the nature of the interpretation. Any given set
of squiggles can support many interpretations, just as bits of natural
language have many interpretations. So form interacts with content and
context through the interactions of interpretation and reasoning. Within
a bout of reasoning interpretation must be xed, but between bouts it
MIT Press Math6X9/2003/02/27:16:11 Page 136
136 Chapter 5
5.5 Reading
Hodges, W. (19??) Logic. Harmondsworth: Penguin.
A highly readable introduction to the basics.
Hofstaedter, D. R. (1980) Godel, Escher, Bach: an eternal golden braid.
Harmondsworth: Penguin Books.
A highly imaginative and entertaining meditation on the early 20C
discoveries of metalogic.
Stenning, K. & Monaghan, P. (in press) Strategies and knowledge
representation. In: Sternberg, R. & Leighton, J.P. (Eds.) The Nature
of Reasoning. Cambridge University Press.
An introduction to the relation between logic and the psychology of
reasoning which stresses that deduction comes to the fore in cognition
when interpretation (of our own or others') language is at stake.
van Lambalgen & Stenning (in press) Semantics and cognition: logical
foundations for the psychology of reasoning. MIT Press.
MIT Press Math6X9/2003/02/27:16:11 Page 137
Exercises
Exercise 5.1: Is p ! q logically equivalent to (p _:q )? That is, do they
have exactly the same truth values as each other under all assignments?
Oer an argument in terms of truth tables and one in terms of rules.
Exercise 5.2: The same question as 1) for the pair of sentences (:p _
:q) and (:p) ! q
Exercise 5.3: The following is an argument:
"The elements of the moral argument on the status of unborn life
strongly favor the conclusion that this unborn segment of humanity has
a right not to be killed, at least. Without laying out all the evidence here,
it is fair to conclude from medicine that the humanity of the life growing
in a mother's womb is undeniable and, in itself, a powerful reason for
treating the unborn with respect."
Here is an argument by analogy that the rst argument relies on an
equivocation:
\The humanity of the patient's appendix is medically undeniable.
Therefore, the appendix has a right to life and should not be surgically
removed."
Does the rst argument rely on equivocation? If so, can the equivo-
cation be repaired?
The example was borrowed, with full acknowledgement, from:
http://gncurtis.home.texas.net/equivoqu.html
MIT Press Math6X9/2003/02/27:16:11 Page 138
138 Chapter 5
MIT Press Math6X9/2003/02/27:16:11 Page 139
140 Chapter 6
142 Chapter 6
revealed.
Attempts to formalise and mechanise human reasoning are older than
Aristotle's logic. If argument can be formalised, then disagreements due
to errors of reasoning and calculation can be avoided. Attempts to engi-
neer machines for reasoning also have a long history, and the interaction
between the mechanical developments and the mathematical theory is
an extremely interesting case for understanding the relation between
pure and applied science and engineering. Pascal was responsible for an
adding machine in the 17C. Babbage in the 19C designed, and partially
constructed a mechanical machine for reasoning.
The late nineteenth century was a time of great theoretical mathe-
matical advance. Logic had not changed in its formalisation much since
Aristotle, but in this period suddenly took great mathematical strides
in the work of the German mathematician Frege and others. In this con-
text, Hilbert, another German mathematician conceived the program of
mechanising the whole body of mathematical reasoning. Hilbert's pro-
gram was entirely theoretical. When Hilbert talked of mechanisation he
meant the demonstration that mathematical problems could be solved
by formally dened rules which required no human intuition for their
application. The connection between this idea and machine execution is
close, but Hilbert was a mathematician, not an engineer. Mathematical
logic and metamathematics, the study of the properties of logical
systems, had reached a point where this age-old dream (or nightmare,
according to taste) was suddenly a plausible research program. Hilbert's
program was taken seriously by many of the great mathematicians of
the day.
In 1931, another German mathematician, Godel proved a strange
theorem of metamathematics which stopped Hilbert's program in its
tracks. Godel proved what has come to be known as the incomplete-
ness of arithmetic|any formal system suciently expressive to rep-
144 Chapter 6
symbols and print others. And it can move the tape one square at a time
in either direction. It can reach a state called halt, perhaps signied
by printing a certain symbol on the tape. The head does these things
according to completely specied rules. An example rule would be: If
the square beneath the head has an X, then erase it, print a Y, and move
one step left. A given machine is dened by the set of rules that the head
operates by. In the simplest case, only one rule ever applies at a time, so
there is no need for any mechanism which decides between options|the
whole process is completely deterministic.
It is not dicult to describe a very simple Turing machine for
adding two numbers. The problem is represented by representing the two
numbers on the tape|say by two blocks of 1s (simple tally notation)
separated by a blank. The machine then processes this starting tape
and eventually reaches the halt state. The state of the tape when it
halts is the answer to the problem. For an adding machine, all it has to
do in the tally representation system suggested here, is to move the 1s of
one number across the blank separating the two numbers in the initial
representation (by erasing from one space and moving and rewriting)
until it gets to the end of the number it is adding and halts. It will then
leave a string of 1s on adjacent squares which represents the answer
to the problem. Not the most exciting procedure, but as an abstract
mathematical device it is immensely productive.
Starting from this notion of a machine, Turing asked what could such
machines compute? Turing approached this problem not by constructing
machines piecemeal to compute sums, products, square roots, . . . some
sample of interesting mathematical functions. He saw that he could
construct a universal machine. That is, he could specify a machine
which could have any other machine's head encoded on its tape as data,
along with the data for the encoded machine's particular problem. The
universal machine could then interpret the encoded machine and mimic
its operations on its encoded data.
To simplify matters, Turing supposed that the tape was indenitely
extendable|whenever the head was about to move o the tape, another
square would be magically added to it. Such a machine is called an
innite Turing machine. This extension is important because for many
problems, the machine's `workings' may take up much more tape than
either the statement of the problem or the answer. Turing's universal
machine could be shown to be able to compute any function that could
MIT Press Math6X9/2003/02/27:16:11 Page 146
146 Chapter 6
drites dendrite sticking out of the cell body, and one long bre called
an axon. The dendrites and axons make connections with other cells'
dendrites, and axons connect to other neurons' bodies, at synapses.
Tens of millions of neurons in the human brain have thousands of mil-
lions of synapses with other neurons. Some of the brain's computations
are relatively well understood at some levels, particularly the computa-
tions involved in perceptual processes. This is because it is possible to
control the stimulus to an animal's sensory organs, and to record the
resulting computations in neurons in the active brain.
From such studies it is clear that neuron's are active elements. Far
from being a computer with one active element and a large passive store,
the brain is a computer entirely composed of active elements, each of
which has a rather small passive store. The storage of information resides
primarily in the connections between neurons (the synapses) which are
adjustable in their resistance. The ensemble of neurons computes by
passing electrochemical impulses along axons and dendrites. All neurons
are actively ring at some frequency all of the time, though they go
through periods of more and less intense rates of activity. They adjust
their activity according to the intensity of the impulses they receive
at any given time from all their synaptic connections. Nothing could
be more dierent in physical design from a Turing machine. Turing in
fact did spend some time speculating and experimenting on biological
computation.
So if biological computation is so dierent from currently engineered
computation, why introduce the concept of computation in terms of
Turing's machines? The layman might jump to the conclusion that
science adopts Turing's model because articial computers are so much
more powerful than the mere human brain. This should be scotched
immediately. The world's most powerful supercomputers are beaten
easily for many simple perceptual tasks by rodents' brains. Computer
designers are currently engaged in an enormous research eort into how
to control parallel computation|they would give their eye-teeth for
the brain's facility in massive parallelism. No, the reason is that cognitive
science is concerned rst and foremost with the abstract concept of
computation, and the study of this concept is made much simpler by
Turing's abstractions. Cognitive science cannot entirely ignore issues of
implementation |how the physics of computational devices underwrite
the operations which are interpreted as the processing of information.
MIT Press Math6X9/2003/02/27:16:11 Page 148
148 Chapter 6
sound `dog' had been heard frequently around dogs but not around
candelabra or birthdays, then `dog' became conditioned to dogs and
evoked a suitable response when heard again.
This account of language is perhaps made more bizarre from com-
pression, but it is fairly bizarre even at full length. The hallmark of
language behaviour is that language often happens at a remove in both
time and space from the things it describes|dogs and candelabra. But
Chomsky's 1957 attack comes not from the need for this semantic dis-
location, but from the fact that sentences in language are not simply
made of words strung like beads on a string|they have hierarchical
structure. This is where Turing's automata come in. The theory of ab-
stract computing machines had developed by 1957 so that three main
powers of machine were known. By restricting the tape movement of
a Turing machine to a single direction, one gets a much less powerful
machine called a finite state machine. Such a machine literally has
no memory for what its previous computations have been, save for the
identity of the last rule applied and the current symbol under the head.
Chomsky showed that these machines provide a plausible model for what
behaviourists claimed about the structure of the human language proces-
sor. Their lack of memory mimics the denial of mental representations.
He then went on to show why such a processor cannot compute some of
the most basic structures of human natural languages.
Finite state grammars, as automata for producing sentences, are
more usually and conveniently represented as node-and-link diagrams
than as uni-directional Turing machines. Nodes represent states. Links
are possible transitions from one state to another. One node represents
the start state and another the halt state. Each link is decorated by an
action which represents the output of a word. The machine's current
state is represented as a single activated node at any one time. The
automaton's lack of memory for the history of its computations is
re
ected in the fact that the only record accessible to the machine is
the identity of the current state. If there are two states from which the
current state can be reached, then there is no memory for which was the
route taken, and therefore, if there is a choice of links out of the current
state, there is nothing on which to conditionalise which route should be
taken next.
Figure 6.1 represents a nite state machine for generating some
English sentences. To restate the rules for interpreting such a diagram:
MIT Press Math6X9/2003/02/27:16:11 Page 150
150 Chapter 6
chased
Etta Pip
start stop
Etta Pip
caught
and
Figure 6.1
A nite state machine
start at start, and follow the arrows. If you have a choice, choose at will.
If you are at stop, you can nish. Each time you follow an arrow, write
down the word (if any), above the arrow. By following these rules, you
can produce any number of English sentences, consisting of one or more
occurrence of Etta chased Pip or Etta caught Pip, conjoined by and.
It is fairly easy to show, as Chomsky did, that there is no way
of re
ecting simple structures like relative clauses in such a device.
A sentence such as \The cat that chased the rat that ate the cheese
was a tabby" has an embedded structure|one sentence `the rat ate the
cheese' is embedded inside another `the cat chased the rat' which is in
turn embedded inside another `the cat was a tabby'. Any device that
can decode this hierarchical structure requires memory for the abstract
structure that these three sentences make. Such structure is omnipresent
in human languages. Ergo, the human processor is not a nite state
machine (or, equivalently, a Turing machine with a uni-directional tape).
Chomsky also made a further distinction between classes of automata
of dierent power. He distinguished context free from context
sensitive phrase structure grammars. Here it is a much more open
Exercises
MIT Press Math6X9/2003/02/27:16:11 Page 152
152 Chapter 6
Exercise 6.2: Explain why a Turing machine whose tape can only
move in one direction is equivalent to a nite state machine.
6.3 References
Weizenbaum, J. (1976) Computer Power and Human Reason: from
Judgement to Calculation. W. H. Freeman: San Francisco.
Still an excellent introduction to Turing machines and much else about
AI besides.
an intro to connectionism?
An introduction to understanding computation in the brain.
Chomsky, N. (1957) Syntactic Structures. Mouton: The Hague.
The original argument from the structure of natural language to the
necessity of abstract structures for understanding the mind.
Chomsky, N. (1959) Review of Skinner's `Verbal Behaviour'. Language,
35, 26{58.
A polemic that takes behaviourism at its word.
MIT Press Math6X9/2003/02/27:16:11 Page 153
7.1 Introduction
Turing's denition of computation makes a clear distinction between
representations (the symbols on the tape) and operations on them (the
operations of the head). We have seen already that at the lowest level of
implementation in neural wetware, this distinction may not be nearly so
easy to apply to biological computers as to Turing's machines. Neverthe-
less we still need the concept of representation, so it is important to re-
member that representations are themselves theoretical abstractions. A
mental representation may not be a localised symbol or string of symbols
(in some alphabet) inscribed somewhere in the brain|it may be realised
by a complicated pattern of resistances and rings in a large number of
synapses. Nevertheless, it will have to have certain gross computational
properties, and these are what concern us most.
Our discussion of representation will be divided into two. The rst
part will take the view from outside the black box and ask why we need to
posit internal representations on the basis of what the box can be seen
to do from the outside. This is one common psychological/linguistic
perspective. In the next chapter we take the synthetic perspective on
representations and look at what AI has learnt from the engineering
approach of building innards for black boxes.
154 Chapter 7
representations are very dierent from the sentences on the page. But
they are still representations in an important sense to be explained. We
also have internal representations of regularities which we have learnt
over long periods of time (rather than on one occasion like the event
of reading). For example, a representation of the meaning of the word
travel, or the skill of riding a bicycle.
Internal and external representations interact in complicated ways.
It is dangerous to suppose that the internal ones merely mirror the ex-
ternal ones|as we shall see that merely leads to regress. We often can
only get at the internal representations (remember something) when we
are in the right external context. Success in the task then depends on
a combination of internal and external representations. Using external
representations (say pencil and paper) may change which internal repre-
sentations (say working and long-term memories) we use while doing a
task (say arithmetic). Similarly, cognitive processes may be distributed
over a group of individuals (say the crew of a ship), plus a lot of external
representations (say radar screens, charts etc.) in such a manner that
no one member has all the information needed to do the task (say navi-
gate). Nevertheless, we still cannot understand the processes involved in
the team's computations without postulating both internal and external
representations and their interactions.
Language was one of the domains used most powerfully to argue for
the need for internal representations. We now turn to some of the prob-
lems encountered by positing internal representations. For this we shift
domains from language to another domain of mental representation|
mental imagery. Our discussion of computation so far has assumed that
the representations processed are drawn from `alphabets' of symbols.
But representation is a much broader issue, and comparing some other
styles of representation will help to broaden the basis of discussion.
The need for internal representations
Let us turn away now from language and look at another area of mental
representation|mental imagery. We do this to review another contri-
bution that Turing's understanding of computation makes to cognitive
science. Philosophers since Plato have written extensively on the nature
of mental imagery. Classical philosophers often used the metaphor of
a wax tablet receiving `impressions' from the sensory organs and stor-
ing them away until memory recalled them. Plato used the beautiful
MIT Press Math6X9/2003/02/27:16:11 Page 155
156 Chapter 7
158 Chapter 7
160 Chapter 7
did not foresee being used for that purpose at the time of receiving it. We
need representations to do all these things. What better example of this
phenomenon to take (especially in a book on human communication)
than you reading this text now? Well, matters will be easier with a
somewhat simpler text, but text comprehension is one of the areas in
which these truths are most self-evident.
Consider the following paragraph:
Napoleon entered as the door opened. The commander strode across the room
to the replace. He stood in front of the ginger haired woman seated on the
sofa. The mud-spattered man addressed his immaculately dressed cousin . . .
How do you think you remember this short paragraph of forty words?
Is each word a stimulus evoking some response before you pass on to the
next? It's pretty obvious that the words are structured in a certain way
to create the message that they convey. Scramble the words within each
sentence and something quite dierent will result|either \word-salad"
or some other quite dierent message. If the units are not independent
words, what are they? And what are you doing as you read them?
Before we set about these questions, without looking back at what
you read, judge which of the following sentences occurred in the para-
graph.
1. Napoleon was mud-spattered from his travels
2. The mud-spattered man addressed his immaculately dressed cousin
3. The commander walked across the room to the replace
4. Napoleon addressed his immaculately dressed cousin
5. He stood in front of the woman with ginger hair seated on the sofa
6. As the door opened, Napoleon entered
7. The woman crossed the room from the replace
8. He stood in front of the ginger haired woman seated on the sofa
9. He stood in front of the ginger haired woman seated on the sofa to
the right of the replace
This sort of memory experiment we have just conducted is one of
the methods that psychologists have used extensively to analyse what
happens as we comprehend text. Sentences like 7 are generally easy
to reject. 7 is in direct con
ict with the scenario described. All of
the other sentences are broadly consistent with the original, and your
MIT Press Math6X9/2003/02/27:16:11 Page 161
162 Chapter 7
the commander, and the mud-spattered man on the one hand, and
the woman and the cousin on the other. If you look back, there is no
explicit statement of these identities anywhere in the text. You happen
to bring with you to this reading task the knowledge that the most
likely Napoleon was a commander. You surmise that he has perhaps
just travelled and hence is mud-spattered. And there are no other
characters available to ll the descriptions in this fragment. You realise
that Napoleon is unlikely to be talking to himself and can't be his own
cousin, and there isn't anyone else around to be cousin other than the
woman. If you go back to the text, and you imagine some quite dierent
context for it, perhaps by adding a few sentences before the paragraph
starts, then you will have no diculty constructing interpretations in
which Napoleon is not the same person as the commander, and perhaps
the cousin is actually a fourth character (or maybe the commander), all
without changing a word inside the original text.
What we cannot do is proceed with comprehending such a text
without assigning identities and non-identities. Mostly, with well-written
text, we do this without even noticing. Only occasionally do we make
a mistake, perhaps with hilarious results. An extreme example of such
problems is often encountered with the following story:
A father and his son were driving down the motorway late one night when they
were involved in a collision with another car. An ambulance rushed them to
hospital. The son was badly hurt though the father was only slightly injured.
The father was admitted to a ward. They took the son immediately to the
operating theatre, where the surgeon waited. The surgeon took one look at
the boy and said \I can't operate. That's my son."
You may well have heard this before, but if you haven't, the chances
are that you will entertain various hypotheses about step fathers, or
family trees of amazing convolution. The surgeon was, of course, the
son's mother. Many people jump so rmly to the unwarranted inference
that the surgeon is male that they become quite unable to nd any
consistent interpretation for the story. Going beyond the information
given is something that we have to do all the time in making sense of
the information we get, and occasionally we get royally stuck, betraying
our beliefs and prejudices in the process
In understanding our Napoleon scenario we have to infer identities.
But we may go far beyond this in making the fragment meaningful. Old
movie freaks (on either reading) among you might well associate the
MIT Press Math6X9/2003/02/27:16:11 Page 163
164 Chapter 7
166 Chapter 7
168 Chapter 7
170 Chapter 7
172 Chapter 7
how this is achieved. Much of our knowledge about the `how' of represen-
tation comes from engineering|from AI's attempts to engineer mental
functions. Combining the methods of psychology and AI (and a lot more
besides) is our only hope of understanding the mind. The next part of
the course looks at some of these techniques applied to understanding
the structure of natural language and how it is processed by human
beings.
Exercises
Exercise 7.1: Why are image-based theories of mental processes more
prone to regress arguments than language-based theories?
7.3 References
Pylyshyn, Z. (1973) What the mind's eye tells the mind's brain: A
critique of mental imagery. Psychological Bulletin 80(1), 1{24.
A modern example of a sceptical argument against the role of imagery
in mental processes.
Dennett, D. (1991) Consciousness explained. The Penguin Press: Lon-
don.
Dennett's concept of the Cartesian Theatre as an architecture for the
mind is another interesting discussion of related issues.
Bartlett, F. (1932) Remembering: a study in experimental and social
psychology. CUP: Cambridge.
The classical study that established that human memory is an active
construction on the basis of cultural knowledge.
Luria, A. R. (1968) The Mind of a Mnemonist. Basic Books: New York
Fascinating insight into what normal human memory is not like, and how
MIT Press Math6X9/2003/02/27:16:11 Page 173
174 Chapter 7
MIT Press Math6X9/2003/02/27:16:11 Page 175
8 Describing language
178 Chapter 8
8.1 Preliminaries
We spend a great deal of time talking and listening, encountering tens
of thousands of words in a typical day. Our native languages are so
familiar and such an essential part of the environment we inhabit, that
we generally don't re
ect on our use of language. Like breathing or
walking, it's only in rare circumstances that we have to make an eort,
perhaps when we are trying to express or understand a complex idea.
Every science needs to make assumptions about the things that it
studies. These assumptions tell us what kinds of thing we can (and can't)
expect to learn about our object of study (cf. Section 1.2). In the case
of physics for example, one has to assume that there is in fact a world
which is ordered by principles that can be uncovered by experiment and
argument. It's just the same with the study of language|we have to
have a set of ground rules that tell us what we're studying and what
methods are appropriate for conducting experiments. In this section,
we'll discuss some basic assumptions of linguistics.
MIT Press Math6X9/2003/02/27:16:11 Page 179
180 Chapter 8
182 Chapter 8
Exercises
Exercise 8.1: Try out the sentences in (7) on other people and note
their reaction.
184 Chapter 8
utters the same words, we will again take this to be a dierent utterance
of the same sentence.
A standard position in linguistics takes the view that we want to
abstract over dierent utterances of the same sentence, in order to
focus on the regularities that collections of such sentences present to
us. The assumption here is that speakers of a language have access to
a collection of rules which dene the language that they speak (see also
Section 2.1 for discussion of an analogous distinction between system and
implementation). We can access these rules by tests of the kind shown
in the examples 7 and 9. These examples are relatively clearcut, and
one seems to get more or less unanimous judgements that one of each
of those sets of sentences has something wrong with it. We'll term these
judgements grammaticality judgements. They seem to diagnose
whether particular sentences are consistent with the rules speakers of
English have. We can then use such judgements in order to determine
what sentences our models of grammar should allow and which ones they
should rule out. To refer back to the discussion of Section 2.1, such an
approach may help us to uncover a single system underlying dierent
implementations. Consider the examples here:
(9) a. Which article did you le?
b. Which article did you le without reading?
c. Which article did you le the letter without reading?
d. Which article did you le without reading the letter?
The here is notation used by linguists to re
ect the judgement that that
sentence is odd. This fact remains, regardless of the accent in which the
sentence is pronounced, whether it is written or spoken and so on. If we
set up a model of language, we would want it to re
ect such judgements.
So, our use of intuitions allows us to group sentences into those which
are grammatical and those which are not. To introduce some jargon,
we take such judgements to give us information about the underlying
grammars people have, their linguistic competence. We distinguish
this from the study of performance, that is, of particular sentences
spoken by people in actual situations. Again the claim is that there are
properties of sentences which hold regardless of the situations in which
they occur.
MIT Press Math6X9/2003/02/27:16:11 Page 185
We've chosen our examples here quite carefully, and it's worth taking
a slightly wider perspective, to see how easy it is in general to apply the
criterion of grammaticality. The exercise 8.4 asks you to consider random
permutations of a sequence of words such as John ran up a big bill. One
such permutation is
(10) a up John bill big ran
This is clearly not something anybody would take to be a sentence of
English. You can probably nd examples using those words which are
less clear cut. Here's one with dierent words:
(11) These are the kinds of cars that when you leave them outside
you have to be careful that there's someone to look after them.
You may hear or produce such sentences. Taken in isolation, what's your
grammaticality judgement? Can you see what might be at issue as far
as the grammaticality of such a sentence is concerned?
There are other sources of data derived from intuitions, for example
about meaning:
Mary has a green bicycle implies Mary has a bicycle.
John jogs and John doesn't jog are contradictory.
Every student who worked hard got an A and If a student worked hard,
he or she got an A mean the same thing (or are paraphrases).
We also have intuitions about whether or not discourses \work" (or
cohere):
She left
Mary knits jumpers. He's very good at it.
Every child has a dog. It's called Pip.
These are judgements of a dierent kind to those above. Each sentence
is well-formed, but there are diculties in interpreting or dierences in
interpretation according to the words involved.
One source of variation in judgements is that dierent dialects exploit
dierent rules. How do you react to the following two sentences?
(12) My shirt needs ironed.
(13) My shirt needs ironing.
MIT Press Math6X9/2003/02/27:16:11 Page 186
186 Chapter 8
Exercises
Exercise 8.3: Description vs. prescription
For each of the following statements, say whether it is a prescriptive
or a descriptive rule:
A sentence in English can consist of a name, such as John or Mary,
followed by a verb, such as walks, snores or vanished.
No word in English can begin with the sequence of letters ng.
Writing should be concise.
Don't start a sentence with a conjunction, such as and or but.
Avoid repeating words.
Exercise 8.4: Consider the following two sentences:
1. John ran up a big bill
2. John ran up a big hill
MIT Press Math6X9/2003/02/27:16:11 Page 187
188 Chapter 8
lived near the Black Sea about 6000 years ago. We'll consequently ignore
historical aspects of language in this course.
More relevant are the areas of phonetics, phonology and
morphology. In phonetics, one studies the vocal tract, how it's used
in the production of speech sounds and the resulting acoustic eects.
It's a phonetic fact about most dialects of English, for instance, that no
sounds are made by moving the tongue back so that it makes contact
with the pharynx (the back wall of the top of the throat). However, such
sounds are common-place in Arabic.
Phonology studies the patterning of sounds in language. Some con-
sequences of the phonology of English are that no word starts with the
sound written (in English orthography) \ng", and that there could not
be words such as sring or bnil.
Morphology looks at the way in which the form of words varies
according to the other words it combines with (put more technically,
according to its syntactic context). So, in most dialects of English, one
says: I am, You are, She is, I was . . . Here we see a particular word \be"
taking on dierent forms am, are, . . . as the subject (I, you, . . . ) varies.
\Be" is quite unusual in English in having so many dierent forms|
more typical are verbs such as \walk" (walk, walks, walked, walking).
We won't have time here to say anything much about these three
areas, and this is obviously an unfortunate gap|if we wanted to give a
complete account of the mechanisms by which people speak and hear, we
would obviously have to talk about how sequences of words are realized,
or how the acoustic signal is related to our perception of words. Similarly,
we would have to describe the relation between the dierent forms of a
word in the language in question. Although we'll brie
y touch on some
of these issues in the course of this book, we will not examine them in
detail.
8.6 Summary
We've argued that there are fundamental regularities about how words
in a given language can be grouped together to form sentences within an
`idealized' setting of language use, abstracting away from hesitations, slip
ups, pauses, and false starts, all of which occur in spoken conversations.
These regularities are what speakers have to know in order to know
MIT Press Math6X9/2003/02/27:16:11 Page 190
190 Chapter 8
English. But how can we describe them? Certainly not by listing each
acceptable sentence of English individually in a list. This isn't possible,
since the number of acceptable English sentences is unlimited. Instead,
we need to write down a nite number of rules which govern the way
words can be combined to form sentences. We also need an accompanying
set of rules which tell us how these sentences are interpreted; i.e., these
rules must tell us what these sentences mean. In the next chapter, we'll
begin to examine these linguistic regularities and suggest ways in which
rules can accurately express them.
1. Sag, Ivan A., and Tom Wasow (1999) Syntactic Theory: A Formal
Introduction, csli Publications.
Levinson (1993) gives an overview of the various areas in pragmatics:
1. Levinson, Steven (1993) Pragmatics, Cambridge University Press.
And Davis (1991) is an edited collection of some of the seminal, original
papers in pragmatics, including papers by Grice and Searle, whose work
we will examine in Chapters 16 and 17:
1. Davis, Steven (1991) (ed.) Pragmatics: A Reader, Oxford University
Press.
MIT Press Math6X9/2003/02/27:16:11 Page 192
192 Chapter 8
MIT Press Math6X9/2003/02/27:16:11 Page 193
Eq ! =
Ex Ex
Ex !( Ex Op Ex )
Ex !( N Op N )
N !1 2 3
; ; ; ::: 9
Op !+ ; ; =;
Figure 9.1
A set of rules for generating arithmetic expressions
MIT Press Math6X9/2003/02/27:16:11 Page 194
194 Chapter 9
Eq
Ex = Ex
( N Op N ) ( N Op N )
7 + 3 2 4
Figure 9.2
Structure associated with the equation (7 + 3) = (2 4).
1: Eq
Ex = Ex
( 7 + 3 ) ( 2 4 )
2: Eq
10 = 8
3: false
Figure 9.3
Computing the truth or falsity of an equation
196 Chapter 9
steps.) As you see, by following those rules we end up with the symbol
false, indicating that the equation is not true.
Some observations and conclusions
First of all, notice that there are two parts of the system of rules above
which deal with dierent kinds of things. First, there are rules dealing
with the structure of expressions, how they are organized into larger
units, and ultimately how they correspond to marks on paper. Second,
there are computations over numbers, and ultimately whether equations
between expressions are true or false. We can make this point more
strongly by pointing out that we could have used the following as rules:
(4) N ! one, two, three, . . . Op ! plus, times, . . .
which would allow us to produce the equation:
(5) (seven plus three ) = (two times four )
In this case, we'd also need to say that one corresponds with the number
1, times with multiplication, and so on.
Another point to note about this example is that we could consis-
tently change the symbols in any way, and not alter what the system
does. We could say, for example, that the symbol , instead of , refers
to multiplication, or we could replace the symbol Eq by XXX through-
out. Furthermore, this system operates purely in terms of symbols. To
introduce some terminology, this system is formal in the following
sense: it is explicit about the rules involved and there is no guesswork
involved in applying the rules. In other words, all that matters is the
form of an equation or expression (rather, say, than any interpretation
a human might come up with).
Finally, notice that we can
dene well-formedness for unlimited collections of expressions,
dene interpretations for elements of that collection and
distinguish between the well-formedness and the truth of an equation.
Exercises
Exercise 9.1: Write out all of the steps involved in Figure 9.3.
MIT Press Math6X9/2003/02/27:16:11 Page 197
Exercise 9.2: Draw the tree for example 2 and compute the truth of
the equation.
Exercise 9.3: Can you satisfy yourself that for any sequence of symbols
drawn from those that can appear in equations, you can
determine whether the sequence is well-formed and
apply the rules above to end up with an equation between two
numbers.
In this case, you should ignore the operation of division. What problems
might you encounter if division is allowed?
do.
From the perspective of the course, one function of the grammar is
to get you used to manipulating systems of this kind. Such grammars are
formal in the sense dened above (Section 9.1). As an analogy, consider
the following statements:
MIT Press Math6X9/2003/02/27:16:11 Page 198
198 Chapter 9
200 Chapter 9
sentence means in virtue of the words it contains and how they're put
together. This assumption helps us to abstract away from how such
sentences are interpreted in particular contexts (and so is another part
of our \laboratory conditions", see p186).
So, a sentence like (7) has a literal meaning to do with being a end
and with coming from a particular place:
(7) Etta is a stick-crazed end from hell
A person who utters (7) may, on the other hand, intend to convey
something dierent from this literal meaning; for example, among other
things, that Etta is a nuisance (because she likes sticks so much).
Working out this intended meaning depends, however, on working out
the literal meaning, and so there is still some value in having a rigourous
and systematic way of analysing literal meaning.
Some other examples involving non-literal interpretations are shown
here:
(8) Did you forget the door?
(9) David Beckham literally took o down the left wing.
While (8) is `literally' a question, it can be used to convey a request
to shut the door (or open the door, depending on the context in which
it's uttered). The person who utters (8) doesn't want an answer to his
question about forgetting a door (whatever that means), but rather he
wants the action of closing the door to be performed. We will examine
this kind of `non-literal' use of language in Chapter 17. Sentence (9)
exhibits another way in which what a person means may be dierent
from `literal' meaning. The soccer player David Beckham is unlikely to
be
ying during a soccer match, and so despite the claim for literalness
made in the sentence, we have to interpret it non-literally. From here
onwards, the word meaning will refer to literal meaning, unless otherwise
indicated. Later on, in Chapters 16 and 17, we'll return to the issue of
how literal and non-literal meanings might be related.
Representing literal meanings
Let's simplify things a little, and assume that we're dealing with con-
versations between a couple of individuals, A and B. Suppose that A
informs B of something by saying Pip barked. If B believes what A says,
MIT Press Math6X9/2003/02/27:16:11 Page 202
202 Chapter 9
Figure 9.4
Is this what Pip barked means?
then they now have something in common: they both believe the propo-
sition which amounts to the meaning of the sentence. Perhaps what they
have in mind is something like the picture in Figure 9.4.
That's ne as far as it goes, but presents us with a few problems. For
a start, it's not obvious how we can interpret pictures to say what their
meaning is. As discussed in section 7.1, there can't be a person inside
our head interpreting such a picture. Furthermore, there's a lot about
that picture which is irrelevant to the sentence in question. Why is the
dog facing right rather than left? What are all those bones doing there?
In sum, as a representation of the meaning of the sentence, the picture
doesn't seem to capture what is constant across dierent utterances and
seems to include a lot which is not implied by the sentence on its own.
It's also the case that language provides resources for combining
meanings. How would you draw a picture of a sentence such as Pip
MIT Press Math6X9/2003/02/27:16:11 Page 203
barked, and then Etta chased sticks? Is it a single picture? If so, how
is the sequence of events to be indicated? If not, how are the pictures
related?
We therefore need a way of thinking about meaning which better
re
ects the meaning of sentences. One very in
uential proposal is the
following:
(10) You can know the literal meaning of a sentence by knowing
the conditions under which it is (or would be) true.
So we need a way of representing such conditions, and a way of deter-
mining whether those conditions are true.
What we are going to do is adopt a specic proposal about the rep-
resentation of meaning (due independently to Hans Kamp and Irene
Heim), namely that we represent literal meanings in terms of the in-
dividuals under discussion and conditions which hold of and between
those individuals. This work extends earlier work in semantics by Alfred
Tarski and Richard Montague, the extensions being explicitly designed
to handle anaphora.
Why the emphasis on individuals? Clearly, language can be used to
talk about groups (e.g. lottery winners), amorphous masses (e.g. piles
of sugar or sand), stretches of time (three to ve is when I can make it)
and many other things which are perhaps dicult to construe in terms
of single entities, well distinguished from others. But for the sake of
simplicitly, we will ignore the technical compexities that are necessary
for describing groups. Instead, we will focus on modelling anaphoric
reference for simple cases involving individuals, so as to demonstrate
how the meanings of words interact with the context in which they are
used (see Section 10.2). For example, consider (11):
(11) Etta's in the park. She likes chasing sticks.
We can think of the rst sentence Etta's in the park as providing a
context in which the second she likes chasing sticks is interpreted. If we
were to omit the rst sentence, you wouldn't be able to tell who she
refers to. We can take this to indicate that a word like Etta introduces
an individual into the conversation, and a word like she refers to an
individual that the speaker assumes has already been introduced. We
will want our grammar for English to re
ect these facts.
MIT Press Math6X9/2003/02/27:16:11 Page 204
204 Chapter 9
dog(x)
bark(x)
Figure 9.5
An example of a Discourse Representation Structure (DRS)
\discourse referents"
\conditions"
Figure 9.6
Terminology to do with DRSs
the current discourse. Below the line you see the conditions that hold
of those individuals.1 These are called Discourse representation struc-
tures because they aim to be able to represent the meaning of sequences
of sentences as well as that of individual sentences.
More formally, the denition of Discourse Representation Structures
or DRSs is given as follows:
Atomic DRS conditions are things like bark(x) (which can be read as
\x barks") love(x,y ) (which can be read as \x loves y") and so on. The
so-called conditional DRS conditions given in item 2. above is used to
handle english sentences containing the words if and every, as we will
see in Chapter 10.
The above denition says that DRSs consist of two sets; the box-style
notation of a DRS given in Figure 9.6 is simply a way of showing the two
sets graphically. Using this box-style notation will make it easier to see
how constructing DRSs as semantic representations of simple English
discourses proceeds, a matter we return to in Chapter 10.
We now need to say something about how we interpret these kinds
of diagram. We can make a start on this by giving rules under which we
say whether or not such a diagram is true.
Models and truth
In discussing the literal meaning of sentences, we used the term \true"
in Section 9.3, and this clearly requires a denition. One way of being
true is to correspond to how the \real world" is. So the following is true:
(12) This sentence was typed on a computer in Buccleuch Place
around 12 noon on a Thursday.
There are obvious philosophical problems in talking about the \real
world" (whatever that might mean) and what is true within it. We are
going to side-step these by relating meanings not directly to the \real
world", but rather by relating meanings to collections of facts which
might, but which don't have to, correspond to the way the world is.
We could think of such collections as representing (a small part of)
MIT Press Math6X9/2003/02/27:16:11 Page 206
206 Chapter 9
emp
dog(p)
Pip(p)
bark(p)
dog(e)
dog(m)
chase(p, m)
Figure 9.7
An example model
from the model. Second, we have severed any explicit link between \the
real world" and our collection of facts. As discussed above, this neatly
sidesteps some philosophical issues. It also allows us a way of cashing out
the proposal on p203 that the literal meaning of a sentence corresponds
to the conditions under which it is true. With the mechanisms to be
developed in Chapter 10, we'll be able to go from a sequence of words
to the corresponding DRS, and we can then evaluate that DRS against
possible models to see whether or not the DRS adequately captures the
literal meaning of the sentence.
We've set up our models so that they are as similar as possible to
DRSs. We draw them dierently, so that we're reminded that they're
not exactly the same kind of thing. There are two respects in which
models and DRSs behave dierently. First, as we'll see below, we'll need
to make the assumption that we can identify two referents in a DRS.
In the models, we'll assume that all the individuals are distinct. In the
case of 9.7 then, there are three distinct individuals e, m and p. We can
now think of the referents in a DRS as referring to individuals in
the model. A second dierence is that facts are the only things that can
occur within a model. In a DRS, we will use not just atomic conditions,
but also \implications" (see Denition 9.1 and also Section 10.2). You
may have noticed another minor dierence: we generally use letters such
as x; y; z to stand as referents in a DRS, while we use other letters, for
example e, m, p, to represent individuals from a model.
For those readers who would like to see the formal denitions of
models and truth of a DRS with respect to a model, here they are:
208 Chapter 9
Exercises
Exercise 9.6: Consider the model shown in Figure 9.7. Of the following
DRSs, which are true with respect to that model?
wz
x y wz
Pip(z )
1: Mac(x) 2: Pip(y ) 3: Pip(z ) 4:
chase(z; w)
bark(x) bark(y ) chase(z; w)
Mac(w)
y
5: Pip(y )
dog(y )
MIT Press Math6X9/2003/02/27:16:11 Page 209
In all cases, be explicit about how you came to your decision. To what
sentence of English might the last DRS correspond?
Exercise 9.7: Suppose you have a DRS just like Figure 9.5, except that
the order of the conditions is reversed (i.e. the one to do with barking
appears above the one to do with Pip). Can there be any models with
respect to which the example shown in the text is true, but the reversed
version is false (or vice versa )? Explain your reasoning.
210 Chapter 9
sentences which are true under dierent sets of conditions: Either Etta
chased Pip or Pip chased Etta
As a nal word of warning, we haven't said anything about models
and consistency. Later on in the course, we'll be interested in the
dierence between she and he (and her and him). With a few exceptions,
the former is used of female animals and humans, and the latter of males.
We'll want to say that in a discourse like
(13) John kissed Mary. She kicked him.
she has to refer to Mary because Mary is the only female mentioned.
However, nothing we've said rules out either of the following as models:
j
(14) John(j)
female(j)
j
John(j)
(15)
female(j)
male(j)
aaaa(x)
bbbb(x)
Figure 9.8
An alternative semantic representation of a dog barked
that for the sentence a dog barked we'll get a DRS as shown in Figure 9.8.
Now, this DRS isn't true with respect to the model in Figure 9.7,
because we've broken the connection between the symbol that we used in
representing the meaning of the word and the symbol that appears in the
model. If we were to make the same substitution in the model, the DRS
would be true with respect to the new model. This emphasises that, as
with the situation with symbols in grammar rules, the choice of symbols
is unimportant|they have no content of their own. What is important
is their role in the system as a whole. In Section 10.1, we give one reason
why such an approach is justiable. It does, however, leave you with
the practical question \what convention should I adopt for choosing the
symbols which will be part of the semantic representation of particular
words?" The convention we use is that described in Section 10.1.
We've seen how we can approximate some aspects of the meaning
of sentences in natural language, that is part of the internal aspect of
language. We'll now turn to methods for characterising external aspects.
212 Chapter 9
Exercise 9.9: How far does this analogy go? Are there ways in which
the performance of a piece of music is not like speaking?
Det ! a, the
N ! dog, vet, cat
With these rules, we can now produce (or generate) a variety of
sequences, including all of the ones discussed in the paragraph above
and some others as well. We've also replaced the symbol X with the
symbol NP for noun phrase. We use this term because we're here
dealing with a phrase (i.e. something perhaps involving more than
one word) that includes a noun.
Compare the rst and the last sentences of example 16. It suggests
that the sequences the vet and Etta must have something in common.
In particular, if the rst is of category NP, so must the second be. We'll
add the following to our growing list of rules.
(19) NP ! PN
214 Chapter 9
(20) VP ! V1 NP
Our rules will now allow us to analyse sequences such as a dog chased a
cat or the vet loved Mary.
Exercise
Exercise 9.10: How many sentences do the syntactic rules allow?
NP VP
PN V0
Pip barked
Figure 9.9
A example syntactic structure
real trees (even though we draw them upside down), and in part that of
family trees. The labels that we saw in rules appear at the nodes of a
tree. Nodes are linked by branches. The top-most node of a tree is its
root. When a node appears above another, linked to it by a branch, the
rst is the mother of the second, and the second a daughter of the
rst. The relationship between a mother and its daughter(s) essentially
expresses the information that there's a rule in the grammar where the
mother is on the LHS and the daughters on the right (in the order
they appear in the tree). As we mentioned in Section 9.1 viewing the
grammar rules as a device for generating all well-formed sentencesof the
language is a legitimate alternative way of thinking about the grammar
system. And under this view, the relationship between a mother and her
daughters amounts to the relationship of rewriting or expansion.
Nodes which share a mother are sisters.
Exercises
Exercise 9.11: Draw trees for the following sentences:
1. Etta loved Pip
2. the vet caught Etta
3. the vet slept
Exercise 9.12: On the basis of the rules given above, say what cate-
gories the underlined elements in the following sentences have to be:
1. Peter bit the vet
2. the vet hated John
MIT Press Math6X9/2003/02/27:16:11 Page 216
216 Chapter 9
You may have come across the terms subject, predicate and
object . For our purposes, we can dene them in terms of the grammar
rules we're using. The subject of a sentence is the NP that is sister to
VP, and VP is the predicate. Objects are sisters to V1. An analysis of
9.6 Recursion
We'll now introduce a rule which shares an interesting property with the
second rule in Figure 9.1, and also with the rules discussed in Section 5.2.
In this rule,
(23) S ! if S S
we can see that the same symbol S appears on both the left- and
righthand sides of the rule. What are the consequences of this? In terms
of sequences of words that the grammar generates, we can now produce
examples like:
(24) if Etta chased a bird Etta caught the bird.
Thinking more generally, you should be able to see that our grammar
now shares with human languages the property that there is no limit to
the length of sentences we can analyse. To repeat some jargon introduced
earlier, we are dealing with a recursive denition, what may constitute
a sentence involving if is itself dened in terms of sentences. Taking
the grammar as discussed so far, we can make an unlimited number of
sentences, for example if Pip caught Etta Etta barked or if John loved
Mary (then) if Etta caught a bird (then) Pip barked. (Adding the word
then sometimes helps to make such sentences clearer).
Exercise
MIT Press Math6X9/2003/02/27:16:11 Page 217
Exercise 9.13: The syntactic rule for conditionals given here allows
some pretty bizarre sentences, particularly if you use the rule for if
a lot. Find a few such examples. What is your judgement of their
grammaticality?
Exercise 9.14: What rule would you have to add in order to allow
sentences involving then, e.g. if Pip barked, then Etta ran ?
Exercise 9.15: Draw the trees associated with the last few examples.
In this section, we've seen how we can describe some aspects of the
form of sentences, specically how words can be grouped into larger
units, and those units into still larger ones. We'll now turn to the topic
of how we can use the form of a sentence to determine its literal meaning.
(1993):
1. Kamp, Hans and Uwe Reyle (1993) From Discourse to the Lexicon:
Introduction to Modeltheoretic Semantics of Natural Language, Formal
Logic and Discourse Representation Theory, Kluwer Academic Publish-
ers.
The interpretation of drss that we gave in Section 9.3 is largely taken
from Kamp and Reyle (1993). However, drt aords itself an alternative
dynamic interpretation. Dynamic semantics has proved very useful in
modelling many of the communication phenomena that we will discuss
in this book|particularly the phenomena discussed in Part IV|but for
reasons of space and simplicity, we don't introduce dynamic semantics
MIT Press Math6X9/2003/02/27:16:11 Page 218
218 Chapter 9
in this book. van Eijk and Kamp (1997) provide an excellent overview
of this exciting area:
1. van Eijk, Jan and Hans Kamp (1997) Representing Discourse in
Context, in J. van Benthem and A. ter Meulen (eds.) Handbook of Logic
and Linguistics, Elsevier, pp179{237.
Word meaning|or lexical semantics as its also known|is currently a
hot topic in Linguistics and Computational Linguistics. The following
article lays out some of the challenges we face in analysing the meaning
of words:
1. Pustejovsky, James (1991) The Generative Lexicon, Computational
Linguistics, 17.4, pp.409{441.
The following articles also give an overview of the phenomena in lexical
semantics, but in a more introductory style than Pustejovsky (1991):
1. Briscoe, Ted (1991) Lexical Issues in Natural Language Processing, in
E. Klein and F. Veltmann (eds.) Natural Language and Speech, Springer-
Verlag, 22 pages.
2. Copestake, Ann (1995) Representing Lexical Polysemy, Proceedings
of the AAAI Spring Symposium on Lexical Semantics, pp.21{26.
For further details about lexical semantics, consult Pustejovsky (1995):
1. Pustejovsky, James (1995) The Generative Lexicon, mit Press.
For an introduction to syntax, consult O'Grady et al. (1996) and Sag
and Wasow (1999):
1. O'Grady, William, Michael Dobrovolsky and Francis Katamba (1996)
Contemporary Linguistics: An Introduction, Longmans.
2. Sag, Ivan A., and Tom Wasow (1999) Syntactic Theory: A Formal
Introduction, csli Publications.
Both of these books give details about the various tests for syntactic
constituency, which are mentioned in Section 9.4.
MIT Press Math6X9/2003/02/27:16:11 Page 219
We'll use the term grammar in the following technical sense: a formal
system for relating form and meaning (see p196 for a denition of the
term \formal"). So, the term here is distinct from other senses such as
\reference grammar". We already have mechanisms for describing both
form and meaning and the question this chapter addresses is the way in
which we can link the two.
You've seen many of the techniques we'll use here in Section 9.1,
both in general terms, where we use a set of syntactic rules to produce
an analysis of a sequence of symbols which can then be interpreted to
produce the \meaning" of that sequence, and in the specics, where we
will produce that meaning by taking apart the tree that represents the
sentence's analysis. Our point of departure will be a discussion of general
principles that might underlie such a process.
220 Chapter 10
connection between the sounds (or letters) of a word and its meaning,
we should expect all of these words to be similar in form.
What's in a name? That which we call a call a rose by any other name would
smell as sweet.
Shakespeare (Romeo and Juliet, II.i.43)
As the above quote suggests, there is complete arbitrariness between
the form of a word and its meaning. It took a while for this idea to nd
favour with linguists. Ferdinand de Saussure coined the term l'arbitraire
du signe to describe this phenomenon around the turn of the twentieth
century.
To give one further example, suppose all speakers of English agreed
one day to replace the word table with the word plurk. It's clear that
this wouldn't have any eect on the things table used to refer to, and it's
also pretty clear that, apart from this change, the grammar of English
wouldn't be aected either.
One of the many amazing things about human facility with language
is that we are able to learn so many arbitrary associations. Because
they're arbitrary, there's no rhyme or reason to particular pairings of
form and meaning and so there are no general principles which can aid
the process of memorisation. One estimate, and there are good reasons
to think that it's a conservative one, is that typical English-speaking
students at age 17 know 60,000 such pairings. If you've read a fair bit,
you're quite likely to have topped the 100,000 mark. In order to achieve
even the lower gure, you have to learn a new, arbitrary pairing every
ninety waking minutes or so from the time you're one year old. The
glossary at the end of these notes contains on the order of 250 denitions,
and that's probably a good estimate of the number of new words, or new
senses of words you'd already come across, that you will have learnt in
reading these notes.
Two objections are common at this point. First, what about words
which sound like what they describe? Well, cases of such onomatopoeia
are vanishingly few (at most a few hundred) compared to the typical
case. Onomatopoeia is also conditioned to a large extent by available
sound patterns in the language: cock-a-doodle-doo translates as cocorico
in French. Second, what about words such as doorknob which describe
the knobs of doors? In this case, there's a mismatch between two
denitions of words. If you take `word' to mean \a sequence of characters
MIT Press Math6X9/2003/02/27:16:11 Page 221
222 Chapter 10
Figure 10.1
Examples of predicate symbols for words of dierent syntactic categories.
For our purposes, we can identify the term `expression' with `con-
stituent', and `manner of combination' with the syntactic conguration
in which an expression appears, or (in other words) with the \shape" of
the tree the syntactic rules tell us to draw. It's probably worth qualifying
the above quotation with \and nothing else can aect the meaning of the
expression as a whole", to emphasize that, in a strictly compositional
MIT Press Math6X9/2003/02/27:16:11 Page 223
224 Chapter 10
into account factors entirely outwith the sentence, i.e. whether today is
Saturday, in determining the meaning of Eddie milked a cow. In the
second case, while the words in the sentence may aect its meaning,
they don't aect it in a way that takes account of the meaning of those
words. The third does take account of the meanings of words, but not
of how those words are put together. If the sentence in question is I
have red pyjamas, things are not so bad. The cases of I don't have red
pyjamas, or While dressing gowns are typically red pyjamas are often
blue show that how words are grouped into larger units must be taken
into account.
In a world where language was non-compositional, predicting the
meanings of an utterance would be much more dicult. Given the
unlimited number of sequences of words that constitute well-formed
sentences, we rely on considerable regularity to the way in which the
meaning of words and phrases contribute to that of sentences as a whole.
Without this regularity, we would see the same arbitrary relationship
between sentences and their meanings as holds between words and their
meanings.
The examples above have been deliberately extreme in order to drive
home the possibilities (or, more to the point, the limits) of completely
non-compositional systems. You should have a strong intuition that
English is not like the more extreme examples in (3). In fact, assuming
that the meaning of an expression can be characterised in terms of
what that expression denotes or refers to (and this is a standard
assumption in models of meaning that exploit formal systems such
as logic), there are a number of interesting ways in which English
is not completely compositional. Consider, for example, the following
statements:
(3) a. The phrase today means \Monday" on Mondays, \Tues-
day" on Tuesdays, . . .
b. The phrase the bucket refers to a previously mentioned
bucket, except in the context kick the bucket, where the
whole phrase means \die".
c. The word she refers to a female mentioned somewhere else
in the sentence or discourse.
While you might quibble with details, these statements sound plausible
MIT Press Math6X9/2003/02/27:16:11 Page 225
Exercises
Exercise 10.1: Consider the rules given in the grammar for English.
Say which aspects of the rules are compositional and which are not.
Explain your reasoning.
Exercise 10.2: How do you interpret the example Peter kept an eye on
Pip? What are the consequences of this example for compositionality?
226 Chapter 10
becomes
Figure 10.2
General form of semantic rules
side (LHS) of such a rule, you will see a part of a tree, including one
or more syntactic categories. On the righthand side (RHS), you will
typically see a dierent tree (or no tree at all), and an indication of
information to be introduced into the DRS. The following general rules
MIT Press Math6X9/2003/02/27:16:11 Page 227
add \name(z )" to the DRS's conditions, where \name" is the predicate
symbol associated with the word whose form is name.
MIT Press Math6X9/2003/02/27:16:11 Page 228
228 Chapter 10
VP
name(z )
becomes
V0
z
name
Figure 10.3
The semantic rule for intransitive verbs, e.g. ran, slept.
NP VP
Det N V0
a dog barked
Figure 10.4
The syntactic tree for a dog barked
NP VP
Det N V0
a dog barked
Figure 10.5
The rst step in applying the semantic rules for the sentence a dog barked.
230 Chapter 10
NP y
Det N
a dog
bark(y)
Figure 10.6
Step 2 in applying the semantic rules for the sentence a dog barked.
name(x)
NP becomes:
Det N
x
a(n) name
Figure 10.7
The semantic rule for a
Note that this rule involves a more complex tree than the rule we
saw before (it consists of a mother node with two daughters, rather than
one daughter), but the principle|whether we can nd the tree on the
LHS of the rule within the DRS we are processing|remains the same.
So, we can apply that rule to the diagram as seen in Figure 10.6, and
will produce Figure 10.8.
The nal stages in processing will involve a rule which looks a little
dierent from the preceding two, namely Figure 10.9. Here, the rule
tells us to substitute one discourse referent for another. This captures
the idea we mentioned earlier, that the individual introduced by the
subject NP is the same as the individual that the VP stipulates has
certain properties (e.g., in this case, that he or she barks). Indeed, we
can apply that rule in our example to give Figure 10.10. Observe how
the discourse referent y that was the argument to the predicate bark has
now be replaced with the discourse referent x; that's what the rule in
Figure 10.9 instructs us to do. In fact, this DRS is exactly the one we
MIT Press Math6X9/2003/02/27:16:11 Page 231
x y
bark(y)
dog(x)
Figure 10.8
The next step in processing a dog barked
S
becomes substitute all occurrences
of z in the DRS conditions
x z with x
Figure 10.9
The semantic rule for sentences
bark(x)
dog(x)
Figure 10.10
The last but one step in processing a dog barked
were aiming for in Figure 9.5 (page 204). We saw at that point how a
model would need to look to make the DRS true, and you may want to
review that discussion now.
We introduced the rule for intransitive verbs above before that for
a because the rule for verbs is simpler. You may have noticed that we
could have done things in another order: we could have chosen to process
a dog rst, before barked. Does it matter which order we do things in?
The answer is: it depends. In this case it doesn't matter, using either
of the possible orderings produces the same result. We'll see later on,
in Section 11.3, some cases where dierent orderings produce dierent
results.
Exercises
MIT Press Math6X9/2003/02/27:16:11 Page 232
232 Chapter 10
Exercise 10.4: We haven't been explicit about why some orderings are
possible and others not. Give one or more reasons why, for example, it
is not possible to start composing the DRS of a sentence by applying
the sentence rule.
NP VP
Det N V1 NP
Det N
Figure 10.11
The rst step in applying the semantic rules for the sentence a dog chased a cat.
u v
v VP
V1 u
chased
cat(u)
dog(v)
Figure 10.12
a dog chased a cat after processing the NPs.
name(x; w)
VP
becomes
V1 w
x
name
Figure 10.13
The semantic rule for transitive verbs, e.g. loved, chased, caught
We can see the tree shown in the rule within the DRS, and so we
can apply the rule, taking w in the rule to be the same as u in the DRS.
We can also see that the predicate symbol for chased, assumed to be
\chase" will appear in the result of applying the rule. That means that
the DRS (so far) will look as shown in Figure 10.14 before deleting the
part of the tree in question, and as shown in Figure 10.15 afterwards. At
this point, we have now got to the same stage in constructing the DRS
as that shown in Figure 10.8 for the previous example, and it should be
easy for you to see the steps involved in completing the example.
MIT Press Math6X9/2003/02/27:16:11 Page 234
234 Chapter 10
u v
v VP
V1 u
chased
cat(u)
dog(v)
chase(x; u)
Figure 10.14
The next step in processing a dog chased a cat.
u v
S cat(u)
v x
dog(v)
chase(x; u)
Figure 10.15
a dog chased a cat after deleting the tree rooted in VP
Exercises
Exercise 10.5: Complete the above example, and write out all of the
steps involved (including the two steps leading to the stage shown in
Figure 10.12).
Exercise 10.6: Is the DRS you compute true with respect to the model
shown in Figure 9.7? Is it true with respect to the following model?
emp
dog(p)
Pip(p)
bark(p)
dog(e)
cat(m)
chase(p, m)
MIT Press Math6X9/2003/02/27:16:11 Page 235
We've now seen how we can compute DRSs for simple sentences
involving transitive and intransitive verbs, and noun phrases involving
a. In the following sections, we will build on this to cover more examples.
Exercise
Exercise 10.7: For each of the semantic rules, rephrase them in En-
glish, as we did for the V0 rule above.
NP
name(x)
becomes
PN
x
name
Reuse a referent, if you can. Otherwise introduce the referent at the top of the box.
Figure 10.16
The semantic rule for proper names, e.g. Etta, Pip, Mary
236 Chapter 10
NP VP
PN V0
Etta barked
Figure 10.17
Start of the process for constructing the DRS of Etta barked
x VP
V0
barked
Etta(x)
Figure 10.18
Step one in composing the logical form of the (one-sentence) discourse Etta barked
in, composing logical form will follow essentially the same steps as those
discussed in Section 10.2, and we omit them here. The resulting DRS will
be very similar to that shown in Figure 9.5. The circumstances under
which the dierence between the rules for a and for proper names makes
a dierence ultimately to the resulting DRS will be discussed in the next
section.
MIT Press Math6X9/2003/02/27:16:11 Page 237
Exercise
Exercise 10.8: Complete the process of composing the logical form for
Etta barked. Ignoring whatever letter you use for the discourse referent,
what is the single dierence between this DRS and that for a dog barked?
The semantic rule for noun phrases involving the denite article
the is shown in Figure 10.19. You'll notice that the same instruction
NP
name(x)
becomes
Det N
x
the name
Reuse a referent, if you can. Otherwise introduce the referent at the top of the box.
Figure 10.19
The semantic rule for the
Exercise 10.9: Construct the DRS for the one-sentence discourse the
dog barked.
238 Chapter 10
pretty similar looking DRSs. So, what's the dierence? Consider the
following examples of very simple discourses:
(4) A dog barked. A dog ran.
(5) A dog barked. The dog ran.
The dierence here seems to be that, in the second case, we can only be
talking about a single dog (or, in the jargon, the discourse only refers to
a single dog). Do the rules we have so far capture this? Well, rst of all
we have to be a bit more specic about what a discourse is. We'll give
the following denition, which functions rather like one of the syntactic
rules for the internals of a sentence:
(6) A discourse is a sequence of one or more sentences, each
terminated by a full stop.
And, in order to provide its logical form, we need to give a semantic
rule. It goes like this:
(7) To process a discourse, process the rst sentence. This will
involve creating a DRS. Process the remaining sentences in
order, adding them one at a time to the DRS containing the
rst sentence.
How will these rules operate? By our syntactic rules, the following is a
sequence of two sentences, and is consequently a discourse by our rule
above:
(8) A dog barked. The dog ran.
So, our semantic rule says that we can process this by taking the rst
sentence and processing that. We already know what the result will look
like (see Figure 9.5). So we know that our rst step in processing the
second sentence in the discourse will be to add its tree to the same box.
This is shown in Figure 10.20. If we think about the instruction \reuse
a referent, if you can", it's clear that we are in a dierent situation
from the analysis of the previous example, where the dog barked was
the rst sentence of the discourse (recall the discussion of Figure 10.18).
In the current case, when we come to process the NP the dog, we will
be doing this in a dierent context : this time there is a referent, and
so we can reuse it, as instructed by the rule. Therefore, processing
MIT Press Math6X9/2003/02/27:16:11 Page 239
dog(x)
bark(x)
S
NP VP
Det N V0
Figure 10.20
Processing the second sentence in a discourse: step 1
the noun phrase in this example will result in the diagram shown in
Figure 10.21. Crucially, then, when composing the logical form of the
dog(x)
dog(x)
bark(x)
S
x VP
V0
ran
Figure 10.21
Processing the second sentence in a discourse: step 2
Exercise
Exercise 10.10: Complete the process of constructing the DRS for A
dog barked. The dog ran.
MIT Press Math6X9/2003/02/27:16:11 Page 240
240 Chapter 10
dog(x)
dog(x)
bark(x)
run(x)
Figure 10.22
The result of processing A dog barked. The dog ran.
You'll notice that we've followed the letter of the rules, and ended
up with two occurrences of the condition \dog(x)" in the DRS. Is this
important? If we think about the models that make such a DRS true
(or false), you should be able to see that having both occurrences of the
condition is redundant. If we were to delete one occurrence, exactly the
same models will make the DRS true (and similarly for DRSs which fail
to make the DRS true).
Exercise
Exercise 10.11: Satisfy yourself that all of the claims in the following
discussion are true.
Let's consider a range of sentences, and what our grammar currently has
to say about them. You should check what we have to say against your
intuitions about these sentences.
(9) A dog barked. A dog ran.
Here, the DRS of the rst sentence will introduce a referent. The second
sentence will introduce a referent too, but according to the rules of our
system, this should be a dierent referent. (To speak in terms of what
will appear in the DRS, we will use dierent letters or symbols for the
two referents.) What that means is that, relative to a model, the two
occurrences of dog in this discourse could refer to dierent individuals
in the model. In the following model, they do:
MIT Press Math6X9/2003/02/27:16:11 Page 241
em
dog(e)
(10) dog(m)
bark(e)
run(m)
you'll see that a single individual is associated with all three conditions.
Therefore, with respect to this model, the two referents in the DRS for
the sentence in question refer to the same individual, because there's
nothing else they could possibly refer to.
What if we reverse the ordering of the sentences of our rst discourse,
so that we have:
(12) the dog ran. a dog barked.
In this case, there is nothing preceding the rst sentence and so it will
get processed as if it were in isolation. Consequently, there will be no
existing referent within the DRS to reuse. The grammar therefore makes
the same prediction as in the case with two indenite articles: the two
referents in question may refer to the same individual but they don't
have to.
So far, probably so good. It's quite likely that you'll agree with what
our grammar predicts. Let's take a look at some sentences where things
don't work out quite so well:
(13) John ran. the dog barked.
MIT Press Math6X9/2003/02/27:16:11 Page 242
242 Chapter 10
i
(14) John(i)
dog(i)
First of all, check your intuitions about this prediction. If it seems plain
wrong to you, try and say why.
What can we learn from this example? First of all, it's a useful
reminder that we are mechanically following rules. For better or for
worse, we can mechanically work out what the grammar predicts. In
this case, it forces John and the dog to corefer, that is they must be
taken to refer to the same individual. We'll be pleased if what it predicts,
coreference in this case, accords with our intuitions. If what it predicts
seems odd, then we have to try and work out what that oddness is, and
whether it represents a good reason to revise the grammar (or perhaps
discard it altogether).
A second useful reminder is that the grammar \knows" only what
we've told it. We haven't said anywhere that John is quite an unusual
name to give a dog. The grammar is not permitted to hypothesize a
scenario in which, say, John is a human, and has a dog. All the grammar
can do is relate sequences of words to DRSs and all we can do with a
DRS is oer models with respect to which that DRS is true or false.
We could choose to build a constraint into our model (cf. p210), to the
eect that no individual is associated with both the conditions \John"
and \dog".
What intuitions do you have about reference in the following exam-
ple?
(15) a dog chased a bird. the dog caught the bird.
MIT Press Math6X9/2003/02/27:16:11 Page 243
There are in fact two things that go wrong here. Before reading on, see
how many problems you can spot. Let's deal with the perhaps more
obvious one rst. Suppose a dog introduces the referent x, and a bird
the referent y . When we come to process the second sentence, both of
these referents can be found at the top of the DRS, and so what is there
to stop us reusing x for the bird and y for the dog? (This amounts to the
claim that one possible interpretation of this discourse is that the thing
that did the chasing is the thing that got caught.) The answer is: there's
nothing to stop us. We haven't put any constraints on how you pick
from the referents that are available, and so any way of choosing them is
allowed. You should note that, under that interpretation, this DRS can
only be true with respect to a model in which there are individuals that
are both dogs and birds. With the example of John and the dog above, it
seemed that our grammar was too strict in forcing us to reuse a referent.
In this case, it seems too liberal in allowing us to reuse referents in ways
that are not the \most obvious". For the time being, we'll leave things
in what may seem to you to be an unsatisfactory state of aairs. This
topic will be taken up again in considerably more detail in Chapter 15.
What's the second problem? It's one that will violate your intuitions
probably more than any we've seen so far. Consider the situation shown
in Figure 10.23.1 At this point, in processing the bird we can ask the
x y
dog(x)
bird(y)
chase(x; y)
S
x VP
V1 NP
Figure 10.23
One point in processing the discourse a dog chased a bird. the dog caught the bird.
1 Here we've used the convention of writing a triangle above the words the bird, on
the assumption that either you know what the analysis of the constituent is, or that
it's unimportant for the current discussion.
MIT Press Math6X9/2003/02/27:16:11 Page 244
244 Chapter 10
So, the thing that got caught was the thing that did the catching. This
is, most people will agree, clearly wrong. Bear this problem in mind, and
we'll return to it in the next section.
Exercises
Exercise 10.12: Repeat the kinds of analysis we've attempted above
on the following examples. That is, assess the kinds of DRSs each gives
rise to, in terms of the models which make those DRSs true, and your
intuitions about what the discourses mean.
(17) A dog ran. Etta barked.
(18) Etta caught a bird. the dog barked.
(19) John chased Etta. Etta chased John.
the rule for pronouns, shown in Figure 10.24.2 It should be clear that
NP
becomes
PRO x
pronoun
Figure 10.24
The semantic rule for pronouns, e.g. she, her, it, he, him
if (20) is the rst sentence of the discourse, then the semantic rule in
Figure 10.24 will predict that it's uninterpretable. This is because when
we reach the instruction \You must reuse a referent", there is no referent
around to reuse and we are unable to nish constructing the DRS for
this sentence and so, by our rules, cannot go on to process any later
sentence.
That inability to proceed captures nicely the double take we feel on
hearing such sentences as that above, and also that below:
(21) she chased Etta
But here we run into a problem: if we process Etta rst, then the referent
we introduce will be available for she to reuse. Most people share the
very strong intuition that that shouldn't be allowed. Things are perhaps
even worse here:
(22) Etta chased her.
It seems to be the case that coreference is impossible in this example.
In order to x this problem, we will introduce a restriction, the
Anaphora Constraint, which will govern how we may reuse referents.
One bit of terminology will help in phrasing it. In an atomic condition
in a DRS, such as \love(x; y )", x and y are the arguments to the
2 We would need to have another rule for dealing with the so-called deictic use
of he, where the speaker actually points to somebody when he says he, his intention
being that he refer to the person that he's pointing to.
MIT Press Math6X9/2003/02/27:16:11 Page 246
246 Chapter 10
Exercises
Exercise 10.13: Go through the details of the above claims.
There are some interesting consequences that follow from this con-
straint. Here are a couple of them:
3 There are words which violate the constraint as stated here, most notably re-
flexive pronouns , such as myself, herself, himself, . . . . To cater for these, we could
revise the condition to read:
In constructing a DRS, a referent cannot appear more than once in the
arguments to an atomic condition, unless one occurrence of the referent was
introduced in the semantic rule for re
exive pronouns, in which case you must
identify that occurrence with another argument.
This would allow us to generate sentences such as
i Etta loved herself.
ii ?herself loved Etta
while not permitting referents to be identied in the following cases:
i Etta loved her
ii she loved Etta
As we will not extend the grammar to cover such pronouns, we won't pursue this
topic here.
MIT Press Math6X9/2003/02/27:16:11 Page 247
a pronoun like he and her cannot occur within the rst sentence of a
written discourse.
For the pronoun to reuse a referent in this case there are only two
possibilities: it could reuse a referent from a previous sentence, but there
is none, or it could take a referent from something else in the same
sentence, but that would result in a violation of the constraint.
a proper name occurring in the rst sentence of a discourse always
introduces a new referent.
If a proper name doesn't introduce a new referent, it behaves in eect
like a pronoun.
in the discourse Mary loved Etta. Etta loved her., either Mary and
Etta refer to dierent individuals in a model, or the individual known as
Mary is also known as Etta.
Exercise
Exercise 10.14: Justify the preceding claim.
248 Chapter 10
e b1 b2 b3 e b1 b2 b3 e b1 b2 b3
Etta(e) Etta(e) Etta(e)
chase(e, b1 ) chase(e, b2 ) catch(e, b1 )
1: catch(e, b1 ) 2: catch(e, b1 ) 3: bird(b1 )
bird(b1 ) bird(b1 ) bird(b2 )
bird(b2 ) bird(b2 ) bird(b3 )
bird(b3 ) bird(b3 )
e b1 b2 b3
Etta(e)
chase(e, b1 )
catch(e, b1 )
4: chase(e, b2 )
catch(e, b2 )
bird(b1 )
bird(b2 )
bird(b3 )
Figure 10.25
Some models involving Etta and birds
she chases but doesn't catch and so the conditional seems to be true. In
model 2, there is a bird b2 that she catches, and a dierent one b1 that
she chases. So b1 is a bird which she chases but doesn't catch, so the
conditional is not true. In model 3, she doesn't chase any birds. We'll
leave it to your intuition whether you think that makes the conditional
false. In model 4, there are two birds that she chases, and she catches
both of them, so the conditional seems to be true.
x; y
Etta(x) x; y
bird(y) catch(x; y)
chase(x; y)
Figure 10.26
The DRSs for the left and righthand side of the conditional
How does this discussion relate to DRSs that might represent the
meaning of two halves of the conditional? The two DRSs corresponding
to the left and righthand sides of the conditional can be seen in Fig-
ure 10.26. One way of generalizing the discussion above is the following:
MIT Press Math6X9/2003/02/27:16:11 Page 250
250 Chapter 10
(33) if S1 S2 is true, provided that any way in which the DRS for
S1 is true is also one in which the DRS for S2 is true.
One way of making the lefthand DRS true, with respect to model 4, is
to take x to be e and y to be b1 . In that case, the righthand DRS is
also true. There's another way, involving b2 . Again the righthand DRS
is true. Now we have exhausted the ways of making the lefthand DRS
true, and seen that each of them is also a way of making the righthand
DRS true, therefore by the rule given above, the conditional is true with
respect to model 4.
Exercise
Exercise 10.17: Go through the models applying this test.
So, how can we represent, within a DRS, that the truth or otherwise
of part of a DRS refers to the rule above, rather than the simpler rule
we've seen to date? We'll do this by adopting a symbol to indicate this
relation between two DRSs, as shown in Figure 10.27. The last thing we
Figure 10.27
Representing conditionals in a DRS
if S1 S2
becomes S1
) S2
Figure 10.28
The semantic rule for if.
MIT Press Math6X9/2003/02/27:16:11 Page 251
if S1 S2
Figure 10.29
Step one of constructing the DRS of a conditional
to break up the conditional and place each sentence within its own box.
See Figure 10.30. Now processing of the lefthand side in this example
S
) S
Figure 10.30
Step two of constructing the DRS of a conditional
will be much like others we've seen before; see for example Figure 10.11.
We'll omit all these steps. One comment is necessary. As you will see
in Figure 10.31, we have placed the referent introduced by the lefthand
sentence into the top of the lefthand box. This mirrors our rules for
translation within the main DRS.
MIT Press Math6X9/2003/02/27:16:11 Page 252
252 Chapter 10
x y
chase(x; y) ) S
Etta(x)
bird(y) Etta caught the bird
Figure 10.31
Step two of the translation of a conditional
We can now go ahead and process the second sentence from the con-
ditional. The main question that will arise now is in the instructions to
reuse referents associated with Etta and the bird. For now, to make things
as simple as possible, we assume that one can reuse a discourse refer-
ent that was introduced anywhere within the DRS so far. We'll present
more sophisticated constraints on which discourse referents you can le-
gitimately reuse in Section 14.2, and these more complex constraints
will better capture some of the linguistic data. But for now, let's assume
you can reuse any discourse referent in the DRS. Under that rule, we
can process the second DRS to produce the nal DRS shown in Fig-
ure 10.32. This is not the only DRS because, for the reasons discussed in
x w
Etta(x) ) Etta(x)
bird(w) bird(w)
chase(x; w) catch(x; w)
Figure 10.32
The nal DRS of the conditional
Section 10.2 above, we have a choice about how we choose to arrange the
co-reference. This translation does use the \obvious" correspondence, in
which both occurrences of Etta have to refer to the same individual.
Exercise
Exercise 10.18: Work through this example completely.
Exercise 10.19: Produce a DRS for If Etta chases a bird, she catches
MIT Press Math6X9/2003/02/27:16:11 Page 253
it. If you take Etta and she to corefer, is there any model with respect
to which this DRS is true and the DRS we constructed in Figure 10.32
for If Etta chased a bird, Etta caught the bird is false, or vice versa ?
S
S
NP VP
becomes x
)
Det N
name(x) x VP
every name
Figure 10.33
The semantic rule for every.
Exercises
MIT Press Math6X9/2003/02/27:16:11 Page 254
254 Chapter 10
NP VP
1:
Det N V0
2: x
) x VP
dog(x)
V0
barked
bark(y)
S
3: x
) 4: x
)
dog(x) x y
dog(x) bark(x)
Figure 10.34
The DRS of every dog barked.
Exercise 10.21: The syntactic rule for conditionals given above allows
some pretty bizarre sentences, particularly if you use the rule for if
a lot. Find a few such examples. What is your judgement of their
grammaticality?
Exercise 10.22: What are your intuitions about the truth or otherwise
of conditionals in which the rst sentence is not true?
Exercise 10.23: The semantic rule for every can only allow interpret
congurations in which that word appears in the subject NP (i.e. the one
directly below S). Sketch a semantic rule to provide an interpretation
of NPs appearing as objects of transitive verbs (i.e. appearing next to
MIT Press Math6X9/2003/02/27:16:11 Page 255
V1). Can you then give an interpretation for Every dog chases every cat?
Explain your reasoning
Exercise 10.26: Consider the sentence Every angry dog barked. Write
down a DRS and construct models which should allow you to assess
whether the DRS is a good logical form of the sentence (i.e., the models
which make the DRS true should match your intuitions about how the
world would be if the discourse were true). Do the semantic rules which
you devised for the previous exercise allow you to construct the DRS
you have written from the syntactic tree? Why not?
256 Chapter 10
We can tie together the uses of dierent referring expressions within the
sentence. Even though the grammar is much too liberal in the ways it
allows us to do that, we can still be condent that the interpretation
of those referring expressions which is obvious to us humans is also an
interpretation that is allowed by the grammar.
A second positive point is that, while we have restricted ourselves to
a very small number of words, and syntactic and semantic rules, they
are still quite general in what they could potentially cover. In case you
are doubtful about this claim, consider the following exercise.
Exercise
Exercise 10.28: Here is a model:
j p g r s1 s2 s3 s4
song(s1 ) song(s2 ) song(s3 ) song(s4 )
Yesterday(s1 ) Tell-me-why(s2 )
Yellow-Submarine(s3 ) Norwegian-Wood(s4)
write(j, s2 ) write(j, s4 ) write(p, s1 ) write(p, s3 )
John(j) Paul(p) George(g) Ringo(r)
beatle(j) beatle(p) beatle(g) beatle(r)
sing(p, s1 ) sing(j, s2 ) sing(j, s4 ) sing(r, s3 )
You should now be in a position to write rules which allow you to decide
the truth or otherwise of sentences and discourses such as Paul wrote a
song. George sang it. and if a beatle sang a song he wrote it. Write down
the rules that you need. Note all the mappings between words in English
and symbols in a DRS that you use. Produce groups of sentences which
are true or false according to this model.
There are many aspects of natural languages which we have not been
able to cover, even within the limitations set out in Section 8.5. We can
cover only simple sentences. This means that we are unable to process
sentences like the dog that caught the bird barked, as well as many other
kinds of sentences. In general, it seems to be the case for English that
we can give quite a thorough characterization of its syntax only using
context free rules. Below, we'll mention some other phenomena which
are of importance to cognitive science, but for which this simple kind of
MIT Press Math6X9/2003/02/27:16:11 Page 257
258 Chapter 10
Figure 10.35
The structure of a Dutch sentence
subjects and objects to the verbs they form sentences with. You can
see from that gure that we are no longer dealing with the kind of
tree that our grammar rules have allowed. In order to generate such
congurations, a more powerful type of rule, called context sensitive
MIT Press Math6X9/2003/02/27:16:11 Page 259
rules are required. (See the Glossary for information about such rules.)
A consequence of this discovery is that, even though the simpler rule
format may be sucient for English, human languages in general are
more complex than can be described using a context free grammar. On
this basis, we can argue that the human processor in language is able to
deal with the more complex, context sensitive rule format.
Semantics
Just as our simple grammar will not cover most of the syntactic phenom-
ena in English, there is a large amount of work in the area of semantics
that we have ignored. In some cases, we have to do this because of the
simple nature of the models we've used. If a model only represents a
snapshot of the world, or of some state of aairs, we won't be able to
talk about what was true in the past, or what might be true in the fu-
ture, nor about situations that are on-going, e.g. building a house, but
that are not true at the very moment our snapshot depicts. These issues,
referred to under the heading of tense (see Section 16.4) and aspect,
have been crucial in the development of apparatus for the description of
the semantics of natural language.
Exercises
Exercise 10.29: Our models are very simple collections of facts. In
what way does ignoring tense assist that simplication? Put another
way, what information would have to be present in our models in order
to correctly distinguish between Pip is barking and Pip was barking?
Exercise 10.30: Why can't we use the system to model negative sen-
tences Pip doesn't bark or ones involving or, e.g. either Pip is in the park
or Etta is in the park?
260 Chapter 10
on this work.
The interpretation of drss that we give in this book is taken from
Kamp and Reyle (1993). However, an alternative interpretation within
the framework of dynamic semantics is also possible. van Eijk and Kamp
(1997) provide an excellent overview of dynamic semantics:
1. van Eijk, Jan and Hans Kamp (1997) Representing Discourse in
Context, in J. van Benthem and A. ter Meulen (eds.) Handbook of Logic
and Linguistics, Elsevier, pp179{237.
In section 10.2 we discussed the semantic analysis of the defniite article
the. Denites have been studied extensively over the years: Russell
and Stalnaker oer seminal papers|and con
icting views|on on the
semantics of denites from the perspective of Philosophy of Language:
1. Russell, Bertrand (1905) On Denoting, Mind, 14, pp.479{493.
2. Stalnaker, Robert (1978) Assertion, in P. Cole (ed.) Syntax and
Semantics Volume 9: Pragmatics, Academic Press, New York.
A more recent analysis of the denite that brings these two con
icting
views together and that also oers a more linguistic perspective is
described in the following book by Gennaro Chierchia:
1. Chierchia, Gennaro (1995) Dynamics of Menaing: Anaphora, Pre-
supposition and the Theory of Grammar, University of Chicago Press,
Chicago.
MIT Press Math6X9/2003/02/27:16:11 Page 262
262 Chapter 10
MIT Press Math6X9/2003/02/27:16:11 Page 263
11 Ambiguity
264 Chapter 11
Ambiguity 265
NP
Det N
a service
In fact we used just this kind of justication above when we decided
that one instance of the word was a verb and the other a noun. As
we'll see in section 12.1, this observation will come in useful when we
talk about automatically determining trees for sentences of English.
Categorial ambiguity may also be present when a sentence as a whole is
associated with more than one tree, a situation discussed in Section 11.4.
MIT Press Math6X9/2003/02/27:16:11 Page 266
266 Chapter 11
Exercises
Exercise 11.1: Find other examples of words which can be of more
than one syntactic category. Construct examples in which each word is
forced to have a particular category. Can you nd examples where this
doesn't happen? Is there always a dierence in meaning associated with
the dierent categories?
Exercise 11.2: Repeat the above exercise using words taken from a
few sentences in a novel or newspaper article.
Ambiguity 267
Exercises
Exercise 11.3: If we adopted the proposal above, how many interpre-
tations will the sentence head seeks arms have?
268 Chapter 11
Exercise
Exercise 11.5: Find other examples to do with bank.
Ambiguity 269
(14) S
NP VP
Det N V1 NP
a woman
With appropriate rules for man and woman, this sentence is well-formed
according to our grammar rules. What semantic rules can we apply in
this case? The answer is that we have a choice. We can either apply the
rule for every or that for a. In all previous examples that we've looked
at, the order in which we applied semantic rules made no dierence to
the resulting DRS. What will happen here? First o, let's use the rule
for every. This will result in the DRS shown here:
x VP
(15) x
man(x)
) V1 NP
loved Det N
a woman
and processing can then continue within the smaller righthand box, to
yield:
MIT Press Math6X9/2003/02/27:16:11 Page 270
270 Chapter 11
y
(16) x
man(x)
) woman(y )
love(x; y )
On the other hand, if we use the rule for a rst, our rst two steps will
produce the DRSs below:
y
woman(y )
S
(17) NP VP
Det N V1 y
(18) x
) x VP
man(x)
V1 y
loved
Ambiguity 271
So are the two DRSs that we have computed dierent? Will there be
models which make one true and not the other? The crucial dierence
is that in the second case the condition associated with woman appears
in the outermost box. So, a model which makes this DRS true must
contain at least one individual of which the condition \woman" holds.
Furthermore the other condition must also hold, namely that for every
way of making the LHS in the implication true there must also be a
way of making the RHS true. In this case, then, our choice of individual
for y is xed with respect to choice of individuals associated with the
condition \men". In other words, there should be (at least) one woman
that all the men love.
Turning to the second DRS that we computed, things are dierent.
The condition \woman" ends up in the box on the RHS. Our rules for
) say that any way of making the LHS true should also be a way of
making the RHS true. In this case, for each way of being a man, you
have to be able to nd a woman, such that the man loves the woman.
In the light of the previous discussion, consider the following models:
m1 m2 m3 w1 w2 w3
woman(w1 ) woman(w2 ) woman(w3 )
man(m1 ) man(m2 ) man(m3 )
(20)
love(m1 , w1 )
love(m2 , w2 )
love(m3 , w3 )
m1 m2 m3 w1 w2 w3
woman(w1 ) woman(w2 ) woman(w3 )
man(m1 ) man(m2 ) man(m3 )
(21)
love(m1 , w1 )
love(m2 , w1 )
love(m3 , w1 )
m1 m2 m3 w1 w2 w3
woman(w1 ) woman(w2 ) woman(w3 )
(22)
man(m1 ) man(m2 ) man(m3 )
love(m1 , w3 )
MIT Press Math6X9/2003/02/27:16:11 Page 272
272 Chapter 11
The rst model here makes the rst DRS we computed true: for each
man we can nd a woman that he loves. It doesn't make the second DRS
true: we can't nd a single woman such that all men love that woman.
On the other hand, this last condition is met by the second model, which
therefore makes the second DRS true. The last model fails to make either
DRS true. From these models we can see that any model which makes
the second DRS true will also make the rst DRS true. A further case
is worth considering, namely that where the model contains no men and
no women. That model makes the rst DRS, but not the second, true.
Some people nd it dicult to see that there is more than one
interpretation for sentences like this. You may like to think about the
following sentences:
(23) Every worker must take part in a re drill next week.
(24) Every student has to come to a meeting next week.
(25) Every person in the village has a friend (in the subpost-
mistress).
(26) Every ocer was present at the arrest of a highly dangerous
criminal.
On the other hand, a sentence such as
(27) Every student has access to a state-of-the-art computer.
seems to require an interpretation where there is not just one computer.
The kind of ambiguity we have investigated here is termed quan-
tifier scope ambiguity (from the elements used in some forms of
Exercise
MIT Press Math6X9/2003/02/27:16:11 Page 273
Ambiguity 273
Exercise 11.6: Provide the details of how the logical forms for the two
sentences above are composed.
Exercise 11.7: What does the phrase a certain do, in every man loves
a certain woman?
Exercise 11.8: What does the phrase each do, in three lawyers bought
a house each?
(34) VP ! VP PP
274 Chapter 11
phrase" respectively
With these rules, we can analyse a sentence such as Etta hit the man
with a hammer in one of two ways, corresponding to the fact that there
are two trees associated with the sentence:
(35) VP
VP PP
V1 NP
hit NP PP
with the meaning in which Etta used a hammer to hit someone. The
second tree, where the PP groups with the NP, shows the analysis whose
interpretation is that Etta hit a man who had a hammer.
The example here just relies on syntactic rules to do with phrases
to induce ambiguity. In some cases, categorial ambiguity may also be
involved:
(37) Pip saw her duck.
This sentence could either mean that Pip saw an animal belonging to
some female, or that Pip saw some female bob down. In the rst case,
we interpret her as a pronoun indicating possession, functioning in a
manner similar to Det, in the second as a pronoun of the kind we've
seen before.
MIT Press Math6X9/2003/02/27:16:11 Page 275
Ambiguity 275
276 Chapter 11
the machine doesn't have to deal with the phonetic eects which result
from the sounds used in one word aecting those in another. These two
problems | oronyms and the fact that we're not good at modelling
with computers the phonetic consequences of words being pronounced
in immediate succession | mean that computers which understand a
range of vocabulary at least as large as most humans have, and are able
to process continuous speech produced by dierent speakers, are still the
stu of science ction.
This completes our survey of the topic of ambiguity. Before we move
on, it's worth pointing out some things that follow from our discussion.
First let's consider ambiguity in another domain, that of drawings.
Figure 11.1 shows a \Necker cube", an outline drawing of a cube that
has two interpretations, according to which of the two vertices towards
the centre of the diagram we perceive as nearer. You can reinterpret
Figure 11.1
A \Necker cube"
Ambiguity 277
some other meaning. The eect is less striking, because it doesn't relate
to an ongoing external stimulus | there is no picture that we're looking
at against which to correlate our interpretation | but we venture that
the eect is there all the same.
This observation provides some useful input into a longstanding
question, namely the issue of the relationship between language and
thought. Some people will tell you that \thought is the same as lan-
guage". The existence of ambiguity provides one method of arguing that
these people are wrong. We can entertain dierent interpretations of the
same word or sentence, and that implies that there's some dierence
between a word or sequence of words in a sentence and the thought that
that word or those words inspire. Put another way, the fact that I can
choose how to interpret the word bank or the sentence Peter arrived at
the bank implies that there is no simple, direct correspondence between
those units of language and whatever thoughts are made of.
Ambiguity represents one crucial respect in which human languages
dier from articial languages, for example those developed in maths,
logic or computer science. If a sentence is ambiguous and intended in a
certain way by a speaker, the possibility exists for a hearer to interpret
in a way not intended by the speaker. So, the existence of ambiguity
represents in principle an impediment to communication. If language
were not ambiguous, speakers would be more certain that the goal of
communicating some meaning is achieved by some utterance. On the
other hand, we often don't perceive ambiguity, even though we can
demonstrate its existence in many cases. It's probably fair to say that the
ubiquity of ambiguity in natural languages was barely suspected until
people attempted to get computers to process language. We'll see in the
next two chapters that ambiguity is problematic for machines, but easy
for people, to deal with. The obvious interest is why that should be so.
Exercises
Exercise 11.9: Go through all of the examples in Section 11 catego-
rizing the ambiguities you can nd. Which examples do you think would
be dicult to model in our grammar and why?
Exercise 11.10: What ambiguities are present in How did you nd the
steak?
MIT Press Math6X9/2003/02/27:16:11 Page 278
278 Chapter 11
Exercise 11.11: Before reading these notes, what was your opinion
about the relationship between thought and language? Where did that
opinion come from? Has it been altered, and if so how, by the discussion
above?
Ambiguity 279
280 Chapter 11
MIT Press Math6X9/2003/02/27:16:11 Page 281
12 Language in Machines
We've seen in the last couple of chapters how we can give a description
of some aspects of language, that's formal in the sense dened on p196.
In exercises, you've practised constructing sequences of representations
which capture some aspects of the meaning of simple sentences. To a
large extent, this involves the mechanical application of syntactic and
semantic rules.
In this chapter we look at ways of getting computers to do a similar
kind of processing at the level of syntactic rules. We'll see a set of
instructions which are sucient (in many cases) to produce one or more
trees from a given sequence of words. We'll see that ambiguity is a major
problem, and demonstrate this by the exponential growth in the number
of syntactic analyses of relatively simple sentences. In the next chapter,
we'll use this as an argument against a simple view of how humans
process language.
The large amount of background knowledge that humans bring to the
task of understanding language has to a great extent frustrated attempts
to use grammars of the kind we develop to produce computer systems
for automatically processing language. We'll see that one engineering
response to this problem is to rely on statistical properties of language.
We'll use these concrete examples to reexamine the discussion of Turing
machines vs. biological computers from Chapter 6.
282 Chapter 12
PN ! John S! NP VP
PN ! Mary
PN ! this NP ! PN
Det ! a NP ! Det N
Det ! his NP ! Det AP N
Det ! the
Det ! my AP ! AP A
Det ! her AP ! A
N ! dog
N ! boy VP ! V0
N ! girl VP ! V1 NP
N ! home
N ! garden
V1 ! is
V1 ! loves
V0 ! walks
A ! angry
A ! beautiful
Figure 12.1
Syntactic rules used in parsing example
284 Chapter 12
the VP. It rst considers the possibility that VP expands as V0, adding
in the corresponding tree. When it gets to step 7, it fails to match walks
to the input and so revises its choice of rules. As walks is the only choice
for V0, it goes back to its choice for VP. The parts of the tree introduced
for the VP ! V0 rule are erased as the program reconsiders its choice of
rule. (Notice that several individual steps are omitted between steps 7
and 8.) In steps 9 and 10, rst is is tried as a V1, and then loves is.
The next steps will be to attempt to expand the NP. If it can analyse
what's left (her dog) as an NP, then we will have an analysis for the
whole sentence.
Exercise
Exercise 12.1: Complete the sequence of diagrams that will produce
a syntactic tree for this example. Be certain that you notice all the blind
alleys that the program gets into and that you don't skip any steps.
Exercise 12.2: Think of some other sentences that the grammar al-
lows, and repeat the previous exercise.
286 Chapter 12
Exercise
Exercise 12.3: Get a large piece of paper and write down all the steps
that a parallel parser would go through for this example.
12.2 Ambiguity
Recall the example we discussed in Chapter 11, concerning the words
service today written on a scrap of paper and then:
pinned to a church door,
on a post-it on a VCR,
on a post-it on a locker in a tennis club or
pinned to a ewe's stall.
This example is intended to highlight the immense amount of knowledge
we can bring to bear in interpreting bits of language. According to the
situation in which we nd the words, our interpretation of them diers
radically. This example gives you something of a feeling for the kinds of
choices that a computer would have to consider in order to determine
what the appropriate interpretation of some word is. We will deal with
this topic in detail in Part IV of the book. For the time being, note rst
as discussed in the previous chapter, that ambiguities are pervasive in
natural languages, and second that ambiguities within a single sentence
interact multiplicatively. Let's look at an example:
MIT Press Math6X9/2003/02/27:16:11 Page 287
(3) I 1
saw 2
a star 2
with a telescope 2
with a large lens 2
with my friends 2
Here we have choices about, for example, what sense of a word is used
in a particular case or how to group words together: the word saw could
mean either \to have seen someone in the past" or \to cut something
with a saw". Recall that, in the treatment we gave in Section 11.2, each
sense of a word is associated with a dierent syntactic rule. Either I
or a star had the telescope (in some sense of had or have); . . . One
interpretation might then be \When I'm with a group of my friends,
I use a telescope with a large lens to cut through a famous person".
That's clearly a very odd thing to say, but it's clearly also a possible
interpretation of this sentence.
Each two-way ambiguity here doubles the number of interpretations
a sentence can receive. (Note that some categorial ambiguities, for ex-
ample I vs. eye, may be removed in virtue of the syntactic context in
which a word appears, as discussed in Section 11.1 on p.265.) That sug-
gests that there are at least 25 = 32 interpretations of this example, and
that's twice the number of words already. If we add another ambiguity,
for instance by replacing friends with set, we will again at least double
the number of interpretations. As you can tell, this means that even for
innocuous-seeming sentences there are likely to be many analyses and, if
we use a model of grammar like that developed earlier, and the parsing
algorithm described above to nd those analyses, we could be in for a
long wait. Let's see just how long.
Exercises
Exercise 12.4: Work out a few more of the dierent interpretations of
this example.
Exercise 12.5: Justify the claims made about the number of ambigu-
ities above, and categorize them in terms of the kinds of ambiguity seen
in the previous chapter. Is there more ambiguity in this example than
MIT Press Math6X9/2003/02/27:16:11 Page 288
288 Chapter 12
we've claimed?
Exercise
Exercise 12.6: Construct other ambiguous sentences, and attempt to
compute just how many ways they are ambiguous. Note which of these
reading are plausible and which are absurd.
chapter, we'll see other reasons for the implausibility of a model which
produces all syntactic analyses in the human case. In the meantime, let's
assess the current model to see what's good about it and what might be
wrong with it.
12.3 Modularity
The model in example 1 has at least one thing going for it|it's very sim-
ple. In particular, we can design simple processes for parsing sentences,
for example, safe in the knowledge that the application of semantic rules
can't aect the operations of the parser.
As a consequence, it's easy
to write a computer program to use as a model
and to predict what the behaviour of the program will be.
In the jargon, this kind of organization is modular. In articial
systems, modularity is a useful property; it allows one to be more certain
in one's predictions of the behaviour of some systems. To take one
example, consider an electric car clock. It must be connected to the
car electrics to work, but should be be on a dierent circuit from the
ignition. After all, you don't want the clock to stop when the car is
turned o. In a well-designed car, the operation of the clock should not
aect the operation of the remainder of the car. You wouldn't want the
engine to rev when you advance the time. In general then, because the
relationship between the clock and the rest of the car is (or ought to be)
stable, changes in the clock shouldn't aect the behaviour of the rest
of the car. The only thing that should aect the clock is the complete
removal of power from the electrical system. On the other hand, the
braking system of a car is less strictly isolated from other systems, as a
result of the legal requirement that your lighting system indicates to road
users behind you that you have applied your brakes. When the brake
pedal is pushed, a circuit is closed to light the brake lamps, in addition
to various bit of hydraulics operating so as to bring the braking surfaces
into contact. According to the state of the brake pedal then, the state of
both the electrical and hydraulic systems change. A consequence of the
more complex arrangement is that faults can be more complex and more
dicult to diagnose. It's possible to imagine a fault in the brake lamp
MIT Press Math6X9/2003/02/27:16:11 Page 290
290 Chapter 12
circuit which results in the engine cutting out when the brake pedal is
depressed (especially if the fuses fail to do their job). As the relationship
between two components in a system becomes more intimate, so it may
be more dicult to diagnose where a fault lies, and more dicult to
predict that, for example, replacing a particular component will cure a
fault.
As another example of a highly modular system, consider a pro-
duction line. At each step in the production process the output of the
previous step is altered in some way and passed on to the next step.
Such a system maximises throughput, by splitting the overall task up
into very simple steps, with consequent economic benets, including ac-
curate estimates of the amount of time that it will take to produce one
unit of output. The system specied in example 1, in which syntactic
rules are used to construct a tree which is then passed along for pro-
cessing using the semantic rules, clearly falls under the production line
model. Where our system seems to break down is that the syntactic part
of the system has to do too much work|we can't guarantee that it will
nish within a reasonable time.
An example of a much less modular system is the production of a
book with multiple authors. As the text develops, one author may say
to another: I don't like what you've done here|rewrite it. It may be the
case that the authors have to negotiate amongst themselves in order to
reach a decision about the text. In a piece of text on which a group of
people have worked closely, it may be impossible to decide which author
is responsible for a certain piece of text.
Relative to the model of example1, a less modular system would then
allow application of semantic rules to aect the operations of the parser.
A famous example of this is a program called SHRDLU. In processing
sentences which are potentially ambiguous, such as Put the red block
on the blue block next to the pyramid, SHRDLU was able to consult its
model of the world to check whether it could nd a red block on a blue
block, or a blue block next to the pyramid, to work out which grouping
of words is appropriate.
As an engineering consideration then, modularity is a useful property
of systems, but strict modularity of the production line model may not
always be achievable. In the case of cars, the braking and electrical
systems have to be linked in some way in oder to comply with the law.
In the case of articial processing of human language, strict modularity
MIT Press Math6X9/2003/02/27:16:11 Page 291
Exercise
MIT Press Math6X9/2003/02/27:16:11 Page 292
292 Chapter 12
You'll recall that there are some translation rules (for example,
those for every and a) which interact: if one could apply one or other
of the rules, the order in which you apply them is signicant|the
dierent orders give rise to dierent conditions. So, the semantic rules
can themselves introduce ambiguity. In this case we can do the same
thing as we did for alternative syntactic rules: remember the other
choices of rule you could have made and be prepared to use the other
ones. In other words, we could use a backtracking processor which
remembers alternative rules for later investigation.
understanding of humans.
Once we have a model of grammar which can be manipulated by
a computer, there are many uses to which automatic processing of
language could be put. Even the simple grammar we've looked at so far
can be exploited by systems that perform various language-based tasks.
Applications which typically exploit rules of the kind we've seen so far
include database query, machine translation, and dialogue systems (e.g.,
tutorial systems). Other applications for language technology include:
text retrieval
question-answering systems
information extraction systems
command and control
prosthetics for speech/hearing-impaired
text analysis
{assessing readability
{digesting
{automatic categorisation and indexing
knowledge engineering and acquisition
{inputting knowledge to and
{explaining the reasoning of intelligent systems.
Computer-aided instruction
Exercise
Exercise 12.8: Add to the list above.
294 Chapter 12
Exercise
Exercise 12.9: For some of these applications, it will be easier to
enforce these limitations. You might like to think about which.
one another. For example, many researchers use the Canadian Hansard
corpus to automatically acquire a probabilistic model of English-French
translations (this corpus of reports on parliamentary debates, where each
report is written in English and in French).
An obvious question that arises is: if such approaches are techno-
logically successful, then is this evidence that we should view human
processing as statistically based? There are a number of assumptions
that lie behind this question. It assumes for example that successful
technology diagnoses successful science.
Exercise
Exercise 12.10: Do you agree with this claim?
296 Chapter 12
298 Chapter 12
In what way does Einstein's opinion line up with the symbolic vs.
non-symbolic debate? What is your opinion on whether we can expect
to nd laws that are strictly true or false? Where does your opinion
come from? What would you count as evidence for or against your
opinion? Are there other beliefs you hold that are closely related to
that opinion? How do these questions relate to the claim that \science
chooses its own problems".
300 Chapter 12
1: S 2: S
NP VP NP VP
PN PN
NP VP NP VP
PN PN V0
Mary Mary
loves her dog loves her dog
7: S 8: S
NP VP NP VP
PN V0 PN V1 NP
NP VP
PN V1 NP
Mary loves
her dog
Figure 12.2
Steps of the parsing algorithm
MIT Press Math6X9/2003/02/27:16:11 Page 301
13 Language in People
13.1 Ambiguity
One of the most striking things about human processing of language is
that it works so well in the face of ambiguity. In the course of speaking
and listening, people rarely notice ambiguities. On the other hand, if you
go looking for ambiguities, you nd them everywhere. As we saw in the
last two chapters, there are many potential sources of ambiguity, and
we claim innocuous-seeming sentences to be perhaps hundreds of ways
of ambiguous. Still, in the task of processing language, we humans don't
normally get hung up on attempting to sort out which interpretation to
opt for.
How can we square these two perceptions? Is it the case that we've
manufactured our ambiguities by looking at language the wrong way? Or
can we demonstrate that at least some of these ambiguities are detected
and processed, even if we're unaware of that processing? We'll see that
the latter answer is the right one in at least some cases.
We can show that human beings consider at least two of the types of
ambiguity discussed in Chapter 11. We'll use data from an experiment
based on a technique called cross-modal priming|\Cross-modal"
because it uses both the modalities of vision and hearing, and \priming"
because it makes use of the fact that, when processing a particular word,
say dog, related words (such as cat) can be retrieved from memory faster.
In that experiment, subjects were played a tape with a sentence on
it, and at some point shown some characters on a screen. They have to
indicate by pressing one of two buttons whether or not the characters
MIT Press Math6X9/2003/02/27:16:11 Page 302
302 Chapter 13
Exercises
Exercise 13.1: For each of the above cases, work out which word
causes the problem, and why. Rephrase the sentence so that the hiccough
disappears.
Exercise 13.3: Make up some garden path sentences of your own. Can
you see any variation in how strong the garden path eect is?
304 Chapter 13
of rule is whether to group cotton and shirts or to group shirts are made
of. We seem to plump for the former, and to interpret the words are
made of as the main verb of the sentence. That is, when our parser
processes those words, it has already assumed that we have reached the
verb phrase within a sentence. We get stuck when we reach the real verb
in the sentence grows, and this is considerably after the point at which
the parser made the wrong choice.
So, the phenomenon of garden path sentences might be taken as some
evidence in favour of the view that humans compute a single syntactic
tree when processing language. This is clearly a tempting hypothesis|it
provides a very simple explanation of some striking data. The question
is: can we make it stick? We'll review some alternative explanations in
Section 15.5 below. For the time being, note the following examples, and
observe the way in which the sequence of words at the start of a sentence
(in this case Have the police), which is ambiguous, can be disambiguated
in the basis of the subsequent words:
(6) Have the police . . . eaten their supper?
come in and look around.
taken out and shot.
For example, in Have the police eaten their supper?, the police are the
subject of the verb have. But in Have the police come in and look around
they are the object. We can also delay the point in the sentence where
this subject vs. object ambiguity is eliminated; for example, we could
replace . . . above to end up with a sentence Have the police who are
investigating the hideous murder come in and look around. It may be
that you `garden path' when you here the word look. For some sentences,
though, that eect doesn't seem to be there. One way of eliminating
processing diculty is to eliminate the ambigiuity. For example, one
could eliminate the subject vs. object ambiguity in the phrase have the
police by replacing the the police with them if the meaning required is
the subject one (e.g., Have they eaten their supper? vs. *Have them
eaten their supper?), or them if the meaning required is object one. But
articulating sentences which generate no ambiguities is almost always
impossible, and even when it is possible it's often not helpful. This is
because, as these examples suggest, humans often expect to consider
multiple possible analyses, and are sometimes willing to `simultaneously'
entertain two dierent hypotheses about the organization of the sentence
MIT Press Math6X9/2003/02/27:16:11 Page 305
306 Chapter 13
is, it never computes more than a single tree. As noted above, this seems
to oer an explanation for the garden path phenomenon.
On the other hand, there is also contradictory evidence that the
human processor operates simultaneously on syntactic, semantic and
perhaps other aspects of the speech being processed. Some of the most
impressive evidence for this come from the experiments involving shad-
owing.
Exercise
Exercise 13.4: Try the shadowing task yourself.
ing where people look in a particular scene, we can make inferences, for
example, about how much of a sentence has been processed.
In one such experiment, people are asked to move playing cards
around a table. The layout of the table might look something like
Figure 13.1 A possible instruction to a subject might be: move the six
4 6|
6~ 3}
K |
Figure 13.1
Cards on the table in an eye-tracking experiment
of clubs to beneath the three of diamonds. One result obtained from this
experiment is that, with this instruction and in a conguration like that
shown in Figure 13.1, the subject's gaze alights on the 3}, just as, or
even just before, the word diamond is said.
One conclusion one might draw from this experiment is that, as with
shadowing, people can allow their knowledge of the context to allow
early processing of semantic and discourse information. In this case, the
uniqueness of the 3 allows the subject to work out which card is being
referred to.
There are a number of open questions that these experiments raise,
however. For example, the tasks the human subjects are asked to perform
are `unnatural', in the sense that identifying a particular card out from a
small set of dierent cards is not something one does everyday (although
arguably we do closely-related tasks frequently). Furthermore, the task
is highly repetitive: subjects do the same task again and again in these
experiments, and so their performance improve with practice. To what
extent do these factors aect the conclusions that we can draw from the
experimental results? Right now, there are no clear answers.
13.5 Conclusions
The modularity of human language processing, the size of the compo-
nents involved and the extent to which those components may operate in
parallel all represent open research questions in this area. We can make
some tentative conclusions.
MIT Press Math6X9/2003/02/27:16:11 Page 308
308 Chapter 13
Exercise
Exercise 13.5: 'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe. Lewis Carroll, Jabberwocky
Think about Lewis Carroll's poem, and assess what it tells us about
the notion of well-formedness in English (and other natural lan-
guages),
human abilities in interpreting utterances in terms of their form and
the overall process of arriving at a meaning for utterances.
13.6 Further Reading
There are some excellent text books on the market that cover either
Psychology in general or cognitive psychology and psycholinguistics
in particular. For a general introduction to psychology consult the
following:
1. Gleitman, Henry, Alan Fridlund and Daniel Reisburg (1999) Psychol-
ogy, W Norton, 5th Edition.
Chapters 7, 8 and 9 of the above book cover issues concerning human
sentence comprehension and production which are discussed in this
chapter. But this book also includes information about other areas of
psychology, such as psychology in education and social psychology.
MIT Press Math6X9/2003/02/27:16:11 Page 309
310 Chapter 13
that route on this map. The features on each map may dier, and so the
two participants may have to talk about those features in order to work
out the routing. With this now in mind, consider this extract from the
corpus:
(7) a. Neil: Right. Start from the sandy shore,
b. Chris: Okay.
c. Neil: moving down ... straight down.
d. Chris: How far?
e. Neil: Down as far as the bottom of the well.
f. Chris: I don't have a well.
g. Neil: fg|Ah. Right, fg|eh. Move down, fg|eh, vertically
down about a quarter of the way down the page. Move to
the right in ... Do you have local residents?
h. Chris: I do.
i. Neil: Right, well, move up and round and above them.
j. Chris: Okay.
The NP the sandy shore is felicitous only if the context contains a unique
(salient) object that satises the description that it's a sandy shore. So
when Neil uses this phrase in (7a), he must be assuming that Chris
has one and only one sandy shore on his map; moreover, Chris must
know that Neil assumes this through observing that Neil has used this
NP. But Neil doesn't make this assumption explicit. He doesn't say You
have one and only one sandy shore on your map. This is all `hidden'
in his use of the sandy shore. Also, Chris has to work out what he's
supposed to start from the sandy shore, because Neil doesn't tell him
explicitly. Start running from there? Start cleaning up the environment
from there? Start shouting from there? Chris will assume Neil means
start drawing. Moreover, Neil knows Chris will assume this, because of
their shared knowledge about the purpose of the conversation, which
is to do the Map Task. So here we see that the background knowledge
people have about the world and each other in
uences the message that
they extract from an utterance.
We can go further. In utterance (7c), Chris interprets Neil to mean
moving down from the sandy shore. But Neil doesn't say this explicitly;
MIT Press Math6X9/2003/02/27:16:11 Page 315
320 Chapter 14
ingly, when can one use a pronoun to refer to something? And if there
is more than one possible referent for a pronoun|as there is in text (2),
for example|how do the dierent choices get ranked:
(2) John met Bill. He asked him a question.
Observe how although one could interpret this text so that it means
that Bill asked John the question, there seems to be a preference for
interpreting it so that John asks Bill the question.
Answers to these puzzles should ideally provide us with a systematic
procedure for nding the antecedents to pronouns. That way, we gain an
understanding of the systematic rules by which languages work. And we
also come closer to being able to get machines to understand pronouns
in the same way as humans do. Pronouns appear very frequently in text.
To convince yourself of this, just pick a newspaper article at random, and
compare the number of sentences that feature pronouns to the number
that don't. Now, to see how useful pronouns are as a communication
device, try to paraphrase that newspaper text so that all its pronouns
are removed, and observe how the result is odd and very dicult to
understand. Here is an example, taken from the BBC news website:
(3) a. Tony Blair praised his wife and accused the media of \dis-
tortion" over its coverage of her dealings with fraudster
Peter Foster.
b. Tony Blair praised Blair's wife, and accused the media of
covering the wife's dealings with fraudster Peter Forster
in a distorted fashion.
Since pronouns are so frequent, computers must be able to handle them
if they are to communicate with human languages in a way that is useful
to humans.
Unfortunately, life's not simple, and easy strategies for nding an-
tecedents to pronouns don't work. Perhaps this is surprising, since people
are very good at working out what a pronoun refers to, and are hardly
even conscious of the quite complex reasoning that's involved. To see
that things are more complicated than they rst appear, let's consider
what appears to be a simple and sensible strategy, and observe where it
falls short. The simple strategy is the following:
1. Look back for a noun phrase.
MIT Press Math6X9/2003/02/27:16:11 Page 321
322 Chapter 14
in (7d) has changed what he refers to, compared with (7a). In (7a),
he refers to the man who loved Mary which is introduced in the rst
sentence; (7a) can be true when two men loved Mary, only one of whom
proposed to her. But in contrast, he in (7d) refers to every man that
loved Mary, and so it isn't true when two men love Mary, only one of
whom proposed to her. That's because (7d) expresses a regularity about
men that loved Mary|they all proposed to her. (7a) doesn't express
this. So what precisely are the linguistic constraints on what a pronoun
refers to? For a computer to deal successfully with these sentences, we
must devise a systematic way of building their semantic representations
and nding antecedents to pronouns, so that the dierence in meaning
between (7a) and (7d) is captured, and the fact that (7a,d) are better
than (7b,c) is predicted.
Discourses (8) and (9) show that non-linguistic information
aects pronouns too.
(8) If a baby hates cow's milk, boil it.
(9) If an incendiary bomb falls near you, don't lose your head.
Put it in a bucket and bury it in sand.
It could refer to the baby in (8) and your head in (9), but with
unfortunate results!
MIT Press Math6X9/2003/02/27:16:11 Page 324
324 Chapter 14
Intuitively, the general knowledge that one shouldn't boil the baby,
or stick your head in the sand, in
uences our preferred interpretations of
(8) and (9). When there's a choice of what a pronoun might refer to, we
tend to prefer those choices that are `in tune' with general knowledge like
people don't boil babies, to those choices that ignore such knowledge. We
must therefore ensure that a computer which has to deal with pronouns
uses general knowledge in this way. This is actually very dicult to do,
given the current technology. Humans have a vast amount of knowledge
about the world. But computers don't, unless we tell them about these
MIT Press Math6X9/2003/02/27:16:11 Page 325
326 Chapter 14
what the magistrates did is described before the mention of her living
with her husband in (12), and it's the other way round in (10). This
makes it easier to interpret he. So arguably, there are some constraints
on nding antecedents to pronouns that are violated in (10), but aren't
violated in (12). This is why (12) sounds better.
Ideally, we would want a machine to produce texts like (12) rather
than (10), even though humans produce texts like (10) when they're in
a hurry! If a computer could do this, then it would be a useful tool in
assisting busy copy editors. It could spot the mistakes that journalists
miss, and suggest improvements. But in order to get a computer to do
this, it must be programmed with the appropriate linguistic constraints
on antecedents to pronouns.
In this chapter, we'll address the following question: Can the way
humans interpret pronouns be modeled in a mechanical way, so that we
can get computers to do it? We'll approach this question by exploring
how we might extend the grammar we constructed earlier in the course
(see Part IV), to model what pronouns mean. We will show that the
semantic representations of sentences we constructed earlier in the course
encode information that we need in order to specify linguistic constraints
on what a pronoun refers to, as illustrated in (6{7). We will demonstrate
this by extending the grammar to account for what's happening in these
texts. And we will discuss what kinds of mechanisms we would need,
in order to rank the possible antecedents to pronouns on the basis of
background knowledge, as shown in (8) and (9).
But this doesn't record what kinds of things (i.e., singular or plural,
masculine or feminine) the pronouns can refer to. The simplest way to
represent this information is to divide the category PRO into several
classes, according to the number and gender of the thing referred to, as
shown below (where \fem" stands for feminine, \masc" for masculine
MIT Press Math6X9/2003/02/27:16:11 Page 328
328 Chapter 14
c. PRO 2 3 ! it
6
6 7
6 num : sing 77
6
6 7
4 gender : neut 75
d. PRO
2 3 ! they, them
6
6 7
4 num : plur 75
the genders of the pronouns that it can co-refer with. So for simplicity, we
will approximate this by including an attribute gender on nouns as well.
This is to be interpreted, however, as the gender of the pronouns which
can co-refer with the noun. Thus we adopt the following rules in the
grammar (and these replace the rules involving nouns from Chapter 9;
see p.213, for example):
(18) a. N 2 3 ! woman, girl, doctor, lioness, dog,
6
6 7
6 num : sing 77
6
6 7
4 gender : fem 75
ship, car : : :
b. N 2 3 ! table, chair, house, ship, bone,
6
6 7
6 num : sing 77
6
6 7
4 gender : neut 75
car : : :
c. N 2 3 ! John, man, doctor, boy, dog, . . .
6
6 7
6 num : sing 77
6
6 7
4 gender : masc 75
These rules re
ect the fact that woman and Jane are singular and always
denote something feminine, table always denotes something neuter, man
always denotes something masculine, while ship can be viewed as either
neuter or feminine, and doctor denotes something masculine or feminine.
In what follows, we'll ignore they and them, since computing an-
tecedents to these plural pronouns is much more complex than for the
singular pronouns. This is because you can `sum together' things in the-
context to form a plural group|this is what happens in (19) for example,
where Jane and Mary are grouped together and they refers to both of
them:
(19) Jane and Mary went to the cinema. They had a great time.
We don't want to discuss the constraints on this grouping process,
since they are very complex and not very well understood at present.
So instead, we'll concentrate on mechanising the process of nding
antecedents to he, him, she, her and it. We will also ignore indexical
MIT Press Math6X9/2003/02/27:16:11 Page 330
330 Chapter 14
NP
PRO2 3 becomes:
6
6 7
6 number : num 77
6
6 7
4 gender : gen 75
MIT Press Math6X9/2003/02/27:16:11 Page 331
x =? num(x) gen(x)
332 Chapter 14
NP
PRO 2 3
6
6 7 becomes:
6 number : num 77
6
6 7
4 gender : gen 75
num(x) gen(x)
x =?
where x hasn't been used yet.
The box at the bottom contains instructions to do more processing. It's
demarcated from the other elements of meaning. The fact that there
is something in the instruction box for sentences involving pronouns
MIT Press Math6X9/2003/02/27:16:11 Page 333
NP VP
PRO 2 3 V0
6
6 7
6 num : sing 77
6
6 7
4 gend : fem 75
barked
she
By the rule Pronouns (Revised), we can form the following from this:
MIT Press Math6X9/2003/02/27:16:11 Page 334
334 Chapter 14
x VP
V0
barked
sing(x)
fem(x)
x =?
You can then reduce the rest of the syntactic tree to semantic informa-
tion as we've done before (see Chapter 10), so that the nal semantic
representation of (20) is (20 ):
0
x
sing(x) fem(x)
(20 ) 0
bark(x)
x =?
In words, (20 ) states: there just one thing talked about|x|where x is
0
singular and feminine, and x barked, and we have to nd out who x is.
Note that our extension to the syntax of the grammar has enabled
us to explicitly encode the number and gender of discourse referents
in the semantics. In the above example, this information is attached
to the discourse referent x that was introduced when the bit of the tree
containing PRO was translated into semantic information. And we'll use
this to check which antecedents to this pronoun are acceptable.
We can check the number and gender of the discourse referent in-
troduced by the pronoun against other discourse referents introduced
by nouns, because these discourse referents will also having number and
gender conditions on them. These number and gender conditions will
be produced when the noun is translated into semantic information. We
MIT Press Math6X9/2003/02/27:16:11 Page 335
x
name(x)
NP gen(x)
num(x)
becomes:
Det N2 3
6
6 7
6 number : num 77
6
6 7
4 gender : gen 75
x
a(n) name
So, for example, a bone will produce a discourse referent y with the
conditions bone(y ), sing(y ) and neut(y ) in the DRS by the above rule.
This information helpus us work out when the bone can be an antecedent
to pronouns that appear subsequently in the text (if there are any), on
the basis of whether its number and gender match that of the pronoun.
If the semantics didn't include information about number and gender,
then we would basically be back to the problems we had with the rule
for pronouns given in earlier chapters; we wouldn't be able to stop the
grammar from identifying she with a bone, for example.
We must carry out the instruction x =? in (20 ). That is, we must
0
work out what x refers to. This involves stating rules which help us
to identify x with a discourse referent y , where y has already been
introduced into the DRS. One of the constraints on this identication
procedure is that the properties of x and y are consistent. In other
words, the things already said of x (including its number and gender)
and those of y are such that if we assume x and y refer to the same
thing, we don't get a contradiction.
MIT Press Math6X9/2003/02/27:16:11 Page 336
336 Chapter 14
means of an example:
(6) a. Etta found a bone. She barked.
To build the DRS of a text like (6a), we apply the following algorithm.
Steps 1{3 and 5 are procedures we've already assumed in the grammar
for translating text into DRSs (see Section 10.2). Step 4 is the extension,
and it tells us how (and when) to carry out instructions of the form x =?.
Algorithm for Constructing DRSs for Text:
1. Start by assuming that the rst sentence of the text is the one
currently being processed. Go to step 2.
2. Use the grammar to construct a syntactic structure of the sen-
tence currently being processed. Add it to the DRS being built. If
there is more than one such syntactic structure, then construct a
seprate DRS for each of these analyses (to re
ect the ambiguity),
and continue with the steps in the algorithm for each of these DRSs.
Go to step 3.
3. Apply the rules which convert the syntactic structure in the DRS
into semantic information. Go to step 4.
4. Deal with instructions of the form x =?. Replace x =? in the
instruction box with a condition of the form x = y , which is added
to the main part of the box. This identity condition must meet the
following:
(a)y is a discourse referent in the DRS built already.
(b)The choice of y is constrained by the Consistency Con-
straint, Structural Constraint and Knowledge Con-
straint, which we will specify below.
If there's a choice of identity conditions, then each option is repre-
sented in a separate DRS (that is, the text is assumed to be am-
biguous). If there's no choice at all, then the procedure for building
a DRS for the sentence has failed, and so the sentence is predicted
to be odd. If it succeeds, go to step 5.
5. Make the next sentence of the text, if there is one, the sentence
that is currently being processed and go back to step 2. If there are
no more sentences in the text, then stop.
MIT Press Math6X9/2003/02/27:16:11 Page 337
NP VP
PN 2 3 V1 NP
6
6 7
6 num : sing 77
6
6 7
4 gen : fem 7 5
a bone
MIT Press Math6X9/2003/02/27:16:11 Page 338
338 Chapter 14
Note that so far, there is nothing in the instructions box; in other words,
nothing in the rst sentence requires us to go and nd out more from
preceding text, about what the sentence is about, before we can work
out if it's true or not.
Now we go back to step 2 and process the second sentence. Note that
the syntactic structure for the second sentence goes into the DRS you've
built for the rst one, re
ecting the fact that it continues the discourse
that the rst sentence started. You should not start a new DRS!
x; y
etta(x) sing(x) fem(x)
bone(y ) sing(y ) neut(y )
found(x; y )
NP VP
(22)
PRO2 3 V0
6
6 7
6 num : sing 77
6
6 7
4 gend : fem 75
she barked
Now we're on step 3 again. Using the same rules as before, the above
reduces to the following semantic information:
MIT Press Math6X9/2003/02/27:16:11 Page 339
x; y; z
etta(x) sing(x) fem(x)
bone(y ) sing(y ) neut(y )
(23) nd(x; y )
sing(z ) fem(z )
bark(z )
z =?
Now we are on step 4. We have an instruction: z =? (z corresponds to
she). We have to replace z =?, with an identity condition between z and
some discourse referent already there. There are two discourse referents:
x and y. By The Consistency Constraint, we can't assume z = y
because y is neuter and z is feminine. (This contrasts with the rule for
pronouns given in earlier chapters). So we must assume z = x. So we
remove z =? from the instruction box (because we have carried out this
instruction), and replace it with z = x in the main semantic content:
x; y; z
etta(x) sing(x) fem(x)
bone(y ) sing(y ) neut(y )
nd(x; y )
(24) sing(z ) fem(z )
bark(z )
z=x
Here, we see the advantage of explicitly marking the number and gender
of pronouns in the semantics. If we didn't know that z was singular and
feminine|as asserted by sing(z ) and fem(z )|then we wouldn't have
known that identifying z with w or with y would have been wrong.
The DRS (24) is equivalent to (25), where all occurrences of z are
replaced with x, and we've removed z = x from the conditions:
MIT Press Math6X9/2003/02/27:16:11 Page 340
340 Chapter 14
x; y
etta(x) sing(x) fem(x)
bone(y ) sing(y ) neut(y )
(25) nd(x; y )
bark(x)
The DRS (25) asserts what intuitions would dictate: there are two things
we're talking about|x and y |where x is Etta, and x found y , which is
a bone, and x barked. There are no more sentences in (6a), and so we
stop there. (25) is the nal representation of what (6a) means, and even
though (6a) didn't explicitly say Etta barked, the DRS that represents
its meaning has captured this.
Now we can see why (6b) is odd. Using the above algorithm, we get
the semantic representation (6b ). 0
x; y; z
etta(x) sing(x) fem(x)
bone(y ) sing(y ) neut(y )
(6) b .0 nd(x; y )
sing(z ) masc(z )
bark(z )
z =?
There is no discourse referent that we can identify z with in (6b ), 0
x; y
him
Figure 14.1
Adding a syntax tree to the DRS for (26)
As we'll see, the above rule for translating pronouns means that the two
pronouns she and him in (26) produce one instruction each. The analysis
is as follows. First, we translate the rst sentence Etta chased Pip. We
have translated this many times before, and so I simply give the nal
DRS for this sentence below:
x; y
etta(x) sing(x) fem(x)
(26a) pip(y ) sing(y ) masc(y )
chase(x; y )
The conditions sing(x) and fem(x) are produced by the fact that Etta is
classied as a singular feminine PN. Similarly for Pip. Now we deal with
the second sentence. First we add its syntactic structure to the DRS
(26a), to produce the DRS in Figure 14.1. We have a choice of rules that
we could apply at this stage. We could either apply the pronouns rule
to the NP under the S node (i.e., to translate she), or we could apply it
to the NP under the VP node (i.e., to translate him). Here, we'll take
MIT Press Math6X9/2003/02/27:16:11 Page 342
342 Chapter 14
x; y; z
she caught
z =?
Figure 14.2
Applying the pronoun rule to the DRS for (26)
the latter choice. This produces the DRS in Figure 14.2: Note that the
pronouns rule has introduce something into the instruction box, and has
added some number and gender conditions to the conditions box.
Now we have a choice of rules to apply again. We could still apply
the pronouns rule to translate she, or we could use the rule for transitive
verbs to translate the VP node. Here, we'll use the pronouns rule rst,
to produce the DRS in Figure 14.3. This second application of the
pronouns rule has produced a second instruction into the instruction
box. But we can't carry out these instructions yet. We have to translate
the rest of the tree rst. The only rule that we can apply at this stage
is the one for translating transitive verbs. This produces the DRS in
Figure 14.4. And now we apply the sentence rule. This tells us to
substitute u with w to produce the DRS in Figure 14.5. Now we have to
empty the instruction box. There are two instructions we have to deal
with. Step 4 in the algorithm for building DRSs doesn't stipulate which
instruction we should do rst, when there is a choice as there is here. So
we could do the instructions in either order. We'll do z =? rst. By the
Consistency Constraint, z = y is the only choice, because y is the only
other discourse referent that's masculine. So we remove z =? from the
instruction box, and add z = y to the conditions box, to produce the
DRS in Figure 14.6. Now we have just one instruction left. Again, by the
MIT Press Math6X9/2003/02/27:16:11 Page 343
x; y; z; w
Figure 14.3
Applying the pronouns rule again to the DRS for (26)
x; y; z; w
Figure 14.4
Applying the rule for transitive verbs to the DRS for (26)
344 Chapter 14
x; y; z; w
Figure 14.5
Applying the sentence rule to the DRS for (26)
x; y; z; w
w =?
Figure 14.6
Carrying out the instructions in the DRS from Figure 14.5
x; y
etta(x) sing(x) fem(x)
pip(y ) sing(y ) masc(y )
(26 ) 0
chase(x; y )
catch(x; y )
x; y; z; w
Figure 14.7
Carrying out the last instruction in the DRS for (26)
346 Chapter 14
We've left out conditions like man is singular and masculine, and Mary
is singular and feminine and the number and gender of the pronouns for
the sake of simplicity. But bear in mind that the conditions are really
there.
(7) a. A man loved Mary. He proposed to her.
w; x; y; z
man(w)
mary(x)
a.
0 loved(w; x)
proposed(y; z )
y =?
z =?
(7) c. Every man loved Mary. ?He proposed to her.
x; y; z
mary(x)
w
man(w) ) loved(w; x)
c.
0
proposed(y; z )
y =?
z =?
There are two things to note about (7c ). First, the semantic conditions
0
arising from he proposed to her are in the top level box. This is because
the syntactic tree for the sentence is introduced at this level, to re
ect
the fact that it's not within the scope of every man, which was in the
previous sentence.
MIT Press Math6X9/2003/02/27:16:11 Page 347
Second, note that the object x which bears the name Mary, and the
condition mary(x) have both been moved into the top level box, rather
than appearing in the lhs box of the )-condition. This is a renement
of the rule for translating proper names which was given earlier in the
course (see Section 10.2). We shall see why we make this renement in
chapter 15. But brie
y, the motivation for promoting x and mary(x) to
the top box is the following. It re
ects the fact that in whatever context
one uses the proper name, one asserts that someone exists who bears
that name. So, for example, the sentence If Mary talks, she walks does
not imply that that Mary actually does walk, but it does imply that
Mary exists. If x and mary(x) were in the lhs box of the )-condition
for this sentence, as the grammar currently predicts, then the DRS would
fail to re
ect this. This is because the DRS for the sentence can be true
even when all the things in the lhs of the )-condition are false (cf.
the four card problem). By promoting the translation of Mary out of
the lhs DRS of the )-condition, and putting it into the top level, we
ensure that our analysis of the sentence captures the implication that
Mary exists. We'll see in detail how this promotion of the proper name
to the top level DRS occurs in chapter 15. But it's important to bear
in mind now that this happens, because we'll show below that it aects
what a pronoun can refer to.
It is interesting to note that in (7a ), the antecedents we want for the
0
348 Chapter 14
w; x
man(w)
mary(x)
(7) a . 00
loved(w; x)
proposed(w; x)
x; y
mary(x)
w
man(w) ) loved(w; x)
(7) c . 00
proposed(y; x)
y =?
(7a ) captures the natural interpretation of (7a). In (7c ), we can't
00 00
like (7c ), except that ) is replaced with something else. So the above
00
MIT Press Math6X9/2003/02/27:16:11 Page 349
350 Chapter 14
shouldn't refer to stu in B , because you haven't said B yet. This would
be dierent if you said, B, if A. But we're ignoring this ordering for the
purposes of this course.
The Structural Constraint makes the right predictions for (7b), (7b)
and (7c). It also deals eectively with (7d).
(7) d. If a man loved Mary, he proposed to her.
The grammar produces the following representation of (7d): rst the
rule which converts if: : :then to semantic information is applied:
(7) d .
0
[A man loved Mary]
) [he proposed to her]
Then the two sentences are reduced to semantic information in the usual
way:
x
mary(x)
w y; z
(7) d .
00 man(w) proposed(y; z )
loved(w; x) )
y =?
z =?
of the sentence.4
We must now carry out these instructions z =? and y =?. z =? can be
replaced with z = x because x is introduced in a box which contains the
box where this instruction is, and so it meets the Structural Constraint.
Moreover, replacing y =? with y = w also satises the Structural
Constraint. And no other choices will do, because by the Consistency
Constraint, the number and gender information on the antecedents
and pronouns must match. Identifying the discourse referents this way
produces the following:
x
mary(x)
w
(7) d . man(w)
proposed(w; x)
000
loved(w; x) )
The DRS (7d ) is true only if the following holds: if a man w loved x,
000
that loved Mary, but only one of them proposed to her. This is in
contrast to the conditions that make (7a) true. So the way information
is presented|namely whether it's in an if-phrase or not, aects how to
interpret the pronoun, and hence what the overall sentence means.
352 Chapter 14
x; y
baby(x) z
cow's-milk(y ) ) boil(z )
(8 )
0
hates(x; y )
z =?
ability to reason with uncertainty actually aects the way utterances are
interpreted. In other words, this reasoning aects what's communicated.
The Knowledge Constraint could be rephrased as follows: when the Con-
sistency and Structural Constraints yield a choice of ways for interpret-
ing a pronoun prefer the interpretion that denotes a situation with the
highest relative probability (relative in this case being relative to the
other choices for interpreting the pronoun). But using probabilities here
to specify the Knowledge Constraint begs questions. What is the rele-
vant base rate information on which to evaluate the probabilities of the
alternative readings of a sentence? How do we rank the probabilities of
the alternative interpretations? And what should those probabilities be
conditioned on (in other words, what factors in the discourse context
should aect our probability calculations)? Further, how big a dier-
ence in probabilities do we need between the situations described by the
alternative interpretations, for humans to disambiguate the pronoun in
favour of one interpretation over the other?
Another way of tackling the problem of formalising the Knowledge
Constraint is to use logic rather than probabilistic reasoning. There are
logics for reasoning with uncertainty that have been developed in ai to
model how humans conclude things on the basis of incomplete evidence.
These logics often model the same inference patterns as Probability
Theory. However, they approach the problem in a dierent way. These
logics exploit rules that symbolise the meaning of the various factors
involved in the inference. They don't involve computation with numbers,
like probability theory does. Rather, these logics supply interpretations
MIT Press Math6X9/2003/02/27:16:11 Page 354
354 Chapter 14
to rules such as (28), which are characterised by the fact that they have
exceptions (e.g., penguins are an exception to this rule), and they provide
the means to reason with these rules:
(28) Birds
y.
Representing general knowledge like (28) in explicit ways, so that a
computer can simulate the reasoning that humans do, is currently a
hot topic in ai.
But the amount of knowledge a computer needs to know is vast: just
think of the size of an encyclopaedia, and that only states a small fraction
of the kinds of things humans know. Encyclopaedias don't include
information like (28), which underlies the preferred interpretation of (8).
Representing all general knowledge in a logical manner that a computer
can understand is still an open research problem.
Furthermore, as we mentioned, even if one does know that people
don't boil babies, it's still unclear exactly how this rule leads us to
disprefer the reading of (8) where it is the baby. On what basis does
this happen? Sentence (8) doesn't actually stipulate that someone boils
a baby, even if it is interpreted as the baby (because boil it is in
the consequent of a conditional, i.e., of an if. . . then sentence). So one
cannot assume that the knowledge that people don't boil babies leads
us to disprefer resolving it to the baby. Rather, reasoning about what
the speaker intended to communicate, as well as reasoning with world
knowledge, is important. Presumably, we conclude that there was a very
small chance that the speaker meant boil the baby, even as a consequent
of the conditional (and this assumption about what the speaker intended
to communicate must be related to the fact about world knowledge
that people don't boil babies); the chances that the speaker meant boil
the milk are much higher. But how we reach such conclusions is as yet
not very well understood, because it involves reasoning about what the
speaker was thinking, as well as reasoning about the world. Reasoning
about what a person thinks is notoriously dicult, since we don't have
direct access to this information, but rather must infer his thoughts
through observing his behaviour. These problems are active research
topics.
Encoding the knowledge one needs, together with the necessary
inference mechanisms over it, in a way that would make the Knowledge
Constraint an eective contributing factor to interpreting pronouns is
MIT Press Math6X9/2003/02/27:16:11 Page 355
356 Chapter 14
we produce with our grammar are
atter than they should be. They
leave out a lot of information which, in the case of the case of (30), is
important for constraining anaphora. In particular, our DRSs ignore the
rhetorical connections between the sentences: the fact that the
topic of conversation in (30) is the three claims made in court, and that
(30a-c) describe the three claims individually. If you add this information
to the semantic representation, then the structure changes, in such a way
that the Structural Constraint as it stands gets the right answers about
what the pronoun it can refer to.
Finally, our analysis of (7d) is inadequate in the general case. Our
grammar currently predicts that when a pronoun in the then-statement
refers to something in the if-statement, then the pronoun refers to
every object that satises the if-statement. But this isn't always true.
Sometimes, it refers to just one such object. This is what happens in
(31):
(7) d. If a man loved Mary, he proposed to her.
(31) If Pedro has a dime in his pocket, he'll stick it in the parking
meter.
Because of our world knowledge about the way parking meters work, we
don't interpret (31) as Pedro stung every dime he's got in his pocket
into the meter either. Rather, we assume that he puts just some of the
dimes in. This contrasts with (7d), where the pronoun he refers to all
men that loved Mary. Given the way we've constructed the semantic
representation of noun phrases such as a dime and a man when they're
in an if -statement, we can't dierentiate the interpretations of (7d) and
(31). The natural interpretation of (31) is currently actually blocked by
the grammar. And even if we were to specify the Knowledge Constraint
more explicitly, we wouldn't be able to predict that Pedro puts only
some dimes from his pocket in the meter and not all of them, because
the semantic representation blocks this from being a choice in the rst
place! Rather, the grammar predicts that it must refer to all the dimes
in Pedro's pocket, or none of them at all. Finding a way to solve this is
currently an open research question.
MIT Press Math6X9/2003/02/27:16:11 Page 358
358 Chapter 14
Exercises
Exercise 14.1: There are many expressions in English that receive
some of their meaning from the discourse context in which they appear.
Pronouns is just one example.
1. Write down four words or phrases which, like pronouns, require
knowledge about the context for their interpretation.
2. Demonstrate that these words and phrases can mean dierent things
in dierent contexts, by means of example texts.
3. Try and specify exactly what things in the context determine the
meaning of these four expressions. What are the constraints on the
eects that context can have on their meaning? Do they dier from
the constraints we've given for pronouns? If so, how?
Exercise 14.2: The account of pronouns we've given here predicts that
if an individual is introduced in a text by his proper name, then he can
subsequently be referred to with a pronoun, regardless of the length and
nature of the intervening material between the proper name and the
pronoun. Explain why the account commits us to this. Do you think
this prediction matches the facts about the way we use pronouns? If so,
why? If not, why not?
360 Chapter 14
15 Presuppositions
presupposes that John has a wife. The things that trigger presupposi-
tions, such as the possessive 's (e.g., John's wife presupposes John has a
wife) and the definite article the (e.g., the man presupposes there is
a man) are known as presupposition triggers. How do we identify
the propositions which are taken for granted from those which are not?
And what role do presuppositions play in communication?
We have already contrasted two things that are implied by sentence
(1): that John has a wife on the one hand, and that someone took an
aspirin on the other. The latter proposition is an entailment of the
sentence (1). A necessary condition for being an entailment of a sentence
is something it must be true for the sentence is true. That someone took
an aspirin must certainly be true for the sentence (1) to be true, for
example. One of the uses for the grammar we introduced earlier in the
course (see part IV), is that you can compute very quickly some of a
sentence's entailments, via its DRS or logical form: just delete some
conditions in the DRS, and you have something that is entailed by the
sentence. As an exercise, assure yourself that you know why this works.
Nevertheless, thinking simply about the truth conditions of the sentence
itself isn't sucient for capturing what we see intuitively as a dierence
between the propositions that John as a wife and that someone took
MIT Press Math6X9/2003/02/27:16:11 Page 362
362 Chapter 15
an aspirin in our example, since they both have the property that they
must be true for the sentence (1) to be true.
Presuppositions are dierent from semantic entailments in two im-
portant ways. First, the most notable dierence is their tendency to
project from embeddings. What this means is: even if a presupposition
trigger is embedded within a conditional (i.e., it's preceded by an if-
clause), a modal (i.e., it's preceded by a phrase like it's possible that..)
or a negation (i.e., it's within the scope of not ), the presupposed material
can behave as if it was not embedded at all, in that it's implied by the
whole complex sentence. Semantic entailments don't project from em-
beddings in this way. For example, (1) entails someone took an aspirin.
But as we observed earlier, adding not to (1), to produce (2) results
in a sentence where this entailment doesn't survive: (2) doesn't entail
someone took an aspirin.
(2) John's wife did not take an aspirin.
But it does still follow from (2) that John has a wife, just as it did from
(1). Here we see a presupposition surviving when we insert not in the
sentence, but entailments don't do this.
Here are further examples that serve to illustrate how presupposi-
tions are dierent from entailments:
(3) a. The king of France signed a peace treaty.
b. The king of France didn't sign a peace treaty.
c. If the King of France signed a peace treaty, then all
disputes have been settled.
d. It's possible that the king of France signed a peace treaty.
The sentences in (3) all imply there is a king of France, but only (3a)
implies that someone signed a peace treaty. That there is a king of France
is presupposed because all these sentences imply it, even though though
the King of France is embedded in conditionals, modals and negation.
That someone signed a peace treaty is a semantic entailment of (3a):
observe how (3bcd) don't entail that someone signed a peace treaty. So,
semantic entailments are distinguished from presuppositions in that the
latter can be implied by the sentence even when the thing that triggers
them is embedded, whereas entailments lack this `projection' property.
Sentences can in fact have several presuppositions, induced by several
MIT Press Math6X9/2003/02/27:16:11 Page 363
Presuppositions 363
364 Chapter 15
Presuppositions 365
366 Chapter 15
Exercises
Exercise 15.1: Write down which expressions in the following sen-
tences are presupposition triggers, and also what they presuppose:
John knows when Bill arrived at the party.
John managed to forget the keys to his car again.
Hint: list propositions that are implied by these sentences (e.g., John
forgetting to do something implies that John intended to do it), and test
whether this proposition is implied by the negated form of the sentence
as well.
Modelling Presuppositions
We must model the way presupposition triggers like the and manage
aect meaning; in particular, the way they make certain propositions
into potential presuppositions, which are in turn potentially proposi-
tions that are to be taken for granted in the utterance containing the
presupposition trigger (whether the potential presupposition is an actual
presupposition or not depends on the content of the utterance and its
discourse context, as we'll shortly see).
People can use presuppositions to good eect in communication.
First, people can save time through using them, because they allow you
to get a message across, even when part of that message is left unsaid.
2 Actually, testing for presuppositions is a matter of controversy, and there is not
even a universally agreed upon technical denition of presuppositions within the
literature. But to keep matters simple, we will assume that this test is always reliable
enough to distinguish those implications that are presupposed from those that are
not.
MIT Press Math6X9/2003/02/27:16:11 Page 367
Presuppositions 367
So, for example, you can say (7) instead of the more cumbersome (14):
(7) All of John's children are bald.
(14) There is someone call John, and he has children all of whom
are bald.
Secondly, politicians and lawyers exploit presuppositions all the time
in order to convey what might be quite controversial as if it were taken
for granted (and hence not controversial at all). A senator asking (15)
to Alan Greenspan rather than (16) is using language to much greater
eect:
(15) When did you realise that you had screwed up the economy?
(16) Did you screw up the economy?
When is a presupposition trigger, and so (15) presupposes (17). Indeed,
realise is a presupposition trigger too, and (17) presupposes (18), and so
both (17) and (18) are presuppositions of (15):
(17) You realised that you had screwed up the economy.
(18) You screwed up the economy.
So (15) `takes for granted' that Greenspan screwed up the economy and
realised this, in contrast to (16). Greenspan is put on the defensive if he
is to refute the presuppositions in (15), where a response like (19) would
be necessary:
(19) But I haven't screwed up the economy.
Presupposition triggers are ubiquitous in communication. Just pick a
sentence in a newspaper article, and count the number of presupposition
triggers and corresponding presuppositions! You will probably get more
than one of them in every sentence that you pick. Think about the
following sentence:
(20) Luke Skywalker regretted nding out that Darth Vadar was
his father.
This sentence doesn't sound strange at all. And yet there are no less than
ve presupposition triggers in it, and so there are ve non-negotiable
assumptions. Can you nd them? Try the above test involving negation,
MIT Press Math6X9/2003/02/27:16:11 Page 368
368 Chapter 15
Presuppositions 369
370 Chapter 15
In this context, the denite description the fat man doesn't mean there is
a unique fat man in the whole world, as Russell's analysis would predict.
Rather, this description serves to refer back to one of the men mentioned
earlier, and it does so successfully, because we know that one and only
one of these two men was fat. The uniqueness aspect of the meaning of
the fat man isn't that a fat man is unique in the whole world; rather, we
must try to specify that it's unique in the relevant situation, as described
by the discourse context.
These examples show that we must modify Russell's semantics, so
that denite descriptions are linked in the right way to objects that
have already been mentioned in the context. Sometimes we must ensure
that the denite description refers to an object that was referred to
earlier. The rule for translating the that we gave earlier in the course
did this. But as mentioned earlier, there are problems with this rule, in
that it gives the wrong analysis of (21). So we must change this rule.
We must automatically predict in any given discourse context when a
denite description refers to something already mentioned, and when it
introduces something new. We need to do this in a systematic way,
so that a computer can do it. This is a prerequisite to a computer
understanding texts that have denite descriptions in them.
In the next section, we'll propose how one might modify the seman-
tics of denite descriptions, so that they link to context correctly. We
review the proposal that presuppositions in general behave like pro-
nouns; not just denite descriptions. Since the semantics of pronouns
Presuppositions 371
Jack's children
(24) a. Jack has children and all of are bald.
them
Jack's children
b. If Jack has children, then all of are
them
bald.
Jack's children
c. Either Jack has no children or all of
them
are bald.
(25) a. John
failed his exams and he regretted
that he failed his exams
.
it
b. If
John failed his exams then he regretted
that he failed his exams
.
it
c. Either
John didn't fail his exams
or he regretted
that he failed his exams
.
it
One proposal in the literature is the following: presuppositions are like
pronouns, but they have more semantic content. In other words,
just as we have to nd antecedents to pronouns, we should also nd
antecedents in the context to bind presuppositions to. The dierence
is that whereas something like them tells us very little about what is
being referred to (merely that whatever it is is plural), something like
Jack's children gives a lot more information (what's being referred to
are human non-adults, ospring of some adult male called Jack, and so
on). When one can't nd an antecedent to a pronoun in the discourse
context, the utterance sounds odd:
(26) ?All of them are bald.
In contrast, presuppositions can be felicitous even when there is no
suitable antecedent for it in the context. In such cases, language users
simply add the appropriate content to the context. So, for example,
suppose that the hearer of (1) doesn't know that John has a wife:
(1) John's wife took an aspirin.
The hearer doesn't get confused because the speaker, in using (1), has
taken for granted that John has a wife. Rather, the hearer infers that
MIT Press Math6X9/2003/02/27:16:11 Page 372
372 Chapter 15
the speaker assumes he knows or will accept that John has a wife.
Accordingly, he adds the information that John has a wife to the context,
and then processes the semantic information that she took an aspirin in
the usual way. This process of adding information to the context so as
to ensure that an utterance is felicitous is known as accommodation.
Intuitively, accommodation is a way of giving the speaker the benet
of the doubt: he adds what's necessary to the context to ensure that what
the speaker uttered `made sense', or isn't odd in any way. Of course, there
will be occasions when the hearer cannot do this because it would con
ict
with knowledge he already has. For example, he cannot accommodate
the presupposition that John has a wife if he already knows that John is
unmarried. In this case, accommodation would fail, and the hearer may
use his next turn in the conversation to indicate this; e.g., by uttering
something like But John hasn't got a wife. In general, accommodating
antecedents to pronouns isn't possible. One might speculate that this is
because pronouns don't carry enough semantic content for the hearer
to really know what he is supposed to add to his model of the world,
although a closer look at the data indicates that the situation is not
clear cut, and the extent to which one can accommodate antecedents to
pronouns is in fact a matter of great controversy in the literature (see
Section 15.7). But by and large, accommodation is what dierentiates
presuppositions from pronouns. This dierence not withstanding, the
similarities between pronouns and presuppositions are striking: in both
cases, one is attempting to bind them to antecedents in the context.
This similarity between presuppositions and pronouns, and their dif-
ferences, isn't surprising when one thinks of the phenomena in terms of
given and new information. The given information is stu that isn't
Presuppositions 373
(b) He will not already have the new information attached to that
antecedent (because otherwise it wouldn't be new!).
So, processing given information|in other words, processing pronouns
and presuppositions|becomes a matter of nding a unique antecedent.
One can see how this explains the way the fat man is used in (23). A
unique fat man exists in the context already. And so the description the
fat man can bind to this. This is very dierent from Russell's proposed
semantics of the fat man which would commit us to (23) being true only
if there was a unique fat man in the whole world. Here, we need only
nd a unique antecedent in the context. There may be millions of fat
men in the world, but the constraint according to this proposal is that
only one of them is taken to be relevant in the context.
As already mentioned, if there's no antecedent that a presupposition
can bind to, then we add it to the context of utterance, subject to
certain constraints being met. One of these constraints is that the
result of adding the presupposition should be consistent. If adding the
presupposition would produce a contradiction, then accommodation fails
and the sentence carrying the presupposition cannot be interpreted. For
example, the discourse (27) sounds odd because the speaker at one and
the same time takes for granted that John has a wife, and asserts that
he doesn't:
(27) ?John is single. John's wife is bald.
When the hearer processes the second sentence of (27), he can't ac-
commodate the information that John has a wife to the context (which
contains John is single), because there is no way of adding this without
producing a contradiction (that John is single and John has a wife).
How does viewing presuppositions like pronouns with semantic con-
tent help solve the Projection Problem? In (24) and (25), the potential
presuppositions (that Jack has children and that John failed his exams
respectively) are not presupposed by the sentences as a whole. Observe,
also, that in each of these cases, one can replace the presupposition trig-
ger with a construction containing a pronoun: in (24) Jack's children
is felicitously replaced with them and in (25) regretted that he failed his
exams is felicitously replaced with regretted it. The fact that we can para-
phrase using pronouns is evidence that there is a suitable antecedent in
the context for the potential presupposition to bind to. In other words,
MIT Press Math6X9/2003/02/27:16:11 Page 374
374 Chapter 15
all these sentences are ones where an antecedent is found without accom-
modating anything. Thus the context of interpretation of the presuppo-
sition trigger doesn't change, and the potential presupposition doesn't
project out, to be presupposed by the whole sentence.
In contrast, when we utter (28) in isolation of such a discourse
context, we need to accommodate the presupposition:
(28) All of Jack's children are bald.
That is, we need to add the information that Jack has children to
the context, before we process the information that all of them are
bald. Compare this with (26), where referring to Jack's children with
a pronoun doesn't work because there isn't a suitable antecedent.
These observations about substituting presuppositions for pronouns
suggests the following informal solution to the Projection Problem. To
test whether a potential presupposition is presupposed by the sentence
or not, investigate what happens when the relevant phrase containing
the presupposition trigger is replaced with a pronoun (as we do in
(24) and (25)). If the resulting sentence containing the pronoun is
felicitous, then the potential presupposition is not presupposed by the
sentence. Otherwise, the resulting sentence containing the pronoun is
not felicitous and the potential presupposition is a presupposition of the
whole sentence. So, none of the sentences in (24) presuppose that Jack
has children, even though they feature the presupposition trigger Jack's
children. Similarly, none of the sentences in (25) presuppose that John
failed his exams, even though they feature the presupposition trigger
regret that potentially presupposes that John failed his exams (since
this is the sentential complement to regret). But (28) does presuppose
Jack has children, because (26) is odd.
This procedure for testing whether a sentence carries a given presup-
position or not is not a fully precise solution to the Projection Problem.
It certainly isn't a solution that is stated in a way that would help a
computer process presuppositions, since the procedure relies heavily on
human intuitions about paraphrases (does the sentence containing the
potential presupposition mean the same thing as the sentence contain-
ing the pronoun?) and human intuitions about whether the sentence
containing the pronoun is felicitous or not.
We will, however, develop below a more formally precise model of
presuppositions that builds on the informal observations about their
MIT Press Math6X9/2003/02/27:16:11 Page 375
Presuppositions 375
behaviour that we've made so far. That is, the formal model will exploit
the close relationship, and the dierences, between presuppositions and
pronouns that are exhibited in the above linguistic data.
376 Chapter 15
Presuppositions 377
In words, the `instruction' (32) means: try and bind the x in the drs
(31) to an antecedent discourse referent y which has the same conditions
imposed on it already (i.e., man(y )); otherwise, if no such y exists, add
x and the condition man(x) to the DRS.
6 As a matter of fact, it may be more appropriate to think of pronouns as behaving
like presuppositions, since using a pronoun presupposes that the hearer will be able
to determine a likely antecedent for it.
MIT Press Math6X9/2003/02/27:16:11 Page 378
378 Chapter 15
The grammar must ensure that with the man, (32) is inserted in the
instruction box. This is dierent from the rule for translating the that
we had in our grammar earlier in the course. Recall that this rule told
you to reuse a discourse referent if you could, and if you couldn't, to add
a new one. But this old rule allowed you to reuse a discourse referent
that lacked the condition that the individual denoted by it was a man.
That's why it gave the wrong analysis of (21). This new rule for the
man xes this: you have to bind all of the content of the presupposition
to an antecedent. This means that, with respect to (32), you can bind
the question marked information only if you already have a discourse
referent y in your DRS, which is already accompanied by the condition
man(y ). So this new semantic rule for the will rene the old one, because
the instruction constrains the reuse of discourse referents in a better way.
Furthermore, as we'll shortly see, the instruction also tells you exactly
what to add (and where in the DRS to add it) when you can't reuse a
discourse referent.
We're now in a position to state in detail the rule that converts the
into semantic information. In words, this rule triggers an instruction
to nd an object with appropriate content (that is, which satises the
property described by name) in the context, and failing that, to add one:
The:
NP
becomes: x
DET N
x
?
the name name(x)
where x hasn't been used yet, and name is the translation of name.
This rule is best explained by means of an example. Consider the
sentence (33):
(33) The man talked.
MIT Press Math6X9/2003/02/27:16:11 Page 379
Presuppositions 379
NP VP
DET N V0
x VP
V0
(33 )
0
talked
x
?
man(x)
So, using the rules for converting intransitive verbs to semantic infor-
mation, and then using the S rule given earlier in the course, we end up
with the following semantic representation of (33):
talk(x)
(33 )
00
x
?
man(x)
380 Chapter 15
add such an object to the context and identify x with it. This matches
our intuitions: (33 ) states that a man talked, and if there's already been
00
a man mentioned then it's that man that's doing the talking, and if not,
then it's a new man we're talking about. Intuitively, that's what (33)
should mean. Contrast this with what (34) means.
(34) A man talked.
Because (34) doesn't induce an instruction like that in (33 ) to identify
00
the talking man with a man already mentioned in the discourse, the
man in (34) could be a completely new man, even if we've already been
talking about men. This isn't so for (33), because of the content of the.
Compare the rule The with the one for converting pronouns to se-
mantic information. With pronouns, we introduced an instruction x =?,
which in words meant: identify x with a discourse referent already in
the context. So both the Pronoun and The rules introduce instructions.
Both instructions involve identifying the thing being talked about with
something already in the context. But the dierence is that with presup-
positions, if this identication (or binding) procedure fails, then accom-
modation is possible for the instruction to be carried out successfully.
This option isn't available for pronouns.
We need to say more about how one fulls the instruction in (33 ). 00
Presuppositions 381
S
(37)
NP VP
DET N V0
Reducing this to semantic information as we did above, via the rule The,
we obtain the following semantic representation of the text (35):
x; y
man(x),
woman(y ),
walk(x),
(38) walk(y )
talk(z )
z
?
man(z )
But we're not nished there, because we have to carry out the instruction
expressed in (38). Strictly speaking, given the uniqueness aspect of the
meaning of the that we mentioned earlier, this instruction amounts to
the following: try and identify z with a unique object in the DRS that's a
man; and failing that, add z to the list of discourse referents of the DRS
MIT Press Math6X9/2003/02/27:16:11 Page 382
382 Chapter 15
Since there are no more instructions left, we're done. We can now work
out whether (35) is true or not. In words, (39) is true if and only if
there are two objects|denoted by x and y |where x is a man that
walks and talks, and y is a woman that walks. It's important to note
that (39) doesn't imply that there is a unique man in the universe.
This is what Russell's analysis would be committed to. Rather, the
uniqueness condition was placed in the instruction: we had to nd a
unique antecedent (as suggested earlier in the discussion of given and
new information), and failing that, we had to add one.
Treating presuppositions like pronouns allows us to work out sys-
tematically when a presupposition is cancelled. The presupposition of
(33) that there is a man is cancelled in (40) because of (40)'s if-phrase.
(40) If there was a man, then the man talked.
We capture this in our analysis, by the fact that the presupposition
triggered by the phrase the man is bound to the man in the if-phrase,
and we don't have to accommodate it. Here's the analysis in detail. First,
we convert the words if: : :then to semantic information:
MIT Press Math6X9/2003/02/27:16:11 Page 383
Presuppositions 383
x talk(y )
man(x) )
(42) y
?
man(y )
The instruction to nd a man appears in the right hand side (rhs) box
of the ), rather than in the instruction for the top-level box, because of
the places in the structure where the sentence the man talked is converted
to semantic information, as shown in (41). We now have to carry out
this instruction. We must identify it to a man in the same way that
we would a pronoun, and if we can't do that, we must add it. In this
case, we can identify it with the man x in the box on the left hand
side (lhs) of the ). Note that this is permitted, because the discourse
referent x meets The Structural Constraint for nding antecedents,
that we introduced in chapter 14, when we analysed pronouns. So the
nal representation is (43):
MIT Press Math6X9/2003/02/27:16:11 Page 384
384 Chapter 15
x
man(x) ) talk(x)
(43)
Note that intuitively (40) doesn't presuppose there is a man (that is,
the presupposition carried by (33) is cancelled in the context of the if-
phrase of (40). Our analysis captures this. The instruction is carried
out via binding. Consequently, the potential presupposition that there
is a man never gets `promoted' out from the if-phrase to be implied (or,
more accurately, presupposed) by the whole. It doesn't get added to the
top-level box, and so (43) doesn't imply there is a man. That's how it
should be, according to our intuitions. Thus, viewing presuppositions
as pronouns has helped here. Analysing them this way has correctly
predicted that the whole sentence (40) doesn't entail there was a man.
Now consider a more complex example where there are several
presuppositions, some of which end up getting bound, and others which
end up getting accommodated:
(44) If the King of France comes to the party, then the party will
get press coverage.
As for (40), we rst convert the words if: : :then to semantic information:
Presuppositions 385
come(x; y ) z
press-coverage(z )
x
? get(w; z )
(46) King-of-France(x) )
w
?
y party(w)
?
party(y )
There are three instructions, resulting from the three occurrences of the
presupposition trigger the. We have to decide on the order in which we
carry out these instructions. We will assume the following: carry out
instructions in the embedded boxes before doing those in the biggest
box; and carry out instructions in the box on the lhs of a )-condition
before doing those in the box on the rhs. The rst part of the ordering
constraint re
ects the fact that embedded information is caused by
simple sentences that form part of the complex sentence represented by
the biggest box, and the complex sentence should always be interpreted
in the light of what the simpler sentences that form it mean. The second
ordering constraint stems from the fact that as humans we process
information incrementally (cf. the discussion in Section 13.1 about the
way humans disambiguate words before they get to the end of the
sentence). That is, we work out what the words and phrases mean as
and when we hear them. And so we process S1 in a sentence of the form
If S1 then S2 before we process S2 , because we hear S1 rst!7
This ordering constraint means that we have to deal rst with the
instruction to nd an antecedent for x|the King of France|and the
instruction to nd an antecedent for y |the party. Then, having carried
out these instructions, we will deal with the instruction to nd an
7 For the sake of simplicity, we're ignoring utterances like S2 , if S1 here.
MIT Press Math6X9/2003/02/27:16:11 Page 386
386 Chapter 15
z
press-coverage(z )
come(x; y )
(47) get(w; z )
y
)
? w
party(y ) ?
party(w)
Presuppositions 387
z
press-coverage(z )
(48) get(w; z )
come(x; y ) )
w
?
party(w)
z
(49) press-coverage(z )
come(x; y ) ) get(y; z )
MIT Press Math6X9/2003/02/27:16:11 Page 388
388 Chapter 15
There are no more instructions, and so we're done. In words, (49) states:
there is a King of France x, and there is a party y , and if x comes to y ,
then y will get press coverage. This matches our intuitions about what
(44) means. Note that even though (44) featured two NPs of the form
the party, we still capture in the semantics that the sentence is about
only one party (which happened to be mentioned twice). Also note that
(49) can be true even if there is more than one party and more than
one king of France in the world. The important thing for processing this
sentence, was that the king of France referred to a unique object in the
context.
Our analysis of (44) also indicates why presuppositions tend to
`project out' from linguistic constructions such as conditionals, to be
presupposed (and hence implied) by the whole sentence. When an
instruction in an embedded box has been dealt with via accommodation,
the material ends up in the top-level box. The result, then, is a DRS
which is true only if the presupposed material is true. This is what
happened in (44). The king of France and the party were in an if-phrase,
and stu in an if-phrase isn't necessarily true (rather it describes a
hypothetical situation). But nevertheless, that there is a King of France
and that there is a party is ultimately implied to be true, because these
things nally appear in the top level box, and not in the antecedent
(or LHS) box. In other words, these things projected out from the
conditional, because they had to be accommodated.
Proper names are presupposition triggers too. And our theory pre-
dicts that presuppositions project out of embeddings to feature as con-
ditions in the top level box. This means that we can justify now why we
assumed in Chapter 14 that proper names such as Mary end up trans-
lating into conditions in the top level box, regardless of which box in the
DRS structure the proper name appeared in, before it was translated
via the semantic rule for proper names.
So let's look now at the translation for Proper Names. The transla-
tion for Proper Names we had before didn't deal with the presupposi-
tion. The revised rule in Figure 15.1 recties this. This new translation
rule explains why generally, x and mary(x) end up in the top level box.
It also explains why repeated mentions of Mary in a text are assumed
to refer to the same person. Proper Names (Revised) captures this, be-
cause the instruction triggered by the rst Mary is accommodated in the
DRS, and all subsequent instructions triggered by subsequent mentions
MIT Press Math6X9/2003/02/27:16:11 Page 389
Presuppositions 389
NP
becomes x
PN
x
name ? name(x)
Where x hasn't been used yet and name is the translation of name.
Figure 15.1
The Revised Rule for Proper Names
of Mary will bind to it. Finally, this rule recties our analysis of (50).
The old rule for proper names told us to reuse a discourse referent if
we could, regardless of whether or not that old discourse referent had
a condition on it already that it was the bearer of the relevant proper
name. So the old rule for proper names got the wrong analysis of (50),
in that it allowed you to build a DRS where Pip and Etta were one and
the same person.
(50) Etta chased a bird. Pip caught it.
The nal translation of (50) is (50 ) with our revised rule for proper
0
names. Compare this with the translation (50 ) that we got with the
00
old rule:
x; y; z
etta(x)
bird(y )
(50 )
0 chase(x; y )
Pip(z )
catch(z; y )
MIT Press Math6X9/2003/02/27:16:11 Page 390
390 Chapter 15
x; y
etta(x)
bird(y )
(50 ) 00
chase(x; y )
Pip(x)
catch(x; y )
(51) Possible S
Presuppositions 391
(52 )
0 Possible [The King of France signed a proclamation]
y
sign(x; y )
proclamation(y )
(52 )
00 Possible
x
?
King-of-France(x)
392 Chapter 15
x
King-of-France(x)
y
(52 ) 000 proclamation(y )
Possible signed(x; y)
In words this states there is a King of France (x), and it's possible that
there is a proclamation (y ) that x signed. So, we have modelled the
contrast between presuppositions and semantic entailments with regard
to projection from embeddings, via the mechanism of accommodation.
Presuppositions 393
394 Chapter 15
(56) There are two types of cotton that are grown in the States. One
of them is typically used to make shirts; the other is typically
used to make skirts.
The cotton shirts are made from grows in Mississippi.
One can now re-state these ideas about how the interpretation of
words like raced in (53) is aected by context, by using our model of
the way presuppositions behave. Sentence (53) includes the word the,
and this introduces an instruction as we've shown in the rule The. What
goes in the instruction box depends on the semantic content of the NP.
But as we parse (53) word by word, we have a choice about where to
attach the word raced: (a) it can either be a verb that forms part of
a relative clause in the NP; or (b) it can be a verb that is part of
the sentence's VP (i.e., the NP is just the horse and nothing more).
The choice at this point will aect the instruction about what kind of
antecedent we must nd in the context. If we go for choice (a), then
the instruction becomes: nd an antecedent discourse referent which
is a horse that was raced past the barn, and this antecedent must
be unique (because recall the presupposes that the object described
is unique in the context). If we go for choice (b) then the instruction
is: nd an antecedent discourse referent which is a horse, and this
antecedent must be unique. If the instruction corresponded to option (b)
and there were two horses in the context, then we wouldn't be able to
carry out that instruction successfully, because the uniqueness condition
would be violated. However, if one of the horses was known to have
been raced past the barn, and the other wasn't, then the instruction
corresponding to option (a) would be carried out successfully. So the
way we interpret raced in (53) is intimately connected with whether or
not the presupposition triggered by the can be processed successfully.
So, we can predict the way humans parse (53) via the following
constraint:
Parsing Constraint:
Suppose you encounter a word which can attach at two alternative
points to the syntactic tree given so far. Suppose furthermore that this
choice of attachments aects an instruction. Then choose between these
syntactic alternatives at this point in parsing, according to the following
constraint:
MIT Press Math6X9/2003/02/27:16:11 Page 395
Presuppositions 395
{If the material you have so far in the instruction box does not
enable you to carry out the instruction successfully, then attach the
constituent so that the content of the instruction changes.
{Otherwise, attach the constituent so that the instruction doesn't
change.
This constraint amounts to: if it ain't broke, don't x it. More speci-
cally, it encapsulates the intuition that if we can deal with the presuppo-
sition without changing the instruction, we do so and attach new words
so that the content of the presupposition (and hence the instruction) re-
mains unaltered. Intuitively, this is a strategy for minimising the chances
of getting presupposition failure, because you deal with the instruction
at the point when you know you can succeed.
So, consider parsing (53) in a discourse context where a single horse
has been mentioned. Then having parsed the horse, we end up with
a semantic representation that looks like (57): y is the horse that had
already been mentioned in the discourse context; Q must be lled in
with VP stu still to be parsed, and P will be lled in with stu if we
add more things to the NP the horse to make a bigger NP, but otherwise
it will be void of content:
y
horse(y )
Q(x)
(57)
x
? horse(x)
P (x)
Now we parse raced, and we have a choice: we can attach this to the NP
or VP; that is, it can ll out P (x) in the instruction, or it can ll out
Q(x). So we check whether we can successfully carry out the instruction
in (57) as it stands: that is, we try to bind x to an existing, unique horse;
and if that fails we try and accommodate it uniquely. And we can bind
x in this case: we can identify x with y. So the instruction as it stands
can be dealt with successfully. Hence, by the above Parsing Constraint,
we attach raced to the VP, so that the content of the instruction doesn't
MIT Press Math6X9/2003/02/27:16:11 Page 396
396 Chapter 15
ultimately change. (Note that if we attached raced to the NP, then the
content of the instruction would change because we would have to nd
an antecedent that's a horse that raced, rather than an antecedent that's
just a horse simpliciter). Making raced part of the VP turns out to be
the wrong choice, but we don't realise this until we reach the last word
fell of (53), because the phrase raced past the barn is a perfectly good
VP. Hence the Parsing Constraint predicts correctly that we garden
path in this context. Also note we would have made the same choice for
raced if there had been no horse y in the context, because although we
couldn't have bound the horse to something in the context, we could
have successfully added the horse to the context, and so the instruction
would have been successfully carried out via accommodation. Hence the
Parsing Constraint predicts correctly that we garden path in this context
as well; it predicts that when we read raced, we'll assume it's part of the
sentence's VP.
Now consider a situation where two horses have been mentioned,
only one of which has raced past a barn. Then at the point at which
we've parsed the words the horse, the semantic content is as in (58):
again Q is to be lled in by VP information; the two horses y and z are
those already mentioned in the discourse context; and P may be lled
in with more content, if we end up adding more to the NP the horse to
make a bigger NP:
y; z; w
horse(y )
horse(z )
barn(w)
be-raced-past(z; w)
(58) Q(x)
x
? horse(x)
P (x)
As before, we now parse raced, and we have a choice: we can attach this
to the NP (thereby lling out P (x)) or to the VP (thereby lling out
Q(x)). So we check whether we can successfully carry out the instruction
MIT Press Math6X9/2003/02/27:16:11 Page 397
Presuppositions 397
398 Chapter 15
y; z; w
horse(y )
horse(z )
(60) barn(w)
raced-past(z; w)
Q(z )
Now we must parse the word fell. And we still have a VP to be lled
in (with the corresponding semantic content Q(z ) to be lled in), which
fell can do, to produce the following:
y; z; w
horse(y )
horse(z )
(61) barn(w)
raced-past(z; w)
fell(z )
Presuppositions 399
are made from) binds to a unique antecedent. This contrasts with what
would happen if the subject NP were taken to be the cotton shirts, and
one chose to parse are as the start of the sentence's VP. In this case, the
corresponding presupposition would have to be accommodated (because
we've mentioned shirts in the context, but not cotton shirts). We've
assumed in this chapter that binding presuppositions is preferred to ac-
commodating them. We can rene our Parsing Constraint to re
ect this,
by adding to it that you prefer to parse things so that the instruction
triggered by a presupposition trigger is bound rather than accommo-
dated. This predicts that in the context of (56), shirts are made from
will be parsed as a relative clause as and when the interpreter hears it,
and so he won't be led up the garden path, in contrast to (55a).
This matches the psychological data in Crain and Steedman's ex-
periments on human sentence processing. So, this is evidence that the
above Parsing Constraint is psychologically plausible.
The Parsing Constraint is a rst step towards providing principles
on how the grammar can be parsed so that it matches the way humans
process language. But the constraint is controversial, partly because
Crain and Steedman's results on which it is based are controversial.
And at any rate, it only solves a very small part of the larger problem,
of specifying how humans parse sentences.
15.6 Conclusion
Some words make us interpret certain things as non-negotiable. These
non-negotiable assumptions are known as presuppositions, and the
words that trigger them are presupposition triggers. Presupposi-
tions are dierent from semantic entailments in two important re-
spects. First, they typically project from embeddings in phrases like It's
possible that, if and not. They don't always project this though; see (4a)
where the (potential) presupposition does project out from the condition
vs. (4b) where it doesn't:
(4) a. If baldness is hereditary, then John's son is bald.
b. If John has a son, then John's son is bald.
This indicates that the status of a presupposition in a discourse is de-
pendent on the content of the presupposition, the context and their
MIT Press Math6X9/2003/02/27:16:11 Page 400
400 Chapter 15
Presuppositions 401
402 Chapter 15
Exercises
Exercise 15.3: David Lewis wrote one of the seminal papers on pre-
suppositions and accommodation:
David Lewis [1979] Scorekeeping in a Language Game, Journal of
Philosophical Logic, 8, pp339{359, D. Reidel Publishing Company.
In it, he claims the following. Presuppositions can be created or de-
stroyed in the course of a conversation. But it's actually not that easy
to say something that will be unacceptable for lack of the required pre-
suppositions. This is because if you say something that requires a miss-
ing presupposition, then straight away, that presupposition springs into
existence, making what you said acceptable after all.
This is what has come to be known as accommodation. Imagine a
world where people couldn't accommodate presuppositions in the way
Lewis describes. Then for each of the sentences (64{66) below, write
down what people would have to say prior to uttering these sentences,
in order to make them acceptable:
MIT Press Math6X9/2003/02/27:16:11 Page 403
Presuppositions 403
404 Chapter 15
Exercise 15.6: If you want to nd out more about bridging (e.g., the
thing that goes on in (62)), then the following contains a very accessible
discussion of the issues involved:
H. Clark [1975] Bridging, in R. C. Schank and B. L. Nash-Webber
(eds.) Theoretical Issues in Natural Language Processing, mit Press.
In this article, Clark argues that accommodating a presupposition in-
volves more than just adding its content to the context. Rather, we try
and relate that added content to what's already there in the context.
The following text is an example of this phenomenon:
(67) John was murdered yesterday. The knife lay nearby.
1. Say what the knife presupposes.
2. Specify what information people infer from the expression the knife
in (67) (clue: Clark gives a specication of this in the paper).
3. How does what people infer dier from our account of what gets
accommodated in ((67), in our formal model of communication? In other
words, what information is missing in this model?
Think up three linguistic texts where a presupposition cannot be bound
or accommodated, making the text sound odd. Try to list the factors
that contribute to the text sounding bad.
Explain how the grammar rule for pronouns, and in particular the con-
straints on which antecedents are accessible, makes the wrong predic-
tions about the interpretation of it in this example. Now consider the
following `paraphrase' of the above discourse:
Every chess set has a spare pawn. I bought a chess set yesterday and the spare
pawn was taped to the top of the box.
MIT Press Math6X9/2003/02/27:16:11 Page 405
Presuppositions 405
406 Chapter 15
16 Juxtaposing Sentences
408 Chapter 16
In (1), he can use general knowledge to conclude that the relevance be-
tween the two facts mentioned is that the second (Keith is Alex's boss)
causes the rst (Alex drinks a lot).1 Assuming this causal link between
the described events (which in turn implies that Keith is a bad boss,
making Alex drown her sorrows in drink), is sucient for ensuring that
the speaker was adhering to a principle that his utterances are relevant
(to each other). But in contrast to (1), the hearer can't compute any rel-
evance between the facts in (2): assuming a causal relation between the
events seems untenable in this case, because there's know background
information that would support such an assumption. On that basis, the
hearer can't identify the relevant link between the sentences, and the
text sounds odd.
Grice suggests that the above principles, which he argues govern
the way conversation is produced and interpreted, follow from assuming
the conversational agents are rational (i.e., they are not prepared to
believe contradictions) and cooperative (i.e., they typically help other
people to achieve their goals). He did not, however, demonstrate this
formally, nor did he oer a formal model of rationality and cooperativity.
Indeed, this remains a big challenge in the study of pragmatics. An
alternative perspective one might take on these principles (although this
wasn't Grice's perspective), is to view them as a `contract' between the
speaker and hearer: they encapsulate rules for communicating eectively.
Think about how you follow such rules when writing an essay, for
example.
To get computers to converse in the way humans do, the computer
must know about and reason eectively with these conversational prin-
ciples. For example, we want a computer to produce the text (3) rather
than (4):
(3) The year 1993 will start with the world in a pessimistic frame
of mind. The gloom should soon dispel itself. A clear economic
recovery is under way. Though it will be hesitant at rst, it
will last the longer for being so. If you are sitting in one of
the world's blackspots, this prediction will seem hopelessly
optimistic. But next year's wealth won't return to yesteryear's
winners; these middle-aged rich people need to look over their
1 Note that this causal relation relies on the assumption that drinking a bottle of
whisky a day amounts to drinking a lot!
MIT Press Math6X9/2003/02/27:16:11 Page 409
410 Chapter 16
the content of the clause which is extracted from the grammar. Thus the
syntactic structure of sentences becomes just one source among many to
constrain the content of the utterance.
So far, we have explored how one can represent the relationship
between the form of a sentence and its meaning in a very precise fashion.
In fact, using DRSs has already brought in some interaction between
the discourse context and the current clause: observe how the results of
carrying out instructions in the instructions box are dependent on what's
in the DRS already (both for pronouns and for presuppositions). Now
we want to extend these techniques for modelling the information
ow
between context and the content of the current utterance even further.
We would like to also model how world knowledge, Grice's principles of
conversation and other factors in
uence the content of a discourse.
In particular, we will show in an informal, but systematic, fashion,
how one can model the principles which allow us to detect the dierence
in meaning between the simple texts (6) (where the textual order of
events matches temporal order) and (7) (where there is mismatch), in
spite of the fact that the sentences in these texts have exactly the same
syntax.
(6) Max stood up. John greeted him.
(7) Max fell. John pushed him.
The prediction that (6) and (7) mean dierent things will stem from
representing in a principled fashion some aspects of the non-linguistic
information that contributes to the way people interpret language in con-
versation. The form-meaning relation in (6) is the same as in (7), because
the sentences involved have the same syntax. But their interpretations
are dierent because other factors in
uence their overall meaning.
As we've mentioned, programming computers to take account of
these other `non-linguistic' factors which in
uence meaning remains a
major challenges in computational linguistics. Although there has been
dramatic progress over the last decade, both representing such rich
knowledge resources and reasoning with them eectively is currently
beyond the state of the art. So what we oer here is a very simple
taste of how to build a systematic procedure for building a semantic
representation of very simple texts, which goes beyond the semantics
generated by the grammar. Giving any formal detail would go beyond
MIT Press Math6X9/2003/02/27:16:11 Page 412
412 Chapter 16
the scope of this book. However, the simple account we present here
of principles for interpretation conversation are formalisable in logics
which are designed to model commonsense reasoning; these logics
have been developed within the eld of Articial Intelligence (ai) to
address problems in knowledge representation and inference.
We'll rst discuss Grice's principles of conversation in more detail.
We'll then give a taste of how some aspects of these principles, together
with things like world knowledge, can be represented and exploited in
computing what a text means. We will concentrate on the task of infer-
ring how events described in a text are connected with part/whole and
causal relations, even when those relations aren't part of the semantic
representation that's generated by the grammar, as we observed in texts
(6) and (7).
(ii) do not say anything for which you lack adequate evidence.
The Maxim of Quantity
Make your contribution as informative as is required for the current
purposes of the exchange, and no more.
The Maxim of Relevance
Make your contributions relevant.
The Maxim of Manner
Be perspicuous, and specically:
(i) avoid obscurity
(ii) avoid ambiguity
(iii) be brief
(iv) be orderly
(8 )
0
B: No I don't know the exact time of the present moment, but
I can provide some information from which you may be able to
deduce the approximate time, namely, the milkman has come.
Here's why. According to Grice's maxims, B 's response in (8) must be
relevant to the question. The questioner A, knowing this, decides to
compute what makes it relevant. He concludes that the relevance is this:
it must be that whatever time it is, it's after the milkman comes. There
seems to be no other plausible connection between the question and B 's
response. Moreover, A can infer from (8) that B doesn't know the exact
time, (note we've included this in (8 )). This is because if B did, then he
0
would have been bound, via the Maxim of Quantity (be as informative
MIT Press Math6X9/2003/02/27:16:11 Page 414
414 Chapter 16
more concise (8) shows how powerful hidden assumptions can be during
communication. We must make sure computers can follow the instruc-
tions in the contract on how to converse with language in the same way
that humans do, so that computers can produce the much more natural
(8) instead of (8 ).
0
416 Chapter 16
(7) being an exception to be orderly)). We will propose how one can use
techniques from Articial Intelligence to construct such an explanation.
In essence, we will be using a tool from AI to make precise an ongoing
discussion in Philosophy, about the rules that govern the way humans
converse.
418 Chapter 16
rules. This logic will then be the `engine' which yields the semantiic
representation of a text, augmenting the information already given by
the grammar. For example in (6), the principles which apply must be
such that the logic working over them predicts that the standing up
precedes the greeting. The form-meaning relation doesn't specify this
aspect of meaning. Rather, it only species that the events are connected
somehow. But we nevertheless compute this additional meaning via
reasoning with non-linguistic information, as specied by the rules which
admit exceptions and a logic which allows us to reason with these rules.
The rules that admit exceptions are known in ai as default rules, and
the logics that tell us how to reason with such rules are known as default
logics. In the next section, we will discuss how one might use these logics
to explain in a systematic fashion why we infer a lot more than what's
explicitly said in text.
420 Chapter 16
so on. However, if we know that birds
y, and that Tweety is a bird, and
that's all we know, then we conclude that Tweety
ies:
Defeasible Modus Ponens
Birds
y
Tweety is a bird
So: Tweety
ies
We call this pattern of inference Defeasible Modus Ponens. We may
retract the conclusion if we nd out more about Tweety; this is Defeat
of Defeasible Modus Ponens:
Defeat of Defeasible Modus Ponens
Birds
y
Tweety is a bird
Tweety doesn't
y
So: Tweety doesn't
y
This example exhibits an important characteristic of default reasoning:
one never draws default conclusions which contradict the premises.
In the above, we had just one default rule that applied. But we
have argued that sometimes, background knowledge gives con
icting
default clues about what a text means, and that con
ict must be resolved
whenever possible. In other words, we must think about the way humans
reason when there are several default rules that are pertinent, which
give con
icting clues about a particular fact. Which default wins in such
cases?
An example of this is known as The Penguin Principle in the ai
literature. It is exemplied as follows:
The Penguin Principle
All penguins are birds
Birds
y
Penguins don't
y
Tweety is a penguin
So: Tweety doesn't
y
In the above, there are two default rules that are relevant for working
out whether Tweety can
y or not. First, Tweety is a penguin, and we
know that (by default) penguins don't
y. Second, since Tweety is a
penguin, he's also a bird. And we know that (by default) birds do
y.
MIT Press Math6X9/2003/02/27:16:11 Page 421
422 Chapter 16
movement, for example. Then the law would apply when e1 is described
by the sentence Max stumbled, Max tripped, Max moved etc.; and e2 is
described by John shoved Max, John bumped into Max etc. Actually,
getting exactly the right form to this law, so that it applies in as many
cases as is plausible, is a task which no one has solved as yet. One
attempt is given below:
Generalised Law:
If the event e1 is described in a text just before the event e2 , and more-
over, e1 is an event where x undergoes a change along some dimension
(movement, creation/destruction, mental change), and e2 is an event
where y causes a change to x along the same dimension as e1 , then
normally, e2 causes e1 .
This Generalised Law will apply to (7) and to a lot of other examples
where the words are semantically similar to push and fall, such as those
in (12):
(12) a. Max tripped. John shoved him.
b. Max stumbled. John banged into him.
c. Max moved to the right. John gave him a big push.
That's because we'll see from the lexicon or dictionary that all these
words cause change/describe change in movement.
This is a very sketchy discussion of one particular way in which you
can get a more general rule about causation than the Push Law. We won't
go further than this here, because solving this problem is not important
for our purposes. The important thing to note here is the structure of
the law: it's a default rule, and it's got as part of its antecedent the
statement e1 is described before e2 and that e1 and e2 are events, which
is exactly the content of the antecedent of Be Orderly. This is important
when we look at the way it interacts with this law as text is processed.
We also have indefeasible causal knowledge, such as Causes Precede
Effects:
424 Chapter 16
from that syntax. (7 ) ensures that two of the above default rules are
0
veried: Be Orderly and Push Law (because e1 is a Max falling event, and
e2 is a John pushing Max event).3 As we've already stated, when both
these rules apply, the Push Law wins over Be Orderly by the Penguin
Principle. Thus we infer its consequent: e2 causes e1 . In other words, the
pushing causes the falling, and so precedes it by Causes Precede Effects.
Again, it must be stressed that the default inferences could change
when (7) is uttered in a dierent context, such as (13):
(13) John and Max were at the edge of a cli. John applied a sharp
blow to the back of Max's head. Max fell. John pushed him.
Max fell over the edge of the cli.
In (13), the falling precedes the pushing, rather than the other way
round. We would account for this via other default rules that con
ict
with Push Law, and are more specic than it, which would be veried
by the semantic information in (13), but not by (7) in isolation of the
discourse context provided by (13).
Now consider text (11).
(11) Max ate a lovely meal. He devoured lots of salmon.
We have the following background knowledge that's relevant:
The Meal Law:
If an event e1 is described just before the event e2 , where e1 is the
event of x eating a meal, and e2 is the event of x devouring something,
then normally: (i) e2 is part of the event e1 , and (ii) the thing that was
devoured was part of the meal.
The Meal Law con
icts with Be Orderly, because an event e2 can't be
part of e1 and precede it at the same time. The Meal Law is more
specic, however. And both laws apply in the analysis of (11). So by
the Penguin Principle, The Meal Law wins, and we interpret the events in
(11) as connected in a part/whole relation, rather than one of temporal
precedence.
Now consider a text which sounds odd, or like a joke:
3 This assumes that the pronoun him is resolved to co-refer with Max, using the
rules for nding antecedents to pronouns that we discussed in chapter 14. We have
ommitted details here.
MIT Press Math6X9/2003/02/27:16:11 Page 426
426 Chapter 16
One the one hand, the background information about what mustard is
used for favours interpreting thighs as chicken thighs. On the other, the
background information about what sun tan lotion is used for favours
interpreting thighs as human thighs. We can't resolve these con
icting
clues about how to interpret thighs. There's a Nixon Diamond here,
because two default rules apply, but neither one is more specic than the
other (there's nothing about mustard that makes it more specic than
sun tan lotion, or vice versa). So we can't decide which clue about the
interpretation of thighs should take precedence|the one about mustard,
or the one about sun tan lotion. This produces a word pun eect.
428 Chapter 16
The crucial diculty is that there is a very delicate balance between the
way the various default clues are represented, and the way these clues
interact in the logic. If we want one clue to have precedence over another,
then making it more specic guarantees this. But knowing when we
want this prioritisation is dicult to evaluate. And so representing the
background knowledge in the right way is a very complex task. Clearly,
encoding rules that are as specic as Push Law isn't practical if we were
building an automated natural language understanding system. There
would have to be a zillion rules like it! We would also be in danger of
encoding the rules so that the logic generates unwanted inferences.
It has so far proved impossible (and some would argue is impossible
in principle) to manually specify all the rules needed to process text in
a robust manner. One needs automated support for acquiring the rules
that encode our background knowledge. There is just too much of it
for us to do it manually. But no one as yet has cracked the problem of
acquiring the necessary default rules that the system needs to know in
order to understand text in the way that a human does.
Machine learning statistical models has proved successful in other
research areas where knowledge-rich information sources are needed to
perform particular tasks: the statistics is one way of approximating such
knowledge, thereby avoiding the need to hand-craft it. In particular,
machine learning has been used very successfully to learn grammars for
various languages, such as English. And the machine learning techniques
themselves are improving all the time, as well as being applied in novel
ways to various tasks in language processing.
However, while machine learning techniques have improved the sit-
uation signicantly, there is still much to be learned about how it might
apply in the area of pragmatic interpretation of discourse. One problem
that currently hampers progress in this area is that typically, machine
learning has to be trained on linguistic data that contains the target
information (this is known as supervised learning). In the realm of
discourse interpretation, then, the texts in the training data would have
to be annotated with representations of their pragmatic interpretation|
these representations would go beyond the content supplied by any gram-
mar. Manually annotating natural language texts with such rich seman-
tic information is prohibitively expensive; at least, for the quantity of
texts which are required for eective training. So to date, this richly an-
notated semantic data (e.g., texts which are annotated with part/whole
MIT Press Math6X9/2003/02/27:16:11 Page 429
430 Chapter 16
book. The important thing to note here is that (16d) has been con-
nected to (16a) and not (16c). This phenomenon of \getting back" to
something you were talking about earlier, after you've had a little di-
gression, is known as discourse popping.4 The fact that people can
cope with digressions in conversation and then perform a `discourse pop'
and go back to an earlier topic of conversation considerably complicates
the models of language and of language use. Discourse interpretation
involves not only identifying the rhetorical connections between utter-
ances, but also identifying exactly which utterances in the text are so
connected. Clearly, these two tasks are heavily dependent on one an-
other, and the suite of default rules that we introduced earlier don't
address the problem of how one decides which sentences in a text are
connected. They would have to be extended accordingly.
The problem of constructing the rhetorical structure of a text is
an ongoing research area in ai, computational linguistics and formal
semantics, and researchers are beginning to take advantage of recent
developments in formal models of reasoning with partial information or
uncertainty|such as default logic, probability theory and statistics|to
get some interesting results for modeling systematically how to interpret
conversation.
Exercises
Exercise 16.1: You can nd the maxims of conversation that we've
discussed here in the following paper by Grice:
H. P. Grice [1975] Logic and Conversation, in Cole, P. and J. L.
Morgan (eds.) Syntax and Semantics Volume 3: Speech Acts, pp41-58,
New York: Academic Press.
1. Can you think of other principles which guide the way we use language
in conversation?
2. Can the principles you just stated be seen to follow from Grice's
principles? If so, how? If not, what consequences does this have for
Grice's theory?
4 This term is due to Barbara Grosz, and draws on analogies about popping elements
o a stack in computer science.
MIT Press Math6X9/2003/02/27:16:11 Page 432
432 Chapter 16
Exercise 16.3:
We infer a lot more from sentence (17) than the content that's derivable
from the meaning of its words and its syntax alone. Describe what this
sentence implies.
(17) John wrote a series of sentences corresponding closely to a
description of the `problem' known as The Projection Problem.
Try to explain how Grice's maxims predict these implications.
Clue: Use be brief and the Maxim of Quality. Think also about the
content that's conveyed by putting something in scare quotes. Do Grice's
maxims have a bearing on our choice as to whether we express something
with scare quotes or not?
434 Chapter 16
Many of the above papers (e.g., Asher and Lascarides (in press),
Grosz and Sidner (1990), Hobbs et al (1993)) take the position that the
interpretation of discourse depends on and is dependent on a discourse
structure that is determined by discourse coherence (see Section 16.6 for
discussion), a point that was rst discussed in Hobbs (1979):
1. Hobbs, J. R. [1979] Coherence and Coreference. Cognitive Science, 3,
67-90.
The following articles oer various logics for reasing with default
rules:
1. Asher, Nicholas and Michael Morreau (1991) Commonsense Entail-
ment, Proceedings of the 12th International Joint Conference on Arti-
cial Intelligence, pp.387{392.
2. Konolige, Kurt (1988) Hierarchic Autoepistemic Logic for Nonmono-
tonic Resoning, Proceedings of the Seventh National Conference on Ar-
ticial Intelligence, pp.439{443.
3. McCarthy, John (1980) Circumscription|A form of non-monotonic
reasoning, Articial Intelligence, 13.1{2, pp.27{39.
4. Pearl, Judea (1988) Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference, Morgan Kaufmann.
5. Reiter, Ray (1980) A Logic for Default Reasoning, Articial Intelli-
gence, 13, pp.91{132.
Following (Grosz and Sidner 1990, Litman and Allen 1990, Perrault
1990) mentioned above, the following paper uses reasoning with inten-
tions to interpret multi-sentence text:
1. Cohen, Philip and Hector Levesque (1990) Rational Interaction as
the Basis for Communication, in P. R. Cohen, J. Morgan and M. Pollack
(eds.) Intentions in Communication, pp.221{255, mit Press.
Finally, for reasons of space, we have not covered in this chapter
corpus-based methods for acquiring automatically a model of discourse
interpretation. The following papers are just two examples of the work
in this area:
1. Stolcke, Andreas, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D.
Juravsky, P. Taylor, R. Martin, C van Ess-Dykma and M. Meteer
MIT Press Math6X9/2003/02/27:16:11 Page 435
436 Chapter 16
MIT Press Math6X9/2003/02/27:16:11 Page 437
17 Processing Dialogue
438 Chapter 17
440 Chapter 17
442 Chapter 17
an indenite number of things in our lexicon that behave like idioms.
And this misses generalisations about indirect speech acts.
The second main problem with treating indirect speech acts as
idioms is that sometimes, both the the direct speech act (as indicated by
linguistic form) and the indirect speech act are equally relevant to the
purposes of the dialogue:
(9) Can you get that suitcase down for me?
The person asking (9) may genuinely not know if the the addressee can
get the suitcase down. If this is the case, then it was not only a genuine
yes/no question, but also a request to pass the suitcase, if the answer to
the question is yes. In other words, it conveys a conditional request: if
you can get the suitcase down for me, please do it. If indirect speech acts
were idiomatic, then it would be irreducible and (9) couldn't have both
meanings simultaneously, much like kick the bucket can't mean literally
to kick a bucket and to die simultaneously.
The second way of addressing the puzzle of encoding the relation
between the literal force of an utterance and its indirect speech act
is to study the inferences that people undertake when listening to
utterances in dialogue. These inferences involve reasoning not only about
the meaning of the sentence that's derived from its syntax, but also
the situation in which it's uttered. For example, Can you p? conveys
a question, according to the grammar. But you can also infer that it
has the indirect force of a request, by taking into account the context
in which the question is uttered. These inferences that an interpreter
makes, about the force of a sentence s, are designed to answer two
questions. First, why did the speaker S utter s? And second, given that,
what is the hearer H supposed to do? Believe something (such as s)?
Or do something (such as pass the salt)?
To model such inferences, we need a theory that links what people
want to what they say. When someone wants something done, sometimes
the best way to achieve the goal is to communicate with somebody in a
dialogue. How do we link goals to things that are said in a dialogue? Do
we use something like Grice's maxims of conversation (see Chapter 16)
to model the link? Or is something else more appropriate?
Certainly, Grice's maxims of conversation help us to infer what's
wanted from what's said in dialogue, even if it's not the whole story.
Take (10) as an example.
MIT Press Math6X9/2003/02/27:16:11 Page 443
444 Chapter 17
(11) b .
0
B: The treasure is at the secret valley,
and I assume you know how to get to the secret valley.
The inference that (11b) means (11b ) in this context is quite complex.
0
to get to the secret valley. This is then part of what (11b) means, as
stated in (11b ).
0
More generally, people can learn quite a bit about what people think
by talking to them, even when they're not talking about themselves
directly. Here, A and B learn that (11b) means (11b ). B learns about
0
doesn't violate our expectations about what B and A think, arising from
reasoning about what the utterances (11ab) mean.
(11) c .
0
A: ?But I know how to get there.
The above discussion about the simple dialogue (11) gives some in-
dication about how complicated processing dialogue can get. Reasoning
ows from the content of sentences that's determined by their linguistic
form; to what people think; and then back again to what the utterances
meant, in the light of what people think. For example in (11), the mean-
ing of but is dependent on the content of (11b) in the context of (11a).
Its content is (11b ) in this context; the added meaning is an inference
0
about what B thinks A thinks, which we obtain from the fact that he
responded to A's question in the way he did. It is this added mean-
ing that makes the use of but|which signals that some expectation is
violated|is acceptable. This reasoning links meaning, beliefs, desires,
and intentions to perform action. As yet, the puzzle as to how one can
model in a formally precise way the interactions among cognitive states
and the content of utterances is not very well understood.
17.3 Conclusion
When engaging in dialogue, people have to think about what the other
participants in the dialogue believe, want and intend to do. All these
MIT Press Math6X9/2003/02/27:16:11 Page 446
446 Chapter 17
things aect the way we should interpret utterances, and the way we
should respond to them. If we come to the wrong conclusions about why
someone said something, then we may respond in inappropriate ways,
as illustrated in (2a,d):
(2) a. A: I was wondering whether I could buy two of the best
seats in the house for the opera on Saturday.
d. B: Oh really? I was wondering whether I could have a
28MB memory upgrade.
According to syntax and its link with meaning, sentence (2a) is an
assertion; either it's true that A is wondering whether he can buy two
seats to the opera or he isn't. But that's clearly not all that (2a) means
in this context. It's also a request to buy two seats to the opera. Since
the request is performed via another speech act|that of asserting|this
is an example of an indirect speech act.
The relationship between the meaning of an utterance as revealed by
its linguistic form and its intended meaning is highly complex, and needs
to be modelled via a reasoning system that stipulates how the content of
utterances, and the beliefs, desires and intentions of the dialogue agents
all interact. The pragmatic maxims that Grice suggests may provide
some informal beginnings of the rules that such a system must obey.
But as yet no one has a solid theory of how all of these factors in
communication relate to one another. In spite of the current lack of
consensus, there has nevertheless been dramatic progress in this area of
linguistic analysis over the last 15 years or so; for some examples see
Section 17.4.
Using beliefs, desires and intentions to compute the meaning that
the speaker intended to convey in his utterance poses many challenges.
One of the main challenges is that one generally doesn't have direct
access to information about cognitive states. A speaker very rarely says
I want : : : explicitly. Rather, we have to make an educated guess as to
what a speaker is thinking, based on observing his behaviour and actions
(including his utterances). Drawing conclusions about cognitive states
from the agent's actions can be modelled by default rules, in line with
the strategy used in Chapter 16. But devising the appropriate default
rules about how we infer what a person thinks, and how that eects our
interpretation of what he says, is a very complex task. At present, we
MIT Press Math6X9/2003/02/27:16:11 Page 447
Exercises
Exercise 17.1: Some of the ideas presented in this chapter are taken
from the following seminal paper on indirect speech acts:
J. Searle [1975] Indirect Speech Acts, in P. Cole and J. L. Morgan
(eds.) Syntax and Semantics, pp59{82, Academic Press, New York.
In this paper, Searle discusses the following dialogue:
(1) Student X: Let's go to the movies tonight.
(2) Student Y: I have to study for an exam.
This is in a section entitled A Sample Case. Read this section (it's only
about 2 pages). Can you see any connections between Searle's discussion
of utterance (2) in the context of (1), and what Grice would say about
what's implicated by utterance (2) in this context? In other words, do
Grice's maxims help us infer from (2) that Y is rejecting X 's proposal,
and if so, how?
Exercise 17.2: Herb Clark has done a lot of work on the way people
respond in dialogues:
H. Clark [1979] Responding to Indirect Speech Acts, Cognitive Psy-
chology, 11, pp430{477, Academic Press.
1. Clark gives Uh|it's six and Let me see{it's six as possible responses
to the question What time is it? He says that Uh and Let me see are not
MIT Press Math6X9/2003/02/27:16:11 Page 448
448 Chapter 17
450 Chapter 17
MIT Press Math6X9/2003/02/27:16:11 Page 451
V GRAPHICAL COMMUNICATION
MIT Press Math6X9/2003/02/27:16:11 Page 452
MIT Press Math6X9/2003/02/27:16:11 Page 453
18 Graphical Communication
18.1 Introduction
So far this book has been heavily focussed on communication in lan-
guage. The beginning of the book dealt generally with the cognitive
approach to understanding communication in terms of information and
computation, but even there most of the examples were linguistic|
reasoning in Wason's selection task, solving Tversky's puzzles. The cen-
tral portion of the book was devoted to formalising linguistic communi-
cation, even if one of its aims was to show how much more than language
is involved. But now we turn to consider communication in modalities
other than language, and we will pay most attention to diagrams.
If you observe communication in the `wild', even when language is
the main vehicle, it is rarely language alone. With speech between partic-
ipants present (though not over the telephone) there is facial expression,
body language, and the immediately perceivable surrounding environ-
ment. Speakers point at things, and this plays a role in their commu-
nication. Even when they don't point, the visually sensed presence of
objects and people can be used in communicating. The written and
printed word is often accompanied by pictures, maps, graphs, diagrams
etc. Communication would often be virtually impossible without these
non-linguistic supports|imagine an atlas without maps, or a car main-
tenance manual without diagrams. On the other hand, graphics without
any words at all are rather unusual. Raymond Briggs' picture story `The
Snowman' is one of the very few books with no words. Instructions for
assembling
at-packed furniture are notorious examples of attempts to
dispense with language in favour of diagrams (to save producing multiple
language editions), but they are often quite hard to understand.
So one of our interests in looking at graphics is to give an antidote to
the idea that all communication is linguistic. It is also useful to compare
what we nd with what we have observed about language. Diagrams
give some `distance' to stand back from language so we can see what is
unique and what is shared. A third reason for including non-linguistic
communication is that there is a technological revolution going on which
is making available media, and combinations of media which have never
been used before. Multi-media (or maybe multi-meejah?) is the buzz
MIT Press Math6X9/2003/02/27:16:11 Page 454
454 Chapter 18
Table 18.1
Output of treacle (metric kilotonnes) by region for the years 1880{82
Regions '80 '81 '82
Bavaria 27 41 63
Bohemia 42 36 83
Bulgaria 8 88 8
90
80
70
60
50 Bavaria
Bohemia
40
Bulgaria
30
20
10
0
80 81 82
Figure 18.1
A bar chart showing treacle output.
456 Chapter 18
80
81
82
Figure 18.2
A pie chart showing treacle output in the Bavarian region.
also use either the histogram, the pie chart or the line graph. We will
call this class of graphics for representing relational data matrix graphics.
There are other kinds of matrix graphics we could use. Notice that there
are two kinds of question here. Firstly, how do we choose what kind of
graphic? And secondly how do we design the best example of its kind?
What is to choose between table, histogram, line graph, and pie-
chart? The question turns on what our `reader' is trying to do (or perhaps
what we want them to do). Tables have the virtue (and the vice) that
the numbers entered are absolute quantities. If we want to know actual
quantities, for whatever reason, then tables are good. But if we are less
interested in absolutes, but want too know about relative trends (was
Bohemia on the up relative to Bulgaria? Did Bohemia ever get above
Bavaria? Which was the bumper year for treacle?) then a histogram or
a line graph is better than a table. We can work it out from the table.
All the same information is there. But we have to work.
But there are some relational questions which are still not easy from
the histogram or the line graph. Suppose we want to emphasise the
relative importance of the regions in a year, but are uninterested in the
absolute levels? We can read the relations o a line graph or histogram,
but if there are more regions than three this can become hard and a pie
chart makes this relational information easier to process. The pie chart
MIT Press Math6X9/2003/02/27:16:11 Page 457
100
80
60
40 100
11111
00000
20 80
00000
11111
0 60
80
82
40
00000
11111
Bavaria
00000
11111 11111
00000
20
00000
11111
0
80
82
00000
11111
Bohemia
00000
11111
11111111
00000000
90
80
00000000
11111111
70
60
50
00000000
11111111
40
30
20
00000000
11111111
10
0 Bulgaria
00000000
11111111
80
81
82
00000000
11111111
Figure 18.3
An \infographic" relating treacle output and location. The map in the background
was provided by the Xerox PARC Map viewer.
458 Chapter 18
90
80
70
60
50 Bavaria
Bohemia
40
Bulgaria
30
20
10
0
80 81 82
Figure 18.4
A line graph showing the evolution of treacle output over time.
and the years would have to be assigned to the two axes. Try it|it
produces a pretty hard graphic to read.
To be less bizarre, but still odd, we could have put the years on the
vertical and the tonnages on the horizontal (in `horizontal histograms
this is common: in a line graph it is not so common{the vertical is
usually reserved for some `dependent' variable quantity (such as output,
rainfall, . . . ), and time is generally on the horizontal. We are accustomed
to looking at the increases and decreases in the quantity we are focussing
on re
ected as rises and falls through time which is portrayed as
owing
from left to right horizontally across the page. Choosing what the
quantity is that we want to focus on is, of course, the central choice in
designing these graphics. Lots more is known about the craft of selecting
kinds of graphic and designing specic examples. These design issues are
much better treated by authors such as Tufte than we have room or skill
for here.
We turn to more fundamental questions|as usual questions whose
answers may seem extremely obvious but which repay explicitness.
How it is that a table, a histogram, a pie chart, a line graph conveys
information at all? Tables are a good place to start. Tables are a nice
half-way house between graphics and language, but they certainly qualify
as graphics under our crude denition because they use spatial relations
MIT Press Math6X9/2003/02/27:16:11 Page 459
460 Chapter 18
one-dimensional space or time. The links in this spatial chain do not have
xed meanings directly interpreted. Instead the chains are interpreted as
having a syntax (represented usually as a tree, or by parentheses), and
it is only these much richer syntactic relations that can be semantically
interpreted. The meaning-relation between a noun and a following verb is
quite dierent than the meaning-relation between a verb and a following
noun. But their spatial relation as links in the chain are identical. If
languages did not have this abstract syntax, and the single concatenation
with a single meaning-relation was all there was to interpret, they would
be able to say little indeed. All this is just to reprise what was said about
logical languages in Chapter 5 and developed into great detail in the
central parts of the course. It is so obvious as to be invisible.
Graphics interpret more than a single spatial relation, but they inter-
pret these relations without a ny intervening concatenation or abstract
syntax. We can see this in the fragment of line graph in Figure 18.1.
The vertical distance relations between (and the other markers) and
the x-axis are interpreted in terms of size of tonnage; the horizontal dis-
tance relation between and the y-axis is interpreted in terms of year.
Both are directly interpreted|distance means time (or distance means
weight). There is no syntax. True the graphic incorporates verbal anno-
tations in the legend. These are lists in our example but they could be
sentences. But all they do in terms of the main graphical eld is to dene
the meaning of the graphical icon . This graphic has three dimensions
directly interpreted|year, tonnage, and region. The rst two are sim-
ple spatial relations and the third is dened by a set of unordered icon
shapes. Of course we could add more dimensions|perhaps the hue of the
-icon could indicate the market price in that year in that region|and
so on. But each dimension is directly interpreted.
These graphics are limited in the number of dimensions that they
could use. A large number of dimensions for a graphic (say ve or six)
is still trivial compared with the number of combinations of words and
syntactic relations which can be packed into a paragraph of text each
having a distinct semantic interpretation. Graphics and language de-
rive their respective strengths and weaknesses from these fundamentally
dierent ways of meaning|direct interpretation vs. indirect interpreta-
tion through syntax. To see how these far-reaching implication follow,
consider the issue of what information is required to draw a graphic.
It is an immediate consequence of the directness of interpretation
MIT Press Math6X9/2003/02/27:16:11 Page 461
462 Chapter 18
have been discussing so far. For example, the nodes may be interpreted
as kinds of components in an electronic circuit (perhaps with dierent
shapes for capacitors, resistors, transistors etc. and numbers to express
values of capacitance, resistance etc. The links are then interpreted as
conductors|maybe wires.
Semantically, such a circuit diagram is rather like a map. Often
the map only represents the circuit's topology (its connections; not its
shape), but nevertheless its semantics is fully concrete. For example, a
two components are shown in dierent places, then they are distinct
components, even if they may be of the same type. If two components
are not connected by a wire, then the nodes that represent them in the
diagram will have no joining link. And vice versa, if there is no link
in the diagram, then there is no wire corresponding. We cannot draw
a picture of one wire that goes to one component or another, but not
make it clear which one. At least if it is unclear that will be a defect in
the diagram, perhaps caused by it being too cluttered to `read' properly.
This style of interpretation of the righthand gure (of these very
same graphical node-and-link formalisms) are essentially linguistic, and
are capable of expressing abstractions just like linguistic representations
generally. Such interpretations are common in computer science where
they often underlie so-called `visual programming' languages. Consider
the node-and-link graphic in Figure 18.5. It is a simple example of what is
called an entity relationship diagram in computer science. We interpret
each node as denoting the person who is named by its label, and we
interpret the graphical arrow relation as indicating that the person
denoted by the node at an arrow's root, loves the person denoted by
the node at its tip. So the diagram can be `read' as indicating that Pavel
loves Sue. Sue loves John. Pavel loves Sarah. Sarah loves John. The usual
kind of tangle.
How does this kind of graphic compare with the matrix graphics
for treacle which we discussed above? Does it demand completeness?
This question has to be taken at several levels, and raises questions at
each. Asked about just the nodes rst, it at rst seems that they do
completely determine the identity relations of the people they denote.
But this is only because we tend to assume that Sue is not Sarah and
Pavel not John. If we can assume that the nodes determine all identity
relations, then we can rule these complexities out. If we look at the
use of such diagrams, say in specifying databases, then this assumption
MIT Press Math6X9/2003/02/27:16:11 Page 463
464 Chapter 18
A B C B A C
A 0 2 4 C 5 4 0 C
B 2 0 5 B 0 2 5 B
C 4 5 0 A 2 0 4 A
466 Chapter 18
least, unhelpful. This serves to remind us that while linguistics has much
to say about what is possible in language, it has had less to say about
what design of language constitutes skilled or optimal communication.
This issue how to optimise communication comes to the fore with our
question about media/modality assignment.
In the current examples of paragraphs, tables and maps, the same
information is presented in several dierent ways. What is the conse-
quence for the reader? What tasks will these expressions facilitate, and
which will they retard? And why? This last question is the scientist's
question. It seeks an explanation. It is not satised with obvious `truth'.
For some people it is obvious that graphical interfaces for computers are
best, and it is obvious that that is because they are graphical. But for
other people, linguistically driven interfaces are obviously best. The only
thing that is obvious to the scientist is that what is obvious may or may
not be true, and even if true, we may or may not have an explanation
why it is true.
The approach we will take to this question is, as with most of the
examples in this course, to look in depth at a rather simple example. The
example we will take is the use of diagrams in teaching very elementary
logic. We choose this domain, as usual, for many reasons. First, it is
one in which we can assess accurately what information is expressed
by alternative diagrammatic and linguistic expressions. If we are to
study alternative expressions of the same information then it had better
be the same information. Second, teaching/learning logic is a dicult
kind of communication (for both teacher and learner), and so provides
something more than a toy example. Third, this is a domain in which
there has been strong controversy within the teaching profession for
several centuries|the disputes between those in favour of using diagrams
and those against, sometimes feel like a microcosm of the wars over
the religious use of imagery. Fourth, logic is the discipline that has
contributed most to an understanding of sentential semantics, and so
it provides a useful place to branch out into the study of the meaning
of other representations. And nally, elementary logic has much to oer
to an understanding of communication, and so relates back to several
other parts of this course|conditional reasoning, logic and computation,
theories of discourse, conversational implicature, and so on.
Our plan is as follows. In the next section we dene some of the
terms involved|particularly media and modality. In Section 18.3, we
MIT Press Math6X9/2003/02/27:16:11 Page 467
468 Chapter 18
by eye.
But this is where the terminological problems begin. Psychologists
use the term modality for sensory modalities: sight, hearing, smell, touch
etc. And so psychologists' modalities are closely related to what com-
puter scientists call `media'. The physics of the computer scientists'
medium determines which of the psychologists modalities can perceive
them. Psychology does not have a systematic term for what is common
to language across media, or common to diagrams across media, be-
cause psychology has not concerned itself with the systematic study of
semantics. Since this aspect of semantic interpretation is our main fo-
cus, we will retain the term modality in the logic/computer science sense.
Just remember that psychologists often use it to mean sensory modal-
ity. Figure 18.2 is intended to help you remember what these two terms
mean through examples. As usual, these terminological issues prove to
be rather important when discussing the `facts'. For example, when it
is claimed that `visual' computer languages are good because they are
visual, it is obvious that they cannot mean by `visual' perceived through
the eye, since text is read through the eye and is exactly what they want
to contrast visual languages to. Probing for just what is meant by visual
helps to get nearer to well-dened empirical questions.
470 Chapter 18
believe what she says, nor that the hearer know or believe it afterwards.
They may both treat it as the telling of a ctional story. Indeed, there
is no place which actually corresponds to the map used in the Map Task
experiments. What is important is that there is asymmetry of authority
for the information| that the speaker knows and the hearer doesn't and
both accept this understanding (or pretence) of the state of aairs. If the
hearer suddenly, in mid-stream, announces that she disagrees and won't
accept something said, then exposition has broken down. The hearer
is then asserting symmetrical authority, and some negotiation will have
to take place to restore the earlier situation before exposition can be
resumed.
But much more actually goes on in communication, even in the
Map Task. The participants establish a l anguage, albeit a very local
language, which they did not share at the outset. This establishment of a
common local language is what espcially involves derivation. Derivation,
in contras to expositiont, is a mode of communication in which authority
for information is symmetrical between participants. They share a set
of assumptions with regard to which they have equal authority. The
business of derivation is to represent some part of this shared set of
assumptions in a novel form. The most sustained cases of pure derivation
occur in mathematics where proofs may proceed for many thousands of
lines. In less formalised cases, derivation frequently goes on interspersed
between bouts of exposition. Just as with exposition, it is generally not
important whether the assumptions are in fact true. What is important
is that they are shared|that all participants are on an equal footing
with regard to the base assumptions. Remember what you learnt about
the dierence between truth and validity (in Chapter 5).
If a previously undetected disparity between participant's assump-
tions emerges during a derivation, then repair is necessary. A `fact' may
be checked (thus appealing to some authority for a piece of exposition).
But sometimes one participant simply agrees to change their assump-
tion of the oending item for the sake of argument. Without agreement
about the assumptions that are going to operate (and the consequent
symmetry of authority) derivation cannot proceed.
Exposition is about passing contingent knowledge from one com-
municator to another. Derivation is about exploring the necessary con-
sequences of shared assumptions. The former is all about getting new
information to the receiver. In the latter, if any information emerges
MIT Press Math6X9/2003/02/27:16:11 Page 471
as new, then something has gone wrong. Derivation produces only new
forms for old information content. Of course, derivation may lead to new
surprising conclusions. But what is new is that the new conclusion re-
represents some aspect of our old assumptions in a new way. The newness
is at a meta-level|we realize that something is a consequence of what
we already knew|not some new assumption at the object level given us
in exposition. This re-representation of shared knowledge is important
in learning abstract concepts and establishing mutual languages because
often our only way of ensuring that we have the same interpretation for
our language is to ensure that we make the same inferences from the
same assumptions.
The whole goal of communication is to arrive at a community of peo-
ple who share assumptions and represent them in common forms. But
there is a tendency to think of communication on the expository model
(neglecting the importance of derivation), and we will see presently that
there is good evidence that that is what students tend to do in many
psychological experiments on reasoning. Students may have a sophisti-
cated ability to conduct communications including both exposition and
derivation without an explicit grasp of the dierence between the two
modes. This is analogous to the way they have a sophisticated mastery
of the syntax of their native language but do not explicitly know about
the rules. Much of teaching has to do with making implicit knowledge
explicit. And succeeding in doing that changes peoples' abilities to do
things|explicit knowledge generalises in ways that implicit knowledge
does not.
This is our model of what has to be learnt about components of
discourse in learning logic. We now turn to look at the details of a
particular logical fragment. We will return to this distinction between
exposition and derivation when we have looked at some of the dierences
between graphical and sentential semantics.
Problems in interpreting quantiers
We already saw in Chapter 3 student subjects struggling to understand
what experimenters meant by the instructions for reasoning tasks such
as Wason's selection task. For example, some students interpreted an `if
. . . then' sentence to mean `if and only if . . . then'. We argued that these
interpretational struggles arose from con
icts between the students' un-
derstanding of the task setting and the sentences used. Similar problems
MIT Press Math6X9/2003/02/27:16:11 Page 472
472 Chapter 18
arise in even simpler tasks than the selection task. For example, if under-
graduate subjects are given the sentence `All A are B' and asked whether
it follows that `All B are A' a substantial number say that it does follow.
This pattern of inference is known as the `illicit conversion' of the con-
ditional and has been well known to logic teachers since classical times.
It is generally assumed that this is an error of reasoning. Again we will
argue that this and other errors arise from students' interpretations of
what they are being asked to do.
For another example, the philosopher Paul Grice, whose conversa-
tional maxims you have heard about in Alex's lectures, was originally
inspired to develop his view of the relation between logic and commu-
nication by his observations of problems experienced by his students
learning logic. For example, told to assume that Some A are B they
would show evidence of inferring that Some A are not B, and when
challenged thee would justify themselves by reasoning as follows:
\Some A are B must imply that Some A are not B because otherwise the
speaker would have said All A are B"
This prompted Grice to formulate his conversational maxims as pre-
sented in Chapter 16. Grice noticed that this pattern of inference is
based on an assumption of the speaker's cooperativeness. The hearer
assumes that the speaker is cooperating by saying just the things that
will allow the hearer to guess the particular model that the speaker has
in mind. The hearer does not take the stance of trying to nd a model
which will defeat the speaker's statements. In terms of our discussion
in the previous section, the student, quite reasonably, adopts an exposi-
tional model of communication rather than a derivational one. In other
words, there is a misunderstanding between student and teacher about
the meaning of `follows' just as there were many misunderstandings in
the selection task. The teacher meant l ogically follows but the students
interpreted the question as c onversationally follows. And of course it
doesn't help merely clarifying the task by saying `logically follows' be-
cause before learning logic the distinction between logically conseuqence
and conversational consequence is not explicitly available.
Although Grice (and subsequent discussions) focussed on coopera-
tion, it is also worth noting that the hearer appears to make a further
assumption, namely that the speaker is omniscient (as regards the mat-
ters at hand at least). If the speaker didn't know whether all As are Bs,
he might well say just Some As are Bs for that reason. The making of
MIT Press Math6X9/2003/02/27:16:11 Page 473
the inference to all As are Bs appears to indicate that the hearer thinks
of the speaker as omniscient. Perhaps the way to think of this is in terms
of the `omniscience' of the narrator of a story?
These examples of `errors' in reasoning (illicit conversion and con-
versational implicatures) are sins of commission|inferences that should
not have been made according to the experimenter's competence model
are made. But careful investigation of students' interpretations of simple
quantied sentences also reveals sins of omission. An example is fail-
ure to convert conditionals that do logically allow conversion: given
the premiss Some A are B the reasoner fails to conclude that Some B
are A. These errors are important for what they can tell us about what
students have to learn about logic. In these cases, although the conclu-
sion follows logically, it does not follow conversationally. An explanation
is given below in terms of what we will call information packaging, and
information packaging is one feature of language which sets it o from
diagrams.
Even more important than nding unnoticed errors is that by looking
at the full range of errors of ommission and commission we observe
highly systematic individual patterns of error across quantiers. For
example, students who make the rst error (illicit conversion), rarely
make the second (failure to convert) and vice versa. There are `styles' of
interpretation of these sentences which cross particular inferences. We
do not yet know the full basis of these styles but they appear to be
related to the graphical/linguistic preferences which we describe below.
These problems we have described are problems which students have
with interpreting quantiers in a logical way before learning logic. So, if
these problems are evidence about what has to be learned about logic,
how can we study the cognitive processes involved in learning logic with
and without diagrams? We now describe a study of some very simple
logic learning designed to do just that. First we describe syllogistic logic
and dierent ways of teaching it, before reporting the results of a study
of the eects of dierent assignments of modalities in the teaching it.
What are syllogisms?
We choose syllogisms as a fragment of logic for several reasons. We could
take the propositional calculus which appeared in Chapter 5. But we
only know of one graphical method of teaching it and to understand
that method takes more time than we can devote to the example here.
MIT Press Math6X9/2003/02/27:16:11 Page 474
474 Chapter 18
The syllogism has the great advantage that there are several graphical
systems and we can compare them, both amongst themselves as well as
with a sentential method. Besides, learning another logic is helpful for
comparison purposes too.
Syllogisms are historically important as one of the rst fragments of
logic for which there was a real logical theory|developed by Aristotle.
Syllogisms are logic problems in which two premisses about relations
between two end properties and a shared middle property, license con-
clusions about the relation between the two end properties. For example,
in the syllogism shown here,
(1) All artists are botanists.
All botanists are chemists.
types from the list of eight. Either at least one of the premisses is false,
or the conclusion is true, so remembering what was said in Chapter 5,
this pattern of argument is valid.
Since syllogisms are just about whether there are or aren't things
of these eight types, there are obviously 28 or 256 possible syllogistic
worlds. That is just all combinations of the eight. The reader can check
for our example syllogism that indeed the conclusion is true whenever
the premisses are true in all 256 possible worlds. Laborious?
If we take an example like
(4) All A are B.
All C are B.
Therefore, all A are C
we nd that there is no valid conclusion. For any of the eight possible
candidate conclusions, we can exhibit a counter example world in which
the premisses are true and the conclusion false. The world where there
are just two things, one :ABC, and the other AB:C is such a world.
Thinking about these two example syllogisms should be enough to make
it clear that exhaustive searching of this space of models is an arduous
way of deciding whether a proposition is a valid conclusion of a syllogism,
or whether a syllogism has any valid conclusion. What we have presented
here is what logicians call the model theory of the syllogism|its most
fundamental semantics. What we need is some less arduous way of
computing valid conclusions. It's even clear that we have some intuitive
methods of solving some of these problems because we don't need to
search through the 256 models to check that the rst (very easy) example
is valid.
Before going on, it is worth comparing this (your second logic)
with the propositional logic described in Chapter 5. The model theory
for the syllogism that has just been described is like truth tables for
propositional logic|a way of examining all possible worlds. The methods
we are about to look at for solving syllogisms are like the natural
deduction rules (proof theory) we gave for the propositional calculus,
except that some of the methods use diagrams instead of sentential rules.
The main dierence between the two logics is that syllogisms analyse the
truth of sentences in terms of individuals that have properties in their
worlds, whereas propositional calculus never analyses below the level of
whole atomic sentences.
MIT Press Math6X9/2003/02/27:16:11 Page 476
476 Chapter 18
478 Chapter 18
480 Chapter 18
Self-consistency
It is easy to construct sets of sentences which are inconsistent, as we all
know to our cost. An inconsistent set of sentences describes nothing, or,
in logical parlance has no model. Subsets of its sentences may describe
things because they may be consistent, but there is nothing which is a
model of the whole set. For example, the pair of sentences All A are B,
Some A are not B has no model.
A remarkable property of Euler diagrams is that there are no dia-
grams which correspond in what they express to inconsistent sets of sen-
tences. It is impossible to construct an inconsistent Euler diagram. For
example, try representing the pair of inconsistent sentences quoted in the
previous paragraph, in Euler diagrams. We can dene this property of
representational systems as self-consistency. A self-consistent rep-
resentational system cannot represent inconsistent sets of propositions.
What underlies this dierence between sentences and Euler's Circles?
We might be tempted to argue that all diagrams are self-consistent. We
can't draw a picture of a square circle although we have no trouble
producing the singleton set of sentences: There is a square circle. Maybe
this is the fundamental semantic property which distinguishes diagrams
from sentential languages? We could even have a slogan \The diagram
never lies, or at least if it does, it always tells a good story".
But we don't have to go far to nd that this is not the simple
essence of diagrammatic representation. We have only to look at Venn.
In this system there is no problem about representing inconsistent sets
of sentences. If there is a and a zero in the same subregion, then the
diagram is inconsistent.
Now one might try to rescue Venn by introducing some formal
rule which ruled out all such diagrams as `ill-formed', thus leaving
well-formed Venn diagrams as self-consistent. But to do so would be
to miss the fundamental. Euler's system does not have the resources
for expressing inconsistencies. The topology of three circles, plus the
mechanism of marking individual regions for non-emptiness, just will not
express inconsistency. Euler uses only topology to express the emptiness
of categories (through the absence of sub regions). Venn's system has
the required means of expressing inconsistency because it does not use
topology but an arbitrary placeable mark to express emptiness. One can
introduce conventions to prevent inconsistency (such as \don't put two
MIT Press Math6X9/2003/02/27:16:11 Page 482
482 Chapter 18
marks in one minimal region" but these are not graphical constraints).
Self-consistency is a property of some diagrams but not others, and
adding certain expressive powers of arbitrary notation appears to be
what leads to its breakdown. Later we will return to consider what
cognitive eects this property might have, but here we turn to another
related semantic property|expressiveness.
Expressiveness|concreteness and abstraction
Expressiveness is a technical term from logic which denotes the power
of a representation system to express abstractions. In general, natural
languages such as English are highly expressive|there may be no ab-
straction they cannot express. In general, diagrams are less expressive,
often extremely so. To take a very simple but crucial example, suppose
we want to express the abstraction that there is an evening star, and a
morning star but it is not known whether they are the same or not (if
you cannot handle this degree of astronomical ignorance, then think of
Mr. Ortcutt and my bank manager. I assure you they both exist, and
you don't know whether they are the same or not). In language express-
ing these abstractions is easy. We can say: There is an evening star and
there is a morning star. If we want to be absolutely sure to avoid con-
fusion, we may explicitly say we don't know whether they are the same
or not.
But now try drawing a diagram with both the evening star and the
morning star in it, but without resolving whether they are one and the
same or not. Maybe, if you are resourceful enough, you can nd some
way of conveying this particular abstraction, but in general you have
to resort to some very a d hoc tricks. Most diagrammatic systems force
the representation of identity relations. They cannot abstract over these
relations.
This is another side to the 4 =
6 triangle problem we discussed under
conventionality. The more we treated 4 like a conventional symbol, the
more it took on the powers of a word to abstract over details of the
particular triangle drawn. But the more we treated 4 as a picture of a
triangle, the more its meaning could not abstract over the possibilities.
We cannot draw a triangle without completely xing the ratios of its
sides. We can ddle with whether or not our diagrammatic system
interprets (treats as signicant) the details, but we cannot avoid them
being there.
MIT Press Math6X9/2003/02/27:16:11 Page 483
484 Chapter 18
are B, and the task at hands requires this abstraction (or is at least
made much easier by having it), then the diagrammatic system will be
a hindrance, an a sentential system preferable. But if these abstractions
are not required, then the weak diagrammatic system will make inference
easier. If this is right, then we should expect modality assignment to be a
question of horses for courses. We would need to analyse carefully what
abstractions are really useful for the task, and which representations can
express them. Going back to syllogisms, Euler (with crosses) is sucient
for expressing all the abstractions necessary for syllogisms in a single
diagram for each problem. So we might expect this system to be eective.
Information packaging in sentences
Our last contrast between diagrammatic and sentential semantics has
to do with features of sentential semantics which do not have any
corresponding features in diagrammatic systems. Language allows us
to present the same information in many dierent ways. In speech (at
least in English), intonation and stress (the rising and falling pitch and
loudness) plays a part in what linguists call information packaging. In
writing, these same distinctions are expressed syntactically by changes in
the structure of the sentence. For example, a simple indicative statement
such as The pipes are rusty normally bears main stress on rusty and the
end of the sentence has a falling intonation. Such a pronunciation of the
sentence is roughly what would be typical if it was uttered as an answer
to the question: What's the matter with the pipes? The pipes are rusty
(boldface crudely marks main stress). In contrast, if the same sentence
were uttered in answer to a dierent question, it receives dierent stress
and intonation: What's rusty? Its the pipes that are rusty. (Note the
change in syntactic organisation that might go along with this|the
sentence is now a cleft structure beginning with an it is clause). Dierent
question contexts can give rise to yet dierent renderings: Is it the taps
that are rusty? No, the rusty ones are the pipes. Here the stress would be
contrastive and rather more pronounced than in the previous example.
Speech tends to use prosodic information to mark these distinctions.
Written language cannot, so it must use syntax. Spoken language can use
syntax too. Dierent languages use dierent resources for expressing the
distinctions. But what are these distinctions? They are not completely
understood, and are quite subtle. They do not obviously involve dier-
ences in the proposition expressed. In each case that is just that the taps
MIT Press Math6X9/2003/02/27:16:11 Page 485
are rusty. One important part of what is dierent from one packaging
to another is the distinction between new and old information. Taking
our rst example, What's the matter with the pipes? Answer: They are
rusty, the answerer assumes on the basis of the questioner's mention of
the pipes, that the pipes are already known to her|they are old infor-
mation. What is unknown to the questioner is the rust|this is the new
information.
In very simple sentences like these, the syntactic subject tends to
be old information: the syntactic predicate new information. Even in a
monologue, speakers structure their sentences so that they use subjects
which are old information and predicates for new information. Of course,
once information has been introduced as new, it becomes immediately
old. In the archetypal story where nothing is old information at the
very beginning, a special form is chosen that ensures that the only new
information is in predicate position: Once upon a time there was a cat.
The cat was a tabby. We see that whereas the cat is new information in
the rst sentence, that even by the second sentence, the cat is already
old (informationally). If you want to expose information packaging and
its power to control communication, take a simple newspaper story,
and starting at the end, rearrange the syntax of each sentence so
that it expresses the same proposition, but the new/old information
packaging is changed. Then try reading the story. It usually becomes
profoundly incomprehensible. Another context in which we may notice
these structures is in the speech of young children who may not be
completely adept at understanding the knowledge state of others, and
especially of strangers.
Even from this crude description of information packaging, it is
evident that information packaging is tailored to the expositional model
of communication. In derivation, since all assumptions are shared, all
information is technically old. In actual practice, matters may be more
complex, especially in a lengthy derivation, but there is a clear link
between the categories of information packaging, and the expositional
model of communication. In exposition Some A are B is not equivalent
to Some B are A. But we have to learn that in derivation these are
equivalent. This is our explanation of why some students refuse to
conclude Some B are A given the premiss that Some A are B. These
two sentences are not `conversationally' equivalent even though they are
logically equivalent, and these students haven't yet dierentiated the
MIT Press Math6X9/2003/02/27:16:11 Page 486
486 Chapter 18
488 Chapter 18
and required fewer interventions by the tutor when taught using Euler
diagrams than their peers who scored lower on this test. But when taught
the sentential method this eect was reversed|students scoring low on
the tests made fewer errors and required fewer teaching interventions
than their higher scoring peers.
The symmetry of these individual dierences is particularly striking.
There are cases where higher test scorers show poorer performance with
a teaching method, and cases where lower test scorers show better per-
formance. This means that there is more here than merely the tendency
of any psychometric test to test `general aptitude' or, worse still, `general
test savvy'. These are real stylistic dierences. Some methods suit some
people better than others.
None of our discussions here of the dierences between graphical
and sentential semantics have so far indicated why one should get these
individual dierences. Sometimes psychologists tell simplistic stories
that some people have images in their right-brain and words in the left-
brain, some people being more asymmetrical than others, and claim
that this explains why there are individual dierences. We can already
show that matters are considerably more subtle than that. The students
in our study who do well with diagrams actually use less concrete
diagrams. What they are really good at is translating between diagrams
and sentences, and knowing when it is useful to do so. There may
well be dierences in the brains of these dierent kinds of thinkers,
but the dierence is not just one of preferences for representations.
And the dierences may be quite easily changed by learning. At the
moment we just don't know very much, but we do know that any theory
of these dierences is going to have to explain how these individual
dierences interact with how diagrams and sentences dier in the way
their semantics works.
Dierent students have dierent ways of interpreting the syllogis-
tic premisses in the interpretation studies described earlier, and these
dierences can be traced to dierences in how they conceive of the ex-
perimenter's intentions with regard to the language used|whether they
adopt an expositional or a derivational model. We have seen that in-
formation packaging is tuned to operation in expositional language. For
students who adopt these expositional interpretations, we might expect
diargams to be useful in learning to distinguish proposition from pack-
aging and conversational consequence from logical consequence. Other
MIT Press Math6X9/2003/02/27:16:11 Page 489
Exercises
Exercise 18.1: Diagrams can be turned into texts but texts can't al-
ways be turned into diagrams. Illustrate with useful example diagram
and paragraph showing how this follows from the concreteness of dia-
grams.
Exercise 18.3: Which of the two learning styles described here do you
think ts your own study methods most closely. Say why you think this
may be so.
18.5 Readings
Chandrasekaran, B. & J. Glasgow (eds.) Diagrammatic Reasoning:
Computational and Cognitive Perspectives on Problem Solving with Di-
agrams. MIT Press.
Stenning, K. (2002) Seeing Reason: language and image in learning to
think. Oxford University Press
MIT Press Math6X9/2003/02/27:16:11 Page 490
490 Chapter 18
A vocabulary of symbols:
P, Q, R, . . .
&, _, . . .
(, ), . . .
P & Q is true just in case P is true and Q is true.
HH
Pavel H
HHH
*
H HHj*
Jane HH
*
HHHj
HHj Harry
* _el
Jane
Harry
j Sue Sue
Graphical semanticsais sometimes similar and sometimes dierent tob this linguistic
semantics. In the lefthand network the spatial relation is directly interpreted and
has a uniform meaning. But in the righthand network the links between the logical
operator vel and the other nodes have a dierent semantic signicance. So again, it
is an abstract syntax that is being interpreted in this second case.
Figure 18.5
Sentential and graphical semantics compared
MIT Press Math6X9/2003/02/27:16:11 Page 491
Modality Contrast
M
e
d Visual Diagram Visual Text
i
a
C
o
n
t
Tactile Diagram Braille Text
r
a
s
t
Figure 18.6
The concepts of medium and modality applied to tactile and visual diagrams and to
printed and Braille texts
MIT Press Math6X9/2003/02/27:16:11 Page 492
492 Chapter 18
Figure 18.7
A `sentential' method based on the process of constructing an individual
description (ID) by conjoining terms for each predicate or its negation (`no valid
conclusion' is abbreviated NVC)
MIT Press Math6X9/2003/02/27:16:11 Page 493
B
A
All A are B
A B
Some A are B
A B No A are B
A B
Figure 18.8
A primitive encoding of premisses in Euler's Circles
B
A All A are B
x
A x B Some A are B
A B No A are B
x x
Figure 18.9
An encoding of premisses in unique Euler's Circles by marking non-empty
subregions with crosses.
MIT Press Math6X9/2003/02/27:16:11 Page 494
494 Chapter 18
C B
A
x x
B
C A B
x
Figure 18.10
An example of the graphical algorithm applied to the syllogism All A are B. Some
C are not B.
MIT Press Math6X9/2003/02/27:16:11 Page 495
Figure 18.11
Summary of a graphical algorithm for solving syllogisms using Euler's Circles
MIT Press Math6X9/2003/02/27:16:11 Page 496
496 Chapter 18
A B B C
X
X
A B
0
?
A C X
C
B X
Figure 18.12
An example syllogism solved by Euler and Venn systems
MIT Press Math6X9/2003/02/27:16:11 Page 497
7
GRE low
6
GRE high
0
EC ND
Teaching method
Figure 18.13
Number of reasoning errors at manipulation phase by GRE score and by teaching
method.
35
GRE low
30
Mean number of interventions
GRE high
25
20
15
10
0
EC ND
Teaching method
Figure 18.14
Number of tutor interventions at manipulation phase by GRE score and by
teaching method.
MIT Press Math6X9/2003/02/27:16:11 Page 498
498 Chapter 18
MIT Press Math6X9/2003/02/27:16:11 Page 499
504 Chapter 19
506 Chapter 19
principle tell whether the room we are dealing with by input and output
is the computational simulation (based on Searle's rule rewriting), or
merely contains a chinese-reading human being.
But, goes the argument, he, Searle, does not understand Chinese
characters because he does not know any Chinese. All he can do is recog-
nise formal identities of ideograms. Since the process is computational,
all processing is done with respect only of the form of the symbols on
the paper which arrives through the chute, and of the form of the rules
in the look up table. He, Searle, is (part of) the system, but he has no
understanding of the meaning of any of the processes that go on in it, so
the room may be an implementation of question answering but it sim-
ulates none of the experiences of a human Chinese question answerer.
It has no consciousness beyond Searle's completely insulated American
one. So no computational simulation can have consciousness. All compu-
tational simulations (even robots which can move around and act in the
world) are zombies|individuals which can perfectly simulate all human
outward behaviour, but have no feelings or mental experiences within.
Searle has greatly amplied this argument in subsequent papers and
there is an industry of commentary on it. In fact in the early paper
little emphasis is placed on the failure to recreate conscious experience
in a perfect computational simulation. But it is clear from the way
the argument develops that this intuition of computers as zombies is
close to the heart of the philosophical intuition that computational
approaches cannot embrace conscious experience. No one, least of all
cognitive scientists, dispute that this is a powerful intuition. But it was
a close to universal intuition that human beings could not possibly have
evolved by blind chance processes, or that life could be the consequence
of biochemicals, up until a century ago. Now, those intuitions are held
strongly by practically all scientists. Intuitions are important but highly
changeable and not enough.
Interestingly Searle is not necessarily opposed to a materialist view
of human beings. He is prepared to accept that he may be nothing more
than a collection of cells made up of nothing more than a collection
of molecules all assembled just thus and so. But he strongly believes
that the particularities of his chemical implementation are absolutely
constitutive of his human mental life. Silicon and electricity just can't
replace the exact jelly which we are, without leading to a fundamental
lack of conscious experience. Even if some mad cognitive scientist (or
MIT Press Math6X9/2003/02/27:16:11 Page 507
508 Chapter 19
510 Chapter 19
human brain. But after some research, we nd that our Mark I android
doesn't run well at all. We nd that what the implementation doesn't
implement is the variable length of the axon connections. Unlike slow
electro-chemical signals which travel at speeds which are functions of
the synapse diameter, electricity has a nearly constant, much higher
speed. So Mark II is equipped with delay lines between synapse and
synapse which produce electrical delays exactly mimicking the delays in
the dendrites and axons. Mark II runs a bit better but still does not
report the right feelings at all, and her qualia are just deplorably pale.
The problem is traced to the fact that there are several ion channels
in the synapse which control the refractory periods of the neuron after
ring (don't worry if this is neuroscience mumbo jumbo|the details
aren't what's important|only the concept of gradually approximation
of the implementation). It's not just a single signal, but a set of parallel
signals, each with slightly dierent delay characteristics. The functioning
of the mind depends on these waves of signal remaining in reliable
phase relations. Having xed this glitch, Mark IV begins to report
the normal roughness shortly after getting up, but qualia reports are
severely distorted. The problem turns out to be due the fact that human
neurons learn|they change their resistance to signals as a function of
the temporal association of signals on their input synapses. Once this
plasticity of resistance is achieved by 8th generation gallium arsenide
technology, Mark V really is beginning to look more promising|some
qualia reports sound just like the qualia reports in his philosophy tutor's
papers.
And so on. Laying aside facetiousness for the moment, let us suppose
that we can successively approximate the information processing of the
neuron in our semi-conductor technologies. Remember this is an argu-
ment in principle, not an estimation of the prudence in investing your
life savings in some new technology. The points I want to make are two.
What could we look to to explain remaining divergences between human
and Robot Mark N other than the information processing capacities of
the neuron replacements? Of course we might turn out to be wrong that
the brain is really what plays the main role in controlling behaviour|
maybe it's really the
uid computer in the kidneys. And certainly it
will be the embodied brain that we have to account for rather than the
version in the vat. But short of such revelations, controlling behaviour
just is computation (implemented within some important constraints of
MIT Press Math6X9/2003/02/27:16:11 Page 511
time), and so if Mark N isn't quite right, then we must have got some
aspect of the signal processing of neurons wrong. Of course we could
also be wrong about the physical implementation, and it could all turn
out to depend on quantum phenomena in the mitochondria (we'll come
back to this when we discuss Penrose later). But then we will just have
to go back to the bench and replace that information processing, with
some other brand.
Of course, it might turn out that when we have pursued the de-
velopment of neuron replacement therapy for many years, and we have
a thorough understanding of what information processing needs to be
replaced, we might actually be able to show that the current physical
implementation of the human computer is the only possible one|it re-
ally does have to be the exact jelly. But if we ever reached that point,
then we would have a theory which explained exactly what kind of com-
putation was required, and then proved that only one physical device
could implement it. This would surely be an ultimate victory for the
computational paradigm? A peculiarly strong one, and rather unlikely?
The second point to be made about this thought experiment is
also about specication, but this time not about the specication of
what computations have to be performed by neurons, but rather the
specication of what behaviour Mark N must perform to pass the test of
implementing human consciousness. Critics of strong AI like Searle tend
to vastly underestimate the subtlety of the specication of what has to be
simulated in human behaviour to pass as a perfect simulation. They talk
about it as if outward behaviour could be specied in some specication
of what our robot must be able to do, without specifying anything
about the coherence of whole patterns of behaviour which express inward
feelings. They perhaps forget that verbal reports of inner experiences,
as well as non-verbal expressions of feeling (like dancing) are overt
behaviour and will therefore have to be simulated in exact coherence
with all other behaviour. The argument always runs as if the specication
comes complete, and at the beginning of the research program. But of
course it does not. Each successive Mark N may fail some behavioural
test which we had not even thought of when we started out, because in
carrying out the program of research we would learn an immense amount
about what does allow people to detect zombies, and the mistakes they
would undoubtedly often make. Any dierences in internal experience
between human and zombie would have to be in-principle inexpressible to
MIT Press Math6X9/2003/02/27:16:11 Page 512
512 Chapter 19
514 Chapter 19
516 Chapter 19
or experience in the rst place|we don't have the specication. Study-
ing the specication and the implementation and how they interact is
the only real program of research on oer.
If this book has generated a little wonder at the extent of our
ignorance about what people do, and given some idea about some current
methods of analysing and understanding the cognitive phenomena of
communication, then it will already have succeeded in its most important
aim.
19.2 References
Searle, J. R. (1980) Minds, brains and programs. Behavioral and Brain
Sciences, 3, 417{24.
Penrose, R. (1989) The emperor's new mind: concerning computers,
minds, and the laws of physics. Oxford : Oxford University Press.
MIT Press Math6X9/2003/02/27:16:11 Page 517
VII APPENDICES
MIT Press Math6X9/2003/02/27:16:11 Page 518
MIT Press Math6X9/2003/02/27:16:11 Page 519
Appendix:
A Bibliographical notes and further reading
520 Appendix A
522 Appendix A
524 Appendix A
MIT Press Math6X9/2003/02/27:16:11 Page 525
Appendix:
B Consolidated grammars
This document shows all of the grammar rules used in this book.
B.2 Grammar 1
This grammar is used in Part III of this book.
Syntactic rules
526 Appendix B
name(x)
NP
becomes
x
PN
name
Reuse a referent, if you can. Otherwise introduce the referent at the
top of the box.
NP
becomes
PRO x
pronoun
You must reuse a referent.
MIT Press Math6X9/2003/02/27:16:11 Page 527
name(z )
VP
becomes
z
V0
name
name(x; w)
VP
becomes
V1 w
x
name
MIT Press Math6X9/2003/02/27:16:11 Page 528
528 Appendix B
x
name(x)
NP becomes:
x
Det N
a(n) name
name(x)
NP
becomes
x
Det N
the name
Reuse a referent, if you can. Otherwise introduce the referent at the
top of the box.
MIT Press Math6X9/2003/02/27:16:11 Page 529
NP VP
becomes
Det N
every name
S
x
name(x)
) x VP
becomes
if S1 S2
S1
) S2
530 Appendix B
B.3 Grammar 2
This grammar is used in Part IV of this book. The translation rules are
shown in Figures B.12 to B.20.
The main dierences are the instruction box and the use of features
to indicate number and gender information.
MIT Press Math6X9/2003/02/27:16:11 Page 531
Syntactic rules
NP ! PRO
NP ! PN
NP ! Det N
VP ! V0
VP ! V1 NP
Det
2 ! a, an, every3, the, . . .
66num : sing 7 7 ! bird, stick, dog . . .
N
4gender : neut 5
2 3
66
num : sing 7 7 ! lawyer, dog, cat, girl, woman, . . .
N
4
gender : fem 5
2 3
66
num : sing 7 7 ! lawyer, dog, cat, boy, man, . . .
N
4
gender : masc 5
2 3 2 3
66
num : sing 7 75 ! it 66 : sing 7
num
7 ! she, her
PRO
4
gender : neut
PRO
4 : fem 5
gender
2 3 2 3
66
num : sing 7 75 ! he, him 66 : sing 7
num
7 ! Etta,
PRO
4
gender : masc 4
PN
: fem 5
gender
Nina
2, . . . 3
66 num : sing 7 7 ! Pip, Mac, . . .
PN
4 gender : masc 5
MIT Press Math6X9/2003/02/27:16:11 Page 532
532 Appendix B
NP
PN 2 3
6
6 7
number : num 77 becomes:
6
6 7
6
4 gender : gen 75
name
x
? num(x) gen(x)
name(x)
MIT Press Math6X9/2003/02/27:16:11 Page 533
NP
PRO 2 3
6
6 7
number : num 77 becomes:
6
6 7
4 gender : gen 75
6
pronoun
num(x) gen(x)
x =?
MIT Press Math6X9/2003/02/27:16:11 Page 534
534 Appendix B
name(z )
VP
becomes
z
V0
name
name(x; w)
VP
becomes
V1 w
x
name
x
name(x)
NP num(x)
gen(x)
2 3
Det N
becomes:
6
6 7
6 number : num 77
6
6 7
4 gender : gen 75
x
a(n) name
MIT Press Math6X9/2003/02/27:16:11 Page 536
536 Appendix B
NP
2 3
Det N
becomes:
66 7
66 number : num 777
64 gender : gen 75
the name
x
x
? name(x)
num(x) gen(x)
MIT Press Math6X9/2003/02/27:16:11 Page 537
NP VP
Det N2 3
66 7 becomes
66 number : num 777
64 gender : gen 75
every name
x S
name(x)
num(x) ) x VP
gen(x)
MIT Press Math6X9/2003/02/27:16:11 Page 538
538 Appendix B
S
S1 S2
if S1 S2
becomes )
Appendix:
C Glossary
Jargon is always a problem and more than ever so when we deal with
several disciplines. We try to minimise it, but some is unavoidable, and
learning the jargon is sometime inseparable from learning what it's all
about. A piece of general advice: try and see if you can extract from the
context what meaning of a word is intended. Check the index for other
occurrences of the same term and see whether your understanding of the
term makes sense.
Below, a word or phrase in bold face within a denition refers to
another glossary entry. We have provided dentions only where we think
them useful. We have omitted denitions in some cases where the rst
index entry refers to a page where a denition of the term appears.
Various denitions below have been adapted from other sources.
MIT Press Math6X9/2003/02/27:16:11 Page 540
540 Appendix C
Glossary 541
542 Appendix C
Glossary 543
544 Appendix C
Glossary 545
546 Appendix C
Glossary 547
548 Appendix C
G grounding . . . . . . . . . . 507
Galileo . . . . . . . . . . . . . . 13
H
gambler's fallacy . . . . . . 79
halt . . . . . . . . . . . . . . . 145
game theory . . . . . . . . 104
Halting Problem . . 144, 513
garden path sentences 303
head the `most important' ele-
gender Gender is a linguistic ment in a constituent. e.g. in
property given to nouns, proper a noun phrase, the noun is
names and pronouns. They the head, similarly for verb
can be male, female or neu- phrases and verbs. 144, 257
tral. . . . . . . . . . . . . . . 321
Heim . . . . . . . . . . . . . . 203
generalisation . . . . . . . 480
Hilbert . . . . . . . . . . . . . 143
generate To produce (perhaps
automatically) a sequence of horoscopes . . . . . . . . . . . 87
words using a set of syntac- human computer interaction
tic rules. . . . . . . . . . . 213 . . . . . . . 139, 142
given . . . . . . . . . . . . . . 372 hybrid involving knowledge both
Godel . . . . . . . . . . . . . . 143 of a formal, symbolic kind and
of a statistical kind. . . 297
\good English" . . . . . . 179
Graduate Record Examina- I
tion . . . . . . . . . . . . . . 487 idealisation . . . . . . . . . . 13
grammar A grammar is a sys- ideational to do with the com-
tem which relates form and munication of propositions
meaning. . . 184, 219, 326 rather than with the reinforce-
grammars . . . . . . . . . . 197 ment of social groupings. See
also phatic. . . . . . . 3, 6, 8
grammaticality A judgement
as to whether a sentence, of idiomatic A phrase is idiomatic
English say, adheres to or vio- if it has an established mean-
lates the rules of language that ing that is quite dierent from
we as speakers know. . 184 its literal meaning. For ex-
ample, kick the bucket is an
graphical semantics . . 455 idiomatic phrase, because it
graphics . . . . . . . . . . . . 454 has the established meaning
MIT Press Math6X9/2003/02/27:16:11 Page 549
Glossary 549
550 Appendix C
Glossary 551
552 Appendix C
Glossary 553
554 Appendix C
Glossary 555
556 Appendix C
Glossary 557
558 Appendix C
Glossary 559
560 Appendix C
W
Wason . . . . . . . . . . . 28, 47
Weaver . . . . . . . . . . . 9, 20
well-formed . . . . . . . . . 120
word In a computing context
(i.e. the organization of com-
puter data), a `word' is the
next level of organization up
from a bit. Computers com-
monly use `8-bit words'. In
linguistics, the smallest unit
of linguistic structure with
meaning. . . . . . . . . . . 221
words . . . . . . . . . . . . . . . 21
world knowledge World knowl-
MIT Press Math6X9/2003/02/27:16:11 Page 561
Index
562 Index
Index 563
564 Index
Index 565