M Thesis
M Thesis
M Thesis
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF INFORMATION ENGINEERING
OF THE TOKYO INSTITUTE OF TECHNOLOGY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
MASTER OF ENGINEERING
Timothy Baldwin
February 1998
Contents
1 Introduction 1
1.1 Objectives and outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Statement of purpose of this research . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Methodological outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Applications of this research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 A basic model of Japanese syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Coordination, cosubordination and subordination . . . . . . . . . . . . . . . . . 3
1.3.2 Displaceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 The structure of Japanese relative clauses . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Past research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Descriptive accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Relativisation and thematisation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Relative clause type definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Case-role gapping relative clauses . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Head restrictive relative clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Full clause-based idioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Distribution of the relative clause types . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 The full relative clause type hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
ii
CONTENTS iii
5 Miscellaneous processing 46
5.1 Non-gapping expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.1 The extraction of non-gapping expressions . . . . . . . . . . . . . . . . . . . . . 46
5.2 Time-related adjuncts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.1 Temporal masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.2 Time relative constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.3 Temporal expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.4 Temporal vs. Durational interpretations . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Cardinal adjuncts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 The default rule set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6 Lexical ambiguity 52
6.1 Verb lexical ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 Resolving verb lexical ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2.1 Calculation of verb scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2.2 Complexity of inflectional content . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2.3 Evaluation of verb scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Noun head lexical ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8 Evaluation 61
8.1 Evaluation criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.1.1 Baseline evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.2 Overall evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
iv CONTENTS
9 Conclusions 74
9.0.1 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
v
List of Figures
4.1 The full verb class hierarchy (Original verb classes indicated in bold, partitioning nodes
capitalised) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
vi
Acknowledgements
First and foremost, I would like to express my heartfelt gratitude to Prof. Hozumi Tanaka in
supervising me over the past two years, giving me the opportunity to come to Japan, and making
every step of the journey to reach this point smooth and enjoyable. At the same time, Assoc. Prof.
Takenobu Tokunaga has generously given of his time to help me take my first shakey steps towards
becoming a genuine researcher. Thanks go also to the long-suffering residents of Room 807 and all
other members of the Tanaka and Tokunaga laboratories, in being asked for opinions on reams of
example sentences, getting me and my computer back on speaking terms on numerous occassions, and
for doing it all with a genuine smile all the way.
I doubt that this research would have progressed anywhere near the stage it has, without the
assistance of the NTT Machine Translation group in providing access to their unsurpassed range of
resources. In particular, I cannot thank Francis Bond enough for patiently reading papers in varying
stages, sharing his work, and generally being a good friend throughout the past two years. I look
forward to getting back to my definiteness roots with Francis now that I have this little fellow out
of the way!
And most importantly, I would like to thank family and friends for being patient and understanding
while I learnt how to do things the hard way. I would like to say that I will devote more of myself, my
time, and my energy to being a better husband, brother, son, and friend from now on, but realistically
know that I am only going to turn around and plunge myself back into the cesspool of relative clauses
and put you all through more of the same!
vii
Abbreviations
abl Ablative
acc Accusative
all Allative
cause Causative
com Comitative
dat Dative
dim Familial Diminutive
gen Genitive
loc Locative
mutual Mutual
neg Negative
-nml Nominaliser
nom Nominative
pass Passive
past Past tense
pres Non-past tense
prog Progressive
qp Question Particle
quot Quotative
ren Renyo
top Topic
while While (nagara)
viii
Chapter 1
Introduction
1
2 CHAPTER 1. INTRODUCTION
for the core case-role set. To take an English example, The party was held at Davids house contains
a Subject and a Locative, in the form of the party and at Davids house, respectively. Other examples
of case-roles are Direct Object, Co-actor, Perlative, and Instrument.
Japanese marks each argument (case filler) with morphological case, in the form of a phrase-final
case marker. Thus, in the Japanese equivalent for the above English example, pati-ha Debiddo no ie-
de okonawareta (party-topDavid gen house-locwas held), the Subject pati is marked with the topic
marker and the Locative Debiddo no ie is marked with the locative case marker. These case-role/case
marker tuples make up the content of each case slot.
That Debiddo no ie both performs the Locative case-role and is marked in the locative case is not
entirely incidental, and peripheral case-roles commonly coincide with their canonical case marking
type (another example of this is the Comitative case-role and comitative case marking), but that is
not to say that this is either a necessary or sufficient condition. The case-role schema and case marking
patterns should thus be considered as orthogonal issues, and the reader should bear in mind that in
recovering the case-role of a given case slot, word order and the semantic content of the case filler also
play integral roles.3
It is also true that a number of the proposed case-roles are associated with a unique case slot
(in particular the Durational and Instrument case-roles), and that the core roles have a default case
marker, but in no sense does the reverse apply, due to the conflation of case marking. To take
an extreme case, the dative case marker (ni) can mark almost any case-role and provides minimal
indication of the particular case-role of that case slot. For this reason of case marking variation, also,
we consider case-roles and case marking as separate issues.
The valency frame for a given verb and verb sense is made up of individual case slots, with the
scope of case marking for each slot often being plural. We will refer to the case marking paradigm for
all case slots, considered in isolation of case-role correspondence, as the case frame. Case frames can
thus be derived trivially from valency frames by discarding case-role information.
1.3 Definitions
1.3.1 Coordination, cosubordination and subordination
Clausal relations provide a valuable mechanism when analysing complex relative clauses, and are
discussed variously throughout this thesis. In describing clausal relations, we apply a parameterised
trichotomy comprising the subordination, cosubordination and coordination types (Foley and
Van Valin 1984; Van Valin 1984).
Two parameters are used to differentiate these three types: dependence and embedding. Depen-
dence is form-based, and relates to whether the clause in question is syntactically dependent on
surrounding clauses (either for operators or distributionally), or alternatively can stand alone as a
complete sentence. Embedding, on the other hand, describes whether the clause is encapsulated
within/functions as part of another clause, or is complete and distinct (Van Valin 1984:542); herein,
embedded clauses will be used to refer to quoted clauses marked with the quotative case marker.
3
See (Blake 1994:13-8) for a discussion of this process in the context of a variety of language types.
4 CHAPTER 1. INTRODUCTION
Combining these two independent parameters, we in fact produce a four-way taxonomy, of which
only the three classes given above are relevant for our purposes.
To clarify this distinction, [David ran] [and Peter rode his bike] is an instance of clause coordination,
[David ran] [because Peter was riding his bike] is an instance of clause cosubordination (distributional
dependence), and [David ran to allow Peter [to ride his bike]] is an instance of clause subordination.
1.3.2 Displaceability
Displaceability is a statement of the potential for a given case slot to be case-role gapped from a
relative clause context, a notion original to this research. We wish to claim that it is possible to make
an a priori judgement on the displaceability of a given case slot, independent of case filler type, and
that if provided with a case filler which can be expressed in a displaceable case slot for a matrix clause
instance, that case filler can also be gapped to become the noun head of a relative clause.
In the case of the verb nar(-u) to become, for example, and the Subject-nomDirect.Object-dat
valency frame, the Subject case slot is displaceable, but the Direct Object is not:
(1) a. syobosi-ni natta kare
fireman-dat became he
(lit.) he, who became a fireman
b. * kare-ga natta syobosi
he-nom became fireman
the fireman he became
Background
a. Formal nouns/postadnominals (Martin 1975:664-740) such as noti after, tame for (the pur-
pose of) bai case/circumstance. These act as discourse/clause-level markers and unambigu-
ously collocate with a clause body, deictic marker or noun specifier.
b. Relational verb stems2 , as taken in the Hallidayean sense (see Halliday (1994:119-38)). This in-
cludes constructs of the type to-iu called, ni-kansuru concerning and ni-taisuru against/regarding.
It is important to note here that our definition of relative clause for Japanese includes both NP
complexes that involve case-role gapping (an adaptation of the traditional definition of relative clauses),
and those for which the relative clause body simply restricts/exemplifies the noun head.3 That is,
our use of the term relative clause corresponds to Matsumotos noun-modifying construction
(1996), and differs significantly from the restricted Transformational Grammar sense of the word,
which corresponds to the precept of case-role gapping relative clausehood in our framework.
Admittedly, this appears to go beyond the bounds of the standard sense of relative clausehood, but
the terminology is intended to reflect the syntactic parallelism that exists between these two relative
1
There is limited scope to apply the methods described here to the processing of relative clause-type constructions
produced with these operators, although this is left as a matter for future research.
2
Referred to as phrasal postpositions by Martin (1975).
3
Kameyama (1995), likewise, uses the term relative clause in this wider sense.
5
6 CHAPTER 2. BACKGROUND
manzoku-sita yuza
NP
S NP
manzoku-sita yuza
clause types. Additionally, while continual reference will be made to the notion of gapping, this is
intended in a semantic case-role based sense, and we wish to distance ourselves from the research on
syntactic gapping and movement that exists in the Chomskian literature (Nakau 1971; Okutsu 1974;
Inoue 1976; Huddleston 1976; Shibatani 1978; Radford 1981). Clearly, similarities exist between the
Chomskian treatment of gapping and its correlation to deep structure transformation, but, for our
purposes, we view relative clause construal as a focusing or aboutness (Kuno 1976; Saito 1985;
Kuno 1987) process rather than movement.
Traditional descriptions of Japanese relative clauses have divided them into two main disjunctive cat-
egories, based on whether the noun head can be reinserted into the relative clause body to compose a
matrix sentence (Martin 1975; Teramura 197578). Perhaps the most famous such account is that pro-
posed by Teramura (1970, 197578, 1978, 1981, 1980), in which he describes the semantic relationship
between the relative clause body and noun head either as the uchi no kankei inner relationship type
or the soto no kankei outer relationship type, corresponding to the sentence-insertable and head
noun content-supplementing sense designations, respectively. Following this typology, [ manzoku-sita
] gakusei a satisfied student is an inner relationship relative clause, and [ manzoku-sita ] wake a
reason for feeling satisfied is an outer relationship relative clause.
On close observation, Teramuras schema attempts to classify inner relationship relative clauses
as being syntactically-defined and outer relationship relative clauses as being semantically-defined, a
point which is convincingly refuted by Matsumoto (1997) in citing cases where pragmatics influence
interpretation of inner relationship relative clauses, such as:
2.2. PAST RESEARCH 7
Matsumotos frame semantic approach seems to provide valuable discrimintative power to the sub-
division of Teramuras outer relationship relative clauses, both in terms of the source of the framing
and additionally through the sub-type of the framing process. It lacks in credibility, however, in the
semantic tenability of determining the relevant frame, identifying the available slots, and analysing
compatibility between target arguments and the candidate slot set. Indeed, this first step of frame
determination seems to be problematic in a cognitive context, even, for non-situational frames. For the
well-defined experiential example of eating, such as that described by Matsumoto for the predicate
tabe(-ru) (Matsumoto 1997:61-63), the roles and frame are easily recoverable, but for a more abstract
predicate such as falling value, the location of an appropriate frame and role set would appear
more confused and the scope of role matching less apparent. Additionally, Matsumotos proposed
methodology of evaluating case-role compatibility is not able to account for cases of bounded gapping.
Perhaps more serious, however, is that the discriminative nature of Matsumotos expanded set of
relative clause types produces its own demarkation problems. That is, as compared to the claimed diffi-
culty in accounting for truncated relative clauses within Teramuras schema, Matsumotos framework
has inherent difficulties in delineating nominal framing (NH-type) and mutual predicate/nominal
framing (CNH-type) relative clauses, and likewise nominal/predicate framing and predicate framing
(CH-type) relative clauses. To take an example from (Matsumoto 1997:159), nioi in (6) is suggested
as both participating in the frame evoked by the modifying clause (predicate framing) and evoking its
own relational frame to describe the cause or source of the smell (nominal framing). As such, (6) is
described as being of the CNH-type.
(6) [ sakana-o yaku ] nioi
fish-acc grill smell
the smell of grilling fish
This begs the question as to the status of (7) and (8).
(7) [ sakana-o yaku ] kemuri
fish-acc grill smoke
smoke from grilling fish
That is, it is not possible to have an interpretation of the type the person (who) introduced (self ) (
) or the person (who) introduced (self ) (to) (self ) without overt reflexive pronoun instances within
the relative clause body. This leads to mutual exclusivity of case-role gapping between interpretations
(12a), (12b) and (12c).
Even in the case of the personal reflexive pronoun zibun occupying a case slot within the relative
clause, coindexing occurs clause-internally through case-role gaps/zero pronominal case-roles, rather
than directly with the noun head.
(13) [ i zibuni -o syokai-sita ] hito
Sbj self-acc introduced person
a. the personi (who) introduced him/herselfi
b. the person to whom ( ) i introduced him/herselfi
For (13), this correlates to zibun being coindexed with the Subject case-role, which in turn corresponds
to either the case-role gap (cf. (13a)) or some other independent, zero-pronominal discourse participant
(cf. (13b)).
One feature of case-role gapping relative clauses is that whereas the case-role gap is defined uniquely
for a given interpretation, the identity of the case slot from which gapping has occurred is not marked
either as a trace within the relative clause, or as a relative pronoun-type marker. Moreover, there
appear to be few restrictions on case-role positions from which gapping can occur, and when restrictions
are found, they tend to be localised to that case-role in the given valency frame. Indeed, the main
source of restriction is semantic, and derives from local sortal preferences defined through case
frames.5
Returning to (12) above, any case-role ambiguity observable in the relative clause complex is removed
in the matrix clause counterparts corresponding to the respective interpretations:
(14) a. (sono) hito-ga syokai-sita
that person-nom introduced
that person introduced ( )
b. (sono) hito-o syokai-sita
that person-acc introduced
( ) introduced that person
c. (sono) hito-ni syokai-sita
that person-dat introduced
( ) introduced ( ) (to) that person
(24) [ au ] kikkake
meets chance
a chance to meet ( )
7
The major component of the sentences contained in the EDR corpus are from newspaper articles, with lesser numbers
of sentences from scientific texts and other miscellaneous sources.
14 CHAPTER 2. BACKGROUND
Bound
(1%)
Head restrictive
(14%)
Full-clause
based idioms
(1%)
Case-role gapping
(84%)
Figure 2.2: The relative distributions of the proposed Japanese relative clause types
Figure 2.3: The relative clause type hierarchy and associated sub-classifications
Chapter 3
In describing case-role gapping, we clearly require a set of case-roles powerful and wide-ranging enough
to label all case slots in all valency frames. Additionally, in order to uniquely label each case slot
contained in a given valency frame, the granularity of the case-role designation must be fine enough
to account for semantic differences between case slots.
As an orthogonal issue, we introduce the concept of argument status, an expansion of the conven-
tional complement/adjunct distinction. Argument status is used to predict argument obligatoriness,
invaluable in distinguishing between zero pronominal and unrealised case slots, to introduce preferences
between valency frames according to argument content, to gauge and generalise semantic consistency
of usage, and to weight case-role gapping interpretations.
This final concern relates to which case-roles are preferred in a semantically neutral relative clause
context of the type:
(1) [ taberu ] X
eat
That is, assuming the semantics of the noun head X are inaccessible, what case-role interpretation
(ignoring head restrictive relative clause interpretations for the time being) would be most likely?
Inevitably, (obligatory) complements are preferred over optional complements, with the particular
ranking of preference of complement case-roles often relating back to topicality/accessibility hierarchies
(Keenan and Comrie 1977; Inoue 1976) such that, in the case of (1), the Subject case-role would be
preferred over the Direct Object, followed by the Locative case-role, and possibly the Instrumental
case-role. At the same time, however, the scope of case-roles available in a given clausal context is
clearly constrained by the predicate valency frame, and it would not be possible to have an Indirect
Object gapping interpretation, for example.
This illustrates the lowest level preference we can draw upon, defined by case-role immediacy and
availability.
Working against any such default ranking are affinities for particular case-roles to take certain
argument types in unmarked usages, and the potential for more specialised preferences for each case-
role when in the context of a given valency frame and predicate sense. This first issue can be seen
with the converse form of (1), in which the predicate is unspecified for a given lexical head:
(2) [ PRED ] basyo
place
Here, the most likely case-role mapping would be onto the Locative case slot, assuming locative
adjunct compatibility for the predicate. Failing any particular semantic correspondence to a predicate-
independent case-role type, however, default preferences of the type seen above would apply. This
exemplifies the opposite end of the scale of argument status, and adjuncy.
A third factor in this process is idiomatic usage, and overrides absolutely any local preferences
15
16 CHAPTER 3. VALENCY, ARGUMENT TYPES AND CASE
arising from case-role immediacy and adjuncy. For example, for the predicate utu to strike with
an unmarked head X , the most accessible case-role would be the Subject, and equally for the noun
head denpo telegram with an unmarked predicate PRED, adjuncy would not have any effect, but
the moment these two are combined as [ utu ] denpo the telegram ( ) sent, the Direct Object
case-role overrules any preferences to take absolute preference. Equally, an unmarked instance of
the predicate a(-u) to coincide would subsume a Co-actor case-role, but when collocating with the
argument keisan-ga calculation-nom, the valency frame is reduced to a single topic position.
This chapter discusses an argument status hierarchy for use in predicting the syntactic interplay
between arguments and predicates, and a case-role schema for documenting the semantic aspect of
argument-predicate linkage.
Naturally, this does not extend to cases of coordination for a given case position, but rather refers to
separate surface realisations of that case slot. Hence, the acceptability of the in-case slot coordination
in (6) and iterative -mo -mo construction in (5) do not threaten the integrity of the test.
The process of a distinct complement role being forced on a repeated element can be seen with the
verb pass below, whereby the salt in (8c) is forced into an Indirect Object role, producing a structure
paralleling that in (8b).
One limitation of this diagnostic is its inability to account for the multiple-subject construction
(Kuno 1973b:34, 68-78) in Japanese, where multiple nominative-marked constituents are generated in
the Subject position.
Smith predicts this fact in identifying the nominative case marker as the default case marker in
Japanese (p. 98). Because of this observation, however, repeatability is only applicable to non-subject
case slots.
One last test worthy of mention, which is specific to Japanese, is quantifier floating (Kuroda 1980;
Miyagawa 1988; Miyagawa 1989b). Quantifier floating occurs when a numeric classifier associated with
a noun can be transposed to the right of the associated case slot. For this process to successfully occur,
the noun phrase occupying the source case slot must necessarily be an obligatory element (Jacobsen
1992:41), and hence a complement. Quantifier floating correctly identifies the nominative case slot in
the sentence pair of (10) as a complement.
1 Integral complement
2 Shadow complement
3 Obligatory complement
4 Optional complement
5 Middle
6 Adjunct
7 Extra-peripheral
complements. In actual fact, as was seen for butter, they can be perceived as being so tightly bound
to the predicate as to be non-realiseable in an unmarked form. However, unlike integral complements,
synonym replacement is generally allowable (see (11d) above), and restrictions on word order are
relatively relaxed in cases when a surface complement representation is possible. At the same time,
we can observe some scope for in-complement modification, as was observed for integral complements.
Shadow complements are thus less rigidly restricted in surface form, supporting the given positioning
below integral complements.
An additional feature of shadow complements is the cline of acceptance of unmarked instances of the
default complement in a matrix clause context. To take the above case from Japanese, zei-o nozei-suru
was unanimously unacceptable to the native speakers consulted, whereas the matrix collocation of the
shadow complement byoin-ni hospital-dat with nyuin-suru to go into hospital/be hospitalised
received a relatively neutral response. Given that our usage of shadow complements is aimed at text
analysis, we avoid making grammaticality judgements for such unmarked matrix occurrences of the
default argument. Despite this relaxation of the constrained nature of shadow complements, however,
we maintain a treatment independent of that for integral complements. That is, integral complements
are necessary to derive the associated idiomatic sense of the predicate, in comparison with surface
realisations of shadow complements which simply reinforce/extend the inherent verb sense generated
by the predicate.
This leaves open the question of the status of song in the construction sing a song. The default
argument of song is unarguably encoded in the predicate, a fact which is evoked in the intransitive usage
of sing, and only synonyms, hyponyms, and modified instances of song are allowed as Direct Object.
The exclusion of arguments in constructions of this type from the shadow complement classification,
stems from the acceptability of proper hyponym replacement, such as sing a shanty or sing a rollicking
tune you heard on the radio. That is, semantic restriction on the Direct Object slot denotates a
hierarchical semantic set of both synonyms and hyponyms, with the default of song at the root,
unlike butter or zei tax which are highly restricted in themselves and are replaceable only with a
limited range of synonyms, and modified instances of that default sense.
The inherent optional nature of shadow complements should be clear from the relative ungram-
maticality of an unmarked surface occurrence of the default element. Application of the repeatability
diagnosis, then, leads to the expected complement status of shadow complements, producing an op-
tional complement (category 4) categorisation for shadow complements. On the surface, this would
appear to cast doubt on the placement of shadow complements above obligatory complements within
the valency binding hierarchy. We justify the given analysis from the observation that whereas elimina-
tion of surface shadow complements is possible, doing so reverts its semantic content to the default; for
optional complements (see below), the same process of elimination simply leads to underspecification.
Moreover, any potential for synonym replacement is highly constrained, to a much higher degree than
for obligatory complements.
For the above reasons, the category 2 placement of shadow complements between integral comple-
ments and obligatory complements would appear to be well-founded.
Category 5 Middles
Middles are proposed by Somers as an idiosyncratic in-between classification made up of elements
which share the characteristics of both complements and adjuncts. Naturally they are non-obligatory,
but the same close association can be observed with the governing verb. Examples of middles taken
from English are Instrumental, such as with a hammer in hit the nail with a hammer, and Beneficiaries,
such as the squire in The gamekeeper shot the squire a rabbit. (taken from Somers (1987:25)). These
two case types also generally produce middle elements in Japanese.
A significant class of middles particular to Japanese is that of onomatopaeic adverbials, and no-
tably phonomimes and phenomimes (Shibatani 1990:153-7). The strong correspondence between ono-
matopaeic expressions and particular verbs supports this view, as is seen for kusukusu titter, nikoniko
grin and kutukutu chuckle, which collocate only with the verb wara(-u) to smile/laugh.4
Category 6 Adjuncts
Adjuncts, again, are necessarily optional, but unlike middles tend to display semantic consistency
across usage with distinct predicate classes. Naturally, pragmatic restrictions will exist as to local se-
mantic compatibility with a given predicate, but, in general, their use is unpredictable. Co-occurrence
of adjuncts of the same semantic type commonly occurs, as suggested by the repeatability test de-
scribed above, and word order restrictions are relatively relaxed.
4
All these expressions also occur with the light verb suru to do in an inherent laughing sense.
3.3. CASE SET 21
An example of an adjunct is the Japanese Locative case slot, such as kono-ie-de this house-loc
in:
(14) Taro-ha kono-ie-de umaresodatta
Taro-top this house-loc was born and brought up
Taro was born and brought up in this house.
Category 7 Extra-peripherals
Extra peripherals are optional sentence modifying constituents, and constitute the outermost argu-
ment category; as one would expect, they are almost impervious to both word order and semantic
restrictions. In both English and Japanese, adverbs form the main component of extra-peripherals.
Particular examples are suddenly and often, and wazato intentionally and sorosoro soon.
Subject
The Subject is traditionally defined as the general doer of the action, such as the dog in The dog
gnawed the rope. While this description is relatively uncontroversial for active clauses, it leads to
two distinct treatments of passive subjects. The first is to take a Fillmore Case-style approach and
identify that entity which corresponds to the active subject, in the underlying or logical subject
sense. The alternative method is to concentrate solely on surface syntactic marking in identifying the
grammatical subject. Thus, in The rope was gnawed by the dog, the dog comprises the logical subject
(coinciding with the grammatical subject of the active voice equivalent), and the rope the grammatical
subject.
In this research, we adopt this second, grammatical notion of subjecthood.
Subjects are necessarily obligatory or integral complements, and are characterised by nominative
case marking. Japanese Subjects cannot be unambiguously detected through either word order, in-
flection, case marking, or the concept of surface syntactic obligatoriness. Rather, we must fall back
on a number of linguistic tests to ascertain the Subject argument in a given sentence context.
The first such test is zibun-binding, and the observation that instances of the reflexive pronoun zibun
can generally only bind to a clausal Subject position.6
(17) Taro-ga Hanako-ni Ziro-o zibun no ie-de syokaisita.
Taro-nom Hanako-dat Ziro-acc self gen house-loc introduced
(lit.) Taroi introduced Jiro to Hanako in selfsi house. (Shibatani 1990:283)
Hence, in (17), Taro is the Subject.
One further test is subject honorification (Shibatani 1990:283), and involves the use of honorific o
V-ni naru marking on the main verb to indicate deference to the Subject:
(18) a. syatyo-ga waratta.
president-nom laughed
The company president laughed
b. syatyo-ga o-warai-ni natta.
(Subject honorific form of a.)
Naturally, the Subject entity must be animate and pragmatically worthy of honorification for this test
to be applicable.
Despite this seemingly overbearing constraint, subject honorification provides an unambiguous
means of determining the Subject case position through analysis of suitable situational participants,
assuming that it is possible to uniquely identify one of those candidate participants as being worthy
of honorification. That is, by identifying a case slot as containing a Subject filler in a given lexical
context, for a given case frame, we can generalise that case slot as being the Subject in other lexical
contexts, assuming consistency of case marking.
An application of this process is the identification of the Subject position of wakar(-u) to know/understand
in a dative-nominative ergative marking context.7 First, it is necessary to generate a sentence con-
text involving animate participants in all candidate case slots, one of which must be unambiguously
superior in social standing to the others. Such a sentence is given in (19a), where syatyo is the en-
tity worthy of honorification. Next, we consider the appropriateness of subject honorification (cf.
(19b)) and object honorification (cf. (19c)), and correlate these findings with our a priori honorifica-
tion judgement. In (19), the appropriateness of (19b) suggests the datively marked syatyo as occupying
the Subject position, leaving the nominatively marked syain as the Direct Object.
6
See (Iida 1996) for documentation of significant exceptional cases in which zibun binds to non-subjects.
7
Here again, note the non-coincidence between prototypical case-roles from the case marking types and the actual
case-roles, with the Subject occupying a datively marked case slot and the Direct Object occupying a nominatively
marked slot.
3.3. CASE SET 23
Direct Object
Direct Objects generally indicate the entity/entities affected by the action described by the main
verb. As such, they are expressible only as obligatory complements, or in terms of our argument
status hierarchy, as obligatory or integral complements.
One language-inspecific test for Direct Objects is that, in the absence of a Causee argument, they
are commonly passivisable to the Subject position. This was the process observed above for the rope
in The rope was gnawed by the dog.
Direct Objects are prototypically marked with accusative case.
Indirect Object
Indirect Objects represent the recipient or beneficiary in an action. Similarly to Direct Objects,
Japanese Indirect Objects are passivisable (cf. (20), in which the Indirect Object is transformed into
the Subject position), but only for (di)transitive verb senses (cf. (21)).
(20) a. Taro-ga Hanako-ni tegami-o okutta.
Taro-nom Hanako-dat letter-acc sent
Taro sent Hanako a letter.
b. Hanako-ga Taro-ni tegami-o okurareta.
Hanako-nom Taro-dat letter-acc was sent
Hanako was sent a letter by Taro.
Co-actors
In terms of traditional grammatical analysis, the Co-actor case-role straddles the boundary between the
Direct and Indirect Object positions. It resembles the Direct Object case-role in argument status, in
that all Co-actors are obligatory complements (but never integral complements). From the perspective
of case marking, however, the Co-actor case slot is dative or comitative case marked, and hence most
similar to Direct Objects. As an additional concern, Co-actors are not passivisable (cf. (22b)), in
which respect they parallel intransitive Indirect Object usages.
8
Zonji(-ru) is a lexical object honorific equivalent of wakar(-u).
24 CHAPTER 3. VALENCY, ARGUMENT TYPES AND CASE
Under causativisation, on the under hand, Co-actors are transformed into Co-patients, (and hence
coordinated with the Causee case-slot see below):
One phenomenon which sets Co-actors apart from all Direct and Indirect Object usages is that Co-
actor case fillers can be coordinated within the Subject case slot to retain the same basic sentential
semantic (ignoring focus/theme variation). That is, they occur with reciprocal verbs and are mutually
exchangeable.
This conflation of agency in the Co-actor case-role points to strong semantic resemblance between
coordinating and comitative case marking roles of the particle to, although no claim is made as to
the exact nature of this correspondence.
The crucial difference between the two case-roles comes in the optional complement status of the
Comitative, and it hence not being intrinsically defined for any verb. For example, while it is perfectly
natural to say Taro-to(-issyo-ni) itta (I) went along with Taro, one would certainly not want to make
the claim that the Comitative is defined within the valency frame for ik(-u) to go. This informal
observation leads to the Comitative and Co-actor case-roles existing in disjunctive distribution.
The optional nature of the Comitative case-role correlates to it being displaceable only with a
dangling expanded comitative case marker in the relative clause body.
All other complement cases are optional complements, with the sole exception of the Passive Agent,
which is an obligatory complement for adversative passives.
3.3. CASE SET 25
Passive agent
The Passive Agent case-role is derived through passivisation, and marked datively or with ni-yotte.9
In the case of a direct passive (Miyagawa 1989b; Miyagawa 1989a), the Passive Agent is an optional
complement, whereas Passive Agents in adversative passive contexts are obligatory complements
(Hoshi 1994).
Causee
The Causee case-role is generated through causativisation, from Subject case slot transformation.
It constitutes an obligatory complement (cf. Passive Agents) and is marked either datively or ac-
cusatively.
(26) a. Taro-ga Hanako-ni hon-o yonda.
Taro-nom Hanako-dat book-acc read
Taro read a book to Hanako.
b. Jiro-ga Taro-ni Hanako-ni hon-o yomaseta.
Jiro-nom Taro-dat Hanako-dat book-acc made read
Jiro made Taro read a book to Hanako.
Co-patient
Co-patients mimic Co-actors in case marking (dative or comitative) and in their being non-passivisable
(cf. (27b)), but differ in that they coordinate with the Direct Object or Causee case-slots (rather than
the Subject cf. (27c)), are optional complements and are unaffected by causativisation. That is,
Direct Objects being unaffected by causativisation leads to consistency of case-role coordination.
(27) a. Taro-ga enzin-o haikikan-to kumiawaseta
Taro-nom engine-acc exhaust pipe-com joined
Taro joined the engine with the exhaust pipe.
b. * haikikan-ga (Taro-niyotte) enzin-o kumiawasareta
exhaust pipe-nom Taro-by engine-acc joined
The exhaust pipe was joined to the engine (by Taro). (intended)
c. Taro-ga haikikan to enzin-o kumiawaseta
Taro-nom exhaust pipe and engine-acc joined
Taro joined the engine and exhaust pipe.
Target
The Target case-role describes the optional target of experiential verbs (see Section 4.5.4) and is
marked datively.
(28) a. kabu no wariate-ga atta
stock gen allotment-nom there was
There was an allotment of stock.
b. kozin-ni(taisite) kabu no wariate-ga atta
individual-dat stock gen allotment-nom there was
(lit.) There was an allotment of stock to individuals.
9
The dative case marker and ni-yotte are essentially interreplaceable, excepting that ni-yotte cannot be used with
adversative passives (Kuroda 1979). There is also a slight semantic difference between the two marking types, relating
to the degree of affectivityof the Subject of the passive clause.
26 CHAPTER 3. VALENCY, ARGUMENT TYPES AND CASE
Object Allative
The Object Allative case-slot refers to the optional physical medium reference associated with re-
sultative action verbs such as kak(-u) to write (on) and kake(-ru) to hang (on); it is marked
datively.
(29) Taro-ga zibun no namae-o kami-ni kaita
Taro-nom self gen name-acc paper-dat wrote
Taro wrote his name on the paper.
Locative
The Locative case slot construes the generic positional case-role, and is marked with the dative or
locative (de) case markers.
(30) Taro-ga Tokyo-ni sundeiru.
Taro-nom Tokyo-dat is living
Taro lives in Tokyo.
Ablative
The Ablative case slot indicates the local source of a directional action, and is marked with the particle
of the same name.
(32) Hanako-ga harubaru Sapporo-kara yattekita.
Hanako-nom all the way Sapporo-abl came
Hanako came all the way from Sapporo.
Allative
The Allative case slot indicates the local target of a directional action, and is marked datively or by
the allative e/made case particles.
(33) Taro-ga sengetu Kyoto-made/ni/e borosya-de unten-site kita.
Taro-nom last month Kyoto-all-dat-all old car-loc drove there and back
Taro drove to Kyoto and back in his bomb (of a car) last month.
3.3. CASE SET 27
Perlative
The Perlative case slot indicates the location through/across which the directional action of the main
verb occurs, and is marked accusatively.
(34) Taro-wa izen tomodati no hune-de Nihonkai-o watatta-koto-ga aru.
Taro-top previously friend gen boat-loc Japan Sea-acc has crossed
Taro has previously crossed the Japan Sea in a friends boat.
Durational
The Durational case-role indicates the length of time an action or state lasted. Unmarked dura-
tional nouns such as kikan interval are generally datively marked (with or without a pre-dative zyu
durational marker), whereas durational complexes involving cardinal reference are most commonly
associated with null marking.
(35) seru no kikan-ni mise-o odozureta
sale gen interval-dat shop-acc visited
( ) visited the shop during the sale.
Temporal
The Temporal case-role indicates a distinct point in time, and is marked with the dative case marker for
cardinal date/time references, and with null/iterative (mo) marking for generic temporal expressions
such as kyo today and kyonen last year.
(36) kyo Taro-ga kuru.
today Taro-nom comes
(lit.) Today, Taro will come.
Cardinal
The quantity/degree of an action is expressed with the Cardinal case-slot, with the semantic scope
spanning from unit-based mention such as sokudo speed, to physical extent and price, such as with
28 CHAPTER 3. VALENCY, ARGUMENT TYPES AND CASE
kakaku price. Cardinals are marked with the locative case marker or null marking, are adjuncts,
and are by default incompatible??? with all verbs.
(39) hikoki-ga monosugoi hayasa-de sora-o tonda
plane-nom extreme speed-loc sky-acc flew
The plane flew across the sky at extreme speed.
Chapter 4
That is, full-clause based idioms are strictly preferred over integral complements, which in turn take
precedence over shadow complements, and so on. The placement of middles above obligatory and
optional complements may seem controversial in light of the higher argument status of these types.
However, the well-defined lexical nature of middles makes them more stable through relativisation
than general complements.
The remainder of argument types (obligatory/optional complements, and adjuncts) interplay on a
finer level, with selection between obligatory and optional complements being made from the mappings
29
30
capitalised)
GENERIC Full clause-based idiom
Conflated Partitive
STATE ACTION ergative
Start End
Copula
Natural PHYSICAL Extinction/ Mental Tool-aided Locational
phenomenon MOVEMENT destruction action action action
Conjoining Existential RELATIONAL
Including Excluding
Empathy
CHAPTER 4. VERB CLASS-BASED RESOLUTION
Figure 4.1: The full verb class hierarchy (Original verb classes indicated in bold, partitioning nodes
4.1. VERB CLASS HIERARCHY 31
between them inherent in the valency frame, and adjuncts weighted according to the prototypicality
of the noun head with that adjunct type.
We claim that the proposed treatment of adjunct case slots is one of the potential strengths of the
system, in terms of consistency of application and simplicity of processing. Given that adjuncts behave
relatively consistently across all usages, it is possible to simply define adjunct type compatibilities
for each adjunct case-role, and apply a uniform semantic treatment to the calculation of adjunct
correspondence of arguments. In terms of verb class representation, this equates to associating verb
classes to each adjunct type, determining a default (in)compatibility judgement (see Section 3.3), and
marking those cases for which the particular verb sense does not coincide with this judgement.
1 5463 68.71%
2 1798 22.61%
3 457 5.75%
4 144 1.81%
5 52 0.65%
6 16 0.20%
7 11 0.14%
8 4 0.05%
9 2 0.03%
10 3 0.04%
14 1 0.01%
utilised in this research. For details of the full clause-based idiom class, the reader is referred to
Section 2.3.3.
Individual descriptions of the newly developed verb classes, and associated rule sets where applicable,
are provided below.
ELSE IF (locative head AND uninstantiated Ablative case slot) RETURN Ablative;
4.2.3 Travelling
The focus for travelling verbs is on the Perlative case-role, and the route taken in the travelling motion;
this is marked with the accusative. However, similarly to proximal travelling verbs, there is potential
to refer to the destination through the use of ablative nouns, for which the case marking is allative or
dative.
Tor(-u) to travel/pass through and tob(-u) to fly (across) are both instances of travelling verbs.
4.3 Relational
Relational verbs are characterised by relating a source and target entity. In the case of inter-personal
relational verbs, both entities are generally human, whereas generic relational verbs are associated
with a broader range of both animate and abstract arguments. For all relational verbs, the focus
is on the source entity, which in the context of relative clauses means that the default gapped case-
role in cases of ambiguity between the source and target case slots, corresponds to the source. For
target case-role gapping to occur, one or more of the following conditions must be met: (a) the source
entity must be lexically realised, (b) the head must be an allative noun, or (c) there must be marked
empathy on the source entity (see below). The most commonly occurring personal allative noun is
aite opponent, although the non-personal allative saki direction/goal can be equally acceptable
for generic relational verbs in certain contexts.
Target entities can occur in any of the Co-actor, Co-patient and Indirect Object case-roles, with the
particular case-role defined by the predicate and exclusivity of these case-roles occurring in a given
valency frame.
Co-actor targets are obligatory in nature (a fact which derives directly from the definition of the
Co-actor case-role), which makes them slightly more tenable to unmarked case-role gapping than the
other two target argument types. They occur for reciprocal verbs such as a(-u) to meet (inter-
personal relational) and itti(-suru) to correspond (generic relational). The reciprocity of Co-actor
target elements can be observed in (1), where ambiguity exists between Subject and Indirect Object
case-role gapping.
(1) [ au ] hito
meets person
a. people who meet ( )
b. people ( ) meets
Clearly, the two glosses correspond to the same situation, and if there is to be any constraint on the
two case-roles, it is that the most topical/empathised entity occupies the Subject case slot for inter-
personal relational verbs. We return to this matter in discussion of the system evaluation in Section
8.2.4.
4.3. RELATIONAL 35
Similarly to Co-actors, Co-patients are obligatory and occur for reciprocal-sense verbs. Here, how-
ever, the relational correspondence occurs with the Direct Object case slot, with differing degrees of
reciprocity. In the case of kumiawase(-ru) combine, for example, full interreplaceability is possible,
whereas complications occur for replacement-sense verbs such as tyenzi(-suru) to change over and
kokan(-suru) to replace. Even with kokan(-suru), however, a higher degree of reciprocity is seen
than for the English replace, in that the Direct Object source element can indicate the replacing item
given appropriate replacing-type markedness on the case filler.2 Thus, while recognising that implicit
directionality is evident for certain Co-patient marked source types, the Direct Object and Co-patient
case slots can generally be interchanged.
Indirect Object-type targets generally refer to the Recipient or Beneficiary of the described action,
and are optional (again, obtained from the definition of Indirect Objects). Unlike Co-actors and
Co-patients, Indirect Object targets produce a definite sense of directionality of the action, are not
reciprocal (seen in the non-equivalence of (2a) and (2b)), and cannot be coordinated with the target
case-role while retaining the same sense, as occurred above for Co-actors and Co-patients (cf. (2c)).
Examples of Indirect Object targets occur with the verbs watas(-u) to hand over and aisatu(-suru)
to greet.
(2) a. Taro-ga Hanako-ni tegami-o okutta.
Taro-nom Hanako-dat letter-acc sent
Taro sent Hanako a letter.
b. Hanako-ga Taro-ni tegami-o okutta.
Hanako-nom Taro-dat letter-acc sent
Hanako sent Taro a letter. (6= a.)
c. Hanako to Taro-ga tegami-o okutta.
Hanako and Taro-nom letter-acc sent
Hanako and Taro sent a letter (to ( ) ). (6= a.)
IF (animate head)
IF (personal allative noun head AND uninstantiated target case slot )
RETURN ;
ELSE IF (uninstantiated source case slot ) RETURN ;
ELSE IF (uninstantiated target case slot ) RETURN ;
Empathy
Empathy verbs (Kuno 1978) form a proper subset of inter-personal relational verbs, and are defined
by their incompatibility with a first person pronoun in the target case slot for simple inflection usages.
(3) * Taro-ga watasi-to atta.
Taro-nom I-com met
Taro met with me.
2
Mechanical and civil engineering-related instruction manuals frequently contain sentences such as atarasii boruto-o
kokansuru (new bolt-accreplaces) replace ( ) (with) the new bolt, in syntactically unmarked usages.
36 CHAPTER 4. VERB CLASS-BASED RESOLUTION
This high degree of empathic focus on the source entity produces the effect that target case-role
gapping can occur without a surface realised source entity, for unmarked head nouns. Unfortunately,
this often leads to lack of focus-based preferences between the source and target case slots in the case
that both are uninstantiated, unless the head noun intension is marked allatively or empathically. At
the same time, however, temporal or locative grounding tends to weight the focus towards the target
case slot, as does the past tense.
(4) a. [ au ] hito
meets person
the person who met ( ) vs. the person ( ) met
b. [ nitiyobi-ni atta ] hito
Sunday-dat met person
the person ( ) met on Sunday
c. [ Nihon-de au ] hito
Japan-loc meets person
people (one) meets in Japan
The handling of this marginal preference for the target case slot is a somewhat brutal one, in that
simple existence of past tense inflection or local grounding is seen to generate unambiguous target
gapping. However, given that there are no factors working to reverse the preference back in the other
direction, the given treatment seems sufficient.
The algorithm for empathy verbs interfaces with that for inter-personal relational verbs through
sequentiality, in that the following rule is applied prior to the inter-personal relational verb algorithm,
and if an output is returned, that analysis type is automatically returned from the inter-personal
relational verb algorithm.
IF (animate head AND uninstantiated target case slot AND relative clause is
in past tense or is locally grounded) RETURN ;
Including
Verbs contained in the including verb class can be used in a non-restrictive exemplification form,
realisable in the simple non-past or past tense. The exemplar set is construed in the accusative case3 ,
which must be present to trigger the including sense, and no further arguments can collocate with the
main verb.
(5) [ tozai-o fukumeta ] Osyu
east and west-acc included Europe
(lit.) Europe, including (both) the east and the west
Including relative clauses are head restrictive, and the system output on detection of this relative
clause type is the tag for this modifying type, i.e. Inclusive.
The fact that including relative clauses are not case-slot gapping can be seen by considering a
simplex derivation of (6a) below.
(6) a. [ Beisutazu-o fukumu ] zen yakyu timu
Baystars-acc includes all baseball teams
all baseball teams, including the Baystars
b. zen yakyu timu-ga Beisutazu-o fukumu
all baseball teams-nom Baystars-acc includes
All baseball teams include the Baystars.
Clearly, the scope of the existential quantifier zen all is not equivalent between (6a) and (6b),
excepting the case where the Subject in (6b) is treated as being textually containing through quotation,
thus restricting the scope of the quantifier to the Subject NP. However, this quotative interpretation
is not available in (6a) and hence does not constitute direct equivalence.
Members of the including verb class include fukume(-ru) and hazime-to(-suru).
IF (simple main verb inflection AND unique accusatively marked argument) RETURN
Inclusive;
Excluding
Excluding verbs extensionally restrict the modified head noun by identifying elements which are to be
excluded from the default denotation. The exclusion sense of these verbs is produced for simple tense
usages with only the accusative case slot instantiated.4
(7) [ nitiyo-o nozoku ] mainiti
Sunday-acc excludes everyday
everyday, excluding Sundays
While usages such as (7) can be related back to the unmarked simplex sense nozoku, scope differences
occur between excluding relative clauses and the corresponding simplex clause derivant, as was seen
for the including verb class above:
(8) mainiti-kara nitiyo-o nozoku
everyday-abl Sunday-acc excludes
to exclude Sunday from every day
3
Verb arguments can also be marked with the iterative (mo) marker.
4
Note that for excluding verbs, the unique verb argument cannot be marked with the iterative (mo) marker, unlike
including verbs.
38 CHAPTER 4. VERB CLASS-BASED RESOLUTION
This supports a head restrictive relative clause treatment for excluding relative clauses.
The excluding verb class consists uniquely of the verb nozoku.
IF (simple main verb inflection AND unique accusatively marked argument) RETURN
Exclusive;
intransitive interpretation in the case that an overt Subject is not supplied within the relative clause.
This correlates to hedging on the transitivity issue, as no assumption is made one way or the other
as to the full content of the valency frame, and the resultant Subject analysis is equally applicable
to both intransitive and transitive Subject analysis. Indeed, the only instance in which this analysis
would prove incorrect is where gapping has occurred from the Direct Object case slot of the transitive
sense of the verb in question, for a zero Subject.
4.4.2 Partitive
Partitive verbs contain a part/attribute in the nominatively marked Subject case slot, and can
optionally collocate with a topic-marked whole, to which the clausal attribution applies.
(11) a. iro-ga aseta.
colour-nom faded
The colour faded.
b. seta-wa iro-ga aseta.
sweater-top colour-nom faded
The colour faded out of the sweater.
The whole and part can alternatively be coordinated in the Subject position, suggesting the clause
initial topic construction as a major subject (Tateishi 1994), and hence a displaced whole in a
relative clause context as generating a bound gapping clause.
(12) seta no iro-ga aseta.
sweater gen colour-nom faded
(lit.) The colour of the sweater faded.
In terms of our case-role schema, the optional whole topic is classified as a (second, anchoring)
Subject. For relative clause analysis, the gapping of the anchoring whole case slot translates to
a bound gapping relative clause instance, with Subject gapping. Gapping from the part case slot
in the absence of the whole, on the other hand, constitutes simple Subject gapping, noting that
gapping of the part in the presence of the whole is not possible.
(13) [ iro-ga/no aseta ] seta
colour-nom/gen faded sweater
a sweater which has faded in colour
whole-part senses would produce this second interpretation, of strictly equivalent acceptability to a
faded colour due to the identical type restrictions on the part case slot in the two frames. For this
reason, the partitive verb class would appear semantically justified.
The current algorithm is limited in its potential to capture correspondences of this type, by the lack
of a broad-coverage world knowledge source, with which to derive part-whole relationships. Thus, the
actual handling of whole Bound Subject gapping of the type given above, is simplified to assume
that Bound Subject gapping occurs only in the context of full complement case instantiation. What
this means in real terms is that given full complement case instantiation, the system should prefer a
Bound Subject interpretation over a head restrictive relative clause interpretation, these two analysis
types being the only two possible alternatives. Clearly, therefore, considerable scope exists to improve
the current treatment of partitive verbs, and this is left as an item for future research.
4.5.2 Conjoining
Conjoining verbs closely resemble the copula from the standpoint of case-role gapping, by way of
gapping only from the Subject position and being incompatible with both temporal and local case-
roles. As implied by the nomenclature, conjoining verbs semantically conjoin or relate concept pairs,
but differ from relational verbs in that gapping cannot occur from the target (non-subject comparator)
case slot.
Examples of conjoining verbs are uwamawar(-u) to exceed and kanren(-suru) to relate to
4.5.3 Quantative
Quantative verbs are exempt from the default adjunct compatibility for time and cardinal adjuncts,
with quantative arguments implicitly expressible through the valency frame-defined maximally pe-
ripheral complement case slot. In real terms, the maximally peripheral complement case slot is the
final (rightmost) complement represented within the valency frame.
5
Due to the modular nature of the given verb class system in handling adjuncts, it is neither possible nor desirable
to code multiple adjunct (in)compatibilities within a single verb class, and the observed adjunct incompatibilities for
copula verbs are not applied directly from the copula verb class within our system.
4.5. OTHER VERB CLASSES 41
4.5.4 Existential
Existential verbs are stative verbs which can include mention of the locus??? of the state. As a direct
consequence of the adjunct status of the locative case-role, multiple mention of locus can occur, with
the separate locative case slots marked distinctly as being inner and outer positions (inner/outer
terminology taken from Halliday (1970) and Platt (1971)). The (basic) Inner Locative is marked
datively and is the default, whereas the (peripheral) Outer Locative is marked with the locative case
marker (de), and occurs only in conjunction with the Inner Locative. This marks a point of departure
from prototypical adjunct repetition, by way of repeatability not extending to the case marking level.
Allocation to the two case slots is determined according to the relative specificity or local granularity
of the locative case fillers, with the finer grained case filler occupying the inner case slot.
(17) Taro-ga Pari-de(ha) okina ikkenya-ni sundeiru
Taro-nom Paris-loc(top) large house-dat is living
Taro lives in a big house in Paris.
In (17), for example, okina ikkenya corresponds to the Inner Locative case slot, and Pari to the
outer case slot. The role of granularity in demarking these case slots is evident in that okina ikkenya is
geographically contained within the extension of Pari, and the two locatives can be coordinated by use
of the genitive connective (no) producing Pari no okina ikkenya big house in Paris. Plugging this
genitive coordinated locative back into the original clause, we see that the two locatives are conflated
within the Inner Locative case slot:
(18) Taro-ga Pari no okina ikkenya-ni sundeiru
Taro-nom Paris gen large house-dat is living
Taro lives in a big house in Paris.
Additionally, if we consider (17) in the absence of the mention of okina ikkenya, we see that Pari is
forced from the Outer Locative case slot into the Inner:
(19) a. * Taro-ga Pari-de sundeiru
Taro-nom Paris-loc is living
Taro lives in Paris. (intended)
b. Taro-ga Pari-ni sundeiru
Taro-nom Paris-dat is living
Taro lives in Paris.
42 CHAPTER 4. VERB CLASS-BASED RESOLUTION
From this, it is clear that the outer locative case slot occurs only in simplex conjunction with an
Inner Locative, and conversely that any singular locative case-role mention for existential verbs must
occur in the Inner Locative case slot (and hence be datively marked for existential verbs).
Returning to consideration of relative clauses, this produces an immediate result for locative gapping
existential verbs. That is, the head of a relative clause containing an Outer Locative and no Inner
Locative must have been gapped from the Inner Locative case slot, given that the Inner/Outer Locative
dichotomy is preserved under case-role gapping. Hence, given that an Outer Locative can only exist
in the presence of an Inner Locative mate, dangling Outer Locatives indicate cases of Inner Locative
gapping.
(20) [ Taro-ga Pari-de sundeiru ] ie
Taro-nom Paris-loc is living house
the house Taro lives in in Paris
Additionally, in the absence of any locative in the relative clause body, Locative gapping must
occur from the Inner Locative case slot, noting that the Outer Locative case slot is equally tenable to
case-role gapping as the Inner Locative.
(21) [ Taro-ga okina ikkenya-ni sundeiru ] Pari
Taro-nom big house-dat is living Paris
(lit.) Paris, where Taro lives in a big house
Examples of existential verbs are sum(-u) to live/inhabit and kizon(-suru) to exist.
IF (locative head)
Experiential
Experiential verbs6 form a proper subset of existential verbs, and are additionally compatible with an
optional Target case slot, realised in the dative case.
Examples of experiential verbs are i(-ru) to be/have and ar(-u) to be/have.
IF (locative head)
ELSE IF (generic object-referring head AND simple non-past main verb tense AND
relative clause subjectless) RETURN Instrumental;
4.6 Quotative
Quotative verbs are compatible with clause quotative (subordinating) usages, with the subordinated
clause marked with the quotative (to) case marker. When nominalised, quotative verbs can generally
express indirect and direct quotation through message linking with the to-no complex case marker or
to-i(-u) relational verb.8 For quotative verbs, case-role gapping can occur from within the subordinate
clause (subordinate clause gapping - see below).
Examples of quotative verbs are i(-u) to say, omo(-u) to think and tutae(-ru) to report.
Miscellaneous processing
46
5.2. TIME-RELATED ADJUNCTS 47
this threshold value. However, given the limited size of the (annotated) corpus currently used, and its
closed set nature, this method could not be applied. Instead, nouns were experientially evaluated for
propensity to case-role gapping, and those for which case-role gapping was possible only in relatively
restricted domains, were included in the non-gapping expression dictionary.
A further automatic learning procedure could be applied to learn verb collocations which produce
case-role gapping sense for non-gapping expression heads. Here again, however, a richer source of
annotated relative clause instances would be required than is currently available.
Instances of each of these temporal units in the target linguistic unit are marked by switching on the
corresponding slot in the vector. For example, 1998-nen no 2-gatu 1998 gen February would result
in the vector:
1 1 0 0
Our interest in temporal vectors lies in their application to the analysis of Temporal gapping relative
clauses, where an effect of concurrent case-role instantiation quite distinct to that for local adjuncts, is
produced. Recall that for locatives, we introduced the notion of inner and outer case-roles, which were
related through granularity/specificity, and interrelated at a high semantic level. For temporal case-
roles, the reverse is true, in that while there appears to be an inherent limitation of two on the number
of temporals which can include in a given context (including the noun head in the case of relative
clauses), the only constraint on multiple Temporals is that their temporal vector representations are
not permitted to overlap. In computer hardware terms, when the two temporal vectors are logically
ANDed together, the resultant temporal vector must consist of all zeroes. This leads to the following
grammaticality judgements for relative clauses:
(3) [ 1-gatu-ni kaigi-ga okonawareta ] hi
January-dat meeting-nom was held day
days in January on which meetings were held
*[ 22-niti-ni kaigi-ga okonawareta ] hi
22nd-dat meeting-nom was held day
(lit.) days on which meetings were held on the 22nd
We return to consider temporal vectors in the next section.
48 CHAPTER 5. MISCELLANEOUS PROCESSING
sono-hi that day, yokuzitu the following day, tozitu that day
These can collocate with any main verb inflectional type to generate a time relative interpretation.
Time relative complexes are generally produced by attaching a postfix to a phrase describing a time
span. Instances of time relative complexes are:
1-kagetu-go one month later, nan-nitika-mae a few days before, 2-okunen-mae 200
million years before
The two affixes which can collocate with time relative complexes are -go after and -mae before.
For -go, the stem verb must be in the simple past tense to produce a time relative construction,
whereas -mae requires the simple present tense. If the head is a time relative complex but tense and
aspectual requirements are not met, a Temporal case-role relative clause is produced.
The effect of the tense and aspect of the stem verb in variously producing a time relative construction
and a non-relative temporal construction, is illustrated by:
(4) kyoryu-ga sunde-ita yaku-2-okunen-mae
dinosaurs-nom were living about 200 million years ago
about 200 million years ago, when dinosaurs lived
1 0 0 0
5.2. TIME-RELATED ADJUNCTS 49
Hence, in applying one as a mask over the other, the inital flag remains set.
This provides a type test for time relative expressions, and allows us to draw a distinct line between
time relative clauses and Temporal gapping clauses. This is worthy of particular note, as time relative
clauses have been largely misrepresented as case-role gapping relative clauses, with gapping occurring
from the temporal case-role (Matsumoto 1997:53).
That is, they constitute the set of temporal expressions which are well defined within the context of
the surrounding text.
Generic temporal expressions are of the type:
These express generic temporal categories and are semantically restricted by the clause body. They
can be likened to lambda expressions in that they are ground time-type case slot casts, without
having the semantic extra-clausal and intra-clausal semantic incompatibility described below for time
relative constructions.
Non-relative temporal constructions are temporal constructions which involve a time relative com-
plex head, but which do not fulfill the stem verb inflectional requirements of a time relative construction
(see above). Non-relative temporal constructions produce Temporal case-slot gapping relative clause
sense.
Note that there is a certain degree of reliance on the surrounding context as to whether a temporal
expression is absolute or generic, in that most absolute expressions can be forced to take a generic
reading. This difference is most noteworthy when analysing restrictive and non-restrictive relative
clauses, a matter which is beyond the scope of the current research.
Lexical ambiguity
This chapter describes methods of resolving lexical ambiguity in the main verb and noun head, through
statistical/representational preference and thesaurus use, respectively.
Plurality of successfully parsed entries results from a combination of both full and partial verb ho-
mophony and homography.
Full verb homophony is a direct result of the existence of multiple inter-replaceable writing systems
within Japanese (hiragana, katakana and kanji), and occurs when two distinct verb entries coincide
in both conjugational type and phonetic content of the verb stem/auxiliary verb complex. It is
distinguishable from polysemy by virtue of the fact that disambiguation is achievable through use
of the kanji form of the verb stem. An example of full verb homophony is a(-u), for which three
heterogeneous kanji forms produce the distinct entries corresponding to the generic glosses of to
meet (q&), to coincide (g&) and to encounter ()&). Full homophony can alternatively be
produced through combinations of auxiliary verb morphemes, such that miau is ambiguous between
mi-a(-u) to see--mutual and mia(-u) to correspond.
Full verb homography is analogous to full verb homophony, except that the ambiguity exists in the
kanji-based representation for coinciding conjugational types. In this case, disambiguation is possible
through the kana phonetic version of the verb in question. An example of a full homograph occurs for
the verbs tome(-ru) to stopTRANS and yame(-ru)to quit/put an end to, for which a common kanji (
_) corresponds to the to- and ya- prefixes, respectively.
Partial verb homophony, meanwhile, occurs for verbs which differ in conjugational type, but agree
in phonetic content of the verb stem. In this case, heteronomy of kana representation is produced for
only certain inflectional types. In the case of our example of a(-u), ar(-u) to have shares the verb
stem of a-, and a heteronym is produced in the simple past tense, in the form of atta. Here again,
however, kanji representation allows us to resolve the lexical ambiguity. Partial verb homography
closely resembles partial verb homophony, except that the lexical ambiguity is produced in the kanji
form, and resolvable through the use of kana. One example of partial homophony is produced for the
simple past tense verbs i-tta to go--past and okona-tta to carry out/hold--past, in that a single
kanji (T) is used to represent both i- and okona-, respectively.
Note that in both of the classifications of partial heteronymic correspondence, the degree of coinci-
dence is usually highly restricted, unlike full verb heteronymy. For the i-tta/okona-tta ambiguity, for
example, partial heteronymy occurs only in the simple past tense or for progressive/perfective aspect.
52
6.2. RESOLVING VERB LEXICAL AMBIGUITY 53
1+f req(af )
RP (af ) = 1+
P
f req(if ) (6.2)
i6=a
This is normalised over the representational preference for all source entries ai , to produce the nor-
malised representational preference N RP (af ).
SW (af )
V S(af ) = (CIC(af )min inf l+1) (6.4)
Accuracy on case-role
Overall accuracy
gapping clause instances
(4411) (3650)
developing the system, and the subset of gapping relative clauses. The sizes of the two test sets are
indicated in brackets below each heading.
The baseline method for evaluation purposes simply selects the verb sense of highest probability
when multiple parses are produced, which equates to utilising the naive probability method in com-
puting the verb score, with set to zero. The optimal achievable result for the system is determined
by testing for membership of the correct analysis in the full set of analysis types produced for all
successful parses. Given that verb scores simply rank these candidates, it is impossible for the other
methods to better this non-deterministic method.
Table 6.1 lists the comparative results for the various methods1 , including evaluation of varying
values of for both the NPO and NRP methods. The 1.4% point difference between the overall
accuracy for the baseline method and that for the NPO method with various values of is a direct
indication of the effects of weighting according to inflectional complexity, although the ineffectiveness
of an increased value is unexpected.
Likewise for the NRP method, whereas results are significantly higher than those for the baseline
method, altering produced only minor improvement. Indeed, performance with set to zero (i.e.
without consideration of CIC) marginally outperformed NRP with set to one, although the statistical
significance of this difference is questionable. This would tend to suggest that there is some interference
in the choice of representational form of the verb stem given complex inflection, a fact which was borne
out on summary inspection of the data. That is, the kanji form of the verb stem is generally utilised
if auxiliary verbs are also given in a kanji representation, and full hiragana representation is generally
reserved for simple inflection uses, such that a hiragana occurrence of miau would tend to point to
the simple inflectional mia-u stem (see section 6.1).
Perhaps more noticeable, however, is that the NPO method slightly outperforms NRP, which leads
to the conclusion that representational preference in isolation is outweighed by the brute force of
likelihood of sense.
Based on these results, we adopt the NPO method for the remainder of this paper, with set to
one.
separately in the thesaurus, rather than attempting to maintain a one-to-one isomorphism between
lexical form and thesaurus correspondences. This presents us with a dilemna, as we wish to not only
classify lexical arguments according to type (i.e. as animate, locative, etc.), but also to weight the
different senses so as to be able to chose between adjunct and complement senses, for example.
The method we use to weight nouns (W (N )) on class typicality, is simply to count the total number
of occurrences of that noun N in the thesaurus, and the number of occurrences which fall into the
particular sub-trees we have designated as classifying a particular type T , and calculate the ratio
thereof.
f req(N T )
W (N ) = f req(N ) (6.5)
For a noun N which never occurs in the extension of T , W (N ) thus becomes zero, whereas for an N
fully enclosed within T , W (N ) is one.
In terms of the application of this weighting scheme to class membership, we stipulate a threshold
for animacy, such that W (N ) must be greater than or equal to 0.5 for N to be judged as animate.
For the locative class, on the other hand, we skew the distribution of the produced W (N ) to
produce preference for highly prototypical locatives, over animacy and other judgements, but penalise
less clear-cut examples. The way we do this is to apply the function Loc(N ):
[W (N )+1]5
Loc(N ) = 21
(6.6)
What this crude and computationally expensive function does is to inflate values closer to one, to a
maximum of around 1.52, and penalise anything under a value of around 0.84 (actually, 5 21 1) by
way of a relatively steep parabolic curve (values near zero actually increase slightly).
Chapter 7
Despite the obvious attractions of the algorithm in the form presented to here, and its ability to weight
interpretations, it still lacks in its ability to capture inter-clausal context, in what turns out to provide
a surprisingly rich source of restrictions on the interpretation type. Here, we discuss the processing of
cosubordinated clauses, coordinated clauses and coordinated heads.
57
58 CHAPTER 7. EXTENSIONS TO THE BASIC ALGORITHM
Two verb types which do not contribute to the clause sub-type, and are hence disregarded during
the resolution process, are the excluding and including types. Excluding and including clauses are
adverbial constructions, and hence exempted from consideration with hypothesis (4). Considering (6),
in which the first clause is of the excluding type, the main clause is essentially treated as a simplex
clause, and the Subject gapping sub-type can be recovered.
One fact which is clear from the original description of conjunction types is that peripheral sub-
ordinating usages exist for all conjunctions except the renyo form, suggesting difficulty in correctly
predicting the type of clause dependency in a given clause prior to being able to apply the restrictions
proposed in section 7.1.1. While this is certainly the case for te clauses, complement analysis-based
heuristics were found to be productive in correctly analysing nagara and tutu clauses. These heuristics
consist of analysing the complement content of the cosubordinated clause to determine if all non-
Subject complement case slots are instantiated. The exceptional treatment of the Subject case is
founded in the observation that these are small clauses (Radford 1981), the Subject of these sub-
ordinate clause types is inherently coindexed to that of the superordinate clause, through a PRO
mechanism, and overt Subject mention within the nagara clause is not possible.
If full instantiation is detected, the unit clause in question is therefore discounted from the reso-
lution process, on the grounds of being adverbial. This process can be seen to correctly identify the
subordinated nagara clause in (7), with the Direct Object gap existing only in the main clause and
no Direct Object incompatibility restriction imposed by the nagara clause.
One additional qualification which must be made to (4) is that it does not seem to apply to the
bounded case-role for bounded relative clauses. To take an example, the Direct Object case-slot is
bound in the first clause of (8), but the final clause is a clear instance of Subject case-slot gapping. At
the present time, we have no explanation for this effect, and simply disregard bounded relative clauses
during cosubordination-based resolution. Note, however, that this does not threaten the applicability
of (4), as bounded relative clauses are included under the classification of case-role gapping relative
clauses.
Evaluation
( 2 + 1) Recall Precision
F-measure =
2 Recall + Precision
Cases where a zero denominator has made any of these values incalculable are indicated in the results
as N.C. (Non-Calculable).
Evaluation of total performance for the given data set is calculated only in terms of precision
(accuracy), but this figure is identical to that for the system recall on the full input set.
61
62 CHAPTER 8. EVALUATION
of head type or verb class membership, we are able to attain an overall accuracy of 65.09%. This
forms the true baseline performance for our system (B1 ).
An alternative baseline performance figure can be obatined from the implementation of the algorithm
proposed in (Baldwin et al. 1997a), which constitutes a much simplified version of our final system, but
relies on the same basic concepts and methods for non-gapping expressions, Temporal case gapping,
time relative constructions, and and case slot instantiation. First, non-gapping expression-headed
relative clauses are filtered off as generating head restrictive relative clauses. Next, the system accesses
a transitivity judgements for the main verb of the input realtive clause, based on which the system
attempts to map the head onto the Direct Object case slot (assumed accusative case marking) for
transitive verbs, and the Subject case slot (assumed nominative case marking) failing this. As a
default, all relative clauses are assumed to be head restrictive.
This algorithm (B2 ) produces significant improvement over B1 above, with an overall accuracy of
75.1%, and is detailed along with B1 in Table 8.1.
The overall performance of the system on the corpus of 4615 annotated relative clauses is detailed
in Table 8.1. The overall system accuracy calculates to around 89%, as compared to 65% for the true
baseline Subject case-slot analysis method (B1 ) and 75% for the naive transitivity algorithm (B2 ).
8.2. OVERALL EVALUATION 63
Within the figure of 89%, the contribution from the case-role gapping and head restrictive relative
clause groups is approximately equivalent.
The first major result is the disparity between the recall for these two relative clause types, with
case-role gapping clauses far outperforming head restrictive clauses on 95%, as compared to 55%. This
points to there being an over-bias towards case-role gapping gapping clause analysis, and overgener-
ation occurring for this type, an unsurprising result given the core focus of this research on case-role
gapping clauses. Within the case-role gapping relative clause type, however, the figures for bounded
relative clauses are slightly disappointing, and again there are signs of overgeneration occurring.
An interesting correspondence between case-slot accessibility/immediacy and accuracy, for the com-
plement case-role set, with performance gradually degrading from Subject to Direct Object, Co-actor
and Indirect Object. It must be said, however, that this trend is probably not entirely coincidental,
due to the focus placed on the more accessible case-roles during the verb class production phase.
The worst figures are seen for the Local case-role set, a sign of the frequent ambiguity between
the locality and animacy/autonomy senses, as occurs for country name references. Additionally, the
context-independence of locative detection leads to the system occassionally missing the local sense
altogether. Having said this, it is reassuring to note that the lowest F-measure values are at least
comparable to the accuracy for the true baseline of B1 , with the Allative and Perlative gapping
analyses roughly equivalent in degree to B2 .
The treatment of the time case-slots and time relatives was, if anything, better than expected,
and the main source of noise between the Temporal and Durational case-slots was mistaken mapping
between the two. That is, the system is generally able to ascertain the time-relatedness of time
case-gapping, but has slight difficulties in differentiating between the two sub-types.
Similarly, the system performed remarkably well for the well-defined head restrictive sub-classificat-
ions, with only the Inclusive sense falling below a 95% F-measure value. One conclusion which could
be drawn from this is that the successful handling of other well-defined head restricting phenomena
could well be the most efficient method of further improving system performance, and eliminating
overgeneration of case-slot gapping interpretations.
Finally, full clause-based idiom detection was predictably excellent, as was the identification of
subordinate gapping instances.
14 of the 16 displacement instances, and that the difference between the overall performance for the
two methods was well beyond the scope of this localised phenomenon.
in that the simplex algorithm is uncapable of correctly analysing the 15 gapping subordinate-type
clauses. Having said this, the degree to which the subordinate gapping rule set outperformed the
simplex algorithm goes beyond the scope of these 15 examples, particularly as a result of gap incom-
patibility judgements. Perhaps more important, however, is that the subordinate gapping rule set
returned higher figures than the overall averages calculated during overall evaluation (see Table 8.1).
of the overgeneration in the Subject case slot may have been detracting from the results for Co-actors
(there were no Co-patients in the test set) is proven correct, as the number of correctly analysed
Subject case-roles remains unchanged, but an additional 8 Co-actors analyses are reproduced.
On mimicking the above test for reciprocity on generic relational verbs, some gain in performance
resulted, but not to the same extext as seen here for inter-personal relational verbs. The reciprocity
data for generic verbs is ommited from this thesis for reasons of space.
The remainder of this chapter is devoted to results for the individual verb classes developed in this
research. As general trends, the movement and locational verbs produced lower overall accuracies
than most other classes, dipping below the mean accuracy seen in the overall analysis, in the case
of distal movement verbs. The verb class which stands out as requiring further attention is that of
tool-based actions, although it is important to realise that the deflated results are no fault of the verb
class characterisation, as there was only one Implement occurrence in the entire corpus.
68 CHAPTER 8. EVALUATION
Conclusions
In this thesis, we have proposed an account of relative clause-hood, focusing on the precepts of case-
role construal and boundedness of case-roles within the relative clause body. Analysis of case-role
gapping was then described based along the lines of three main paradigms:
First, we introduced an argument status hierarchy, defined according to affinity with the
predicate, which assigns behavioural properties to the different argument types according to
rank. This argument type hierarchy was applied to define a case-role schema, and predict the
syntactic nature of each case-role type based on the inherent features of each argument type.
Additionally, the basic ranking of argument types was used to apply preferences to argument
types to accommodate the case-role gap, for which purpose the verb class hierarchy was linked
in closely with argument stasis, and the valency frame dictionary referenced by the resolution
system tagged accordingly.
Next, we defined a 17-way case-role schema with which to analyse case-role gapping in the
relative clause context. All effort was made to provide tests for the less intuitive case-role
types, and document their syntactic and semantic behvioural patterns, so as to make them
reproduceable in an alternate system/theoretical context. Case-roles were allocated a unique
argument status, which was used to predict basic accessibility to case-role gapping for the various
verb classes proposed. One factor which set the core component of the case-role schema apart
from traditional Case accounts was its grammatical dependence, with the central case-roles as
the Subject, Direct Object and Indirect Object, and case-role transformation occurring readily
under modal transformation.
Last, we proposed a verb class hierarchy with which to map links between case-roles intrinsic
to that verb class, and predict adjunct (in)compatibility. As an account of the basic semantic
of each class, a simple conditional-based rule set was described for each rule set, which can be
applied in case-role gapping resolution as a realisation of the interdependence between case-roles
and to the propotypical verb sense.
The individual verb class rule-sets were complemented with a set of adjunct-based filters, which
interface with the verb class rule-sets through sortal preferences and relative weighting. Basic weighting
mechanisms were then described, which are combined additively in the context of multiple analyses
from the different rule-sets. Discussion was next made of clausal cosubordination, and clausal and
noun head coordination, and the roles they can play in restricting the scope of interpretation and
offsetting local sortal preferences.
Finally, a full account was given of the system resources in carrying out the resolution task, including
the means used to extract and structure the system dictionary.
Under evaluation, the proposed system produced a mean accuracy of around 89% on the test set
of 4615 relative clause instances. Separate experimentation was also documented to indicate the
74
75
effectivity of fixed expressions in the system dictionary, and applicability of inter-clausal relations
to narrow the scope of case-slot interpretations and identify alternate host clauses for the case-slot
gapping process.
The valency dictionary utilised by the system was extracted from the NTT valency dictionary (Ikehara
et al. 1997) and subsequently modified/expanded. First, description will be given of the original
structure of the NTT valency dictionary, followed by discussion of the extraction method and data
incorporated/excluded from the system valency dictionary.
One inevitable failing of any attempt to attribute a unique valency frame to an essentially lexical
verb representation, is the inability to combine verb senses involving distinct case marking/valency.
Assuming identical complement valency, however, it is often possible to merge such multiple va-
lency frames simply by combining corresponding case slots. Naturally, this also assumes case slot
correspondence, a sometimes unrealistic expectation.
One method of overcoming valency variation in cases where subsumption of valency frame content
occurs, is to mark coordinated/optional case slots within the valency frame. This approach is generally
applicable to the Indirect Object, Co-actor and Co-patient case slots for relational verbs.
An alternative method is to take the intersection of conflicting candidate valency frames, and gen-
erate expanded derivative valency frames, as required through the use of verb classes. This is the
method applied for conflated ergative and partitive verbs.
A more serious dilemna results from overlap in case marking between distinct case slots, such that
the basic marking produces a one-to-many mapping onto the valency frame. In these cases, ambiguous
case marking is retained in the valency frame and various heuristics are applied to resolve case slot
ambiguities at run time.
76
A.1. THE SYSTEM VALENCY DICTIONARY 77
23 DE# F % FHG I ! % "J K I D % Q >R
4
5 7686 ! STUVUXW
"#$%$
^%
"# $ I % F $ pqo eErtslu kmkh` eEs
G" %! IK8v w w]x % w/y Relative c jlkmi8c n4n`bo e n
clause
analyser
Case-role gap /
restrictive clause type ...
Additionally, in merging case slots and case marking, we potentially create ungrammatical case
marker combinations. Consider the case of stative verbs (i.e. deki(-ru) can do/be done, ar(-u) to
be/have, et al.), which exhibit the case marking characteristics described in the following connection
matrix:
DObject
nom acc
top yes yes
Subject nom yes yes
dat yes no
That is, the Subject/Direct Object case marking patterns of top/nom, top/acc, nom/nom, and
so on are acceptable, but dat/acc produces ungrammaticality.
In collapsing these combinations into a single valency frame representation, we produce the following,
inferring the existence of this ungrammatical dat/acc marking type:
dat
nom
(1) Subject- top/nom DObject- nom deki-ru
nom
acc
While recognising this overmodelling characteristic of the valency frame representation used, I sug-
gest that this does not pose problems for pure analysis purposes.
Fixed expressions
Fixed expressions are defined as entries which contain at least one integral complement in their valency
frame; any such complement case slots are provided with a set of lexical fillers. To be triggered, all
lexically instantiated case slots must match with the system input. As a means of ensuring no overlap
between fixed expressions, the system valency dictionary has been designed such that lexical fillers
are mutually exclusive in surface content for a given verbal stem and case slot. That is, for a given
verbal stem, no two fixed expressions share any subset of the fixed case element content. However,
this guarantee of mutual exclusivity is not sufficient in itself to guarantee at most one fixed expression
for an arbitrary system input, as case element correspondence can potentially occur between distinct
case slots in the input for separate fixed expression entries.
One issue to arise from the inclusion of fixed expressions in the case dictionary is whether instantiated
case fillers can be gapped to become the noun head (displaceability). Here, there is a distinct division
A.1. THE SYSTEM VALENCY DICTIONARY 79
between idiomatic-type fixed expressions and case element-defined fixed expressions. In general,
the semantics of idiomatic-type fixed expressions are not intuitively accessible from the independent
meanings of the verbal stem and instantiated case fillers, and displacement of any one of the case
fillers removes the idiomatic sense. For case element-defined fixed expressions, on the other hand,
the fixed case element(s) tend to be intrinsically limited in default sense, and simply restrict rather
than modify the semantic content of the root verb. Indeed, some of the case element-defined fixed
expressions extracted from the NTT dictionary were questionable as to their true idiomatic status,
but for reasons of economy and consistency, no attempt was made to filter off such usages.
In terms of argument status, most integral complements produce idiomatic-type fixed expressions.
The NTT valency dictionary does not contain information on the displaceability of fixed case el-
ements from within relative clauses, and as such, all fixed expressions were manually analysed, and
displaceable fixed case elements marked explicitly as being such. The method used to judge the
displaceability of each fixed case slot was a threefold one.
Firstly, the fixed case element in question was moved from within the valency frame to assume the
role of the head of the derived relative clause. Any non-fixed case slots were then instantiated with
appropriately non-specific arguments before comparing the semantics of the resulting relative clause
with the original semantics of the source sense to check for coincidence of sense.
Next, the relative clause complex produced above was considered with all non-fixed case slots el-
lipted. To fulfil the displaceability requirement, the gapped case slot had to be uniquely identifiable
as having been gapped from its original case slot, within the scope of the original sense of that entry.
Finally, the freedom of case slot order was tested within a matrix clause context (scrambling), with
appropriate non-specific arguments used as fillers for the non-fixed case slots. Despite Japanese being
well known as a free word order language, different permutations of case slots can produce variable
acceptability. Thus, freedom of case slot order was measured by the existence of at least one case
frame permutation which involved the fixed case slot in question, and not by requiring that all such
permutations lead to an acceptable case slot order. Idiomatic fixed expressions are characterised by
fixed case slots generally being final in the case slot ordering, and adverbs or adverbial arguments not
being insertable between the fixed case slot and predicate. In the case that either of these conditions
can be violated for a given fixed case slot, that case slot is generally displaceable in the relative clause
context.
Case slots which fulfilled all of the above requirements were judged to be fully displaceable, and
marked as such.
valency frame.
their hiragana equivalents in rematching the string. The reason that katakana-hiragana conversion is
carried out only after failure to detect a match, is that there are significant numbers of verbs for which
a substring of the stem must be in katakana. This occurs particularly for words which originated
from non-Japanese sources (commonly European languages), such as dabur-u to double up/overlap.
Inflectional types
Inflectional types are given as an index to the inflectional class that verbal stem belongs to, through
which the full inflectional paradigm for that verb is defined.
Head compatibility
Each verb sense is provided with a head type. The head type is used in the detection of full relative
clause-based idioms (see above), by way of describing the lexical head for each full idiom entry. For
other dictionary entry types, this head type is necessarily given as an asterisk (dont care).
(non-fixed expression) verb senses of that same verbal stem. Comparison was based on (a) argument
status, (b) deep case correspondence, and (c) case marker correspondence for matching case slots.
Case slots were first evaluated in terms of their adjunct/complement status. This was achieved
through direct application of the deep case analysis given within each valency frame. Based on this
disambiguated argument status, any adjunct case slots were tentatively removed from the valency
frame.
Deep case case comparison was facilitated by superimposing complement valency frame skeletons
onto the default valency frame skeleton. In the case of a full match, corresponding case markers
were simply merged, whereas instances of the current valency frame constituting a past of the default
valency frame were treated by similarly merging the case markers for the set of matched case slots.
Any instances of the current valency frame being a superset of the default valency frame were marked
for manual analysis. Likewise, occurrences of partial overlap/full disjunction was marked for later
analysis.
The verb class content of the default valency frame was determined through the union of the verb
class content of other generalised (non-fixed expression) verb senses of that same verbal stem.
Following extraction of the default valency frame for each verbal stem, fixed expressions were ex-
tracted individually, and simply converted into the desired format for inclusion into the system valency
dictionary. Within the fixed expressions contained in the NTT dictionary, there is a small proportion
of lexical overlap resulting from multiple senses being attributed to a single valency frame. In general,
the valency frame content of these overlapping entries coincides fully, and subsequent occurrences of
a given valency frame can simply be ignored. For the limited number of overlapping fixed-expression
valency frames for which this was not the case, the same merging process as applied for generalised
entries was employed.
All full clause-based idioms included in the system valency dictionary were manually added, as
clause-based expressions are not included in the NTT valency dictionary.
Essentially, this equates to labelling each case slot with a unique ID, and identifying the case slot
trace through the use of this ID. However, by way of establishing semantic equivalence classes of deep
class IDs, evaluation can be made of parallelism between analysis of different verb stem and within
verb classes.
While these are strong correspondences between typical surface case markers and deep case IDs,
significant deviation exists.
In particular, explicit deep case identification is required to distinguish between distinct case slots
marked with an identical surface case marker, and also between valency frame-defined case slots and
verb class/algorithm-defined case slots.
to maximise efficiency and dynamicism. The principle use made of the semantic dictionary is in
classifying noun heads as being animate (person/organisation), locative or abstract. This information
is used to override the inherent preferential ordering of the valency frame, according to the verb class
of the stem verb.
Bibliography
Baldwin, T. 1998. Relative clause coordination and subordination in Japanese. In Proc. of the
Australian Natural Language Processing Postgraduate Workshop, 110.
, H. Tanaka, and T. Tokunaga. 1997a. Analysis of head gapping in Japanese relative clauses.
In Information Processing Society of Japan SIG Notes, volume 97, no. 4, 18.
, T. Tokunaga, and H. Tanaka. 1997b. Semantic verb classes in the analysis of head gapping in
Japanese relative clauses. In Proc. of the 4th Natural Language Processing Pacific Rim Symposium
1997 (NLPRS97), 40914.
Bond, F., and S. Shirai, 1997. Practical and Efficient Organization of a Large Valency Dictio-
nary. Handout at the Workshop on Multilingual Information Processing, held in conjunction with
NLPRS97.
EDR, 1995. EDR Electronic Dictionary Technical Guide. Japan Electronic Dictionary Research
Institute, Ltd. (In Japanese).
Fillmore, C.J. 1968. The case for case. In Universals in linguistic theory, ed. by E. Bach and R.T.
Harms, 188. New York: Holt, Rinehart and Winston.
Foley, W., and R. Van Valin. 1984. Functional Syntax and Universal Grammar . CUP.
Good, I.J. 1965. The Estimation of Probabilities. MIT Press, Cambridge MA.
Halliday, M. A. K. 1970. Language structure and language function. In New Horizons in Linguis-
tics, ed. by J. Lyons, 14065. Harmondsworth, Middlesex: Penguin.
Heinz, H. 1978. On the possibility of distinguishing between complements and adjuncts. In Valence,
Semantic Case, and Grammatical Relations, ed. by W. Abraham, 2245. Amsterdam: John
Benjamins.
85
86 BIBLIOGRAPHY
Helbig, G., and W. Schenkel. 1973. Worterbuch zur Valenz und Distribution deutscher Verben.
Leipzig: VEB Verlag Enzyklopadie.
Hoshi, H., 1994. Passive, Causive, and Light Verbs: A Study of Theta Role Assignment. University
of Connecticut dissertation.
Ikehara, S., M. Miyazaki, and A. Yokoo. 1993. Classification of language knowledge for meaning
analysis in machine translation. Transactions of the Information Processing Society of Japan
34.16921704. (In Japanese).
IPA, 1987. IPA Lexicon of the Japanese Language for Computers. (In Japanese).
Jacobsen, W.M. 1992. The Transitive Structure of Events in Japanese. Kurosio Publishers.
Kameyama, M. 1995. The syntax and semantics of the Japanese Language Engine. In Japanese
Sentence Processing, ed. by R. Mazuka and N. Nagai, 15376. Lawrence Erlbaum Associates.
Kanzaki, K. 1997. Lexical semantic relations between adnominal constituents and their head nouns.
Mathematical Linguistics 21.5368. (In Japanese).
, and H. Isahara. 1997. Lexical semantics for adnominal constituents in Japanese. In Proc. of
the 4th Natural Language Processing Pacific Rim Symposium 1997 (NLPRS97), 5736.
Keenan, E. L., and B. Comrie. 1977. Noun phrase accessibility and universal grammar. Linguistic
Inquiry 8.6399.
1973b. The Structure of the Japanese Language. MIT Press, Cambridge MA.
1976. Subject, theme, and the speakers empathy [A reexamination of relativization phenomena].
In Subject and Topic, ed. by C. Li, 41944. New York: Academic Press.
1980. Bun no kozo. In Nichi-ei Hikaku Koza II: Bunpo, ed. by T. Kunihiro, 2362. Tokyo:
Taishukan. (In Japanese).
Levin, B. 1993. English Verb Classes and Alterations. University of Chicago Press.
Matsumoto, Yoshiko. 1990. Role of pragmatics in japanese relative clauses. Lingua 82.11129.
. 1996. Interaction of factors in construal: Japanese relative clauses. In (Shibatani and Thompson
1996).
Miyagawa, S. 1988. Predication and numeral quantifier. In Papers from the Second International
Workshop on Japanese Syntax , ed. by W. Poser, 15792. Stanford: CSLI.
1989a. Light verbs and the ergative hypothesis. Linguistic Inquiry 20.65968.
1989b. Structure and Case Marking in Japanese. San Diego: Academic Press.
Muraki, M., 1970. Presupposition, Pseudo-clefting, and Thematization. University of Texas, Austin
dissertation.
Nakaiwa, H., and S. Ikehara. 1994. Japanese zero pronoun resolution in a machine translation
system using verbal semantic attributes. In Proc. of SPICIS 94 , B3416.
, and . 1996. Anaphora resolution of Japanese zero pronouns with deictic reference. In Proc.
of the 16th International Conference on Computational Linguistics (COLING 96), 8127.
, A. Yokoo, and S. Ikehara. 1994. A system of verbal semantic attributes focused on the syntac-
tic correspondence between Japanese and English. In Proc. of the 15th International Conference
on Computational Linguistics (COLING 94), 6728.
Neumann, C., 1994. La construction relative en franconien et en francais et les deux fonctions de la
relativation. Masters thesis, Universite Michel de Montaigne Bordeaux III.
Nishida, F., S. Takamatsu, and H. Kuroki. 1980. A proper treatment of syntax and semantics in
machine translation. In Proc. of the 8th International Conference on Computational Linguistics
(COLING 80), 44754.
88 BIBLIOGRAPHY
Nomura, N., and K. Muraki. 1996. An empirical architecture for verb categorization frame. In
Proc. of the 16th International Conference on Computational Linguistics (COLING 96), 6405.
Ohno, S., and T. Shibata. 1977. Onin. Iwanami Shoten. (In Japanese).
Okumura, M., and K. Tamura. 1996. Zero pronoun resolution in Japanese discourse based on
centering theory. In Proc. of the 16th International Conference on Computational Linguistics
(COLING 96), 8716.
Okutsu, K. 1974. Seisei Nihongo Bunpo Ron: Meisi-ku no kozo. Tokyo: Taishukan.
Platt, J. T. 1971. Grammatical form and grammatical meaning: a tagmemic view of Fillmores
Deep Structure Case concepts. Amsterdam: North-Holland.
Saito, M., 1985. Some Asymmetries in Japanese and Their Theoretical Consequences. MIT disser-
tation.
Sato, R., 1989. Nihongo no Rentai-shushoku-setsu no Imi-kaiseki ni-kansuru Kenkyu. Masters thesis,
Tokyo Institute of Technology. (In Japanese).
, and S.A. Thompson (eds.) 1996. Grammatical Constructions Their Form and Meaning.
Clarendon Press: Oxford.
Shirai, S., S. Yokoo, H. Inoue, H. Nakaiwa, S. Ikehara, and A. Yagi. 1997. Nichi-ei kikai-
honyaku ni-okeru imi-kaiseki no tame no kobun jisho [A structural dictionary for semantic analy-
sis in Japanese-English machine translation]. In Proc. of the Third Annual Meeting of the Japanese
Association for Natural Language Processing, 1536. (In Japanese).
Talmy, L. 1996. The windowing of attention in language. In (Shibatani and Thompson 1996),
chapter 10, 23587.
Teramura, H. 1970. The syntax of noun modification in Japanese. The Journal of the Association
of Teacher of Japanese VI.2355.
BIBLIOGRAPHY 89
1978, 1981. Nihongo no bunpo 1 & 2: Nihongo kyoiku shido sankosko 4 & 5 . Tokyo: Research
Institute of the National Language. (In Japanese).
1980. Meishi shushoku-bu no hikaku. In Nichieigo Hikaku Koza: Bunpo 2 , ed. by T. Kunihiro,
22160. Tokyo: Taishukan. (In Japanese).
Van Valin, R. 1984. A typology of syntactic relations in clause linkage. In Proc. of the Tenth Annual
Meeting of the Berkeley Linguistics Society, 54258.
Vance, T.J. 1987. An Introduction to Japanese Phonology. New York: SUNY Press.