ANAPHORA AND TYPE LOGICAL GRAMMAR
TRENDS IN LOGIC Studia Logica Library
VOLUME 24 Managing Editor Ryszard Wójcicki, ...
22 downloads
706 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ANAPHORA AND TYPE LOGICAL GRAMMAR
TRENDS IN LOGIC Studia Logica Library
VOLUME 24 Managing Editor Ryszard Wójcicki, Institute of Philosophy and Sociology, Polish Academy of Sciences, Warsaw, Poland Editors Vincent F. Hendricks, Department of Philosophy and Science Studies, Roskilde University, Denmark Daniele Mundici, Department of Mathematics “Ulisse Dini”, University of Florence, Italy Ewa Or á owska, National Institute of Telecommunications, Warsaw, Poland Krister Segerberg, Department of Philosophy, Uppsala University, Sweden Heinrich Wansing, Institute of Philosophy, Dresden University of Technology, Germany
SCOPE OF THE SERIES
Trends in Logic is a bookseries covering essentially the same area as the journal Studia Logica – that is, contemporary formal logic and its applications and relations to other disciplines. These include artificial intelligence, informatics, cognitive science, philosophy of science, and the philosophy of language. However, this list is not exhaustive, moreover, the range of applications, comparisons and sources of inspiration is open and evolves over time.
Volume Editor Heinrich Wansing
The titles published in this series are listed at the end of this volume.
ANAPHORA AND TYPE LOGICAL GRAMMAR by
GERHARD JÄGER University of Bielefeld, Germany
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN-10 ISBN-13 ISBN-10 ISBN-13
1-4020-3904-2 (HB) 978-1-4020-3904-1 (HB) 1-4020-3905-0 (e-book) 978-1-4020-3905-8 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springeronline.com
Printed on acid-free paper
All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands.
Contents
List of Tables Preface Acknowledgments
vii ix xiii
1. TYPE LOGICAL GRAMMAR: THE FRAMEWORK
1
1
Basic Categorial Grammar
1
2
Combinators and Type Logical Grammar
17
3
Historical and Bibliographical Remarks
65
2. THE PROBLEM OF ANAPHORA
69
1
Anaphora and Semantic Resource Sensitivity
69
2
Variables in TLG
72
3
Previous Categorial Approaches to Anaphora
76
4
Summary
116
3. LAMBEK CALCULUS WITH LIMITED CONTRACTION
119
1
The Agenda
119
2
Contraction?
120
3
The Logic LLC
121
4
Relation to Jacobson’s System
153
4. PRONOUNS AND QUANTIFICATION
157
1
Basic Cases
157
2
Binding by wh -operators
158
3
Binding by Quantifiers
159
4
Weak Crossover
169
5
Precedence Versus c-command
169
v
vi
ANAPHORA AND TYPE LOGICAL GRAMMAR
6
Backward Binding and Reconstruction
174
5. VERB PHRASE ELLIPSIS 1 Introduction 2 VPE: The Basic Idea 3 Interaction with Pronominal Anaphora 4 Interaction of VPE and Quantification 5 VPE and Polymorphism 6 Parallelism Versus Source Ambiguity
183 183 186 187 195 201 206
6. INDEFINITES 1 Introduction 2 Dekker’s Predicate Logic with Anaphora 3 Bringing PLA into TLG 4 Donkey sentences 5 Indefinites and Scope 6 Sluicing 7 Summary and Desiderata
213 213 215 220 228 245 258 269
References
273
Index
283
List of Tables
1.1 1.2 2.1
The structural hierarchy Substructural Curry-Howard correspondences Categorial approaches to anaphora
vii
30 39 118
Preface
This book discusses how Type Logical Grammar can be modified in such a way that a systematic treatment of anaphora phenomena becomes possible without giving up the general architecture of this framework. By Type Logical Grammar, I mean the version of Categorial Grammar that arose out of the work of Lambek, 1958 and Lambek, 1961. There Categorial types are analyzed as formulae of a logical calculus. In particular, the Categorial slashes are interpreted as forms of constructive implication in the sense of Intuitionistic Logic. Such a theory of grammar is per se attractive for a formal linguist who is interested in the interplay between formal logic and the structure of language. What makes Lambek style Categorial Grammar even more exciting is the fact that (as van Benthem, 1983 points out) the Curry-Howard correspondence—a central part of mathematical proof theory which establishes a deep connection between constructive logics and the λ-calculus—supplies the type logical syntax with an extremely elegant and independently motivated interface to model-theoretic semantics. Prima facie, anaphora does not fit very well into the Categorial picture of the syntax-semantics interface. The Curry-Howard based composition of meaning operates in a local way, and meaning assembly is linear, i.e., every piece of lexical meaning is used exactly once. Anaphora, on the other hand, is in principle unbounded, and it involves by definition the multiple use of certain semantic resources. The latter problem has been tackled by several Categorial grammarians by assuming sufficiently complex lexical meanings for anaphoric expressions, but the locality problem is not easy to solve in a purely lexical way. The main purpose of this book is to develop an extension of Lambek style Type Logical Grammar that overcomes these difficulties and handles anaphora in a systematic fashion. The linguistic applications of the theoretical framework that is developed here focus on three classes of
ix
x
ANAPHORA AND TYPE LOGICAL GRAMMAR
anaphora that are well-studied and well-understood as far as the empirical generalizations go. First and foremost, I will discuss the grammar of anaphoric third person singular pronouns, as illustrated by the following example. (1)
a. Johni invented a problem that hei could not solve. b. [Every student]i invented a problem that hei could not solve.
The second empirical domain that we are going to look at is verb phrase ellipsis, i.e., constructions like (2). (2)
a. John revised his paper, and Bill did too. b. John is happy with his job, but Bill isn’t.
As is well-known, VP ellipsis interacts with pronominal anaphora and quantification in complex ways. The logic of anaphora resolution that I will propose lends itself readily to a simple theory of this kind of ellipsis which covers the basic facts in an empirically adequate way. Finally, I will discuss a third class of anaphora, a version of ellipsis that has been called “sluicing” in the literature. Thereby I mean constructions in which a bare wh-phrase is interpreted as a (direct or indirect) question, as illustrated in (3). (3)
a. She’s reading something, but I can’t imagine what. b. A: She’s reading something. B: What?
The main goal of this work is not so much to develop a novel descriptive theory of anaphora but rather to demonstrate that anaphora can be integrated into Type Logical Grammar without giving up the attractive design of this theory of grammar. Nonetheless, the empirical predictions that we end up with do not always coincide with those of competing analyses, and I (naturally) try to argue that my analysis also gets the facts right in these cases. So the discussion might be interesting for non-Categorial grammarians who are interested in the analysis of anaphora as well. Also, the type logical analysis of donkey anaphora led to a partially novel account of the grammar of indefinites that diverges from established theories in several respects, both theoretically and empirically. The book does not expect prior knowledge of Categorial Grammar or any acquaintance with proof theory that goes beyond the level of some introductory logics course. I do assume though a working knowledge of set theory, first order logic and the typed λ-calculus. The technical level of the book should be easily accessible to anybody who has mastered
Preface
xi
some standard textbook on formal linguistics like Dowty et al., 1981 or Gamut, 1991. The structure of the book is as follows. Chapter 1 gives a selfcontained introduction to the framework of Type Logical Grammar. It does not make any reference to the issue of anaphora whatsoever. It can be used as an introductory text on its own. Readers that are already familiar with TLG, on the other hand, can safely skip this chapter. Chapter 2 discusses previous Categorial approaches to (pronominal) anaphora. This chapter too can be read on its own (or in combination with Chapter 1). The remainder of the book does not build on it in any significant way, so this chaper is not essential for understanding of the subsequent material. Chapter 3 is the core of the book. There I develop the novel type logical machinery that enables us to analyze anaphora resolution. In this chapter I focus on the proof theoretic properties of the resulting type logical calculus, i.e., I present the calculus in different proof theoretic formats, establish their equivalence, and prove essential meta-logical properties like Cut elimination, decidability, finite reading property, strong normalization, and completeness. The remaining chapters apply these theoretical tools to the mentioned empirical areas. Chapter 4 focuses on anaphoric pronouns and their interaction with quantification. Chapter 5 discusses two options for a type logical treatment of verb phrase ellipsis. In Chapter 6 I propose another extension of the underlying formalism to accommodate certain peculiarities of indefinite NPs. This has an impact on the issue of anaphora for two reasons. First, I believe that the problem of donkey anaphora is mainly a problem of indefiniteness, less a problem of anaphora. So an adequate treatment of donkey pronouns requires a theory of indefiniteness. Furthermore, sluicing is a form of ellipsis that interacts closely with the grammar of indefinites. I discuss these anaphora related aspects of indefiniteness, and I also consider some empirical issues pertaining to the grammar of indefinites as such, namely their peculiar scope taking behavior. Now that I have explained what the book is intended to be, a few words about what it is not. The intended audience are mainly formally inclined linguists and computational linguists with an interest in logic. I try to illustrate how a logical grammar can contribute to a both formally precise and empirically comprehensive linguistic theory. It is not my goal to give an introduction into the Lambek calculus and related substructural calculi for logicians. Therefore, issues that are important to logicians but of lesser relevance to the linguistic applications are not covered in great depth. This concerns especially model theory and the
xii
ANAPHORA AND TYPE LOGICAL GRAMMAR
relation of type logics to modal logic and Linear Logic. Likewise, extensions of the Lambek calculus that are interesting from a logical point of view but without obvious linguistic applications—like negation or additive connectives—are not discussed. Neither could I deal with all facets of Type Logical Grammar that emerged within the last decade. The focus of the book is on the treatment of anaphora. The introductory Chapter 1 gives an overview over the “classical” version of Type Logical Grammar, but various new developments that have no immediate connection to anaphora are left out. This concerns especially non-associative Categorial calculi, multimodal extions of TLG, and the calculus of proof nets.
Acknowledgments
Most ideas that I present in this book were developed while I was a postdoc at the Institute for Research in Cognitive Science of UPenn in Philadelphia 1997 and 1998. It is no exaggeration to say that at that time, the IRCS was one of the best institutes in the world to conduct research on formal grammar. Thanks to Aravind Joshi for making the place what it is! I profited immensely from the discussions with my colleagues there. Many people gave me inspiration and feedback, but I feel the contacts to Robin Clark, Seth Kulick, Jeff Lidz, Mark Steedman and Yael Sharvit were especially important. Last but not least, Dick Oehrle’s occasional visits to Philadelphia were very rewarding. Natasha Kurtonina deserves a special mention. Due to a lucky coincidence, we came to the IRCS at the same time and wound up being office mates. She has an extraordinary gift for explaining things, and most of what I know about the “Logic” in “Type Logical Grammar”, I learned from her. She never tired of pointing out the flaws in my proofs and digging out literature that might be relevant for my research. Last but not least, I thank her for always being a good friend. When I left Philadelphia, my work on anaphora in TLG consisted of a couple of half finished papers and a lot of loose ideas. My time as a visitor at the Utrecht Institute of Linguistics in 2000 and 2001 gave me the opportunity to finally write everything down in a coherent way. It is hard to find a place with a higher concentration of excellent categorial grammarians than the OTS, and this created the right atmosphere to finish this work. Thanks to my Utrecht colleagues, especially to Michael Moortgat, for making the time in Utrecht a pleasant and productive one. While I was finally writing down the manuscript, Cornelia Endriss got me interested into the issue of indefiniteness again. We had a vivid intellectual exchange on this over several months. The last chapter of the book would have taken a different shape without this, and perhaps it
xiii
xiv
ANAPHORA AND TYPE LOGICAL GRAMMAR
would not exist at all. Even though we finally drew different conclusions on what a correct approach to specificity should look like, many of the ideas and observations from this chapter are due to Neli. I had the opportunity to present material from this book at various talks at Berlin, UPenn, the MIT, Utrecht, D¨ usseldorf, Amsterdam and Leiden, and I am grateful for the comments I received at these occasions. I am also indebted to the students of my Categorial Grammar classes in Potsdam and in Utrecht for the feedback they gave me. I owe a lot to Raffaella Bernardi, Christian Ebert, Bryan Jurish, Manfred Krifka, Glyn Morrill, and Willemijn Vermaat for reading previous versions of the manuscript or parts of it, and making numerous helpful suggestions for improvement. All remaining errors are of course mine. Special thanks go to Bryan for spending a lot of effort correcting my English. This book is a revised version of my habilitation thesis that I defended at the Humboldt-University at Berlin in 2002. I would like to thank the committe members Marcus Kracht, Manfred Krifka and Michael Moortgat for their encouragement and support. It was also Marcus who suggested that I submit the manuscript to the “Trends in Logic”. Last but not least I thank the series editor Heinrich Wansing for the very good cooperation during the final preparations for publication, and the two anonymous reviewers for their suggestions and comments.
Chapter 1 TYPE LOGICAL GRAMMAR: THE FRAMEWORK
1. 1.1
Basic Categorial Grammar Informal Introduction
All versions of Categorial Grammar that have been developed in the past 30 years can be traced back to the pioneering work of Bar-Hillel, 1953. His system, despite its obvious limitations, in nuce contains most of the features that make the Categorial approach attractive to the present day. It is thus a natural starting point for a presentation of its more sophisticated descendant, Type Logical Grammar. It rests on a fundamental intuition about the structure of languages (both natural and formal ones) which says that linguistic signs may be complete or incomplete. Under this perspective, grammatical composition can be described as the process of completing incomplete linguistic signs. Basic Categorial Grammar (BCG henceforth) is probably the grammatical framework that expresses this intuition in its purest form. Consider the sentence (1)
Walter snores.
The name Walter has a simple semantic function; it just refers to the individual called “Walter”. We may thus consider the expression Walter to be complete, since its linguistic function does not depend on its linguistic context. Similarly, the sentence Walter snores is complete insofar as its denotation is a proposition with a truth value that depends on the extralinguistic context only. The verb snores, however, is incomplete in a sense. It serves to constitute a proposition, but it needs a subject to do so. Semantically it serves as a function that turns an individual into a proposition.
1
2
ANAPHORA AND TYPE LOGICAL GRAMMAR
Following standard practice, I use the label np for names (and phrases that have a comparable distribution) and s for sentences. The verb snores is thus an incomplete expression that turns an np into an s. Using a notation from Linear Logic (Girard, 1987), this intuition could be expressed by snores : np−◦s In words this says that the expression snores has the category np−◦s. An important piece of information is missing here though. Consider the more complex sentence (2)
Walter knows Kevin.
Here the transitive verb knows is doubly incomplete; it requires two nps to turn it into an s. We may express this with knows : np−◦(np−◦s) However, for an adequate description of the linguistic facts we also need the information that one np occurs to the right and one to the left of the verb. Following Bar-Hillel, 1953, I therefore distinguish between two kinds of incomplete expressions: forward looking functors that have a category of the form A/B (pronounced: “A over B”) and expect the missing piece on their right, and backward looking functors that have a category of the form A\B (“A under B”) and expect the missing piece on their left. A more adequate description of the facts collected so far would thus be Walter, Kevin : np snores : np\s knows : (np\s)/np The derivation of the two example sentences can conveniently be expressed in tree format as s
s
np
np\s
Walter
snores
np\s
np Walter
(np\s)/np
np
knows
Kevin
3
Type Logical Grammar: The Framework
In the sequel, I will refer to the A in the types A/B and B\A as the goal category and to the B as the argument category. The rules that are used in this derivation are (where ab is the concatenation of the strings a and b): 1 If a has category A/B and b has category B, then ab has category A. 2 If a has category A and b has category A\B, then ab has category B. These derivation schemes are sometimes called cancellation rules since they bear an obvious analogy to the arithmetic law1 x ×y =x y
Complex categories. It should be added that both the goal category and the argument category of a complex category may be complex themselves. The category of transitive verbs—(np\s)/np—already provides an example. Manner adverbs like faintly illustrate this point further; they combine with an intransitive verb phrase (category np\s) to yield an intransitive verb phrase. Both the argument category and the goal category are complex here. (3)
a. faintly : (np\s)\(np\s) b. Kevin snores faintly s c. np Kevin
np\s np\s
(np\s)\(np\s)
snores
faintly
Recursion. The category of adverbs also illustrates that the argument category and the goal category may be identical. If a BCG assigns such categories to an expression, the described language will display a recursive structure (but note that recursion may be realized in other ways as well). Adjectives are another case in point. They are attached to a common noun phrase (category n) to produce an expression of exactly this category. Figure 1.1 on the following page illustrates this. 1 Bar-Hillel’s system is based on the work of Ajdukiewicz, 1935 where the analogy is even more striking since it does not distinguish between forward looking and backward looking functors.
4
ANAPHORA AND TYPE LOGICAL GRAMMAR s
np
snores
n
np/n The
np\s
n
n/n old
n
n/n old
n/n old
Figure 1.1.
n n/n
n
old
man
Recursion
Semantic composition. Categories in Categorial Grammar represent two kinds of information. They encode how a sign combines with other signs both syntactically and semantically. Incomplete signs denote functions, and syntactic composition is accompanied by function application in semantics. The structure of the category of a sign is mirrored in the type of the function that it denotes. If the goal category of a sign is complex, its denotation is a curried function (i.e., a function whose values are functions themselves). Categories with complex argument categories correspond to higher order functions—functions that take other functions as arguments. If the semantic component of signs is represented by terms of the typed λ-calculus, syntactic and semantic composition can be displayed simultaneously in a tree structure, as illustrated in Figure 1.2.
1.2
The Formal System
1.2.1 Syntax After this rather informal description of BCG, let us make these intuitions precise. I begin with a formal specification of the notion of category. A BCG comprises finitely many basic categories (also atomic categories). Most linguistic applications make do with very few—the set {s, np, n, pp} is sufficient in many cases, but this is not essential. Complex categories are formed from basic ones by means of the connectives “/” (forward looking slash) and “ \ ” (backward looking slash).
5
Type Logical Grammar: The Framework s faintly’(λx.call’(x, kevin’))walter’
np\s faintly’(λx.call’(x, kevin’))
np walter’ Walter
np\s λx.call’(x, kevin’)
Figure 1.2.
(np\s)/np λyx.call’(x, y)
np kevin’
called
Kevin
(np\s)\(np\s) faintly’ faintly
Semantic composition of Walter called Kevin faintly
Definition 1 (Categories) Let a finite set B of basic categories be given. CAT(B) is the smallest set such that 1 B ⊆ CAT(B) 2 If A, B ∈ CAT(B), then A/B ∈ CAT(B) 3 If A, B ∈ CAT(B), then A\B ∈ CAT(B)
Bracketing convention:. I assume that the forward slash associates to the left, i.e., A/B/C is shorthand for (A/B)/C. The backward slash associates to the right, i.e., A\B\C stands for A\(B\C). Furthermore, forward slash takes precedence over the backward slash; A\B/C means A\(B/C). Like any formal grammar, a BCG consists of a lexical and a syntactic component. Ignoring semantics for a moment, the lexicon of a BCG is a mapping that assigns finitely many categories to each element of some finite set of strings. A lexical unit may have more than one category since linguistic units may be lexically ambiguous (as for instance walk in English, which is both a common noun and an intransitive verb). Definition 2 ((Uninterpreted) Lexicon) Let an alphabet Σ and a finite set B of basic categories be given. A BCG-lexicon LEX is a finite relation between Σ+ (the set of non-empty strings over Σ) and CAT(B).
6
ANAPHORA AND TYPE LOGICAL GRAMMAR
The syntactic component is identical for all BCGs. It consists of a series of axiom schemes and rule schemes that jointly constitute a deductive system. I choose sequent presentation as a convenient format for a description of a deductive system. A sequent consists of a sequence of formulae A1 , . . . , An (of some formal language)—the antecedent, and a single formula B, the succedent. Antecedent and succedent are connected by the deduction symbol ⇒. A1 , . . . , An ⇒ B This sequent expresses that the succedent B can be derived from the antecedent A1 , . . . , An . Applied to BCG, the formulae in the sequents are elements of CAT(B). Trivially, every category A can be derived from itself. This is expressed by the identity axiom scheme id. Here and henceforth, I use letters A, B, C, . . ., possibly augmented with indices, as variables over categories.
A⇒A
id
Furthermore, it is possible to use lemmas in a derivation. In other words, a preliminary result of a derivation can be plugged into another derivation. This is expressed by the Cut rule. Letters X, Y, Z, . . . are variables over (possibly empty) sequences of categories.2 X⇒A
Y, A, Z ⇒ B
Y, X, Z ⇒ B
Cut
Finally, the fraction cancellation schemes informally given above represent valid deductions:
A/B, B ⇒ A
A>
B, B\A ⇒ A
A<
I will call these axioms “forward application” and “backward application” respectively. 2 The left hand side of a sequent will never be empty in BCG because all axioms have nonempty left hand sides, and applications of Cut never decrease the length of the left hand sides of sequents.
Type Logical Grammar: The Framework
7
A sequent X ⇒ A is derivable iff it can be obtained from the axioms by finitely many applications of the Cut rule. This notion of deduction is summarized in the following definition. (I use the Kleene star in its usual meaning, i.e., Σ∗ is the set of (possibly empty) finite sequences of elements of Σ.)
Definition 3 (Derivability) Let a set B of basic categories be given. Then the relation B is the smallest set with the following properties (I write X ⇒ A instead of X ⇒ A ∈ B ): For arbitrary A, B, C ∈ CAT(B) and X, Y, Z ∈ CAT(B)∗ 1 A⇒A 2 A/B, B ⇒ A 3 B, B\A ⇒ A 4 If X ⇒ A and Y, A, Z ⇒ B, then Y, X, Z ⇒ B. Last but not least, a BCG grammar of a given language L specifies a finite set of designated categories, i.e., the categories of sentences of L. Usually, this set is tacitly assumed to be the singleton {s}. Lexicon, deductive rules and designated categories jointly determine a formal language L in the following way. Suppose a1 . . . an is a sequence of strings over the alphabet Σ such that the lexicon of a given BCG grammar assigns each ai at least one category. If we replace each ai by one of its lexically assigned categories, we obtain finitely many different sequences of categories. a1 . . . an is an element of L if and only if at least one designated category is derivable from at least one of these sequences. The following definition summarizes the notion of a BCG grammar:
Definition 4 (BCG Grammar) Let an alphabet Σ be given. A BCG grammar G is a triple B, LEX, S, where B is a finite set (the basic categories), LEX is a finite sub-relation of Σ+ × CAT(B), and S is a finite subset of CAT(B) (the designated categories). Such a grammar determines a language over Σ in the following way:
Definition 5 Let G = B, LEX, S be a BCG grammar over the alphabet Σ. Then α ∈ L(G) iff there are a1 , . . . , an ∈ Σ+ , A1 , . . . , An ∈ CAT(B), and S ∈ S such that 1 α = a1 . . . an ,
8
ANAPHORA AND TYPE LOGICAL GRAMMAR
2 For all i such that 1 ≤ i ≤ n : ai , Ai ∈ LEX, and 3 A1 , . . . , An ⇒ S. Let me illustrate this notion of string recognition with an example. We consider the formal language that comprises the well-formed arithmetical equations without variables. So strings from this language are for instance 0 = 1 (3 ∗ 5999) = (72 − 16) + (17 : 77777777) (101 − (202 − 303)) = (0 − 1) .. . We use five basic categories: e (equation), t (term), lb (left bracket), rb (right bracket), and n (number). The lexicon assigns categories to atomic strings (I write “a ; A” rather than “a, A ∈ LEX”): {+, −, ∗, :} = ( ) {1, 2, 3, 4, 5, 6, 7, 8, 9} {1, 2, 3, 4, 5, 6, 7, 8, 9} {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
; ; ; ; ; ; ; ;
t\lb\t/rb/t t\e/t lb rb t/n n/n n t
The sample grammar contains exactly one designated category, namely e. In Figure 1.3 on the next page I give a sample derivation for the equation “12 = 1”. The interested reader is invited to test the adequacy of our grammar for more complex examples. One way to replace every lexical unit in the string in question by one of its lexical categories gives us the sequence t/n, n, t\e/t, t. The string in question is recognized by the sample grammar if we can derive the designated category e from this antecedent. This means that we have to prove the sequent t/n, n, t\e/t, t ⇒ e. This can be done by connecting three application axioms by means of three applications of the Cut rule.
Decision procedure. Derivability in the deductive component of a BCG is easily shown to be decidable. Observe that for each derivable
9
Type Logical Grammar: The Framework
t/n, n ⇒ t
A>
t, t\e/t ⇒ e/t
e/t, t ⇒ e
t, t\e/t, t ⇒ e
t/n, n, t\e/t, t ⇒ e | | | | 1 2 = 1 Figure 1.3.
A<
A> Cut
Cut
Sequent derivation of “12 = 1”
sequent X ⇒ A, the succedent A is a subformula of some formula in X. (All axioms have this property, and it is preserved under Cut.) So we can do bottom up proof search by testing for a sequent in question whether (a) it is an instance of an axiom, or (b) it is the conclusion of some instance of Cut, and the two premises are derivable. The premises of a Cut rule always have a lower complexity (consist of fewer symbols) than its conclusion, so this procedure is bound to terminate after finitely many steps. If deduction is decidable, string recognition by a BCG grammar is decidable as well. Since both the lexicon and the set of designated categories is finite, recognition of a given string reduces to derivability of finitely many different sequents.
1.2.2 Relation to Context Free Grammars Basic Categorial Grammars are closely related to Context Free Grammars. These two grammar formats in fact (weakly) recognize the same class of languages. This equivalence was established in the seminal paper Bar-Hillel et al., 1960. I will sketch the basic idea of an equivalence proof. For an in-depth discussion of this and related issues the reader is referred to the excellent overview article Buszkowski, 1997. The notion of a derivation in BCG and CFG (Context Free Grammars) are highly similar to start with. Context free derivations can also be seen as deductions in a deductive system that contains the identity axiom and is closed under Cut. Instead of the two application schemes, however, CFGs have context free rules as additional axioms. To transform a BCG into an equivalent CFG, it is thus sufficient to demonstrate that only finitely many instances of the application schemes are used in actual derivations. These instances can then be reinterpreted as CFG rules. As was mentioned above, in every derivable BCG sequent, the succedent is a subformula of one element of the antecedent of this sequent (since this property holds of all axioms, and it is preserved under Cut).
10
ANAPHORA AND TYPE LOGICAL GRAMMAR
Given this, it is straightforward to see that all categories that occur in a premise of a Cut rule are subformulae of categories that occur in the conclusion of this Cut. This in turn entails that in the derivation of a sequent X ⇒ A, only subformulae of X or of A are used. In particular, all application axioms used in the derivation consist of subformulae of the sequent to be derived. In the derivation of grammatical strings, only subformulae of lexical categories are used in the antecedents and only subformulae of designated categories as succedents. Since there are only finitely many such categories for a given BCG, in fact finitely many instances of application are sufficient. It follows from these considerations that every language that is recognized by a BCG is also recognized by some CFG. The inclusion in the other direction is much harder to establish—the proof is the central point of Bar-Hillel et al., 1960. It is easy to demonstrate though if we use the Greibach Normal Form Theorem (Greibach, 1965):
Theorem 1 (Greibach Normal Form Theorem) Every context free language L is recognized by some CFG G that only contains rules of the form A → aα (where “A” ranges over non-terminals, “a” over terminals, and “α” over (possibly empty) sequences of non-terminals). Let L be some context free language that is recognized by some CFG G in Greibach Normal Form. An equivalent BCG can easily be constructed. We identify the set of non-terminals of G with the set of basic categories. The CFG rules of G are transformed into lexical assignments by going from A→a to a;A and from A → aB1 . . . Bn to a ; A/Bn / · · · /B1 Finally we identify the start symbol of G as the only designated category of the BCG.
Type Logical Grammar: The Framework
11
1.2.3 Semantics Semantic types. Despite this strong similarity between BCG and CFG, the former has at least one conceptual advantage over the latter pertaining to its natural connection to semantic interpretation. As I alluded to in the beginning, the Categorial architecture is actually strongly motivated by semantic considerations. In phrase structure grammars and related formalisms, the syntactic category of a sign on the one hand and its semantic type on the other hand are independent semiotic dimensions that have to be specified separately. In Categorial Grammar, these components are closely linked. In other words, the category of a linguistic sign codes two kinds of information: it determines the (syntactic) combinatory potential of this sign, and it specifies which type of denotation the sign has. Let us make this precise. I assume that the reader is more or less familiar with type theoretic interpretation and summarize the basic notions very briefly. Analogously to basic syntactic categories, there is a finite set of basic semantic types. The set of semantic types is the closure of the basic types under function space formation: Definition 6 (Semantic types) Let BTYPE be a finite set (of basic semantic types). The set TYPE of semantic types is the smallest set such that 1 BTYPE ⊆ TYPE 2 If a, b ∈ TYPE, then a, b ∈ TYPE. In linguistic applications, the basic types usually contain at least the types e (for “entity”) and t (for “truth value), but this is not part of the general format. Semantic types correspond to ontological domains. The semantic type of a sign indicates what kind of object the denotation of this sign will be. The space of semantic domains has a recursive structure analogous to the set of semantic types; a complex type a, b always corresponds to the set of total functions from the domain of type a into the domain of type b.
Definition 7 (Domains) The function Dom is a semantic domain function iff 1 The domain of Dom is TYPE, 2 for all A ∈ TYPE, Dom(A) is a non-empty set, and 3 Dom(a, b) = Dom(b)Dom(a)
12
ANAPHORA AND TYPE LOGICAL GRAMMAR
This definition does not restrict the assignment of domains to basic types (beyond the requirement that domains are non-empty). Following the lead of Montague, 1974, the conventional basic types e and t are usually mapped to some set E of individuals and the set {0, 1} of truth values respectively, but again, this is not part of the general architecture. Given a correspondence between basic categories and semantic types, the semantic type of a sign of an arbitrary category can be predicted from the internal structure of this category. More formally put, the semantic type of a linguistic sign is the homomorphic image of its syntactic category.
Definition 8 (Category to type correspondence) Let τ be a function from CAT(B) to TYPE. τ is a correspondence function iff τ (A\B) = τ (B/A) = τ (A), τ (B) Compositional Interpretation. Semantic types serve a double function. Primarily, they restrict the possible denotations of linguistic signs. If an expression has the syntactic category A, its denotation will be an element of Dom(τ (A)). Semantic types simply identify categories that differ only syntactically but not semantically (like np/np and np\np). For practical purposes, we are rarely concerned with actual meanings (i.e., model theoretic entities) but we deal with meaning representations that are formulated in the language of the typed λ-calculus. So we are mainly interested in a compositional translation from natural language into the semantic representation language. Since the denotation of λ-expressions is unambiguous and well-understood, such a translation indirectly determines a compositional interpretation of the object language. Semantic types not only determine the range of possible interpretations of a linguistic sign, they also determine the syntactic properties of its translation. The semantic types that we use are the syntactic categories of the semantic representation language. The category-to-type correspondence restricts possible translations from natural language to the λ-calculus. The translation—and thus indirectly the meaning—of a basic expression is determined in the lexicon. So a Categorial lexicon for an interpreted language is a three place relation, relating the form of an expression, its syntactic category, and its translation into the λ-calculus. I revise the definition of a lexicon accordingly. EXPa is the set of expressions of the typed λ-calculus that have type a. Definition 9 ((Interpreted) Lexicon) Let an alphabet Σ, a finite set B of basic categories and a correspondence
13
Type Logical Grammar: The Framework
function τ be given. An interpreted BCG-lexicon LEX is a finite subrelation of (Σ+ × {A} × EXPτ (A) ) A∈CAT(B)
The lexically determined form-category-meaning relation can be extended to all constituents that are recognized by the corresponding BCG. To this end, the axioms and rules of BCG have to be supplied with operations on the meanings of their operands (or, more precisely, with operations on their semantic representations). These are conveniently represented by means of labeled deduction. Antecedent formulae of sequents are labeled with variables and succedents with possibly complex terms of the λ-calculus. The actual translation of a constituent is obtained by replacing the free variables in the succedent term with the corresponding lexical translations of the formulae in the antecedent. The labeled BCG rules are given below. I write l : C for category C carrying the label l. First some remarks on notation are in order. I use lower case letters x, y, z, . . . as metavariables over variables of the λ-calculus and upper case letters M, N, O, . . . as metavariables over λterms. “N [M/x]” is the result of replacing all free occurrences of x in N by M . Variables (free or bound) that occur in different sequents are tacitly assumed to be different, so no variable clashes can arise. Finally, I omit brackets for function application; so the term “M N ” is the result of applying the functor M to the argument N (which is sometimes also written as “M (N )”). x:A⇒x:A
X⇒M :A
id
x : A/B, y : B ⇒ xy : A
Y, x : A, Z ⇒ N : B
Y, X, Z ⇒ N [M/x] : B A>
x : B, y : B\A ⇒ yx : A
Cut
A<
For complex sequent derivations, I will use proof trees as typographic format. There the antecedent formulae of a sequent are written in appropriate order on top of a horizontal line, and the succedent appears below it. So a sequent like (4a) is displayed as (4b). (4)
a. A1 , . . . , An ⇒ B b. A1 , . . . , An
B The Cut rule is compiled into the proof tree format: Two derivations that meet the conditions for a Cut application can be merged into a complex
14
ANAPHORA AND TYPE LOGICAL GRAMMAR
tree by unifying the succedent of one derivation with an element of the antecedent of the other one that has the same name. For instance, the Cut derivation in (5a) is represented as the proof tree in (5b). (5)
a. b.
A, B ⇒ C
C, D ⇒ E
A, B, D ⇒ E A
Cut
B C
D E
Lexical entries are considered as axiomatic premises and thus appear at the leafs of a proof tree. So in sum Categorial derivations in proof tree format resemble ordinary phrase structure trees very closely (apart from the fact that proof trees have the leafs at the top). The complete derivation in proof tree format for our example Walter knows Kevin on page 2 now comes out as: (6)
a. Lexicon Walter – walter’ : np knows – know’ : (np\s)/np Kevin – kevin’ : np b. knows W alter
walter’ : np
lex
know’ : (np\s)/np
lex
Kevin kevin’ : np
know’kevin’ : np\s
know’kevin’walter’ : s
lex A>
A<
Structural ambiguities as multiple proofs. Sometimes there is more than one way to prove that a given string is a sentence. An example is given in (7). The two possible proofs are given in Figure 1.4 on the facing page. (To simplify the derivation, I treat to talk as a single lexical entry, and I ignore the morphological distinction between finite and infinite VPs.) (7)
John asked Bill to talk faintly.
λ-labels have two functions. They supply a semantic representation of the constituent in question, and they record the history of a proof. If a sentence has two non-equivalent proofs, it receives two distinct labels
15
Type Logical Grammar: The Framework asked
lex
ask’ (np\s)/(np\s)/np John
lex
Bill b’ np
lex A>
to talk talk’ np\s
ask’b’ (np\s)/(np\s)
j’ np
f aintly
lex
faintly’ (np\s)\(np\s) faintly’talk’ np\s
ask’b’(faintly’talk’) np\s
lex A<
A>
A<
ask’b’(faintly’talk’)j’ s
asked
lex
ask’ (np\s)/(np\s)/np
Bill b’ np
lex A>
talk’ np\s
ask’b’ (np\s)/(np\s) John
lex
to talk
ask’b’talk’ np\s
j’ np
lex f aintly
A>
faintly’ (np\s)\(np\s)
faintly’(ask’b’talk’) np\s
lex A<
A<
faintly’(ask’b’talk’)j’ s Figure 1.4.
Derivations for (7)
and we thus predict two interpretations. For the example (7) this is borne out, it is an instance of a run-of-the-mill attachment ambiguity. Quite generally, structural ambiguities are treated as multiple proofs for the same sequent. Different proofs need not correspond to different meanings though. Proofs may differ in inessential aspects. Using an identity axiom as a premise for a Cut rule, for instance, is just redundant and has no impact on interpretation. So the two sub-proofs in Figure 1.5 on the next page are equivalent, and they derive the same λ-labels. In the literature, this phenomenon is called spurious ambiguity . Finally it should be added that the format of BCG (and all other Categorial systems) allows for the assignment of multiple meanings to the same form. So lexical ambiguity is another source of a multiplicity of meanings for the same form.
16
ANAPHORA AND TYPE LOGICAL GRAMMAR .. .
.. . X⇒M :A Figure 1.5.
1.3
X⇒M :A
x:A⇒x:A
X⇒M :A
id Cut
Two equivalent proofs
Conclusion
I close this section by pointing out three features of BCG that are shared by the more involved systems of Categorial Grammar to be introduced later in the book and that are characteristic for the Categorial approach in general. Surface orientation Categorial Grammar assumes a monostratal model of grammar. This means that there is only one level of syntactic representation, namely surface structure. Accordingly, CG does without empty categories like traces and empty pronominal elements. Since there are no different levels of representation, there is no room for transformations. Also, CG does not assume syntactic structures as independent objects of linguistic theory. Constituent structures show up as structures of proofs in a derivation, but they are part of our theorizing about language, not of language itself. It is therefore impossible to formulate constraints on syntactic objects that make reference to notions of dominance, c-command, m-command etc. Compositional interpretation Most theories of the syntax-semantics interface assume a level of syntactic representation that serves as input for a compositional semantic interpretation. This level may be a syntactic tree structure like “Logical Form” in generative approaches (see for instance Heim and Kratzer, 1998) or specifically semantic representations like “Discourse Representation Structures” in DRT (cf. Kamp and Reyle, 1993). This semantic representation is interpreted compositionally, i.e., the meaning of complex representations are determined by the meanings of their parts and the way they are combined. Since CG assumes surface structure to be the only level of syntactic representation, it is surface structure that is interpreted compositionally. The fact that we use the λ-calculus as semantic representation language is not
Type Logical Grammar: The Framework
17
at odds with this overall approach: As long as the translation from surface structure to the semantic representation language is compositional it can in principle be dispensed with.3 Lexicalism CG does without syntactic structures as independent objects, and it assumes a highly impoverished syntactic component of grammar. In BCG, we only have the two application schemes as syntactic rules. (The identity axiom and the Cut rule are part of the very notion of a syntactic derivation.) This syntactic component is assumed to be universal across languages.4 The sole locus of non-trivial linguistic generalizations is the lexicon. Both paradigmatic regularities within a language (that would be covered by transformations in generative approaches) and parametric variation between languages have to be formulated as constraints over possible lexical entries.
2.
Combinators and Type Logical Grammar
Type Logical Grammar is an extension of Basic Categorial Grammar which reconstructs the deductive componenent of the grammar as the proof system of a logical calculus. In this sections I will give an introduction to the simplest version of TLG, which is based on the Lambek calculus from Lambek, 1958. The structure of the section is as follows. I first present some linguistically motivated proposals from the literature that BCG should be extended by additional inference rules, so-called “combinators”. In the next step, I discuss a family of logical calculi called the “structural hierarchy”, and I point out that one representative of this class of logics, the Lambek calculus, supplies us with an elegant meta-theory of combinatory extensions of BCG. In the main part of the section, I explore the logical and linguistic aspects of the system of Categorial Grammar that uses the Lambek calculus as its deductive component.
2.1
Combinators and Coordination
Over the years, empirical research on the syntax-semantics interface revealed that the view on grammatical composition inherent in Basic Categorial Grammar is too rigid for realistic linguistic analyses. Coordination phenomena can serve to illustrate this point very clearly. 3 Janssen,
1997 contains an in-depth discussion of these issues. assumption is not shared in full generality by proponents of Combinatory Categorial Grammar, cf. Steedman, 1996. 4 This
18
ANAPHORA AND TYPE LOGICAL GRAMMAR
Coordination particles like and or or are polymorphic, i.e., their arguments can be of different categories: (8)
a. John walked and Bill talked. b. John walked and talked. c. John loves and plays soccer.
Assigning the particle and a plethora of different categories would miss a crucial generalization, the fact that the two arguments of and are always of the same category, and that the constituent created by coordination has this very category. This can be expressed by means of an inference scheme for coordination which covers all category instances: (9)
X⇒A
Y ⇒A
X, and, Y ⇒ A
Conj
We restrict our attention to Boolean coordination, i.e., coordination where the two conjuncts denote functions into truth values.
Definition 10 (Boolean types and categories) 1 t is a Boolean type. 2 If a is a Boolean type, so is b, a. 3 If τ (A) is a Boolean type, A is a Boolean category. The coordination scheme given in (9) is restricted to instances where A is a Boolean category. The main motivation for assuming a coordination scheme rather than different lexical entries for and comes from the observation that the meaning of and is constant across its different category instances. It always denotes set intersection. So the polymorphic syntactic operation is accompanied by a polymorphic semantic operation.
Definition 11 (Boolean Conjunction) . 1 If ϕ and ψ have type t, ϕ ∩ ψ = ϕ ∧ ψ. . 2 If ϕ and ψ have the Boolean type a, b, then ϕ ∩ ψ = λxa .(ϕx) ∩ (ψx). Combining the syntactic and semantic aspects leads to the semantically labeled coordination scheme:
19
Type Logical Grammar: The Framework
Definition 12 (Coordination scheme) X⇒M :A
Y ⇒N :A
Conj X, and, Y ⇒ M ∩ N : A This scheme is only applicable if the labels are defined, so it is implicitly restricted to Boolean categories. As the derivations in Figure 1.6 show, the coordination scheme enables us to derive the correct interpretations for the examples in (8) without resorting to ellipsis or other non-overt means. John
walked
lex
j’ : np
walk’ : np\s
walk’j’ : s
lex A<
Bill b’ : np
talked
lex
talk’ : np\s
talk’b’ : s
(walk’j’) ∧ (talk’b’) : s walked John j’ : np
walk’ : np\s
lex
lex
talked talk’ : np\s
λx.(walk’x) ∧ (talk’x) : np\s
(walk’j’) ∧ (talk’j’) : s loves love’ : (np\s)/np John j’ : np
lex
lex
plays play’ : (np\s)/np
λxy.(love’xy) ∧ (play’xy) : (np\s)/np
lex Conj
Figure 1.6.
A<
Conj
lex Conj A<
soccer soccer’ : np
λy.(love’soccer’y) ∧ (play’soccer’y) : np\s (love’soccer’j’) ∧ (play’soccer’j’) : s
lex
lex A>
A<
Derivations for (8)
However, there are several coordination patterns in English (as well as in other languages) that seem to involve Boolean coordination, but which are not derivable in BCG plus coordination scheme. We restrict our attention to three configurations.
2.1.1 Quantifiers At a first glance, quantifier phrases like every donkey, some man, most farmers etc. have the same distribution as names, so it is tempting to
20
ANAPHORA AND TYPE LOGICAL GRAMMAR
assign them the category np as well. Simple semantic considerations show however that this would lead to incorrect predictions. Consider the following two deductions: (10)
a. John walked and John talked. John walked and talked. b. Some man walked and some man talked. Some man walked and talked.
Even though (10a) and (b) seem parallel, the inference scheme (10a) is valid and (10b) is not. According to the semantics of coordination predicted by the coordination scheme, this inference pattern is valid, however, if the subject expression has type e. So due to category-totype correspondence, quantifiers like some man cannot have category np. On the other hand, some man yields a sentence if it is combined with a VP to its right. So the only appropriate type assignment for some man will be s/(np\s), which corresponds to the quantifier type e, t, t. Under this category assignment, the inference in (10b) is not predicted to be generally valid. Nevertheless, names and quantifiers are conjoinable. (11)
John and somebody walked.
This entails that (a) names have a Boolean type as well (np is not Boolean), and (b) names and quantifiers have the same type. From these considerations, Montague, 1974 drew the conclusion that names have the syntactic category of quantifiers, namely s/(np\s). This in turn leads to an overly complicated meaning assignment to names. It has to be assumed that the denotation of John is the set of all properties that the individual John has, represented by λP.P j’. Several authors (like Partee and Rooth, 1983) noticed that this unsatisfactory state of affairs can be improved if we assume that the grammar admits an operation that lifts names to the category of generalized quantifiers. This Lifting Rule in its most general form can be formulated as (12)
X⇒M :A X ⇒ λx.xM : B/(A\B)
T>
Type lifting is conventionally labeled with T. Since there are two directional versions of it, I distinguish them with subscripts “>” and “<”. The inference from category np to category s/(np\s) is just one instance of this rule. The derivation in Figure 2.1.1 on the next page shows that this rule admits a correct derivation of example (11).
21
Type Logical Grammar: The Framework
John j’ : np
lex
λQ.Qj’ : s/(np\s)
T>
somebody λP.∃xP x : s/(np\s)
λP.(P j’) ∧ ∃xP x : s/(np\s)
lex conj
(walk’j’) ∧ ∃xwalk’x : s Figure 1.7.
walked walk’ : np\s
lex A>
Derivations for (11)
2.1.2 Right Node Raising Even though phrase structures are not part of linguistic representations in BCG, the structure of lexical categories induces a constituent structure that determines what chunks are conjoinable. Under the customary category assignments, in a simple transitive clause such as (13)
John likes broccoli.
the substring likes broccoli can be assigned a category—np\s, but not the substring John likes. Nevertheless the complex “subject + transitive verb” is conjoinable: (14)
John likes and Bill detests broccoli.
This kind of construction requires the extension of the BCG core with yet another inference scheme called forward function composition. As the name indicates, the corresponding semantic operation is function composition, and the inference scheme operates with forward looking functors. (The label B> for this inference scheme is inspired by Curry and Feys’ (1958) function composition combinator.) (15)
X ⇒ M : A/B
Y ⇒ N : B/C
X, Y ⇒ λx.M (N x) : A/C
B>
The derivation of the Right Node Raising construction (14) requires lifting of the two subjects to the category of quantifiers, followed by function composition of the lifted subjects with the transitive verbs. It is given in Figure 1.8 on page 24.
2.1.3 Left Node Raising Coordination of chunks that a phrase structure grammar (or BCG) would not analyze as constituents is pervasive in natural language. Right Node Raising constructions can informally be described as involving
22
ANAPHORA AND TYPE LOGICAL GRAMMAR
deletion of right peripheral material in the first conjunct. The mirror image pattern exists as well. The following example, where clusters of arguments are conjoined, is a case in point. (16)
John introduced Bill to Sue and Harry to Sally.
Here apparently the sequence John introduced is missing in the second conjunct. A surface compositional derivation is possible in an extended Categorial Grammar if we make use of mirror images of the rules introduced above. Both backward type lifting T< and backward function composition B< are necessary here. Their semantics is identical to their forward oriented twins, and they differ from them just by the directionality of the slashes. (17)
a.
b.
X⇒M :A X ⇒ λx.xM : (B/A)\B X ⇒ M : A\B
T<
Y ⇒ N : B\C
X, Y ⇒ λx.N (M x) : A\C
B<
To derive the construction in (16), some new notational conventions have to be introduced. Following standard practice, I assume the category label pp for prepositional phrases. A single preposition like to thus has category pp/np. An expression like introduced Bill selects a pp to its right and an np to its left to constitute a clause, it thus has category (np\s)/pp. I will abbreviate this category as “tvp”. Accordingly, a bitransitive verb like introduced has category tvp/np. I will furthermore abbreviate the category of VPs, np\s, as vp wherever it is convenient. The derivation of the construction in (16) involves three operations in each conjunct prior to coordination: 1. backward lifting of the direct object to the level of tvp, 2. backward lifting of the prepositional object to the level of vp, and 3. backward function composition of the two lifted objects. It is illustrated in Figure 1.8 on page 24. (I assume that the preposition to does not make a semantic contribution and thus denotes the identity function λx.x). Due to their relatedness to the combinators of combinatory logic (cf. Curry and Feys, 1958), inference rules like function composition and type lifting have been dubbed combinators. A whole branch of Categorial Grammar—Combinatory Categorial Grammar—treats combinators
Type Logical Grammar: The Framework
23
as the main locus of linguistic generalizations both under a language particular and under a universal and typological perspective.5 Alternatively, one might consider the variety of different combinators (and of sub-combinators) that have been proposed in the literature as an indication that a more profound generalization has been missed. The work of Joachim Lambek from the late fifties and early sixties (Lambek, 1958 and Lambek, 1961) supplies such a meta-theory of combinators even though it antedates the idea of Combinatory Categorial Grammar. There, syntactic categories are conceived as propositions of a logical calculus, and combinators (alongside with all valid Categorial derivations) are theorems of this logic that are provable from more basic axioms and rules.
2.2
The Lambek Calculus L
The internal structure of a syntactic category in CG determines in which syntactic environment a sign of this category can occur. In BCG, this comes down to the following generalizations: If the category A/B is derivable from the antecedent X, then category A is derivable from X followed by a B. If the category B\A is derivable from the antecedent X, then category A is derivable from X preceded by a B. This is nothing more than a verbose formulation of the application axioms. They give necessary conditions for the assignment of slash categories to signs. The conditions are not sufficient though. If we turn the conditionals into biconditionals, we obtain necessary and sufficient conditions for the usage of slash categories. This step brings us from Basic Categorial Grammar to the core system of Type Logical Grammar (TLG)6 in the version of Lambek, 1958. There the behavior of slash categories is governed by the regularities: The category A/B is derivable from the antecedent X if and only if the category A is derivable from X followed by a B. The category B\A is derivable from the antecedent X if and only if the category A is derivable from X preceded by a B. These rules can concisely be expressed in the format of sequent rules:
5 This research program is carried out mainly by Mark Steedman and several of his students and coworkers. Representative references are Steedman, 1996 and Steedman, 2000. 6 This terminology will be motivated below.
j’ : np
Bill
lex
λx.xj’ : s/(np\s)
T>
likes like’ : (np\s)/np
λy.like’yj’ : s/np
lex B>
b’ : np
24
John
lex
λx.xb’ : s/(np\s)
detests
T>
detest’ : (np\s)/np
λy.detest’yb’ : s/np
λz.(like’zj’) ∧ (detest’zb’) : s/np
lex B>
Conj
broccoli
lex
broccoli’ : np
A>
(like’broccoli’j’) ∧ (detest’broccoli’b’) : s
Bill
λy.yb’ (tvp/np)\tvp introduced John j’ np
lex
lex
λx.x pp/np
lex
b’ np
lex
T<
Sue sue’ np
sue’ pp λw.wsue’ (vp/pp)\vp
λu.(usue’b’) (tvp/np)\vp
lex to A>
Harry
B<
λv.(introduce’sue’b’v) ∧ (introduce’sa’h’v) vp (introduce’sue’b’j’) ∧ (introduce’sa’h’j’) s Figure 1.8.
lex
λy.yh’ : (tvp/np)\tvp
T<
sa’ : pp
A<
Derivations of (14) and (16)
A<
Sally sa’ np
lex A>
λw.wsa’ : (vp/pp)\vp
λu.(usa’h’) (tvp/np)\vp
λuv.(usue’b’v) ∧ (usa’h’v) (tvp/np)\vp
introduce’ tvp/np
λx.x pp/np
h’ np
T<
lex
Conj
T< B<
ANAPHORA AND TYPE LOGICAL GRAMMAR
to
25
Type Logical Grammar: The Framework X ⇒ A/B
Y ⇒B
X, Y ⇒ A X⇒B
Y ⇒ B\A
X, Y ⇒ A
X, A ⇒ B /E
\E
X ⇒ B/A A, X ⇒ B X ⇒ A\B
/I
\I
As a side condition it is required that the antecedent of a sequent is never empty. The rules in the left column formalize the only-if direction of the informal formulation above. They are equivalent to the application axiom schemes of Basic Categorial Grammar. (We obtain the axiomatic formulation if we take identity axioms as premises of the rules given here. Conversely, the rule formulation is derivable from the application axioms via the Cut rule.) These two rules eliminate one slash occurrence, therefore they are called slash elimination rules, abbreviated as “/E” and “\E” respectively. Conversely, the rules in the right hand column represent the ifdirection of the biconditionals. They are examples of the method of hypothetical reasoning: To prove B/A from some antecedent X, provisionally add a hypothesis of category A to the right periphery of the antecedent and try to derive the succedent B. If you succeed, you can discharge the hypothesis and conclude B/A (and likewise for the backward looking slash). These rules create a new slash occurrence in a derivation. Therefore they are dubbed slash introduction rules, abbreviated as “/I” and “\I” respectively. Next to the two slashes, Lambek, 1958 assumes a third category forming connective, the product “•”. Intuitively, a linguistic resource has category A • B iff it consists of a component of category A, followed by a component of category B. So an argument cluster like Bill to Sue in (16) would (among others) have the category np • pp. Alternatively, the product operator can be considered the category-internal counterpart of the comma in the antecedent of sequents. These intuitions are captured by the elimination rule and the introduction rule for the product.. X ⇒A•B
Y, A, B, Z ⇒ C
Y, X, Z ⇒ C
•E
X⇒A
Y ⇒B
X, Y ⇒ A • B
•I
The usage of introduction rules and elimination rules, as well as the method of hypothetical reasoning, is reminiscent of systems of natural deduction for classical or Intuitionistic Logic. The Categorial slashes are
26
ANAPHORA AND TYPE LOGICAL GRAMMAR
akin to directed implications (with the application rules corresponding to Modus Ponens), while the product is related to conjunction. This resemblance is not accidental; the Lambek calculus is in fact a (very lean) logical calculus. In the subsequent paragraphs, I will make this connection precise by pointing out how exactly the Lambek calculus can be obtained from ordinary classical propositional logic.
2.2.1 The Structural Hierarchy Consider a standard natural deduction formulation for the classical propositional calculus such as the one given in Figure 1.9. (I consider disjunction a defined operation and omit the corresponding rules.)
A⇒A
X⇒A
id
Y, X, Z ⇒ B
X⇒A X, B ⇒ A
X ⇒A∧B
M
X⇒A
X ⇒A∧B X⇒B
X⇒B X ⇒ ¬¬A
¬E
Figure 1.9.
X⇒B
X ⇒A∧B X⇒A
→E
X, A ⇒ B X⇒A→B X, A ⇒ B
Cut
∧E(1)
X⇒A
∧E(2)
X⇒A→B
X⇒A
Y, A, Z ⇒ B
∧I
→I X, A ⇒ ¬B
X ⇒ ¬A
¬I
Natural deduction calculus for classical propositional logic
Systems of natural deduction (“ND” henceforth) generally consist of three components. Like any deductive system, they contain the identity axiom scheme and the Cut rule. Second, there are optionally structural rules.7 These are inference rules that do not affect the internal structures of the formulae involved but only rearrange the formulae in the antecedent. In the present example, there is only one such rule, namely “Monotonicity” (abbreviated “M”). Intuitively, it says that not every antecedent formula has to be used in a valid deduction. Antecedent formulae may be redundant and can be ignored if necessary. Finally, the system contains logical rules, i.e., introduction rules and elimination rules for each logical connective. 7 The
notion of structural rules is due to Gentzen, 1935.
27
Type Logical Grammar: The Framework
Antecedents in the ND system for classical logic are implicitly assumed to be sets of formulae. Neither linear order nor multiplicity of a single antecedent formula plays a role. In the Lambek calculus, on the other hand, antecedents are assumed to be sequences of formulae—as in BCG. We can bring classical logic into the sequence based format if we add two more structural rules, Permutation (P) and Contraction (C) (cf. Figure 1.10). They express explicitly that order and multiplicity of resources are irrelevant in this calculus. X, A, B, Y ⇒ C X, B, A, Y ⇒ C Figure 1.10.
X, A, A, Y ⇒ B
P
X, A, Y ⇒ B
C
Additional structural rules
In the presence of these structural rules, we may give different but equivalent logical rules for conjunction and implication (Figure 1.11). (The introduction rule for implication remains unchanged). X ⇒A∧B
Y, A, B, Z ⇒ C
Y, X, Z ⇒ C X⇒A→B
Y ⇒A
X, Y ⇒ B Figure 1.11.
→ E
∧E
X⇒A
Y ⇒B
X, Y ⇒ A ∧ B X, A ⇒ B X⇒A→B
∧I
→I
Alternative rules for ∧ and →
As the reader may verify, ∧E(1) and ∧E(2) can be derived from ∧E by using Monotonicity and Permutation, while ∧E can be derived from ∧E(1) and ∧E(2) with the help of Contraction and Cut. Conversely, ∧I is derivable from ∧I with Contraction and Permutation, while ∧I can be derived from ∧I with Permutation and Monotonicity. Likewise the two versions of the elimination rule for implication are interderivable by means of Permutation, Contraction and Monotonicity. If we omit the logical rules for the negation, we obtain the system of (positive implicational) Intuitionistic Logic. Here conjunction and implication are the only logical connectives (therefore the name “positive”). Note that this logic is weaker than just the positive fragment of classical propositional logic. For instance, Peirce’s Law ((A → B) → A) → A is a theorem that is derivable in classical but not in Intuitionistic Logic, even though it does not involve negation. This theorem is characteristic
28
ANAPHORA AND TYPE LOGICAL GRAMMAR
of classical logic in the sense that adding it as an axiom to Intuitionistic Logic leads us to the positive fragment of classical logic. Intuitionistic Logic still admits all structural rules. Nevertheless the notion of deduction that underlies this calculus is different from classical logic. Classical logic is concerned with truth of propositions in a Platonic sense, and deduction is basically preservation of truth. Intuitionistic Logic is concerned with proofs. It is a constructive logic; a deduction is valid iff it is possible to construct a proof of the succedent from a proof of the antecedent. So the Intuitionistic notion of deduction is akin to the notion of a computation, and antecedents can be considered as computational resources. A further step in the direction of a logic that takes computational resources into account (i.e., a “resource conscious logic”, to use a fashionable term) is made by omitting the structural rule of Monotonicity. Without this rule, we require that all resources are consumed in a computation, i.e., valid deductions do not admit redundant antecedent formulae. The logic that we obtain in this way is a version of Relevant Logic (the canonical references are Anderson and Belnap, 1975 and Anderson et al., 1992, see Dunn, 1986 for an excellent overview). The characteristic formula which is a theorem in Intuitionistic but not in Relevant Logic is A → (B → A) In the absence of Monotonicity, the two ways to define conjunction in classical or Intuitionistic Logic are not equivalent anymore. In other words, Intuitionistic conjunction splits into two Relevant Logical connectives. To avoid confusion, I use two different symbols for the two Relevant conjunctions, and •. The corresponding logical rules are given in Figure 1.12. X ⇒AB X⇒A X ⇒AB X⇒B X ⇒A•B
E(1) X⇒A
E(2)
X ⇒AB
Y, A, B, Z ⇒ C
Y, X, Z ⇒ C Figure 1.12.
X⇒B
•E
X⇒A
Y ⇒B
X, Y ⇒ A • B
I
•I
Two Relevant conjunctions
In the sequel I reserve the term “conjunction” for and call • “product” (following the Lambek terminology; in the literature on Relevant Logic
29
Type Logical Grammar: The Framework
this connective is usually called “fusion”). In Relevant Logic, it still holds that AB ⇒A•B but A•B ⇒AB is underivable. In Relevant Logic, all antecedent formulae in a deduction have to be consumed, but a given formula can be used arbitrarily many times. A more resource conscious perspective on deduction assumes that antecedent formulae are actually consumed in the process of deduction; so it makes a difference how many instances of a given proposition are available for a deduction. This amounts to dropping the structural rule of Contraction. The resulting system is (the additive-multiplicative fragment of Intuitionistic) Linear Logic (introduced in Girard, 1987). Again there is a characteristic law that is a Relevant, but not a Linear theorem: (A → A → B) → A → B Furthermore product and conjunction are logically independent in Linear Logic; A B ⇒ A • B is now underivable as well. The only structural rule that is left in Linear Logic is Permutation. If we remove it as well, implication splits into two variants (analogously to conjunction further above in the structural hierarchy). Alternatively to the logical rules for implication given in Figure 1.11 on page 27, we could use their mirror images: X⇒A
Y ⇒A→B X, Y ⇒ B
→ E
A, X ⇒ B X⇒A→B
→ I
In the presence of Permutation, the two versions are equivalent. If we do without this structural rule, we reach a logic with two directional versions of implication. Following Lambek’s notation, the first one is written as “/” and the second as “\”. The resulting logic is a version of the Lambek calculus. The characteristic law that is a theorem in Linear Logic but not in the Lambek calculus is (A → B → C) → B → A → C The original Lambek calculus is obtained if we ignore conjunction and add the requirement that the left hand side of sequents is never empty. To sum up so far, the logical calculi that we considered here form a hierarchy of systems with increasing strength, the Lambek calculus being
30
ANAPHORA AND TYPE LOGICAL GRAMMAR
the weakest and classical logic the strongest of these systems. Between the Lambek calculus and Intuitionistic Logic, the difference between the calculi is determined by the presence or absence of structural rules. This motivates the name “structural hierarchy” for this pattern, and the cover term “substructural logics” for all calculi that have fewer structural rules than Intuitionistic Logic.8 This hierarchy is summarized in Table 1.1. Name
Characteristic Law
Structural rules
Classical Logic Intuitionistic Logic Relevant Logic Linear Logic Lambek calculus
((A → B) → A) → A A→B→A (A → A → B) → A → B (A → B → C) → B → A → C —
P, C, M P, C, M P, C P —
Table 1.1.
The structural hierarchy
Having thus a variety of resource conscious logics at our disposal, one might wonder which notion of deduction is appropriate for grammatical composition. The structural rule of Monotonicity admits the usage of redundant antecedent formulae. Natural language does not tolerate redundant lexical material though. Even pleonastic elements have a clear syntactic function and are not really redundant. So an appropriate logic of grammar will do without Monotonicity. Logics with Contraction admit multiple usage of resources. In its unrestricted form this would be much too powerful for natural language as well. It amounts to free deletion under identity (possibly restricted to adjacent constituents, depending on whether Permutation is available or not). If the grammar of English would admit such an operation, we could wrongly conclude from the grammaticality of (18a) that (18b) is grammatical as well. (18)
a. Anybody who laughed laughed about John. b. *Anybody who laughed about John.
(Anaphora phenomena show that natural language sometimes does reuse resources and that therefore a limited version of Contraction might be an appropriate grammatical deductive step. This will be the main point of the subsequent chapters.) 8 A good overview over the landscape of substructural logics, their history and motivations can be found in Restall, 2000.
Type Logical Grammar: The Framework
31
Finally, unrestricted Permutation is not characteristic for the grammatical resource regime either. Even though languages differ with respect to their freedom of word order, no natural language is actually closed under permutation. So similarly to Contraction, Permutation is only available as a very restricted option in grammar. Hence we can conclude that the Lambek calculus is the obvious candidate for a grammatical general purpose logic since it embodies a resource management regime that we generally find in natural language.9
2.2.2 The Curry-Howard Correspondence Intuitionistic Logic as well as all substructural logics (including the Lambek calculus) are constructive logics. The intuitive meaning of a sequent X ⇒ A is “there is a construction transforming resources X into an A”. The different calculi discussed in the previous paragraph differ with respect to the methods of construction they admit but not with respect to this overall interpretation. So generally speaking, if a sequent X ⇒ A is derivable, there is a corresponding computable function leading from X to A. The appropriate formal language to talk about functions is the λ-calculus. The Curry-Howard correspondence establishes a profound correspondence between constructive logics on the one hand and the λ-calculus on the other hand. Loosely speaking, logic tells you what constructions are possible and λ-terms express how these constructions are to be performed. I start by recalling the syntax of the typed λ-calculus. Instead of the notation a, b for functional types used in linguistic semantics, I will write functional types as a → b. (The usage of an implication arrow for function spaces is no accident.) Also, I extend the simply typed λcalculus with conjunctive types that express Cartesian products. So the set of types coincides with the set of formulae of Positive Intuitionistic Logic. It is given by the definition Definition 13 (Types) Let a set BTYPE of basic types be given. TYPE is the smallest set such that 1 BTYPE ⊆ TYPE, 9 Arguably even the Lambek calculus is too permissive since it tacitly assumes that antecedents come as sequences rather than as trees. Therefore it is called the associative Lambek calculus. There is plenty of evidence that a notion of constituency is inevitable for adequate grammatical descriptions. A non-associative version of the Lambek calculus is presented in Lambek, 1961. However, I will ignore the issue of associativity/constituency in this book because it is orthogonal to the issues pursued here.
32
ANAPHORA AND TYPE LOGICAL GRAMMAR
2 if A, B ∈ TYPE, then A → B ∈ TYPE, and 3 if A, B ∈ TYPE, then A ∧ B ∈ TYPE. Functional types figure in the syntactic operations of function application and λ-abstraction. The syntactic operations corresponding to conjunctive types are pair formation ·, · and the first and second projection, (·)0 and (·)1 . I assume that there are infinitely many variables of each type in TYPE. As in the previous section, I use letters x, y, z, . . . as metavariables over variables and M, N, O, . . . as metavariables over terms. Also I write M : A as shorthand for “term M has type A”.
Definition 14 (Syntax of the typed λ-calculus) 1 Every variable of type A is a term of type A. 2 If M : A → B and N : A, then (M N ) : B. 3 If M : A and x : B, then λxM : B → A. 4 If M : A and N : B, then M, N : A ∧ B. 5 If M : A ∧ B, then (M )0 : A and (M )1 : B. The notion of the domain of a type given in the previous section has to be extended appropriately to accommodate conjunctive types.
Definition 15 (Domains) The function Dom is a semantic domain function iff 1 The domain of Dom is TYPE, 2 for all A ∈ TYPE, Dom(A) is a non-empty set, 3 Dom(A → B) = Dom(B)Dom(A) , and 4 Dom(A ∧ B) = Dom(A) × Dom(B). Given a variable assignment function g that assigns each variable of type A an object from Dom(A), the interpretation function · extends g to all terms. I use the notation g[x → a] for the assignment function that is exactly like g except that it maps x to a.
Definition 16 (Interpretation function) xg = g(x) (M N )g = M g (N )g
Type Logical Grammar: The Framework
λxM g M, N g (M )0 g (M )1 g
= = = =
33
{a, M g[x→a] |x has type A and a ∈ Dom(A)} M g , N g the unique a such that for some b : M g = a, b the unique a such that for some b : M g = b, a
So the operation ·, · of pair formation in the object language is interpreted as pair formation in the metalanguage. The projection functions (·)0 and (·)1 pick out the first and the second element of an ordered pair, respectively. This interpretation justifies the following reduction relations on λterms:
Definition 17 λxM N λx(M x) (M, N )0 (M, N )1 (M )0 , (M )1
;β ;η ;β ;β ;η
M [N/x] provided N is free for x in M M provided x is not free in M M N M
β-reduction for implicational types is well-known under the name of λconversion, and η-reduction for implicational types expresses extensionality of functions. The corresponding reductions for conjunctive types arise naturally from the semantics of pair formation. Furthermore there is the so-called α-equivalence over λ-terms: λyM [y/x] =α λzM [z/x] provided y, z are not free in M α-equivalence, β-reduction and η-reduction jointly constitute the notion of αβη-equivalence:
Definition 18 (αβη -equivalence) “=αβη ” is the smallest reflexive, transitive and symmetric relation such that 1 if M =α N , then M =αβη N , 2 if M ;β N , then M =αβη N , and 3 if M ;η N , then M =αβη N . It can easily be verified that this syntactically defined equivalence entails semantic equivalence under arbitrary assignment functions.
34
ANAPHORA AND TYPE LOGICAL GRAMMAR
A comparison between the Intuitionistic part of the ND system in Figure 1.9 on page 26 on the one hand and the syntax of the typed λcalculus (as given in Definition 14 on page 32) on the other hand reveals a close similarity. The construction of typed λ-terms requires exactly the same reasoning steps as Intuitionistic deduction. The celebrated CurryHoward isomorphism for Intuitionistic Logic makes this correspondence precise. First some auxiliary notation:
Definition 19 Let Γ be a set of terms. Then |Γ| = {A ∈ TYPE | there is a term M ∈ Γ such that M : A} The syntax of the λ-calculus defines a consequence relation on types. Let M be some term of type A, and let Γ be the set of variables that occur free in M . Then M represents an operation that transforms arbitrary resources with the types |Γ| into an object of type A. According to the Intuitionistic resource management, not all resources have to be consumed in a derivation, so M also represents an operation from supersets of |Γ| to A. The following definition captures this notion of deduction.
Definition 20 X ⇒λ A iff for some term M with M : A, |F V (M )| ⊆ X Let us illustrate this with a simple example. Suppose the variable x has type e and the variable y has type e → t. Then the term λy.yx has type (e → t) → t. x is the only free variable occurring in this term. Therefore it holds that e ⇒λ (e → t) → t. To avoid notational confusion, I write X ⇒IL A if the sequent X ⇒ A is derivable in Intuitionistic Logic. The Curry-Howard correspondence says
Theorem 2 (Curry-Howard correspondence) X ⇒λ A iff X ⇒IL A Proof: As for the only-if direction, observe that each syntactic construction rule for λ-terms is matched by a rule of the ND calculus for Intuitionistic Logic. So this direction of the theorem can be established via induction over the complexity of terms. As for the if-direction, there is a simple constructive proof that assigns every Intuitionistic ND derivation a λ-term which embodies the derived sequent. This is done using labeled deduction; every ND step is labeled with a syntactic construction step over λ-terms. The labeled calculus is given in Figure 1.13. Here
35
Type Logical Grammar: The Framework
letters X, Y, Z . . . range over sets of formulae that are labeled with variables. By convention, X, Y means X ∪ Y and X, x : A is shorthand for X ∪{x : A}. Variables are always assumed to be distinct unless otherwise stated. It is easy to see that every Intuitionistic proof has a labeled version which delivers the witness λ-term that is required by the theorem.
x:A⇒x:A
id
X⇒M :A
Y, x : A, Z ⇒ N : B
Y, X, Z ⇒ N [M/x] : B X⇒M :A X, x : B ⇒ M : A
Cut
M
X ⇒M :A∧B
Y, x : A, y : B, Z ⇒ N : C
Y, X, Z ⇒ N [(M )0 /x][(M )1 /y] X⇒M :A
Y ⇒N :B
X, Y ⇒ M, N : A ∧ B X⇒M :A→B
X, x : A ⇒ M : B X ⇒ λxM : A → B
Figure 1.13.
∧I
Y ⇒N :A
X, Y ⇒ M N : B
∧E
→E
→I
Labeled natural deduction for Intuitionistic Logic
Labeled deduction actually establishes a one-to-one correspondence between typed λ-terms and ND proofs. Therefore one can speak of a Curry-Howard isomorphism. This correspondence was first noted by Curry in Curry and Feys, 1958 for the purely implicational fragment of Intuitionistic Logic. In Howard, 1969 the correspondence is extended
36
ANAPHORA AND TYPE LOGICAL GRAMMAR
to conjunction as well as to disjunction (the latter will be ignored in this book). From a more general perspective, the Curry-Howard correspondence provides a remarkable link between two apparently unrelated branches of mathematical logic, namely proof theory and function theory. This connection is frequently expressed by the programmatic slogan Propositions as types, proofs as programs. The terminology “type logic” is motivated by this connection; type logics are constructive logics that admit a functional interpretation. “Type Logical Grammar” is a theory of grammar that makes crucial use of type logics. The usage of the word type here is more or less synonymous to our earlier usage of category, and I will use these terms interchangeably henceforth. The connection between λ-calculus and Intuitionistic proof theory allows the transfer of results from one area to the other. In particular, the notion of normalization of terms (application of reduction steps to terms until a normal form is reached, i.e., a term that cannot be reduced any further) can be translated into a notion of proof normalization. Well-known properties of term reduction like the Church-Rosser property (confluence of different reduction strategies) and strong normalization (absence of infinite sequences of normalization steps) immediately carry over to ND proofs. A proof normalization step transforms an ND proof of a given sequent into a simpler proof of the same sequent. “Strong normalization” for proofs means that every sequence of normalization steps eventually terminates. To illustrate the concept of proof normalization, an example of a β-reduction and one of an η-reduction for implicational types are given in Figure 1.14, and in Figure 1.15 on the next page. X, x : A ⇒ M : B X ⇒ λxM : A → B
→I
Y ⇒N :A
X, Y ⇒ (λxM )N : B
→E
; Y ⇒N :A
X, x : A ⇒ M : B
X, Y ⇒ M [N/x]B Figure 1.14.
Cut
β-normalization
There are analogous relations between ND proofs and λ-terms for substructural logics. Limiting the deductive power of the proof theory by
37
Type Logical Grammar: The Framework
X⇒M :A→B
x:A⇒x:A
X, x : A ⇒ M x : B X ⇒ λx.M x : A → B
id →E
→I
; X⇒M :A→B Figure 1.15.
η-normalization
omitting structural rules amounts to restricting the syntax of the λcalculus in certain respects. Relevant Logic requires that every deductive resource is actually used. This has two repercussions for the corresponding Curry-Howard terms. First, every variable on the left hand side of a sequent must in fact occur in the term on the right hand side. Furthermore, every λabstractor in a term must bind at least one free variable occurrence. To see why this is so, consider the term (λx.y)z Here the abstractor λx does not bind a free variable occurrence. z is free in this term, i.e., it represents an input slot for the corresponding computation. Normalizing this term leads to y Now z does not occur anymore; the resource z proved to be redundant. This is illicit in Relevant Logic, and the prohibition of empty abstraction excludes such configurations. Since Relevant Logic distinguishes conjunction and product, a complete correspondence would require two kinds of pair formation operations on the level of terms. To simplify matters, I ignore conjunction and assume that conjunctive types correspond to product formulae. So for Relevant Logic, the correspondence between terms and deductions has to be formulated as
Definition 21 X ⇒λR A iff for some term M without empty abstraction such that M : A, |F V (M )| = X. The relation ⇒λR represents the deduction relation determined by the fragment of the λ-calculus without empty abstraction and without re-
38
ANAPHORA AND TYPE LOGICAL GRAMMAR
dundant antecedent formulae, and it coincides with derivability in the multiplicative fragment of Relevant Logic, i.e., Relevant Logic without conjunction (but with product). Linear Logic is still more restricted because it requires that every resource is used exactly once. On the level of λ-terms this amounts to the requirement that every free variable occurs exactly once, and every λ binds exactly one free variable occurrence. There is a minor complication involved here in connection with the projection functions that come with product elimination. For instance, the sequent x : A • B ⇒ (x)0 , (x)1 : A • B is Linearly derivable even though x apparently occurs twice in the succedent term. What we actually want is that the two occurrences of x count as one. To achieve this, the syntax of the fragment of the λ-calculus that corresponds to (the multiplicative fragment of) Linear Logic has to be defined in a way that sidesteps this problem. This alternative syntax is also closer to the ND calculus for product.
Definition 22 (Linear λ-calculus) Let VAR be the set of typed variables. The set of linear λ-terms ΛLL is the smallest set such that 1 VAR ⊆ ΛLL . F V (x) = {x} 2 If M : A → B, N : A, and F V (M ) ∩ F V (N ) = ∅, then M N ∈ ΛLL . F V (M N ) = F V (M ) ∪ F V (N ) 3 If M ∈ ΛLL and x ∈ F V (M ), then λxM ∈ ΛLL . F V (λxM ) = F V (M ) − {x} 4 If M, N ∈ ΛLL , and F V (M ) ∩ F V (N ) = ∅, then M, N ∈ ΛLL . F V (M, N ) = F V (M ) ∪ F V (N ) 5 If M, N ∈ ΛLL , x, y ∈ F V (M ), N : A ∧ B, x : A, y : B, and F V (M ) ∩ F V (N ) = ∅, then M [(N )0 /x][(N )1 /y] ∈ ΛLL . F V (M [(N )0 /x][(N )1 /y]) = F V (M ) ∪ F V (N ) − {x, y} This fragment of the λ-calculus corresponds to a fragment of Linear Logic10 in a way that is analogous to the Curry-Howard correspondence for Intuitionistic Logic. 10 The
positive multiplicative fragment of Intuitionistic Linear Logic, to be precise.
Type Logical Grammar: The Framework
39
Up to this point, there is a neat correspondence between constructive logics (excluding classical logic, which is not constructive), structural rules, and constraints on λ-terms. This is summarized in table 1.2. Name
Structural rules
Constraints on λ-terms
Intuitionistic Logic Relevant Logic Linear Logic
P, C, M P, C P
— no empty abstraction no multiple abstraction/ no multiple occurrences of the same variable
Table 1.2.
Substructural Curry-Howard correspondences
To extend the Curry-Howard correspondence below Linear Logic, we would have to introduce a version of the λ-calculus that distinguishes between leftward and rightward abstraction and application. Such a system is developed in Wansing, 1993. I will not pursue this line here though but use Linear λ-terms as labels for derivations in the Lambek calculus as well. Since the Lambek calculus is a subsystem of Linear Logic, there is a corresponding Linear λ-term for every proof in the Lambek calculus, even though not every Linear term corresponds to a Lambek proof. The official labeled ND presentation of the Lambek calculus L is thus as given in Figure 1.16 on the following page.11 In the sequel I will use the notation LX⇒M :A to indicate that the labeled sequent X ⇒ M : A is derivable in the Lambek calculus L. Likewise I write LX⇒A if there is a Curry-Howard labeling for the unlabeled sequent X ⇒ A that makes it L-derivable. If no confusion is likely to arise, I will omit the “L” to the left of the turnstile. The tension between the directional base calculus and the non-directional Curry-Howard labeling has a sound linguistic motivation. The intended usage of the Lambek calculus is a description of grammatical 11 It
is based on the ND calculus for the Lambek calculus from Morrill et al., 1990. Lambek, 1958 only gives an axiomatic and a Gentzen style sequent presentation.
40
ANAPHORA AND TYPE LOGICAL GRAMMAR
x:A⇒x:A
X⇒M :A
id
X⇒M :A
Y, X, Z ⇒ N [M/x] : B Y ⇒N :B
X, Y ⇒ M, N : A • B X, x : A ⇒ M : B X ⇒ λxM : B/A x : A, X ⇒ M : B X ⇒ λxM : A\B
Y, x : A, Z ⇒ N : B
/I
\I
Figure 1.16.
•I
X ⇒M :A•B
Cut
Y, x : A, y : B, Z ⇒ N : C
Y, X, Z ⇒ N [(M )0 /x][(M )1 /y] : C X ⇒ M : A/B
Y ⇒N :B
X, Y ⇒ M N : A X⇒M :A
Y ⇒ N : A\B
X, Y ⇒ N M : B
•E
/E
\E
Labeled natural deduction for the Lambek calculus L
composition. This process is multi-dimensional; it consists at least of the composition of forms and the composition of meanings. Composition of forms discriminates between left and right, so a general logic of grammatical composition must be non-commutative. On the other hand, meanings are not directed. Curry-Howard labels in general give recipes of how antecedent resources are composed into witnesses for the succedent. For the linguistic application of the Lambek calculus, we use them only to describe the composition of lexical meanings into sentence meanings. Since meanings are not directed, a term language that is less discriminating than the grammatical base logic is sufficient for this purpose. In the last section, I introduced the format of proof trees as a graphically more appealing notational alternative to the sequent format. I will make use of this format in the context of the Lambek calculus as well. There, ND rules are presented as transformations on graphs as given in Figure 1.17 on the next page. The resulting structures are a bit more complex than in the context of BCG. The identity axiom from the sequent presentation has no counterpart in the tree format since it comes down to the trivial claim that every single node that is labeled by a variable is a tree. Neither does the Cut rule occur explicitly. It corresponds to the general tree building operation of unifying the root node of one tree with a leaf of another tree to form a more complex tree. So we only have logical rules for the three Lambek connectives. The rule •I is a straightforward transposition of its sequent counterpart. •E is somewhat more complex since it introduces downward branching. So the resulting structures are strictly speaking not necessarily trees but only directed acyclic graphs. Down-
41
Type Logical Grammar: The Framework x:A
y:B
•I
x, y : A • B i x:A .. .
.. . .. .
M :B λxM : A\B .. . .. .
.. . .. .
.. . .. .
x:A•B (x)0 : A
(x)1 : B
x:A
y : A\B (yx) : B
•E
\E
\I, i i
x:A .. .
x : A/B
y:B /E
(xy) : A
M :B /I, i λxM : B/A Figure 1.17.
Natural deduction for the Lambek calculus in tree format
ward branching is only possible in the context of a proof graph with a unique root. The slash introduction rules \I and /I involve discharging of a hypothesis. In tree format, this amounts to a non-local transformation on trees. \I operates on a complete proof tree. It adds a new root at the root of its argument, and it marks the leftmost leaf of its argument as being discharged. This means that this leaf does not count as a premise12 of the deduction anymore. To keep track of the dependency between the discharged leaf and the position where it is discharged in case of multiple hypotheses, the discharged hypothesis and the root node are coindexed. (This mechanism is strictly speaking redundant if we use Curry-Howard labeling since the variable names of the hypothesis labels store this information too.) Rule /I works analogously except that it is the rightmost leaf of its argument proof tree that is discharged. The slash elimination rules are plain local tree admissibility conditions. They coincide with the two application schemes in BCG. The following theorem states the equivalence between the sequent format and the tree format of the ND presentation of L. (By σ(X) we refer to the result of replacing all commas in the sequent X by products, i.e. σ(A) = A and σ(X, A) = σ(X) • A.) 12 A note on terminology: The antecedents in the sequent style ND system correspond to premises in the tree format, and likewise for succedents vs. conclusions.
42
ANAPHORA AND TYPE LOGICAL GRAMMAR
Theorem 3 For any sequences X, Y it holds that L X ⇒ σ(Y ) iff there is a prooftree with X as its sequence of undischarged assumptions and Y as its sequence of conclusions. Proof: By induction over derivations.
2.2.3 Linguistic Application: Lambek Grammars Now the ground is paved to officially introduce the Lambek calculus as a logic of grammatical composition. Grammars based on the Lambek calculus L are very similar to Basic Categorial Grammars that were introduced in the previous section. The crucial difference is the fact that the deductive component of the grammar is now given by the logical calculus L rather than just by the application schemes. The formal definition of categories in Lambek grammars coincides with the corresponding definition for BCG and is repeated here for convenience. Definition 23 (Categories) Let a finite set B of basic categories be given. CAT(B) is the smallest set such that 1 B ⊆ CAT(B) 2 If A, B ∈ CAT, then A/B ∈ CAT(B) 3 If A, B ∈ CAT, then A\B ∈ CAT(B) The definition of a Lambek grammar also coincides with the corresponding BCG notion.
Definition 24 (Lambek Grammar) Let Σ be an alphabet. A Lambek grammar G is a triple B, LEX, S, where B is a finite set (the basic categories), LEX is a finite sub-relation of Σ+ × CAT(B), and S is a finite subset of CAT(B) (the designated categories). Due to the more inclusive notion of deduction based on L, the same grammar recognizes different languages depending on whether it is conceived as a BCG grammar or a Lambek grammar.
Definition 25 Let G = B, LEX, S be a Lambek grammar over the alphabet Σ. Then
Type Logical Grammar: The Framework
43
α ∈ L(G) iff there are a1 , . . . , an ∈ Σ+ , A1 , . . . , An ∈ CAT(B), and S ∈ S such that 1 α = a1 . . . an , 2 for all i such that 1 ≤ i ≤ n : ai , Ai ∈ LEX, and 3 L A1 , . . . , An ⇒ S. Note that all axioms and rules of BCG are derivable in L. Therefore every BCG derivation is also a Lambek derivation. As a consequence, the language that is recognized by a given grammar G conceived as a BCG grammar is always a sub-language of the language that is recognized by G as a Lambek grammar. Curry-Howard labels supply every derivation in L with a term of the λ-calculus. These terms play the same role as the labels in BCG derivations; they are interpreted as recipes for composing the meaning of a complex expression from the meanings of the lexical resources involved. Note also that term labeling of BCG derivations is preserved if they are interpreted as Lambek derivations. So switching from BCG to Lambek grammars does not only preserve string recognition but also meaning assignment. While Lambek grammars preserve the deductive means of BCG, they properly extend it. In pure Lambek grammars, the product connective (and its corresponding logical rules) plays no major role. The main difference with respect to BCG comes from the usage of hypothetical reasoning. There are at least two kinds of phenomena that find much more natural analyses if we have the slash introduction rules at our disposal, namely unbounded dependencies and coordination.
2.2.4 Unbounded Dependencies Consider a relative clause construction involving subject relativization, like (19)
the mathematician who wrote Principia
The relative clause who wrote Principia as a whole is a postnominal noun modifier, so its category should be n\n. If we assume the standard type assignments for determiners, transitive verbs and proper nouns, the simplest type assignment for the relative pronoun who is (n\n)/(np\s). Furthermore, to derive the desired interpretation ιx.mathematician’x∧ write’xprincipia’, we will have to assume the interpreted lexical entry (20)
who – λP Qx.Qx ∧ P x : (n\n)/(np\s)
1 np x who mathematician
lex
np/n λP ιx.P x
s write’principia’x
lex
np\s λxwrite’principia’x
n\n λQx.Qx ∧ write’principia’x n λx.math’x ∧ write’principia’x
np ιx.math’x ∧ write’principia’x
/E
wrote Russell
lex
book the np/n λP ιx.P x
lex
n book’
lex
lex
(n\n)/(s/np) λP Qx.Qx ∧ P x
s/np λxwrite’xrussell’
n λx.book’x ∧ write’xrussell’
\E
/E
np ιx.book’x ∧ write’xrussell’
Figure 1.18.
Derivations for (19) and (21)
1 np x
np\s write’x
s write’xrussell’
n\n λQx.Qx ∧ write’xrussell’
lex
(np\s)/np write’
np russell’ which
\E
/I, 1 /E
\E
/E
lex /E
\E
\I, 1 /E
ANAPHORA AND TYPE LOGICAL GRAMMAR
the
lex
np principia’
np\s write’principia’
(n\n)/(np\s) λP Qx.Qx ∧ P x
n math’
principia
lex
44
wrote (np\s)/np write’
Type Logical Grammar: The Framework
45
A ND derivation for (19) is given in Figure 1.18 on the facing page. This derivation is redundant in one respect: A \ E step is immediately followed by a \ I step. In this configuration, η-reduction is possible. However, it is important to notice that the hypothesis marked with 1 occurs in the subject position of the embedded clause, i.e., at the position where transformational theories would place a trace. This pattern can be extrapolated to other constructions. Quite generally, hypotheses in Type Logical Grammar play a role that is very similar to the role of traces in theories like GB (cf. Chomsky, 1981). Hypothetical reasoning is triggered by lexical entries with higher order categories like (n\n)/(np\s). (This type is called “higher order” because one of its argument categories is a complex category). Object relativization works analogously except for the linear position of the hypothesis, which in turn is triggered by the directionality of the higher order part of the category of the relative pronoun. So to derive the complex NP given in (21a), I assume the lexical entry (21b). (21)
a. the book which Russell wrote b. which – λP Qx.Qx ∧ P x : (n\n)/(s/np)
Note that subject and object relative pronouns have the same semantics, namely property intersection. The derivation of (21a) is given in figure 1.18. Again, a hypothesis occurs where transformationalists would put a trace. This time the hypothetical reasoning cannot be eliminated via η-reduction. It is easy to see that the distance between the discharged hypothesis and the triggering relative pronoun has no influence on the derivability of wh-dependency. This phenomenon is correctly predicted to be unbounded. Similar analyses can be formulated for other kinds of “A’-movement” like question formation or topicalization. The interested reader is referred to Morrill, 1994 and Carpenter, 1998 for a comprehensive discussion of these issues. Crucially, the natural ND according to these analyses starts by putting a hypothesis at the “base position” of the dislocated element that is later discharged. Note, however, that hypothetical reasoning is a proof technique, not an intrinsic part of our theory of grammar. There are proof theories for the Lambek calculus that are fully equivalent to ND without using anything that would resemble traces. Let me point out two problems that arise in connection with the Lambek treatment of A’-dependencies, both of which motivated further developments of Type Logical Grammar. First, L is too restrictive in a sense because it only admits extraction if the hypothesis to be discharged
46
ANAPHORA AND TYPE LOGICAL GRAMMAR
(= the “base position” of the “moved” element) is peripheral in the scope of the binding operator. This held for the subject position and—in the previous example—for the object position, but it is straightforward to construct different kinds of counterexamples, as for instance (22)
the book that Russell wrote passionately the book that Russell gave to Whitehead the book that Russell gave to Whitehead intentionally the book that Russell thought Whitehead recommended to a student e. . . .
a. b. c. d.
While it might be possible to account for each of these examples individually by stipulating a new category for the relative pronoun, this strategy evidently misses a generalization: The linear position of the hypothesis is irrelevant for relativization. To cover this insight, we have to assume that the hypothesis has access to the structural rule of Permutation while the overt material hasn’t. So what is apparently needed is a hybrid logic of grammar that fine-tunes structural reasoning to certain linguistic domains. This can be done by extending the Lambek calculus with additional modes of composition to a multimodal type logic. Discussion of these techniques goes beyond the scope of this book though. The interested reader is referred to Morrill, 1994 and Moortgat, 1997. In some respects, the Lambek calculus is also too liberal to adequately formalize extraction phenomena. It treats these dependencies as unbounded in the literal sense, disregarding the fact that extraction is bounded by island constraints. It is for instance impossible to extract a relative pronoun out of a coordinate structure, cf. the following example from Moortgat, 1997: (23)
*the mathematician whom Gottlob admired Kazimierz and Jim detested
Nothing will prevent the derivation of this example in the Lambek calculus though, since the island constituent Gottlob admired Kazimierz and Jim detested has no special status that would prevent the discharging of hypotheses. Here the problem is that the Lambek calculus is too liberal, allowing unrestricted associativity. The solution is again an appropriate multimodal extension of L that blocks associativity wherever necessary.
2.2.5 Coordination and Combinators As was demonstrated above, Categorial Grammar lends itself to an elegant treatment of non-constituent coordination, provided BCG is ex-
47
Type Logical Grammar: The Framework
tended with appropriate combinators. One major advantage of Type Logical Grammar based on the Lambek calculus is the fact that almost all combinators that have been proposed in an ad hoc fashion in the literature turn out to be theorems of L. Let us start with the two directional variants of type lifting, T> and T< . Their derivation is given in figure 1.19. 1 y : A\B
x:A
yx : B
1 y : B/A
\E
x:A /E yx : B
/I, 1 λy.yx : B/(A\B) Figure 1.19.
λy.yx : (B/A)\B
\I, 1
Natural deduction derivations of Type Rasing
The Curry-Howard labeling of the proof supplies the appropriate semantics for type lifting as a side effect. Function composition is derivable as well. It involves two slash elimination steps, followed by one slash introduction step (cf. Figure 1.20). y : B/C yz : B
x : A/B x(yz) : A λz.x(yz) : A/C
x:A
1
y : A\B yx : B
1 /E
/E
/I, 1
\E
z(yx) : C λx.z(yx) : A\C Figure 1.20.
z:C
z : B\C
\E
\I, 1
Natural deduction derivations of function composition
Other combinators from the literature like the Geach rules (cf. Geach, 1972) x : A/B ⇒ λyz.x(yz) : A/C/(B/C) x : B/C ⇒ λyz.y(xz) : (A/B)\(A/C)
48
ANAPHORA AND TYPE LOGICAL GRAMMAR
and argument lowering (Partee and Rooth, 1983) x : A/((B/C)\B) ⇒ λy.x(λz.zy) : A/C are Lambek theorems as well. The Geach rules can immediately be derived from function composition by performing an additional step of slash introduction. Depending on whether the rightmost or the leftmost remaining premise of function composition is discharged, we end up with the harmonic or the disharmonic version of the Geach rule (I leave out the backward looking mirror images as these are essentially equivalent to their forward looking counterparts). 2
y : B/C yz : B
x : A/B x(yz) : A λz.x(yz) : A/C
2
/I, 2
1 /E
/E
/I, 1
λyz.y(xz) : (A/B)\(A/C) Figure 1.21.
z:C
xz : B
λz.y(xz) : A/C
/E
/E
x : B/C
y(xz) : A
1
/I, 1
λyz.x(yz) : A/C/(B/C)
y : A/B
z:C
\I, 2
Natural deduction derivations of the Geach rules
The ND derivation of argument lowering essentially involves type lifting of a hypothesis that is later discharged (cf. Figure 1.22 on the facing page). It goes without saying that the combinatory analyses of non-standard coordination phenomena sketched above carry over to Lambek grammars since the combinators involved are theorems here. So the Lambek calculus plus the Curry-Howard correspondence offers a principled explanation why these combinatory rules exist and why they have the semantics they have. I conclude this discussion with mentioning two combinators from the literature that are not derivable in L. To account for the cross-serial
49
Type Logical Grammar: The Framework
z : B/C
2
y:C
zy : B x : A/((B/C)\B)
λz.zy : (B/C)\B
x(λz.zy) : A λy.x(λz.zy) : A/C Figure 1.22.
1 /E
\I, 2 /E
/I, 1
Natural deduction derivation of argument lowering
dependencies in Dutch, Steedman, 2000 proposes a disharmonic variant of function composition: x : A/B, y : C\B ⇒ λz.x(yz) : C\A To make this sequent a theorem, we would need access to the structural rule of Permutation. In other words, this combinator is a theorem of Linear Logic, and there it has the semantics Steedman assumes. In Moortgat, 1997 it is shown how Steedman’s analysis can be reproduced in a multimodal version of TLG. Furthermore Steedman (op. cit.) assumes a version of Curry and Feys’ combinator S to handle parasitic gap constructions: x : A/B, y : (A\C)/B ⇒ λz.yz(xz) : C/B The (Relevant) derivation of this sequent requires both Permutation and Contraction. Again, Moortgat, 1997 shows how this can be done multimodally while avoiding a collapse into full Relevant Logic.
2.2.6
Sequent Presentation, Cut Elimination and Decidability The decidability proof for BCG given above rests on the fact that in BCG-derivable sequents X ⇒ A, the succedent A is a subformula of one of the antecedent formulae in X. It is easy to see that neither the slash introduction rules nor the product introduction rule in the ND presentation of L necessarily preserve this property. In Lambek, 1958 another strategy for a decidability proof is pursued. Rather than using ND, Lambek formulates his calculus as a Gentzen style sequent system. Like the sequent presentation of ND, this proof format involves reasoning with sequents, and the two formats are very similar, but they also differ in crucial respects. Recall that the logical rules in the ND calculus either introduce or eliminate logical connectives on the succedent side of sequents. Sequent rules only introduce logical connectives,
50
ANAPHORA AND TYPE LOGICAL GRAMMAR
but there are also rules that introduce them on the antecedent side of sequents. So in a sequent system, there is a left rule (also called “rule of use”) and a right rule (“rule of proof”) for every connective. Together with these logical rules, the sequent system for L comprises the identity axiom scheme and the Cut rule. Structurally stronger systems also employ structural rules (that are identical to the corresponding ND rules). Curry-Howard labeling can be applied to sequent proofs as well, even though the connection between the syntax of the labels and the structure of the proof is not as close as in the ND system. The labeled version of the sequent system for L from Lambek, 1958 is given in Figure 1.23.
x:A⇒x:A
X⇒M :A
id
X⇒M :A
Y, X, Z ⇒ N [M/x] : B Y ⇒N :B
X, Y ⇒ M, N : A • B X, x : A ⇒ M : B X ⇒ λxM : A/B x : A, X ⇒ M : B X ⇒ λxM : A\B
Figure 1.23.
Y, x : A, Z ⇒ N : B
/R
\R
•R
Cut
X, x : A, y : B, Y ⇒ M : C X, z : A • B, Y ⇒ M [(z)0 /x][(z)1 /y] : C X⇒M :A
Y, x : B, Z ⇒ N : C
Y, y : B/A, X, Z ⇒ N [(yM )/x] : C X⇒M :A
Y, x : B, Z ⇒ N : C
Y, X, y : A\B, Z ⇒ N [(yM )/x] : C
•L
/L
\L
Labeled sequent presentation of the Lambek calculus L
The rules of proof in the sequent system coincide with the ND introduction rules. The rules of use differ from the corresponding elimination rules, but it is easy to see that the ND rules and the corresponding sequent rules are interderivable using the Cut rule. Hence both systems derive the same set of theorems. So in a sense the sequent system is a notational variant of ND. However, the former has an important feature that is crucial to establish decidability: the subformula property. Apart from the Cut rule, every formula in the premise of a sequent rule is a subformula of some formula in the conclusion. To be more precise, the premises of a sequent rule consist of exactly the same material except that they contain one connective less. So doing Cut free bottom up proof search for a given sequent will always reduce complexity and is thus guaranteed to terminate. So to establish decidability for L, it is sufficient to show that the Cut rule is admissible in the sequent system for L without Cut. (A rule is admissible if adding it does not increase the set of derivable theorems.)
Type Logical Grammar: The Framework
51
The proof of this fact is the central part of Lambek, 1958. I give a sketch of Lambek’s proof here.
Theorem 4 (Cut Elimination) If L X ⇒ A, then there is a Cut-free sequent proof of X ⇒ A. Sketch of Proof: We define the complexity of a type or of a sequence of types as the number of symbols (i.e., atomic formulae and logical connectives) occurring in it. I notate the complexity of X as d(X). Now consider a schematic instance of the Cut rule: X⇒A Y, A, Z ⇒ B Cut Y, X, Z ⇒ B The degree of this instance of Cut is given by the definition d(X) + d(Y ) + d(Z) + d(A) + d(B) Lambek shows that every sequent proof that uses Cut exactly once can be transformed into a proof of the same sequent that is either Cut free or else uses one or two Cuts of a lower degree than the Cut in a given proof. In the latter case, this transformation can be applied to the one or two subproofs rooted by the new Cuts. Since the degree of a Cut is always finite and non-negative, repeated application of this transformation eventually eliminates all instances of Cut in the original proof. (I omit Curry-Howard labels in this proof. It can be shown that Cut elimination always leads to a proof term in β-normal form that is αβη-equivalent to the original proof term.) Some terminology: The formula in a Cut application that matches the A in the scheme above is called the Cut formula. All logical rules of the sequent presentation of L introduce exactly one new logical connective. In other words, each logical rule creates exactly one new formula, while all other formulae in the conclusion already occur in the premise. I will call this newly created formula the active formula of a logical rule. Now suppose we have a proof containing some Cut application. Then the proof contains at least one Cut that is not dominated by any other Cut. Now we may distinguish three cases. 1 At least one premise of the Cut is an identity axiom. 2 Both premises are results of logical rules, and the Cut formula is the active formula in both premises. 3 Both premises are results of logical rules, and the Cut formula is not the active formula in one premise.
52
ANAPHORA AND TYPE LOGICAL GRAMMAR
Consider the first case. Schematically, this looks like A⇒A
X, A, Y ⇒ B
X, A, Y ⇒ B
Cut
or X⇒A
A⇒A
Cut X⇒A In either case, the conclusion is identical to one of the premises, and the Cut as a whole can be removed from the proof. Now consider the second case. Suppose the Cut formula is AB, for some logical connective . If this formula is active in both premises, the two premises must be results of the logical rules R and L respectively. Therefore the two subformulae A and B of the Cut formula each occur once in an antecedent and once in a succedent of the premises of the Cut. I illustrate this for the case = /; the other two cases are analogous. X, B ⇒ A X ⇒ A/B
/R
Z⇒B
Y, A, W ⇒ C
Y, A/B, Z, W ⇒ C
Y, X, Z, W ⇒ C
/L
Cut
We may replace the original Cut by two Cuts which have A and B respectively as active formulae. In the example above, the result of this operation is X, B ⇒ A Z⇒B
Y, A, W ⇒ C
Y, X, B, W ⇒ C Y, X, Z, W ⇒ C
Cut
Cut
The final conclusion here is identical to the conclusion of the original Cut, and the original Cut is replaced by two Cuts of a lower degree. Now consider the third and final case, where the Cut formula is inactive in at least one premise. Then the Cut formula occurs in the antecedent of this premise as well, and we may permute the Cut rule with the logical rule. Again I illustrate just one subcase. The other five subcases are analogous. X, D, W ⇒ A U ⇒D
Y, B, Z ⇒ C
Y, B/A, X, D, W, Z ⇒ C Y, B/A, X, U, W, Z ⇒ C ;
Cut
/L
53
Type Logical Grammar: The Framework
U ⇒D
X, D, W ⇒ A X, U, W ⇒ A
Cut
Y, B, Z ⇒ C
Y, B/A, X, U, W, Z ⇒ C
/L
Here a Cut is replaced by one Cut of a lower degree. So in each of the three cases to be considered, it is possible to replace a Cut by at most two Cuts of a lower degree without affecting the ultimate conclusion. Repeated application of this procedure will thus transform every proof into a Cut free proof. There is a simple decision procedure for the Cut free sequent calculus, which leads to the important
Theorem 5 (Decidability (Lambek, 1958)) Derivability in L is decidable. Proof: The conclusion sequent of each rule of the Cut free sequent calculus contains more symbols then its premises (since each formula in the premise occurs as a subformula in the conclusion and each logical rule introduces one logical connective). Furthermore there are only finitely many ways to match a given sequent with the conclusion of some sequent rule. Therefore there are always at most finitely many options to continue bottom up proof search, and every branch of the proof search tree is finite. This entails that the proof search space as a whole is finite. As an immediate corollary to this proof, we obtain the finite reading property of L:
Corollary 1 (Finite reading property) For a given unlabeled L-sequent, there are at most finitely many CurryHoward labelings. Linguistically speaking this means that we may derive at most finitely many different interpretations for a given string of lexical items (provided no infinite lexical ambiguities are involved). Given that different proofs of a given sequent correspond to structural ambiguities rather than vagueness etc., this is certainly a desired result. Especially this fact indicates that Lambek grammars do not suffer from the problem of infinite spurious ambiguities, despite appearance to the contrary. Consider a simple sequence of a subject followed by a VP, i.e., a sequence of categories x : np, y : np\s
54
ANAPHORA AND TYPE LOGICAL GRAMMAR
A simple application of Modus Ponens will yield the goal type s (augmented with the semantic label (yx)). However, it is also possible to perform type lifting with the subject, yielding the sequence λz.zx : s/(np\s), y : np\s Now backward application leads to the succedent ((λz.zx)y) : s. Still, there are more options. Rather than performing application, we can type lift the VP, which gives us λz.zx : s/(np\s), λw.wy : (s/(np\s))\s From here we can derive (λw.wy)(λz.zx) : s/(np\s) via backward application. This iterated type lifting could be repeated arbitrarily many times, thus leading to infinitely many different derivations of the same sequent, np, ns\s ⇒ s. However, all these derivations except the first one use the Cut rule. Applying Cut elimination to any of these derivations returns the original Cut free derivation which just consists of one application step.
2.2.7 Axiomatic Presentation In Lambek, 1958 L is first introduced by means of an axiomatic system. I will briefly review this perspective here too. The theorems of this system are arrows rather than sequents, i.e., pairs of formulae of L. An arrow consisting of the source A and the target B is represented as A→B The axiomatic presentation of L consists of a set of axioms, i.e., arrows, and a set of rules, i.e., relations between arrows. Arrows represent deductions, so we assume that the relation expressed by arrows is reflexive and transitive (i.e., a pre-order). This is covered by the identity axiom scheme and the Cut rule A→A A→B
id
B→C A→C
Cut
We consider the product operator to be the primary connective. Working in the associative system L, we assume axiom schemes that guarantee that the product is associative:
55
Type Logical Grammar: The Framework
A • (B • C) → (A • B) • C
(A • B) • C → A • (B • C)
α
α−1
To introduce the two implications, Lambek makes use of the notion of “residuated functions”. Suppose some set M is ordered by a pre-order ≤, and let f and g be functions in M . Then f and g are residuated functions iff the following holds for all x, y ∈ M : f x ≤ y iff x ≤ gy The product operator defines two unary functions for each type A, namely B → B •A and B → A•B. The two implications “\” and “/” define two unary functions for every type A as well, namely B → B/A and B → A\B. The axiomatics of L requires that these two implicational functions form residuated pairs with the two product functions (where the domain is identified with the set of types and the pre-order with the arrow). Lambek expresses this idea by means of two biconditionals: B → A\C iff A • B → C iff A → C/B This amounts to the following four inference rules (the labels are from Lambek, 1988): A•B →C A → C/B A•B →C B → A\C
β
A → C/B A•B →C
γ
B → A\C A•B →C
β −1
γ −1
The inference relation defined by the axiomatic system in fact coincides with the ND presentation and the sequent presentation of L. A sequent is L-derivable iff replacing all commas in the antecedent by products (and replacing ⇒ by →) yields a derivable arrow. To be more precise
Definition 26 (•-closure) 1 σ(A) = A 2 σ(X, A) = σ(X) • A
56
ANAPHORA AND TYPE LOGICAL GRAMMAR
Theorem 6 L X ⇒ A iff σ(X) → A It is easy to see that the axioms and inference rules of the axiomatic system are derivable in the ND calculus (β and γ correspond to slash introduction and β −1 , γ −1 to slash elimination). To see that the other direction holds as well, first observe that in the presence of Cut, the slash elimination rules are equivalent to the application axioms of BCG. These are easily derivable in the axiomatic system: A\B → A\B A • A\B → B
id
A/B → A/B
γ −1
A/B • B → A
id β −1
As mentioned above, α−1 , β −1 are notational variants of the slash introduction rules. The more general sequent formulation of Cut, as well as the two logical rules for the product come down to the requirement that the product is monotonic in both arguments. This follows from the logic of residuation in the following way: B•D →B•D D → B\(B • D)
C→D
C → B\(B • D) B•C →B•D A→B
B → (B • D)/C A → (B • D)/C A•C →B•D
id γ Cut
γ −1 β Cut
β −1
2.2.8 Model Theory The Lambek calculus and similar type logical calculi are used to reason about linguistic objects. A sequence like np, np\s ⇒ s for instance expresses the fact that a linguistic item of category np, followed by an item of category np\s, jointly constitute a complex object of category s. Model theory for the Lambek calculus studies the relation between syntactic categories and the set of objects they classify.13 13 There
is a source of confusion here: model theory is a branch of semantics. The semantics of syntactic categories is to be distinguished though from the semantics of linguistic signs.
57
Type Logical Grammar: The Framework
A very general approach to the semantics of substructural logics is interpretation in ternary frames. This is inspired by the possible world semantics for standard modal logic, where unary modalities are interpreted relative to a binary accessibility relation between possible worlds. Likewise, the binary operators \, •, and / in L and related calculi can be interpreted relative to a ternary relation.
Definition 27 (Frames) A ternary frame F = W, R consists of a non-empty set of points W and a ternary relation R ⊆ W 3 on W . The associative Lambek calculus L is related to a special class of ternary frames, the associative frames.
Definition 28 (Associative frames) A frame F = W, R is associative iff for all x, y, z, u, v ∈ W : Rxyz ∧ Rzuv → ∃w ∈ W.Rwyu ∧ Rxwz and Rxyz ∧ Ryuv → ∃w ∈ W.Rwvz ∧ Rxyw The set W can be thought of as the set of linguistic signs, and Rxyz intuitively means: x can be decomposed into y and z (in that order). The two associativity postulates above thus express that linguistic composition is associative. Graphically this can be depicted as follows. Triples from R are displayed as elementary trees, with the first element as root and the other two as leaves:14 x
x
⇔
z
y u
v
w y
v u
A frame is turned into a model for L if an interpretation function for formulae is added.
Definition 29 (Model for L) M = W, R, f is a model for L iff W, R is an associative frame, and f is a function from the set of basic categories to subsets of W . The meaning / semantics / interpetation / denotation of a category like np is the set of signs of that category, while the meaning / semantics / interpretation / denotation of a particular np, say John, is an object in the world (here the person John). 14 I owe the graphical representation of ternary frames to Natasha Kurtonina (p.c.).
58
ANAPHORA AND TYPE LOGICAL GRAMMAR
We can extend the function f to an interepretation function for all syntactic categories and strings of categories:
Definition 30 (Interpretation for L) Let B be a set of basic categories and M = W, R, f a model for L. pM A • BM A\BM A/BM X, Y M
= = = = =
f (p) iff p ∈ B {x|∃y ∈ AM ∃z ∈ BM .Rxyz} {x|∀y ∈ AM ∀z.Rzyx → z ∈ BM } {x|∀y ∈ BM ∀z.Rzxy → z ∈ AM } {x|∃y ∈ XM ∃z ∈ Y M .Rxyz}
We say that an object w verifies a formula or sequence X (relative to the model M) iff w is an element of the interepretation of X (relative to M). A sequent X ⇒ A is valid iff it always preserves truth, i.e., whenever an object w verifies the antecedent X in a model M, it also verifies the succedent A relative to M.
Definition 31 (Validity) |= X ⇒ A iff for each model M for L: XM ⊆ AM In Doˇsen, 1992 it is proved that L is sound and complete with respect to the class of ternary frames —the theorems of L are exactly the valid sequents. I only sketch the proof here.
Theorem 7 (Doˇ sen, 1992) L X ⇒ A iff |= X ⇒ A Sketch of proof: Soundness: We prove this via induction over the length of axiomatic derivations. It is obvious that each identity axiom is valid. Likewise, Cut obviously preserves validity. Consider the first associativity rule, α on page 55. Suppose x verifies the left hand side. Then there must be objects y and z such that y verifies A, z verifies B • C, and Rxyz. From the latter fact we infer that there are objects u and v with Rzuv, where u verifies B
Type Logical Grammar: The Framework
59
and v verifies C. Since we are talking about associative frames, there must be an object w such that Rxwv and Rwyu. Therefore w verifies A • B, and x verifies (A • B) • C, i.e., x also verifies the right hand side of the axiom. The validity of α−1 is proved analogously. Now suppose the premise of the rule β is valid. This means that in each model: ∀x, y, z.y ∈ A ∧ z ∈ B ∧ Rxyz → x ∈ C Suppose furthermore that w verifies the antecedent of the conclusion, i.e., z ∈ B. It follows immediately from the interpretation of / that z also verifies C/B. By a similar argument it can be shown that γ is also validity preserving. Finally, suppose the premise of β −1 is valid. Then it holds in each model that ∀x ∈ A∀y ∈ B∀z.Rzxy → z ∈ C Simple first order reasoning transforms this into ∀z.∃x ∈ A∃y ∈ BRzxy → z ∈ C which means that the conclusion is also truth preserving in that model. Hence the conclusion is also valid. By a similar argument it can be shown that γ −1 is also validity preserving. These arguments prove that each arrow of the axiomatic system is valid. Since by definition, σ(X) = X, by Theorem 6 on page 55 every theorem of L is valid. Completeness: We start by constructing a canonical model. The set W is the set of types of L, and R is defined as RABC iff L A ⇒ B • C It follows directly from the associativity of the product in L that this is in fact an associative frame. The interpretation function f is defined as f (p) = {A|L A ⇒ p}
60
ANAPHORA AND TYPE LOGICAL GRAMMAR
Next we show that it holds for all types A and B that A ∈ B iff L A ⇒ B This is proved by induction over the complexity of B. For atomic B it follows directly from the construction of f . So suppose B = C • D. Assume that A ∈ C • D. By the semantics of the product, there are A1 and A2 with A1 ∈ C, A2 ∈ D, and RAA1 A2 . By induction hypthesis, L A1 ⇒ C and L A2 ⇒ D. From the way R is defined it follows that L A ⇒ A1 • A2 . Since the product is upward monotonic in both arguments, it follows that L A ⇒ C • D. Now suppose L A ⇒ C • D. Hence RACD. By induction hypothesis, C ∈ C and D ∈ D. Hence A ∈ C • D. Let B = C/D. Assume that A ∈ C/D. By the semantics of /, if d verifies D and ReAd, then e verifies C. By induction hypothesis, d verifies D. Thus for all e, if ReAD then L e ⇒ C. By the construction of R, it holds that R(A • D)AD. Hence L A • D ⇒ C. It follows directly that L A ⇒ C/D. Conversely, assume that L A ⇒ C/D. Also, assume that for some d and e that d verifies D, and ReAd. Then by induction hypothesis, L d ⇒ D, and by the construction of R, L e ⇒ A•d. Since the product is upward monotonic in both arguments, we infer that L e ⇒ (C/D) • D, and hence L e ⇒ D. From this we conclude that A verifies C/D. The argument for B = C\D is analogous. Finally, suppose that a sequent X ⇒ A is not derivable in L. Then σ(X) ⇒ A is not derivable either. By the above argument, σ(X) verifies X in the canonical model, but it does not verify A. Hence X ⇒ A is not valid. By contraposition it follows that every valid sequent is derivable. The method of interpretation in relational frames is very general, and the above result can be extended in various ways. Here we considered interpretation of a family of binary residuated operators in ternary frames. In general, any family of n-ary residuated operators can be interpreted relative to an n + 1-ary relation. There is a close correspondence between structural rules and frame conditions. The associativity of L corresponds to the associativity conditions on R. The non-associative Lambek calculus is sound and complete with respect to the class of all ternary frames. Requiring that
Type Logical Grammar: The Framework
61
Rxyz ↔ Rxzy leads to the class of frames which is described by the nonassociative commutative Lambek calculus etc. Kurtonina, 1995 gives a general perspective on the correspondence between frame conditions and structural rules. L is also complete with respect to specialized sub-classes of the associative ternary frames. For instance, we can consider interpretation in ordered groupoids, where the set of objects is an algebra with a binary associative relation + and a pre-order ≤. Rxyz can then defined as y + z ≤ x. Doˇsen, 1992 also showed completeness of L for ordered groupoids. An even more concrete sub-class of ordered groupoids are language frames. There W is a set of strings, and Rxyz means that x = yz. Pentus, 1994 proved completeness of L in this class of frames. Another option is to identify W with the set of pairs of some set of states S. Types are then interpreted as sets of pairs, i.e., as binary relations. Our ternary relation R is definable as R(ab)(cd)(ef ) iff a = c, d = e, and f = b. This interpretation was proposed in van Benthem, 1991. Kurtonina, 1995 proves completeness of L1—the variant L where sequents with empty antecedents are permitted—in this class of frames (see also Pankrat’ev, 1994 and Andr´eka and Mikul´ as, 1994 for completeness results for L in a similar class of frames).
2.2.9 Generative Capacity and Complexity It was (correctly) conjectured by Chomsky, 1963 that Lambek grammars weakly recognize exactly the context free languages,15 but the problem remained open until the breakthrough of Pentus, 1993, where the Chomsky conjecture is proved. The inclusion of the context free languages in the Lambek recognizable languages can be established using basically the same argument that is used in the corresponding proof for Basic Categorial Grammars in Bar-Hillel et al., 1960 (sketched on page 10). This fact was noted in Cohen, 1967. Crucially, transforming a context free grammar in Greibach Normal Form into a Basic Categorial Grammar in the way described above creates a grammar where all lexical categories are first order formulae, where “first order” is defined as
15 On
p. 413, Chomsky writes “It is not known how Lambek’s system is related to bidirectional categorial systems or context-freee grammars, although one would expect to find that the relation is quite close, perhaps as close as weak equivalence.” Thanks to Makoto Kanazawa (p.c.) for pointing out this reference to me.
62
ANAPHORA AND TYPE LOGICAL GRAMMAR
Definition 32 1 Every atomic formula is first order. 2 If A is first order and p atomic, then A/p and p\A are first order as well. 3 Nothing else is first order. Furthermore, this construction yields only atomic designated categories. Thus the string recognition task for such a grammar always boils down to the question whether a sequent of the form A1 , . . . , An ⇒ s is derivable, where Ai is a first order formula and s is atomic. So we only have to consider a fragment of the full Categorial type language. It is easy to see that any application of a slash introduction rule in the ND presentation for L leads beyond this fragment, and the same holds for the product rules. Thus such a sequent can only be derived in L by means of the identity axiom scheme, Cut, and the slash elimination rules. These rules are also rules of BCG. So for the first order fragment, derivability in BCG and in L coincide, and it does not make a difference for the language recognized whether a first order grammar is conceived as a BCG or a Lambek grammar. Since every CFG can be transformed into a weakly equivalent BCG which is always first order, it can also be transformed into a weakly equivalent Lambek grammar. Pentus, 1993 showed that the inclusion holds in the other direction as well, i.e., every Lambek grammar can be transformed into a weakly equivalent CFG. Again I restrict myself to a sketch of the proof idea. For a very accessible discussion of the proof, the reader is referred to Buszkowski, 1997. Pentus’ proof crucially relies on the fact that a version of the interpolation theorem holds for L. This was proved in Roorda, 1991. Let us use the notation π(A) (π(X)) to refer to the multiset of atomic formulae occurring in the formula A (the sequence X). So it holds that
Definition 33 π(p) = {p} (p atomic) π(A/B) = π(A\B) = π(A • B) = π(A) ∪ π(B) π(A, X) = π(A) ∪ π(X) Note that ∪ means multiset union here (and ⊆ below multiset inclusion). The interpolation theorem runs as follows:
Theorem 8 (Interpolation Theorem) Let X, Y, Z ⇒ A be an L-derivable sequent. Then there is a formula B with the properties:
Type Logical Grammar: The Framework
63
LY ⇒B L X, B, Z ⇒ A π(B) ⊆ π(Y ) ∩ (π(X) ∪ π(Z) ∪ π(A)) This theorem basically says that starting from a derivable sequent X, Y, Z ⇒ A and a subsequence Y of its antecedent, it is possible to represent this sequent as conclusion of a Cut application in such a way that the Cut formula is composed solely from the atoms that are shared between the two premises of the Cut rule. The proof of this theorem is a routine induction over sequent derivations and is omitted here. The interested reader is referred to Roorda, 1991. In the sequel, I will call a formula B with the properties given in the formulation of the interpolation theorem above an interpolant of Y. The crucial step in Pentus’ proof is the binary reduction lemma which strengthens Roorda’s theorem. (By |π(A)| I mean the cardinality of the multiset π(A), which is a measure of the length of A).
Lemma 1 Let A1 , . . . , An ⇒ B be an L-derivable sequent with n ≥ 2. Then there is a k with 1 ≤ k < n and a type C such that C is an interpolant of the sequence Ak , Ak+1 ∀i(1 ≤ i ≤ n → |π(C)| ≤ |π(Ai )|) |π(C)| ≤ |π(B)| In words, this lemma says that you can always pick out a pair of adjacent antecedent formulae in a derivable sequent such that one of their interpolants does not exceed any of the formulae in the initial sequent in length. This interpolant is constructed solely from the atomic formulae occurring in the initial sequent. This follows from the fact that it is an interpolant. Since its length is limited, there are only finitely many formulae which are candidates for interpolation. Now consider an L-derivable sequent X, A, B, Y ⇒ C where A, B are two adjacent antecedent formulae that have the property described in the lemma. This means that there is a formula D with
64
ANAPHORA AND TYPE LOGICAL GRAMMAR
L A, B ⇒ D L X, D, Y ⇒ C such that only atoms from X, A, B, Y, C occur in D, and the length of D does not exceed the length of the longest formula in X, A, B, Y, C. Note that the original sequent can be derived from these two sequents by means of a single application of Cut. The new sequent L X, D, Y ⇒ C contains one antecedent formula less that the original sequent. Applying this procedure to this shorter sequent will produce a sequent with two formulae in the antecedent—a binary sequent for short—and a still shorter remaining sequent. After finitely many application of the binary reduction lemma, we thus end up with a number of binary sequents that (a) are L-derivable, (b) are composed solely from atoms that occur in the original sequent, (c) consist only of formulae that do not exceed the length of the longest formula in the original sequent. The original sequent can be derived from these binary sequents by means of the Cut rule only. There are only finitely many sequents with these properties. If we consider all sequents with these properties as axioms and Cut as the only inference rule, the original sequent will be a theorem of this deductive system. For a given Lambek grammar, recognition of a string depends on the L-derivability of sequents whose antecedent formulae are lexical categories and whose succedents are designated categories. There are only finitely many lexical or designated categories for a given Lambek grammar. Thus there is an upper bound for the length of formulae that matter for string recognition of this grammar, and only finitely many atoms are involved. Now let a Lambek Grammar G be given and consider the set of sequents of the form A⇒C A, B ⇒ C which have the following properties: 1 All atoms occurring in A, B, C also occur in G (either in a lexical or in a designated category). 2 The length of A, B, and C does not exceed the length of the longest formula occurring in G. Clearly there are only finitely many axioms with this property, and all sequents that matter for string recognition in G are L-derivable if and
Type Logical Grammar: The Framework
65
only if they can be derived from these axioms via the Cut rule. This is an immediate consequence of the binary reduction lemma. (Sequents with just one formula in the antecedent have to be included into the set of axioms to cover cases when we have a lexical assignment a : A such that A ⇒ B for some designated category B.) Now a CFG is essentially a finite set of axioms closed under Cut. So to transform G into a CFG, we have to (a) conceive the above defined axioms as CFG rules, (b) create a novel start symbol S and add the CFG rules A ⇒ S for each designated category A, and (c) add the CFG rules a ⇒ A for each lexical assignment a : A. (Note that I consider A ⇒ B as a notational variant of the conventional CFG rule format B → A). The resulting CFG recognizes exactly the same language as G. The membership problem for context-free languages is known to be solvable in time that is cubic in the length of the string. However, Pentus, 2003 proved that the derivability problem for L is NP-complete in the size of the sequent. This means that the complexity of the parsing problem for Lambek grammars is cubic in the size of the string but NPcomplete in the size of the grammar.
3.
Historical and Bibliographical Remarks
Categorial Grammar was originally conceived in Ajdukiewicz, 1935. It incorporated the distinction between complete and incomplete expressions developed by Frege, Husserl’s concept of “meaning categories”, and Russell’s notion of types. Formally Ajdukiewicz’s system can be considered as the subsystem of Basic Categorial Grammar that only uses the forward slash. Ajdukiewicz’s calculus was extended to the classical bidirectional BCG by Bar-Hillel, 1953. Bar-Hillel et al., 1960 explore the formal properties of BCG. Most importantly, they establish the weak generative equivalence between BCG and context free grammars. Even though the inadequacy of context free grammars for the description of natural language was firmly established only in the 1980s (see Pullum, 1991 regarding the intricate history of this issue), Chomsky’s arguments for the context sensitivity of natural language (see Chomsky, 1957) were widely believed and formal linguists (including Bar-Hillel himself) lost interest in BCG. Joachim Lambek introduced his type logical version of Categorial Grammar in Lambek, 1958 and Lambek, 1961 (the former using an associative and the latter a non-associative logic). Since he presented it as a purely syntactic calculus and its generative capacity was correctly conjectured to be context free by Chomsky in 1963, Lambek style Categorial Grammar did not receive much attention at that time either.
66
ANAPHORA AND TYPE LOGICAL GRAMMAR
Montague, 1974 uses a Categorial core for his PTQ system. He makes explicit use of the category-to-type correspondence and the general parallelism between syntactic and semantic composition implicit in Categorial Grammar. The close connection between syntax and semantics inspired the work on combinatory extensions of BCG that was systematized in the program of Combinatory Categorial Grammar (started in Ades and Steedman, 1982, comprehensive accounts are Steedman, 1996 and Steedman, 2000). In van Benthem, 1983 it is pointed out that the Lambek calculus as a substructural logic displays a version of the Curry-Howard correspondence, and that this supplies the type logical version of Categorial Grammar with a very natural syntax-semantics interface. All subsequent work in the type logical tradition follows this lead. While van Benthem’s slogan “Curry-Howard terms as semantic recipes” established the general attractiveness of Type Logical Grammar for linguists interested in the syntax-semantics interface, the generative limitations of the Lambek calculus remained an obstacle to comprehensive empirical investigations. Moortgat, 1988 contains a first proposal to extend L with additional logical connectives that improve the linguistic coverage while sticking to Curry-Howard labeling. Morrill, 1990 is the earliest attempt to employ unary modal operators for this purpose. This program is carried through for substantial fragments of natural language in Hepple, 1990 and in Morrill, 1994. The latter is also a good introduction into Type Logical Grammar in general. The weak generative equivalence between Lambek grammars and context free grammars was finally established in Pentus, 1993. The context-freeness of Categorial Grammars based on the non-associative Lambek calculus was shown already in Kandulski, 1988. The usage of multimodality extends the generative capacity beyond the limits of context freeness—at a price. Carpenter, 1999 shows that unrestricted Multimodal Type Logical Grammar has the same generative power as a Turing machine. Tiede, 1999 shows that there are Lambek grammars that are not strongly equivalent to any CFG if we conceive ND proofs in tree format as tree structures. Girard, 1987 proposes proof nets as a novel method in proof theory, next to sequent derivations and natural deduction. In Roorda, 1991 this method is adapted to the Lambek calculus. There it is also shown that the assignment of Curry-Howard terms—and this is tantamount to meaning assembly in TLG—can be conceived as a side effect of checking correctness of proof nets. A good introduction to Categorial proof nets is Lamarche and Retor´e, 1996. De Groote and Retor´e, 1996 show that
Type Logical Grammar: The Framework
67
proof nets can themselves be conceived as semantic recipes, which makes Curry-Howard terms as semantic representations obsolete. Carpenter, 1998 presents an up-to-date application of a slightly extended Lambek style Type Logical Grammar to a vast variety of linguistic phenomena. Moortgat, 1997 summarizes recent developments in the area, with a focus on the usage of multimodal techniques.
Chapter 2 THE PROBLEM OF ANAPHORA
After the introduction into the general framework of TLG given in the previous chapter, I will now turn to the main topic of this book, the treatment of anaphora within this approach to grammar. In the first section I will point out why anaphora poses a problem for a strictly compositional theory of the syntax-semantics interface like Categorial Grammar in the first place. In Section 2.2, I discuss the fact that TLG is an essentially variable free theory. Therefore the standard treatment of anaphora using variables is not viable there. The third section gives an overview of existing Categorial approaches to anaphora from the literature.
1.
Anaphora and Semantic Resource Sensitivity
Anaphora is a challenge to any compositional theory of natural language interpretation. Let us explore this claim in some detail. Recall that the principle of compositionality requires the meaning of a complex expression to be determined by the meaning of its components and the way they are combined. The typed λ-calculus has proved to be a useful glue language to represent the operations on meanings that correspond to possible ways to combine signs. This can formally be stated as follows: For each sign S consisting of n lexemes, in each of its readings there is an expression M of the typed λ-calculus with x1 , . . . , xn occurring each exactly once such that M [N1 /x1 , . . . , Nn /xn ] = S where S represents the meaning of S and Ni the meaning of the i’th lexeme.
69
70
ANAPHORA AND TYPE LOGICAL GRAMMAR
The term M can be said to represent the semantic structure of the sign. It is an obvious question to ask whether there are restrictions on the form of these structures in natural language semantics. It is uncontroversial to assume that every λ-operator should bind at least one variable occurrence. This disallows such unnatural meaning recipes like ((λy.x1 )x2 ), which would predict that the meaning of a sign can be completely independent of one of its lexical components. In terms of the hierarchy of resource sensitive logics introduced in the previous chapter, this amounts to the claim that semantic composition in natural language does not use the structural rule of Monotonicity. The Lambek calculus, being a subsystem of Linear Logic, imposes an even stronger constraint on semantic operations. It requires that each λ-operator in M binds at most one variable occurrence. This corresponds to the appealing intuition that each lexical resource is used exactly once. There are prima facie counterexamples to this view, but most of them can nevertheless be handled, as will be illustrated below. To do so, it is crucial to assume that the single-bind condition does not apply to lexical meanings. In the examples that I will discuss, (b) gives the meanings of the lexical items involved, (c) the desired sentence meaning after normalization, and (d) gives the term M in the sense of the definition above.
Reflexives. (1)
a. b. c. d.
John shaves himself. N1 = j, N2 = shave’, N3 =? S = shave’j j M = (λy.x2 yy)x1
At a first glance, the meaning of the subject is used twice here, while the meaning of the reflexive—whatever it may be—doesn’t make any contribution at all. This puzzling situation can be overcome by assigning the meaning λT λy.T yy to the reflexive. (Note that the single-bind constraint does not apply to lexical meanings.) Now the structure of the example gives rise to the meaning recipe M = x3 x2 x1 , which is perfect.1
Coordination Ellipsis. (2)
a. John walks and talks. b. N1 = j’, N2 = walk’, N3 = and’, N4 = talk’
1 This analysis of reflexivization was proposed at various places, see for instance Keenan and Faltz, 1985 and Szabolcsi, 1989.
71
The Problem of Anaphora
c. S = and’(talk’j’)(walk’j’) d. M = (λy.x3 (x4 y)(x2 y))x1 Here again the meaning of the subject occurs twice. As already discussed in the previous chapter, we can handle this by giving and the meaning λxλyλz.(xz) ∧ (yz) This is basically already proposed in Montague’s PTQ system and was probably first generalized to other types in Kayne, 1978.
Other kinds of anaphora and ellipsis. (3)
a. John claims that he will win. b. claim’(win’ j’)j’
Here the representation of the matrix subject occurs twice while the embedded subject completely disappears. Things are similar in the case of VP ellipsis: (4)
a. John walks, and Bill does, too. b. and’(walk’b’)(walk’j’)
Apparently the whole VP of the first conjunct gets recycled here. There are several ways to deal with these constructions. The burden of multiplying meanings could be transferred to the lexical semantics of the pronoun he in (3), and similarly to the auxiliary does in (4). In the case of bound anaphors, this has been proposed by Szabolcsi, 1989 and Dalrymple et al., 1997. However, these systems only capture pronouns that are syntactically bound. Since ellipsis phenomena are largely identical within one sentence and across sentence boundaries—as can be seen in (5)—syntactic binding is unlikely to extend to ellipsis. (5)
John walks. Bill does, too.
A more traditional alternative approach assumes that the output of meaning composition is an underspecified representation where each lexical resource is used exactly once. The final meaning is constructed by resolving the underspecification, thereby possibly identifying several subexpressions. Anaphoric expressions could be interpreted as free variables that get instantiated by means of a pragmatic resolution mechanism that takes place after meaning composition is completed. This is the folklore approach to the interpretation of coreferential pronouns. It has also been applied to ellipsis. A paradigmatic example of this idea is Dalrymple et al., 1991, where the compositional meaning of (4) is
72
ANAPHORA AND TYPE LOGICAL GRAMMAR
supposed to be and’(P b’)(walk’j’), with P representing the meaning of does (,too). This parameter is, in a final step, pinned down to the meaning it is supposed to have by means of a system of term equations. This approach does not easily extend to bound pronouns though. Consider a variant of (3) where the antecedent is a quantifier rather than a name. (6)
a. Everybody claims that he will win. b. every’λx.claim’(win’x)x
If we assume that the pronoun him is translated as a free variable x, we are forced to assume that semantic composition uses a variable binding device since x is bound in the term representing the meaning of the whole sentence. Categorial grammars are essentially variable free theories of grammar though. This does not exclude the usage of variables as semantic parameters, but it does exclude variable binding as a licit semantic operation. This issue will be taken up in the next section. If variable binding is not an option, Categorial approaches to anaphora either have to locate the source of meaning multiplication in the lexicon as discussed above, or they have to assume that the semantic operations that are used in natural language go beyond the single-bind fragment of the λ-calculus, i.e., beyond the resource regime of Linear Logic. Both options have been proposed in the Categorial literature, and I will briefly review representatives of both strategies. In the subsequent chapters, I will present my own proposal that belongs to the second family of approaches.
2.
Variables in TLG
As alluded to above, Lambek grammars—as well as all other Categorial formalisms discussed here—are variable free theories of grammar. In this section I will discuss what this means and which consequences arise from this for the theory of meaning composition. Let us briefly review which role variables play in semantic theories based on transformational syntax, as for instance in Heim and Kratzer, 1998. Consider a relative clause construction like (7)
(book) which John liked
The Logical Form of this clause would come out as
73
The Problem of Anaphora CP
(8) which
λvi
C’ IP
C
I’
NP John
I
VP V
ti
liked
Ignoring tense, the IP John likes ti denotes the truth value 1 in a model M under an assignment function g iff John likes g(i) in M . So its denotation properly depends on the assignment function g. The sister constituent of the operator which, on the other hand, denotes the set of objects that are liked by John in M , i.e., it does not depend on g. In other words, while the index i is free in IP—the denotation of IP depends on the value of i under g, it is bound further up in the tree— the denotations of superconstituents of IP do not depend on i in this way. This means that meanings cannot be identified with plain denotations if the computation of meanings is supposed to be compositional. To see why, have a look at the semantic clause for λ-abstraction:
(9)
M λvi N M g = {a, N g[vi →a] |a ∈ E}
(Here I assume vi to be of type e and E to be the domain of individuals). The denotation of λvi N under g does not just depend on the denotation of N under g, but also on N ’s denotation under different assignment functions. So meaning composition is only compositional if we identify meanings with functions from assignment functions to denotations. Let us compare this to the composition of meanings in Type Logical Grammar. At a first glance, the picture seems to be similar. Apart from the different labels at the nodes and the absence of the phonetically empty functional categories “C” and “I”, the natural deduction derivation tree for this relative clause is virtually identical to the GB-style derivation.
74
ANAPHORA AND TYPE LOGICAL GRAMMAR
(10) liked John john’ : np which λQλP λx.P x ∧ Qx (n\n)/(s/np)
lex
lex
like’ : (np\s)/np
lex
like’x : np\s
like’xjohn’ : s λxlike’xjohn’ : s/np
λP λx.P x ∧ like’xjohn’ : n\n
x : np
1 /E
\E
/I, 1 /E
However, the theoretical status of these two trees is entirely different. In GB and related formalisms, a syntax tree represents the internal syntactic structure of a sign, and meaning assignment is defined via recursion over such tree structures. In TLG, on the other hand, the corresponding tree depicts two kinds of facts: 1 The sequent x : (n\n)/(s/np), y : np, z : (np\s)/np ⇒ x(λw.zwy) : n\n is derivable in the Lambek calculus L, and 2 the lexicon relates the form “which” to the category (n\n)/(s/np) and the meaning λQλP λx.P x ∧ Qx, and likewise for the other lexical items involved. The tree thus represents the structure of a proof, not the structure of a sign. For Type Logical Grammar as a theory of grammar, it is inessential how we prove the derivability of sequents, so the proof theory is strictly speaking not part of our linguistic theory. To use a metaphor, the usage of free variables to mark hypotheses in natural deduction proofs has the same theoretical status as the subscripts that some people use when they do complicated additions on paper (illustrated in Figure 2.1 on the next page). These subscripts are artefacts of a certain algorithm to carry out an addition, and they are not part of number theory. Likewise, natural deduction proof trees, and especially variables marking hypotheses, are artefacts of a certain proof theory rather than ingredients of the underlying linguistic theory. The principle of compositionality requires that the meaning of a complex expression is determined from the meanings of its components and the way they are combined. In TLG, every derivable sequent represents a “syntactic rule”, i.e., a licit way to combine signs to form a possibly larger sign. Curry-Howard labeling supplies the corresponding
75
The Problem of Anaphora
9 + 5 +2 61 2 1
Figure 2.1.
8 5 51 9
7 5 4 6
Calculation using subscripts
semantic operation. So strictly speaking there are infinitely many syntactic operations in TLG. Proof theoretic rules like natural deduction rules or sequent rules are not rules to combine resources, but they have the status of GPSG’s meta-rules (cf. Gazdar et al., 1985): They transform syntactic rules into new syntactic rules.2 The set of the TLG rules is the closure of the identity map under the sequent rules (or, alternatively, the natural deduction rules) of the Lambek calculus. As a consequence, it is literally impossible to design a non-compositional meaning assignment to a Lambek grammar, provided Curry-Howard labels are interpreted as meaning recipes. Despite appearance to the contrary, meaning assignment via CurryHoward labeling is essentially variable free. To see why, observe that the only variables that occur free in a Curry-Howard term on the succedent side of a derivable sequent are those that are used as labels of the antecedent formulae. These variables are just place holders for lexical meanings. As a consequence, the meaning of a complex expression will never depend on the assignment function (provided the lexical meanings don’t). In terms of the syntax of λ-terms, this means that there are no semantic operations that turn a free variable occurrence into a bound one. Using the λ-calculus as a semantic glue language still requires the usage of assignment functions in intermediate steps that lead to the computation of complex meanings. However, due to compositionality, semantic representations are not essential for the theory and can in principle be dispensed with. We could augment the inference rules of L with
2 Pentus’ (1993) binary reduction lemma (which is discussed in the previous chapter in connection with the weak generative capacity of Lambek grammars) entails that for a given Lambek grammar, a finite number of unary and binary rules is sufficient. In practice a few instances of the standard combinators like type lifting, function composition and the Geach rule will do. Nonetheless, TLG in general comprises infinitely many syntactic rules.
76
ANAPHORA AND TYPE LOGICAL GRAMMAR
direct operations on meanings, or we could use a variable free glue language like combinatory logic instead. There are two reasons why a variable free design of meaning assignment is ceteris paribus to be preferred. First, the meanings that such a theory assigns to linguistic expressions are simpler objects than their counterparts in an equivalent theory using variables. In the latter case, meanings are always functions from assignment functions into intuitive denotations, and this additional information is mostly redundant. Second, using variables means managing variable names. The standard way of doing this is to augment certain linguistic expressions with additional information like referential indices. This introduces a non-compositional aspect into the theory since the linguistic input simply does not contain this information. Even theories that do without such devices—like DRT (in the version of Kamp and Reyle, 1993)—assume that the linguistic input has to be disambiguated with respect to variable binding and coreference before interpretation is possible. It seems more natural to treat such ambiguities as structural ambiguities. In a variable free theory, this is the obvious route to take. The issue of variable freeness is discussed in great detail in Jacobson, 1996b, Jacobson, 1999, and Jacobson, 2000.
3.
Previous Categorial Approaches to Anaphora
As mentioned above, Lambek grammars display a Linear resource management regime for semantic composition: every lexical meaning can be used exactly once in the derivation of complex meanings. There are basically two strategies to cope with the fact that anaphora phenomena in natural language do involve a re-use of resources. We can locate the resource multiplicative force in the semantics of the lexical entries that trigger it—such as anaphoric pronouns, reflexives, or coordination particles. Alternatively, one might assume that resource multiplication is in fact part of the grammatical machinery of natural language. Choosing the latter option comes down to admitting a limited use of the structural rule of Contraction in syntax. Both options have been investigated in the literature, and in the remainder of this chapter, I will briefly discuss representatives of both paradigms.
3.1
Resource Multiplication in the Lexicon
3.1.1 Szabolcsi (1989) The first attempt to integrate anaphora resolution into the Categorial machinery is due to Anna Szabolcsi (Szabolcsi, 1989; see also Szabolcsi, 1992). Her proposal is mainly concerned with the behavior of reflexive pronouns, as in
77
The Problem of Anaphora
(11)
John likes himself.
Reflexive pronouns have three properties that an adequate theory should strive to cover: 1 They require a binder. 2 The binder must be syntactically more prominent than the reflexive itself. 3 The binder must be part of the same domain of locality. There is some dispute in the literature as to how the notions of “prominence” and “locality” should be exactly defined. I leave this open here.3 Szabolcsi assumes that the meaning of the reflexive pronoun is essentially an argument reducer, viz. it is a function that takes a binary relation as argument and returns the diagonal of this relation. Formally, the meaning of himself thus comes out as (12)
λRx.Rxx
Projected into syntax, this means that himself is a functor that consumes a transitive verb and produces a VP. Its syntactic category is therefore (13)
((np\s)/np)\np\s
It is noteworthy that this category is also the result of applying the combinator T< to the base category np. So Szabolcsi correctly predicts that there are no contexts where a reflexive can occur but a name can’t.4 Given this, the derivation of (11) is straightforward. It only involves function application: likes John john’ np
lex
lex
himself λRx.Rxx ((np\s)/np)\np\s
like’ (np\s)/np
λx.like’xx np\s
lex A<
A>
like’john’john’ s 3 In the Binding Theory of Chomsky, 1981, “prominence” would be identified with c-command and the locality domain with the smallest structure containing a subject. 4 As far as syntax is concerned; Binding Principle C rules out a certain reading of (i), not the sentence per se (i) John likes John.
78
ANAPHORA AND TYPE LOGICAL GRAMMAR
It is obvious that this treatment of reflexives covers the first generalization given above. The reflexive is a higher order functor that can only be applied to an argument that has an open argument slot itself. So strictly speaking the reflexive is not directly bound by the subject in the derivation above but by the subject slot of the verb. This is a desired result, given that reflexives are also licit in constructions where there is no overt local binder: (14)
John tried to enjoy himself.
Here himself would be “bound” by the subject slot of enjoy, which in turn will be connected to the matrix subject John via the lexical semantics of the matrix verb. At first glance, nothing excludes the existence of a reflexive in nominative which occupies the subject position and is bound by the object. So it seems to be logically possible to assume a reflexive heself with the lexical entry (15a) which would render (15b) grammatical and assign it the meaning (15c). (15)
a. heself – λRx.Rxx : s/np/((np\s)/np) b. Heself loves everyone. c. ∀x(love’xx)
Szabolcsi excludes this possibility by the assumption that the category of a reflexive should always be obtainable from the category np via (possibly repeated) application of combinators. For accusative reflexives, this can be achieved by using backward type lifting. For the nominative reflexive, we would need forward type lifting and the Geach rule: np ⇒T> s/(np\s) ⇒Geach s/np/((np\s)/np) Szabolcsi assumes that the Geach rule is not part of the grammar of English. So her explanation of the prominence condition rests on assumptions on the inventory of combinators that are not easily reproduced in a type logical setting. Like any theory that handles the c-command constraint on reflexive binding correctly, Szabolcsi’s theory has problems coping with double object constructions as in (16), where the prominence hierarchy is apparently inverse to the intuitive c-command relation. (16)
a. *John introduced herself to Mary. b. John introduced Mary to herself.
This problem can be overcome if some notion of wrapping is invoked. Intuitively, the verb in (16) first combines with the prepositional object and forms a discontinuous constituent, which in turn is “wrapped”
79
The Problem of Anaphora
around the direct object to form a VP. Under this perspective, the direct object c-commands the prepositional object. There is no obvious way though to handle the locality constraint on reflexive binding correctly if the Categorial machinery comprises the combinator B (function composition, which roughly corresponds to associativity in a type logical system). So an example like (17) can get a derivation like Figure 2.2, which yields a reading where the matrix subject binds a reflexive in an embedded clause. (17)
Johni thinks Mary likes himselfi . M ary
lex
mary’ np thinks think’ (np\s)/s John
lex
lex
λx.xmary’ s/(np\s)
T>
likes like’ (np\s)/np
λy.like’ymary’ s/np
λy.think’(like’ymary’) (np\s)/np
lex B>
B>
λy.think’(like’ymary’)y np\s
john’ np
himself λRx.Rxx ((np\s)/np)\np\s
lex A<
A<
think’(like’john’mary’)john’ s
Figure 2.2.
Derivation of (17)
This need not be a disadvantage. Szabolcsi speculates that the locality constraint is not part of the grammar of anaphors but rather a processing effect. Grammar thus does not distinguish between reflexives and pronouns, and the above derivation represents the sentence (18)
Johni thinks Mary likes himi .
Quite generally, Szabolcsi proposes to treat bound pronouns in a manner similar to reflexives. Of course neither a uniform category assignment nor a uniform meaning will cover all instances of bound pronouns. For instance, she proposes to assign the pronoun he in (19a) the lexical entry (19b). (19)
a. Everybody thought he saw Mary. b. he – λxyz.y(xz)z : (((np\s)/s)\np\s)/(np\s)
80
ANAPHORA AND TYPE LOGICAL GRAMMAR
In words, he is treated as a functor here that consumes a verb requiring a sentential object to its left and a VP to its right to produce a VP, thereby identifying the subject slots of the matrix verb and of the embedded VP. Analogously, the accusative pronoun him will receive the following lexical entry (which makes him synonymous with he): (20)
him – λxyz.y(xz)z : (s/np)\((np\s)/s)\(np\s)
The categories of bound pronouns may be considered pied-piped versions of ordinary NPs. Nominative NPs and accusative NPs are assigned the categories s/(np\s) and (s/np)\s respectively. The categories for pronouns are analogous, except for the fact that the goal category s is lifted to the category ((np\s)/s)\(np\s), i.e., the category of an object clause. Syntactically this guarantees that a clause containing a bound pronoun is always embedded into a matrix clause, while the semantics ensures binding by some superordinated subject. So while Chomsky’s Binding Principle A remains unaccounted for, Principle B is built into the lexical entry of pronouns. As the readers may convince themselves, this approach requires a considerable proliferation of lexical entries for pronouns if binding from non-subject positions should be taken into account. Szabolcsi assumes that they are all instances of a general combinatory scheme, in a way similar to the polymorphic treatment of coordination discussed in the previous chapter. All instances of this scheme obey the command constraint for similar reasons as the simple reflexive pronoun discussed above. This leads to an empirical and a conceptual disadvantage. First, the restriction of bound pronouns to configurations where they are c-commanded by their binder (proposed in Reinhart, 1983) is empirically inadequate in many cases. The following example (from Gawron and Peters, 1990) illustrates this. (21)
The soldiers turned some citizens in [each state]i over to itsi governor.
Other counterexamples to this generalization can be produced at will. This issue will be taken up in greater detail in Chapter 4. Furthermore, Szabolcsi follows Reinhart in assuming that pronoun binding is fundamentally different from anaphoric coreference. This is also empirically questionable. The following example (again taken from Gawron and Peters, 1990) indicates this. (22)
Every student read his paper before the teacher did.
The sentence has a reading where every student x read x’s paper before the teacher read x’s paper. In other words, the pronoun his is bound
The Problem of Anaphora
81
here but nevertheless gives rise to a strict reading under ellipsis. This is unexpected if bound pronouns are really nothing but argument reducers semantically. Quite independently from this fact, a Reinhart-Szabolcsi style approach considers the coreference in examples like (23) to be fundamentally different from instances of pronoun binding. (23)
The man who asked for John met him.
From the perspective of semantic resource management, there is no fundamental difference between binding and coreference. While binding involves binding of multiple variable occurrences by one λ-operator, coreference comes down to the multiple use of one lexical resource. Both phenomena are instances of a Relevant (as opposed to Linear) resource management regime, i.e., they require access to a (lexically controlled) application of Contraction. Given that these conceptually similar phenomena—binding and coreference—are not overtly distinguished in natural language, it seems somewhat artificial to separate them in linguistic theory.
3.1.2 Discontinuity Szabolcsi’s approach to pronoun binding leads to a proliferation of lexical entries since every different structural configuration in which the pronoun and its binder may occur require a separate lexical specification. It is thus desirable to somehow generalize this approach. The proposals of Moortgat, 1996a and Morrill, 2000 can be seen as attempts to do this. While Szabolcsi only mentions the usage of a “wrap” operation in passing, both Moortgat and Morrill treat discontinuity as an essential aspect of anaphora. A detailed discussion of the type logical implementation of discontinuity would go beyond the scope of this work, so I restrict myself to an illustration of the basic intuitions. Moortgat, 1996a. I start with the discussion of Moortgat’s proposal. While the Lambek calculus only allows reasoning over continuous strings, certain linguistic phenomena are best described as invoking operations that address a string and a non-peripheral substring of it. Quantifier scope is an obvious case in point. In a sentence like (24)
John introduced everyone to Mary.
the quantifier everyone occupies the structural position of an np, but its meaning operates over the whole sentence. Since syntactic and semantic composition cannot be divorced in TLG, the syntactic placement of
82
ANAPHORA AND TYPE LOGICAL GRAMMAR x : q(A, B, C) .. . .. .
i y:A .. .
.. . .. .
M :B
x:A qI λy.yx : q(A, B, B)
qE, i x(λyM ) : C Figure 2.3.
Natural Deduction rules for q
everyone thus has to involve the sentence as a whole as well. Intuitively, one might say that everyone is an operator that transforms the discontinuous string John introduced to Mary into the continuous string in (24). So the argument of everyone would be an s which contains an np gap. Arguably, the meaning of this object should be a function from np denotations to s denotations, so filling the np gap with an np in syntax amounts to function application in semantics. The meaning of everyone is thus a function from such functions to s denotations, i.e., a quantifier. Moortgat suggests a three place type constructor “q” to describe this behavior. Everyone is assigned the category q(np, s, s). This means that everyone can replace an np inside an s, and the result of this replacement will again be an s. Generally, a sign α has category q(A, B, C) iff replacing an item of category A inside a sign of category B by α results in a sign of category C. That the categories B and C need not be identical is illustrated by pied piping phenomena. A wh-NP like which man transforms an ordinary PP into a prepositional wh-phrase if it replaces a sub-NP: (25)
a. to a friend of John ; pp[−wh] b. to a friend of which man ; pp[+wh] c. which man ; q(np, pp[−wh], pp[+wh])
This intuitive content of the category q(A, B, C) is formally covered by the natural deduction rules in tree format in Figure 2.3 that extend the simple Lambek calculus L.5 The elimination rule roughly says the following: to use a premise of type q(A, B, C), replace it hypothetically by a premise of type A and use this together with the surrounding material to derive the conclusion B. 5 It
should be remarked that this formalization is incomplete. While the sequent q(np, s, s) ⇒ q(np, np\s, np\s)
is intuitively valid, it is not derivable.
The Problem of Anaphora
83
If you succeed, you can discharge the hypothesis and replace it by the original q(A, B, C), thereby changing the root node of the whole derivation from B to C. In terms of Curry-Howard labels, this rule amounts to λ-abstraction over the hypothetical A, followed by applying the label of q(A, B, C) to the resulting abstract.6 Since this deduction achieves nonlocal binding without involving movement, the q-constructor is dubbed “in situ binder”. The introduction rule is a generalization of the combinatory type lifting rule. It says that every A can (trivially) occupy an A-position inside a larger constituent of any arbitrary type B. The semantic type corresponding to the in situ binder is defined by the equation τ (q(A, B, C)) = τ (A), τ (B), τ (C) The proof theoretic properties as well as the linguistic applications of the in situ binder will be discussed in greater detail in Chapter 4. For the time being, I restrict attention to Moortgat’s proposal to apply q to the analysis of bound pronouns. He focuses on subject oriented reflexives as in (26)
a. John likes himself. b. John introduced himself to Mary. c. John dedicated the book to himself.
Szabolci’s theory of reflexives is confined to cases where the reflexive is the direct object. As (26b) shows, such a treatment does not cover all cases of reflexivization since reflexive pronouns are not confined to the direct object position.7 The correct generalization of her proposal seems to be that a subject oriented reflexive always occupies an np position inside a VP and identifies the argument slot it occupies with the subject of this VP. This behavior is correctly covered by the lexical entry (27)
himself – λRx.Rxx : q(np, np\s, np\s)
Note that Moortgat assumes the same meaning for the reflexive as Szabolcsi. The in situ binder is not sensitive to its linear position inside its scope. So the derivations for (26a) and (b) are completely analogous. They are given in Figures 2.4 on the following page, and 2.5 on the next page, respectively. 6 So
the hypothesis A, together with its Curry-Howard label, plays a similar role here as the storage in Cooper’s 1983 theory of quantifier scope. 7 Szabolcsi handles such cases by invoking “simulated wrapping”. To this end, she utilizes the powerful combinator of crossed function composition.
84
ANAPHORA AND TYPE LOGICAL GRAMMAR
himself likes
lex
y np
like’ (np\s)/np John
like’y np\s
lex
λx.like’xx np\s
john’ np
lex
λRx.Rxx q(np, np\s, np\s)
1
/E
qE, 1 \E
like’john’john’ s Figure 2.4.
Derivation for (26a)
himself introduced
lex
λRx.Rxx q(np, np\s, np\s) y np
introduce’ (np\s)/pp/np
lex to
1
/E
to’ pp/np
introduce’y (np\s)/pp John john’ np
M ary mary’ np
to’mary’ pp introduce’y(to’mary’) np\s
lex
lex
λx.introduce’x(to’mary’)x np\s
lex /E
/E
qE, 1 \E
introduce’john’(to’mary’)john’ s Figure 2.5.
Derivation for (26b)
Like Szabolcsi’s proposal, Moortgat’s type assignment to reflexives does not predict the locality constraint on binding. So we may apply the same treatment to bound pronouns like he or him to cover cases like (28)
a. Everyone believes that he has a solution. b. Everyone believes that John will talk to him.
Morrill 2000. While Moortgat’s account abstracts over irrelevant information pertaining to the linear position of the anaphoric element
The Problem of Anaphora
85
itself, it is still limited in scope. Binders or antecedents are restricted to c-commanding subjects, and there is no obvious way this problem could be overcome just by using the in situ binder. Morrill, 2000 therefore invokes powerful operations of “wrap” and “secondary wrap” that abstract both over the particular position of the anaphoric element and the position of its antecedent. While Moortgat’s in situ binder operates in a global fashion, wrapping decomposes discontinuity into more elementary operations. The product operator • of the Lambek calculus models string concatenation. Morrill (drawing on previous work of Bach, 1979, Versmissen, 1991, Solias, 1992, Morrill and Solias, 1993, Morrill, 1994, Morrill, 1995, and Morrill and Merenciano, 1996) extends L with a binary operator that models wrapping of a discontinuous string around a continuous string. Its first argument is to be thought of as a discontinuous string (i.e., a pair of strings) and its second argument as a simple string. Combining them via yields the result of infixing the second argument into the split point of the first argument, so we end up with a continuous string. The wrapping operator is a product operator like the concatenative product •, and thus left and right implications can be defined by means of left and right residuation analogously to the standard slashes \ and /. So the inventory of type forming connectives is extended with two more binary operators ↑ and ↓, which obey the residuation laws B ↓ A → C iff A → B C iff A ↑ C → B A sign a has category B ↓ A iff wrapping a discontinuous constituent of category B around a yields a continuous constituent of category A. Conversely, sign a has category A ↑ C iff it is a discontinuous constituent which yields a continuous constituent of category A if it is wrapped around a sign of category C. The category-to-type correspondence for the discontinuity operators is similar to that of the standard Lambek connectives. The discontinuous product corresponds to pair formation, while the implications ↑ and ↓ create function spaces. So we have
1 τ (A B) = τ (A) ∧ τ (B)
2 τ (A ↑ B) = τ (B ↓ A) = τ (B), τ (A)
86
ANAPHORA AND TYPE LOGICAL GRAMMAR
These operators allow the definition of Moortgat’s q-operator as a combination of left residuation and right residuation of the wrapping operation:8 . q(A, B, C) = (B ↑ A) ↓ C So Moortgat’s analysis of subject oriented reflexives and pronouns can be reproduced in a wrapping analysis by assigning he and him(self ) the lexical entry (29)
λRx.Rxx : ((np\s) ↑ np) ↓ (np\s)
To generalize this analysis to binding from non-subject positions, Morrill proposes to generalize the notion of wrapping. Informally his idea runs a follows. Consider a sentence like (30)
Mary convinced everyonei that hei should participate.
Let us represent the meaning of Mary convinced x that y should participate as ϕ(y, x). The meaning of (30) is thus ∀xϕ(x, x). According to Morrill’s analysis, (30) is derived from a tripartite discontinuous constituent (31)
Mary convinced
that
should participate.
which contains two split points—to be occupied by NPs—and has the semantics λxyϕ(y, x). A pronoun like he still has the semantic value of an argument reducer (λRx.Rxx), but syntactically it is now infixed into the second split point of such a tripartite constituent, yielding a bipartite discontinuous constituent like (32). Note that a pronoun can only be infixed into the second split point of a tripartite constituent, thus Morrill’s system predicts that a pronoun always follows its antecedent. (32)
Mary convinced
that he should participate.
The semantic value of (32) is obtained by applying the meaning of the pronoun to the meaning of (31). This gives us λxϕ(x, x). The quantifier everyone is treated in a Moortgat style fashion, i.e., it has category (s ↑ np) ↓ s and meaning λP ∀xP x. Infixing it into (32) yields (30) and the desired meaning ∀xϕ(x, x). 8 As pointed out in Moortgat, 1996b, a decomposition of q in terms of wrapping only works if the default product is associative. Decomposition of q in a non-associative environment requires more powerful multimodal techniques.
The Problem of Anaphora
87
This intuitive idea is formalized by extending the Lambek calculus plus ordinary wrapping with a third product operator 2 . A 2 B is to be understood as combining a tripartite discontinuous constituent A with a continuous constituent B by infixing B into the second split point of A. The result is thus a bipartite discontinuous constituent. Like the concatenative product and the ordinary wrap product , this “secondary wrap” product comes with two implications ↓2 and ↑2 which are related to 2 via the residuation laws B ↓2 A → C iff A → B 2 C iff A ↑2 C → B So a sign has category B ↓2 A iff infixing it into the second split point of a tripartite constituent of category B yields a bipartite constituent of category A. Likewise, a sign has category A ↑2 C iff it is a tripartite constituent which yields a bipartite sign of category A if you infix a continuous sign of category C at its second split point. Semantic type assignment for secondary wrap categories is analogous to the other two families of type forming connectives. We have 1 τ (A 2 B) = τ (A) ∧ τ (B) 2 τ (A ↑2 B) = τ (B ↓2 A) = τ (B), τ (A) The communication between the three families of logical connectives is established by a natural deduction calculus that uses prosodic labeling (next to semantic Curry-Howard labeling). The units of the deductive system are triples consisting of a prosodic label, a semantic label and a category. I write them as p − s : c, where p and s are prosodic and semantic labels respectively, and c is a category. Intuitively, a prosodic term represents the form component of a sign, just as Curry-Howard labels represent the semantic component. Since we are dealing with three sorts of signs—continuous constituents, bipartite and tripartite discontinuous constituents—prosodic labels are categorized as belonging to the sorts T 1 , T 2 or T 3 . There are operations ( , ) and ( , , ) that form bipartite and tripartite discontinuous terms from continuous ones, and there is a term operation · representing concatenation. Furthermore I assume a term constant ε representing the empty string. So the set T of prosodic terms over a set of atomic prosodic terms is defined as in the following definition, where greek lower case letter α, β, γ are used as meta-variables over terms from the sort T 1 .
Definition 34 (Prosodic terms) Let a countably infinite set AT of atomic prosodic terms be given. The sets T 1 , T 2 , T 3 , T are the smallest sets such that
88
ANAPHORA AND TYPE LOGICAL GRAMMAR
1 AT ∪ {ε} ⊆ T 1 2 If α, β ∈ T 1 , α · β ∈ T 1 3 If α, β ∈ T 1 , (α, β) ∈ T 2 4 If α, β, γ ∈ T 1 , (α, β, γ) ∈ T 3 5 T = T1 ∪ T2 ∪ T3 Furthermore there is an equivalence relation ≡ over terms which ensures that concatenation is associative and that the empty string is an identity element for concatenation.
Definition 35 ≡ is the smallest equivalence relation over T such that 1 (α · β) · γ ≡ α · (β · γ) 2 ε·α≡α≡α·ε 3 If α1 ≡ α2 , β1 ≡ β2 , and γ1 ≡ γ2 , then α1 · β1 ≡ α2 · β2 , (α1 , β1 ) ≡ (α2 , β2 ), and (α1 , β1 , γ1 ) ≡ (α2 , β2 , γ2 ) The natural deduction presentation of the extension of L with wrap and secondary wrap consists of inference rules over prosodically and semantically labeled formulae (see Figure 3.1.20 on the facing page). For simplicity I restrict myself to the implicational fragment, since the product rules are not used in the linguistic applications to follow. I will use lowercase Greek letters α, β, γ, ... as metavariables over terms from T 1 , and boldface lowercase Latin letters a1 , a2 , b, ... as metavariables for atomic terms. I start with the labeled versions of the rules for the Lambek slashes. The slash elimination rules correspond to simple concatenation on the level of prosodic terms. Slash introduction involves hypothetical reasoning. The hypothetical premise comes with an atomic prosodic label that is discharged with the introduction rule. The logical rules for ↓, ↑, ↓2 , and ↑2 are similar except for the fact that they perform a wrap operation on the prosodic labels rather than plain concatenation. In the prosodically and semantically labeled system, lexical entries can in fact be identified with labeled formulae. This suggests a modified notion of string recognition—a string is recognized iff it is the prosodic label of the conclusion of a derivation that only uses lexical entries as premises. Semantic labeling still supplies the semantic composition as a side effect.
89
The Problem of Anaphora a−x:A .. .
i
.. . .. .
.. . .. .
i
a−x:A .. .
a − x : A/B
i
.. . (α1 , α2 ) − x : A
↓ I, i
i a−x:A .. . γ1 · a · γ2 − M : B
.. . γ−y :A↓B
α1 · γ · α2 − (yx) : B
.. . (γ 1 , γ 2 ) − x : B ↑ A
(γ 1 , γ 2 ) − λxM : B ↑ A (a1 , a2 , a3 ) − x : A .. . (a1 , a2 · γ · a3 ) − M : B γ − λxM : A ↓2 B
↑ I, i i
(γ 1 , γ 2 , γ 3 ) − λxM : B ↑2 A
.. . γ − y : A ↓2 B
(α1 , α2 · γ · α3 ) − (yx) : B
.. . (γ 1 , γ 2 , γ 3 ) − x : B ↑2 A ↑2 I, i
.. . α−y :A
γ 1 · α · γ 2 − (xy) : B
.. . (α1 , α2 , α3 ) − x : A
↓2 I, i
i a−x:A .. . (γ 1 , γ 2 · a · γ 3 ) − M : B
Figure 2.6.
/E
/I, i
α − λxM : B/A
γ − λxM : A ↓ B
b−y :B
a · b − (xy) : A
α·a−M :B
(a1 , a2 ) − x : A .. . a1 · γ · a 2 − M : B
\E
\I, i
α − λxM : A\B .. . .. .
b − y : A\B
a · b − (yx) : B
a·α−M :B
.. . .. .
a−x:A
.. . α−y :A
(γ 1 , γ 2 · α · γ 3 ) − (xy) : B
↓E
↑E
↓2 E
↑2 E
Natural deduction rules with prosodic labels
Let us return to the issue of anaphora. As said before, Morrill basically treats a pronoun like he (as well as its accusative form him and the reflexive himself ) as an operator that infixes itself into the second split point of a tripartite discontinuous string. Thus it has a category of the form A ↓2 B. Its argument is a clause that contains two np gaps.
90
ANAPHORA AND TYPE LOGICAL GRAMMAR
A clause with one np gap has category s ↑ np, thus a clause with two such gaps has category (s ↑ np) ↑2 np. The result of infixing he into the second gap yields a clause with one np gap. So the syntactic category comes out as ((s ↑ np) ↑2 np) ↓2 (s ↑ np). As in the Szabolcsi/Moortgat approach, the meaning of a pronoun is just the diagonalization operator, so the full (preliminary) lexical entry for he is (33)
he/him/himself − λRx.Rxx : ((s ↑ np) ↑2 np) ↓2 (s ↑ np)
This lexical entry still disregards locality constraints on the distribution of reflexive and personal pronouns. Simplifying somewhat, the antecedent of a personal pronoun must not be contained in the same local clause as the pronoun itself. Consider the example (34)
John thinks that Bill knows him.
Here only the matrix subject John is a licit antecedent of the pronoun him. The wrapping mechanism is fine grained enough to cover this generalization. We may consider him as an operator that replaces an np inside its local clause and turns this clause into a clause that needs an np antecedent. In analogy to the all-purpose pronoun category given above, a clause that needs an np antecedent (in the context of a matrix clause) has category ((s ↑ np) ↑2 s) ↓2 (s ↑ np). A VP which constitutes such an “anaphoric” clause then has category np\(((s ↑ np) ↑2 s) ↓2 (s ↑ np)). An accusative pronoun like him is an operator that infixes itself into an np gap of an ordinary VP and returns an anaphoric VP. So a more informed lexical entry for him is (35)
him − λxyzw.z(xwy)w : ((np\s) ↑ np) ↓ (np\((s ↑ np) ↑2 s) ↓2 (s ↑ np))
The nominative version he is restricted to subject positions. This means that it combines with an ordinary VP to form an anaphoric clause. This leads to the lexical entry (36)
he − λxyz.y(xz)z : (((s ↑ np) ↑2 s) ↓2 (s ↑ np))/(np\s)
Note that in both entries, one λ-operator binds two variable occurrences, so the job of meaning duplication is done by the lexical entries here, as in Szabolcsi’s and Moortgat’s approach. With these lexical entries in hand, we can derive bound readings for constructions where the binder does not c-command the pronoun, as in (37)
a. Mary convinced everyone that he should participate. b. Mary convinced everyone that the teacher likes him.
lex
M ary mary np
lex
that
1 a np
/E
convinced · a (np\s)/cp
(mary · convinced, that · b) s ↑ np (mary · convinced, that · b · ε) s ↑ np (mary · convinced, that, ε) (s ↑ np) ↑2 s
2 b s
that · b cp
convinced · a · that · b np\s
mary · convinced · a · that · b s
lex
that cp/s
The Problem of Anaphora
convinced convinced (np\s)/cp/np
/E
/E
\E
↑ I, 1 ≡ ↑2 I, 2
he
lex
should participate sh p np\s
he (((s ↑ np) ↑2 s) ↓2 (s ↑ np))/(np\s) he · sh p ((s ↑ np) ↑2 s) ↓2 (s ↑ np)
(mary · convinced, that · he · sh p · ε) s ↑ np (mary · convinced, that · he · sh p) s ↑ np
≡
everyone everyone (s ↑ np) ↓ s
lex ↓E
91
Derivation of (37a)
/E
↓2 E
mary · convinced · everyone · that · he · sh p s
Figure 2.7.
lex
92
convinced
lex
convinced (np\s)/cp/np
mary np
lex
/E
convinced · a (np\s)/cp
mary · convinced · a · that · b · ε s (mary · convinced, that · b · ε) s ↑ np (mary · convinced, that, ε) (s ↑ np) ↑2 s
≡ ↑ I, 1 ↑2 I, 2
knows
2 b s
that · b cp
convinced · a · that · b np\s
mary · convinced · a · that · b s
lex
that cp/s
lex
knows (np\s)/np
/E
knows · c np\s
/E
knows · c · ε np\s
/E the the np/n
lex
teacher teacher n
the · teacher np
(knows, ε) (np\s) ↑ np
lex
3 c np
/E
≡ him
↑ I, 3
knows · him · ε np\((s ↑ np) ↑2 s) ↓2 (s ↑ np)
/E
knows · him np\(((s ↑ np) ↑2 s) ↓2 (s ↑ np)) the · teacher · knows · him ((s ↑ np) ↑2 s) ↓2 (s ↑ np)
(mary · convinced, that · the · teacher · knows · him · ε) s ↑ np (mary · convinced, that · the · teacher · knows · him) s ↑ np
≡
Derivation of (37b)
↓E
≡ \E
↓2 E
mary · convinced · everyone · that · the · teacher · knows · him s
Figure 2.8.
lex
him ((np\s) ↑ np) ↓ (np\((s ↑ np) ↑2 s) ↓2 (s ↑ np))
everyone everyone (s ↑ np) ↓ s
lex ↓E
ANAPHORA AND TYPE LOGICAL GRAMMAR
M ary
that
1 a np
The Problem of Anaphora
93
The derivations in the prosodically labeled natural deduction calculus are given in the Figures 2.7 on page 91 and 2.8 on the facing page. The derivation of bound readings that involve Principle B violations, as in (38), will fail because both for he and him, the anaphoric potential of a pronoun only becomes active after the local clause containing the pronoun is assembled, so the pronoun will never have access to local binders.9 (38)
Every mani shaves himi .
To sum up briefly, Morrill, 2000 assumes that anaphora involves two steps of a wrapping operation. The context of an anaphoric link is a discontinuous constituent containing two split points. It wraps first around the pronoun and then around the antecedent. In this way cases where the antecedent does not c-command the pronoun can be dealt with empirically adequately. Furthermore, Morrill proposes type assignments for personal pronouns that take the blocking effects into account that are standardly dubbed “Principle B effects”. Intuitively, he assumes that a pronoun like he as such is not anaphoric, but it constructs a clause that is anaphoric and requires an np antecedent. This proposal is certainly the most sophisticated approach to date to deal with anaphora exclusively in the lexicon in a Categorial setting, and it handles a considerable range of empirical data adequately. On the other hand, it has to use both a highly complex deductive system— secondary wrap and labeled deduction—and highly complex lexical types for pronouns. It also has to draw a fundamental distinction between bound and free pronouns which seems intuitively unmotivated. These shortcomings—which are inherent in the binding-in-lexicon approach— make it worthwhile to consider the other option, namely extending the logical apparatus of TLG with means to do anaphora resolution in syntax.
9 This approach to Principle B is of limited generality though since it wrongly predicts that a pronoun sitting inside a relative clause will have no access to an antecedent from the immediate matrix clause.
(i) Everybodyi sang a song that hei knew. It also wrongly excludes binding of a pronoun by a non-c-commanding quantifier within the same clause, as in (ii) Everybodyi ’s mother loves himi .
94
3.2
ANAPHORA AND TYPE LOGICAL GRAMMAR
Resource Multiplication in Syntax
All the approaches discussed up to now share the assumption that pronouns are complex higher order functors that scope over some other functor and identify two argument slots of its argument. In other words, binding is considered to be part of the meaning of the pronoun, while the overall resource management regime is sub-Linear, i.e., each lexical resource is used exactly once. However, it is an inevitable consequence of this kind of approach that the lexical entries become rather complex, and the syntactic mechanism has to be enriched with highly powerful operations like different kinds of wrapping. It is thus tempting to keep the lexical entries of anaphors simple and instead to introduce an operation of anaphora resolution—and thus of meaning multiplication—directly into the grammatical machinery. This idea was first proposed in Hepple, 1990 (see also Hepple, 1992) in a Type Logical setting. Pauline Jacobson develops a different implementation of the same idea within the framework of Combinatory Categorial Grammar (cf. Jacobson, 1992a, Jacobson, 1992b, Jacobson, 1994a, Jacobson, 1994b, Jacobson, 1996a, Jacobson, 1996b, Jacobson, 1999, Jacobson, 2000, Jacobson, 2001). In the remainder of this section I will discuss these two approaches before I develop my own proposal that incorporates important aspects of both.
3.2.1 Hepple 1990 In his thesis Hepple, 1990, Mark Hepple pursues the program—originally formulated in Morrill et al., 1990—to extend the Lambek calculus with controlled versions of structural rules from Intuitionistic Logic. To illustrate this point with an example, consider the structural rule of Permutation, which corresponds to the axiom A•B →B•A Adding this rule in its general form to the Lambek calculus leads to the multiplicative fragment of Linear Logic, a logic that is fully commutative. Accordingly, a Categorial Grammar based on this logic would only recognize languages that are closed under permutation. No natural language has this property. However, natural languages do admit a limited amount of permutation—depending both on structural and language particular triggers. It is thus desirable that Permutation be applicable in certain environments while being blocked as a general axiom. This can be achieved by means of modal operators. For instance, we may extend the inventory of type forming connectives by a unary operator “” (i.e., A is a type if A is), and extend the Lambek calculus with the following controlled form of Permutation:
95
The Problem of Anaphora
A • B → B • A The limitation of Permutation to modalized formulae enables us to fine tune the structural operations of the grammatical system by using modal operators in lexical assignments where appropriate. Hepple uses a similar strategy to model anaphoric binding. As mentioned above, doing anaphora resolution in syntax amounts to admitting a version of the structural rule of Contraction in the grammar logic. In an axiomatic form, Contraction can be formulated as A→A•A A modally controlled version of Contraction would thus be A → A • A (where “” is a unary modal operator). Hepple’s proposal can in fact be reduced to something that is very close to this axiom. Before we have a closer look at the logical aspects of his system, let me briefly discuss the repercussions of this strategy for the lexical meaning of the pronoun. In a simple sentence involving anaphora, like (39), the meaning of the antecedent is prima facie used twice, while there is no obvious counterpart to the anaphor in the semantic representation. (39)
a. John shaves himself. b. shave’john’john’
If the duplication of the meaning of the antecedent is performed in the course of semantic composition, the pronoun seems in fact to be semantically empty. This is exactly what Hepple proposes: the meaning of a pronoun is the identity function on individuals, i.e., λxe .x. At a first glance this might seem counterintuitive, but this assumption makes perfect sense if seen in the appropriate conceptual setting. Pronouns—and all other anaphoric expressions—are context dependent items. Their actual meaning in a particular context depends on some antecedent. So their context independent meaning can be identified with a function from the meaning of their antecedent to their meaning in an actual context. For a pronoun, this meaning is in fact the identity function. To employ another perspective, under the standard view the meaning of a pronoun is a function that maps assignment functions to one element of their range. Assignment functions can be identified with infinite sequences. To take an example, the meaning of the pronoun he21 is a function that takes a sequence of individuals and returns its 21st element.
96
ANAPHORA AND TYPE LOGICAL GRAMMAR
The sequence contains much more information—all other values—which is wasted when interpreting the pronoun. A more economical system would identify a pronoun’s meaning with a function from partial assignment functions to values. The extreme borderline case would be a pronoun meaning that takes single-valued assignment functions as arguments. Identifying pronoun meanings with identity functions amounts exactly to this. Hepple implements binding by extending the Lambek calculus (a) with the unary modal operator and (b) with the Natural Deduction rule given in Figure 2.9, which he dubs the “Binding Interpretation Rule”. .. . .. .
x : A .. .
i
M :C λx.M x : C
.. . .. . BIR, i
where C is A\B or B/A Figure 2.9.
Binding Interpretation Rule
The intuitive content of this rule is best illustrated with an example. Reconsider sentence (39a). Hepple assigns the lexical entry in (40) to the reflexive pronoun. (40)
himself – λx.x : np/ np
So before himself can serve as argument of the transitive verb shave, it has to be combined with an argument of category np. The grammar does not assign this category to any constituent. So the argument of himself has to be a hypothesis that is to be discharged later. The only way to discharge it in such a way that the result does not contain an occurrence of the modal operator is via the Binding Interpretation Rule. Due to the side condition of this rule, this amounts to identifying the hypothetical argument of himself with the np argument of some functor category which contains himself. The only candidate in this example is the subject slot of the matrix VP. This leads to the bound reading of the example. The Natural Deduction derivation is given in Figure 2.10 on the facing page. While the overall architecture of Hepple’s system—extending the Lambek calculus with a modally controlled version of Contraction—is in-
97
The Problem of Anaphora
himself shaves
lex
shave’ (np\s)/np John john’ np
/E
/E
λy.shave’yy np\s shave’john’john’ Figure 2.10.
1 y np
y : np shave’y np\s
lex
lex
λx.x np/ np
BIR, 1 \E
Derivation of (39a)
tuitively appealing, it has a serious proof theoretic drawback, and the obvious way to remedy it leads to a system that is computationally inadequate. Therefore I will develop an alternative approach later on. Let us start to consider the proof theoretic properties of the Binding Interpretation rule (“BIR” henceforth). In a sequent formulation, it can be formulated by the following two rules: X, x : A, Y ⇒ M : A\B X, Y ⇒ λx.M x : A\B X, x : A, Y ⇒ M : B/A X, Y ⇒ λx.M x : B/A
BIR\
BIR/
These rules represent modalized versions of a combination of Contraction with Permutation. They cannot simply be incorporated as sequent rules into the Gentzen style sequent formulation of the Lambek calculus, because this system would not enjoy Cut elimination. Suppose the conclusion of the BIR is the left premise of a Cut application, and the Cut formula (i.e., B/A or A\B respectivly) is the active formula of the right premise of the Cut. Then permuting the BIR over the right premise is impossible. So to reach a system with Cut elimination, we have to reformulate the BIR.10 10 Hepple,
1990:160 discusses this problem. He presents a sequent formulation that is decidable (and has the finite reading property) provided all antecedent formulae are types from a “well-behaved” lexicon, even though the logic does not allow Cut elimination. I consider this state of affairs unsatisfactory.
98
ANAPHORA AND TYPE LOGICAL GRAMMAR
It is easy to see that in the presence of the slash elimination rules and the slash introduction rules of the Lambek calculus, the two instances of the BIR given above are equivalent to the following formulations:11 y : A, X, x : A, Y ⇒ N : B y : A, X, Y ⇒ N [y/x] : B X, x : A, Y, y : A ⇒ N : B X, Y, y : A ⇒ N [y/x] : B
BIR\
BIR /
In these formulations, the BIR still blocks Lambek’s Cut elimination algorithm. Suppose the consequence of a BIR application is the right premise of a Cut, and A is the Cut formula. Then permutation of Cut with the BIR will fail. This problem can easily be overcome if we adopt slightly more involved but equivalent formulations of the BIRs: X⇒M :A
Y, y : A, Z, x : A, W ⇒ N : B
Y, X, Z, W ⇒ N [M/x][M/y] : B X⇒M :A
Y, x : A, Z, y : A, W ⇒ N : B
Y, Z, X, W ⇒ N [M/x][M/y] : B
BIR\
BIR /
Formulated this way, the BIRs can be incorporated into the Cut elimination procedure for L. However, these formulations point to two severe problems. First, they predict that any NP can antecede any pronoun, without any structural constraints. This would lead to wild overgeneration (and Hepple employed multimodal techniques to avoid this kind of collapse at the price of sacrificing Cut elimination). But even if appropriate constraints can be imposed, the system does not have the subformula property. Neither A nor A occur as subformulae in the conclusions of the rules. Thus Cut elimination is of little use here, since the Cut free system still does not lead to a finite proof search space. Cut elimination leads neither to decidability nor to the finite reading property.
3.2.2 Jacobson In a series of publications (Jacobson, 1992a, Jacobson, 1992b, Jacobson, 1994a, Jacobson, 1994b, Jacobson, 1996a, Jacobson, 1996b, Jacobson, 11 Hepple
distinguishes two families of slashes. The slashes that occur in his formulation of the BIR do not have introduction rules, so our reformulation is not entirely faithful to his system. Nevertheless it is instructive to explore the consequences that arise if we only consider Lambek’s slashes.
99
The Problem of Anaphora
1999, Jacobson, 2000, Jacobson, 2001), Pauline Jacobson has developed an alternative Categorial approach to pronominal anaphora resolution and applied it to a wide range of empirical phenomena. Her system is formulated in a version of the framework of Combinatory Categorial Grammar. This means that she does not assume the full power of the Lambek calculus but only certain theorems (combinators) like type lifting, function composition, Geach rule etc. As a novel contribution, she extends the inventory of category forming connectives with a third slash that expresses anaphoric dependencies, and she introduces a series of combinatory inference schemes that govern the combinatory potential of anaphoric expressions. The central intuition underlying her approach is the idea that the meaning of a constituent containing n unbound pronouns is a (Curryed) function from an n-tuple of referents to the contextualized meaning of this constituent. So the meaning of a sentence like (41)
Mary knows him.
is not a proposition but a function from individuals (i.e., potential referents of him) to propositions. Likewise, the meaning of the VP knows him is not a property but a relation. Consequently, the meaning of him itself is not an individual, but the identity function on individuals, as in Hepple’s system.12 So the semantic composition for (41) works as in Figure 3.2.2. S λx.know’xmary’ NP M ary mary’
VP λx.know’x V
NP
knows know’
him λx.x
Figure 2.11.
Since CCG—like all varieties of Categorial Grammar—assumes a strict category-to-type correspondence, the difference in type that is induced by unbound pronouns has to be mirrored in the syntactic categories. While Hepple formalizes the functional character of anaphoric 12 If
number and gender information is taken into account, the meaning of pronouns should actually be identified with partial identity functions. I ignore this point for simplicity.
100
ANAPHORA AND TYPE LOGICAL GRAMMAR
expressions by means of the ordinary forward slash and distinguishes types of arguments by means of a modal operator, Jacobson introduces a third slash connective that is responsible for anaphoric dependencies. Instead of her notation AB for signs of category A that need an antecedent of category B, I will use the notation A|B to stress the similarity with the other slashes.13 Jacobson thus extends the Categorial machinery with the following definitions:
Definition 36 If A and B are categories, then A|B is a category. τ (A|B) = τ (B), τ (A) Accordingly, a pronoun receives category np|np. So the full lexical entry for him comes out as (42)
him – λx.x : np|np
A more appropriate derivation for the example above is hence s|np λxknow’xmary’
(43)
np M ary mary’
(np\s)|np λxknow’x (np\s)/np knows know’
np|np λx.x him
To make this a valid CCG derivation, the combinatory rules have to admit the inheritance of anaphora slots from subconstituents to superconstituents. This is achieved by means of the combinator G. The semantic operation accompanying it is the one of the Geach rule (which motivates its name). It comes in two directional variants given below: (44)
13 I
a.
X ⇒ M : A/B X ⇒ λxy.M (xy) : A|C/B|C
G>
assume that the vertical slash takes the highest precedence among all binary operators. So A|B/C abbreviates (A|B)/C, A/B|C abbreviates A/(B|C) etc. Furthermore, vertical slashes associate to the left, so A|B|C abbreviates (A|B)|C.
101
The Problem of Anaphora
b.
X ⇒ M : B\A X ⇒ λxy.M (xy) : B|C\A|C
G<
These combinators ensure that anaphora slots can be passed on from arguments to the result of applying a function to an argument. If a functor contains an anaphora slot itself, the argument has to be turned into the functor by means of type lifting. This is illustrated in the “official” derivation of (41) given in Figure 2.12. M ary
knows
lex
mary’ np λw.wmary’ s/(np\s) λuv.uvmary’ s|np/(np\s)|np
T> G>
lex
know’ (np\s)/np λyz.know’(yz) (np\s)|np/np|np
G>
λz.know’z (np\s)|np
λv.know’vmary’ s|np Figure 2.12.
him λx.x np|np
lex A>
A>
Derivation for (41)
Binding of pronouns is achieved by identifying the anaphora slot that originates from the lexical entry of the pronoun with some np argument slot of a superordinate functor. This is implemented by means of the combinator Z. Since it operates on two-place functors, it comes in four directional variants.14 (45)
a.
b.
c.
14 Jacobson
X ⇒ M : A/B/C X ⇒ λxy.M (xy)y : A/B/C|B
Z> >
X ⇒ M : (B\A)/C X ⇒ λxy.M (xy)y : (B\A)/C|B X ⇒ M : C\A/B X ⇒ λxy.M (xy)y : C|B\A/B
Z< >
Z> <
restricts the type variable B in the combinatory schemes in (45) to the value np.
102
ANAPHORA AND TYPE LOGICAL GRAMMAR
d.
X ⇒ M : C\B\A X ⇒ λxy.M (xy)y : C|B\B\A
Z< <
A prototypical instance of Z takes a transitive verb (phrase) as input and returns a TVP which selects an object containing a pronoun (i.e., an object of category np|np). Semantically it binds the pronoun inside the object to its subject slot. This operation corresponds to the sequent x : (np\s)/np ⇒ λyz.x(yz)z : (np\s)/np|np If the subject slot of such a shifted transitive verb is in turn bound by a quantifier or a wh-operator, we indirectly achieve the effect of binding the pronoun to the operator. This is illustrated in the example (46). (I simplify matters a bit and pretend that the complex NP his mother receives the category np|np and the meaning mother’—the Skolem function15 mapping individuals to their mothers—in the lexicon, since the semantics of possessive constructions is of minor interest in the present context.) (46)
a. Every man loves his mother. b. loves
every
lex
every’ s/(np\s)/n
man man’ n
every’man’ s/(np\s)
lex A>
lex
love’ (np\s)/np λyz.love’(yz)z (np\s)/np|np
Z< >
his mother mother’ np|np
λz.love’(mother’z)z np\s
lex A>
A>
every’man’(λz.love’(mother’z)z) s
In Jacobson’s “official” theory the formulation of Z is somewhat more complicated, but I skip over this here for ease of exposition. The purpose of the G-combinators is to pass unbound anaphora slots from subconstituents to superconstituents. As I have presented G up to now, this will only work for one single slot, but of course a 15 A
remark on terminology: I use the term “Skolem function” as synonymous to “function of type e, e” throughout this book, regardless whether or not an operation of “Skolemization” is involved.
103
The Problem of Anaphora
constituent may contain more than one unbound pronoun. Therefore a generalization of G is required as well. Jacobson assumes that there are infinitely many instances of G that are defined recursively. The definition given above represents the base case. The recursive rule takes the form of the following monotonicity rule (Jacobson assumes that the input to this inference scheme has to be obtained by applications of G> , G< , and G∗ only. I ignore this aspect for simplicity.) (47)
x:A⇒M :B y : A|C ⇒ λz.M [(yz)/x] : B|C
G∗
Written in tree format, this rule amounts to a form of hypothetical reasoning. To derive a conclusion B|C from a premise A|C, assume some hypothesis of type A, try to derive B from it, and discharge the hypothesis. The general scheme is given in Figure 2.13. y : A|C
1
yz : A .. . M :B λzM : B|C Figure 2.13.
G∗ , 1
G∗ in tree format
I illustrate the application of G∗ in example (48) below. To summarize this mechanism, suppose the argument in a functorargument structure contains an anaphora slot. Then either of two options apply: 1 The functor undergoes some version of G and the anaphora slot is thus projected to the superconstituent (as illustrated in Figure 2.12 on page 101). 2 The functor undergoes Z prior to applying it to its argument. As net effect, the anaphora slot in the argument is bound by some superordinate syntactic argument place of the functor (cf. (46)). As a consequence, Jacobson’s system agrees with Szabolcsi’s in the prediction that in a binding configuration, the binder always c-commands the pronoun.16 A welcome consequence of this is that the system han16 To
apply this view on binding for double object constructions, Jacobson also employs a wrapping operation in these cases, following basically the suggestions from Bach, 1979.
104 (48)
ANAPHORA AND TYPE LOGICAL GRAMMAR
a. His mother loves his dog. b.
loves
lex
love’ (np\s)/np λrs.love’(rs) (np\s)|np/np|np his mother
λs.love’(dog’s) (np\s)|np
lex
mother’ np|np λx.xmother’ s|np/(np|np\s|np)
G>
dog’ np|np
G>
λuv.love’(dog’s)(uv) np|np\s|np λsuv.love’(dog’s)(uv) (np|np\s|np)|np
λzv.love’(dog’z)(mother’v) s|np|np
lex A>
1
love’(dog’s) np\s
T>
λyz.yzmother’ s|np|np/(np|np\s|np)|np
his dog
G< G∗ , 1 A>
dles basic cases of Weak Crossover correctly. Consider the contrast in (49). (49)
a. Every Englishmani loves hisi mother. b. *Hisi mother loves every Englishmani .
The binding in (49a) is achieved by applying Z to the verb loves before it is combined with the object. To get a similar binding effect in (49b), we would need a mirror image of Z, something like x : (A\B)/C ⇒ λyz.xz(yz) : (A|C\B)/C Incidentally, this is a directional version of Curry and Feys’ (1958) combinator S. Since, according to Jacobson, the grammar of English does not contain this combinator, the subject-object asymmetry observed in connection with crossover violations is correctly accounted for. Jacobson presents a considerable list of empirical arguments to show that the view “pronouns as identity maps” is in fact superior to both to the standard view using variables and to the Categorial treatments that locate the binding in the lexical entry of the pronoun. I will briefly review the most important arguments.
The Problem of Anaphora
105
Functional questions. Consider a question like (50a): (50)
a. Who does no Englishman admire? b. Margaret Thatcher. c. His mother-in-law.
There is a general agreement in the literature (cf. Groenendijk and Stokhof, 1984, Engdahl, 1986, Chierchia, 1993) that such questions are ambiguous between an individual reading (which elicits answers like (b)) and a functional reading (where we expect answers like (c)). The two readings can be paraphrased as (51)
a. Which individual x is such that every Englishman y admires x? b. Which Skolem function f is such that every Englishman y admires f y?
As a consequence of this observation, it is inevitable to assume that a wh-phrase like who is semantically ambiguous, binding either an individual gap or a Skolem function gap in its sister clause. The advantage of the Jacobsonian approach is that no further apparatus is needed to handle the ambiguity. The functional gap is bound by exactly the same means as an ordinary pronoun, namely by employing Z. (This is in clear contrast to the mentioned alternative approaches, where considerable extra apparatus like internally structured traces is needed.) In a Jacobsonian approach, the interrogative pronoun receives the two lexical entries below. Here Q is the syntactic category of questions, and the formula ?xϕ is to be interpreted as the question “Which x is such that ϕ”. (I remain neutral with regard to the correct semantics of questions since this has no bearing on the issue discussed here.) (52)
a. who – λP ?x.P x : Q/(s/np) b. who – λP ?f.P f : Q/(s/np|np)
The functional reading is now easily derived (see Figure 2.14 on the next page), using Z and the lexical entry for who in (52b). (I give a simplified treatment of auxiliary inversion since this issue is inessential here.) Since the binding of functional gaps is treated analogously to the binding of pronouns here, the approach predicts that functional gaps are subject to Weak Crossover effects. This is in fact the case (as for instance discussed at length in Chierchia, 1993). A functional reading is missing if subject and object are reversed: (53)
a. Which woman admires no Englishman?
106
ANAPHORA AND TYPE LOGICAL GRAMMAR no λQR¬∃x(Qx ∧ Rx) s/(np\s)/n does
who
lex
lex
lex
Englishman englishman’ n
λR¬∃x(englishman’x ∧ Rx) s/(np\s)
A>
λy.¬∃x(englishman’x ∧ admire’(yx)x) s/np|np ?y.¬∃x(englishman’x ∧ admire’(yx)x) q
Figure 2.14.
lex
(np\s)/np admire’ λyz.admire’(yz)z (np\s)/np|np
λy.¬∃x(englishman’x ∧ admire’(yx)x) s/np|np
λx.x s/s
λP ?f.P f q/(s/np|np)
admire
lex
Z< > B>
B>
A>
Derivation of the functional reading of (50a)
b. Margaret Thatcher. c. *His mother-in-law. The argument in favor of the Jacobsonian treatment can be further strengthened if the meaning of a constituent question is identified with the set of (denotations of) its correct constituent answers (as for instance proposed in Hausser and Zaefferer, 1978, Zaefferer, 1984 and, more recently, in Krifka, 1999). Then the answer his mother-in-law to the question Who does no Englishman admire has to be interpreted as the Skolem function mapping each individual to his mother-in-law. Jacobson’s system provides this as the basic meaning anyway, while a variable-based account would need an extra type shifting device that λ-abstracts over the variable corresponding to the pronoun.17
Sloppy inferences. Free relative clauses display a similar polymorphism. This can be observed most clearly in connection with so-called “sloppy” inferences as in (54) (taken from Jacobson, 2000 who attributes it to Tanya Reinhart): (54)
a. John will buy whatever Bill buys. b. Billi will buy hisi favorite car. c. Therefore Johnj will buy hisj favorite car.
In its most prominent reading, (54a) is to be interpreted as ∀xe (buy’xbill’ → buy’xjohn’) 17 Needless
to say this argument does not apply if the meaning of a question is identified with a set of propositions, as in Karttunen, 1977 or in Groenendijk and Stokhof, 1984.
The Problem of Anaphora
107
In this reading, the inference from (54a) and (b) to (c) is not valid. (54a) has another reading though that renders the argument valid. The critical reading can be represented as ∀fe,e (buy’(f bill’)bill’ → buy’(f john’)john’) A discussion of the semantics of free relatives would lead us too far afield here. The essential aspect of Jacobson’s analysis of the functional reading of free relatives is more or less parallel to her analysis of functional questions. In its basic meaning, a free relative pronoun like whatever binds an np gap inside the relative clause, and the free relative as a whole binds an np-position in the matrix clause. Jacobson assumes that whatever has a second reading where it binds an np|np-gap in the embedded clause and creates a free relative that binds such a position in the matrix clause. The meaning of whatever in this functional reading is basically a universal quantifier over Skolem functions. Again, binding is achieved by the same means as in the case of ordinary pronouns. In the example above, this means that the main verb both in the relative clause and in the matrix clause has to undergo Z. The same strategy can be applied if the object is sentential rather than nominal. So sentence (55a) is predicted to be ambiguous between the readings (b) and (c). (55)
a. Every Englishman believes whatever every Frenchman believes. b. ∀pt (∀x(french’x → bel’px) → ∀y(english’y → bel’py)) c. ∀Pe,t (∀x(french’x → bel’(P x)x) → ∀y(english’y → bel’(P x)y))
Under the second reading, the inference from (56a) and (b) to (c)— among others discussed by Chierchia, 1989 under the heading of “believe de se”—is correctly predicted to be valid. (56)
a. Every Englishman believes whatever every Frenchman believes. b. Every Frenchmani believes that hei should drink lots of red wine. c. Therefore, every Englishmanj believes that hej should drink lots of red wine.
Right Node Raising. In paragraph 2.1.2 (starting on page 21) I introduced a surface compositional Categorial treatment of Right Node Raising constructions like
108 (57)
ANAPHORA AND TYPE LOGICAL GRAMMAR
John likes and Bill detests broccoli.
The analysis rests on the assumptions that the strings John likes and Bill detests form constituents that denote properties (the property to be liked by John and the property to be detested by Bill), and that the coordination particle and is polymorphic and denotes the join operation on properties (i.e., set intersection) in the construction above. Now consider a somewhat more complicated example where the object contains a bound pronoun: (58)
Every man loves but no man wants to marry his mother.
The sentence has a reading where the pronoun his is simultaneously bound by both quantifiers. It can be paraphrased by (59)
Every mani loves hisi mother but no manj wants to marry hisj mother.
If we analyze pronouns as variables and adopt the Categorial treatment of non-constituent coordination, this reading is underivable. The closest we can get at is the semantic representation (60)
λy(∀z(man’z → love’yz) ∧ ¬∃z(man’z ∧ wtm’yz))(mother’z)
Now λ-conversion would lead to the intended reading, but it is illicit without prior renaming of the quantified variables since y is not free for z here. With the renaming, we only obtain the reading where the pronoun is free. If one wants to maintain an analysis of pronouns as variables, one is forced to abandon the Categorial treatment of Right Node Raising. One has to adopt a reconstruction approach instead. So the input for the interpretation of (58) would be (61)
Every mani loves hisi mother but no mani wants to marry hisi mother.
This would give us the intended reading. However, a reconstruction approach without further constraints on the management of variable names leads to considerable overgeneration. For instance nothing prevents an interpretation of (62a) as (62b), where the pronoun is bound by the matrix subject in the first conjunct and by the local subject in the second one. Such a reading is impossible. (62)
a. Each boy believes that every man loves and no man marries his mother.
109
The Problem of Anaphora
b. Each boyi believes that every mank loves (hisi mother) and no mani marries hisi mother. A variable free analysis of pronouns is compatible with the general Categorial treatment of coordination, and the resulting analysis avoids both the undergeneration of the Categorial approach and the overgeneration of the reconstruction approach that comes with the variable analysis. The critical reading (59) is derived if the conjunction operates on the category s/np|np, i.e., the two conjuncts are interpreted as properties of Skolem functions. Besides, it is possible to do coordination in the category s/np and to pass the anaphora slot up to the entire coordinated structure. This admits binding from outside. The two derivations are given in Figure 2.15. There are no further options, so non-parallel binding patterns as in (62b) are excluded. loves every man λP ∀x(man’x → P x) s/(np\s)
lex
wants to marry
lex
love’ (np\s)/np λyzlove’(yz)z (np\s)/np|np
λy∀x(man’x → love’(yx)x) s/np|np
Z B>
no man λP ¬∃x(man’x ∧ P x) s/(np\s)
wtm’ (np\s)/np
lex
λyzwtm’(yz)z (np\s)/np|np
λy¬∃x(man’x ∧ wtm’(yx)x) s/np|np
λy.∀x(man’x → love’(yx)x) ∧ ¬∃x(man’x ∧ wtm’(yx)x) s/np|np
lex Z B> his mother
Conj
mother’ np|np
∀x(man’x → love’(mother’x)x) ∧ ¬∃x(man’x ∧ wtm’(mother’x)x) s
every man λP ∀x(man’x → P x) s/(np\s)
lex
loves love’ (np\s)/np
λy∀x(man’x → love’yx) s/np
lex B>
no man λP ¬∃x(man’x ∧ P x) s/(np\s)
lex
wants to marry wtm’ (np\s)/np
λy¬∃x(man’x ∧ wtm’yx) s/np
λy.∀x(man’x → love’yx) ∧ ¬∃x(man’x ∧ wtm’yx) s/np λzw.∀x(man’x → love’(zw)x) ∧ ¬∃x(man’x ∧ wtm’(zw)x) s|np/np|np
A>
lex
B>
Conj
G
his mother
λw.∀x(man’x → love’(mother’w)x) ∧ ¬∃x(man’x ∧ wtm’(mother’w)x) s|np
Figure 2.15.
lex
mother’ np|np
lex A>
Bound and free reading of (58)
i-within-i effects. In the literature, this is the common name for the observation that a pronoun inside a complex definite NP cannot be coreferential with the matrix NP. So the following coindexations lead to ungrammaticality:
110
ANAPHORA AND TYPE LOGICAL GRAMMAR
(63)
a. *[The wife of heri childhood sweetheart]i left. b. *[The wife of heri sister’s childhood sweetheart]i left. c. *[The wife of the author of heri biography]i left.
(64)
a. *[Heri childhood sweetheart’s wife]i came to the party. b. *[The author of heri biography’s wife]i came to the party.
Neither is it possible that a quantificational determiner binds a pronoun inside its (complex) complement noun. So the following structures are excluded as well. (65)
a. *[Every wife of heri childhood sweetheart]i left. b. *[Every wife of heri sister’s childhood sweetheart]i left. c. *[Every wife of the author of heri biography]i left.
(66)
*[Every author of heri biography’s wife]i came to the party.
If indices are taken to be part of the theory, this suggests a simple generalization which Chomsky, 1981:212 formulates as follows: (67)
“*[γ . . . δ . . .], where γ and δ bear the same index.”
However, there are systematic exceptions to this generalization (which are accommodated by Chomsky in a complication of the above rule given in a footnote). The indicated indexation becomes possible if the pronoun sits inside a relative clause that modifies the head noun of the matrix NP: (68)
a. [The woman whoi married {heri sister’s / heri } childhood sweetheart]i left. b. [The woman whoi married the author of heri biography]i left.
(69)
a. [Every woman whoi married {heri sister’s / heri } childhood sweetheart]i left. b. [Every woman whoi married the author of heri biography]i left.
Under the Jacobsonian view on pronoun binding, this pattern is in fact expected. Let us start with a good example like (69b). The anaphora slot originating from the pronoun her is inherited by the NP the author of her biography by repeated application of G. So this NP will receive category np|np. The verb married undergoes Z before it is combined with its object. As a consequence, the VP married the author of her biography receives the interpretation “λx.marry’(author’(biography’x))x”.
The Problem of Anaphora
111
Starting from this VP meaning, the relative clause, the matrix NP and the matrix clause are assembled in the usual fashion, leading to the final meaning ∀x(woman’x ∧ marry’(author’(biography’x))x → leave’x) which corresponds to the coindexation in (69b). The same mechanism works ceteris paribus for all other good examples. So why is it impossible to assign the same meaning to (65c)? As far as the semantics goes, nothing prevents this. To get the reading in question, the 2-place predicate wife of has to undergo Z, just like married does in (69c). This is impossible though because the preconditions for the application of Z are defined in terms of syntactic categories rather than semantic types. While both married and wife of are semantically of type e, e, t, the former has category (np\s)/np and the latter category n/np. Only the former meets the preconditions for Z.
Paycheck pronouns. There is a class of pronoun occurrences that can neither be accommodated under “binding” nor under “coreference”. The name “paycheck pronouns” comes from the following example (from Karttunen, 1969). (70)
a. The man who gave his paycheck to his wife was wiser than the man who gave it to his mistress. b. The man whoi gave hisi paycheck to hisi wife was wiser than the man whoj gave hisj paycheck to hisj mistress.
Sentence (70a) has a reading which is synonymous with (70b). The example is problematic because the pronouns it does not have a coreferential/binding antecedent, even though it is evidently anaphorically related to the NP his paycheck in the first conjunct. There are two possible strategies to analyze this kind of anaphor in the literature. One may consider the critical pronoun as an E-type pronoun in the sense of Evans, 1977, i.e., as shorthand for a definite description. The paraphrase given in (70b) would thus be the main part of the analysis. This kind of analysis is incompatible with the program of surface compositionality since a certain syntactic copy mechanism has to be evoked before interpretation can proceed. Alternatively, one may assume that the paycheck pronoun it retrieves two meaning components from the context by means of anaphora resolution, namely a Skolem function and an individual, and it denotes the result of applying the function to this individual. In the above example, these components are the function f mapping individuals to their
112
ANAPHORA AND TYPE LOGICAL GRAMMAR
paychecks, while the individual slot is bound by the relative pronoun who. Analyses along these lines were among others proposed in Cooper, 1979 and Engdahl, 1986. The latter kind of analysis has the advantage of being compositional. It is faced with three problems: 1 How can a pronoun take several antecedents simultaneously? 2 How can the NP his paycheck evoke a Skolem function as value of a subsequent paycheck pronoun? 3 How exactly does anaphora resolution of the two anaphoric components of paycheck pronouns proceed? Obviously, the second question receives an immediate answer if we assume Jacobson’s analysis—the meaning of his paycheck is the paycheckfunction. Let us turn attention to the first question. Syntactically, the paycheck pronoun in the example above takes his paycheck as one antecedent, and its second anaphoric slot is bound by a relative pronoun. The category of his paycheck is np|np, and the relative pronoun binds a gap of category np. The category of the paycheck pronoun should thus be (np|np)|(np|np)—an anaphor that takes first an np|np and second an np as antecedent and returns an np. This category is derivable from the basic pronoun category np|np by a variant of the well-established Geach rule: x : A|B ⇒G λyz.x(yz) : (A|C)|(B|C) | Applying this rule to the lexical entry of it gives us the derived sign (71)
it – λf.f : (np|np)|(np|np)
(Note that λf x.f x = λf.f due to the extensionality of functions.) So the first question is answered by assuming G| as a general type shifting rule. Given this, no extra apparatus is needed to answer the third question. The value for the Skolem function is retrieved by means of accidental coreference, while the individual component is bound by Z. A sample derivation for the critical clause in the simplified example in (72) is given in Figure 2.16 on the next page. The category of the clause is s|(np|np), i.e., it denotes a function from Skolem functions to propositions. The Skolem function slot is filled by the denotation of the antecedent phrase his paycheck. (72)
Every man spent his paycheck. Mary kept it.
113
The Problem of Anaphora kept M ary
lex
mary’ np λP.P mary’ s/(np\s)
lex
keep’ (np\s)/np λyz.keep’(yz)z (np\s)/np|np
T>
λuv.uvmary’ s|(np|np)/(np\s)|(np|np)
G>
λrsz.keep’(rsz)z (np\s)|(np|np)/(np|np)|(np|np)
G>
λsz.keep’(sz)z (np\s)|(np|np)
λv.keep’(vmary’)mary’ s|(np|np)
Figure 2.16.
it
Z< >
lex
λx.x np|np λf.f (np|np)|(np|np)
G| A>
A>
Derivation for (72)
Bach-Peters sentences. The above analysis of paycheck pronouns leads to a straightforward account of Bach-Peters sentences, i.e., sentences with two complex NPs each containing a pronoun that is coindexed with the other NP. A classical example is (73)
[The man who deserves iti ]k gets [the prize hek wants]i .
Even though this coindexation pattern leads us to expect a kind of circular reference (and thus pragmatic deviance), the construction is perfectly intelligible. The question is how the interpretation is to be derived in a compositional way. Let us first note some asymmetries in these constructions. To start with, the first NP is unrestricted with regard to the form of its determiner. It may be indefinite, definite (as above) or quantified: (74)
a. [A man who deserved iti ]k got [the prize hek wanted]i . b. [Men who deserved iti ]k got [the prizes theyk wanted]i . c. [Every man who deserved iti ]k got [the prize hek wanted]i .
On the other hand, the second matrix NP must be definite (or specific if indefinite). (75)
a. ??? [The man who deserves iti ]k gets [a prize hek wants]i . b. *[The man who deserves iti ]k gets [every prize hek wants]i .
This pattern is not really surprising; the attempted backward bindings in (75) are ruled out as mundane Weak Crossover violations. Definites and specific indefinites are known to be exempted from Weak Crossover.
114
ANAPHORA AND TYPE LOGICAL GRAMMAR
Furthermore, the first pronoun seems to be subject to Weak Crossover as well. Exchanging the pronoun and the gap corresponding to the relative pronoun in the first NP results in deviance. (76)
*[The man whom iti always evaded]k finally got [the prize hek wanted]i .
The second pronoun can occur both in subject position and in object position though. (77)
[The man who deserves iti ]k finally got [the prize that always evaded himk ]i .
The kind of subject-object asymmetry displayed by the first pronoun is also characteristic for paycheck pronouns: (78)
a. The man who sees [his brother]f regularly is better off than the man whoi never visits himf i . b. *The man who sees [his brother]f regularly is better off than the man whomi hef i is never visited by.
This asymmetry is predicted by Jacobson’s account; the relative pronoun binds the np slot of the paycheck pronoun just like an ordinary pronoun, and an object relative pronoun cannot bind an anaphora slot in the subject. Finally it should be observed that paycheck pronouns can precede their functional antecedent. This is not surprising either, given that accidental coreference does not involve binding and thus does not evoke Weak Crossover. (79)
The man whoi sees himf i regularly is better off than the man who never visits [his brother]f .
Putting these pieces together, we have evidence that the first pronoun in a Bach-Peters sentence is a paycheck pronoun, while the second one is an ordinary bound pronoun. This is exactly the analysis Jacobson proposes: In a sentence like (80), it is analyzed as a paycheck pronoun. Its Skolem function slot remains free on the level of sentence semantics— the category of the whole sentence is thus s|(np|np), while its np slot is bound by the relative pronoun (i.e., deserves undergoes Z). The pronoun him is treated as an ordinary pronoun that gets bound by the matrix subject (i.e., gets undergoes Z as well). The full derivation is given in Figure 2.17 on the facing page. (80)
Every man who deserves it gets the prize that pleases him.
lex who
man lex man’ n
every λP Q∀x.(P x → Qx) s/(np\s)/n
lex
n/(n\n) G>
(s/(np\s))|(np|np)/n|(np|np)
λP Qx.Qx ∧ P x (n\n)/(np\s)
T>
n|(np|np)/(n\n)|(np|np)
G>
lex
(np\s)/np|np G>
(n\n)|(np|np)/(np\s)|(np|np)
(np\s)|(np|np)/(np|np)|(np|np)
λf Q∀x(man’x ∧ deserve’(f x)x → Qx) (s/(np\s))|(np|np)
lex G>
(np\s)|(np|np)
(n\n)|(np|np) n|(np|np)
it Z< >
A>
A>
λx.x np|np (np|np)|(np|np) A>
G| A>
The Problem of Anaphora
deserves deserve’ (np\s)/np
pleases lex that
prize lex prize’ n
the lex gets lex get’ (np\s)/np (np\s)/np|np
Z< >
λP ιx.P x np/n np|np/n|np
G>
n/(n\n)
(s/np\s)\s
T>
λP Qx.Qx ∧ P x (n\n)/(np\s)
lex
(n\n)|np/(np\s)|np
n|np/(n\n)|np
(np\s)|np/np|np G> (n\n)|np
n|np np|np
np\s
please’ (np\s)/np
him G>
(np\s)|np
lex λx.x np|np
A>
A>
A>
A>
A>
T<
λrs.rs(λx.get’(ιy.prize’y ∧ please’xy)x) (s/np\s)|(np|np)\s|(np|np)
G<
...
...
λf Q∀x(man’x ∧ deserve’(f x)x → Qx) (s/(np\s))|(np|np)
λrs.rs(λx.get’(ιy.prize’y ∧ please’xy)x) (s/np\s)|(np|np)\s|(np|np)
Figure 2.17.
Derivation of (80)
A<
115
λf.∀x(man’x ∧ deserve’(f x)x → get’(ιy.prize’y ∧ please’xy)x) s|(np|np)
116
ANAPHORA AND TYPE LOGICAL GRAMMAR
The syntax-semantics interface supplies the meaning (81)
λf.∀x(man’x ∧ deserve’(f x)x → get’(ιy.prize’y ∧ please’xy)x)
The paycheck pronoun it is still unresolved, so the meaning is a function from Skolem functions to propositions. In the Bach-Peters reading, it is accidentally coreferent with the prize that pleases him, which denotes the Skolem function (82)
λxιy.prize’y ∧ please’xy
So the final interpretation is obtained by applying (81) to (82), which yields the desired (83)
∀x(man’x ∧ deserve’(ιy.prize’y ∧ please’xy)x → get’(ιy.prize’y ∧ please’xy)x)
Let us briefly wrap up the discussion of Jacobson’s approach. Her crucial assumptions are that the meaning of pronouns is the identity map, and that the binding of pronouns is achieved by means of a syntactic operation, namely Z. The fact that these simple assumptions suffice to explain a considerable range of quite diverse data is strong evidence that her theory is on the right track. Still, some critical remarks can be made. From a theoretical point of view, the collection of combinators that are necessary to make the system work seems ad hoc. The instances of Z that were discussed here only deal with constructions where the binder is the subject. Other configurations require other versions of Z. An empirically adequate modeling of the inheritance of pronoun slots requires an even larger proliferation of combinators; we need infinitely many instances of G. So it seems that some generalization has been missed here. In the ideal case, all these combinators should be theorems of a more general deductive system. Besides, Jacobson’s system has certain empirical shortcomings. It assumes that c-command is a structural precondition for pronoun binding. As discussed above, this is inadequate in many cases. This problem becomes more severe if we strive for a unified treatment of pronominal anaphora and ellipsis. In ellipsis construction, c-command of the ellipsis site by the antecedent is the exception rather than the rule. In the next chapter, I will thus develop a type logical version of a Jacobson style treatment of anaphora that avoids these problems.
4.
Summary
Categorial grammars generally employ a version of the Curry-Howard correspondence for meaning assembly. This entails a variable-free con-
The Problem of Anaphora
117
ception of the syntax-semantics interface. In other words, under the Categorial perspective on meaning composition, the grammar does not include variable binding operations. Furthermore, both Lambek Categorial Grammars and most versions of Combinatory Categorial Grammar are in a sense subsystems of Linear Logic. This means that every meaning of a lexical item that occurs in a complex construction must be used exactly once in the composition of the complex meaning. This seems to be at odds with the empirical facts in connection with anaphora phenomena like pronominal anaphora and ellipsis. By definition, anaphora involves the re-use of semantic resources. If the overall variable-free design is to be maintained, there are two basic strategies to accommodate anaphora into the general picture: 1 Anaphora is triggered by certain lexical items, and the recycling of semantic resources is due to the interpretation of these lexical items. Accordingly, anaphoric lexical items have semantic representations where a λ-operator binds more than one variable occurrence. This strategy is pursued among others by Szabolcsi, 1989, Szabolcsi, 1992 in a Combinatory and by Moortgat, 1996a, and Morrill, 2000 in a Type Logical setting. 2 Anaphora resolution is handled in syntax. This means that the grammar contains operations specifically designed for this purpose. Lexical items typically have Linear meaning representations here, while the grammatical operations go beyond the resource management of Linear Logic. This approach was first explored by Mark Hepple (Hepple, 1990, Hepple, 1992), who uses a version of Type Logical Grammar. Drawing on his insights, Pauline Jacobson reformulated this idea within the framework of CCG (Jacobson, 1999, Jacobson, 2000). Mixed approaches are possible. Jacobson, for instance, follows the general Categorial consensus in treating coordination ellipsis in the lexicon while pronominal anaphora is dealt with in syntax. All mentioned approaches, including the second group, assume that anaphora is somehow lexically triggered. So they will not easily lend themselves to an analysis of ellipsis phenomena like stripping that are apparently not lexically triggered. (84)
Most people want to be millionaires, but not John.
The landscape of Categorial approaches to anaphora is schematically summarized in Table 2.1 on the next page.
118
ANAPHORA AND TYPE LOGICAL GRAMMAR
Author
Locus of resource multiplication
Szabolcsi
lexicon
Moortgat
lexicon
Morrill
lexicon
Hepple
syntax
Jacobson
syntax
Table 2.1.
Lexical entry of he
Non-standard operations
λxyz.y(xz)z (s/np)\((np\s)/s)\(np\s) λxy.xyy q(np, np\s, np\s) λxyz.y(xz)y (((s ↑ np) ↑2 s) ↓2 (s ↑ np))/(np\s) λx.x np/ np λx.x np|np
none none none BIR Z, G
Categorial approaches to anaphora
There is an obvious tradeoff between a fairly complex lexicon in the first three approaches and a complication of the grammatical machinery in the last two ones. Given that Jacobson gives empirical arguments a) that the meaning of a pronoun is in fact the identity function and b) something like her Z-operation is needed also in the absence of anaphora (for instance in functional questions), the second class of approaches seems to be superior. Furthermore they do without a lexical ambiguity between bound, coreferential and free pronouns, while this complication is virtually inevitable in the first group of theories. On the other hand, both Hepple and Jacobson assume some version of a c-command constraint on anaphora resolution. As argued above, this is empirically inadequate in certain cases of pronominal anaphora, and it practically blocks the extrapolation of their anaphora machinery to ellipsis. Furthermore, Hepple’s system displays certain unpleasant formal properties (like the failure of Cut elimination), and the (infinite!) collection of combinators needed to make Jacobson’s system work seems to be ad hoc. In the following chapter, I will develop a simple extension of the Lambek calculus which enables us to derive all relevant instances of Jacobson’s Combinatory approach as theorems. The system is proof theoretically well-behaved, and it is straightforwardly applicable to several kinds of ellipsis phenomena in natural language.
Chapter 3 LAMBEK CALCULUS WITH LIMITED CONTRACTION
1.
The Agenda
After having reviewed the main Categorial approaches to anaphora from the literature, in this chapter I will develop a new proposal. My aim is to extend the Lambek style core of Type Logical Grammar in such a way that a comprehensive treatment of anaphora phenomena becomes possible. The discussion from the previous chapter leads to the following agenda: Resource multiplication should be done in syntax (as in Hepple’s 1992 and Jacobson’s 1999, 2000 systems) rather than in the lexicon. There are three main reasons for taking this decision: 1. Doing resource multiplication in the lexicon means we have to stipulate ambiguity between bound and free pronouns, 2. binding-in-syntax lends itself more naturally to an extension to the discourse level than binding-in-lexicon, and 3. Jacobson supplies convincing empirical evidence that the meaning of a pronoun is in fact the identity function. This leads to the second desideratum: The meaning of a pronoun should come out as the identity function on individuals. The general topic of the present investigation is the analysis of anaphora in TLG, hence: The analysis should be formulated in an extension of the Lambek calculus.
119
120
ANAPHORA AND TYPE LOGICAL GRAMMAR
I thus want to improve on Hepple’s proposal: The system should be proof-theoretically well-behaved, i.e., the logic should enjoy Cut elimination, decidability, the subformula property and the finite reading property. Furthermore there should be a natural Curry-Howard correspondence as syntaxsemantics interface. Finally, I want my analysis to incorporate the insights from Morrill’s (2000) system. Neither the structural positions of anaphors nor the positions of antecedents should be limited in an empirically unjustified way. The anaphora resolution mechanism should do without c-command restriction. The latter point is certainly controversial among linguists, and I will discuss the empirical aspects of this decision at length in the next chapter.
2.
Contraction?
Under a type logical perspective, doing resource multiplication in syntax means that the logic of grammatical composition derives Curry-Howard terms where one λ-operator binds more than one occurrence of a variable. According to the Curry-Howard correspondences of substructural logics, this amounts to the assumption that the structural rule of Contraction is part of the logic of grammar in one way or another. The canonical version of this rule is repeated here for convenience: X, x : A, y : A, Y ⇒ M : B X, x : A, Y ⇒ M [x/y] : B
C
Looking at this rule under a bottom-up proof search perspective, it says that antecedent formulae can be multiplied at will. It is easy to see that the proof search space becomes infinite as soon as we incorporate this rule, since we can apply Contraction to the premise of this rule again (still in the bottom-up direction) etc. and thus run into an infinite regress. While logics using Contraction might still be decidable (Intuitionistic Logic and some versions of Relevant Logic are), we nevertheless lose the finite reading property.1 Such a logic would thus be a priori too 1 The
simplest illustration of this point is the identity theorem x:A→A⇒M :A→A
where the Curry-Howard term M can be any λx.f n x for arbitrary n in Intuitionistic or Relevant Logic.
Lambek Calculus with Limited Contraction
121
powerful as a logic of grammatical composition. Contraction thus has to be limited in a suitable way to avoid this collapse. One may try to do this by employing multimodal techniques along the lines of Hepple’s (1990) work. I will pursue another strategy though, which is inspired by Jacobson’s work. I will extend the Lambek calculus with a third version of implication, and I will compile a limited version of Contraction directly into the logical rules for this new connective. This will allow us to keep the power of Contraction under strict logical control.
3. 3.1
The Logic LLC Vocabulary
In this section I will introduce a conservative extension of L called Lambek Calculus with Limited Contraction (abbreviated LLC), where a limited version of the structural rule of Contraction is compiled into the logical rules of a logical connective. Starting from the Lambek calculus L, I extend the inventory of category forming connectives by a third kind of implication (written as |). So the set of categories F over a collection of atomic categories A is given by
Definition 37 (Categories) F ::= A, F\F, F • F, F/F, F|F As in Jacobson’s system, the vertical slash creates categories of anaphoric items. A sign has category A|B iff it needs an antecedent of category B and, provided it finds one, behaves like an item of category A. Pronouns will thus come out as np|np. As in L, the product is interpreted as Cartesian product and the implications as function space formation. So the category-to-type correspondence for LLC is given by
Definition 38 (Category to type correspondence) τ (A • B) = τ (A) ∧ τ (B) τ (A\B) = τ (B/A) = τ (B|A) = τ (A), τ (B)
3.2
Sequent Presentation
The Gentzen style sequent formulation extends the corresponding presentation of L by a left rule and a right rule for the new implication slash. It is given in Figure 3.1 on the following page. Let us have a closer look at the two new rules. If the left premise of the rule of use |L is instantiated with an identity axiom, we obtain the
122
ANAPHORA AND TYPE LOGICAL GRAMMAR
x:A⇒x:A
id
X⇒M :A
Y, x : A, Z ⇒ N : B
Y, X, Z ⇒ N [M/x] : B X⇒M :A
Y ⇒N :B
X, Y ⇒ M, N : A • B
Cut
•R
X, x : A, y : B, Y ⇒ M : C X, z : A • B, Y ⇒ M [(z)0 /x][(z)1 /y] : C X, x : A ⇒ M : B X ⇒ λxM : B/A X⇒M :A
/R Y, x : B, Z ⇒ N : C
Y, y : B/A, X, Z ⇒ N [(yM )/x] : C x : A, X ⇒ M : B X ⇒ λxM : A\B X⇒M :A
•L
/L
\R Y, x : B, Z ⇒ N : C
Y, X, y : A\B, Z ⇒ N [(yM )/x] : C
\L
X, x1 : A1 , Y1 , . . . , xn : An , Yn ⇒ M : B X, y1 : A1 |C, Y1 , . . . , yn : An |C, Yn ⇒ λz.M [(y1 z)/x1 ] · · · [(yn z)/xn ] : B|C
|R
n>0 Y ⇒M :B
X, x : B, Z, y : A, W ⇒ N : C
X, Y, Z, z : A|B, W ⇒ N [M/x][(zM )/y] : C Figure 3.1.
|L
Labeled sequent presentation of LLC
simplified formulation below (from which the original formulation can be recovered via Cut): X, x : B, Z, y : A, W ⇒ N : C X, x : B, Z, z : A|B, W ⇒ N [(zx)/y] : C Intuitively this rule says: If an anaphoric resource of category A|B is preceded by an antecedent of category B, it may be resolved and thus be replaced by a resource of category A. The meaning of the resolved anaphor is obtained by applying the meaning of the unresolved anaphor to the
123
Lambek Calculus with Limited Contraction
meaning of its antecedent. (Typically, the meaning of the anaphor is just the identity function, so the resolved meaning of the anaphor winds up being identical to the meaning of the antecedent in these cases.) Note that the metavariable Z ranges over sequences of categories, including the empty sequence, so antecedent and anaphor may or may not be adjacent. The same intuition is possibly expressed more transparently by the two axioms below, which are jointly equivalent to the sequent rule above (i.e., extending L with the two axioms has the same effect as extending L with |L): x : A, y : B|A ⇒ x, yx : A • B x : A, y : B, z : C|A ⇒ x, y, zx : A • B • C The Curry-Howard labeling of |L reveals that this operation corresponds to three Intuitionistic operations: 1. Contraction (because M occurs twice in the proof term of the succedent), 2. the rule of Modus Ponens (corresponding to function application of z to one copy of M ), and Cut (corresponding to replacing x by M and y by zM ). In fact, if all three implications of LLC are mapped to the Intuitionistic implication and the product of LLC to Intuitionistic conjunction, |L becomes a derivable rule of Intuitionistic Logic, as can be seen from the natural deduction derivation in Figure 3.2. This translation into Intuitionistic Logic justifies the Curry-Howard labeling used here. z:B→A⇒z:B→A
id
w:B⇒w:B
z : B → A, w : B ⇒ zw : A
id →E
X, x : B, Z, y : A, W ⇒ N : C
X, x : B, Z, z : B → A, w : B, W ⇒ N [(zw)/y] : C .. . X, x : B, w : B, Z, z : B → A, W ⇒ N [(zw)/y] : C Y ⇒M :B
X, x : B, Z, z : B → A, W ⇒ N [(zx)/y] : C X, Y, Z, z : B → A, W ⇒ N [M/x][(zM )/y] : C
Figure 3.2.
Cut
P P C
Cut
Intuitionistic derivation of |L
The rule of proof |R expresses the insights that anaphora slots can percolate up inside larger constituents, and that they can be merged. The first fact is relevant for instance in answers to functional questions; we want to be able to assign the phrase (1b) the category pp|np: (1)
a. In which town is every Englishman happy?
124
ANAPHORA AND TYPE LOGICAL GRAMMAR
b. In his hometown. This percolation mechanism can be covered by two axioms and one inference rule. An anaphora slot can percolate up to a superconstituent from either of its subconstituents. This corresponds to the two axioms x : A|C, y : B ⇒ λz.xz, y : (A • B)|C x : A, y : B|C ⇒ λz.x, yz : (A • B)|C Furthermore, anaphora slots are preserved under unary derivations. The corresponding inference rule is x:A⇒M :B y : A|C ⇒ λz.M [(yz)/x] : B|C Second, a syntagma may contain several anaphoric expressions that are understood as being co-anaphoric (i.e., depending on the same antecedent) even though no antecedent is present. A case in point is the adjective local which arguably has category (n/n)|np (i.e., it requires an np-antecedent to be an attributive adjective). In (2b) the two occurrences of local are preferably interpreted as co-anaphoric: (2)
a. What happened in three cities last year? b. The local press accused the local politicians of corruption.
The axiom covering this merge operation is x : A|C, y : B|C ⇒ λz.xz, yz : (A • B)|C These axioms and rules taken together are equivalent to the rule of proof |R given in fig 3.1 on page 122.2 As for the rule of use, the labeling is justified by the fact that the rule is Intuitionistically derivable (cf. Figure 3.3 on the next page).
3.3
Cut Elimination
Despite the fact that they incorporate Contraction, the two logical rules for the new implication have the subformula property, just as all logical rules of L. Hence the premise of any inference rule has a lower complexity 2 In J¨ ager, 2001 I used a rule of proof that is slightly stronger than the one given here. The present version is a generalization of a proposal from Glyn Morrill (p.c.). His rule is the special case of the one given here for n = 1.
125
Lambek Calculus with Limited Contraction y i : C → A i ⇒ y i : C → Ai
id
zi : C ⇒ z i : C
yi : C → Ai , zi : C ⇒ yi zi : Ai
id →E
X, x1 : A1 , Y1 , . . . , xn : An , Yn ⇒ M : B
X, y1 : C → A1 , z1 : C, Y1 , . . . , yn : C → An , zn : C, Yn ⇒ M [(yi zi )/xi ] : B .. .
P
X, y1 : C → A1 , Y1 , . . . , yn : C → An , Yn , z1 : C, . . . , zn : C ⇒ M [(yi zi )/xi ] : B X, y1 : C → A1 , Y1 , . . . , yn : C → An , Yn , z : C ⇒ M [(yi z)/xi ] : B X, y1 : C → A1 , Y1 , . . . , yn : C → An , Yn ⇒ λz.M [(yi z)/xi ] : C → B
Figure 3.3.
Cutn
P C n−1
→I
Intuitionistic derivation of |R
than its conclusion, and bottom-up proof search reduces complexity. Again this does not hold for the Cut rule. As in L, however, Cut is admissible in the Cut free sequent presentation of LLC.
Theorem 9 (Cut Elimination) If LLC X ⇒ A, then there is a Cut-free sequent proof of X ⇒ A. Sketch of proof: The proof is essentially identical to Lambek’s Cut elimination proof for L (as sketched from page 51 onwards in Chapter 1), except for the fact that we have two more cases to consider for principal Cut. These are the configurations where the Cut formula is the active formula in both premises. Since the rule |R also introduces anaphora-implications on the left hand side of the sequent, there are two new configurations for principal Cut: The left premise of the Cut is a conclusion of |L and the right premise is a conclusion of |R, or both premises are conclusions of |R. The Cut elimination steps for these configurations are schematically given in Figures 3.4 and 3.5 on the following page. (For the second configuration it is assumed that 1 ≤ i ≤ n.) In either case, the principal Cut is replaced by a Cut of lower degree. Lambek’s Cut elimination algorithm is thus guaranteed to terminate. Because every rule in the Cut free sequent presentation of LLC has the subformula property, the bottom-up proof search space is finite. As in L, this leads to the following consequences:
Theorem 10 (Decidability) Derivability in LLC is decidable. Proof: Identical to the corresponding proof for L.
126
ANAPHORA AND TYPE LOGICAL GRAMMAR
U, D1 , V1 , . . . , Dn , Vn ⇒ A U, D1 |B, V1 , . . . , Dn |B, Vn ⇒ A|B
Y ⇒B
|R
X, B, Z, A, W ⇒ C
X, Y, Z, A|B, W ⇒ C
X, Y, Z, U, D1 , V1 , . . . , Dn |B, Vn , W ⇒ C
|L
Cut
;
B⇒B
U, D1 , V1 , . . . , Dn , Vn ⇒ A
id
X, B, Z, A, W ⇒ C
X, B, Z, U, D1 , V1 , . . . , Dn , Vn , W ⇒ C
X, B, Z, U, D1 |B, V1 , D2 , V2 , . . . , Dn , Vn , W ⇒ C .. . Y ⇒B
X, B, Z, U, D1 |B, . . . , Dn−1 |B, Vn−1 , Dn , Vn ⇒ C X, Y, Z, U, D1 , V1 , . . . , Dn |B, Vn , W ⇒ C
Cut
|L
|L |L |L
Principal Cut for |, first configuration
Figure 3.4.
X, A1 , Y1 , . . . , An , Yn ⇒ Bi X, A1 |C, Y1 , . . . , An |C, Yn ⇒ Bi |C
Z, B1 , W1 , . . . , Bm , Wm ⇒ D
|R
Z, B1 |C, W1 , . . . , Bm |C, Wm ⇒ D|C
Z, B1 |C, W1 , . . . , X, A1 |C, Y1 , . . . , An |C, Yn , Wi , . . . , Bm |C, Wm ⇒ D|C
|R Cut
; X, A1 , Y1 , . . . , An , Yn ⇒ Bi
Z, B1 , W1 , . . . , Bm , Wm ⇒ D
Z, B1 , W1 , . . . , X, A1 , Y1 , . . . , An , Yn , . . . , Bm , Wm ⇒ D
Cut
Z, B1 |C, W1 , . . . , X, A1 |C, Y1 , . . . , An |C, Yn , Wi , . . . , Bm |C, Wm ⇒ D|C Figure 3.5.
|R
Principal Cut for |, second configuration
Corollary 2 (Finite reading property) For a given unlabeled LLC-sequent, there are at most finitely many Curry-Howard labelings. Proof: Identical to the corresponding proof for L.
Lambek Calculus with Limited Contraction
3.4
127
Natural Deduction Presentation
During the discussion of L we saw that the sequent system is indispensable since it guarantees decidability, but for practical purposes it is rather awkward. A presentation in natural deduction format is better suited for concrete derivations. Besides, it has an appealing allusion to the tree format linguists are used to. I start with a sequent style presentation of the natural deduction system (Figure 3.6 on the next page). Next to the identity rule and the Cut rule (which are identical to the corresponding rules in the sequent system), we have an introduction rule and an elimination rule for each connective. The rules for the Lambek connectives are identical to the corresponding rules in the natural deduction presentation of L. Additionally, we have an introduction rule and an elimination rule for the anaphora slash. The |-introduction rule is a combination of the rule of proof for | in the sequent system and Cut, and thus requires no further elaboration. The elimination rule is straightforwardly derivable from the rule of use from the sequent system (and vice versa). The derivations are given in Figure 3.7 on the following page and 3.8 on page 129.
Cut elimination for sequent style natural deduction. As in the sequent system, Cut is an admissible rule in the natural deduction system in tree format. Here Cut elimination does not even affect the CurryHoward term of the proof. Theorem 11 (Cut Elimination) If LLC X ⇒ M : A, then there is a Cut-free natural deduction proof of X ⇒ M : A. Proof: The proof follows the same strategy as the corresponding proof for the sequent system. There are two notable differences: The degree of a Cut application is measured by the complexity of the Curry-Howard term of the conclusion of the Cut. This guarantees that every Cut elimination step reduces the degree of the Cut, also if Cut is permuted with an elimination rule. Second, since there are no left rules in the natural deduction calculus, the configuration for principal Cut never arises. Since the elimination of a principal Cut is the only configuration where Cut elimination leads to a change in the Curry-Howard term, Cut elimination in the natural deduction calculus preserves Curry-Howard term assignment.
128
ANAPHORA AND TYPE LOGICAL GRAMMAR
x:A⇒x:A
id
X⇒M :A
Y, x : A, Z ⇒ N : B
Y, X, Z ⇒ N [M/x] : B X⇒M :A
Y ⇒N :B
X, Y ⇒ M, N : A • B X ⇒M :A•B
Cut
•I
Y, x : A, y : B, Z ⇒ N : C
Y, X, Z ⇒ N [(M )0 /x][(M )1 /y] : C X, x : A ⇒ M : B X ⇒ λxM : B/A
/I
X ⇒ M : A/B
Y ⇒N :B
X, Y ⇒ M N : A x : A, X ⇒ M : B X ⇒ λxM : A\B X⇒M :A
/E
\I Y ⇒ N : A\B
X, Y ⇒ N M : B for 1 ≤ i ≤ n : Zi ⇒ Ni : Ai |C
•E
\E
X, x1 : A1 , Y1 , . . . , xn : An , Yn ⇒ M : B
X, Z1 , Y1 , . . . , Zn , Yn ⇒ λz.M [(Ni z)/xi ] : B|C X⇒M :A
Y ⇒ N : B|A
|I
Z, x : A, W, y : B, U ⇒ O : C
Z, X, W, Y, U ⇒ O[M/x][(N M )/y] : C Figure 3.6.
(Labeled) Natural Deduction presentation of LLC
X⇒A Y ⇒ B|A
Z, A, W, B, U ⇒ C
Z, X, W, B|A, U ⇒ C Z, X, W, Y, U ⇒ C Figure 3.7.
Derivation |L ; |E
Cut
|L
|E
129
Lambek Calculus with Limited Contraction
A|B ⇒ A|B
id
Y ⇒B
X, B, Z, A, W ⇒ C
X, Y, Z, A|B, W ⇒ C Figure 3.8.
|E
Derivation |E ; |L
3.4.1 Natural Deduction in Tree Format Natural deduction proofs are more concisely presented in tree format. To give a natural representation for |-elimination, this format has to be extended somewhat in comparison to the tree format for L. Strictly speaking, a natural deduction proof tree is not necessarily a tree, but rather a sequence of finite directed acyclic graphs (DAGs) with labeled nodes. I will continue to use the term “proof tree” nevertheless. As in conventional syntax trees, the nodes in a proof tree are partially ordered by two relations, immediate dominance D and precedence <. In contradistinction to ordinary trees, one node may not only immediately dominate several nodes, but it may also be immediately dominated by several nodes. I call the transitive closure of the immediate dominance relation simply dominance. Due to the fact that a node can be dominated by several nodes, I will avoid the kinship terminology that is common when talking about trees (since this would entail that a node may have arbitrarily many parent nodes). Rather, if a immediately dominates b, I will say that a is a premise of b and b is a conclusion of a. If a and b are premises of the same conclusion, they are co-premises (and there is an analogous notion of co-conclusions). If a precedes b, a is a predecessor of b and b a successor of a. Finally, a sequence of proof trees is again a proof tree. So a proof tree need not be connected by the transitive and symmetric closure of the dominance relation. The class of graphs underlying proof trees obeys a series of axioms that are straightforward extensions of the corresponding axioms for trees. (As notational conventions, I write “D+ ” for the transitive closure and “D∗ ” for the reflexive and transitive closure of the relation “D”.) 1 Dominance is irreflexive. ¬∃x.xD+ x 2 Precedence is irreflexive and transitive. ¬∃x.x < x
130
ANAPHORA AND TYPE LOGICAL GRAMMAR
∀xyz.x < y ∧ y < z → x < z 3 Both dominance and its inverse are disjoint from both precedence and its inverse. ¬∃xy.(xD+ y ∨ yD+ x) ∧ (x < y ∨ y < x) 4 Any two distinct nodes are either related by dominance or by precedence. ∀xy.x = y ∨ xD+ y ∨ yD+ x ∨ x < y ∨ y < x 5 Precedence is inherited from premises to conclusions and vice versa. ∀xyzw.xDy ∧ zDw ∧ x < z → (yD∗ w ∨ wD∗ y ∨ y < w) ∀xyzw.yDx ∧ wDz ∧ x < z → (yD∗ w ∨ wD∗ y ∨ y < w) The nodes in a proof tree are labeled with a rule name, a Curry-Howard term, and an LLC-category. (In the sequel, I will sometimes omit the rule name when referring to a label of a node.) Nodes may furthermore optionally be marked by indices and by an overline. Overlined nodes are called discharged. A node in a proof tree that is not dominated by any other node is called a premise of that proof tree. It follows from the above axioms that the undischarged premises of a proof tree are linearly ordered by the precedence relation. Likewise, a node that does not dominate any other node is called a conclusion of this proof tree, and the set of conclusions form a linear sequence as well. The set of proof trees is defined recursively by the subsequent rules. I tacitly assume that the sets of Curry-Howard variables used in distinct proof trees are mutually disjoint. 1 (id ) Every node labeled by x : A; id is a proof tree (where A is a category and x a variable with type τ (A)). 2 (Cut) If α is a proof tree with M1 : A1 , . . . , Mn An as its conclusions, and β1 , . . . , βk are proof trees with X, x1 : A1 ; id, . . . , xn : An ; id, Y as their undischarged premises, then α + β1 + · · · + βn is a proof tree as well, where α + β1 + · · · + βn is the result of replacing every occurrence of xi in βj (for 1 ≤ i ≤ n, 1 ≤ j ≤ k) by Mi , followed by a merging of the graphs by identifying nodes with identical labels.
Lambek Calculus with Limited Contraction
131
3 (/I) If α is a proof tree with the single conclusion M : A and the sequence of undischarged premises X, x : B (where X is nonempty), then α is a proof tree as well, where α is the result of replacing x : B by x : B and extending the resulting graph by a node λxM : A/B; /I with M : A as its only premise. 4 (/E) If α is a proof tree with the conclusion sequence X, M : A/B, N : B, Y , then α is a proof tree as well, where α is the result of extending α with a new node M N : A; /E with M : A/B and N : B as its only premises. 5 (\I) If α is a proof tree with the single conclusion M : A and the sequence of undischarged premises x : B, X (where X is nonempty), then α is a proof tree as well, where α is the result of replacing x : B by x : B and extending the resulting graph by a node λxM : B\A; \I with M : A as its only premise. 6 (\E) If α is a proof tree with the conclusion sequence X, M : A, N : A\B, Y , then α is a proof tree as well, where α is the result of extending α with a new node N M : B; \E with M : A and N : A\B as its only premises. 7 (•I) If α is a proof tree with the conclusion sequence X, M : A, N : B, Y , then α is a proof tree as well, where α is the result of extending α with a new node M, N : A • B; •I with M : A and N : B as its only premises. 8 (•E) If α is a proof tree with the conclusion sequence X, M : A • B, Y , then α is a proof tree as well, where α is the result of extending α with a two nodes, (M )0 : A; •E and (M )1 : B; •E, each of which has M : A • B as its only premise, such that the first new node precedes the second one. 9 (|I) Let α be a proof tree with the conclusion sequence X, M1 : A1 |B, Y1 , . . . , Mn : An |B, Yn , and β a proof tree with X , x1 : A1 , Y1 , . . . , xn : An , Yn as sequence of undischarged premises (where X , Yi are like X, Yi except that all formulae are labeled with variables) and N : C as single conclusion. Then γ is a proof tree as well, where γ is the result of 1. replacing all occurrences of xi in β with Mi y, 2. replacing all occurrences of variables occuring in X , Yi by the corresponding terms from X, Yi , and 3. merging the two graphs by identifying all nodes with identical labels and having each Mi : Ai |B immediately dominate Mi y : Ai , and 4. extending the resulting graph by a new node λyN : C|B; |I with N : C as only premise.
132
ANAPHORA AND TYPE LOGICAL GRAMMAR
10 (|E) If α is a proof tree with the conclusion sequence X, M : A, Y, N : B|A, Z, then α is a proof tree as well, where α is the result of replacing M : A by [M : A]i (i being a fresh index) and extending the resulting graph with a new node [N M : B]i ; |E with N : B|A as its only premise. 11 If α and β are proof trees, then the sequence α, β is a proof tree as well, where every node in α precedes every node in β. 12 Nothing else is a proof tree. A sequent X ⇒ M : A is derivable by natural deduction in tree format iff there is a proof tree α that has X as its sequence of undischarged assumptions and M : A as its sole conclusion. As in the other two proof formats discussed so far, Cut is redundant and can be eliminated:
Theorem 12 (Cut elimination for natural deduction) If LLC X ⇒ M : A, then there is Cut free natural deduction proof in tree format for X ⇒ M : A. Proof: The proof is essentially identical to the corresponding proof for sequent style natural deduction. If the second input to a Cut rule is an identity axiom, the output of Cut is identical to the first input, and the Cut can be eliminated. If the second input is itself the output of a rule R, we apply Cut to the input of this rule first and apply R to the output of Cut. Since no rule except Cut increases the set of undischarged premises, this permutation is always possible. Identifying the degree of a Cut with the complexity of the Curry-Howard term of its conclusion, every Cut elimination step reduces the degree of the Cut, and the algorithm is thus bound to terminate. Both natural deduction calculi derive the same set of sequents. The proof is somewhat blurred by the fact that intermediate steps in the tree format may have multiple conclusions. So we again need the notion of product closure from Definition 26 on page 55, this time applied to labeled formulae:
Definition 39 (Labeled product closure) 1 σ(M : A) = M : A 2 σ(X, M : A) = N, M : B • A, where σ(X) = N : B.
Theorem 13 (Equivalence of sequent and tree format) X ⇒ M : A is derivable in sequent style natural deduction iff there is
Lambek Calculus with Limited Contraction
133
a proof tree with X as sequence of undischarged assumptions and Y as sequence of conclusions such that M : A is αβη-equivalent to σ(Y ). Proof: Both directions are proved by induction over derivations. I start with the only-if direction. The base case is the identity axiom, where the claim is obviously true. The rules for /I, /E, \I, \E and •I in the two formats are hardly more than notational variants from each other, and the induction steps for them are trivial. The result of a •E step in tree format can be obtained in sequent format by combining (sequent style counterparts of) the premise of the rule with the theorem x : A, y : B ⇒ x, y : A • B via •E. As for |I, suppose the induction hypothesis holds. This means that Γ ⇒ σ(X, M1 : A1 |B, Y1 , . . . , Mn , An |B, Yn ) (where Γ is the sequence of undischarged assumptions of α), and X , x1 : A1 , Y1 , . . . , xn , An , Yn ⇒ N : C. From the latter sequent we can derive X , z1 : A1 |B, Y1 , . . . , zn , An |B, Yn ⇒ λyN [zi y/xi ] : C via the sequent rule |I and n instances of the identity axiom. From this and the former sequent we can derive Γ ⇒ λyN : C via Cut and possibly several applications of •E, where N is a term from which N [Mi y/xi ] can be obtained via a sequence of β-reduction steps. Now consider |E. By induction hypothesis we assume that Γ ⇒ σ(X, M : A, Y, N : B|A, Z), and we have to show that then, Γ ⇒ O : C as well, where O : C is αβη-equivalent to σ(X, M : A, Y, N M : B, Z). Suppose the premise holds. It is easy to see that X , x : A, Y , y : B|A, Z ⇒ σ(X , x : A, Y , yx : B, Z ) is derivable, using one |E-step and a series of product introductions. From that and the other assumption, we derive the conclusion via Cut and possibly a sequence of product eliminations. Finally, the induction step for the sequencing rule corresponds to •I in the sequent format. Now we turn to the if-direction. Again, the induction base is obvious, and the induction steps for /I, /E, \I, \E and •I are trivial. Consider •E. By induction hypothesis, there is a proof tree leading from the undischarged premise sequence X to the conclusion M : A • B. (If the witness for an induction hypothesis has multiple conclusions, we can always turn it into a single-conclusion tree by applying •I.) Applying one •E step leads to a proof tree with the conclusions (M )0 : A and (M )1 : B. The second induction hypothesis states that there is a proof tree leading from the premises Y, x : A, y : B, Z to the conclusion N : C. These two proof trees can be combined to a new proof tree via Cut that leads from the premises Y, X, Z to the conclusion N : C. Now consider |I. By induction hypothesis, there are proof trees leading from the premises Zi to the conclusions Ni : Ai |C, and there is a proof tree leading from X, x1 : A1 , Y1 , . . . , xn : An , Yn to the con-
134
ANAPHORA AND TYPE LOGICAL GRAMMAR
clusion M : B. The former set of proof trees can be combined with the sequences of identity axioms X , Yi (which are like X, Yi except that they only use variables as Curry-Howard labels) to a proof tree with the premises X , Z1 , Y1 , . . . , Zn , Yn and the conclusions X , Ni : Ai |C, Y1 , . . . , Nn : An |C, Yn via the sequencing rule. Combining this proof tree with the proof tree corresponding to the other premise of the sequent rule via |I, we obtain a proof tree leading from the premises X, Z1 , Y1 , . . . , Zn , Yn to the conclusion λzM [Ni z/xi ] : B|C. Finally we turn to rule |E. By induction hypothesis, there is a proof tree α leading from X to M : A, a proof tree β leading from Y to N : B|A, and a proof tree γ leading from Z, x : A, W, y : B, U to O : C. α and β can be combined with a series of identity axioms to a proof tree leading from Z, X, W, Y, U to Z, M : A, W, N : B|A, U via the sequencing rule. From there we obtain a proof tree with the same premises and the conclusion sequence Z, M : A, W, N M : B, U by applying |E once. This proof tree can be combined with γ via Cut, leading to a proof tree with the premises Z, X, W, Y, U and the conclusion O[M/x][N M/y] : C. The graphical representation of proof trees that I used in Chapter 2 is easier to work with (but possibly somewhat ambiguous; therefore I gave the precise definition above). The rules are schematically given in Figure 3.9 on the facing page. The rules for the Lambek connectives are identical to the rules in the natural deduction presentation of L from Chapter 2. Here we (again) coindex discharged assumptions with the point in the derivation where they are discharged. It should be noted that the graphical representation may be misleading sometimes since it might convey the impression that the inference rules are declarative graph admissibility conditions rather than procedural rules for graph constructions. The pseudo-proof tree in Figure 3.10 on page 136 may serve to illustrate this point. While the structure seems to conform to the rules from Figure 3.9 on the facing page, it is actually impossible to prove that this is a proof tree because the definition of \I requires that it operates on a single conclusion proof tree—therefore it can only be applied to a graph that excludes the node z : C|A—while the definition for |E requires that x : A and z : C|A are both conclusions of the same proof tree. These conditions cannot simultaneously be fulfilled, and the graph is thus not a proof tree. Accordingly, the ‘proven’ sequent y : B, z : C|A ⇒ λxx, y, zx : (A\(A • B)) • C is not derivable. In addition to better readability, the tree format for natural deduction has two advantages over the sequent format. First, proof trees are ambiguous with regard to the derivation history. It might occur that
135
Lambek Calculus with Limited Contraction i
.. . .. .
x:A .. .
λxM : A\B
.. . .. .
M :A
N : A\B
(N M ) : B
M :B
.. . .. .
.. . .. .
\E
\I, i
i x:A .. .
M : A/B
N :B /E
(M N ) : A
M :B /I, i λxM : B/A
M :A
N :B
M, N : A • B
M :A•B
•I
(M )0 : A
[M : A]i
(M )1 : B
···
N : B|A NM : B
M1 : A1 |B .. . .. .
M1 x : A1 .. .
i1
M2 : A2 |B .. . .. .
M2 x : A2 .. .
i2
N :C λxN : C|B Figure 3.9.
Mn : An |B .. . .. .
Mn x : An .. .
in
•E
|E, i
.. . .. .
|I, i1 , . . . , in
Natural deduction in tree format
several sequences of rule application lead to the same proof tree. A simple case in point is the following instance of the associativity law: A • B, C • D ⇒ A • (B • C) • D
136
ANAPHORA AND TYPE LOGICAL GRAMMAR
[x : A]i
1
y:B
x, y : A • B λxx, y : A\(A • B)
•I
[z : C|A]i
\I, 1
zx : C
λxx, y, zx : (A\(A • B)) • C Figure 3.10.
|E
•I
An illicit Natural Deduction derivation
There are two natural deduction sequent proofs for this sequent, since there is a choice whether the first or the second product is eliminated first. They are given in Figure 3.11 and 3.12 respectively. However, this proof ambiguity is spurious since both proofs produce the same CurryHoward term, namely (x)0 , (x)1 , (y)0 , (y)1 . When working with natural deduction in tree format, both rule serializations lead to the same graph (given in Figure 3.13 on the next page).
A⇒A
A•B ⇒A•B
C •D ⇒C •D
id
id
id
B⇒B
id
C⇒C
B, C ⇒ B • C
A, B, C ⇒ A • (B • C)
id •I
•I
D⇒D
A, B, C, D ⇒ A • (B • C) • D
A, B, C • D ⇒ A • (B • C) • D A • B, C • D ⇒ A • (B • C) • D
id •I
•E
•E
Figure 3.11.
A⇒A
C •D ⇒C •D
id
A•B ⇒A•B
id
id
B⇒B
id
C⇒C
B, C ⇒ B • C
A, B, C ⇒ A • (B • C)
•I
•I
A, B, C, D ⇒ A • (B • C) • D
A • B, C, D ⇒ A • (B • C) • D
A • B, C • D ⇒ A • (B • C) • D
id
D⇒D
id •I
•E
•E
Figure 3.12.
Quite generally, proof trees do not represent serializations of subproofs that do not affect each other, i.e., that are essentially computed in parallel. This is reminiscent of the tree format for context free derivations
137
Lambek Calculus with Limited Contraction
A•B A
B
•E
B•C A • (B • C)
C •D C
•I
D
•E
•I
A • (B • C) • D
•I
Figure 3.13.
which skips over inessential aspects of the serialization of derivation steps. The other advantage of the tree format lies in the fact that there is a tight correspondence between the structure of proofs and the syntactic structure of the accompanying Curry-Howard terms. An inspection of the inference rules reveals that, with one exception, the proof term M is a direct subterm of the proof term N if and only if the node labeled with M is either a premise of the node labeled with N , or M is coindexed with N due to an application of |E. There is just one exception to this generalization: if a proof term M x has been constructed by means of the upper part of |I, the variable x does not correspond to any node in the proof tree.
3.4.2 Proof Normalization Similarly to the natural deduction presentation of Intuitionistic Logic (≈ full typed λ-calculus), we can define a notion of proof normalization over proof trees in LLC. This is a procedure that removes redundant steps from proofs, i.e., sequences of proof steps where a connective is introduced and eliminated immediately afterwards or vice versa. Such detours can occur with any of the four connectives. I start with the Lambek implications. A proof where a / is introduced and eliminated in the next step leads to a Curry-Howard term that contains a β-redex. Such a proof can be replaced by a simpler proof of the same formula that avoids these two steps. On the level of proof terms, this transformation corresponds to one β-reduction step, and thus this operation is called β-normalization. It is schematically given in Figure 3.14. A similar redundancy arises if a / is eliminated and re-introduced in the next step. This leads to an η-redex in the term, and eliminating this detour is called η-normalization. It is given in Figure 3.15 on the following page.
138
ANAPHORA AND TYPE LOGICAL GRAMMAR .. . .. .
.. . .. .
1 .. .
x:A .. .
.. .
M :B
;
/I, 1
.. .
N :A
λxM : B/A
.. .
N :A .. .
M [N/x] : B
/E (λxM )N : B Figure 3.14.
β-normalization for /
.. . 1 x:B
M : A/B
/E
.. .
;
Mx : A
M : A/B /I, 1
λx.M x : A/B Figure 3.15.
η-normalization for /
Corresponding patterns arise for \ as well. They are the perfect mirror images of the ones given above and therefore omitted. An instance of •I immediately followed by •E leads to a β-normalization configuration too, and the inverse pattern to an η-redex. The reduction schemes are given in Figure 3.16 and 3.17 on the facing page. .. .
.. .
M :A
N :B
M, N : A • B (M, N )0 : A
;
•I
(M, N )1 : B Figure 3.16.
•E
.. .
.. .
M :A
N :B
β-normalization for •
The β-normalization pattern for | is somewhat more complex. Again, such a pattern can arise if an introduction step is immediately followed by an elimination step. However, since |I involves not just λ-abstraction but also n instances of application on the level of terms, the |E in the redex configuration is replaced by n instances of |E after reduction (cf. Figure 3.19 on page 140).
139
Lambek Calculus with Limited Contraction .. . M :A•B (M )0 : A
(M )1 : B
(M )0 , (M )1 : A • B Figure 3.17.
.. .
;
•E
M :A•B
•I
η-normalization for •
Due to the fact that |I involves application on the level of terms as well as λ-abstraction, a combination of two instances of |I may also lead to a β-redex. This is illustrated in Figure 3.20 on page 141. Finally, an η-normalization configuration for | can occur if the two components of |I are not separated by a real deduction (in other words, if the second input to |I is an identity axiom). The configuration is given in Figure 3.18. .. . M : A|B
;
i Mx : A λx.M x : A|B
|I, i
Figure 3.18.
.. . M : A|B
η-normalization for |
Due to the absence of Contraction as a structural rule in LLC, it is easy to establish that there are no infinite sequences of normalization steps starting from a given proof.
Theorem 14 (Strong normalization) Normalization always terminates after finitely many steps. Proof: Each normalization configuration given above reduces the number of instances of introduction rules in a proof (while it may increase the number of elimination rules). Since this parameter must be a nonnegative integer, there cannot be infinite normalization sequences. Furthermore, every proof tree has a unique normal form. To establish this, we have to modify the Curry-Howard labeling slightly. In the modified calculus, both participants in a |E-step are marked on the level of terms. The antecedent of the anaphora resolution becomes the argument of a fresh variable with an identity type. So if M in the rule below
140
ANAPHORA AND TYPE LOGICAL GRAMMAR M1 : A1 |B
.. .
···
.. . .. .
M1 x : A1 .. .
i1
Mn : An |B .. . .. .
Mn x : An .. .
N :C
[O : B]i
λxN : C|B (λxN )O : C
in
.. . .. .
|I, i1 , . . . , in |E, i
; M1 : A1 |B .. .
···
[· · · [O : B]i1 · · ·]in
.. . .. .
M1 O : A1 .. .
|E, i1
Mn : An |B .. . .. .
Mn O : An .. .
|E, in
.. . .. .
N [O/x] : C Figure 3.19.
β-normalization for |, first configuration
has type τ (A), x has type τ (A), τ (A). The modified rule is given in Figure 3.21 on the facing page. The reason for this move is the following: With the original labeling, it might occur that a proof contains a redex configuration but normalization cannot be applied because one of the intermediate steps that would disappear by normalization is used as antecedent for |E. Therefore according to the original labeling, there are proofs that are in normal form but that have Curry-Howard labels that are not in normal form. (Following standard terminology, I say that a proof is in normal form if no normalization step can be applied to it.) With the modified calculus, this possibility is excluded. This modified labeling simplifies the proof of the following lemma.
Lemma 2 If an LLC-proof tree is in normal form, all Curry-Howard terms occurring in it are in αβη-normal form. Proof: We prove this by contraposition. Suppose a proof tree contains a Curry-Howard term that contains a redex configuration M . M cannot be a variable, and all other subterms occurring in an LLC-proof are labels of some node in that proof. So M is the label of some node. M
141
Lambek Calculus with Limited Contraction M1 : A1 |B .. . .. .
M1 x : A1 .. .
Mn : An |B
i1
.. . .. .
Mn x : An .. .
N : Ck O1 : C1 |B .. . .. .
O1 y : C1 .. .
j1
λxN : Ck |B .. . .. .
(λxN )y : Ck .. .
in
.. . .. .
|I, i1 , . . . , in
Om : Cm |B
jk
P [(λxN )y/z] : D λyP [(λxN )y/z] : D|B
.. . .. .
Om y : Cm .. .
jm
.. . .. .
jm
.. . .. .
|I, j1 , . . . , jm
; M1 : A1 |B
O1 : C1 |B .. . .. .
O1 y : C1 .. .
j1
.. . .. .
.. . .. .
M1 y : A1 .. .
Mn : An |B
i1
.. . .. .
Mn y : An .. .
N [y/x] : Ck .. . P [N [y/x]/z] : D λyP [N [y/x]/z] : D|B
Figure 3.20.
in
.. . .. .
Om : Cm |B .. . .. .
Om y : Cm .. .
|I, j1 , . . . , jk−1 , i1 , . . . , in , jk+1 , . . . , jm
β-normalization for |, second configuration [M : A]i
···
xM : A Figure 3.21.
N : B|A NM : B
|E, i
Modified rule |E
may either be a β-redex or an η-redex. Suppose it is a β-redex. We have to distinguish several cases. 1 M = (λxN )O, and this term is the result of an application of /E. Then the premises of this node are λxN : A/B and O : B, and the former is in turn the conclusion of a node N : A; /I. This is thus a β-normalization configuration for /. 2 M = (λxN )O, and this term is the result of an application of \E. Here the same argument applies as above. 3 M = (λxN )O : A, and this term is the result of an application of |E. This entails that M is coindexed with a preceding node O : B, and that it is the sole conclusion of a node λxN : A|B.
142
ANAPHORA AND TYPE LOGICAL GRAMMAR
This node in turn must be the sole conclusion of a term N : A due to |I. This is thus an instance of the first configuration for β-reduction for |. 4 M = (λxN )y : A, and it is coindexed with a |I-node λyO that it dominates. This is thus an instance of the second configuration of β-reduction for |. 5 M = (N, O)0 : A. Then M must be the left conclusion of a node N, O : A • B due to •E, the right conclusion of this node is (N, O)1 : B, and the premise itself is the conclusion of the two premises N : A and O : B due to •I. This is thus a β-normalization configuration for •. 6 M = (N, O)1 : A. Likewise. Now suppose M is an η-redex. Again we have to distinguish several cases. (I tacitly assume below that x does not occur in N .) 1 M = λx.N x : A/B. Then M must be the conclusion of N x : A due to /I, which in turn is the conclusion of N : A/B and x : B due to /I. This is an η-normalization configuration for /. 2 M = λx.N x : A\B. Likewise. 3 M = λx.N x : A|B. Then M is the conclusion of N x due to |I, i, and this in turn is the conclusion of N : A|B. Since |I can only be applied to proof trees with a single conclusion, N x : A neither precedes nor is preceded by any other node within this proof tree. Therefore it cannot be an operand of |E, and therefore the application of N to x is due to the upper part of |I. This is an η-normalization configuration for |I. 4 M = (N )0 , (N )1 . This means that M is the conclusion of the two nodes (N )0 and (N )1 via •I, which in turn are the coconclusions of N via •E. This is an η-normalization configuration for •. So whenever a proof tree contains a Curry-Howard label in non-normal form, it is not in normal form itself. By contraposition, normal form proofs only produce normal form labels. As with the original Curry-Howard labeling, there is a direct correspondence between the structure of a proof and the syntactic structure of its Curry-Howard term.
Lambek Calculus with Limited Contraction
143
Lemma 3 If two proof trees Π1 and Π2 have the same sequence of undischarged premise types and the same sequence of conclusion types, and if the sequences of the Curry-Howard terms of their conclusions are identical up to uniform renaming of variables (free or bound), then Π1 and Π2 are isomorphic. Proof: By induction over the complexity of Curry-Howard terms.
From these lemmas we can immediately conclude that normal forms for LLC-proofs are unique.
Theorem 15 (Normal form theorem) Every LLC-proof has a unique normal form up to α-equivalence of Curry-Howard terms. Proof: An inspection of the normalization configurations given above shows that every proof normalization step is accompanied by at least one βη-reduction step in one of the Curry-Howard terms of one of its conclusions. So proof normalization preserves Curry-Howard labels up to αβη-equivalence. From this and lemma 2 we can conclude that for any LLC-proof Π and any normal form Π obtained from Π via proof normalization, the Curry-Howard terms of the conclusions of Π are the normal forms of the Curry-Howard terms of the conclusions of Π. Due to Lemma 3, any normal form Π of Π must thus be isomorphic to Π , and due to the normalization theorem for the typed λ-calculus, the CurryHoward terms of Π and Π are identical up to alphabetic variance. The Curry-Howard labeling of an LLC-proof according to the original calculus can be obtained from the modified labels by replacing subterms of the form (xA,A MA ) by MA . Since this operation is functional, normal forms for proofs according to the original calculus are unique as well (even though their Curry-Howard labels need not be in normal form). From this we can infer the corollary
Corollary 3 (Church-Rosser property) Proof normalization is confluent. Proof: Immediately from the normal form theorem.
Let us summarize so far and see how far we got with the agenda from the beginning of the chapter. LLC is an extension of the Lambek calculus, and it admits resource multiplication in syntax. This can be seen from the fact that both |E and |I can lead to Curry-Howard terms where one
144
ANAPHORA AND TYPE LOGICAL GRAMMAR
λ-operator binds several variable occurrences, or where the label of an antecedent formula of a sequent occurs several times in the succedent. Thus far LLC is similar to Hepple’s system. However, LLC has several advantages at least from a proof theorist’s point of view. It has a sequent presentation and a natural deduction presentation that both admit Cut elimination. The sequent system enjoys the subformula property, and thence the system is decidable and has the finite-reading property. The natural deduction system has a natural connection to the structure of the accompanying Curry-Howard terms, and we can define a well-behaved notion of proof normalization that is strongly normalizing and confluent.
3.4.3 Model Theory The anaphora slash is a hyperintensional operator. Even if two categories A and B are interderivable, C|A and C|B need not be interderivable. For instance, it holds in LLC that A • (B • C) ⇔ (A • B) • C but nevertheless LLC C|(A • (B • C)) ⇒ C|((A • B) • C) and LLC C|((A • B) • C) ⇒ C|(A • (B • C)) This makes it difficult to develop a sound and complete model theory for LLC along the lines of the semantics for L that was described in Chapter 1. There, the intension of a category, i.e., the set of points that verify it, is compositionally derived from the intensions of their immediate components. If validity is defined as preservation of truth in all models and the interpretation is sound and complete, two interderivable categories are verified by exactly the same points. Therefore they have the same intension and should be interchangable in more complex categories salva veritate. Before I start to develop a hyperintensional semantics for LLC, I will briefly motivate this unusual feature. Conceptually A|B is a substructural version of the constructive implication B → A. Implications are downward monotonic in their first argument and upward monotonic in the second: A⇒B
A⇒B
B→C⇒A→C
C→A⇒C→B
The second inference scheme does hold for the anaphora slash:
145
Lambek Calculus with Limited Contraction
A⇒B A|C ⇒ B|C Suppose we would try to add the second scheme to LLC. With CurryHoward labeling, we would get x : A ⇒ M (x) : B y : C|B ⇒ λz.yM (z) : C|A Imagine there are two non-equivalent proofs for A ⇒ B: x : A ⇒ M1 (x) : B
x : A ⇒ M2 (x) : B
with M1 = M2 . With the rules of LLC plus the above monotonicity rule, we could derive x : A, y : C|B, z : C|B ⇒ x, λw.yM1 (w), λu.zM2 (u) : A • C|A • C|A Applying slash elimination twice gives us x : A, y : C|B, z : C|B ⇒ x, yM1 (x), zM2 (x) : A • C • C Linguistically speaking, this means that the resource x : A, which is structurally ambiguous between the readings M1 and M2 if conceived as a B, can be used as an antecedent for the first anaphor in one reading and as antecedent for the second anaphor in the second reading. Admitting Monotonicity thus predicts that a string like the boy who knows old men and women can simultaneously antecede an anaphoric pronoun referring to the boy who knows women and old men, and another pronoun that refers to the boy that knows old men and old women. Natural language anaphora does not work like this, so the second argument of the anaphora slash cannot be downward monotonic. It cannot be upward monotonic either because then we could not interpret it as a constructive implication. But why not admitting a rule like the following? (3)
x : A ⇒ M (x) : B
y : B ⇒ N (y) : A
z : C|A ⇒ λw.zN (M (N (w))) : C|B The same conceptual reasons that speak against downward monotonicity can also be used as arguments against this kind of intensional compositionality. Structural ambiguity can also arise in reversible sequents.
146
ANAPHORA AND TYPE LOGICAL GRAMMAR
Suppose we augment LLC with the conjunction , which has the same Curry-Howard labeling as the product but is interpreted as set intersection in relational models. The following two labeled sequents come out as theorems: x : A A ⇒ (x)0 , (x)0 : A A x : A A ⇒ (x)1 , (x)1 : A A Together with the rule (3), we can derive x : B|(A A) ⇒ λy.x(y)0 , (y)0 : B|(A A) x : B|(A A) ⇒ λy.x(y)1 , (y)1 : B|(A A) This in turn makes the following derivable: x : A A, y : B|(A A), z : B|(A A) ⇒ x, y(x)0 , (x)0 , z(x)0 , (x)0 : (A A) • B • B Here too the very same resource, x : A A, is used as antecedent for two anaphors in two incompatible readings. Thus a rule like (3) is undesirable. It is nonetheless possible to give a sound and complete model theoretic interperation for LLC.
Definition 40 (Model for LLC) A model M for LLC is a tuple W, R, S, ∼, f, g where W is a non-empty set, R, S ⊆ W 3 are ternary relations on W, W, R form an associative frame, ∼⊆ W 2 is a binary relation on W , f is a function from atomic categories to subsets of W , and g is a function from LLC-categories to W , and the following conditions hold:
Lambek Calculus with Limited Contraction
147
1 ∀xyzwu.Rxyz ∧ Szwu → ∃v.Sxvu ∧ Rvyw 2 ∀xyzwu.Rxyz ∧ Sywu → ∃v.Sxvu ∧ Rvwz 3 ∀xyzwuv.Rxyz ∧ Sywu ∧ Szvu → ∃r.Sxru ∧ Rrwv 4 ∀xyzwu.Rxyz ∧ Szwu ∧ y ∼ u → Rxyw 5 ∀Aw.w ∈ AM → w ∼ g(A) The interpretation of a category A under M depends both on its value under · —which is a compositional extension of f to all categories as usual—and on its value under g. It is not required that g is defined recursively over the syntactic structure of its arguments. Therefore two interderivable categories may have different interpretations due to g. This leads to hyperintensionality as desired. The verification relation between points in W and categories is defined as follows:
Definition 41 (Interpretation for L) Let B be a set of basic categories and M = W, R, S, ∼, f, g a model for L. pM A • BM A\BM A/BM A|BM X, Y M
= = = = = =
f (p) iff p ∈ B {x|∃y ∈ AM ∃z ∈ BM .Rxyz} {x|∀y ∈ AM ∀z.Rzyx → z ∈ BM } {x|∀y ∈ BM ∀z.Rzxy → z ∈ AM } {x|∃y ∈ AM .Sxyg(B)} {x|∃y ∈ XM ∃z ∈ Y M .Rxyz}
Some comments: The ternary relation R is to be thought of as ordinary syntactic composition. Rxyz means that y and z, if they occur adjacently in that order, can be composed to x. The second relation, S is similar, except that it models composition via anaphora resolution. Sxyz means that x can be tranformed into y provided something which is similar to z is available as antecedent for anaphora resolution. The best intuitive approximation for x ∼ y is that x is similar to y, but note that this similarity relation need not be reflexive, symmetric or transitive. The first two postulates require that anaphoric slots are passed on from smaller to larger items. The third postulate expresses that several anaphoric slots can be merged without actually resolving them. The fourth postulate covers anaphora resolution. It basically says that the left component of a complex sign is a suitable antecedent for the right component if the latter is anaphoric.
148
ANAPHORA AND TYPE LOGICAL GRAMMAR
The last postulate relates the two dimensions of interpretation. g(A) can be thought of as the designated representative of the category A. The postulate says that all points that verify a category must be similar to the representative of that category. The definition of validity is entirely standard. Before I start with the soundness proof, I give an axiomatic presentation of LLC, which makes the soundness proof easier.
Definition 42 (Axiomatic version of LLC) The axiomatic version of LLC is the system that is obtained if the following five axioms and rules are added to the axiomatic version of L that was presented on page 55 in Chapter 1. A • B|C A|B • C A|C • B|C A • B|A
→ → → →
(A • B)|C (A • C)|B (A • B)|C A•B
A → B A|C → B|C
Lemma 4 (Equivalence axiomatic/sequent presentation) LLC X ⇒ A iff the arrow σ(X) → A is derivable in the axiomatic version. Proof: For all n ≥ 1, all instances of the following arrow are derivable in the axiomatic system: σ(X, A1 |B, Y1 , . . . , An |B, Yn ) → σ(X, A1 , Y1 , . . . , An , Yn )|B This can be shown by induction over the length of the sequence X, A1 |B, Y1 , . . . , An |B, Yn . If its length is 1, we are dealing with an identity axiom. Suppose the sequence consists of more than one formula. Suppose X is non-empty. Then obviously X ⇒ σ(X) is derivable, and σ(A1 |B, Y1 , . . . , An |B, Yn ) → σ(A1 , Y1 , . . . , An , Yn )|B is derivable by induction hypothesis. Via monotonicity of the product, the second axiom above and possibly associativity, we get
Lambek Calculus with Limited Contraction
149
σ(X, A1 |B, Y1 , . . . , An |B, Yn ) → σ(X) • σ(X, A1 , Y1 , . . . , An , Yn )|B → σ(X, A1 , Y1 , . . . , An , Yn )|B Suppose X is empty and n = 1. Then we have σ(A1 |B, Y1 ) → σ(A1 , Y1 )|B via induction hypothesis, and via monotonicity of the product, associativity and the second axiom, we get σ(A1 |B, Y1 ) → σ(A1 , Y1 )|B Finally, suppose X is empty and n > 1. Then, by induction hypothesis, σ(A2 |B, Y2 , . . . , An |B, Yn ) → σ(A2 , Y2 , . . . , An , Yn )|B Via the identity axiom, associativity and monotonicity of the product, and the third axiom, this gives us σ(X, A1 |B, Y1 , . . . , An |B, Yn ) → σ(X, A1 , Y1 , . . . , An , Yn )|B Now we can turn to the actual proof of the lemma. It is obvious that the axioms above are derivable in the sequent system, and the monotonicity rule is a direct consequence of |R. Hence the if -direction is straightforward. We prove the only-if -direction via induction over sequent derivations. The lemma obviously holds for identity sequents. For the rules of L, the induction step was established in the proof of Theorem 6 on page 55 in Chapter 1. Suppose the following sequent is derivable in the sequent system: X, A1 , Y1 , . . . , An , Yn ⇒ C Then the following arrow is derivable in the axiomatic system: σ(X, A1 , Y1 , . . . , An , Yn ) → C Due to the monotonicity rule above, this gives us σ(X, A1 , Y1 , . . . , An , Yn )|B → C|B Together with the result we just proved and Cut, we have σ(X, A1 |B, Y1 , . . . , An |B, Yn ) → C|B
150
ANAPHORA AND TYPE LOGICAL GRAMMAR
So if the lemma holds for the premise of an application of |R in the sequent system, it also holds for the conclusion. It remains to be shown that the truth of the lemma is preserved by |L. As we just showed, it is derivable in the axiomatic system that σ(Z, A|B) → σ(Z, A)|B Together with the fourth axiom plus monotonicity and associativity of the product, we get σ(B, Z, A|B) → σ(B, Z, A) Suppose it is derivable in the sequent system that Y ⇒B and X, B, Z, A, W ⇒ C Then via induction hypothesis, the previous result, Cut, and associativity and monotonicity of the product, we get σ(X, Y, Z, A|B, W ) → σ(C)
This completes the proof.
Theorem 16 (Soundness) For each LLC-sequent X ⇒ A, if LLC X ⇒ A then for all models M, XM ⊆ AM Proof: I will prove soundess of the axiomatic version. Together with the previous lemma, this gives soundness of the sequent system as well. The soundess of the axioms and rules of the axiomatic version of L carries over from Theorem 7 on page 58 since each model for LLC is based on an associative frame. There is a straightforward correspondence between the postulates in the definition of LLC-models and the axioms in the axiomatic presentation. We start with the first axiom A • B|C → (A • B)|C
Lambek Calculus with Limited Contraction
151
Suppose x ∈ A • B|C (for some model M which I leave implicit henceforth). Then there are y ∈ A and z ∈ B|C such that Rxyz. Furthermore, there is a w ∈ B such that Szwg(C). Due to the first postulate, there is a v such that Sxvg(C) and Rvyw. Hence v ∈ A • B, and thus x ∈ (A • B)|C. A|B • C → (A • C)|B Suppose x ∈ A|B • C. Then there are y ∈ A|B and z ∈ C, and thus there is a w ∈ A such that Sywg(B). The second postulate entails that there is a v such that Sxvg(B) and Rvwz. Hence v ∈ A • C, and thus x ∈ (A • C)|B. A|C • B|C → (A • B)|C Suppose x ∈ A|C • B|C. Then there are y ∈ A|C and z ∈ B|C such that Rxyz. Therefore there are w ∈ A and v ∈ B such that Sywg(C) and Szug(C). According to the third postulate, there is an r with Sxrg(C) and Rrwv. Hence r ∈ A • B, and thus x ∈ (A • B)|C. A • B|A → A • B Suppose x ∈ A • B|A. Then there are y ∈ A and z ∈ B|A with Rxyz. Therefore there is a w ∈ B with Szwg(A). According to the fifth postulate, y ∼ g(A), and thus Rxyw due to the fourth postulate. Hence x ∈ A • B. A → B A|C → B|C Finally, suppose that AM ⊆ BM in all models M , and suppose that x ∈ A|C. Then there is a y ∈ A such that Sxyg(C). By assumption, y ∈ B, and hence x ∈ B|C. The completeness proof also follows closely the analogous proof for L.
Theorem 17 (Completeness) For all sequents X ⇒ A, if for all LLC-models M XM ⊆ AM then LLC X ⇒ A
152
ANAPHORA AND TYPE LOGICAL GRAMMAR
Proof: We start with the construction of a canonical model. The set W is simply the set of LLC-categories. For all atomic categories p, f (p) = {A| A ⇒ p} The relation R is defined as RABC iff A ⇒ B • C Likewise, S is defined as SABC iff A ⇒ B|C Finally, A ∼ B iff A ⇒ B, and g(A) = A for all categories A and B. The fact that W, R is an associative frame follows from the associativity of the product (see the completeness proof for L on page 58). It is straightforward to show that the first four additional postulates are fulfilled by this model: 1 If x ⇒ y • z and z ⇒ w|u, then x ⇒ y • w|u due to the monotonicity of the product, and thus x ⇒ (y • w)|u. So y • w has the properties that are required for v. 2 Likewise, w • z would be a witness for the required v for the second postulate. 3 For the third postulate, w • v is the required witness for r. 4 Suppose x ⇒ y • z, and z ⇒ w|y. Due to monotonicity of the product, we have x ⇒ y • w|y, and with the fourth axiom and Cut this entails that x ⇒ y • w, hence Rxyw. The fifth postulate requires that A ∈ B entails that A ⇒ B. We will show something stronger, namely
Lemma 5 (Truth lemma) In the canonical model CM, it holds for all categories A, B that A ∈ BCM iff A ⇒ B We prove this by induction over the complexity of B. If B is atomic, the claim follows from the way CM is constructed. If the main connective of B is one of the three Lambek connectives, the proof of the induction step is identical to the corresponding step in the proof of Theorem 7 on page 58. It remains to be shown that the induction step also goes through for the anaphora slash.
Lambek Calculus with Limited Contraction
153
→ Suppose that B = C|D, and suppose A ∈ C|D. (Unless otherwise stated, interpretation is with respect to the canonical model.) This means that there is an E ∈ C such that SAEg(D). By the model construction and the induction hypothesis, it follows that E ⇒ C and A ⇒ E|D. Due to the monotonicity rule for the anaphora slash and Cut, it follows that A ⇒ C|D. ← Now suppose that A ⇒ C|D. By the construction of the model, we thus have SACD, and thus SACg(D). By induction hypothesis C ∈ C, and hence A ∈ C|D. This completes the proof of the truth lemma. The fifth postulate for LLC-models follows as a corollary, so the canonical modelis in fact a model for LLC. Now suppose that LLC X ⇒ A. Since XCM = σ(X)CM , it follows from the truth lemma that σ(X) ∈ XCM , but σ(X) ∈ ACM . Hence XCM ⊆ ACM , and hence X ⇒ A is not valid. By contraposition, every valid sequent must be derivable. This completes the completeness proof. Before we look at the linguistic applications of LLC, I will close the chapter with a brief review of the formal similarities and differences between Jacobson’s system and LLC.
4.
Relation to Jacobson’s System
The central innovation of Jacobson’s system is the combinator Z. There are four directional instances of Z (if we ignore wrapping, which Jacobson needs to analyze double object constructions properly). However, of these four axioms, only one is used in linguistic analyses, namely x : (A\B)/C ⇒ λyz.x(yz)z : (A\B)/C|A It is easy to show that this instance of Z is a theorem of LLC. The derivation is given in Figure 3.22 on the following page. Despite this kinship, there is a crucial conceptual difference between Z and |E. Z is essentially based on a notion like c-command. Binding is an operation that connects argument places of an operator, and it is determined by the argument structure hierarchy which argument place can bind which other argument place. |E, on the other hand, is purely precedence based. The only structural constraint on binding is the requirement that the binder precedes the bound element. This difference has certain empirical consequences when it comes to the linguistic applications.
154
ANAPHORA AND TYPE LOGICAL GRAMMAR
y : C|A
[z : A]i
2
x : (A\B)/C
yz : C
x(yz) : A\B x(yz)z : B
λz.x(yz)z : A\B
|E, i /E
\E
\I, 2
λyz.x(yz)z : (A\B)/C|A Figure 3.22.
1
/I, 1
Derivation of Z in LLC
Z roughly corresponds to |E, and there is a similar correspondence between the different instances of G in Jacobson’s system and |I in LLC. The two directional instances of G are repeated here for convenience. X ⇒ M : A/B X ⇒ λxy.M (xy) : A|C/B|C X ⇒ M : B\A X ⇒ λxy.M (xy) : B|C\A|C
G>
G<
In the presence of Cut, these inference rules are equivalent to the two axioms z : A/B ⇒ λxy.z(xy) : A|C/B|C z : B\A ⇒ λxy.z(xy) : B|C\A|C They are both theorems of LLC, as the derivations in Figure 3.23 demonstrate. 1 x : B|C
1 x : B|C
i xy : B
z : A/B
i xy : B
z : B\A
/E z(xy) : A λy.z(xy) : A|C
|I, i
z(xy) : A λy.z(xy) : A|C
|I, i
/I λxy.z(xy) : A|C/B|C Figure 3.23.
λxy.z(xy) : B|C\A|C Derivation of G in LLC
\E
\I
155
Lambek Calculus with Limited Contraction
The recursive part of Jacobson’s G, the inference rule given below, is obviously a special case of |I where n = 1. Thus all instances of G are in fact theorems of LLC. x:A⇒M :B y : A|C ⇒ λz.M [(yz)/x] : B|C
G∗
This does not apply for the version of the Geach rule that I dubbed G| . Jacobson introduces it mainly to carry out her analysis of paycheck pronouns. x : A|B ⇒G λyz.x(yz) : (A|C)|(B|C) | This rule is not derivable in LLC, and adding it is not an option, since it destroys the finite reading property. This can be seen from the following sequent: x : A|A, y : A|A ⇒ λzxz, y(xn z) : (A • A)|A With LLC+G| , this labeled sequent is derivable for any non-negative value for n. To see why this is so, consider the derivation in Figure 3.24. y : A|A ⎛
G|
⎜ λvw.y(vw) : (A|A)|(A|A) |E, j ⎜ ⎜ λw.y(xw) : A|A ⎜ ⎝ [x : A|A]j xz : A
i
y(xn z) : A xz, y(xn z) : A • A
λzxz, y(xn z) : (A • A)|A
⎞n−1 ⎟ ⎟ ⎟ ⎟ ⎠ k
•I
|I
Figure 3.24.
The two steps in brackets form a loop since premise and conclusion have the same type. Thus they may be repeated arbitrarily many times, leading to an infinity of different proof terms for the same sequent. A more realistic example for the power of G| is the paycheck sentence in (4). (4)
Every decent man visits his father regularly, but John hasn’t seen him for years.
156
ANAPHORA AND TYPE LOGICAL GRAMMAR
If we assume a Jacobsonian type assignment, LLC and G| , we predict (non-existent) readings where John hasn’t seen his grandfather, his grand-grandfather etc., because we can alway apply G| to him, resolve it with the father function, and repeat this loop. So if we want to reproduce Jacobson’s treatment of paycheck pronouns within LLC, we have to compile all relevant instances of G| into the lexicon. This comes down to the claim that every pronoun is lexically ambiguous between the categories np|np (denoting the identity function over individuals) and (np|np)|(np|np) (denoting the identity function over Skolem functions).3 This complication taken into account, Jacobson’s analyses of functional questions, sloppy inferences, the interaction of pronoun binding with right node raising, i-within-i effects, paycheck pronouns and BachPeters sentences can be reproduced using LLC and adopting Jacobson’s lexical entries. However, an LLC-based theory of binding will lead to different predictions pertaining to the structural constraints on binding (including a different account of Weak Crossover). Furthermore, LLC can be applied to a broad range of ellipsis phenomena. These empirical issues will be discussed in the subsequent chapters.
3 As Glyn Morrill (p.c.) points out, this raises the question why this ambiguity is never morphologically marked. At the present point I have to leave this issue open.
Chapter 4 PRONOUNS AND QUANTIFICATION
1.
Basic Cases
In the previous chapter I introduced LLC, an extension of the associative Lambek calculus L that is formally capable of handling the kind of semantic resource multiplication that we observe in natural language in connection with anaphora. The formal tools thus being prepared, in this chapter I will start the discussion of linguistic phenomena. I begin with basic cases of coreference between a pronoun and a c-commanding proper name. The example (1) is a case in point. (1)
John said he walked.
I assume the obvious lexical entries for John, said, and walked. Furthermore I follow Jacobson in the assumption that pronouns have category np|np and denote the identity map λx.x.1 In the (only) normal form derivation for the reading where he is anaphorically related to John, the anaphoric link is established directly between the antecedent john’ : np and the pronoun. (Since Jacobson’s Z and G are theorems of LLC, it is also possible to reproduce a Jacobsonian analysis by first applying Z to said and G to walked, and combining the results via repeated Modus Ponens, but this derivation would not be in normal form. Normalizing this proof leads to the proof given below.) The natural deduction proof tree is given in Figure 4.1 on the next page. In addition, the sentence has a reading where the pronoun remains free (see Figure 4.2 on the following page). The category of the sentence 1 As alluded to at the end of the previous chapter, I also assume a paycheck assignment for each pronoun that usually plays no role in the analyses.
157
158
ANAPHORA AND TYPE LOGICAL GRAMMAR he λx.x : np|np said john
[john’ : np]i
lex
say’ : (np\s)/s
lex
john’ : np
lex |E, i
walk’ : np\s
walk’ john’ : s
say’(walk’ john’) : np\s
say’(walk’ john’)john’ : s Figure 4.1.
walked
lex \E
/E
\E
Derivation of the first reading of (1)
is then s|np (i.e., it is a sentence containing one free pronoun). Crucially, here we use |I rather than |E to fill the np-slot of the pronoun. As a consequence, the slot is not filled but inherited to the whole structure. he λx.x : np|np said john john’ : np
lex
say’ : (np\s)/s
lex
λy.say’(walk’ y)john’ : s|np
2.
i
walked walk’ : np\s
walk’ y : s
say’(walk’ y) : np\s
say’(walk’ y)john’ : s
Figure 4.2.
y : np
lex
lex \E
/E
\E
|I, i
Derivation of the second reading of (1)
Binding by wh -operators
The second reading of the example above can be synonymous with the first one if the open slot for the pronoun is filled with the denotation of John by the context (i.e., John and he corefer accidentally). The more interesting cases are those where this possibility is excluded because the pronoun is bound by an operator. A simple instance of this configuration is binding by a relative pronoun as in (2). (2)
the man who said he walked
A subject relative pronoun like who has category (n\n)/(np\s). Thus its complement clause must be of category np\s. This amounts to saying that its complement forms a clause if preceded by an np. This hypothetical np can serve as antecedent of the pronoun via |E before it is discharged by \I. It is this interaction between hypothetical reasoning
159
Pronouns and Quantification
and |E that is used to model all kinds of bound readings in the present account. The derivation for (2) is given in Figure 4.3. he λy.y np|np z np
said
1
man’ n
the λP ιx.P x np/n
say’(walk’z) np\s
who
say’(walk’z)z s
λP Qw.Qw ∧ P w (n\n)/(np\s)
λz.say’(walk’z)z np\s
λQw.Qw ∧ say’(walk’w)w n\n λw.man’w ∧ say’(walk’w)w n
walked walk’ np\s
walk’z s
say’ (np\s)/s
[z]1 np
man
|E, 1
\E
/E
\E
\I, 1 /E
\E
ιx.man’x ∧ say’(walk’x)x np
Figure 4.3.
3.
Derivation of (2)
Binding by Quantifiers
Like wh-operators, quantifying expressions can bind pronouns. As we will see shortly, this is modeled by means of the same kind of interaction of hypothetical reasoning and |E as binding by wh-operators in LLC. Before we can discuss binding though, I present a brief discussion of the treatment of quantification in TLG in general.
3.1
Quantification in TLG: Moortgat’s in-situ Binder
When I touched on the issue of quantification in TLG in Chapter 1, I assigned quantifying expressions like somebody the type s/(np\s). This is appropriate for quantifiers in subject positions as in the example (3), as the reader can easily verify. (3)
Somebody left John.
This type assignment for somebody does not cover constructions where the quantifier occurs in object position, as in
160
ANAPHORA AND TYPE LOGICAL GRAMMAR
(4)
John left somebody.
Constructions like this can be handled if we assign quantifiers the additional type (s/np)\s. The meaning assignment (for instance λP ∃xP x for somebody) can remain unchanged. This accusative quantifier type can be derived from the type np by means of backward lifting in L: L x : np ⇒ λy.yx : (s/np)\s So conjoinability of accusative quantifiers with proper names is correctly predicted to be possible, cf. (5) and the corresponding derivation in Figure 4.4. (Here I assign the coordination particle and the polymorphic category (X\X)/X, which schematizes over all its instances where X is uniformly replaced by a Boolean category.) (5)
John left Bill and most of the pets. lef t
John john’ np
lex
lex
leave’x np\s leave’xjohn’ s
/I, 2
Bill
2 x np
leave’ (np\s)/np
\E
P : s/np
1
/E
bill’ np
P bill’ s
/I, 1
lex /E
and λpq.q ∩ p (X\X)/X
λP.P bill’ s/(np\s)
lex
leave’bill’john’ ∧ most’pets’(λx.leave’xjohn’) s
Figure 4.4.
most’pets’ (s/np)\s
λq.q ∩ most’pets’ ((s/np)\s)\((s/np)\s)
λP (P bill’ ∧ most’pets’P ) (s/np)\s
λx.leave’xjohn’ s/np
most of the pets
lex
/E
\E
\E
Derivation of (5)
However, this two-way ambiguity only works for quantifiers that occur in a clause peripheral position. Consider an example like (6), where the quantifier is the direct object in a double object construction. (6)
John left something to Susan.
To derive the intuitively correct reading for this sentence, we would need a third type assignment to something, namely (s/pp/np)\s/pp. Furthermore, this type assignment would come with the meaning λxy∃z.yzx, so we have to resort to a genuine semantic ambiguity. Yet another lexical entry for something is needed when it is followed by an adverb, as in (7)
John left something yesterday.
Pronouns and Quantification
161
The list of constructions that each requires another lexical entry for the same element can be extended almost indefinitely. While the list is probably still finite, we obviously miss a generalization this way.
3.1.1 The Logic of q Based on these considerations, Moortgat, 1990 proposed extending the logical apparatus of L in a way that allows a uniform treatment of quantification. I will use the somewhat more advanced version of his theory that he presented in Moortgat, 1996a.2 Derivations for quantificational structures that use either the subject type or the object type for quantifiers (s/(np\s) or (s/np)\s) all follow an inference pattern like 1 Suppose the sequent X, x : np, Y ⇒ M : s is derivable. 2 Let Q be the type of a quantifier. 3 Then X, y : Q, Y ⇒ y(λxM ) : s is derivable as well. If Q = s/(np\s), X must be empty, and conversely, if Q = (s/np)\s, then Y must be empty. Moortgat proposes two innovations: One type for quantifiers. The inference scheme given above holds without restrictions on X of Y . Getting down to the technical details, Moortgat extends the inventory of type forming connectives of L with a new three-place operator q.
Definition 43 If A, B, C are types, then q(A, B, C) is a type as well. τ (q(A, B, C)) = τ (A), τ (B), τ (C). The intuitive meaning of q(A, B, C) can be paraphrased as Replacing a premise of type A inside a structure of type B by a premise of type q(A, B, C) changes the type of the whole structure from B to C.
Generalized quantifiers in natural language receive type q(np, s, s), so they assume np-positions in the context of an s without changing the 2 To keep both Chapter 2 and the present chapter self-contained, I partially repeat the discussion of the in situ binding mechanism from pp 81.
162
ANAPHORA AND TYPE LOGICAL GRAMMAR
type of the local clause. The crucial aspect of this inference pattern for the purposes of quantification is the fact that the q(A, B, C)-premise takes semantic scope over the whole resulting C-structure. This is ensured by appropriate Curry-Howard labeling of the inference rules for q. As for the other logical connectives, there are logical rules for q in sequent format, in sequent style natural deduction, and in tree style natural deduction. The intuitive content is presumably most clearly conveyed in the sequent style natural deduction presentation. The elimination rule and the introduction rule for q are as given in Figure 4.5. Y ⇒ M : q(A, B, C)
X, x : A, Z ⇒ N : B
X, Y, Z ⇒ M (λxN ) : C
Figure 4.5.
X⇒M :A qE
X ⇒ λy.yM : q(A, B, B)
qI
Sequent style natural deduction rules for q
The elimination rule is a direct formalization of the intuitive content of q given above. Note that the resource x : A in the right premise is a hypothesis that gets bound by the operator M in the Curry-Howard term of the conclusion. Since this binding is achieved without movement operations, the q-operator is also called in situ binder. The introduction rule extrapolates the type lifting combinator for the Lambek style quantifier types to all quantifiers. It is easy to see that the proof of Cut elimination for sequent style natural deduction for LLC given in the previous chapter does extend to logics with these rules. So both L+q and LLC+q enjoy Cut elimination of their sequent style natural deduction calculi. To establish decidability and the finite reading property, we also need sequent rules for q. By some elementary transformations of the natural deduction rules, we obtain the sequent rules given in Figure 4.6. X, x : A, Y ⇒ M : B
Z, y : C, W ⇒ N : D
Z, X, z : q(A, B, C), Y, W ⇒ N [z(λxM )/y] : D X⇒M :A X ⇒ λy.yx : q(A, B, B) Figure 4.6.
qL
qR
Sequent rules for q
Lambek’s Cut elimination algorithm extends to q without further ado. So we can immediately conclude that both L + q and LLC + q enjoy Cut
163
Pronouns and Quantification
elimination in their sequent style presentation. Since the sequent rules above also have the subformula property, this leads to decidability and the finite reading property for both logics.
3.1.2 Natural Deduction for q To conclude the introduction of the logic of in situ binding, I give natural deduction rules for q in tree format. To this end, I extend the definition from the previous chapter for LLC with the following rules. Definition 44 (Natural deduction for q in tree format) 1 (qI) If α is a proof tree with the conclusion sequence X, M : A, Y , then α is a proof tree as well, where α is the result of adding a new node λx.xM : q(A, B, B); qI to α that has M : A as its only premise (where x is a variable of type τ (A), τ (B) that does not occur anywhere in α). 2 (qE) Let α be a proof tree with the sequence X, x : A, Y as undischarged premises and M : B as its single conclusion, and let β be a proof tree with X , N : q(A, B, C), Y as conclusions (where X , Y are like X, Y except that all formulae are not necessarily labeled with variables). Then α + β is a proof tree as well, where α+β is the result of 1. replacing all occurrences of variables from X, Y in α by the corresponding terms from X , Y , 2. merging the two graphs by having N : q(A, B, C) immediately dominate x : A, indexing the latter node with a fresh index i and identifying all nodes from X, Y with the corresponding nodes from X , Y , and 3. creating a new node N (λxM ) : C; qE, i with M : B as its only premise. The schematic graphical representation of the two rules is given in Figure 4.7. .. . .. . .. .
.. . N : q(A, B, C)
.. . i
y:A .. .
.. . .. .
.. . M :A qI λx.xM : q(A, B, B)
M :B qE, i N (λy.M ) : C Figure 4.7.
Natural Deduction rules for q in tree format
164
ANAPHORA AND TYPE LOGICAL GRAMMAR
It is easy to see that Cut elimination for tree format natural deduction also works in the presence of the rules for q. Proofs using the in situ binder may lead to β-normalization configurations. The relevant pattern is given in Figure 4.8. In contradistinction to the other connectives, q never gives rise to η-normalization, since the proof term of qI can never be an η-redex. .. .
.. . M :A
.. . qI
.. .
λx.xM : q(A, B, B)
.. .
y:A .. .
.. .
i
; .. .
.. . .. . .. .
.. . M :A .. .
.. . .. . .. .
N [M/y] : A
N :B qE, 1 (λx.xM )(λyN ) : B Figure 4.8.
β-normalization for q
Like the other normalization steps, β-normalization for q eliminates one application of an introduction rule from the proof. Thus the number of instances of introduction rules is always reduced by normalization, and therefore there cannot be infinite sequences of normalization steps. In other words, strong normalization holds for LLC+q. The proof of the normal form theorem for LLC can also easily be extended to LLC+q. As for |E, we have to change the Curry-Howard labeling for qE slightly to make the proof go through. The modified rule is given in Figure 4.9 on the facing page. The only difference from the original labeling is the change from N (λyM ) to N (λy.xM ) in the label of the conclusion. Here x is a variable of type τ (B), τ (B) that does not occur anywhere else in the proof. This modification ensures that the argument of N is never an η-redex. Under this proviso, it still holds that any subterm of a Curry-Howard term of a proof tree that is a redex is the label of a node in this proof tree. Now suppose the Curry-Howard term of a conclusion of a proof tree contains a redex. In addition to the possibilities that were discussed in the proof of the normal form theorem for LLC (cf. pp. 143), we have to consider the possibility that this redex has the form (λxM )N : C and is the result of some application of qE, i. Then (λxM )N : C is the conclusion of a node O : B with N = λyO, and this node in turn is dominated by some node y : A. The latter node is then indexed with i and is the single conclusion of λxM : q(A, B, C). This node must be the
165
Pronouns and Quantification
.. . .. . .. .
.. . N : q(A, B, C) y:A .. .
.. . i
M :B N (λy.xM ) : C Figure 4.9.
.. . .. . qE, i
Modified rule qE
single conclusion of an application of qI, and this means that M = xP , B = C, and the premise of this application of qI is P : A. The proof tree in question as a whole thus contains an β-normalization configuration. By contraposition, a proof tree in normal form only contains CurryHoward labels in normal form. Furthermore β-normalization for q is accompanied by two steps of β-reduction in the Curry-Howard label. So generally, proof normalization is accompanied by term normalization. Since normal forms of proof trees always produce normal form terms and normal forms of λ-terms are unique, normal forms of proof must be unique as well. Hence the normal form theorem holds for LLC+q as well, and therefore LLC+q also has the Church-Rosser property.3
3.1.3 The Treatment of Scope and Scope Ambiguity with q As mentioned above, generalized quantifiers in natural language can be analyzed by assigning them the type q(np, s, s). Scoping of a quantifier in the derivation of a sentence now proceeds in three steps: 1 Replace the quantifier by a hypothetical np. 2 Derive an s using this hypothesis. 3 Discharge the hypothesis and replace it by the quantifier. It is worth remarking that on an intuitive level, this kind of reasoning has strong similarities to other scoping mechanisms like Montague’s (1974) Quantifying In, Cooper’s (1983) storage mechanism or May’s (1985) Quantifier Raising. Carpenter, 1998 contains a lucid discussion of the relationship between these approaches to scope taking. 3 This proof strategy is derived from Carpenter’s (1998) proof that proof trees for the product free fragment of L+q have a unique β-normal form.
166
ANAPHORA AND TYPE LOGICAL GRAMMAR
If more than one quantifier is present in a sentence, the order of the applications of qE is underdetermined. This leads to multiple proofs corresponding to different scope readings. A simple example is (8)
John gave every student a book.
Determiners like every or a are operators that take an n to their right and return a generalized quantifier.4 Their type is thus q(np, s, s)/n. Given this, there are two derivations for (8), corresponding to the two scope readings of the sentence (Figures 4.10 and 4.11). every
student
lex
every’ q(np, s, s)/n gave (np\s)/np/np John np
n
x : np
(np\s)/np
lex
/E
q(np, s, s)
lex
1
give’xyjohn’ : s
John np
lex
lex
/E 2
/E
\E qE, 2
student
lex
n
x : np
1
give’xyjohn’ : s
a
book
lex
some’ q(np, s, s)/n
n
q(np, s, s)
/E np\s
some’book’(λygive’xyjohn’) : s
lex
/E
q(np, s, s)
(np\s)/np
y : np
lex /E
2
/E
\E
qE, 2
every’student’(λxsome’book’(λygive’xyjohn’)) : s
Figure 4.11.
lex
First reading of (8)
every
(np\s)/np/np
n
qE, 1
every’ q(np, s, s)/n gave
book
lex
some’ q(np, s, s)/n y : np
some’book’(λyevery’student’(λxgive’xy)john’) : s
Figure 4.10.
a
q(np, s, s)
/E np\s
every’student’(λxgive’xyjohn’) : s
lex
qE, 1
Second reading of (8)
The relative scope of quantifiers is unambiguous though if one quantifier is a sub-constituent of the other, as in the following example: 4 The
analysis of indefinites is to be revised slightly in Chapter 6.
Pronouns and Quantification
(9)
167
A friend of every member attended the meeting.
Here q-elimination for a friend of every member has to precede qelimination for every member. This is a consequence of the procedural definitions of proof trees given above. This leads to the reading where every takes scope over a. (The other reading where every only takes scope over friend can only be obtained by assuming a second type assignment for every, namely q(np, n, n)/n.) For an in-depth discussion of this and related empirical predictions that arise as a consequence of the proof theory of q, the reader is referred to Carpenter, 1998. A more advanced type logical treatment of quantifier scope in a multimodal setting is pursued in Moot and Bernardi, 2000.
3.2
Interaction of qE with |E
According to Moortgat’s mechanism, scoping of quantifiers involves the introduction of a hypothesis of type np which is later discharged. This hypothesis can be used as an antecedent for anaphora resolution, just like hypotheses that arise in the derivation of wh-constructions. This interaction of hypothetical reasoning with |E leads to binding of pronouns by quantifiers.5 Consider an elementary example like (10). (10)
Everybody loves his mother.
(To simplify the derivation somewhat, I ignore the internal structure of the NP his mother and treat it as a lexical unit with the category np|np denoting the mother function.) Scoping the quantifier everybody amounts to replacing the subject by an np-hypothesis x, proving that x loves his mother has type s, and finally discharging x and replacing it by everybody. Using x as antecedent for his mother results in a bound reading. The derivation is given in Figure 4.12 on the next page. The procedural rule for qE requires that it operates on a proof tree with a single conclusion. Thus if the temporary hypothesis used in qE is used as antecedent for |E, the other half of |E—the anaphor—must dominate this instance of qE as well. Any derivation which violates this constraint—such as that in Figure 4.13 on the following page—is thus illicit, since it cannot be proved to be a proof tree according to the recursive definition of proof trees. Note that this constraint is not an ad hoc stipulation about proof trees. Rather, it ensures that the proof tree format is equivalent to the 5 So the deductive treatment of bound pronouns from Pereira, 1990 arises as a consequence of the interaction of |E with the in situ binding mechanism here.
168
ANAPHORA AND TYPE LOGICAL GRAMMAR
his mother everybody q(np, s, s) everybody’
loves
lex 1
(np\s)/np love’
[np]i x
np|np mother’
lex
np mother’x
np\s love’(mother’x) s love’(mother’x)x
|E, i /E
\E
qE, 1
s everybody’(λx.love’(mother’x)x) Figure 4.12.
Derivation for (10)
everybody
he
lex
lex λx.x talked np|np |E, 1 lex y talk’ and np np\s \E lex λpq.q ∧ p talk’y /E λq.q ∧ talk’y \E everybody’(λy.walk’y) ∧ talk’y
everybody’ walked q(np, s, s) 1 lex [y]i walk’ np np\s \E walk’y s qE, 1 everybody’(λy.walk’y)
Figure 4.13.
An illicit proof tree
sequent formats. It thus arises naturally from the proof theories for | and q. In more linguistic terms, this restriction amounts to the requirement that every bound pronoun is inside the scope of its binder. As a consequence, the examples (11a,b) do not display a scope ambiguity because only one order of quantifier scoping ensures that the pronoun is in the scope of its binder. In (11c), the scope of the quantifier is confined to the local clause (how this kind of restriction on scoping can be modeled in TLG goes beyond the scope of this book; see Morrill, 1994 for discussion); therefore the indicated binding pattern is excluded. (11)
a. [Every man]i saw a friend of hisi . *∃∀ b. Everyi admirer of a picture of himselfi is vain. *∃∀ c. *The man who knows [every customer]i treats himi politely.
Pronouns and Quantification
4.
169
Weak Crossover
Given the way I formulated |E, the current theory of anaphora and scope predicts just one constraint on pronoun binding: the position of the antecedent has to precede the pronoun. In the case of genuine binding, the antecedent is a hypothesis that is to be discharged later. This hypothesis can be equated with the base position of the binding operator in transformational terms. We thus predict that the base position of the binder has to precede the bound pronoun. In the case of quantifiers, the linear position of the quantifier coincides with the position of the hypothesis. So we correctly expect that a subject quantifier can bind a pronoun that is embedded in the object, but not vice versa. (12)
a. Every Englishmani loves hisi mother. b. *Hisi mother loves every Englishmani .
The same subject-object asymmetry is predicted to arise in connection with binding by wh-operators. During the derivation of a subject relative clause like (2), a hypothetical np has to be put into the subject position of the relative clause (cf. Figure 4.3 on page 159), and this hypothesis in turn can serve as antecedent for a pronoun to its right. Matters are different with object relative clauses such as (13)
*the man whoi hisi mother loves
Here the relative pronoun has the category (n\n)/(s/np), and to prove that who his mother loves has category n\n, we have to prove that his mother loves has category s/np. This requires hypothesization of an np in the object position of loves, but this hypothesis cannot antecede the pronoun his because it does not precede it. Therefore the derivation of the bound reading of (13) fails. For the basic cases of Weak Crossover, the present system thus makes the same predictions as a c-command based account like Jacobson’s, but for entirely different reasons. As in many other theories since Postal, 1972 and Chomsky, 1976, Weak Crossover is treated here as a leftness effect which is entirely independent of hierarchical structure.
5.
Precedence Versus c-command
The issue of the proper explanans for Weak Crossover is just one instance of a more general issue: Is pronoun binding determined by precedence or by c-command? The notion of c-command was introduced by Tanya Reinhart in Reinhart, 1976, and it was explicitly designed to capture the structural
170
ANAPHORA AND TYPE LOGICAL GRAMMAR
conditions on pronominal anaphora. This idea has become the mainstream in generative linguistics, and much syntactic work on anaphora has been devoted to the attempt to accommodate apparent counterexamples (the most striking examples being Kayne, 1994 who argues that precedence is an epiphenomenon of c-command, and Pesetsky, 1995, who stipulates different simultaneous constituent structures to explain discrepancies between binding patterns and conventional tests for constituency). On the other hand, the idea that linear order determines binding possibilities has been revived every now and then (see for instance Barss and Lasnik, 1986, Gawron and Peters, 1990, and Bresnan, 1994). Bresnan, 1998 provides interesting evidence that crosslinguistically, both precedence and hierarchy play a role, and that the competition between these two forces is resolved in a different way in different languages. However, in the present book I restrict my attention to binding patterns in English, and here a precedence based account seems in fact to be superior. Let me briefly recapitulate the main arguments for each view. Reinhart argues that binding is impossible in configurations where a quantifier precedes a pronoun without c-commanding it. For instance, in the following examples, a quantifier embedded in a subject cannot bind a pronoun inside the VP (the examples are taken from Reinhart, 1983). (14)
a. *People from [each of the small western cities]i hate iti . b. *Gossip about [every businessman]i harmed hisi career. c. *The neighbours of [each of the pianists]i hate himi .
Likewise, binding from the object into an adjunct fails. (15)
a. *We changed the carpets in [each of the flats]i to make iti look more cheerful. b. *I placed the scores in front of [each of the pianists]i before hisi performance.
(16)
a. *So many patients called [a psychiatrist]i that hei couldn’t handle them all. b. *We fired [each of the workers]i since hei was corrupt.
On the other hand, there are well-known inverse linking constructions where a quantifier that is embedded in a matrix NP can bind a pronoun that follows the matrix NP: (17)
a. Everybodyi ’s mother loves himi .
171
Pronouns and Quantification
b. The policemen turned a citizen of [each state]i over to itsi governor. (from Gawron and Peters, 1990) The derivation for (17a) is given in Figure 4.14. It does not differ from the binding pattern in Figure 4.12 on page 168 in any significant way. everybody q(np, s, s) every’
lex
1
[np]i y
s
np\np/n of’ np/n of’y
him
lex \E
mother n mother’
loves
lex /E
lex
(np\s)/np loves’
np of’ymother’
np y
np\s love’y s love’y(of’ymother’)
lex
np|np λx.x
|E, i /E
\E
qE, 1
s every’(λy.love’y(of’ymother’))
Figure 4.14.
Derivation for (17a)
The case for c-command and against precedence could be settled if there were configurations where a quantifier c-commands and follows a pronoun that it binds. Reinhart gives a construction for which this seems to be the case. Consider the following minimal pair (also taken from Reinhart, 1983): (18)
a. Near hisi child’s crib, nobodyi would keep matches. b. *Near hisi child’s crib you should give nobodyi matches.
Reinhart, 1983 assumed a phrase structure like the one given below, where the topicalized PP is a sister of S. Also, she employs a definition of c-command according to which the subject c-commands the PP in this configuration (roughly because S and S are not distinct enough to block c-command). S’ PP
S NP
VP V
NP
172
ANAPHORA AND TYPE LOGICAL GRAMMAR
In this configuration, the subject does and the object does not c-command a pronoun inside the topicalized PP, while neither subject nor object precede it. Since binding from the subject is possible but binding from the object is not, this contrast is striking evidence in favor of ccommand, it seems. On the other hand, more modern theories of phrase structure integrate topicalized constituents into the X-bar scheme, and under this assumption neither subject nor object c-command the pronoun. The contrast appears to be mysterious under either account. There are good reasons, however, to assume that anaphora resolution can take place prior to certain movement operations (or after reconstruction, if you prefer this metaphor). Under this assumption, a c-command based theory can account for the contrast if one assumes that the base position of the topicalized PP is adjoined to the VP, i.e., a position that is ccommanded by the subject but not by the object, as in the following structure: S’ PPi
S VP
NP VP V
ti
NP
If this approach is on the right track, we expect that the same contrast shows up if the PP remains in situ. If the PP occurs clause finally, both binding from the subject and binding from the object is impeccable (which provides in itself another argument against c-command). (19)
a. Nobodyi would keep matches near hisi child’s crib. b. You should give nobodyi matches near hisi child’s crib.
A c-command based account could be maintained if we assume that the construction in (18) is derived from a different structure than (19b). If this option is taken into account, however, we may also assume that the base position of the PP in (18) is preverbal, and then precedence would predict the contrast as well. We thus conclude that the contrast in (18) is equally problematic for both accounts. The binding possibilities between the objects in double object constructions are similarly asymmetric to those between subject and object. The relevant contrast is illustrated below.
Pronouns and Quantification
173
(20)
a. Mary gave [every student]i a copy of hisi term paper. b. *Mary gave itsi author [every paper]i .
(21)
a. *Mary gave a copy of hisi term paper to [every student]i . b. Mary returned [every paper]i to itsi author.
It is obvious that these patterns provide prima facie support for a precedence based account. Proponents of c-command as the crucial structural factor for binding have to make some additional assumptions to cover these data. For instance, Bach, 1979 and many subsequent works in Categorial Grammar employ a wrapping operation to ensure that the second object c-commands the first one. Researchers from the generative tradition have proposed to assume empty nodes to make asymmetric c-command co-extensional with linear order (Larson, 1988, Kayne, 1994), or assuming several simultaneous constituent structures for the same string (Pesetsky, 1995). Let us briefly wrap up the discussion so far. In many prototypical cases of binding, the binder both c-commands and precedes the pronoun, so these constructions do not help to decide between the two approaches. Next, it is fairly easy to construct cases where the binder precedes the pronoun without c-commanding it. Here the evidence is not unequivocal. In some of these cases (like (14) – (16)), binding is excluded, while it is readily possible in other examples (cf. (17)). There are no undisputed instances of backward binding under c-command. The constructions in (20b) and (21a) would be cases in point if we would assume a simple concatenative phrase structure without empty categories for double object constructions, and here binding is excluded. However, it is under dispute what exactly the c-command relations in these constructions are. So it seems that a precedence based account overgenerates in certain respects while a c-command based account undergenerates. Given this state of affairs, I will continue to use the precedence based account that is implicit in LLC and I will tacitly assume that there are further constraints on anaphora resolution that are not directly linked to the logic of this operation. A final remark on the issue of c-command: While this notion arguably plays no role for determining the structural binding configurations of personal pronouns, matters are different for reflexive pronouns. Here apparently, a combination of precedence and c-command is operative. So while the pronoun himself is a clause-mate of and follows its intended binder John in both (22a) and (22b), binding is possible in (a), but not in (b). (22)
a. Johni likes himselfi .
174
ANAPHORA AND TYPE LOGICAL GRAMMAR
b. *Johni ’s mother likes himselfi . It seems that c-command is in fact a necessary condition for binding of reflexives. One might hypothesize that this is in fact an epiphenomenon of the fact that reflexivization operates on lexical items, which would also account for its clause boundedness. However, some languages have longdistance reflexives, i.e., anaphoric pronouns that require a c-commanding antecedent which may be located in a superordinate clause. (Such items occur for instance in Icelandic, see Thrainsson, 1976.) From this we conclude that reflexive pronouns, including long distance reflexives, cannot adequately be analyzed in terms of the connective “|”. Note, however, that they have exactly the properties that a Moortgat style approach to anaphoric binding predicts for all anaphors: They must be bound, and the binder must c-command them.6 Therefore I suggest that a quantificational analysis of reflexives in the sense of Moortgat, 1996a is adequate, while the resolution of personal pronouns works via |E.
6.
Backward Binding and Reconstruction
A purely precedence based theory of anaphora resolution seems to be plainly falsified by the fact that there is cataphora in natural language. In this section I will briefly explore to what extent LLC is able to deal with this. Here we have to consider several subcases. Backward binding may arise by means of an interaction between anaphora resolution and hypothetical reasoning. These are cases that would be analyzed as involving reconstruction in transformational syntax, and I will adopt this term for convenience. Next, cataphora may be an instance of accidental coreference. Finally, there are certain cases of backward binding that have to be dealt with by means of lexical rules.
6.1
Reconstruction in LLC
Consider the following example: (23)
Which of hisi friends did [every student]i see?
Here the quantifier every student binds the pronoun his without preceding it (and also without c-commanding it). Intuitively, this is due to the fact that the pronoun sits inside a wh-phrase, and the base position of 6 Strictly speaking, a Moortgat style approach predicts an f-command constraint (in the sense of Bach and Partee, 1980) rather than a c-command constraint. For the purpose of the present discussion however, these notions can be identified.
Pronouns and Quantification
175
this phrase is to the right of the quantifier. So prior to wh-movement (or after reconstruction), the quantifier precedes the pronoun. How can this intuition be of any use in a monostratal theory like TLG? Let us first consider a similar example without binding to clarify the general mechanism. (24)
Which of the students did every professor see?
Since I am not concerned with the semantics of possessive constructions, I will treat which of as a single lexical entry. Its meaning is a function that maps a pluralic entity to an interrogative quantifier over atoms from this entity (in the sense of a semantics of plural along the lines of Link, 1991). By “interrogative quantifier” I mean a function from a property to a question, like for instance the denotation of which flower. The analysis of reconstruction I am going to pursue is independent of the semantics of questions, so I leave it open what the denotation of a question is. Syntactically, which of first combines with an np and then with a clause that lacks an object. Under the simplifying assumption that the object is always clause final in English, this leads to the following lexical entry for which of. Here Q is the category of questions, and “y ≤a x” is to be read as “y is an atomic part of x”. (25)
which of – λxP ?(λy.y ≤a x)P : Q/(s/np)/np
Given this, the derivation of (24) is straightforward. It leads to the meaning representation in (26). Here, σ is Link’s sum operator. (26)
?(λy.y ≤a σx.student’x)(λy∀z(professor’z → see’yz))
In the original example (23), the complement of which of is an np containing a pronoun, so it has category np|np, and it denotes a Skolem function. The wh-phrase itself also contains a pronoun, and therefore it binds a hypothesis of category np|np, whereas the wh-phrase in (24) binds a hypothesis of category np. So the category of which of in (23) is Q/(s/np|np)/np|np. The semantics of this secondary lexical entry of which of ensures that the anaphora slot in the main clause binds the anaphora slot in the restrictor. So the second lexical entry for which of comes out as (27)
which of – λfe,e Pe,e,t ?(λg.g ≤a f )P : Q/(s/np|np)/np|np
Here the relation ≤a is type shifted from individuals to Skolem functions by the definition g ≤a f ⇐⇒ ∀x(gx ≤a f x)
176
ANAPHORA AND TYPE LOGICAL GRAMMAR
Given this, the semantics of (23) comes out as in (28). The derivation is given in Figure 4.15 on the facing page. I analyze his friends as denoting the function from individuals x to the sum of x’s friends. Furthermore, since I am not concerned with the syntax of auxiliary inversion, I do not distinguish between the base form of a verb and its inflected form, and I analyze do as an identity operation over clauses. (28)
?(λg.g ≤a λxσyfriend of’xy)(λg∀z(student’z → see’(gz)z))
So the example is analyzed as a functional question that can be paraphrased as Which function g from individuals to one of their friends is such that every student z saw gz? The important point here for the issue of binding is the fact that strictly speaking, binding—i.e., |E—takes place between the hypothesis corresponding to the quantifier (which is in subject position) and the hypothesis corresponding to the wh-phrase (which is in object position). Between these two positions, the linear order is as it should be, i.e., the antecedent precedes the anaphor. The connection between the surface position of the pronoun and the hypothesis that gets bound is mediated by the lexial entry of the wh-operator. It should be noted that this treatment of reconstruction is a natural extension of Jacobson’s analysis of functional questions that was discussed in Chapter 2. Jacobson argues that interrogative pronouns have the categories Q/(s/np) and Q/(s/np|np). I extrapolate this treatment to interrogative determiners, thereby covering the binding of pied-piped bound pronouns. This account of backward binding via reconstruction works for topicalization as well. For a construction like (29a), we will have to assume the lexical entry (29b) for the preposition in. (29)
a. In New York City, John is happy. b. in – λxP.P (in’x) : s/(s/pp)/np
For cases where the fronted PP contains a bound pronoun as in (30a), I assume that the preposition has an additional entry (30b). (30)
a. In his hometown, everybody is happy. b. in – λge,e Pe,e,t .P (λxin’(gx)) : s/(s/pp|np)/np|np
The higher order type of the preposition will trigger the assumption of a clause final hypothesis of type pp|np, which in turn can be bound (in forward direction) by the quantified subject. Since any arbitrary number of pronouns might be bound by means of reconstruction, the strategy advocated here leads to the conclusion
λP Q∀z(P z → Qz) q(np, s, s)/n
lex
student student’ n
λQ∀z(student’z → Qz) q(np, s, s)
lex /E
which of λf P ?(λg.g ≤a f )P Q/(s/np|np)/np|np
lex
his f riends λxσyfriend’xy np|np
λP ?(λg.g ≤a λxσyfriend’xy)P Q/(s/np|np)
lex
lex
2
see’(gx) (np\s)
∀z(student’z → see’(gz)z) s ∀z(student’z → see’(gz)z) s λg∀z(student’z → see’(gz)z) s/np|np
?(λg.g ≤a λxσyfriend’xy)λg∀z(student’z → see’(gz)z) Q
Derivation of (23)
gx np
|E, i /E
\E
qE, 2 /E
/I, 1 /E
177
Figure 4.15.
lex
see’(gx)x s
λP.P s/s
/E
see see’ (np\s)/np
[x]i np did
1 g np|np
Pronouns and Quantification
every
178
ANAPHORA AND TYPE LOGICAL GRAMMAR
that certain lexical items are polymorphic, i.e., infinitely ambiguous. Decidability of the language that is determined by a lexicon—or, more generally, effective computability of the form-meaning map determined by a grammar—is not undermined by this though. The shifted lexical entry for a topicalized preposition need not contain more pronoun slots than its complement contains pronouns. So for a given string, there are always only finitely many lexical entries to be considered, and thus decidability is guaranteed. This consideration applies ceteris paribus to wh-operators as well.7 If the possibility of binding under reconstruction depends on lexical properties of some “moved” element, we might expect that its availability is lexically restricted. This is in fact the case: backward binding into a how -phrase is impossible, as observed in Bresnan, 1994, whence the following example is taken. (31)
6.2
a. How sure does everyone seem? b. *How sure of hisi ideas does everyonei seem?
Accidental Coreference
According to the present theory, coreference between a pronoun and a name may arise in two ways. It may be grammatically determined (by means of an application of |E). Furthermore, a pronoun may remain free as far as sentence grammar is concerned, and its value may be supplied by the context. It is of course possible that this value happens to be identical to the denotation of some proper name in the same sentence. If this name follows the pronoun in question, we obtain a pattern of accidental cataphora that is not determined by sentence grammar. The sentence in (32) (modeled after an example from Williams, 1997) provides an example. WELCOMED Johni . (32) Hei won the race and we *welcomed JOHNi
Since the referent of John has to be salient in the context in this example, the name John is preferably anaphorically de-accented here. So “coanaphora” would be a better term than cataphora for configurations like this. A more complex instance of accidental cataphora is the following (also from Williams, 1997): 7 Needless to say, it is theoreticallly unsatisfactory to assume an infinite ambiguity; ideally, all these entries should be captured by one higher order category in the sense of Morrill, 1994. I leave this issue for further research.
Pronouns and Quantification
(33)
179
Anyonej who has written iti can turn [hisj term paper]i in to me now.
Accidental coreference is apparently not a viable explanation here since the antecedent his term paper contains a bound pronoun and is thus not referential. However, this construction is a Bach-Peters sentence, and Jacobson’s (2000) treatment can be applied (see pp. 113). To repeat the basic assumptions, the first pronoun it is analyzed as a paycheck pronoun. This means it has type (np|np)|(np|np) and denotes the identity function over Skolem functions. It is accidentally coreferent with the “antecedent” his term paper, which denotes the Skolem function from individuals to their term papers. So after contextual resolution of it, the construction in (33) involves two instances of forward binding (from anyone to it and to his) and no backward binding.
6.3
Backward Binding without Reconstruction
To test whether there are genuine cases of grammatically determined cataphora in English (apart from those that can be analyzed via reconstruction), we have to look for cases of backward binding which cannot be analyzed as accidental coreference.It is not easy to settle this question though. Obvious test cases like (34) are excluded as Weak Crossover violations. (34)
*His mother gave every student a book.
If one considers the Weak Crossover constraint to be a consequence of the impossibility of backward binding (as I do), this is of course expected. However, we have to take the option into account that Weak Crossover is an independent constraint. Let us thus develop a more complicated setup. Consider the example in (35). (35)
a. Johni gave hisi mother a book and Billj gave hisj mother flowers. b. Johni gave hisi mother a book and Billj (gave hisj mother) flowers.
Example (35b) is an instance of gappping, a kind of ellipsis where part of a VP is deleted. In one of its interpretations, (35b) is synonymous with (35a) (with the indicated coreference pattern). So the example illustrates that gappping admits sloppy readings—the pronoun his refers to the local subject both in the source clause and in the elliptical clause. I take it that sloppy reference only arises if the anaphora pattern is grammatically determined and does not arise via accidental coreference.
180
ANAPHORA AND TYPE LOGICAL GRAMMAR
Gapping thus provides a good test case for backward binding because the right peripheral remnant of the elliptical clause may in principle provide an antecedent for a cataphoric pronoun in the ellipsis. (36a) illustrates that an NP inside the direct object can antecede a cataphoric pronoun inside the indirect object. However, this cataphoric relationship does not give rise to a sloppy reading under gappping, cf. (36b). (36)
a. Mary gave hisi mother a picture of Johni . b. *Mary gave hisi mother a picture of Johni and Sue (gave hisj mother) a photo of Billj .
So it seems that backward binding is also excluded in constructions that cannot be accounted for as Weak Crossover violations. Backward binding from an object into a clausal subject is quite generally possible though. The following examples (taken from or modeled after examples from Williams, 1997) illustrate this point. (37)
a. That hei might someday meet the queen inspires [every British soldier]i . b. That hei/j had won encouraged Johni and electrified Billj .
In (37a), the object quantifier every British soldier binds the preceding pronoun he inside the subject clause. Likewise, the pronoun he in (37b) has a sloppy reading under ellipsis even though it precedes its antecedents. This problem can be dealt with in the lexicon if we assume that verbs like inspire or encourage undergo a lexical type shift that establishes the relevant binding pattern. This lexical rule takes the form x : (s\s)/np ⇒ λyz.xz(yz) : (s|np\s)/np As far as the semantics goes, this is an instance of Curry and Feys’ (1958) combinator §. Finally, there is a series of remaining constructions that admit backward binding without c-command that apparently neither approach can easily cope with. The following examples (taken from Gawron and Peters, 1990) are instances of this pattern. (38)
a. A plaque bearing the date of itsi incorporation can be found outside [every American city]i . b. Devotion to hisi country characterizes [every good soldier]i .
Invoking lexical rules here is a possibility, but a satisfactory treatment would have to investigate the factors that are responsible for this kind of pattern more closely.
Pronouns and Quantification
181
To sum up the discussion in this chapter, I demonstrated that LLC provides a theoretical base for a treatment of pronominal anaphora that covers a substantial amount of the binding patterns that are observed in English. Crucially, anaphora resolution by means of |E interacts with hypothetical reasoning in various ways. It is possible that the antecedent of anaphora resolution is a hypothesis that is to be discharged later. Depending on which kind of operator triggers the introduction and discharge of this assumption, this leads to binding of pronouns by whoperators or by quantifiers. Besides, the anaphor itself might be hypothetical, i.e., we may work with hypotheses of a type A|B. This pattern arises in connection with functional questions and in constructions that a transformational treatment would analyze as invoking reconstruction. Anaphora resolution in LLC is governed solely by the linear order of the elements involved. This leads to a purely precedence based theory of pronoun binding. While such an approach is not entirely without problems, I argued that it is empirically adequate for a wide range of cases.
Chapter 5 VERB PHRASE ELLIPSIS
1.
Introduction
The kind of meaning multiplication that LLC is designed to model is not restricted to pronominal anaphora. Another pervasive instance of anaphora is ellipsis. To be precise, meaning multiplication is characteristic for the subclass of elliptical construction that Hankamer and Sag, 1976 subsume under “surface anaphora”. These are anaphoric constructions that require a linguistically realized antecedent. The following contrast (from Hankamer and Sag, 1976) may serve to illustrate this point: (1)
a. [Hankamer attempts to stuff a 9-inch ball through a 6-inch hoop] Sag: #It’s not clear that you’ll be able to. b. [Same context] Sag: It’s not clear that you’ll be able to do it.
Even though there is a salient and pragmatically plausible interpretation available in both anaphoric constructions above, bare to cannot be interpreted without an overt antecedent while do it can. The former is an instance of surface anaphora and the latter one of deep anaphora. Surface anaphora thus requires the semantic re-use of linguistic resources. Surface ellipsis typically involves two phrases (usually clauses) that exhibit a certain kind of parallelism. Crucially, in one of the two clauses some syntactic material is missing, and this missing material is identified with the parallel material from the other phrase. The incomplete phrase is called the ellipsis site or the target and the other one the source.
183
184
ANAPHORA AND TYPE LOGICAL GRAMMAR
Ellipsis can be classified according to the syntactic category of the missing material, the remaining material in the target clause, and according to the structural relation that holds between source and target. Some well-studied examples of ellipsis are (the list is not supposed to be exhaustive in any way):
Right node raising. Source clause and target clause are conjoined, the target precedes the source, and the remaining material is on the left periphery of the target. (2)
a. John likes and Bill detests corduroy. b. Every man loves but no man wants to marry his mother.
Gapping. Source clause and target clause are conjoined, the source precedes the target, the missing material consists of the verb (both auxiliaries and the main verb in case of periphrastic forms), possibly together with a verb-adjacent object in double object constructions. (3)
a. b. c. d. e. f.
John met Mary and Bill Sue. John has invited Mary and Bill Sue. John gave a flower to Sue and Bill a CD to Anna. John gave a flower to Sue and Bill to Anna. John gave Sue a flower and Bill Anna a CD. He gave Sue a flower and she a CD.
Stripping. The remnant in the target clause is an argument of the verb; no special constraints on the structural relation between source clause and target clause. (4)
a. Bill opened a bottle of wine, and Harry too. b. Bill opened a bottle of wine. Harry too.
VP ellipsis. No special constraint on the syntactic relation between source clause and target clause (coordination or subordination within one sentence, or different sentences). The missing material is an infinite VP. (5)
a. b. c. d. e. f.
John John John John John John
left, and Bill did too. left, but Bill didn’t. left, and Bill wants to, too. left before Bill did. is tall, and Bill is too. is tall. Bill is too.
Verb Phrase Ellipsis
185
Antecedent contained deletion. The target clause is a relative clause that modifies the object of the main clause, the missing material is an infinite transitive VP.1 (6)
a. John read every book that Bill did. b. John showed Bill every place that Harry already had.
Sluicing. No special requirement on the relation between source clause and target clause, target is a constituent question, everything except the wh-phrase is missing. (7)
a. John read a book, but I don’t know which one. b. They wanted to hire somebody who speaks a Balkan language, but I don’t know which one. (from Merchant, 1999) c. They wanted to hire somebody who speaks a Balkan language. Which one?
As said above, this list is by no means intended to be exhaustive. These kinds of ellipsis can be grouped into three super-categories. Most of the kinds of ellipsis shown above are triggered by the presence of some lexical item (like a coordination particle in coordination ellipsis, or a wh-word in sluicing). This does not hold for stripping. Nor does stripping seem to be conditioned by a particular type of grammatical environment. Given this, this kind of ellipsis does not seem to lend itself easily to a compositional grammatical analysis.2 Among the remaining cases, we may distinguish between bounded and unbounded kinds of anaphora. All instances of coordination ellipsis, i.e., right node raising, gapping etc. are confined to coordinate structure. Therefore it seems plausible to locate the source of the meaning multiplication that comes with anaphora in the lexical entry of the coordination particle. As discussed in Chapter 1, there is a straightforward Categorial treatment of all cases where the remnant in the target clause is a continuous substring of the reconstructed clause, and there are proposals 1 Most work on antecedent contained deletion subsumes it under VP ellipsis, but under a purely descriptive perspective, what is missing is a transitive VP. See Jacobson, 1992a for arguments that this seemingly naive view is in fact the correct one. 2 A possible strategy for an analysis within the present framework could run as follows: We assume a lexical rule like x : A/B ⇒ x : A|B
which would transform a lifted NP like Bill with category s/(np\s) and meaning λP.P bill’ into a sentence that needs a VP antecedent, i.e., something of category s|(np\s) that accesses a property P from the context and assumes the meaning P bill’. Whether such an analysis is viable of course depends on whether appropriate restrictions on such a lexical rule can be formulated. For the time being I have to leave this issue open.
186
ANAPHORA AND TYPE LOGICAL GRAMMAR
to extend this kind of treatment to gappping as well (see for instance Steedman, 1990 and Morrill and Solias, 1993). Antecedent conained deletion is likewise clause bounded, and an analysis that locates the trigger for ellipsis in the relative pronoun seems conceivable (even though to my knowledge it hasn’t been tried yet). The remaining cases, verb phrase ellipsis (VPE henceforth) and sluicing are both triggered by certain lexical items (auxiliaries or the infinitive marker to for VPE, wh-words for sluicing), and they are in principle unbounded. Therefore, an analysis which locates the job of meaning multiplication in the lexicon would be as complex as the corresponding theories of pronominal anaphora. This makes these kinds of ellipsis candidates for a modeling in terms of |E. An analysis of sluicing with the apparatus of LLC requires certain assumptions about the semantics of indefinites that are to be introduced in the following chapter. In the present chapter, I will develop an analysis of VPE using LLC. The theory comes in two variants; in the next section I will introduce a fairly simple version that is similar in spirit to the theory of Sag, 1976 (who however uses a transformational syntax). It is well-known from the literature that Sag’s theory undergenerates in certain respects, and so does the one to be developed here. Therefore I will propose a somewhat more complex theory variant in the sections 5 and 6 which extends to cases that are problematic for a Sag style account—admittedly at the price of overgeneration in certain respects. However, I am just concerned with the syntax and semantics of VPE here, while there are evidently pragmatic adequacy conditions that constrain ellipsis further.
2.
VPE: The Basic Idea
Let us start the discussion with a simple instance of VPE like (8)
John walked, and Bill did too.
Both did and too are prima facie candidates for the lexical trigger of ellipsis. Since an auxiliary (or—in the case of infinite target clauses—to) is obligatory for VPE while the presence of too isn’t (as can be seen from this sentence), I take it that VPE is triggered by the auxiliary. I will ignore all issues pertaining to verbal inflection here, including the semantic impact of tense and aspect. This simplification being made, the auxiliary did occupies the position of a VP, but it requires a VP as an antecedent. The meaning of the target VP is identical to that of the source VP. Therefore the natural candidate as lexical entry for did (and other auxiliaries) is (9)
did – λP.P : (np\s)|(np\s)
187
Verb Phrase Ellipsis
In words, auxiliaries are treated as “pro-verbs” here. They are analyzed completely in parallel to pronouns, except that they occupy a VP position rather than an NP position, and they require a VP rather than an NP as antecedent. Their denotation is the identity function over VP denotations, i.e., over properties. This approach thus does not assume any internal syntactic structure of the elliptical VP. As a consequence, I do not expect that there is always a non-elliptical counterpart to a VPE construction with exactly the same meaning. This distinguishes this approach (and all other proverb approaches like for instance Hardt, 1993) from deletion theories (like for instance Fiengo and May, 1994). During the subsequent discussion, we will in fact encounter examples where reconstruction leads to wrong predictions. The derivation of a construction like (8) is entirely analogous to pronominal anaphora resolution. The source VP, walks, has the appropriate category to antecede did and thus enables the application of |E. Besides, only Modus Ponens is involved. The derivation is given in Figure 5.1. (I ignore the contribution of too as inessential for the semantic composition.) did Bill
John john’ np
lex
walked [walk’]i np\s
walk’john’
and lex
λpq.q ∧ p (s\s)/s
\E
lex
lex
bill’ np
walk’ np\s walk’bill’ s
λq.q ∧ walk’bill’ s\s
walk’john’ ∧ walk’bill’ s Figure 5.1.
3.
lex
λP.P (np\s)|(np\s)
|E, i
\E
/E
\E
Derivation of (8)
Interaction with Pronominal Anaphora
If the elided VP contains a pronoun that can be bound by the local subject, ellipsis resolution leads to a three-way ambiguity. If the pronoun in the source is free, it will be coreferential with the elided pronoun in the target. Such a coreferential reading is also possible if the pronoun in the source is bound to the local subject. In the latter case, it is also possible that the elided pronoun refers to the subject of the target clause.
188
ANAPHORA AND TYPE LOGICAL GRAMMAR
The former two readings are called strict and the latter one sloppy in the literature. They are indicated in (10) (where italicized material is to be understood as being elided). (10)
a. b. c. d.
John revised his paper, and Bill did (too). Johni revised hisk paper, and Billj did revise hisk paper. Johni revised hisi paper, and Billj did revise hisi paper. Johni revised hisi paper, and Billj did revise hisj paper.
This three-way ambiguity arises naturally from the possible interactions of |E for the VP anaphor with the interpretation of the pronoun. As a first option, we may leave the pronoun unbound. Its anaphora slot is inherited by the sentence as a whole via application of |I. This rule involves the temporary replacement of a premise of type np|np by a hypothesis of type np. This hypothesis participates in the construction of the source VP, which in turn serves as antecedent for ellipsis resolution. The derivation is given in Figure 5.2. To keep things simple, I treat his paper as a lexical unit with category np|np which denotes the Skolem function paper’ which maps individuals to their papers. his paper revised John j’ np
lex
lex
paper’ np|np
rv’ (np\s)/np
ppr’x np
[rv’(ppr’x)]i np\s
did
lex Bill
k /E
\E
rv’(ppr’x)j’ s
and λpq.q ∧ p (s\s)/s
lex
rv’(ppr’x) np\s rv’(ppr’x)b’ s
λq.q ∧ rv’(ppr’x)b’ s\s
rv’(ppr’x)j’ ∧ rv’(ppr’x)b’ s λx.rv’(ppr’x)j’ ∧ rv’(ppr’x)b’ s|np
Figure 5.2.
lex
b’ np
lex
λP.P (np\s)|(np\s)
|E, i \E
/E
\E
|I, k
Derivation of (10b)
Alternatively, the anaphora slot corresponding to the pronoun may be eliminated via |E. Here we have two options. We may choose the subject of the source clause, John, as antecedent for his. This leads to the derivation in Figure 5.3 on the next page, corresponding to the strict reading (10c). Finally, we may fill the subject slot of the verb in the source clause with a hypothetical subject which serves as antecedent for his. After
189
Verb Phrase Ellipsis his paper revised John
lex
ppr’ np|np
lex
rv’ (np\s)/np
[j’]i np
ppr’j’ np
[rv’(ppr’j’)]j np\s
did
lex Bill
|E, i and
/E
λpq.q ∧ p (s\s)/s
\E
lex
lex
b’ np
rv’(ppr’j’) np\s rv’(ppr’j’)b’ s
λq.q ∧ rv’(ppr’j’)b’ s\s
rv’(ppr’j’)j’ s
|E, j \E
/E
\E
rv’(ppr’j’)j’ ∧ rv’(ppr’j’)b’ s
Figure 5.3.
lex
λP.P (np\s)|(np\s)
Derivation of (10c)
pronoun resolution, this hypothesis gets discharged via \I. This gives the type np\s for the source VP, which is thus a suitable antecedent for ellipsis resolution. While this sequence of a \E followed by \I seems to lead to an η-normalization configuration, it does not due to the intervening |E step. In this derivation, the source VP receives the interpretation to revise one’s paper. In other words, the pronoun is not bound to some overt antecedent but to the subject slot of the verb. This leads to the sloppy reading (10d). The derivation is given in Figure 5.4. his paper revised
1 [x]i np John j’ np
lex
lex
rv’ (np\s)/np
ppr’ np|np ppr’x np
rv’(ppr’x) np\s rv’(ppr’x)x s [λx.rv’(ppr’x)x]j np\s
lex
|E, i did
/E Bill
\E
\I \E
and λpq.q ∧ p (s\s)/s
lex
lex
λx.rv’(ppr’x)x np\s rv’(ppr’b’)b’ s
λq.q ∧ rv’(ppr’b’)b’ s\s
rv’(ppr’j’)j’ s rv’(ppr’j’)j’ ∧ rv’(ppr’b’)b’ s
Figure 5.4.
b’ np
lex
λP.P (np\s)|(np\s)
|E, j \E
/E
\E
Derivation of (10d)
Cascaded ellipsis. In the present system, the ambiguity between a strict and a sloppy reading for a pronoun under ellipsis is not a property
190
ANAPHORA AND TYPE LOGICAL GRAMMAR
of the pronoun per se but arises from the interaction between anaphora resolution and other operations of semantic composition. It is thus possible that the same pronoun receives a strict and a sloppy construal simultaneously if it is part of two ellipses. Gawron and Peters, 1990 discuss an example with this property. (11)
a. John [[revised his paper]j before the teacher didj ]i , and Bill didi , too. b. John revised John’s paper before the teacher revised John’s paper, and Bill revised John’s paper before the teacher revised John’s paper. c. John revised John’s paper before the teacher revised the teacher’s paper, and Bill revised Bill’s paper before the teacher revised the teacher’s paper. d. John revised John’s paper before the teacher revised John’s paper, and Bill revised Bill’s paper before the teacher revised Bill’s paper.
The sentence (11a) involves cascaded ellipsis. The source VP of the outer ellipsis contains an embedded clause with an elided VP (which refers back to the matrix VP of the source clause). So the pronoun his participates in two ellipses. In the indicated ellipsis pattern, (11a) is four-way ambiguous. In addition to the reading in which his remains free—which I omit from the following discussion—both ellipses may receive a strict construal (reading (11b)) or they may both receive a sloppy construal (paraphrased in (11c)). The interesting reading is given in (11d). Here, the inner ellipsis receives a strict interpretation and the outer ellipsis a sloppy one. The derivation of the first two readings in LLC are straightforward. To obtain the strict-strict reading, |E is applied to the pronoun with the np John as antecedent. After this, the inner VP is assembled and assigned the Curry-Howard term rv’(ppr’j’), i.e., the meaning to revise John’s paper. Using this VP as antecedent for the first did and then assembling the source VP of the outer VP leads to the meaning before’(rv’(ppr’j’)teacher’)(rv’(ppr’j’)) for the matrix VP revised his paper before the teacher did. Using this VP as antecedent for the second did leads to reading (11b). The strict-sloppy reading is obtained if the inner ellipsis is given a sloppy construal. To this end, pronoun resolution is combined with hypothetical reasoning for the subject of revised as in the example derivation
Verb Phrase Ellipsis
191
of a sloppy construal above (Figure 5.4 on page 189).This leads to the meaning λx.rv’(ppr’x)x for revised his paper. Using this construal to resolve the first occurrence of did and then assembling the matrix VP leads to the interpretation before’(rv’(ppr’teacher’)teacher’)(λx.rv’(ppr’x)x) for revised his paper before the teacher did. Using this VP meaning as antecedent for the second ellipsis gives us reading (11c). Finally, there is the critical reading in which the outer ellipsis receives a sloppy construal and the inner ellipsis a strict one. To see how this reading is derived, observe that the strict reading of the first conjunct, John revised his paper before the teacher did, amounts to the derivability of the following sequent in LLC: LLC x : np, r : (np\s)/np, p : np|np, bef : (np\s)\(np\s)/s, t : np, d : (np\s)|(np\s) ⇒ bef (d(r(px))t)(r(px))x : s
Applying \I to this sequent yields the equally derivable sequent LLC
r : (np\s)/np, p : np|np, bef : (np\s)\(np\s)/s, t : np, d : (np\s)|(np\s) ⇒ λx.bef (d(r(px))t)(r(px))x : np\s
In linguistic terms, this means that the VP revised his paper before the teacher did can have the interpretation to revise one’s paper before the teacher revised one’s paper. If this VP meaning is used as antecedent for the resolution of the outer ellipsis, we get the mixed reading (11d). Dalrymple et al., 1991 consider two more readings for (11), which are paraphrased below: (12)
a. John revised John’s paper before the teacher revised the teacher’s paper, and Bill revised John’s paper before the teacher revised the teacher’s paper. b. John revised John’s paper before the teacher revised the teacher’s paper, and Bill revised Bill’s paper before the teacher revised John’s paper.
Native speaker judgments are notoriously shaky in examples as complex as this. Nevertheless the majority opinion tends towards the assessment that (12a) is possible while (12b) is not. With the given type assignment, only the three readings in (11) are derivable in LLC. For the time being, there seems no obvious remedy for this undergeneration.
192
ANAPHORA AND TYPE LOGICAL GRAMMAR
Let us next turn our attention to a puzzle that was brought up by Gawron and Peters, 1990 and received further attention by Shieber et al., 1996. Consider the example (13a). (13)
a. Madeline revised [her mother]i ’s paper before shei did. (Gawron and Peters, 1990) b. *Madeline revised Madeline’s mother’s paper before Madeline’s mother revised Madeline’s grandmother’s paper.
It seems impossible to interpret (13a) as (13b), i.e., to give the ellipsis a sloppy construal and simultaneously use her mother as antecedent for she. Under the type assignment np|np for she, this reading is in fact excluded in our theory. A derivation of this reading would schematically look like the one in Figure 5.5
[np]i
1
revised [heri mother]k ’s paper np\s s
M. np
lex
[np\s]j
\E
before shek didj
\I, 1
(np\s)\np\s np\s
s Figure 5.5.
\E
\E
(Illicit) Derivation of (13b)
This is not a licit proof tree because her mother is anaphorically related to the np-hypothesis with the label “1”—otherwise we do not get a sloppy reading—and thus the np-node resulting after applying |E to her mother and the hypothetical np dominate the note \I where the hypothesis 1 is discharged. According to the definition of proof trees, \I operates on a proof tree with a single conclusion. Therefore her mother cannot enter an anaphoric relationship with an anaphor outside the scope of this application of \I such as she in the derivation above. Thus, this reading cannot be derived with the given type assignments. However, Shieber et al., 1996 point out that this kind of reading does in fact exist if the example is changed in such a way that the reading in question becomes pragmatically plausible. Their example is (14)
a. Ronniei criticized [hisi predecessor]j ’s policy just as hej did when hej assumed office. b. Maryi heard about the layoffs from [heri manager]j shortly after hej did. (both examples from Shieber et al., 1996)
Verb Phrase Ellipsis
193
In (14a), the reading in which Ronnie (Reagan) criticized the policy of Carter just as Carter criticized Ford’s policy is possible. It becomes derivable in our system if we assume a paycheck reading for the pronominal subject of the target clause. This means its type is (np|np)|(np|np), and it requires two antecedents for resolution, a Skolem function and an individual (i.e., an np containing an as yet unresolved pronoun and an np). In this reading, his predecessor in (14a) can perfectly well antecede he, and using Ronnie as second antecedent results in the reading in question. To account for the contradictory empirical evidence in these constructions, we have to resort to the assumption that pronouns are lexically ambiguous between an individual reading and a paycheck reading, but that the latter one is strongly dispreferred and only pops up if it is pragmatically enforced. This is not entirely unsatisfactory because the same seems to hold for paycheck readings in general. I close this section with a problem that was originally brought up in Dahl, 1973 and which is sometimes called the “many pronouns puzzle” in the literature. An instance in given in (15). (15)
a. John said he talked to his mother, and Bill did, too. b. John said John talked to John’s mother, and Bill said John talked to John’s mother. c. John said John talked to John’s mother, and Bill said Bill talked to Bill’s mother. d. John said John talked to John’s mother, and Bill said Bill talked to John’s mother. e. *John said John talked to John’s mother, and Bill said John talked to Bill’s mother.
Here, the source VP contains two pronouns, and if they are both coreferent with the source subject in the source VP, we expect a strict/sloppy ambiguity for both in the ellipsis. So there are four logically possible readings. One of these readings is impossible though. The pronouns may be both strict (paraphrased in (15b)) or both sloppy (cf. (15c)). Furthermore, it is possible that the first pronoun is sloppy and the second one strict (reading (15d)). The fourth reading, where the first pronoun is strict and the second one sloppy (as in (15e)) is excluded though. The literature contains quite a few proposals to cope with this fact (see e.g. Kehler, 1993, Fiengo and May, 1994, Sem, 1994, Williams, 1997, Fox, 1998). Despite differences pertaining to framework and implementation, they all share the intuition that we are dealing with a minimality effect here. The idea runs roughly as follows: The source clause is spuriously ambiguous because the second pronoun may refer
194
ANAPHORA AND TYPE LOGICAL GRAMMAR
back either to the first pronoun or to the matrix subject. Since the second pronoun is closer to the first pronoun than to the matrix subject John, the anaphoric link from his to he blocks the anaphoric link from his to John. According to the mentioned theories, sloppy readings arise because under ellipsis resolution, the referential index of a pronoun may either be maintained or replaced by the index of a parallel element of the source antecedent. If his is linked to he in the source and he receives a strict construal, his cannot be sloppy, since source antecedent and target antecedent have the same index. The intuition that we are dealing with a blocking effect here is certainly appealing, and it gains further support from the fact that it can be overruled by an appropriate contextual setting. The following example (due to Hardt, 1993), which is structurally parallel to (15), readily admits the reading corresponding to (15e). (16)
(John is suspected of murdering Bill’s mother. Bill has claimed that John was visiting Bill’s mother on the night in question. But John has presented as his alibi that he was home with his own mother that night. The district attorney says, in reference to the case against John: [...] So where WAS John last night?) John says he was at his mother’s house, but BILL does too. (from Hardt, 1993:119)
It is questionable, however, whether it is really the ellipsis resolution module wherein this blocking effect is rooted. A similar effect can be observed under de-accenting and in connection with focus. (17)
John said he talked to his mother, and Bill also said he talked to his mother.
Here the italicized material is meant to be pronounced with flat intonation. The sentence (17) has the same range of readings as (15), even though no ellipsis is involved. Likewise, the following sentence is only three-way ambiguous: (18)
Only John said he talked to his mother.
While both pronouns can in principle be either bound or coreferential, the sentence has no reading where John is the only person x such that x said John talked to x’s mother. So it seems that the interpretation λx.x said John talked to x’s mother is a highly marked interpretation for the source VP said he talked to his mother in the context of (15), and this makes it a dispreferred antecedent for ellipsis resolution, even though this reading is grammatically permitted.
195
Verb Phrase Ellipsis
4.
Interaction of VPE and Quantification
The interaction of VPE with quantification has received much attention in the literature on ellipsis, and in this section I will briefly discuss some key examples and show how LLC copes with them. As we saw above, quantifier scoping involves the temporary introduction of an np-hypothesis, and this hypothesis can be used as an antecedent of anaphora resolution. This pattern results in bound readings for pronouns. A similar pattern arises in connection with VPE as well. Here, the quantifier hypothesis may participate in the composition of a VP which serves as antecedent for ellipsis resolution. An example of this pattern is (18). (An analogous example was first discussed in Sag, 1976.) (19)
a. John met everybody before Bill did. b. John met everybody before Bill met everybody. c. John met everybodyi before Bill met himi .
The sentence (19a) is ambiguous. In its first reading, the object quantifier everybody takes scope over the source VP, and this VP is used as antecedent for ellipsis resolution. This results in a reading that is synonymous with (19b). The second reading is paraphrased in (19c). Here, the quantifier everybody is replaced by a hypothesis in the process of scoping, and the second step of qE is delayed until after ellipsis resolution. This results in a reading where the quantifier binds two variables, one in the source VP and one in the target VP. The proof tree is given in Figure 5.6. everybody met lex meet’ (np\s)/np meet’x [np\s]j
lex every’ q(np, s, s) i x np /E
before Bill didj
before’(meet’xbill’) (np\s)\np\s lex \E john’ before’(meet’xbill’)(meet’x) np np\s \E before’(meet’xbill’)(meet’x)john’ s qE, i every’(λx.before’(meet’xbill’)(meet’x)john’) s John
Figure 5.6.
Derivation of reading (19c) of (19a)
196
ANAPHORA AND TYPE LOGICAL GRAMMAR
Here the hypothesis x : np that is introduced by the quantifier is part of the source VP that serves as antecedent for ellipsis resolution. This hypothesis must be discharged by qE, and this rule is only applicable if it operates on a proof tree with a single conclusion. Therefore the corresponding qE-step must be dominated both by the hypothesis and the ellipsis site. In linguistic terms, this means that such a reading is only possible if the quantifier takes scope over the whole construction, including both the source and the target VP. As mentioned before, a detailed discussion of wh-constructions in the context of TLG, including extraction from non-peripheral positions, goes beyond the scope of this book. Nevertheless, it should be mentioned in passing that a pattern comparable to the one discussed above arises in connection with wh-movement as well. The example in (20) illustrates this. (20)
a. the man who Mary met before Sue did b. the man who Mary met before Sue met him
A wh-operator like who above triggers the introduction of a hypothesis, just like a quantifier, and this hypothesis can participate in the composition of the source VP of VPE. The final discharging of the hypothesis leads to a reading such as that paraphrased in (20b) for an example like (20a).3 for an example like (20a), where the wh-operator binds two variables. (Since in the example above this hypothesis would occur after met and thus in a non-peripheral position, we need limited access to the structural rule of Permutation to carry out such a derivation.) Note that here and in the previous example, the counterpart of the elided VP in the closest non-elliptical paraphrase is not identical with the source VP. It is even possible to construct examples that do not have a non-elliptical counterpart at all. This happens if a wh-operator binds a gap of a category that has no proform. Measure phrases are an example. (21)
How many miles are you prepared to walk if the people want you to?
This provides evidence for an interpretative theory of VPE which does not assume that ellipsis arises from the phonological deletion of syntactically present material in the ellipsis site. 3 The
fact that (20b) itself is of questionable grammaticality is inessential for my point here.
Verb Phrase Ellipsis
197
Let us next consider a construction that is related to the example of cascaded ellipsis discussed in the previous section. The following sentence comes from Gawron and Peters, 1990 as well. It demonstrates that the ambiguity of bound versus coreferential interpretation of pronouns on the one hand and the strict/sloppy ambiguity on the other hand are independent phenomena: (22)
a. Every student revised his paper before the teacher did. b. Every studenti revised hisj paper before the teacherk revised hisj paper. c. Every studenti revised hisi paper before the teacherj revised hisj paper. d. Every studenti revised hisi paper before the teacherj revised hisi paper.
Sentence (22a) has three readings (paraphrased in (22b-d)). Next to the unproblematic cases where the pronoun is either free and strict (b) or bound and sloppy (c), there is an interpretation where the pronoun is bound but nevertheless strict (d). Gawron and Peters therefore assume a three-way ambiguity of pronoun uses—referential as in (b), role-linking as in (c), and co-parametric as in (d). In the present system, all three readings fall out immediately, even though the pronoun is unambiguous. If the pronoun is free, the derivation is analogous to the one given in Figure 5.2 on page 188. Likewise, the sloppy reading in (22c) is comparable to the sloppy reading of (10), combined with quantifier scoping. The interesting case is the reading (22d) where the pronoun his is bound by the quantifier but nevertheless strict. The existence of this reading follows from the intuitive idea behind Moortgat’s treatment of quantification here. The qE-scheme basically says: Whenever an a : np occurs in the context of an s with meaning ϕ(a), you can replace the np by a quantifier with meaning Q and obtain an s with meaning Q(λx.ϕ(x)). Using the strict reading of (10) as input to this operation yields the critical reading (22d). The full derivation is given in Figure 5.7 on the next page. Last but not least, pronoun resolution, ellipsis resolution and quantifier scope may interact and thus constrain each other. The following example (from Gawron and Peters, 1990) illustrates this. (23)
a. Alice recommended a book that she hated before Mary did. b. (∃x(book’(x) ∧ hate’(a, x) ∧ recommend’(a, x))) < (∃x(book’(x) ∧ hate’(m, x) ∧ recommend’(m, x))) c. (∃x(book’(x) ∧ hate’(a, x) ∧ recommend’(a, x))) < (∃x(book’(x) ∧ hate’(a, x) ∧ recommend’(m, x)))
198
ANAPHORA AND TYPE LOGICAL GRAMMAR did
lex λP.P (np\s)|(np\s) |E, j lex lex ppr’ b’ rv’(ppr’x) revised bef ore np|np |E, i np np\s lex lex \E rv’ ppr’x before’ rv’(ppr’x)b’ everybody (np\s)/np np ((np\s)\np\s)/s s lex /E /E ∀ [rv’(ppr’x)]j before’rv’(ppr’x)b’ q(np, s, s) 1 (np\s)\np\s np\s \E [x]i before’(rv’(ppr’x)b’)(rv’(ppr’x)) np np\s \E before’(rv’(ppr’x)b’)(rv’(ppr’x))x s qE, 1 ∀(λx.before’(rv’(ppr’x)b’)(rv’(ppr’x))x) s his paper
Figure 5.7.
Bill
Derivation of (22d)
d. ∃x(book’(x) ∧ hate’(a, x) ∧ (recommend’(a, x) < recommend’(m, x))) The sentence (23a) contains two sources of ambiguity. The pronoun she may receive a strict or a sloppy construal with respect to the VP ellipsis, and the indefinite NP a book that she hated may have narrow scope or wide scope with respect to before. So we expect four logical interpretational possibilities, while the sentence is in fact only three-way ambiguous. The sloppy reading with narrow scope of the indefinite is paraphrased in (23b). Here, the subject slot of recommended is filled by a hypothetical subject which serves as antecedent for the pronoun. Furthermore, the object quantifier is scoped over the s that is composed with this hypothetical subject. It is discharged after quantifier scoping. So in effect the quantifier takes scope only over the source VP, and after ellipsis resolution, a parallel quantifier appears in the interpretation of the target clause. The derivation of the narrow scope/strict reading (23c) is completely analogous; the only difference is that in the latter case, the overt NP Alice serves as antecedent for the pronoun. (The introduction of a hypothetical subject is nevertheless necessary to give the quantifier VP scope.) The formula in (23d) represents the reading where the pronoun is strict and the quantifier takes wide scope. Its derivation (sketched in Figure 5.9 on the facing page) is structurally analogous to the derivation in Figure 5.6 on page 195. In both cases, a quantifier in the object position of the source VP takes wide scope over the whole construction and thus binds a variable both in the source clause and in the target clause. The current example furthermore involves pronoun resolution
199
Verb Phrase Ellipsis a q(np, s, s)/n recommended
[np]i
1
(np\s)/np
s
Alice [np]l
lex
n
q(np, s, s)
lex
np
np\s
s
book that shei/l hated
lex
/E
j
/E
\E
qE, j
before Mary didk
\I, 1
[np\s]k
(np\s)\np\s np\s
s
Figure 5.8.
\E
\E
Derivation of (23b/c)
(with the overt subject Alice as antecedent) as part of the composition of the quantifier. a lex
book that shei hated
q(np, s, s)/n
n /E
q(np, s, s)
recommended lex
j
(np\s)/np
np
before Mary didk
/E Alice lex
[np\s]k
[np]i
(np\s)\np\s np\s
s
\E
\E
qE, j s
Figure 5.9.
Derivation of (23d)
The interesting point about this example is that there is no sloppy reading where the quantifier has wide scope. At a first glance, it seems possible to design a proof tree which represents this reading. It is given in Figure 5.10 on the following page. Here, the hypothetical subject of the source VP is used as antecedent for she, resulting in a sloppy reading,and the quantifier is scoped at the end of the derivation. This leads to wide scope of the quantifier. However, there is no derivation that would result in this proof tree. According to the qE-scheme, this rule takes two independent proof trees as inputs: one which has q(np, s, s) as conclusion, and one which has the corresponding hypothetical np as premise. In the derivation in Figure 5.10 on the next page, there is an anaphoric link (indicated by the index i) connecting a
200
ANAPHORA AND TYPE LOGICAL GRAMMAR a q(np, s, s)/n recommended
[np]i
1
(np\s)/np
Alice np
lex
[np\s]k
n
q(np, s, s)
lex
np
np\s s
book that shei hated
lex
/E
j
/E
\E
before Mary didk
\I, i
(np\s)\np\s np\s
s s
\E
\E
qE, j
Figure 5.10.
Illicit derivation for (23)
discharged assumption of the second input with a node in the first input. This is illicit since the two inputs to qE must be independent from each other. It should be stressed again that the constraints on proof tree formation are not ad hoc rules; on the contrary, they are essential to establish the equivalence between the proof tree format on the one hand and the two sequent formats on the other hand. In other words, there is no sequent derivation corresponding to the wide scope sloppy reading, thus LLC does not admit it. This point is possibly further clarified by the fact that the Curry-Howard label of the conclusion of the illicit proof tree would be ∃z.book’z ∧ hate’zx ∧ before’(recommend’zmary’) (λx.recommend’zx)alice’ Here, the term corresponding to the relative clause, hate’zx, contains an occurrence of the variable x that is unbound. Licit derivations in LLC (or any other substructural logic) never produce Curry-Howard terms containing free variables that do not correspond to some premise. Let us take stock. In the preceding two sections, I limited my attention to cases of VPE where 1 the subjects of source VP and target VP are either proper nouns or simple quantifiers, 2 the source clause is a main clause, and 3 the target clause is either also a main clause or else directly subordinated to the source clause. Within this fragment, the logic LLC, together with the type assignment (np\s)|(np\s)
Verb Phrase Ellipsis
201
for the auxiliary in the target clause, leads to a theory of VPE with considerable empirical coverage. Paired with the TLG analyses of pronominal anaphora and quantifier scope that were introduced in the preceding chapters, the system handles all key constructions from the VPE literature in an largely empirically adequate way. The only mispredictions occur in connection with Gawron and Peter’s (1990) example of cascaded ellipsis (11)—where we predict three readings while there are probably four—as well as in connection with the many pronoun puzzle, and the latter problem is arguably independent from VPE. While this is a fairly satisfactory result, the present theory massively undergenerates when we look at cases that lie outside the fragment defined above. These issues will be discussed in the remainder of this chapter, and I will propose a revision of the lexical type assignment for the auxiliary. This revision is conservative though: As long as we restrict attention to the fragment that was covered up to now, the revised theory makes exactly the same predictions as the original one.
5.
VPE and Polymorphism
The approach to VP ellipsis presented in the last section belongs to the family of “identity-of-property” theories for VPE. Following basically Sag, 1976, these theories assume that the source VP and the elliptical VP express the same property at some level of derivation or representation. This idea is in sharp contrast with theories like that proposed by Fiengo and May, 1994, where VPE is basically seen as involving identical syntactic structure which is not pronounced in the elliptical part. In the sequel, I will discuss several problems for an identity-of-property approach that have been discussed in the literature, and I will demonstrate that an identity-of-meaning approach can be maintained if we admit a limited polymorphism in the lexicon, in a manner akin to the standard Categorial treatment of coordination.
5.1
The Hirschb¨ uhler Problem
Hirschb¨ uhler, 1982 notes that in the following example, the subject can take wide scope in both conjuncts. (24)
A Canadian flag was hanging in front of each window, and an American one was, too.
In the preferred reading, there is one American and one Canadian flag per window. Hirschb¨ uhler considered the option that this reading arises because the object each window scopes over the whole construction, including the conjunction. This would render the example analogous to
202
ANAPHORA AND TYPE LOGICAL GRAMMAR
(19). However, such a solution would fail, as Hirschb¨ uhler points out. We observe a similar reading in (25). (25)
A Canadian flag was hanging in front of many windows, and an American one was, too.
The preferred reading here is the one where the object takes scope over the subject in both conjuncts, but the conjunction still takes scope over both objects. Identity-of-property approaches to VPE are unable to derive this reading. To see why, one has to consider what potential antecedent properties the source clause supplies here. The syntactic antecedent in the last example is was hanging in front of many windows. This VP is entirely unambiguous; the only meaning of type e, t that can be derived from it is the one where the object scopes over the VP: λx.(many’ windows’(λy.was hanging in front of’xy)) Combining this meaning with either the source subject or the target subject yields inevitably the subject wide scope reading. Even though several attempts have been undertaken to treat this kind of example within an identity-of-property approach, none of them was really successful. The Hirschb¨ uhler problem effectively falsifies this group of ellipsis theories. It does not falsify a somewhat more general setup, something which has been called “identity-of-meaning” theories. It maintains the basic intuition that it is the meaning that is shared between source and target in VPE construction rather than syntactic structure, but it possibly gives up the assumption that this has to be a property. Under a flexible approach to meaning assignment, a phrase like was hanging in front of many windows may receive different meanings with different types. The key example for this more flexible treatment is Kempson and Cormack, 1983. They claim that the piece of meaning that is shared between source VP and ellipsis site is not a property of individuals but a property of quantifiers. A VP containing a quantified object will be ambiguous in this type, which in turn leads to the Hirschb¨ uhler ambiguity in ellipsis. To be somewhat more specific, the VP in question is ambiguous between the lifted properties λT.T (λx.(many’ windows’(λy.Ryx))) and λT.many’ windows’(λy.T (λx.Ryx)), where R stands for the meaning of hanging in front of. The former meaning assignment leads to a reading where the subject has wide scope in both conjuncts, while the latter one gives the critical Hirschb¨ uhler reading. Flexible meaning assignment is an essential aspect of any Categorial Grammar, so the Kempson/Cormack style treatment is easy to incor-
203
Verb Phrase Ellipsis
porate into the present theory of ellipsis resolution. To start with, even though Categorial meaning assignment is flexible, the category-to-type correspondence between syntax and semantics is strict. So assigning the string was hanging in front of many windows a meaning of a higher type implies assignment of a more complex syntactic category. The obvious candidate is (s/(np\s))\s, i.e., a functor that consumes a subject quantifier to its left to yield a clause. So the only adjustment that is necessary to adopt Kempson and Cormack’s analysis is a modification of the lexical assignment for the auxiliary in VPE construction: instead of the identity function over properties, I assign it the identity function over properties of quantifiers, paired with the appropriate syntactic type. So the modified lexical entry is (26)
did/was – λx.x : ((s/(np\s))\s)|((s/(np\s))\s)
The derivation of the lifted source VP of the Hirschb¨ uhler reading of (27) is given in Figure 5.11. (27)
A doctor visited every patient, and a nurse did too. every patient visited lex 1 s/(np\s) T
(np\s)/np visit’
q(np, s, s) every patient’ np x /E
2
np\s visit’x
/E s T visit’x qE, 2 s every patient’(λx.T visit’x) \I, 1 (s/(np\s))\s λT.every patient’(λx.T visit’x) Figure 5.11.
visited every patient – object wide scope
It should be noted that due to the built-in flexibility of Type Logical Grammar, this approach overgenerates. The Hirschb¨ uhler examples admit scope inversion, but only if it occurs both in the source clause and the target clause. A reading where the subject takes wide scope in the source clause and narrow scope in the target clause is excluded. In the present setup, such crossed readings are derivable, however. This is due to the fact that argument lowering is a theorem of L (and thus of LLC
204
ANAPHORA AND TYPE LOGICAL GRAMMAR
x : (s/(np\s))\s ⇒ λy.x(λz.zy) : np\s Now suppose we assign the source VP the object wide scope reading as in the sample derivation in Figure 5.11 on the previous page, combine it with the source subject directly, but let the copy undergo argument lowering before we combine it with the target subject. This will result in a reading where the object has wide scope in the source but narrow scope in the target clause. One might wonder though whether the kind of parallelism effects that we observe here should really be treated as a property of ellipsis resolution as such. Arguably, there is a parallelism constraint in coordinate construction anyway, quite independently of ellipsis. So it is possible that grammar in fact admits crossed readings, while pragmatics filters them out. This would lead to a kind of hybrid theory of VPE, where the interface between syntax and semantics is fairly liberal as far as admissible readings are concerned, while pragmatic constraints that are basically independent of the ellipsis module are responsible for the fine tuning. Some evidence for such an architecture will be collected in the next subsection.
5.2
Non-subject Sloppy Readings
Even more problematic for an identity-of-property approach are cases where the antecedent for a sloppy pronoun is not the subject of the source VP. Possible antecedents can be
NPs embedded in the subject. (28)
a. John’s coach thinks he has a chance, and Bill’s coach does too. (Rooth, 1992) b. People from LA adore it and people from NY do too. (after Reinhart, 1983) c. The policeman who arrested John failed to read him his rights, and the one who arrested Bill did too. (after Wescoat, 1989, cited by Dalrymple et al., 1991)
NPs embedded in a topicalized constituent. (29)
If Bill was having trouble in school, I would help him. If Harry was having trouble in school, I wouldn’t. (after Hardt, 1993)
NPs from superordinated clauses.
Verb Phrase Ellipsis
(30)
205
I didn’t know that Bill was a bigamist. Mary just said he’s married to her, and Sally did, too. (from Fiengo and May, 1994)
The sloppy pronouns are marked by italic font, and their antecedents by underlining. The first descriptive hypothesis about sloppy readings that comes to mind in view of these data is that the two antecedents of a sloppy pronoun must occupy structurally parallel positions in the source clause and the target clause (this is for instance assumed in Fiengo and May, 1994). However, this is shown to be too rigid by Rooth, 1992 ((a) and (b)) and Hardt, 1993 (c): (31)
a. First John told Mary that I was bad-mouthing her, and then Sue heard that I was. b. Yesterday the guy John works for told him to shape up, and today Bill’s boss did. c. If John was having trouble in school, I would help him. On the other hand, if Bill was having trouble, I doubt if I would.
So apparently a notion of semantic rather than structural parallelism is called for, which may be enriched by some notion of “implicational bridging” (Rooth, 1992) to cover cases like (31a). This approach, however, turns out to be too narrow too, as the following example from Fiengo and May, 1994 demonstrates. (32)
First John told Mary that I was bad-mouthing her, and then Sue behaved as though I would.
I do not have a novel account of the structural / semantic / pragmatic relation that has to hold between source and target in VPE to offer here. What the examples above do show is that whatever governs the distribution of non-subject sloppy readings, it is certainly not determined by grammar in the narrow sense. The only (trivial) grammatical constraint seems to be that the sloppy pronoun has to find an antecedent in the pre-VP material of both clauses. Even though an identity-of-property approach to VPE is incapable of covering any non-subject sloppy reading, these data are not overly problematic for an identity-of-meaning program if pronouns are analyzed in a variable free way. Let us take the intuition “the elliptical VP has to find an antecedent in the pre-VP material of both clauses” seriously. To put this idea slightly differently, what is shared between source clause and target clause in a VPE construction is the meaning of a VP that may contain a series of pronouns which are bound inside the source clause and in the target clause respectively. (The source clause and the target clause need not be the local clauses, as the example (30) demonstrates.)
206
ANAPHORA AND TYPE LOGICAL GRAMMAR
Let us restrict the discussion to cases with one pronoun for the moment. Basically, the category of a VP containing one pronoun is (np\s)|np. Let us abbreviate this category with vp1 . To enforce binding of the pronoun within a superordinate clause, this type has to be lifted to (33)
(s/vp1 )\s
Note that after lifting, the VP in question does not contain unresolved pronouns any longer. This can be generalized to an arbitrary number of pronouns in a simple way: Let us say that vp0 = np\s and vpn+1 = vpn |np. The general type scheme for lifted VPs is then (s/vpn )\s for arbitrary natural numbers n. Accordingly, I assume a polymorphic lexical entry for the auxiliary, namely the identity function over all instances of lifted VPs. (34)
λx.x : ((s/vpn )\s)|((s/vpn )\s)
Note that the proposal made in the last subsection is just a special case of this where n = 0. To see how this proposal works, consider a simple example like (35)
John’s father helps him, and Bill’s father does too.
The derivation of the source clause is given in Figure 5.12 on the facing page. In an intermediate step of the derivation, the string helps him is assigned the lifted VP category (s/vp1 )\s, paired with the meaning λT.T help’. This piece of meaning serves as antecedent for ellipsis resolution. The derivation of the target clause runs completely in parallel, except for the fact that the lifted VP is not lexically founded but retrieved from the source clause via |E. So the meaning of the target clause winds up being (help’ b’(of’ b’ father’))—Bill’s father helps Bill.
6.
Parallelism Versus Source Ambiguity
Speaking in somewhat less technical terms, the lexical entry for the auxiliary given in (34)—paired with the general approach to anaphora presented in this book—leads to a constraint on sloppy readings of VPE: a given sloppy pronoun has to find its antecedents in the pre-VP material of the source clause and the target clause (or clauses in the case of multiple ellipsis) respectively. Since this is a very mild constraint indeed, it is not very surprising that most examples that are discussed in the literature can be derived in such a system. A notoriously difficult one is due to Dahl, 1973.
207
Verb Phrase Ellipsis John
lex
[np]i j’
s
np\np/n of’ np/n (of’ j’)
lex \E
f ather n father’
lex /E
np (of’ j’ father’)
[vp1 ]i R vp (Rj’)
s (Rj’(of’ j’ father’)) s/vp1 λR.Rj’(of’ j’ father’)
1 |E
s/vp1
\E
/I, 1
s (help’ j’(of’ j’ father’))
Figure 5.12.
(36)
T
2
helps him vp1 help’
s (T help’) (s/vp1 )\s λT.T help’
/E
\I, 2 \E
Source clause of John’s father helps him, and Bill’s father does too
John realizes that he is a fool, but Bill does not, even though his wife does.
The critical reading is the one where John realizes that John is a fool, Bill fails to realize that Bill is a fool, but Bill’s wife realizes that Bill is a fool. So apparently the second clause takes the first clause as antecedent and receives a sloppy reading, while the third clause is anaphoric to the second clause and strict. Under an identity-of-meaning theory, this configuration should be impossible. Another analysis is possible though. We may analyze both ellipses as taking the first clause as antecedent and receiving a sloppy construal. The second ellipsis is extremely sloppy because it takes the possessor of the subject as antecedent of the sloppy pronoun rather than the subject itself. Liberal though the present theory may be, it is not entirely unconstrained. In particular, it predicts a fundamental asymmetry between VPE in coordination and subordination. In subordinative constructions, it is as restrictive as the traditional Sag, 1976 style theory. To place this aspect in the right perspective, let us briefly return to the general issue: Does VPE involve identity of meaning? I have argued above that such a theory has to be paired with some theory of parallelism to cope with the problem of overgeneration. Given this, it might be suggested that we totally trivialize the operation of ellipsis resolution (“fill in whatever gives you a sentence”) and locate all interesting generalizations inside the parallelism module. This idea has been pursued by many authors, most prominently by Dalrymple et al., 1991, Rooth, 1992, and Shieber et al., 1996.
208
ANAPHORA AND TYPE LOGICAL GRAMMAR
As far as the syntax-semantics interface goes, the meaning of a VP anaphor like does in a VPE construction is simply a free variable over properties in such an approach. This variable is instantiated by means of a pragmatic resolution process that takes parallelism constraints into account. Recall that in a system using free variables, the meaning of a free variable is a function from assignment functions to values. So the VP anaphor does would be translated as a variable P , say, which is interpreted as the function λg.g(P ). Suppose we would try to incorporate such an approach into the overall Categorial machinery. The closest counterpart of a free variable is a function that consumes only those components of the assignment function g that are relevant for the evaluation of P , and this is the identity function over properties. So in a sense, the free variable approach corresponds to the variant of the present LLC-based approach where VPE slots always remain free. The main effect of admitting binding of VPE slots is to induce a preference ordering over possible resolutions. Arguably, bound readings of anaphors are preferred over those where the interpretation of the anaphor is supplied by the context. While constraints on resolution that are predicted by the LLCanalysis of VPE are relevant to the syntax-semantics interface, the parallelism constraints operate on the discourse level. This distinction is worked out clearly for instance in Gardent, 2000. There, it is pointed out that the constraints on de-accenting are of a similar nature as the constraints on ellipsis resolution. For instance, in the following exchange, it is infelicitous to de-accent the VP likes Sarah in the second sentence: (37)
a. A: John likes Mary. b. B: No, PETER [likes Sarah]. (example from Gardent, 2000)
Likewise, a resolution of does in (38b) as likes Sarah is totally infelicitous. (38)
a. A: John likes Mary. b. B: No, PETER does.
As Gardent shows, the formal machinery to state the parallelism constraints on VPE resolution that is proposed in Dalrymple et al., 1991 and Shieber et al., 1996 can be directly extrapolated to the analysis of de-accenting. It is well-known, however, that de-accenting need not be licensed by overt material. Inferred propositions can serve as licensors as well. Gardent illustrates this point with Lakoff’s (1971) example: (39)
First John called Mary a republican, and then SHE insulted HIM.
Verb Phrase Ellipsis
209
Here the parallelism that licenses the de-accenting of insulted holds between the propositions John insulted Mary and Mary insulted John. The former proposition is not explicitly expressed but contextually inferred from the first conjunct. This suggests that parallelism is a constraint on interpretation that operates on the discourse level. It may constrain the resolution of VPE as a side effect. To motivate a structural account of VPE like the present one, it has to be shown that there are structural constraints on VPE resolution beyond parallelism. Reconsider a simple strict/sloppy ambiguity like (40)
John revised his paper, and Bill did too.
An identity-of-meaning approach has to assume that the source VP is ambiguous between to revise John’s paper and to revise one’s own paper. Outside ellipsis construction, this ambiguity is spurious, but it leads to different truth conditions for the target clause in VPE. A purely parallelism based approach can do without this kind of spurious ambiguity. Informally put, the mentioned theories require only that replacing Bill by John in the target clause leads to the same meaning as the source clause. Clearly, both the strict and the sloppy readings fulfill this requirement, independently of the semantic derivation of the source clause. So a parallelism based theory does without the assumption of spurious ambiguity. Alongside the fact that these theories are unified—only the parallelism constraint matters—this is another strong argument in their favor. However, it can be argued that the assumption of a spurious ambiguity is unavoidable, as soon as we turn attention to subordination constructions. So an adequate account of VPE has to be hybrid between syntax/semantics and pragmatics to some degree. Consider a comparative construction like (41)
John revised his paper faster than Bill did.
The syntactic structure of this sentence, using traditional category labels, is given in Figure 5.13 on the next page. The fact that the comparative clause faster than Bill did cannot be attached to the matrix S node can be seen from the fact that it is impossible to give the comparative operator scope over the matrix subject. To see why, consider the following example. (42)
Betsy collected more rose hips than all the boy scouts.
This sentence is ambiguous between a collective reading of all the boy scouts (where Betsy outperformed the joint efforts of all the boy scouts)
210
ANAPHORA AND TYPE LOGICAL GRAMMAR S
NP
VP
John VP
AdvP
revised his paper faster
S
than Bill
VP did
Figure 5.13.
and a distributive reading (where she was just better than every individual boy). So all the boy scouts may take wide scope or narrow scope with respect to the comparative operator. Compare this to (43)
All the girl scouts collected more rose hips than Tom.
Here the quantifier all the girl scouts can only have a distributive interpretation. This implies that the matrix subject in a comparative construction must take wide scope with respect to the comparative operator. Syntactically speaking, this means that a comparative clause like collected more rose hips than Tom cannot be construed as being attached to the matrix S node, but must be analyzed as a VP adjunct. Given this, it is impossible to establish parallelism between source and target clause in (41), since the target clause is included in the source clause. So if parallelism plays a role here, it can only be a parallelism between VPs, not between clauses. But this means that the meaning of source VP and target VP must be identical; the subjects are excluded from parallelism. The target VP is ambiguous between a strict and a sloppy reading, thus there must be a spurious ambiguity in the source VP. Two conclusions have to be drawn from this. First, the interface between syntax and semantics has to supply the option of a pronoun being bound “sloppily” to the subject argument place of a superordinate verb before the overt subject is supplied. That much spurious ambiguity is inevitable. Second, since the parallelism constraint in whatever form is unable to say anything about constructions like (41), but the space of possible interpretations there is neither totally free nor totally
211
Verb Phrase Ellipsis
restricted, we need a non-trivial theory of VP ellipsis beyond parallelism. An analysis of VPE as hybrid in nature appears to be inevitable. Now let us see what the present theory has to say about the sloppy reading of examples like (41). Reproducing the phrase structure given above in Categorial terms, I assume the lexical assignment (44)
faster’ : ((np\s)\(np\s))/s
for faster than. Binding the pronoun to John right away leads to the unproblematic strict reading. But we also correctly predict a sloppy reading. The construction requires that we derive successively two goal types for the source VP revised his paper while leaving the pronoun unresolved. First, the unresolved VP has to be lifted to the ellipsis type (s/vp1 )\s to supply an appropriate antecedent for the target clause. But after that, it has to be lowered to the ordinary VP type np\s to serve as argument of the operator faster than. There are two derivations for the first part, but they lead to the same result in the second part: vp1 ⇒ R
(s/vp1 )\s np\s ⇒ λT.T R λx.Rxx λT.T (λxλy.Ryx)
So for the matrix, we derive the expected reading where John revises his own paper. As for the embedded clause, the subject Bill has to combine with the “copy” of the lifted VP to form a sentence. Here again, both solutions for the lifted type lead to the same result: np b’
s (s/vp1 )\s ⇒ λT.T R Rb’b’ λT.T (λxλy.Ryx)
So we correctly predict there to be a sloppy reading in subordinating VPE constructions. Matters become more interesting if we combine the kind of non-subject sloppy scenario discussed in connection with (35) with subordination. (45)
John’s lawyer defended him better than Bill’s lawyer did.
It goes without saying that this sentence has a strict reading where John was defended both by his own and by Bill’s lawyer. We are interested in the reading where Bill’s lawyer defended Bill. What I said about the two goal types of the source VP above applies here as well. If the pronoun him is not bound before ellipsis resolution (which leads to a strict reading), it can only be bound to the subject of the matrix verb.
212
ANAPHORA AND TYPE LOGICAL GRAMMAR
This leads to the subject-sloppy reading which is excluded here because it violates Binding Principle B. There is no way to derive a genuine nonsubject sloppy reading. And this reading in fact does not exist. In other words, we correctly predict sloppy readings in subordination constructions to be strictly limited to subjects. Here the predictions derived with the “lifted” entry for the auxiliary are not different from the much simpler theory of Section 2. Let us summarize the findings of this chapter. Our main aim was to demonstrate that the Categorial logic LLC is a suitable base for a linguistically informed theory of VPE. I first considered a very simple implementation, treating the stranded auxiliary in the target clause of VPE constructions as a pro-VP, i.e., as an item with the type vp|vp that denotes the identity function. Despite its simplicity, the resulting theory does a good job if we limit our attention to cases where either the target VP is subordinated or both source clause and target clause are main clauses and the subjects are simple.q The account of the interaction between the three factors of VPE resolution, pronoun resolution and quantifier scope was shown to be largely empirically adequate with respect to this fragment. This theory proved to be too restrictive though for cases involving inverted scope or constructions in which the antecedents of a sloppy pronoun are not the subjects of the source clause and the target clause respectively. To cope with these phenomena, I modified the lexical entry of the auxiliary to a polymorphic proform over lifted VPs. The resulting theory correctly predicts a strong asymmetry between coordination and subordination. VPE in subordinated clauses follows the predictions of the original, simple theory, while VPE in coordinated constructions is subject to few structural constraints. For the latter kind of cases, our final theory overgenerates considerably and has to resort to the assumption that ellipsis resolution is constrained by pragmatic factors.
Chapter 6 INDEFINITES
1.
Introduction
The theory of pronominal anaphora that was developed in the last chapters deals with a range of phenomena that is comparable to the empirical coverage of classical Montague Grammar or one of its variants. Modern post-Montagovian semantics has focused on kinds of anaphora that transcend the limitations of Montague’s framework. The central observation (which can be traced back to medieval or even ancient philosophy) is the fact that some bound pronouns can occur outside the syntactic scope of their binders. Typically, the binder in these cases is an indefinite NP. There are several patterns of this kind of “dynamic” binding. To start with, binder and pronoun may occur in different clauses or even in different sentences. (1)
[A man]i walked. Hei talked.
Here, the existential force that is prima facie connected with the indefinite article extends beyond the sentence boundary such that the pronoun in the second sentence can be bound by the indefinite in the first one. Also, an indefinite which is embedded in another quantifier can bind a pronoun outside its scope. This is one pattern of the classical donkey sentences. (2)
a. Most farmers who own [a donkey]i beat iti . b. No farmer who owns [a donkey]i beats iti .
Here, the indefinite takes narrow scope with respect to the subject determiner (most or no) in the preferred readings of these examples. It can it nonetheless bind a pronoun inside the VP, i.e., outside its
213
214
ANAPHORA AND TYPE LOGICAL GRAMMAR
scope.Furthermore, the syntactic complement of the main subject determiner (most or no in the example) seems to appear both in the restrictive clause and the nuclear scope of the corresponding quantificational structure. While the indefinite always corresponds to an existential quantifier in the restrictor, it is ambiguous between an existential and a universal reading in the nuclear scope. These two readings have been called strong and weak in the literature. They are paraphrased for (2a) in (3a) and (3b) respectively.1 (3)
a. Most farmers who own a donkey beat every donkey they own. b. Most farmers who own a donkey own and beat a donkey.
Finally, indefinites that occur in the if -clause of a conditional can bind pronouns in the main clause. This is the second brand of donkey sentences. Here, the quantificational force of the indefinite in question is determined by the adverb of quantification in the main clause. If no such adverb is present, the quantificational force is universal/generic. So (4a) is interpreted as the paraphrase in (4b).2 (4)
a. If [a man]i walks, hei talks. b. Every man who walks talks.
These and related data have inspired the development of quite a few novel semantic frameworks, notably various versions of Discourse Representation Theory (Lewis, 1975, Kamp, 1981, Chapter 2 of Heim, 1982, and Kamp and Reyle, 1993) and of Dynamic Semantics (like Chapter 3 and 4 of Heim, 1982, Barwise, 1987, Rooth, 1987, Staudacher, 1987, Groenendijk and Stokhof, 1991a, Groenendijk and Stokhof, 1991b, and many descendants of the work of Groenendijk and Stokhof). It also stirred considerable controversy on the nature of linguistic meanings and the relation between syntax and semantics. Recently, Dekker, 2000 has made a proposal that seems to combine the best aspects of both families of theories and perhaps finally resolves these controversies. Like the systems of Dynamic Semantics, Dekker’s theory lends itself to a compositional treatment of the phenomena described above, and like systems of DRT, it assumes a classical, static notion of truth. Our goal in the next three sections is a modest one. As it turns out, Dekker’s semantics of pronouns is perfectly compatible with the theory 1 Usually each donkey sentence either favors the strong or the weak reading, but a consensus has been reached in the literature that both readings are structurally available. For a detailed discussion of this ambiguity, see Kanazawa, 1994. 2 Conditional donkey sentences also exhibit a systematic ambiguity called the “proportion problem”. To keep things simple, I will skip over this point here.
Indefinites
215
sketched in Chapter 4. His treatment of indefinites lends itself readily to a type logical reformulation as well.3 I will thus confine myself to a TLG implementation of Dekker’s system which ignores the proportion problem. It is not very difficult though to translate your favorite DRT or dynamic treatments into Dekker’s system, and thus into TLG. In other words, this part supplies Dekker’s first order system with a type logical syntax-semantics interface. In the second part of the chapter, I will focus on the treatment of the descriptive content of indefinites. The basic idea is that constituents containing indefinites are to be interpreted as functions, and that the descriptive content of an indefinite supplies the domain of this function. So I will introduce partiality into the semantics. In the third and final part of the chapter, I will apply this grammar of indefinites to the phenomenon of sluicing, an area where anaphora and indefiniteness interact.
2.
Dekker’s Predicate Logic with Anaphora
Montague Semantics holds it that the meaning of a sentence is exhaustively defined by its truth conditions, and that the meaning of complex linguistic signs is composed from the meanings of their parts. Both DRT and Dynamic Semantics challenge this view. The standard argument runs as follows: The sentences (5a) and (5b) are truth-conditionally equivalent. Nevertheless, (5c) can be a follow-up to (5a), but not to (5b) with an interpretation where the pronoun refers to the man mentioned in the first sentence. If one contends that anaphora is a semantic phenomenon, either (5a) and (5b) are not synonymous, or else the composition of sentences in discourse is not compositional. (5)
a. A man walks. b. It is not the case that no man walks. c. He whistles.
Stalnaker, 1998 objects to this argument. According to him, (5a) and (5b) are semantically equivalent, but their pragmatic usage conditions differ. While (5a) can be used with referential intentions, (5b) cannot, and it is the intended referent of the indefinite description that supplies a value for the pronoun in the subsequent discourse. Dekker, 2000 takes up Stalnaker’s argumentation, and he gives a formal reconstruction of the Stalnakerian program. If a sentence like (5a) is used with referential intention, its satisfaction can only be evaluated with respect to this referent. Satisfaction of a sentence is thus 3 The significance of Dekker’s work for the program of variable free semantics was pointed out in Szabolcsi, 2000.
216
ANAPHORA AND TYPE LOGICAL GRAMMAR
relativized to sequences of individuals which supply the referents of the referential indefinites occurring in the sentence. The existential impact of indefinites comes in by means of a distinction between satisfaction and truth: A sentence is true iff it can be satisfied. The meaning of a sentence is identified with its satisfaction conditions. Satisfaction is recursively defined as a relation between models, assignment functions, sequences of referents, and formulae. So as in Dynamic Semantics, the meaning of a sentence is richer than its truth conditions, but neither notion is dynamic in any way. Techniques from Dynamic Semantics are used though in the interpretation of the logical connectives of negation and conjunction: negation negates the truth conditions of its operand rather than its meaning, and conjunction is non-commutative in a way that allows forward binding but not backward binding. Dekker carries out his program for a version of first order predicate logic that he calls Predicate Logic with Anaphora, abbreviated as PLA. The language of PLA is that of first order predicate logic without function symbols, but it divides the set of individual constants into two subsorts: the set of ordinary constants C, and the set of pronouns P = {p1 , p2 , . . .}. The usual abbreviational conventions apply; ¬, ∧ and ∃ are basic connectives, while ϕ ∨ ψ abbreviates ¬(¬ϕ ∧ ¬ψ), ϕ → ψ abbreviates ¬(ϕ ∧ ¬ψ), and ∀xϕ abbreviates ¬∃x¬ϕ. A central non-classical parameter of a formula is the number of referential existential quantifiers occurring in it. It is called the length of a formula and can be defined recursively:
Definition 45 (Length of a formula) n(Rt1 . . . tm ) = 0 n(∃xϕ) = n(ϕ) + 1 n(¬ϕ) = 0 n(ϕ ∧ ψ) = n(ϕ) + n(ψ) Another way to look at the notion of length is to say that the length of a formula is the number of discourse referents that are introduced by the usage of this formula. Existential quantifiers—as the formal counterpart of indefinites—introduce discourse referents, and both existential quantification and conjunction are transparent for discourse markers introduced in their scope. That means a discourse marker which is introduced in their scope can be accessed outside this scope. Negation however closes off the referential potential of indefinites occurring in its scope. Therefore the length of a negated formula is always 0.
217
Indefinites
I now turn to the compositional definition of satisfaction for PLA. Models for PLA are standard first order models, i.e., they consist of a domain of individuals D and an interpretation function E that maps ordinary individual constants to elements of D and n-place predicate symbols to n-place relations over D. The denotation of individual constants and variables is defined relative to a model M , an assignment function g and an infinite sequence of individuals e. Satisfaction is a relation between a model M , an assignment function g, a sequence of individuals e, and a formula ϕ. Following standard practice, I write ϕM,g,e = a iff the denotation of ϕ relative to M, g and e is a, and I write e M,g ϕ iff ϕ is satisfied relative to M, g, and e. I suppress the index for the model where convenient. Also, I use the notation ei to refer to the ith element of the sequence e (where the counting starts with 1). e − n is the sequence that results if you remove the first n elements from e, i.e., it is the sequence en+1 , en+2 , en+3 , . . .. Finally, if c is a finite sequence and e is a (finite or infinite) sequence, ce is the result of concatenating c and e. The semantics of PLA is given by the following recursive definition:
Definition 46 (Interpretation of PLA) xg,e pi g,e e g Rt1 . . . tm e g ∃xϕ
= = ⇐⇒ ⇐⇒
g(x) ei t1 g,e , . . . , tm g,e ∈ E(R) e − 1 g[x→e1 ] ϕ
e g ¬ϕ e g ϕ ∧ ψ
⇐⇒ ⇐⇒
¬∃c ∈ Dn(ϕ) : ce g ϕ e g ψ and e − n(ψ) g ϕ
The interpretation of variables is determined by the assignment function g. A pronoun pi picks its value from the sequence of available discourse referents e. The index of the pronoun determines which referent is chosen. p1 refers to the topmost element, p2 to the second etc. The interpretation of atomic formulae is standard. The central innovation is the clause for existential quantification. To satisfy ∃xϕ, it is not sufficient that there is a witness for x that verifies ϕ, but this witness has to be present as the topmost element in the sequence of referents e. Once this witness is supplied and the quantified variable is mapped to it, ϕ is evaluated relative to the remaining sequence of referents e − 1. So as in Dynamic Semantics, an existential quantifier introduces a novel discourse referent, but this dynamic aspect is part of the evaluation procedure and not of the denotation of the formula in question.
218
ANAPHORA AND TYPE LOGICAL GRAMMAR
A negated formula ¬ϕ is satisfied if ϕ is not satisfiable, i.e., if it is impossible to extend e in such a way that it satisfies ϕ. As we will see shortly, this amounts to saying that ¬ϕ is satisfied in e iff ϕ is not true in e. The non-commutative semantics for conjunction incorporates the idea from Dynamic Semantics that discourse referents that are introduced in the first conjunct can be readressed in the second conjunct, but not vice versa. If a conjoined formula ϕ ∧ ψ is satisfied relative to e, this means that e contains the witnesses for the existential quantifiers in both ϕ and ψ. In an intuitive sense, ψ is interpreted “later,” therefore its contribution constitutes a prefix of e. ϕ is thus interpreted relative to e − n(ψ), i.e., e with the contribution of ψ stripped off, while ψ is interpreted relative to e itself. Dekker distinguishes between satisfaction and truth. A formula ϕ is true relative to a sequence e iff it is possible to extend e with witnesses for the existential quantifiers in ϕ in such a way that the extended sequence satisfies ϕ.
Definition 47 (Truth in PLA) ϕ is true with respect to g and e iff ∃c ∈ Dn(ϕ) : ce g ϕ. So the existential impact of ∃ is not part of its meaning as such, but it comes in (1) in the definition of truth, which existentially quantifies over witnesses for existentially quantified variables, and (2) in the clause for negation which denies the truth of the formula in the scope of the negation. This is reminiscent of the operations of existential closure in DRT, which apply both to the top level DRS and to DRSs in the scope of negation, universal quantification and implication (which are defined in terms of negation in PLA). Let us see how the key features of Dynamic Semantics are reproduced in PLA by going through the central examples. (6)
a. [A man]i walked. Hei talked. b. ∃x(man’x ∧ walk’x) ∧ talk’p1
Note that the pronoun he is translated as a pronoun in PLA, not as a variable as in standard translations. The intended interpretation of the pronoun is managed by choosing the appropriate index in PLA. Applying the semantic clauses to (6b) gives the truth conditions
219
Indefinites
∃x(man’x ∧ walk’x) ∧ talk’p1 is true wrt. g and e ∃c ∈ D.ce g ∃x(man’x ∧ walk’x) ∧ talk’p1 ∃c ∈ D.ce g ∃x(man’x ∧ walk’x) and ce g talk’p1 ∃c ∈ D.e g[x→c] (man’x ∧ walk’x) and ce g talk’p1 ∃c ∈ D.e g[x→c] man’x and e g[x→c] walk’x and ce g talk’p1 ∃c ∈ D.c ∈ E(man’) and e g[x→c] walk’x and ce g talk’p1 ∃c ∈ D.c ∈ E(man’) and c ∈ E(walk’) and ce g talk’p1 ∃c ∈ D.c ∈ E(man’) and c ∈ E(walk’) and c ∈ E(talk’)
⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒
These are just the truth conditions of ∃x(man’x ∧ walk’x ∧ talk’x) (both classically and in PLA). The “dynamic binding” from the existential quantifier in the first conjunct to the pronoun in the second conjunct is possible because the existential quantifier increases the length of the formula by 1, and thus adds one referent to the sequence of evaluation. This referent can be picked up by the pronoun. Compare this to: (7)
a. It is not the case that no man walks. He talks. b. ¬¬∃x(man’x ∧ walk’x) ∧ talk’p1
Due to the fact that the length of (7b) is 0, its truth conditions are ¬¬∃x(man’x ∧ walk’x) ∧ talk’p1 is true wrt. g and e ∃c ∈ D.c ∈ E(man’) ∧ c ∈ E(walk’) ∧ e1 ∈ E(talk’)
⇐⇒
In fact, there is no index of the pronoun which would render (7b) equivalent to (6b). Given the “dynamic binding” equivalence ∃xϕ ∧ ψ(pn(ϕ)+1 ) ⇐⇒ ∃x(ϕ ∧ ψ(x)) the treatment of donkey sentences in PLA is straightforward. (8)
a. Every farmer who owns a donkey beats it. b. ∀x(farmer’x ∧ ∃y(donkey’y ∧ own’yx) → beat’p1 x)
Using the abbreviational conventions for universal quantification and implication, together with the equivalence given above, we can transform (8b) as follows (note that ¬¬ϕ ⇐⇒ ϕ provided n(ϕ) = 0): ∀x(farmer’x ∧ ∃y(donkey’y ∧ own’yx) → beat’p1 x) ∀x¬(farmer’x ∧ ∃y(donkey’y ∧ own’yx) ∧ ¬beat’p1 x) ∀x¬(farmer’x ∧ ∃y(donkey’y ∧ own’yx ∧ ¬beat’yx)) ∀x¬∃y(farmer’x ∧ donkey’y ∧ own’yx ∧ ¬beat’yx) ∀x∀y¬(farmer’x ∧ donkey’y ∧ own’yx ∧ ¬beat’yx) ∀x∀y(farmer’x ∧ donkey’y ∧ own’yx → beat’yx)
⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒
220
ANAPHORA AND TYPE LOGICAL GRAMMAR
A similar analysis can be given for the conditional donkey sentence (9)
a. If a farmer owns a donkey, he beats it. b. ∃x(farmer’x ∧ ∃y(donkey’y ∧ own’yx)) → beat’p1 p2
∃x(farmer’x ∧ ∃y(donkey’y ∧ own’yx)) → beat’p1 p2 ¬∃x(farmer’x ∧ ∃y(donkey’y ∧ own’yx)) ∧ ¬beat’p1 p2 ¬∃x(farmer’x ∧ ∃y(donkey’y ∧ own’yx) ∧ ¬beat’p1 x) ¬∃x(farmer’x ∧ ∃y(donkey’y ∧ own’yx ∧ ¬beat’yx)) ¬∃x∃y(farmer’x ∧ donkey’y ∧ own’yx ∧ ¬beat’yx) ∀x∀y¬(farmer’x ∧ donkey’y ∧ own’yx ∧ ¬beat’yx) ∀x∀y(farmer’x ∧ donkey’y ∧ own’yx → beat’yx)
⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒
Dekker’s system thus has the same coverage as both classical DRT (in the sense of Kamp, 1981) and Dynamic Predicate Logic (Groenendijk and Stokhof, 1991b), and it combines insights from both sources. PLA is compositional (in particular it has a compositional non-commutative conjunction and a compositional counterpart of the indefinite article) like Dynamic Predicate Logic. Like DRT, it uses a Tarski style static semantics, and it makes crucial use of existential closure over sequences of individuals. In the subsequent sections, I will explore the compatibility of Dekker’s PLA with the treatment of anaphoric pronouns that was developed in Chapter 4 within the framework of LLC, and I will develop an extension of LLC which integrates crucial aspects of Dekker’s treatment of indefinites.
3.
Bringing PLA into TLG
What exactly is the semantics of pronouns that is embodied in Dekker’s PLA? Consider a simple sentence containing a pronoun like (10a), which is translated as (10b). (10)
a. He walked. b. walk’pi
The formula (10b) is satisfied by a sequence e iff the ith element of e falls into the extension of walk’. So the meaning of the formula has two aspects: the descriptive part of it is just the denotation of the predicate walk’, while the structural aspect determines how the pronoun is to be resolved in a larger discourse. Compare this to the interpretation that the same sentence would receive under LLC
Indefinites
(11)
221
He walked – λxwalk’x : s|np
Here the meaning of the sentence coincides with the descriptive part of its PLA-semantics. The structural aspect is missing. The reason for this is simple: In Dekker’s system, anaphora resolution is part of the translation procedure from English to PLA. In LLC, it is part of the grammar of English, and there is no need for an intermediate level of representation where anaphors are resolved. So apart from the locus of anaphora resolution, the semantic contribution of anaphoric pronouns in PLA and in LLC is virtually identical. Now compare this to the PLA-semantics for indefinites. Let us ignore the descriptive part of indefinite descriptions for the moment and limit our attention to indefinites like someone (against the background of a universe of discourse that only consists of humans). Sentence (12a) is to be translated to the PLA formula (12b). (12)
a. Someone walked. b. ∃xwalk’x
This formula is satisfied by a sequence e iff the first element of e falls into the extension of walk’. This means that the descriptive aspect of the meaning of this formula is identical with the descriptive aspect of the meaning of (11). The fact that the descriptive meaning—a function from individuals to truth values—has to be applied to the first element of the current sequence follows from the way that this descriptive meaning is syntactically expressed. In other words, (10b) and (12b) are semantically equivalent, but they implicitly belong to different syntactic categories. To make this point clear, consider the case where i = 1 in (10b). Then (10b) and (12b) will be satisfied by the same sequences, but they nevertheless make different contributions to the meaning of complex formulae, and they have different truth conditions. This is due to the fact that existential quantifiers have an impact on the length of a formula, but pronouns do not. To keep this in line with the principle of compositionality, we have to assume that the length of a formula is part of its syntactic category. To implement this in TLG, we have to assume that the semantic contribution of indefinites and pronouns are identical, i.e., that their deductive behavior leads to identical Curry-Howard terms, but that their syntactic categories are nevertheless distinct. Since the semantic contribution of the indefinite in a sentence like (12a) is identical to the semantic contribution of the pronoun in (10a), I assume that the semantic composition of the two sentences is similar. So the denotation of the indefinite NP someone is the identity function, just like the denotation
222
ANAPHORA AND TYPE LOGICAL GRAMMAR
of the pronoun he. Their syntactic categories are distinct though; I thus enrich LLC with yet another type of implication to model indefinites.
Definition 48 If A and B are types, then AB is a type as well. The intuitive idea behind this connective is that Anp is the category of a sign that is like a sign of category A except that it contains one indefinite. The category of an indefinite NP itself is thus npnp . The corresponding semantic type is a Skolem function, which is lexically specified to be the identity function over individuals. So the natural mapping of categories to types would be τ (AB ) = τ (B), τ (A). However, the semantic impacts of walks, he walks, and someone walks differ, even though they are analyzed as having the same denotation. To take this difference into account, we have to mirror their difference in their syntactic categories as a difference in their semantic types. Therefore I assume a richer structure of semantic types from now on:4
Definition 49 (Semantic Types) 1 e and t are types. 2 If A and B are types, so are A, B, A|B, and AB . The syntax of the term language has to be adjusted accordingly:
Definition 50 1 Every variable x : A is a term. 2 If M : A is a term and x : B is a variable, then λxM : B, A, λxM : A|B, and λxM : AB are terms. 3 If M : A, B and N : A are terms, then (M N ) : B is a term. 4 If M : B|A and N : A are terms, then (M N ) : B is a term. 5 If M : B A and N : A are terms, then (M N ) : B is a term. I revise the category-to-type correspondence from Definition 38 on page 121 in the following way: 4 This is mainly a matter of convenience. Pragmatic notions like truth and entailment ultimately depend both on the denotation of a sign and its syntactic category. While the λ-terms represent the denotation of a sign, its type (in the sense used here) represents those aspects of its syntactic category that are pragmatically relevant.
223
Indefinites
Definition 51 (Category to type correspondence) Let τ be a function from CAT(B) to TYPE. τ is a correspondence function iff 1 τ (A\B) = τ (B/A) = τ (A), τ (B) 2 τ (A|B) = τ (A)|τ (B) 3 τ (AB ) = τ (A)τ (B) Even though the four implications from the enriched system (henceforth referred to as LLC+∧ ) are now distinguished in the syntax of terms, they are uniformly interpreted as function space formation.
Definition 52 (Domains) The function Dom is a semantic domain function iff 1 the domain of Dom is TYPE, 2 for all A ∈ TYPE, Dom(A) is a non-empty set, and 3 Dom(A, B) = Dom(B|A) = Dom(B A ) = Dom(B)Dom(A) Since the same term may have different types in the extended version of the λ-calculus that we use here and henceforth, it is convenient to introduce some syntactic sugar into the syntax of terms such that the typing of terms becomes unambiguous again. I thus use the following conventions: 1 If the type of λxM is AB , I write εxM instead of λxM . 2 If the type of λxM is A|B, I write πxM instead of λxM . Note that these are just orthographic conventions, neither the official syntax of the term language nor its semantics are affected by this. The grammatical contribution of the pronoun in (10a) is governed by the rule |I. Since the indefinite in (12a) behaves similarly, there must be an analogous rule for indefinites. This holds with one qualification however: While the |I rule takes the possibility into account that several pronouns may be coreferent (without being resolved), this is impossible with indefinites. So the counterpart of the |I-rule should be restricted to n = 1. The sequent formulation and the sequent style natural deduction formulation of this rule coincide; both rules take the form X, x : A, Y ⇒ M : C X, y : AB , Y ⇒ εz.M [(yz)/x] : C B
∧
224
ANAPHORA AND TYPE LOGICAL GRAMMAR
Anaphoric pronouns receive their interpretation from the preceding linguistic material. This is not the case for indefinites. According to Dekker (and Stalnaker), their value is fixed by the extra-linguistic context, rather than by the linguistic context. Thus there cannot be a counterpart of the |E-rule for AB . The rule given above is the only logical rule governing the behavior of the new connective. (Therefore its label is just “∧ ”.) The resulting system can be seen as a variable-free reformulation of Heim-style DRT. According to Heim, both indefinites and pronouns (alongside full definites, which will be ignored here) introduce a free variable into the semantic representation. The Novelty-Familiarity Condition requires that the variable that comes with an indefinite is novel while the variable that comes with a pronoun is familiar. Here, both kinds of NPs are interpreted as identity functions which function-compose with their semantic surroundings. This is the closest approximation to the notion of a free variable in a variable-free setting. The anaphora resolution rule |E enables pronouns to find antecedents (corresponding to the Familiarity Condition). Since there is no corresponding rule for indefinites, they never have antecedents—this is the counterpart of the Novelty Condition. Since the rule ∧ is isomorphic to one instance of |L, the Cut elimination proof for LLC extends immediately to LLC+∧ , the extension of LLC with the new connective for indefinites. Accordingly, decidability and the finite reading property are preserved. The tree style natural deduction version of the new rule is analogous to the corresponding rule |I:
Definition 53 (Natural deduction for ∧ in tree format) Let α be a proof tree with the conclusion sequence X, M : AB , Y , and β a proof tree with X , x : A, Y as sequence of undischarged premises (where X , Y are like X, Y except that all formulae are labeled with variables) and N : C as single conclusion. Then γ is a proof tree as well, where γ is the result of 1. replacing x in β with M y, 2. replacing all occurrences of variables occurring in X , Y by the corresponding terms from X, Y , and 3. merging the two graphs by identifying all nodes with identical labels and having M : AB immediately dominate M y : A, and 4. extending the resulting graph by a new node εyN : C B ;∧ with N : C as only premise. The proofs of Cut elimination, strong normalization, and the Normal Form theorem of the naturals deduction calculus for LLC can readily be extended to LLC+∧ by basically repeating the corresponding clauses for |I. So these properties are preserved as well.
225
Indefinites
In the graphical notation, the tree format natural deduction rule takes the shape given in Figure 6.1. M : AB .. . .. .
Mx : A .. .
i
N :C εxN : C Figure 6.1.
B
∧
.. . .. . ∧, i
Natural deduction rule for
∧
in tree format
Informally put, the rule expresses that we can temporarily ignore the exponent of a category AB , provided we retrieve this exponent later (i.e., further down) in the derivation. As indicated above, I assume the (somewhat simplified) lexical entry (13) for the indefinite NP someone. (13)
someone – εx.x : npnp
The derivation of (12a) thus comes out as in Figure 6.2. someone εx.x : npnp x : np
lex i
walked walk’ : np\s
walk’x : s εx.walk’x : snp Figure 6.2.
lex \E
∧, i
Derivation of (12a)
As was the case in connection with anaphora, the category of a sentence need not be s here, but it can for instance be snp , or any category corresponding to a sentence containing an arbitrary number of indefinites and unresolved pronouns. Therefore, a precise definition of the notion “sentence” must be recursive. A sign is a sentence if it can be assigned a sentential category, and the latter notion is defined as follows:
Definition 54 (Sentential Category) 1 s is a sentential category 2 If A is a sentential category, so are Anp and A|np.
226
ANAPHORA AND TYPE LOGICAL GRAMMAR
(I disregard unresolved ellipses here. If they are to be incorporated, the definition can easily be adjusted accordingly.) Due to the variable free character of TLG and its strict category-to-type correspondence, the polymorphism that is implicit in Dekker’s system is thus made explicit in the definition of a sentential category. It should be noted that decidability and the finite reading property are nevertheless preserved, since the complexity of the category of a sentence is always bounded by the number of indefinites and pronouns occurring in it, so we always have to consider only finitely many candidate categories for a given string. I can now give a PLA style semantics for English sentences. Two qualifications are necessary though: First, PLA employs two notions of interpretation, satisfaction and truth. I will do something similar: I will define a notion of truth on top of the standard interpretation function for λ-terms. There is thus no need for an independent notion of satisfaction here. Second, truth is a meta-notion that is defined as a property of denotations of English sentences. However, sentences may have identical denotations but different truth conditions in case they differ in their syntactic and semantic types. Truth is thus defined as a relation between a denotation and a semantic type. Since denotations depend on models and assignment functions, truth is implicitly relativized to them. Now let an interpretation function · for the typed λ-calculus (in the sense of Definition 16 on page 32) be given, and let e be a metavariable that ranges over sequences of elements of Dom(e). I write e |= α : A iff the sequence e verifies the sentence denotation α relative to the (sentential) syntactic type A.
Definition 55 (Truth) 1 e |= α : t iff α = 1 2 e |= α : A|e iff e − 1 |= αe1 : A 3 e |= α : Ae iff ∃c ∈ Dom(e).e |= αc : A So in short, all slots corresponding to indefinites are existentially bound, while the slots corresponding to pronouns are filled by the sequence of evaluation e. In the derivation of a sentence containing an indefinite, the type of the indefinite, npnp , is temporarily replaced by a hypothesis of type np that is discharged later. Thus the deductive behavior of indefinites is similar to that of quantifiers. Crucially, this temporary np-hypothesis can antecede subsequent pronouns. This leads to configurations where a pronoun is bound by an indefinite, as for instance in example (14). The derivation is given in Figure 6.3 on the facing page.
227
Indefinites
(14)
Someonei met hisi mother. his mother
someone εx.x : npnp [x : np]j
lex i
met
lex
meet’ : (np\s)/np
mother’ : np|np mother’x : np
met’(mother’x) : np\s meet’(mother’x)x : s
εx.meet’(mother’x)x : snp Figure 6.3.
lex |E, j /E
\E
∧, i
Derivation of (14)
According to the recursive truth definition given above, the truth conditions of this sentence are computed as follows: e |= εx.meet’(mother’x)xg : te ∃c ∈ Dom(e).e |= εxmeet’(mother’x)xg c : t ∃c ∈ Dom(e).e |= meet’(mother’x)xg[x→c] : t ∃c ∈ Dom(e).c, mother’(c) ∈ meet’
⇐⇒ ⇐⇒ ⇐⇒
In words, the sentence is true iff there is an individual that stands in the meeting relation to its mother. More generally, the ε-operator is finally interpreted as an existential quantifier, but its existential impact is not due to its contribution to the denotation of a term (recall that ε is just a notational variant of λ), but it is due to the truth definition. This is reminiscent of DRT, where free variables are existentially bound by default. The following fact generalizes this. Here and henceforth, I tacitly assume that the classical first order connectives ∧, ∨, ¬, ∃ and ∀ have their usual truth functional interpretation when used as part of the λ-calculus.
Fact 1 Let M be a term of type t, M be a model, g an assignment function, and e a sequence of individuals. Then it holds that e |= εv1 · · · εvn M M,g
iff
Proof: Immediate from the definitions.
e |= ∃v1 · · · ∃vn M M,g
An extension of TLG to the discourse level goes beyond the scope of this book. Therefore we cannot reproduce Dekker’s analysis of crosssentential anaphora. The treatment of cross-clausal anaphora within one
228
ANAPHORA AND TYPE LOGICAL GRAMMAR
sentence is unproblematic; it works in a manner parallel to the example given above. I simply admit that indefinites take arbitrary wide scope (the issue of the scope of indefinites will be further discussed in Section 5); therefore indefinites can antecede any subsequent pronoun within the same sentence. Since the scopal mechanism of indefinites is formally independent of ordinary quantifier scope (i.e., from qE), a restriction of the latter to the local clause, say, would not effect this analysis of indefinites. An example of cross-clausal anaphora is given in (15) with the derivation in Figure 6.4. (15)
Someonei walked and hei talked. he
someone εx.x npnp [y]i np
j
πx.x np|np
lex walked walk’ np\s
lex \E
and
lex
λpq.q ∧ p (s\s)/s
walk’y s
|E, i
λq.q ∧ talk’y s\s
εy.walk’y ∧ talk’y snp
talked talk’ np\s
talk’y s
walk’y ∧ talk’y s
Figure 6.4.
y np
lex lex \e
/E
\E
∧, j
Derivation of (15)
As the reader can easily verify, the sentence is predicted to be true in this reading if and only if there is an individual that is both in the extension of walk and in the extension of talk.
4.
Donkey Sentences
In Dynamic Predicate Logic, negation is externally static, and this property is inherited by all operators that are defined in terms of negation, like the universal quantifier and implication. This property is ceteris paribus inherited by Dekker’s PLA-negation. An existential quantifier that resides inside the scope of a negation cannot bind a pronoun outside the scope of this negation. This is achieved by existentially binding off
Indefinites
229
all indefinites in the scope of the negation. Negation is therefore implicitly polymorphic in PLA; its semantic clause makes reference to the length of the formula in its scope. In the type logical reformulation, this polymorphism has to be made explicit. I will define this polymorphic negation in an indirect way. First, I will introduce the auxiliary notion of static closure. This is an operation from sentential denotations to sentential denotations that neutralizes the anaphora licensing potential of indefinites without affecting the truth conditions or the anaphoric potential of the sentence. (The analogous operation in Dynamic Predicate Logic neutralizes the dynamic potential of a formula; this motivates the name.) Strictly speaking, static closure is a twofold operation: It operates both on the level of (sentential) syntactic types and on the level of denotations. As an operation on types, it simply eliminates all argument slots corresponding to indefinites. I overload the symbol “↓” by using it both for static closure on (syntactic and semantic) types and for static closure on denotations. Here and henceforth, I use the upper case letter S as a meta-variable over sentential types.
Definition 56 (Static closure) 1 ↓s=s 2 ↓t=t 3 ↓ (S|e) = (↓ S)|e 4 ↓ (S|np) = (↓ S)|np 5 ↓ (S e ) =↓ S 6 ↓ (S np ) =↓ S The parallel recursion on syntactic categories and on semantic types ensures that static closure commutes with the map from categories to types, i.e., τ (↓ A) =↓ τ (A). As an operation on denotations, static closure has the effect of existentially closing all argument places corresponding to indefinites, while the argument slots that come from pronouns remain unaffected. (The letter c is used as meta-variable over elements of the individual domain Dom(e).)
Definition 57 (Static closure of sentential denotations) 1 ↓ (α : t) = α : t 2 ↓ (α : S|e) = λc. ↓ (αc) :↓ S|e
230
ANAPHORA AND TYPE LOGICAL GRAMMAR
3 ↓ (α : S e ) =
c∈Dom(e)
↓ (αc) :↓ S
Finally, we can define static closure as a syntactic operation on terms, i.e., as syntactic counterpart of the corresponding semantic operation. (The symbol “↓” is highly overloaded now; it symbolizes an operation on syntactic categories, an operation on semantic types, a functor in the term language and an operation on model-theoretic objects. The context, however, always makes clear what the intended meaning is.)
Definition 58 (Static closure of terms) 1 If M is a term of a sentential type S, then ↓ M is a term of type ↓ S. 2 ↓ M =↓ M Like the global truth definition, static closure existentially binds all variables that are bound by ε, while it has no impact on variables that are bound by π. This can be made precise by the following generalization.
Fact 2 For all models M and assignment functions g, it holds that ↓ πxM M,g = πx ↓ M M,g (↓ εxM ) : tM,g = ∃x ↓ M M,g Proof: Immediate from the definitions.
The (polymorphic) negation of a sentential denotation is now easily definable as the set-theoretic complement of the static closure of this denotation.5 Formally, negation is an operation on typed Curry-Howard terms.
Definition 59 (Dekker-Negation) 1 If M is a term of type S, then ∼ M is a term of type ↓ S. 2 ∼ M : Sg = ↓ M g : S Dekker-negation performs static closure on its operand, i.e., it induces existential closure on all ε-bound variables. In the special case that its operand has type t, it coincides with classical negation. 5 Following the usual practice, I do not distinguish between sets and their characteristic functions. Strictly speaking, the negation of a sentence denotation α is the characteristic function of the complement of the set {x|αx = 1}. Also, I take the complement of a truth value to be the opposite truth value.
231
Indefinites
Fact 3 For all models M and assignment function g it holds that ∼↓ M M,g = ∼ M M,g ∼ (M : t)M,g = ¬M M,g ∼ πxM M,g = πx ∼ M M,g Proof: The first part immediately follows from the fact that static closure on denotations is idempotent, i.e. ↓ (↓ (α : S) :↓ S) =↓ (α : S) To see why this is so, observe that ↓ S is a type of the form t|e · · · |e for arbitrary sentential types S. It follows immediately from the definition of static closure on typed denotations that it is the identity operation if applied to denotations of such a type. The second part and the third part follow immediately from the definitions of Dekker-negation and of static closure. Let us illustrate this notion of negation with an example. I assume the lexical entry in (16) for the negation auxiliary doesn’t. (16)
doesn’t – λx. ∼ x(λy.y) : q(vp/vp, S, ↓ S)
This seemingly complex entry basically expresses the idea that doesn’t occupies the position of an auxiliary while its semantic impact is negation with scope over the entire clause. Now consider the following sentence. (17)
Someone doesn’t beat his donkey.
This sentence is four-way ambiguous, since the pronoun may be free or bound, and the indefinite may take narrow or wide scope with respect to negation. These four readings correspond to four different TLGderivations.6 They are given in the figures 6.5 – 6.8. Successively applying the definitions and facts given above leads to the following derivation of the truth conditions for the reading from Figure 6.5 on the following page.
6 When the pronoun is free, a spurious ambiguity arises because the |I may be applied before or after qE, but this choice does not affect the truth conditions.
232
ANAPHORA AND TYPE LOGICAL GRAMMAR
his donkey doesn t someone εx.x npnp
lex
k
λy ∼ y(λz.z) q(vp/vp, S, ↓ S)
lex j
beat
donkey of’w np
beat’ (np\s)/np
u vp/vp
v np
lex
donkey of’ np|np
beat’(donkey of’w) np\s u(beat’(donkey of’w)) np\s
u(beat’(donkey of’w))v s
πwεv.u(beat’(donkey of’w))v snp |np ∼ πwεv.beat’(donkey of’w)v s|np
someone εx.x npnp
lex
k
λy ∼ y(λz.z) q(vp/vp, S, ↓ S)
\E
|I, i qE, j
First derivation of (17)
lex j
beat
lex
beat’ (np\s)/np
u vp/vp
v np
donkey of’ np|np donkey of’w np
beat’(donkey of’w) np\s u(beat’(donkey of’w)) np\s
u(beat’(donkey of’w))v s πw.u(beat’(donkey of’w))v s|np ∼ πw.beat’(donkey of’w)v s|np εv ∼ πw.beat’(donkey of’w)v (s|np)np Figure 6.6.
/E
/E
his donkey doesn t
i
∧, k
εv.u(beat’(donkey of’w))v snp
Figure 6.5.
lex
|I, i qE, j ∧, k
Second derivation of (17)
\E
/E
lex i /E
233
Indefinites
his donkey doesn t someone εx.x npnp
lex
k
λy ∼ y(λz.z) q(vp/vp, S, ↓ S)
beat
lex j
lex
donkey of’v np
beat’ (np\s)/np
u vp/vp
donkey of’ np|np
beat’(donkey of’v) np\s
[v]i np
u(beat’(donkey of’v)) np\s u(beat’(donkey of’v))v s εv.u(beat’(donkey of’v))v snp ∼ εv.beat’(donkey of’v)v s Figure 6.7.
someone εx.x npnp [v]i np
lex
k
\E
∧k
qE, j
beat
lex j
lex
beat’ (np\s)/np
u vp/vp
donkey of’ np|np donkey of’w np
beat’(donkey of’v) np\s u(beat’(donkey of’v)) np\s
u(beat’(donkey of’v))v s ∼ beat’(donkey of’v)v s εv ∼ beat’(donkey of’v)v snp Figure 6.8.
/E
Third derivation of (17)
λy ∼ y(λz.z) q(vp/vp, S, ↓ S)
|E, i
/E
his donkey doesn t
lex
qE, j ∧, k
Fourth derivation of (17)
\E
/E
lex |E, i /E
234
ANAPHORA AND TYPE LOGICAL GRAMMAR
e |= ∼ πwεv.beat’(donkey of’w)vg : s|np e |= πw ∼ εv.beat’(donkey of’w)vg : s|np e − 1 |= πw ∼ εv.beat’(donkey of’w)vg e1 : s e − 1 |= ∼ εv.beat’(donkey of’w)vg[w→e1 ] : s e − 1 |= ∼ ∃v.beat’(donkey of’w)vg[w→e1 ] : s e − 1 |= ¬∃v.beat’(donkey of’w)vg[w→e1 ] : s
∃c ∈ Dom(e).c, donkey of’g e1 ∈ beat’g
⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒
In words, the sentence is true in this reading relative to a sequence e iff the donkey of e1 isn’t beaten by anyone. By similar calculations, we can derive that the sentence is true in the second reading relative to e iff there is someone who doesn’t beat e1 ’s donkey. The third reading is true iff nobody beats his donkey, and the fourth reading is true if there is someone who refrains from beating his donkey. So the interaction between indefinites and negation works as it is supposed to. The third essential non-classical ingredient of PLA, next to existential quantification and negation, is conjunction. I will not use it to model the semantics of the English word and here; as was shown above, a classical semantics for and is compatible with the anaphora facts, given LLC+∧ . However, implication is defined in terms of negation and conjunction in PLA, and therefore we need a version of Dekker’s conjunction nevertheless. The crucial non-classical aspect of PLA-conjunction is the fact that it enables binding of pronouns in the second conjunct from indefinites (i.e., existential quantifiers) in the first conjunct. The mapping of indefinites and pronouns is determined both by the linear order of the existential quantifiers and the indices of the pronouns. The latter kind of information is absent from our type logical reformulation of PLA. On the other hand, unresolved pronouns have scope in LLC+∧ , and the scopal order of pronouns can be used to manage the mapping between pronouns in the second conjunct and their binders in the first conjunct. The basic idea is that the first indefinite in the first conjunct binds the first pronoun (i.e., the pronoun with widest scope) in the second conjunct, the second indefinite binds the second pronoun and so forth. Pronoun slots that are not bound in this way are inherited by the conjunction as a whole. Likewise, pronoun slots from the first conjunct are inherited by the conjunction as a whole. Finally, all slots corresponding to indefinites are inherited by the conjunction as a whole as well. These considerations lead to the following (re-)definition of Dekker’s conjunction. Analogously to negation, I first define an operation on types (static closure in the case of negation) before I give the corresponding definition for sentential meanings. Both indefinite slots and pronoun
235
Indefinites
slots from either conjunct are inherited by the conjunction as a whole, with the single exception of clause 3. If an indefinite slot in the first conjunct is matched by a pronoun slot in the second conjunct, the indefinite binds the pronoun, and thus only the indefinite slot is inherited by the conjunction as a whole.
Definition 60 (Dekker-conjunction on terms) Let S1 and S2 be sentential types. If M : S1 and N : S2 are terms, then S1 & S2 is defined as follows: t & S2 S1 |e & S2 S1e & S2 where t−1 S|e − 1 Se − 1
= S2 = (S1 & S2 )|e = (S1 & S2 − 1)e = t = S = (S − 1)e
Like static closure, conjunction of types can be defined on syntactic categories as well. The definition runs completely analogously:
Definition 61 s & S2 S1 |np & S2 S1np & S2 where s−1 S|np − 1 S np − 1
= S2 = (S1 & S2 )|np = (S1 & S2 − 1)np = s = S = (S − 1)np
Based on this syntactic notion, we can give a recursive definition for Dekker style conjunction as an operation on typed sentential denotations. (If both conjuncts have type t, Dekker-conjunction coincides with Boolean conjunction. The two-place operation min captures this; it takes two truth values as arguments and returns the smallest of them.)
Definition 62 (Interpretation of Dekker-conjunction) M & N g = M g & N g
236
ANAPHORA AND TYPE LOGICAL GRAMMAR
where 1 (α : t) & (β : t) = min(α, β) : t 2 (α : t) & (β : S|e) = λc.α & (βc) : S|e 3 (α : t) & (β : S e ) = λc.α & (βc) : S e 4 (α : S1 |e) & (β : S2 ) = λc.(αc) & β : (S1 & S2 )|e 5 (α : S1e ) & (β : S2 ) = λc.(αc) & (β : S2 + c) : (S1 & S2 − 1)e where 6 (β : t) + c = β 7 (β : S|e) + c = (βc) 8 (β : S e ) + c = λd.((βd) : S) + c As with the other counterparts of the PLA-connectives, there are some useful facts about the properties of Dekker-conjunction and its interaction with the other connectives.
Fact 4 For all models M and assignment functions g, it holds that M : t & N : tM,g = M ∧ N M,g (πxM ) & N M,g = πx(M & N )M,g provided x is not free in N εxM & πyN M,g = εx(M & N [x/y])M,g provided x is not free in N and y is free for x in N M : t & εxN M,g = εx(M & N )M,g provided x is not free in M Proof: Immediately from the definitions.
The third clause is especially noteworthy here, since it directly corresponds to dynamic binding in Dynamic Predicate Logic and its counterpart in PLA. We are now ready to put the pieces together and to implement the PLA analysis of donkey sentences in TLG. I start with conditional donkey sentences. Consider example (18).
237
Indefinites
(18)
If someone walks, he talks.
The only missing building block is the lexical entry for the complementizer if. Restricting attention to if -clauses in topicalized position for simplicity, I assume the following entry: (19)
if – λpq. ∼ (p & ∼ q) : (↓ (S1 & ↓ S2 ))/S2 /S1
To improve readability, I introduce another abbreviational convention: . M → N = ∼ (M & ∼ N ) The lexical entry for if thus becomes (20)
if – λpq.p → q : (↓ (S1 & ↓ S2 ))/S2 /S1
The semantic label is a direct translation of the PLA-treatment of implication. Since the implicit polymorphism of conjunction and negation in PLA becomes explicit in TLG, the syntactic category (and thus the semantic type) of if is polymorphic. Its specific instantiation depends on the number of donkey pronouns occurring in its scope. In the donkey reading of (18), the type of if comes out as s/s|np/snp . The syntactic derivation is given in Figure 6.9 (where I skip over the composition of the two clauses since these are analogous to previous examples). if λpq.p → q ↓ (S1 & ↓ S2 )/S2 /S1
lex
someone walks εxwalk’x snp
λq.εxwalk’x → q ↓ (snp & ↓ S2 )/S2
/E
he talks πytalk’y s|np
εxwalk’x → πytalk’y s Figure 6.9.
/E
Derivation of the donkey reading of (18)
The semantic representation of (18) thus comes out as εxwalk’x → πytalk’y According to the truth definition (Definition 55 on page 226), this term (and thus sentence (18)) is true with respect to a sequence e and an assignment g iff εxwalk’x → πytalk’yg = 1
238
ANAPHORA AND TYPE LOGICAL GRAMMAR
Expanding the abbreviation for → gives us ∼ (εxwalk’x & ∼ πytalk’y)g = 1 According to Fact 3, this can be rewritten as ∼ (εxwalk’x & πy ∼ talk’y)g = 1 Making use of Facts 3 and 4, we get ∼ εx(walk’x & ¬talk’x)g = 1 Fact 3 gives us ∼↓ εx(walk’x & ¬talk’x)g = 1 and thus according to Fact 2, we have ¬∃x(walk’x ∧ ¬talk’x)g = 1 According to the mundane semantics of first order logic, this is true iff every walking individual is also a talking individual. Now let us consider the classical donkey pattern, where two indefinites in the if -clause bind one pronoun each in the main clause each. (21)
If someone owns something, he beats it.
The if -clause is (spuriously) ambiguous, depending on whether the subject or the object receive wide scope. In either case the syntactic category of the clause is (snp )np , but the semantic representations differ. They are given in (22a) and (b). (22)
a. εxεy.own’xy : (te )e b. εyεx.own’xy : (te )e
The main clause is ambiguous in a similar way: either the subject pronoun or the object pronoun may take wide scope. In either case the category of the clause is (s|np)|np, and the two semantic representations are (23)
a. πxπy.beat’xy : (t|e)|e b. πyπx.beat’xy : (t|e)|e
So there are four ways to derive the final category s for the whole sentence. The accompanying Curry-Howard terms are
239
Indefinites
(24)
a. b. c. d.
εxεy.own’xy εyεx.own’xy εxεy.own’xy εyεx.own’xy
→ πxπy.beat’xy → πxπy.beat’xy → πyπx.beat’xy → πyπx.beat’xy
After a series of calculations that are similar to those of the previous example (but somewhat more complex), we end up with the truth conditions ∀c∀d(c, d ∈ own’g → c, d ∈ beat’g ) both for (24a) and (d). Likewise, we obtain the truth conditions ∀c∀d(c, d ∈ own’g → d, c ∈ beat’g ) for (24b,c). (This reading does not exist because it would involve a gender clash.) This example illustrates that the structural ambiguity between nesting and crossing is dealt with in the syntax-semantics interface. What if both pronouns (or, more generally, more than one pronoun) are dynamically bound by the same indefinite? Such a reading would lead both to a gender clash and a binding deviance in the previous example, but good examples are easily constructed, as for instance (25)
If something bothers someonei , hei turns hisi head.
There are two aspects of the corresponding derivation that are noteworthy. First, since it is the object of the if -clause that binds the pronouns in the main clause, the object has to receive wide scope in the derivation of this clause. Thus, its semantic representation has to come out as (26)
εyεx.bother’xy
Second, the coreference between the subject pronoun and the object pronoun in the main clause has to be dealt with in the syntax. This can be achieved by using an instance of the |I rule where n = 2. The derivation is given in Figure 6.10 on the following page. The type of if is thus instantiated as s/s|np/((snp )np ), and the semantic representation of (25) comes out as (27)
εyεx.bother’yx → πz.turn’(head’z)z
This formula is true with respect to a sequence if everybody who is bothered by something turns his head.
240
ANAPHORA AND TYPE LOGICAL GRAMMAR
his head he πx.x np|np
lex
turns
lex
turn’ (np\s)/np
i
z np
head’ np|np head’z np
turn’(head’z) np\s turn’(head’z)z s πz.turn’(head’z)z s|np
Figure 6.10.
i /E
\E
|I, i
Derivation of the main clause of (25)
Finally, the question arises how to treat a configuration where the if -clause contains indefinites and the main clause contains a pronoun, but the pronoun remains free or is bound by some operator in superordinate position. I only consider the former case, the latter is analogous. An example is given in (28). (28)
If somethingi happens, hej will resign.
The crucial aspect in the derivation of this kind of example is that the composition of the if -clause with the main clause is done prior to the application of |I which binds the anaphora slot that comes from the pronoun. So again the relevant binding (or better: non-binding) pattern is dealt with in the syntax. The basic structure of this derivation is sketched in Figure 6.11 on the next page. The semantic representation of the sentence (28) in the intended reading is thus (29)
πy.εz(happen’z) → resign’y
This formula is true with respect to a sequence e iff either nothing happens or e1 will resign. So the slot corresponding to the pronoun remains free at the sentence level and is filled by means of the extra-linguistic context. Let us now turn our attention to the other variety of donkey sentences, where an indefinite inside the restrictor of a quantifier binds a pronoun outside its scope. A simple example is
241
Indefinites something
lex εx.x happens npnp j lex z happen’ he np np\s \E lex happen’z πx.x if will resign s np|np i ∧ lex ,j lex np s/s/s εz.happen’z y resign’ λpq.p → q snp np np\s /E \E λq.εz(happen’z) → q resign’y s/s s /E εz(happen’z) → resign’y s |I, i πy.εz(happen’z) → resign’y s|np Figure 6.11.
(30)
Derivation of (28)
Every farmer who owns something beats it.
PLA follows DRT and Dynamic Semantics in the assumption that this kind of construction requires the same analysis as conditional donkey sentences. Technical difficulties arise if one attempts to follow this strategy too closely in TLG, and I will thus deviate from it somewhat. It turns out, however, that this decision is linguistically well-motivated. Consider the semantics of the common noun phrase farmer who owns something in the example above. Its Curry-Howard term is (31)
(εxλy.farmer’y ∧ own’xy)
So the denotation of this phrase is a (curried) binary relation between individuals. The two argument places are of different type though. The first argument place that corresponds to εx is introduced by an indefinite and creates a type Ae for some type A. The second argument place (corresponding to λy) is rooted in the semantic type of common nouns and leads to a type of the form e, A. A similar distinction between argument places can be made for the VP beats it in the above example (which denotes a binary relation as well). Its semantic representation is (32)
πxbeat’x
The outermost argument place corresponds to a pronoun and creates a type of the form A|e, while the second argument place (which is left
242
ANAPHORA AND TYPE LOGICAL GRAMMAR
implicit in the λ-term) corresponds to the subject position of the verb and creates a type of the form e, A. In the sequel, I will call argument positions that create types of the form A, B structural, while argument positions that correspond to indefinites or anaphors (i.e., argument positions that create types of the form A|B or AB ) will be called nonstructural. Notationally, structural argument positions are marked with λ, and non-structural ones with π or ε. This distinction is important for the analysis of quantificational donkey sentences because the determiner every in (30) binds the highest structural argument place both of its restrictor and its scope, no matter how many non-structural argument places these items may have. To formalize this analysis, I introduce the operation of structural function application. Applying a function f structurally to an argument x means that x fills the first structural argument place of f , while all non-structural argument places are passed on to the result of the operation. I will write f {x} for the result of applying f structurally to x. The operation is defined recursively as follows:
Definition 63 (Structural function application) 1 (M : B, A){N : B} = (M N ) : A 2 (M : A|B){N : C} = πx.(M x){N } : D|B (where D is the type of (M x){N }) 3 (M : AB ){N : C} = εx.(M x){N } : DB (where D is the type of (M x){N }) Note that structural function application is a partial operation. If M does not have structural arguments, or if the highest structural argument of M has another type than N , then M {N } is undefined. The intuitive motivation for introducing structural function application is the insight that determiners only bind the highest structural argument both of their restrictor and their scope. This is at odds with the DRT/PLA analysis which assumes that determiners unselectively bind both structural and non-structural arguments. As discussed above, unselective binding only predicts one of two readings for examples with every, and it predicts an entirely wrong reading for other determiners. The empirically correct generalization is that structural arguments are bound by the determiner while all other arguments are either bound universally or existentially (which leads to strong and weak readings of donkey sentences, respectively). I assume that determiners are systematically lexically ambiguous between a weak and a strong reading. These ideas are formalized in the following two lexical entries for the determiner every:
Indefinites
(33)
243
a. everyweak – λP Q.∀x(↓ P {x} →↓ (P {x} & Q{x})) : q(np, S, s)/N b. everystrong – λP Q.∀x(↓ P {x} → (P {x} → Q{x})) : q(np, S, s)/N
Here, I use N as a metavariable over categories of common noun phrases that contain an arbitrary number of pronouns and indefinites. The motivation for the semantic representations for the weak and the strong readings will become clear as we go along. First, consider the syntactic category of every, q(np, S, s)/N . Both the restrictor and the scope of the determiner are assumed to be polymorphic, i.e., they may contain an arbitrary number of indefinites and pronouns. Combining every with its restrictor yields a quantifier that occupies an np-position. Scoping this quantifier has the effect of static closure; the result of scoping is a clause with category s. (Of course the scope of the quantifier may contain pronouns that are free or bound from the outside. These have to be scoped after scoping the quantifier. Likewise, specific indefinites in the restrictor or the scope of every are scoped after scoping the quantifier headed by every.) Consider the weak reading of (30). The derivation (which is structurally identical for the weak and the strong reading) is schematically given in Figure 6.12 on the following page, and it leads to the semantic representation in (34). (34)
∀z(↓ εx(farmer’z ∧ own’xz) → ↓ (εx(farmer’z ∧ own’xz) & πu.beat’uz))
Some elementary manipulations using the equivalences from the facts stated above lead to the equivalent term (35)
∀z(∃x(farmer’z ∧ own’xz) → ∃x(farmer’z ∧ own’xz ∧ beat’xz))
This first order formula can further be simplified to (36)
∀z(farmer’z ∧ ∃xown’xz → ∃x(own’xz ∧ beat’xz))
So in its weak reading, (30) is true iff every farmer who owns something beats something that he owns. Now consider the strong reading that arises if we use the second lexical entry for every. The syntactic derivation is identical to the weak reading, and replacing the first lexical entry for every by the second leads to the semantic representation
every lex
f armer who owns something
/E
beats it
πu.beat’uw s|np
beat’uw s
|I, j
beat’u np\s
j \E
Figure 6.12.
Derivation for (30)
∀z(↓ εx.farmer’z ∧ own’xz →↓ (εx.(farmer’z ∧ own’xz) & (πu.beat’uz))) s
w np
λQ.∀z(↓ εx.farmer’z ∧ own’xz →↓ (εx.(farmer’z ∧ own’xz) & Q{z})) q(np, S, s) i
qE, i
∀z(↓ εx(farmer’z ∧ own’xz) →↓ (εx(farmer’z ∧ own’xz) → πu.beat’uz)) πx.beat’x (np\s)|np
(37) εxλy.farmer’y ∧ own’xy nnp
ANAPHORA AND TYPE LOGICAL GRAMMAR
λP Q.∀z(↓ P {z} →↓ (P {z} & Q{z})) q(np, S, s)/N
244
According to the abbreviational convention for →, this is shorthand for
245
Indefinites
(38)
∀z(↓ εx(farmer’z ∧ own’xz) → ↓∼ (εx(farmer’z ∧ own’xz) & ∼ πu.beat’uz))
Here we can commute Dekker-negation and πu and perform dynamic binding. This leads to (39)
∀z(↓ εx(farmer’z ∧ own’xz) → ↓∼ (εx(farmer’z ∧ own’xz & ∼ beat’xz)))
Further elementary manipulations yield the equivalent first order formula (40)
∀z(∃x(farmer’z ∧ own’xz) → ¬∃x(farmer’z ∧ own’xz ∧ ¬beat’xz))
Due to the laws of first order logic, this is equivalent to (41)
∀z∀x(farmer’z ∧ own’xz → beat’xz)
So (30) is true in its strong reading if every farmer beats everything that he owns. I conclude the discussion of quantificational donkey sentences with the remark that this treatment is not confined to the first order definable determiner every. It easily extends to a general scheme for all determiners. If det is a determiner, I assume two lexical entries for it, namely (where det’ is the generalized determiner, i.e., the relation between sets that corresponds to det) (42)
a. b.
5.
detweak – λP Q.det’(λx ↓ P {x})(λx ↓ (P {x} & Q{x})) : q(np, S, s)/N detstrong – λP Q.det’(λx ↓ P {x})(λx ↓ (P {x} → Q{x})) : q(np, S, s)/N
Indefinites and Scope
In the previous two sections, it was demonstrated how the binding patterns from PLA can be implemented in a type logical setting. The discussion however was confined to indefinite NPs with a trivial descriptive content. In this section, I will extend the framework to indefinites with arbitrary content. Let us start with a brief discussion of the issue of the scope of indefinites in general. Since the work of Fodor and Sag, 1982 it has been generally known that indefinites differ remarkably from genuine quantifiers with respect to their scopal behavior. While the scope of a quantifier like every movie is usually restricted to its local clause, the scope of indefinites is basically unbounded. This is illustrated by the following minimal pair.
246
ANAPHORA AND TYPE LOGICAL GRAMMAR
(43)
a. Some girl will be happy if every movie is shown. [∃ > ∀] *[∀ > ∃] b. Every girl will be happy if some movie is shown. [∃ > ∀][∀ > ∃]
Fodor and Sag, 1982 suggest that indefinites are ambiguous between a quantificational and a referential (= specific) reading. This predicts though that indefinites take either local or global scope. The existence of intermediate readings has been established by several authors, however, notably by Farkas, 1981 and by Abusch, 1994. The following two examples are taken from Kratzer, 1998 (they are slight modifications of examples from Abusch, 1994): (44)
a. Every professor rewarded every student who read some book he had recommended. ∀ > ∃ > ∀ b. Every one of them moved to Stuttgart because some woman lived there. ∀ > ∃ > because
The conclusion that has to be drawn from the investigations of the authors mentioned and others is that the scope of indefinites is structurally unrestricted, even if local and global scope readings might be preferred pragmatically. The sharp contrast between indefinites and other quantifiers suggests that different mechanisms are at work here. Existential closure in the sense of DRT is an obvious candidate for a mechanism to assign scope to indefinites. It leads to mispredictions though if the indefinite has a non-trivial descriptive content. The following example from Reinhart, 1995 illustrates this point. (45)
a. If we invite some philosopher, Max will be offended. b. ∃x((philosopher’x ∧ invite’xwe’) → offended’max’)
Analysing (45a) in a DRT-style way without employing any further scoping mechanisms leads to a semantic representation like (45b) for the specific reading of (a), where the existential impact of the indefinite some philosopher takes wide scope, while the descriptive content remains in the antecedent of the conditional. As already observed in Heim, 1982 for a parallel example, (45b) does not represent the truth conditions of the specific reading of (45a). The former is true if there is one nonphilosopher, while (45a) in the wide-scope reading requires the existence of a philosopher x with the property that Max will be offended if we invite x. Since the existence of the non-philosopher Donald Duck is sufficient to verify (45b) but not (45a), this problem is sometimes called the Donald Duck problem in the literature.
247
Indefinites
To overcome this and related problems, several authors have proposed to employ choice functions for the analysis of indefinites (see for instance Reinhart, 1992, Reinhart, 1995, Reinhart, 1997, Kratzer, 1998, Winter, 1997). To cut a long story short, according to these theories, the semantic counterpart of an indefinite determiner is a variable over a choice function, i.e., a function that maps non-empty sets to one of their elements. This variable is subject to existential closure in a way akin to the treatment of free individual variables in DRT. (45a) would therefore come out as (46)
∃f (CH(f ) ∧ (invite’f (philosopher’)we’ → offended’max’))
The extension of the predicate constant CH is the set of choice functions of type e, t, e, i.e., ∀f (CH(f ) ↔ ∀P (∃xP x → P (f P ))) (46) in fact represents the truth conditions of (45a) in an adequate way. Generally speaking, the choice function approach solves two problems in one stroke. Since it uses unselective binding to assign scope to indefinites, it covers the fact that the scope of indefinites is structurally unrestricted. Second, the choice function mechanism makes sure that the existential impact of an indefinite is not unduly divorced from its descriptive content. On the other hand, the choice function approach faces at least two serious problems. First, what happens if the extension of the descriptive content of an indefinite is empty? Consider a slight variation of (45a): (47)
If we invite some Polish friend of mine, Max will be offended.
If the indefinite some Polish friend of mine receives a specific reading, the sentence can be paraphrased as There is a certain Polish friend of mine, and if we invite him, Max will be offended. Suppose I don’t have any Polish friends. In this scenario the sentence is false in the relevant reading. The choice function approach as such does not supply clear truth conditions in this case, since the argument of the choice function f in the term f (polish friend of mine’) denotes the empty set. Both Reinhart and Winter suggest mechanisms that make the smallest clause containing such a term false. This works fine for simple sentences such as (48)
We invited some Polish friend of mine.
248
ANAPHORA AND TYPE LOGICAL GRAMMAR
This sentence is in fact false if there are no Polish friends of mine. This very fact would make (47) true though, while the sentence should come out as false. Let us call this problem the empty set problem. The second problem arises if the descriptive content of an indefinite contains a pronoun that is bound by some superordinate quantifier. The following example (from Abusch, 1994) can serve to elaborate this point. (49)
Every professori rewarded every student who read some book hei had recommended.
According to the choice function approach, the sentence should have a reading which can be represented as (50)
∃f (CH(f ) ∧ ∀x(professor’x → ∀y(student’y ∧ read’(f (λz.book’z ∧ recommend’zx)) → reward’yx)))
Suppose two professors, a and b, happened to recommend exactly the same books to their students. Then the expressions λz.book’z ∧ recommend’za and λz.book’z ∧ recommend’zb denote the same set, and thus the terms f (λz.book’z ∧ recommend’za) and f (λz.book’z ∧ recommend’zb) denote the same individual. So the reading that is described in (50) can be paraphrased as follows: Every professor has a favorite book. He recommends this book (possibly along with other books), and he awards students that read his favorite book. Furthermore, if two professors recommend the same books, they have the same favorite book. The last condition is entirely unnatural, and the sentence has no such reading. This problem is discussed among others by Kratzer, 1998 (who attributes the observation to Kai von Fintel and P. Casalegno) and Winter, 1997. Again, the literature contains several proposals regarding how to circumvent this kind of overgeneration, but so far no suggested solution is really satisfactory. I will call this problem the bound pronoun problem bound pronoun problem.
Indefinites
249
These and a variety of other problems for the choice function approach have been pointed out by several authors, see for instance Reniers, 1997, Geurts, 2000 and Endriss, 2001. The works mentioned also contain discussions of the possible solutions of these problems, and arguments why they are not fully satisfactory. The conclusion that has to be drawn from the discussion in the literature so far is that the choice function approach is not a viable alternative to a quantificational treatment of indefinites. So an adequate theory of indefinites should avoid the problems that were discussed in this section so far. At the same time, such a theory should take the fact into account that the scope taking behavior of indefinites is virtually unrestricted, and it should be able to deal with the peculiar pronoun binding abilities of indefinites which were the subject of the previous two sections. The theory I sketched there already meets the first two requirements. The scope of indefinites is syntactically handled by the rule ∧ , and its applicability is determined by the scope of static closure. In other words, an indefinite is predicted to take scope either over an entire sentence or over a constituent that is subject to static closure. If we assume that clause embedding operators like verbs of propositional attitude etc. apply static closure to their arguments, this generalization is empirically correct. The scope of ordinary quantifiers on the other hand is syntactically modeled by means of the rule qE in our type logical framework. While I only gave a formulation of this rule that does not take domains of its applicability into account, the clause boundedness of quantifier scope can easily be modeled by means of multimodal techniques, as for instance Morrill, 1994 demonstrates. While multimodality goes beyond the scope of this book, it is important to point out that the scoping mechanisms for indefinites and for quantifiers are independent from each other in our framework, and it is thus not surprising that they are subject to different constraints. In the previous two sections, I demonstrated how the “dynamic” binding abilities of indefinites can be modeled in TLG. It remains to be shown how this framework can take the descriptive content of indefinites into account, thereby avoiding the Donald Duck problem, the empty set problem, and the bound pronoun problem. The basic idea underlying my proposal can be sketched as follows. Recall that I assumed that an indefinite like something denotes the identity function over individuals. Let us extend this idea to other indefinites. For some philosopher, I assume that it denotes the identity function as well, but the domain of this function is confined to the set of philosophers. The application of this function to non-philosophers is not defined. This function combines with its linguistic environment in the
250
ANAPHORA AND TYPE LOGICAL GRAMMAR
manner discussed above for indefinites with trivial descriptive content. A sentence like (51)
John invited some philosopher.
will denote a partial function f from individuals to truth values. Applying f to a philosopher that was invited by John yields the value 1. Applying f to a philosopher that was not invited by John yields the value 0. The application of f to non-philosophers is not defined. The sentence is true with respect to a sequence e iff there are individuals c such that applying f to c yields the value 1. This is the case iff there is at least one philosopher that was invited by John. Likewise, the static closure of the denotation of (51) is 1 iff there is a c such that f c = 1 and 0 otherwise. So the descriptive content of an indefinite is interpreted as a domain restriction for the argument place that corresponds to this indefinite. Existentially closing such an argument place has the effect of asserting the existence of an element of this domain. This makes sure that the descriptive content of an indefinite always has the same scope as its existential impact. Thus we avoid the Donald Duck problem. If this domain happens to be empty, both static closure and the global truth definition lead to falsehood, so there is no empty set problem either. Finally, since the descriptive content of an indefinite always has the same scope as its existential impact, a wide scope reading for the indefinite in (49) is excluded, since then the pronoun could not be bound by the quantifier. So the bound pronoun problem does not arise either.7 To formalize this idea, I extend the term language. ε-abstraction now optionally comes with an explicit domain of the function that is created. I thus add the following clauses to the syntax and the semantics of the term language respectively:
Definition 64 1 If x is a variable of type A, M is a term of type t, and N is a term of type B, then εxM N is a term of type B A . 2 εxM N M,g = {c, N M,g[x→c] |c ∈ Dom(A) ∧ M M,g[x→c] = 1} 7 A very similar analysis could probably be carried out within an unselective bindingframework if the descriptive content of indefinites is analyzed as a restriction on the corresponding variables, and if restricted variables are assumed to be undefined if their value does not obey their restriction. Farkas, 1999 points out that the Donald Duck problem can be avoided by using restricted variables, but she does not develop a semantics with a partial interpretation function.
Indefinites
251
So the denotation of εxM N is similar to the denotation of λxN , except that the domain of this function is restricted to the extension of λxM . The truth definition and the definition of semantic closure on denotations have to be adjusted accordingly. Basically, while existential closure of ε-arguments involves existential quantification over the whole domain of the type of the bound variable, we now only quantify over the extension of the restriction.
Definition 65 (Truth) 1 e |= α : t iff α = 1 2 e |= α : A|e iff e − 1 |= αe1 : A 3 e |= α : Ae iff ∃c ∈ Dom(α) : e |= αc : A
Definition 66 (Static closure of sentential denotations) 1 ↓ (α : t) = α : t 2 ↓ (α : S|e) = λc. ↓ (αc) :↓ S|e
3 ↓ (α : S e ) = {(↓ (αc) :↓ S)|c ∈ Dom(α)} There are some noteworthy facts concerning the behavior of restricted abstraction.
Fact 5 1 e |= εx1,M1 · · · εxn,Mn N : t iff e |= ∃x1 .M1 ∧· · ·∧∃xn .Mn ∧N 2 (↓ εxM N ) : t = ∃x(M ∧ ↓ N ) 3 εxM [((εzN O)x)/y] = εxN [x/z] M [(O[x/z])/y] provided x is free for z in O and N , and no variable in N that is bound on the left hand side is free on the right hand side Proof: Immediate from the definitions.
The first two parts simply state that existential closure of restricted εabstraction turns it into restricted existential quantification. The third part is basically a restricted version of β-reduction for ε-abstraction. If we apply an ε-abstract εzN O to a variable x that is itself ε-bound, we may perform β-reduction (subject to the usual restrictions) and thus simplify the function application to O[x/z], but the restriction N on z has to be passed on to the ε-operator that binds x. The side condition ensures that no variable in N may become unbound by this operation.
252
ANAPHORA AND TYPE LOGICAL GRAMMAR
Indefinites are still analyzed as identity functions, but these are created by restricted ε-abstraction, and the common noun phrase of an indefinite NP supplies the restriction on the abstract. So the lexical entry for the indefinite determiner some comes out as in (52). (The indefinite article a is treated analogously.) (52)
some – λP εxP x x : npnp /n
In the remainder of this section I will demonstrate that this treatment of the descriptive contents of indefinites adequately extends the treatment of indefinites of the previous two sections to the general case, and that it avoids the problems of the unselective binding approach and the choice function approach that were discussed above. Let us start the discussion with a simple example like (53)
John invited some philosopher.
The syntactic derivation does not differ from previous examples where the descriptive content of the indefinite was empty. It is given in Figure 6.13 for completeness. some
philosopher
lex
λP εxP x x npnp /n invited John john’ np
lex
lex
invite’ (np\s)/np
philosopher’ n
εxphilosopher’x x npnp (εxphilosopher’x x)y np
invite’((εxphilosopher’x x)y) np\s invite’((εxphilosopher’x x)y)john’ s
lex /E
i /E
\E
∧, I
εy.invite’((εxphilosopher’x x)y)john’ snp Figure 6.13.
Derivation of (53)
The semantic representation of the sentence is (54a). According to the third part of Fact 5, this is equivalent to (54b), which in turn has the same truth conditions as (54c). (54)
a. εy.invite’((εxphilospher’x x)y)john’
253
Indefinites
b. εyphilospher’y .invite’yjohn’ c. ∃y(philospher’y ∧ invite’yjohn’) I continue with another look at the interaction between indefinites and negation. Example (55) is analogous to (17) apart from the fact that someone has been changed to some farmer. (55)
Some farmer doesn’t beat his donkey.
The syntactic derivations of (55) are structurally identical to the four derivations of (17) (which are given in the figures 6.5 – 6.8). They lead to the four semantic representations in (56). (56)
a. ∼ πwεv.beat’(donkey of’w)((εxfarmer’x x)v) b. εv ∼ πw.beat’(donkey of’w)((εxfarmer’x x)v) c. ∼ εv.beat’(donkey of’((εxfarmer’x x)v)) ((εxfarmer’x x)v) d. εv ∼ beat’(donkey of’((εxfarmer’x x)v)) ((εxfarmer’x x)v)
According to the third part of Fact 5, these terms can be rewritten by the equivalent (57)
a. b. c. d.
∼ πwεvfarmer’v .beat’(donkey of’w)v εvfarmer’v ∼ πw.beat’(donkey of’w)v ∼ εvfarmer’v .beat’(donkey of’v)v εvfarmer’v ∼ beat’(donkey of’v)v
Some further elementary transformations render these representations truth conditionally equivalent to8 (58)
a. b. c. d.
πw¬∃v(farmer’v ∧ beat’(donkey of’w)v) πw∃v(farmer’v ∧ ¬beat’(donkey of’w)v) ¬∃v(farmer’v ∧ beat’(donkey of’v)v) ∃v(farmer’v ∧ ¬beat’(donkey of’v)v)
So in all four readings, the restriction on ε-abstraction is always turned into a restriction of an existentially quantified variable. This fact accounts for the absence of a Donald Duck problem in the present account. Reconsider the critical example (45), which is repeated here in 8 The calculation for (b) makes use of the fact that εx πyN and πyεx N are truth condiM M tionally equivalent if y is not free in M , which follows directly from the definitions.
254
ANAPHORA AND TYPE LOGICAL GRAMMAR
a slightly modified form as (59a). Giving the indefinite wide scope over the conditional leads to the semantic representation (59b). (59)
a. If John invites some philosopher, Max will be offended. b. εx(invite’((εyphilosopher’y y)x)john’ → offended’max’)
Transferring the restriction on the inner ε to the outer ε leads to (60a). Expanding the abbreviational convention for → gives us (b), and employing the correspondence between the Dekker-connectives and the classical first order connectives makes this equivalent to (c). This in turn is truth conditionally equivalent to (d). (60)
a. b. c. d.
εxphilosopher’x (invite’xjohn’ → offended’max’) εxphilosopher’x ∼ (invite’xjohn’ & ∼ offended’max’) εxphilosopher’x ¬(invite’xjohn’ ∧ ¬offended’max’) ∃x(philosopher’x ∧ (invite’xjohn’ → offended’max’))
Note that the truth conditional equivalence between (59b) and (60d) holds for all models, including those where philosopher’ has an empty extension. If there are no philosophers, the sentence is predicted to be false. So the present account avoids the empty set problem. The semantic reason for this is the fact that the denotation of (59a) is a function from philosophers to truth values. If there are no philosophers, this is the empty function. A truth valued function is true according to our truth definition iff there are arguments for which the function returns the value 1. The empty function never returns any value, therefore the sentence is false in such a model. Let us now turn our attention to a run-of-the-mill conditional donkey sentence like the classical (61)
If a farmer owns a donkey, he beats it.
The syntactic derivation of this sentence is analogous to the one for (21). It leads to the semantic representation (62)
εxfarmer’x εydonkey’y own’yx → πzπwbeat’wz
Expanding the definition of → leads to (63a). Dynamic binding makes this equivalent to (b). Employing the interaction between the Dekker connectives, static closure and the classical connectives allows us to rewrite (b) as (c). Using the second part of Fact 5 twice, we get (d), and this is first-order equivalent to (e).
255
Indefinites
(63)
a. b. c. d. e.
∼ (εxfarmer’x εydonkey’y own’yx & ∼ πzπwbeat’wz) ∼ εxfarmer’x εydonkey’y (own’yx & ∼ beat’yx) ¬ ↓ εxfarmer’x εydonkey’y (own’yx ∧ ¬beat’yx) ¬∃x(farmer’x ∧ ∃y(donkey’y ∧ own’yx ∧ ¬beat’yx)) ∀x(farmer’x → ∀y(donkey’y ∧ own’yx → beat’yx))
Quantificational donkey sentences are analyzed in a similar way. Apart from the form of the indefinite, the example in (64) is analogous to (30), which was discussed in the previous section. Again, the syntactic derivation is analogous (cf. Figure 6.12 on page 244), and we end up with the semantic representation in (64b,c) for the weak and the strong readings, respectively. (64)
a. Every farmer who owns a donkey beats it. b. ∀z(↓ εx(farmer’z ∧ own’((εydonkey’y y)x)z) → ↓ (εx(farmer’z ∧ own’((εydonkey’y y)x)z) & πu.beat’uz)) c. ∀z(↓ εx(farmer’z ∧ own’((εydonkey’y y)x)z) → ↓ (εx(farmer’z ∧ own’((εydonkey’y y)x)z) → πu.beat’uz))
β-reduction leads to (65)
a. ∀z(↓ εxdonkey’x (farmer’z ∧ own’xz) → ↓ (εxdonkey’x (farmer’z ∧ own’xz) & πu.beat’uz)) b. ∀z(↓ εxdonkey’x (farmer’z ∧ own’xz) → ↓ (εxdonkey’x (farmer’z ∧ own’xz) → πu.beat’uz))
Expanding the definition for → and performing dynamic binding (together with some minor routine manipulations) leads to the reformulations (66)
a. ∀z(↓ εxdonkey’x (farmer’z ∧ own’xz) → ↓ εxdonkey’x .(farmer’z ∧ own’xz) & beat’xz) b. ∀z(↓ εxdonkey’x (farmer’z ∧ own’xz) → ∼↓ εxdonkey’x .(farmer’z ∧ own’xz) & ∼ beat’xz)
Crucially, all ε-operators in these representations are immediately preceded by ↓. Due to Fact 5, this amounts to existential quantification over the corresponding argument places, i.e., we get (67)
a. ∀z(↓ ∃x(donkey’x ∧ farmer’z ∧ own’xz) → ↓ ∃x(donkey’x ∧ farmer’z ∧ own’xz) & beat’xz)
256
ANAPHORA AND TYPE LOGICAL GRAMMAR
b. ∀z(↓ ∃x(donkey’x ∧ farmer’z ∧ own’xz) → ∼↓ ∃x(donkey’x ∧ farmer’z ∧ own’xz) & ∼ beat’xz) This in turn is equivalent to (68)
a. ∀z(∃x(donkey’x ∧ farmer’z ∧ own’xz) → ∃x(donkey’x ∧ own’xz ∧ beat’xz)) b. ∀z(farmer’z → ∀x(donkey’x ∧ own’xz → beat’xz))
So to sum up this point, the domain restriction on ε-bound variables is always turned into a restriction on the corresponding existential quantifier when this ε-slot is existentially bound. This avoids the Donald Duck problem, and the treatment of donkey constructions that was proposed in the previous section carries over to indefinites with non-trivial restrictions without problems. It remains to be shown how the present system handles cases where the descriptive part of an indefinite contains a bound pronoun. A simple example is (69)
Every girli visited some boy that shei fancied.
In the indicated binding configuration, the subject quantifier must take scope over the indefinite object because otherwise the corresponding proof tree would not be well-formed (cf. the discussion of this issue on page 167 in Chapter 4). So the only derivation of (69) is the one that is sketched in Figure 6.14 on the next page.9 The semantic representation of (69) is thus (70)
∀x(↓ girl’{x} →↓ (girl’{x} & (λuεv.visit’((εyboy’y∧fancy’yu y)v)u){x}))
According to the laws of structural function application,this is equivalent to (71)
∀x(↓ girl’x →↓ (girl’x &εv.visit’((εyboy’y∧fancy’yx y)v)x))
β-reduction leads to (72)
∀x(↓ girl’x →↓ (girl’x &εvboy’v∧fancy’vx .visit’vx))
Some elementary transformations lead to the equivalent first order formula 9 I use only the weak reading of every here since the strong reading leads to an equivalent result.
(73) i
lex
∧, k
Figure 6.14.
Derivation of (69)
∀x(↓ girl’{x} →↓ (girl’{x} &(λuεv.visit’((εyboy’y∧fancy’yu y)v)u){x})) s
εv.visit’((εyboy’y∧fancy’yu y)v)u snp
visit’((εyboy’y∧fancy’yu y)v)u s
\E
qE, i
(εyboy’y∧fancy’yu y)v np
/E
k
λw.boy’w ∧ fancy’wu n
εyboy’y∧fancy’yu y npnp
lex
πzλw.boy’w ∧ fancy’wz n|np
visit’((εyboy’y∧fancy’yu y)v) np\s
visit’ (np\s)/np
λQ.∀x(↓ girl’{x} →↓ (girl’{x} & Q{x})) q(np, S, s)
[u]j np
visited
every girl
λP εyP y y npnp /n
some
boy that she f ancied
/E
|E, i
Indefinites
257
∀x(girl’x → ∃v(boy’v ∧ fancy’vx ∧ visit’vx))
According to these truth conditions, the sentence could also be true in a situation where two girls fancy the same boys but visit different boys. This is in line with the semantic intuitions. As discussed above, the
258
ANAPHORA AND TYPE LOGICAL GRAMMAR
choice function approach furthermore predicts a non-existent reading where girls that fancy the same boys must visit the same boy to make the sentence true. It might be argued that this reading is actually there but hard to detect, because it is logically stronger than the ordinary narrow-scope reading (73). This is not the case anymore though if we use a downward monotonic quantifier in subject position, as in (74)
At most three girls visited a boy that they fancied.
According to the choice function approach, this sentence should have the reading given in (75a), which is truth-conditionally equivalent to (75b). (75)
a. ∃f.CH(f )∧|λx.girl’x∧visit’(f (λy.boy’y∧fancy’yx))x| ≤ 3 b. |λx.girl’x ∧ ∀y((∃z(boy’z ∧ fancy’zx) → boy’y ∧ fancy’yx) → visit’yx)| ≤ 3
Under the assumption that every girl fancies some boy, the prediction is that the sentence has a reading that is synonymous to At most three girls visited every boy that they fancied. Intuitions are fairly solid here that such a reading does not exist. Intuitively, this bound pronoun problem in connection with the choice function approach is similar to the Donald Duck problem of unselective binding: In both approaches, the interpretation of the descriptive content of an indefinite is divorced from its existential impact, while these two semantic components of indefinites always occur in tandem. Modelling the scope of indefinites by means of existential closure over partial functions covers this fact. It deserves to be mentioned that the bound pronoun problem of the choice function approach has been taken as evidence by Geurts, 2000 and by Endriss, 2001 that the scope of indefinites is assigned by means of some form of syntactic movement. The present solution proves that this conclusion is not inevitable. The scope of indefinites is assigned in an entirely surface compositional way here, without making reference to transformations between syntactic representations. (Recall that the manipulations of the semantic representation that I used in the discussion above are meaning preserving reformulations in the semantic representation language without any significance for the meanings that the theory assigns to natural language expressions.)
6.
Sluicing
Donkey anaphora is an empirical domain where the grammar of indefinites is intricately linked with the grammar of anaphora. The same
Indefinites
259
holds for the phenomenon of sluicing. After a brief recapitulation of the basic issues that arise in connection with this form of ellipsis, I will demonstrate that the LLC-treatment of anaphora in combination with the analysis of indefinites that was developed in the previous sections can easily be combined in a natural approach to sluicing. Briefly put, sluicing is a version of ellipsis where under certain contextual conditions, a bare wh-phrase stands proxy for an entire (embedded or matrix) question. The phenomenon was first systematically described in Ross, 1969, where also the name is coined. Typical examples are (76)
a. She’s reading something, but I don’t know what. b. Some guy knows how to get in here. Do you have any idea who? c. They hired a new system administrator. Guess who!
As with VP ellipsis, sluicing involves a source clause and a target clause. The source clause is typically a declarative clause which contains an indefinite NP. The target clause is (interpreted as) the question that is obtained if this indefinite is replaced by a wh-phrase. At the surface structure, everything but this wh-phrase is deleted. So on the face of it, the examples above are related (via deletion, reconstruction or whatever) to the non-elliptical counterparts (77)
a. She’s reading something, but I don’t know what she’s reading. b. Some guy knows how to get in here. Do you have any idea who knows how to get in here? c. They hired a new system administrator. Guess who they hired!
Interestingly, sluicing constructions remain grammatical in cases where the non-elliptical counterpart involves an island violation of the whphrase. Consider the following example (like some of the subsequent examples, it is taken from Merchant, 1999).10 (78)
10 I
a. They wanted to hire somebody who speaks a Balkan language, but I don’t know which.
follow the standard assumption that the deletion of the common noun phrase inside the wh-phrase (i.e., which instead of which Balkan language) is independent of sluicing, and I will ignore this kind of ellipsis.
260
ANAPHORA AND TYPE LOGICAL GRAMMAR
b. *They wanted to hire somebody who speaks a Balkan language, but I don’t know which Balkan language they wanted to hire somebody who speaks. In the non-elliptical version (78b), the wh-phrase which Balkan language binds a gap inside a relative clause island. Therefore the example is ungrammatical. Nonetheless, the corresponding sluicing construction (78a) is impeccable. The same point can be made with regard to a whole range of syntactic island constraints. The following list is not meant to be exhaustive.
Adjunct islands. (79)
a. Ben will be mad if Abby talks to one of the teachers, but she couldn’t remember which. b. *Ben will be mad if Abby talks to one of the teachers, but she couldn’t remember which of the teachers Ben will be mad if Abby talks to. (from Merchant, 1999)
Complex NP islands. (80)
a. The administration has issued a statement that it is willing to meet with one of the student groups, but I’m not sure which one. b. *The administration has issued a statement that it is willing to meet with one of the student groups, but I’m not sure which one the administration has issued a statement that it is willing to meet with. (from Chung et al., 1995)
Sentential subject islands. (81)
a. That certain countries would vote against the resolution has been widely reported, but I’m not sure which ones. b. *That certain countries would vote against the resolution has been widely reported, but I’m not sure which ones that would vote against the resolution has been widely reported. (from Chung et al., 1995)
Embedded question islands. (82)
a. Sandy was trying to work out which students would be able to solve a certain problem, but she wouldn’t tell us which one.
Indefinites
261
b. *Sandy was trying to work out which students would be able to solve a certain problem, but she wouldn’t tell us which one Sandy was trying to work out which students would be able to solve. (from Chung et al., 1995)
Coordinate structure constraint. (83)
a. Bob ate dinner and saw a movie that night, but he didn’t say which. b. *Bob ate dinner and saw a movie that night, but he didn’t say which movie Bob ate dinner and saw that night. (from Merchant, 1999)
These facts suggest that sluicing does not involve syntactic operations like reconstruction or deletion. Rather, an approach that requires some form of semantic correspondence between source clause and target clause seems viable. On the other hand, the morphological form of the remnant wh-phrase is not arbitrary. In languages with overt case marking, the wh-phrase has to have the same case as the indefinite in the source. The following German example is from Ross, 1969, where this effect (as well the island insensitivity of sluicing) was first observed. (84)
a. Er will jemandem schmeicheln, aber sie wissen nicht {wem / *wen}. He wants someoneDAT flatter but they know not {whoDAT / *whoACC } b. Er will jemandem schmeicheln, aber sie wissen nicht, {wem / *wen} er schmeicheln will. he wants someoneDAT flatter but they know not {whoDAT / *whoACC } he flatter wants ‘He wants to flatter someone, but they don’t know who (he wants to flatter)’
(85)
a. Er will jemanden loben, aber sie wissen nicht {*wem / wen}. He wants someoneACC praise but they know not {*whoDAT / whoACC } b. Er will jemanden loben, aber sie wissen nicht, {*wem / wen} er loben will. he wants someoneACC flatter but they know not {*whoDAT / whoACC } he praise wants ‘He wants to praise someone, but they don’t know who (he wants to praise)’
262
ANAPHORA AND TYPE LOGICAL GRAMMAR
The German verbs schmeicheln (‘to flatter’) and loben (‘to praise’) govern dative case and accusative case respectively on their object. The sluiced wh-phrases in the (a)-examples have to have the same case marking as the corresponding indefinites in the source clause. In other words, they must have the same case marking that they would have in the corresponding non-elliptical constructions. Under a pure identity-of-meaning approach, this morphological correspondence would seem mysterious. A third peculiarity of sluicing is the fact that the wh-phrase in the target clause and the corresponding indefinite in the source clause must have parallel scope. (This has been pointed out by Chung et al., 1995.) Recall that in (43b)—repeated here as (86a)—the indefinite some movie may have narrow scope or wide scope relative to the quantifier every girl. If this sentence is used as a source clause in sluicing, only the wide scope reading is possible. (86)
a. Every girl will be happy if some movie is shown. [∃ > ∀][∀ > ∃] b. Every girl will be happy if some movie is shown, but I don’t know which movie. [∃ > ∀] *[∀ > ∃]
As I will try to demonstrate in the remainder of this section, the theory of indefinites that was developed in the previous sections lends itself naturally to a TLG account of sluicing that is based on LLC and covers the empirical generalizations just discussed. Let us consider a simple example like (87)
John invited someone, but it is unclear who John invited.
The details of the semantics of questions that one adopts are of minor importance for the subsequent discussion. Therefore I remain neutral in this respect and represent the semantics of the sluiced question who John invited as (88a). The interrogative pronoun who has the lexical entry in (88b). So the missing piece of meaning that is required to interpret the ellipsis in (87) is (88c). (88)
a. ?x.invite’xjohn’ b. who – λP ?xP x : Q/(s/np) c. λx.invite’xjohn’
The denotation of the term in (88c) is identical to the denotation of the source clause John invited someone according to the semantics of indefinites given above. So the adequate reading can easily be derived via anaphora resolution if we assign the interrogative pronoun who the additional lexical entry
263
Indefinites
(89)
who – λP ?xP x : Q|(snp )
The semantics of the two readings of who is identical. In the sluicing version, who is an anaphor that needs a declarative clause containing an indefinite as antecedent to yield a question. This is just a formal reformulation of the informal description of sluicing patterns given above. Note that the only difference between the two lexical entries for who lies in the fact that they use different substructural versions of Intuitionistic implications. This is similar to the relation between the ordinary English auxiliaries and their VPE-counterparts. They too have pairwise identical meanings, and their categories differ with regard to the implication they use (vp/vp versus vp|vp). An analogous lexical ambiguity has to be assumed for all interrogative pronouns and interrogative determiners. What happens if the descriptive content of the indefinite is nontrivial, as in (90)? (90)
John invited some philosopher, but it is unclear which philosopher.
Here, the missing piece of meaning is also λx.invite’xjohn’, but the meaning of the source clause is the partial function εxphilosopher’x .invite’xjohn’ We can transform this partial function into a total function by means of an operation tot’, which is defined as 1 iff f c = 1 ∀f, c.tot’f c = 0 else So for the sluicing version of the interrogative determiner which we have to assume the lexical entry11 (91)
which – λP λR?x.P x ∧ tot’Rx : Q|(snp )/n
The (simplified) syntactic derivation for (90) is thus as in given in Figure 6.15 on the following page. The semantic representation which corresponds to this derivation is (92a), which is truth conditionally equivalent to (92b). (I treat but as synonymous with and.) 11 The
correct analysis of the semantic contribution of the restrictor of which is an intricate issue since it gives rise to a de re/de dicto ambiguity (cf. the discussion in Groenendijk and Stokhof, 1984, pp 89). I think that an analysis using Boolean conjunction as in the entry in (91) can be maintained if we admit free world indexing of common nouns in the restrictor of operators, which is arguably necessary anyway. The issue is orthogonal to our present concerns, so I omit further discussion.
264
some npnp /n
John np
lex
(np\s)/np
lex np\s
s [snp ]j s
npnp np
philosopher n
lex
/E
which
i
Q|(snp )/n
/E
\E
but
∧, i
(s\s)/s
k
it is unclear
Q|(snp )
s/Q
Q
lex
s s\s
s snp
∧, k
Figure 6.15.
lex
Derivation for (90)
\E
/E
philosopher n |E, j
/E
/e
lex
ANAPHORA AND TYPE LOGICAL GRAMMAR
invited
lex
Indefinites
(92)
265
a. εx.(εyphilosopher’y invite’yjohn’)x∧ unclear’?z.philosopher’z∧ tot’(εyphilosopher’y invite’yjohn’)z b. ∃x.philosopher’x ∧ invite’xjohn’∧ unclear’?z.philosopher’z ∧ invite’zjohn’
Let us now change the example slightly to (93)
John invited some philosopher, but it is unclear who.
Here, the descriptive content of the indefinite in the source clause does not coincide with the restrictor of the wh-phrase in the target. Nevertheless, (93) is synonymous to (90). To derive this fact, I have to revise the lexical entry of who slightly. The example (93) demonstrates that the antecedents of who-sluices may be partial functions. Therefore, the totalizing function tot’ has to be incorporated into the semantics of sluicing-who as well. The modified entry is thus12 (94)
who – λP ?xtot’P x : Q|(snp )
Given this, the interpretation of (93) comes out as in (95a), which is truth-conditionally equivalent to (92b) as well. (95)
εx.(εyphilosopher’y invite’yjohn’)x∧ unclear’?z.tot’(εyphilosopher’y invite’yjohn’)z
The fact that the descriptive content of the source indefinite always serves as an additional restriction of the remnant wh-phrase in the sluiced question was first observed (and accounted for) in Chung et al., 1995. It provides a major stumbling block for any theory that analyzes sluicing via syntactic copying or deletion. The prime obstacle for a purely syntactic approach to sluicing is of course the lack of island sensitivity of sluiced constructions. Ross, 1969 suggested that syntactic island constraints only apply to phonetically non-empty structures. Versions of this idea recur at several places in the relevant literature, most recently—in a qualified form—in Merchant, 1999. A full discussion of how this problem is dealt with in TLG would of course require a discussion of islandhood within this framework. Like the problem of restrictions on quantifier scope, this issue goes beyond the 12 We
may as well assume that ordinary who has the same semantics and thus maintain the synonymy between the two readings, since the argument of who in ordinary questions is always a total function and the presence of tot’ does not make any difference.
266
ANAPHORA AND TYPE LOGICAL GRAMMAR
scope of this work, and the interested reader is referred to the relevant discussion in Morrill, 1994. However, quite independently of the precise type logical analysis of islandhood that we adopt, it should be clear that sluicing is predicted to be insensitive to it. Consider again a relevant minimal pair such as (96)
a. They wanted to hire somebody who speaks a Balkan language, but I don’t know which one. b. *They wanted to hire somebody who speaks a Balkan language, but I don’t know which Balkan language they wanted to hire somebody who speaks.
The wh-phrase which one in (96a) has the syntactic category Q|(snp ). So it acts as a question as soon as the linguistic context supplies an antecedent of category snp —a clause containing a wide scope indefinite. The source clause in (96a) has this category, provided the indefinite a Balkan language is given wide scope there. The (ungrammatical) nonelliptical question in (96b) plays no role in the analysis of (96a), so no matter how we exclude (96b), this analysis will not affect the analysis of (96a). Instead, we predict that the locality constraints in sluicing exactly mirror the locality constraints for the scope of indefinites. Since the latter is in principle unbounded, so is sluicing. The discussion in the previous paragraph readily suggests an explanation of one half of the scope parallelism facts mentioned above. Note that the source clause in (96a), taken in isolation, is ambigous between a narrow scope reading and a wide scope reading of the indefinite a Balkan language. According to the present theory of indefiniteness, this semantic difference is reflected in the syntactic category of this clause. If the indefinite has narrow scope, the matrix clause has the category s. Wide scope of the indefinite corresponds to the category snp for the matrix clause, and only in this category may it serve as antecedent for the sluice in the second conjunct. These considerations derive one half of the scope parallelism constraint: the indefinite in the source clause that licenses sluicing must at least have scope over the entire source clause. We predict that it may have wider scope though. The derivation in Figure 6.15 on page 264 already provides an example. If cross-clausal binding of pronouns by indefinites is analyzed in the way done here, i.e., by assuming that the binding indefinite takes scope over the entire construction, this conclusion is desired and even inevitable. Nonetheless in examples like (97), only the reading where the indefinite and the wh-phrase take exactly parallel scope (i.e., some girl takes narrow scope with respect to knows) is possible.
Indefinites
(97)
267
Everybody knows that John wants to marry some girl, but John’s mother still doesn’t know which one (John wants to marry).
This might be due though to the fact that constituent questions trigger existential presuppositions. So the sluiced question in (97) triggers the presupposition John wants to marry some girl (with a girl having wide scope). If the indefinite takes wider scope than the wh-phrase, this presupposition has to be bound via bridging, while it can be directly bound if the scopes are parallel. Let us now turn to the third empirical generalization discussed above, the morphological parallelism between the licensing indefinite in the source clause and the remnant wh-phrase in the target. I repeat Ross’ example. (98)
a. Er will jemandem schmeicheln, aber sie wissen nicht {wem / *wen}. He wants someoneDAT flatter but they know not {whoDAT / *whoACC } ‘He wants to flatter someone, but they don’t know whom’ b. Er will jemanden loben, aber sie wissen nicht {*wem / wen}. He wants someoneACC praise but they know not {*whoDAT / whoACC } ‘He wants to praise someone, but they don’t know who (he wants to praise)
For a detailed discussion of the treatment of morphology in TLG, I have to refer the reader once again to Morrill, 1994, but for the present purposes a sketch will do. Suffice it to say that basic categories in a morphologically informed version of TLG are not unstructured atoms but (atomic) first order formulae, i.e., they consist of a predicate (unary predicates suffice) which takes complex terms as arguments. Morphological feature structures can be coded as first order terms. Underspecified aspects of the morphological structure can be represented as universally quantified individual variables. Morphological feature structures can thus be incorporated into a first order version of LLC+∧ . In such a first order version of TLG, the category of a dative NP in German will be an atomic formula of the form np(...dat...), where dat is a (possibly complex) term representing the case information “dative”. Let us abbreviate this category with np(dat). Likewise, the category of accusative NPs shall be sketched as np(acc). The German interrogative pronouns wem (dative) and wen (accusative) thus have the syntactic categories Q/(s/np(dat)) and Q/(s/np(acc)) respectively, i.e., they bind an np-position with the matching case information in the interrogative clause.
268
ANAPHORA AND TYPE LOGICAL GRAMMAR
The case features of an indefinite NP appear at two places: at the argument and the result of the substructural implication in its syntactic category. An indefinite in dative case has the category np(dat)np(dat) , and likewise for other cases. A clause containing a wide-scope indefinite in dative thus has the category snp(dat) . The sluicing version of the dative interrogative pronoun has the category Q|(snp(dat) ), i.e., it requires a clause as antecedent that contains a wide scope indefinite with dative case. The ungrammatical versions of (98) are excluded because the case features of the anaphoric wh-phrase do not match with the corresponding feature in the antecedent. Among the analyses of sluicing from the literature, the present one is probably closest to the one from Chung et al., 1995. These authors adopt a DRT style unselective binding analysis of indefinites. According to them, sluicing invokes the copying of the LF of the IP of the source clause into the target clause. So after this copying operation, our previous example (99a) would receive approximately an LF as (99b). (99)
a. John invited some philosopher, but it is unclear who. b. ∃x[IP John invited some philosopherx ], but it is unclear whox [IP John invited some philosopherx ].
So the indefinite some philosopher introduces a free variable both in the source clause and the target clause. This variable is bound by unselective existential closure in the source and by the wh-operator in the target. If the source did not contain a free variable (i.e., an indefinite), vacuous binding in the target and thus ungrammaticality would ensue. Furthermore, the copying mechanism ensures that the descriptive content of the indefinite contributes to the interpretation of the target question. Finally, the connection between the wh-operator in the target and the variable that it binds is not established via movement and thus not predicted to be sensitive to island constraints. The main problem of Chung et al.’s (1995) approach is inherited from the unselective binding approach as such—it is susceptible to the Donald Duck problem. The example (100a) will receive the LF (100b). (100)
a. Max will be offended if we invite some philosopher, but it is unclear who. b. Max will be offended if we invite some philosopher, but it is unclear whox [IP Max will be offended if we invite some philosopherx ].
So the question part of this sentence can be paraphrased as which x is such that Max will be offended if x is a philosopher that we invite. Given
Indefinites
269
that Donald Duck is not a philosopher, “Donald Duck” should be a good answer to this question, but it isn’t. To sum up the discussion of sluicing, it can be said that the present theory covers the core facts of this kind of ellipsis in a simple and adequate way. However, our theory is essentially an identity-of-meaning theory, and the literature contains quite a few instances of sluicing that prima facie do not lend themselves easily to such an analysis. The most problematic cases are those where sluicing is not licensed by an overt indefinite; implicit existentially quantified arguments in the source clause can do that job as well. Chung et al., 1995 call this version of sluicing “sprouting”. The following illustrate this phenomenon (101)
a. She served the soup, but I don’t know to whom. (from Chung et al., 1995) b. She was reading, but I couldn’t make out what. (from Chung et al., 1995) c. He’s writing, but you can’t imagine where/why/how fast. (from Ross, 1969)
While it might be suggestive to assume that here, the licensing indefinite is somehow incorporated into the verb, such an analysis won’t work in examples like the following (also from Chung et al., 1995). (102)
Joan ate dinner but I don’t know with whom.
Here, the source clause does not entail that Joan ate dinner with someone, so the elided material in the target clause is not present in its entirety in the source clause, no matter what identity criterion we assume. So a plain identity-of-meaning theory like the present one has nothing to say about these cases. One might argue that these cases seem to involve some version of bridging, a phenomenon that is well attested in all classes of anaphora.
7.
Summary and Desiderata
The main purpose of this chapter was to demonstrate that the LLCanalysis of anaphoric pronouns can be extended to donkey pronouns. However, the difficult part of the analysis of donkey anaphora is not how to analyze pronouns but how to analyze indefinites, so most space was devoted to this problem. The chapter consisted of three parts. In the first part, I introduced Dekker’s Predicate Logic with Anaphora, and I showed how the PLA-analysis of indefinites can be combined with the LLC-treatment of pronouns. I thereby extended LLC to the Categorial logic LLC+∧ . Basically, indefinites are treated analogously to pronouns,
270
ANAPHORA AND TYPE LOGICAL GRAMMAR
with the crucial difference that indefinites cannot be resolved. Another way to look at it is to say that I gave a type logical reformulation of Heim-style DRT, where free variables are replaced by identity functions. The Novelty Condition for indefinites is reconstructed as the absence of resolution rules. By translating Dekker’s analyses of the standard logical connectives of conjunction and negation into the term language accompanying LLC+∧ , we were able to reproduce the core of the DRT analysis of donkey anaphora within TLG. The second part focused on the issue of how the descriptive content of indefinites is to be analyzed. I suggested that indefinites in general denote (possibly partial) identity functions over individuals, and that the descriptive content of an indefinite supplies the domain of this function. These functions function-compose with their linguistic environment, and the descriptive content of an indefinite is thus inherited by the denotations of its super-constituents. I showed that this mechanism, paired with an operation of existential closure of argument slots, circumvents certain problems which plague other current theories of the scoping of indefinites. The last part of the chapter applied these findings to the problem of sluicing. I showed that the functional semantics of indefinites, paired with the LLC-mechanism of anaphora, lends itself naturally to a simple identity-of-meaning theory for sluicing. The basic empirical generalizations about this kind of ellipsis fall out immediately. Each of these three topics is of considerable complexity, and a host of issues has to remain untouched, let alone resolved. As for our account of donkey anaphora, this analysis basically reformulates “classical” Dynamic Semantics (i.e., Dynamic Predicate Logic in the sense of Groenendijk and Stokhof, 1991b), even though the philosophical underpinning is different. This of course means that the empirical weaknesses of DPL are inherited. Our account of the scoping of indefinites is confined to singular NPs. The issue becomes considerably more intricate if plural NPs are taken into account. Weak quantifiers like three men are as unrestricted in their scope taking behavior as singular indefinites, so one would expect the same mechanisms to be at work. However, with plural indefinites, two scoping mechanisms are involved. For instance, sentence (103a) (taken from Winter, 1997 who attributes it to Ruys, 1995) has a reading that can be paraphrased as (103b). (103)
a. If three relatives of mine die, I’ll inherit a fortune. b. There are three relatives of mine, and if each of them dies, I’ll inherit a fortune.
Indefinites
271
The specific reading of three relatives of mine thus actually involves two quantifiations, a wide scope existential quantification over sets of relatives of mine with the cardinality three, and a narrow scope universal quantification over elements of this set. It seems that the former is as unrestricted as the existential impact of singular indefinites, while the universal quantification is confined to the local clause (i.e., obeys the same constraints as other non-indefinite quantifiers). Reniers, 1997 gives a TLG analysis of these facts using two versions of Moortgat’s in situ binder. A reformulation into the present framework, where locally restricted quantification is handled by qE and unrestricted existential quantification by ∧ is easy to provide. However, a lot of issues remain open in in connection with this issue, such as the question of which quantifiers exactly can be subject to a double scope interpretation, and what properties qualify a determiner to belong to that class. These issues have been discussed in different theoretical frameworks in Szabolcsi, 1997 and Endriss, 2001. It remains to be seen whether the findings of these authors are compatible with the present theoretical setting. Finally, the analysis of sluicing presented here remains somewhat sketchy for two reasons. First, a detailed semantics of questions has to take intensionality into account. While there is no fundamental obstacle against a Curry-Howard style intensional semantics,13 the syntaxsemantics interface becomes considerably more complex and less transparent if relativization to possible worlds is added. Therefore, this issue was left out throughout this book. Second, I believe that an adequate treatment of sluicing requires a theory of presupposition resolution and a theory of bridging, and it has to take the effects of information structure into account. While there is no lack of formal approaches to these phenomena, they are completely independent of the type logical aspects of anaphora. The discussion was therefore confined to those aspects that have a direct bearing on the topic of the book as a whole.
13 See
for instance Morrill, 1994 for a fully worked out formalization.
References
Abusch, Dorit (1994). The scope of indefinites. Natural Language Semantics, 2:83–135. Ades, Anthony E. and Steedman, Mark J. (1982). On the order of words. Linguistics and Philosophy, 4:517–558. Ajdukiewicz, Kazimierz (1935). Die syntaktische Konnexit¨at. Studia Philosophica, 1:1–27. Anderson, Alan and Belnap, Nuel (1975). Entailment: The Logic of Relevance and Necessity, volume I. Princeton University Press, Princeton. Anderson, Alan, Belnap, Nuel, and Dunn, Michael (1992). Entailment: The Logic of Relevance and Necessity, volume II. Princeton University Press, Princeton. Andr´eka, Hajnal and Mikul´ as, Szabolcs (1994). Lambek Calculus and its relational semantics: Completeness and incompleteness. Journal of Logic, Language, and Information, 3:1–37. Bach, Emmon (1979). Control in Montague Grammar. Linguistic Inquiry, 10:515–531. Bach, Emmon and Partee, Barbara (1980). Anaphora and semantic structure. In Kreimann, K. J. and Ojeda, A. E., editors, Papers from the Parasession on Pronouns and Anaphora, pages 1–28. Chicago Linguistic Society. Bar-Hillel, Yehoshua (1953). A quasi-arithmetical notation for syntactic description. Language, 29:47–58. Bar-Hillel, Yehoshua, Gaifman, C., and Shamir, E. (1960). On categorial and phrase structure grammars. Bulletin of the Research Council of Israel, F(9):1–16. Barss, Andrew and Lasnik, Howard (1986). A note on anaphora and double objects. Linguistic Inquiry, 17:347–54. Barwise, Jon (1987). Noun phrases, generalized quantifiers and anaphora. In den fors, Peter G¨ ar editor, Generalized Quantifiers. Logical and Linguistic Approaches, pages 1–29. Reidel, Dordrecht. Bresnan, Joan (1994). Linear order vs. syntactic rank: Evidence from weak crossover. In Beals, Katie, Denton, Jeannette, Knippen, Bob, Melnar, Lynette, mi Suzuki, Hisa and Zeinfeld, Erika, editors, CLS 30-I: Papers from the Thirtieth Regional Meeting of the Chicago Linguistic Society, pages 57–89. Chicago Linguistic Society, Chicago. Bresnan, Joan (1998). Morphology competes with syntax: Explaining typological variation in weak crossover effects. In Barbosa, Pilar, Fox, Danny, Hagstrom, Paul, McGinnis, Martha, and Pesetsky, David, editors, Is the Best Good Enough, pages 59–92. MIT Press, Cambridge (Mass.).
273
274
ANAPHORA AND TYPE LOGICAL GRAMMAR
Buszkowski, Wojciech (1997). Mathematical linguistics and proof theory. In van Benthem, Johan and ter Meulen, Alice, editors, Handbook of Logic and Language, chapter 12, pages 683–736. Elsevier, MIT Press. Carpenter, Bob (1998). Type-Logical Semantics. MIT Press, Cambridge (Mass.). Carpenter, Bob (1999). The Turing-completeness of multimodal categorial grammars. Papers presented to Johan van Benthem in honor of his 50th birthday. European Summer School in Logic, Language and Information, Utrecht. Chierchia, Gennaro (1989). Anaphora and attitudes de se. In Bartsch, Renate, van Benthem, Johan, and van Emde Boas, Peter, editors, Semantics and Contextual Expression, pages 1–32. Foris, Dordrecht. Chierchia, Gennaro (1993). Questions with quantifiers. Natural Language Semantics, 1:181–234. Chomsky, Noam (1957). Syntactic Structures. Mouton, The Hague. Chomsky, Noam (1963). Formal properties of grammars. In Luce, R. Duncan, Bush, Robert R., and Galanter, Eugene, editors, Handbook of Mathematical Psychology, volume 2, pages 323–418. Wiley, New York. Chomsky, Noam (1976). Conditions on rules in grammar. Linguistic Analysis, 2:303– 351. Chomsky, Noam (1981). Lectures on Government and Binding. Foris, Dordrecht. Chung, Sandra, Ladusaw, William, and McCloskey, James (1995). Sluicing and Logical Form. Natural Language Semantics, 3:239–282. Cohen, Joel M. (1967). The equivalence of two concepts of Categorial Grammar. Information and Control, 10:475–484. Cooper, Robin (1979). The interpretation of pronouns. In Heny, Frank and Schnelle, Helmut, editors, Syntax and Semantics, volume 10. Academic Press, New York. Cooper, Robin (1983). Quantification and Syntactic Theory. Reidel, Dordrecht. Curry, Haskell and Feys, Robert (1958). Combinatory Logic, volume I. North Holland, Amsterdam. ¨ Dahl, Osten (1973). On so-called sloppy identity. Synthese, 26:81–112. Dalrymple, Mary, Lamping, John, Pereira, Fernando, and Saraswat, Vijay (1997). Quantifiers, anaphora and intensionality. Journal of Logic, Language and Information, 6(3):219–273. Dalrymple, Mary, Shieber, Stuart M., and Pereira, Fernando (1991). Ellipsis and higher-order unification. Linguistics and Philosophy, 14(4):399–452. de Groote, Philippe and Retor´e, Christian (1996). On the semantic reading of proofnets. In Kruijff, Geert-Jan, Morrill, Glyn, and Oehrle, Dick, editors, Proceedings of Formal Grammar, pages 57–70. ESSLLI, Prague. Dekker, Paul (2000). Grounding dynamic semantics. manuscript, University of Amsterdam. Doˇsen, Kosta (1992). A brief survey of frames for the Lambek Calculus. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik, 38:179–187. Dowty, David R., Wall, Robert E., and Peters, Stanley (1981). Introduction to Montague Semantics. Reidel, Dordrecht. Dunn, Michael (1986). Relevance logic and entailment. In Gabbay, Dov and Guenthner, Franz, editors, Handbook of Philosophical Logic, volume III, pages 177–224. Reidel, Dordrecht. Endriss, Cornelia (2001). The double scope of quantifier phrases. Master’s thesis, University of Potsdam. Engdahl, Elisabeth (1986). Constituent Questions. Reidel, Dordrecht.
Reference
275
Evans, Gareth (1977). Pronouns, quantifiers, and relative clauses. Canadian Journal of Philosophy, 7:467–536. Farkas, Donka (1981). Quantifier scope and syntactic islands. In Papers from the 17th Regional Meeting of the Chicago Linguistic Society, pages 59–66. University of Chicago. Farkas, Donka (1999). Scope matters. In von Heusinger, Klaus and Egli, Urs, editors, Reference and Anaphoric Relations, pages 79–108. Kluwer, Dordrecht. Fiengo, Robert and May, Robert (1994). Indices and Identity. MIT Press, Cambridge (Mass.). Fodor, Janet and Sag, Ivan (1982). Referential and quantificational indefinites. Linguistics and Philosophy, 5:355–398. Fox, Danny (1998). Locality in variable binding. In Barbosa, Pilar, Fox, Danny, Hagstrom, Paul, McGinnis, Martha, and Pesetsky, David, editors, Is the Best Good Enough. MIT Press, Cambridge (Mass.). Gamut, L. T. F. (1991). Logic, Language, and Meaning: Introduction to Logic, volume I. University of Chicago Press, Chicago. Gardent, Claire (2000). Deaccenting and higher-order unification. Journal of Logic, Language and Information, 9(3):313–338. Gawron, Jean Mark and Peters, Stanley (1990). Anaphora and Quantification in Situation Semantics. CSLI, Stanford. Gazdar, Gerald, Klein, Ewan, Pullum, Geoffrey, and Sag, Ivan (1985). Generalized Phrase Structure Grammar. Basil Blackwell, Oxford. Geach, Peter (1972). A program for syntax. Synth`ese, 22:3.17. Gentzen, Gerhard (1935). Untersuchungen u ¨ber das logische Schließen. Mathematische Zeitschrift, 39:176–210, 405–431. Geurts, Bart (2000). Indefinites and choice functions. Linguistic Inquiry, 31:731–738. Girard, Jean-Yves (1987). Linear logic. Theoretical Computer Science, 50:1–102. Greibach, Sheila A. (1965). A new normal form theorem for context-free phrase structure grammars. Journal of the ACM, 12:42–52. Groenendijk, Jeroen and Stokhof, Martin (1984). Studies on the Semantics of Questions and the Pragmatics of Answers. PhD thesis, University of Amsterdam. Groenendijk, Jeroen and Stokhof, Martin (1991a). Dynamic Montague Grammar. In Groenendijk, Jeroen, Stokhof, Martin, and Beaver, David Ian, editors, Quantification and Anaphora I, DYANA deliverable R2.2a. Amsterdam. Groenendijk, Jeroen and Stokhof, Martin (1991b). Dynamic Predicate Logic. Linguistics and Philosophy, 14(1):39–100. Hankamer, Jorge and Sag, Ivan A. (1976). Deep and surface anaphora. Linguistic Inquiry, 7(3):391–426. Hardt, Daniel (1993). Verb Phrase Ellipsis: Form, Meaning, and Processing. PhD thesis, University of Pennsylvania. Hausser, Roland and Zaefferer, Dietmar (1978). Questions and answers in a contextdependent Montague Grammar. In Guenthner, Franz and Schmidt, Siegfried J., editors, Formal Semantics and Pragmatics for Natural Language, pages 339–58. Reidel, Dordrecht. Heim, Irene (1982). The Semantics of Definite and Indefinite Noun Phrases. PhD thesis, University of Massachusetts, Amherst. Heim, Irene and Kratzer, Angelika (1998). Semantics in Generative Grammar. Blackwell, Oxford. Hepple, Mark (1990). The Grammar and Processing of Order and Dependency: A Categorial Approach. PhD thesis, University of Edinburgh.
276
ANAPHORA AND TYPE LOGICAL GRAMMAR
Hepple, Mark (1992). Command and domain constraints in a categorial theory of binding. In Dekker, Paul and Stokhof, Martin, editors, Proceedings of the Eighth Amsterdam Colloquium. University of Amsterdam. Hirschb¨ uhler, P. (1982). VP-deletion and Across-the-Board quantifier scope. In Pustejovsky, James and Sells, Peter, editors, Proceedings of NELS 12, pages 132–139. GLSA, Amherst. Howard, William A. (1969). The formulae-as-types notion of construction. manuscript, published in Seldin and Hindley, 1980. Jacobson, Pauline (1992a). Antecedent contained deletion in a variable-free semantics. In Barker, Chris and Dowty, David, editors, Proceedings of SALT 2, number 40 in Working Papers in Linguistics, pages 193–213. Ohio State University, Columbus. Jacobson, Pauline (1992b). Bach-Peters sentences in a variable-free semantics. In Dekker, Paul and Stokhof, Martin, editors, Proceedings of the Eighth Amsterdam Colloquium. University of Amsterdam. Jacobson, Pauline (1994a). Binding connectivity in copular sentences. In Harvey, Mandy and Santelmann, Lynn, editors, Proceedings of SALT IV, pages 161–178. Cornell University. Jacobson, Pauline (1994b). i-within-i effects in a variable-free semantics and a categorial syntax. In Dekker, Paul and Stokhof, Martin, editors, Proceedings of the Ninth Amsterdam Colloquium. University of Amsterdam. Jacobson, Pauline (1996a). The locality of interpretation: The case of binding and coordination. In Proceedings of SALT 6, Cornell Working Papers in Linguistics. Cornell University. Jacobson, Pauline (1996b). The syntax/semantics interface in categorial grammar. In Lappin, Shalom, editor, The Handbook of Contemporary Semantic Theory, pages 89–116. Blackwell Publishers. Jacobson, Pauline (1999). Towards a variable-free semantics. Linguistics and Philosophy, 22(2):117–184. Jacobson, Pauline (2000). Paycheck pronouns, Bach-Peters sentences, and variablefree semantics. Natural Language Semantics, 8(2):77–155. Jacobson, Pauline (2001). Binding without pronouns (and pronouns without binding). manuscript, Brown University. J¨ ager, Gerhard (2001). Anaphora and quantification in categorial grammar. In Moortgat, Michael, editor, Logical Aspects of Computational Linguistics, number 2014 in Lecture Notes in Artificial Intelligence, pages 70–89. Springer, Berlin, Heidelberg. Janssen, Theo (1997). Compositionality. In van Benthem, Johan and ter Meulen, Alice, editors, Handbook of Logic and Language, pages 417–473. Elsevier, MIT Press. Kamp, Hans (1981). A theory of truth and semantic representation. In Groenendijk, Jeroen, Janssen, Theo, and Stokhof, Martin, editors, Formal Methods in the Study of Language, pages 277–322. Amsterdam. Kamp, Hans and Reyle, Uwe (1993). From Discourse to Logic. Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer, Dordrecht. Kanazawa, Makoto (1994). Weak vs. strong readings of donkey sentences and monotonicity inference in a dynamic setting. Linguistics and Philosophy, 17(2):109–158. Kandulski, M. (1988). The equivalence of nonassociative Lambek categorial grammars and context-free grammars. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik, 34:41–52.
Reference
277
Karttunen, Lauri (1969). Pronouns and variables. In Binnick, Robert I., Davison, Alice, Green, Georgia M., and Morgan, Jerry L., editors, Papers from the Fifth Regional Meeting of the Chicago Linguistic Society, pages 108–115. University of Chicago. Karttunen, Lauri (1977). Syntax and semantics of questions. Linguistics and Philosophy, 1:3–44. Kayne, Richard (1978). Logical types for natural language. UCLA Occasional Papers in Linguistics 3. Kayne, Richard (1994). The Antisymmetry of Syntax. MIT Press, Cambridge (Mass.). Keenan, Edward L. and Faltz, Leonard M. (1985). Boolean Semantics for Natural Language. Reidel, Dordrecht. Kehler, Andrew (1993). A discourse copying algorithm for ellipsis and anaphora resolution. In Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL-93), pages 203–212. Utrecht. Kempson, Ruth M. and Cormack, Annabel (1983). Type lifting rules and VP anaphora. In Barlow, Michael T., Flickinger, Daniel P., and Wescoat, Michael T., editors, Proceedings of WCCFL 2, pages 140–152. . Kratzer, Angelika (1998). Scope or pseudscope? Are there wide scope indefinites? In Rothstein, Susan, editor, Events and Grammar, pages 163–196. Kluwer, Dordrecht. Krifka, Manfred (1999). For a structured account of questions and answers. In Smith, Carlota, editor, Proceedings to Workshop on Spoken and Written Text. University of Texas at Austin. Kurtonina, Natasha (1995). Frames and Labels: A Modal Analysis of Categorial Inference. PhD thesis, University of Utrecht. Lakoff, George (1971). Presupposition and relative well-formedness. In Steinberg, Danny D. and Jakobovits, Leon A., editors, Semantics: An Interdisciplinary Reader in Philosophy, Linguistics and Psychology, pages 329–340. Cambridge University Press, Cambridge (UK). Lamarche, Fran¸cois and Retor´e, Christian (1996). Proof nets for the Lambek calculus. In Abrusci, V. Michele and Casadio, Claudia, editors, Third Roma Workshop: Proofs and Linguistic Categories, pages 241–262. CLUEB, Bologna. Lambek, Joachim (1958). The mathematics of sentence structure. American Mathematical Monthly, 65:154–170. Lambek, Joachim (1961). On the calculus of syntactic types. In Jakobson, Roman, editor, Structure of Language and Its Mathematical Aspects. Providence, RI. Lambek, Joachim (1988). Categorial and categorical grammar. In Oehrle, Richard T., Bach, Emmon, and Wheeler, Deirdre, editors, Categorial Grammars and Natural Language Structures, pages 297–317. Reidel, Dordrecht. Larson, Richard (1988). On the double object construction. Linguistic Inquiry, 19:335– 392. Lewis, David (1975). Adverbs of quantification. In Keenan, Edward L., editor, Formal Semantics, pages 3–15. Cambridge University Press. Link, Godehard (1991). Plural. In von Stechow, Arnim and Wunderlich, Dieter, editors, Handbook Semantics. de Gruyter, Berlin, New York. May, Robert (1985). Logical Form, its structure and derivation. MIT Press, Cambridge (Mass.). Merchant, Jason (1999). The syntax of silence: Sluicing, islands, and identity in ellipsis. PhD thesis, University of California at Santa Cruz. Montague, Richard (1974). Formal Philosophy. Yale University Press, New Haven.
278
ANAPHORA AND TYPE LOGICAL GRAMMAR
Moortgat, Michael (1988). Categorial Investigations. Logical and Linguistic Aspects of the Lambek Calculus. Foris, Dordrecht. Moortgat, Michael (1990). The quantification calculus: Questions of axiomatization. In Hendriks, Hermann and Moortgat, Michael, editors, Theories of flexibel interpretation, volume R1.2.A of Dyana deliverable. Centre of Cognitive Science, Edinburgh. Moortgat, Michael (1996a). Generalized quantification and discontinuous type constructors. In Sijtsma, Wietske and von Horck, Arthur, editors, Discontinuous Constituency. De Gruyter, Berlin. Moortgat, Michael (1996b). In situ binding: A modal analysis. In Dekker, Paul and Stokhof, Martin, editors, Proceedings or the Tenth Amsterdam Colloquium, pages 539–549. ILLC, University of Amsterdam. Moortgat, Michael (1997). Categorial type logics. In van Benthem, Johan and ter Meulen, Alice, editors, Handbook of Logic and Language, chapter 2, pages 93–178. Elsevier, MIT Press. Moot, Richard and Bernardi, Raffaella (2000). Generalized quantifiers in declarative and interrogative sentences. Proceedings of ICoS-2. Morrill, Glyn (1990). Intensionality and boundedness. Linguistics and Philosophy, 13:699–726. Morrill, Glyn (1994). Type Logical Grammar. Kluwer, Dordrecht. Morrill, Glyn (1995). Discontinuity in categorial grammar. Linguistics and Philosophy, 18:175–219. Morrill, Glyn (2000). Type-logical anaphora. Report de Recerca LSI-00-77-R, Departament de Llenguatges i Sistemes Inform`atics, Universitat Polit`ecnica de Catalunya. Morrill, Glyn, Leslie, Neil, Hepple, Mark, and Barry, Guy (1990). Categorial deduction and structural operations. In Barry, Guy and Morrill, Glyn, editors, Studies in Categorial Grammar, volume 5 of Edinburgh Working Papers in Cognitive Science, pages 1–21. University of Edinburgh. Morrill, Glyn and Merenciano, Joseph Maria (1996). Generalising discontinuity. Traitement Automatique des Langues, 37(2):119–143. Morrill, Glyn and Solias, Teresa (1993). Tuples, discontinuity and gapping. In Proceedings of the Meeting of the European Chapter of the Association of Computational Linguistics, pages 287–297. Utrecht. Pankrat’ev, Nikolai (1994). On the completeness of the Lambek Calculus with respect to relativized relational semantics. Journal of Logic, Language, and Information, 3:233–246. Partee, Barbara and Rooth, Mats (1983). Generalized conjunction and type ambiguity. In B¨ auerle, Rainer, Schwarze, Christoph, and von Stechow, Arnim, editors, Meaning, Use, and Interpretation of Language, pages 361–383. de Gruyter, Berlin, New York. Pentus, Martin (1993). Lambek grammars are context-free. In Proceedings of the 8th Annual IEEE Symposium on Logic in Computer Science. Montreal. Pentus, Martin (1994). Language completeness of the Lambek calculus. In Proceedings of the Ninth Annual IEEE Symposium on Logic in Computer Science, Montreal. Pentus, Martin (2003). Lambek calculus is NP-complete. Technical Report TR-2003 005, CUNY Ph.D. Program in Computer Science. Pereira, Fernando (1990). Categorial semantics and scoping. Computational Linguistics, 16(1):1–10.
Reference
279
Pesetsky, David (1995). Zero Syntax: Experiencers and Cascades. MIT Press, Cambridge (Mass.). Postal, Paul (1972). A global constraint on pronominalization. Linguistic Inquiry, 3:5–59. Pullum, Geoffrey K. (1991). Footloose and context-free. In The Great Eskimo Vocabulary Hoax, pages 131–138. The University of Chicago Press, Chicago. Reinhart, Tanya (1976). The Syntactic Domain of Anaphora. PhD thesis, MIT, Cambridge (Mass.). Reinhart, Tanya (1983). Anaphora and Semantic Interpretation. Croom Helm. Reinhart, Tanya (1992). Wh-in-situ: an apparent paradox. In Dekker, Paul, editor, Proceedings of the Eighth Amsterdam Colloquium, pages 483–491. University of Amsterdam. Reinhart, Tanya (1995). Interface Strategies. OTS Working Papers. Research Institute for Language and Speech, Utrecht University. Reinhart, Tanya (1997). Quantifier scope: How labor is divided between QR and choice functions. Linguistics and Philosophy, 20:335–397. Reniers, Fabien (1997). How to (s)cope with indefinites. Master’s thesis, University of Utrecht. Restall, Greg (2000). An Introduction to Substructural Logics. Routledge, London, New York. Roorda, Dirk (1991). Resource logics: Proof-theoretical investigations. PhD thesis, University of Amsterdam. Rooth, Mats (1987). Noun phrase interpretation in Montague Grammar, File Change Semantics, and Situation Semantics. In G¨ardenfors, Peter, editor, Generalized Quantifiers. Reidel, Dordrecht. Rooth, Mats (1992). Ellipsis redundancy and reduction redundancy. In Berman, Steve and Hestvik, Arild, editors, Proceedings of the Stuttgart Ellipsis Workshop, Arbeitspapiere des SFB 340 “Sprachtheoretische Grundlagen f¨ ur die Computerlinguistik”, Nr. 29. IBM Heidelberg. Ross, John (1969). Guess who? In Binnick, Robert I., Davison, Alice, Green, Georgia M., and Morgan, Jerry L., editors, Papers from the Fifth Regional Meeting of the Chicago Linguistic Society, pages 252–286. University of Chicago. Ruys, Eddy (1995). Weak crossover as a scope phenomenon. manuscript, Utrecht University. Sag, Ivan A. (1976). Deletion and Logical Form. PhD thesis, MIT. Seldin, Jonathan P. and Hindley, J. Roger, editors (1980). To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism. Academic Press Limited. Sem, Helle Frisak (1994). VP-ellipsis and DRT. In DYANA deliverable, Task 2.2, subtask 4. University of Amsterdam. Shieber, Stuart M., Pereira, Fernando, and Dalrymple, Mary (1996). Interaction of scope and ellipsis. Linguistics and Philosophy, 19(5):527–552. Solias, Teresa (1992). Gram´ aticas Categoriales, Coordinaci´ on Generalizada y Elisi´ on. PhD thesis, Universidad Aut´ onoma de Madrid. Stalnaker, Robert (1998). On the representation of context. Journal of Logic, Language, and Information, 7(1):3–19. Staudacher, Peter (1987). Zur Semantik indefiniter Nominalphrasen. In Asbach-Schnitker, Brigitte and Roggenhofer, Johannes, editors, Neuere Forschungen zur Wortbildung und Historiographie der Linguistik. Festgabe f¨ ur Herbert E. Brekle zum 50. Geburtstag, pages 239–258. Narr, T¨ ubingen.
280
ANAPHORA AND TYPE LOGICAL GRAMMAR
Steedman, Mark (1990). Gapping as constituent coordination. Linguistics and Philosophy, 13(2):207–263. Steedman, Mark (1996). Surface Structure and Interpretation. MIT Press, Cambridge (Mass.). Steedman, Mark (2000). The Syntactic Process. MIT Press, Cambridge (Mass.). Szabolcsi, Anna (1989). Bound variables in syntax (are there any?). In Bartsch, Renate, van Benthem, Johan, and van Emde Boas, Peter, editors, Semantics and Contextual Expressions, pages 295–318. Foris. Szabolcsi, Anna (1992). Combinatory grammar and projection from the lexicon. In Sag, Ivan and Szablocsi, Anna, editors, Lexical Matters. CSLI, Stanford. Szabolcsi, Anna (1997). Strategies for scope taking. In Szabolcsi, Anna, editor, Ways of Scope Taking, pages 109–154. Kluwer, Dordrecht. Szabolcsi, Anna (2000). Cross-sentential anaphora in Combinatory Grammar. manuscript, New York University. Thrainsson, Hoskuldur (1976). Reflexives and subjunctives in Icelandic. In Proceedings of NELS 6, pages 225–239. Universite du Quebec a Montreal. Tiede, Hans-Joerg (1999). Deductive Systems and Grammars: Proofs as Grammatical Structures. PhD thesis, Indiana University. van Benthem, Johan (1991). Language in Action. Elsevier, Amsterdam. van Benthem, Johan (1983). The semantics of variety in categorial grammar. Report 83–29, Simon Fraser University, Burnaby. Versmissen, Koen (1991). Discontinuous type constructors in Categorial Grammar. Master’s thesis, University of Utrecht. Wansing, Heinrich Theodor (1993). The Logic of Information Structures. Springer Lecture Notes in Artificial Intelligence 681. Springer Verlag, Berlin. Wescoat, Michael T. (1989). Sloppy readings with embedded antecedents. manuscript, Stanford University. Williams, Edwin (1997). Blocking and anaphora. Linguistic Inquiry, 28(4):577–628. Winter, Yoad (1997). Choice functions and the scopal semantics of indefinites. Linguistics and Philopsophy, 20:399–467. Zaefferer, Dietmar (1984). Frageausdr¨ ucke und Fragen im Deutschen: Zu ihrer Syntax, Semantik und Pragmatik. Fink, M¨ unchen.
About the Author
Gerhard J¨ ager is professor of linguistics at the University of Bielefeld.
281
Index
214, 219, 222, 223, 225–227, 232, 238, 256, 260, 267–269 deep a., 181 donkey a., x, xi, 256, 267, 268 surface a., 181 Anderson, 28 Andreka, 61 antecedent, 6, 8–10, 13, 14, 23, 25, 26, 28–31, 37, 39, 41, 49, 52, 55, 58, 59, 61, 63, 64, 70, 73, 83, 84, 88, 91, 93, 95, 97, 109, 110, 112, 114, 118–122, 137, 138, 142–145, 155, 156, 165, 167, 172, 175–178, 181, 183– 193, 196, 198, 200, 202–205, 209, 210, 222, 244, 260, 263, 264, 266 application (function a.), 4, 6, 8–10, 13, 17, 23, 25, 31, 41, 42, 54, 56, 75, 80, 100, 108, 121, 125, 133, 135–137, 139, 140, 148, 162, 163, 176, 185, 186, 190, 238, 247–249 structural function a., 240, 254 argument category, 2–4, 45 arrow, 31, 54, 55, 59, 146, 147 assignment function, 32, 71, 73, 206, 215, 225, 228, 229 associative, 31, 54, 57–59, 61, 65, 83, 86, 144, 148, 150, 155 auxiliary, 33, 69, 103, 173, 182, 184, 185, 198, 199, 201, 204, 209, 210, 227, 229, 261 auxiliary inversion, 103, 173 axiom, 6–10, 13, 23, 25, 27, 42, 54–56, 59, 92, 93, 122, 146–148, 150 -atic presentation, 54, 146, 148 -atic system, 54–56, 59, 147, 148
α-equivalence, 33, 141 αβη-equivalence, 33, 141 β-redex, 135, 137, 139 β-reduction, 33, 36, 131, 135, 140, 163, 249, 253, 254 ε-operator, 225, 249 η-redex, 135, 136, 139, 140, 162 η-reduction, 33, 36, 43, 45 λ-abstraction, 31, 71, 81, 136, 137 λ-calculus, ix, x, 4, 12, 13, 16, 31, 33, 34, 36–38, 43, 67, 70, 73, 135, 141, 221, 224, 225 λ-conversion, 33, 106 A, 6, 9, 13, 14, 99–101, 103, 107, 110 A’-movement, 45 Abusch, 244, 246 accessibility relation, 57 accidental coreference, 110, 112, 156, 172, 176, 177 accusative, 76, 78, 86, 88, 158, 259, 265 Ades, 65 adjective, 3, 122 adverb, 3, 158, 212 Ajdukiewicz, 3, 65 alphabet, 5, 7, 12, 42 ambiguity, 15, 74, 103, 117, 134, 154, 158, 176, 185–187, 191, 194, 195, 200, 204, 207, 212, 261 lexical a., 15, 53, 116, 261 scope a., 163, 166 spurious a., 15, 53, 207, 208, 229 structural a., 14, 15, 53, 74, 143, 237 anaphora, ix–xii, 30, 67, 69, 70, 74, 79, 86, 91–93, 96, 98, 100, 101, 106, 108, 109, 112, 114–118, 121, 122, 125, 137, 142, 143, 145, 150, 151, 155, 165, 167, 168, 170–173, 177–179, 181, 183– 187, 193, 198, 204, 211, 213,
283
284
ANAPHORA AND TYPE LOGICAL GRAMMAR identity a., 6, 9, 15, 17, 26, 41, 49, 51, 54, 58, 62, 119, 130, 131, 137, 146, 147 scheme, 6, 25, 26, 49, 54, 62
B, 77, 103, 107 Bach, 83, 102, 171, 172 Bach-Peters sentence, 110, 112, 154, 176 Bar-Hillel, 1, 2, 9, 10, 61, 65 Barry, 39, 92 Barss, 168 Barwise, 212 Basic Categorial Grammar, 1, 3–11, 13, 15–17, 19–21, 23, 25, 26, 40– 43, 46, 49, 56, 61, 62, 65 BCG, see Basic Categorial Grammar Belnap, 28 Bernardi, 165 binary reduction lemma, 63, 64, 73 binding, 45, 69, 70, 74–79, 81, 82, 84, 91– 93, 98, 99, 102–104, 106, 108, 109, 112, 114, 151, 154, 156, 157, 159–161, 165–172, 175– 179, 203, 206, 209, 214, 226, 232, 238, 243, 247, 254, 264, 266 backward b., 111, 171, 172, 175–178, 214 dynamic b., 211, 217, 234, 243, 247, 252, 253 Interpretation Rule, 94–96 Theory, 75, 237 Principle A, 78 Principle B, 78, 88, 91, 209 Principle C, 75 Bresnan, 168, 176 bridging, 265, 267, 269 implicational b., 203 Buszkowski, 9, 62 c-command, 16, 75, 76, 88, 91, 114, 116, 118, 151, 167–172, 178 canonical model, 60, 150, 151 Carpenter, 45, 66, 163, 165 Cartesian product, 119 Casalegno, 246 cataphora, 172, 176, 177 Categorial Grammar, ix, x, 1, 4, 11, 16, 17, 22, 23, 25, 46, 61, 65–67, 92, 96, 114, 171, 200 CFG, see context free grammar CG, see Categorial Grammar Chierchia, 102, 103, 105 choice function, 244, 245 approach, 245–247, 250, 254, 256 Chomsky, 43, 61, 65, 75, 108, 167 conjecture, 61 Chung, 258, 260, 263, 266, 267
Church-Rosser property, 36, 163 classical logic, 26, 27, 29, 30, 38 Cohen, 61 combinator, 17, 21–23, 46–49, 73, 75–77, 81, 96, 98, 102, 114, 116, 151, 160, 178 Combinatory Categorial Grammar, 96 combinatory logic, 22, 74 common noun, 3, 5, 239, 241, 249, 257, 261 complete, 58, 60, 61, 142, 144 completeness, xi, 59, 61, 149–151, 250 complexity, 9, 34, 50, 60, 61, 65, 122, 123, 125, 130, 141, 150, 224, 268 compositional interpretation, 12, 16 compositionality, 73, 143 principle of c., 67, 72, 219 surface c., 109 concatenation, 3, 83, 85, 86 confluent, 141, 142 constituent, 13, 14, 16, 18, 20, 21, 30, 46, 71, 76, 81, 83–85, 91, 94, 96–98, 100, 104, 105, 121, 122, 168, 170, 171, 183, 202, 213, 247, 264, 268 constructive logic, 28 context free grammar, 9–11, 61, 62, 64–66 Contraction, 26, 27, 29, 30, 49, 74, 79, 92, 93, 95, 117–119, 121, 122, 137 Cooper, 81, 109, 163 coordinate structure constraint, 258 coordination, 17–22, 43, 48, 68, 74, 78, 105, 106, 115, 158, 182, 183, 199, 205, 210 Boolean c., 18, 19 non-constituent c., 46, 106 scheme, 18–20 Cormack, 200 curried, 4, 239 Curry, 21, 22, 35, 49, 102, 178 Curry-Howard correspondence, 31, 34, 35, 38, 48, 65, 114, 118 Curry-Howard isomorphism, 35 Cut, xi, 6–10, 13–17, 25–27, 36, 41, 49–54, 56, 58, 62–64, 95, 96, 116, 118, 120–126, 128, 130–132, 142, 147, 148, 150–152, 160, 162, 222 elimination, xi, 49, 51, 54, 95, 96, 116, 118, 122, 123, 125, 130, 142, 160–162, 222 Dahl, 191, 205 Dalrymple, 69, 189, 190, 202, 205, 206 dative, 259, 265, 266 de Groote, 66 de-accenting, 192, 206
Index decidability, xi, 49, 50, 53, 96, 118, 123, 125, 160, 161, 175, 176, 222, 224 decidable, 8, 9, 53, 95, 118, 123, 142 Dekker, 212–214, 216, 218, 219, 222, 224– 226, 228, 229, 232–234, 243, 252, 267 denotation, 1, 4, 11, 12, 20, 56, 71, 110, 156, 173, 176, 185, 215, 218– 220, 224, 225, 228, 239, 248, 252, 260 designated category, 7–10, 42, 61, 64, 65 determiner, 43, 107, 111, 164, 211, 212, 240, 241, 243, 245, 249, 269 interrogative d., 175, 261 discontinuity, 79, 83 Discourse Representation Structure, 16, 216 Discourse Representation Theory, 16, 74, 212, 213, 216, 218, 222, 225, 239, 244, 245, 266, 267 disjunction, 26, 35 Doˇsen, 58, 61 dominance, 16, 127, 128 immediate d., 127 Donald Duck problem, 244, 247, 248, 251, 254, 256, 266 donkey anaphora, see anaphora donkey sentence, 211, 212, 217, 218, 226, 234, 238–240, 243, 252, 253 double object construction, 76, 102, 151, 158, 170, 171, 182 Dowty, x DRS, see Discourse Representation Structure DRT, see Discourse Representation Theory Dunn, 28 dynamic binding, see binding Dynamic Predicate Logic, 218, 226, 227, 234, 268 Dynamic Semantics, 212–216, 239, 268 E-type pronoun, 109 ellipsis, x, xi, 19, 69, 79, 114–116, 154, 177, 178, 181–196, 200, 202, 204–207, 209, 210, 224, 256, 257, 260, 266, 268 antecedent contained deletion, 183, 184 cascaded e., 187, 188, 194, 199 coordination e., 68, 115, 183 gapping, 177, 182, 183 sluicing, x, xi, 183, 184, 213, 256, 257, 259–261, 263, 264, 266– 269 stripping, 115, 182, 183
285 verb phrase e., x, xi, 69, 181, 182, 184, 185, 193, 194, 196, 198– 203, 205–210, 257 empty abstraction, 37, 38 empty category, 16, 171 empty set problem, 245, 247, 248, 252 Endriss, 247, 256, 269 Engdahl, 102, 109 Evans, 109 existential closure, 216, 218, 228, 244, 245, 249, 256, 266, 268 extensionality, 33, 110 f-command, 172 Faltz, 68, 69 Familiarity Condition, 222 Farkas, 244, 248 Feys, 21, 22, 35, 49, 102, 178 Fiengo, 185, 191, 199, 202, 203 finite reading property, xi, 53, 95, 96, 118, 153, 160, 161, 222, 224 Fodor, 243, 244 Fox, 191 frame, 57, 60, 61 associative f., 57–59, 61, 144, 148, 150 conditions, 60 language f., 61 relational f., 60 ternary f., 56–58, 60, 61 free relative, 104 Frege, 65 function composition, 21, 22, 46, 47, 49, 77, 81, 96 function space formation, 11, 119, 221 functional gap, 103 functional question, 102, 104, 116, 121, 154, 175, 179 functional reading, 102–104 G, 98–101, 107, 108, 110, 114, 115, 152– 155 Gaifman, 9, 10, 61, 65 Gamut, x Gardent, 206 Gawron, 78, 168, 178, 188–190, 194, 195, 199 Gazdar, 73 Geach, 47 Geach rule, 47, 73, 76, 96, 98, 110, 153 gender, 97, 237 generative capacity, 65, 66, 73 Gentzen, 26 Gentzen style sequent presentation, see sequent presentation Gentzen style sequent system, see sequent system Geurts, 246, 256
286
ANAPHORA AND TYPE LOGICAL GRAMMAR
Girard, 2, 29, 66 glue language, 67, 73, 74 goal category, 2 Greibach, 10 Greibach Normal Form, 10, 61 Groenendijk, 102, 104, 212, 218, 261, 268 Hankamer, 181 Hardt, 185, 192, 202, 203 Hausser, 104 Heim, 16, 70, 212, 244 Hepple, 39, 66, 92–97, 115–119, 142 Hirschb¨ uhler, 199–201 Howard, 35 Husserl, 65 hyperintensional, 142 hyperintensionality, 145 hypothetical reasoning, 25, 43, 45, 86, 100, 156, 157, 165, 172, 178, 188 i-within-i effects, 106, 154 in situ binder, see q indefinite, x, xi, 111, 164, 184, 196, 211– 214, 218–224, 226, 227, 229, 232, 233, 236–241, 243–251, 253, 254, 256, 257, 259–261, 263–268 intension, 142 interpolation, 62, 63 interpretation function, 32, 57, 59, 215, 224, 248 Intuitionistic Logic, ix, 25, 27–31, 33–35, 38, 92, 118, 121, 135 island, 46, 257–259, 263, 266 adjunct i., 258 complex NP i., 258 embedded question i., 258 sentential subject i., 258 J¨ ager, 122 Jacobson, 74, 92, 96, 97, 100, 102–104, 108, 109, 112, 114–117, 119, 151–155, 167, 175, 176, 183 Janssen, 17 Kamp, 16, 74, 212, 218 Kanazawa, 212 Kandulski, 66 Karttunen, 104, 109 Kayne, 168, 171 Keenan, 68, 69 Kehler, 191 Kempson, 200 Kleene star, 7 Klein, 73 Kratzer, 16, 70, 244–246
Krifka, 104 Kurtonina, 60, 61 L, xi, 17, 23, 25, 26, 29–31, 39–43, 45, 46, 48–51, 53–66, 68, 72, 73, 80, 83, 84, 86, 92, 94–96, 117, 119, 121–125, 127, 132, 141, 142, 145–150, 155, 158–160, 163, 201 labeled deduction, 13, 34 Ladusaw, 258, 260, 263, 266, 267 Lakoff, 206 Lamarche, 66 Lambek, ix, 17, 23, 25, 31, 39, 49, 50, 53– 55, 65, 160 Lambek calculus, see L Lambek Calculus with Limited Contraction, see LLC Lamping, 69 Larson, 171 Lasnik, 168 left node raising, 21 left rule, 49, 119 Leslie, 39, 92 Lewis, 212 lexicon, 5, 7–9, 12, 14, 17, 70, 72, 74, 91, 95, 100, 115, 117, 154, 178, 184, 199 LF, see Logical Form lifting, 20–22, 46, 48, 53, 54, 73, 76, 81, 96, 98, 160 Linear Logic, xi, 2, 29, 30, 37–39, 49, 68, 70, 92, 114, 115 Link, 173 LLC, 117, 119–121, 123–126, 128, 130, 135, 137, 138, 141–144, 146, 148–155, 157, 160–163, 171, 172, 178, 179, 181, 184, 188, 189, 193, 198, 201, 206, 210, 218–222, 232, 256, 260, 265, 267, 268 LLC+∧ , 221, 222, 232, 265, 267 Logical Form, 16, 70, 266 logical rule, 26–29, 41, 43, 49, 51–53, 56, 86, 119, 122, 160, 222 m-command, 16 many pronoun puzzle, 199 May, 163, 185, 191, 199, 202, 203 McCloskey, 258, 260, 263, 266, 267 meaning multiplication, 70, 181, 183, 184 Merchant, 183, 257–259, 263 Merenciano, 83 Mikulas, 61 modal logic, xi, 57 model, xi, 12, 16, 56–60, 71, 92, 142, 144, 145, 148–151, 157, 181, 215, 220, 225, 232, 252
Index theory, xi, 56, 142 Modus Ponens, 25, 53, 121, 155, 185 monostratal, 16, 172 Monotonicity, 26–28, 30, 68, 143 Montague, 12, 20, 65, 69, 163, 211, 213 Montague Grammar, 211 Moortgat, xiii, xiv, 46, 49, 66, 79–84, 87, 88, 115, 157, 159, 165, 172, 195, 268 Moot, 165 Morrill, xiv, 39, 45, 46, 66, 79, 82–84, 86, 91, 92, 115, 118, 122, 154, 166, 176, 184, 247, 263, 265, 269 multimodal, xii, 46, 49, 66, 83, 96, 119, 165, 247 Multimodal Type Logical Grammar, 66 natural deduction, 25, 26, 34, 40, 66, 71– 73, 80, 85, 86, 88, 94, 121, 125, 127, 130, 132, 134, 135, 142, 155, 160–162, 221–223 negation, xii, 27, 214, 216, 226–229, 232, 235, 250, 267 nominative, 76, 78, 88 non-associative, xii, 31, 60, 65, 66, 83 non-subject sloppy reading, 203, 209 normalization, 36, 68, 137, 138, 141, 162 β-n., 36, 135, 136, 138–140, 162, 163 η-n., 36, 135, 137, 140, 162, 163, 187 proof n., 36, 135, 141, 142, 163 strong n., xi, 36, 137, 222 term n., 36, 163 Novelty Condition, 222, 267 NP-complete, 65 number (morph. category), 97 pair formation, 32, 33, 37, 83 Pankrat’ev, 61 parallelism, 65, 181, 202–208, 264, 265 Partee, 47, 172 Peirce’s Law, 27 Pentus, 61, 62, 65, 66, 73 Pereira, 69, 165, 189, 190, 202, 205, 206 Permutation, 26, 27, 29, 30, 46, 49, 92, 95, 194 Pesetsky, 168, 171 Peters, x, 78, 168, 178, 188–190, 194, 195, 199 pied piping, 80 pied-piped, 78, 175 PLA, see Predicate Logic with Anaphora plural, 173, 268 polymorphic, 17, 18, 78, 105, 158, 175, 204, 210, 227, 228, 235, 241 polymorphism, 104, 199, 224, 227, 235 possessive, 100, 172 possible world semantics, 57
287 Postal, 167 pre-order, 54, 55, 61 precedence, 5, 97, 127, 128, 151, 167–172, 179 Predicate Logic with Anaphora, 213–219, 224, 227, 232, 234, 235, 239, 243, 267 presupposition, 264, 265, 269 product, 25, 28, 29, 37, 38, 43, 49, 54– 56, 59, 60, 62, 83, 84, 86, 119, 121, 130, 131, 134, 144, 146– 148, 150, 163 elimination, 37, 131 pronoun, 69, 70, 77–79, 84, 86–88, 91, 93, 96, 98–100, 102–112, 114, 116, 117, 143, 154–156, 166– 173, 175–179, 185–188, 191, 192, 195, 196, 199, 202, 203, 205, 208–211, 213–222, 224, 226, 229, 232, 233, 236–239, 246–248 anaphoric p., 74 bound p., 70, 77–79, 81, 82, 105, 112, 165–167, 175, 176, 211, 254, 256 problem, 246–248, 256 donkey p., xi, 235, 267 interrogative p., 103, 175, 260, 261, 265, 266 paycheck p., 109, 110, 112, 177 reflexive p., 68, 74–78, 81, 82, 84, 86, 88, 94, 127, 145, 171, 172 relative p., 43, 45, 46, 104, 109–112, 156, 167, 184 proof nets, xii, 66 proof search, 9, 50, 53, 96, 118, 123 proof tree, 13, 14, 40, 41, 72, 127–135, 137, 138, 140, 141, 155, 161– 163, 165, 166, 190, 193, 194, 196, 198, 222, 254 proposition, 1, 23, 27, 29, 35, 97, 104, 110, 112, 206 prosodic labeling, 85 Pullum, 65, 73 q, 80–84, 115, 157, 159–166, 169, 174, 193–198, 201, 226, 229–231, 241–243, 247, 255, 268, 269 Quantifer Raising, 163 quantification, x, xi, 155, 157, 159, 160, 193, 195, 212, 214–217, 232, 249, 253, 268, 269 quantifier, 19–21, 70, 79–81, 84, 91, 99, 104, 105, 157–160, 163–169, 172, 175, 178, 179, 193–196, 198, 200, 201, 207, 210, 211, 214, 216, 219, 224, 226, 232,
288
ANAPHORA AND TYPE LOGICAL GRAMMAR
238, 241, 243, 244, 246–248, 254, 260, 263, 268, 269 downward monotonic q., 256 existential q., 212, 215, 217, 225, 226, 254 interrogative q., 173 Quantifying In, 163 recipe, 39, 43, 66, 68, 73 reconstruction, 106, 170, 172, 173, 175– 177, 179, 185, 213, 257, 259 recursion, 3, 72, 227 reflexive, see pronoun Reinhart, 78, 79, 167–169, 202, 244, 245 relative clause, 43, 70, 71, 91, 104, 105, 108, 167, 183, 198, 257 Relevant Logic, 28–30, 37, 38, 49, 118 Reniers, 246, 268 residuation, 56, 83, 84 resource conscious logic, 28 resource sensitivity, 67 Restall, 30 Retor´e, 66 Reyle, 16, 74, 212 right node raising, 21, 105, 106, 154, 182, 183 right rule, 49, 119 Roorda, 62, 63, 66 Rooth, 47, 202, 203, 205, 212 Ross, 257, 259, 263, 267 rule of proof, 49, 121, 122, 125 rule of use, 49, 119, 122, 125 Russell, 65 Ruys, 268 Sag, 73, 181, 184, 193, 199, 205, 243, 244 Saraswat, 69 satisfaction, 213–216, 224 scope, xi, 45, 46, 79, 81, 83, 91, 160, 163–167, 190, 193–196, 198– 202, 207, 208, 210–212, 214, 216, 225–227, 229, 232, 235– 238, 240, 241, 243–245, 247, 248, 251, 254, 256, 260, 263– 266, 268 double s., 269 inversion, 201 parallelism, 264 secondary wrap, see wrapping Sem, 191 semantic composition, 4, 65, 68, 70, 74, 79, 86, 93, 97, 185, 188, 219 sentential category, 223, 224 sequent, 6–10, 13, 15, 25, 31, 34, 36, 37, 39, 41, 49–51, 53–56, 58, 60– 65, 72, 73, 80, 94, 95, 99, 119,
123–125, 130–132, 134, 142– 144, 147–149, 151, 153, 159, 160, 189, 221 calculus, 53 derivation, 13, 63, 66, 147, 198 format, 40, 41, 131, 132, 160, 166, 198 presentation, 6, 39, 41, 49–51, 55, 120, 123, 142, 146, 161 proof, 49–51, 123, 134 rule, 23, 50, 53, 73, 95, 121, 131, 132, 160, 161 system, 49, 50, 125, 142, 147, 148 Shamir, 9, 10, 61, 65 Shieber, 69, 189, 190, 202, 205, 206 Skolem function, 100, 102–104, 106, 109, 110, 112, 154, 173, 177, 186, 191, 220 slash, ix, 4, 5, 22, 23, 25, 65, 83, 86, 95–97, 119, 125, 142, 143, 150, 151 elimination, 25, 41, 46, 55, 56, 62, 86, 95, 143 introduction, 25, 41, 43, 46, 47, 49, 55, 56, 62, 86, 95 sloppy, 104, 154, 177, 178, 185, 187–192, 195, 196, 198, 202, 203, 205, 207–210 Solias, 83, 184 sound, 39, 58, 60, 142, 144 soundness, 58, 146, 148 specific, 111, 200, 235, 241, 244, 245, 268 sprouting, 267 Stalnaker, 213, 222 static closure, 227–229, 232, 233, 241, 247, 248, 252 Staudacher, 212 Steedman, 17, 22, 49, 65, 184 Stokhof, 102, 104, 212, 218, 261, 268 strict, 79, 97, 119, 185–189, 191, 192, 195, 196, 201, 205, 207–209, 224 string recognition, 8, 9, 43, 61, 64 structural hierarchy, 17, 26, 29, 30 structural rule, 26–30, 37, 38, 46, 49, 60, 68, 74, 92, 118, 119, 137, 194 subformula, 9, 49, 50, 53 property, 50, 96, 118, 122, 123, 142, 161 substructural logic, 65, 198 succedent, 6, 9, 13, 14, 25, 28, 38, 39, 49, 52, 54, 58, 73, 121, 142 surface structure, 16, 17, 257 syntactic composition, 4, 145 Szabolcsi, 68, 69, 74–79, 81, 82, 87, 88, 102, 115, 213, 269 S, 49 T, 20–22, 75, 99, 101, 110 Tarski, 218
Index
289
Thrainsson, 172 Tiede, 66 TLG, see Type Logical Grammar topicalization, 45, 175 trace, 16, 43, 45, 103 transformation, 16, 17, 41, 51, 135, 256 truth, 1, 11, 12, 18, 27, 28, 58, 59, 71, 142, 148, 150, 151, 212, 214, 216, 219, 220, 224, 225, 228, 233, 235, 248, 251, 252, 261 conditions, 207, 213, 214, 216, 217, 219, 224, 225, 227, 229, 237, 244, 245, 250, 254 Turing, 66 Type Logical Grammar, ix–xii, 1, 17, 23, 35, 43, 45, 46, 49, 66, 67, 70– 73, 79, 91, 115, 117, 157, 166, 172, 194, 198, 201, 213, 218, 219, 224, 225, 234, 235, 239, 247, 260, 263, 265, 267, 268
validity, 58, 59, 142, 146 van Benthem, ix, 61, 65 variable free, 67, 70, 73, 74, 106, 203, 213, 224 Versmissen, 83 von Fintel, 246 VPE, see ellipsis, verb phrase
underspecification, 69 unselective binding, 240, 245, 248, 250, 256, 266
Z, 98–103, 105, 107, 108, 110, 112, 114– 116, 151, 152, 155 Zaefferer, 104
Wall, x Wansing, 38 weak crossover, 102, 103, 111, 112, 154, 167, 177, 178 Wescoat, 202 Williams, 176, 178, 191 Winter, 245, 246, 268 wrapping, 76, 81, 83, 84, 88, 91, 92, 102, 151, 171 secondary wrap, 83–86
TRENDS IN LOGIC 1.
G. Schurz: The Is-Ought Problem. An Investigation in Philosophical Logic. 1997 ISBN 0-7923-4410-3
2.
E. Ejerhed and S. Lindstr¨om (eds.): Logic, Action and Cognition. Essays in Philosophical Logic. 1997 ISBN 0-7923-4560-6
3.
H. Wansing: Displaying Modal Logic. 1998
ISBN 0-7923-5205-X
4.
P. H´ajek: Metamathematics of Fuzzy Logic. 1998
ISBN 0-7923-5238-6
5.
H.J. Ohlbach and U. Reyle (eds.): Logic, Language and Reasoning. Essays in Honour of Dov Gabbay. 1999 ISBN 0-7923-5687-X
6.
K. Do˘sen: Cut Elimination in Categories. 2000
7.
R.L.O. Cignoli, I.M.L. D’Ottaviano and D. Mundici: Algebraic Foundations of manyvalued Reasoning. 2000 ISBN 0-7923-6009-5
8.
E.P. Klement, R. Mesiar and E. Pap: Triangular Norms. 2000
ISBN 0-7923-5720-5
ISBN 0-7923-6416-3 9.
V.F. Hendricks: The Convergence of Scientific Knowledge. A View From the Limit. 2001 ISBN 0-7923-6929-7
10.
J. Czelakowski: Protoalgebraic Logics. 2001
11.
G. Gerla: Fuzzy Logic. Mathematical Tools for Approximate Reasoning. 2001 ISBN 0-7923-6941-6
ISBN 0-7923-6940-8
12.
M. Fitting: Types, Tableaus, and G¨odel’s God. 2002
ISBN 1-4020-0604-7
13.
F. Paoli: Substructural Logics: A Primer. 2002
ISBN 1-4020-0605-5
14.
S. Ghilardi and M. Zawadowki: Sheaves, Games, and Model Completions. A Categorical Approach to Nonclassical Propositional Logics. 2002 ISBN 1-4020-0660-8
15.
G. Coletti and R. Scozzafava: Probabilistic Logic in a Coherent Setting. 2002 ISBN 1-4020-0917-8; Pb: 1-4020-0970-4
16.
P. Kawalec: Structural Reliabilism. Inductive Logic as a Theory of Justification. 2002 ISBN 1-4020-1013-3
17.
B. L¨owe, W. Malzkorn and T. R¨asch (eds.): Foundations of the Formal Sciences II. Applications of Mathematical Logic in Philosophy and Linguistics, Papers of a conference held in Bonn, November 10-13, 2000. 2003 ISBN 1-4020-1154-7
18.
R.J.G.B. de Queiroz (ed.): Logic for Concurrency and Synchronisation. 2003 ISBN 1-4020-1270-5
19.
A. Marcja and C. Toffalori: A Guide to Classical and Modern Model Theory. 2003 ISBN 1-4020-1330-2; Pb 1-4020-1331-0
20.
S.E. Rodabaugh and E.P. Klement (eds.): Topological and Algebraic Structures in Fuzzy Sets. A Handbook of Recent Developments in the Mathematics of Fuzzy Sets. 2003 ISBN 1-4020-1515-1; Pb 1-4020-1516-X
21.
V.F. Hendricks and J. Malinowski: Trends in Logic. 50 Years Studia Logica. 2003 ISBN 1-4020-1601-8
22.
M. Dalla Chiara, R. Giuntini and R Greechie: Reasoning in Quantum Theory. Sharp and Unsharp Quantum Logics. 2004 ISBN 1-4020-1978-5
23.
B. L¨owe, B. Piwinger and T. R¨asch (eds.): Classical and New Paradigms of Computation and their Complexity Hierarchies. Papers of the conference “Foundations of the Formal Sciences III” held in Vienna, September 21–24, 2001. 2004 ISBN 1-4020-2775-3
24.
G. J¨ager: Anaphora and Type Logical Grammar. 2005
springeronline.com
ISBN 1-4020-3904-2