Structures and categories for the representation of meaning develops a way of representing the meanings of linguistic expressions which is independent of any particular language, allowing the expressions to be manipulated in accordance with rules related to their meanings which could be implemented on a computer. This requires a new two-dimensional notation, different from that of modern logic. The book begins with a survey of the contributions of linguistics, logic and computer science to the problem of representation, linking each with a particular type of formal grammar. Taking Frege as his guide, the author then presents a system of graphs organized by scope relations in which linguistic constituents are sub-graphs whose configuration is determined by their categories. In developing this system, he extends the notion of scope and argues that anaphoric and relative pronouns are structural signs not linguistic constituents. Certain count nouns are made the basis of this system and a new account of proper names, relating to count nouns, is given.
Structures and categories for the representation of meaning
Structures and categories for the representation of meaning Timothy C. Potts Formerly Senior Lecturer in the Department of Philosophy, University of Leeds
CAMBRIDGE
UNIVERSITY PRESS
CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521434812 © Timothy C. Potts 1994 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1994 Reprinted 1996 This digitally printed version 2007 A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data Potts. Timothy C. Structures and categories for the representation of meaning / Timothy C. Potts p. cm. Includes bibliographical references and index. ISBN 0 521 43481 5 1. Grammar, comparative and general. 2. Categorization (linguistics). 3. Semantics. 4. Language and logic. 5. Computational linguistics. I. Title P151.P66 1993 40T.43-dc20 93-788 CIP ISBN 978-0-521-43481-2 hardback ISBN 978-0-521-04250-5 paperback
to
Peter Thomas Geach whose lectures on the theory of syntax at the University of Leeds in 1967 originally inspired this work
Die stillschweigenden Abmachungen zum Verstandnis der Umgangssprache sind enorm kompliziert. (Wittgenstein, 1921, 4.002) Quia quaesisti a me, quomodo oportet incedere in thesauro scientiae acquirendo, tale a me tibi super hoc traditur consilium: ut per rivulos, et non statim in mare, eligas introire; quia per facilia ad difflcilia oportet devenire. (Aquinas, letter De modo studiendi)
Contents
Preface Acknowledgments
page
xi xv
1 1.1 1.2 1.3 1.4 1.5
Linguistics: strings String grammars Semantic roles Passives Pronouns and relative clauses The semantic component
1 9 15 24 33
2 2.1 2.2 2.3 2.4 2.5
Logic: trees Tree grammars Logic and meaning Operator and operand Categorial grammar Quantification
46 48 54 64 75
3 3.1 3.2 3.3 3.4
Computer science: graphs Graphs and graph grammars Semantic networks Conceptual dependency Frames
94 96 111 122
4 4.1 4.2 4.3 4.4 4.5
Categorial graphs Scopeways Converging scope: (1) pronouns Converging scope: (2) relative clauses Scope at higher levels Sub-categorization
131 137 148 162 168
5 5.1
Basic categories: count nouns Frege's categorization of count nouns
176
ix
x
Contents
5.2 5.3 5.4
Difficulties in Frege's view Count nouns as a basic category Generic propositions
182 193 201
6 6.1 6.2 6.3
Basic categories: pointers The category of pointers Proper names The relationship of proper names to count nouns
211 222 232
7 7.1 7.2 7.3 7.4
Quantifiers, pronouns and identity Quantification revisited Anaphoric pronouns and relative clauses revisited Plurative and numerical quantifiers Identity
239 246 259 270
Epilogue
278
Bibliography Index
289 300
Preface
This work addresses the representation problem - to use the jargon of computer scientists. To be sure, they speak of the representation of knowledge, but that is a misnomer, reflecting their intentions rather than the nature of the problem. What counts as knowledge must be true, yet any notation in which we can express what is true must equally allow us to express what is false. The problem, therefore, is how best to represent the meanings of linguistic expressions so that they may be manipulated in accordance with rules, such as rules of inference or of translation. One might call this the 'semantic form' of expressions, by analogy with 'logical form'. My interest is restricted to expressions of everyday language. This is not a synonym for 'natural language'. The implied contrast is with technical language, for example the language of mathematics, which might also qualify as natural language. I also assume that, in the case of expressions which are accounted either true or false (propositions1), the central core of their meanings will be given by specifying the circumstances under which they would be true, so that semantic form or structure will relate to and should facilitate these specifications. Identifying the structure is, indeed, the very first step in such a specification, for the meaning of an expression is determined by the meanings of its parts and the manner of their combination; that much is implicit in the possibility of learning a language (see Davidson, 1965). Yet there seems to be a remarkable reluctance on the part of those concerned with the study of meaning to discuss structural issues. Time and time again one finds that an author is simply taking a certain structure for granted, in order to press on as quickly as possible to offer an account of the truth conditions of propositions containing the types of expression in 1
This is the traditional sense of 'proposition', and the sense in which it will be used throughout this book. It should not be confused with a more recent sense, deriving from Russell, in which a proposition is an abstract entity constituting the meaning, or perhaps denotation, or perhaps reference of a proposition in my sense.
xi
xii
Preface
which he or she is currently interested. Structures are, moreover, usually assumed to be of very simple kinds, even at the cost of very complex specifications of truth conditions. This prejudice against structural investigation is especially remarkable in view of the manifest aptitude of the human mind for grasping structure, by contrast, for example, with its poor showing at computation. Our delight in music is one evidence of this and most of us, indeed, enjoy these complex sound patterns without any theoretical understanding of them, just as we can speak and write a language without any theoretical knowledge of linguistic structures. It would be more controversial to claim that appreciation of painting or sculpture turned upon apprehension of structure, but for architecture the case requires no argument; one has only to recall the enormous importance always accorded to proportion by architectural theorists, from ancient Greece through Renaissance figures like Alberti and Palladio to modern masters such as le Corbusier. Without our apprehension of structure there would not even be any computation, for the subject-matter of pure mathematics, upon which applied mathematics depends, is, precisely, structure. So an account of meaning which emphasizes structure is a priori far more credible than one which stresses computation. The one really great success story of modern logic should also have been a warning against this lack of interest in structure. The failure of logicians in the late middle ages to give a correct account of generality specifically, of the logical relationships of propositions containing more than one general term, such as 'every', 'few', 'some', etc. - was overcome by Frege in the late nineteenth century thanks to a new structural description of general propositions (to be explained in detail in the sequel). And whereas medieval logicians, relying on over-simple structures, tried to compensate with more and more complex specifications of the conditions for valid inferences, Frege, operating with more complex structures, was able to offer a simple account of validity. Taking Frege as my guide, then, I have tried to develop aspects of his structural analysis of language with respect to meaning which remain implicit within his work. This stage is reached in chapter 4, which is the pivot of the work. It is preceded by three chapters in which I survey the contributions made by linguistics, logic and computer science respectively to the representation of meaning. Conveniently, although perhaps slightly artificially, I have linked each of these disciplines with a particular type of formal grammar: linguistics with string grammars, logic with tree grammars and computer science with graph grammars. These grammars proceed in order of complexity, which has determined the order in which the contributions of the three disciplines are presented.
Preface
xiii
I have not, of course, attempted a general survey of each discipline's contribution to the study of meaning in general, but have singled out what each has to offer by way of structural analysis, as that alone is germane to my purpose. In the remaining three chapters, I diverge from Frege, calling into question, first, his treatment of count nouns as disguised intransitive verbs (chapter 5) and, second, his use of proper names as the basic category of his system (chapter 6). I propose an alternative categorization for both count nouns and proper names, so a final chapter re-works the ground covered by chapter 4 in order to make the necessary modifications, with some extensions. It is a matter of considerable regret to me that I have been unable to include a treatment of temporal expressions, which occur in the majority of contingent propositions. This lack consequently inhibits practical application of the system of representation proposed here; I hope to remedy it at a later date. Meanwhile, I have given a brief taste in the Epilogue of how categorial graphs might be used to handle a longrecognized stumbling block for Frege's ideography, adverbial modification. Computer scientists who look for a notation which can be implemented immediately on a machine will also be disappointed to find that, while I argue for distinct, though related, structural analyses with respect to meaning and with respect to the accepted forms of expression in a particular language, I restrict myself entirely to the former, thus leaving one side of the representation problem untouched. This is properly a task for linguists but, if the ideography which I develop here is on the right lines, their current proposals would demand substantial modification. With the exception of the first section of each of chapters 1-3, argument and notation proceed hand in hand throughout this book. This is essential to my purpose, since a notation is, precisely, a means of representing structures of a certain kind. A discussion of structure must, therefore, involve an ongoing discussion of notation. To set out the final notation at the beginning would be tantamount to assuming from the outset everything that the book sets out to justify. Some readers may find the structures which I discuss difficult to grasp: in my experience, some people find it much easier to apprehend auditory structures, others visual structures. I myself am in the latter group, and so my representations are primarily visual. This may present an obstacle to those who need a notation which they can pronounce. In addition, with one exception (in section 7.1), I am only concerned with structure, so that the reader who looks for full specifications of truth conditions will be disappointed. Although structural analysis must
xiv
Preface
constantly advert to questions about truth and inference, it does not require a full specification of truth conditions; the structural analysis is, rather, a precondition for the latter. Moreover, a system for representing meaning has many uses for which a detailed specification of truth conditions may be unnecessary, for instance machine translation, expert systems (see Galton, 1988). This is fortunate, since experience shows that spelling out the truth conditions of propositions of everyday language is an enormously difficult task. To do so for the range of constructions considered in this book would be a totally unreasonable demand; I hope, by the end, to convince the reader that structural analysis with respect to meaning is both demanding and worth-while in its own right. N O T E ON N U M B E R I N G OF E X A M P L E S Examples are numbered consecutively, beginning anew with each chapter. Analyses of examples are given the same number as the example itself, followed by a letter: P for a phrase marker, L for a representation in linear notation (but F if the representation is based on Frege's principles), LF for the 'logical form' of transformational grammar, and S for shallow structure. Graphs are numbered in their own sequence (Gl) etc.
Acknowledgments
My thanks are due to Professor David Holdcroft for encouraging me in this enterprise and for reading an earlier version; to Mrs Stella Whiting for useful criticisms, one of which persuaded me to change substantially my account of demonstratives; to Mr Brandon Bennett for comments and information which have helped me especially in the chapter on computer science; and finally to three publishers' readers, whose names I do not know, for criticisms which led to a major revision of an earlier version. I am also much indebted to a publisher's reader of the penultimate version, who submitted a very detailed report which has prompted several excisions and a few additions to the main text as well as many footnotes.
xv
1
Linguistics: strings
1.1 STRING GRAMMARS Throughout this book, I shall be using 'grammar' in the sense given to it by mathematicians. That calls for a word of explanation, for the terms 'grammar' and 'syntax' are in process of interchanging their meanings and the new usage has not yet crystallized, with consequent occasions of confusion. The way in which we classify linguistic expressions and the way in which which we analyse them structurally depends upon our purposes. People's jobs, for example, can be distinguished into manual versus 'white-collar' or, alternatively, into those concerned with production and those concerned with providing services. Sometimes we need one classification, sometimes the other. Language is no exception. The branches of learning whose subject-matter is language already exhibit at least three distinct (though not unrelated) purposes. Oldest, perhaps, is the study of meaning and of validity in argument. Then came grammar (now, more often, called 'syntax'), the description, roughly at the level of words and phrases, of those combinations which are used in a given language. More recently, phonology and phonetics have sought to classify sounds and their combinations, from several points of view, including, for example, the relation of sound production to the physiology of the throat and mouth. There is no a priori reason to suppose that the same structural analysis will be apposite to all of these purposes. Indeed, quite the contrary, although, to the extent to which the purposes are related, it should be possible to inter-relate the corresponding structural systems. The contrast which is of prime concern here is that between the study of meaning, on the one hand, and that of the accepted forms of expression in a particular language, on the other. Etymologically, 'syntax' simply means 'order', 'arrangement', 'organization', so that it is precisely the study of structure. Consequently, if we distinguish more than one structural system in language, we shall correspondingly have more than one syntax, for example a syntax 1
2
Linguistics: strings
relating to meaning and another relating to the forms of expression which are accepted in a particular language. But recently it has become common to contrast syntax with semantics. In the mouths of linguists, this is a contrast with the study of meaning, and syntax is roughly what grammar used to be, a classification of the accepted forms of expression for a particular language, though largely omitting the morphology of individual words. Meanwhile, grammar has been used by some philosophers for the combination of expressions in relation to their meanings, as in 'philosophical grammar'. Moreover, its application has been extended very considerably by mathematicians to embrace almost any system of rules for the combination of elements, not necessarily linguistic, into structures. An example is a (graph) grammar to describe the development of epidermal cell layers (Lindenmayer and Rosenberg, 1979); another is a (context-free) grammar for string descriptions of submedian and telocentric chromosomes (Ledley et al., 1965). In the mathematician's usage, a grammar is always a formal system, whereas philosophical grammar is usually understood to be implicit, the unwritten rules for the combination of expressions with respect to their meanings; the philosopher's task is then to make as much of it explicit as his immediate purposes may require. This slight ambiguity is unavoidable when we are dealing with everyday language, for we are not free to specify what rules we please: we are constrained by language which is already present; yet, at the same time, we want to be as explicit as possible. It seems too late to protest successfully against this reversal of the terminology, but it is with regret that I acquiesce in it. Most important, however, is to have a clear and stable usage. We do not have this at present, as the reversal is not yet complete, so that 'grammar' and 'syntax' are still sometimes used interchangeably, for instance 'universal grammar' instead of 'universal syntax'. We do need two distinct terms here, one for the study of linguistic structures in general, of whatever type, and another for the study of forms of expression which are accepted in a particular language. So, bowing to the new custom, I shall reserve 'grammar' for the former and 'syntax' for the latter. To the extent that linguists have concerned themselves with specifying grammars formally, most of the grammars which they have proposed for everyday language have been among those known to mathematicians as string grammars. String grammars, naturally, generate strings, that is, symbols concatenated in a line, a well-ordering, so that we could identify each as the first, second, third, etc., symbol in the string. Alternatively, a string is an ordered list. It is also possible for a member of a string itself
String grammars
3
to be a string, so that we obtain a structure of nested strings. So, if the grammar is used as a basis for representing meaning, there is an implicit claim that meaning can be represented adequately by string structures, that nothing more complicated is needed. A string grammar G s consists of an ordered set (N,£,P,S), where N is a finite set of non-terminals, I of terminals, P of productions or rewriting rules, and S is the starting symbol. Intuitively, terminals are expressions of the language which the grammar generates, non-terminals are category symbols, of which the starting symbol S is one. V, the union of N and £, is the alphabet of the grammar, while V* is the closure of V, that is, the denumerably infinite set of all finite strings composed of members of V, but including the empty string {excluding the empty string, it is V4"). In general, productions take the form a => /?, meaning that a may be re-written as /?, where a is in V + and /} is in V*. Linguists have largely confined themselves to string grammars, of which a wide variety has now been proposed. Their interest, however, has primarily been in syntax, so we need only be concerned with these grammars to the extent that they have been expected to sustain an account of meaning. To this end, some exposition of formal syntax is unavoidable; yet, at the same time, a comprehensive survey of every theory, even from this point of view alone, would call for a book to itself. I propose, therefore, to concentrate upon the most famous and the most fully developed string grammar for everyday language, transformational grammar, due originally to Chomsky (1957), which from the start has sought to encompass an account of meaning as well.1 I shall not, however, discuss the syntactic arguments used to support the proposed structural analyses. Moreover, linguists who favour a different theory of syntax will have to ask for themselves whether the points of criticism which I raise carry over to their preferred theory. Transformational grammar grew out of constituent-structure analysis (Bloomfield, 1933; Wells, 1947; Harris, 1951; Postal, 1964). Sentences were first divided into their immediate constituents, typically phrases, 1
My exposition is based on Borsley (1991) and Radford (1988), supplemented from Radford (1981), Jacobsen (1986) and Chomsky (1977, 1982a, 1982b, 1986a and 1986b). For the most recent version of the semantic component, May (1985) is the central text. To be exact, there have been three transformational grammars, the second dating from Chomsky (1965) and the third from Chomsky (1981). In each case continuity of notation has masked fundamental changes to the theory. Thus from 1965 to 1981 transformations were held to be meaning-preserving, with the semantic component operating upon deep structures, while since 1981 the semantic component has been attached to shallow structures, which are almost surface structures, and transformation rules merely move items within a structure derived from the phrase-structure rules. The revisions have been prompted in large measure by challenges from logic.
4
Linguistics: strings
and then the latter were progressively divided until their ultimate constituents, typically single words, were reached. The criterion upon which the divisions were based was initially intuitive, but later a substitution test was introduced: if, for an expression occurring in a sentence others could be substituted whilst preserving grammaticality and if, further, the same substitutions were possible in any other sentence in which the expression could occur, then that expression was accounted a constituent of any sentence in which it occurred. The test was thus also a method of classifying linguistic expressions, the members of each group belonging to the same syntactic category.2 By using a symbol for each category, it became possible to describe a series of sentence-patterns. The categories (constituting N, the finite set of non-terminals) used in most formal theories today are derived from those of traditional syntax. While there remain some differences, there is also a broad measure of agreement, starting from Noun (TV), Verb (F), Adjective (^4) and Preposition (P). These are known as lexical categories; investigations of constituent structure revealed a need to classify corresponding phrases, so four phrasal categories were introduced as well. Subsequently arguments were put forward for intermediate categories to cater for expressions which were more than single words or morphemes, yet smaller than the phrases already recognized. The intermediate categories are commonly indicated by a prime, the phrasal categories by a double prime, for example N\ N" (or NP for the latter). The original start symbol was S (Sentence) and there was no phrasal category corresponding to it. Subsequently it was itself recognized as a phrasal category, the head of such phrases being an Inflexion (/) catering for variations of tense, aspect and modality, so that 5, accordingly, was replaced by /'. A further category C (Complementizer) was introduced later to provide for subordinate and relative clauses as well as for mood; thus, examples of complementizers are relative pronouns and 'that' introducing a subordinate clause. As a result of these developments, the start symbol in the latest version of the theory is C" but, as I shall only be concerned with indicative sentences, I shall omit the first few steps in each derivation and begin with /". This will make it easier to see the essential features of derivations for the purpose in hand. Another development which can be largely ignored here is to break down the original categories into sets of features, each of which has a value. Thus verbs and prepositions share common behaviour which 2
This test for syntactic constituents has now been supplemented by several more, which are clearly set out in Radford (1988, p. 90); see also Borsley (1991, ch. 2).
String grammars
5
nouns and adjectives lack, such as being able to combine with a noun phrase, so are assumed to incorporate a feature, rather confusingly dubbed 'V; they are + F, whereas nouns and adjectives are — V. Similarly, a feature 'N' is credited to nouns and prepositions but not to verbs and adjectives; while a further feature BAR (derived from an earlier notation) with 0, 1 and 2 as values can be used to differentiate lexical, intermediate and full phrasal categories respectively. The theoretical interest of this feature theory is as a basis for justifying the system of categories; but, so long as it is understood that these are syntactic categories, they are not our concern. The form of production rules for a string grammar cited above, according to which a category symbol may be re-written as a string, combines two elements. First, there is the replacement of the category symbol by a list of symbols; second, there is the ordering of that list. So long as we use a linear notation to represent the structure of a string, these two elements are difficult to disentangle. Take, as an example, the sentence (1)
Every doctor visited at least one patient.
Let us suppose that 'every doctor' and 'at least one patient' are noun phrases, composed of an expression of a new category, Det (Determiner) ('every' and 'at least one') and a noun ('doctor', 'patient'), while 'visited at least one patient' is a verb phrase, composed of a verb 'visited' and the noun phrase 'at least one patient' already identified. We can then represent a structure for (1) in a linear notation by (1L)
(Det N) (I (V (Det N))).
This is a string whose first member is a string of two members and whose second member is also a string of two members, with its second member being in turn a string of two members, and so once more. However, it omits any information about the categories of the sub-strings. We could supply this by placing sub-scripts on the closing brackets, as follows (intermediate categories are omitted in the interest of simplicity): (1L')
((Det NV (I (V (Det N)N.)v«))r.
This is known as a labelled bracketing, but it is a much less clear representation than the following tree, known as a phrase-marker (to which I have added the terminal symbols for (1)):
6
Linguistics: strings
(IP)
I"
I
N"
I
Det
N
I
every
doctor
e[past]
I
V"
V visit
N"
J_ Det
at least one
I
N patient
The symbol e under the / node indicates that it is empty; tense, however, is regarded as a feature and is therefore shown in square brackets. As we shall be dealing with trees a great deal in the sequel, a few terms will be useful. A tree consists of nodes, joined by edges. Both nodes and edges may be labelled; here only the nodes are labelled and are shown by their labels. The node at the top of the tree is called its root, whose label is always the starting symbol. The nodes at the bottom of the tree are called its leaves\ here they are all terminal symbols. Phrase-markers are ordered from left to right and from top to bottom. The left-to-right ordering gives rise to a relationship of precedence between nodes: for any two distinct nodes X and Y, X precedes Y just in case X occurs to the left of Y. This relationship remains invariant in any given string, because edges in a phrase-marker may not cross. The top-to-bottom ordering produces a dominance relationship: ^dominates Fjust in case there is a path down the tree from X to Y; X immediately dominates Y just in case it dominates Y and no node occurs between them. Symbols which are immediately dominated by the same symbol are called sisters. Other relationships important in transformational grammar can wait until they are needed. The first phrase-structure rules formulated by transformational grammarians combined immediate dominance and precedence, for example the following, in accordance with which we could obtain (IP): S => N"V" N" => DetN V" => VN" Since then, they have undergone continuous change and refinement. First, immediate dominance (ID) and (linear) precedence (LP) have been separated; the motivation for this is that it is more economical when making cross-comparisons between different languages, which may differ
String grammars
7
in their word-order conventions while nevertheless breaking down phrases into the same elements. Second, the ID rules have now been reduced to three types, which can, accordingly, be stated as rule-schemas, that is to say, rule-patterns which use symbols for which the category names given above may be substituted in order to yield particular rules. I shall use Greek minuscules as such symbols; on the right-hand side of the rule they will be separated by commas, to indicate that their order is indifferent. Each of the four types of rule relates to a type of modifier, what is modified being the head of the construction (which I write first), and the four modifiers belong to the same group as the traditional terms subject, object, indirect object. The first is that of specifier, which takes us from a full phrasal category to an intermediate category which is the head of the new string; the ruleschema is: (S)
<x" => a', (/?")•
The parentheses indicate that /?" is an optional element, so that a" may simply be re-written by a'. If /?" is present, however, it is the specifier of a' and the latter its head. It is not possible to substitute just any category name for the Greek letters and still always obtain a syntactically correct rule, but some substitutions which work out all right for English are the following: (SC) (SI) (SV) (SN) (SP) (SA)
C" I" V" N" P" A"
(N") (N") =» V', (N") =• N ' , Det => F, Det =• A', Det
=> c, =*• r,
The parentheses indicate that the symbol enclosed in them is optional. Thus C" may simply be re-written as C , without any branching, and similarly for F and V". In such cases I shall often omit the intermediate category, in the interest of keeping phrase-markers as simple as possible. So far as linear precedence is concerned, in English the head always follows the specifier in applications of these rules. The second type of modifier is a complement; the rule-schema for introducing complements takes us from an intermediate to a lexical category, the latter again the head of the new construction, and is: (C)
a' =» a, OS"), ()'"), (• • .)•
Thus a rule may introduce more than one complement. Some examples are:
8 (CC) (ci) (CV) (CN)
Linguistics: strings C r V N'
=> => =* =>
C, I" i, v" V, (N"), (A"), (P"), (I") N, (P") / (I")
(The slash in this rule indicates that P" and I" are exclusive alternatives). (CP) (CA)
P' =* P,(N") A' =* A, (N"), (P")
Thus this type of rule allows us to re-introduce phrasal categories. The LP rules for English require that a lexical category or a noun phrase precedes any phrasal category which is its sister, and that a sentence (/") follows all its sisters. By using a specifier rule for a given category followed by a complement rule, we may descend from the double-barred category to the corresponding unbarred one, the derivation proceeding thus:
I
I
specifier
a'
i
i
a
complement |
1 complement
The third type of modifier is an adjunct, which is simply added to an intermediate category; thus the rule-schema is: (A)
a'
=> a', p"
The following are examples: (AV) (AN) (AP) (XA)
V' N' P' A'
=» =» => =*
V, A" / N" / P" N\ N" / A" / P" / I" P', A"/P" A',A"/P"
These rules are optional. The arguments for adjuncts from specifiers and complements are mainly syntactic and need not concern us here. There is one type of construction for which the three rule-schemas described above do not provide, namely, that involving coordinating conjunctions such as 'and' and 'or\ According to transformational grammarians, coordinating conjunctions can be used to join together any two expressions of the same category; moreover, they can be used to form lists, so, if their category be K, we need a rule-schema on the following lines:
Semantic roles (K)
9
a =• K, a b a2, (a3, • • ., an.)
with the LP rule for English that K must occur in the penultimate position. A typical example is when a sentence is re-written as a disjunction of two sentences, i.e. I" =* I" or I", and another occurs when a complex subject is formed, such as 'Jack and Jill*, which requires: N" =» N" and N". (I have ignored, here, the complication which arises when the conjunction as shown is the second part of a two-part expression, as in 'either . . . or' and 'both . . . and'.) 1.2 S E M A N T I C ROLES So far, these rules will yield phrase-markers whose leaves are lexical category symbols, but they do not provide for the introduction of terminal symbols; hence the resulting structure is known as a pre-terminal string. In order to obtain a deep structure from a pre-terminal string, the non-terminals must be replaced by terminals. This is effected by lexical insertion, the substitution of words, phrases or other signs having a fixed meaning for the category symbols (including an empty or null symbol, 0). At its simplest, this could be done by providing, for each category symbol, a list of linguistic expressions which might be substituted for it. In practice, a more complicated arrangement is needed. Suppose, for example, that we had the pre-terminal string (1L'); clearly, a transitive verb must be substituted for V, because of the second (Det N). By contrast, if the pre-terminal string had been (Det N) (I V) instead, an intransitive verb would have to be substituted for V. In view of this and many similar cases with other categories, linguistic expressions were sub-categorized, an idea due originally to Matthews and Chomsky (see Chomsky, 1965, p. 79), which was effected by showing what kinds of string they may fit into. These strings, in turn, were specified by recourse to the phrase-structure rules, since the pre-terminal strings may contain more detail (often variable) than is needed for the purpose. Thus a transitive verb like visit could be listed as V, +[— N"], where the brackets enclose a sub-categorization frame into which the expression would fit at the horizontal line, and the plus-sign indicates that the specified frame must be present.
10
Linguistics: strings
A subsequent development, however, has now made it possible to eliminate sub-categorization in the lexicon (though some linguists prefer to retain it). This is case grammar,3 which is concerned with the semantic relationships between verbs and their subjects, direct and indirect objects, etc. Borrowing from mathematics via logic, Fillmore calls these the arguments of a verb or predicate. Neither he nor other transformational grammarians, however, use the term precisely in its mathematico-logical sense, so it is better to define it, as they do, in terms of phrase-markers: an argument is any N" which is dominated either by another N" or by an F. These relationships between verbs and their arguments are expressed in everyday language, according to case-grammar, by cases, prepositions or postpositions, and may be characterized as role-types: human languages are constrained in such a way that the relations between arguments and predicates fall into a small number of types . . . these role types can be identified with certain quite elementary judgments about the things that go on around us: judgments about who does something, who experiences something, who benefits from something, where something happens, what it is that changes, what it is that moves, where it starts out, and where it ends up. (1968b, p. 38) Fillmore eventually settled for nine of these roles (1971, pp. 42, 50-1). He does not suppose a one-to-one correspondence between roles and (in English) prepositions; a single role can be indicated, in different sentences, by more than one preposition and the same preposition may be used for more than one role. Typical examples of prepositions for each role, however, are: Agent Experiencer Object
'by' 'by' -
Source Goal Instrument
'from' 'to' 'with'
Location Path Time
'in' 'along', 'through' 'at', 'during'
There is no preposition for the object-role in English.4 These roles are most easily understood from examples. It would be difficult to construct a single example exhibiting all nine, but (2)
Margot opened the cupboard outside her sister's bedroom with her key at 9:15 p.m. on her way from the kitchen to the attic along the landing
crams in eight of them. Margot is the Agent, the cupboard is the Object, outside her sister's bedroom is the Location, her key is the Instrument, 3 4
Due originally to Gruber (1965) but largely developed by Fillmore (1966, 1968a, 1968b, 1971, 1975a) and Anderson (1971, 1977). A more recent addition to this list is Benefactive, the role of something which benefits from an action. Indeed, some authors posit as many as twenty-five distinct roles, whereas others restrict them to four.
Semantic roles
11
9:15 p.m. is the Time, the kitchen is the Source, the attic the Goal and the landing the Path. For an example of Experiencer, (3)
Margot felt a sharp pain in her left foot
will do; although 'Margot' is the subject of both (2) and (3), Margot is not represented as an agent in the latter, but rather as undergoing something, experiencing it. Her left foot is another example of a Location. Fillmore's term 'awe-roles' is unfortunate, for it uses a syntactic phenomenon to characterize a semantic feature. These roles are to be found, indeed, just as much in uninflected languages like English which use prepositions where inflected languages use cases. Moreover, in speaking of cases we are referring to inflexions of nouns, whereas the roles belong to what is described or named by the nouns: proposition (2) presents Margot the woman, not her name, as Agent, the cupboard and not the expression 'the cupboard' as Object, etc. All the same, it is the proposition which presents Margot, the cupboard, etc., in these roles, which they are not committed to playing perpetually (Margot is Agent in (2) but Experiencer in (3), while in another proposition the cupboard might be Agent, for instance if it were represented as falling upon somebody). So, instead of 'case-roles', it will be more appropriate to speak of semantic roles.5 These semantic roles are properly to be regarded as going with verbs, in spite of their original introduction via inflexions of nouns. For each verb carries a selection of semantic roles with it and excludes others: thus giving involves a Source (the giver) and a Goal (the recipient). Verbs might even be classified by the roles which they carry; thus 'open' might be an Agent/Object/Instrument verb and 'give' an Object/Source/Goal verb. This idea has been incorporated into transformational grammar by including a semantic grid in the lexicon entry for each verb, for example [Agent, Object] for 'visit', [Experiencer, Object, Location] for 'feel' and [Source, Object, Goal] for 'give'. One semantic role is underlined in each of these lists to indicate that it relates to the subject of the verb, called its external argument.6 The remaining, internal arguments of the verb are its complements, so, given the semantic grid, we no longer need a separate sub-categorization frame, provided that we know how each role is to be expressed (for example the Instrument role by a P" whose P is 'with'). It 5 6
I first suggested this in (1978a). Chomsky adopts the same terminology in (1986a, p. 60), but most transformational grammarians use thematic roles (0-roles) instead. Chomsky has argued that the semantic roles of external arguments are determined by the whole verb phrase (1986b, pp. 60-2; cf. Radford, 1988, pp. 386-8).
12
Linguistics: strings
has also been found that certain other syntactic information can be predicted from semantic grids. Yet there are problems in taking over this account of semantic roles as it stands. First and foremost is the question how each role is to be identified and characterized, and how many there are. Fillmore's list of nine has not met with universal acceptance. No wonder, for its justification is far from clear: he describes the roles in very general terms and in formulations which do not stand up to close scrutiny, whereas the roles are presented to us in a language by its verbs. Thus his methodology is inverted; it is not that we first have a set of abstract role concepts and then proceed to recognize instances of them as being presented in propositions. Rather, it is through our understanding of the meanings of propositions that we come to formulate descriptions of roles presented by them. 7 Hence the correct method of identifying the roles would be by studying groups of verbs. A start has been made on this by a computer scientist, Roger Schank, whose work will be described in section 3.3. Meanwhile, any list of semantic roles must be regarded as tentative. A second problem for semantic roles is raised in a point made by Kenny a generation ago. If the meaning of a verb is partly characterized by the number and type of semantic roles which it introduces, then we should expect a fixed number of semantic roles to go with each verb, unless the verb had more than one meaning. But Kenny pointed out that, from a proposition like (4)
Brutus killed Caesar with a knife in Pompey's Theatre on the Ides of March
we can validly infer any proposition derived from it by deleting one or more of its prepositional phrases (1963, pp. 156-62). So if we were to classify 'kill' in (4) as, say, an [Agent, Object, Instrument, Location, Time] verb, how would we classify it in, for example 'Brutus killed Caesar on the Ides of March'? If as an [Agent, Object, Time] verb, and the meaning of a verb is partly determined by its associated semantic roles, then 'kill' cannot have exactly the same meaning in the two propositions. But not only would that be a totally arbitrary conclusion, without any independent evidence to support it; the validity of the inference would also be called into question by the equivocation in the meaning of 'kill'. Davidson (1967a) proposed to overcome this difficulty by analysing propositions like (4) so that each semantic role is introduced by a
7
This criticism is argued in detail in Potts (1978a).
Semantic roles
13
conjunction, thus: There is a killing and it is by Brutus and it is of Caesar and it is in Pompey's Theatre and it was on the Ides of March'. One of the simplest of all logical inferences is that from 'p and q" to 'p' (or to V), s o it is easy to see that any of the conjoined clauses may be dropped. This analysis certainly achieves its intended object, but at a heavy price. It destroys the category distinction between the verb and the names or descriptions of the players of the various semantic roles. Moreover, it is unclear what would be represented if we were to drop the very first (verbal) clause from the conjunction, as we are entitled to do; or, again, if we were to drop all but the first and second conjuncts, since, if 'killed' is a transitive verb, *'Brutus killed' is not a sentence. In one case, that of Location, the analysis is also inadequate. For the most part, the various actors in an action are in the same place, but not always. On Davidson's analysis, however, they must be, since the place is given as an attribute of the action, not of the individual actors. Communication provides common counter-examples to this assumption, such as (5)
Andrew telephoned from Leeds to Philip in New York.
If there was, then, a telephoning, was it in Leeds, in New York, in the transatlantic cable or now, perhaps, in the satellite and pathways through space? Clearly, all we can safely say is that Andrew telephoned to Philip and Andrew was in Leeds and Philip was in New York. Now this does, indeed, give us three conjuncts, so perhaps Davidson was on the right lines in that aspect of his analysis. Only we are left with an irreducible sentence 'Brutus killed Caesar', since neither *'Brutus killed' nor *'killed Caesar' is a sentence. However, this tallies with the intuition that it is part of the meaning of 'kill' that in any killing there must be both a killer and a killed (not necessarily distinct, though). We should then have to say that 'Caesar was killed', in which there is no explicit mention of the agent, has the same meaning as 'Something killed Caesar', but that we do not need to add 'by something' to the former precisely because it is implied in the meaning of the verb. The conclusion to be drawn from this discussion of Kenny's problem is that Fillmore's list of semantic roles would have to be pruned so long as we insist that a semantic role must be associated with a verb. Transformational grammar can handle this, however, by its distinction between complements and adjuncts. Thus in (4) 'Caesar' is a complement of 'kill' but 'with a knife', 'in Pompey's Theatre' and 'on the Ides of March' are all adjuncts. So the semantic grid for 'kill' would be just [Agent, Object]. We could then have semantic grids for prepositions as well, such as [Instrument] for 'with'. And a semantic role which was
14
Linguistics: strings
introduced by a prepositional adjunct to one verb might be introduced as a complement to another; thus it might be held that 'unscrew' demands the semantic grid [Agent, Object, Instrument]. A third and final difficulty remains: Fillmore requires that the same semantic role shall not occur more than once in the same sentence. As it stands, this requirement clearly will not do: where a proposition contains more than one verb, through the use of connectives, relative clauses, psychological verbs and so on, the same semantic role may be introduced by each of two different verbs and with respect to a different participant. So the requirement would at least have to be re-phrased to the effect that no verb shall have the same semantic role associated with it more than once. This is how it has been put in the Theta Criterion of recent transformational grammar, with the complementary stipulation that not more than one semantic role shall be assigned to each argument of a verb: each argument bears just one semantic role and each semantic role is assigned to just one argument (see Chomsky, 1981, p. 36). Even then, however, a justification for this restriction is still lacking. Were the restriction not present, there would often be room for dispute about which semantic role a participant was playing. Thus, in stating a semantic grid for 'give', I assumed that the giver and recipient of a gift play the roles of Source and Goal respectively, with the gift itself as Object. But why not, following the hint offered by the syntactic distinction between direct and indirect object, say that the giver is Agent, the gift the first Object and the recipient the second Object? Only because duplication of semantic roles has been excluded. The requirement is clearly convenient] but convenience hardly amounts to a justification. Moreover, it is prima facie counter-intuitive to insist that the giver of a gift is not, in respect of his action, an agent, whatever we may say about gift and recipient. This example brings out a further important point. The original application of Source and Goal was to movements of bodies (in the sense of Newtonian mechanics), the Source being their starting-place and the Goal their end-place. Now of course it is perfectly legitimate to look at giving as analogous to this: from the point of view of the gift, it 'moves' from being in the ownership of the giver to being in the ownership of the recipient. However, this is not the original sense of 'Source' and 'Goal', but a new, analogous one, which arises precisely from looking at giving in this way. And the 'movement' of ownership may not be accompanied by any physical movement of the gift itself, for example the gift of a house, so this is also movement in an analogous sense. Yet it is surely also legitimate to regard giving as an action in which the giver is the agent and the recipient the object, with the gift either as a further object or perhaps
Passives
15
as analogous to the instrument of a causal action, that by means of which a giving is effected. In spite of the uncertainty which still surrounds both the number and distribution of semantic roles, the notion promises to be of great importance in grammar, both syntactic and semantic. It will surface several times throughout this book, and one must hope that more intensive study will be devoted to it in the near future. Its immediate application, however, is to lexical insertion. When a verb is inserted under a V node, its complements will, where they are its internal arguments, simultaneously be assigned semantic roles and the whole verb-phrase verb will assign a semantic role to its subject, where that is an external argument. Clearly, then, it will only be possible to re-write in a context where the correct number of complements, to match the number of semantic roles, is available; hence sub-categorization frames are no longer needed. Strictly speaking, semantic grids are only assigned to syntactic structures at the point at which they are semantically interpreted. Nevertheless, semantic roles interact with syntactic operations which take place earlier, so it will be convenient to show semantic grids in phrasemarkers. They will be written underneath the lexical category symbols to which they pertain, though abbreviating each semantic role to a single letter. I shall use only three semantic roles, A (Agent), P (Patient) and / (Instrument). Thus the semantic grid for 'visit' will be [A,PJ. Correspondingly, the role assigned to each argument will be shown under its category symbol; the argument is then said to be semantically marked. So, for instance, if a prepositional phrase contains a noun phrase which, in context, describes an instrument, I shall write [I] under PP. Sub-categorization has been retained in head-driven phrase-structure grammar (Pollard and Sag, 1988), where it plays a large role. In this grammar, sub-categorization is introduced as a feature of lexical categories, with a list of phrasal categories as its value; these are the categories of its complements. In the case of verbs, if the list is empty, we have an intransitive verb, whereas for 'visit' we should have
V[SUBCAT:(NP)J and for 'give' V[SUBCAT:(NP,PP)], etc. This is less informative than a semantic grid, because it tells us nothing of the semantic roles associated with the complements, and nothing about the external argument. 1.3 PASSIVES A construction which is of especial interest with respect to meaning is the passive. The reason for this is as follows. If we compare the active
16
Linguistics: strings
sentence 'Dr Patel visited Mrs Wilson' and the corresponding passive, 'Mrs Wilson was visited by Dr Patel', we notice that they will be true under just the same circumstances. In order to understand an indicative sentence, however, we must at least know in what circumstances it would be true; that is a part, indeed a large part, of its meaning. Consequently, if we ask what is the structure of each of the above sentences with respect to its meaning (its semantic structure), we should expect the two structures at least to have much in common, even if they were not exactly the same. But now consider the passive sentence corresponding to (1), (6)
At least one patient was visited by every doctor.
A possible interpretation of (6) is that there is at least one patient such that every doctor visited him or her. This would be strained as an interpretation of (1), whose more likely meaning is that, taking each doctor in turn, you will find at least one patient whom he or she visited. That meaning could also be given to (6), especially in a context where the first suggested interpretation is unlikely to be true. For although any circumstances which satisfy the first interpretation will also satisfy the second, the converse does not hold. If the doctors in question were Dr Patel, Dr Hawbrook and Dr Farbridge, it would be enough to make the second true that Dr Patel visited Mrs Wilson, Dr Hawbrook Mr Oddy and Dr Farbridge Miss Verity, but not enough to make the first true; for that, all three doctors must have visited the same patient. From the point of view of truth conditions, therefore, propositions like (1) and (6) have a double complexity: first, they incorporate a distinction between visitor and visited and, second, a distinction between relating every doctor to some patient or other, perhaps different for each, and relating one particular patient to every doctor, the same for each. Moreover, the second type of complexity is absent from pairs like 'Dr Patel visited Mrs Wilson' and 'Mrs Wilson was visited by Dr Patel'. The two interpretations thus have different truth conditions and, hence, should be represented by different semantic structures. However, there is no difference upon one point: both sentences and both interpretations present the doctors as the visitors and the patients as the visited. English syntax gives priority to this point; according to the voice of the verb, the positions in which we place the noun phrases relative to it determines which represents the visitors and which the visited. How, then, can we account for the two interpretations with regard to the relationship between every doctor and at least one patient? For the grammar so far expounded there is a prior problem, since it will not even provide a structural analysis of (6), never mind relate it to
Passives
17
(1). The difficulty lies with the semantic grid for 'visit', for, if we try to use a modified version of (IP), we shall get:
? at least one patient
I—[—I I
V"
I
I
past
V | be visited [A,P]
I P" [A] | by every doctor
The semantic grid under ' V requires the external argument of 'was visited' to be associated with the Agent role, and the internal argument with the Patient role, but here they are the wrong way round. This is but one example of a much more general problem: there are many other constructions, too, for which the grammar will not provide. Now string grammars are classified into four types, known as the Chomsky hierarchy. These differ in the kinds of productions which are allowed. The first, type 0, is unrestricted. The second, type 1, is contextsensitive. Its productions must take the form yAd => y\xd, where strings of mixed terminals and non-terminals may be substituted for the Greek letters but only non-terminals for the Latin capital; y and 5 may be empty but JX may not. These grammars are called 'context-sensitive' for the obvious reason that A can be re-written as JJ, only when it occurs in the context of y and 8. By contrast, type 2 grammars are context-free, and have productions of the form A => fi. Finally, type 3 grammars are regular and have productions of the form A => bC or A => b, where only terminals may be substituted for the small Latin letters. Each of these grammars is a special case of the type one lower than it, but the languages which it generates are labelled as being of the type of the most restricted grammar which generates them. 8 The grammar as so far presented is context-free; yet 'there is little reason to doubt that all natural languages can be . . . generated by context-sensitive phrase-structure grammars' 9 (Chomsky, 1966, p. 29). So 8
9
For further details of string grammars, see Gonzalez and Thomason (1978), Gross (1972), Aho and Ullman (1972), Hopcroft and Ullman (1969), Marcus (1967), Chomsky and Miller (1956) and Chomsky (1956). A detailed attempt on these lines has since been made by Gazdar et al. (1985).
18
Linguistics: strings
why not use a context-sensitive grammar instead? The answer originally given was that grammar as a whole can . . . be regarded as a device for pairing phonetically represented signals with semantic interpretations, this pairing being mediated through a system of abstract structures generated by the syntactic component. Thus the syntactic component must provide for . . . each interpretation of each sentence . . . a semantically interpretable deep structure and a phonetically interpretable surface structure, and, in the event that these are distinct, a statement of the relation between the two structures. (Chomsky, 1962, p. 52) The motivation for transformation rules is thus a belief that the structure determining the meaning of a sentence will normally differ from that determining its form of expression in a given language and, hence, that the syntactic part of a linguistic theory must consist of two components, one generating deep structures (the phrase-structure rules) and the other relating them to surface structures (the transformation rules). The latter are so called, accordingly, because they allow us to transform one derivation of a structure into another. The original version of transformational grammar contained a Passive Transformation rule which derived passive sentences from the deep structures which, unaltered, would have yielded the corresponding active ones, by swapping the positions of the two noun phrases and passivizing the verb. A rough equivalent in terms of the syntax outlined above is: C N"i I V N"2
=*
C N"2 I (be + en) V by N"i.
If we apply this rule to the example above, the output will be: At least one patient past (be + en) visit by every doctor, and subsequent rules inverted the order of past be to yield be past and then was, as also of en visit, which becomes visit en and then visited. Thus the Passive Transformation rule took a string which would otherwise yield the sentence (1) and gave us a string which would yield the sentence (6). According to the theory, there should be no difference of meaning between these two sentences. In that case, we must say that both are ambiguous as between the two interpretations and, moreover, that the ambiguity cannot be resolved in terms of deep structures. Yet that is hardly satisfactory, for the second interpretation is unlikely for (1), so how does it become so much more probable for (6)? Moreover, if we start with an active sentence like 'Dr Patel visited Mrs Wilson', the rule gives us 'Mrs Wilson was visited by Dr Patel'; now these do have the same meaning (at least, to the extent that they will be true under just the same conditions), so how is it that the rule preserves meaning in this case but not in the transformation of (1) into (6)?
Passives
19
The Passive Transformation rule also has its drawbacks from a purely syntactic point of view. It does not offer any explanation of the structural changes which it introduces, so that we do not know why the object of the active verb is moved into subject position, nor why this cannot happen to the subject of an active verb, nor why the changes are just what the rule says they are. It is, moreover, heavily biased towards English, while even in English some passive forms are found for which it will not account, and there are many more such cases in other languages (see Keenan, 1979; Chomsky, 1981a, pp. 117-27). Transformational grammarians now prefer, accordingly, to decompose passive formation into a number of steps each of which has a much wider range of instances and is, hence, syntactically more fundamental. Within this framework, transformation rules have become essentially movement rules, allowing a constituent of a given category to be moved to an empty position of the same category (substitution rules)10 or allowing a constituent to be adjoined to another to form a constituent of the same category as that to which it is adjoined (adjunction rules). But this alone would be too permissive, allowing ungrammatical expressions to be generated, so it is tempered by various restrictions. One of these has already been mentioned, that no argument may have more than one semantic role, but most of the restrictions are expressed in terms of government, an application of a traditional syntactic concept (according to which verbs, prepositions, etc., govern different cases) to nodes of phrase-markers. The basic idea is then that an expression may only move to a position which governs it. The definition of government is based upon that of ccommanding: X c-commands Y just in case X does not dominate Y and every Z that dominates X dominates Y. Thus a node will c-command its sisters and everything which they dominate. If Z in this definition be restricted to full phrasal categories, then X is said to m-command Y, so that if one node c-commands another it m-commands it, but not conversely. M-commanding is the weaker notion; for example, a head introduced by the complement rule-schema will m-command, but not c-command, the intermediate category node which immediately dominates it. Government is the same as m-commanding, except that there may be barriers to it. These barriers are always full phrasal category nodes. The basic case in which such a node is a barrier to government is when it is not semantically marked, that is, has not been assigned any of the 10
The F-movement rule is an exception to this, because it moves a verb from being under a F-node to being under an /-node.
20
Linguistics: strings
semantic roles; another way of putting this is that a full phrasal category is a barrier unless it is V" or the complement of a verb or adjective. This has an immediate application to passive constructions, for, if we suppose that the 'by'-phrase of a passive sentence originates in deep structure as a complement of the verb, it cannot be semantically marked, because the semantic role which should be associated with it is the external argument of the active verb. So the P" node above the 'by'-phrase would be a barrier to government, for example by the head of the verb phrase. This account of barriers is supplemented by a provision for barrierhood to be inherited by a full phrasal node higher up the tree from one which it dominates; but I shall not go further into this complication here. The notion of barrier is used to formulate a further restriction upon transformational movements, the Subjacency Condition: a movement may not cross more than one barrier. Reverting, now, to (6), one proposal for generating passive constructions is to categorize 'visited' as an adjective, with 'at least one patient' and 'by every doctor' as its complements, the former being moved into an empty N" position generated by the (SI) rule (Jacobsen, 1986, p. 158). The problem with this is that adjectives do not carry semantic roles, so it leaves 'at least one patient' and 'every doctor' with no roles assigned to them. Yet it is evident that these expressions do have semantic roles in (6). Most transformational grammarians, however, draw a distinction between syntactic and lexical passives. The latter, which do not concern us here, are illustrated in English by words like 'untaught' and 'unknown'; they are simply treated as adjectives and provided as such by the lexicon for insertion into appropriate deep-structure phrasemarkers. In (6), by contrast, 'visited' is a syntactic passive, and the supposition is that, in the deep structure, it carries with it the direct object of its active form, which is subsequently moved into an empty subject position to become the subject of the eventual passive sentence. The latter thus derives from a different deep structure from the active one. According to another proposal, then, 'be' is the main verb of a passive sentence and the passive participle is the verb of a subordinate clause. Let us consider how this might apply to (6). If'visited by every doctor' is the remnant (after deletion of empty positions) of a subordinate clause, what is the latter's category? On this account, it is a small clause. Small clauses are effectively sentences lacking an inflexion and sometimes, also, lacking a verb, for instance the italicized phrase in 'Romeo expected Juliet {to come) before noon'; but they always have a subject. Opinions differ as to their structure; one view, in which they are compatible with the three ruleschemas, is to assign them to a full phrasal category X", where Xmay be
Passives
21
/, V, A or P. We then have X" => N" X' by rule-schema (S) and X' then decomposes into a head and complements in the normal way.11 For a passive sentence we should want the structure of the small clause to be N" V. The suggestion is then that the object of the (active) verb in V is moved first to the (empty) N" position, and thence outside the small clause altogether to become the new subject of the whole sentence (see Radford, 1988, p. 445). Thus, for 'Mrs Wilson was visited' we might have the deep structure: I" 1 N" V"
r
I
e[past] V
be
I
I" I N"
I
I
V
I visited [A,P]
V" 1
N" [P] Mrs Wilson
A transformation is also required in order to inflect the verbs; this is effected by moving the verb under the /-node whenever the latter is empty. 12 It does not concern us here and I shall take it for granted in future examples. This solution only caters for passives which lack a 'by'-phrase, and it is obvious that it will run into difficulties if we supply, for example, 'Dr Patel' as the subject of the small clause; for then the movement of 'Mrs Wilson' to that position is blocked, while there is nowhere for 'Dr Patel' to go in order to yield 'visited by Dr Patel'. The most recent proposal is therefore that the passive participle be analysed into two constituents, the first the verb root, which carries with it the semantic roles of its internal 11 12
See Stowell (1981). This condition is subsequently relaxed to allow movement of the verb under the /-node when the latter is occupied by the past participle suffix -en (see below).
22
Linguistics: strings
arguments, and the second the suffix -en which carries the semantic role of the verb's external argument. Where, however, a 'by'-phrase is also present in the sentence, the semantic role of the external argument of the verb is assigned to it by the passive suffix; this is achieved by providing an optional sub-categorization for the latter with the frame / by N"] (Jaeggli, 1986). This idea has been further developed by Baker, Johnson and Roberts (1989). Instead of splitting past participles under a Fnode into verb root and suffix, they propose that the latter be considered an inflexion in deep structure, to be combined with the verb root later by a transformation. 'Be' in the passive voice is treated as a separate verb, which thus requires its own inflexion for tense, etc. The resulting deep structure for (6) would then be given by the phrase-marker (6P). (6P)
I" I N" e
1 r l
r
; past
1
1 r
V
1
1
be
1
1 r 1
I I
I
I
V"
i
i
-en [A]
V I I
l_
r
i
P" I
N" [P]
I
I
visit by every doctor | [A>p] at least one patient Where, as here, a 'by'-phrase is present, the external semantic role assigned by the verb is considered to be transferred down the tree from the passive suffix to the noun phrase within the prepositional phrase. The relevant transformation is movement of 'at least one patient' to the empty TV" position. However, a trace is left in the vacated positions, and this is to be regarded as filling the place which it occupies, so that the latter is no longer available to be filled by a subsequent transformation movement. Moreover, the moved constituents are indexed and the same
Passives
23
index is attached to the trace marking their original positions (or intermediate positions, in the case of successive moves); constituent and traces are then said to be co-indexed. Using 't' for traces, and numeric sub-scripts for co-indexing, (6P) will yield the following shallow structure: (6S)
at least one patienti was2 t2 visited3 t3 by every doctor t\.
From a semantic point of view, there is a serious difficulty in this proposal which its authors do not even consider: what sense can we attach to the notion of a semantic role being associated with the passive suffix? It makes sense to associate a semantic role with, for example, one of the arguments of a verb, because the noun phrase occurring in that argument describes whatever plays that role. In the case of the passive suffix, however, nothing is described which could play the role of for instance, Agent. Consequently we cannot attach a meaning to the proposal, despite its conformity with the technicalities of transformational grammar; the language in which it is set out has simply 'gone on holiday', to use Wittgenstein's phrase. The essence of the current analysis of passives by transformational grammarians is to see them as containing a subordinate clause (on which the small clause is a variant). The analogy is with sentences like Thilip reckons the computer (to be) good value', which can be paraphrased by Thilip reckons that the computer is good value', in which the subordinate clause is explicit. The intended parallel is then: Philip reckons the computer (to be) good value Mrs Wilson was Dr Patel -visited in which the Agent of the passive sentence is the analogue of the subject of the subordinate clause in the first example; I have tried to bring this out by forming a composite predicate 'Dr Patel-visited' which is something that Mrs Wilson can be. But some transformational grammarians remain unhappy with small clauses (Radford, 1988, pp. 515-20; cf. Williams, 1983, and Postal and Pullum, 1988), while another has written of their supposed analogues: 'It is one of the enduring embarrassments of the field that this apparently routine question of constituency has proven so difficult to resolve' (McCloskey, 1988, pp. 556). Head-driven phrase-structure grammar also adopts a two-verb analysis of passives, but avoids transformations by sub-categorization which makes the grammar context-sensitive. The central provision is to sub-categorize 'be' in passives so that its subject is the same as the
24
Linguistics: strings
(missing) subject of the verb phrase which follows it; this is done by introducing a variable as the value of the feature 'subject', as follows: V [SUBCAT,(VP [ +PASSIVE; SUBJ,Y]>; SUBJ,Y]. There is then no need to generate a structure with empty N" positions, but this complication is avoided only at the price of further, extensive subcategorization instead. The detail need not concern us, because the basic structural analysis is the same as in transformational grammar.
1.4 P R O N O U N S AND RELATIVE CLAUSES Two other types of construction which put a theory of meaning to the test are propositions containing anaphoric pronouns (those which have antecedents within the proposition) and relative clauses. Transformational grammar treats pronouns as noun phrases, that is, insertions directly under N'\ but distinguishes them into anaphors and pronominals. Anaphors comprise reflexive pronouns and expressions like 'each other'; pronominals include personal pronouns, whether anaphoric or not. In deep and shallow structures, the antecedent of an anaphoric expression is shown by co-indexing, which must, accordingly, be regulated. This is done by principles of binding, which is defined in terms of ccommanding: X is bound | 1 3 X is an argument and for some argument Y: Y c-commands X and Y is co-indexed with X. Arguments which are not bound are said to btfree. Since both X and Y in this definition have to be arguments, they will both be noun phrases. There are three principles of binding, which relate it immediately to the treatment of pronouns: (Bl) (B2) (B3)
an anaphor is bound in its governing category; a pronominal is free in its governing category; other noun phrases are free everywhere.
This means that an anaphor must be co-indexed with an antecedent related to it in a particular way, determined by the notion of governing category, which is defined as follows: X is a governing category for Y J X is the minimal I" or N" containing both Y and a governor of Y. 13
Following Frege, I use 'I' as a sign for a definition.
Pronouns and relative clauses
25
The effect of the binding principles on the treatment of pronouns may best be illustrated by examples. Consider, then, the deep structure required in order to generate (7)
Octavian defeated Antony and he killed himself.
The phrase-marker, with co-indexing shown, as before, by sub-script numerals, would be as in (7P). (7P)
I"
F
I
I
I" I
I
N"[A] Octavian
I
and
I e[past] V
I
I" 1
r
N"[A]
V"
he, N" [P]
defeat | [A, P] Antony2
I
r
I
V"
e[past] V I
N" []
kill [A, P] himself,
If we look at the branch of this phrase-marker to the right of'and', we see that 'kill' governs [N\V]. The minimal T' or N" dominating both of these is the / ' to the right of 'and', so this is the governing category for [Nf\V], We have inserted an anaphor under that node, so by (Bl) it must be co-indexed with an argument dominated by the same /" node and which c-commands it. The only candidate is [N"J']y under which we have inserted a pronominal. The latter, accordingly, must be co-indexed with the anaphor. However, by (B2), the pronominal is free with respect to the F node. We are, though, free to co-index the pronominal with either of the nouns in the branch of the tree to the left of the 'and'. Here it is shown co-indexed with 'Antony', giving the sense that Antony killed himself, the most natural interpretation of (7). But we could have coindexed it with 'Octavian' instead, and then the sense would be that Octavian defeated Antony and Octavian killed himself - a possible but unlikely reading of (7). Finally, if the pronominal were not co-indexed with either of the preceding nouns, an alternative for which the binding rules allow, the sense would be that Octavian defeated Antony and
26
Linguistics: strings
someone else (exactly who being grasped, presumably, from the context) killed himself. In treating relative clauses, transformational grammarians distinguish between restrictive and appositive ones. The former are sometimes also called defining relative clauses, as in this characterization by Fowler: A defining relative clause is one that identifies the person or thing meant by delimiting the denotation of the antecedent: Each made a list of books that had influenced him; not books generally, but books as defined by the that-clause. Contrast with that: / always buy his books, which have influenced me greatly; the clause does not limit his books, which needs no limitation; it gives a reason (= for they have), or adds a new fact (= & they have). (1926, entry that, rel. pron.) This is a semantic explanation of the difference between the two types of clause; transformational grammarians would no doubt prefer a more syntactic characterization. Thus restrictive clauses may be introduced by a relative pronoun inserted under an TV" node, by the complementizer 'that' (finite clauses only), or may have no overt introducing expression; they can sometimes be separated from their antecedents, but they cannot modify proper nouns. Appositive clauses, by contrast, are always introduced by an overt relative pronoun inserted under an N' node, cannot be separated from their antecedents but can modify proper nouns; they are by way of parenthetical comments or afterthoughts, and this is often indicated by intonation or punctuation (commas, hyphens, parentheses) (Radford, 1988, pp. 480-1). Sometimes the intonation or punctuation is essential to mark the difference; thus the relative clause in (8)
Eskimos, who live in igloos, have lots of fun
is appositive, but in (9)
Eskimos who live in igloos have lots of fun
is restrictive. On other occasions a sentence containing a relative clause may be ambiguous: (10)
Serious works on Russia from Polish sources, which are not intended as merely propagandist pamphlets, are a valuable contribution towards a better understanding of that country
on which Fowler comments: 'If the clause is non-defining, . . . none of these serious works are propagandist, and all are valuable. The real meaning is that some of them are free of propaganda, and are therefore valuable' (1926, entry which) (that) (who, 3).
Pronouns and relative clauses
27
Fowler, indeed, proposes to mark this difference of meaning by reserving 'who' and 'which' for appositive relative clauses, and 'that' for restrictive ones: The two kinds of relative clause, to one of which that and to the other of which which is appropriate, are the defining and non-defining; and if writers would agree to regard that as the defining relative pronoun, & which as the non-defining, there would be much gain both in lucidity & in ease. Some there are who follow this principle now; but it would be idle to pretend that it is the practice either of most or of the best writers. (1926, entry that, rel.pron.)
He gives a long list of examples to show that the practice which he recommends would aid in clarity; but contrary usage both before and since he wrote makes its general adoption, however advantageous, highly improbable. Yet it can still be a useful convention in the presentation of linguistic examples and I shall follow it where possible. From the present point of view the importance of this distinction is that it sometimes carries with it a difference in the circumstances under which a sentence is true and, hence, a difference of meaning. Thus (8) says of eskimos in general both that they live in igloos and that they have lots of fun, whereas (9) says only of eskimos that live in igloos that they have lots of fun. Similarly for the two readings of (10). Our question must therefore be whether this difference of meaning can be accounted for by a structural difference. Jackendoff (1977, section 7.2) notes that where a noun is qualified both by a restrictive and by an appositive relative clause, the former must come first and the latter after it, thus: The man that came to dinner, who was drunk, fainted'. Moreover, restrictive relative clauses can be concatenated, for instance, 'The man that came to dinner that gobbled his soup fainted', whereas appositive ones must be joined with an 'and', for example 'Eskimos, who live in igloos and who hunt seals, have lots of fun'. Again, when the noun modified by the relative clause is in object position, the sentence can be negated if the clause is restricted but not if it is appositive. Thus we can have 'Baldwin did not greet the man that was sitting opposite him' but not *'Baldwin did not greet the man, who was sitting opposite him'. From all this, Jackendoff concludes that restrictive relative clauses are more intimately tied to the noun which they qualify than are appositive ones. His version of X-bar syntax allows for triple-barred categories, so he accounts for these differences by generating appositive clauses as complements of N" (immediately dominated by triple-bar N), but restrictive clauses as complements of N' (immediately dominated by
28
Linguistics: strings
N").14 This resort is not available in a system restricted to double-barred categories but, since part of the argument for distinguishing adjuncts from complements is that adjuncts always follow complements, it is open to us to generate restrictive relative clauses as complements of N, and appositive ones as adjuncts of N'. Since a head can have more than one complement, this allows for concatenated restrictive clauses; but as a head can only have one adjunct (at a time), multiple appositive clauses at the same level would have to be formed by conjunction. This yields the phrase-marker (9P) for (9). (9P)
I" I
I
I
N" [A]
I
I
N"
I
I
eskimos
I
C"
I
N" I
I
I C I
have lots of fun [A]
I
C 1
I
I I" I
I
N" r [A] I live in igloos [A] whThe reason for the empty N" here is that, where the verb of the relative clause is transitive, its object may give rise to the relative pronoun, as in 'someone whom I met'. In that case, 'wh-' will be inserted under the N" node under /' (not shown here) and subsequently moved by a transformation rule to the empty N" position. In order to produce Fowler's canonical form of (9), 'Eskimos that live in igloos have lots of fun', we insert 'that' as complementizer under the C node; 'wh-' is also inserted as above, and moved into the empty N" position, being deleted in the phonetic component. It is therefore supposed that, even in the present case, 'wh-' is also moved to the empty N" position. 14
Van Benthem (1988, p. 40) asserts that there is strong syntactic evidence in many languages for the structure '(Determiner Noun) Relative-Clause' rather than 'Determiner (Noun Relative-Clause)', but he presents no evidence and transformational grammarians evidently disagree with him.
Pronouns and relative clauses
29
The phrase-marker for (8) is only slightly different from (9P), but places the relative clause at one further remove from the modified noun. (8P)
I" I N" [A] have lots of fun [A]
1
N' I C
N' N
1
1*
imos
e;
1
C 1
1
I"
C
r N" [A]
1
live in igloos [A]
wh-
This makes no difference to the transformation and traces which it leaves, so that the shallow structure for both (8) and (9) is: (8/9S) eskimos wh-i e tx live2 t2 in igloos have3 /3 lots of fun, with 'that' replacing 'e' as an option for (9S). I shall now consider how transformational grammar would handle three examples which combine reflexive, personal and relative pronouns and which are especially challenging when we ask how to represent their meanings. In the first example, the relative clauses are restrictive (according to the criteria given by transformational grammarians) and are concatenated: (11)
Anyone that hurts anyone that hurts him hurts himself.
I assume that 'hurts' is an [Agent, Patient] verb. The two restrictive relative clauses here are nested, not concatenated: the clause 'that hurts him' qualifies the second 'anyone', not the first. So, using a broken line for omitted steps unimportant to the present example and which the reader can supply for himself, the phrase-marker of the deep structure should be as shown in (IIP).
30
Linguistics: strings
(IIP)
I
N"[A] I
Det
N'
I any
I
r N
I
I
C"
I
one! N"
hurt [A,P]
1
I
N"[P]
V
c
that
V I
himselfj
I
N"[AJ V
whI V
I N"[P]
_J
hurt Det [A,P] I any
N'
r N one
I N"
C" I
r
c
that
e l
r N"[A] wh-
H
V
V N"[P]
hurt hinij [A,P]
The anaphor at the end of the sentence is compulsorily co-indexed with the first occurrence of 'anyone', by the same reasoning as in (7P). The pronominal 'him' is free in its governing category, which is the first /"
Pronouns and relative clauses
31
above it, so we are permitted to co-index it also with the first occurrenee of 'anyone', as the most natural sense of (11) requires. By way of transformation, each occurrence of 'wh-' will be moved to the nearest empty N" position to its left, leaving, of course, a trace. Assuming also movements of the verbs to their respective / nodes (omitted above), so that they can be inflected for tense, we shall then have as the shallow structure: (1 IS) anyonei wh-2 that t2 hurts3 /3 anyone wh-4 that /4 hurts515 himi hurts6 t6 himself!, which will yield (11) by elimination of 'that' and the traces, together with appropriate inflexion of 'wh-' (as 'that', rather than 'who', under the Fowler convention). My second example is a Bach-Peters sentence, which is interesting because the subject noun phrase contains a pronominal which relates to the object noun phrase, so that the pronominal has a 'postcedent' rather (12P)
Det
fool [A,P]
love [A,P]
hini2
32
Linguistics: strings
than an antecedent, while the object noun phrase also contains a pronominal whose antecedent is in the subject noun phrase: (12)
A boy that was fooling her kissed a girl that loved him.
For the present, I shall assume that these relative clauses are also restrictive, on the ground that they, too, satisfy the criteria described above; but this will be called into question in chapter 4. The phrasemarker for the deep structure will then be (12P). Following the same transformational and co-indexing procedures as in the previous example, we shall obtain from this the shallow structure (12S) A boy! wh-2 that t2 was fooling her3 kissed4 /4 a girl3 wh-5 that t5 loved6 t6 My final example comes from the genre of 'donkey'-sentences, so-called (13P)
I" I
I
N"[A] I
Det
I
N' I
every N
onei
I
shall [past]
C"
r
I
I c
N"[P]
I
N"
i that
I
I
N"[A]
I
return [A,P]
it2
F
r
I
V"
wh-i I
r V
V 1
I I borrow Det [A,P]
N"[P] I
I
N' N
book2
The semantic component
33
after the medieval example 'Every man who owns a donkey beats it'. For a change, I offer the following variant: (13)
Everyone that borrows a book should return it.
The full interest of this example will only emerge in chapter 4, but as a foretaste we may note that an attempt to form a corresponding passive sentence, *'It should be returned by everyone that borrows a book' is inadmissible given that the pronoun is to be anaphoric. The phrasemarker (13P) for the deep structure presents no special difficulties, however. With the usual transformations again, the shallow structure is: (SI3) Every onej wh-2 that /2 borrows3 t$ a. book4 should return it4. Transformational grammar is thus able to generate all of these examples; in head-driven and generalized phrase-structure grammar, they are again catered for by sub-categorization. In the case of relative clauses, a feature SLASH is introduced which has categories as its values. Rules then provide for introducing this feature at the / node of the relative clause and passing it down to each subsequent node on a path which eventually leads to e. There is thus no fundamental difference between the structures posited by these grammars and the shallow structures of transformational grammar described above. Our next question, therefore, must be whether it can interpret them satisfactorily. In order to answer that, we have to look at the semantic component of the grammar.
1.5 THE SEMANTIC COMPONENT In the original version of transformational grammar, semantic interpretation was applied to deep structures, as an appendage to the system of syntactic features which governed lexical insertion. Interpretative semantics, as it was called, aimed to explain how one or more meanings may be attached to a sentence given one or more meanings for each of its constituents (Katz and Fodor, 1963; Katz and Postal, 1964). The method adopted was, first, to specify a form for dictionary entries of lexical items which distinguished different meanings of each item by a tree, each meaning corresponding to one path along the tree; and, second, to specify rules telling us how such paths may be combined. This approach was criticized by Weinreich on the ground that it tried to combine a recognitional theory of meaning with a generative syntax, whereas 'Semantic theories should be formulated so as to guarantee that deep structures (including their lexical components) are specified as
34
Linguistics: strings
unambiguous in the first place . . . and proceed from there to account for the interpretation of a complex expression from the known meanings of its components' (1966, pp. 398-9) Instead of modifying the semantic component on the lines suggested by Weinreich, however, transformational grammarians now propose a completely different account. To begin with, semantic interpretation is applied to shallow structures, 15 that is, the output of the transformation rules, and no longer to deep ones; the most important reason for this change is that the interpretation requires the traces introduced by the transformation rules, since it is assumed that the assignment of semantic roles is left undisturbed by the latter. Next, semantic interpretation is separated into two stages. The first maps a shallow structure onto LF ('logical form' 16 ) which is restricted to 'those aspects of semantic representation that are strictly determined by grammar, abstracted from other cognitive systems' (Chomsky, 1977, p. 5) and whose status and properties are empirical matters, 'not to be settled in terms of considerations of valid inference and the like' (Chomsky, 1986b, pp. 67, 205n.). The second stage brings to bear such extra-grammatical knowledge as is required to complete an explanation of the meaning of the expression concerned. The semantic component of the new transformational grammar deals only with the first stage, and is thus considerably less ambitious than the earlier version. The basic rule for deriving LF from S-structure, termed by May (1985, p. 5) QR, allows us to move a noun phrase occurring under an S node and adjoin it to the left of the latter. It leaves a trace, with which it is coindexed. May sees QR as analogous to the rule for forming questions (or relative clauses) which moves 'wh-' to the C position under S' in deriving surface from deep structures. Thus we may compare: (14)
John saw who?
yielding (14LF) ((whoOc (John saw (/,)N0s)s with (15)
John saw everyone
yielding 15
16
Alternatively, shallow structures can be fed to a phonetic-form component which yields surface structures proper, that is, actual sentences. This component will not concern us here. This turns out to be different enough from what logicians would call 'logical form' to warrant retention of the scare quotes whenever the term is used.
The semantic component
35
(15LF) ((everyoneON- (John saw (/i)N-)s)s This, of course, pre-dates the latest modifications to transformational grammar. The movement of 'wh-' is now presumed to be to a leftmost node under C" which may be any of N\ P" or A" (see Radford, 1988, section 9.10, especially p. 504). Thus (14LF) becomes: (14LP)
«who,) N . (John saw (/,)N")e)c".
If the analogy is to be preserved, (15LF) would have to be altered to: (15LF)
((everyonei)N« (John saw (/i) N ")c)c.
the place for 'everyone' being provided by the specifier rule (SC) and no longer by an adjunction rule. We could then formulate the QR rule as follows: (QR)
(. . .(X)N-. . .)c => ((Xi)N"(. • .(/s)N-. • .)c)c-.
May imposes certain restrictions on QR. One is that Xj c-commands t\\ this is already built into the formulation above. Another is that QR may not be applied to proper nouns, only to a noun phrase consisting of a determiner and count noun. May's initial approach to multiple quantification is to allow repeated applications of QR, the first application yielding the leftmost noun phrase. Now the shallow structure deriving from (IP) will be (IS)
(every doctor)N» (visited\ t\ (at least one patient)N")r,
where t is the trace left after moving 'visit' under the empty /-node. With one application of QR, this would become: (every doctor2)N» (h visited i t\ (at least one patient)N»)r and then, by a second application: (lLFa) (every doctor2)N» (at least one patient3)N» (/2 visited! t\ ty)v But it would be equally legitimate to apply QR to 'at least one patient' first, giving the result: (lLFb) (at least one patient3)N» (every doctor2)N» (h visitedi t\ t^r It would thus be possible to obtain two distinct LF representations from (IS), corresponding to the two interpretations which its syntactic form leaves open. (In (lLFa) 'every doctor' c-commands 'at least one patient' but, in (lLFb), conversely.) Exactly the same applies to (6); from (6S) we can obtain either (6LFa) (at least one patient3)N» (every doctor2)N» (/2 was4 /4 visitedi by t\ ty)v
36
Linguistics: strings
or (6LFb) (every doctor2)N- (at least one patient3)N» (t2 was4 t4 visited i by t\ /3)i', with no explanation why the latter is much less probable than the former. May allows that in a particular case only one interpretation might be possible but, if so, then the other will be ruled out on grounds which are not purely syntactic. He does acknowledge a difficulty, though, with Bach-Peters sentences like (12). Beginning with a simplification corresponding to (16)
A boy kissed a girl that loved him,
he temporarily accepts the LF representation (16LF) (a boyi)N" (a girl that loved him2)N» ('i kissed /2)r but rejects (a girl that loved him2)N' (a boyi)N» (t\ kissed t2)v on the ground that the pronoun 'him' is not c-commanded by 'a boy' in the latter, whereas it is in (16LF). When we try to apply QR to (12S), however, there is no way in which we can so represent its LF that 'him' is c-commanded by 'a boy that was fooling her' and 'her' c-commanded by 'a girl that loves him', for the only possibilities are (a boy that was fooling heri)N» (a girl that loved him2)N» (/i kissed /2)r, in which 'him' is c-commanded by 'a boy that was fooling her' but 'her' is not c-commanded by 'a girl that loved him', and (a girl that loved him2)N» (a boy that was fooling heri)N« (t\ kissed /2)i', in which the converse obtains. His solution is to propose an Absorption of two noun phrases, one of which c-commands the other, into 'something like a conjoined constituent' (1985, p. 21) in which each c-commands the other. This requires a new expedient in indexing, in which an index is attached to the conjoined constituent: (12LF) ((a boy that was fooling heri)N» (a girl that loved him2)N»)N"2 (ti kissed r2)r. Donkey-sentences are treated in the same manner (pp. 74, 152). He rejects the analogue of a booki ((everyone that borrows
/I)N"2
(h should return i
The semantic component
37
for (13) on the ground that the 'wh-' movement would produce *'Which book should everyone that borrows return it?', and proposes instead: (13LF) ((a booki)N- (everyone that borrows /I)N")N-2 (f2 should return iti)r. This appears to be more of an ad hoc than a principled solution, for it assumes a grammatical rule quite foreign to every version of transformational grammar, N" =» N" N", which is crucially unlike a conjoined constituent in having no conjunction. Moreover, so far as I can see, there is no way in which Absorption might be integrated into the latest version of transformational grammar as a case of attribution or of adjunction, for that would have to take place at the N' level, at which the quantifying phrase has been stripped of its determiner. Absorption would also apply, of course, to examples like (1) and (6), so that we finally have (lLFc) (every doctori (at least one patient2)N')N»2 ('i visited /2)r and (6LFc) (at least one patient3 (every doctor2)N»)N"2 (h was4 t4 visitedi by tx h)v But (lLFc) would be susceptible both of the interpretation that it is true of every doctor that he or she visited at least one patient, and of the interpretation that it is true of at least one patient that every doctor visited him or her, and a similar ambiguity would be intrinsic to (6LFc). So there would be no way of distinguishing these two meanings. May now discovers a difficulty in his representations of LF for sentences involving multiple quantification, prompted by the analogy which he sees between quantifiers and interrogative or relative pronouns. In order to rule out *'What did who admire?' (in contrast to 'Who admired what?'), transformational grammarians have proposed an Empty Category Principle (ECP) to the effect that an empty category must be governed by the expression to which it is co-indexed. This means that when the empty category occurs in a subject position, the co-indexed phrase must be adjacent to it. Example (lLFb) fulfils this condition, but (lLFa) and (lLFc) do not, so if the ECP applies at LF, the only LF corresponding to (1) would be (lLFb). How, then, are we to provide for the two interpretations of (1)? May's answer is to modify the definition of 'c-command' so that the two quantifying phrases mutually govern each other and then to provide alternative interpretations of (lLFa). This is to claim, in effect, that the difference between them is not structurally
38
Linguistics: strings
determined by syntax. Yet previously he thought that it was so determined. What caused him to change his mind? Not, it seems, any clearer view of what aspects of meaning are determined by syntax, but the presumed analogy with interrogative pronouns and their treatment in the syntactic theory. The modification also raises a further difficulty, which is brought out by the example: (17)
Some student admires every professor, but John doesn't.
May assumes that 'but' conjoins two sentences here. Consequently, (every professori (some student2 (/2 admires /i)r))r but (John doesn't (admire /i)v")r must be rejected, because the second occurrence of t\ is not c-commanded by 'every professor'. Instead, he proposes that 'every professor' be adjoined to the verb phrase, being repeated on the latter's second occurrence: (some student2 (f2 (every professori (admires fi)y)r)r)r but (John doesn't (every professor3 (admire f3)v")r)r« May finds further support for this proposal in examples like (16), where (16LF) would commit him, as one possible interpretation, to *'there is at least one girl who loved him such that a boy kissed her'; in order to avoid this, he proposes to adjoin 'a girl that loved him' to the verb phrase. In the most recent version of transformational grammar, 'every professor' and 'a girl that loved him' would have to be attributes of F \ with the effect that / occurred to the left of it. So now we have two positions to which QR may move a noun phrase: it may become an attribute of C" or of V'\ and QR would, accordingly, require amendment. May acknowledges that quantifying phrases as adjuncts of verb phrases are awkward, in that a quantifying phrase is customarily regarded as qualifying a sentence frame (frame, because it will contain at least one empty category trace). Although he argues that quantifying phrases adjoined to verb phrases do also c-command the traces in subject position immediately preceding them, he subsequently discovers further exceptions to the Empty Category Principle when extended to quantification (1985, pp. 115-17). This leads him to replace it by a Path Containment Condition (PCC). A path is here understood to be 'a set of occurrences of successively immediately dominating categorial nodes' in a phrase-marker from an empty category trace to the quantifying phrase with which it is co-indexed (p. 118). The condition is then that paths must not cross over each other, though they may share common segments
The semantic component
39
(parts). This still rules out (lLFa), since paths 1 and 2 would cross, but allows (lLFb), in which they do not. Similarly, it allows for verb-phrase adjunction, since the paths do not then intersect at all. So far, paths have been defined as starting with a co-indexed empty category symbol, but May then broadens the definition to allow them to begin with co-indexed pronouns. With this extension, the PCC differentiates between LFs for examples like (16) and (17) in which the quantifier phrase is adjoined to the verb phrase, which conform to it, and those in which it is adjoined to a sentence-node, which do not. Moreover, while allowing (13), it excludes the corresponding passive *4It should be returned by everyone that borrows a book' in which the pronoun is to be understood as anaphoric. He also claims that the PCC legitimizes (12LF), for the path from t2 to wa girl . . .' does not cross that from t\ to 'a boy . . .', while the paths from the two pronouns to their related quantifier phrases are internal to the combined noun phrase. However, as he does not analyse the restrictive relative clauses in each of the quantifier phrases, we need to look at this more closely, applying QR to (12S). The result would be: (12LF')
(a boy } wh- 2 that t2 was fooling her3)N» (a girl3 wh-4 that
loved
N'^N"! (/ 3 kissed t\)\>.
Now in (12LF) 'a boy' and 'a girl' are not indexed, but according to the rules for co-indexing pronouns, they must be so when the phrasemarkers for the quantifying phrases are spelled out in full. Paths 2 and 4 are duly nested within paths 3 and 1 respectively and pose no problem. But there is no way of avoiding a crossing of paths 1 and 3 at the N" node which dominates 'a girl that loved him', and that is a categorial node. This will be clear from inspection of a simplified phrase-marker corresponding to (12LF'): C" I
i
Det
N"
N"
N"
N'
D
N' N
C"
boy!
that was fooling her 3
a
r N girl3
I
I'
7 kissed t\
C" that loved hi
40
Linguistics: strings
Path 1 (from 'him') =
, but path 3 (from t3) = and the two paths must cross either at N"2 or at some point below it on the 'a girl that loved him' branch. Thus May has no way of representing an LF for this example which conforms to his own rules. Other transformational grammarians treat pronouns in LF rather differently from May. If we consider again the first example of section 1.4, the shallow structure is the same as the deep structure, so far as the pronouns are concerned: (7S)
(Octavian (defeated2 t2 Antonyi))i» and (hei (killed3 t3 himselfi))r'.
Here the antecedent of the pronouns is a proper noun. It seems from examples (see Chomsky, 1977, p. 195; Jacobsen, 1986, p. 346) that such an antecedent is considered on a par with a quantifying phrase and preposed in the form 'for x = Antony'. The pronouns themselves can then be interpreted in accordance with the anaphora rule, to the effect that a pronoun whose antecedent is a quantifier phrase is to be re-written as the letter which is added to the latter when it is preposed. The numerical sub-scripts then become redundant and the LF for (7S), omitting the traces of the verb movements, will be: (7LF) for x = Antony [Octavian defeated x and x killed x]. It is unclear whether 'Octavian' should be treated in the same manner; examples in the literature containing proper nouns which are not the antecedents of pronouns are left unchanged. There is some dispute whether relative pronouns should be represented on a par with interrogative ones, but Chomsky prefers to treat a relative clause as a predicate of its head; thus, given a shallow structure the mani [wh-2 hisi mother loved /2 best], a rule of predication identifies T and '2' to yield: the mani [wh-! t\'s mother loved t\ best] (1982, pp. 13 and n. 11, 92-5). This would simplify slightly the structures proposed by May, for example by identifying sub-script 2 with sub-script 1 in (12LF'), and sub-script 4 with sub-script 3, but would make no essential difference. Chomsky has subsequently extended this treatment to clauses from which the relative pronoun has been omitted; the empty category sign 0 is then replaced by a letter and 'wh-' by an operator 'O' (1986b, p. 84). The difficulty with (12) also remains under Chomsky's proposals. The anaphora rule only allows replacement of a pronoun by a co-indexed
The semantic component
41
trace when the pronoun occurs after the noun phrase (its antecedent) to which its meaning relates (see Chomsky, 1977, p. 202). In (12), 'her' relates to 'a girl' and precedes it (its postczdznt). Lasnik (1976) tried to tackle this problem, but Evans (1980) has shown that the solution which he proposes will not work. Evans himself, however, proposes a rule having the effect that a pronominal may be co-indexed with a noun phrase provided that it does not both precede the latter and c-command it (1985, p. 244). This covers (12), since 'her', though preceding 'a girl', does not c-command it {vide (12P)). None of these proposals for the treatment of relative clauses in LF will serve to distinguish the meanings of (8) and (9), for, as we concluded in section 1.4, both have the same shallow structure: (8/9S) eskimos wh-i e t\ live2 t2 in igloos have3 /3 lots of fun, which would yield (8/9LF)(eskimos4 (t4 wh-i e t\ live2 t2 in igloos have3 /3 lots of fun)) on May's principles, with sub-script 1 changed to 4 for Chomsky. Yet, by analogy with May's representation of other examples of restrictive relative clauses, it seems that he would want the LF for (9) to be: (9LF) (eskimosi wh-2 / t2 Hve3 /3 in igloos)N» (t\ have4 t4 lots of fun)r. But this would mean that the distinction between restrictive and appositive relative clauses was introduced at LF, rather than different LF representations being 'triggered' by a difference in shallow structures. Finally, we have to consider how (11) would be represented at LF. The shallow structure was (1 IS)
anyonei wh-2 that t2 hurts3 /3 anyone wh-4 that t4 hurts5 t5 himj hurtS6 t6 himself i.
The literature offers no guidance on how to deal with an example like this, so we shall have to improvise. Supposing the relative clauses to be restrictive, the second appears to be embedded within the first, so that a representation consonant with May's general approach to restrictive relative clauses would be: (anyonei wh-2 that t2 hurts3 /3 (anyone wh-4 that /4 hurts5 t5 himi)N»)N" (ti hurts6 But this is not yet in LF, because the second occurrence of 'anyone', although a quantifying phrase, is not co-indexed. The problem, however, is with what to co-index it, since the subject position of the second occurrence of 'hurts' is already filled by the co-indexed trace f3. The best
42
Linguistics: strings
we can do is probably to give it the same index, 4, even though there is no rule to justify this: (11LF) (anyonei wh-2 that t2 hurts3 /3 (anyone4 wh-4 that /4 hurts5 t5 himi)N»)N" (t{ hurtS6 /6 himselfi)i' At least it is clear that none of the paths will then cross. A final judgment upon this new semantic component of transformational grammar must be postponed until we have considered the structural analyses of these examples which logicians can offer, for transformational grammar leaves it to other disciplines to tell us how to proceed from this point onwards in assigning a meaning to the structures which it delivers. But it should be clear already that the semantic component is still very inchoate and is beset with many unanswered problems. Part of the trouble undoubtedly lies in the methods of transformational grammarians: they are constantly modifying their notation and rules (which is perfectly legitimate and, indeed, necessary to progress) but do not then go back to re-work earlier examples (which is not). This often happens even within a single work (as illustrated several times in this chapter). The result is that the pieces do not fit together and anyone who tries to extract a coherent whole becomes extremely frustrated. But there are also problems of substance. Thus the analogy which transformational grammarians see between the treatment of relative clauses in shallow structures and of quantifying phrases in LF may be questioned. So far as LF is concerned, the sub-script letter attached to 'wh-' appears redundant, in that it does no work. This can be seen if we replace sub-scripted 'wh-'s by 'such', for example in (12LF"): (12LF") ((a boyi such that t\ was fooling her3)N» (a girl3 such that /3 loved himi)N»)N"i (/3 kissed /i) r .
The phrase 'such that' - a logicians' device - is perhaps pedantic, but clear enough, and shows that nothing is lost to the meaning by eliminating the sub-script letter. Semantics is undoubtedly the Achilles heel of transformational grammar. This is now its second attempt at adding a semantic component to the syntactic theory, and it seems destined to be no more succesful than the first. Transformational grammar has now been around for over thirty years, so this deficiency can hardly be attributed to growing pains. It must surely prompt us to ask whether the theory is not mistaken about the relationship of syntax to semantics. In retrospect, the relegation of semantics as an appendage of syntax, a feature common to most theories due to linguists, has historical rather than theoretical justification:
The semantic component
43
modern synchronic linguistics began in the nineteenth century with phonology, and only discovered formal syntax in the thirties of this century, while its excursions into semantics remained at a pre-formal stage until very recently. Moreover, linguists were concerned to claim a certain autonomy for syntax. In transformational grammar this takes the form of a principle to the effect that no syntactic rule can refer to pragmatic, phonological or semantic information (see Chomsky, 1977, pp. 36-59).17 But that does not exclude an appeal to semantic arguments in favour of a particular syntactic rule, which are sometimes given quite explicitly by transformational grammarians. Moreover, the principle itself presupposes that we can always distinguish pragmatic, phonological and semantic information from syntactic. Yet it is admitted that the native speaker, upon whose judgments of acceptability syntactic theory is ultimately based, is for the most part unable to tell whether a deviant expression is meaningless or asyntactic18 or merely false (see Chomsky, 1977, p. 4; Radford, 1988, pp. 13-16). What settles this question is a theory: if an example violates the rules of my semantic theory, I say it is meaningless, but if the rules of my syntactic theory, then I say it is syntactically ill-formed. Indeed, this very expression presupposes a system of rules by the implied contrast with well-formed, that is, formed in accordance with certain rules. But is not this position viciously circular, with theory dependent upon judgments which in turn are dependent upon the same theory? It could be saved from this charge if, instead of being for the most part unable to tell whether a deviant expression were meaningless or asyntactic, we were for the most part able to do so. Then the theory would be built on the agreed judgments and used only to settle borderline cases. A good case can be made out for constructing a theory of meaning on this basis. There is a long tradition of philosophical criticism which takes the form of arguing that what an opponent has written is meaningless. This builds upon a type of protest which any language speaker quite naturally makes from time to time: 'It doesn't make sense'. In both cases, of course, the critic must be prepared, if challenged, to support the charge 17
18
It is to observe that transformational grammar does not itself entirely conform to this principle, since it embraces the assignment of semantic roles at the point of lexical insertion to yield deep structures, and subsequently appeals to semantic roles in defining government. This is not a term used by transformational grammarians, but it is cited in the OED in just the sense required here, with a quotation from Mark Pattison. There is no corresponding term 'asemantic', doubtless because 'asyntactic' transliterates a Greek word meaning 'disordered' and thus pertaining to the old sense of 'syntax'.
44
Linguistics: strings
of meaninglessness with argument, and may turn out in the end to have been wrong: such judgments are not infallible. It is much less plausible to try to construct a theory of syntax upon some notion of syntactic ill-formedness. The reason for this is that, whereas we are familiar with meaningless expressions that cannot be faulted on the ground that their authors do not know how to speak or write their language, we are very largely quite unfamiliar with syntactically ill-formed expressions whose sense cannot be impugned. One can give enough examples, perhaps, to show that there is a difference between semantic and syntactic ungrammaticality, but syntactic illformedness does not have to go far before it begins to impair intelligibility, so that, without a theory for a guide, one soon becomes uncertain whether the deviance is semantic or syntactic in origin. The ground of this asymmetry between syntax and semantics is that syntax is the handmaid of semantics. This may be unpalatable to many linguists brought up in neglect of semantics, but the primary purpose of knowing how to express oneself in English, Spanish or German, say, according to the generally accepted forms in those languages, is to communicate with others who share that knowledge. We study the syntax of a foreign language primarily because we want to be understood in it, to be able to convey our meaning to others. This does not exclude other purposes for syntax which have nothing to do with meaning, nor does it in any way deny that syntactic rules should for the most part be formulated in terms of syntactic structure. It does, however, explain why attempts to explain syntactic ill-formedness as a notion upon which to build a syntactic theory are doomed to failure, and one only has to look at any of these attempts to see how feeble they are. 19 It is possible, nevertheless, to extract structures for sentences containing multiple quantifying phrases upon which a correct account of meaning can be based from those generated in transformational grammar. A method for doing this has been developed by computer scientists and will be described in section 3.2. But the complexity is considerable even with a restricted vocabulary and quantification is only one aspect of a wider problem, to be explained in the next chapter under the heading of scope ordering. A serious attempt to extend the method to a wider range of linguistic structures would almost certainly turn out to be messy and enormously complicated. The basic reason for this is that in seeking to transform syntactic into semantic representations we are moving from simpler to more complex structures. There is therefore a 19
See, for example Radford, 1988, pp. 7-17, which explains well enough what grammaticality is not, but says almost nothing about what it is.
The semantic component
45
strong case for working the other way round instead, starting with the more complex semantic structures and projecting them onto simpler syntactic ones. We need, therefore, a different model of the relationship of syntax to semantics. The centrepiece of the model will be a theory of semantic structures, that is, an explanation of the meanings of expressions in so far as that can be given by reference to structures assigned to them. The task of a theory of syntax will then be to show how these semantic structures can be mapped onto forms which are generally acceptable in various languages. Since the syntactic theory will carry no burden of semantic interpretation as an appendage, it should be simpler than current transformational grammar. It is an open question whether the syntax would have to be specified by a tree grammar, or whether a string grammar would suffice. Transformational syntax cries out for a tree grammar, but that may be because of its (partly hidden) semantic content. It is idle to speculate further about syntax, however, until we have a much clearer idea of semantic structures, which brings us to the contribution of logic to our problem. Meanwhile, thirty years' work in transformational grammar, together with the rivals which it has engendered, has made linguists think of language in formal terms as never before and produced a wealth of syntactic insights and observations which will surely be incorporated into a syntax related to semantics in the way just outlined, even though many generalizations may demand modification to fit a different theoretical framework. The work on LF in transformational grammar described above makes much more sense as a contribution to the problem of parsing, that is, of obtaining a semantic structure or structures from syntactic ones - the converse of generation. For it sets out to extract from shallow structures a system of structures upon which an account of meaning can be based, which is, precisely, semantic parsing. In this perspective, the theory of LF has still not taken the measure of Weinreich's criticism of interpretative semantics: it, too, is still working the wrong way round, trying to attach a theory of semantic parsing to one of syntactic generation. But as part of a wider theory of syntactic and semantic parsing, it could have a future. As Wittgenstein said of his Tractatus, it is not like a box of junk (say, parts of old clocks) but, rather, like a clock that tells the wrong time.
Logic: trees
2.1 TREE GRAMMARS In logic, what corresponds to grammar is the set of formation rules for the symbols, that is, the rules which define a formula. Although I shall eventually argue that some of the structures exemplified in the formulas of modern logic are more complex than trees, many of them certainly are trees and probably the majority of logicians has assumed that all of them are. So it is fitting to take tree grammars as typical of logic in the way that string grammars have been taken as typical of linguistics. A tree is a finite set of one or more nodes which fulfil two conditions: first, that just one node is called the root and, second, that the remaining nodes are partitioned into disjoint trees, called sub-trees of the whole. Each root is connected to the roots of each of its sub-trees by a line and, intuitively, the resulting diagrams look like upside-down trees, with the root at the top and the branches spreading out downwards. The bottommost nodes are called the leaves of the tree and the set of leaves constitutes the tree frontier. As with a string grammar, a distinction is drawn between nonterminals (category symbols) and terminals but, in addition, each of these symbols is assigned a degree (sometimes also called a 'rank'), which is the number of sub-trees which issue from it. A tree grammar Gt consists of an ordered set (V,r,P,S), where V is the alphabet (the union of nonterminals C and terminals E) and r a set of integers giving the degree of each member of V. P is again the set of productions, which are of the form Ti => Tj, where Tx and Tj are trees. Finally, S is a finite set of starting trees, each of whose nodes must be labelled with an item from the alphabet. Thus, instead of allowing us to re-write one string by another string, a tree grammar allows us to re-write one tree by another tree. The rules of the simplest tree grammars - which will be enough for our needs - are in expansive form, viz. 46
Tree grammars Co => I C|
T I ....
47
I Cn
where Co, . . ., Cw are category symbols and T is a terminal symbol of degree n. Co, . . ., C,, may be empty; application of such a rule will terminate a branch of a tree. In order to facilitate comparison and contrast with string grammars, let us see how a string grammar might be used to generate the sentence used as an example in chapter 1, 'Every doctor visited at least one patient', with the absolute minimum of changes to the category and terminal symbols. The process by which the sentence is generated will consist of a series of trees and, as it is possible to express trees in a linear notation, to do so will make for a more compact exposition. Thus the general expansive form given above can be written: Co =» T(C,,. . .,Cn) and, as rules of this form, we may lay down the following, using the categories of transformational grammar: (Rl) (R2) (R3) (R4)
I" V" N" N
=> Past(V") =» visit (N", N") =» every (N), at least one (N) => doctor, patient.
Thus 'visit' is given a degree of 2, 'Past', 'every' and 'at least one' a degree of 1, and 'doctor' and 'patient' a degree of 0. Here, then, is the derivation of 'Every doctor visited at least one patient', using I" (sentence) as the start symbol: 1 I" 2 Past (V") by (Rl) 3 Past (visit (N",N")) by (R2) 4 Past (visit (every (N), at least one (N)) by (R3) 5 Past (visit (every (doctor), at least one (patient))) by (R4) In the original notation, the final tree will be: Past visit
r
I
every
at least one
doctor
patient
48
Logic: trees
The difference between this tree and the phrase-marker for the corresponding string is patent. Unlike the latter, it contains no category symbols and terminal symbols are not only to be found as leaves, but also at all the other nodes, including the root. There the sentence consisted of the tree frontier only, the remainder of the tree showing how it was derived, whereas here the sentence itself is displayed as having a tree structure, and the history of its derivation is given by a series of trees. 2.2 LOGIC AND M E A N I N G Even if we grant, however, that logic demands a tree grammar, by what right do we suppose that logic has anything to contribute to the representation of meaning! Traditionally, logic is the study of arguments (or of types of argument) in order to determine whether they are valid, and in practice it has concentrated upon theoretical deductive arguments. The attraction of using logical methods in order to represent everyday language is that they are geared to deduction, making it easier to determine what follows from a given premiss or premisses. But argument is not the only use of language, not even the principal one, so we need some guarantee that logical form has wider import or, at the very least, that it is capable of supplementation to cater for other uses. The way forward is shown by a method of establishing that an argument is invalid which is in such common use that we seldom reflect upon its implications. When we do not know independently whether the conclusion of an argument with which we are presented is true, we look for a parallel, a similar argument. If we suspect that the original is invalid, then we try to find a parallel the truth of whose premisses and the falsity of whose conclusion will not be contested. But even this does not settle the matter, for it may not be accepted that our new argument is parallel to the original, so this forces us to consider more carefully what we mean by 'parallel' in this context. By what process did we obtain the second argument from the first? The answer, clearly, is by changing some parts of the original but leaving others intact and this, in turn, implies that we saw the first argument as having some pattern or structure, which is preserved in the second. In some cases, producing the parallel argument may involve changing entire propositions among those constituting the original, but far more commonly we only change parts of its constituent propositions, thus recognizing the latter as having an internal structure which is relevant to the validity of the argument. So already we find ourselves committed in logic to a structural analysis of propositions. However, supposing that our second argument can be related to the first by a series of changes in
Logic and meaning
49
the latter, so that it cannot be denied that both have the structure which we saw in the first, it may still be objected that we have not shown the first argument to be invalid. The objection may be correct; if so, our mistake will have been that we changed too much in the original, that we treated something as mere content which should have been recognized as belonging to the structure or pattern. Of course, if we have found but one example of the structure which we saw in the original argument which has true premisses but a false conclusion, we have shown that that pattern of argument is invalid; but that does not establish that every argument in which the pattern can be seen is also invalid, for it may also have another structure of which there is no such example. One need only instance the pattern consisting of any two premisses and a conclusion, of which invalid examples can be thought up in a moment. Yet almost all of the valid forms of argument catalogued by Aristotle in his Prior Analytics have this pattern, too. Finding invalid patterns of argument is therefore also of limited use, and will never yield conclusive proof of the invalidity of any particular argument. But in practice it is often effective in convincing someone to withdraw an argument, by throwing upon him the burden of finding a valid pattern in it, which he may not be able to do. The logician, accordingly, is primarily interested in cataloguing valid patterns of argument. Yet he cannot do this simply by going through examples of each pattern. True, the more examples he tries out, failing on each occasion to find a counter-example, the more confident he can be that the pattern is a valid one, but at no stage does he have a guarantee that he has not simply overlooked an example which would show the pattern to be invalid. In this resort, he has two alternatives. The first consists in breaking down the argument into a series of minimal steps, and then justifying each step. The justification proceeds in two stages. First, each of the minimal steps is represented as a simple argument pattern. A representation of an argument pattern is known as a schema, the Greek word for 'pattern', but in order to avoid any confusion, it will be convenient here to distinguish between an argument schema and an argument pattern, the latter being the structure which the argument is seen as having and the former a representation of that structure. Logicians customarily use a notation for schemas in which the parts of the argument whose structure is being represented and which are replaceable by others without affecting the structure are indicated by letters or perhaps other symbols; these are, accordingly, called schematic symbols} An actual argument may therefore be obtained from a schema 1
I find this term much preferable to 'variable', an unfortunate loan-word from mathematics which has caused some confusion in logic.
50
Logic: trees
by substituting linguistic expressions for the schematic symbols which it contains. Yet we cannot allow any linguistic expression we please to be substituted for a schematic symbol, for the result of doing so might not make sense. Consequently, a schema must always be accompanied by a key to its schematic symbols stating what kind of expression may be substituted for each. The key gives us the directions for using the schema in order to obtain actual arguments from it; without the key, it has no use and, hence, no meaning. So, together with the logician's commitment to structural analysis goes a commitment to classifying linguistic expressions, that is, assigning them to categories. Since a theoretical deductive argument consists of propositions, the proposition will be the first of these categories and, in order to illustrate the method presently under consideration, it will be enough to introduce the schematic symbols T \ 'Q' and ' R \ for each of which a proposition may be substituted. Ex hypothesis a minimal step in argument will consist solely of one or more premisses and a conclusion. If there were any intermediate propositions, then it would not be a minimal step. Let us, then, use the following convention for argument schemas which represent minimal steps and, more widely, for patterns of argument from which any intermediate steps are omitted. The premisses will be separated by commas, and the premisses from the conclusion by 'f\ As an example of an argument schema, we may then cite: (51)
if P then Q, P f Q
Each schematic symbol occurs twice in this schema. That it does so is part of the pattern represented. So, in order to do justice to this feature of the pattern, a further restriction is imposed upon substitutions for the schematic symbols; not only must a proposition be substituted for each, but the same proposition must be substituted for every occurrence of the same schematic symbol. We can claim that (SI) represents a minimal step in argument because there is evidently no way of breaking it down into still simpler steps. So now we can proceed to the second stage, in which its validity is justified. One might well urge that no justification is needed or even possible in this case, that the schema is self-evidently valid. Anyone with experience of teaching logic, however, will know that it is perilous to rely upon what people consider to be self-evident. The schema (52)
if P then Q, Q \ P
is invalid, but to many a student beginning logic, this has been far from self-evident. Aristotle, too, would hardly have gone to the trouble of
Logic and meaning
51
baptizing the move in argument represented by this schema as the fallacy of asserting the antecedent (of a conditional) if its invalidity were so universally self-evident. A justification of the validity of (SI) is therefore required. Yet since ex hypothesi it represents a minimal step in argument, it cannot be justified by further argument. We have to ask, therefore, what is lacking in anyone who fails to appreciate that it is valid. The only answer left is that he does not understand the meaning of 'if. For it is already assumed that he understands the key to the schematic symbols and, since any proposition may be substituted for each of them, we cannot appeal to any particular choice of substitutions. Only 'if then remains, though taken in the context of the whole pattern represented by the schema, and not in isolation, as the invalidity of (S2) shows. Thus the pursuit of logic leads rapidly into the study of meaning, and this, in turn, demands that the structures represented by schemas be relevant to meaning. These argument schemas can be regarded as rules of inference, in accordance with which valid arguments are constructed. We can thus define a method of proof in terms of them which allows us to show that many other argument patterns are valid. Given any argument schema consisting just of premisses and conclusion, we see whether the latter can be derived from the former by a series of steps each of which consists in an application of one of the rules.2 The second method of determining the validity of a pattern of argument ignores any intermediate steps, concentrating upon the premisses and the conclusion. This is a defect vis-a-vis linguistic arguments, because someone might well reach a conclusion which follows validly from his premisses, yet do so by invalid steps. In other words, it does not test the validity of a chain of reasoning. It makes essential use of the notion of truth conditions, the truth conditions of a proposition being the circumstances under which it is true. This is extended also to apply to proposition schemas. The latter, of course, are neither true nor false and so, strictly, have no truth conditions. However, where we have an argument schema, it is possible to specify the truth conditions for the premisses and for the conclusion relative to one another, although not absolutely, and that is enough for the purpose of investigating validity. For the schema will be valid just in case there are no substitutions for its schematic symbols (legitimized by the key to 2
In practice, the basic argument schemas have to be supplemented, for this purpose, with some further rules of inference of which the premisses and the conclusion are themselves argument schemas. An example is the rule that at least one of a set of premisses from which a contradiction is derivable must be false. Such rules, following the Stoics, are called argument themas.
52
Logic: trees
them) which will yield true premisses but a false conclusion. Hence, if we can find a way of classifying the possible substitutions for the schematic symbols by which each type of substitution will yield a true conclusion providing that it yields true premisses, we shall have excluded the possibility of a counter-example and so have shown that the schema is valid. It is easiest to explain how the method is applied by way of an example, for which the following schema will do: (S3)
If either P or Q then R h If Q then R
For each of the schematic symbols here, a proposition may be substituted. But, ex hypothesis propositions are either true or false. So a simple way of classifying the possible substitutions in the schema is immediately to hand, according to whether the propositions substituted are true or are false. Since the schema contains three schematic symbols, there will be eight types of substitution, as there are two types of substitution for each. But there is no need to consider all of these, for, if the schema is invalid, there will be at least one which yields a false conclusion from a true premiss. We therefore suppose that the conclusion is false and see where this leads. Now, if we can relate the truth or falsity of substitutions for 4 Q' and 'R' in the proposition schema 'If Q then R' to the truth or falsity of the resulting proposition, it will be possible to infer from the falsity of the conclusion of the schema to the truth or falsity of the substitutions in it. In the present example, this is customarily done by stipulating that a proposition whose structure is representable by 'If Q then R' will be false if the proposition substituted for 'Q' is true and that substituted for 'R' is false, but will otherwise be true. Thus, on our supposition with regard to (S3), we only need to consider the cases in which a true proposition is substituted for 'Q' and a false one for *R\ The next move is to consider whether the premiss will be true or false for this substitution. Here we are looking at the premiss in relation to the conclusion, because the schematic symbols which occur in the latter also occur in the former and we are only considering the type of substitution for them which will yield a false conclusion. Thus, so far as the premiss is concerned, we have only two cases to consider: in both, a true proposition will be substituted for 'Q' and a false one for 'R' but, in the first, a true proposition will be substituted for T ' and, in the second, a false one. The problem is then to specify whether the premiss as a whole will be true or false in each of these two cases; if it is true in either case, then we have our counter-example to the argument schema and it is invalid.
Logic and meaning
53
The premiss has the same over-all structure as the conclusion but, instead of the schematic symbol 'Q\ has the (subordinate) schema 'either P or Q'. Since we are only considering the case in which a false proposition is substituted for 'R', it follows from our previous stipulation of the truth conditions of propositions having the structure represented by 'If Q then R\ that we have now to consider whether a substitution in the (subordinate) schema 'either P or Q' which is consonant with our substitution of a true proposition for 'Q' in the conclusion could yield a false proposition from 'either P or Q\ For, if the latter is true, the whole premiss will be false, which is not a counter-example to the schema. The next step, therefore, is to stipulate the truth conditions of propositions having the structure represented by 'either P or Q\ So we lay down that a proposition having the structure represented by 'either P or Q' is to be accounted false if a false proposition is substituted for T' and a false proposition for 'Q\ but is otherwise to be accounted true. In the example under test, it is given that the substitution for 'Q' is a true proposition; consequently the result of substituting in 'either P or Q' will be true whether the proposition substituted for 'P' is true or is false. Hence, every substitution in the schema which yields a false conclusion will also yield a false premiss, and so there is no counter-example to it: it is valid. Looking back at this method in the light of the example, we see that it involves structural analysis just as much as the first method. The structural analysis is, of course, partially accomplished already in the argument schema, but a further step is taken in the course of applying the method: the premiss 'If either P or Q then R' is treated as being constructed from two proposition schemas, by substituting 'either P or Q' for 'P' in 'If P then R\ The depth to which the analysis is pursued is also the same in both methods. It terminates when we reach the simplest proposition schemas which can be given for each piece of language which is regarded as reflecting the structure of the argument - minimal schemas, as I shall call them. Rule-schemas and themas prescribe how proposition schemas having the same over-all structure may be introduced or eliminated in the course of argument: thus (SI) can be regarded as a rule for eliminating 'if in the context 'If P then Q\ A proof, therefore, which only uses rule-schemas will take the form of breaking down the premisses into these minimal units and then building up the conclusion from the latter; the picture is overlaid, and so more difficult to discern, when the proof also involves the use of mle-themas. In an exactly parallel way, the method of truth conditions constructs an account of the truth conditions of the conclusion and then of the premisses from those
54
Logic: trees
which have been stipulated for the minimal units of which they are composed. If validity as defined by the method of truth conditions is to answer to validity as we should recognize it in linguistic arguments, the stipulations of truth conditions for the minimal schemas cannot, of course, be arbitrary. Hence these, like rule-schemas, must be justified by appeal to the meanings in everyday language of the 'structural' words in the minimal schemas. So this method, too, leads us straight into considerations about meaning and, accordingly, also demands a structural analysis which relates to meaning. Whichever method we prefer, therefore, the study of validity, that is, logic, is central to the analysis of meaning, and valid inferences will afford us some of the most important clues to that structure.
2.3 OPERATOR AND OPERAND Modern logic was inaugurated by Frege and the definitive advance which he achieved depended essentially upon a new structural analysis of propositions. Frege himself was quite clear about this from the start and made explicit to his readers the importance which he attached to it: A distinction between subject and predicate does not occur in my way of presenting a j u d g m e n t . . . In this I follow strictly the ideography of mathematics in which, also, one can distinguish subject and predicate only by doing violence. (1879, para. 3) I hope that logicians, if they do not allow themselves to be frightened off by the first impression of unfamiliarity, will not refuse their assent to the innovations to which I have been driven by a necessity inherent in the subject-matter itself... In particular, I believe that the replacement of the concepts of subject and predicate by argument and function will prove itself in the long run. (1879, Preface, p. xiii)
Yet this manifesto does not explain what difference to semantic structures will be made by the replacement, and indeed there has been much confusion over the question since. The first task must therefore be to elucidate the structures proposed by Frege, in order to compare and contrast them with those of traditional logic and syntax, and to understand just how they prepared the ground for a new era in the history of logic. If argument and function are to replace subject and predicate in an account of semantic structure, then we should expect them to play a comparable role. Now, in the first place, subject and predicate are always linguistic expressions, so argument and function should both be linguistic expressions, too. Second, though, neither subjects nor predicates are
Operator and operand
55
kinds of expression, in the way that, for example, nouns and verbs, adjectives and adverbs are. Sometimes the same expression can occur as a subject and as a predicate or, at least, as part of a predicate; for instance, 'smoking' in 'Smoking can damage your health' and 'Elizabeth is smoking'. 'Subject' and 'predicate' are relative terms; given a sentence, we can distinguish its subject and its predicate. Similarly, then, function and argument should not be categories but, rather, be distinguishable in a given proposition or, perhaps, other expression. Originally, Frege quite explicitly defined argument and function as expressions, but his discussions of examples do not entirely tally with this; thus, he says, if we imagine 'Cato' in the sentence 'Cato killed Cato' as being replaceable by some other expression at both of its occurrences, then 'killing oneself is the function (1879, section 9). Yet the expression 'killing oneself does not even occur in the sentence chosen as example. Writing on the same topic twelve years later, Frege had sorted out this muddle. He begins with the distinction between a numeral and a number: 'One could imagine the introduction some day of quite new numerals, just as, for example, the Arabic numeral superseded the Roman. Nobody is seriously going to suppose that in this way we should get quite new numbers . . . with properties still to be investigated' (1891, p. 4). Numerals, he says, are signs which signify numbers; thus 'VIII' and '8', being different signs, are different numerals, but both signify the same number. Now, if we take a simple example of what Frege calls a 'calculating expression' (Rechnungsausdruck), like '3 + 4', it indicates that the numbers 3 and 4 are to be added together - not the numerals, because the value of the expression is the number 7 and not the numeral '34'. If we regard the calculating expression as a command, then it will be executed by doing something with the two numbers, namely, adding them, and 'III + IV' would express the same command in a different notation. In mathematical terminology, addition is a function which is applied to at least two arguments when an appropriate calculating expression is executed, the arguments being the numbers which are added. The expressions from which the whole calculating expression is built up are, however, the two numerals '3' and '4' and the sign ' + '. Consequently, if Frege wanted to replace the terms 'subject' and 'predicate' by some terminology which would correspond to the mathematician's use of 'function' and 'argument', he needed a distinction which would contrast the numerals in a calculating expression like '3 + 4' with the remaining expression. Although he eventually recognized this, he never introduced such a terminology. He did, indeed, distinguish between proper names and
56
Logic: trees
function names, but that is a categorial distinction; in some contexts, a function name can signify an argument and even a proper name a function. Perhaps Frege's failure to supply this lack was responsible for his subsequent reversion to the subject/predicate terminology, though using it, now, in a totally new way. At any rate, it has since become quite commonplace to speak of 'logical predicates' and 'logical subjects'. This witnesses to a need for an appropriate pair of terms, but it can only be confusing to impress terms from linguistics and then to endow them with different senses in logic (for example a sentence may have more than one logical subject). It also lacks the generality required; thus it goes against the grain to call ' 3 ' and '4' the logical subjects of '3 + 4' and ' + ' its logical predicate, because '3 + 4' is not a sentence, not even a mathematical one. Yet '3 + 4' expresses the composition of function and argument(s) which is Frege's paradigm. Fortunately, there is an alternative to hand which, though not very widespread, has gained some currency: operator and operand? The etymology of 'operator' and 'operand' also suggests, very nicely, the intuitive content of Frege's basic assumption about semantic structure. An operator is something which works upon something else, while an operand is its correlative, something which is worked upon. The seed of this idea is already present in traditional grammar, which tells us that adjectives modify the meanings of nouns while adverbs modify the meanings of verbs, that is, that adjectives and adverbs work upon the meanings of nouns and verbs respectively, so that, in a phrase consisting of an adjective and a noun, the adjective is the operator and the noun its operand, and similarly for a phrase consisting of an adverb and a verb. In that case, then, the phrase is ordered, with respect to its meaning, by the notion works upon. This ordering, moreover, does not always coincide with the order of the words in speech or writing; thus, in English, an adverb usually comes after the verb which it qualifies, so that, in such a case, the works upon relation goes from right to left across the written page (the adverb is a suffix operator). And, in other languages, adjectives are placed after the nouns which they qualify. Frege extended this structural principle to all complex linguistic expressions, giving the following prescription for analysing them: Suppose that a word or phrase occurs in one or more places in an expression. If 3
The only jarring note in previous uses of this pair of terms is to be found in Wittgenstein (1922, 5.21-5.251), who tried to contrast operations with functions, though, on closer inspection, this turns out to be a sub-division of functions in the accepted sense.
Operator and operand
57
we imagine this word or phrase as replaceable by another (the same one each time) at one or more of its occurrences, then the part of the expression that shows itself invariant under such replacement is called the function; and the replaceable part, the argument of the function. (1879, section 9) Here we should substitute 'operator' for 'function' and 'operand' for 'argument'. 'Replaceable' means 'replaceable salva congruitate\ that is, supposing the original expression to be meaningful, the new expression obtained by the replacement will also be meaningful, though it will not necessarily have the same meaning as the original one, nor, if the expression in question happens to be a proposition, will the replacement necessarily preserve the truth or falsity of the original. In short, it is a replacement which preserves semantic coherence. It should be distinguished from a replacement which preserves syntactic coherence but may yield an expression to which no meaning has been given. Frege's prescription will not yield any determinate results, however, for the simple reason that almost any part of a linguistic expression is replaceable salva congruitate. This can easily be seen from Frege's own examples. Thus he suggests that, first, we regard 'hydrogen' in (1)
Hydrogen is lighter than carbon dioxide
as replaceable by 'oxygen' or 'nitrogen'. Then 'is lighter than carbon dioxide' will be the operator and 'hydrogen' its operand. Second, we can regard 'carbon dioxide' as replaceable by 'hydrogen chloride' or by 'ammonia'. 'Hydrogen is lighter than' will then be the operator and 'carbon dioxide' its operand. Finally, we could regard both 'hydrogen' and 'carbon dioxide' as replaceable, in which case 'is lighter than' will be the operator, which will now have two operands, 'hydrogen' and 'carbon dioxide' (1879, section 9). That these three alternatives should be available is part of Frege's intention. But there are also other possibilities. Thus 'light' can be imagined as replaceable by 'heavy' (allowing ourselves the morphological change from 'heavyer' to 'heavier'), and this immediately creates a problem. 'Hydrogen' and 'carbon dioxide' are rather conveniently placed at the ends of the sentence, so that what remains when either or both of them are removed is still a single phrase. If 'light' is imagined as replaceable by 'heavy', though, we are left with two pieces when we try to say what the operator is, 'Hydrogen is' and 'er than carbon dioxide'. Well, perhaps we could get round that difficulty by allowing dots of omission to feature in operators, so that 'Hydrogen is . . .er than carbon dioxide' could be accounted the operator. But the possibilities do not end there. We could imagine 'is' as replaceable by 'weighs' or even 'looks' and, although 'than' cannot be replaced salva congruitate, '. . .er than'
58
Logic: trees
could be replaced by 'and so is' or by 'as . . . as'. Already, then, we have been able to imagine each part of the sentence in turn as replaceable by another expression, so it would appear that virtually any word or phrase can be regarded as an operand and similarly, therefore, any word or phrase as an operator. This offers an extreme contrast to the subject/predicate distinction, which allows us only one way of dividing each sentence. It is, indeed, too great a contrast. The analysis of an expression into operator and operands is intended to elucidate its meaning and it is, on the face of it, unlikely that any consistent account of meaning could be built upon an absolutely free choice of operator. At the same time, Frege had good reason for allowing alternative analyses, so we do not want the operator/ operand distinction to be as rigid as the subject/predicate one. Intuitively, certain alternatives are compatible, whereas others are not. Thus, if one takes 'is lighter than carbon dioxide' as the operator in (1) and 'hydrogen' as its operand, it is open to us to analyse the operator, in turn, as itself consisting of an operator 'is lighter than', with 'carbon dioxide' as operand. It is not implausible to regard this result as equivalent to an initial analysis of 'is lighter than' as operator, with two operands, 'hydrogen' and 'carbon dioxide'. Again, if 'hydrogen is lighter than' is taken as the operator, with 'carbon dioxide' as operand, this operator can be further analysed, with the same result as before. Thus there is a sense in which all of these analyses are mutually compatible. By contrast, if we were to take 'hydrogen' as the operator and 'is lighter than carbon dioxide' as the operand, that would not be compatible with the previous series of analyses, because, in positing 'hydrogen' as operand, they supposed that its meaning is worked upon, either by 'is lighter than' or by 'is lighter than carbon dioxide'. To call 'hydrogen' the operator is, however, to posit that its meaning works upon that of 'is lighter than carbon dioxide'. But if, as it seems, working upon is an asymmetric relation, then either of these analyses will exclude the other. So our aim should be to characterize the operator/operand distinction in a way which excludes incompatible analyses yet finds room for compatible alternatives. This would be feasible if we had at our disposal a classification of different kinds of expression, or system of categories, for, given a combination of expressions of known categories, we could state which was the operator and which its operands. Indeed, a much more modest apparatus will suffice, because any operator can be characterized by stating the kinds of expression which it requires as operands and the kind of expression which it forms by combining with them. Thus all that we need to begin with are some expressions which can only occur as operands, and a category or categories for them. I shall call the categories to which
Operator and operand
59
these expressions are assigned basic categories. But a warning must be issued that this carries no implication that expressions belonging to basic categories are not susceptible of further structural analysis. Indeed propositions themselves will be expressions of a basic category, for it is propositions whose meanings semantics must elucidate in the first instance, while, as may be seen from propositional logic, they always occur as operands when set in the context of larger propositions. Yet if we were to deny structural complexity to propositions, our whole enterprise would be cut off at the root. To offer an example, then, if propositions form a basic category and also names of substances like 'hydrogen' and 'carbon dioxide', say the categories P and S respectively, then we should no longer be free to regard 'hydrogen' as the operator in (1) and 'is lighter than carbon dioxide' as its operand. Yet this move does not deprive us of the compatible alternative analyses described above. If 'hydrogen' and 'carbon dioxide' are both designated as expressions of the basic category 5, then it will follow that 'is lighter than' is an operator which will form a proposition from two expressions of that category. Since we can characterize operators in terms of the categories of their operands and the categories of the expressions which they form, this yields a notation for category names by writing the latter first, followed by a bracketed list of the former: in the present case, therefore, P(S,S). But if 'is lighter than' be combined with just one expression of category S - whether preceding or following it does not matter - then we shall have an expression which will combine with one expression of category S to form a proposition, that is, an operator of category P(S). Thus it still remains open to us to analyse (1) into 'hydrogen' as operand and 'is lighter than carbon dioxide' as operator, or into 'hydrogen is lighter than' as operator and 'carbon dioxide' as operand. Now Frege came very close to this solution. Indeed he simplified it even further, by having only a single basic category, that of proper name (Eigenname). The price of this simplification was that the category of proper name had to be very hospitable, including propositions as well as most proper nouns. Inevitably this has been controversial, for it is counter-intuitive to suppose that propositions relate to meaning in the same way as proper nouns. But, for the moment, the particular choice of a basic category or categories does not concern us. The essential point is that at least one category should be designated as basic, in the sense that expressions of that category may only occur as operands.4
4
Frege's system does conform to this requirement, but there is a complication. We cannot say without further ado that proper names can only occur as operands, because this is
60
Logic: trees
The structures to which the notion works upon gives rise are most clearly represented as trees. Trees can be read in two directions, horizontally and vertically, each of which corresponds to a distinct type of ordering. Frege partly recognized this, adopting a planar notation in order to represent 'if. . ., then . . .', 'both . . . and . . .' and 'either . . . or . . .', but otherwise retaining a linear notation. Each of these expressions forms a proposition from two (subordinate) propositions. If we go along with Frege for the time being in assigning propositions to the category of proper names and calling that category E (for Eigenname), then each of the three expressions will be an operator of category E(E,E). Now, supposing 'A' and 'B' to be propositions, Frege's representation for 'if A, then B' is
and that is a tree, although branching from left to right instead of from top to bottom, the more usual convention. In Frege's notation, the horizontal dimension represents the notion works upon from left to right, while the vertical dimension from bottom to top represents the 'direction' of'if A, then B': the latter orders 'A' and ' B \ since 'if B, then A' does not mean the same, although the meanings of both 'A' and 'B' are worked upon by 'if. . ., then . . .'. It is, of course, also possible to represent this structure in the linear notation. The usual convention is to write the operator first (as a prefix operator) with its operands enclosed in parentheses, for example 'if (A,B)\ The way in which the order of the operands relates to the meaning of the operator is then fixed by a convention. The convention which I
ambiguous. By 'proper name' we can understand a certain expression, after which a category is named, or an expression qua member of a category, in virtue of which membership it is called a proper name. Thus Frege held that a proper name can sometimes occur as an operator (1893, section 22); though, in that case, he no longer assigned it to the category of proper names, but to a different category. The difficulties presented by this view will be considered later. For the moment, the point is that an expression may sometimes be assignable to more than one category. In such cases, we really need a neutral way of describing it, but, because it is relatively rare (even in Frege's view), for a proper name to occur as an expression of a category other than that of proper name, it is called after the category to which it usually belongs. In spite of this, however, it remains true for Frege that an expression which occurs as a proper name, that is, as a member of that category, can only occur as an operand. So we can disregard this complication and proceed on the understanding that Frege has a basic category in the sense explained above.
Operator and operand
61
shall follow when using a linear notation is that the order of the operands will be the same as in English, except that, where in English we have an infix operator (that is written between the first operand and the remainder), its first operand will be written at the end. Thus 'if is infix in 'B if A' and so I write 'if (A,B)', but it is prefix in if A, B', so again we have 'if (A,B)\ Similarly, (1) becomes: 'is lighter than (carbon dioxide, hydrogen)'. The reader should note that this is not the most common convention for this purpose among logicians. Frege represented 'both . . . and . . .' and 'either . . . or . . .' by combining his representation for 'if . . ., then . . .' with a sign for 'not'. This was a small vertical line placed under the horizontal one, as it were dividing the horizontal line into two halves. He understood 'if A, then B' to be true in every case except that in which 'A' is true and 'B' false, so that 'both A and B', for example, could be represented as 'not if A then not B'. Now 'not' is also an operator here, whose operand Frege takes to be a single proposition, so that it belongs to his category E(E). In linear notation, therefore, we can express 'both A and B' as 'not (if (A, not (B)))'. But it is not very easy, visually, to disentangle the works upon relationship (expressed by the parentheses) from the semantic direction of 'if in relation to its operands (expressed by the ordering of items between commas). A planar notation is much clearer, whether Frege's own or a tree with its root at the top. I place the two versions side by side for comparison, setting out the second as a labelled tree: 1
1
-B A
— •
works upon
not
I
if
I
works upon
Although Frege used a linear notation to represent semantic structures for examples like (1), it is evident that they, too, can be shown in the planar notation. The virtues of a planar as against a linear notation can be argued on both sides, but there can be little to be said for a mixed notation which is partly linear and partly planar. The verdict of history has so far gone against Frege's effort to sustain a partly two-dimensional notation. Yet although a linear notation may be much more convenient for working in, for parsing and synthesizing actual sentences and for setting out proofs, a planar one may be far better for theoretical
62
Logic: trees
purposes, when it is of primary importance to obtain a clear view of the structures with which we are dealing or, perhaps, are positing. So it is that, here, the planar notation for Fregean structures is more apposite to our purposes than the linear one. It will be evident that it can be extended, where Frege failed to extend it, to the style of analysis which he proposed for (1), for which we simply have: (IF)
is lighter than
I hydrogen carbon dioxide Moreover, at the bottom of each branch of the tree, as its leaves, we shall always have expressions of a basic category, that is, in Frege's system, always proper names. In order to generate this type of structure, we shall evidently need a tree grammar, for example, one with the rules: E =» I E
is lighter than 1 1 E
E =>
hydrogen, carbon dioxide
The principle of vertical ordering introduced by a tree grammar is nowadays called scope. An expression is said to fall or lie within the scope of another, just in case the latter lies above it on the path from it to the root of the tree. Thus in the tree representation of 'both A and B' given earlier, 'B' lies within the scope of 'if but not within the scope of 'A'. In the linear notation, everything which is enclosed in the parentheses following an operator lies within its scope, but items separated by commas do not fall within each others' scope. I shall find it useful to distinguish one expression as falling within the immediate scope of another when the latter is at the next node above it in the tree; lying within the immediate scope of an expression is then the same as being worked upon by it. Frege's replacement of subject and predicate by argument and function may thus be interpreted as putting trees in the place of strings, ordering linguistic structures by scope instead of by concatenation. But this is to confine attention to linguistic structures without any thought about how those structures are to be interpreted. Although Frege seems to have begun by thinking of argument and function as expressions, he had certainly abandoned this view by 1891, and thereafter he held them to be the respective Bedeutungen - meanings - of what I have called operands and operators. The Bedeutungen of proper names were christened objects (in the case of proper nouns, these were to be their bearers, in the case of numerals, the corresponding numbers and, in the case of propositions,
Operator and operand
63
truth and falsity). Among operators, those which form propositions were singled out as having a special kind of function, concepts, for their Bedeutungen, the values of concepts being always truth or falsity. This allowed Frege to provide an explanation of the truth or falsity of propositions exactly parallel to that of the values of calculating expressions: only, instead of the functions mapping numbers onto numbers, as in the simplest kind of calculating expression, in the simplest kind of proposition, concepts map objects onto truth or falsity. This aspect of Frege's appeal to argument and function in linguistic analysis was certainly original but, although it has been widely influential, it remains controversial. However it can hardly be regarded as a replacement for subject and predicate, since the latter are indubitably expressions, whereas function and argument in Frege's mature sense are explicitly distinguished from the expressions which signify them. Operator and operand replace predicate and subject, not function and argument and, although the replacement was necessary so that Frege could apply the function/argument distinction to everyday language, there is no complementary necessity to interpret a tree grammar in terms of functions and their arguments, however widespread the custom. Frege's theory of meaning was put forward as a package deal, but we are not compelled to take it or leave it as a whole. It seems, then, that at the time when Frege made such large claims for the advantage of replacing subject and predicate by argument and function, he was by no means clear about what that replacement comprised. If, indeed, he thought of it as more than a structural innovation, we must nevertheless distinguish sharply between the structural analysis and the functional account of truth conditions built upon it. It is then evident that the latter does not come up for assessment until we are satisfied with the former. Yet if all that we are concerned with initially is the suggestion that, in asking for the meaning of a proposition, we must see it as having a tree structure, it is dubious whether that marks any advance upon or even any major innovation to previous logic. Perhaps Frege was the first logician to make the idea explicit, but it has surely been implicit at least since the Stoics inaugurated propositional logic. The only really new feature is that, where an operator has more than one operand, no distinction is made between the operands corresponding to the syntactic distinction between subject and object, or between direct and indirect object. But that is just a more thorough-going application of the idea of a tree structure, so the principal contrast lies not between Frege and previous logicians, but rather between those who think that a string grammar is enough to sustain a theory of meaning and those, including Frege, who think that it is not.
64
Logic: trees
2.4 C A T E G O R I A L G R A M M A R The first attempt to construct a tree grammar inspired by Frege's replacement of subject and predicate by operator and operand was made by Ajdukiewicz (1935). Ajdukiewicz began by proposing a system of category names, which I have already anticipated, though using a different notation; whereas I have used linear tree-notation, he used a horizontal line on the model of division in arithmetic, for example |
for
E(E),
i
for
E(E,E),
etc.
Ajdukiewicz differed from Frege in distinguishing two basic categories, sentence and name. 'Sentence', of course, translates 'Satz' in the German version of Ajdukiewicz's article; but in spite of the title, which uses 'syntactical' in the logician's sense, he is certainly concerned with meaning. Both sentence and name, for example, are described as basic semantic categories; so we need to amend 'sentence' to 'proposition'. As to names, he avers that at least two semantic categories must be distinguished under this head in everyday language, 'the singular names of individuals' and 'general names . . . as names of universals'. But he nevertheless proposes to follow Lesniewski in having only a single category of names. From the sole linguistic example which he gives, (2)
Lilac smells very powerfully and roses bloom,
in which 'lilac' and 'roses' are assigned to the category of names, it appears that he is including count nouns in this category. Frege would certainly have demurred at this assimilation but, as Ajdukiewicz himself points out, the idea of characterizing semantic coherence by means of a system of categories is independent of any particular choice of basic categories. The choice of basic categories will be considered later; meanwhile, it will be convenient to follow Ajdukiewicz while expounding his theory. If Co . . . Cn are categories, then the general form of a category name is C0(Cj,. . .,C,J, where n is the degree (or rank) of expressions belonging to that category (for basic categories, n = 0) and, for each of Co . . . Cn any category name may be substituted. The category names of the two basic categories are S (Satz, proposition) and N (Name, name). Taking his example, Ajdukiewicz assigned 'lilac' and 'roses' to category N9 'smells' and 'bloom' to category S(N), 'and' to category S(S,S), while the adverb 'powerfully' is assigned to category S(N)(S(N)) and 'very' to category S(N)(S(N))(S(N)(S(N))). The grammar then consisted of a single rule-schema,
Categorial grammar (A)
65
C0(C,,. . .,(?„) C, . . . Cn » Co,
If this be written out in Ajdukiewicz's own notation, it is analogous to the multiplication of fractions in arithmetic, the 'numerator' operands cancelling out with the 'denominators' of the operator; hence his choice of that notation. I have used ' » ' instead of '=>' in stating this rule because it is the converse of a production for a tree grammar, a combination rule instead of a re-write rule. It works, moreover, the opposite way round from the rules considered in section 2.1, that is, it is & parsing rather than a generative rule. In order to apply it, Ajdukiewicz assumes that the expression concerned has first been re-written so that each operator precedes its operands; thus, for his example, and S(S,S)
(very S(N)(S(N))(S(N)(S(N)))
(powerfully S(N)(S(N))
(smells(lilac))),bloom(roses)) S(N) N S(N) N
Then, for the first application of the rule-schema, we take
C0=Cj = S(N)(S(N)) (Al)
to yield the rule
S(N)(S(N))(S(N)(S(N)))
S(N)(S(N))
»
S(N)(S(N))
which allows us to combine 'very' and 'powerfully' to give 'very powerfully' as an expression of category S(N)(S(N)). In a second application, we take COZ=CI = S(N), yielding the rule (A2)
S(N)(S(N)) S(N) » S(N),
so that we can combine 'very powerfully' with 'smells', giving S(N) as the category of 'very powerfully smells'. Next, with Co = S and Cy = N, we get the rule (A3)
S(N) N » S,
which can be applied twice: first, to license combining 'very powerfully smells' with 'lilac' to form a proposition; second, to yield another proposition by combining 'bloom' with 'roses'. Finally, by taking Co=Ci = C2— S in the rule-schema, we have the rule (A4)
S(S,S) S S » S,
which allows us to combine 'and' with both 'very powerfully smells lilac' and 'bloom roses' to obtain the result that (2) is a proposition. Unfortunately Ajdukiewicz never spelled out any procedure for rearranging the parts of a proposition into the correct order for applying the combination rule-schema. Instead, bar-Hillel (1953) proposed a modification to the notation for category names which would allow the combination procedure to be carried out on a proposition with its words
66
Logic: trees
in the correct order for the language concerned. This was to distinguish operands occurring before (on the left of) an operator from those occurring after (on the right of) it. Thus Dj. . .D m \C 0 /Ci. . .C m is the category name of an operator which will combine with m operands preceding it and n operands following it. If m = 0, then Co is a prefix operator; if n = 0, it is a suffix operator; and if both m > 0 and n > 0, then it is an infix operator. A corresponding modification of the rule-schema is required, to: (B)
D , . . . D m D 1 . ..D m \C 0 /C,. ..C n C,. ..C n
»
Co
We can then write category names in the new notation under the words of (2) just as they stand and apply the new rule-schema with the same result as before: Lilac N
smells N\S
very powerfully and roses bloom ((N\S)\(N\S))/((N\S)\(N\S)) (N\S)\(N\S) S\S/S N N\S very powerfully (N\S)\(N\S) smells very powerfully N\S Lilac smells very powerfully roses bloom S S
There are, however, quite common examples for which even this modification does not cater. Bar-Hillel himself mentions one of them, of which an example (though not his) is the placement of 'not' in English immediately before a verb, as in (3)
The cat did not move.
From the point of view of meaning, 'not' is best regarded as an operator upon a proposition, that is, as of category S(S), for it says that something is not the case, is not true. So the standard logical analysis of (3) is 'not (the cat moved)', with 'not' as the operator and 'the cat moved' as its operand. But neither an assignment to category S/S nor an assignment to category S\S will allow us to write the category names under the words in (3) in the order in which they stand and so that we can show that it is a proposition by using schema (B). The other three operators of propositional logic provide a further exception, for, if we take the forms 'both . . . and . . .', 'either . . . or . . .' and 'if . . ., then . . .', each operator consists of two separated parts; under which, then, do we write the category name and what do we write under the other? Many subsequent writers on categorial grammar have pursued this attempt to make it fit the forms of expression of a particular language, usually English (for example Lambek, 1958; Montague, 1974, ch. 8; Ades
Categorial grammar
67
and Steedman, 1982; Bach, 1984; Huck, 1988). We see, here, the obverse of transformational grammar, in which a syntactically inspired structural analysis was extended to embrace meaning. Now a semantically inspired analysis is being extended to embrace syntax. Of course it would be convenient, and economic, too, if a single structural analysis would simultaneously serve for both syntax and semantics, but the more one goes into detail the more the difficulties pile up. Those which arise for an account of meaning affixed to a syntactic analysis have been discussed in the previous chapter, while the ultimate source of difficulty in affixing syntax to a semantic analysis is that, whereas semantic structures may be expected to be relatively independent of the vagaries of any particular language, syntax will differ from one language to another. Have we, in any case, any right to expect that semantic structures will be the same as syntactic ones? 'Every doctor visited at least one patient' has already been used to show that a proposition or sentence can be seen as having a string structure, but can also be seen as having a tree structure. When we consider it as generated by a string grammar, we see it as having a string structure; when we consider it as generated by a tree grammar, we see it as having a tree structure. It is no use asking which of these structures it 'really' has. The way in which we see anything complex as being organized - what we count as its parts and how they are interrelated - will depend upon our purposes. Giving an explanation of the meaning of a proposition is a different purpose from explaining why it is a correct form of expression in a given language. Hence there is no a priori reason to suppose that we need to see it as having the same type of structure qua proposition and qua sentence. On the contrary, our expectation should be that semantic structures will be different from syntactic, and we should require cogent arguments to persuade us otherwise. That the divergence of purpose between semantics and syntax should give rise to two different ways of discerning a structure within an expression, each legitimate in its own context, but immediately productive of confusion when used for the wrong purpose, is a very ancient idea, which is already quite explicit in Aristotle: 'In fallacies arising from the form of expression, the cause is the similarity of the expression. For it is difficult to distinguish kinds of thing expressed similarly and kinds of things expressed differently (indeed he who can do this is very near the contemplation of the truth and is especially wise in judgment)' (De sophisticis elenchis 7, 169a29-36). It has, indeed, been a persistent theme throughout the history of philosophy, even though the emphasis which it has received has varied; in our time it has again come to the fore: thus Wittgenstein echoes Aristotle when he says: 'Our
68
Logic: trees
investigation . . . is a grammatical one. Such an investigation sheds light on our problems by clearing misunderstandings away. Misunderstandings concerning the use of words, caused, among other things, by certain analogies between the forms of expression in different regions of language . . . Philosophy is a battle against the bewitchment of our intelligence by means of language' (1953, 1.90,109). The phenomena of paraphrase and ambiguity lend further support to the supposition that two different kinds of structure are discernible in expressions, each of which is necessary to a full account of language. Indeed, paraphrase and ambiguity then appear as converses. One expression is a paraphrase of another if both have the same meaning; hence if there is one structure relevant to meaning and another relevant to the particular form of expression employed, we might expect to discern in two expressions which paraphrase each other the same semantic structure but different syntactic structures. By contrast, when a single expression has two different meanings, we should be able to find in it two semantic structures although there is only one syntactic structure. The distinction between semantic and syntactic structure will appear more clearly if we extend the notion of paraphrase to include translation. For one expression to be a translation of another, in a different language, is also for both to have the same meaning. Although two such expressions, because they have the same meanings, will have the same semantic structure, their syntactic structures might differ so considerably that a man might be able to understand the one without being able to understand the other, even though he had the requisite vocabulary. This often happens with a foreign language which we do not know well: we know the translation of each word in the expression but cannot see precisely how their meanings fit together. How could this occur if semantic and syntactic structure were the same? Ambiguity is the other side of the coin: not the ambiguity which occurs when one element in a sentence has more than one meaning, but structural ambiguity, when it is possible to see the sentence as being constructed in more than one way. Here is an example from a modern hymn: 'Though storm-clouds threaten the day he will set me free'. None of the individual words in this example is ambiguous; it is a question of how we read the structure, as a complete sentence with an implicit comma after 'the day', or as a clause in which 'he will set me free' qualifies 'the day'. Another example is the following notice: 'No parking in front of these gates at all times'; yet another, an advertisement for a newspaper: 'Everything you need to know about sport every Monday'. On the face of it, we have a single syntactic structure in each of these examples, for I do not know what syntactic structure is if two equiform expressions of the
Categorial grammar
69
same language do not ipso facto have the same syntactic structure; but ex hypothesi we have two semantic structures, since the expression has two meanings. Moreover, for centuries it has been a staple of philosophical criticism to argue that an author has degenerated into writing nonsense, yet without in the least implying that he was ignorant of the syntax of his native language and unable to write correctly formed expressions in it. Typically, though not invariably, the criticism is directed to expressions which make liberal use of abstract nouns ('pain* and 'pleasure' are classical examples). So what is often being contended is that certain items of vocabulary may be combined in ways which are syntactically coherent, but still do not yield, in combination, an over-all meaning. The implication of this must be that an expression can have a syntactic structure without having any semantic structure; and, hence, that the elements of which semantic structures are composed are not always the same as those of which syntactic structures are composed. Many linguists, too, are committed by their principles to support philosophical tradition on this point, for they have been at pains to develop arguments for certain types of syntactic structure which are independent of any appeal to meaning (see Akmajian and Heny, 1975, passim). To argue that semantic and syntactic structures are distinct is not to say that they are unrelated. The contrary, indeed, follows if we understand an expression through its syntactic structure; for then the latter will represent the meaning of the expression too, but by the use of certain conventions which are peculiar to the language concerned and to which other languages offer alternative conventions. So there must be some systematic means by which syntactic structures can be related to semantic ones, and these will eventually have to be spelled out. Here, however, it is my purpose to elucidate semantic structures and so I shall assume that categorial grammar, when used in the service of a theory of meaning-representation, need not concern itself with syntax. The categorial grammar which I have expounded so far is that of Ajdukiewicz, but although Ajdukiewiz was to some degree inspired by the logic of Frege, it contains a feature, to which bar-Hillel has drawn attention, which contrasts with Frege's analysis: 'It makes a considerable difference . . . whether we treat loves (say) as an operator which out of a left N John and a right N Mary forms a sentence, John loves Mary, IN ONE COMPLEX STEP, or as an operator which out of a right N Mary forms an operator, loves Mary, which out of a left N John forms a sentence IN TWO SIMPLE STEPS' (1964, p. 70). The distinction between left and right combination here is an unnecessary complication and can be ignored. The point at issue remains if we take the simpler system in which
70
Logic: trees
operators are assumed to precede their operands. The difference is then between assigning 'loves' to category S(N,N), which is the category of expressions requiring two names to form a proposition, and thus of degree 2, and assigning it to category S( N) (N), the category of expressions requiring one name to form an expression of category S(N)9 and thus of degree 1. These are distinct categories in Ajdukiewicz's grammar; in Frege's system, they are not. The reason for this is that Frege classifies proposition schemas, not operators. In the previous section, I observed that it was convenient for Frege's purposes that 'hydrogen' and 'carbon dioxide' occur at the ends of (1), so that when both of them are removed we are still left with a single phrase, 'is lighter than'; for if 'light' be removed instead, the remainder of the proposition falls apart into two pieces, 'hydrogen is' and 4-er than carbon dioxide'. To overcome this difficulty, I suggested that dots of omission might be allowed to occur in operators, so that we could write, for example 'Hydrogen is . . .er than carbon dioxide' as an operator. But this is only a makeshift, which will not serve in the extreme case, illustrated by the power notation in mathematics, when nothing remains if all of the operands are removed. That notation consists in writing the power to which the number is to be raised as a super-script to its right, so that we have two numerals, either or both of which can be considered operands. The pattern in which they are arranged has been given a determinate meaning by mathematicians. There is, however, a low limit to the number of distinct patterns in which operands alone can be arranged. The same goes for speech: a sound pattern is a sequence. The example of music shows that, potentially, sound patterns can be much more than mere sequences of sounds, but these potentialities are largely undeveloped in spoken language, perhaps because it is important to us to be able to write a language as well as to speak it, and writing is strictly sequential. So, in order to be able to construct distinct patterns freely, we incorporate signs as parts of a pattern. Thus a common computer notation for powers is illustrated by '3 A 2\ The pattern has not been eliminated here in favour of the new sign, because the numerals still have to be related to the sign in a particular way: one must immediately precede it, the other immediately follow it, and the latter indicates the power to which the number designated by the former is to be raised. Consequently '2*3' arranges '2' and ' 3 ' in a different way from '3*2', and each combination of signs has a different meaning. Pocket calculators which embody a power function overcome the problem of having no special sign for it in standard mathematical notation by writing two letters, each of which relates to a working register
Categorial grammar
71
on the calculator, in the same pattern as the numerals should be written. This representation of a pattern of numerals is clearly a schema in the sense introduced in section 2.2, although not a proposition schema: the user is invited to substitute numerals for the letters written on the key, by entering them into the appropriate registers, so the letters are also schematic symbols. This shows us that what is essential to a schema is no more than schematic symbols arranged in a determinate pattern; it need not contain any words or other special signs. The solution to the problem raised by treating 'light' as the operand in proposition (1) is, of course, to use schemas instead of operators, for example 'Hydrogen is 9er than carbon dioxide', together with a specification of what expressions may be substituted for '9'. From 1891 onwards, Frege adopted this course. He distinguished between complete and incomplete expressions and by 1893 was using 'proper name' versus 'function name' to mark the same distinction. A Fregean incomplete expression or function name is the same as a schema; it always contains at least one schematic symbol. Frege used Greek letters for schematic symbols, because he did not want them to be confused either with the link letters of his quantifier notation or with the letters used in algebra. As this convention is now familiar, I shall follow it. In section 2.2 I introduced a distinction between schema and pattern according to which a schema is a representation of a pattern. Now Frege remarked that the essential contrast to be discovered in a (complete) expression lies between what is present in a schema 'over and above' its symbols, and the expressions which are substituted for them (1891, p. 6; 1893, section 1). Wittgenstein subsequently drew out this hint that the distinction should be between pattern and operands, rather than between operator and operands: what signifies that the meanings of the operands are worked upon in a certain way is that they are arranged in a certain pattern (see 1922, 3.1431-2). Looking at an expression as being structured in this kind of way, therefore, implies that its only parts are its operands. A schema from which the expression may be obtained (by substitution for its schematic symbols) is not a part of the resulting expression; it is merely a representation of the arrangement of the parts, whether or not it contains any words or signs in addition to the schematic symbols. Notwithstanding this conclusion, what we have hitherto called the operator is normally part of a schema. It will be convenient to retain 'operator' in this sense, for we can ask about the meaning of a word or phrase in an expression whether it is an operand or an operator in that context. Moreover, although we may conceive of a proposition as having been constructed by substitution in a schema, it is at the very least dubious whether we can consider the schema as occurring in the
72
Logic: trees
proposition, for a schema by definition contains schematic symbols, whereas a proposition does not. The operator of the schema, however, certainly does occur in the proposition. Linguistic description would be crippled without any means of referring to certain words or phrases in an expression, so it will still be valuable to have a term which indicates that the role of some words with respect to the meaning of an expression is fundamentally different from that of others. Yet, if the contrast to be found in a proposition lies between its operands and the pattern in which they are arranged, is it not perverse to represent operators as well as operands as nodes on trees? Surely this is to invite misrepresentation, for a structure consists of inter-related parts, so that, where it is represented by a tree, what could be more natural than to suppose that the nodes represent the parts and the edges (the lines between nodes) their inter-relationships? In that case, ought not the operands to be the only nodes, and everything else to be shown by the way in which the nodes are inter-connected by the edges? In theory, this sounds absolutely right, but in practice it cannot be carried through. Some examples will show why. First, we could no longer represent linguistic structures by trees of any kind. Consider the proposition: (4)
Plato taught Aristotle and Aristotle admired Plato.
This contains two subordinate propositions, each of which in turn contains two proper names. Hence each is constructed from a schema: the first from % taught £', and the second from '( admired
taught
Aristotle
Aristotle
admired
Plato
These are trees of a sort, in that they are strings (ignoring the labels of the edges) and strings are a limiting case of trees. Yet they are not trees of the kind we require, since we do not want to represent 'Plato' as working upon the meaning of 'Aristotle' or conversely but, rather, 'taught' as working upon the meaning of both. Moreover, if we now go on to consider how the whole proposition (4) might be represented, 'and', being an operator, will also have to label an edge, which must, presumably, join the two representations above. The result
Categorial grammar Plato
73
Aristotle and
taught Aristotle
admired Plato
is not a tree at all, because the 'and' edge does not join the other edges at a node. So much the worse for trees, someone will doubtless say; why not develop, instead, the notation just sketched out? Well, this new notation depends upon an absolute distinction between operator (or schema) and operand, since operands only label nodes and operators only label edges. When considering the replacement of 'subject' and 'predicate' by 'operand' and 'operator', I observed that both pairs are relative terms, which implies that an expression which is an operator in one context may be an operand in another. No examples have yet been cited, but they will make their appearance shortly, and this tells decisively against any method of representation which is based upon the schema/operand distinction rather than upon category differences. Nevertheless, we can give due weight to the argument for drawing the primary distinction between schema and operand by insisting that what corresponds, in a tree, to a schema is not just an operator labelling a node, but the operator together with the edges associated with it in virtue of its degree and also the category symbols (which are schematic symbols) at the other nodes of those edges. In other words, the sign for a schema in tree notation is everything on the right-hand side of a production rule in expansive form which introduces a terminal symbol of degree > 0. The operator is a part of this sign just as it is a part of a schema. To return, now, to bar-Hillel's point about the difference between assigning 'loves' to category S(N,N) and to category S(N)(N)\ if, following Frege, we regard 'loves' as the operand of a schema '( loves £' of degree 2, then we have three alternative ways of using it. First, we may simultaneously, 'in one complex step', as bar-Hillel puts it, substitute 'John' for '(' and 'Mary' for '£' to obtain the proposition 'John loves Mary'. That corresponds to Ajdukiewicz's category S(N,N). Second, we may simply substitute 'Mary' for '£', to obtain the schema '£ loves Mary', of degree 1. That corresponds to bar-Hillel's first 'simple step' when 'loves' is assigned to category S(N)(N), which combines with one expression of category TV to yield another operator, of category S(N). Third, we may just substitute 'John' for '(', obtaining the schema 'John loves
74
Logic: trees
But, for Frege, it is the same schema which has been used in all three cases, so, for him, there could be no distinction between category S(N,N) and category S(N)(N). Moreover, the flexibility offered by Frege's account is essential to logic, because there may be some argument schemas in which '( loves Mary' occurs but others in which 'John loves £' occurs, but no further analysis of either is needed. Thus 'Everyone loves Mary; ergo John loves Mary' is an instance of the schema: 'Everyone (/>; ergo John >', whereas 'John loves everyone; ergo John loves Mary' is an instance of the schema '> everyone; ergo (j) Mary'. In neither schema does the substitution for (j> have to have the inner structure exemplified by 'loves Mary' and 'John loves' respectively. 'Yawns' could be substituted for (f) in the first schema and 'There is a devil in' for <> / in the second. This flexibility can be secured for categorial grammar by two modifications. The first is a simplification: in a category name, only basic category names may be substituted for Co. This corresponds to the result of substitution for each of the schematic symbols in a schema, which will always be an expression of a basic category, precisely because it no longer contains any schematic symbols. This rules out Ajdukiewicz's category S(N)(N), leaving only S(N,N). But now, second, we must modify the rule of combination to allow for a partial combination, that is, as well as S(N,N) N N »
S,
we must allow S(N,N) N »
S(N).
The obvious way to do this is to break up the first combination into two steps, allowing only one operand to be combined with an operator at a time, viz. (C) C 0 (C b . . .,Civ . .,Cn) Q
»
C0(C,,. . .,Q-i,C l+1> . . .,Cn).
I have called the system which results from these modifications Fregean categorial5 grammar in order to distinguish it from Ajdukiewicz's variety 5
Actually, the term 'categorial' is unfortunate in that every grammar is necessarily categorial in the sense of employing categories. Ajdukiewicz used 'functor' as equivalent to my 'operator', so 'functorial grammar' would have been better. But 'categorial grammar' is now the accepted usage. At any rate, I shall count as categorial any grammar which embodies the idea that expressions have an operator/operand structure. Martin (1984) has argued that Frege's Grundgesetze system is not a categorial grammar, but his case rests upon an idiosyncratic notion of the latter. He claims, first, that Frege does not set out a grammar 'in the sense of generating all well formed expressions from a limited lexicon and a restricted set of formation rules' (p. 151). But nearly all of what has been called 'grammar' throughout history would fail this criterion, while the demand for
Quantification
75
(Potts, 1978b), but I shall call (C) 'Ajdukiewicz's rule' because it is the rule-schema which corresponds to (A) in Fregean grammar. 2.5 Q U A N T I F I C A T I O N The divergence between Ajdukiewicz and Frege is even more marked over generality, that is, the analysis of propositions containing one or more quantifying phrases - typically, a count noun and a qualifying numerical adjective, like 'three lorries'. But their interest centred upon two unspecific numerical adjectives, 'every' and 'some' or 'a' (in the sense of 'at least one') and, in Frege's case, the 'dummy' count noun 'thing', giving us the combinations 'everything' and 'something' respectively, the universal and existential quantifiers.6 Ajdukiewicz assigned the quantifiers and other quantifying phrases to category S(S), the same as that of 'not'. On the face of it, this will not work at all. Take a very simple example like (5)
A dog barked.
We can argue as follows that the category of 'barked' is S(N). Given that 'Fido' is the name of a dog (category N), then 'barked' in (6)
Fido barked
must be of category S(N), since (6) is a proposition (category S). But there is a valid inference from (6) to (5), so 'barked' must have the same meaning in both, or the inference would be invalid by reason of ambiguity. Now our categories are semantic, that is, categories relative to meaning, so a difference of category automatically carries with it a difference of meaning. Hence the category of 'barked' in (5) must also be S(N). So far, Ajdukiewicz would have agreed. Yet there is no way in
6
explicit formation rules is blind to the importance of notation, to which Frege always devoted great attention. The point of Frege's ideography is that it should be impossible to express in it a structure which could represent nonsense and, with such a notation, explicit formation rules are redundant. Martin also appears to hold that an account of the truth conditions of propositions in terms of the Bedeutungen of their parts is integral to categorial grammar, for he writes 'The explicit versions of the categorial interpretation all agree that the operators, both the connectives and the variable binding operators, are to be read as referring expressions standing for semantic operations on references' (p. 147). It will be clear from the end of section 2.3 that my use of 'categorial grammar' carries no such implication; at most it supposes that the meaning of an expression is to be explained in terms of the meanings of its parts (operands) and the manner of their combination (the schema), but leaves quite open just how the explanation should proceed. The term 'quantifier' relates to the question 'How many?' and not, as its affinity to 'quantity' might suggest, to the question 'How much?' As to Frege's representation of the specific count nouns in quantifying phrases, that will engage us in a later chapter.
76
Logic: trees
which 'S(S) S(N)9 can be combined by rule (A), (B) or (C) to yield a proposition. So how could Ajdukiewicz have thought that the category of quantifiers was S(S)1 The answer is that Ajdukiewicz based his assignment on Frege's notation for generality but, like many others, misunderstood it. This will become clear by explaining Frege's notation. 7 To begin, if we are justified in assuming that 'barked' has the same category in (5) as in (6), namely S(N), it seems that we must take it to be the operand in (5), although it is the operator in (6). For the expression 'a dog' is not a proper name, whereas the category of 'barked' shows that, if it occurs as an operator, it requires a proper name as its operand. Hence, the operator in (5) must be 'a dog' and, since (5) as a whole is a proposition (category S), while the operand 'barked' is of category S(N)> it follows that the category of 'a dog' will be S(S(N)). In that case, the structure of (5) should be simply: a dog (barked). But this is inadequate, and it is extremely important to see why. Wittgenstein observed, of such a notation, that 'we should not know what was generalized' (1922, 4.0411). We need a more complex example than (5) in order to bring out the force of this criticism, such as example (6) of section 1.3, viz.: (7)
At least one patient was visited by every doctor.
By analogy with our approach to the analysis of (5), we begin by categorizing the verb 'visit', which can be done by showing how it will combine with proper names to form a proposition, as, for instance, in (8)
Dr Patel visited Mrs Wilson.
This is comparable to example (1). Since 'visit' requires two proper names to form a proposition, its category will be S(N,N). So a semantic structure for (8) can be displayed in linear notation as (8A)
visited (Mrs Wilson, Dr Patel).
Now we have already decided that 'a dog' should be assigned to category S(S(N)). So presumably the same will go for other quantifying phrases, of which 'at least one patient' and 'every doctor' will be examples. For in each case we can use them to construct propositions which are strictly 7
A parallel account will be found in Dummett (1973, chapter 2, pp. 8-33), though with category assignments remaining implicit. Lewis (1970, p. 193) assigns quantifiers to category S(N,S), a possibility which Ajdukiewicz (1935, p. 17) discusses and rejects. But this shows how necessary it still is to expound Frege's notation.
Quantification
77
comparable to 'a dog barked', for example 'at least one patient died' and 'every doctor was incompetent'. If we now compare (7) with (5), we see that the structure which was assigned to the latter is also to be found in the former, with 'at least one patient' corresponding to 'a dog' and 'was visited by every doctor' to 'barked'. Hence, if 'at least one patient' belongs to category S(S(N)) then 'was visited by every doctor' will belong to category S(N) and be the operand of 'at least one patient'. However, 'was visited by every doctor' has a complexity which 'barked' lacks, and we have already decided that 'visited' belongs to category S(N,N) while 'every doctor' belongs to category S(S(N)). Of course, 'was visited by every doctor' is not a simple combination of 'visited' and 'every doctor'; the verb is in the passive voice. But the whole expression 'was visited by' can also be assigned to category S(N,N)9 since we can have, for example, 'Mrs Wilson was visited by Dr Patel'. So it seems that an expression of category S(S(N)) ('every doctor') must be able to combine with one of category S(N,N) ('was visited by') to form an expression of category S(N) ('was visited by every doctor'), even though we cannot yet offer a justification for this result. If this combination is to be approved, it still remains to decide which expression is the operator and which the operand. Analogy suggests that the quantifying phrase is the operator and the verb its operand, for although 'was visited by every doctor' is not a proposition, it is composed, like (5), of a quantifying phrase and a verb, only now a transitive instead of an intransitive verb. This would mean that the works upon relationship goes from right to left, but that is a phenomenon which we have already encountered: the quantifying phrase is a suffix operator. In any case, (7) could be rearranged in the same way: At least one patient, every doctor visited is a little unusual, but not out of the question in English syntax. At the same time, it puts the verb back into the active voice, so if we took our cue from this, we might display a semantic structure for (7) by: (7')
at least one patient (every doctor (visited)).
This brings us at last to Wittgenstein's criticism. Visiting requires a visitor and something visited. In this case both are generalized, and our analysis forces us to write both of the general expressions in front of the verb. Normally, however, we can tell which is the visitor and which the visited by the positions of the corresponding expressions relative to the verb: when the verb is in the active voice, the name of the visitor is written before it and the name of what is visited after it, and the opposite
78
Logic: trees
when the verb is in the passive voice. This indication of the meaning is lost when operators are always written before operands, so how do we know from our structure for (7) that the latter is about doctors visiting patients rather than conversely? Are we talking, now, of doctors visiting patients or patients visiting doctors? There appears to be a quite straightforward reply to this difficulty. Our representation of (8) used a convention to determine which of Dr Patel and Mrs Wilson is visitor and visited, so why not lay down a corresponding convention for representations of (7) and similar propositions? Thus we could agree that the immediate operator upon 'visited' is to describe the visitor and the operator upon that, the visited. But this will not work. Looking at (7) again, there is no reason why we should not analyse it, in the first instance, into 'every doctor' as operator and 'at least one patient was visited by' as operand; if we then go on to analyse the latter into 'at least one patient' as operator and 'was visited by' as operand, we obtain the structure: (7")
every doctor (at least one patient (was visited by)).
Now we agreed that the immediate operator upon the active verb should describe the visitor, so the corresponding convention for the passive verb would be that its immediate operator should describe the visited. That is exactly what we have here, with the more remote operator describing the visitor, as we should also expect. It was shown in section 1.3, however, that (9)
Every doctor visited at least one patient
and (7) will then have different truth conditions and that, in consequence, we have to account for a double complexity in these propositions: we have to show the semantic relationship of each quantifying expression to the verb and also that between each of the quantifying expressions. Now the analyses (7A) and (7B) show that it is not enough merely to distinguish between the categories of proper names and of quantifying expressions in order to give a correct structural account of generality. But it is an essential first move, because it permits us to exhibit the scope relationships of the two quantifying expressions. A price, however, has been paid; by contrast with the original propositions, the analyses no longer show how the quantifying expressions relate to the meaning of the verb (for that, we had to invoke a supplementary convention). So the next step must be to restore an explicit indication of this.
Quantification
79
Frege's solution was to write the verb as it is represented in (8A), but to link8 the positions occupied there by operands to the quantifying expressions. In the case of the simple example (5) with which we began, there is just one such link, so that we have: a dog (barked ( I ))
I
Instead of (8B), however, we obtain: every doctor (at least one patient (visited (I , ))) I I It is to observe that we no longer need the passive voice of the verb, whose syntactic function is to enable us to alter the relative linear order of the quantifying phrases, whereas in the semantic analysis we are only concerned with their scope order. The right-hand ends of the links show that the patient is the visited and the doctors the visitors, given only the convention regarding the meaning of 'visited' established for (8A). The right-hand ends of the links can also be switched; thus there are four possible combinations of the two quantifying phrases and the verb, each of which has a different meaning. To draw out the links in the manner shown above would yield a notation clumsy to work with and even, on occasion, difficult to read, for instance if there were several quantifying phrases and perhaps crossing links. So it is neater to label each link - say as the x-link, the y-link and the z-link, with further letters as required - and then just to write the label at each end of the link, leaving it to the reader to connect the two ends in his imagination. We should then get, instead of the two representations above, a dog:x (barked (x)) every doctonx (at least one patient:y (visited (y,x))). This is, in essentials, Frege's notation for generality. Moreover, since each quantifying expression requires a link, the latter is an integral part of the notation for such an expression. The quantifying expressions in the examples above, for example, would have to be written as 'a dog:x (0(x))\ 'every doctor:* (>(*))' a n d (*))', and the 8
The notation for links used in this paragraph derives from the lines of identity' in Peirce's existential graphs (1960, vol. 4, sections 385 and 407-8), as subsequently used by Bourbaki (1954) and, in a numerical sub-script variant, Quine (1965, pp. 73-4). It is not found in Frege.
80
Logic: trees
two quantifiers as 'everything:* ($(x))' a n d 'something:* (4>(x)y respectively. We can now see how Ajdukiewicz came to assign quantifying phrases to category S(S). When a schema of category S(N) is supplied as operand to a quantifying phrase, the second link letter in the latter takes the place of a proper name which, in the simplest case, would be required in order to obtain a proposition from the schema. Frege himself was responsible for borrowing the term 'variable' from mathematics (a loan which he later regretted) and the second occurrences of link letters were subsequently called 'bound variables'. This has encouraged people to think of them as constituents of logical formulas, rather than merely as structural signs showing how the constituents are combined. It then seems reasonable to ask to what category the bound variables belong and to assign them to that of names. So we get, for example: A dog:x (barked (x)) S(N) N S(S) S S
in which the categories combine to yield a proposition. But, according to this, 'barked (x)' is a proposition. Yet it is neither true nor false, so how can it be one? The issue is commonly fudged by calling it an 'open sentence', but what that really means is a schema, for the V in 'barked (x)' marks a place where a proper name may be placed to yield a proposition.9 If it is open, it is not a proposition; and, if it is a proposition, it is not open. Moreover, as Geach has pointed out in his translation of Ajdukiewicz's paper (1967, p. 635; see Ajdukiewicz, 1935), if'barked (x)' belongs to category S and 'A dog:*' to category S(S), then 'A dog:z (barked(x))' should also represent a proposition. Ajdukiewicz's category assignments offer no explanation why the two link letters must be the same.10
The assignment of quantifying phrases to category S(S(N)) introduces a new distinction between schemas (and also, hence, between 9 10
See Dummett (1973, pp. 16-17). To be fair to Ajdukiewicz, he does impose as a condition upon quantifying phrases (here called 'operators') 'dass in dem Argument eines jeden Operators, d.h. in dem Ausdruck, auf welchen sich der Operator bezieht, jeder durch die Gestalt des Operators angezeigten Veranderlichen eine innerhalb dieses Arguments nicht gebundene gleichgestaltete Variable entspricht' (1935, pp. 22-3) ('that in the argument of each operator, that is, in the expression to which the operator relates, to every variable indicated by the form of the operator, there corresponds an equiform variable that is unbound within this argument'). And, to signal this, he wrote
Quantification
81
operators). So far they have been differentiated by degree, but now we can also classify them by level. The level of a schema is always one greater than the level of its operands, and a schema whose operands are expressions of a basic category (for Frege, proper names) is of first level. Schemas whose operands are of different levels are of mixed level. So 'barked' and 'visited' are the operators of first-level schemas, the first of degree 1 and the second of degree 2, while quantifying phrases are the operators of second-level schemas of degree 1. Another, simple way of determining the level of a schema is to count the number of nested pairs of parentheses in its category name. 'SfN)9 and 'S(N,Ny both have only one pair, and so are first level, 'S(S(N)y has two and so is second level, while 'S(S(S(N))y would be third level. Schemas of second or greater level are customarily called higher-level schemas. The classification of schemas by levels is a way of showing that the notion of a part or constituent in a semantic structure is relative. Operands are parts relative to the schema of which they are operands. Expressions of basic categories are the only ones which are always parts, because they can only occur as operands. To some extent, this is reflected in Frege's notation, in which there is a sharp distinction between expressions of basic categories and schemas, so much so that, in his mature work, he never writes operators apart from their schemas. Thus by his precept we should write, for example, '( barked' instead of just 'barked', '( visited £' instead of just 'visited' and 'every doctor:* (>(*))' instead of 'every doctor'. It is to observe that the schema always shows what kinds of expression are required as operands in order to form a proposition (or other expression of a basic category). Thus the link letters in 'every doctor.x (<^>(x))' show that a first-level schema must be substituted for '>(. . .)', while in the third-level schema 'every property:f (Q:x (f(x))', the link letters again show that a second-level schema must be substituted for '£l:x (. . .(x))'. Once we introduce higher-level schemas into categorial grammar, Ajdukiewicz's rule becomes inadequate. We can see this by considering a proposition in which 'not' occurs within the scope of a quantifying phrase, that is, one whose semantic structure could be represented by the schema 'everything:x (not ($(x)))' or 'something:x (not (<£(x)))'. Suppose \— Is
instead of
—
s
for the category of quantifiers. But there is nothing in this notation which shows that the operator is linked to its operand, and in the examples he gives the category name preceded by the vertical stroke combines with other category names in exactly the same way as that without the vertical. Moreover, the second and subsequent occurrences of the link letter are still assigned to category N.
82
Logic: trees
that the schema be completed with a first-level schema 'F(', of category S(N). But the operand of the quantifier, 'not (F()\ must also be a schema of that same category, while the category of 'not' is S(S). So negation should be able to operate upon a schema of category S(N) (and, in general, upon any propositional schema) to yield a new schema of the same category as its operand. Yet Ajdukiewicz's rule does not license S(S) S(N) » S(N). In order to provide for this combination, a xvXt-thema is required, that is, a rule which says that, given a certain combination, a new combination is permitted. 11 Geach (1972, p. 5) proposed to remedy this defect with the rule: (D)
If Co C, » C2 then Co C,(C3) » C2(C3).
This will, indeed, produce the combination cited above, but it does not cater for operators of degree > 1. Suppose, for example, we want to combine QS(S,S) S(N) S(N)\ It will give us S(S,S) S(N) » S(S,N), but then leaves us high and dry with no means of combining 'S(S,N) S(N)\ In order to deal with this problem, Geach provided a further rule, (E)
If Co Ci C, » C2 then Co Q(C 3 ) C,(C3) » C2(C3).
But this produces the wrong result. It gives us, for instance, S(S,S) S(N) S(N) » S(N), whereas the correct combination is S(S,S) S(N) S(N) » S(N,N).12 But by what right do I speak here of a 'wrong result' and the 'correct combination'? By reference, of course, to Frege's ideography, which, 11
12
The motivation which Geach gives for adding a rule-thema is that we should want to account 'does not fly' a coherent sub-string of 'Socrates does not fly' and 'not every man' a coherent sub-string of 'not every man flies' (1972, p. 484). But although Ajdukiewicz cannot do this, he can show that the two sentences are of category S, so the extra rule is not absolutely necessary on these grounds alone. It is only when the sub-string is used as the operand of a higher-level operator, as in 'some man does not fly' or 'some people can drive every make of car', that the rule-thema becomes essential. Some other attempts at a comparable rule are even more disastrous. Thus Ades and Steedman (1982, pp. 526-7) have a rule of Forward Partial Combination, for which Geach is cited as a precedent, which will not combine lS(S.S) S(N) S(N)' at all. Allowing the Ajdukiewicz category 'S(S)(S)' for 'S(S,S)\ the rule permits combination with one *S(Ny to yield *S(S)(N) S(N)\ but this is irreducible. Nor would it help to reverse the order of these two category names, for although Ades and Steedman include
Quantification
83
providing we stick to Fregean categorial grammar, is a touchstone by which our rules can be judged.13 So, now, consider a schema like 'if n thenCT\Instead of substituting a proposition for each of the schematic letters, let us substitute a propositional schema of category S(N), say '( snores' and '£ yawns'. Then we shall obtain 'if ( snores, then £ yawns'. This contains two schematic letters for each of which an expression of category N may be substituted and, clearly, we must be allowed to substitute different names for each, in order to obtain, for example, a proposition like 'If Philip snores, then Andrew yawns'. It seems that Geach was misled on this point by cases in which a complex expression of this kind appears to serve as the operand of a quantifying phrase, as in 'Everyone who snores, yawns', represented in Fregean notation by: everyonerx (if x snores, then x yawns). For if it is the operand of a quantifier, must it not be of category S(N), that is, of degree 1? The answer is that there is a difference between the schema 'if ( snores, then £ yawns', which is of degree 2, and the schema 'if C snores, then ( yawns', which is of degree 1. In the latter, the same schematic symbol occurs twice, which signifies that the same name must be substituted for both of its occurrences; in the former, we may substitute the same name for both schematic symbols, but need not do so. The operand of the quantifier in the above example is the schema of degree 1, not that of degree 2. In categorial grammar, a special operator will be required to convert the schema of degree 2 into the corresponding schema of degree I.14 The opportunity of re-formulating Geach's rules may be taken to revert to production rules for a tree grammar instead of combination rules. A minor variation on the expansive form rules of section 2.1 will, however, be that category symbols will be used throughout, so that a tree is generated whose nodes are labelled entirely by category symbols. Propositions may be thought of as being obtained by substitution for the category symbols in the tree; this leaves the assignment of terminal
13
14
rules of backward combination, they say, astonishingly, that 'English does not appear to include a rule of 'Backward Partial Combination" (p. 528) and hence do not provide one. The source of the inadequacy in their treatment seems to be that they have only thought about operators of degree 1. Levin (1982) complains that Geach does not justify his combination rule except on the ground of utility, and offers an alternative justification based on a functional interpretation of the notation. That, however, stands or falls with the interpretation (with which Geach has little sympathy), and Levin has missed the implicit reliance upon Fregean principles of analysis. How this would work is spelled out in detail in Potts (1979).
84
Logic: trees
symbols (linguistic expressions) to categories for separate discussion in a later chapter. A notational convention will allow considerable simplification of the rules: if a rule yields a category name of the form C0(C],. . .,Cn)(Cm), this is to be taken as equivalent to C0(CIt. . .,Cn,Cm). The production rule corresponding to Ajdukiewicz's original combination rule then becomes the rule-schema: (Rl)
Co =» Co(C,)Ci.
So, starting with *S\ we could obtain 4S(N) N' and then, taking Cj = N again and C0 = S(N), obtain *S(N)(N) N N \ which reduces to 4S(N,N) N N' This, in general, will be the technique for obtaining trees containing operators of degree > 1. The production rule-thema corresponding to Geach's intentions but, now, yielding results which are correct on Fregean principles, is: (R2)
If Co =» C, C2, then C0(C3) => Q C2(C3).
Since we have
S => S(S,S) S S S(N) => S(S,S) S(N) S
by 2 applications of (Rl), by (R2).
In order to allow a second application of (R2), one further modification is necessary: it must be legitimate, in the course of a derivation, to switch the order of operands. This cannot have any significance, because the category symbols are not interpreted. So the last line above may be rewritten as: S(N) => S(S,S) S S(N) and then S(N,N) => S(S,S) S(N) S(N)
follows by (R2).
This rule also solves another problem, that a quantifying phrase should be able to combine with a first-level schema of category S(N,N) to form a first level schema of degree 1: Since we have
15
S =* S(S(N)) S(N) S(N) =» S(S(N)) S(N,N)
by (Rl), by (R2).15
Introducing (R2) does not solve a corresponding problem about the meanings of 'not' and the propositional connectives which arises as soon as we assign them to categories S(S) and S(S,S) respectively. Wittgenstein stated the problem as follows: If logic has primitive ideas these must be independent of one another. If a primitive idea is introduced it must be introduced in all contexts in which it occurs at all. One cannot therefore introduce it for one context and then again for another. For example, if denial is introduced, we must understand it in propositions of the form 'not (p)\ just as in propositions like 'not (or (q, p))\
Quantification
85
Unfortunately, the addition of this production rule-thema is still not enough. In classical logic, it is possible to define either one of the quantifiers in terms of the other plus negation, for example something:x 0(x) | not (everything:x (not (>(x)))). Now we know that negation does not change the category of its operand, so, since the category of the definiendum here is S(S(N)), that must also be the category of 'everything:x (not (0(x)))\ as may be seen from inspection of the Fregean notation. But rules (Rl) and (R2) are not enough to prove that S(S(N)) =* S(S(N)) S(S). For that, we need a third rule, again a thema 16 : (R3)
If Co(Ci) =* C2(C3) C4, then Co(C,(C5)) =* C2(C3(C5)) C4.
The proof is then: [1] [2] [3]
S =» S(S) S S(S) => S(S) S(S) S(S(N)) =» S(S(N)) S(S)
by (Rl) by (R2) by(R3)
Of the two rule-themas which have now been introduced, (R2) cannot apply to expressions of basic categories and (R3) cannot apply either to expressions of basic categories or to first-level schemas. This prompts the 'something:x (not (Fx))' and others. We may not first introduce it for one class of cases and then for another, for it would then remain doubtful whether its meaning in the two cases was the same. (1922, 5.451, with notation amended to that used in this book)
16
So if we explain the meaning of 'not' by saying that it turns a true proposition into a false one and vice versa, we have no legitimate way of explaining its meaning in the schema 'not (barked ())', for example. This is, of course, an implicit criticism of Frege. Martin (1984) tries to rescue Frege from it by giving a substitutional interpretation of his ideography. That may be a promising approach, but Martin's attempt is unfortunately vitiated by definitions of negation, implication and the universal quantifier which are impossible on Frege's principles, since they have schemas as deflnienda but complete expressions as definientes (p. 148). This is a glaring example of the confusion between concept and object which Frege so often denounced. It is to observe that the motivation for introducing this rule is different from the motivation for (R2). That 'everything:x not $(x)' does not correspond to any sub-string is a merely superficial difference, to do with English word-order conventions, resulting in a gap between 'everything' and 'not', to be filled by an auxiliary which is part of the operand; but the schema shows that it belongs to category S(N). However, the grammar with only rules (Rl) and (R2) will be enough to show that propositions to which this schema contributes are such. The reason for introducing a further rule is, rather, that 'nothingix $(x)' can be defined as 'everything:x not 4>(\)\ so the latter must constitute a semantic unit. In general, our grammar must allow us to explain the meanings of expressions, where appropriate, by breaking them down into components which are combined in a certain way (componential analysis).
86
Logic: trees
question whether (R2) and (R3) are just the first two members of a series of rule-themas which will ascend through the hierarchy of levels, a rule (R4) being required for the correct expansion of third-level schemas, and so on. I do not know the answer to this question, and it is difficult to see how it could be decided in advance of encountering structures in everyday language for which (Rl) to (R3) do not provide. 17 An alternative method of allowing a wider range of combinations than Ajdukiewicz's original rule provides for is by type-raising (see Lambek, 1958; Van Benthem, 1988; Dowty, 1988; Steedman, 1988). This has its origin in Frege, though its recent exponents appear to be unaware of the prototype. Frege held that a proper name can sometimes occur as an operator: thus he cites '0(2)' explicitly as a second-level function name, representing 'property of the number 2', and 'not (if (everything:x (if (0(x), x = 2)), not (0(2))))' as another, representing 'property of the number 2 that belongs to it exclusively' (1893, section 22). There is a further, implicit example in his representation of the proposition 'What holds for every object, holds for A' by 'Everything:f (if (everything:x (f(x)), f(a)))\ since the first occurrence of 'everything' is there the operator of a third-level schema (1893, section 20). So in these examples a proper name has been 'raised' from its normal (basic) category N to the (second-level) category S(S(N)). This has been generalized to the effect that a category Co may be raised to category CjfCjfCo)), for any category C/. The effect of Geach's rule (D) can also be obtained by allowing the type of any schema of category C0(Cj) to be raised to category Co(C2)(Cj(C2)), for any category C2- Thus, in order to obtain S(S) S(N) » S(N), we raise S(S) to S(N)(S(N)).
Similarly, in order to obtain
S(S) S(S(N)) » S(S(N)), we raise S(S) to S(S(N))(S(S(N))). the degree 2 case
However, this will not cater for
S(S,S) S(N) S(N) » S(N,N). for which it would be necessary to raise S(S,S) to
17
S(N,N)(S(N),S(N)).
In the above, I have expounded the three rules in a somewhat simplified form in order to bring out their essential features more clearly. A more exact version, which takes account of the context in which an expression to be expanded may occur, and indicates scope relations by a system of super-scripts and sub-scripts corresponding to the edges in trees, will be found in Potts (1978b).
Quantification
87
That would call for a further type-raising rule, and so on for each successive degree or, alternatively, a complicated generalization of the original rule. Even then, we could not be confident that type raising gave us the full power of the converse of (R2). Some authors, for instance, have assigned certain complement verbs to category S(N,S) (for example Prior, 1971, pp. 18-19), so we should want to be able to show that S(N,S) S(N) » S(N,N), for example, when such a verb fell within the scope of a quantifying phrase. It is also difficult to see how any type raising could achieve the effect of the converse of (R3). One could, indeed, raise the category of C4 so that it became the operator, but we do not need (R3) to justify, for instance, the combination S(S) S(S(N)), while in the complex schema 'something:x (not (0(x)))\ 'not' is the operand. Since I first introduced (R3) in (1978), it has received neither mention nor comment in the literature on categorial grammar. Perhaps the reason is that it may be unnecessary in a grammar which does not allow for alternative analyses, which is fairly typical of syntactically based categorial grammars which have appeared since. I have already argued, however, that alternative analyses are necessary to a structural system which is to support an account of meaning, and in the present case it seems evident that we do not want to deprive ourselves of the possibility of defining one of the quantifiers in terms of the other, which in turn demands that the schema 'not (something:x (not (0(x))))' can be shown to belong to category S(S(N)). However, these are largely practical objections to type raising as an alternative to rule-themas and might, perhaps, be overcome by typeraising arrangements of sufficient complexity. Far more serious is that type raising embodies several confusions. First, what exactly is meant by 'raising'? In the case of changing a category Co to category CJ(CJ(C0)), it is a question of raising the level of a category; indeed, it is raised by two levels. In the case of using type raising instead of (R2) and (R3), it is more difficult to say what is going on because we do not know quite how far type raising must go and also because the Fregean distinctions of degree and level become blurred in non-Fregean categorial grammars. Is category S(N)(S(N))9 for example, of degree 1 or 2, and of level 2 or mixed level? The corresponding Fregean category, S(N,S(N))9 is of degree 2 and mixed first/second level. What we are to say about the degree of S(N)(S(N)) is unclear; if it is of degree 1, then it is straightforwardly of level 2, but if of degree 2, then mixed level. So the most that can be said about type raising in this context is that it raises the
88
Logic: trees
level of a category by 1, at least in part, and may also increase its degree by I.18 Bearing in mind that our categories relate to meaning, difference of category implies difference of meaning. One proponent of type raising, at any rate, is clear on this point, paraphrasing Geach's rule (D): 'if an expression occurs in category A(B), then it can also occur in A(C)(B(C)) (with an evident transfer of meaning)' (Van Benthem, 1988, p. 36). But, of course, this is precisely not the intent of Geach's rule or of other rule-themas in categorial grammar. We do not want to say, for example, that the meaning of 'not' is different in 'Some dogs did not bark' from its meaning in 'Fido did not bark'. This is not like the extension of our notion of number when we move, for example, from natural to rational numbers or from rational to real numbers.19 There is absolutely no evidence that someone who understands the meaning of 'not' in 'Fido did not bark' and also understands the meaning of 'Some dogs did bark' must learn something further in order to understand the meaning of 'Some dogs did not bark'. And, lest anyone should consider looking for such evidence, if 'not' and other terms are multiply ambiguous in the way that type raising would demand, we can say goodbye to most logical inference: it will be invalid for ambiguity. The whole point of rule-themas is to allow a wider range of generation or combination than the rule-schema alone would provide, without recourse to re-categorization; that the same results regarding semantic coherence can be achieved by type raising instead, is irrelevant. Type raising is also a misleading way of explaining differences of two levels in category assignments. An example commonly cited is Montague's assignment of proper names to category S(S(N)) instead of to category N, as though he first of all assigned them to the latter 18
19
Some versions of categorial grammar resolve this ambiguity by only allowing categories of degree 1. The level of a category C0(C]) is then defined as whichever is the greater, the level of C/ plus 1, or the level of Co. So, on this account, type raising is always of level and never of degree, there are no mixed-level categories and category S(N)(S(N)) is of level 2. A double price must be paid for this solution. First, the flexibility of analysis which, as argued at the end of section 2.4, logic requires, is lost. Second, the linguistic generalization expressed by distinctions of degree is also lost, even if we allow that Frege's assertion that 'Functions of two arguments are just as fundamentally different from functions of one argument as the latter are from objects' (1893, section 21; cf. 1892, p. 193) is difficult to sustain if one admits functionals (functions which have functions as their values). Though some categorial grammarians suppose that it is, seduced, perhaps, by analogies which they have found between their systems and parts of the lambda calculus (see Van Benthem, 1986, chapter 7). But mathematical ingenuity is no substitute for philosophical argument and the lambda calculus is, in any case, worthless as a tool in the theory of meaning (a justification of this judgment may be found in Potts, 1979).
Quantification
89
category, then allowed them to be 'raised' to the former when testing for semantic coherence (see Dowty, 1988, p. 159; Van Benthem, 1988, p. 37). But this is simply incorrect. He never assigned proper names to category N (his category e) nor, indeed, anything else to that category: it was an empty category, while proper names were assigned with quantifying phrases to category S(S(N)) (his category t(t(e))) from the start, with a corresponding account of their meanings (Montague, 1974, pp. 249-50).20 The examples from Frege cited above are more serious contenders for type raising but, even there, we must not lose sight of an important distinction. Frege always accounts numerals proper names (except when used adjectivally), and what he assigns as second level is not the proper name '2' but the schema '>(2)\ The latter, like the schema 'everything:x (0(x))\ contains a schematic symbol for which a first-level expression must be substituted and, hence, is by Frege's normal criterion for levels a schema of level 2. But it is a degenerate second-level schema, because it contains no link letters. So, here again, there is no question of raising or, indeed, in any way changing the category of an expression. Nevertheless, the Fregean examples present a difficulty and, lest anyone should baulk at their mathematical nature, Dummett points out that 'higher-level quantification is extremely common in natural language, and would not be regarded as puzzling or odd by its unsophisticated speakers' (1973, p. 218). From a variety of examples which he gives, I cite (10)
Paul is what I am not,
namely (perhaps), thrifty. As a Fregean representation of (10) he gives (p. 215): (10F) Something:f (and (not (f (I)), f (Paul))), which is strictly comparable to Frege's '$(2)' example. Now can we insist that the proper name 'Paul' does not occur in (10F), but only the secondlevel schema '^(Paul)'? It hardly seems plausible to claim that the proper name 'Paul' does not occur in (10) and it will not help much to say that it only occurs there as the operator of a second-level schema. For surely we can form the first-level schema '( is what I am not' from (10) by removing 'Paul', which is then being treated as a proper name? Moreover, if it is the same proper name which can be considered in (10) either as occurring as a
20
A rule allowing expressions of category N to be raised to category S(S(N)) is more properly credited to Cresswell (1973, pp. 130-3), where the motivation is to allow expressions like 'some economists and Mrs Thatcher' to be assigned to category S(S(N)) while maintaining S(S,S) as the category of'and'. I shall discuss such cases in section 7.3.
90
Logic: trees
proper name or as the operator of a second-level schema, how can it belong to two different categories without change of meaning? Ultimately Frege does seem to be committed to re-categorization in this case, with its attendant problem of a change in meaning where, intuitively, there appears to be none. Now re-categorization could be avoided by adding a further rule which will allow a schema to take an operand which is three levels, instead of one, below it (our example is of a third-level schema taking an expression of a basic category - level 0 - as operand). In the case of (10), we want to be able to derive S =* S(S(S(N))) N instead of S =» S(S(S(N))) S(S(N)) The latter is, however, in a sense the condition of the former, and the operand of the latter must be at least of second level. So, generalizing, we obtain the rule-thema: (R4)
if Co =* C, C2(C2(C3)), then Co =» C, C3
Examples involving operators of degree > 1 will be taken care of by successive applications of the rule, as with the previous rules (R1)-(R3). This rule marks a slight departure from Frege, but licenses structures which he found it necessary to admit without positing the presence of schemas raising problems about meaning. As to type raising, we may dispense with it for the purposes of semantic analysis. One concession, however, is in order. It may be that a system of categorial grammar with type raising is formally equivalent to one with recursive rules. In such a case, it may be technically convenient to prove some property of the latter indirectly, by proving it first directly of the former. Clearly there could be no objection to invoking a system with type raising for this purely technical purpose. A similar remark applies to the Lambek calculus (Lambek, 1958), which exploits an analogy between rules (Rl) and (R2) and part of intuitionist propositional logic. In this system we can, indeed, prove the two derivations which I cited earlier as obstacles to type raising alone, but it seems that neither (R3) nor (R4) would be obtainable, though the converse of (R4) is. Again, there can be no objection to such a calculus if it is used for purely mathematical purposes, but there must be every objection to it if used as a straightjacket to confine everyday language. Little interest was shown in categorial grammars during the 1960s and early 1970s because of a result published by bar-Hillel, Gaifman and Shamir (1960), now known as Gaifman's theorem, to the effect that context-free string grammars are equivalent in power to Ajdukiewicz's categorial grammar. This was unfortunate: in the first place, because the
Quantification
91
possibility of generating the same expressions by a string grammar as by a tree grammar does not show that it is indifferent so far as the meanings of those expressions are concerned whether we regard them as having tree structures or as merely having string structures; and, in the second place, because the result does not hold for Fregean categorial grammar - as soon as the recursive rule-thema (R2) is introduced, the categorial grammar becomes more powerful than any context-free string grammar. Some, indeed, have complained that Fregean categorial grammar is too powerful, generating structures of such complexity that no linguistic examples of them can be found. I cannot see that this matters greatly: with each minimal linguistic expression assigned a category in the lexicon, the grammar could be used both to generate and to parse complex expressions even though infinitely many possible categories were never called into play. Of course, it would be very interesting to know that language only uses certain categories of the grammar and, even more, to know why; but it is not possible at this stage in the development of categorial grammars even to put a tentative limit to category usage, and it is surely better to have a tool box so well equipped that some tools are seldom, if ever, used, than to have so few tools in it that jobs are constantly turning up which cannot be undertaken. Quantification, however, poses a difficulty for tree grammars which seems to exclude them altogether as a means of representing semantic structures. Trees are, indeed, an improvement upon strings because they embody the notion of one expression working upon the meanings of others with which it combines, but they are still inadequate to represent quantification as conceived by Frege. This has been concealed up to now because we have been using a linear notation for tree structures, but now let us revert to the planar notation with which this chapter began. Remembering that we decided that a schema should be represented by an operator plus the edges associated with it in virtue of its degree, we obtain the following diagrams for propositions (5) (7) and (9) respectively: (5A)
A dog
(7A)
at least one patient
barked
I
(9A)
Every doctor
every doctor
at least one patient
visited
visited
I
I
I
L
1
None of these is a tree, because the leaves are missing. Moreover, it is evident that we cannot tell from (7A) whether at least one patient was visited by every doctor or at least one patient visited every doctor.
92
Logic: trees
Similarly, we cannot tell from (9A) whether every doctor visited at least one patient or was visited by at least one patient. Some authors (for example McCawley, 1971, section 3; Montague, 1974, chapter 8) have dealt with this by dispensing with the edges associated with the lowest operators in the tree, but including the link letters of Frege's notation, for example (5B)
A dog:x
(7B)
At least one patient:y
x barked
(9B)
every doctonx
Every doctonx at least one patient:y
i
i
x visited y
x visited y
This does, indeed, resolve the ambiguity of (7A) and (9A) and it could be claimed that inclusion of the link letters with the operators now at the leaves of the tree shows that they are the operators of first-level schemas. However, those link letters belong, not to the operators at the leaves of the tree, but to the schemas of the second-level operators (the schema of which 'every doctor' is the operator is 'every doctor.x (c/)(x))\ which includes both link letters). So this convention is, at best, misleading and, to see the structures with which we are dealing more clearly, we need to replace the link letters with actual links. Assuming that the edge from the left of 'visited' relates to the visitor and the edge from the right to the visited, this yields the following results: (5C)
p A dog
[i
- barked L-haricen
(7C)
At least one patient
(9C)
r
at least one patient-
r- every doctor p every doctor I
vi«iitpH ———
Every doctor
1
visited -
This also restores the edges showing the degree of each of the first-level operators. But now, instead of being descending edges like those in a tree, these edges ascend from the first-level operator back to a second-level operator - that is, if we think of the edges as having a direction. These representations, however, are no longer trees, though they are graphs. A grammar powerful enough to generate such structures will, accordingly, have to be a graph grammar and Fregean categorial grammar, in spite of initial appearances, is indeed a graph and not just a tree grammar.
Quantification
93
We could always sacrifice quantification in order to remain within the simpler world of tree grammars; but it was precisely this structural analysis which enabled Frege to give a correct account of the logical relationships between propositions containing more than one quantifying phrase. Frege's originality with respect to structural analysis did not lie in the replacement of subject and predicate by operator and operand, but in his quantifier notation, although introducing the operator/operand distinction was a necessary preliminary. To logicians, to computer scientists and to most philosophers it would be unthinkable to abandon such hard-won gains now.
Computer science: graphs
3.1 GRAPHS AND GRAPH GRAMMARS A graph is a structure consisting of nodes (represented as small circles) connected by edges. The limiting case of a graph is a single node. The edges may be given a direction, in which case the structure is called a digraph. The nodes may be labelled, for example with words, and so may the edges. So both strings and trees will be special cases of graphs. Graphs allow for any connexions between their nodes and, in particular, for circuits, that is, a sequence of distinct edges and distinct nodes in which, however, the first and last nodes are the same (if they are different, it is just a path). The simplest case of a circuit is provided by a triangular graph; thus (G1)-(G3), all of which are graphs, each contain several circuits. (Gl)
(G2)
(G3)
Graph (G2) differs from the cube graph (Gl), however, in that none of the edges crosses in the former, whereas some of the edges do cross in the latter. On this account, (G2) is a plane graph, whereas (Gl) is not. However, if we consider a graph to be defined simply by the number of its nodes and the connexions between them, rather than by the way in which it is drawn, we can see that (Gl), though not plane, is another way of drawing the same graph as (G2). Thus the notion of a plane graph is not very useful and we need, instead, that of a planar graph, viz. any graph which can be drawn in such a way that none of its edges crosses. Both (Gl) and (G2) are then planar, but (G3) is not: there is no alternative way 94
Graphs and graph grammars
95
of drawing it on a plane surface so as to eliminate all of the edge crossings. This notion has been generalized in order to yield a classification of graphs. Planar graphs can also be embedded on the surface of a sphere; it is evident that the cube graph (Gl) fulfils this condition. From a sphere we proceed to a torus, that is, a ring. Graphs which can be embedded on the surface of such a body but not on the surface of a sphere are termed graphs of genus 7, or toroidal graphs, of which (G3) is an instance. If we think of the torus as a sphere with a hole in it, then we can think of more complex surfaces obtained by making further holes; for example, one with two holes would be like a closed pot with two handles. So a graph of genus n will be one which can be drawn without edge crossings on the surface of a sphere with n holes or handles, but not on the surface of one with n—\ holes or handles. We can build onto this a definition of dimensionality for the structures which things have which has an intuitive basis. If the structure can be represented by a planar graph but not by a linear graph, we can regard it as two-dimensional; if by a graph of genus 1 but not by a planar graph, as three-dimensional; if by a graph of genus 2 but not by a graph of genus 1, as four-dimensional; and so on. The graphs to be employed in this book will mostly be planar, but there will be a few cases in which we shall need a toroidal graph. I do not know what is the highest genus of graph needed to represent semantic structures for everyday language as a whole, since the scope of the present work is severely restricted. In particular, it would be unwise even to make a conjecture about what may be required until a satisfactory representation of semantic structures for propositions containing psychological verbs has been found. Graphs are generated by graph grammars. Naturally, the rules of graph grammars allow us to replace graphs by graphs, a single node being recognized as the limiting case of a graph. Thus a typical graph will contain sub-graphs: not only individual nodes, but also configurations of nodes and edges. For example, several square sub-graphs can be extracted from the cube graph (Gl). At their most general, then, the rules of a graph grammar will allow us to replace either an entire graph or a sub-graph by a new graph. This may complicate, or it may simplify, the original graph (replacement of a complex sub-graph by a single node would simplify). Such rules are clearly likely to be much more complicated than the rules of a string or of a tree grammar, because they must ensure that the new structure is a correctly constructed graph. Obviously we want to restrict these rules to what is absolutely necessary in order to generate semantic structures, so that they be kept as simple as possible. It would therefore be very premature to essay a graph grammar
96
Computer science: graphs
for the semantics of everyday language until we have a fairly clear idea of the range of structures involved. Graph grammars, however, are divided into sequential and parallel. Sequential grammars only allow replacement of one sub-graph at a time, whereas parallel grammars allow for several to be replaced simultaneously. The primary application of the latter is to biology, where it may be important to represent simultaneous changes in different cells of an organism. But a linguistic grammar is not intended to represent any actual process, only to determine what configurations of symbols are correctly formed from a semantic, syntactic or phonetic point of view. So there is no theoretical reason why we should need a parallel grammar for linguistic purposes and, indeed, the syntactic grammars so far proposed are all sequential, as is the application of the recursive definitions of a formula specified in logic. From a practical point of view, it might, of course, be more convenient, if one wanted to use parallel processing, to have a parallel grammar. But that need not concern us here and so we may safely conclude that a sequential graph grammar will take care of our needs. That is good news, since sequential grammars are less complicated than parallel ones. 3.2 S E M A N T I C N E T W O R K S Computer scientists began to use graphs for the representation of meaning from 1966 onwards, for the most part quite independently of formal methods in linguistics and logic, under the title of semantic networks (Quillian, 1966). The edges and even the nodes, however, have been very variously labelled. The earliest semantic networks represented definitions of concepts, drawn from dictionaries (Quillian, 1966, 1967). Later, these were combined with proper names and, sometimes, logical operators in order to represent propositions. A persistent theme is that of breaking down concepts into structured primitive elements, so that one could regard semantic networks as making a contribution to componential analysis (see section 3.3, note 7). Commonly, the nodes of the graphs are taken to represent concepts and the edges relationships between them. The relationships include semantic roles (as in case grammar), but much more besides, such as propositional connectives, modalities, types of attribute (for example: size, shape, having as a part) and set relations, like being a sub-set of (see Simmons, 1973). In particular, they include a relationship being a token of. The terminology is borrowed from Peirce (1906), but is not used in the sense of his type/token distinction; rather, being a token of some concept is being an instance of it. Sometimes this is used in contexts such as 'token
Semantic networks
97
(Rex, dog)', to say that Rex is a dog, but also 'token (dog, animal)', to say that the dog is an animal (with 'is a' as an alternative to 'token'; see Scragg, 1976, p. 104). At other times, however, it is used with numbered nodes, as in 'token (C2, John)', where the intention seems to be to pick out an instance of people named 'John', that is, *C2 is a John'; of course, the individual, thus conceived, cannot be given an ordinary proper name, since proper names are here being deemed common count nouns (on the ground that many people are called by the same name: more of this in section 6.2). All this is confused enough: in the first case, an object falling under a concept being muddled with one concept falling within another (to use Frege's way of stating the difference); in the second case, a categorial confusion between proper names and count nouns. But to make matters worse, the token relationship is also used for instances of actions, such as 'token (Cl, break)', to say that Cl is a breaking, that is, an instance of breaking. An example shows how this is applied (Simmons, 1973; Scragg, 1976, p. 122; the semantic roles have been altered to accord with section 1.2): (1)
John broke the window with a hammer
(G4)
O hammer John O O with
window This ignores the tense of the proposition, NBR (number, values siNGular or PLURal) and DET (determiner, values DEFinite and iNDEFinite) are borrowed from syntax, PREP (presumably preposition) appears in the example but is not otherwise explained. Finally, we have a token of a breaking (Cl), of a window (C3) and of a hammer (C2), and of a John (C4). Such semantic networks can also be represented in the notation of the programming language LISP (LISt Processing), in which lists are shown
98
Computer science: graphs
within parentheses (which may be nested) and the first item in any list may be defined as a function with the remaining items naming its arguments. Following Allen (1987, ch. 7), (1) could be represented by (1L)
(PAST C l BREAK (AGENT C 4 (iNDEF/SING C 7 JOHN)) (OBJECT C3 (DEF/SING C6 WINDOW)) (INSTR C 2 (INDEF/SING C 5 HAMMER))).
This is not a straight 'translation' of (G4).1 First, the TOKEN relationship is omitted in favour of a convention that the second item in each string is always a token. Second, DETerminer and NUMBer are also omitted and their values combined into a new operator, which also demands a new token-name as its first operand. It is then quite unclear what the extra token-names name; for example either 'C4' or 'C7' is the name of the Agent, but what, then, does the other name? Third, the preposition 'with' is omitted from the representation; this is an improvement, since iNSTRument tells us the semantic role of the hammer and provision of 'with' can be considered a matter for syntax. It will be evident that the nodes in graph (G4) represent a remarkable hotch-potch of disparate elements and that there is no attempt to distinguish between categories of expression. Similarly, the relationships between the nodes strike one as an arbitrary collection many of which would be difficult to justify as relationships between concepts. Indeed, most of the nodes themselves seem no longer to represent concepts: in particular the nodes C1-C4, but also nodes like 'with' and 'INDEF'. NO justification is offered for introducing a token of breaking, rather than simply replacing the label 'Cl' with 'break', although this may be inspired by Davidson's analysis of action-sentences (see Allen, 1987, p. 213). Finally, there is no indication of how the meaning of the proposition represented is organized by scope. But I do not need to labour these criticisms, for the ambiguities in semantic network notation and its indeterminacy of meaning have already been more thoroughly exposed and criticized by Woods (1975). Woods also observed that most systems of semantic networks are unable to handle quantification properly. At that time, he found only two exceptions. The first (Kay, 1973) represents universal quantifiers by nodes labelled with the second and subsequent occurrences of their link Allen derives representations such as (1L) from a syntactic description of the sentence or expression by means of semantic interpretation rules which are applied in turn to each of the elements in the syntactic description to yield partial semantic interpretations. The latter are then merged into a single semantic representation for the whole expression. Merging resolves any ambiguities of single words in the expression and is clearly modelled upon the projection rules of interpretative semantics described in section 1.5.
Semantic networks
99
letters. Existential quantifiers are eliminated in favour of edges labelled by Skolem functions. These edges are directed from a node labelled with the link letter of the existential quantifier to that labelled by the link letter of a universal quantifier within whose scope it lies. Skolem functions may be explained as follows. Since an existentially quantified proposition will be true just in case something falls under the concept expressed by its operand, we can replace the quantifier by a function whose value is the something in question. That is, instead of 'something:x (f(x))\ we may write 'f(g)\ where g is a function (in this case, without arguments) of whose value / is true. When existential quantification occurs within the scope of universal quantification, however, the value of which the predicate is true will depend upon which case covered by the universal quantifier we happen to be considering. Thus, if every girl loves a sailor, we cannot determine the sailor until we know which girl is in question, for each girl may love a different sailor. So the Skolem function which replaces the existential quantifier in this case must be differentiated for each girl. To this end we write: 'every girl:x (loves (x, g(x)))', where g is a function whose arguments are girls and whose values are sailors. Of course, we may not be able to specify the Skolem function, any more than we can specify what object satisfies an existential proposition. The procedure for replacing existential quantifiers by Skolem functions presupposes that our propositions are in prenex normal form, that is, with all the quantifiers at the beginning. The universal quantifiers are then removed, leaving the second and subsequent occurrences of their link letters to represent them. The example thus becomes: 'loves (x, g(x))\ According to Woods, we can then 'obtain a semantic network notation based on this Skolem function analogy by simply including with every existentially quantified object a link which points to all of the universally quantified objects on which this one depends' (1975, p. 77). Reduced to its barest form,2 though with the count nouns of the quantifier phrases restored, the semantic network which this yields is: (G5)
2
girl Q
Q sailor
Kay's network is more complicated (1973, p. 183). He introduces extra nodes in order to show this case of loving as a member of the class of lovings, and then duplicates the
100
Computer science: graphs
In order to represent the proposition obtained by switching the order of the quantifiers, There is a girl whom every sailor loves', Kay simply changes the direction of the edge labelled with the Skolem function (cf. the example in 1973, p. 184). But this fails to distinguish the above from 'Every sailor loves a girl'. On closer inspection, indeed, one can see that his notation lacks the necessary multiplicity, because a single sign (the edge labelled with the Skolem function) is simultaneously trying to show the scope order of the quantifiers and whether each is universal or existential. A further difficulty (noted by Woods) is that one cannot obtain a network for the negation of a proposition so represented by attaching a negation sign to it; instead, one must first work out the prenex normal form of the new proposition and then construct a new network. More generally, then, such a network cannot be incorporated as it stands into another as a sub-graph. Since Woods's article appeared, several attempts have been made to make the notation more rigorous and to provide for quantification. Thus Hendrix (1975a, 1975b, 1979) severely restricts the labels available for edges and introduces partitions, boxes enclosing groups of nodes (not necessarily forming sub-graphs), as a means of representing logical operators. A link from a node within a given box to one outside it is regarded as falling within the box if its label is written within the latter. Conjunction is then represented by enclosing nodes representing individual states or events within the same box. Disjunction, however, requires a much more complex apparatus: first, separate boxes each containing a node representing an individual state or event are connected to a node D which in turn is linked to a node Disjunctions by a setmembership edge. This is supposed to mean that D is a member of the set of disjunctions (by analogy with this man being a member of the set of men)! The edges from the boxes to the node D are also labelled by setmembership, but qualified in this case to make the boxes disjoint members of D (p. 67, fig. 10). Negation is represented similarly, though with only one box and no node corresponding to D (p. 68, fig. 11). These
Skolem function edge from the individual loving node to the universal quantifier node. But this does not affect my criticism. (G5) contains two innovations. First, duplication of nodes with the same label is avoided by having an edge both from the 'girl' and the Moves' nodes to the 'x' node, and similarly both from the 'sailor' and the 'loves' nodes to the vy' node. Second, the network has directed edges, unlike (G4). These are now usual features of semantic networks. I shall develop them in the next chapter. Meanwhile, it is to observe that no general explanation is given of what a directed edge signifies. Often it appears to relate to the way in which the network is stored on a computer rather than to any logical features of the representation itself.
Semantic networks
101
representations are barely intelligible, and it is quite unclear how transformations such as those in de Morgan's laws might be effected. Hendrix requires no special notation for existential quantification, since 'the occurrence of a structure (that is, a node or arc) in a space [ = box] is taken to be an assertion of the existence with respect to that space of the entity represented by the structure' (p. 69). Since his nodes can represent virtually anything, this introduces implicit existential quantification on a vast scale, for example over the node D described above, but he does not pause to consider any problems which this might raise. Universal quantification is represented in connexion with implication. The latter is shown in a similar way to disjunction, with a node / and another node Implications. But now the edges are from / to the boxes and are labelled <3«te[cedent] and coAis^quent] respectively. For universal quantification, a node representing an individual (whose label is not a proper name) is included within two overlapping boxes one of which is the antecedent and the other the consequent of an implication. This notation is inspired by Frege's use of unrestricted quantification, so that, for example, 'Every man owns a car' is analysed as 'everything:M (if (M is a man, M owns a car))'. The node labelled 'M' would then be placed in the two overlapping boxes (cf. p. 83, fig. 18). The treatment of quantification again raises many unanswered questions. How may 'everything:x (not (F (x)))' be transformed into 'not (something:x (F (x)))'? How can universal quantification be combined with operators other than implication? Moreover, if universal quantification is to be analysed in the Fregean manner, then why is, for instance, 'Some man owns a car' not represented so as to show a hidden conjunction ('something:M (and (M is a man, M owns a car))')? Again, it is totally unclear how other quantifiers, whether numerical adjectives or inexact quantifiers like 'few', 'several', etc., should be represented. These difficulties spring from a single source. Hendrix has not thought about the categories of the expressions which he is seeking to represent, and in consequence has quite disparate and, indeed, incommensurable representations for expressions of the same category: for conjunction and disjunction, both of category S(S,S)\ and for universal and existential quantification, both of category S(S(N)). Moreover, the interpretation of his networks is uncertain, and would become increasingly so the more complex their content, because they contain no indication of scope. Numerous references to Woods's article in the work of Cercone and Schubert show that they have taken note of his criticisms (Cercone, 1975; Schubert, 1976; Schubert, Goebel and Cercone, 1979). In their system, each proposition is represented by a node which is connected to a node for its main verb by an edge labelled 'PRED', and also connected to nodes
102
Computer science: graphs
for each of the verb's operands. Propositional operators are represented by nodes with an edge from a new propositional node which, in turn, is connected by edges to the operands of the propositional operator, for example (2)
Either Jim sat or Sue stood. or
(G6)
Jim
-OSue PRED
PRED sat
stood
Quantification is expressed using Skolem functions, but with one important modification to Kay's method of representing them in semantic networks. Schubert (to whom the notation is primarily due; see 1976) distinguishes universal from existential quantification by using dotted lines for the nodes of the former, leaving nodes representing quantifiers unlabelled. Thus, taking (3)
every girl loves a sailor
again as an example, the Skolemization is, in linear notation, (3F)
if (girl (x), and (sailor (f(x)), loves (f(x),x)))
(G7)
sailor
scope
Semantic networks
103
and the corresponding semantic network, (G7). (Instead of Schubert's dotted node for universal quantification I have used a square node and instead of his dotted edge between the quantifier nodes, simply labelled the edge 'scope'. His dotted edges are unlabelled, the Skolem function which is explicit in Kay's notation becoming implicit. Schubert gives this edge the opposite direction to Kay, so that it is now also the direction of the scope.) Schubert introduces an abbreviated notation which absorbs the propositional nodes, turning predicates and propositional operators at the same time into labelled edges. This greatly simplifies the appearance of his networks. It comes close enough to the notation used in (G7) for us to ignore the differences for present purposes. The abbreviated network for (3) is then (G8). In Schubert's notation the corresponding network for (4)
There is a girl whom every sailor loves
would be obtained simply by deleting the edge from the universally quantified node to the existentially quantified one. But it would surely be clearer to leave the scope edge, though switching its direction, as in (G9). The notation then has the necessary multiplicity: the different types of node distinguish between universal and existential quantification, while (G9)
(G8) Q girl
sailor Q
sailor Q
the scope edge shows their mutual scopes. The presupposition that each network corresponds to a proposition in prenex normal form still prevents us from incorporating networks as subgraphs of larger ones. It can also exclude representation of some propositions containing modal operators altogether. For example, if it is possible that there will (someday) be a woman Pope, it does not follow that there is (already) a woman of whom it is possible that she will (someday) be Pope. Thus there is no way of expressing the first of these propositions in prenex normal form. Schubert proposed to overcome this limitation by using scope edges (subsequently called scope inclusion links)
104
Computer science: graphs
to show when a quantifier falls within the scope of a propositional or modal operator. One of Schubert's examples (1976, p. 180) is: (5)
Mary will receive a scholarship provided that she passes all of the exams.
He represents this in linear notation as: (5F)
iff (something:x (and (scholarship (x), receives (x, Mary))), everything:y (if (exam (y), pass (y, Mary))))
(using 'iff for 'if and only if and slightly simplifying his treatment of 'all of the exams'). This example could, of course, be put into prenex normal form, though the result 'is twice as complicated as the original version and quite incomprehensible when stated in English' (1976, p. 180). The semantic network corresponding to (5F) is (G10). (G10)
iff
scope
receive
/
Kpass
scope
Mary 1 I
O scholarship
exam O
Similarly, a scope inclusion link would be drawn from a modal operator to the first quantifier node which fell within its scope, for instance from the 'possibly' node in the example given above to an existentially quantified node. What Schubert has done is to introduce into semantic networks explicit indications of scope relationships. These are essential to any adequate representation of quantification and of propositional and modal operators, so his notation works where the attempts of his predecessors failed. But it is still deficient in one respect. Because he relies upon different kinds of node in order to distinguish universal from existential quantification, he would have to use a different type of node for each possible quantifier. That might seem feasible, if complicated, if we
Semantic networks
105
restricted ourselves to quantifiers like 'several', 'most', 'many' and 'few'. Once numerical adjectives are included, however, infinitely many different kinds of node would be required. Another version of semantic networks, under the title of conceptual graphs, has been developed by Sowa (1984). Sowa's graphs contain two distinct types of node, one representing concepts and the other relationships between them. But the edges of the graphs are unlabelled, and a relationship node must occur between any two concept nodes, so this appears to be no more than a notational variant upon labelling the edges. Concepts are, however, classified into types, and rules are given for thirty-seven relationships stating which types of concept they may inter-relate. Thus the agent relationship links an action concept to a concept of an animate being, and the cause relationship links two concepts of states (Sowa, appendix B). Many of the concept classifications, however, as well as the relationships posited between them, are controversial, yet are simply presented without supporting argument. Indeed, the original idea that semantic roles are relationships between an action and its participants is mistaken, for we could not imagine an instance of the action in which it was differently related to the participants; the semantic roles are, rather, part of the meaning of the verb (see Potts, 1978). Sowa also provides rules for constructing more complex graphs from those obtainable by applying the rules for relationships noted above (Sowa, section 3.4). They do not, however, allow for introducing negation and propositional connectives, or quantifiers, so he provides for these by adopting the existential graphs of Peirce (1960, vol. 4, pp. 320-410). These contain representations for negation, conjunction and existential quantification only, and are thus limited to classical logic. Perhaps another source of representational inadequacy is the attempt to by-pass language in order to represent meaning directly. From a true premiss that semantic representations should not be tied to a particular language, the false conclusion has been drawn that semantic representations are independent of any means of expression, that they somehow transcend language. Yet the notation in which they are expressed is, of course, itself a language, albeit a technical and not an everyday one. It may, indeed, be justified for the purpose in hand but, since that purpose is to represent the meanings of linguistic expressions, it does not excuse us from carefully relating the notation to the language which it purports to represent. Once we begin to imagine ourselves free of this duty because we are dealing directly with concepts, unjustified assumptions lead to nonsense, while the relationship of the proposed formalisms to everyday language is no longer clear.
106
Computer science: graphs
It appears that the lesson has still not been fully learned that semantic networks must exhibit scope relationships if they are to handle quantification satisfactorily. Thus, to recur to (1L), from a logical point of view, the indefinite article in (1) is a quantifier: not, perhaps, an existential quantifier ('at least one'), but a numerical quantifier Cjust one'). That difference, however, is not to the point here; the trouble is that in (1L), 'a hammer' is represented as falling within the scope of 'break', which is impossible, since it is not a proper name. The standard Fregean representation of (1) would be: (IF)
a hammenx (broke with (x, the:y (window(y), John))),
assuming the definite article to be a name-forming operator. It is, therefore, highly misleading for Allen to describe representations such as (1L) as presenting the iogical form' of propositions, etc.; at the very least we must say, as with the transformational grammarians' notion of logical form, that it is not to be confused with the term as used by logicians. Allen does, indeed, admit that 'the current LF [logical form] cannot represent scoping distinctions' (1987, p. 217). In particular, he has in mind that when a proposition contains two or more quantifying phrases, the notation cannot show which of them lies within the scope of the others. 3 Thus my example from previous chapters, (6)
Every doctor visited at least one patient
would be represented by (6L)
(PAST el VISIT-EVENT (AGENT al (EVERY dl DOCTOR)) (PATIENT p i (iNDEF/SING p2 PATIENT))),
which does not distinguish between every doctor visiting the same patient and every doctor visiting some patient or other, perhaps a different one for each doctor. Allen claims that this is actually an advantage of his notation, in that the ambiguity is resolved by context and should therefore be dealt with separately. But he is not consistent on this score, for when it comes to ambiguous words, such as the adjective 'green' (colour, unripe (of fruit), simple (of persons)), he is careful to distinguish them. Yet here, too, it might be necessary to resort to context in order to resolve the ambiguity, for an apple could be called 'green' to indicate either its colour or that it is 3
This is a consequence of the way in which semantic representations are related to the syntactic ones by the semantic interpretation rules, for the syntactic system uses noun phrase as a category and includes quantifiers as parts of it. Thus the syntactic analysis imposes a framework within which the semantic structure must be developed, and so constrains it.
Semantic networks
107
unripe. This leaves Allen's notion of logical form' wholly unclear; it clearly does not represent the structure of the proposition relevant to its meaning, for then we should require two (or more) distinct structures for a proposition which was structurally ambiguous, even though it might be left to a contextual component of the formal system to choose between them. He proposes, nevertheless, a notation to indicate the scope of quantifiers. It consists in attaching the token-name which follows one of the quantifiers as an index to another quantifier; this is to be understood to show that the indexed quantifier lies within the scope of the other. Thus, in order to represent the sense of (7)
At least one patient was visited by every doctor
in which 'every doctor' is taken to be within the scope of 'at least one patient', we should write: (7L)
(PAST el VISIT-EVENT (AGENT a l (EVERYp2 d l DOCTOR)) (PATIENT p i (INDEF/SING p2 PATIENT))),
whereas to represent (6) or the sense of (7) in which 'at least one patient' is taken to lie within the scope of 'every doctor', we have: (6L')
(PAST el VISIT-EVENT (AGENT al (EVERY dl DOCTOR)) (PATIENT p i (iNDEF/SINGdi p2 PATIENT))).
This device has the appearance of an ad hoc resort to avoid a difficulty. In a proposition containing many quantifiers, it would be extremely unperspicuous. Moreover, since it does not tell us when one quantifier does not lie within the scope of another, there will be cases in which it fails to determine scope unambiguously. Take, for instance, the following type of structure: A:x (B:y (C:z (. . .(x,y,z)) . . . (D:v (E:w (. . .(x,v,w))). We can use the indexing device to show that B and D both fall within the scope of A, and that C falls within the scope of B and likewise E within the scope of D. This will be enough to show that C and E fall within the scope of A, but does not show that E does not fall within the scope of B nor C within the scope of D. Moreover, indexing some of the quantifiers in a representation is not enough to show the scope relationships in the structure as a whole. Allen implicitly acknowledges this when he extends indexing to negation (1987, p. 288). He cites a very common type of example in which 'not' is attached syntactically to the verb in a proposition when its scope is intended to be the whole sentence, for example
108 (8)
Computer science: graphs Everyone is not going to York
for 'Not everyone is going to York' rather than 'No one is going to York'. Without resort to indices, Allen has only one way of representing (8), viz. (8L)
(NOT n l (PRES g l GO-EVENT (AGENT a l (EVERY p i PERSON))
(TO-LOC
11 (NAME yl
CITY
"York")))).4
In order to resolve the ambiguity, 'EVERY' must be indexed as 'EVERYni', to show that it lies within the scope of 'NOT', or 'NOT' indexed as 'NOT P I', to show that it lies within the scope of 'EVERY'. In principle, then, the full scope structure of the proposition would be shown by indexing every expression in the representation written in capitals except the main operator. Moreover, if the meaning of a proposition is in part determined by its scope structure, this would eventually be necessary in order to represent that meaning. Even ignoring uncertainty in the notation, the result would be excessively difficult to read. And why use a basic notation which is designed to express the operator/operand distinction in such a way as positively to obscure it? What is happening here is, of course, that an account of syntactic structure is being arbitrarily imposed upon semantic representations essentially the same mistake as transformational grammar makes. Not all computer scientists, however, treat quantifying phrases as Allen does. One of the most famous question-answering programs, Woods's LUNAR, for use in conjunction with a database of moon rock samples, extracts quantifying phrases from syntactic descriptions based on transformational grammar and preposes them to verb schemas (Woods, Kaplan and Nash-Webber, 1972; Woods, 1977). The schema for quantifiers in this system is (FOR (QUANT) X / (CLASS): (p X); (q X)),
from which a schema for a quantifying phrase may be obtained by substituting a quantifier (including the definite article and numerical adjectives) for '(QUANT)' and a count noun for '(CLASS)'. The Xs are link letters, while for p and for q an expression of category S(N) may be substituted, the former being a restrictive relative clause. Thus Woods assigns quantifying phrases to category S(S(N),S(N)). This works providing that every class name (count noun) is qualified by a restrictive 4
One may well ask what the token-name lnV means in this representation. The idea that an event could have a proper name, as 'el' in (6L), is at least plausible, but of what could 'nl' possibly be a proper name. A not-ing?
Semantic networks
109
relative clause, as in 'Every rock that contains silicon contains sodium', but is unable to handle simpler examples like 4A dog barked'. Quantifying phrases are then extracted from phrase-markers of transformational grammar by the following stages: 1 The main verb is turned into a PREDicate schema by a semantic translation rule which specifies the number and kinds of its operands. 2 Expressions in the phrase-markers corresponding to operands of the PRED which contain determiners are turned into quantifier schemas on the pattern above. 3 If a syntactic structure contains more than one quantifying expression, a decision is made about their relative scopes. This is effected by a procedure based upon the commonest orderings implicit in everyday English (see Hobbs and Scheiber, 1987). The quantifier schemas are then combined by substitution for q in the outermost. 4 Finally, the PRED is substituted for the q of the combined quantifier expression. The problem addressed in the third of these stages engages Allen, too, but at the stage of appealing to context in order to resolve ambiguous semantic representations rather than as a problem in deriving semantic from syntactic representations. Either way, it is a genuine and an important problem, but not one which need detain us here; each meaning of an ambiguous proposition should be represented by a distinct semantic structure, but which meaning we select in a given case may be a matter of the relationship between syntactic and semantic structures or be a matter of context. It is not, however, a question of the kinds of structure used to represent meaning. Shapiro (1971) developed a semantic network analogue of Woods's treatment of quantifiers. In Shapiro's system, quantifiers were represented by special nodes with some new types of edge to the verbs or other quantifiers which lie within their scope.5 Woods allows that it can handle quantification satisfactorily, but objects to it on the ground that it 'breaks up the chains of connections from node to node that one finds attractive in the more customary semantic network notations' (1975, p. 75). He gives the following example: (9)
5
Three look-outs saw two boats.
I have not cited Shapiro's own example, because it is confusing in many respects. He represents 'Every man is human', but describes it as a deduction rule, not as a proposition. Moreover, his network contains some unlabelled nodes whose significance is unclear. Shapiro later abandoned this notation, apparently under the influence of Schubert, and for unexplained reasons (see Shapiro, 1979, p. 189).
110
Computer science: graphs
The most simple-minded graph representation of this, he says, would be: saw
o lookouts
boats
From this, we move to a representation showing semantic roles of the form: lookouts O
agent ««
P Q
patient »» O boats
saw where 'p' labels a special proposition node. Finally, with extra nodes for the quantifiers, we have:6 lookouts QL
class 3 ^ O
prop 2 class ^ O »» O boats patient
Now this final network is extremely close to the graphs proposed at the end of section 2.5, and the directions of the edges can all be interpreted as scope indicators, with the exceptions of the two semantic-role edges. We have only to eliminate the 'p' node in favour of a 'saw' node and to join each quantifier with the count noun which it qualifies to obtain the same graph as (9C) of that section, except that the edges of the latter are not directed. No doubt this final graph does have more complex paths than the first one, but they are not so complex as to impede intelligibility, while the first and second networks are completely unable to handle quantification. Woods, indeed, eventually admits that greater complexity 6
Woods does not actually mention the edges from the 'p' node to the lookouts' and 'boats' nodes respectively, which indicate semantic roles, but he must have intended them, as the third network is supposed to be a progression from the second. These edges could go to the quantifier nodes instead of to the nodes for the count nouns of the quantifier phrases. I have, however, followed the precedent set by the second network as being the most likely interpretation of Woods's intention.
Conceptual dependency
111
of the paths 'may be an inevitable consequence of making the networks adequate for storing knowledge in general' (ibid.). Compared with Schubert's system of representation, Shapiro's has the great advantage that we do not need different styles of node for different quantifiers, since they are differentiated by their labels. Nor are any special edges to indicate scope required, because the quantifier nodes are already placed in the correct scope order with respect to the verb and each other. However, the example cited by Woods does leave a residual problem. He notes that it is susceptible of three different interpretations: 1 a group of three lookouts saw a group of two boats; 2 each of three lookouts saw a group of two boats, not necessarily the same group in each case; 3 a group of three lookouts saw each of two boats separately. Shapiro's system of representation does not differentiate these, nor does the one which I proposed at the end of section 2.5. Woods himself does not comment further on this problem. I shall return to it much later, in section 7.3. 3.3 CONCEPTUAL DEPENDENCY Schank's conceptual dependency diagrams are often classed as semantic networks, but I treat them separately here because they raise special questions which do not arise for semantic networks generally. Nor are they presented in the form of graphs, and it is not entirely straightforward to adapt Schank's notation in order to present them as such. Conceptual dependency (Schank, 1975a) is so called because it aims to represent the structure of thoughts, independently of any particular language. It has concentrated upon thoughts of actions, distinguishing sharply between actions and states. States, indeed, feature only in the context of changes of state, brought about by actions. Moreover, the participants in actions are envisaged almost exclusively as bodies (in the sense of Newtonian mechanics). Many details of the theory are controversial and could be discussed at length, but the heart of it is a claim that the meaning of every verb of action can be expounded in terms of not more than eleven basic notions, or thereabouts (originally sixteen; see Schank, 1972). It therefore belongs, perhaps more than any other form of semantic network, within the context of componential analysis.7 7
Componential analysis was first developed by anthropologists interested in colour vocabularies, native ethnobotanical terminologies, disease taxonomies, primitive cosmologies, systems of religious concepts, and so on (see Conklin, 1962). The vocabularies were selected on the basis that there is a common element in the meaning
112
Computer science: graphs
Although states are peripheral to the theory, they provide a suitable starting-point because their representations are the simplest. To indicate an attribute of a body, a triple-barred arrow is used, the attribute itself being cited as a state, followed, in brackets, by a value of that state, thus: body
state (value)
e.g. Nigel
health (ill),
where italicized words are category symbols (Schank does not distinguish between proper names of bodies and names of kinds of body). A change of state is then represented by Schank as follows: -•
state (value 2)
•
health (ill)
e.g. Nigel
body state (value 1)
health (OK)
where the example represents the proposition 'Nigel became ill' (ignoring tense). Of course this only serves to represent changes of states which are susceptible of intensive magnitude, but no pretence is made of an exhaustive treatment of states. The central idea embodied in this representation, that a change of state can be described by citing the initial state and the end state, is surely basically correct, although in one respect it goes too far and in another not far enough. Too far, in insisting upon specifying the initial state: we are usually much more interested in the end state of a change, and many changes of state can be described in the form 'became F , where ' F is the end state, such as the example above, 'Nigel became ill'. To this, Schank would reply that, if Nigel became ill, then the implication is that he was OK previously, and our representation should make this explicit so that, if need be, the inference can be drawn; but that, as the initial values of the state are not always given, we simply need to provide a slot indicating that some initial value is presupposed.
of each word in the vocabulary, which defines the semantic field under investigation. The remainder of the meaning was then analysed into elements which, in different combinations, would yield the meanings of other words in the field. Names of common animals provide a simple example (Lyons, 1968, 10.5): thus 'woman' = 'human + adult + female', whereas 'bitch' = 'canine + adult 4- female' and 'calf = 'bovine + nonadult'. The idea was further developed by Bendix (1966), who applied it to a nonspecialized vocabulary and envisaged the components as being combined in particular ways; thus the meaning of 'get' was expounded, in one sense, as 'come to have', with 'come to' modifying 'have' ('John got a letter') and, in another, as 'cause to come to have', with 'cause to' modifying 'come to have' ('John got a hammer from the cupboard'). Schank's work combines both strands in previous componential analysis.
Conceptual dependency
113
The representation does not go far enough because it is unclear how it would be extended to deal with a case in which there was a significant distinction between 'Cambridge' change8 and real change and would show what type was in question. A classic example is Plato's (10)
Theaetetus became taller than Socrates.
There is a sense in which 'Socrates became shorter than Theaetetus' is true just in case (10) is true, but in that sense there is no real change in Socrates: boys grow, but men do not (usually) shrink. Now if we were to expound the meaning of (10) merely in terms of initial and end states, for example by 'Theaetetus was not taller than Socrates and then Theaetetus was taller than Socrates', nothing in the exposition would tell us in which of the two the real change occurred. We cannot appeal to the occurrence of Theaetetus' in the subject position, since the exposition given has precisely the same truth conditions as 'Socrates was not shorter than Theaetetus and then Socrates was shorter than Theaetetus'. Presumably we could substitute 'height' for 'state' in the conceptual dependency diagram, setting value 1 = 'not taller than Socrates' and value 2 = 'taller than Socrates'. It could then be claimed that the triplebarred arrow pointing to 'Theaetetus' would show that the real change took place in Theaetetus and not in Socrates. But this does not seem a wholly satisfactory solution, for 'Socrates' is now tucked away as part of the specification of the value of Theaetetus's height, although certain inferences about Socrates follow from (10), for instance that he is now shorter than Theaetetus, which would be difficult to extract from this representation, because it totally omits to show scope relationships. In cataloguing actions, Schank employs a slightly different system of semantic roles from Fillmore. These include the already familiar Agent and Object, with the requirement that every type of action have both. Source and Goal as such, however, disappear. Instead, some actions have a Direction, specified by means of an initial and an end location (coordinates in space); these are, respectively, the equivalents of Fillmore's Source and Goal for movements of bodies. But other actions involve a Recipient role, specified as donor or sender, on the one hand, and a recipient, on the other; these are bodies, not locations, so Schank makes explicit the difference of sense obscured by Fillmore when he applied the roles of Source and Goal indifferently to changes of place and of ownership or possession. But the analogy between the two roles is preserved in their respective representations: A term introduced by Geach because this notion of change was utilized by Russell (1937, chapter 54, sections 442-7).
114
Computer science: graphs location 2
D -*
location 1
R
body 2 (recipient) body 1 (donor, sender)
It will be useful still to distinguish the initial and terminal items of these roles as source and goal, for example 'Direction goal', 'Recipient source'. Schank retains the role of Time, and also Location outside the directive role, so that it may be applied to an action as a whole. He also retains the role of Instrument, but radically modifies it so that an instrument is always described by a proposition: roughly speaking, the latter tells us what the agent did with the instrument in the more ordinary sense of the latter. But none of these last three roles is needed for classifying types of action. The eleven primitive kinds of actions comprise, first, a group of five types of human physical action, each of which involves an agent, a patient and a direction, INGEST and EXPEL are converses, the agent moving the patient inside or outside his body respectively. To PROPEL is (surprisingly, in view of the point made above) for an agent to apply a force to a patient, thus causing the latter to move; to MOVE, by contrast, is restricted to movements of parts of the agent's own body. Finally, there is to GRASP, which in the original version involved only an agent and a patient, but in the later version also includes a direction, the goal of which is the part of the agent's body doing the grasping (usually the hand). The idea is presumably that, when a person grasps something, its place (whatever it was before) becomes his hand; but this seems a strange application of direction: think, for example, of someone grasping a handrail in order to climb a staircase. It will be evident from this list that the primitive actions are not to be understood exactly in the everyday senses of the English words used to describe them; as a reminder of this, I have followed Schank's practice of writing them in capitals. There follows a group whose members have clearly artificial names, PTRANS, ATRANS amd MTRANS. These are all transfers, the first physical, the second abstract and the third mental. To PTRANS is to change the place of a body, so involving agent, patient and direction. To ATRANS is to change an abstract relationship holding between a patient and a second organism (the recipient), so it involves agent, patient and recipient roles. The last in this group, MTRANS, is one of a pair of mental actions, the other being MBUILD. MTRANS is used for all transfers of information and so involves agent, object and recipient (the recipient source is the sender). It is associated with a controversial theory in which each person is regarded as having a conceptual processor, an intermediate memory and
Conceptual dependency
115
a long-term memory, considered as parts of his head; thus intra-personal as well as inter-personal mental transfers are possible. Examples of propositions in whose analysis MBUILD is used are 'John considered kicking a rock', 'John thought about hitting Mary but realized that he would hurt his hand' and 'Bill became aware of John's poverty when he saw the state of his house'. It also involves a sender and a recipient, apparently always the agent's long-term memory as the former and his conceptual processor as the latter. Finally there are two actions, SPEAK and ATTEND, which, we are told, occur almost exclusively as instruments of other actions. To SPEAK is to produce a sound and, as sounds can be directed, it has a direction, with the agent as its source. An example in whose analysis it occurs as the principal action and not as an instrument is: 'Who yelled at me at six in the morning?' To ATTEND is to direct a sense organ towards a stimulus, the former being the object or patient, while the place of the stimulus is the goal; the source is simply left empty in the examples given, so it remains unclear what it would be. The reduction of descriptions of types of action to eleven is achieved largely by the first basic principle of conceptual dependency, 'that an ACT is something done by an actor to an object. All verbs that leave out the actual ACT that was done must be treated as DOs with causals connected to a state of change' (Schank, 1975a, p. 49). A simple but nevertheless interesting example of this is Schank's representation (p. 34) for (11)
John grew the plants with the fertilizer, John ^=
fertilizer
size (> x) plants < size (x) which says, in effect, 'John did something with fertilizer as a result of which the plants increased in size'. Now it so happens that the verb 'grow' in English has both a transitive and an intransitive use, the former causal but the latter not, and so related that the transitive use can be defined as 'cause to grow (intransitive)'. There are many other examples of this, and in Semitic languages there is a regular productive process whereby the
116
Computer science: graphs
equivalent of 'cause to' may be prefixed to any suitable intransitive verb to turn it into the corresponding transitive one. Schank goes further than this, as may be seen by introducing the intransitive 'grow' into the paraphrase of his analysis: 'John did something with fertilizer as a result of which the plants grew'. He requires that causality always be represented as an operation upon whole propositions, here 'John did something with fertilizer' and 'the plants grew'. In my terminology, his vertical triple-barred arrow labelled 'r' is an operator of category S(S,S), with the restrictions that the cause proposition must always describe a type of action in his limited sense, and the effect proposition a change of state. The idea that causality is a relation between events is common currency among philosophers. Yet it presents serious difficulties which often pass unnoticed. The first warning sign is that the tenses of the sentences describing the two events are not independent, as they should be if propositions were indeed the operands of 'as a result of which'. We cannot say *'John will do something with fertilizer as a result of which the plants grew', although 'John did something with fertilizer as a result of which the plants will grow' is in order. Perhaps this objection could be met by a relatively small modification to the effect that the operands of 'as a result of which' are propositions in all but tense and, perhaps, aspect. In Galton's terminology they would be event-radicals (1984, p. 5). But, if this is correct, the causal sign in the representation of (11) should fall within the scope of a tense operator. (One can see, intuitively, that the same cause and effect might be placed at any point, or at the limits of any period, in the time series, wherever the present happened to fall in relation to them.) Schank provides no way of doing this, although his notation might, perhaps, be extended to cater for it. Unfortunately, however, there is a much more fundamental objection to this analysis. Where the main operator of a proposition is of category S(S,S), it is not possible to negate the entire proposition by negating the main verb of its first operand.9 Thus 'John did not laugh but Mary cried' is not true in precisely the same circumstances as 'It is not the case that John laughed but Mary cried' - if Mary did not cry, the latter will be true but not the former. The same holds, mutatis mutandis, if we substitute 'and' or 'or' (whether understood inclusively or exclusively) for 'but'. Now apply this test to Schank's analysis of (11). It will be simplest, perhaps, to switch the order of the operands so that we can replace 'as a result of which' by 'because', that is,
I am obliged to my colleague Mr Peter Long for this test.
Conceptual dependency
117
(lla) The plants grew because John did something with fertilizer. Negating the main verb of the first operand, we get: (lib) The plants did not grow because John did something with fertilizer. This is ambiguous. It could be understood as asserting that the plants did not grow; this interpretation creates no difficulty. However, the 'not' can also be read as qualifying the 'because', and then the sense is 'It is not because John did something with fertilizer that the plants grew', which is the negation of (lla), that is, true just in case (lla) is false. Hence 'because' (and similarly 'as a result of which') cannot belong to category S(S,S). I think it is clear, moreover, that this test would not be invalidated if the operands were event-radicals rather than full propositions. A parallel criticism of Schank's representation of (11) also arises to that made earlier of his analyses of changes of state, when I observed that it is often enough to describe a change in terms of its end state without saying anything about the initial state. In the same way, causal verbs are very widely used without telling us how the result was obtained. Schank gives us part of the answer in (11), but the even less informative 'John grew these plants' is still quite in order as a proposition. Although the dummy action 'DO' is used in the representation, Schank tells us that it is fairly obvious what John did to the fertilizer, namely, moved it to the plants. Yet in many cases where causal verbs are used, the possible means of effecting the result are manifold and a guess may be very far from the truth. Schank does not distinguish between what is explicitly part of the meaning of a proposition and what may be inferred from it; in his representations all possible inferences must be made explicit from the start. This leads not only to unnecessary complexity in the representations, but also forces drastic restrictions upon the system of representation which unduly limit the range of everyday language which it can represent. The requisite modification is to offer a method of representing only what is explicit in a proposition yet which allows its content to be pursued, either by inference or by question and answer. So, in the present instance, Schank is quite right to insist that, for example, any representation of a proposition containing a causal verb shall allow for the question: 'How did the agent produce this effect?' At the same time, though, it should remain legitimate to represent 'John grew these plants' simply as 'grew (these plants, John)' because, for some purposes, no more detailed analysis may be required.
118
Computer science: graphs
Schank distinguishes three types of causation, the one illustrated above being result causation. In addition, there is reason causation, as in 'Socrates kicked the dog because it bit him', but he regards this as an abbreviation for a more detailed analysis in terms of mental states, which cannot be pursued here. Finally, there is enabling causation, represented again by a vertical triple-barred arrow, but this time labelled 'E'. Enabling causation is 'a relationship between a state and an action where the performance of that action depends upon the existence of the state', examples being 'John bought the book to give to Mary' and 'John stole the apple in order to eat it' (1975a, p. 36). It appears, then, that enabling causation is expressed in English by 'in order that'. But matters are not so simple, for Schank also invokes it in his analysis of 'John strangled Mary' which is, roughly: 'John grasped Mary's neck tightly in his hand in order that Mary might not ingest air into her lungs from outside' (1975a p. 57). This reveals a feature in which enabling causation is not fully expressed by 'in order that'. It may be true that I went to the shops in order to buy some eggs but, having arrived, do not buy any: perhaps there are none on sale, or perhaps I see some other food which I decide to buy instead. However, if Mary managed to go on breathing, John did not strangle her; he only tried to strangle her. Schank's original definition of enabling causation is not of much help in this connexion, because his analysis yields two inter-related actions (GRASP, INGEST), whereas his definition required an action and a state or change of state (In fact, his representation violates his own grammar for enabling causation; see p. 39, (13).) The special interest of the example, however, lies in the negation, 'in order that she might not ingest'. Schank actually represents this by a stroke through the 'E' which labels the causation arrow, thus: '$' and calls it 'unenabling'. This is a rare example of negation in conceptual dependency, and prompts the questions how the system would handle negation more generally, as well as conjunction, disjunction and implication. No answer can be given from Schank's text, but the present example suggests that problems may lie in store. It would be natural to construe the ']£' notation as representing an external negation of the whole proposition which would have been represented without it, that is, so that 'not' is the main operator. But the meaning would then be 'It was not in order that Mary might ingest air that John grasped her neck etc.'. There is no indication how Schank would represent the latter proposition, but in the light of his example we have to construe the stroke as negating, not the enabling causation, but the proposition describing what is enabled; that is, it negates the proposition following 'in order that' to form 'in order that not'. The notation is thus extremely unperspicuous.
Conceptual dependency
119
Schank remarks that enabling causation is easily confused with his version of the instrumental case. Yet here it seems, rather, that the distinction between enabling and result causation is at issue, for, if Mary was strangled by John, was she not prevented from breathing by him? And is not preventing something from happening the same as causing it not to happen, where the type of causing in question is result causation? To paraphrase, if John strangled Mary, then he placed a constriction around her neck (not necessarily his hands, as Schank gratuitously assumes: perhaps he garotted her) with the result that she was unable to breathe with the result that she died. The reply to this is that result causation, as defined by Schank, requires that the effect always be a change of state and the cause always an action. This rules out chains of causes and effects, since an effect would be of the wrong category to be the cause of a further effect; moreover, breathing (ingesting air into the lungs and expelling it) is an action, so neither it nor, presumably, its absence, can be an effect of result causation. Actions can, however, have other actions as their instruments', indeed, Schank tells us that, since every action has an instrument which is part of it, not only is instrumentality transitive, but each action has an infinite series of instruments, for example 'John ATRANsed the book to Mary by moving the book towards Mary, by moving his hand which contained the book towards Mary, by grasping the book, by moving his hand, moving the muscles, by thinking about moving his muscles' (1975a, p. 33). Were this latter point correct, no action could ever be started; it does not make sense to ask 'And how did he do that?' ad infinitum. In particular, we do not generally move our muscles by thinking about moving them; if, indeed, I were to think about moving the muscles which I am now using to write this sentence, my handwriting would immediately be gravely impaired and the production of each letter take an unconscionable time. Schank approaches semantic analysis as though it were a branch of ergonomics, trying to break down each action into minimal physical steps. This is mistaken in principle, because with so many actions we have a wide choice of means by which to accomplish them, and it is not part of the meaning of the description of the action that it was performed by one means rather than another. I have already drawn attention to examples in which Schank posits one means in his analysis where others are also possible. A further example is his analysis of 'John ate a frog' to include 'by moving his hand, containing the frog, to his mouth' (1975a, p. 26). But is it really part of the meaning of 'John ate a frog' that he ate it with his fingers? Or, for that matter, that he ate it with a knife and fork or again, perhaps, with chopsticks? This passion for specificity continually
120
Computer science: graphs
misleads Schank into importing information into his analyses which is not present in the propositions which he purports to analyse. The question 'How did the agent do such-and-such?' is nevertheless often in order, and can sometimes be repeated. So it might be a first step in analysing the meaning of 'John strangled Mary' to expound it as 'John killed Mary by preventing her from breathing by placing a constriction around her neck', the result of asking 'How did John kill Mary?' and then 'And how did he prevent her from breathing?' But we have to remember that killing is not an action for Schank; it is doing something with the result that somebody dies. Similarly, preventing someone from breathing is not an action either but, rather, doing something with the result that the person cannot breathe. So the only action here is constricting Mary's neck and, since the conditions imposed upon result causation are not fulfilled, another type of causation, enabling, must be invoked instead. I hope that the foregoing discussion of some of Schank's examples has brought out that his distinctions between result and enabling causation and instrumentality are more a product of his notion of an ACT as 'a series of little actions that make up a whole. That is, writing, running, shooting, talking and so on, are actually achieved by doing a great many small actions' (1975a, p. 36), than of an objective analysis of causality. All the same, his contention that we need to distinguish more than one type of causality may well be correct. Wittgenstein (1937) also argued in the same sense, although his distinctions are quite different from Schank's. More widely, enough of Schank's notation has now been shown to establish that it is inadequate for representing everyday language in general. Schank states explicitly that he has 'chosen to ignore some of the stickier issues like quantification' (p. 49). That would be perfectly legitimate were it possible to extend his system of representation to encompass it, but there is no way, for example, of removing multiple occurrences of a proper name from one of his diagrams and then using the resulting firstlevel schema as the operand of a quantifier. Indeed, his notation often forces repetition of a proper name, so that the only pronouns which can be represented are those which go proxy for a repeated proper name. It may be that these difficulties could be overcome, but that would certainly demand extensive revision of the notation. In spite of these criticisms of conceptual dependency, though, it serves notice upon philosophers and logicians that first-order, and even modal logic is not enough to represent the meanings of many everyday expressions in such detail as will allow for many of the manipulations - inferential, question-and-answer, or those mimicked in expert systems - which are ubiquitous in everyday life and which computational
Conceptual dependency
121
linguists seek to reproduce. In particular, the aspects of language which we use to talk about change demand urgent attention and, if conceptual dependency is an inadequate response to this need, at least it is a response, with some positive features which could be developed. Foremost among these is the idea that the meanings of many verbs of action can be expounded as combinations of other verbs and various auxiliaries; but here a further pitfall may trap the unwary. I have already criticized Schank's ergonomic view of semantic analysis, the notion that because an action is often composed of a series of smaller actions, the meaning of a description of an action will be given by a combined description of the smaller actions which make it up. Yet although Schank conflates this with the idea that there is a set of primitive actions in the sense of a basic action-vocabulary in terms of which the meanings of all other descriptions of action can be expounded, the latter is quite independent of the former. Thus we might hold that 'cause to' and 'die' were semantic primitives, whereas 'kill' is not, without any commitment to holding that dying (or causing, for that matter) is an ergonomic primitive. All the same, there is a difficulty in assuming a priori that there are semantic primitives. It is that we may reach a point in componential analysis where further analysis is not impossible, but takes us round in circles. Classical propositional logic affords a simple model. We can expound the meaning of'if. . ., then . . .' in terms of'not. . .' and either 'both . . . and . . .' or 'either . . . or . . .' (if/? then q just in case not both/? and not q just in case either not p or q). Equally, we can expound the meaning of 'either . . . or . . .' in terms of'not. . .' and either 'both . . . and . . .' or 'if . . ., then . . .'. Finally, we can expound the meaning of'both . . . and . . .' in terms of 'not. . .' and either 'if. . ., then . . .' or 'either . . . or . . .'. So when our analysis has reached the stage of 'not' and one of the three connectives, it has not touched rock bottom; we can always continue, but only within the circle of alternatives just listed. There is no 'rock bottom' of analysis. Similarly, the existential quantifier can be defined in terms of negation and the universal quantifier, or the latter in terms of negation and the existential quantifier, while possibility can be defined in terms of negation and necessity and that, in turn, be defined in terms of negation and possibility. So it seems that this circularity is a typical, rather than an exceptional feature of a given semantic field. However, all of this applies only to classical logic; in intuitionist logic the definitions listed above all break down and their terms become independent. Still, we know about the interdependency of the terms in classical logic and their independence in intuitionist logic only because the notions
122
Computer science: graphs
represented have been formalized and systems set up which have been thoroughly investigated. The lesson is therefore that no assumptions about semantic primitives should be made in advance; we must first undertake such componential analysis as seems feasible, and hope that results with regard to independence of the terms or otherwise will emerge after investigation.
3.4 FRAMES The notion of frames originated in a remarkably obscure yet influential article by Minsky (1975) - obscure, because, as one commentator has aptly said: 'It is not at all clear now what frames are, or were ever intended to be' (Hayes, 1980, p. 46). They were originally a contribution to the representation of knowledge, rather than of meaning, but have subsequently also been enlisted in accounts of meaning. The representation of knowledge only concerns us indirectly here, in that background knowledge other than purely linguistic is sometimes necessary in order to understand the meaning of a proposition. An excellent example of this is the proposition There is someone in every room'. We know that the existential quantifier is not intended to be the main operator here because the sense that this would impose is physically impossible, although multilocation might remain a logical possibility. We take 'every' as the main operator because of what we know about rooms in buildings: that they are considerably bigger than people and typically separated by internal walls. Alter 'room' to 'office', though, and there is a perfectly plausible interpretation with 'There is someone' as the main operator, viz. if'office' is understood as a job, not a room, since one person can have several jobs at once. Wittgenstein also drew attention to a more profound sense in which the meaning of an expression can be dependent upon 'very general facts of nature' or even upon human agreement and, thus, shared beliefs (1953,1, paragraph 142 and note; Il.xii). An example of a relevant very general fact of nature would, I think, be gravity. Consider how the vocabulary of someone born and brought up on a space ship in a condition of weightlessness might contrast with ours. First, weight could not have the role in trading which it has with us, so that, for example, an expression like 'Five pounds of potatoes, please!' would have no use, even if potatoes were available on the space ship. Second, bereft of any experience of weight, the many analogical uses (especially psychological) of terms connected with weight would have no application, such as 'depression', 'he felt as though a great burden had been lifted from his shoulders', 'over
Frames
123
10
the moon'. These presuppositions of meaning, as we may call them, would doubtless have to be made explicit in a full formal theory, so knowledge representations would be needed to back up our semantic representations. Such knowledge could, of course, be expressed propositionally, and thus be represented by any system capable of representing propositions. Frames are intended as an alternative method of representation. A single frame is capable of representing all of the system's knowledge about a given body, event, action, institution, etc. (of which it constitutes a complex description), and thus corresponds to a whole set of propositions. From the point of view of the computer scientist, the use of frames would be justified if they were merely a more convenient way of representing the information, making it more easily accessible or manipulable. That, however, is not relevant to the problem addressed by this book; our question must be whether frames offer a more perspicuous way of representing the information than, for example, first order logic. Two well-worn examples of frames are a house and a child's birthday party. What is being described provides the title of the frame, the body of which consists of a list of slots and values. The values are the parts or aspects of what is being described and the slot names state their relationships to it. Thus a frame for a house might be as follows:
10
The Cedars living room dining room kitchen
Gl G2 G3
master bedroom guest bedroom bathroom
Fl F2 F3
Presuppositions of meaning create a serious difficulty for possible-worlds semantics, in which logical necessity is defined, following Leibniz, as truth in every possible world, and is then used to define the meanings of terms. This assumes that the meaning of an expression will not vary from one possible world to another, whereas it may well do so, for example in a world without gravity compared with the environment in which we live. The point, simply, is that human languages were developed to facilitate our lives in this world and not in any possible world; imagine a world different enough from ours, and they could no longer be relied upon to fulfil their function. Of course one can restrict the possible worlds in the definition of necessity to those which bear a specified relation to our world, but how would one decide what, precisely, to build into this relation and what to leave out? Some expressions vary in meaning from one human society to another in such a way that one needs certain background knowledge in order to understand the difference in meaning.
124
Computer science: graphs
The values here are (dummy) proper names of the rooms ('G' for 'ground floor', ' F for 'first floor'; thus my room in the University is G22), so that Gl is the living room of the Cedars and the slot name, 'living room', describes the relationship of Gl to the house. Strictly speaking, the example above is of a frame instance', the frame itself is a kind of schema obtainable by removing the values and, since the title 'The Cedars' is an instance of a house, replacing it by 'House' as title. The frame is then, as its name indicates, a framework for describing a typical house. Moreover, just as we could substitute another schema instead of an operand for a schematic symbol, so we can substitute another frame for a slot; thus there might be a frame for describing a typical kitchen, for example Kitchen cooker refrigerator sink window cupboard 1 so that this list of slots (each, of course, requiring a value in any instance of the frame) would replace the value position marked by 'G3' above. It is, I think, clear that a frame instance can provide a much more compact description of something than can be given propositionally. In order to stuff all of the information into a single proposition, we should need a complex embedding of relative clauses, such as T h e Cedars has a living room, a dining room, a kitchen which has a cooker, refrigerator, sink, . . ., a master bedroom, . . .'. Alternatively, we could split the description into a set of propositions, but at the cost of much repetition: 'The Cedars has a living room. . . . It has a kitchen. The kitchen has a cooker. It has a refrigerator . . .'. n So frame instances have evident
Thus the Greek expression TO otnoyzvi-ioi is customarily translated 'afternoon', but the visitor to Greece soon discovers that TO aTioyeu^a begins at about 6 p.m. and goes on to about 9 p.m. Yet 'afternoon' is not a mistranslation; TO arcoyeu/ia should not be translated 'evening' instead, because Greek simply has no word for the time between lunch and 6 p.m. The reason is simple: that is the siesta time and so everyone is asleep. Nothing of any social consequence happens between 3 p.m. and 6 p.m., so no term is required to refer to that part of the day. The first social time after the midday meal (TO arco-yEO/joc) is the period beginning at 6 p.m., so that is the after-noon for a Greek. I have glossed over the values in the frame instance. Strictly, the example should translate into: 'Gl is the living room of the Cedars', etc. But we can regard slot and value as tantamount to a Skolem function replacing an existential quantifier and thus licensing 'The Cedars has a living room', etc.
Frames
125
advantages when we want to represent more information about something than can conveniently be expressed in one or two propositions. Moreover, there seems to be no great difficulty in extracting propositions from frame instances, so that information represented in frames could easily be converted into propositional form in order to use it in conjunction with representations of other propositions (see Hayes, 1980). Frame instances merely represent knowledge, for the most part of contingent matters. The extension to meaning only arises when we move from frame instances to frames proper. Thus, there has been a suggestion that frames can serve as definitions: 'if we find fillers [values] for all the slots of a frame', then we may 'infer that an appropriate instance of the concept does indeed exist' (Hayes, 1980, p. 49, the 'criteriality' inference). So it would be a sufficient condition of something being a house that it should have a living room, a dining room, etc. If the slots also specify necessary conditions, then the frame becomes a definition. Now this suggestion must be firmly resisted, on three grounds. First, very few, if any, expressions can be defined by giving detailed descriptions of what they express. The more detailed the description, the more exceptions there will be. The number and types of room in a house may, for instance, vary enormously: compare and contrast a palace, a cottage, a bungalow and a large suburban house divided into flats, just to make a start. But the more economical the description, the more things of other kinds that will satisfy it; for example, if the net is cast wide enough to catch every house, it will catch many other things as well, say the flat in the office block provided for company guests. Second, descriptions of the type given in frames are not exclusive. We can always construct an alternative frame for the same expression, which looks at it from a different point of view. The reason for this is that frames treat what they describe as a structure, the values are its parts and the slots state their relationships to it. But the way we see a thing as being organized relates to our purposes; we call it a structure because we see it as having a structure, but it is not determined in advance what structure we see it as having. There are always different ways of enumerating the parts. This is very evident with the house example. No doubt most computer scientists would find it natural to take the rooms as the parts, because they are primarily interested in houses as places to live in. But a structural engineer might have a very different point of view. He might be more interested in classifying houses by their structural elements, so that, for him, two houses with identical accommodation might be classified quite differently because one had conventional foundations, load-bearing walls and a pitched roof, while the other rested on a concrete raft, was
126
Computer science: graphs
supported by pillars and had a flat roof. Someone else, again, might want to classify houses by the materials of which they were made, for instance brick, stone or wood. Their frames for 'house' would have totally different slots and values from the one given above. Third, the attempt to use frames as definitions runs counter to the basic intuition upon which the idea was originally based. That idea was that a frame gives us a typical description schema for something, the elements that a person in our society would expect to find in it. It is partly developed by allowing default values for slots. One cannot illustrate this very easily from the house example. A clearer case is a frame for travellers for use by a travel agency which contains a slot for the place from which the journey is to start. The default value might then be the traveller's home town (Bobrow and Winograd, 1977). However, travellers sometimes book tickets for a journey which is to start elsewhere, so the default value makes an assumption which stands until it is over-ridden, in the absence of any evidence to the contrary. This idea needs to be extended to the slots themselves. They, too, are defaults. The typical house has, perhaps, a living room, a dining room and a kitchen; but in some houses one room will serve both as living and dining room, in others one room both as dining room and kitchen. The typical house may have a separate bathroom and WC, but many have a combined bathroom and WC. The typical house may have only one WC, on the first floor, but many have a second WC on the ground floor. So the slots also represent what we may expect to find in the absence of evidence to the contrary. Of course someone may argue that there is no such thing as a typical house; if so, we could only use frame instances of houses and frames would at best represent description schemas for possible houses.
Yet I think that there is, in fact, very wide application for the notion of a typical so-and-so, and, indeed, often in cases where it is impossible to give a definition. Thus the typical chair has four legs, a seat and a back, but some chairs have a single pedestal, in others the seat and back are not distinct, while some are low and others high, some have arms and others not, some are hard and others soft. Again, the typical bed has a frame and a mattress and is either single or double, but try to define 'bed' and you will soon be plagued by the exceptions! This goes for natural as well as inanimate things. The typical man has two arms and two legs, can see and can hear; but some people have lost an arm or a leg, are blind or deaf. Philosophers have been trying to define 'man'12 for two and a half millennia, but have not yet succeeded to everyone's satisfaction. In spite or homo, not hvf\p or vir.
Frames
127
of this, we should have little difficulty in describing a typical human being and frames would be a good medium for doing so, although perhaps we should need several frames, for example one for physical and another for psychological description. Fillmore, who has criticized what he calls 'check-list* theories of meaning, that is, those which specify truth conditions, seems to have been inspired by this notion of a prototype in his informal sketch of a theory of meaning based upon a distinction between frames and scenes (Fillmore, 1975b, 1977). He says that he is using 'frames' in one of Minsky's senses - presumably, for frames as opposed to frame instances. What he means by 'scene' is more difficult to pin down. His general characterization, 'any kind of coherent segment, large or small, of human beliefs, actions, experiences or imaginings' (p. 63) lumps together mental phenomena (beliefs and imaginings) with events in the external world (actions) and the very slippery notion of experience, which could be either (the experience as a mental phenomenon, or what it is that is experienced). Nor do his examples resolve this ambiguity. He distinguishes three senses of 'scene'. First, a prototypic sense, covering the flora and fauna of one's garden, the artifacts in one's kitchen and the observable parts of the human body. Second, a cinematic sense, covering events and activities such as a person eating, a child drawing a picture, people engaged in acts of commerce and carrying on a correspondence. Third, a stage-direction sense, of which examples are imagining that a closed box contains sweets or imagining that someone is hiding behind a curtain (p. 72). One could describe the first and second of these as scenes of the external world, static and dynamic respectively, but the third is mental. His classification of scenes becomes even more confused as it progresses. Many of his examples are stated to be the meanings of words: some whose understanding requires a history, like 'scar' and 'widow'; others for which we must know certain background conditions, like 'poison' and 'wound'; others which relate to institutions, like 'buy', 'promise' and 'negotiate'; and yet others which relate to our body image, like 'left' and 'front', 'crawl' and 'smile', 'hunger' and 'fever', or to mental experiences, like 'anger', 'surprise' and 'impatience' (pp. 73-4). The reader must wonder, at the end of all this, what would not qualify as a scene, for the notion appears to be all-embracing. Indeed Fillmore himself asks why frames should be needed in addition to scenes, but answers that sometimes we are familiar with something, and its role in our lives, but do not know what it is called; that is, we have a scene for it but not a frame. Frames are linguistic, but scenes are non-linguistic; Fillmore characterizes frames as 'any system of linguistic choices (the easiest cases
128
Computer science: graphs
being collections of words, but also including choices of grammatical rules or categories) that can get associated with prototypical instances of scenes' (p. 63). Yet in spite of this indeterminacy in the notion of a scene, some of the applications which Fillmore describes suggest that it may have merit. One application is to the statement of selection restrictions. Fillmore notes that the selection of a pair from the scalar terms 'taH'/'high' and 'short'/ 'low' depends upon context. Thus, for people we use 'tall' and 'short', for buildings 'tall' and 'low', but for cloud 'high' and 'low'. Instead of the customary procedure of specifying, for each noun to which scalar measures apply, which pair it takes, he suggests that we should specify an associated scene for each pair, such as 'high' and 'low' for measuring vertical distance from a horizontal base line. If, then, the scene associated with a given noun includes vertical distance from such a base, it will collect that pair of scalar terms (p. 71). The essential point here is that, in order to select the correct pair in a given case, we need to know the context of measurement, so that, in accounting for the uses of scalar expressions, we need to describe, as generally as possible, the context in which each pair is applicable. Now this immediately recalls Wittgenstein's language-games, as he first introduced them: 'language and the actions into which it is woven' (1953, 1.7). Indeed, specifying a frame and its associated scene seems to be, under another name, exactly the same as describing a language-game. This would explain why Fillmore's scenes exclude nothing but linguistic expressions themselves: they are the actions into which the language of a frame is interwoven. They could be mental as well as physical actions and even, by extension, events other than actions or static states of affairs, since some linguistic fields are less anthropocentric than others. Moreover, as well as claiming that, in order to understand the meaning of an expression, one must know the context of its use, Wittgenstein also argued for the semantic significance of the typical case or, in Aristotle's terminology, of what holds for the most part. A salient example of this is that intentions are for the most part, though not invariably, executed. We therefore plan on the basis that we and others will carry out their intentions, even though we know that some intentions will not be fulfilled. A large part of our knowledge of the contingent future rests on this. One has only to think of diaries: they would lose much of their present point if the exception became the rule and for the most part our own intentions and those of others were not executed. One might still record what befell each day, but it would be silly to write down appointments most of which would be broken, both by oneself and the other parties.
Frames
129
There would be little use for a system of frames, however, unless we were able to reason from typical cases. But it is evident that we do often reason in such ways. Indeed this is in large part what is meant by Bishop Butler's famous dictum, which so impressed Newman, that probability is the guide of life, and which inspired his account of historical reasoning (Newman, 1845, chapter 3). Reasoning of this kind is non-monotonic; that is to say, it does not hold, as in standard logic, that if a conclusion follows from a given set of premisses, it will follow from any larger set. Practical reasoning is perhaps the clearest case of non-monotonic argument, when additional information can render a previous conclusion invalid. Thus the record and prospects of a given company might lead one to conclude that its shares would be a good investment, but subsequent information that the directors had recently sold large amounts of their holdings make the earlier conclusion suspect. Now typical cases can be regarded as defaults: thus a description of a typical house incorporates assumptions about the house's structure which stand until we have evidence to the contrary, for example that it has a dining-room. One could, of course, say that if we know of a particular house that it does not have a dining-room, then the original premisses embodied in our description of a typical house are simply changed, and in either case the reasoning is monotonic. But it is also possible to accommodate exceptions by adding premisses to those already assumed in the typical case. Our rules of inference must then be non-monotonic, allowing certain types of conclusion to be drawn from specified types of premiss provided that some further specified type of proposition is consistent with all the available premisses (see Genesereth and Nilsson, 1987, section 6.6). So, for example, we might be able to infer from our description of the typical house that the dining-room is next to the kitchen, provided that all available premisses are consistent with the house having been built since 1900. Given the extra information regarding a particular house that it was built in the eighteenth century, this inference would be blocked, although none of the original premisses has been cancelled. Neither Wittgenstein nor Fillmore provides a formal account of language-games, or frames-and-scenes. But if the two ideas which I have picked out in this discussion - that the meaning of an expression is given in the context of its use, and that we reason largely on the basis of typical cases - are to be given formal embodiment, it certainly seems that some system of representation on the lines of computer scientists' frames will be needed. To state the requirement minimally, we need a way of representing as a unit what would otherwise be expressed in a whole set of propositions with a common subject-matter.
130
Computer science: graphs
Yet frames are probably only a first step in this direction. First, it is unclear at present upon what principles duplication is to be eliminated. To cite an example, if we can have frames for bodies and for actions, then would one incorporate the selection restrictions discussed by Fillmore into a frame for Measurement (an activity), or into some rather general frame or frames for bodies? Second, it might well be necessary to incorporate quantifiers (including second-order ones) and modal operators into frames; this would put devices like Skolem functions under great strain and the representations would certainly be very unperspicuous. Some method of representing these notions more directly in frames will have to be developed. Although, then, I envisage a role for frames in the representation of meaning, at the end of the day what they contain must be convertible into propositions, together with expressions of intention, command, question, etc. Equally, information expressed propositionally would have to be convertible into frame format. So frames would not be an alternative to propositional representations in the sense of supplanting them, but rather as an auxiliary. Moreover, since each frame would hold the information contained in a whole set of propositions, a fully satisfactory notation for frames is unlikely to be developed until outstanding problems concerning the representation of propositions have been settled. The main issue of this book therefore remains the prior task.
Categorial graphs
4.1 SCOPEWAYS My principal criticism of the graphs proposed by computer scientists for the representation of meaning has been that they do not provide for scope relationships and that, in consequence, they are unable to comprehend a straightforward account of quantification. At the end of section 2.5 I also argued that Frege's account of quantification involves graph structures, although this is concealed by his mixture of a planar and a linear notation. We need, therefore, to take up the computer scientists' idea of using graphs to represent meaning, but to give it a sound foundation by using the notion of scope as their principle of organization. This will lead to two extensions of the notion of scope. In my original account of Fregean grammar (Potts, 1973), I added numerical super-scripts to the letters occurring in category names as scope indicators. The reason for this was to show, for operators of degree > 1, which operand corresponded to which of their operand-places. This meant that the linear order of Cj. . .Cn in a category name was of no consequence to the structure represented. (We have already seen in section 2.5 that certain derivations require that we be able to change the linear ordering of these items.) Thus, to take the simplest example, instead of representing a structure for (1)
Dr Patel visited Mrs Wilson
simply by 4S(N,N) N N', it is represented by ^(N 1 ,!^ 2 ) N 2 N 1 '. The method was also applied to category names of degree 1, the super-script then attaching to the Co of the category name of the operand. So, as a structure for (2)
Dr Patel did not visit Mrs Wilson,
we get: not (visited S(S*) S!(N2,N3)
(Mrs Wilson, Dr Patel)) N3 N2 131
132
Categorial graphs
It should be evident that the pairs of super-scripts in this notation correspond to the edges of a tree or graph in planar notation. If we now revert to the three graphs with which chapter 2 ended, viz. (Gla) r A d o g barked
(G2a) At least one patient
(G3a)
r- Every doctor at least one patient
r- every doctor •
visited
1
visited -
the question arises: what do the edges from the verbs to the quantifying phrases represent? Originally, I called them 'back linkages' and represented them by sub-scripts in the linear notation, for example for (Gl): S(S!(N2)) S 1 ^ ) for (G2): S(Sl(ti4)) S ^ S 2 ^ ) ) S2(N3,N4). But I then went on to say that 'both kinds of link, however, serve the same purpose' and to propose, for occasions when super-scripts and subscripts were inconvenient, a notation which did not distinguish between them. This implies that all the edges in the three graphs represent scope. Yet it certainly goes against normal usage to say that 'a dog' in 'A dog barked' lies within the scope of 'barked' and that in 'Every doctor visited at least one patient', 'every doctor' and 'at least one patient' both lie within the scope of 'visited'. Indeed, since we also want to say that 'barked' lies within the scope of 'a dog' in the first example, and 'visited' within the scope both of 'every doctor' and 'at least one patient' in the second, the idea of scope as an organizing principle for graphs will be wiped out if back linkages also indicate scope. Yet the verbs 'barked' and 'visited', being the operators of schemas of the first-level categories S(N) and S(N,N) respectively, have scope. We have no hesitation in saying that 'Fido' lies within the scope of'barked' in 'Fido barked', and we represent a structure for it by 'barked (Fido)'. So, if we represent a structure for 'A dog barked' by 'A dog:x (barked (x))', and if both link letters belong to the second-level schema 'A dog:x (0(x))', must we not admit that the latter falls within the scope of 'barked' in the example given? And the same will follow, pari passu, for 'visited' in 'Every doctor visited at least one patient'. We have, then, a sophisma, an argument on the one hand that the quantifying phrases do not fall within the scope of the verbs and an argument on the other hand that they do. The solution to this dilemma is that, although the quantifying phrases do indeed fall within the scope of the verbs, they do so in a different way from that in which the verbs fall
Scopeways
133
within the scope of the quantifying phrases. Scope as used so far has been indicated, in linear notation, by writing an expression which lies within the scope of another to its right, with a bracketing convention for branching scope; this is the same as the convention that an operand is always written to the right of its operator. So what scope expresses is the operator/operand relationship, and it must remain absolutely clear that in (Gl), for example, the quantifying phrase is the operator and the verb the operand. On that, there can be no compromise, or we should never be able to assign the operator and operand(s) in any expression without danger of confusion. Yet the very category of a verb shows that it has scope. When, as in (Gl), the place normally occupied by a proper name as operand of the verb is occupied instead by the link letter of a quantifying phrase, it would be natural to call this ''back scope'. For the link takes us back, as it were, from the verb to the quantifying phrase. Thinking in terms of the linear notation, the 'forward' direction, through which ordinary scope is expressed, is from left to right, whereas the link letters take us back, from right to left, to the quantifying phrase. Similarly, in (G1)-(G3), the links from the verbs take us back (up, now) to the quantifying phrases. The kind of scope to which we have been used so far could then be termed, where it was necessary to distinguish it, forward scope. But these are mere labels. We still need to know whether back scope is scope in a different sense from forward scope, or merely scope in a different direction. In the linear notation, it is expressed by a different convention. Forward scope is shown by an ordering convention, back scope by link letters. If we were to follow that precedent in planar notation, we should have to distinguish two different kinds of edge, one for forward and the other for back scope. The implication would then be that we are dealing with scope in two different senses. But the linear notation is not decisive in this regard, for, once given the use of left-toright ordering to express forward scope, no other ordering convention is available to a linear notation, so some alternative means must be found to express any other principle of ordering. The method of super-scripts and sub-scripts represents both forward and back scope (the linear order of the category labels then becomes quite indifferent, although I have kept the usual forward-scope order in the examples above), so that, if it is not essential to distinguish the sub-scripts from the super-scripts, a single convention does both jobs. It is of some importance, however, that the left-hand end of a subscript link never goes to the Co of a category name, while the right-hand end of a super-script link always does so. This difference cannot be mirrored in (G1)-(G3), because each verbal label constitutes, with respect
134
Categorial graphs
to the edges, a single node with no internal structure. In order to obtain a parallel, it will be necessary to represent each expression as a labelled subgraph with a structure corresponding to that of the category name of the expression. For expressions of basic categories, that would simply be a single node, the limiting case of a graph. For the other categories which we have so far encountered, the following sub-graphs would be requisite: S(N) N
S(N,N)
N
N
S(S(N)) N
I shall allow these to be written in any order, right to left, downwards or upwards, in order to keep the pattern of edges as simple as possible. Thick lines have been used for the edges, for two reasons: first, because these edges do not indicate scope, whether forward or back; second, so that we can tell the extent of each sub-graph to which a linguistic label is attached. Since these labelled sub-graphs will not be the only sub-graphs that can be distinguished in our representations, it will be useful to have a special name for them. I shall coin hypograph for this purpose, as it has not been pre-empted by mathematicians. For hypographs in the series S(N), S(N,N), . . ., we need to provide, further, an indication of the semantic roles which they introduce; this is easily done by attaching a semantic-role indicator to each Af-node ('A' for Agent, T' for Patient). If we now re-write (Gla)-(G3a) using these hypographs, we obtain: (G2b)
(Gib) a dog
at least one patient
(G3b) every doctor
)S at least
barked visited
visited
Without the semantic-role indicators, we should not be able to tell from (G2b) and (G3b) who did the visiting and who was visited. (I have omitted the indicator from (Gib), where no confusion can arise.) It is
Scopeways
135
also to observe that an economy of representation can be effected by labelling the links instead of the nodes with a basic category name. It is then only necessary to label one node, that which, in the linear notation, has no super-script or sub-script. In these modified graphs, we are still relying upon a convention in order to determine forward scope, namely, that the graph is to be 'read' downwards. Moreover, if the N-edges represent back scope, from the verbs to the quantifying phrases, there is nothing to show this. Evidently, we need digraphs (directed graphs), in which the edges are given a direction. This will not only distinguish forward from back scope, but make the orientation of the graph on the page indifferent. The result is then: (Glc)
(G2C)
a dog
at least one patient
so—K:^—o
(G3c)
every doctor
o—o—os
at least
visited
visited
This leaves just one problem, which has so far been glossed over. If we compare the hypograph for category S(N,N) with that for category S(S(N)), the only difference between them is in the labelling of the nodes. How, then, do we know that the first hypograph represents category S(N,N) rather than category S(N(N))9 and that the second represents category S(S(N)) rather than category S(S,N)1 These differences could be made clear if we were allowed to give a direction to edges within hypographs. But what justification can be offered for this innovation and upon what principle should it proceed? Consider again the linear notation offered above to represent a structure for (2). The category label corresponding to 'visited' represents a branch in the structure; edge 1 comes into it from the label representing 'not' and edges 2 and 3 go out of it to the labels representing the two proper names. So, if we think of an expression as being represented by a hypograph, then one would expect any internal direction to the edges of such a hypograph to carry us from the nodes to which an external edge is input to those from which an external edge is output. Thus with an expression of category S(N,N) the internal direction would be from the S-node to each of the N-nodes, while for an expression of category S(N)
136
Categorial graphs
we should have just one directed edge, from the S-node to the N-node. For (2), this gives us: (G4)
Dr Patel Mrs Wilson The effect is thus to create a directed path, taking the form of a tree, whose root is the Co of the main operator and whose leaves are all nodes of category N. The new directions just introduced also carry us along the path of forward scope; so the corresponding direction in a hypograph representing a quantifying expression will be from its first to its second S-node, with the edge between the latter and the N-node remaining undirected. The pathway then terminates, as before, in N-nodes, for example (Gla)
(G2a) a dog
barked
at least
(G3a) every doctor
N
S at least j one patient
visited Each structure is now held together by a directed tree path, starting from a root (the root node is now shown by a square, as is customary) and embracing every node. I shall dub such a path a scopeway and recognize the directed edges within hypographs corresponding to category labels as also representing a kind of scope, which may be termed internal scope to distinguish it from the external scope, whether forward or back, with which we have been concerned hitherto. Thus both back scope and internal scope are extensions to the customary notion of scope, justified, I believe, by the structural role which they play when compared with that of forward scope.
Converging scope: (1) pronouns
137
Although it lies outside the remit of this book to discuss syntactic questions, a brief comment on the relationship of (G3) to 'Every doctor visited at least one patient' and of (G2) to 'At least one patient was visited by every doctor' is appropriate in view of section 1.3. To get from graph to proposition, the former must be mapped onto a linear structure. Suppose, for the sake of argument, that the linear structure onto which (G3) and (G2) are to be mapped is N"j V N"2y using the categories discussed in chapter 1. Then, if the scope order of the two noun (quantifying) phrases is the same as the linear order of the semantic-role indicators in the graph, the verb will be in the active voice, but otherwise in the passive. This will yield the most likely interpretation of each proposition; it would have to be modified to yield less likely senses. The passive voice would have to be defined, of course, to include the preposition 'by', so that the passive of 'visit' is 'be visited by'. This sketch should be enough to reveal one of the major sources of difficulty in current string grammars, namely, that they single out the subject of a sentence for special treatment by contrast with the complements of the verb, whereas all the operands of a verbal schema are structurally on the same level. It also shows why Frege's abandonment of the subject/predicate distinction marked an important advance. Yet it remains an open question whether the subject/predicate distinction should be retained in syntax, provided that no attempt is made to base an explanation of meaning upon it. As to the representation of expressions like 'At least one patient was visited', my view is that they are incomplete, being short for 'At least one patient was visited by something'. The 'by something' can be omitted because, 'visit' being a verb which requires two operands, it is embodied in its meaning that there must always be a visitor. So the graph for this incomplete expression would be (G2) with 'something' replacing 'every doctor'. It would be mapped onto 'At least one patient was visited by something' and then 'by something' would be deleted. Similar deletions of 'somewhere', 'sometime' and 'somehow' are possible, but they must occur in the last linear position in the syntactic structure. 4.2 CONVERGING SCOPE: (1) PRONOUNS Quantifying phrases are not the only examples of expressions which introduce into Fregean structures a level of complexity greater than that of trees; anaphoric pronouns also do so. Pronouns are commonly distinguished into those which have an antecedent (or, more rarely, a postcedent) and those which do not. I shall consider the latter in chapter 6 as deictic expressions: their meanings can often be given by pointing or
138
Categorial graphs
an equivalent gesture, for example with 'He's the one' in an identity parade, or by a non-linguistic context as with 4I think he's charming' said just after someone has left the room. The former are anaphoric pronouns; I shall argue that they are structural signs, that is, signs whose function is to show how the parts of the expression in which they occur are put together with respect to its meaning. Consequently, they will not occur as constituents of semantic structures at all.1 Evans suggested a division of anaphoric pronouns into those whose antecedent is a proper name (group 1) and those whose antecedent is a quantifying expression (group 2), the latter being further sub-divided into those which fall within the scope of the quantifying expression (group 2a) and those which do not (group 2b), which he called 'E-type' pronouns (1980, pp. 337-8). He held that a unified account should be given of the meanings of anaphoric pronouns (and in this I agree with him), but that only group 2a pronouns are susceptible of explanation as structural signs. I intend to show that, although this is correct so long as we are limited to Fregean linear notation for representing semantic structures, it is no longer so once categorial graphs are at our disposal.2 I begin, then, with group 1 pronouns. Geach introduced an apt term, now widely used, pronoun of laziness, for these; his characterization is 'any pronoun used in lieu of a repetitious expression' (1962, section 76). Thus in everyday language, we often repeat a proper name, as in (3)
Plato taught Aristotle and Aristotle admired Plato,
or, alternatively, use a pronoun: perhaps, even, simply omit the proper name. The following alternative to (3) illustrates both of these devices: (4)
Plato taught Aristotle and was admired by him.
I do not think that anyone else has defended precisely this view before. The nearest approach is Geach's (1962) treatment. Some linguists have argued that the linguistic context of an anaphoric pronoun provided by its antecedent is a limiting case of a nonlinguistic context and that, hence, the meaning of anaphoric pronouns should be explained along the same lines as that of non-anaphoric ones (see especially Lasnik, 1976). I do not propose to argue myself against this view here, as I am content with the refutation of it by Evans (1980). Evans's own account was that a group 1 pronoun refers to whatever its antecedent refers to; that for a group 2a pronoun, if we replace the quantifying phrase which is its antecedent by a singular term, then the resulting sentence can be interpreted as though the pronoun were in group 1; and that group 3 pronouns are singular terms whose reference is fixed by a description recoverable from the clause containing the quantifier antecedent. This account leans heavily upon a notion of reference, and to some extent also upon a notion of singular term, both of which I find obscure in spite of their ubiquity in the literature. Their use appears to constitute an attempt to extend the notion of a proper name and of its bearer to other expressions, yet somehow without quite averring that the latter - especially pronouns - are proper names.
Converging scope: (1) pronouns
139
Such anaphoric pronouns and repetitions of proper names are only needed because speech and writing are linear. In a planar notation, they can be completely eliminated if we allow scopeways to converge at their leaves. Thus both (3) and (4) can be represented by a single graph (G5). (G5)
N
Plato
N
taught (
OP
admired
N
Aristotle
N
O In order to represent this in the style of Frege's notation, we should have to write: (3F)
and (admired (Plato, Aristotle), taught (Aristotle, Plato)),
since his ideography contains no analogues of pronouns.3 It is, indeed, evident that there is no way of exhibiting in a linear notation a structure such as that shown in (G5) without either repeating the proper names or using some equivalent of anaphoric pronouns. A linear notation thus disguises a feature which stands out clearly in the graph: that each proper name falls within the scope of both first-level operators. I call this phenomenon converging scope, since the scope of (at least) two operators intersects at the node representing each proper name. The scopeway is also no longer a tree (in which scope always diverges), since some of the branches join up again at the leaves. Thus these group 1 pronouns can be seen as structural signs which show that the expression to which they relate falls within the converging scope of at least two operators. The above description requires a slight modification for reflexive pronouns. For example, the pair (5) (6)
3
Canning killed Canning Canning killed himself,
The top half of (G5) could be used to represent corresponding propositions with intransitive verbs, for example 'Plato laughed and coughed'.
140
Categorial graphs
which are related to each other much as (3) and (4), can both be represented by the graph: (G6)
NX
/ N Canning
Only one first-level operator occurs here, but the diverging scopes of the edges issuing from the operator node converge again upon the operand node, the operand thus falling within both branches of the operator's scope. Converging scope from two or more operators always calls for a personal (or relative) pronoun at the syntactic level, whereas multiple edges from the samefirst-leveloperator to one operand call for a reflexive pronoun.4 (Reflexive pronouns also occur within the scope of psychological verbs, but the latter lie outside the remit of the present enquiry; they can also sometimes be used for emphasis.) The graph notation avoids a difficulty concerning pronouns which afflicts the linear notation. Apropos the example 'Cato killed Cato', Frege observed that 'if we imagine 'Cato' as replaceable at both occurrences, then 'killing oneself is the function' (1879, section 9). This implies that there is only one operand in 'Cato killed Cato', for killing oneself is an action involving only one person. But surely 'killed' is an operator of category S(N,N), and thus dyadic, since it can form a proposition from two different proper names, as in 'Brutus killed Caesar'? Frege tried to avoid this contradiction by distinguishing between the schemas '( killed £ and '£ killed (', where the use of the same schematic letter indicates that the same proper name must be substituted for each occurrence. So '( killed £' is a schema of degree 2, in accordance with which 'Brutus killed Caesar' is constructed, whereas '£ killed (', in accordance with which 'Cato killed Cato' is constructed, is of degree 7, and hence of category S(N). The trouble with this is that, if the two schemas differ in semantic category, they must a fortiori have different meanings, whereas it is clear that 'killed' has the same meaning in both of the propositions cited.
4
This corresponds to the rules of binding in transformational grammar, that anaphors are always bound in their governing category but pronominals always free in their governing category.
Converging scope: (1) pronouns
141
Moreover, while the use of different symbols in a schema allows us to make different substitutions for them, there is no accepted convention (even in Frege's book) which forces us to do so. Consequently it would be quite legitimate to construct 'Cato killed Cato' from '( killed Q by substituting 'Cato' for '(' and also for '£'. How, then, can we tell in accordance with which schema 'Cato killed Cato' is constructed? In the graph notation, it is clear even in the reflexive case that 'killed' is of degree 2, but both edges converge upon the same operand, which thus completes the graph. As an example combining both personal and reflexive pronouns, we can show how (7) of section 1.4, repeated here as (7)
Octavian defeated Antony and he killed himself,
would be represented by a categorial graph, (G7).
O
In this case, there is a triple convergence upon the N-node labelled ' Antony \ With group 2a pronouns, those which occur within the scope of a quantifying phrase which is their antecedent, the meaning is often changed if the quantifying phrase is repeated instead of using the pronoun, for instance instead of'A politician killed himself, 'A politician killed a politician': in the second, it need not be the same politician, in the first, it must be. So far as structural representations in the planar notation are concerned, however, these cases introduce no new principle. They are just further examples of converging scope. Theoretically, however, they are important as exhibiting a further group of structures which are more
142
Categorial graphs
complex than trees, for, whereas converging scope induced by pronouns of laziness can be superficially5 avoided by repetition of the operands, with these examples involving quantifying phrases, converging scope is essential to the semantic structure. A simple illustration of this is provided by the proposition (8)
If anyone makes a sound, he dies.
The syntactic structure of this example is not a sound guide to its semantic structure: 'anyone' does not fall within the scope of 'if, and 'if anyone' means the same as 'everyone if. That is, (8) says of everyone that, if he makes a sound, then he dies. The linear representation of this is (8F)
everyoneix (if (makes a sound (x), dies (x))).
It is to observe that we have here a triple of link letters instead of just a pair. That means that both of the positions occupied by the two righthand letters are tied back to the same quantifying phrase, that is, that the scopes of the two first-level operators 'makes a sound' and 'dies' converge upon it. The graph for (8), accordingly, is (G8) makes a sound
dies
if
with the letters 'a' and 'c' marking the antecedent and consequent of 'if respectively. In this case, moreover, we cannot replace the pronoun by a quantifying phrase salva veritate. The meaning of 'If anyone makes a sound, anyone dies', if indeed it has one, is unclear, while we could not validly infer (8) from 'If everyone makes a sound, everyone dies'. Thus
5
It is only superficially avoided because we must then tacitly accept the convention that each occurrence of the proper name has the same bearer; there is nothing which shows this in the linear notation.
Converging scope: (1) pronouns
143
converging scope is an ineliminable feature of the semantic structure of this example.6 Pronouns of group 2b were identified by Evans using a syntactic criterion taken from transformational grammar: they relate to quantifier phrases, with the exception of those whose quantifier is 'any' or 'a certain', which either do not precede or do not c-command them (1977, p. 110; 1985, pp. 219, 223). However, he is not committed to this particular criterion, though he does regard 'the possibility of a principled, syntactic demarcation of E-type pronouns from bound pronouns' as 'the strongest weapon' with which his position may be defended (1977, p. 509). At the same time, he also offers two semantic criteria: first, that the pronouns lie outside the scope of the quantifier phrase to which they relate; second, that it is not possible to substitute 'no' for the occurring quantifier salva congruitate. His first examples involve plural pronouns: (9)
Few MPs came to the party, but they had a marvellous time.
This cannot be analysed by 'Few MPs both went to the party and had a wonderful time', since the latter would be true if many MPs had come to the party but only a few had enjoyed themselves, whereas (9) requires that all of those who did come had a marvellous time. Similarly, although (10)
John owns some sheep and Harry vaccinates them
might be construed to mean the same as 'There are some sheep such that John owns and Harry vaccinates them', the more natural understanding 6
According to Evans, Geach is committed to representing propositions containing group 1 pronouns on the same lines as those containing these group 2a pronouns, because he believes that 'pronouns must be the manifestations of a device . . . for the formation of complex predicates' (1977, p. 481). So proper names would be treated as 'singulary quantifiers' (p. 489), presumably on the lines of the 'logical form' of transformational grammarians, for example (7') for x = Antony (and (defeated (x, Octavian), killed (x,x))) for (7), as in section 1.5. Evans considered this a disadvantage of Geach's account, because it produces difficulties for representing pronouns occurring in intentional contexts. But Geach never employs any such 'singulary quantifier', nor does he ever speak of proper names as a special type of quantifying expression. It is also difficult to see how this type of representation could be squared with saying that 'Antony' is a pronoun of laziness in (7). A Fregean interpretation of Geach's position is much simpler and more likely, that is, that (7) is formed by taking 'Antony' as the operand of'Octavian defeated £ and £ killed (', which is of category S(N) and distinct from 'Octavian defeated £ and \ killed ', which is of category S(N,N,N). In the representation for (7), 'Antony' would, of course, occur three times, corresponding to a pronoun of laziness on the second and third occurrences.
144
Categorial graphs
is that Harry vaccinates all the sheep which John owns. (It is to observe that 'some' is being taken here to mean 'at least two\) Again, (11)
Mary danced with many boys and they found her interesting
is not equivalent to 'Mary danced with many boys who found her interesting', since the latter allows that she may also have danced with other boys who did not find her interesting. (But it is to observe that an appositive relative clause yields an equivalent to (11): 'Mary danced with many boys, who found her interesting'.) According to Evans, these considerations show that the pronoun in each example does not fall within the scope of the quantifying phrase. Moreover, substitution of 'no' for the quantifier in each example is inadmissible: *No MPs came to the party, but they had a marvellous time. *John owns no sheep and Harry vaccinates them. *Mary danced with no boys and they found her interesting. Yet in spite of these considerations, we may have a lingering doubt that the source of the problem lies not in the narrow scope of the quantifiers, that is, that the connective lies outside their scope, but in that they are plural, and so require plural pronouns. Evans dispels this doubt by giving some examples of singular pronouns belonging to group 2b, in particular: (12)
Just one man drank champagne and he was ill,
pointing out that 'Just one man both drank champagne and was ill' will not do as an analysis, because it is consistent with more than one man having drunk champagne, whereas (12) entails that just one man drank champagne. A similar example may be obtained from each of (9)-(l 1) by substituting 'just one' for the quantifier and changing the pronoun to the singular form.7 It is not immediately obvious, however, that the pronoun in another of Evans's examples, (13)
Socrates owns a dog and it bit Socrates
is of this kind. Curiously, though, he does not apply his earlier criterion of substituting 'no' for the quantifier, which produces the desired result: *'Socrates owns no dog and it bit Socrates'. Instead, he proposes substitution of 'every' for the quantifier, arguing that the result ought, 7
Evans rejects a suggestion by Geach that the category of'just one man' is S(S(N),S(N))> so that it requires two operands of category S(N) in order to form a proposition, on the very reasonable ground that 'just one man drank champagne' is an acceptable proposition in its own right (1977, pp. 501-2).
Converging scope: (1) pronouns
145
were the pronoun of type 2a, to mean the same as 'Every dog is such that Socrates owns it and it bit Socrates', whereas *'Socrates owns every dog and it bit Socrates' appears to be ill-formed. Yet we should be cautious about this example, for two reasons. The first is that, if Evans is right about it, must not the role of the pronoun in (4) be reconsidered? Evans draws back at this point: we can, he says, link operand positions of verbs across a connective, as in I I I I ( ) loves ( ) and ( ) loves ( ) but, if we insert pronouns into the right-hand positions, may only insert proper names into the left-hand ones (e.g. in 'Mary loves John and he loves her') and not quantifying phrases, 'because their scope will not be interpreted as reaching across the co-ordinate structure to bind the pronouns' (1977, p. 497). Yet does not 'Plato taught a Stagirite and was admired by him', which is closely akin to (13), say of a Stagirite what (4) says of Aristotle? And does not (13) say of a dog what 'Socrates owns Fido and he bit Socrates' says of Fido? But this is precisely Evans's own argument, against Lasnik, for giving a unified account of group 1 and group 2a pronouns, so should it not also apply to these pronouns whose antecedents are 'on the other side' of a connective? (1980, pp. 215-16). The second reason for caution is that the indefinite article in English is often taken to mean 'just one' rather than the 'at least one' of the existential quantifier, and it could be argued that this implication is carried by the pronoun in (13). Evans, at any rate, thinks so: 'When a pronoun is in a clause coordinate with the clause containing the quantifier, as in 'Socrates owned a dog and it bit Socrates', there is a clear implication that Socrates owned just one dog'. And he goes on to claim that 'There is a doctor in London and he is Welsh' implies that there is just one doctor in London, and cannot be used to say that there is at least one Welsh doctor in London (1980, p. 222). From a semantic point of view, then, 'a dog' in (13) would be equivalent to 'just one dog' and we have to consider whether the scope of 'just one dog' in 'just one dog is such that Socrates owns it and it bit Socrates' is wide or narrow when the latter is taken as a paraphrase of (13). Evans also offers other examples containing the indefinite article, in which he takes it to represent the existential quantifier ('some' = 'at least one'): (14) (15)
If a man enters the room, he will trip the switch. If there is a man in the garden, John will tell him to leave.
146 (16)
Categorial graphs If Mary has a son, she will spoil him.
He also supposes that the connective is the main operator, rejecting Harman's suggestion that 'a' here means 'any' and that the connective lies within its scope (1972, pp. 44-6). Yet (14) and (15) are, surely, comparable to (8), the first being equivalent to 'Any man, if he enters the room, will trip the switch' and the second to 'Any man, if he is in the garden, will be told by John to leave'? The temporal element in (16) makes it ambiguous: does the antecedent refer to the present or the future? On the former assumption, the sense appears to be that, of any boy, if she is his parent, then she will spoil him; the latter assumption posits a coming-into-existence which raises special problems of representation anyway. Evans's reasons for rejecting these analyses are unconvincing. First, he notes that 'some' may be substituted for the indefinite article, for instance 'If someone enters the room, he will trip the switch', and seizes upon Harman's admission that 'it is not clear how one might give a general characterization of the relevant contexts' (p. 45) in which 'any' is changed to 'some' in the course of the generation of the sentence. But that is a problem for syntax, not one of semantic representation. Following Harman again, he then asks how we should deal with plural quantifiers in contexts such as If several/few/many/two/three/. . . men come, they will be disappointed, but curiously does not mention Harman's claim that his suggestion can be extended to them. Thus Harman proposes that the numerical quantifiers be prefixed by 'any', for example 'If any three men come, they will be disappointed'. He does not, however, tell us how to analyse the propositions with 'several', 'few' and 'many', nor is the extension obvious. However, it may be significant that these difficulties again arise with plurals where they do not arise with singular quantifiers. We are left, then, with (12) as the only certainly recalcitrant example to the view that pronouns are structural signs: Evans is surely right that we cannot take 'just one man' as the main operator, for that would give us the wrong truth conditions. Moreover, we can sympathize with him when he baulks at Geach's solution, which is to say that there are really two quantifiers here, collapsed into a single syntactic form. Geach actually proposes this solution for (17)
The only man who ever stole a book from Snead made a lot of money by selling it,
Converging scope: (1) pronouns
147
which he paraphrases as The man who stole a (certain) book from Snead, in fact the only man who ever stole any book from Snead . . . etc' (1972, p. 100). This is not precise enough to make it clear how the two quantifiers would be related in the semantic structure; consequently it is also uncertain how it would be carried over to (12), which, unlike (17), does not contain a definite description. What is wanted is that a first quantifier 'at least one man' would have wide scope, extending to the end of the proposition, while a second quantifier, 'at most one man' would have narrower scope, extending only to the end of 'drank champagne'. But how would the link letters belonging to each quantifier go? That remains a mystery upon which neither Geach nor Evans enlightens us. There is, however, a possible solution which satisfies Evans's requirement that the pronoun not be within the scope of the quantifier and yet also still exhibits the pronoun as a structural sign. This solution is not available within the current conventions of the linear notation for quantification, but it is a possible way of constructing a categorial graph, and may be illustrated for (12) by (G9): (G9)
l'S
just
one man
drank champagne N
.was ill N
Here, 'was ill' does not fall within the scope of 'just one'; 'and' is the main operator, yet the scopeway is correctly constructed, with convergence upon the N-node of the quantifier and no possibility of pathway crossover. Indeed the N-path from 'was ill' to 'just one' represents very clearly just what we want, namely, that he who was ill was he who drank champagne.
148
Categorial graphs
The nearest equivalent to this in linear notation is (12F) and (just one man:x (drank champagne (x)), was ill (x)), but on the normal interpretation we should have to regard this as representing a schema of category S(N), since the last occurrence of V is not within the scope of the quantifier. Now that we have extended the notion of scope to include back scope, however, there is no reason why we should not re-interpret the linear notation so as to bring it into accord with categorial graphs. That is to say, although a quantifier's link letter must occur at least once within its forward scope, it may also be used subsequently outside its forward scope to indicate back scope only. Under this convention, which I shall henceforth adopt, (12F) will represent a proposition and not a schema, indeed the same proposition as (G9). Thus pronouns of group 2b, although perhaps not as ubiquitous as Evans supposed, are indeed to be found, but can be regarded as structural signs no less than other anaphoric pronouns. In generating propositions from our representations it would, of course, be necessary to ensure that the back-scope link always produced a pronoun, for example in the case of (12) to prevent 'Just one man drank champagne and was ill' being generated from (G9) or (12F); for that, we should need a representation in which the scope of the quantifier phrase extended to the end of the whole sentence. Meanwhile this discussion leaves a residual problem, the representation of plural quantifiers, which I shall take up again in section 7.3. 4.3 CONVERGING SCOPE: (2) RELATIVE CLAUSES We have seen from section 1.4 that the difference between appositive and restrictive relative clauses carries with it a difference of truth conditions, and should therefore expect the distinction to be marked also in semantic structures. Evans attempts to provide for this by treating 'all natural language quantifiers as binary, taking two predicates . . . to make a sentence' (1977, p. 521). Since, by 'predicate', he means an expression of category S(N), he is thereby assigning quantifiers to category S(S(N),S(N)). The first operand of a quantifier is then the count noun which it qualifies, treated in Fregean style as an expression of category S(N), so that the quantifying phrase is of category S(S(N)), as before. The difference between restrictive and appositive relative clauses is that a restrictive relative clause forms, together with the count noun which it qualifies, a complex (first) operand of category S(N)9 whereas an appositive relative clause is incorporated into the second operand. In Evans's words, 'the first predicate of the binary structure has the role of
Converging scope: (2) relative clauses
149
identifying the objects whose satisfaction of the second is relevant to the sentence's truth or falsity' (1977, p. 522). Before pursuing this in detail, let us consider a little more closely the proposed categorization of quantifiers. Although it respects Frege's treatment of count nouns, he assigned quantifiers to category S(S(N)). This was made possible by 'discovering' disguised binary connectives in propositions of the four traditional forms, 'Every S is P\ 'Some S is P\ etc. Thus the former was analysed as 'Everything, if it is S, is />' and the latter as 'Something both is S and is P\ Evans's objection to this is that, as well as being unintuitive, it does not cater for other quantifiers, such as 'most', 'many', 'few', 'several' and specific numbers, where we should be at a loss to find a suitable binary connective. He could have added that there is a precedent in Russell's theory of descriptions, that is, Russell's analysis of propositions containing the definite article where it bears the sense 'the one and only' (Russell, 1905). For, as Dummett has pointed out, Russell's analysis treats the latter as if it were of category S(S(N),S(N)), although the notation which he defines does not (1973, p. 162). Thus, to cite an example, Russell's notation for (18)
The governor of Cyprus lives in Nicosia
is tantamount to (18F) lives in (Nicosia, the:x (governs (Cyprus, x],8 which assigns 'the' to category N(S(N))9 that is, a second-level operator of degree 1, forming a proper name from a schema of category S(N). But Russell's analysis tells a quite different story: (18F') something:x (and (and (governs (Cyprus, x), lives in (Nicosia, x)), everything:y (if (governs (Cyprus, y), y —x]. The existential quantifier at the beginning ensures that at least one thing governs Cyprus, the final, universally quantified clause, that at most one thing does so, but nowhere can we find a complex proper name corresponding to 'the governor of Cyprus'. Apart from the quantifiers, connectives and identity-sign, just two schemas occur in (18F'), 'governs (Cyprus, 0 ' and 'lives in (Nicosia, £)'. Each of these is of category S(N), so it follows that the remainder must constitute a second-level schema of degree 2 and category S(S(N),S(N)). Instead of (18F), therefore,
I adopt, here and subsequently, a convention of the computer-programming language LISP whereby a square closing bracket closes all round opening brackets which are still open.
150
Categorial graphs
Russell's notation should have been: (18F") just one thing:x (governs (Cyprus, x), lives in (Nicosia, x]. In order to obtain a proposition from this, the first operand of the quantifier must be nominalized, for instance from 'governs Cyprus' we form 'governor of Cyprus'. It remains controversial whether Russell's analysis of propositions containing the definite article in the sense of 'the one and only' is correct. Frege offers an alternative analysis in which the definite article is an operator forming a proper name and in which the notation and its definition are, unlike Russell's, consistent (1893, section 11). Many philosophers have preferred to follow Frege rather than Russell here and to say that the existence of the one and only thing mentioned in the proposition is a presupposition of its truth (or falsity) rather than one of its truth conditions. It is not my intention to come down on one side or the other of this controversy here. Russell's theory of descriptions, however, is instructive not only in showing that the idea of assigning quantifiers to category S(S(N),S(N)) is not new, but also because it uses the Fregean quantifiers 'everything' and 'something', assigned to category S(S(N)), in order to define 'the (one and only)' as a binary quantifier. So, too, then, a Fregean could accept Evans's categorization of 'every' and 'some' but go on to define them in terms of 'everything' and 'something' respectively plus the appropriate connective. Evans proposes to analyse propositions containing appositive relative clauses entirely on Fregean lines and in conformity with Geach's account. So they will not demand any internal analysis of quantifying phrases, which can simply be shown as expressions of category S(S(N)). These cases generate perfectly straightforward, if sometimes complicated, categorial graphs. Thus for several examples of the previous section, the same thought could have been expressed using a relative clause instead of a personal pronoun. Instead of (4), one could say: 'Plato taught Aristotle, by whom he was admired', instead of (7), 'Octavian defeated Antony, who killed himself and, instead of (8), 'Anyone who makes a sound dies'. Since these will be true in just the same circumstances as (4), (7) and (8) respectively, we should not introduce distinct semantic structures for them unless there are compelling reasons to do so. In general, it seems that an appositive relative pronoun replaces a connective followed by a personal pronoun, such as 'and he' in (7). But how do we know which connective to use? When the antecedent is a proper name, the connective is always 'and', but when it is a quantifying phrase, the matter is more complicated. Whereas 'anything which . . .' or
Converging scope: (2) relative clauses
151
'everything which . . .' can be paraphrased by 'everything if it . . .', 'something which . . .' must be paraphrased by 'something both . . . and (it) . . .'. So the graph for (19)
Someone who made a sound died
would accordingly be the same as (G8), except that, with the change of the root label to 'someone', the label of the lower hypograph would have to be changed to 'and' (together with appropriate changes to the tenses of the verbs). So far, the examples of propositions containing personal and relative pronouns which I have given have been restricted to cases in which only one quantifying expression occurs. Examples with multiple quantifying expressions do not raise any new issues of principle regarding the semantic representation of pronouns, but merely give rise to more complicated graphs. Here is an uncontroversial example, taken from Evans (1980, p. 218): (20)
Few men despise those who stand up to them.
In this context, 'those' means 'any persons', so the Fregean linear representation of a semantic structure for this will be: (20F) few men:x (every person:y (if (stands up to(x,y), despises (y,x]. It is instructive to draw a graph, (G10), corresponding to (20F) in which only a single node, instead of a proper hypograph, corresponds to each label: (G10)
stands up to
despises
152
Categorial graphs
The difficulty with this is that it allows us not only to trace a circuit from 'every person' to 'if to 'stands up to' to 'every person' again, but also from 'if to 'stands up to' to 'every person' and back to 'if again, or from 'stands up to' to 'every person' to 'if and back to 'stands up to' again. Corresponding possibilities exist on the 'despises' side of the graph. Nothing of the sort is possible with the node labelled 'few men', because 'everyone' labels the root of the graph, so that circuits must begin and end at that node, and may not pass through it; but the node labelled 'every person' is not the root of the graph and so we cannot insist that circuits in which it occurs should begin and end with it. There must, indeed, be one path which passes through it, namely that from 'few men' to 'if, which indicates two of the principal scope relationships in the structure. The paths from 'stands up to' and 'despises' to 'every person', however, should end at the latter, just as the corresponding paths from them to 'few men'. These problems disappear immediately once we represent each label by a hypograph corresponding to its category, viz. (Gil). (Gil)
every person
despises
The only path which passes through the hypograph labelled 'every person' is now that from 'few men' to 'if, which is as it should be. It is to observe, moreover, that we no longer have any circuits, since the paths originating from the two quantifying phrases now originate from a distinct node from that at which they terminate, and there is no path between the two nodes. But the price paid is a more complex graph, since there is an edgecrossing. Consequently (Gil) is not planar. However, there is a theorem
Converging scope: (2) relative clauses
153
that the genus of a graph never exceeds its crossing-number, where the latter is the smallest number of crossings occurring when the graph is drawn in the plane (see Wilson, 1979, pp. 69-70). Thus (Gil) is a graph of genus 1, that is toroidal. When we turn to propositions containing restrictive relative clauses, Evans proposes that we should regard the count noun which is qualified by the quantifier together with the relative clause as forming a complex expression of category S(N). In such cases the relative pronouns are not in general equivalent to a connective plus a personal pronoun, but are 'devices of predicate abstraction, enabling us to form a predicate: '(who) loves (John)' from a sentence frame with one free singular term position: '( ) loves John" (1977, p. 522). We are allowed to attach these predicates to count nouns, for example 'girl: (who) loves (John)', and then use any of the resulting expressions as the first operand of a quantifier. If we think of this procedure in categorial terms, it is clear that it is incorrectly described. For the sentence frame with which we begin is an expression of category S(N), while 'predicate' is Evans's term for an expression of the same category. But the count noun is also, according to him, an expression of category S(N). Merely juxtaposing two expressions each of category S(N) cannot yield a single expression of that category nor, indeed, a single expression of any category.9 In order to obtain an expression of category S(N)y an operator is needed. Neither of the given expressions of category S(N) can fill this role, since neither could have the other as operand. Thus the only solution is to make 'who' the operator, and its category will then be S(S(N),S(N),N), that is, that of an expression which will combine two predicates (in Evans's sense) into a single one. Moreover, this makes it an operator of second-level, so his notation for the complex predicate is also wrong and should be who:x (loves (John, x), girl (x), {). We now have to consider the consequences of these category assignments for quantifiers and for restrictive relative pronouns. First, suppose that we complete the above expression of category S(N) into a proposition by supplying a proper name - say, 'Mary'. The result should then represent the proposition 'Mary is a girl who loves John'. However, we cannot construe the relative clause here as appositive, because the 9
An apparent counter-example to this claim is the juxtaposition of two count nouns, as in 'telephone bill'. However, the first word here is functioning as an adjective, as may be seen from the expansion 'bill for using a telephone'; to account for its meaning in this context, we should therefore have to introduce an operator upon 'telephone' forming an expression of a new category from the count noun 'telephone', which could then combine with 'bill'. In other cases it may be the second noun which is a disguised operator, for example 'toy train', which is a toy having the form of a train.
154
Categorial graphs
sense would then be 'Mary is a girl and she loves John', which is represented by the much simpler first-level structure and (loves (John, Mary), girl (Mary)). Thus we must distinguish between 'Mary is a girl, who loves John' (appositive) and 'Mary is a girl that loves John' (restrictive). If we allow ourselves to use 'which (who)' in restrictive relative clauses, we must also say that it has a different meaning (because a different category) in them from its meaning in appositive relative clauses. This seems to be a welcome consequence, for it reinforces Fowler's convention of using 'that' instead of 'which' for the restrictive relative pronoun - a convention which Evans does not follow, but to which I shall now return (in examples). A further consequence is that, since both operands of quantifiers are of the same category, there is no way in which we can exclude appositive relative pronouns from their first operands nor restrictive relative pronouns from their second operands. The second of these possibilities may be deemed to add flexibility to the analysis. Thus, for example, something:x (woman (x), that:y (loves (John, y), linguist (y), x]. could be construed as representing 'Some woman is a linguist that loves John', although one could also argue that the same thought could be expressed more economically as 'A (woman) linguist loves John', which has a much simpler semantic structure. However, the apparent flexibility in this case depends upon a special feature of the existential quantifier, that the order of its operands can be switched without altering the meaning; thus, if we replace 'something' by 'everything', the structure can only represent 'Every woman is a linguist that loves John' and not 'Every (woman) linguist loves John'. A serious difficulty is raised, though, by the correlative case in which the first operand of a quantifier contains an appositive relative clause, which, ex hypothesi, is analysable as a connective plus a personal pronoun. So, for example, everything:x (and (loves (John, x), girl (x)), grossly deceived (x] is quite correctly formed on Evans's criteria, but what does it represent? It cannot represent 'Every girl that loves John is grossly deceived', because the first operand of the quantifier does not contain the restrictive relative pronoun; nor can it represent 'Every girl both loves John and is grossly deceived' ('Every girl, who loves John, is grossly deceived'), because the conjunction between 'loves John' and 'is grossly deceived' is
Converging scope: (2) relative clauses
155
missing and can only be provided by putting 'loves John' into the second operand: everything:x (girl (x), and (loves (John, x), grossly deceived (x]. So we are landed with a set of correctly formed representations which we cannot construe. This is an object-lesson in the incoherence into which a notation can lead when category assignments are not borne in mind right from the beginning, for Evans's proposal originally seemed very promising. Evans does consider, in great detail, a different objection to his proposal, namely, that expressions of the form 'A that F', where a count noun is substituted for 'A' and an expression of category S(N) for 'F', do not form semantic units. He ascribes the objection to Geach, and replies to four arguments which he finds in Geach (Evans, 1977, pp. 777-97; 1985, pp. 153-75). Geach, however, denies that 'a defining relative clause goes along with its antecedent to form a complex general term' (1962, section 70, p. 114), which is not quite the same thing, for although he, too, does not give categorial analyses, he argues that common nouns, which are general terms, can be used as logical subjects, and are then used as names, albeit not proper names (1962, section 34). In terms of categories, this would mean that general terms used as logical subjects would be expressions of a basic category, though distinct from category N. Strictly, then, Geach's position is that a restrictive relative clause together with its antecedent does not form an expression of a basic category, and Evans in fact agrees with this, since he says that it forms an expression of category S(N), which is not a basic category. However, it is true that Geach also denies that such a phrase is a 'genuine logical unit' (ibid., section 71, p. 115) and claims that it is 'a sort of logical mirage' (ibid., section 72, p. 118), so perhaps Evans can be excused for thinking that his view is incompatible with Geach's. Yet in spite of these remarks, I think that we must understand Geach's position strictly, because the analysis which he proposes for propositions containing these expressions allows us to form a schema corresponding to them. Thus, as an analysis of (21)
Any man that owns a donkey beats it,
which parallels my example (13) of section 1.4, 'Everyone that borrows a book should return it', he proposes: (2IF) Any man, if he owns a donkey, beats it, that is:
156
Categorial graphs
(2IF') Any man:x (a donkey:y (if (owns (y,x), beats (y,x))))10. Well, we have only to remove the quantifer and the second operand from this to form a schema whose operand is equivalent to 'man that owns a donkey', viz.: Q. man:x (a donkey:y (if (owns (y,x), $ (y,x)))). The link letters show that the expression substituted for 'Q' must be of second-level. The category of the schema will depend on the category to which 'man' is assigned, so, if we follow Geach's intentions and assign it to a basic category C of common nouns, the category of the whole schema will be S(S(C,S(N)),S(N))9 which makes it a mixed thirdsecond-level category of degree 2. This is admittedly a much more complex result than the categorization proposed by Evans, but it does show that Geach's position would be selfcontradictory unless understood strictly, as denying only that restrictive relative clauses form complex general terms from general terms. Now I shall eventually argue that Geach is wrong about this, too, and that will be the place to consider Geach's detailed arguments for his position and Evans's replies to them. Let us now suppose that we have found some way to exclude any complex expression of category S(N) from being the first operand of a quantifier unless it has been formed by using the restrictive relative pronoun 'that', categorized as above. We could then bring out the distinction between our previous examples (22) (23)
Eskimos, who live in igloos, have lots of fun Eskimos who live in igloos have lots of fun,
using T L thing:x (0x, t/>x)' as a quantifier for plurals which are not preceded by any overt quantifier, with the following respective analyses: (22F) PL thing:x (eskimo (x), and (lives in igloo (x), has lots of fun (x] (23F) PL thing:x (that:y (lives in igloo (y), eskimo (y), x), has lots of fun (x]. The analysis will also accommodate restrictive relative clauses in which the relative pronoun has been applied recursively, as in another example used already: (24)
10
Anyone that hurts anyone that hurts him hurts himself,
Or, perhaps: Any man:x (every donkey:y (if (owns (y,x), beats (y,x)))). I shall consider this alternative later; the difference is not relevant to the present point.
Converging scope: (2) relative clauses
157
for which we need: (24F) anything:x (that:z (any:y (that:w (hurts (z,w), person (w), y), person (z), x), hurts (x,x]. But it may be objected against both (23F) and (24F) that the meanings of (23) and (24) can be captured quite adequately by an analysis using the connective 'if and a pronoun. Thus we can paraphrase (23) as Eskimos, // they live in igloos, have lots of fun, which does not force us to dissect the quantifying phrase or, if we still prefer to do so, allows the relative clause to occur in the second operand of the quantifier: (23F') PL thing:x (eskimo (x), if (lives in igloo (x), has lots of fun (x]. The difference in meaning between (22) and (23) would then be accounted for, not by any structural difference, but by a difference in the underlying connective. It is not so easy to give a simple paraphrase of (24) with 'if and personal pronouns replacing the relative pronouns, but the analysis Geach offers is: (24F') everyone:x (if (everyone:y (if (hurts (x,y), hurts (y,x)))), hurts (x,x] (1962, section 68). Dissecting the quantifying phrases, this becomes: (24F") everything:x (person (x), if (every:y (person (y), if (hurts (x,y), hurts (y,x))), hurts (x,x], so that the first 'if occurs within the second operand of the first quantifier, and the second 'if within the second operand of the second quantifier. There are two replies to this objection. The first is that, unless an analysis in the style of Geach is universally applicable, one in the style of Evans will be preferable provided that it can also accommodate the examples which Geach can handle. But we know that there are quantifiers for which there is no obvious choice of connective to replace a restrictive relative pronoun; moreover, it would be possible to interpret 'that' as introducing a condition which must be satisfied for the proposition to be true, otherwise making it false, so that (23F) and (23F) would be true in just the same circumstances, and similarly (24F) and (24F"). Second, whereas on Geach's analysis failure to satisfy the restrictive relative clause automatically makes the proposition false, on Evans's analysis we have the option, instead, of specifying a third truth
158
Categorial graphs
value which the proposition takes if the restrictive relative clause is not satisfied, thus leaving room for one type of presupposition theory. On the first of these two scores, Evans has a further argument. He holds that we can also find propositions with restrictive relative clauses in which a subsequent pronoun belongs to group 2b and, indeed, that 'it' in (21) is, pace Geach, such a pronoun. His arguments against (2IF') are nevertheless obscure. First, 'If the sentence is to express the intended restriction upon the major quantifier - that of being a . . . donkeyowner - it would appear that the second quantifier must be given a scope which does not extend beyond the relative clause' (1977, p. 504). Why? Some further support is surely needed for this claim. Second, propositions such as (21) 'do not entail the sentence which results when the existential quantifier is given wide scope' (ibid., p. 505). Unfortunately he does not specify this proposition, but from what follows it seems that he has in mind, not (2IF), but, rather, 'A donkey is such that every man who owns it beats it', since he claims that ' "The only man who owns a donkey beats it" . . . entails a wide-scope sentence "A donkey is such that the only man who owns it beats it" not in virtue of its form, but in virtue of particular semantic properties of the quantifier " T h e " ' (ibid.). Geach's analysis of (21), however, does not give 'a donkey' wide scope in this sense, that is, does not make it the main quantifier, nor does (2IF') entail the wide-scope proposition. Yet Geach's analysis of (21) has proved very controversial and given rise to a large literature. 11 The objection to it springs in part from transformational grammarians on the ground that 'a donkey' does not ccommand the pronoun 'it' and so cannot bind it. However, this is taking 'bind' in the sense which it bears in transformational grammar, and we have no guarantee, whatever the intentions of transformational grammarians may be, that this corresponds exactly to the sense of 'bind' in logic. Geach's analysis, if wrong, must be shown to be so on strictly logical grounds. As a start in that direction, it fails Evans's 'no' test for bound pronouns: *'Any man who owns no donkey beats it'. But we can substitute 'no donkey.y >y' for 'a donkey:y 0y' in (21F'). The sense is then that if you take any man, you will not find a donkey such that, if he owns it, he beats it. This may arouse our suspicions about the meaning of (2IF'); considering it more closely, the sense is: if you take any man, you will find at least one donkey such that, if he owns that donkey, he beats it. This cannot be right, for it says that we can identify the donkey which is 11
Kamp (1981), Haim (1982), Haik (1984), Hornstein (1984), May (1985), Reinhart (1984), Lappin (1988/9).
Converging scope: (2) relative clauses
159
potentially in for a beating by the man we are considering, before we know whether he owns it. But what we want to say is he will beat any donkey which he happens to own. So 'a' must be short for 'any', in the sense of 'every' here, not short for 'at least one', and we need a second universal quantifier in the analysis: (2IF") every man:x (every donkey:y (if (owns(y,x), beats(y,x)))). An objection which has been made to this analysis is that it requires that every man beats every donkey that he owns and that it is not clear that this represents the correct reading, at least for some speakers (Cooper, 1979, p. 81). This is confused. What lies behind the objection is, presumably, that the indefinite article can sometimes be taken to mean 'just one', so that (21) could be understood in the same sense as (25)
Any man who owns just one donkey beats it.
However, if we do not so interpret the indefinite article in (21), it must mean either 'at least one' or 'any', of which the former is excluded by the argument above. At best, then, this objection shows that (21) is ambiguous, but that is no bar to acceptance of (2IF") as an analysis of one of its senses. We may then go on to enquire, however, how (25) should be analysed. It would be natural to suppose that the only difference between (25) and the other interpretation of (21) lies in the choice of the second quantifier, so that both share the same structure. But (25F) every man:x (just one donkey:y (if (owns(x,y), beats(x,y)))) is open to the same objection as (2IF): it says that for each man you will find just one donkey such that, if he owns that donkey, he beats it. Again, then, the donkey in question is said to be identifiable before we know whether he owns it, whereas, in (25), it is identified via its ownership. This time, the situation cannot be saved by changing the second quantifier, so we must look for a different structure. Evans's proposal is then immediately attractive, for it sets aside the first operand of a quantifier precisely to identify the subject of which something is said by the second operand. So, here, the first operand would be '( is a man that owns just one donkey'. But, in that case, the scope of 'just one' must be confined to the first operand and cannot bind the pronoun 'it' which occurs in the second operand. Consequently the latter must be a pronoun of group 2b. Using the convention introduced in the last section, we can represent this by: (25F) everything:x (that:z (man(z), just one:y (donkey(y), owns(y,z) x), beats(y,x],
160
Categorial graphs
but at this level of complexity a categorial graph, (G12), will probably be clearer. (G12)
OA
beats
N
The link from the patient N-node of 'beats' to the N-node of 'just one thing' corresponds to the pronoun of group 2b, but 'beats' does not lie within the scope of 'just one thing'. It may be of interest that this example - like the only other one, (12), which quite certainly contained a pronoun of group 2b - employs the quantifier 'just one'. As to the question whether corresponding examples with other quantifiers should be assigned the same structures, it is not necessary to settle it here: it is enough to have shown that, whichever analysis is chosen, it can be represented by means of categorial graphs and that, either way, the pronouns can be understood as structural signs. It would, of course, help to settle the question if we could be certain, in the case of quantified examples, which relative clauses were restrictive and which appositive. This must, in the last resort, be a question of meaning, but so long as we can give an analysis of a proposition which does not require dissection of quantifying phrases and yet gives a correct
Converging scope: (2) relative clauses
161
account of its meaning, what ground have we for holding that the relative clause is restrictive rather than appositive? Take, for example, the Bach-Peters proposition cited in section 1.4, (26)
A boy that was fooling her kissed a girl that loved him.
In spite of the double use of 'that' to indicate restrictive relative clauses, we can represent the circumstances under which it will be true by (26F) a boy:x (a girl:y (and (and (loved(x,y), kissed(y,x)), was fooling(y,x], and it is not clear that we effect any difference in meaning by writing, instead, 'A boy, who was fooling her, kissed a girl, who loved him'. As Evans says, 'it is always possible to regard the relative clause appended to a simple existential quantifier as a non-restrictive clause' (1977, p. 527). Indeed, he argues that we cannot construe the relative clauses in BachPeters propositions as restrictive. To this we may add that (26) fails Jackendoff s negation test: *'A boy that was fooling her did not kiss a girl that loved him' does not make sense, because we must understand 'a girl' as meaning 'any girl' so that, ex hypothesi, there is no 'her' for the boy to be fooling. So I conclude that (26) must be amended to (26')
A boy, who was fooling her, kissed a girl, who loved him.
The corresponding graph (G13) is interesting for having two ineliminable edge-crossings, one of which reflects the 'cross-over' of the pronominal anaphora which is palpable in the proposition itself. (G13)
162
Categorial graphs
So far as their use of pronouns is concerned, both (24) and (26) are of quite unusual complexity; if our structural analysis can handle them, it should be able to meet any demands upon it in this direction. 4.4 SCOPE AT HIGHER LEVELS Anaphoric pronouns have higher-level analogues. Chomsky observed long ago that in a proposition like (27)
Bill has a chance to live and so has John
'so' is a pro-verb (phrase), 'in much the same sense in which he is a pronoun' (1957, p. 66). We might, then, call this a pro-verb of laziness, since it saves us from having to repeat the first-level operator 'has a chance to live'. The question therefore arises whether only one representation of the latter should occur in a graph for (27). The matter is no longer so simple as it was for basic expressions, since operators have scope, and in (27) a different operand lies within the scope of each occurrence of 'has a chance to live' - 'Bill' in the first case and 'John' in the second. With only a single representation of'has a chance to live' (category S(N)), therefore, the scopeway is no longer a tree even up to its penultimate nodes. We should have a graph as in (G14), in which the path could be followed so as to 'cross over' from left to right or right to left. Moreover, we have a hypograph of category S(N) which appears to have two operands, and an N-node with an out-degree of 2. Clearly this will not do. (G14)
(G15)
It might be possible to surmount the difficulty by dividing the nodes of hypographs, and the directed edges between them, so as to allow 'insulated' pathways through them. But this would be very complex visually and extremely unperspicuous; moreover, the result would no
Scope at higher levels
163
longer, strictly, be a graph. The alternative is to duplicate nodes or hypographs (as required). If such duplicated nodes are joined by edges, then we should simply be replacing single nodes by sub-graphs and subgraphs by more complex ones, thus preserving the status of the representation as a graph. These edges, however, represent neither external scope nor the binding together of nodes into a hypograph in order to represent a single expression of a given category. They should, accordingly, have a special style, such as the dotted line used in (G15), which shows how (27) would now be represented. It is to observe that this innovation restores the original account of a scopeway; it will now always be a tree whose root is an S-node (assuming that a representation of a proposition is in question) and whose leaves are N-nodes. A further example of converging scope at the first level, but now with quantifying expressions instead of proper names, is the following: (28)
Most undergraduates are impecunious and so are some graduates.
This example, for which the graph is (G16), shows very clearly what damage would be done if the two first-level hypographs were to be merged, for there would then be a path from the S-nodes of 'most undergraduates' via 'are impecunious' to the N-node of 'some graduates', together with a corresponding path from the S-nodes of 'some graduates' to the N-node of 'most undergraduates', both of which would clearly be incorrect. (G16) ^ S
most
t .
under- <"> £ graduates!^
and
n-- - - O T are im . I pecumous
„
Zm
A s ° me , | ^duateS
These are both examples of pro-verbs of laziness. The corresponding modification necessary to the graphs in the previous section exemplifying pronouns of laziness, where only expressions of a basic category lie within converging scope, is minimal; instead of two or more N-edges converging on a single node, we shall have two or more nodes connected by dotted edges, each with an in-degree of 1 for a single N-edge. But the matter is more complicated when pronouns not being pronouns of laziness are involved, because the N-nodes lying within converging scope are then not individually labelled (that is with proper names), but are the N-nodes of
164
Categorial graphs
first-order (second-level) quantifying expressions. Yet it is only the Nnodes of these graphs which should be duplicated (triplicated, etc.), because their other nodes do not lie within converging scope. Consequently the thick lines will branch, instead of all of them being duplicated (etc.), as illustrated in the revision of (G8), (G8a). (G8a) dies
o From a theoretical point of view, nodes joined by a dotted edge should be regarded as representations of a single node, as indeed the single label for the hypograph is intended to indicate, and such duplications should be ignored for the purpose of determining the category of a hypograph. Similarly, thick edges connecting nodes at least one of which has been duplicated should be regarded as representations of a single edge. The duplication (triplication, etc.) of nodes is solely to avoid cross-over of pathways; it seems better to allow this complication in hypographs than to obscure scopeways. One may think of the two nodes as being superimposed in a third dimension which cannot be shown in a planar representation: from above, the graphs would look as in sections 4.2 and 4.3, but no paths would cross because, in the third dimension the nodes would be duplicated (triplicated, etc.). Thus the method of representation adopted here may be regarded as a concession to the limitations of a sheet of paper as a plane surface and, since a duplicated node is a double representation of the same node, it will still be appropriate to speak of converging scope. The higher-level analogue of pronouns which are not pronouns of laziness arises when a first-level expression is replaced by a quantifying expression, the latter then being of third level (that is a second-order quantifier). An example of this was given in section 2.5 when discussing type raising, but as there was no second-level expression falling within the scope of the third-level one in that example, it will be easier to begin the discussion of third-level operators with the formula
Scope at higher levels (29F)
165
Vf(3x(fx)),
which could roughly be taken to represent the proposition (29)
Every property is to be found in something.
Admittedly the probability of this proposition occurring in everyday speech is extremely small, but by considering it the way will be paved to represent other, more commonplace examples. In linear categorial notation it would be represented by S(S!(S2(N3))) S1(S2(N3)), from which the corresponding graph can immediately be constructed: ^
'
everything
something It is quite clear from the graph that 'everything' must be of third-level, because it has four nodes and three external edges, by contrast with the second-level quantifiers in (G16), which each have three nodes and two external edges. Turning, now, to the case mentioned in section 2.5, our graph notation shows more clearly than Frege's linear notation that a genuine secondlevel schema always requires an external S-edge and an external N-edge. Moreover, although, in representing propositions like (30)
Paul is what I am not
we might use second-level hypographs to represent T and 'Paul', if the hypographs were then labelled with these expressions, that would be equivalent to assigning them to category S(S(N)). But rule (R4), introduced in section 2.5 to obviate type raising, allows a third-level schema to take an expression of a basic category as its operand and, in general, a schema of level n to take operands of level n —3 as well as n— 1. This calls for only a minor modification to the planar notation. If we look at the representation of a third-level operator in (G17) and imagine its external N-edge joining it to an isolated node, labelled with a proper name, we should be left with two unattached S-edges, one going out and the other coming in. In (G17) itself, the second-level operator meshes with these to form a path from the second S-node of the third-level operator, via the two S-nodes of the second-level operator, back to the
166
Categorial graphs
third S-node of the third-level operator. So it would be a small modification to allow this path to be contracted into a single, internal edge within the third-level hypograph from the second S-node to the third (or, perhaps, an external loop). This, accordingly, is what corresponds in the planar notation to rule (R4) in the linear notation. Where an operator intervened between the quantifying phrase and the proper name, as in the formula 43 f (N(fa))\ the path between the two Snodes of the quantifying phrase would be external again, viz. (G18)
everything
not the path from and back to the third-level operator being completed now by a first- and not, as in (G17), a second-level operator. Now in (30) the convergence of scope is not upon two repetitions of a first-level operator, but upon what might be called the first-level part of a third-level operator, so the nodes of that part must be duplicated, just as the N-node of the quantifying expression was in (G8a): The duplication of nodes in the hypograph representing the third-level (G19)
not operator here is not occasioned by its combination with proper names instead of with second-level operators. Exactly the same necessity would have arisen had its operands been the latter. Perhaps enough examples have now been given to allow some generalizations about categorial graphs. They make two innovations, neither of which is to be found in previous uses of graphs to represent meaning. First, linguistic expressions are represented by sub-graphs
Scope at higher levels
167
according to their categories; in the system used in this chapter, the only such sub-graphs which consist of a single node are those representing expressions of category N. Second, the sub-graphs are combined into a single graph by means of a scopeway, which incorporates the standard notion of scope but also two extensions of it, back scope and internal scope. By specifying the hypographs and scopeways in more detail, we can in effect define, albeit informally, a categorial graph grammar. Hypographs may be constructed mechanically from category labels. They contain a node for each letter in the latter; if the letter be a, then it is an a-node. An edge connects one node to the next, except when parentheses occur in the category label; in that case, the edges branch, to nodes corresponding to the first letter of each item in the list in parentheses. The root of the scopeway is the first node of the main operator, hence always a node of the same category as the whole expression represented by the graph (so far always S). If we think of each of the nodes of a hypograph as having a level, starting with the level of the category label as a whole and going down to level 0, then internal scope proceeds alternately, starting with the first node to the next level down, but not from that level to the subsequent one. Thus with an expression of the third-level category S(S(S(N),S(S(N))9 we shall have two directed edges branching from the first S-node to two further S-nodes, and from each of these an undirected edge to another pair of S-nodes, followed by a directed edge each to an N-node, that is, External scope proceeds from one node to another of the same (G20)
N
S
S
S
S
S
N
category, the edge being directed and labelled with the category in question. The scopeway must form a tree connecting every node of the graph and terminating in nodes from which there is no internal pathway. Finally, where more than one path converges upon a node, that node is to be duplicated (triplicated, etc.) so that the in-degree of each representation of the node is 1. Moreover, if the converging edges are S-edges, every subsequent node and edge within the hypograph is to be duplicated (etc.). Duplicated nodes are to be connected with dotted edges, indicating that they are to be considered representations of the same node. These are, of course, instructions for creating categorial graphs once one has already decided upon the structural analysis which is appropriate to the meanings of the expressions under consideration. If it be asked, now, how we arrive at these analyses in the first place, the answer must be
168
Categorial graphs
that, at this stage in the investigation, they are reached by reflexion upon and argument about the meanings of the expressions in question. At a later stage, when a corresponding theory of syntactic structures for a given language has been constructed, and a mapping from categorial graphs to these syntactic structures defined, the time would have come to define a converse mapping, allowing sentences of that language to be translated automatically into representations of their meanings. 4.5 SUB-CATEGORIZATION A system of semantic representation ought, inter alia, to demarcate the border between sense and nonsense. The way in which this is done by a schema/operand or operator/operand system has already been explained in outline: any representation of a semantic structure should have a meaning and, conversely, it should be possible to represent a semantic structure for all and only those expressions which have a meaning. This is the ideal, but it is a goal which can only be approached by slow stages. Thus the assignment of propositions and proper names to distinct categories excludes some forms of nonsense, but still lets in others. Moreover, the relationship between semantic and syntactic structures has not been spelled out, so that it is not fully determinate what our semantic representations represent at the syntactic level. Still, if the system could be carried through in detail, it would eventually yield an absolutely sharp separation of sense from nonsense. In everyday language, however, the borderline between sense and nonsense is not sharp. Quite often an expression which, at a first hearing, one is inclined to dismiss as nonsense, can be construed in the light of some rather unusual background circumstances. The literary context is also relevant: an expression which might be deemed nonsense if it occurred in an academic work may be quite acceptable in a fairy tale or in a poem. Moreover, languages change over time. New words are coined, and old ones are used in new ways. A system of representation for everyday language must, accordingly, have a certain flexibility. Even if it does not actually cater for these phenomena, it must be capable of extensions or modifications to handle them. A schema/operand theory can certainly be adapted to some kinds of linguistic change. New expressions can be assigned to categories as they come into use. Old expressions which change their meanings can be reassigned to a different category if necessary. If it were required, categories to which no expressions had hitherto been assigned could be brought into use. It would even be possible to introduce new basic categories, although a priori it is extremely unlikely that everyday language ever develops in a
Sub-categorization
169
way which would demand this, because of the upheaval which it creates throughout the whole system. Finally, now that provision has been made for complex schemas, new simple schemas can always be defined by means of complex ones. But when all this has been said, it still remains that the theory will draw a sharp boundary between sense and nonsense at any one time. It answers to our intuitions that some expressions should be judged irredeemable nonsense. Ajdukiewicz cites 'the sun if, 'the sun green' and 'the sun although' as examples (1935; 1967, p. 637). But these would also be deemed syntactically incoherent, as, too, Chomsky's example 'Furiously sleep ideas green colourless' (1957, p. 15). By contrast, both Wittgenstein and Chomsky noticed a kind of nonsense which preserves an illusion of sense, of which Lewis Carroll's poem 'Jabberwocky' and Edward Lear's nonsense-rhymes are sustained examples. Wittgenstein said that expressions of this kind have a Satzklang, the ring of a sentence or proposition. Chomsky has wavered over them. Of his first and most famous example of this genre, 'Colourless green ideas sleep furiously', he said that it was nonsense but syntactically coherent. Later, though, citing a number of examples of which 'the dog looks barking' is one, he deemed them syntactically incoherent, yet not so badly incoherent as, for example, 'looks dog barking the' (1965, p. 76). This shows how extremely unreliable our intuitions are unless the question is whether an expression is both semantically and syntactically coherent. Yet if it is possible for an expression to be, on the one hand, semantically incoherent but syntactically coherent or, on the other, semantically coherent but syntactically incoherent, then we need some way of distinguishing between the two cases. If there were a sharp boundary between either kind of coherence and incoherence, intuition might be a better guide, but is even more enfeebled if we have to recognize degrees of incoherence at both levels. It is then not at all unreasonable to let the system of representation decide borderline cases, so long as it does not produce outrageous results. Moreover, it seems possible at least to cite examples in which the fault lies on the side of meaning rather than of syntax, and these might provide a starting point. For instance, Wittgenstein's 'The chair is thinking to itself: ". . ."' is a likely candidate, because we can imagine circumstances (for example a fairy story) in which we might accept it without demur (1953, 1.361). Intuitively, the kind of objection which we should feel to this phrase in a normal conversation is quite different from that which we should feel against 'These my hairs they need washing'. The problem about Wittgenstein's example is how we are to make sense of it, if at all. By contrast, we understand quite well what a foreigner who produces the
170
Categorial graphs
other example is trying to say, although that is not the way we put it in English. Prima facie, Wittgenstein's example does not have a meaning but is syntactically coherent, whereas the other has a meaning but is not syntactically coherent. It should be possible to pin-point the difference between these examples if we try to say exactly what is wrong with each of them. First, then, The chair is thinking to itself: ". . ."' will not do because chairs are not the kind of things that can think. A passage in Frege suggests a closely related example, 'Julius Caesar is prime' (see 1884, section 56), which we might reject on the ground that only numbers can be prime, not men. The objection to These my hairs they need washing' is totally different in character. Hairs are very much the kind of things that can need washing, and the difficulty does not lie in an incongruity of meaning between noun and verb, both of which remain when we have made the necessary correction, 'My hair needs to be washed'. In the first two cases, the trouble cannot be put right by some kind of rearrangement of the words nor by modifications to their inflexions. If it is to be put right at all, what is required is a new way of understanding the meanings of the words which leaves the expressions just as they stand, for example imagining intelligent beings which have all the physical properties of chairs and so were called chairs, or an extended sense of 'prime' which would be applicable to humans. Alternatively, either the verb or the noun must be replaced by a different one. Computing science has a well-chosen term for the incongruity which we sense in these examples drawn from Frege, Wittgenstein and Chomsky: a mismatch. 'Caesar' does not match 'is prime', 'the chair' does not match 'is thinking' and 'barking' does not match 'looks'. It is clear that our system of representation as it stands does not exclude such mismatches, for if proper nouns like 'Julius Caesar', numerals like 'two' and phrases consisting of the definite article followed by a count noun, like 'the chair' and 'the professor', are all expressions of a basic category, they must all be assigned to the same category, N9 since no one will argue that any of them is a proposition. Then, since Two is prime' and The professor is thinking' are propositions, 'A is prime' and 'A is thinking' will both have to be assigned as schemas of category P(N). It follows that 'Julius Caesar' is a legitimate substitution for M' in the first of these schemas and The chair' in the second. Within the schema/operand system, there are two ways of blocking these mismatches. There may be independent evidence that some of the expressions assigned to category N do not belong to a basic category. Reassignment may then block the mismatches but will not necessarily do so. Thus, although 'the chair' and 'the professor' are proper names according
Sub-categorization
171
to Frege, they were held to be quantifying phrases by Russell (1919, chapter 16). Re-assignment to category P(P(N)), in accordance with Russell's view, does not block the mismatches, since a schema of that category combines with one of category P(N) as its operand. The other method is to divide category N into two or more new categories, but this can easily be too effective, blocking combinations which are not mismatches. Suppose we were to assign 'the professor' and 'the chair' to different categories, say X and Y respectively. Then, on the basis of The professor is thinking', yA is thinking' is assigned to the category P(X). We can now no longer substitute 'the chair' for *A\ and that immediately rules out any minor modification to the system which would accommodate 'The chair is thinking to itself: ". . ."' in the fairy tale. It also has other awkward consequences. How shall we now categorize the operator 'was moved into the next room' in 'The chair was moved into the next room'? If as P(X)9 then we cannot replace 'the chair' by 'the professor'. Perhaps, then, we also assign 'was moved into the next room' to category P(Y). But this entails that it has two different senses. That might not be unreasonable if 'The professor was moved into the next room' meant that he was asked, or even required, to change the room which he occupied, say in his university or in an hotel. However, he could also have been moved into the next room just like a chair, for example supposing that he had had a heart-attack or was bed-ridden. Not even a prima facie case can then be made out for regarding the operator as ambiguous in the two propositions. Moreover, if every mismatch is to be blocked by the creation of new categories, we should rapidly develop an enormously complex category system, especially bearing in mind the whole range of new schema categories which is generated by the introduction of a single new basic category. Thus multiplication of categories is not a panacea for all mismatches, although it might be the appropriate solution in a particular case. Some new way of distinguishing different kinds of expression is needed, which will be local rather than global in its effect and so can be modified without far-reaching results which are difficult to control. To this end, we can adapt the linguists' device of sub-categorization, which was mentioned in chapter 1. There, however, we were concerned with its use to specify the degree of a verb; here, that is irrelevant, since the degree of a verb is always reflected in its category. The relevant starting-point is, rather, sub-categorization of basic expressions, though, as with subcategorization of verbs, the method is to designate a number of features and assign basic expressions to sub-categories in accordance with whether they possess each feature or lack it.
172
Categorial graphs
So, for example, we might designate 'numeral' as a feature and assign 'two' {qua noun) to category N9 sub-category NM (for 'numeral'). It will be useful to have a term distinct from 'category' to indicate a classification by category and sub-category, so let us call this the type of the expression. Now if we want to prescribe that only proper names with the feature NM may be substituted for a given schematic symbol in a schema, all we have to do is to add that requirement to the key to substitution supplied with the schema. Thus, for the schema 'A is prime', we say that any proper name which has the feature NM may be substituted for iA\ For the schema 'A barked', we can do the opposite, requiring that any proper name substituted for 'A' shall lack the feature NM. Suppose, now, that proper names other than numerals are further subcategorized by means of a feature PS (for 'animate'), with 'the professor' having that feature but 'the chair' lacking it. Then, for the schema 'A is thinking', we say that any proper name having the feature PS and lacking the feature NM may be substituted. Thus the first schema would accept 'the professor' but not 'the chair', while the second would accept both, but neither schema would accept 'two'. What, then, if we encounter an example, like Wittgenstein's, which contravenes these substitution rules? If the category of the putative operand is wrong, we can reject the expression as irredeemable nonsense. But if only the sub-category is wrong, we could regard the substitution as forcing a feature onto the operand in that context which it does not normally have and, if the sub-categorization features are well chosen, this could be an excellent way of bringing out the implications of regarding the expression as having a sense. Moreover, if a permanent change of this kind occurred in a language, it would be a simple matter to alter the subcategorization of an expression without wide-reaching effects. Even the set of sub-categorization features could be varied without disrupting the entire system. One can imagine that this might be an appropriate way of handling linguistic change consequent upon the development of machines in the last hundred years and the increasingly important role which they play in everyday life. In principle, sub-categorization could also be extended to higher-level schemas. Thus, for the schema 'many dogs:x (Fx)' we could prescribe that substitutions for 'F' be restricted to operators whose operand must lack the feature NM, thus blocking 'is prime' and other operators of category P(N) which can take numerals as operands. A further interesting possibility latent in sub-categorization is that there could be some schemas which yield expressions of the same category as their operands, but of different type because they change the sub-category. Supposing
Sub-categorization
173
that the schema 'the birthplace of A' were assigned to category N(N) and we had a sub-categorization feature LC (for location'), then it could be specified that substitutions for 'A9 must lack the feature LC but possess the feature PS, but that the expressions produced by this schema would possess the feature LC but lack the feature PS. It is difficult to devise a concise notation for sub-categorization features. To be sure, we can use ' + ' and ' —' to indicate that an expression possesses or lacks a given feature respectively, for example 'NM + ' for a numeral and 'LC —' for a proper name which is not the name of a place. But if sub-categorization features are logically independent, they produce a cross-classification, so that for n features we get 2n sub-categories. If, as a notation for types, we use category name/ feature 7,. . ., feature n, then the two features NM and LC will divide the category N into four sub-categories, N/MN+ ,LC+, N/NM+ ,LC-, N/ There will then be 2 to the power of 2n NM-,LC + , N/NM-XC-. ways in which substitution into a schema of category P(N) can be restricted, that is, sixteen in the example. One of these can be ignored, the case in which every sub-category is excluded. At the other extreme, where every sub-category is included, the key to substitution in the schema is at its simplest. That leaves fourteen other possibilities, eight of which are comprised of specifications that substitutions for 'A* must be or, alternatively, must not be, of one of the four types listed above. An example of the remainder is that an expression either of type N/ NM+ ,LC- or of type N/NM- ,LC + may be substituted for kA\ It will be clear from this that keys to substitution in schemas could soon become extremely complicated, for a schema of category P(N) is, after all, the simplest possible kind of schema. In fact, the two features cited in this example are not logically independent. If a numeral were used as the name of a place, then it would no longer bear the same sense as it does when used as the name of a number. Take a map grid reference, for example: you cannot multiply grid references, or subtract one from another so as to obtain a negative result. So the question whether a proper name is the name of a place or not only arises once we know that it is not a numeral. This means that the features NM and LC are ordered, the cross-classification of subcategories being reduced to a tree classification in this example. The first two of the four original types then merge into one, which we could represent simply as N/NM+, while the other two could be given just as N/LC+ and N/LC— provided that it is specified separately that both LC+ and LC- entail NM-. This would materially simplify keys to substitution in schemas, and the ordering of features could additionally be used as a basis for grading
174
Categorial graphs
violations of substitution restrictions as more or less serious. If PS subclassifies NM-, for example, it would be more nonsensical to substitute a numeral into the schema 'A is thinking' than 'the chair', or - to put it the other way round - more difficult to construe the expression resulting from the former substitution than that resulting from the latter. So subcategorization could also provide a way of grading nonsense which answers to our intuitions. The stronger the illusion of sense produced, the less serious should the sub-categorial violation be. Candidates for the administrative grade of the civil service, for example, are set a test consisting of sentences in which two words have been interchanged in such a way that the resulting expression almost makes sense and the interchange is quite difficult to spot. A good theory of meaning should exhibit such examples as minimal violations of its rules. Where sub-categorization features are logically inter-related, the whole system of features for a given category will have to be introduced at a single blow. New features cannot be added as an afterthought any more than new basic categories, for, although local in their effects, any one feature will interact with the remainder in the system for that category. Moreover, although the system of features for a category cannot be decided until the category itself has been determined, the possibilities afforded by sub-categorization have to be borne in mind when deciding upon category distinctions, because it offers a way of distinguishing between kinds of expression which is less radical in its effects than a categorial distinction. On the face of it, sub-categorization is a means of continuing the classification of kinds of expression which is simply less pervasive in its effects than full categorization, a device for making minor as opposed to major distinctions. Yet more may be going on here than is immediately apparent. Some examples of sub-categorization features (though not all) suggest a different kind of classification from that exemplified in categories treated so far. A case in point is the distinction between proper names of animate (PS+) and inanimate (PS—) things. One may readily suppose that expressions of type N/PS + would require further sub-division in accordance with a feature human, since there are certainly some propositions in which names of humans occur which would not retain a sense if, say, names of plants were substituted instead. But that might be just the beginning of a classification mirroring (though doubtless in much less detail) the botanist's and zoologist's classification of the plant and animal kingdoms. Now a sub-categorization of this kind contrasts very sharply with the categorial distinction between propositions and proper names and of the schemas built upon them. The latter does not have any particular subject-
Sub-categorization
175
matter, whereas the former is akin to a scientific classification of specific phenomena. The question is, therefore, whether the considerations which prompt sub-categorization are not, at base, an argument for a double classification in semantics, one which is closer to syntactic categories in having no restriction of subject-matter, and another which is closer to scientific categories in dividing a particular subject area. If that is so, perhaps both systems of classification should be given equal status, rather than one being made subordinate to the other, and their interactions charted. Long ago, Trier (1931) urged the necessity of distinguishing 'fields' in semantics, that is, broad areas of subject-matter, and of investigating their sub-divisions and boundary changes over time. More recently, Dixon (1989, pp. 92, 94) has claimed that some thirty semantic types would be required for this purpose, of which about twenty would be used, inter alia, for classifying verbs. At the present stage, this must remain just a suspicion, for we have far too little evidence to take the matter further. That suspicion would, however, be strengthened if we were to find that some of the same subcategorization features were required for different categories. For it would be an acknowledgment that the features were not just serving to make minor distinctions within a category, but were effecting transcategorial distinctions; and that would be the bud of a parallel system of classification.
Basic categories: count nouns
5.1 FREGE'S CATEGORIZATION OF COUNT NOUNS In previous chapters, I have confined myself either to making explicit ideas which, in my view, are implicit in Frege's ideography, or to extending them. Rather little attention, accordingly, has been paid to determining the basic categories of everyday language. At first I simply went along with Frege in providing only one basic category, that of proper name, but subsequently divided it, following Ajdukiewicz, into a category of propositions and another of proper names. But even that can be regarded as primarily a matter of expository convenience, for the question of the number and kinds of basic category is independent of the structural system. That is to say, a commitment to Frege's ideography to the extent of acknowledging a schema/operand structure in everyday language, organized by scope and incorporating differences of polyadicity and level between schemas, does not of itself tie us to any particular choice of basic categories. To that choice we must now turn; it will lead to a substantial disagreement with Frege. I shall take it as established that we require a distinct basic category of propositions, the expressions whose meanings we wish, in the first instance, to represent.1 That leaves for our immediate consideration the remaining expressions which Frege called 'proper names'. Although a few philosophers and logicians have offered alternative characterizations of If we think of a proposition as the semantic correlate of the syntactic notion sentence, then propositions would have to include a correlate of complementizers in transformational grammar, to indicate semantic mood. Many subordinate propositions, for example the antecedents of conditionals, would then not strictly be propositions, but propositions stripped of their mood. Alternatively we could regard a proposition as not having a mood; it would then be the correlate of /" (the old S) in transformational grammar, and we should need a new term for proposition plus mood. In chapter 1, I avoided complications arising from different complementizers by restricting myself to indicative examples. That will not serve here, because I cannot avoid examples containing subordinate propositions; but I shall duck the issue nevertheless, in the interest of simplicity, while acknowledging that my analysis is consequently incomplete.
Frege's categorization of count nouns
177
these expressions or included further expressions in the same category, most have fallen into line behind Frege.2 His position has been commended above all by the success of quantification theory in giving a correct account of the logical relationships between propositions containing more than one quantifying phrase, because taking proper names to be expressions of a basic category seemed essential to the structural analysis by which quantifying phrases were assigned as expressions of second level. Yet, because a certain structural analysis of propositions yields the correct logical results, it does not follow that it is a wholly satisfactory representation with respect to meaning. There may be still better alternatives. We should indeed be foolish to reject it on that score alone, but if its acceptance, while yielding a neat account of some expressions, raises difficulties for others, there might be good reason to look for an alternative. Any such alternative would most certainly lack credibility unless it preserved gains achieved by the original; thus, today, no one could be expected to entertain proposals which sacrificed quantification theory.3 Frege's category of proper names marked a sharper break with logical tradition even than his introduction of scope. To be sure, no categorial distinction is so firmly entrenched in the history of philosophy and of linguistics as that between a noun or name and a verb. It goes right back to the claim of Plato {Sophist, 261C-262E) and Aristotle (De interpretation, 1-3) that a sentence consists of an ovoixa and a /3i)jua, the latter carrying the tense; and the distinction between adjective and adverb is built upon it. It still lies at the heart of logic and of formal linguistics; since Frege, the issue has been where the line between noun and verb should be drawn, not whether it should be drawn. Linguistics has for the most part remained faithful to the older tradition, with a category of noun sub-categorized into proper versus common, count versus mass and concrete versus abstract. Frege, on the other hand, took 2 3
I shall review some of the exceptions later in this chapter. It is for this reason that my survey will not include Sommers (1982). Sommers attempts to revive 'traditional' logic, and so has no place for quantification theory. He maintains that traditional logic, supplemented by a theory about pronouns which he propounds, can give a correct account of the logical relationships of propositions containing more than one quantifying phrase. However, he cites but one example (pp. 144-7) and makes no attempt to prove that the method he employs could be generalized. Moreover, the method involves decomposing propositions containing more than one quantifying phrase into sentences none of which contains more than one; these are used as distinct premisses in proofs, but contain pronouns with cross-indexations to quantifying expressions in other premisses. At best this is extremely clumsy by comparison with first-order logic; it also prohibits specification of the truth conditions of the original proposition as a whole.
178
Basic categories: count nouns
proper nouns as the core of his category of proper names, while he disposed of the other kinds of noun in a variety of ways. Abstract nouns were in general to be eliminated by paraphrase. His view of mass nouns is uncertain, as he hardly ever instanced them in examples of later work, but it is possible that he regarded them as proper names. Count nouns, however, he assigned as the operators of first-level monadic schemas, thus treating them as semantically on a par with intransitive verbs. This is the point upon which he made a decisive break with logical tradition, and whose consequences, accordingly, we should examine, with the aid of one of his examples (using the linear notation). He tells us that, in general, propositions of the form 'Every X is a P' may be represented by; (IF)
everything^ (if (X(x), P(x]
(1879, section 12)
and he cites as an instance of it (1)
All mammals have red blood,
where we are to substitute 'is a mammal' for 'X' and 'has red blood' for 'P'. Frege justifies this by observing that (1) could be paraphrased by 'Whatever is a mammal has red blood' or, again, by 'If anything is a mammal, then it has red blood' (1892, pp. 197-8). He thus held that count nouns, by contrast with proper names, are essentially predicative. Thus one can assert of something, for example, that it is a mammal, the word 'is' serving here 'as a mere verbal sign of predication', whereas, if one asserts of something that it is Alexander the Great, the 'is' abbreviates 'is no other than' and is analogous to the equals-sign in mathematics (1892, pp. 193-4). His analysis of (1) thus brings out the predicative nature of 'mammals', which is obscured by the English syntactic form but can be revealed by paraphrase. The analysis also permits of an exactly symmetrical treatment of, syntactically speaking, the subject-term ('mammals') and the predicate-term ('have red blood') of the proposition, both being assigned to the category S(N). Frege's analysis has the further merit of simplifying the treatment of commonly encountered patterns of inference into which propositions like (1) may enter. Thus the simplest way of obtaining a proposition from the schemas '£ is a mammal' and '£ has red blood' is to substitute a proper name for the schematic symbol '('. Suppose that we do this, using the proper name 'Fido' to yield: (2)
Fido is a mammal.
If (1) and (2) are now taken as premisses of an argument, the conclusion 'Fido has red blood' will follow validly and, on Frege's analysis of (1), the
Frege's categorization of count nouns
179
pattern of inference involved is easy both to show and to justify. Frege himself formulated an axiomatic system of inference in Part II of Begriffsschrift (Frege, 1879), in which the only rule is modus ponens and everything else is done by substitutions in the axioms. Axiomatic systems are not very intuitive, and deductions carried out in them are often far removed from the way in which we should ordinarily argue from the same premisses to the same conclusion. It will be clearer, therefore, to present inferences as the application of rules representing minimal steps in argument, of the kind discussed in section 2.2. A rule-schema for the elimination of 'if, modus ponens, was cited there (SI) and will be needed again here: (If-)
if(P,Q) P
The premisses of the rule are set out on separate lines and the conclusion to be drawn follows a horizontal line. When applying the rule in a deduction, the horizontal line is omitted but each line of the deduction is numbered and the rule by which each line, apart from the premisses, is obtained, is cited at the end of the line together with the lines which have been taken as premisses of the rule. In order to illustrate this for the present example, we also need a rule for eliminating 'everything', which can be set out as follows: (V - )
everything:x (F(x)) F(a)
where 'F(a)' is any proposition containing one or more occurrences of a proper name 'a' (It is presupposed throughout that each proper name has just one bearer). The derivation may then be given: [1] [2] [3] [4]
everything:x (if (is a mammal (x), has red blood (x))) if (is a mammal (Fido), has red blood (Fido)) is a mammal (Fido) has red blood (Fido)
Premiss E-, 1 Premiss If-, 2,3
This corresponds, fairly closely, to the following form of words in which one might spell out the argument. First, if it holds of anything that, if it is a mammal, then it has red blood, then this holds of Fido in particular, that is, if Fido is a mammal, then Fido has red blood. The second premiss, however, states the antecedent of this conditional, and so the consequent follows, which is the required conclusion.
180
Basic categories: count nouns
Similar considerations apply to the types of argument studied in Aristotle's syllogistic. Thus, if the argument be (3) (4) (5)
Every mammal has red blood Nothing red-blooded dwells in the sea Ergo: No mammal dwells in the sea,
its validity can again be demonstrated by appeal to a small number of rules, each of which is of general application. Frege represented propositions of the form 'No X is a P', of which (4) and (5) are instances, in essentially the same way as those of the form 'Every X is a P', the only difference being that the predicate-term is negated, that is, (3F)
everything:x (if (X(x), not (P(x))))
(1879, section 12)
In addition, two more rules are required in order to deduce the conclusion. First, a rule for introducing 'everything', because the strategy of the deduction will be to eliminate 'everything' from each of the premisses, use propositional logic to manipulate the results, and finally to restore 'everything' in order to form the conclusion. The rule for introducing 'everything' is simply the converse of that for eliminating it, but with a restriction imposed upon its use, for it would clearly be invalid if we could quite generally infer from a particular instance of some property to everything having that property. We therefore stipulate: (V +) F(a) everything:x (F(x))
provided that 'a' does not occur in any premiss or assumption upon which 'everything:x (F(x))' depends.
The other rule required is a propositional one, for introducing 'if, and is a rule-thema, that is, one which allows us to replace one (or more) arguments by another argument. Suppose, then, that we have an argument whose premisses include 'P' and whose conclusion is 'Q'. The rule allows us to replace this by an argument from the other premisses (that is, without 'P') to a conclusion 'if P, then Q'. In order to be able to use a thema like this in the same system of deduction as we have already illustrated for schemas, it has to be dressed up to look more like a schema:
Q if(P,Q)
Frege's categorization of count nouns
181
The vertical line and the vertical dots, however, give the game away. The dots show that we have an argument as the 'premiss' of the rule, while the vertical line and indentation show that it is a subordinate argument. This is also the explanation of the phrase 'premiss or assumption' in the restriction placed upon the V+ rule: 'P' in If 4- will not be a premiss of an argument in which that rule is used, but it will be a temporary assumption, introduced 'for the sake of argument' and discharged upon use of the If 4rule. This presentation of deductions is based upon Fitch (1952), where further details may be found. The deduction may now be set out as follows: [1] [2]
[3] [4] [5] [6] [7] [8] [9]
everything:x(if (mammal(x), has red blood (x] Premiss everything:x(if(has red blood(x),not(dwells in the sea(x] Premiss mammal(Moby Dick) Assumption if(mammal(Moby Dick),has red blood(Moby Dick] E —, 1 has red blood(Moby Dick) If-, 3,4 if(has red blood(Moby Dick),not(dwells in the sea(Moby Dick] E —, 2 not(dwells in the sea(Moby Dick] If—, 5,6 if(mammal(Moby Dick),not(dwells in the sea(Moby Dick] If +, 3-7 everythingix (if (mammal(x), not (dwells in the sea(x] E +, 8
The restriction on V+ is observed, since line 8 does not depend upon line 3 (the only premiss or assumption containing 'Moby Dick'), but only upon lines 1 and 2. The essential work in this deduction is done by the two rules for 'if and thus that pattern of syllogistic argument is justified by the much more general principle of the transitivity of 'if. This illustrates a pervasive feature of Frege's logic, to which his treatment of count nouns makes a major contribution, that it re-unites the logic of propositions and the logic of predicates or concepts, which had been separate branches of traditional logic. Naturally this unification of logic together with the possibility of justifying syllogistic patterns of argument by appeal to more basic steps have been feathers in Frege's cap and have commended his categorization of count nouns. A final advantage which has been claimed for treating count nouns as the operators of schemas is that it dissolves the traditional philosophical problem of universals (see Dummett, 1973, pp. 174-9). That problem is not raised solely by count nouns but, as only they concern us here, I shall not go further afield for examples. Briefly, then, the problem is how a property (universal) can inhere in different individuals (particulars): if, in the proposition 'William is a priest', 'William' names an individual and 'priest' names a property, viz. priesthood, it appears that the proposition only succeeds in listing individual and property, without binding them together. And if it be replied that the binding element is the copula 'was',
182
Basic categories: count nouns
then we still require an explanation of what, precisely, it effects: how (in what manner) does it inter-relate individual and property? But if, by contrast, 'priest' is the operator of a schema '( is a priest', of category S(N), whose meaning is not exponible by separating the copula from the noun, then no 'linguistic glue' is required to hold 'William' and the schema together in the completed proposition. Moreover, if we do not allow ourselves to say that the schema names the property of priesthood (except, perhaps, as a kind of shorthand), but say instead that it signifies being a priest, then our very turn of phrase mirrors, so far as is possible without invoking devices like schematic symbols, the incompleteness of the schema. Frege, of course, wanted to go further than this and claim that schemas signify functions of various kinds, but his resolution of the problem of universals seems not to depend upon regarding them as function names. It is enough that we do not regard a property as an abstract object but, rather, as separable from an individual only in thought. Thus the property of being a priest is not isolable in some noetic heaven, but is only to be found in individual priests, such as William, and the proposition 'William is a priest' signifies William being a priest. There is a close parallel between this notion of a property and Aristotelian forms, which cannot exist separated from matter and are also invoked in order to explain the meanings of count nouns. So perhaps Frege's dissolution of the problem of universals is original more in its premisses than in its conclusion. 5.2 DIFFICULTIES IN FREGE'S VIEW
The main difficulty in Frege's assimilation of count nouns to intransitive verbs is posed by negation. If the subject-term of (3) is, from a logical point of view, an expression of category S(N), then it should make sense to negate it. In terms of the paraphrase offered by Frege, given that 'If anything is a mammal, then it has red blood' makes sense, 'If anything is not a mammal, then it has red blood' must also make sense. But how are we to 'convert' the latter into a syntactic form comparable to (3)? The only possibility which suggests itself is (6)
Every non-mammal has red blood.
It is natural to suppose that this means that every animal that is not a mammal has red blood, if only because having red blood or not having red blood cannot be sensibly attributed to plants or to inanimate things. But this is not how Frege would have us interpret it. He is quite serious about the 'anything' in his paraphrase of (3): it is to mean absolutely
Difficulties in Frege's view
183
anything, with this proviso only, that it must always be possible to give any of the things in question a proper name. The reason for this requirement is that the simplest way of obtaining propositions from the schemas '{ is a mammal' and '( has red blood' is by substituting proper names for the schematic letters; hence when either is taken as the operand of'everything' or 'anything', the things in question must be such as could bear proper names. And these, Frege calls 'objects'. It is perhaps a relatively trivial objection that there is no natural way of representing a negated subject-term in English syntax so that the proposition will be readily understood as Frege would wish. For the prefix 'non-' customarily negates only part of the meaning of the noun to which it is attached. You would not call your dog a non-Catholic, nor your motor car a non-juror. A non-Catholic is normally understood to be a Christian who is not a Catholic, or at least & person who is not so, while a non-juror was a beneficed clergyman in Britain who refused to take the oath of allegiance to William and Mary. In general, the prefix 'non-' is applicable to nouns whose meaning can be expounded in the form 'A that F', so that, if'B' be such a noun, 'non-B' will mean 'A that not F'. 'A' will always be another count noun, 'F' some verb phrase which may or may not contain a noun. Thus the natural interpretation of (6), mentioned above, construes 'non-mammal' as equivalent to 'animal that is not a mammal'.4 An apparently slight extension to this account of 'non-' will accommodate Frege's analysis. If the meaning of a count noun 'A' can be expounded as 'B that F', 5 then we can ask of 'B', in turn, whether its meaning can be expounded by a similar phrase. If so, then the question may be repeated; but eventually we shall reach the end of the line and the answer will be 'No'. Frege's position would then be that the end of the line will always be 'object', whatever noun we start with. That is to say, 'If anything is a mammal, then it has red blood' means, more exactly, 'If any object is a mammal, then it has red blood', for, as we have seen, 'anything' does not mean literally anything (for example patience), but only what can be the bearer of a proper name. So the contention will be that any count noun can be expounded as meaning 'object that F', but that 'object' itself cannot be expounded in this manner. Hence, although we tend to construe expressions consisting of a count noun prefixed by 'non-' in a more restrictive way, that is, taking a much less general term 4
5
These expressions can often be used as adjectives as well as nouns, and for that reason are also called 'absolute adjectives'. Another example is 'socialist'; non-socialists can be counted because a socialist is a person who holds socialist principles, whereas a nonsocialist is still a person, though now not holding socialist principles. Pace Geach, whose view was noted in section 4.3; I shall expound Geach's reasoning and reply to it in section 7.2.
184
Basic categories: count nouns
than 'object' as the 'B' of 'B that not F', Frege's analysis merely pushes this interpretation to the limit and we could quite legitimately specify that 'non-' is to be so understood. If 'object' is the ultimate count noun, then at least we must record that it appears to be a very different kind of noun from 'animal', 'mammal' and so on. On the face of it, the most general count noun in the series 'mammal', 'animal', . . ., would appear to be 'body', in the Newtonian sense in which a body must have extension and mass and occupy a place, and may be animate or inanimate but is always concrete and never abstract. Yet, on reflexion, we may hesitate to commit ourselves so far. Is 'body' specific enough to give us a criterion for counting? Suppose, for example, that I point at the desk at which I am now writing and ask: 'How many bodies are there here?' It so happens that the top of the desk will lift off the frame containing the legs; moreover, under the lid there are four drawers, each of which will pull out. Is this then one body, or six, or perhaps as many as there are pieces of wood in the desk, for why should the fact that some pieces are permanently attached to each other while others are not, mark a difference of principle for counting? While we are about it, too, we must not forget the piece of leather on the writing surface, nor the hinges, nails and screws. By this time it should be clear that there is no determinate answer to our question, by contrast with such questions as 'How many desks are there here?' or 'How many pieces of wood are there here?' In other words, 'body' (in its Newtonian sense) is not a count noun, and so cannot serve as a substitution for 'B' in 'any B that F \ The 'end of the line' is already reached at a much more specific point, probably with 'organism'. In philosophy today, 'object' (sometimes 'material object') is used as equivalent to 'body' in the Newtonian sense. The Fregean sense of 'object' is much wider: every body would indeed be a Fregean object, but not conversely, since Frege allowed abstract objects such as directions and numbers, explicitly inveighing against those who would restrict objects to the perceptible (for example 1893, Introduction, p. xiii). He even laid down a procedure for the creation of new objects (more exactly, new types of object), which was primarily intended to cater for abstract objects (1884, sections 62-9; 1903, sections 138-47). In view of all this, there is a substantial difficulty in attaching any meaning to phrases like 'any object that is not a mammal', for how are we to determine of anything whether or not it is a non-mammalian object? In asking whether it could be given a proper name, we cannot appeal to a predetermined stock of expressions which are available as proper names, but have to conjure with the possibility of defining new ones with the aid of abstract nouns. In addition to this, to speak oVevery object that is not a mammal'
Difficulties in Frege's view
185
implies a totality of non-mammalian objects, which would be even more impossible to specify (see Wittgenstein, 1969, II, section 7). Even without the complication introduced by Frege's procedure for creating new objects, though, 'every object that is not a mammal' would still be uninterpretable. The reason for this is that 'everything that can be given a proper name' remains indeterminate even supposing proper names to be restricted, say, to proper nouns. Thus, given a pile of shoes, I might assign a proper name to each shoe, or to each pair of shoes, or perhaps one name to each shoe and another to its lace (if it had one), and so on. The totality of objects which are not mammals would differ from one type of baptism to another. Moreover, Frege would be the first to agree: he was always insisting that objects fall under concepts, so that shoe, pair of shoes and shoelace (or pair of shoelaces) would all be bona fide concepts in his book, each determining a corresponding type of object. Indeed, in his view a set can only be introduced as the extension of a concept, and a totality can be seen as a set. The difficulty is not removed even by imposing the still more drastic restriction that only bodies are to count as objects, thus excluding all abstract objects, for 'body', too, does not provide us with a principle for counting. 6 Frege's use of unrestricted quantification is thus inconsistent with his own principles. Indeed, there is worse, a vicious circularity at the heart of his system. For he attempts to take both proper names and count nouns as basic, and yet simultaneously to make each, though in a different way, dependent upon the other. In his structural analyses, as we have seen, proper names are basic and count nouns dependent. Count nouns are represented by schemas which, when one proper name is supplied, will yield propositions, that is, by schemas of category S(N). But, at the same time, the bearers of proper names are objects and objects are countable, which demands that they be describable by some count noun, since we must be able to answer the question: 'One (two, etc.) whatT It has been disputed to what extent Frege committed himself to the possibility of spelling out in words the meaning (Sinn1) of a proper name. 6
7
Perhaps someone will object to the preceding argument: but we know that (6) is false, since we can cite some non-mammals which do not have red blood, such as squids; hence it cannot lack a sense. The reply, of course, is that the objector is tacitly assuming 'nonmammal' to mean 'animal which is not a mammal' and not trying to give it a Fregean sense as 'object which is not a mammal'. 'Sinn' in Frege's writings is customarily translated into English as 'sense', because he distinguishes the Sinn of an expression from its Bedeutung and the latter is the ordinary German word for 'meaning'. But he holds that the Bedeutung of a proper name is its bearer, which is certainly not its meaning (cf. Wittgenstein, 1953, 1.40); so what corresponds in Frege to the way in which I have been using 'meaning' is 'Sinn* and not * Bedeutung*.
186
Basic categories: count nouns
The occasional examples which he gave do at least imply, though, that one essential for understanding a proper name is to know what kind of thing it is a name for, that is, what kind of thing it would name, if it had a bearer. And this will be given by a count noun. Whether or not Frege held this view (though I think he did), it appears to be correct. We do not, perhaps, always advert to it because there are conventions that certain types of name are usually bestowed upon certain kinds of thing, for example names of people and of places. It strikes us only when we hear an unfamiliar proper name in a context from which we cannot guess the kind of bearer which it is supposed to have, and have to ask in order to be able to use it sensibly ourselves. From this it follows that, in order to explain the meaning of a proper name, we have to appeal to a count noun, which makes some count nouns, at least, more basic than proper names from a semantic point of view. A logician might attempt to extricate Frege from this vicious circle by urging that his theory makes proper names syntactically prior to count nouns, but count nouns semantically prior to proper names, and that these two theses are not necessarily incompatible. I do not think that this defence can stand. In this context, we can only interpret 'syntactically' as equivalent to 'structurally'; it does not contrast with 'semantically' as syntax contrasts with semantics when the former is concerned with the borderline between what is correct and what is not correct in a particular language or for languages as a whole (universal syntax). Ex hypothesis the kind of structure that Frege investigated was that relevant to inference and, eventually, to meaning; so an account of linguistic structure in which proper name is a basic category must imply that proper names are basic with respect to meaning. Consequently, the circularity at the heart of Frege's system is vicious, for it removes any starting-point for a theory of meaning. Enough has now been said to call into serious question Frege's application of his ideography to everyday language. By way of corollaries, three further problems which it raises may be mentioned briefly.8 The first concerns adjectives and adverbs. Traditional syntax presents adjectives to us as qualifying nouns and adverbs as qualifying verbs, so that the distinction between adjective and adverb is founded upon that between noun and verb. With the assimilation of count nouns 8
A further possible objection is spurious: that Frege's assignment of count nouns to category S(N) licenses expressions like 'Every dances man'. What it licenses is, rather, 'Everything, if it dances, is a man'; English syntax requires a count noun (phrase) after 'every', so we are allowed to turn 'Everything, if it is a man, dances' into 'Every man dances'. By the same token, then, we should be allowed to turn 'Everything, if it dances, is a man' into 'Every dancer is a man'.
Difficulties in Frege's view
187
to intransitive verbs, one would expect some revision of the adjective/ adverb distinction to be necessary, adjectives qualifying count nouns, at any rate, being categorized together with adverbs which can qualify intransitive verbs. Now on Fregean principles, adverbs which qualify verbs as distinct from whole propositions must be assigned to mixed-level categories which are partly of second level. Frege did not himself discuss adverbs or words like 'very' which qualify adverbs, so we have to ask what resources are available in his ideography for representing them. To begin, we should note that there are two kinds of adverb (see Geach, 1970, pp. 910). The first kind modifies the meanings of entire propositions, for instance 'probably', 'possibly', 'certainly'. In English these adverbs can occur just as naturally at the beginning of a sentence as just after the verb, and they all qualify the truth of the proposition in some way, saying, for example, that it is probably true, possibly true or certainly true. There seems, accordingly, to be no problem about assigning them with 'not' to category S(S).9 But there is also another kind of adverb which does not modify the meaning of a whole proposition, but only that of the verb. 'Strongly' in (7)
Fido smells strongly
is an example, for it cannot be paraphrased as: 'It is very strongly true that Fido smells'. So 'strongly' cannot be assigned to category S(S), yet, if we are to accommodate it within Frege's ideography, it must be the operator of a proposition schema. A solution to this problem is available within Frege's ideography. In the phrase 'smells strongly', 'smells' is assigned as the operator of a schema of category S(N), and 'powerfully' as the operator of a schema of category S(N,S(N)) or S(S(N),N) - the order of the operands in the category name does not matter - so that, if we supply 'smells' as the second operand of that schema, we shall be left with a schema of category S(N), since the first operand of the original schema has not been supplied. Thus the Fregean schema for 'strongly' will be 'strongly:x (>x,a)'. Frege's insistence that the composition in a proposition is between operands and the pattern in which they are arranged, the latter represented in a proposition schema, thus leads us rapidly into highly complex structures once we go beyond the categories represented in first9
This is the category to which 'necessarily' and 'possibly' are (implicitly) assigned in modal logic, in which 'necessarily />' and 'possibly /?' are formulas, that is, propositional schemas.
188
Basic categories: count nouns
order logic. Now, if count nouns belong to category S(N), adjectives which qualify them should presumably belong, with adverbs, to category S(N,S(N)) also. Yet it seems that this is not what Frege intended. One can only say 'seems' at this point, because no systematic treatment of adjectives is provided in his work. There are only minor indications, such as the citation of 'green' in a context which suggests that it is to be placed in the same category as count nouns (1892, p. 193). This, at any rate, is the procedure adopted in virtually every introductory book on modern logic nowadays. For example, (8)
There is a green suitcase in the attic
would be analysed via the paraphrase (8')
Something which is both green and is a suitcase is in the attic,
which would be represented in the linear notation by: (8F)
something:x (and (and (is a suitcase (x), is green (x)), is in the attic (x].
On this analysis, we could validly infer from (8'): (9)
Something which is green is in the attic
and, if this makes sense, so should (10)
Something which is not green is in the attic.
The difficulty previously raised with regard to (6) applies also to (10) and, in this adjectival case, even to (9), because 'green' does not provide a criterion for counting. This account of adjectives presents no new problem that is not already raised by the Fregean treatment of count nouns. It merely extends the cases in which the intelligibility of the analysis depends upon the possibility of enumerating objects as such to positive as well as negative ones. But the analysis is also suspect upon quite independent grounds. It destroys, of course, any parallel between adjectives and adverbs, since it assigns the former to the first-level category S(N). Perhaps, though, there is nothing sacrosanct about that parallel, anyway: the analogy may be just a syntactic feature which has no semantic parallel. A more serious criticism, though, which tells in favour of the analogy, is that it is semantically implausible to treat all noun phrases composed of an adjective and a count noun as disguised conjunctions. Some adjectives, for example, have the effect of cancelling part of the meaning of any count noun which they qualify. Thus a toy train is not a train, and so we cannot expound the meaning of 'Toy trains are expensive' as 'Whatever both is a
Difficulties in Frege's view
189
toy and is a train, is expensive'. This does not show that conjunction is to be excluded from the analysis of every noun phrase containing an adjective, but it argues for a structure which represents the adjective as qualifying the noun directly, which yet allows for further expansion, in some cases, in which conjunction could appear. But it is unlikely that all adjectives will belong to the same semantic category and, indeed, Frege himself argued that numerical adjectives belong to the second level (see 1884, section 46). The general assimilation of count nouns, adjectives and intransitive verbs is intuitively unconvincing. The traditional syntactic distinction corresponds roughly (though only roughly) to a distinction of meaning. Typically, a count noun tells us what kind of thing is in question, an adjective one of its qualities, while an intransitive verb tells us something which it does or undergoes. There is also a difference in permanence between these three, and how closely they are bound up with the very existence of the thing in question. It cannot change its kind at all, whereas many of the things it does or undergoes are transient and have little effect on what it is; qualities lie between these extremes, themselves on a scale. As Wittgenstein wrote of Russell's logic: the old logic contains more convention and physics than has been realised. If a noun is the name of a body, a verb is to denote its movement, and an adjective to denote a property of a body, it is easy to see how much that logic presupposes; and it is reasonable to conjecture that those original presuppositions go still deeper into the application of the words, and the logic of propositions. (1969, p. 204) Yet the syntactic distinction only points to these differences; it does not encapsulate them. Thus instead of saying that an animal is sleeping (using an intransitive verb), we can say that it is asleep (using an adjective). Again, there is a whole group of count nouns ending in '-er' and '-or', such as 'actor', whose meaning would clearly be given in part by a verb phrase. A logic which can make no distinction between kinds, qualities and doings or undergoings, however, must at the very least be seriously deficient, in urgent need of supplementation if not of emendation. The second corollary concerns temporal qualifications. As was mentioned earlier, when Aristotle distinguished between ovo^ia and /&%ia, he said that tense attaches to the latter. By implication, therefore, nouns or names do not carry any temporal qualification. But if count nouns belong to the same category as intransitive verbs, it should be possible to qualify them with respect to time. So, if Frege's analysis of propositions like (1) is correct, it ought to make sense to say, not only 'If anything is a mammal, then it has red blood', but 'If anything was a
190
Basic categories: count nouns
mammal, then it will have red blood' and 'If anything will be a mammal, then it had red blood'. Perhaps, with ingenuity, these sentences could be interpreted, but they pose fairly obvious difficulties to the would-be interpreter. As they stand, however, they constitute no direct problem for Frege, since his view of truth as timeless requires that tenses be eliminated in favour of dates. But they can easily be amended in order to meet this requirement, for example 'If anything is a mammal at 12 noon GMT on 1 January 1990, then it has red blood at 12 noon GMT on 1 January 1890'. This sounds very exact, as though, indeed, its truth conditions were quite determinate. Suppose, however, that a mammal from 1990 did not yet exist in 1890: can we say either that it did or did not have red blood at that time? The question does not arise; of course, we can stipulate that it did not, and that might be quite legitimate in a formal system, but if we are investigating everyday language it is not open to us to stipulate what expressions shall mean. So no gain in intelligibility is achieved by replacing tenses with dates. The possibility of attaching a temporal qualification to a count noun has nevertheless one attraction. We have a certain number of temporal adjectives available for this very purpose, notably 'past' and 'future', as in 'past president' and 'future students'. Surely a past president is one who was president (or who presided) and future students are those who will be students (or will be studying), so an analysis in which these adjectives are expounded by quite straightforward tenses is exactly what we want? Indeed it is, but the correct parallel here is with our earlier discussion of the prefix 'non-\ 'President' and 'student' are count nouns whose meaning can be expounded in a phrase of the form 'A that F \ so 'past' and 'future' could be incorporated into the account as tenses in the substitution for 'F', leaving a count noun ('person' in the two examples given) to be substituted for 'A' which had no temporal qualification. Thus, if 'The past president will now address the meeting' is paraphrased as 'The person that was president will now address the meeting', we do not have to go on to enquire whether past, present or future persons are meant: that information is already encoded in the 'that was' and 'will now', and 'person' neither has a specific tense nor yet is omnitemporal. It gives us a kind, and nothing more. It would, of course, always be possible to avoid this whole difficulty about tense by sub-categorization. Thus, supposing for the sake of argument that tense were assigned, as in standard tense-logic, to category S(S), and that we introduce a feature TM for temporal expressions, then the type of tense operators would be S(S/TM+ ) and that of intransitive verbs S/TM+ (N), whereas count nouns would be of type S/TM— (N). This would prevent count nouns from being used as the operand of a
Difficulties in Frege's view
191
tense operator. But Frege's own ideography did not include any provision for sub-categorization; moreover, if, as I suggested in my discussion of sub-categorization in section 4.5, its purpose is to avoid mismatches of subject-matter, then it does not seem an appropriate resort in the present difficulty. For the difference between count nouns and intransitive verbs cuts right across any distinction of subject-matter, and most of the things described by count nouns are subject to time and change. As a final corollary, preliminary mention should be made of the type of proposition called 'indefinite' in traditional logic. An example cited by Frege is (11)
Horses are herbivorous animals
or 'The horse is a herbivore'. He treats this in exactly the same way as (1), offering as a paraphrase: 'If something is a horse, then it is a herbivorous animal' (1969, p. 104; 1979, p. 95). Another, closely parallel example is 'the horse is a four-legged animal', which, he says, 'is probably best regarded as expressing a universal judgment, say 'all horses are fourlegged animals' or 'all properly constituted horses are four-legged animals'" (1892, p. 196). Frege is probably justified in treating this, (11) and (1) alike, for 'Mammals have red blood' or 'The mammal has red blood' would have done just as well as the latter. But- the truth conditions imposed by his analysis are then not correct for this type of proposition, diverging in two directions from what is required. Schema (IF) represents a proposition which is contingently true (or false, as the case may be). The reason for this is that Frege's explanation of the meaning of 'if in this context is as follows: any proposition obtained from the schema 'if P then Q' is to be false just in case the proposition substituted for 'P' is true but that substituted for 'Q' is false. If we apply this first to, for example, 'If Neddy is a horse, then Neddy is herbivorous', the latter is to mean the same as 'It is not both the case that Neddy is a horse and Neddy is not herbivorous'. There is nothing here to suggest that it is in the nature of a horse to be herbivorous: the proposition could be true even though every other horse were carnivorous. It just has to be a matter of fact that if Neddy is a horse, then Neddy is herbivorous, although it might quite conceivably have been otherwise. The generalization to 'If something is a horse, then it is herbivorous' is then equally a simple matter of fact, stating no more than that it just happens to be so. In order to bring out more clearly what is at issue here, (11) may be compared with (12)
Everyone in this room is sitting down.
192
Basic categories: count nouns
This is the kind of proposition that Frege's analysis suits, for it is evident that, if true, it just happens to be so; nor is there any possibility of expressing it in a form like that of (11). The truth conditions provided by Frege's analysis are thus in one way too weak for propositions such as (1) and (11); but in another way they are also too strong. Thus, it is not necessary to the truth of (11) that every horse should be a herbivore. If some horse-breeder gradually induced a taste for meat in his horses, (11) would still be true, just as it is true in the sense intended that men can see although some are blind, or that men have two legs although some have lost one or both or perhaps, even, like thalidomide victims, been born without them. On the other hand, we could not truly say The horse is a herbivore' if it were merely that most horses were herbivores. In sum, 'every, most, some' is the wrong dimension for expressing the intended connexion. As (12) shows, schema (IF) may have some applications to everyday language, so the purpose of discussing indefinite propositions at this stage has only been to show that Frege's application of his ideography does not represent their truth conditions correctly. His choice of examples for (IF) was unfortunate, as they turn out to be types of proposition for which he cannot offer any adequate analysis. Because singular and plural are usually interchangeable in indefinite propositions, Frege also discusses 'the Turk besieged Vienna' in this context, but suggests a totally different analysis. There, he says, 'it is clear that . . . "the Turk" is the proper name of a people' (1892, p. 196). On this account, 'The Turk besieged Vienna' would have the same semantic structure as, for example, 'Napoleon besieged Vienna'. But it is far from clear that 'The Turk' is the proper name of a people, the truth conditions of this example being exponible in terms of a group of individual Turks - admittedly, also constituting an army under the aegis of the Turkish government - besieging Vienna, whereas it certainly does not mean that the entire Turkish people besieged Vienna. However, the very truth conditions just outlined show that the example does not belong together with the indefinite propositions discussed above. Dummett, however, applies Frege's proposal regarding this example to cases like (11), claiming that words for kinds of organism are sometimes used as proper names (1973, p. 144). Thus, in 'The horse is a herbivore', 'the horse' would be a proper name of a race of animals, that is, animals descended from a common stock. But what, then, is the logical relationship between 'the horse' as an expression of category N and the schema % is a horse' of category S(N)1 And how should we represent the form of the argument:
Count nouns as a basic category
193
The horse is a herbivore; Neddy is a horse; Ergo: Neddy is a herbivore? Dummett agrees that there must be a connexion between proper name and predicate, but offers no explanation of it. Geach proposes a variant of this solution. For him, many count nouns10 can be used as names (though not proper names), both in acts of naming and also as 'logical subjects', that is, operands of a basic category other than S, in propositions. He cites three types of example: first, where the count noun is combined with a demonstrative, as in 'That woman is an architect' (1962, section 32); second, where we tell a story about, say, a cat, though no particular cat (ibid., section 34); third, where the count noun is combined with an applicative (quantifier) (ibid., section 105). In the first case he thinks that the count noun occurs as part of an act of naming, in the second that 'cat' names any and every cat indifferently, but when it comes to the third case, draws a distinction based upon Aquinas (Summa theologiae 1.13.12; \.29 AadX). A proposition like 'A fish swims in the sea', he assimilates to his second case: 'fish' is here 'a subject of predication and relates to the objects . . . called "fish"'. By contrast, 'if I say "A dolphin is not a fish", my proposition relates to no individual fish - but rather to the nature of fish'. Geach does not make it clear whether or not he regards 'fish' as a name in this context. Translating all this into my terminology, it seems that Geach wishes to assign count nouns in some of their uses to a basic category, albeit not that of proper names. But if they can also be used as expressions of category S(N), that is predicatively, then a fortiori they will have different meanings in each category and we need an explanation of how the two are related. If, on the other hand, they can only be used as expressions of a new basic category, then we need an explanation of how this category is related to category N, that of proper names. Geach does indeed argue that the use of certain count nouns underlies that of proper names (ibid., section 34), so he owes us an explanation on this score.
5.3 COUNT NOUNS AS A BASIC CATEGORY Lewis (1970) was one of the first to propose, within the context of a categorial grammar, a distinction between the two basic categories of 10
He calls them substantival terms, those count nouns 'A' for which 'the same A' supplies a criterion of identity, that is, a means of judging when one encounters the same A again (1962, section 31).
194
Basic categories: count nouns
proper name (AO and common noun (C). Thus he assigns the count noun 'pig' to category C and the adjective 'yellow' to category C(C), but Torky' to category N. Quantifiers (among which he includes the definite article) are assigned to category S(S(N),C),n so that, when 'every', 'a' or 'the' are combined with a common noun, the resulting quantifying expression is of category S(S(N)) and thus able to combine with intransitive verbs like 'grunts', assigned as usual to category S(N). There is also a facility for forming complex common nouns of the form 'A that F', 'that' ('which') being assigned to category C(C,S(N)). Unfortunately Lewis seems not to have considered propositions like (11), which he would only be able to parse as on all fours with (12). Nor does he give any examples to show us how he would deal with apparently predicative uses of count nouns, as in 'If Porky is a pig, then he can grunt', or 'Porky is a black pig'. He assigns 'is' to the same category as transitive verbs, S(N,N)y but that, of course, will not allow it to combine with a count noun, so his only resort would be the desperate one of treating the indefinite article in these two examples as a quantifier.12 Finally, the categories of common noun and proper name are wholly separate, with no semantic connexion established between them. Gupta (1980) developed a logical system in which common nouns are distinct from predicates. As he allows schemas as well as propositions to count as formulas, some corrections to his formation rules would be necessary before assigning the symbols of his system to Fregean categories. Assuming this done, and using C again for the category of common noun, he assigns the quantifiers to the same category as Lewis, but the definite article to category N(C). He also provides the same facility as Lewis for forming complex common nouns by qualifying a common noun with a relative clause, but makes no provision for adjectives (1980, pp. 6-16). Again, he does not consider examples like (11) or apparently predicative uses of common nouns, while the categories of common noun and term (which includes proper names) remain unrelated. Both of these authors in any case cast their nets wider than we need to do, since many common nouns are not count nouns. But neither they nor Dummett nor Geach afford us any help with the basic problem in 11
12
Here and subsequently I give the nearest Fregean category; actually Lewis uses Ajdukiewicz-style categories which have no exact Fregean equivalent. But for present purposes the difference does not matter. Desperate, because if'Socrates is a philosopher' is to be analysed as 'Some philosophers (Socrates is x)\ then 'Socrates became a philosopher' will be analysed as 'Some philosophenx (Socrates became x)' and it should then make sense to ask: 'So which philosopher did he become?' Yet this analysis has actually been proposed, for example by Montague (1974, p. 213).
Count nouns as a basic category
195
categorizing count nouns, namely, that, chameleon-like, they sometimes behave predicatively and sometimes not. Geach is the most aware of this tension, but does not resolve it. One possibility which they do not consider is that only some count nouns are to be assigned to a basic category, proper names being explained with reference to these, and the remaining count nouns introduced in the Fregean manner. That presupposes a distinction between two kinds of count noun, so how is it to be made out? Suggestions on this score have been made by Wiggins and by Dummett. Wiggins first introduces the notion of&sortal as a concept which yields a principle of counting (1967, p. 1); thus the linguistic expression corresponding to a sortal will be a count noun or count noun phrase. He then distinguishes, among sortals, between substance-concepts and phasesortals. Phase-sortals are concepts under which individuals fall only for part of their existence, whereas an individual which falls under a substance-concept does so for every moment of its existence (ibid., p. 7). Subsequently he characterizes substance-concepts in more detail, including, as an extra criterion, that if / is a substance-concept for a, then the proposition that a is not / i s self-contradictory (ibid., p. 37). So 'miner' is a phase-sortal noun, because a man is not a miner for every moment of his existence, but 'spastic' is not, since people are born spastic. However, 'spastic' is not a substance-concept noun either, because there would be nothing contradictory in supposing someone who was in fact a spastic not to be such. Thus the effect of adding the extra criterion is that some sortals are neither substance-concepts nor phase-sortals and what started as an exhaustive distinction ceases to be so. Dummett draws a comparable distinction between basic and derivative count nouns. The context is a discussion of Geach's views on identity, in which Frege's explanation of the meaning of the schema 'A is the same F as B' as 'A is an F and A is the same as B' is compared and contrasted with Geach's explanation of the meaning of the schema 'A is an F' as 'A is the same F as something'. (Proper names are to be substituted for each of 'A' and 'B' in these schemas, and a count noun for *F\) Dummett points out, first, that abstract nouns cannot easily be expounded in Geach's manner and, second, that the meanings of many concrete count nouns are learned without having to learn a special criterion of identity ('A is the same F as B') to go with them. Thus the meanings of 'spaniel' and 'collie' would normally be learned after that of 'dog', so that, if we already understand 'A is the same dog as B', it is only necessary to learn the features which distinguish spaniels, on the one hand, and collies, on the other, from the remaining species of dog. Those count nouns whose meanings can be
196
Basic categories: count nouns
explained without recourse to a special criterion of identity are, by contrast, derivative (1981a, pp. 202ff.). Basic count nouns, as Dummett understands them, do not contrast solely with derivative ones. They also contrast with those which can be expounded by means of Frege's procedure for the creation of new objects. This requires, as a starting-point, an equivalence relation defined over a given domain, which is then transformed into an identity statement containing the new count noun. Examples appear to be exclusively of abstract count nouns; thus Frege himself illustrates it by the equivalence relation expressed in 'line A is parallel with line B\ yielding 'the direction of line A = the direction of line B', the new count noun thus introduced being 'direction' (1884, section 65). Dummett's point is that this presupposes a domain with an associated criterion of identity: in the example, lines constitute the domain and we must already know under what circumstances A is the same line as B. Thus either the count noun 'line' cannot itself be introduced by this procedure or, if it is, we must eventually come to some count noun that is not. (One may have doubts whether 'line' is a count noun, but that does not affect the general argument.) Dummett's distinction is less satisfactory for present purposes than Wiggins's, because it appears to have a psychological element which could give rise to endless disputes. Are the meanings of 'chimpanzee' and 'gorilla', for example, normally learned after that of 'monkey', or might somebody learn to identify gorillas and only later learn that they were monkeys? The spaniel and collie example seems to derive a lot of its force from the contingent fact that we live in a society in which there are many species of dog as well as many mongrel dogs around. If, on the other hand, the point is that spaniels and collies are species of dog, then dogs and monkeys are also species of mammal, mammals species of animals and animals species of organism, so we should end with only the names of summa genera as basic count nouns, which is neither what Dummett intends nor useful to semantic analysis. Neither Wiggins nor Dummett addresses our question how count nouns should be categorized. Dummett, indeed, evades it. It is the practice of contemporary logicians, when using first-order logic, to specify a domain of objects over which the quantifiers may range. Thus they escape the charge of relying upon an ultimately unintelligible unrestricted quantification. Once the domain has been specified by means of a count noun with an associated criterion of identity, however, smaller parts of that domain can be delimited by count nouns analysed in the Fregean manner. There can be no objection to their negation, since the specification of the domain leaves it determinate. For example, if the
Count nouns as a basic category
197
domain be that of animals, (6) has a determinate interpretation, since 'every non-mammal' now means 'every animal that is not a mammal'. So another way of characterizing basic count nouns will be as those which are necessary in order to specify a domain of quantification and which cannot be introduced within an already specified domain (see Dummett, 1981a, p. 212). The main shortcoming of this account is that it simply by-passes the central structural question. Basic count nouns are withdrawn from the scope of structural representations and are provided for in a piece of preliminary stipulation which precedes any attempt at structural analysis. Thus, for example, if we stipulate a domain restricted to animals, schema (IF) will represent any proposition of the form 'If any animal is X, then it is P', but there is nothing in the representation to show us how the count noun 'animal' fits into its structure. Perhaps this may not matter in particular applications of first-order logic; it is, however, tantamount to abandoning the present enterprise.13 Could we, then, taking Wiggins's distinction in preference to Dummett's, assign his substance nouns to a basic category, perhaps on the lines of Lewis or Gupta, but assign other count nouns to a derived category, presumably the usual S(N)1 There would be ah initial difficulty in recognizing the type of count noun with which we were dealing, but perhaps that might be overcome by preparing a list of substance nouns which could be referred to as required. But then suppose that we have a proposition containing a count noun of the other type and that we want or need to give a componential analysis of its meaning. Thus we might be engaged upon a machine-translation project between two languages in one of which our count noun had no counterpart, so that it had to be rendered by a suitable combination of components in its meaning. What, then, if one of these components were a substance noun? We have, let us say, a proposition in which 'miner' occurs, but there is no word for 'miner' in the other language (perhaps it is the language of a people who have no mineral deposits). We can, however, translate 'man that excavates minerals for a living'. In that case, we shall need a semantic representation for the proposition which includes a componential analysis of the meaning of 'miner'.
13
These criticisms apply, mutatis mutandis, to many-sorted logics. Thus Wang (1952) offers two (equivalent) formulations of a many-sorted logic. In the first, link letters are distinguished into different types, which is effectively sub-categorization of category N; the count noun specifying each type would then have to be given in a preliminary stipulation outside the formal system itself. In the second formulation, the count nouns occur in the system as schemas of category S(N), but quantification is then unrestricted.
198
Basic categories: count nouns
This will present no difficulty provided that the original count noun and the substance noun which occurs in the analysis of its meaning both belong to the same category. Supposing that both belong to category S(N)9 then '( is a miner' can be analysed as '( is a man and ( excavates minerals for a living'; supposing, however, that both belong to a basic category, say C, we could assign 'that' as the operator of a schema of category C(C,S(N)) so that 'man that excavates minerals for a living' will come out as an expression of category C. Now we can, of course, always devise a category which will form a schema of category S(N) from an expression of a basic category and another of category S(N). In the present case it will be category S(N,C,S(N)) and the corresponding schema would be '( is a K that:x >(x)' or, perhaps, '£ is a K and >(()'• But this could be achieved more simply by introducing an operator of category S(N,C) which would convert a substance noun to predicative use, because that is really all that the other schemas do. They tacitly resolve the original dilemma about categorization by allowing us to convert an expression of a basic category into one of the predicative category S(N). Admittedly they only allow this conversion in conjunction with another expression of category S(N)y but is there any principle which would exclude conversion of a substance noun on its own? Moreover, if there are any predicative uses of count nouns, this would cater for them. Let us press this argument one stage further. Given any proposition containing a non-substance noun in the exposition of whose meaning one would have to appeal to a substance noun,14 the latter could always be substituted for the former salva congruitate. Thus, if 'In the past, some miners were often underpaid' makes sense, so does, 'In the past, some men were often underpaid'. Now, ex hypothesi 'miners' in the foregoing proposition is an expression of category S(N). Does 'men' in the corresponding proposition then occur as an expression of a basic category or is it used there predicatively? If the former, then the second proposition has a different semantic structure from the first, which is counter-intuitive. If the latter, though, how is unrestricted quantification to be avoided? The difficulty is immediately removed if we suppose, instead, that both 'miners' and 'men' in the two propositions are expressions of a basic 14
Wiggins holds - I think - that this applies to all non-substance nouns; cf. 1967, p. 30, where he says that all phase-sortals are restrictions of underlying more general sortals. This is a piece of medieval terminology revived by Geach, who explains it by saying that any substantival term (count noun) whose meaning can be expounded in the form 'A that F is restricted (1962, section 36).
Count nouns as a basic category
199
category. But since there can be no argument but that non-substance nouns are sometimes used predicatively, we should need the operator of category S(N,C) to convert them to predicative use. Why should it not be extended to them, though? The question is then whether it should be applicable to substance nouns, for this will depend upon whether they, too, have genuine predicative uses. Let us reconsider the Fregean assumption that all count nouns are (sometimes) used predicatively. This seems obvious to us today partly because Frege has taught us to see things his way: proper names have objects as their bearers, and objects fall under concepts, by which they are classified. So to say, for example, that (13)
Neddy is a horse,
is a very paradigm of predication, and one of the 'atomic' propositions upon which all the rest are built. This, I imagine, is what Wittgenstein meant by his comment 'Frege's 'Concept and Object' is the same as subject and predicate' (1969, p. 205), shocking as it would be to a Fregean conscious of his master's deliberate rejection of the subject/predicate distinction. Yet although (13) bears all the grammatical marks of the simplest kind of predication, a doubt may be raised whether the matter is so straightforward when considered from the point of view of meaning. If it is true that Neddy is a horse, then it is false that Neddy is not a horse and, if false, then at least meaningful. But, given that Neddy is a horse, are we free to suppose that Neddy is not a horse, just as we are free to suppose that Neddy is sleeping although in fact Neddy is cantering round a paddock? It is the same Neddy which is now cantering round the paddock and now sleeping, but could it be the same Neddy which is now a horse and now not a horse? We are sometimes invited to imagine such metamorphoses in fiction, but it is an open question whether we can consistently do so, whether enough is identified which persists through the changes to allow us to think of the same thing being there in different guises, or whether our enjoyment of the story rests upon an illusion of sense. Even if a coherent account of metamorphoses could be given, however, it remains disputable whether what remains the same under such changes is the bearer of one of our customary proper names. The first step in explaining the meaning of 'Neddy' to one who was wholly ignorant of the use of that name in our example would be to tell him that it was the name of a horse. More generally, in real life Neddy cannot cease to be a horse without ceasing to exist. But that is what Wittgenstein would have called a grammatical remark: it shows us that being a horse is not like being
200
Basic categories: count nouns
asleep nor even like having a chestnut coat, and does not tell us an interesting and curious fact about Neddy. It will, moreover, yield a criterion for substance nouns which will close the gap between them and phase-sortal nouns. Let us lay down that "a count noun 'B' is a substance noun just in case, given any individual a which is B, if a ceases to be B, then ipso facto a ceases to exist.15 Then, for example, a man might cease to be a postman without ceasing to exist, might even cease to have a skill like typing, through accident or amnesia; and while his past cannot be taken away from him, we can easily conceive that it might have been very different, so that, although he is a past president of such-and-such a society until the day of his death, he might never have been president of it in the first place. With this qualification, which effectively restricts us to what something which comes into existence and ceases to exist has to be (not just is) throughout its existence, we have delimited a group of count nouns which are essential for explaining the meanings of proper names of things which come into existence and cease to exist.16 It is necessary to insist that only count nouns which satisfy this criterion are in question, since other kinds of expression can also satisfy it. Differentiae are the most obvious example. If the meaning of 'horse' can be expounded as 'animal that P , for some substitution for ' P , then Neddy cannot cease to be that, either, without ceasing to exist. In addition, the possession of certain parts of the body, at least, is inalienable from an animal, while everything which comes into existence and can be described by a count noun will have some qualities which, though subject to change, cannot be totally absent. Neddy must have some shape, size and weight, and his coat must have some colour. But these qualities will typically be expressed either by adjectives or by verb phrases; if count nouns are sometimes obtainable from them by a recognized syntactic production, then we should have to exclude those, too. 15
16
Wiggins mentions this criterion as an alternative to those which he gives (1967, p. 30). Why he does not develop it is unclear. To my mind, it has the advantage of eliminating the temporal element in his definition of phase-sortals. It also reinforces the connexion between substance-concepts and Aristotle's notion of secondary substance, a connexion which is part of the intention behind Wiggins's distinction. Any of Dummett's basic count nouns will satisfy this criterion, but so, also, will some of his derivative ones. (A spaniel cannot cease to be a spaniel without ceasing to exist.) The distinction between substance and phase-sortal nouns is not fixed forever by this criterion. Thus, supposing advances in genetic engineering we could imagine that it became possible to change a person's sex completely, not just at the superficial level of present 'sex-change' operations. Then 'man' (in the sense of 'vir', not of 'homo') and 'woman' would no longer be substance nouns. But such cases will be very rare and thus not affect the general utility of the distinction.
Generic propositions
201
Returning, now, to sentences like (13), must we deny that they are propositions: that is, deny that they are either true or false? Well, even granting that their use is to explain to someone, in part, the meaning of the proper name which occurs in the subject position, that they are preparatory to the everyday use of language (see Wittgenstein, 1953, 1.49), it is still possible for such an explanation to go wrong and be deemed false. Suppose, for example, that Neddy is actually a donkey and not a horse; then, notwithstanding that if he were a horse we could not conceive of him as a donkey (and conversely), it is still false to say that Neddy is a horse. It seems, then, that (13) is, after all, a proposition, albeit out-of-the-run, for at the same time it is dubious whether 'horse' occurs predicatively in it. This is a topic to which I shall return in section 6.3. 5.4 GENERIC PROPOSITIONS The conclusions reached so far are, first, that all count nouns must be assigned to the same category; second, that a distinction can be drawn between two kinds of count noun, substance nouns and the rest; third, that, while non-substance nouns are sometimes used predicatively, one apparently predicative use of substance nouns turns out upon investigation not to be such; and, fourth, that, in order to explain the meaning of a proper name, one would have to appeal to a substance noun. We have also seen that if count nouns were assigned to a basic category, they might be converted to predicative use by an operator, whereas it is difficult to see how the converse could be effected. If we allow ourselves to be guided by meaning in structural analysis, as we surely should do when the structures in question are those which relate to meaning, then we must take substance nouns as prior to any proper names. This argues strongly for assigning them to a basic category and, hence, together with the other conclusions above, for assigning all count nouns to that category. But I propose to be rather more cautious at first, restricting the assignment to substance nouns and other count nouns whose meanings can be expounded in the form 'A that F where 'A' is a substance noun. This will embrace the 'concrete' count nouns of linguists, but it will remain open to us to dispose of abstract count nouns, if there are any, by paraphrase. This new basic category, then, will be one of names of kinds of substance (understanding 'kinds' widely to include the non-substance nouns). Clearly, we do not want to risk any confusion between expressions of this category and proper names. Nor, in speaking of names of kinds of substance, do I wish to suggest that these names have
202
Basic categories: count nouns
bearers in the way that proper names do, or even in any analogous way. It is a very large assumption that proper names are paradigmatic of the rest of language (they may well be exceptional), but so commonplace that it has become necessary to state explicitly that one is not making it. If I use 'name' in this connexion, I am doing no more than someone wholly innocent of philosophical or linguistic theories who might say, for example, ' "kangaroo" is the name of an animal' to mean no more than that the kangaroo is a species of animal. It is also liable to cause confusion today to use 'substance' in its Aristotelian sense, because a substance in the modern sense (for example plastic, wood) would be matter and not substance for Aristotle. The modern equivalent of Aristotle's '(material) substance' is, I think, 'body' in the sense that it bears in Newtonian mechanics, in which a body may be inanimate as well as animate, but must have extension. So I shall call the new category B, the category of names of kinds of body. Because of the important role of substance nouns in relation to proper names, it will be necessary to divide B into two sub-categories, corresponding to substance nouns according to the criterion laid down in section 5.3, on the one hand, and the remaining (phase-sortal) nouns, on the other. The initial question which then faces us is whether expressions of category B are to be allowed to combine with those schemas which, in the Fregean system, are assigned to categories S(N), S(N,N), etc. At first sight, it would seem that no such combinations should be permitted, that expressions like 'animal is a horse', 'cow is a herbivore', 'mammal dwells in the sea' or 'doctor visited patient' do not form semantically coherent units.17 Yet that would leave us with just two alternatives, neither of which is attractive. The first is to assign words like 'every', 'some' and numerical adjectives to category S(B,S(N))y thus making them dyadic. This alternative stays closest to Frege, for whom 'everythingrx (if (F(x)), (G(x)))' and 'something:x (and (F(x)), (G(x)))' are both schemas of the dyadic category S(S(N),S(N)). But it retains the basic category of proper names without affording us any hint of how they are related to expressions of category B, although we know that substance nouns have to be invoked in explaining the meanings of proper names. Moreover, if we take a pair of examples like those cited earlier, At least one patient was visited by every doctor
17
The reader is reminded that a semantically coherent unit does not have to be a proposition, but can be an expression of any category.
Generic propositions
203
and Every doctor visited at least one patient, at an intuitive level both appear to contain a common core (both are about doctors visiting patients), which could be represented by 'doctor visited patient'. (On a Fregean analysis they do not contain any common core: this will be spelled out later.) The second alternative is to introduce a category of operators which converts names of kinds of body into proper names, that is, a category N(B), and to assign 'every', 'some' and numerical adjectives to this category. This would indeed allow quantifying expressions to combine with first-level operators of categories S(N), S(N,N), etc., while at the same time preventing expressions of category B from doing so, but the price paid would be that quantifying expressions were included among proper names and thus were no longer of second level. And, with that, we should sacrifice quantification theory. We have therefore to consider seriously the idea that the examples cited above should be deemed semantically coherent units, although not necessarily propositions. The simplest solution would certainly be to recategorize first-level operators previously assigned to the Fregean category S(N) to category S(B), and similarly with category S(N,N), etc. Nor is that absolutely out of the question. At the end of the last section, I cited an example of an 'indefinite' proposition for which Frege's analysis was incorrect. Where the difficulties which I raised have been noticed, one solution has been to compensate for the contingency expressed in Frege's analysis by adding a necessity operator, thus increasing the complexity of the structure. More generally, we might hold that (11) is suitably paraphrased as 'Whatever is a horse, is a herbivorous animal' and that we simply have a special schema of category S(S(N),S(N)) here, 'Whatever^ (F(x),G(x))\ which will form a true proposition just in case it is in the nature of the thing described by its first operand to have the property described by the second. But this demands that the first operand does indeed describe something which can be deemed to have a nature, a demand which would not be satisfied so long as we are free to substitute any operator of category S(N) for ' F \ What, for example, is the nature of being next to Fido? Now I noted in section 5.2 that (11) might be paraphrased as 'The horse is a herbivorous animal', and there are a few nouns which, in similar contexts, allow us to drop even the definite article. The central example is 'man', as in 'Man is a carnivore' or 'Man eats meat'. It could, then, be that this form is a better guide to the semantic structure of such propositions than that with the definite article or that employing a plural
204
Basic categories: count nouns
noun. Certainly, the definite article in The horse is a herbivorous animal' does not signify any particular horse, as it does in 'Take the horse for a canter', while there is a parallel contrast between the use of the plural in (11) and in There are mice in the cellar'. If, then, propositions like (11) have the simplest possible structures, just 'S(B) B', then they can only be formed with the aid of expressions of category B and it will always make sense to ascribe a nature to what they signify. This solution also has its problems. It requires that any expression of category B combine with a schema of category S(B) to form a proposition. So far, I have illustrated the proposal with a few carefully chosen examples, but now we must go on to consider a wider range. There is no use in doing this, however, until we have a clearer view of the type of proposition which is in question. A test which helps in this regard is whether, salva congruitate, we can insert the phrase 'by nature' after the expression of category B, for example 'Man is by nature carnivorous', The horse is by nature herbivorous'. Now this phrase can be inserted into all of the examples given at the outset, but in two of the four it simply results in a false proposition. Mammals do not, by nature, dwell in the sea, although whales do so; nor is the animal, by nature, a horse. This test must be applied with a certain sensitivity, for it is an intuitive rather than a mechanical one. One must be especially careful with plural forms. Thus 'Dogs are in my garden' also yields the false, but not meaningless, 'Dogs are by nature in my garden'. Yet someone may object that the original proposition is obviously not generic.18 However, if we are indeed to take it as non-generic, then we must, I think, understand it as equivalent to There are some dogs in my garden' and as being said, perhaps, by someone whose native language was not English. But *There are some dogs by nature in my garden' does not make sense. It is also possible, though, to take the original in a generic sense, made explicit by the test, in which case it is false: doubtless, indeed, so obviously false that one instinctively prefers to take it in the first way. Though the animal is not, by nature, a horse, the horse is by nature an animal, so it appears that classificatory propositions fall within this group, including those which give us the genus of some species. Herewith we also have the answer to a question still outstanding, whether there are any genuinely predicative uses of substance nouns. For it seems difficult to deny that, in (14)
The horse is an animal,
I owe this objection, and the example, to the publisher's reader.
Generic propositions
205
being an animal is predicated, whether we say, with Frege, that it is predicated of whatever is a horse, or give some other account of the subject of predication. If, then, 'animal1 is an expression of category B, we shall have to invoke an operator to convert it to predicative use, as mentioned in section 5.3, although it might now be an operator of category S(B,B), so that '£ is a horse' would belong to category S(B). As a further step, supposing that 'the horse' occurs in (14) as the name of a kind of body, we could assign to (14) the structure S(B,B) B B. It may be objected that this simply revives the old theory of the copula in a categorial disguise. But that is not so, for the copula featured indifferently in generic and non-generic propositions, whereas there is no commitment here to any use of the new operator in propositions about individuals. Indeed, its most natural interpretation would be as relating the name of a species to the name of its genus, that is, as relating two natures. In that case, it would be most important to distinguish this operator from any indication that an individual belonged to some kind. Nevertheless, if we adopt this analysis, we must eventually take up Frege's challenge to show the pattern of argument in, for example, 'The horse is an animal; Neddy is a horse; ergo Neddy is an animal'. Meanwhile, I shall allow myself the use of an operator 'GEN ((,£)\ of category S(B,B), to mean '^ is a species of the genus ('. 'The cow is by nature a herbivore' is also classificatory, but does not give us a genus for 'cow'. 'Herbivore', indeed, is not an expression of category B at all, because 'is a herbivore' can be paraphrased as 'eats only plants'; it is one of those nouns, all too frequent in English, which go proxy for notions which are essentially verbal and predicative. So 'is a herbivore' should either simply be assigned as a whole to category S(B), yielding a structure S(B) B for (11), or should be subjected to componential analysis of its meaning, which would involve deciding upon a category for 'eat'. Even if this turned out to be S(B,B)9 with 'plant' assigned to category /?, 'eat' would still be a quite distinct operator from 'GEN'. The remaining example is more tricky. Is it in the nature of doctors to visit patients, or not? Here we can take a broader or a more restrictive view of what constitutes the nature of a doctor. On the more restrictive view, the nature of a doctor is the same as the nature of a man, what a doctor could not cease to be without ceasing to exist; in that case, the proposition will be false. On the broader view, the nature of a doctor will include everything that is essential to a doctor if he is not to cease to be a doctor; and in that case one could imagine circumstances in which the proposition is true: a society, say, in which it was not enough to be
206
Basic categories: count nouns
medically qualified in order to be called a doctor and in which medically qualified patients who either had no patients or did not visit their patients were not recognized as doctors. It is not necessary to decide here whether the wider or the narrower view is to be taken; the point is that the insertion of 'by nature' makes sense in this context, but of course there will be borderline cases when it comes to settling the question of truth or falsity. If we can insert the phrase 'by nature' into a proposition salva congruitate, as illustrated above, I shall henceforth call it a generic proposition. We also, however, sometimes credit the individual with a nature, such as when we say that someone has a gentle nature or is illnatured. So one can also say, for example, 'Alex is by nature a liar'. In this use, nature is contrasted with nurture; it means that a person was born that way. In spite of passing the test, therefore, this is not a generic proposition in my sense, nor is any proposition about an individual generic. As the term suggests, a generic proposition is a proposition about a kind. Generic propositions are relatively rare, because most would be false. However, proverbs are quite a good source of examples, especially if we take the wider view of what constitutes a nature; but, of course, proverbs are not always to be taken literally. Here is a selection which have a prima facie case for being accounted generic propositions: 'Dog does not eat dog', 'Empty vessels make the most noise', 'The fool wanders, the wise man travels', 'Fortune favours fools', 'The fox knows much, but more he that catcheth him', 'Friends are thieves of time', 'Great talkers are great liars', 'Honest men marry soon, wise ones not at all', 'The lion is not so fierce as he is painted', 'The tongue is not steel, yet it cuts'. Traditional logic was also interested in generic propositions, but under the heading of indefinite propositions, where they were put together with other examples which would not be generic in my sense. Thus Aristotle gives examples with abstract nouns {Prior Analytics I.I, 24a21-2), but William of Sherwood, discussing simple suppositio, cites 'Homo est dignissima creaturarum' ('Man is the most worthy of creatures') {Introduction to Logic, chapter 5), while Peter of Spain contributes 'Omne animal praeter hominem est irrationale' ('Every animal except man is irrational') {Summulae logicales, section 6). In the latter, 'Every animal' is not, of course, a straightforward quantifying phrase; it means 'every kind of animal', by contrast with 'Every philosopher' in 'Every philosopher except Wittgenstein wrote in red ink', which is not generic. This should suffice for an introduction to generic propositions, but perhaps rather more detail is required about their truth conditions or, what amounts to the same, the manner of their justification. It will be
Generic propositions
207
instructive to take an example which can both be interpreted as a generic proposition and as a contingent one, (15)
Homosexuals are more promiscuous than heterosexuals.
'Homosexual' and 'heterosexual' are, of course, adjectives, but 'homosexuals' and 'heterosexuals' are often used as shorthand for the count noun phrases 'homosexual people' and 'heterosexual people', as here. Now of those who care to defend this allegation, some will maintain that it is a result of social pressures designed to encourage stable sexual relationships between people of opposite sex but to discourage them between people of the same sex, whereas others will hold that it is in the nature of heterosexual attraction to create a stable bond but of homosexual attraction to create an unstable one. For the first group, (15), although true, is only contingently true, whereas for the second group it is a generic proposition (in the wider sense distinguished above). The distinction is sometimes presented as though it were only a matter of aetiology and not of meaning, a contrast of nurture with nature. Environment and heredity are seen as alternative, although not necessarily exclusive, causes, or the contrast may be between two kinds of environment: thus someone who held that homosexual people are by nature more promiscuous than heterosexual people might yet consistently believe that sexual orientation is acquired and not inherited. Yet, either way, there is a radical difference between formal causality, as Aristotle would have called it, and efficient. This can be brought out by an analogy with computers. Suppose that we have a computer upon which a certain programming language has been implemented but which, so far, contains no programs written in that language. Then somebody writes a program in the language concerned and compiles it on the machine. This brings about a change in the machine's internal state, in virtue of which it is now able to perform specific computations, when required, which hitherto it could not have carried out. In Aristotelian terminology, the programmer is the efficient cause of the machine's new state, but the latter the formal cause of its new computational ability. Now having a certain sexual orientation, even if it has been acquired, is to have a certain internal state - whether physical or mental or a combination of the two. So one who claims that (15) is true 'by nature' asserts that the relative difference between the behaviour of homosexual and heterosexual people which is in question flows from their respective internal states, notwithstanding that each state may have an efficient cause. By contrast, one who claims that (15) is true simply because of social discrimination against homosexual people thinks, indeed, that the
208
Basic categories: count nouns
latter suffer discriminatory treatment because of their internal state, but that their greater promiscuity is provoked by that treatment and does not flow from the internal state as such. Moreover, the circumstances under which (15) is true will differ according to the way in which it is interpreted, and so, too, the methods appropriate for verifying it. If it be understood contingently, then we have to look at unbiased samples of heterosexual and homosexual people respectively and find out how many different sexual partners each has had, etc. (The notion of promiscuity is also ambiguous, but that need not concern us here.) The question would then be decided by statistical procedures. If (15) be understood generically, however, it could be true even if homosexual people were in practice statistically less promiscuous than heterosexual ones, as they might be, for example, in a country like China, where the opportunity for homosexual behaviour is so far as possible denied to people. In order to establish the truth of (15) taken as a generic proposition, we should have to analyse the two sexual orientations, homosexual and heterosexual, and show what characteristics of the former encouraged greater promiscuity than the latter, or what characteristics of the latter inhibited the tendencies to promiscuity given freer rein by the former. Now these are surely differences in the method of verification under the two interpretations, and differences of truth conditions; a fortiori, then, there is a difference of meaning. Ambiguity, of course, is not always reflected in structural differences, but we can conveniently posit two distinct semantic structures for (15), and only in the generic sense would it have the structure (15F) is more promiscuous than (heterosexual, homosexual). as a case of S(B,B) B B, while the other sense would be represented by a more complex structure than (15F).19 Does this imply that the meanings of contingent propositions about individuals are operations upon the meanings of generic propositions? On the face of it, that would be counter-intuitive. We should therefore consider an alternative solution which takes its rise from a difference between generic and contingent propositions which has not yet been mentioned. Generic propositions are not usually significantly tensed, whereas contingent propositions are. This may be verified by checking the examples of generic propositions which have been given above. It is also illustrated by the two interpretations of (15). One who takes it as a contingent proposition understands it as describing how things stand now, without prejudice to the past or the future; indeed, there it carries a 19
To be detailed in section 7.3.
Generic propositions
209
suggestion that, were society to mend its ways, homosexual people would be no more promiscuous in the future than heterosexual ones. But if, by contrast, it is in the nature of homosexual attraction to be inherently unstable, no practical measures will avail to alter the situation; it is timelessly true. To this, it may be objected that we sometimes regard a nature as changeable over time, as, for example, in evolutionary accounts of the animal kingdom. That may be so; but I must reiterate that my concern is with the representation of the meaning of everyday language, not of natural language as a whole. Everyday language relates primarily to the human life-span, secondarily to recorded history, but merges into technical language when we talk about change on an evolutionary scale. So we must think of the nature of something as envisaged in a generic proposition as being its nature as we know it now; and what we then go on to say about that nature seems to be envisaged without regard to time. We must not expect everyday language to reflect the world with scientific accuracy, and it may be that where change is very slow and very difficult, it is ignored altogether in certain applications of language. If we take the broader view of nature distinguished above, then some distinction must be drawn between a thing's primary or basic nature and its secondary, or acquired nature. Thus a joiner, for example, has learned a trade in virtue of which he is able to do certain jobs which the rest of us are either altogether unable to do or only able to do very badly. If we think of a nature as a kind of internal state, then we share with the joiner a common condition, that of being human, which is basic to us all, but he has acquired the special internal state of a joiner, which we do not share with him. In organisms, an acquired nature is clearly much more easily changeable than a basic one; for the most part, indeed, a basic nature will at best be changeable over many generations and not in the individual though advances in genetic engineering may eventually alter this. Acquired nature will also range from the relatively permanent and unchangeable dispositions which are virtually impossible to lose, like the ability to interpret a photograph three-dimensionally, to slight variations of internal state which come and go, like an ephemeral piece of information casually acquired and quickly forgotten. Yet however permanent or impermanent a given nature in some individual, what we say about the nature as such rather than about the individual is regarded without respect for time. Thus that the great talker is a great liar, if true at all, is so notwithstanding that some men may learn to bridle their tongues, having been at one time great talkers but now no longer so, and applies equally to the great talkers of past and future generations as to those of the present.
210
Basic categories: count nouns
In contingent propositions, therefore, there appears to be a more complex verbal structure than in generic ones. The former contain a temporal and perhaps also an aspectual element which is absent from the latter. It is tempting to suppose that this correlates with the difference between proper names and names of kinds of body, that is, that tense turns an expression of category S(B), S(B,B), etc., into one of category S(N), S(N,N), etc., so that names of kinds of body combine only with untensed first-level schemas, while proper names combine only with tensed ones. For names of kinds of body do not form propositions when combined with tensed first-level operators and the results, indeed, may not even be semantically coherent. But this still leaves us with several problems. What would the category of the tense operators be? Would quantifiers be dyadic operators, each with a name of a body as one operand, and then how could the latter lie within the scope of the verb? Finally, it gives us no indication of how proper names are related to substance nouns, which name basic natures. This is the issue which must occupy us next.
Basic categories: pointers
6.1 THE CATEGORY OF POINTERS If names of kinds of body are assigned to a basic category, it will immediately be asked how names of individual bodies, the paradigms of proper names, are to be categorized. Yet it will help to place that question in perspective if we postpone it in order to consider deictic expressions first. Some central examples of deictic expressions are the demonstrative pronouns 'this' and 'that', but also tenses and temporal adverbials like 'yesterday'. In order to determine whether a sentence containing a deictic expression is true or false, we have to know what the latter indicates, which depends in turn upon the context (usually non-linguistic). Thus the sentence That man is a Pole' may be true said at one time in a certain place, but false at another time or in a different place. Similarly for a sentence like 'Fred went to York yesterday'. The term 'deictic expression' is current among linguists; many logicians use 'indexicaP instead, but I prefer the linguists' term, because it is less theory-laden: to call these expressions 'indexicaF already suggests an analogy with indexes in mathematical notation, whereas 'deictic' comes from the Greek verb 'deiKvbco\ meaning 'show' or 'point out' (Latin 'monstro', incorporated into 'demonstrative pronoun'). This only implies what has been said already, that the role of a deictic expression is to point to something outside the sentence in which it occurs (see Geach, 1962, section 22: 'it works like a pointer, not like a label'). Following this, we may Anglicize the terminology and simply call these expressions pointers from now on. Proper names, in a sense, replace pointers in special cases: that is to say, we only give proper names to a tiny fraction of the bodies upon which we could confer them, and for the most part pick out individual bodies either by means of simple pointing phrases, consisting of a pointer and a count noun (in their presence, for example 'this plant-pot') or by more complex ones containing a description expressed as a relative clause (in their absence, for example 'that plant that you bought at Homebase 211
212
Basic categories: pointers
last week'). It would be very tedious to have to use the latter constantly for bodies which we need to mention frequently, so proper names, which are usually much shorter, are convenient. If we are to understand proper names, then, it seems sensible to look at pointers first, and subsequently enquire how proper names relate to them. Frege excluded sentences containing pointers from the status of propositions, on the ground that they do not express a complete sense until any pointers which they contain be replaced by proper names. So he made no provision for representing them in his ideography and, by implication, excluded them from semantics altogether. His arguments on this score, however, concern temporal pointers (tenses and temporal adverbials), and turn on his notion of a thought as the sense of a sentence, so that it would take us rather far afield to pursue them here. Moreover, pointers of many kinds are ubiquitous in everyday language and are sometimes involved in analyses of the meanings of expressions of other categories. For example, going and coming might be contrasted as moving from here and to here respectively, where 'here' is a locative pointer. The exclusion of pointers could thus impose serious limitations on semantics. Frege's position has also been criticized on other grounds, notably by Perry (1977), who has stressed its unacceptable corollary that thoughts about oneself are incommunicable. Meanwhile, under the influence of Morris (1938), a branch of logical/linguistic theory called pragmatics has developed, which embraces pointers ('indexicals'). It is widely assumed that pointers and pointing phrases are to be categorized with proper names as expressions of a basic category. In part, this may be due to the influence of Russell, who at one time held that 'this' and 'that' are the only real proper names (1918, II (1956, p. 201)), although he later modified this view (1940, p. 110). In part it may also be attributable to the lingering influence of the picture theory of meaning in Wittgenstein's Tractatus, according to which the pointers (names) in propositions correspond to the objects of thoughts and are simple signs (Wittgenstein, 1922, 3.2-3.221). Whether or not this is historically correct, it is at any rate now difficult to find any discussion, explicit or implicit, of their category.1 Yet this is an essential first step if we hold that an account of their meanings depends, like that of other words, on the contribution which they make to the semantic structure of propositions in which they occur. It would, though, be unwise to assume that all pointers belong to the same category. In the preceding section, temporal and locative pointers have featured in the discussion, but I shall exclude them from 1
Thus, for example, not one contributor to a recent anthology of articles on demonstratives (Yourgrau, 1990) so much as raises the question.
The category of pointers
213
now on, confining myself to expressions which can be used to point to bodies {body-pointers). Accordingly, my main concern will be with 'this' and 'that', supplemented with certain uses of the definite article. The truth or falsity of propositions which we form with pointing phrases, for example (1)
That persimmon was taken from this bowl,
depends upon where the speaker is and when they are said. They may also call for some accompanying gestures, like pointing with the arm or hand, or a glance in a certain direction. In the absence of these, the direction in which the speaker is facing may be important. Phrases of this type, accordingly, direct the hearer's attention away from the speaker towards a body or bodies in his environment. They point, as it were, out of the sentence into the speaker's and hearer's surroundings. The prototype of pointers is the arrow-sign, but an arrow, by itself, only directs our attention in a certain direction, so we need to be told also what lies in that direction. This can be done by coupling it with another graphic sign, and also by labelling it with a word, which may sometimes be the name of a kind, such as the word 'toilet' with an arrow above, below or beside it. The whole sign then tells us roughly where to find a toilet; jt picks out an individual of a given kind by directing us to its place. Phrases like 'this bowl' serve a closely comparable function and so, too, do certain sentences. To illustrate the latter, suppose that I go into a shop and ask the shopkeeper: 'Show me a persimmon'. He goes to the back of the shop, takes a fruit from a box and brings it to me, saying (2)
This is a persimmon,
or perhaps (3)
Here is a persimmon,
possibly, even, (4)
Look, a persimmon!
Or, instead of bringing me the persimmon, he may just go to the back of the shop, look around, and say any one of these three sentences when he has found it, so that I have to go over to where he is in order to see it. Again, he may say 'Come with me', take me to the back of the shop with him, and point towards or otherwise indicate a shelf on which there is a fruit, using one of the three sentences at the same time. It is to observe that the shopkeeper need know nothing of my purpose in asking to see a persimmon. He will doubtless assume, or at least hope,
214
Basic categories: pointers
that I am thinking of buying one, and that my intent, therefore, is to find a persimmon. But it may be that I heard some friends talking about persimmons and, not knowing what they were, wanted to find out. So he may, without realizing it, be giving me an ostensive definition of 'persimmon'. Still, in either case what he says is true or false - the definition is not stipulative.2 If the fruit to which he directs my attention is, in fact, a Cape gooseberry, then (2)-(4) are false, whatever the purpose of my quest; so, if it really is a persimmon, they are true. Moreover, such sentences can be negated, for example This is a persimmon; that is not (a persimmon)'. They can, therefore, be used as propositions, even though they can also be used to give ostensive definitions. Both (3) and (4) confirm the connexion of linguistic pointers with place. The former does so because 'here' means 'in this place' (and 'there', 'in that place'). In (1) we cannot substitute 'there' and 'here' for 'that' and 'this', but we can say (5)
The persimmon there was taken from the bowl here,
and in some dialects the places would be mentioned without changing the original pointers to the definite article: 'That there persimmon was taken from this 'ere bowl'. In (4), the appeal is to sight. Well, in order to look at something we have to look in a certain direction and focus upon a certain distance, which is to say, look in a certain place. The vocabulary of sight presupposes that of place. If someone claims that he saw a persimmon yesterday and we ask him where, he may perhaps have forgotten, but he cannot reply 'Nowhere'. To see a persimmon nowhere is the same as not to see a persimmon (anywhere). Example (1) could also be re-cast so that it makes an explicit appeal to sight: (6)
2
The persimmon that you see there was taken from the bowl that you see here.
Geach holds that a sentence like (2) 'often has the logical role not of an asserted proposition but of a simple act of naming' (1962, section 22). That is, it would name the body indicated as a persimmon. However, this notion of naming seems to be ambiguous. There is a strong sense of 'naming', in which a name is conferred for the first time, as when Adam named all the animals (Genesis 2: 19-20). Acts of naming of this type cannot be either true or false, so, if Adam had said 'This is an elephant', he would not have asserted a proposition. But there is also a weaker sense of 'naming', in which we teach someone the meaning of a name which already has an established use. Call this an act of naming if you wish, but if a father says to his child today 'That is a horse' when pointing to a donkey, what he says is false, and hence a proposition. There is room for a mistake on such occasions. Acts of naming in the strong sense are rare with count nouns.
The category of pointers
215
If my purpose is to find a persimmon, it would not greatly matter if there were more than one in the place indicated by the shopkeeper, and even a philosopher would not complain to the management if he found several toilets at the end of the corridor after following the direction of the arrow under which 'toilet' was written in the singular. If my purpose is to discover what a persimmon is, though, it would matter. If more than one persimmon is in that place - indeed if more than one body of any kind is there - I may draw the wrong conclusion from what the shopkeeper says. Supposing that there were two persimmons, I might conclude that 'persimmon' was the name of a pair of fruits of that kind. Languages are quirky in these matters: a pair of trousers is one garment, so why should a persimmon not be a pair of fruits (perhaps they always grow in pairs)? Supposing that there were several fruits, each of a different kind, it would depend upon whether I knew the names of the others; otherwise, I should have to ask which of those unknown to me was meant. Similarly with (1): if more than one persimmon or more than one bowl are in the places indicated, it will be necessary to ask 'Which persimmon?' or 'Which bowl?' One who prepares a scenario for an ostensive definition of the name of a kind of body will, of course, do his best to see that there is only one body in the place which he indicates, and anyone who uses 'this' or 'that' coupled with the name of a kind of body, as in (1), will also try to avoid situations in which his indications of place leave it uncertain which bodies are meant. This prompts a comparison with another kind of nonlinguistic pointer, an 'address' in computing. A computer has registers in which signs may be stored, so it is necessary, in order to retrieve the contents of the registers, to give each register a name, which thus points to or addresses that register. The address on an envelope also serves this function: it enables us to get a message to a person by sending it to the place where he lives or works. Now if only one person lived in each house and we always knew if anyone had moved, we should only need to write the address of the house on the envelope and could omit the name of the person to whom the letter was being sent; every letter would be addressed, in effect, to 'The Occupier'. That is how it is with a computer register, which can only hold one sign at a time. The same goes for 'this' and 'that'. It is part of their meanings that they should pick out places in which there is just one body (or just one of the stated kind) and so, when things go wrong, we have to ask: 'What do you meant There's nothing here (there)' or 'Which one do you meanT We charitably assume that the speaker meant something, but the meaning of his words does not tally with the situation, even to the extent of making them false. Yet 'this place' and 'that place' also contrast with computer
216
Basic categories: pointers
addresses in that they tailor the place to the body and have no fixed size or location. An address, in the sense of'address' in which we write an address on an envelope, does not itself have a location, and a computer address, similarly, may have no location on the computer. A labelled arrow-sign, on the other hand, must be placed somewhere, and that is our startingpoint in following the sign. In this respect, 'this' and 'that' resemble arrow-signs rather than addresses; the hearer starts from the place where the speaker is. But in which direction is he to go, and how far? The direction will be shown by the speaker's posture or gestures. To this extent, 'this' and 'that' are unlike arrow-signs, being pointers which do not, in themselves, point in any particular direction, and more like the arrow on a weather-vane, whose direction is determined by the wind. Sometimes, though, 'that' contrasts with 'this' as a second, different direction. At other times, the contrast is of distance: this place is normally in the immediate vicinity of the speaker, that place farther away. You would shout 'Pick up that ball!' to someone at whose feet it lay, but only 'Pick up this ball!' if it were near to you yourself. With regard to distance and direction, there is one important difference between 'this' and arrow-signs. 'This' is also used when the distance from the speaker is nil and then, of course, the question of direction does not arise either. We can regard this as a limiting case, but there is then no longer any comparison with the arrow-sign. When we get to the toilet, the room just has 'toilet' on the door and an arrow there would be counterproductive. But if I am holding a persimmon in my hand, I still say 'This is a persimmon' or 'This persimmon is ripe', and this place is then the place where I am. Yet it does still indicate a direction and a distance to the hearer: that, namely, from himself towards and up to the speaker. 'This' and 'that' are mobile signs of which we all carry a supply, so everyone who reads them has first to direct his attention to where they are placed; after that, he may or may not have to direct it further away from the 'base'. In view of these considerations, it would be natural to use (7)
In this place is a persimmon
as a springboard for representing a semantic structure for (2)-(4). That, in turn, appears similar to (8)
In this shop is a persimmon.
The similarity, however, is merely grammatical. 'In this shop is both a persimmon and a Cape gooseberry' makes sense and so, too, does 'In this place is both a persimmon and a Cape gooseberry'. But now 'place'
The category of pointers
217
cannot have the same meaning as in (7), considered as a paraphrase of (2)-(4), because it does not make sense to say This is both a persimmon and a Cape gooseberry'. Thus (7) is a misleading paraphrase, and the reason is the one I gave earlier, that although 'this' and 'that' pick out individuals by pointing to a place, the place is tailored to the body which occupies it. That is to say, there is no way of identifying the place independently of the body: the body defines the place: it is the place of the body. Consequently, the considerations about place advanced above are by way of an elucidation of the meanings of 'this' and 'that' but are not a first step in a structural analysis of their meanings. The comparison with computer addresses is also helpful in showing that we do not need to exhibit the locative aspect of the demonstrative pronoun in semantic representations. Because a computer register can only contain one sign at a time, it is possible to regard the register's address as a sign which has, as its value, the sign stored in the register. Thus 'PI', for example, may be the address of a register in which the numeral '3.141592654' is stored and, in a command containing 'PI', the latter will be interpreted as if it meant 'the numeral stored in the register whose address is "PI" '. So the register as a place drops out of consideration; it is essential to the working of the machine, but the user need not bother himself about it. Linguistic pointers behave similarly in this respect: they address a place, but, because the place is defined by what occupies it, a speaker can use them as if they pointed directly to a body and not to its place. The grammatical form of (2) suggests that 'this' is its operand and 'is a persimmon' its operator; that, however, is difficult to reconcile with yet a further alternative to (2)-(4), (9)
This fruit is a persimmon,
which supplies the name of a kind of body for 'this' to qualify. We may, indeed, go further than regarding (9) as merely an alternative to (2). Since there is always the logical possibility of misunderstanding what is being pointed to, a demonstrative which does not qualify a count noun is equally beset by ambiguity; the hearer may take it as it is intended, but also may not. If he is unsure what is being pointed to, he can always ask 'This what is a persimmon?' - to which (9) would be the natural answer. (It might not always wholly eliminate ambiguity, but would certainly reduce it, by focussing the hearer's attention.) In that case, (2) seems to be incomplete: a perfectly good sentence, certainly, but not quite a proposition, for we could not determine whether it was true or false if what the 'this' pointed to were indeterminate. So (2) must be regarded as
218
Basic categories: pointers
shorthand for (9), which is the 'canonical' form for which we need to give a semantic structure. But now it may be objected that we are set upon an infinite regress, for we can always use the count noun supplied in answer to the question to construct a new sentence in which the demonstrative again does not qualify any count noun, for example This is a fruit'. Of course that could hardly serve on its own as an ostensive definition of 'fruit', because there are so many kinds of fruit, with very different visual appearances. But said frequently enough, using a wide variety of fruits, preferably still on the tree, as examples, it might do so. Yet it is very unclear how we could answer the question This what is a fruit?' The fruit is not itself a tree or a plant and, even if it were, the same question would arise again for This is a plant'. Perhaps we could reply This organism is a fruit'; but then we have to answer the question This what is an organism?' At this point there is little left to say, for if we fall back on 'this body', we are no longer supplying a count noun. Ultimately, then, it seems that we have to acknowledge irreducible uses of pointers which do not qualify count nouns and, if ultimately, why not straight away, with examples like (2)? We have absolutely no evidence that the meaning of 'this' in (2) is different from its meaning in (9), so the presumption must be that it belongs to the same category in both uses. Moreover, if 'this' in (9) really does qualify the count noun 'fruit', then the demonstrative must be an operator.3 Now the role of pointer is not necessarily incompatible with being an operator. The direction of a pointer with respect to its scope must not be confused with its direction qua pointer. There is no reason why it should not point out o/any proposition in which it occurs into its (non-linguistic) environment, while yet having a scope extending into the proposition, holding it together. These two notions of direction have often been assimilated in the past (cf. the picture theory of meaning), the idea being that pointers must be termini of linguistic structures, having no scope because they latch onto the external world instead of latching onto other parts of the proposition. Yet body-pointers cannot be first-level operators forming pointing phrases from count nouns, because then they could not occur in contexts where there is no count noun for them to qualify. Moreover, whereas there is only one way in which (2) can be false, viz. when the demonstrative does not point to a persimmon, there are two ways in which a sentence like (9) can be false. This will be clearer with another example, 3
That holds even with the Fregean account of count nouns, though the demonstrative would then be a second-level operator.
The category of pointers (10)
219
That vegetable is a tomato.
The first way is exactly parallel to (2), when the body indicated is not a tomato. But even if it is a tomato, (10) is still false, because the tomato is a fruit and not a vegetable. 4 In that case, we must, it seems, regard (9) and (10) as disguised conjunctions, so that the latter, for example, is to be analysed as T h a t is a vegetable and it is a tomato', with 'vegetable' having a predicative use in this context. An objection to this analysis is that, 'and' being commutative, 'That is a tomato and it is a vegetable', which will be the analysis of 'That tomato is a vegetable', will be true in just the same circumstances as (10). Yet saying that that tomato is a vegetable seems not at all the same thing as saying that that vegetable is a tomato. However, this is because we tend to think first of the non-propositional use of (10), as an ostensive definition.5 If 'That tomato is a vegetable' were used as part of an ostensive definition, it would contribute to an attempt to teach someone the meaning of 'vegetable', whereas (10) would be used to define 'tomato'. Here, though, we are only concerned with (10) as a proposition, and it will indeed be true that that vegetable is a tomato just in case it is also true that that tomato is a vegetable. Even if the best we can manage is 'That red object on the shelf is a tomato', what we have said will be true just in case that tomato is a red object on the shelf. This leaves us with just two alternatives for the category of bodypointers. Either they are expressions of a basic category, or they are second-level operators. They cannot be first-level operators, because then they would require operands of a basic category, whereas we have concluded that 'vegetable' is used predicatively in (3). They could, however, be dyadic second-level operators, having two first-level schemas as their operands (for example one for the predicative use of 'vegetable', the other for the predicative use of 'tomato'). But a serious objection to regarding a pointer as an operator is that pointers, like proper names but in contrast to quantifying phrases, do not 4
5
This consideration is what motivated Russell's analysis of propositions containing the definite article in the sense of 'the one and only', his Theory of Descriptions. Those who reject Russell's analysis would no doubt wish to say that (10) presupposes, but does not assert, that what is being indicated is a vegetable (see Keenan, 1973, for a system of logic incorporating presupposition). But as presuppositions can be false just as much as assertions, the difference does not affect the present issue. Geach holds that (10) combines an act of naming ('That is a vegetable') fused with a predication about the body then and there named by the count noun 'vegetable' (1962, section 32). But, in this context, 'vegetable' is a misnomer and, as this is certainly not an act of naming in the strong sense distinguished earlier, it is not excluded from being either true or false. Which might, incidentally, work even though as a proposition it is false.
220
Basic categories: pointers
give rise to scope ambiguities. An advertisement for a newspaper which I saw recently exemplifies the latter: 'Everything you need to know about sport every Monday'. The intention, of course, is to claim that every Monday's issue contains all the reader needs to know about sport at that time, but the wording prompts one to imagine the paper setting out some arcane information which sporting enthusiasts need to know every Monday - what could it be? Or, to describe the ambiguity in structural terms, it is intended that 'every Monday' be the main operator, with all the rest falling within its scope, whereas the alternative reading places 'every Monday' within the scope of 'know', so that the combination 'know every Monday' is qualified by the rest of the proposition. By assigning quantifying phrases as the operators of second-level schemas such scope ambiguities are readily exponible and, with suitable rules of inference, logical fallacies avoided. No such ambiguities ever arise with pointers. The recipe used in the example above is a fairly reliable method of creating a scope ambiguity in a proposition containing two quantifying phrases, namely, to place what is intended as the main operator at the end of the sentence, without separating it from the rest by a comma. But take a sentence containing two pointers, say 'That woman adores this dog'. We can invert the order of the pointers by putting the verb into the passive voice, viz. 'This dog is adored by that woman', without introducing any hint of ambiguity or, indeed, any difference of truth conditions between the two propositions. And it is, I think, clear that the relative order of pointers in a proposition will never, as such, affect its meaning. It may well be possible, technically, to give a correct account of logical inferences from propositions containing pointers by assigning them as the operators of second-level schemas. Montague, after all, did as much for proper names (1974, chapter 8). But we should never assign a more complex structure to a proposition than is necessary in order to explain its meaning; otherwise we shall engender differences of structure which do not correspond to any differences of meaning. Moreover, it will be only too evident by now that semantic structures are much more complex than syntactic ones, anyway, so it would hardly be sensible to add gratuitous multiplicity to them. We are thus driven to the conclusion that pointers must belong to a basic category. It does not follow from this, however, that pointers are to be assimilated to proper names, because we have not yet decided the category of the latter. The introduction of a new basic category is always a momentous occasion, because of the wide implications which it always carries for the whole system of categories. It is not to be undertaken lightly and its consequences must be carefully monitored. Hence the justification above.
The category of pointers
221
As I do not want pointers to be prematurely confused with proper names, I shall avoid the old Fregean category N, using instead D (deictic, demonstrative) for the category of pointers. The first consequence of this decision is that propositional schemas which can take pointers as operands will belong to one of the series of categories S(D), S(D,D), etc. Moreover, X is a persimmon','( is a fruit', '£ is a tomato' and % is a vegetable' will be examples of schemas of category S(D). I am assuming, here, that these are tensed schemas, for one can say, for example, That was a tomato', pointing at a red splodge on the floor and even That will be a tomato', pointing at a blossom on a tomato plant.6 Moreover, where a pointing phrase occurs as syntactic subject of a sentence, the predicate is often a verb phrase and indisputably tensed, for instance This girl is drawing a picture'. The semantic structure posited for (2) will, accordingly, be (2F)
S(B,D) D B,
where the operator of category S(B,D) combines two functions: first, that of converting the count noun to predicative use; second, that of endowing the predicate with a tense. Given an analysis of tenses, it could be further analysed into these distinct factors; both would presumably be represented by operators. We cannot represent a structure for (10) in linear notation, because it embodies converging scope. That is a vegetable and that is a tomato' is not equivalent to (10), because 'that' could point to something different on each occasion of its use; rather, we need (Gl). (Gl)
vegetable
that
tomato
Finally, in order to represent (1) we need to make explicit the Agent role associated with 'take': (1') 6
That persimmon was taken (by someone) from this bowl
The tense is converted into an adjective when the pointer qualifies (from a syntactic point of view) the count noun, for example 'that erstwhile tomato', 'this prospective tomato'.
222
Basic categories: pointers
or 'Someone took that persimmon from this bowl'. In order to simplify the graph, I leave 'someone took' unanalysed; this absorbs the Agent role, thus leaving Patient and Source (indicated below by P and S respectively) to be represented.7 It will also embody converging scope twice so that we have (G2). (G2)
persimmon
O- — O that this bowl Pointers occurring in propositions can often be replaced by nonanaphoric pronouns, such as 'She adores it', said in sight of a woman cuddling a dog. Such pronouns are not structural signs like anaphoric ones, but are themselves pointers, understood in conjunction with some gesture. They should, accordingly, also be assigned to category D.
6.2 P R O P E R NAMES As the use of pointers and pointing phrases consisting only of a demonstrative and a count noun is largely limited to those occasions when the individual bodies with which we are concerned are perceptible to speaker and hearer, the facility which they provide has to be extended to embrace bodies which are outside the range of our perception, either because those bodies are removed in place, that is, are too distant to be seen or are hidden from view by intervening bodies, or else because they are removed in time, the extreme case of this being when they have ceased to exist. One way of overcoming this limitation is to combine a pointing phrase with a relative clause, as in (11)
The dog that bit you has been put down.
Perhaps Goal, too, but I shall ignore that here.
Proper names
223
I shall not attempt a semantic analysis of propositions of this type until the ground has been prepared for a review of relative clauses in the light of the assignment of count nouns to category B and of pointers to category D. At this stage, it is merely to observe that this device also has its limitations, at least from a practical point of view. They are illustrated by two responses which (11) might elicit from the person to whom it is addressed. The first is 'What dog? I don't remember being bitten by any dog', the second 'Which dog? I was bitten by two'. Of course, the same difficulties can arise when pointing phrases alone are used. This persimmon is ripe' may prompt the retort 'What persimmon? I don't see any persimmon here' or, again, 'Which persimmon? There are several here'. Yet, granting this, pointing phrases which have been embellished with relative clauses are much more likely to fail in their purpose in either of the two ways illustrated. In general, the further removed a body is in place or time from the speaker's range of perception, the more uncertain this method of picking it out. To compensate for this, there will be a tendency to use more complicated and elaborate relative clauses, which are eventually counter-productive. On the speaker's part, the risk of a mistake in phrasing is increased; while, for the hearer, they become more difficult to take in correctly. Moreover, in an extended conversation concerning, perhaps, several bodies, there will be the added inconvenience of repeating these long phrases. In some cases we are able to circumvent these problems because the body in question has a name, which we can then use instead of a pointing phrase, whether or not the latter would have to be supplemented by a relative clause. It thus appears that the proper name of a body replaces a pointing phrase. Let us, then, consider in more detail how names of bodies work. When a body first comes into existence, it does not have a name, and it remains without one until it has been named. Most bodies remain unnamed throughout their existence, so that, if we want to speak of them individually, we have to use pointing phrases or pointing phrases supplemented by relative clauses. But those bodies which are important to us, and of which we often want to speak in their absence, we name. In theory, any word or phrase, even a numeral or a made-up word, can be used to name a body; in practice, for reasons which will be spelled out later, this freedom is often restricted. Society requires that bodies of some kinds be named: first and foremost, that each human being be named, either at birth or very shortly afterwards. For bodies of many other kinds, naming is optional, and may be a private rather than a public matter: that is to say, the naming may not be ratified by society, with the consequence that the name does not become a part of the language of the whole
224
Basic categories: pointers
community, although it can be used within a smaller circle of people who recognize it as such. Although the names of human beings are the central examples of proper names of bodies, the procedure by which they are nowadays named has become overlaid by many accessory features which obscure what is essential to naming. I shall, therefore, start with the example of naming a ship instead. When a ship is to be named, its owners invite someone to name it, though they themselves usually choose the name. On the appointed day, all gather at the shipyard and the person deputed says, for example: (12)
I name this ship 'Caronia'.
It is customary, simultaneously, to break a bottle of champagne across the ship's bow, but that is a pseudo-baptism and is not essential to naming it. What is essential is only that a person with the right to do so utters a sentence having the form of (12), in the presence of the body to be named, and publicly. This is an act of naming in the strongest sense. Henceforth society acknowledges that, for example, 'Caronia' is the name of that ship, and a new name of a body has thereby been introduced into the language. It has often been pointed out that sentences like (12) are not propositions, that is, are neither true nor false. They are more akin to stipulative definitions, in that they introduce a new term in such a way that it can henceforth be used in propositions. The propositions most closely related to sentences like (12) are the corresponding reports, for instance that such-and-such a person named this ship 'Caronia' or, more impersonally, that this ship has been named 'Caronia'. There is a slight, though not unimportant, difference between saying that the ship was named 'Caronia' and that it has been named 'Caronia'. The former leaves it open, and may well suggest, that the name of the ship has since been changed, whereas the latter excludes any intervening change of name. The custom of introduction is connected with the naming of bodies. It is, indeed, usually the way in which we learn their names, not having been at the naming ceremony. So, going down to the harbour with a friend, one may be told: (13)
That ship is called 'Caronia',
or perhaps just (14)
That ship is the Caronia.
These introductions are also propositions: they can sometimes be false. Provided this be granted, it is indifferent whether we also dub them acts
Proper names
225
of naming. Both will be true, however, just in case that ship has been named 'Caronia'. Their truth, that is to say, derives from an historical event. This is closely relevant to their semantic structure. Two differences between (13) and (14) can safely be ignored. In English, some names of bodies are preceded by the definite article, but most are not. If the introduction had been, say, to a horse instead of to a ship, the proposition corresponding to (14) would have been, for instance, That horse is Bucephalus', with no definite article. In Greek, by contrast, all names of bodies must be preceded by the definite article when occurring in sentences. So the presence or absence of the definite article is a matter of syntax, not of semantics. The second difference is that the name is enclosed in quotation marks in (13) but not in (14). Although some philosophers are very fussy about this, the quotation marks are optional in this context. The preceding words 'is called' or 'is named' show quite clearly that what follows is the name of a body, given the pointing phrase which in turn precedes them. If the quotation marks serve any purpose at all, it is to show where the name ends, because it might have been a phrase and not a single word. One has only to look at any classical English novel to see that, more often than not, quotation marks are not used after 'is called'. Omitting them cannot effect a change of sense like that which would be produced, say, in dropping the quotation marks from the question: 'How many letters has "John"?' If we wanted to incorporate the definite article into (13), moreover, we should hesitate about which side of the opening quotation mark to put it. 'That ship is called "the Caronia"' looks better than 'That ship is called the "Caronia"', yet what is written on the bows and stern of the ship is just 'Caronia'. So, in the end, we should probably write: 'That ship is called the Caronia', which is still quite all right and cannot be construed in any sense except the one intended. Some have thought that the 'is' in (14) is an identity-sign, that is, that the proposition may be paraphrased by 'That is a ship and it is no other than the Caronia' Our assessment of this suggestion will depend upon the accompanying account of identity. Since Frege, identity has commonly been conceived as an equivalence relation (and thus as reflexive, symmetrical and transitive) between the (potential) bearers of proper names. Thus one implication of treating the 'is' in (14) as a sign of identity is that proper names are to be assigned to the same category as pointers, namely, D. That would demand revision of one standard feature in the current account of identity, that it provides a licence for substitution in propositions salva veritate. It is evident enough that 'Caronia' cannot be substituted salva veritate for 'that' (or even 'that ship') in whatever
226
Basic categories: pointers
proposition the latter may occur, but only when the ship singled out by the pointing phrase happens to be the Caronia. In particular, we cannot substitute one for the other in (13), even dropping the quotation marks: *'(The) Caronia is called that ship' is nonsense unless we take 'that ship' as the name of a ship and not as a pointing phrase. Or, if we paraphrase (13) by That is a ship and it is called "Caronia"', we cannot interchange to obtain *'(The) Caronia is a ship and it is called that'. This creates a strong presumption that 'Caronia' and 'that ship' are expressions of different categories, but is not absolutely conclusive. The distinction could be sub-categorial, imposing restrictions upon the operands of 'is called' and 'is named' respectively. It is evident enough that the way in which proper names of bodies lead us to individual bodies is very different from the way in which 'this' and 'that' do so. Kripke (1972, pp. 253-355) has expressed one aspect of the difference between them by calling names of bodies rigid designators, to bring out that they always designate the same bodies, in whatever context they are used. This implies some kinship with pointers, which would presumably be, by contrast, flexible designators. Should proper names, then, also be assigned to category Dl A strong consideration in favour is that they will combine with schemas of categories S(D), S(D,D)y etc. to form propositions. This combination must, indeed, be possible, so, if they are not assigned to category Z>, they would probably have to be assigned to the second-level category S(S(D)) in the manner of Montague. Yet the kinship between proper names and pointers is still closer, for proper names are also pointers, albeit historical pointers, introduced into language by an historical event. Their rigidity derives from their historical nature, so that the latter is the more fundamental characteristic. The contrast between proper names and demonstratives is well expressed by terminology used in linguistics: proper names are diachronic pointers, demonstratives etc. synchronic ones. Diachronic pointers are historical in a double sense. Not only was their introduction into the language an historical event, but they also presuppose that the bodies which they name - their bearers - have histories. From the time of their assignment onwards, they always name the same body, and are necessary to language for reasons which have more to do with time than with place. In principle, a body could always be picked out by a pointing phrase in the present, if only our range of perception were not limited and unable to penetrate intervening bodies and other obstacles. But we cannot, physically, point into the past or into the future, so a different kind of pointer is necessary for the purpose. In particular, to use Frege's well-known phrase, we often need to know whether a body is the same again; but bodies change over time in many
Proper names
227
ways, so we cannot decide this on a 'snap-shot' basis, that is, by comparing a body described as it was at an earlier time with a present body. A body cannot, however, change its kind, so substance nouns yield a touchstone for comparing bodies over time, with the help of phrases of the form 'is the same B as'. Nevertheless, the truth conditions of propositions formed from the latter vary both with the substitution for 'B' and with the time-span of the comparison. Were it not for naming bodies, it would in general be much easier to determine that they were false than that they were true, and we should be guided very largely by our knowledge of the ways in which bodies of the kind in question characteristically change, and how quickly or slowly. If I saw a ship here ten years ago and see a ship here today which looks rather like it, the possibility that it is the same ship again is not ruled out by a new colour scheme or even a different arrangement of rooms inside (it could have been re-fitted in the meantime), but if it has twice the tonnage of the ship often years ago, then it cannot be the same. If, however, I saw a man here ten years ago and see a man here today who resembles him, he could have doubled his weight and his hair could have become white in the meantime; yet if this man's eyes are blue, say, and those of the man of ten years ago were brown, then it cannot be the same man again. But if I compare him with a man whom I saw only last week, then this man cannot be the same, supposing him to be twice the weight. Where our knowledge of change does not rule out that this is the same B again, the proper name of the body becomes important. If it has the same name, that creates a defeasible presumption that it is the same body; but if a different name, a defeasible presumption that it is not the same body. Probability, as Bishop Butler said, is the guide of life, and for the most part we are content to decide the matter on the basis of the name. We do not enquire as a matter of routine whether the name of a body has been changed, but only if our suspicions are aroused. Thus we rely very considerably upon the presumption created by the name, and so, in order to understand how names of bodies work, we must ask how they create such presumptions, and also why those presumptions are nevertheless defeasible. First, naming a body is an irrevocable action. Once named it cannot be un-named. There are no un-naming ceremonies. Henceforth, the body bears that name forever, even after it has ceased to exist, unless and until it is exchanged for another. It is this, above all else, which makes proper names of bodies diachronic pointers. In the last resort, they enable us to identify individual bodies by leading us back to the time and place when the name was conferred; thus the historian, if necessary, goes to the
228
Basic categories: pointers
parish register (or perhaps to Somerset House) to identify a person, and follows his life on the assumption that the name will continue to point to the same person, unless he finds evidence that it was subsequently changed. Similarly the Passport Office demands a person's birth certificate before it will issue him with his first passport. When we are introduced to someone we do not, of course, insist that he produce his birth certificate, but we rely, nevertheless, upon the testimony of the person performing the introduction that the man he is introducing has been named as he states; and that testimony will usually rest upon further testimony, but the chain of testimony will eventually terminate in the original naming. Dummett has objected to this account that it does not apply to nicknames (1981a, pp. 192-5). In reply, two points are to be made. First, that the description of naming given above only has to describe the typical case, so that expressions whose conferral merely approximates to it may also be regarded as names of bodies. And, second, that nick-names do so approximate. The nick-name will have been given by someone in the first place, on a particular occasion. But the person who conferred it lacked any authority to do so, and the original occasion has doubtless been forgotten. Still, it 'caught on': other people started to use it, so that, within their limited community, it acquired the authority of custom. Thus there are analogues in the adoption of nick-names to what happens when a name is conferred officially; the final stage, perhaps more common with nick-names of places than of persons, occurs when the relevant authorities themselves adopt the name. Second, society takes certain precautions to ensure that names of bodies are, on the whole, reliable pointers. Where misidentification could have serious consequences (usually financial), it insists upon public ratification of the original naming; the name must be registered in a public record. Thereafter, obstacles are placed in the way of changing a name. Marriage affords the opportunity to take the family name of one's spouse; otherwise, it can only be done by a legal process and payment of a fairly stiff fee, together with public registration of the new name showing which name it is replacing. Moreover, more than one name at a time is very seldom allowed. People, admittedly, have both forenames and a family name, but these are regarded as constituting a single complex name, which will be cited in toto on official documents. Unofficially, people and sometimes bodies of other kinds do collect alternative names, such as nick-names and aliases. But nick-names are only current within a limited and private circle, while aliases (even nomsde-plume) are, precisely, more or less serious attempts to avoid
Proper names
229
identification and hence to subvert the normal purpose of giving names to bodies. Thus, to give a false name is in many circumstances an offence. These precautions for the most part ensure that each body has at most one name. If the system were to be really foolproof, it would also be necessary, apart from the tightening-up on the exceptions noted above, to prevent more than one body of the same kind from being given the same name. Society compromises upon this point, for two very good reasons. The first is that the number of distinct names required would be vast, and constantly increasing, since the name of a body does not lose its currency when that body ceases to exist. The need to devise a new name for each new body of the same kind would greatly tax our ingenuity and, in any case, the provision could not be enforced without a world government, together with a search, on the occasion of each new naming, of all names for bodies of that kind already registered to be sure that the new one did not duplicate one which was already registered. Although these are practical difficulties, it is pertinent to mention them, because they bring out how far removed from names of bodies in everyday language is the logician's convention that there is a one-to-one correspondence between such names and their bearers. The other reason for compromise is that language-learning in this area is greatly simplified by conventions that certain kinds of word are reserved for use as names for certain kinds of body. The conventions are not legally enforced and are quite often broken. Nevertheless, more often than not one can tell from a name for what kind of thing it is a name. Names for places are frequently recognizable as such from a prefix or suffix, such as in English 'Lower-', '-ton', '-borough', '-mouth'. Reverting to bodies, forenames of people are mainly drawn from quite a small pool, but there we surmount the problem by a method which serves two purposes simultaneously. By combining forenames with a family name, a relatively small stock of each yields a much larger number of distinct combinations. But as the family name is inherited (except where changed at marriage), it also helps to identify a person as the child of certain parents; thus a birth certificate provides a relative rather than an absolute identification, relying upon the names of the parents and the date of birth instead of upon a naming ceremony. This is why I avoided it as an example to illustrate what is essential to naming, for if the names of the parents were also registered by reference to their parents, we are set upon a regress which, though yielding a very practical method for naming people, must eventually terminate in naming ceremonies, even if they have now long been forgotten. This compromise has two important results. The first is that more than one body may have the same name. Even the use of a combination of
230
Basic categories: pointers
names, though designed to minimize such cases, does not eliminate them altogether, and can never afford a guarantee that no other body bears the same name. It just makes it less likely that a body will be misidentified in a given context. But, as Aristotle says, the improbable sometimes happens, and then we have to fall back upon devices very like those which are used to embellish pointing phrases. Thus two people both called 'Peter' may be distinguished in conversation, for example as 'Elizabeth's Peter' and 'Helen's Peter', where Elizabeth and Helen are the respective wives or perhaps mothers of each Peter. In effect, we are then resorting to relative clauses, such as 'the Peter that is married to Elizabeth', 'the Peter that lives in Leeds', 'the Peter that is an architect'. Even where these qualifications have been added to the name, they can never rule out the theoretical possibility that it still names more than one body, but that does not matter provided that they distinguish the bodies which might otherwise have been confused in that context. The name o/a body does not cease to be such because it has more than one bearer; that idea comes of taking the logician's model of how names of individuals ought to work instead of looking at how they do work, and then using it as a straight-jacket for everyday language. For such a name to have more than one bearer is the rule, not the exception, and it does not thereby become the name of a kind of body, as some have thought. That it is not the name of a kind in the strict sense of 'kind' is evident, since bodies come into existence unnamed and can change their names. But even in a loose sense, we do not think of a body's name as being the name of a kind. 'What kind of man is he?', we sometimes ask, but we do not expect to be told, for example, 'He's a Peter'. Names for kinds of body have quite a different purpose from proper names for bodies; the former are related to classification, the latter to identification, especially over time. To repeat, proper names of bodies are diachronic pointers. If both Peters (in a given context) are ill, then that means nothing more mysterious than that both men named 'Peter' are ill, and there has been no categorial metamorphosis by which 'Peter' has been turned into the name of a kind of body. To be fair to logicians since Frege, it is obviously essential to the validity of arguments containing proper names that the latter shall have the same bearers throughout any given argument in which they occur. Logical systems were originally devised to test the validity of arguments, so it was convenient and, indeed, justified to assume a one-to-one correlation between proper names and bearers of proper names or, at least, that no proper name had more than one bearer. Now that the purposes of logic have become much wider, in particular to aid in representing everyday language on computers, the restriction can no
Proper names
231
longer be justified, although it would still be necessary to demand that whenever a deduction was made, any proper names occurring in it were assumed to have just one bearer each throughout that deduction. This corresponds to what we do in everyday life, where we know perfectly well that many proper names have multiple bearers but, in a given context, rarely need to press for further specification. We can put it this way: that a proper name has the same bearer in multiple occurrences is a defeasible presumption, but not a prior requirement. Frege was so much a captive of the single-bearer model of names of bodies that he thought their meanings were given by relative clauses drawn up so that just one body satisfied the descriptions which they contain. Thus he suggested 'the pupil of Plato and teacher of Alexander the Great', that is, the man that was taught by Plato and taught Alexander the Great, as giving the meaning of 'Aristotle' (1892, p. 27 note). Even if we overlook the use of two further names of bodies in this example, Frege's view has the awkward consequence (from which, indeed, he did not shrink - see 1918, p. 65) that two people will more often than not attach different meanings to the same name although the body which each set of descriptions picks out is the same. But the more fundamental objection is that it reverses the proper relationship of the names of bodies and the histories of those bodies. Aristotle was Aristotle long before he sat at the feet of Plato and very long before he taught Alexander. Are we to believe that the meaning of his name changed as his life unfolded, perhaps, even, that the meaning of a name of a body only becomes definitive when that body ceases to exist? On the contrary, it is because Aristotle was taught by Plato and because Aristotle taught Alexander that we can infer that the same man both was taught by Plato and taught Alexander. If historians could not in general rely upon names of bodies in this way, their task would be impossible. 'Aristotle' acquired its meaning when Aristotle was named 'Aristotle': when, lying in his cradle, he was indistinguishable except to his close relations from many another baby, and then by marks which they would doubtless have found it difficult to describe to the satisfaction of the police if he had been stolen. The other effect of the compromise is that certain words are recognized as being reserved in the language for use as names of individuals, but yet may never have actually been used to name one. In a useful terminology introduced by Geach8 and already anticipated, they are names for bodies 8
1979, p. 145. The distinction should not be confused with Noonan's distinction between unitalicized and italicized 'of (1980, p. 13). Noonan's * "N" is a name o/an F' is the same as Geach's 'of, but is contrasted with the case where *F* does not provide a criterion of identity for the use of 'N\ but it is presupposed in both cases that something has been named by *N\
232
Basic categories: pointers
but not necessarily names of bodies (or perhaps names for places, etc.). Until recently, logic has only admitted names of bodies, and such systems as are now proposed which admit names for bodies which are not names o/bodies remain controversial. Yet once our purposes in giving structural representations for everyday language go beyond the analysis of inferences, it seems that we must provide for names for bodies which are not also names of bodies. For it is admitted on all hands that there will be meaningful sentences containing such names, even though it is disputed whether the sentences could ever be true or false. However, it does not follow that proper names which are not names of bodies should be represented differently from those which are, so we must now take up the issue of representation. 6.3 THE RELATIONSHIP OF PROPER NAMES TO COUNT NOUNS Let us recall at this point that certain proper names are conventionally reserved for certain types of bearer. In these cases we can tell from the name itself what it is a name for. Names of people are typical examples, where even the sex of the person can usually be divined from the name, while characteristic prefixes and suffixes of place-names have already been mentioned. When we stray outside these reserved areas, or where the name has multiple bearers of different kinds, it is quite common for it to be accompanied quite explicitly by the corresponding count noun. There are many examples in fiction: 'man Friday', because 'Friday' is normally reserved for a day of the week; 'Pooh bear' and, more recently, 'Paddington bear'. Other non-fictional examples are 'London town', 'New York city' and 'Washington state'. Thus, to a limited extent everyday language itself recognizes that proper names embody count nouns when cited in full. This accords exactly with the contention that, in order to explain the meanings of proper names, appeal must be made to substance nouns. A further indication that count nouns are embedded in the meanings of proper names is given by the possibility of using them, with the definite article, in place of pronouns. Thus, instead of 'Socrates kicked Fido, who promptly bit him', we can say: 'Socrates kicked Fido, and the dog promptly bit him'. Now whence does that count noun come, if it is not already implicitly present in the occurrence of 'Fido'?9 All this would be neatly explained if we suppose that proper names which do not qualify a count noun are a shorthand, that the count noun 9
This also answers Evans (1985, p. 103).
The relationship of proper names to count nouns
233
is omitted because those proper names are conventionally reserved for particular kinds of body and that this is already known to those who use them. Since proper names can form propositions from schemas of category S(D), it then follows that the compound of proper name and count noun will be an expression of category /), that is, a pointer. Moreover, the proper name alone will be an operator of category D(B), because the count noun belongs to category B. To take an example, I am claiming that the jingle (15)
London town is burning down
is a better guide to semantic structure than the more modern equivalent 'London is burning down', and our representation for it would, accordingly, be (15F) is burning down (London (town)), where the pointer is the whole sub-structure 'London (town)'. This is, incidentally, an (historical) pointer to a body in this context and not the name of a place, since places cannot burn down and London, qua place, would survive the destruction of any buildings built in that place. So far I have argued only that some count noun is to be regarded as implicitly present in the full specification of a proper name. It is possible, however, to be more definite than that. It will, of course, be a substance noun, since the individual could cease to be whatever a non-substance noun indicated without any change in the meaning of its proper name. But normally more than one substance noun could be cited, for example Quentin is not only a person, but also a mammal and an animal. These substance nouns are related as the names of genera and species, however, and we normally try to give the maximum information about the meaning of a proper name; so the appropriate substance noun will always be that which gives the infima species. At this point we are in danger of getting in a muddle over terminology, since there are considerations which prompt us to call both the operator and the compound a proper name. The compound, because the substance noun is suppressed when there is a convention about what the proper name is a name for; the operator, in order to distinguish the proper name from the count noun. I shall take the latter option, but make a concession to the former by calling the compound of proper name and count noun a proper name phrase. (When a proper name phrase occurs in a proposition the count noun may not, of course, be overtly present and may remain implicit.) My conclusion is thus that proper name phrases belong to the same category as demonstratives and are pointers, though diachronic
234
Basic categories: pointers
pointers (whereas demonstratives are synchronic), while proper names themselves are operators forming pointers from substance nouns. (It could be specified that their operands must be drawn from that subcategory of B.) This, moreover, makes the minimal modification to Frege's logic which is necessary in order to overcome the difficulties raised in section 5.1, thus allowing us to take over much which has already been thoroughly investigated, including, as will appear in the sequel, quantification theory. It remains to take up a piece of unfinished business from section 5.3. What is the semantic structure of sentences used to introduce proper names? In the light of the distinction drawn above between names for bodies and names of bodies, there will be two cases to consider. Since a proper name may either be introduced into the language, or merely to a person who does not yet know its meaning, there will also be two subcases under each heading. Case 1, then, is the introduction of a proper name for something, and sub-case la will be when it is introduced into the language for the first time - a relatively rare event, except, perhaps, among writers of fiction or owners of domestic pets and vintage motor-cars. This is the very first move in preparing an expression for use in a language as a proper name. It stipulates what kind of thing the expression is a name for. This stipulation will be neither true nor false, hence not a proposition, but it undoubtedly has a meaning and so must have a semantic structure. Subcase lb is an explanation of what a proper name is a proper name for, when there is an existing convention to that effect. Truth or falsity will be present in this case, since it is clearly possible to make a mistake in giving such an explanation. Case 2 is the introduction of a proper name of something, that is, of making an expression a name of an individual; in Geach's terminology, this is an act of naming. Again there will be two sub-cases; 2a is an act of naming in the strongest sense, when a name is conferred upon an individual. This, too, is a stipulation and so is neither true nor false, yet, in virtue of having a meaning, must have a semantic structure. Sub-case 2b is an explanation of what a proper name is a proper name of where it has been conferred previously (as in an introduction); one may call this an act of naming, but only if it is recognized that it leaves room for a mistake, and so will be either true or false. In case 2 the proper name will usually already be a name for something, but sometimes both stipulations or explanations will be combined in a single sentence. There is sometimes nothing in the sentence used to show to which case it relates; 2a is the most distinctive, for instance (12), but even the occurrence of a demonstrative in the sentence does not
The relationship of proper names to count nouns
235
conclusively make it an example of case 2, since the sentence might occur in a fictional context. The example which I gave in section 5.3, (16)
Neddy is a horse
was intended as a 2b case, but could equally be an example of lb. However, this is only to say that the meaning of the sentence will have to be gleaned from the context, linguistic or otherwise, and we shall still require a distinct representation for each sub-case, though 2b should entail lb and 2a have a corresponding relationship to la. (It cannot be entailment because the a sub-cases are not propositions.) It will be convenient to discuss examples of each case, which must, accordingly, be taken as examples of the case cited even though they might be construed as examples of another case. So, for case la, let us take (17)
Paddington is a bear
as said for the first time by the author of the story. The same example will do for case lb, but now given as an explanation to someone who has heard part of a conversation relating to the story but is only familiar with 'Paddington' as the name of a railway station. In both of these sub-cases there is no reason to regard the sentence as significantly tensed; it is not saying that Paddington is now a bear, because, so long as we are only concerned with a name for a body, we do not know whether or not it has a bearer which could exist at a particular time. The two a sub-cases appear to be related to the corresponding b subcases rather as a command is related to the proposition describing its execution. Thus the author who first introduced 'Paddington' as the name of a bear said, in effect, 'Let "Paddington" be a name for a bear' (or 'Let Paddington be a bear'). We might, then, represent this relationship by using an operator 'LET', which takes a proposition as its operand, meaning roughly 'Let it be the case that . . .'. So, if (17) be taken in the lb sense only, LET (Paddington is a bear) would represent the la sense. Similarly, taking (14) in the 2b sense, LET (That ship is the Caronia) could represent (12). Notice, however, that the case 2 example contains a demonstrative and that LET (Neddy is a horse)
236
Basic categories: pointers
could not represent an act of naming, because it would not tell us which horse bore the name 'Neddy1, only that 'Neddy' is to be the name of some horse or other. However, if we can represent the two b sub-cases, the LET operator will take care of the two a sub-cases.10 The LET operator does not, of course, form a proposition and hence cannot be assigned a category given only our present set of basic categories, S, B and D. But (17), taken in the lb sense, is a proposition, so we cannot represent it simply as 'Paddington (bear)', because that is a proper name phrase and thus of category D instead of S. Both Taddington' and 'bear' must, then, occur as operands, and we shall need an operator of category S(D(B),B) to form a proposition from them. It would be natural to call this operator 'FOR', thus representing (17) by (17F) FOR:X (Paddington (x), bear), which could also be translated as '"Paddington" is a name for a bear'. It remains to find a representation for (14). In accordance with what has been said about demonstratives, we must construe (14) as equivalent to (14')
That is a ship and it is (the) Caronia,
where 'ship' is used predicatively. The only issue, then, is how to represent the second limb of the conjunction. I mentioned in section 6.2 that some people regard the 'is' in the latter as signifying identity, but we can now dismiss that suggestion definitively, for the expressions flanking the identity sign would then be of different categories, even of different levels (D and D(B) respectively), whereas the very least we can expect of an identity sign is that it should be flanked by expressions of the same category. A more promising avenue would be to look for an analogue to the operator 'FOR'. However, if we take (14) rather than (16) as our guide, it would have to be of category S(D(B),D) rather than of category S(D(B),B). Let us then introduce an operator 'OF' belonging to the former category. We really need a graph in order to represent (14), because of the converging scope introduced by the pronoun 'it' which is explicit in (14F); but that has nothing to do with the 'OF' operator and would add nothing from a structural point of view to (Gl). I shall 10
My LET operator is, fairly obviously, inspired by the operator of the same name in the programming language BASIC, which is used to assign values to variables (although, being optional, is more often omitted). There seems to be an analogy between assigning values to variables and names to bodies (though only an analogy); at any rate, 'LET' is listed among the commands of BASIC.
The relationship of proper names to count nouns
237
therefore use a hybrid form to illustrate what is new in the representation of (14): (14F) That is a ship and OF:X (Caronia (x), it). I think it is evident from this that the first attempt, above, to represent (12) was not quite right; we should not bring the whole of (14) within the scope of 'LET', but only the second limb, viz. (12F) That is a ship and LET (OF:X (Caronia (x), it)), that is, T h a t is a ship and let Caronia be a name of it' as opposed to 'Let that be a ship and Caronia be a name of it'. Given this analysis, cases like (16) can be represented by resorting to quantification. The precise form that this will take must await the modification of quantification theory which is imposed by our new account of the categories of count nouns and of proper names, but the paraphrase (16')
For some horse, Neddy is a name of it
illustrates the general idea. If we compare the schemas 'FOR:X (Neddy(x), ()' and 'OF:X (Neddy(x), Q', the significant difference between them is that the first is of category S(B) while the second is of category S(D). Bearing in mind what has been said previously about these two categories, we should expect the 'OF' operator to be significantly tensed but the 'FOR' operator not to be. I have already argued for the latter point; confirmation of the former is provided by examples such as: (18)
Bucephalus was a horse,
which imply, by contrast, that (16) is explaining that Neddy is now the name of a horse. But could we represent (18) by (18F) For some horse, Bucephalus was a name of it? Well, there is a certain ambiguity in 'Neddy is a name of it' which only comes out with the past tense. 'Bucephalus was a name of it' suggests not that the bearer of 'Bucephalus' has ceased to exist, but that 'Bucephalus' has ceased to be the name of that bearer, that is, that the latter now has a different name. That is clearly not the sense of (18). However, if we turn round the translation of the 'OF' operator to 'it bears the name Bucephalus', then the corresponding past version 'it bore the name Bucephalus' could quite naturally be understood as expressing the same as (18). So if tenses are sensitive to voices of verbs, we should be able to express both senses.
238
Basic categories: pointers
To express the relationship between the 'FOR' and 'OF' operators, we also need quantification, for it is evident that we can infer 'Bucephalus is a name for a horse' from 'Bucephalus is a name of a horse' without knowing of which horse it is the name. In hybrid form again, the consequence will be For some B, OF:X
(N(X),
it) ^=
FOR:X (N(X),
B)
where the same substitution of a count noun is made for both occurrences of 'B', and the same substitution of a proper name for both occurrences of 'N(C)'. With this, our account of the semantic structures of sentences used to introduce proper names is complete.
Quantifiers, pronouns and identity
7.1 QUANTIFICATION REVISITED Quantifying phrases, like proper name phrases, explicitly include a count noun or count noun phrase; but, unlike pointers, they have scope. We may therefore expect that, if we are to be able to give a correct account of inferences involving propositions containing quantifying phrases, then our structural representations for the latter will at least preserve the distinctive feature introduced by Frege, namely, that they are the operators of higher level schemas. Now the simplest way of assuring this, and the one closest to Frege, is to assign quantifying phrases to category S(S(D)). Propositions containing quantifying phrases would then be represented exactly as described in the previous chapter, with the sole exception that the category TV is replaced by the category D. The most straightforward way of decomposing a quantifying expression into its constituents, quantifier and count noun, is then to make the latter a further operand of the former, so that the category of 'every', 'some', etc., becomes S(B,S(D)). The resulting representation of one of the examples used in section 4.1, (1)
Every doctor visited at least one patient
will then be (Gl). (Gl)
visited
239
240
Quantifiers, pronouns and identity
With some re-labelling, the same digraph will serve to represent the paired example (2)
At least one patient was visited by every doctor.
It is only necessary to interchange 'every' and 'some', 'doctor' and 'patient', 'A' and 'P'. In these representations, however, each of the count nouns goes with its quantifier and does not lie within the scope of the verb (there is no directed path from 'visited' either to 'doctor' or to 'patient'). Of course, this holds, too, of the Fregean representations of these propositions, viz.: (IF) (2F)
everything:x (if (doctor(x), something:y (and (patient(y), visited(y,x] something:y (and (patient(y), everything:x (if (doctor(x), visited(y,x]
which differ only in the interchange of the two expressions beginning with a quantifier and ending with the next comma. We have perhaps become so accustomed to the Fregean analysis as no longer to think of questioning it. Yet, if we compare the two propositions of which these purport to be semantic representations, there seems intuitively to be a common core to the meaning of both which is represented neither in the Fregean analysis nor in the digraphs. It is the notion of doctor-visiting-patient, which is then differently quantified in each proposition. In other words, the count nouns, while they are indeed semantically connected with their respective quantifiers, are also semantically connected with the verb. In everyday language, the latter connexion is preserved by the change in the voice of the verb when the order of the quantifiers is interchanged; indeed, it has priority over the order of the quantifiers in construing the proposition: (2) is actually ambiguous with respect to the scope of the quantifiers, but neither (1) nor (2) is ambiguous with respect to doctor visiting patient rather than patient visiting doctor. Nor, I think, could one even construct a sentence of everyday language in which that ambiguity occurred. Thus both the Fregean analysis and the one modelled on it that was suggested above do violence to one aspect of the meaning of everyday language. But the opposite extreme, in which the count nouns were tied to the verb but separated from their respective quantifiers, would be still worse. The quantifying phrases 'every doctor' and 'at least one patient' move as wholes from (1) to (2), and it is essential to understanding the meaning of each proposition that we know which quantifier qualifies which count noun. Moreover, there are strong semantic grounds for not representing doctor-visiting-patient as a sub-graph. For this would have to be a generic proposition, with 'visit' untensed as the operator of a schema of category S(B,B), which was subsequently given a tense while
Quantification revisited
241
each of the count nouns was brought within the scope of a quantifier. But generic propositions do not appear to be contained within quantified propositions. This is clear from the simpler example, (3)
A dog barked.
The corresponding generic proposition would be 'Dogs bark' understood as saying something timeless about the nature of dogs. Yet the meaning of 'A dog barked' has nothing to do with the nature of dogs, any more than 'A dog sang', so it is difficult to see how any operation could yield one from the other. No: the tensed verb is first formed on the one hand, and the quantifying expression on the other, and then the two are combined to produce a generalization over individuals. To this extent, then, the Fregean analysis must be correct. The graph notation, however, contains a possibility of simplifying the representation so as to bring the count noun within the scope of the verb, while retaining the degree of multiplicity necessary for a correct account of the logical relationships of propositions containing more than one quantifying phrase. Let us first present this solution and then reflect upon it; the representation for (1) is (G2), (G2)
patient
and that for (2) may be obtained by the same re-labelling as before. Our first question must be to what category this representation assigns the quantifiers. The comparison lies with the third-level schemas discussed in section 4.4. Graph (G2) corresponds to the special case of (G18) of that section, in which a second-level operand is replaced by two operands (one within the scope of the other),first-leveland basic respectively. The only difference lies in the categories of the last two nodes of the quantifier, which are now D and B respectively instead of P and N. So we can see from the representation itself that the quantifier would combine with a second-level operand of category S(D(B)) to form a proposition and, hence, that its category must be S(S(D(B))).
242
Quantifiers, pronouns and identity
If we now consider how a schema of category S(D(B)) might be formed, we see that it could be done by removing a proper name from a proposition, for example by removing 'Fido' from 'barked (Fido (dog))\ Since Tido' is the operator of a first-level schema, the schema which results when it is removed from a proposition is of second level, and an operator which can take that, in turn, as its operand will be of third level. Moreover, 'every' or 'some' is now seen as replacing a proper name, just as in standard quantification theory, except that, proper names now no longer being basic expressions but the operators of first-level schemas, 'every' and 'some' become the operators of third-level schemas. Of course, when the schema for 'every' or 'some' is combined with such a second-level schema, the result will be the same type of structure as in (G2), for example (G3)
some S B
D
6 barked dog Finally, it is evident that the B-edge of (G3) can be contracted, giving the result that the quantifying phrase 'some dog' is the operator of a schema of category S(S(D))y which was our starting-point. So everything is as it should be. I now give a graph to represent example (16) of section 6.3: (4)
Neddy is a horse,
in the sense, not merely that Neddy is a name FOR a horse, but that it is the name OF a horse, that is, has been conferred upon a horse. As indicated, the analysis takes the form 'For some horse, "Neddy" is a name OF it', which yields (G4). some
(G4)
D horse O OF D
Neddy
Quantification revisited
243
Although it is not in general part of the aim of this work to set up logical systems, anyone who proposes an alternative to standard firstorder logic faces an implicit challenge to show that it correctly ensures the validity of arguments involving generality. Exceptionally, therefore, I shall specify rules of inference for 'every' and 'some' (in the sense of 'at least one') when they are the operators of schemas of category S(S(D(B))). The reader will probably find these rules easier to assimilate, and to compare with the standard quantifier rules, if they are expressed in a linear notation, so I shall use 'F(N(B))' as a formula in which ' F ' is an operator of category S(D), 'N' a proper name of a. body and 'B' an expression of the basic category B (that is, a count noun for a kind of body). Then we have: Every +
Every —
F(N(B))
every:n provided that 'NT does not occur in any premiss upon which 'every:n (F(n(B)))' depends Some +
every:n F(N(B))
Some-
F(N(B)) some:n
F(N(B»
P
provided that VN' does not occur in T ' nor in any premiss upon which *P* depends, except T(N(B))'
some:n
It will be seen that the rules for 'every' are exact analogues of the standard rules for the universal quantifier; similarly, the rules for 'some' are exact analogues of the standard rules for the existential quantifier. Accordingly, it will not be necessary to prove that the rules preserve consistency and completeness. For comparison, I also set out the Every- rule in graph notation: every
B
B
It should be obvious from this how the other three rules would look. At the end of part I of Begriffsschrift, Frege sets out the traditional square of opposition, though with his own formulas at the corners (1979, section 12), that is to say:
244 A: E: I: O:
Quantifiers, pronouns and identity Every S is P N o S is P Some S is P Some S is n o t P
everything.x (if (S(x), P(x] everything:x (if (S(x), not (P(x] not (everything:x (if (S(x), not (P(x] not (everything:x (if (S(x), P(x]
The two diagonals of the square are labelled 'contradictory', which is borne out by Frege's formulas, since they run from the A formula to the O formula and from the E formula to the I formula respectively. But the two sides of the square are labelled 'subaltern', which meant that the formula at the bottom should follow logically from that at the top. Frege seems not to have realized that, with his representations, these two relationships no longer hold. The left-hand side of the square goes from the A formula to the I formula and the right-hand side from the E formula to the O formula; the reason for the failure can most easily be seen if we replace the representation of the I formula by the equivalent 'something:x (and (S(x), P(x]'. For this to be true, there must be at least one S, but Frege's representation of the A formula is true providing that there is no case of an S that is not P, and this includes the (trivial) case in which there is no S. This difference has considerable consequences for Frege's logic as compared with Aristotle's. Thus a number of syllogistic patterns of inference recognized as valid by Aristotle and subsequently, are invalidated (the details may be found in Quine (1952, p. 77)). Nor is there any straightforward way of representing the A and E formulas in Frege's notation so as to include the existential commitment which they were traditionally thought to carry; Strawson has shown that the minimal formulas needed to restore all the relationships of the square of opposition are so lengthy and complex as to be most unconvincing as representations of the four traditional formulas (1952, pp. 163-79). Well, it is perhaps a matter of dispute whether the A and E formulas should be true when there is no S, but at any rate the source of the divergence is clear. It does not lie in any difference between Aristotle and Frege over the meaning of 'every(thing)' and 'something)'. Even in Frege's logic there is a valid inference from 'Everything:x (A(x))' to 'Something:x (A(x))'. The source of the divergence is, rather, Frege's treatment of the subject-term as a disguised intransitive verb, so that the propositional operator 'if is introduced into the analysis of the A and E formulas. Given Frege's definition of 'if, as forming a true proposition in every possible case except when its antecedent is true and its consequent false, the result then follows, for such a proposition is automatically true whenever its antecedent is false.
Quantification revisited
245
The representations proposed here have, of course, eliminated propositional operators from the analysis of A, E, I and O propositions. Thus (G3), for instance, represents an I proposition. It is a consequence of this representation that there is no way of saying, for example There are no dogs' unless we introduce a new expression of category B, for example by saying: 'No animal is a dog'. The same point would then apply to 'animal' in this new example. It seems, then, that we must regard it as a presupposition of the truth of A, E, I and O propositions that their subject-terms be not empty. That, in turn, will validate the relationships of the square of opposition as well as the patterns of syllogistic argument in dispute between Aristotle and Frege. The elimination of propositional operators from these analyses has a further advantage. In representing everyday language, we cannot ignore other expressions which may be substituted salva congruitate for the standard quantifiers, such as 'many', 'most', 'several' and 'few'; but 'it has recently been widely recognized that the way of reducing the superficial binary structures of "Some As are Bs" and "All As are Bs" to the unary structures which are familiar from the classical predicate calculus cannot be generalized to all quantifiers' (Evans, 1977, p. 788).1 One would probably be forced, therefore, to assign some of these quantifiers, at any rate, to category S(S(N),S(N))> corresponding to the combination of one of the standard quantifiers with a propositional operator. Yet there is no convincing reason for holding that they belong to a different category from 'every' and 'some'. On the analysis proposed here, by contrast, all quantifiers will be assigned to category S(S(D(B))). A problem remains, however, with regard to syllogisms and some other traditional rules of inference. Syllogisms have a 'middle term'; in some of the figures, it must be possible for this term to occur syntactically as subject in one premiss but as predicate in the other. That presents no difficulty for Frege's representations, since his treatment of both terms is completely symmetrical. On the analysis proposed here, however, names of kinds of body and first-level schemas cannot be interchanged. The difficulty is removed in the special case where a count noun is used predicatively, since we proposed an operator of category S(D,B) to effect the conversion. Thus a structure for (5) would be given by (G5). (5)
Every philosopher is a writer.
Altham (1971, p. 13) argues for 'many things:x (and (man (x), lover (x)))' to represent 'many men are lovers'. By contrast, 'nearly all men are lovers' is represented by 'nearly all thingsix (if (man(x), lover(x)))'.
246
Quantifiers, pronouns and identity
(G5)
every
D
6
philosopher
(is a)
writer
If this were the major premiss of a syllogism, it is evident that 'writer' could be the middle term, occurring again, for example, in a minor premiss 'No writer is wealthy'. In the latter, however, the predicate term is adjectival, so 'wealthy' could not be used as a middle term. But this difficulty was latent in traditional logic, too; in Greek and Latin it could be hidden by using the neuter form of the adjective as subject term, covertly transforming it into a noun, while verbal middle terms were for the most part conveniently avoided. The example given is rather similar, in that 'writer' is a count noun formed from a verb. In general, these cases can be accommodated by the use of relative clauses, to which, together with anaphoric pronouns, I now turn. 7.2 ANAPHORIC PRONOUNS AND RELATIVE CLAUSES REVISITED The general principles governing representations of propositions containing anaphoric pronouns remain unchanged from chapter 4, but the structures are now naturally more complex in view of the developments which have been introduced in the meantime. The source of each graph lies, as before, in a propositional operator, but the sinks, in which the pathways terminate, are now count nouns of category B. Moreover, converging scope requires duplication of the nodes of proper name phrases in order to avoid merging scopeways. From any of the graphs finally approved in section 4.2, the corresponding new representations can be constructed mechanically by replacing each N-node of the former by the appropriate sub-graph for a proper name and operand of category B, or by extending the sub-graphs for quantifiers, replacing their N-nodes with D- and B-nodes and attaching an operand of category B to the latter, together with, in each case, the requisite duplication of nodes. It should not, then, be necessary to go through all of the examples of section 4.2 again, but I take (4) and (8) of that section, shown now as (6) and (7), to illustrate the procedure for an unquantified and a quantified example respectively.
Anaphoric pronouns and relative clauses revisited (6)
247
Plato taught Aristotle and was admired by him
(G6)
taught
man
admired Because of the converging scope onto the D-nodes of Tlato' and 'Aristotle', those nodes are duplicated; similarly for their B-nodes and for the 'man' node. But the latter duplications are unnecessary, for, unlike (G19) of section 4.4, there can be no question of each proper name having two different operands. Yet, so long as we restrict ourselves to examples containing appositive relative clauses or their equivalents, the duplications are harmless enough. The reader may verify this by making the modifications to (G13) of section 4.3 which would now be necessary in view of the developments of chapters 5 and 6; in spite of triple convergence, it is quite straightforward to modify the original graph. But were the operand of the proper name to be a complex expression of category B, perhaps with convergence upon some of its nodes, the resulting complications to the representation might be quite unacceptable. Consequently it will be safer not to duplicate nodes unnecessarily, allowing internal scopeways to converge in order to prevent this. This is illustrated in my graph (G7) for the second, quantified, example. (7)
If anyone makes a sound, he dies
The modified treatment of examples with pronouns of group 2b is exactly parallel; an example will be given in section 7.3. As was noted in section 4.3, both of the examples above can be paraphrased by propositions containing appositive relative clauses, so it will not be necessary either to spell out any further the modified
248
Quantifiers, pronouns and identity
treatment of the latter. But we need to consider the representation of examples containing restrictive relative clauses in more detail. On the (G7)
Oman
account of restrictive relative clauses developed in section 4.3, the analysis of 'girl that loves John' was: that:x (loves (John, x), girl (x), 0> but this has now to be adapted to the assignment of names of bodies to category B and proper names to category D(B). 'Girl' must now be assigned to the basic category B, as well as the whole resulting phrase, while 'John' moves up from basic to first level. The first of these changes effects a simplification. Thus, if 'girl that loves John' and 'girl' are both assigned to category B, with 'loves' assigned to category S(D,D) - so that ioves John (man)' belongs to category S(D) - the category of 'that' will be B(S(D),B) and the analysis: that:x (loves (John (man), x), girl). This presupposes that the verb of the restrictive relative clause is significantly tensed. Because, however, a restrictive relative clause is alternatively called a defining clause, one might be led to suppose that it is
Anaphoric pronouns and relative clauses revisited
249
always untensed. But there are plenty of counter-examples, for example Fowler's 'Among the distinguished visitors [that] the Crawfords had at Rome was Longfellow', or 'Each made a list of books that had influenced him'. Tensed examples are much commoner in practice than untensed ones - a further reason for calling these relative clauses 'restrictive' rather than 'defining'. In particular, the meanings of many count nouns can be spelled out by means of another count noun qualified by a restrictive relative clause. A cobbler, for example, is a man that mends shoes - perhaps we should add 'for a living', but let us not be too fussy about that; in any case, one can surely speak of an amateur cobbler, just as we speak of an amateur photographer (a person that takes photographs, though not for a living). So, (8)
Every man that mends shoes is poor,
having, ex hypothesi, the same meaning as 'Every cobbler is poor', should have the same semantic structure as the latter. But that is an Aproposition, for which an analysis was proposed in the last section, in which 'cobbler' is represented as an expression of category B. Consequently, in the representation of (8), 'man that mends shoes' must also be a (complex) expression of category B. But is 'mends shoes' significantly tensed? It contains & frequentative or repetitive element in its meaning which was more explicit in the older English form 'man who is wont to mend shoes'. Now it seems that this frequentative sense is more strictly an aspect rather than a tense of the verb (see Galton, 1984, p. 155). Nevertheless, 'man who mends shoes' contrasts with 'man who used to (was wont to) mend shoes' as the corresponding past form (which would yield 'ex-cobbler' or, perhaps, 'retired cobbler'). So we must, it seems, regard it as being in the present tense of the frequentative aspect rather than as untensed. For the most part, then, the first-level schemas in restrictive relative clauses will be of category S(D). This account of restrictive relative clauses is, of course, precisely the one which Geach denies; according to him, such a clause together with its antecedent does not form an expression of a basic category (see section 4.3; Geach, 1962, sections 70-71), After noting a logical difference between restrictive and non-restrictive relative clauses, Geach goes on to canvas the proposal made above and, after saying that it seems to give 'quite a good explanation of the difference between defining and qualifying relative clauses' cites, as a further feather in its cap, that it can deal with propositions containing a relative clause which could be interpreted either as defining or as qualifying. Thus (forgetting, for the
250
Quantifiers, pronouns and identity
moment, the Fowler convention on 'that'), we might treat 'that mends shoes' in (18) either as forming a phrase of category B from 'man' (restrictive) or as equivalent to 'if he mends shoes' (non-restrictive). Then, however, Geach asks what is the logical structure of 'B that F' phrases. He begins by suggesting that the structure must be 'logically posterior' to the predicational structure 'F(B)'; this is explained by means of the example 'pink pigs', meaning 'pigs that are pink': 'and this depends for its intelligibility on "Pigs are pink", not vice versa. We may thus expect that the analysis of a proposition containing the complex term "pink pigs" should contain the predication ". . . pigs are pink"'. It seems to me that Geach has here conflated two distinct notions of logical posteriority. The first is, in context, perfectly reasonable: 'pigs that are pink' will only have a meaning if 'pigs are pink' does so (this would be a springboard for categorizing 'that'). The second is a much stronger and, to my mind, unreasonable demand, that the analysis of any proposition containing the phrase 'pigs that are pink' must contain the unit 'pigs are pink'. A proposition like (3) only has a meaning if the combination of 'barked' with a pointer has a meaning: the latter is logically prior to (3) because the category of'barked' is S(D), that is, that of phrases which combine with a pointer to yield a proposition. But nobody has ever thought that the analysis of propositions like (3) must contain a unit comprised of a pointer and a first-level operator; quite the contrary, indeed: the quantifying phrase replaces the pointer. The present case differs in that 'pigs' occurs in the proposition to be analysed, but why should it not be combined with 'are pink' in a different way in 'pigs that are pink' from 'pigs are pink'? Just so, I argued in section 6.3, proper names are combined with names of kinds of body differently when we explain that a proper name is a name for such-and-such a kind of body from when it is combined with a proper name in a proper name phrase. I think we can say with some confidence that this second requirement by Geach is inconsistent with Fregean grammar as a whole, which would be crippled if any two expressions of different categories could only be combined in at most one way, for this is what the demand amounts to. Geach then queries whether an analysis which conforms to his second requirement must 'contain a part that can be picked out and identified as the analysis of the phrase 'pink pigs'.' As a counter-example, he cites 'Some pink pigs squeal', for which he proposes the analysis 'Some pigs are pink and the same pigs squeal', claiming that if we delete 'some' and 'squeal' from the latter, what remains does not form a logical unit. But, in the first place, this is not a proposition containing a restrictive relative clause; it has an adjective instead, and Geach's proposed analysis treats
Anaphoric pronouns and relative clauses revisited
251
the adjective as equivalent to a «o«-restrictive relative clause, by contrast with 'Some pigs that are pink squeal'. Moreover, the notion of a logical unit to which Geach here appeals is obscure. Is a schema a logical unit (it is, ex hypothesi, semantically coherent)? If so, removal of the words cited by Geach does leave a logical unit, for both expressions removed are themselves logical units, the first a quantifier and the second a first-level operator; it would, then, be a schema of category S(S(S(D(B))),S(D)). If, however, a schema is not a logical unit, Geach owes us an explanation of this term. Up to this point, Geach claims only to have raised a suspicion that 'B that F' phrases are not semantically coherent. He thinks this suspicion is confirmed by comparison of the 'obviously equivalent propositions' (9) (10)
Any gentleman who is so grossly insulted must send a challenge Any gentleman, if he is so grossly insulted, must send a challenge,
because the words 'gentleman, if he is so grossly insulted' in the latter do not even look like a logical unit. Again, this does not seem to be a counter-example, for, if we do indeed treat (10) as equivalent to (9), then we are regarding the relative clause in the latter as non-restrictive, whereas the original claim was only that restrictive relative clauses form expressions of category B when attached to a count noun of that category. Geach's subsequent demonstration that substitution of a phrase like 'gentleman, if he is so grossly insulted' for 'A' in 'any A' leads to paralogisms is thus not to the point. Geach offers a further argument drawn from propositions containing pronouns of group 2b, using example (21) of section 4.3: (11)
Any man that owns a donkey beats it.
The argument then runs as follows: 'man that owns a donkey' means the same as 'donkey-owner'; but *'Any donkey-owner beats it' is nonsense; ergo 'man that owns a donkey' is not a semantic unit in (11) (Geach, 1962, section 72). The implicit criterion for a semantic unit in this argument is that an expression A is a semantic unit in an expression B just in case a further expression C, having the same meaning as A, may be substituted for A salva congruitate. But this is not strong enough to achieve the desired result, for 'Any donkey-owner beats it' is not nonsense; if we interpret the pronoun non-anaphorically, the sentence is quite in order. At the very least, we must strengthen the criterion by replacing salva congruitate with salva veritate. Even then, the conclusion of the argument is relative, and does not show that 'man that owns a donkey' is not a semantic unit in any context - just as an expression's
252
Quantifiers, pronouns and identity
being a semantic unit in one context does not guarantee that it is so in every other context. 2 Evans, however, replies to Geach's argument by denying that one can always substitute for an expression in a proposition another having the same meaning salva congruitate; if the two expressions do not also have the same structure, the substitution may fail. This strikes me as a very controversial reply, but Evans also offers another, ad hoc argument. Since Geach's own analysis of (11) is (1 IF) any man:x (if (owns a donkey (x), beats (x)), Evans says we are entitled, on Geach's principle, to substitute 'is a donkey-owner' for 'owns a donkey', but actually get a similarly unacceptable result, *'Any man, if he is a donkey-owner, beats it'. So 'Geach is hoist with his own petard'. Subsequently, however, Geach has presented two more examples of propositions which, in his view, create difficulties for those who want to hold that expressions of the form 'B that F' are to be assigned to a basic category. The first example is: (12)
Only a woman who has lost all sense of shame will get drunk.
The premiss of his argument is that (12) does not entail that a man will not get drunk. Hence we cannot regard it as obtainable by substituting 'woman who has lost all sense of shame' for B in 'Only a B will get drunk'. Instead, he maintains, the correct analysis is: 'A(ny) woman will get drunk only if she has lost all sense of shame', in which even the appearance of a unit 'woman who has lost all sense of shame' has disappeared (Geach, 1965). Evans observed that (12) is in fact ambiguous, and can be understood so as to entail that a man will not get drunk. But he does not press the point, preferring instead to cite other examples of 'only' sentences with senses parallel to Geach's interpretation of (12) which pose the same problem although their complex terms would be logical units even on Geach's principles. So, for example, (13)
Only a large woman will get drunk
has the same ambiguity as (12), yet 'large' is, according to Geach, an attributive adjective, that is, one that cannot be expounded by recourse to conjunction: thus 'large woman' does not mean 'is large and is a woman' (Geach, 1956, p. 33). We cannot, therefore, analyse (13) as 'A(ny) woman will get drunk only if she is large'. 2
I owe these comments on Geach's argument to an anonymous publisher's reader.
Anaphoric pronouns and relative clauses revisited
253
I do not find this reply very convincing. The rejected analysis of (13) does indeed seem to render its intended sense, and problems would only arise were we to press it a step further to: (13F) any woman:x (if (will get drunk (x), large (x))), because this presupposes a notion of largeness which is intelligible without reference to what kind of thing is in question. However, it may be that we understand the original analysis without difficulty because we implicitly take 'large' as short for iarge woman', so that the corresponding Fregean-style analysis should be: (13'F) any woman:x (if (will get drunk (x), large woman (x))). What is more to the point is the ambiguity of all these sentences. Let me repeat, at this point, that I am only defending the view that restrictive relative clauses can be considered to form a complex expression of category B from the count noun which they qualify. Now the distinction between a restrictive and an appositive relative clause is not at all clear intuitively with 'any . . . who' and 'only . . . who' propositions. However, where they are ambiguous and one sense is given correctly by taking the count noun plus relative clause as a unit, while the other is not, it seems wholly reasonable to say that the clause is interpreted restrictively in the first case but appositively in the second. Thus, using the Fowler convention, (12A) Only a woman that has lost all sense of shame will get drunk, with the relative clause taken restrictively, will entail 'No man will get drunk', but (12) itself, with the relative clause understood appositively, will not. So we may agree with Geach that count noun plus relative clause is not a unit when the proposition is taken in the sense that he wishes to analyse, but disagree that this shows that it is never an expression of a basic category. His final example is: (14)
The one woman whom every true Englishman honours above all other women is his mother.
The argument is that 'woman whom every true Englishman honours above all other women' cannot be an expression of a basic category, say 'B' for short, because it would then entail that the one and only B is the mother of each true Englishman (Geach, 1968; 1972, pp. 120-6). Now, strictly speaking, this is correct, for 'his' is anaphoric and relates to 'every true Englishman', so we must take the latter to be the main operator of the proposition. But this does not prevent us from representing the
254
Quantifiers, pronouns and identity
restrictive relative clause as a schema obtainable from an expression of category B by removing a proper name. There is an additional but extraneous complication in this example which I propose to ignore, namely, the tie between 'all other women' and 'The one woman'; so I shall just shorten 'honours above all other women' to 'honours'. If we then represent (15)
The one woman whom Peter honours is his mother
by just one:x (bore (Peter (man)), x (that:y (honours (Peter (man), y), woman], we have but to remove the two occurrences of the proper name, change the count noun 'man' to 'true Englishman' and use the resulting schema of category S(D(B)) as operand to 'every': (14F) every:z (just one:x (bore (z (true Englishman)), x (that:y (honours (z (true Englishman), y, woman]. As Evans comments, 'to argue on the basis of sentences like (14) that a common noun plus its relative clause does not form a genuine logical unit seems to require the absurd assumption that a genuine logical unit cannot be quantified into' (1977; 1985, p. 165). I conclude that we have not encountered any solid objection to treating expressions of the form 'B that F \ where 'that' introduces a restrictive relative clause, as being of category B. The Fregean analysis of (8), by contrast, actually gives us the wrong truth conditions. It was pointed out in the previous section that, under the analysis proposed there, Apropositions have existential import. Thus, from 'Every cobbler is poor' it will follow that some cobbler is poor and, hence, that there is at least one cobbler. But from 'Every man, if he mends shoes, is poor' it will only follow that some man, if he mends shoes, is poor and, thus, only that there is at least one man, though perhaps there may be no cobblers. Consequently, we cannot expound the 'that' of a restrictive relative clause as meaning 'if (s)he/it\ This conclusion is confirmed by our recognition of a difference in meaning according to whether a relative clause is taken as restrictive or appositive. So far, I have pursued the account of restrictive relative clauses which results from adapting Evans's analysis to the re-categorization of count nouns and proper names. But there is also an alternative possibility: 'that' could be analogous to 'every' and 'some', the operator of a third-level schema of category B(S(D(B))). Let us canvass this in the context of the simplest type of example, where the relative clause qualifies a proper
Anaphoric pronouns and relative clauses revisited
255
name. According to transformational grammarians, this is not allowed: we are forced to construe it as non-restrictive in such a context. If, however, a count noun qualified by a restrictive relative clause is an expression of category B, it must be legitimate to use it as the operand of an expression of category D(B) such as a proper name. This possibility can be justified, pace transformational grammarians. Since even historical pointers may often have more than one bearer, a restrictive relative clause may often be needed for the hearer to identify the bearer correctly, for example (16)
Peter that mends shoes is poor,
to distinguish, perhaps, Peter the cobbler from Peter the butcher and in contrast to 'Peter, who mends shoes, is poor', where two distinct pieces of information are offered about someone who is supposed to be independently identifiable. A Fregean would object that (16) is really elliptical for a proposition containing a definite description, T h e (one and only) Peter that mends shoes is poor' and that the definite article shows that 'Peter' is not being used as a proper name here, but as a count noun (cp. 'Paris' in 'Edinburgh is no Paris'). But whereas 'Paris' in the example can reasonably be supposed to describe a kind of city, 'Peter' in (16) cannot be understood as describing a kind of man. The function of the relative clause is to identify one man among several called 'Peter', whereas nobody supposes that Edinburgh is also called 'Paris'. Of course, in Fregean logic each proper name is assumed to have just one bearer, so a restrictive relative clause could never serve any purpose by being attached to a proper name. In our actual everyday language, however, it can and often does. Let us then compare two representations for (16), the first using the previous assignment of'that' to category B(S(D),B) and the second to category B(S(D(B))). They can be given in linear notation, using 'd' as a link letter of category D. I have also written the category name of each constituent underneath it: (16F) is poor S(D) (16F) is poor S(D)
(Peter D(B) (Peter D(B)
(that:d (mends shoes (d), man] B(S(D),B) S(D) B (that:d (mends shoes (d (man]. B(S(D(B))) S(D) B
At first sight the first analysis appears much more natural than the second. After all, it presents 'man that mends shoes' as a unit, just as it occurs in (8), where 'that' looks precisely like an infix dyadic operator, tying together 'man' and 'mends shoes'.
256
Quantifiers, pronouns and identity
However, English word order can often be quite misleading with regard to semantic structure. Moreover, we must never forget that an account of semantic structure must sustain an account of truth conditions. Now it is surely requisite for the truth of (16) that a man called Peter mends shoes, even though the proposition may contain no assertion that there is any such man. Well, this could certainly be specified on the basis of the first analysis; all the elements from which Teter (man) mends shoes' is constructed are present in it. However, they are not present in anything approximating the required arrangement, since 'man' is not within the scope of 'mends shoes', whereas in the second analysis 'man' is within the scope of 'mends shoes' and we can see 'Peter (that:n (mends shoes (n (man]' as having been constructed by removing 'Peter' from 'mends shoes (Peter (man]', taking the resulting schema as operand of 'that', and operating upon the result with 'Peter'. This also shows us how we can construct a relative clause from any proposition by removing from it one proper name or demonstrative. Thus, upon reflexion, the second analysis appears the better. It will be as well to illustrate this with a graph, so I give graph (G8) for (8). (G8)
is poor
mends shoes
man
Finally, a further word about syllogisms with adjectival or verbal middle terms. There are really only two ways of handling these in English. The first is illustrated by the following examples: (17) (18) (19)
What has no Thought cannot reason; but a Worm has no Thought; therefore a Worm cannot reason. (Watts, 1724, p. 288) All battles are noisy; What makes no noise may escape notice. Therefore some things, that are not battles, may escape notice. Some, who deserve the fair, get their deserts; None but the brave deserve the fair. Therefore some brave persons get their deserts. (Carroll, 1896b, II.7)
Anaphoric pronouns and relative clauses revisited
257
In each case the middle term occurs (syntactically) as a verb phrase in one premiss but as part of a relative clause in the other. If we are allowed to treat the relative clauses as non-restrictive, then we have no problem, for the structural analyses of the arguments would be essentially the same as Frege's. But that demands, at the very least, a count noun which the relative clause may qualify. Carroll, without any prompting from a theory such as I have put forward, supplies count nouns to delimit the universe (of discourse): 'things' for (17) and 'persons' for (18); these feature explicitly in his conclusions (ibid., III.7). 'Person' may be allowed as an expression of category B, but not 'thing'; however, (18) concerns events and not bodies, and so goes beyond the scope of the present enquiry anyway. Watts does not employ the notion of a universe of discourse, but it would not misrepresent his intentions to regard organisms (or creatures) as the universe for (17) and to treat the major premiss as equivalent to 'Any organism (creature), if it has no Thought, cannot reason'. Given a right to supply a suitable count noun in these cases, however, are we also entitled to treat the relative clauses as non-restrictive rather than restrictive? Well, although I have insisted upon a difference of meaning in propositions containing relative clauses in accordance with the interpretation of the latter as restrictive or non-restrictive respectively, it seems that in many cases the appropriate interpretation is determined by the context (linguistic or non-linguistic) in which the proposition occurs rather than by the proposition alone. Often, indeed, the difference in meaning does not matter in the context, so that there is no sharp boundary between contexts demanding the restrictive interpretation and contexts demanding the non-restrictive one. We should then be free to treat a syllogistic argument as a context imposing a non-restrictive interpretation where the verb phrase occurring in the relative clause is the middle term. The second way of handling these cases is to allow the middle term to occur in one of the premisses as an adjective qualifying a count noun. The following are examples: (20) (21) 3
Dictionaries are useful; Useful books are valuable. Therefore dictionaries are valuable. (Carroll, 1896a, VIII.I.7, no. 2)3 No experienced person is incompetent;
It may be objected that this example is invalid, on the ground that AIDS is infectious; Infectious diseases are on the decrease. Therefore AIDS is on the decrease.
258
Quantifiers, pronouns and identity Jenkins is always blundering; No competent person is always blundering. Therefore Jenkins is inexperienced. (Carroll, 1896a, VIII.1.9, no. 8)
With this method, the count noun must be introduced explicitly, so we have no problem on that score. Looking at these examples syntactically, however, the adjectives seem to be bound to the count nouns more intimately than the relative clauses in the previous set of examples, which may incline us to the view that adjectives always correspond to restrictive rather than non-restrictive relative clauses, that is, that we may only paraphrase 'Useful books are valuable' as 'Books that are useful are valuable' and 'No competent person is always blundering' as 'No person that is competent is always blundering'. But this is to be over-influenced by syntax. In the context of (20), 'Useful books are valuable' can quite well be paraphrased as 'Books, if useful, are valuable'; no commitment to the existence of any useful books in this premiss is necessary to the argument, since the first premiss can be analysed to include a commitment to the existence of dictionaries if required. Similarly, the validity of the sorites (21) (a kind of double syllogism) does not depend upon there being any competent persons, so we are free to paraphrase 'No competent person is always blundering' as 'No person, if he is competent, is always blundering'. To conclude, then, adjectives can be restrictive or non-restrictive just like relative clauses and, in general, we have the same latitude of interpretation. Only a restrictive adjective would be of category B(B)\ to non-restrictive adjectives, Geach's remarks about 'pink pigs' would apply. I am unable to include a thorough treatment of adjectives in this enquiry, but enough has been said to indicate that the topic should be approached via relative clauses; it is fairly obvious that, from the point of view of meaning and hence of semantic structure, there is a great variety by comparison with the single traditional syntactic category of adjective.
is a counter-example. However, this has nothing to do with the treatment of adjectives in syllogisms; the difficulty arises, rather, from the use of plural nouns with no explicit quantifier. Thus, if we re-cast (20) as: Every dictionary is useful; Every useful book is valuable. Therefore every dictionary is valuable. the corresponding syllogism about AIDS is no longer a counter-example.
Plurative and numerical quantifiers
259
7.3 PLURATIVE AND NUMERICAL QUANTIFIERS Logicians have concentrated upon the quantifiers 'every' and 'some' ('at least one'), but in everyday language many other expressions of the same category are in constant use and so cannot, from our point of view, be passed by without mention. Perhaps the major syntactic distinction within this category is between those quantifiers which form quantifying phrases taking a singular verb and those which form quantifying phrases requiring a plural one, but there are three reasons for thinking that this does not coincide with any categorial distinction. First, in many contexts 'every', which takes a singular verb, is interchangeable with 'all', which takes a plural one, without any alteration of meaning. Second, the quantifiers 'no' ('zero'), 'at least one' and 'just one' take a singular verb, while the remaining numerical quantifiers ('two', 'three', etc., when used adjectivally) require a plural one, but it would be very strange if the numerical quantifiers did not all go together semantically. Third, a quantifier forming a quantifying phrase taking a singular verb can always be changed for one taking a plural verb salva congruitate, provided only that we also change the number of the verb. A minor distinction can be drawn, however, between quantifiers which specify an exact number of things and those which do not. This is not a category distinction, but examples from the two groups raise different issues and so it is convenient to discuss them separately. Apart from 'every' and 'some', the second group includes the plurative quantifiers 'nearly all', 'many' and 'few', together with 'most' and 'several'. Altham, after noting an analogy between the first three in this list and 'every', 'some' and 'no' respectively, proposes to define them by appeal to the notion of a manifold (Altham, 1971). This is a set containing not less than n members, n being specified for the context in question. 'Many Fs are G' is then defined as 'At least n Fs are G'. 'Few' is defined as 'not many' and 'nearly all . . .' as 'not many . . . are not'. The latter definitions may be debatable, but what is important for current concerns is that the whole analysis places these plurative quantifiers firmly in the same category as the two standard ones. Of the remaining plurative quantifiers, 'most' (together with 'half) has been treated by Geach, who defines 'Most Fs are G' as 'More things are both Fs and G than are Fs and not G' and 'Half the Fs are G' as 'At least as many things are both Fs and G as are both Fs and not G' (Geach, 1976, pp. 61-4). In the absence of an account of comparative constructions it remains uncertain whether these definitions place the two quantifiers in the same category as the others, though the plurative propositions which Geach considers have the same apparent form as
260
Quantifiers, pronouns and identity
those with standard quantifiers which he treats, and he proposes a parallel decision procedure for arguments in which each, and sometimes both, kinds occur. However, it is a limitation of Geach's account that he does not consider propositions containing multiple quantification. But we can easily see that Altham's analysis could be extended to 'most' and 'half. Thus, to take the former, if the number of things constituting a manifold is specified to be half of those in the context in question, plus one if there is an even number of the latter and plus a half if there is an odd number, then 'many' will mean 'most'. Thus we can also be confident that 'most' and fractional adjectives belong with the other quantifiers, and it would surely be quite extraordinary if 'several' were an exception. We should also account some plural count nouns among unspecific plurative quantifying phrases. Propositions in which these occur may, of course, be generic, in which case the analysis expounded in section 5.4 applies. In other cases they may be tantamount to universal quantifying phrases. But (23) of section 4.3, for example, 'Eskimos who live in igloos have lots of fun', is neither generic nor universal. It would not normally be taken to mean that it is in the nature of eskimos who live in igloos to have lots of fun, nor yet to be true only if all eskimos who live in igloos, absolutely without exception, do so. The sense is, rather, that for the most part eskimos who live in igloos have lots of fun. We cannot pin this down to saying that most such eskimos have lots of fun, either, because its truth may well demand more than a bare majority. All the same, as the preceding discussion shows, the plural signifies quantification, however vague. I anticipated this in section 4.3 by introducing an operator 'PL' to represent it. The writers on plurative quantifiers mentioned above implicitly assume, of course, that the standard quantifiers belong to category S(S(N)), as I also did in chapter 4. Their accounts of the meanings of plurative quantifiers would therefore have to be adapted to recategorization as S(S(D(B))) in the way that I adapted that of the standard quantifiers in section 7.1, and this applies equally to my analyses of (22) and (23) in section 4.3. In section 5.4 I also promised an analysis of the non-generic sense of (15), 'Homosexuals are more promiscuous than heterosexuals'. This is given by graph (G2), with suitable re-labelling: the two quantifiers both become 'PL', 'doctor' and 'patient' become 'homosexual' and 'heterosexual' respectively, while 'visited' is replaced by 'is more promiscuous than'. Having settled, then, the categorization of unspecific plurative quantifiers, we may return to two examples from Evans deferred from section 4.2 because they involved plurals:
Plurative and numerical quantifiers (22)
261
Few MPs came to the party, but they had a marvellous time
and (23)
John owns some sheep and Harry vaccinates them.
Evans claimed that the pronouns in these propositions belong to group 2b, so that the connective and not the quantifier must be taken as the main operator. Since we do not wish to draw any semantic distinction between plurative and other quantifiers, we can now agree with this, so the solution proposed in section 4.2 for pronouns of this group should apply here too; we can, accordingly, offer (G9) for (22). (G9)
came to the party
OMP By contrast with (G19) of section 4.4, we only have to duplicate the Dnode of the quantifier here and not the B-node as well, because there can be no confusion of the scopeways. The same graph will serve to represent (23), with suitable re-labelling: 'and' instead of 'but', 'some' instead of 'few', 'sheep' instead of 'MP', 'John (man) owns' instead of 'came to the party' and 'Harry (man) vaccinates' instead of 'had a marvellous time'. The last two expressions are, of course, susceptible of further analysis on the lines expounded in chapter 6, but that is not germane to the present issue. Turning, now, to numerical adjectives, there are two extra reasons for discussing them here: first, because in concerning ourselves with count nouns, we are implicitly committed to giving an account of the numerical terms which are combined with them when we count bodies (the question
262
Quantifiers, pronouns and identity
'How many?' always makes sense when asked of bodies); second, because even the quantifiers which do not specify an exact number of things are nevertheless numerical, as is most evident from the expression of the existential quantifier as 'at least one'. Moreover, since to have just one horse is to have at least one and at most one, there is no room to drive a semantic wedge between quantifiers which specify an exact number of things and those which do not. However, there are other numerical terms which need not be treated here: first and foremost, the numerical terms used in arithmetical propositions such as 'Six divided by three equals two', which are not adjectives although equiform with numerical adjectives. Arithmetical propositions are better regarded as part of a specialist, technical language than of everyday language; moreover, they are necessary, whereas numerical adjectives are mostly used in contingent propositions. In any case, the use of number in arithmetical propositions is derivative from the adjectival use: in Frege's words, numerical adjectives bring out the basic use of number (1884, section 46). A third group of numerical terms, occurring both in contingent and in arithmetical propositions, are the numerical adverbs such as 'twice', 'ten times', etc. The most appropriate place to discuss the contingent uses of these is in connexion with temporal expressions, since one of their most basic applications is to repetition of actions. So they, too, will be omitted here. The reasons why numerical adjectives should not be treated as semantic predicates were set out definitively by Frege, whose conclusion that 'the content of a statement of number is an assertion about a concept' (ibid., section 46) is essentially what is being proposed here, when allowance is made for the displacement of Fregean proper names by count nouns as a basic category. He also observes that existence is analogous to number, indeed that 'Affirmation of existence is in fact nothing but denial of the number 0' (ibid., section 53). We may consider all this in relation to two examples which he cites: If I say 'Venus has 0 moons', there simply does not exist any moon or agglomeration of moons for anything to be asserted of; but what happens is that a property is assigned to the concept 'moon of Venus', namely that of including nothing under it. If I say 'the King's carnage is drawn by four horses', then I assign the number four to the concept 'horse that draws the King's carriage', (ibid.) In the case of the first example, the quantifier 'no' could replace the numeral '0'; indeed, 'no' would be more colloquial. But Frege never actually gave a formal analysis of these or of any other comparable examples, his interest being centred upon arithmetical propositions, for which consideration of these was only a preparation. But here we should
Plurative and numerical quantifiers
263
not leave that matter at such an informal level, however conclusive the general arguments may seem. Let us, then, first consider the representation of (24)
Venus (planet) has 0 moons.
This can be given in linear notation: (24F) 0:d (has (d (moon), Venus (planet)] with 'no' instead of '0' if we prefer. But what, precisely, is the concept to which the number 0 is being assigned in this analysis? It is that represented by the schema 'has (Venus (planet), D (moon))' or, colloquially, that of Venus having so many moons. This differs from Frege's formulation 'moon of Venus', but Frege seems to have been careless here, for his example contains the word 'has', which has inexplicably disappeared from his expression for the corresponding concept (by contrast, 'draws' remains in the other example). The account offered here diverges from Frege's only in that the schema is of second instead of first level; that, of course, is the result of the role now assigned to names of kinds of body. Frege's other example raises a new problem, which he himself did not notice. Compare: (25) (26)
The king's carriage is drawn by four horses The king's carriage was drawn by four famous artists.
Of course, (26) forces a different interpretation of 'drawn', but that is not the point at issue here. The rule for eliminating 'every' given in section 5.1 allows us to say of each thing what was said of everything: if every man respects the king, then Tom, Dick and Harry in particular respect him. Now supposing we were to add to (26) 'namely, Sickert, Landseer, Sutherland and Nash'. Would it then follow that the king's carriage was drawn by Sickert, that it was drawn by Landseer, and so on? Under the most obvious interpretation of the proposition, it would. To cite a yet better example, if four men climbed the stairs, namely A, B, C and D, then A climbed the stairs, B climbed the stairs and . . . etc. Now return to (25): does it follow that the king's carriage is drawn by each of the four horses? Well, certainly not in the same way as it follows that each of the four men climbed the stairs, for drawing the king's carriage is a cooperative effort, which none of the horses could have performed alone (the carriage is not a gig which each could draw in turn). Similarly, although when Barchester United won the football game, eleven men won it, it would not be true to say that Bright (a member of the team) won the game.
264
Quantifiers, pronouns and identity
We have, then, different truth conditions in propositions containing numerical adjectives in the cases where a cooperative effort is being described and those where it is not (or, to use a more traditional terminology, where the quantifier is taken collectively and where it is taken distributively). The question which this poses is then whether a different structure should be assigned to the propositions for each of the two cases. Against doing so, it may be argued that the reason why we are likely to understand (25) as describing a cooperative effort and (26) as describing four separate actions has nothing to do with ways of construing the structures of the two sentences but, rather, with our background knowledge of contingent matters. Normally an artist draws the whole of his picture himself; normally a royal carriage is so large and heavy that a team of horses is necessary to draw it. But our interpretation of both propositions could have been wrong. Sir Godfrey Kneller, after all, only painted the main features of the portraits attributed to him and left all the rest to assistants; so perhaps the king persuaded four famous artists to cooperate upon this drawing of his carriage, perhaps choosing one who was particularly good at carriages themselves, another at horses, and so on. Again, perhaps the king was hard up and his carriage was a gig, the four horses taking it in turns to draw it. So the conclusion would be that such propositions are structurally ambiguous, but that we normally impose one interpretation in view of background information available to us. If we analyse (25) on the same lines as (24), viz. (25F) 4:d (draw (the king's carriage, d (horse], there is evidently no way in which this ambiguity can be expressed, so we should be obliged to posit greater structural multiplicity. Carlson (1982) has proposed a solution on these lines, apropos an example which, although it uses quantifiers which do not specify an exact number of things, raises the same issue: (27)
Some detectives have solved many crimes,
where we can understand the detectives as working either individually or cooperatively. Carlson challenges the assumption that plurative quantifying phrases correspond to single quantifiers; thus he distinguishes the individual from the cooperative reading of (27) by the following respective analyses:4 4
I have here simplified Carlson's (8) (1982, p. 168) and (15) (p. 169), in order not to have to represent a corresponding complexity which he finds in 'many crimes'. This feature of his example obscures the principle at issue.
Plurative and numerical quantifiers
265
(27aF) somethingix (and (detectives (x), everything:y (if (y 6 x, has solved many crimes (y] (27bF) something:x (and (detectives (x), has solved many crimes (x]. Assuming the sign 'e' to have its normal meaning of set membership, Carlson is here using 'x' as a link letter corresponding to sets. In that case, we must understand 'detectives (x)' in both representations to mean 'x is a set of detectives'. Then (27bF) says that the set of detectives has solved many crimes, whereas (27aF) says that each member of the set has done so. Carlson's claim is thus that, if we take a plurative quantifying phrase as corresponding to only one quantifier, we must understand it as, effectively, a collective noun, whereas if we want to understand it as applying to individuals, we must acknowledge a double quantification, existential over a set followed by universal over individual members of the set. This analysis presents two difficulties. First, it should make sense to replace the existential quantifier with which each representation begins by any other quantifier. But, supposing we replace it in (27aF) by the universal quantifier and, to ease interpretation, 'and' by 'if. The meaning will then be, that for every set of detectives, every member of the set has solved many crimes. Yet that is not a possible reading of 'Every detective has' (or, even 'all detectives have') 'solved many crimes'. So it seems that the existential quantifier of the analysis does not correspond to the 'some' in (27). Second, Carlson's attempt to represent other quantifiers in examples is inadmissible. Thus, in order to represent the sense of (27) according to which a set of detectives jointly solved a particular set of crimes, he proposes: (27cF) something:x (something:y (and (and (detectives (x), and (crimes (y), many (y))), x has solved y]. The offending clause in this is 'many (y)'. If 'crimes (y)' means 'y is a set of crimes', then part passu 'many (y)' should translate as 'y is a set of many'. But that is just nonsense. If, however, it is supposed to mean 'y has many members', then it is wrongly represented as a first-level operator when it should be of second level: in Fregean terms, it represents number as a property of an object instead of a mark of a concept. Moreover, why is there no corresponding 'some (x)' clause to show that the set of detectives has some members? Otherwise it might be the empty set, for all we know, for on this understanding the initial existential quantifier merely guarantees the existence of a set, not that it has any members.
266
Quantifiers, pronouns and identity
Although Carlson's analysis will not do as it stands, we can build upon its basic idea that plural quantifying phrases can be construed as collective nouns. Now collective nouns are most commonly prefixed by the definite article, for example 'the committee', 'the government'. But we could, if we wanted, give a proper name to the sets described, such as 'Parliament' instead of 'the UK parliament'. Moreover, such proper names of sets could be substituted salva congruitate for proper names of individuals wherever the members of the sets were of the same kind as the individuals. Thus, if there are any limitations to a set of people doing or undergoing what an individual person may do or undergo and conversely, they are physical rather than logical. We should, then, assign proper names of sets to category D(B). But what, then, is the substitution for Bl One possibility is the kind of body of the members of the set, such as people, horses: another the kind of set itself, such as a team, a family, a flock, a committee, etc. It may help here to consider some paraphrases of the relevant senses of previous examples, for instance (25') (21')
The king's carriage is drawn by a team of four horses A task-force of several detectives solved many crimes.
These suggest that our analysis should at least provide for the name of a kind of set; of course, there may be occasions when no specific name is available and all that we can say is that it is a set. Yet that, too, is important, for there should be some way of telling from the representation that we are concerned with a set and not an individual. Now a team of four horses is a team consisting of four horses, that is, that consists of four horses or, in logicians' jargon, a team such that just four horses are members of it. So we have a restrictive relative clause forming an expression 'team that consists of four horses'. This clause must, furthermore, contain a numerical quantifier 'four'. On our account of restrictive relative clauses, the analysis of 'team consisting of four horses' will be: that:x (4:y (y (horse) e x (team], Returning to (25), which does not specify that the four horses are called a 'team', we shall need, in order to represent the cooperative interpretation, a neutral description for a group, such as 'set'. The analysis will then say that there is a set consisting of four horses which draws the king's carriage: (25T) some:z (draws the king's carriage (z (that:x (4:y (y (horse) e x (set].
Plurative and numerical quantifiers
267
If this analysis is correct, it must make sense to replace either of the quantifiers by any other quantifier. That poses no problem. For example, we may replace the numerical quantifier by an unspecific one, say 'many': 'The king's carriage is drawn by (a set consisting of) many horses'. Similarly, we could replace the initial quantifier by a numerical one, say 'three': 'The king's carriage is drawn by three teams of four horses', that is, sometimes by one team, sometimes by another. Moreover, it must also make sense to give a set a proper name, eliminating the second quantifier in favour of it. But that, too, though unusual, is quite possible, for example 'The king's carriage is drawn by Leeds United', where 'Leeds United' would be analysed as 'Leeds United (team)'. Of course, the underlying presupposition is that sets are freely interchangeable salva congruitate with individuals as the participants in states and actions described by first-level schemas of categories S(D), S(D,D), etc. But that, too, seems justifiable; thus even an example like 'Emma Lathen writes detective stories' is in order, even though 'Emma Lathen' is the nom-de-plume of two women whose books are cooperative undertakings. If there are limits to what groups of individuals can be or do, as contrasted with their individual members, those limits are physical rather than logical. The analysis proffered above is simpler than Carlson's, in that it does not require double quantification in representing the individual interpretation(s) of propositions which are also susceptible of a group interpretation. Representations of the latter can also be simplified by replacing unchanging expressions in the full analysis by a single operator. Thus, if we consider the analysis of 'set of four horses' as: that:x (4:y (y (horse) e x (set], contraction of 'horse' with the quantifier will yield: that:x (4 horses:y (yex (set]. The category of the quantifying phrase 'four horses' is S(S(D))\ if we were to remove it, we should then be left with a third-level schema of category B(S(S(D))9 which could therefore be represented as set:f(Q:d(f(d], where 'Q' marks a place for the insertion of an operator of category S(S(D)). In order to separate the quantifying phrase into its constituents, we have to turn 'Q' into a schematic symbol for a quantifier (now third level, so that 'd' labels a first-level link) and insert a schematic symbol for an expression of category B (for example 'horse'). The result is:
268
Quantifiers, pronouns and identity set:f (Q:d (f (d (B].
This account will also apply to non-numerical quantifiers which can bear a collective sense, such as 'all'. In some cases, such as 'All the men dug a trench', the collective sense, 'A team that consisted of every one of the men dug a trench' is the natural one; in others, such as 'All the foreign delegates visited Chats worth', it is left open whether each went individually or whether they went as a group. We then have two possible analyses, 'Every foreign delegate visited Chatsworth' and 'A group that consisted of every foreign delegate visited Chatsworth'. It can also apply to examples in which a quantifying phrase is conjoined to a proper name, as in 'Some economists and Mrs Thatcher oppose ratifying the Maastricht Treaty'. We can understand this distributively to mean the same as 'Some economists oppose ratifying the Maastricht Treaty and so does Mrs Thatcher', but we can also understand it collectively to mean 'A group that consists of some economists and Mrs Thatcher opposes ratifying the Maastricht Treaty', with 'group that consists of some economists and Mrs Thatcher' analysed as: that:x (and (Mrs Thatcher e x (group), some:y (y (economist) e x (group]. It is thus unnecessary to invoke type raising of the proper name in these cases, pace Cresswell, in order to conjoin the quantifying phrase and the proper name into a complex quantifying phrase. In discussing examples of propositions containing more than one quantifier, it has so far been assumed that only one will not lie within the scope of any of the others. However, there are examples which call this into question, of which the most straightforward use numerical adjectives, such as: (28)
Two Greeks were fighting three Turks,
which has been discussed by Geach (1973). Woods's example, 'Three look-outs saw two boats', which I cited in section 3.2, is exactly parallel to this. The difficulty posed by (28), according to Geach, is that if we take 'two Greeks' as the main operator, we impose the reading: 'As regards each of two Greeks it holds good that he was fighting three Turks', while, if we take 'three Turks' as the main operator, we impose the reading 'As regards each of three Turks it holds good that two Greeks were fighting him'; yet there is no good reason for preferring either interpretation. So he suggests that neither operator falls within the scope of the other, but that their respective scopes converge upon 'were fighting', a solution which is also favoured by Hintikka for (27) (1979, p. 142). The present system of representation could be modified to accommodate this solution, but not without cost. First, we should have to allow
Plurative and numerical quantifiers
269
graphs with multiple roots. Second, the rule that converging scope forces duplication of nodes would have to be modified, since, in this case, we should want the scopeways from each root to merge, not to keep them apart. So the graph for (28) would be (G10). (G10)
s
D
O
were
fighting
Turk
O
Greek D
If the operator 'were fighting' were duplicated, we should have, in addition to the pair of S-nodes, two pairs of D-nodes, of which only one in each pair could be used. But it remains unclear just what form the modification should take, that is, in what general circumstances scope convergence upon an S-node is still to force duplication of nodes, and in what circumstances it is to be allowed without duplication. In view of these complications, we may enquire further whether an adequate semantic analysis of (28) really demands converging scope. The two readings of (28) which Geach rejects are certainly to be rejected, for they would be true in circumstances which he does not mention. His worry is that the fight is a cooperative effort, both by the two Greeks on the one side and by the three Turks on the other. Hence it is not necessary that each of the Greeks was fighting all three Turks. But 'As regards each of two Greeks it holds good that he was fighting three Turks' would still be true if the three Turks were not even the same three Turks in each case, for instance one Greek might be fighting three Turks over here and the other Greek another three Turks over there. This is a very unlikely interpretation of (28), which by itself would be a quite misleading description of the latter situation. However, the central issue is how we are to represent the cooperative enterprise interpretation of (28), under which it means that a group of two Greeks was fighting a group of three Turks. My answer will be evident; the analysis is exactly parallel to that of (25), except that we now have to represent two groups. Using the abbreviated notation introduced above, the analysis will be:
270
Quantifiers, pronouns and identity
(28F) a:x (a:y (was fighting (set:f (3:d (f (d (Turk)))), set:f (2:d (f (d (Greek]. I have used the indefinite article for the two initial quantifiers here in order to leave it open whether 'at least one' or 'just one' is to be understood; there is a good case for supposing the latter, as also in the analysis of (25). Even the standard quantifiers can sometimes call for a group interpretation, as another example which Geach has provided shows. If we compare (29)
You may have each object on this tray for 49p
with (30)
You may have every object on this tray for 49p,
the former can be represented by the universal quantifier with its normal individual interpretation, whereas the latter is most likely to be understood as meaning that the totality of objects on the tray is offered for 49p (although it could also be taken in the same sense as (29)). There is clearly no difficulty in representing the group interpretation of (30) with the apparatus that has been introduced in this section. 7.4 I D E N T I T Y Identity is a difficult and complicated topic, so I can only touch upon it lightly here. In Fregean logic, expressions for identity are assigned to category E(E,E), the category to which transitive verbs and other relational expressions (for example comparatives) are also assigned. The vast majority of writers on identity simply assume that this is the correct categorization, though, if they distinguish category E into categories S and N, it becomes S(N,N). The corresponding category in the system developed here is thus S(D,D). A commonplace use of identity in a contingent proposition is to reveal the bearer of a nom-de-plume, for example (31)
Lewis Carroll was the same as Charles Dodgson.
Williams claims that in such examples at least one of the proper names is merely being mentioned, not used as a proper name (1989, pp. 7, 21). That would be so in the present case if the hearer already knew the works of Lewis Carroll, say, and the purpose was to tell him that his real name was 'Charles Dodgson'. Or, again, 'Lewis Carroll' would be mentioned, not used, if the hearer already knew of the nineteenth-century Oxford mathematician Charles Dodgson and the point was to tell him that the latter also wrote under another name. But what if the hearer was already
Identity
271
familiar both with the Oxford mathematician and with the children's books of Lewis Carroll? In that case, (31) cannot be used to introduce him to either name, only to tell him that Lewis Carroll and Charles Dodgson were the same person, by using that person's two names. However, for present purposes it does not really matter whether the names are being used or only mentioned. In either case it is true, since it is not used to introduce either name for the first time and, as such, must have a semantic structure. We are therefore justified in enquiring what that structure is. Now if we were to accept the Fregean categorization of identity expressions, but retain the account of proper names developed in chapter 6, we should have to analyse this as: (3IF) Lewis Carroll (man) was the same as Charles Dodgson (man). Generalizing, this would be an instance of the schema: (I)
idem (d2(b2), d,(bO),
where 'idem' means 'is the same as' and tense is ignored. An obvious objection to this is that it allows of independent substitutions for 'bf and l b2\ although, for the truth of a proposition of this form, either the same substitution must be made for both, or one kind of body must be a species of the other. If this condition is not fulfilled, it would not be open to us to say that the resulting expression did not make sense. It would have to make sense, but be false and, presumably, necessarily false. Yet it is difficult to imagine a logical system in which that would be provable. This difficulty arises from the account which I have given of proper names, but it hints at a deeper unease with the Fregean explanation of identity which is quite independent of my treatment of pointers. It is, essentially, that identity is not a relation between two bodies (or objects) because, in claiming that two pointers have the same bearer, an identity proposition presents us with only one body. Or, to put the matter in another way, it does not tell us, of two bodies, that they are one (which would be contradictory) but, rather, that one body is picked out by two pointers. The information which it gives us is, accordingly, partly linguistic, yet, at the same time, usually also contingent. Williams calls this 'the paradox of identity' (1989, pp. 1-4). Perhaps the heart of the difficulty with identity is that it hovers between language and the world, telling us partly about pointers but also partly about their bearers. The most radical solution was proposed by Wittgenstein, to dispense with a sign of identity altogether (1922, 5.53ff.). This demands, however, a logician's ideal language in which proper names are correlated one-toone with their bearers, so that examples like (31) would be eliminated from the language by disallowing multiple proper names for the same
272
Quantifiers, pronouns and identity
bearer, while no name would be allowed more than one bearer. As I argued in section 6.3, this is hopeless as an account of proper names in everyday language, so it need not detain us further.5 Taking equality in arithmetic as his inspiration, Frege thought that identity is posited absolutely, not noticing that, in the context, its usual meaning of 'is the same number as' is being taken for granted, just as we say 'Lewis Carroll was the same as Charles Dodgson' because we can assume that everyone will know that each pointer is a name for a man. My account of proper name phrases makes this explicit, but then produces redundancy in (3IF) which is a source of difficulty. This could be avoided, however, if 'man' only occurred once in the representation of a semantic structure for (31), as in the paraphrase: (31')
Lewis Carroll was the same man as Charles Dodgson.
Moreover, 'was the same man as' should form a semantically coherent unit in this proposition. In that case, we are left with 'Lewis Carroll' and 'Charles Dodgson', each an expression of category D(B), as its operands. The category of 'is the same man as' would then be S(D(B),D(B)), which is of second level.6 It would be a natural assumption that 'man' is an operand in 'is the same man as', but this has been denied by Geach: We shall treat 'the same' in 'is the same A as' not as a syntactically separable part, but as an index showing we have here a word for a certain sort of relation: just as 'of in 'is brother of does not signify a relation by itself (as if the phrase were 'is a brother, who belongs to') but serves to show that the whole 'is brother of stands for a relation. (1973, p. 291) Now an index is an additional sign which serves to distinguish two otherwise uniform signs. The paradigms are the subscript numbers which mathematicians attach to constants or variables, for example 'af, 'a 2 \ It 5
6
In any case, Hintikka has since argued that the system proposed by Wittgenstein can be translated into standard first-order logic with identity, and vice versa, so that Wittgenstein's proposal is no more than a notational variant upon the latter (1956, 1957). Williams, however, claims that the reverse translation cannot be carried out in every case, citing 'Vx (x = x)' as a counter-example (1989, p. 31). Yet he also argues, on the basis of an example involving a belief-context, that a sign for identity is still required in certain circumstances (1989, chapter 3). These turn out to be cases in which 'the same B' is an alternative to a reflexive or (anaphoric) personal pronoun, for example 'Socrates kicked a dog and the same (dog) bit him' instead of'Socrates kicked a dog and it bit him'. His sign for this use of 'the same' is tantamount to the /-operator introduced by myself (Potts, 1979, section 2) before arriving at the notion of converging scope, which here supersedes it. Cp. Frege (1969, p. 132 (translation, 1979, p. 121)), where second-level identity of functions is introduced.
Identity
273
is in this sense that Wittgenstein distinguished indices from arguments, citing 'Julius' in 'Julius Caesar' as an index and commenting The index is always part of a description of the object to whose name we attach it, e.g. The Caesar of the Julian gens' (1922, 5.02). It is, then, quite baffling how Geach can regard 'the same' as an index, for his comparison with 'of in 'is brother of is quite unlike expounding 'the same A as' as 'ASAME' (I assume that he intends 'as' to go with 'the same', although he does not say so.) What would be alternative indices? There must be others, or this one would be redundant: no point in calling Julius Caesar 'Julius' if there is only one Caesar. The only alternative which springs to mind is 'is a different A from', but this and 'is the same A as' are quite clearly not related as 'af and 'a 2 \ since a is a different A from b just in case a is not the same A as b, that is,if 'ai (c,b)' represented 'b is the same A as c', then 'not (ai (c,b))' would represent 'b is a different A from c'. Let us, then, dismiss the comparison with an index and look at that with 'of in 'is brother of. According to Geach, the preposition shows that the latter stands for a relation. But that, by itself, is a very inadequate account of its role. For a start, in an inflected language there would be no preposition but, instead, one of the two nouns in the completed proposition would be in the genitive case. So the role of 'of is to show us of whom the brother is a brother. Moreover, if anyone understands the meaning of 'brother' he must know that it signifies a relationship, so he does not need the 'of to tell him so. Indeed, we could easily imagine a language lacking both case-prepositions and caseinflexions, in which the semantic roles associated with a relation were indicated solely by word order (see Potts, 1978a). Again, there are many relational expressions which do not contain prepositions, for instance transitive verbs. The role of 'of in Geach's example is syntactic rather than semantic: it has to be added because 'brother' is a noun, not because its meaning would otherwise not be relational. This amendment to his account of 'of may, however, suit Geach's book nicely, for he holds that 'is an A' may be defined as 'is the same A as something', just as we might define 'is a brother' as 'is a brother of someone'. The claim, then, is that 'A' is not an operand in 'is the same A as'; that the meanings of names of kinds of body are essentially relational; and that the frame 'is the same . . . as' is a syntactic requirement, because 'A' is a noun, to show that it does, in fact, express a relation. My guess is that Geach dubbed 'same' an index because Wittgenstein, in the passage referred to above, said that it is natural to confuse the arguments of functions with the indices of names, although, in this case, it would have been a confusion of operator with index. However, although 'brother' is not an operand in 'is brother of, it does not follow that 'of is an index.
274
Quantifiers, pronouns and identity
Geach's view is clearly incompatible with the account of names of kinds of body which I have developed in this book. But it also has internal difficulties. The reason for this is that it ascribes too great a logical multiplicity to expressions of the form 'is an A\ For, if the latter means 'is the same A as something', then there are two ways in which we can insert a 'not', namely: not (something:x (( is the same A as x] and something:x (not (£ is the same A as x]. The first is quite straightforward, defining 'is the same A as nothing', that is, 'is not an A'. But the second is equivalent to 'is a different A from something', which might be a roundabout way of saying that there are at least two ,4s, but cannot be paraphrased by any combination of 'is an A' and 'not'. So how is it that 'different', although it can be defined in terms of 'same', is not eliminable in favour of some operator of degree I?7 Support for Geach's view of names of kinds of body as essentially relational in meaning might be sought from a comparison with adjectives. Thus, in an expression such as 'is the same length as', 'same' is eliminable in favour of'is just as long as'. Moreover, if we say simply that something is long, there is an implied comparison with some average, that is, that it is long for a so-and-so, so the relational sense appears to be primary. In this case, though, we can not only substitute 'is a different. . . from' for 'is the same . . . as' salva congruitate, but also 'is . . .er than' for 'is just as . . . as'. Further, we can define 'a is just as <> / as b' as 'a is not $er than b and b is not $er than a', but there is no way of defining 'a is $er than b' in terms of 'a is just as <j) as b'. So it seems that the primary sense of such adjectives is the comparative one, which has no parallel with names of kinds of body. The latter, indeed, stand in such marked contrast with adjectives which are susceptible of comparatives that, semantically, they are almost inverses: whereas the comparative sense of these adjectives is primary and apparently absolute uses are derivative, it seems that the absolute use of names of kinds of body is primary and that in statements of identity or difference derivative. One reason for resisting this view seems to be the 7
Williams denies that 'same' and 'different' are inter-definable (1989, p. 75), but this seems to fly in the face of everyday language. Even Frege often used 'is no other than' as a synonym for 'is the same as', and I cannot detect any difference in meaning between 'Lewis Carroll was not a different man from Charles Dodgson' and (30'), or between 'Lewis Carroll was not the same man as Charles Dodgson' and 'Lewis Carroll was a different man from Charles Dodgson'.
Identity
275
demand, made by Geach and many other philosophers, that a proper name must be associated with a criterion of identity. By this is meant that, in order to understand the meaning of a proper name, one must be able to identify its bearer, that is, be able to distinguish that bearer from other bodies of the same kind. Thus the expression of category B which is implicit in the meaning of the proper name must carry with it a principle for deciding when we have, or have not, the same B. Hence names of kinds of body are essentially relational in meaning. This argument goes too fast. The ability to distinguish one body from another of the same kind is something over and above the ability to distinguish it merely as a body of a certain kind. It is quite conceivable that one might understand the meaning of a count noun 'B' without knowing under what circumstances propositions of the form 'a is the same B as c' would be true. We are, indeed, quite familiar with such cases. For example, people often find it very difficult to distinguish persons of a different race one from another, and consequently are in immediate difficulties when it comes to using their proper names. But someone who says 'All black people look alike to me' presupposes his ability to recognize someone as a black person, though not as the same black person again. We could extend this type of case, which is even more marked in our lack of discrimination between animals of other species, and imagine a situation in which we could never recognize any bodies as the same again. This would doubtless change our lives very fundamentally, since we should be deprived of proper names - think only of the effects on property, or on human relationships! - but it would not prevent us from using count nouns as we do today. It may be objected that the difficulties in using proper names which I have envisaged are merely contingent, and that we should still know in principle when a is the same B as c. Some would appeal to spatiotemporal continuity as the principle in question, others to the same matter for synchronic identity of bodies, the same form for diachronic identity. But of what avail would such principles be if we could never apply them: if we were never in a position to determine spatio-temporal continuity, never able to determine the same matter or form, as the case required? Our criterion of identity would then be no more than an empty boast. It is enough to envisage, as I have done, circumstances in which we could not use proper names for want of such a criterion, yet could continue to use names of kinds of body, to show that the meaning of the latter does not incorporate a criterion of identity. It is irrelevant that the circumstances are contingent. If, then, the meaning of a name of a kind of body is basically absolute and not relational, the way is open to consider such a name as an operand
276
Quantifiers, pronouns and identity
in schemas of the form 'a is the same B as c\ This is in one sense to account identity as relative, that is, as qualifying the name of a kind of body. But it is important not to confuse this with the further sense in which Geach maintains that identity is relative, that, if a is the same B as c, it does not follow that a is the same D as c, for every legitimate substitution for 'D'. 8 It is not necessary to decide this question here, since we are only concerned with the representation of identity propositions. If 'B' is an operand in schemas of the form 'a is the same B as c\ then we have to decide whether it lies within the scope of the proper names or not. (This is analogous to the question raised in section 7.1 whether the expression of category B included in a quantifying phrase lies within the scope of the latter's operand.) If we say 'no', then the category of 'is the same . . . as' will be S(D(B),D(B),B)9 a mixed first/second-level category, and the representation for (31) will be (Gil). (Gil)
same
B
D
Lewis Carroll
O
o
Charles Dodgson
O man But it is surely unsatisfactory that 'man' should lie within the scope of neither proper name, and unintuitive that identity should be represented by an operator of degree 3 rather than 2. So the solution which I propose, (GIT), is analogous to that for quantifiers, bringing 'man' within the scope of the two proper names by essentially the same device used to bring the expression of category B within the scope of the verb phrase. However, the question then arises: to what category does this representation assign 'same'? Well, we assigned the quantifiers to category S(S(D(B))), so 'same' will belong to the third-level category of degree 2 S(D(B(B)),D(B(B))). But that is not quite enough, because it does not show that the two in-most Bs are identified, and there is This is what is usually called 'the relative identity thesis'. It is the central concern of Noonan (1980), who nevertheless presupposes throughout that the expression of identity belongs to category S(N,N).
Identity
277
obviously no straightforward way of doing this in a linear notation, which can represent divergences but not convergences. Some additional convention is needed, and a numerical subscript is as simple as any: the two subscripts can be seen as marking the ends of a link, thus: S(D(B(Bi))tD(B(Bi))). In linear notation, this would mean that the same substitution must be made for 'Bi' in both parts of the schema, that is, for (31): (3IF') idem:x,y (x (Charles Dodgson (man), y (Lewis Carroll (man] (Gil')
Lewis Carroll
man
Charles Dodgson Of course a schema of this category must also be computable by two operands each of category D(B(B)), but I think there is little danger of encountering any expressions of this second-level category which would prove an embarrassment. Moreover, if 'idem' is used as the operand of 'not', we obtain another expression of the same category, which is just what is required to express 'a is a different B from c\ I conclude, then, that identity of bodies is a third-level concept and requires the name of a kind of body for its full specification.
Epilogue
In this book, I have largely been re-working old ground, in the sense that I have only dealt with the representation of a very limited vocabulary, namely, that which standard first-order logic aims to represent. The main burden of the book is that the standard representations are unsatisfactory on certain counts and must be modified - especially the ways in which count nouns and proper names are represented. But the purpose of the exercise has been to prepare the ground for representation of a wider vocabulary and especially of expressions which first-, or even secondorder logic is effectively unable to handle. I should like to leave the reader with at least a foretaste of these possibilities, and there is one topic which allows me to do so in a brief compass: the representation of adverbs. This will also provide an occasion to introduce a final structural innovation: one which, I believe, will eventually have large consequences. Adverbs are awkward for Fregean logic because they appear to qualify verbs, and it is clearly out of the question to represent them by means of conjunction in the way that is plausible for some adjectives. Thus it is obvious nonsense to try to analyse 'Mabel spoke slowly' as *'Mabel spoke and Mabel slowly' in a way that it is not obvious nonsense to try to analyse 'Diana is a brown cow' as 'Diana is a cow and Diana is brown'. Indeed, those who have tried to fit adverbs into the model provided by Fregean logic have introduced quantification over events, states, processes, etc. in order to do so. The example given above could then be analysed as 'There is a speaking and it was by Mabel and it was slowly'. I have already criticized this style of analysis, which stems from Davidson (1967a), and shall not pursue it further. Now I noted in section 5.2 that there are two kinds of adverb, of which the first can happily be assigned to category S(S). These adverbs operate, like negation, upon propositions and, as we might expect from that, can be regarded as qualifying their truth. They have appropriately been termed propositional adverbs (see Cresswell, 1985, p. 4). Thus Fido probably smells just in case it is probably true that Fido smells. 'Necessarily' and 'possibly', together with 'not', also satisfy this criterion 278
Epilogue
279
and belong in the same category. By contrast, although we might give a sense to 'It is very strongly true that Fido smells', it would not be the sense of (1)
Fido (dog) smells very strongly,
in which it appears that 'very strongly' does not qualify the whole expression 'Fido smells', but just the verb 'smells'.1 To what category, then, should 'very strongly' be assigned? Geach falls back upon Ajdukiewicz-style categories in answering this question (1972, pp. 489-500). Using a categorial system with the two basic categories S and N, he follows Ajdukiewicz in assigning 'passionately' in 'passionately protested' to category S(N)(S(N))\ assuming that 'protested' belongs to category S(N), this has the effect that 'passionately' will combine with it to form a compound expression of the same category. In virtue of Geach's recursive rule (section 2.5, (D)) this also allows 'passionately' to combine with an operator of category S(N,N)9 for example 'loved', to form a compound expression of that category. The same will apply to the remaining categories in the series S(N), S(N,N), S(N,N,N), . . . etc., but an expression of category S(N)(S(N)) will not combine with one of category S. This categorization therefore shows that adverbs which belong to it can qualify verbs but not propositions. Cresswell takes a similar view,2 supporting it by a standard argument (1985, p. 22). In order to show that a given adverb is not of category S(S), we take an example in which it qualifies a verb of category S(N,N), and then the corresponding passive proposition; if their truth conditions differ, then the adverb cannot be of category S(S). His pair of examples is 'John precedes Arabella willingly' and 'Arabella follows John willingly' (1985, p. 22). This certainly shows that 'willingly' cannot be assigned to category S(S), but it may not always be easy to find counterexamples to exclude S(S) (for example for 'passionately'), so the test proposed above is preferable. Cresswell also allows for the possibility of adverbs belonging to category S(N,N) (S(N,N))9 and so on in the series, but he does not give examples or tell us what kind of argument would be needed to show that an adverb belonged to one of these categories (1985, p. 23).
1
2
There may be more than two categories of adverbs. Thus 'discourse' adverbs, like 'frankly' in 'Frankly, the man is a bore', seem to relate either to psychological verbs or to verbs explicitly concerned with language, so that we could paraphrase 'I tell you frankly that the man is a bore'. These lie outside the scope of the present enquiry. As do the majority of those who have written on this topic: see Cresswell (1985, especially p. 60, n. 5), who gives many references.
280
Epilogue
As I explained in section 5.2, the nearest equivalent in Fregean categorial grammar to the Ajdukiewicz-style category S(N) (S(N)) is the mixed first/second-level category S(NtS(N)), which is of degree 2. Replacing category N by category D, the representation of (1) in linear notation, with the category assignments written below, would then be: (IF)
very strongly:d (smells(d)), Fido(dog)) S(D,S(D)) S(D) D(B) B
This produces the correct result in that 'smells very strongly' works out as an expression of category S(D), but it fragments the proposition 'Fido smells'; that proposition, indeed, is not to be found as a subordinate unit in (1), in which 'Fido' and 'smells' are bound together by the adverb, whose operands they are. Yet (1) surely entails 'Fido smells' (though that would not be so for every adverb), which is slightly counter-intuitive if no such unit occurs in (1). This is still not a decisive consideration against (IF). But now consider how 'very' might be categorized in order to split up 'very strongly' into its components. 'Strongly' must clearly belong to the same category as 'very strongly' since 'Fido smells strongly' is quite in order. But 'very' requires an adverb as its operand; however, it can qualify a propositional adverb, as in 'very probably', so its Ajdukiewicz-style category would be S(S)(S(S))9 to which S(S,S(S)) is the nearest Fregean equivalent. This yields the following categorial graph (Gl) for (1). (Gl)
smells
O
O D
c
« D Fido
strongly
Epilogue
281
The proposition is again fragmented, 'strongly' and 'smells' being separate operands of 'very'. Yet it may be urged, quite independently, that this is the wrong categorization of 'very', since, in allowing the latter to have any operand of category S(S)9 it would license the combination *'very not'. If we were to prevent this combination by assigning 'very' to category S(D,S(D),S(D,S(D)))9 a mixed first/second/third-level category of degree 3, it would split up 'Fido smells strongly' into three components and then bind them together separately, although, once again, (1) entails 'Fido smells strongly'. In linear notation, we should have: (IF')
very:di,s,d2 (strongly:d3(s(d3),di), smells(d2), Fido(dog)) S(D,S(D),S(D,S(D))) S(D,S(D)) S(D) D(B) B
but at this stage of complexity the categorial graph (GT) is much clearer. (Gl')
_
Dl
smells
D2V very
strongly
o D
o
Fido
B
(I have numbered the D-edges to correspond with the link-letter subscripts in the linear notation, in order to facilitate comparison.) This analysis does, indeed, give the required result that 'very strongly' is an expression of category S(D,S(D))9 but it is too complex to be credible, and one would hesitate even to construct a graph for 'Fido smells very, very strongly'. Moreover, it leaves us with no way of representing 'very probably', unless we posit a distinct sense of 'very' in the latter, an ad hoc and unconvincing solution which would, in any case, still leave us with the problem of explaining why we can have 'very probably' but not *'very not'. Now a little reflexion suggests that the reason *'very not' is
282
Epilogue
nonsense is that 'very' requires as its operand an expression signifying a quality subject to intensive magnitude. Thus Fido can smell more or less strongly, an event be more or less probable. This is an aspect of the meaning of an adverb (or adjective) which is not captured by the present category system, and I must here leave it as an open question whether the system could be developed so as to discriminate between adverbs on this ground, or whether it would have to be supplemented by some quite distinct apparatus. Must we, then, in order to avoid the fragmentation illustrated above, revert to a system of Ajdukiewicz-style categories, in spite of the objections advanced against it in section 2.4? There is a third alternative, though it would never have occurred to Frege, or, if it did, he would have dismissed it as pointless. Schemas and operators, as they have so far been presented, are wholly insensitive to the semantic structures of their
operands. In assigning negation, for instance, to category S(S) and in using the schema 'not p\ we require only that the operand be a proposition. The internal semantic structure of that proposition is a matter of total indifference to us so far as regards its suitability to be an operand of negation. In the case of negation, there are good reasons for such indifference, but there is no a priori reason why, in general, we should not sometimes require that an operand have a specified structure, excluding other expressions of the same category. Now Fregean structures were designed to be interpreted in terms of functions and their arguments, as was noted at the end of section 2.3, and the argument of a function is the value of the expression which is written as the operand of the corresponding schema (function name), so it cannot matter what structure that expression has: only its value is relevant. For example, given a function like the square root, it can make no difference whether we write 'square root (3 + 1)' or 'square root (4)', because '3 + 1' will simply be evaluated to 4 before the square-root function is applied to it. So, if a subordinate structure is important, the only way to allow for it is to split the structure and make each of its components a separate operand of a new schema, which is exactly what we find in (IF) and (IF). If that fragmentation were correct, however, it is difficult to understand how a statement like 'Fido smells' might be greeted with the comment: 'Yes, and very strongly, too!' For the latter seems to be a comment upon the statement as a whole, and not severally upon its parts. Let us then consider whether a better account of the meaning of (1) can be given if we extend Frege's ideography so that an internal structure may be specified for the operands of any operator. Thus the simplest assignment for 'strongly' would be to the category S(S)9 with the requirement that its operand must have the internal structure 'S(D) D'. I
Epilogue
283
shall indicate this by writing the category label as S([S(D) DJ), thus showing the structure required for the operand in square brackets, and shall say that expressions of such a category take a structured operand. Since a structure of the form S(D) D is, as a whole, of category S, S([S(D) D]) is a special case of category S(S). It will be possible to have nested square brackets in category labels, though such complications will probably be unusual. In the planar notation, the obvious way of showing that an operand is restricted in this way is to connect each required constituent of it to the operator by a separate edge. However, this must not be allowed to interfere with the scopeway, nor to prevent the polyadicity of the operator from being determined by its out-degree. Thus, in order to represent (2)
Fido smells strongly,
we should have: (G2)
_
strongly
Fido
S B
smells
The undirected edge here connects an S-node to a D-node and, clearly, this will be commonplace with structured operands. Hitherto, this has only occurred with undirected internal edges, so we can now only require that directed external edges shall connect nodes of the same basic category. It may be that an operand of such a schema has an internal structure which is more complex than the restriction upon the operand demands. Thus we might have a case in which 'strongly' qualifies a transitive verb, such as 'Alex rebuked Nick strongly'. So far, there would be nothing to prevent the undirected edge from 'strongly' going to 'Nick' rather than to 'Alex', but that would clearly be wrong, for Alex is represented by the proposition as the source of strength, not Nick. In order to prevent this, it would be possible to make use of the semantic-role indicators like 'A' (Agent) and 'P' (Patient) which have already featured in categorial graphs. In this case, they could be incorporated into the structural specification of the operand, for example S([S(DA) DJ). Should we encounter any adverbs which can only qualify transitive verbs, their category label would be S([S(D,D) D DJ), but this is again
284
Epilogue
a special case of category S(S). It must also be possible to combine adverbs with quantifying phrases, as in (3)
Every dog smells strongly.
The method of representing this will evidently be to take the additional undirected edge from the second S-node of'strongly' to the D-node of the quantifier, as in (G3). (G3)
O
strongly
smells
Thus the original specification that the operand of an operator of category S([S(D) D]) must have an operand with the structure 'S(D) D' puts the matter too strongly, and requires exactly the same qualification as the specification that an operator of category S(S) must have an operand of category S. In terms of the planar notation, the requirement is that an operator of category S([S(D) DJ) shall have, in addition to a directed external edge to an S-node, an undirected external edge to a Dnode. The same technique will provide for the representation of expressions like 'very', which qualify adverbs. Their category will be S([S(S) SJ), thus ensuring that they can only qualify adverbs. So for (1) we shall have: (G4)
very
strongly
Fido
This categorization of 'very' allows it to qualify propositional as well as non-propositional adverbs. We could restrict it to the latter by specifying its category as S(fS(S) fS(D) D]])\ this would call for two undirected edges from 'very' in the graph, one to an S-node and the other to a Dnode. In the present case, however, that categorization is undesirable.
Epilogue
285
The effect of specifying the structures of operands is not only to reduce the polyadicity of Very' and 'strongly', but also their levels, for both are now of first level instead of third and second respectively. In this regard, the present solution differs as much from Ajdukiewicz's categorizations as from the Fregean ones. But that is an advantage, because it simplifies the semantic structures: a comparison between (Gl) and (G4) shows that, even with the extra edges, (G4) is still substantially simpler than (Gl), with nine nodes and eleven edges as against thirteen nodes and fourteen edges. A representation should ideally contain no greater degree of multiplicity than is essential to its purpose, so the only pertinent question is whether (G4) is complex enough to carry all of the information requisite to representing the meaning of (1) (or, more exactly, those aspects of the meaning of (1) with which we are presently concerned). Well, inferences are a good test of that and, if Fido smells very strongly, then it follows that he smells strongly; while, if he smells strongly, it also follows that he smells. The validity of these inferences is readily comprehensible from (G4), for it would be a simple matter to provide rules which allowed us to detach the remaining sub-graph from 'very', and then to detach a further sub-graph from 'strongly*. (Corresponding rules would not, of course, necessarily be valid for every other adverb or adverb-qualifier.) But once we remove 'very' from (Gl), we are left with three separate sub-graphs and should require further rules in order to re-combine them. This extension of Frege's ideography, accordingly, commends itself both by comparison with the original and with Ajdukiewicz's variant. It may also be a mark of intensional contexts. Cresswell points out that, even supposing that everyone who sings dances and conversely, it will not follow that everyone sings beautifully if and only if he or she dances beautifully (1985, p. 42). Hence, he believes, an extensional logic is unable to provide an account of adverbs, unless by a Davidsonian analysis of propositions reporting events, etc. However, I suspect that 'intensional' is currently used to comprise a variety of linguistic phenomena which should be distinguished, and so am hesitant to tie structured category names to intensionality until we have a much clearer view of the latter. In conclusion, I must explain why I have not formally specified a graph grammar which will generate all and only logical graphs of the type which I have described at an intuitive level and illustrated with examples. The reason is simple: it would be premature to do so. The notation has developed progressively throughout this book, right up to the end; there can be no guarantee that it will not demand further development and, especially until the representation of temporal expressions, which occur
286
Epilogue
in almost all contingent propositions, has been satisfactorily treated, much time and effort could simply be wasted on spelling out the details of a formal grammar. Moreover, one feature of the present notation is not wholly satisfactory and demands further consideration. The representation of alethic propositional connectives (the so-called 'truth functions') needs revision. I have just taken over the operators of propositional logic without modification. However, there is reason to query whether they pertain to semantic structure at all. The interdefinability of the propositional operators has already been noted in another connexion, but it has serious implications for our concept of semantic structure, too. Suppose we want to represent 'neither . . . nor . . .'. One way of doing so is 'not (either . . . or . . .)', but another is 'and (not . . ., not . . .)'. Yet these are quite distinct structures, the first a monadic operator with a dyadic schema as its operand, the second a dyadic operator with two monadic schemas as its operands. Worse, propositional operators can sometimes disappear, for example a double negation in classical logic and two of any odd number of negations (greater than one) in intuitionistic logic. Yet, again, there is an evident difference of structural complexity between 'not (not p)' and merely 'p\ How, then, can these operators be a part of semantic structure? We may also query whether the familiar propositional operators are well chosen to represent the meaning of everyday language. Criticism has mainly focussed upon 'if p, then q', as defined by the customary matrix, as a representation of the meaning of 'if in everyday language, but the standard accounts of 'both p and q' and of 'either p or q' also raise problems. In everyday language, we commonly use 'and' and 'or' to form lists, for example 'Denys invited Theo, Alice and Roland to dinner' and 'Alex may have roast beef, lamb cutlets or loin of pork for lunch'. Yet the 'and' and 'or' of propositional logic are dyadic operators. Perhaps this does not matter much with a conjunctive list, because we can always represent 'p, q and r' as '(p and q) and r' - the truth conditions of both will be the same. Similarly, we could define a triadic operator 'and*(p,q,r)' by 'and(and(p,q),r)\ However, the 'or' of propositional logic represents inclusive disjunction, whereas disjunctive lists far more commonly bear an exclusive sense in everyday language, as does the example just given. In the dyadic case there is, of course, no problem about defining exclusive disjunction. Using Latin, which distinguishes inclusive disjunction as vel from exclusive disjunction as aut, we can say: 'p aut q iff both p vel q and not both p and q'; or, alternatively, 'p aut q if and only if not (p if and only if q). Or we can use the corresponding matrix:
Epilogue aut P P
T
F
287
q T F F T T F
Unfortunately, however, this runs into trouble the moment we try to extend it to lists, for, with aut so defined, 'aut (aut (p, q) r)' does not mean the same as 'p, q aut r\ The latter says that just one of the three propositions is true; the former says that an odd number of them is true. This pattern repeats itself as the list is extended, so the method fails to yield a definition of an operator forming an exclusive disjunctive list. This is not to say that such an operator cannot be defined. Here is a matrix definition. Suppose a list of n propositions («>1). Suppose, furthermore, that we list the truth possibilities for each proposition in the following order: FTFT. . . etc. for the first, FFTT. . . etc. for the second, and so on (with 2n T's or T's in each case). Let M(n) be the value of "auV for n operands. Then Af(2) = FTTF, and M(n) = M(n — 1)+TF. . ., where the * + ' indicates concatenation of the two strings and the dots are filled by further T's to make up the total of 2n 4 T ' s or ' F ' s . T h u s M(3) = F T T F T F F F , M(4) = FTTFTFFFTFFFFFFF. Now what we have here is a definition of an operator of variable polyadicity; it forms a proposition from any number of propositions greater than one. If we are forced to do this to cater for exclusive disjunctive lists, however, why not do the same for conjunctive lists and (if there is any use for it) for inclusive disjunctive lists, too? In these two cases the matrix is evidently much simpler to specify, anyway. The value of the matrix for 'p, q, . . . vel r' is T ' followed by 2n— 1 T's; that for 4p, q, . . . and r' is 2 n - 1 T's followed by T \ Our category system has not hitherto provided for operators of variable polyadicity, so this is an innovation which would have been prevented if we had insisted too precipitately upon formulating a graph grammar based upon it. Nor is the addition just a trivial matter; the precise form of these operators demands careful consideration. It is tempting to say that thay have but one operand, a set of propositions (with two or more members). In that case, however, they will not be iterable, since the result of the operation is a proposition, not the set required as the operand of a further application. Yet we surely want them to be iterable, in order to be able to represent examples like 'Adrian visited Fountains, Jervaulx and either Mount Grace, Rievaulx or Byland (I forget which)'. So it seems that there is a difference between an operator of variable polyadicity and one which takes a set of expressions as its operand.
288
Epilogue
If we insist upon the former for 'and', "veV and 'aut\ though, the question must be faced whether each of these connectives has the same meaning whatever the number of its operands. Two considerations argue that it does not. First, we adopted the principle that difference of category carries with it difference of meaning; but what else is an operator of variable polyadicity but one which can take on, in application, any one of a series of categories? (by contrast with an operator whose operand is a set of expressions). Second, the matrices for each polyadicity differ; all our definitions do is to give a rule for constructing each matrix. Admittedly there is a kinship between the matrices for each operator, but that only establishes a relationship between their meanings, not that they have the same meaning whatever the polyadicity. Against this, we can argue that someone who learns the meaning of any of these expressions in the context of a list with a given number of members will henceforth be able to use it correctly in the context of a list with any other number of members. We can even find a single formula to express the meaning of each connective: for example, 'auf means that just one of the propositions which it connects is true. So 'and', 'ver and 'auf each have the same meaning whatever the length of the list it forms. The purpose of a philosophical book is to stimulate thought, not to put it to rest with solutions to every problem. So I shall not try to resolve this sophisma here, nor to answer the previous question whether alethic connectives belong to semantic structure. These issues have not, in any case, been raised here for their own sakes so much as to substantiate my claim that it is still premature to formulate a graph grammar for semantic representation of everyday language. By this time I trust that I have also vindicated the claim made in the Introduction that the representation problem is commonly not accorded the respect which it deserves.
Bibliography
Ades, A. E., and Steedman, M. J. (1982), 'On the Order of Words', Linguistics and Philosophy 4, pp. 517-58. Aho, A. V., and Ullman, J. D. (1972), The Theory of Parsing, Translation, and Compiling. Vol. 1: Parsing. Englewood Cliffs, Prentice-Hall. Ajdukiewicz, K. (1935), 'Die syntaktische Konnexitat', Studia Philosophica 1, pp. 1-27. Part I translated by P. T. Geach (1967), 'On Syntactical Coherence', The Review of Metaphysics 20, pp. 635-47 (from the Polish text in K. Ajdukiewicz (1960), Jezyk i poznanie, Warsaw). Akmajian, A., and Heny, F. (1975), An Introduction to the Principles of Transformational Syntax. Cambridge, Massachusetts, The MIT Press. Allen, J. (1987), Natural Language Understanding. Menlo Park, Benjamin/ Cummings. Altham, J. E. J. (1971), The Logic of Plurality. London, Methuen. Anderson, J. M. (1971), The Grammar of Case: towards a Localistic Theory. Cambridge University Press. (1977), On Case Grammar: Prolegomena to a Theory of Grammatical Relations. London, Croom Helm. Bach, E. (1984), 'Some Generalizations of Categorial Grammars', in T. Landman and F. Veltman (eds.), Varieties of Formal Semantics. Dordrecht, Foris. Pp. 1-23. Baker, M. C , Johnson, K., and Roberts, I. (1989), 'Passive Arguments Raised', Linguistic Inquiry 20, pp. 219-51. Bar-Hillel, Y. (1953), 'A Quasi-Arithmetical Notation for Syntactic Description'. Language 29, pp. 47-58. Re-printed in Bar-Hillel (1964), pp. 61 74. (1964), Language and Information. Reading, Massachusetts, Addison-Wesley. Bar-Hillel, Y., Gaifman, C , and Shamir, E. (1960), 'On Categoriai and PhraseStructure Grammars', The Bulletin of the Research Council of Israel 9F, pp. 1-16. Re-printed in Bar-Hillel (1964), pp. 99-115. Bartsch, R. (1979), 'The Syntax and Semantics of Subordinate Clause Constructions and Pronominal Coreference', in Heny and Schnelle (eds.), pp. 23-59. Belletti, A., Brandi, L., and Rizzi, L. (1981), Theory of Markedness in Generative Grammar. Proceedings of the 1979 GLOW Conference. Pisa: Scuola Normale Superiore di Pisa. Bendix, E. H. (1966), Componential Analysis of General Vocabulary: the Semantic Structure of a Set of Verbs in English, Hindi and Japanese. Bloomington: 289
290
Bibliography
University of Indiana Press, and The Hague: Mouton (International Journal of American Linguistics 32, part 2). Bloomfield, L. (1933), Language. New York, Holt, Rinehart and Winston. Bobrow, D. G., and Winograd, T. (1977), 'An Overview of KRL, a Knowledge Representation Language', Artificial Intelligence 8, pp. 155-73. Re-printed in Brachman and Levesque (eds.), (1985), pp. 263-85. Borsley, R. D. (1991), Syntactic Theory: a Unified Approach. London, Edward Arnold. Bourbaki, N. (1954), Elements de mathematique, Part 1: Les structures fondamentales de Vanalyse, Book 1, The'orie des ensembles. Paris, Hermann et Cie. Brachman, R. J., and Levesque, H. (eds.) (1985), Readings in Knowledge Representation. Palo Alto, California, Morgan Kaufmann. Buszkowski, W., Marciszewski, W. and Van Benthem, J. (1988), Categorial Grammar (Linguistic and Literary Studies in Eastern Europe, 25). Amsterdam, John Benjamins. Carlson, L. (1982), 'Plural Quantifiers and Informational Independence', Acta Philosophica Fennica 35, pp. 163-74. Carnap, R. (1937), The Logical Syntax of Language. London, Kegan Paul, Trench, Tubner. Carroll, L. (1896a, 1896b), Symbolic Logic and The Game of Logic. Reprinted New York, Dover, 1958. [a = Symbolic Logic; b = The Game of Logic]. Cercone, N. (1975), 'Representing Natural Language in Extended Semantic Networks'. Edmonton, University of Alberta, Department of Computing Science Technical Report TR75-11. Charniak, E. (1975), 'Organisation and Inference in a Frame-like System of Common Sense Knowledge', in R. C. Schank and B. Nash-Webber (eds.). Chomsky, A. N. (1956), 'Three Models for the Description of Language'. Proc. Group. Inf. Th., 2, no. 3, pp. 113-24. (1957), Syntactic Structures. The Hague, Mouton. (1962), 'A Transformational Approach to Syntax', in A. A. Hill, (ed.), Proceedings of the Third Texas Conference on Problems of Linguistic Analysis in English. Austin, The University of Texas. Re-printed in Fodor and Katz (eds.) (1964), pp. 211-45. (1965), Aspects of the Theory of Syntax. Cambridge, Massachusetts, The MIT Press. (1966), Cartesian Linguistics. New York, Harper and Row. (1977), Essays on Form and Interpretation. New York: North-Holland. (1981; page references are to revised edition, 1982), Lectures on Government and Binding. Dordrecht, Foris. (1982), Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, Massachusetts: The MIT Press. (1986a), Barriers. Cambridge, Massachusetts: The MIT Press. (1986b), Knowledge of Language: its Nature, Origin and Use. New York and Eastbourne: Praeger. Chomsky, A. N. and Geach, P. T. (1969), 'Should Traditional Grammar be Ended or Mended?', Educational Review 22, pp. 5-25. Chomsky, A. N., and Miller, G. A. (1956), 'Finitary Models of Language Users',
Bibliography
291
in R. D. Luce, R. D. Bush, and E. Galanter (eds.), Handbook of Mathematical Psychology, Vol. 2. New York, John Wiley. Ch. 13. Church, A. (1956), Introduction to Mathematical Logic, Vol. I. Princeton University Press. Claus, V., Ehrig, H. and Rozenberg, G. (eds.) (1979), Graph Grammars and their Application to Computer Science and Biology (Lecture Notes in Computer Science, 73). Berlin, Springer-Verlag. Conklin, H. C. (1962), 'Lexicographical Treatment of Folk Taxonomies', in F. W. Householder and S. Saporta (eds.), Problems in Lexicography (Publications of Indiana Research Center in Anthropology, Folklore and Linguistics 21). Baltimore. Pp. 119-41. Cooper, R. (1979), T h e Interpretation of Pronouns', in Heny and Schnelle (eds.), pp. 61-92. Cresswell, M. J. (1973), Logics and Languages. London, Methuen. (1977), Categorial Languages. Bloomingtom, Indiana University Linguistics Club. (1985), Adverbial Modification: Interval Semantics and its Rivals. (Studies in Linguistics and Philosophy 28). Dordrecht, D. Reidel. Davidson, D. (1965), Theories of Meaning and Learnable Languages', in Y. BarHillel (ed.), Logic, Methodology and Philosophy of Science. Amsterdam, North-Holland. Pp. 384-94. (1967a), T h e Logical Form of Action Sentences', in N. Rescher (editor), The Logic of Decision and Action. Pittsburgh, University of Pittsburgh Press. Pp. 81-95. (1967b), 'Causal Relations', The Journal of Philosophy 64, pp. 691-703. (1969), T h e Individuation of Events', in N. Rescher (ed.), Essays in Honor of Carl G. Hempel. Dordrecht, Reidel. Pp. 216-34. Davidson, D., and Harman, G. (eds.) (1972), Semantics of Natural Language. Dordrecht, D. Reidel. (eds.) (1975), The Logic of Grammar. Encino, California, Dickenson. den Besten, H. (1981), 4A Case Filter for Passives', in Belletti, Brandi and Rizzi (eds.), pp. 65-122. Dixon, R. M. W. (1989), 'Subject and Object in Universal Grammar', in D. Arnold, M. Atkinson, J. Durand, C. Grover and L. Sadler (eds.), Essays on Grammatical Theory and Universal Grammar. Oxford, Clarendon Press. Ch. 4, pp. 91-118. Dowty, D. (1988), Type Raising, Functional Composition, and Non-Constituent Conjunction', in Oehrle, Bach and Wheeler (eds.), pp. 153-97. Dummett, M. (1973), Frege: Philosophy of Language. London, Duckworth. (1981a), The Interpretation of Frege's Philosophy. London, Duckworth. (1981b), 'Objections to Chomsky', London Review of Books, 3-16. September. Ehrig, E., Nagl, M. and Rozenberg, G. (eds.) (1983), Graph Grammars and their Application to Computer Science (Lecture Notes in Computer Science, 153). Berlin, Springer-Verlag. Evans, G. (1977), 'Pronouns, Quantifiers and Relative Clauses', The Canadian Journal of Philosophy 7, pp. 467-536, 777-97. Re-printed (with minor changes) in M. Platts (ed.) (1980), Reference, Truth and Reality: Essays on the Philosophy of Language. London, Routledge and Kegan Paul. Re-printed in
292
Bibliography
Evans (1985), pp. 76-175. (1980), 'Pronouns', Linguistic Inquiry 11, pp. 337-62. Re-printed in Evans (1985), pp. 214-48. (1981), 'Understanding Demonstratives', in H. Parrett and J. Bouveresse (eds.). Meaning and Understanding. Berlin, de Gruyter. Re-printed in Evans (1985), pp. 291-321. (1985), Collected Papers. Oxford, Clarendon Press. Fillmore, C. J. (1966), A Proposal Concerning English Prepositions (Monograph Series on Language and Linguistics, 19). Washington, Georgetown University. Pp 19-34. (1968a), 'The Case for Case', in E. Bach and T. Harms (eds.), Universals in Linguistic Theory. New York, Holt, Rinehart and Winston. Pp. 1-88. (1968b), 'Lexical Entries for Verbs', Foundations of Language 4, pp. 373-93. (1971), 'Some Problems for Case-Grammar', in R. O'Brien (ed.), Report of the 22nd Annual Round Table Meeting on Linguistics and Language Studies (Monograph Series on Language and Linguistics, 24). Washington, Georgetown University. Pp. 35-56. (1975a), 'The Future of Semantics', in R. Austerlitz (ed.), The Scope of American Linguistics. Peter de Ridder Press. (1975b), 'Against a Checklist Theory of Meaning', in Proceedings of the First Annual Meeting of the Berkeley Linguistics Society. Berkeley, Institute of Human Learning. (1977), 'Scenes and Frames Semantics', in A. Zampolli (ed.), Linguistic Structures Processing. Amsterdam, North-Holland. Pp. 55-82. Findler, N. V. (ed.) (1979), Associative Networks: the Representation and Use of Knowledge in Computers. New York, Academic Press. Fitch, F. B. (1952), Symbolic Logic. New York, The Ronald Press. Fodor, J. A. (1970), 'Three Reasons for Not Deriving "Kill" from "Cause to Die"', Linguistic Inquiry 1, pp. 429-38. Fodor, J. A., and Katz, J. J. (eds.) (1964), The Structure of Language. Englewood Cliffs, Prentice-Hall. Fowler, H. W. (1926), A Dictionary of Modern English Usage. Oxford, Clarendon Press. Frege, G. (1879), Begriffsschrift. Halle, L. Nebert. Reprinted Hildesheim, Georg Olms, 1964. Translation by T. W. Bynum (1972), Conceptual Notation and Related Articles. Oxford, Clarendon Press. (1884), Die Grundlagen der Arithmetik. Breslau, Wilhelm Koebner. Translated by J. L. Austin (1950; second edition 1953), The Foundations of Arithmetic. Oxford, Basil Blackwell. (1891), 'Funktion und Begriff, in Frege (1967), pp. 125-42. Translated by P. T. Geach (1952) in P. T. Geach and M. Black (eds.), Translations from the Philosophical Writings of Gottlob Frege. Oxford, Basil Blackwell. Pp. 21-41. (Page references are to the 1891 printing, reproduced in the 1967 reprint and in the 1952 translation.) (1892), 'Uber Begriff und Gegenstand', in Frege (1967), pp. 167-78. Translated by Geach (1952) in Geach and Black (eds.), pp. 42-55. (1893, 1903), Grundgesetze der Arithmetik, Vol. I 1893, Vol. II 1903. Jena, H. Pohle. Reprinted Hildesheim, Georg Olms, 1962. Partial translation by M.
Bibliography
293
Furth (1964), The Basic Laws of Arithmetic. Berkeley and Los Angeles, University of California Press. (1896), 'Uber die Begriffsschrift des Herrn Peano und meine eigene', in Frege (1967), pp. 230-33. (1918), Logische Untersuchungen: I. Der Gedanke, in Frege (1967), pp. 342-62. Translated by Geach (1977), Logical Investigations. Oxford, Basil Blackwell. Thoughts', pp. 1-30. (1967), Kleine Schriften, ed. I. Angelelli, Darmstadt, Wissenschaftliche Buchgesellschaft. (1969), Nachgelassene Schriften. Hamburg, Felix Meiner Verlag. 'Logik', pp. 137-63. Translated by P. Long and R. M. White (1979), Posthumous Writings. Oxford, Basil Blackwell. 'Logic', pp. 126-51. Galton, A. (1984), The Logic of Aspect. Oxford, Clarendon Press. (1988), 'Formal Semantics: is it Relevant to Artificial Intelligence?'. The University of Leeds (duplicated). Gazdar, G., Klein, E., Pullum, G. and Sag, I. (1985), Generalized Phrase Structure Grammar. Oxford, Basil Blackwell. Geach, P. T. (1956), 'Good and Evil', Analysis 17, pp. 33-42. Re-printed in P. Foot (ed.) Theories of Ethics. London, Oxford University Press, 1967, pp. 64-73. (1962, second edition), Reference and Generality. Ithaca, Cornell University Press. Third edition 1979. (1965), 'Complex Terms Again', Journal of Philosophy 72, no. 23, pp. 716-17. Re-printed in Geach (1972), pp. 106-8. (1968), 'Quine's Syntactical Insights', Synthese 19, no. 1/2. Re-printed in Geach (1972), pp. 115-27. (1970), 'A Program for Syntax', Synthese 23, pp. 3-17. Re-printed in Davidson and Harman (eds.) (1972), pp. 483-97. (1972), Logic Matters. Oxford, Basil Blackwell. (1973), 'On Operators of Common Scope'. The University of Leeds (duplicated). (1976), Reason and Argument. Oxford, Blackwell. (1979), 'Existential or Particular Quantifier?', in P. Weingartner and E. Morscher (eds.), Ontology and Logic (Grazer Philosophische Studien). Berlin: Duncker and Humblot. (1980), 'Some Problems about the Sense and Reference of Proper Names', Canadian Journal of Philosophy, Supplementary Volume 6, pp. 83-96. Genesereth, M. R., and Nilsson, N. J. (1987), Logical Foundations of Artificial Intelligence. Los Altos, California, Morgan Kaufmann. Gonzalez, R. C. and Thomason, M. G. (1978), Syntactic Pattern Recognition. Reading, Massachusetts, Addison-Wesley. Goodenough, W. H. (1956), 'Componential Analysis and the Study of Meaning', Language 32, pp. 195-216. Gross, M. (1972), Mathematical Models in Linguistics. Englewood Cliffs, Prentice-Hall. Gruber, J. (1965), 'Studies in Lexical Relations', Ph. D. dissertation, Massachusetts Institute of Technology. Gupta, A. (1980), The Logic of Common Nouns. New Haven, Yale University
294
Bibliography
Press. Hai'k, I. (1984), 'Indirect Binding', Linguistic Inquiry 15, pp. 185-223. Haim, I. (1982), The Semantics of Definite and Indefinite Noun Phrases', Ph. D. dissertation, University of Massachusetts, Amherst. Harary, F. (1969), Graph Theory. Reading, Massachusetts, Addison-Wesley. Harman, G. (1972), 'Deep Structure as Logical Form', in Davidson and Harman (eds.), pp. 25-47. Harris, Z. (1951), Methods in Structural Linguistics. Chicago, University of Chicago Press. Hayes, P. J. (1980), 'The Logic of Frames', in Metzing (ed.), pp. 46-61. Reprinted in Brachman and Levesque (eds.) (1985), pp. 287-95. Hellen, L. (1981), 'An Argument for a Transformational Derivation of Passives', in Belletti, Brandi and Rizzi (eds.), pp. 217-86. Hendrix, G. G. (1975a), Partitioned Networks for the Mathematical Modeling of
Natural Language Semantics. Austin, University of Texas, Department of Computer Science Technical Report NL-28. (1975b), 'Expanding the Utility of Semantic Networks by Partitioning', Proceedings of the 4th International Conference on Artificial Intelligence.
Pp. 115-21. (1979), 'Encoding knowledge in partitioned networks', in Findler (ed.), pp. 5192. Henkin, L. (1964), Completeness (The Voice of America Forum Lectures, Philosophy of Science Series, 5). Heny, F. and Schnelle, H. S. (eds.) (1979), Syntax and Semantics. Vol. 10: Selections from the Third Groningen Round Table. New York, Academic
Press. Hintikka, J. (1956), 'Identity, Variables and Impredicative Definitions', Journal of Symbolic Logic 21, pp. 225-45. (1957), 'Vicious Circle Principle and the Paradoxes', Journal of Symbolic Logic 22, pp. 245-9. (1979), 'Rejoinder to Peacocke', in E. Saarinen (ed.), Game-Theoretical Semantics. Dordrecht: D. Reidel. Hobbs, J. R., and Scheiber, S. M. (1987), 'An Algorithm for Generating Quantifier Scopings', Computational Linguistics 13, 1-2. Hopcroft, J. E., and Ullman, J. D. (1969), Formal Languages and their Relation to Automata. Reading, Massachusetts, Addison-Wesley. Hornstein, N. (1984), Logic as Grammar, Cambridge, Massachusetts, MIT Press. Huck, G. J. (1988), 'Phrasal Verbs and the Categories of Postponement', in Oehrle, Bach and Wheeler (eds.), pp. 249-63. Huddleston, R. (1976), An Introduction to English Transformational Syntax.
London, Longman.
Jacobsen, B. (1986), Modern Transformational Grammar, with Particular Reference to the Theory of Government and Binding (North-Holland
Linguistic Series 53)^Amsterdam, North-Holland. Jackendoff, R. (1977), X Syntax: A Study of Phrase Structure. Cambridge, Massachusetts, The MIT Press. Jaeggli, O. A. (1986), 'Passive', Linguistic Inquiry 17, pp. 587-622. Kamp, H. (1984), 'A Theory of Truth and Semantic Representation', in J.
Bibliography
295
Groenendijk, T. M. V. Janssen, and M. Stockhoff (eds.), Truth, Interpretation and Information. Dordrecht, Foris. Katz, J. J. (1964), 'Analyticity and Contradiction in Natural Language', in Fodor and Katz (eds.), pp. 519-43. (1964), The Structure of a Semantic Theory', in Fodor and Katz (eds.), pp. 479-518. Katz, J. J., and Postal, P. M. (1964), An Integrated Theory of Linguistic Descriptions. Cambridge, Massachusetts, The MIT Press. Kay, M. (1973), The MIND System', in R. Rustin (ed.), Natural Language Processing. New York: Algorithmic Press. Pp. 115-88. Keenan, E. L. (1973), 'Presupposition in Natural Logic', The Monist 57, pp. 34470. (1979), Tassive: a Case Study in Markedness'. Los Angeles, University of California, and Tel Aviv University (duplicated). Kenny, A. J. P. (1963), Action, Emotion and Will. London, Routledge and Kegan Paul. Kimball, J. P. (1973), The Formal Theory of Grammar. Englewood Cliffs, Prentice-Hall. Kripke, S. (1972), 'Naming and Necessity', in D. Davidson and G. Harman (eds.), Semantics of Natural Language. Dordrecht, Reidel. Pp. 253-355. Lakoff, G. (1965), 'On the Nature of Syntactic Irregularity', in Mathematical Linguistics and Automatic Translation, Report No. NSF-I6. Cambridge, Massachusetts, Harvard University Computation Laboratory. (1969), 'On Generative Semantics', in D. D. Steinberg, and L. A. Jakobovits (eds.), Semantics: an Interdisciplinary Reader in Philosophy, Linguistics, Anthropology and Psychology. Cambridge University Press. Lambek, J. (1958), 'The Mathematics of Sentence Structure', American Mathematical Monthly 65, pp. 154-69. Re-printed in Buszkowski, Marciszewski and Van Benthem (eds.) (1988), pp. 153-72. (1961), 'On the Calculus of Syntactic Types', in R. Jacobson (ed.), Structure of Language and its Mathematical Aspects. Providence, American Mathematical Society. Pp. 166-78. (1988), 'Categorial and Categorical Grammars', in Oehrle, Bach and Wheeler (eds.), pp. 297-317. Lappin, S. (1989), 'Donkey Pronouns Unbound', Theoretical Linguistics 15, pp. 263-86. Lasnik, H. (1976), 'Remarks on Coreference', Linguistic Analysis 2, pp. 1-22. Ledley, R. S., et al. (1965), 'FIDAC: Film Input to Digital Automatic Computer and Associated Syntax-Directed Pattern-Recognition Programming System', in Tippett, Beckowitz, Koester and Vanderburgh (eds.), Optical and ElectroOptical Information Processing, 591-613. Cambridge, Massachusetts, The MIT Press. Levin, H. D. (1982), Categorial Grammar and the Logical Form of Quantification. Naples, Bibliopolis. Lewis, D. (1970), 'General Semantics', Synthese 22, pp. 18-67. Re-printed in Davidson and Harman (eds.), pp. 169-218, to which cited pages refer. (1974), Tensions', in Munitz and Unger (eds.), pp. 49-61. Lindenmayer, A. and Rozenberg, G. (1979), 'Parallel Generation of Maps:
296
Bibliography
Developmental Systems for Cell Layers', in Claus, Ehrig and Rozenberg (eds.), pp. 301-16. Lorenz, K. (1963), Das Sogenannte Boese. Vienna, Dr G. Borotha-Schoeler Verlag. Translated by M. Latzke (1966), On Aggression. London, Methuen. Lounsbury, F. G. (1956), 'A Semantic Analysis of the Pawnee Kinship Usage', Language 32, pp. 158-94. (1984), T h e Structural Analysis of Kinship Semantics', in H. G. Lunt (ed.), Proceedings of the Ninth International Congress of Linguists. The Hague:
Mouton. Pp. 1073-93.
Lyons, J. (1968), Introduction to Theoretical Linguistics. Cambridge University Press. Marcus, S. (1967), Algebraic Linguistics; Analytical Models. New York, Academic Press. Martin, J. N. (1984), T h e Semantics of Frege's Grundgesetze\ History and Philosophy of Logic 5, pp. 143-76. May, R. (1985), Logical Form: its Structure and Derivation. Cambridge, Massachusetts, The MIT Press. McCawley, J. D. (1971), 'Where do Noun Phrases Come from?', in Steinberg and Jakobovits (eds.), pp. 217-31. (1982), Thirty Million Theories of Grammar. London, Croom Helm. McCloskey, J. (1988), 'Syntactic Theory', in F. J. Newmeyer (ed.), Linguistics: the Cambridge Survey, vol. I. Cambridge University Press. Metzing, D. (ed.) (1980), Frame Conceptions and Text Understanding (Research in Text Theory, 5). Berlin, de Gruyter. Minsky, M. (1975), 4A Framework for Representing Knowledge', in P. Winston (ed.), The Psychology of Computer Vision. New York, McGraw-Hill. Pp. 211-277. Condensed version reprinted in Metzing (ed.) (1980), pp. 1-25. Another version in J. Haugheland (ed.) (1981), Mind Design. Cambridge, Massachusetts, The MIT Press. Pp. 95-128. Re-printed in Brachman and Levesque (1985), pp. 245-62. Montague, R. (1974), Formal Philosophy. New Haven, Yale University Press. Morris, C. W. (1938), Foundations of the Theory of Signs (International Encyclopedia of Unified Science, vol. I, no. 2). University of Chicago Press. Munitz, M. K., and Unger, P. K. (eds.) (1974), Semantics and Philosophy., New York University Press. Newman, J. H. (1845), An Essay on the Development of Christian Doctrine. Reprinted Harmondsworth, Penguin, 1974. Nilsson, N. J. (1980), Principles of Artificial Intelligence. Los Altos, California, Morgan Kaufmann. Noonan, H. (1980), Objects and Identity. The Hague, Nijhoff. Oehrle, R. T., Bach, E. and Wheeler, D. (eds.) (1988), Categorial Grammars and Natural Language Structures. Dordrecht, D. Reidel. Partee, B. H. (ed.) (1976), Montague Grammar. New York, Academic Press. Peirce, C. S. (1906), 'Prolegomena to an Apology for Pragmatism', The Monist 16, pp. 492-546. (1960), Collected Papers of Charles Sanders Peirce, vol. 4, ed. C. Hartshorne and P. Weiss. Cambridge, Massachusetts, Harvard University Press. Perry, J. (1977), 'Frege on Demonstratives', The Philosophical Review 86, pp. 474-
Bibliography
297
97. Re-printed in Yourgrau (ed.) (1990), pp. 50-70. Pollard, C , and Sag, I. (1988), Information-Based Syntax and Semantics. Vol. 1: Fundamentals. Stanford, CSLI. Postal, P. M. (1964), Constituent Structure. Bloomington, Indiana University Research Center in Anthropology, Folklore and Linguistics. Postal, P. M., and Pullum, G. K. (1988), 'Expletive Noun Phrases in Subcategorized Positions', Linguistic Inquiry 19, pp. 635-70. Potts, T. C. (1968), 'The Logical Delineation of Constructions following Psychological Verbs', Oxford D. Phil thesis (copy in Bodleian Library). (1973), 'Fregean Categorial Grammar', in R. J. Bogdan and I. Niiniluoto (eds.), Logic, Language and Probability. Dordrecht, Reidel. Pp. 245-84. (1974), 'Modal Logic and Auxiliary Verbs', in C. Heidrich (ed.), Semantics and Communication. Amsterdam, North-Holland. Pp. 180-207. (1975), 'Model Theory and Linguistics', in E. Keenan (ed.), Formal Semantics of Natural Language. Cambridge University Press. Pp. 241-50. (1976), 'Montague's Semiotic: a Syllabus of Errors', Theoretical Linguistics 3, pp. 191-208. (1978a), 'Case Grammar as Componential Analysis', in W. Abraham (ed.), Valence, Semantic Case and Grammatical Relations. Amsterdam, John Benjamins. Pp. 399-457. (1978b), 'Fregean Grammar: a Formal Outline', Studia Logica 37, pp. 7-26. (1979), 'A General Theory of the Meaning of Anaphoric Pronouns', in Heny and Schnelle (eds.), pp. 141-98. Prior, A. N. (1971), Objects of Thought. Oxford, Clarendon Press. Putnam, H. (1975), Language, Mind and Reality: Philosophical Papers, Vol. 2. Cambridge University Press. Quillian, M. R. (1966), Semantic Memory. Cambridge, Massachusetts, Bolt, Beranek and Newman. Report AFCRL-66-189. Re-printed in M. Minsky (ed.) (1968), Semantic Information Processing. Cambridge, Massachusetts, The MIT Press. Pp. 227-70. (1967), 'Word Concepts: a Theory and Simulation of Some Basic Semantic Capabilities', Behavioral Science 12, pp. 410-30. Re-printed in Brachman and Levesque (eds.) (1985), pp. 98-118. Quine, W. V. O. (1952), Methods of Logic. London, Routledge and Kegan Paul. (1965), Elementary Logic (revised edition). New York, Harper Torchbook, also Cambridge, Massachusetts, Harvard University Press, 1966. Radford, A. (1981), Transformational Syntax: a Student's Guide to Chomsky's Extended Standard Theory. Cambridge University Press. (1988), Transformational Grammar. Cambridge University Press. Reinhart, T. (1984), 'A Surface-Structure Analysis of "Donkey" Anaphora'. Tel Aviv University (duplicated). Russell, B. (1905), 'On Denoting', Mind 14, pp. 479-93. Re-printed in B. Russell (1956), Logic and Knowledge. London, Allen and Unwin. Pp. 41-56. (1918), The Philosophy of Logical Atomism. Re-printed in Russell (1956). (1919), Introduction to Mathematical Philosophy. London, Allen and Unwin. (1937), The Principles of Mathematics. London, Allen and Unwin. Second edition (first edition 1903). (1940), An Inquiry into Meaning and Truth. London, Allen and Unwin.
298
Bibliography
(1956), Logic and Knowledge. London, Allen and Unwin. Schank, R. C. (1972), T h e Sixteen Conceptual Actions Underlying Natural Language'. Stanford, Department of Computer Science and Committee on Linguistics, Stanford University. (1975a), Conceptual Information Processing. Amsterdam, North-Holland. (1975b), 'Using Knowledge to Understand', in Schank and Nash-Webber (1975). Schank, R. C , and Nash-Webber, B. (eds.) (1975), Workshop on Theoretical Issues in Natural Language Processing (TIN LAP )^ vol. 1. Schubert, L. K. (1976), 'Extending the Expressive Power of Semantic Networks'. Artificial Intelligence 7, pp. 163-98. Schubert, L. K., Goebel, R. G. and Cercone, N. J. (1979), T h e Structure and Organization of a Semantic Net for Comprehension and Inference', in Findler (ed.), pp. 121-75. Scragg, G. (1976), 'Semantic Nets as Memory Models', in E. Charniak and Y. Wilks (eds.), Computational Semantics. Amsterdam, North-Holland. Pp. 101-27. Shapiro, S. C. (1971), 'A Net Structure for Semantic Information Storage, Deduction and Retrieval', Proceedings of the Second International Joint Conference on Artificial Intelligence. Pp. 512-23. (1979), T h e SNePS Semantic Network Processing System', in Findler (ed.), pp. 179-203. Simmons, R. F. (1973), 'Semantic Networks', in R. C. Schank and K. Colby (eds.), Computer Models of Thought and Language. San Francisco, W. H. Freeman. Sommers, F. (1982), The Logic of Natural Language. Oxford, Clarendon Press. Sowa, J. F. (1984), Conceptual Structures. Reading, Massachusetts, AddisonWesley. Steedman, M. J. (1983), 'A Categorial Syntax for Subject and Tensed Verb in English and some Related Languages'. Duplicated. (1985), 'Dependency and Coordination in the Grammar of Dutch and English', Language 61, pp. 523-68. (1988), 'Combinators and Grammars', in Oehrle, Bach and Wheeler (eds.), pp. 417-42. Steinberg, D. D., and Jakobovits, L. A. (1971), Semantics: an Interdisciplinary Reader in Philosophy, Linguistics and Psychology. Cambridge University Press. Strawson, P. F. (1952), Introduction to Logical Theory. London, Methuen; New York, John Wiley. Tarksi, A. (1936), 'Der Wahrheitsbegriff in den formalisierten Sprachen', Studia Philosophica 1, pp. 261-405. Translated by J. H. Woodger (1956), T h e Concept of Truth in Formalized Languages', in A. Tarski, Logic, Semantics, Mathematics. Oxford, Clarendon Press. Pp. 152-278. Tinberghen, N. (1951), The Study of Instinct. Oxford, Clarendon Press. Trier, J. (1931), Der deutsche Wortschatz im Sinnbezirk des Verstandes. Heidelberg, Winter (second edition 1973). Van Benthem, J. (1986), Essays in Logical Semantics. Dordrecht, Reidel. (1988), T h e Lambek Calculus', in Oehrle, Bach and Wheeler (eds.), pp. 35-68.
Bibliography
299
Vesey, G. (ed.) (1976), Communication and Understanding. London, Harvester Press. Wang, H. (1952), 'Logic of Many-Sorted Theories', Journal of Symbolic Logic 17, pp. 105-16. Watts, I. (1724), Logick. London, Buckland, Rivington, Rivington, Longman, Field, Dilly, Robinson, Robinson, Flexney and Goldsmith. Weinreich, U. (1966), 'Explorations in Semantic Theory', in T. Sebeok (ed.), Current Trends in Linguistics. Vol. 3: Theoretical Foundations. The Hague, Mouton. Pp. 395-477. Wells, R. S. (1947), 'Immediate Constituents', Language 23, pp. 81-117. Wiggins, D. (1967), Identity and Spatio-Temporal Continuity. Oxford, Basil Blackwell. Wilks, Y. (1978), 'Frames, Scripts, Stones and Fantasies', in E. Stegentritt (ed.), Proceedings of the Regensberg Romanstentag, University of Regensberg.. Williams, C. J. F. (1989), What is Identity? Oxford, Clarendon Press. Williams, E. (1983), 'Against Small Clauses', Linguistic Inquiry 14, pp. 287-308. Wilson, R. J. (1979), Introduction to Graph Theory. Harlow, Longman. Wittgenstein, L. (1922), Tractatus Logico-Philosophicus. London, Routledge and Kegan Paul. (1937), 'Ursache und Wirkung: intuitives Erfassen'. Text and English translation in Philosophia 6 (1976), pp. 391-445. (1953), Philosophische UntersuchungenjPhilosophical Investigations. Oxford, Basil Blackwell. (1958), The Blue and Brown Books. Oxford, Basil Blackwell. (1964), Philosophische Bemerkungen. Oxford, Basil Blackwell. Translated by R. Hargreaves and R. M. White (1975), Philosophical Remarks. Oxford, Basil Blackwell. (1969), Philosophische Grammatik. Oxford, Basil Blackwell. Translated by A. Kenny (1974), Philosophical Grammar. Oxford, Basil Blackwell. (The page numbers are the same in the translation as in the German edition.) Woods, W. A. (1975), 'What's in a Link: Foundations for Semantic Networks', in D. G. Bobrow and A. Collins (eds.), Representation and Understanding. Vol. 2: Studies in Cognitive Science. New York, Academic Press. Re-printed in Brachman and Levesque (eds.) (1985), pp. 217-41. (1977), 'Lunar Rocks in Natural English: Explorations in Natural Language Question Answering', in A. Zampoli (ed.), Linguistic Structures Processing. New York, Elsevier North-Holland. (1978), 'Semantics and Quantification in Natural Language Question Answering', in M. C. Yovits (ed.), Advances in Computers, vol. 17. New York, Academic Press. Woods, W. A., Kaplan, R. M. and Nash-Webber, B. (1972), The Lunar Sciences Natural Language Information System: Final Report. Cambridge, Massachusetts, Bolt, Beranek and Newman. BBN Report no. 2378. Yourgrau, P. (ed.) (1990), Demonstratives. Oxford University Press.
Index
'A that F' 155, 183, 190, 198, 201, 249 Absorption 36 Act of naming, see under Naming, act of Actions basic/primitive 114 in conceptual dependency 115 vs. states 111 thoughts of 111 Address, in computing 215, 217 Ades, A.E., and Steedman, M.J. 66, 82 Adjectives 186, 256 adsolute 183 numerical 261 Adverbial modification xiii Adverbs 186, 278 numerical 262 Agent role 10, 11, 14, 15, 17, 113, 134 in Sowa's conceptual graphs 105 Ajdukiewicz, K. 64, 69, 73-5, 80, 82, 90, 169, 176, 279 grammar, rule (A) 65 Akmajian, A., and Heny, F. 69 Aliases 229 Allen, J. 98, 106, 109 Alphabet, of grammar 3, 46 Altham, J.E.J. 245, 259 Ambiguity 68, 208 Analyses alternative 74 compatible vs. incompatible 57 componential 96, 111 structural 48, 53; semantic vs. syntactic 67 syntactic 106 Anaphors 24 'and' 8 Aquinas, St Thomas 193 Argument bound vs. free in transformational grammar 24
in case-grammar 10 external (in transformational grammar) 11 internal (in transformational grammar) 11 Arguments invalidity of 48 minimal steps in 49 patterns 48 schema vs. pattern 49 valid 49 Arguments and logic 48 Artistotle 67, 128, 177, 182, 200, 202, 206, 207, 244 Arrow-signs 216 Article indefinite 145, 158 Assumption, in logical deduction 181 ATRANS 114 ATTEND 115 Attribute in semantic network 96 'aut' (exclusive disjunction) 286-7 Bach, E. 67 Bach-Peters propositions 31, 36, 161 Baker, M.C., Johnson, K., and Roberts, I. 22 Bar-Hillel, Y. 65,69,73 Bar-Hillel, Y., Gaifman, C , and Shamir, E. 90 Barriers to government 19 BASIC 237 Basic categories, see under Categories, basic Bearer of proper name 138,199 Bedeutung in Frege 63 Bendix, E.H. 111 Benefactive role 10 Binding 140 ambiguity 158
300
Index in transformational grammar Bloomfield, L. 3 Body attribute 112 name of kind 215 Newtonian 111, 184,202 Body-pointers 213 Bourbaki, N. 79 Butler, Bishop 129 'by nature' 204
24
C-command 143 Calculating expression 55, 63 'Cambridge' change 113 Carlson, L. 264 Carroll, L. ( = Dodgson, C.) 169, 256-8 Case-grammar 10 Case-roles 9-15 Categories basic 59, 168, 176-238; B (names of bodies) 202; D (deictic, demonstrative) 221; P (proposition) 176; count nouns 193; in Ajdukiewicz 64; pointers 211-38 difference entails meaning difference 75, 88 Fregean 58 governing 140 of identity 276 logic and 50 names 59, 64 of proper name 234 of proper name phrase 234 of quantifiers 239, 241 of quantifying phrases 239 in semantic networks 101 S(S(D(B))) 241 S(S(D)) 239 sub-, see under Sub-categorization symbol 47 type 172 Category name basic, on links 135 definition modified 74 Category symbol, in conceptual dependency 112 Causality 115 formal vs. efficient (Aristotle) 207 in Sowa's conceptual graphs 105 Causation (Schank) result vs. reason vs. enabling 118 unenabling 118 Cercone, N. 101
301
Change 121 Chomsky, A.N. 3, 17, 19, 34, 40, 41, 43, 169 Chomsky hierarchy (of string grammars) 17 Circuit, of graph 94 Co-indexing 23, 24 Coherence semantic 64 semantic vs. syntactic 57, 169 syntactic 69 Commanding c-command 19, 24, 37-8, 143 m-command 19 Complement 11 Complementizer 176 Componential analysis 96 Concept Fregean 63, 185 in semantic network 96, 98 Conceptual dependency basic notions 111 category symbol in 112 and ergonomics 119 first basic principle of 115 negation and connectives in 118 quantification in 120 Conjunction coordinating 8 lists 286 in semantic networks 100 Conklin, H.C. Ill Connectives alethic propositional 286 binary 149 in semantic network 96 in Sowa's conceptual graphs 105 Constituents, substitution test for 3 Converging scope, see under Scope, converging Cooperative effort 263 Coordinating conjunctions 8 Copula 205 Count noun, see under Nouns, count Cresswell, M.J. 89, 268, 279, 285 Davidson, D. xi, 12, 98, 278 De Morgan's laws 101 Deep structure 18 Defaults, frames as 129 Definite article, before names of bodies 226
302
Index
Definite descriptions (Russell), see under Descriptions Definition, ostensive 219 Degree 46 Deictic expressions 137, 211 Descriptions (definite), Russell's theory of 149,219,255 Differentia 200 Digraph 94, 99, 135 Direction 113 ambiguity 218 Disjunction exclusive 286 in semantic networks 100 Dixon, R.M.W. 175 Donkey-sentences 36, 155 Dowty, D. 86, 89 Dummett, M.A.E. 76, 80, 89, 149, 181, 192, 194, 195, 200, 229 e (empty symbol) 5, 9 Edges directed 99 dotted 163 external, undirected 283 of graphs 94, 132 Empty Category Principle 37, 38 English, LP rules for 8 Evans, G. 41, 138, 245, 252, 260 Event-radical 116 Existence analogous to number 262 Existential graphs, Peirce's 79 Expansive form 46 EXPEL 114 Experiencer role 10, 11 Expression incomplete 71 parts of 71 Features 4 Fillmore, C.J. 10, 113, 127-9 Fitch, F.B. 181 Fodor, J.A., and Katz, J.J. 33 FOR operator 237 Forms of expression in a particular language xiii Fowler, H.W. 26, 154 Frames 122-30 as definitions 125 frame instance 124 sub-categorization 9, 11 Frege, G. xii, xiii, 54-9, 61-4, 69, 71, 73-5, 79, 80, 82, 84, 86, 88-91, 93, 97, 101,
131, 137, 140, 149, 150, 170, 171, 176, 195, 196, 199, 205, 212, 227, 231, 232, 243, 262, 272, 282 Frontier of tree 46 Function application of 55 vs. operator in Wittgenstein 56 Skolem 99 Functionals 88 Gaifman's theorem 90 Galton, A. 116 Geach, P.T. 80, 82, 83, 86, 88, 113, 138, 143, 144, H6, 155, 157, 183, 187, 193, 194, 195, 211, 214, 219, 232, 235, 249, 259, 268, 270, 272, 276, 279 Generality xii; see also under Quantification Generalized phrase-structure grammar, relative clauses in 33 Generic proposition, see under Proposition, generic Genesereth, M.R., & Nilsson, N.J. 129 Gestures with pointing 213, 216 Goal role 10, 11, 14, 113 Government 19 barriers to 19 governing category 24, 140 Grammar categorial 64-75 defined 1 difference between Fregean and Ajdukiewicz's 70 Fregean 91-2, 74, 131 graph xii, 92, 94-6, 285; categorial 167; sequential vs. parallel 96 phrase-structure 5-9 string xii, 1-45 transformational 15-33, 140, 158 tree xii, 46; and quantification 91 Graphs 92, 94-6 categorial 131-75 circuit 94 classification 95 conceptual 105 cube 94 edge 132 edge crossing 95 existential (Peirce) 105 hypographs 134-6 non-planar 161 path 94 plane vs. planar 94
Index root node 136 sub-graphs 95 toroidal 95, 152 triangular 94 GRASP 114 Gravity 122 Gupta, A. 194, 197 Haik, I. 158 Haim, I. 158 Harman, G. 146 Harris, Z. 3 Hayes, P.J. 122, 125 Head 7 Head-driven phrase-structure grammar passive construction 23 relative clauses 33 sub-categorization 15 Hendrix, G.G. 100, 101 'here' 214 Hintikka, J.K.K. 268, 272 Hornstein, N. 158 Huck, G.J. 67 Hyprograph 134-6 Identity 237, 270-7 Dummett on Geach on 195 Fregean view of 270 and proper names 226 relative 276 Wittgenstein on 271 Implication in semantic networks Incomplete expression 71 Index vs. argument 272 indexicals 211-12 Inference rules of 51 Injima species 234 INGEST 114 Instrument role 10, 13, 15, 114 Intensional contexts 285 Introduction and naming 225 'is' as identity-sign 226 Jackendoff, R. 27, 161 Jacobsen, B. 40 Jaeggli, O.A. 22 Kamp, H. 158 Katz, J.J., and Postal, P.M. Kay, M. 98, 99, 102, 103 Keenan, E.L. 19, 219 Kenny, A.J.P. 12 Kripke, S. 227
33
101
303
Label vs. pointer 211 Labelled bracketing 5 Labelling 213 Lambda calculus 88 Lambek, J. 66, 86, 90 Lambek calculus 90 Language, everyday vs. technical, vs. natural xi, 209 Lappin, S. 158 Lasnik, H. 41,138,145 Leaf, of tree 46 Leibniz, G.W. 123 LET operator 236 Levels 81 higher: and sub-categorization 172; scope at 162-8 Levin, H.D. 83 Lewis, D. 76, 193, 197 Lexical insertion 9, 15 LF ('logical form') 34,41,106 Link-letters 80-1, 89, 92 Links 132, 135 LISP 149 semantic networks in 97 Lists 286 Location role 10, 11, 13, 114 Logic many-sorted 197 and meaning 48-54 propositional, classical 121 of propositions and of predicates 181 the study of arguments 48 and tree grammar 46 Logical form 48, 106, 107 Logical unit 251 Long, P. 116 Lyons, J. Ill Machines and language 172 Manifold 259 Martin, J.N. 74, 84 Matrices 286-7 May, R. 34,40,42, 158 MBUILD 114 McCawley, J.D. 92 McCloskey, J. 23 Meaning and logic, see under Logic of 'not' and propositional connectives 84 picture theory of 218 and structure xi Minsky, M. 122, 127
304
Index
Mis-match 170 Modality, in semantic network 96 Modifier 7 Montague, R. 66, 88, 89, 92, 194, 220, 227 Morris, C.W. 212 'Most' 259 MOVE 114 MTRANS 114 Names 202 and bearer: one-to-one correspondence between 230 of bodies 224 category 64, 74 common nouns as (Geach) 155 and count nouns 232-8 FOR 236 of kinds of body, absolute vs. relational 273 OF 242 proper xiii, 59, 97, 199, 202, 212, 223; (in Frege) 59; vs. function 55, 71; and bearer of 138; change of 228; for a body 230; logician's model of 231; of a body 231; phrases 234; vs. demonstratives 227; vs. quantifying expressions 78; use vs. mention 270 singular vs. general, in Ajdukiewicz 64 Naming act of 214 ceremonies 230 relative clauses and 231 strong vs. weak sense 234 Nature basic vs. acquired 209 'by nature' 204 Negation and count nouns 182 notation for (Allen) 107 in semantic networks 100 in Sowa's conceptual graphs 105 Networks, see under Semantic networks Newman, J.H. 129 Nick-names 229 Node 94 Noms-de-plume 229 Non-terminals 3, 4, 46 Nonsense 69 and sense 168 Noonan, H. 232, 276 Notation Ajdukiewicz's 64 bar-Hillel's 65
convention 84 dotted edges 163 linear 60, 65; vs. planar 61; for group 2b pronouns 148; for trees 60 link 79, 132 mathematical, for powers 70 for negation (Allen) 107 planar 60, 91 quantifier 76, 93 for quantifier scope (Allen) 107 (R4) modification 165 root node of graph 136 for schematic symbols 71 in semantic networks 103; ambiguity of 106 and structure xiii sub-categorization 173 Nouns abstract 178, 195 collective 266 common 194 concrete 195, 201 count xiii, 64, 97, 176; as basic category 193; basic vs. derivative (Dummett) 195; juxtaposed 153; and negation 182; and proper names 233 mass 178 phase-sortal 200 substance 200, 201, 233 substance vs. phase-sortal 200 vs. verb 177, 189 Null symbol 9 Number vs. numeral 55 Object 11,113 Fregean 62, 183-4, 196 semantic role 10, 14 OF operator 237 Open sentence 80 Operand 54-63 structured 282-3 switching order in derivations 84 Operators 54-63, 218 FOR 237 FOR and OF, how related 239 LET 236 OF 237 PL 260 prefix, infix 60 propositional 286 suffix 77 'or' 8
Index Paraphrase 68 Parentheses, convention for closing 149 Parsing 45 Partition in semantic networks 100 Passive construction 15-24 in head-driven phrase structure grammar 23-4 and small clauses 20 syntactic vs. lexical 20 transformation rule 18 Path 10, 11 of graph 94 Path Containment Condition 38 Patient role 15, 17, 134 Peirce, C.S. 79 existential graphs 105 type/token distinction 96 Perry, J. 212 Peter of Spain ( = Pope John XXI) 206 Phase-sortals 198 vs. substance-concepts 195 Phonetics 1 Phonology 1 Phrase marker 5 dominance 6 precedence 5 sisters 6 Place 214, 217; see also under Location role Plato 177 Pointers 211-38 category (D) of 223 diachronic/historical/proper names vs. synchronic 227 locative 212 vs. quantifying phrases 219 and scope 218 temporal 212 Pointing 213 Pointing phrases 211, 224 Polyadicity, variable 287 Postal, P.M. 3 Postal, P.M., and Pullum, G.K. 23 Potts, T.C. 75, 83, 86, 88, 105, 131, 272, 273 Pragmatics 212 Pre-terminal string 9 Predicate, logical 56 Predicative nature of count nouns (Frege) 178 Predicative uses of substance nouns 204 Prenex normal form 99, 100, 103, 104 Presuppositions 219
305
of meaning 123 of truth 150, 158, 245 Prevention 119 Prior, A.N. 87 Pro-verb 162 Productions 3, 5; see also under Rules adjunct rules 8 complement rules 7 for coordinating conjunctions 8 phrase structure 6 rule-schemas 7 specifier rules 7 in string grammars 17 transformation, see under Transformation rules in tree grammars 46 Pronominal 24 Pronouns 137-48, 246-7 anaphoric 25, 137, 246; groups 138; group 1 138; group 2a 141; group 2b (E-type) 138, 143, 251, 261 anaphors 24 demonstrative 211 of laziness 138, 142-3 non-anaphoric 223 personal 25-6, 140 plural 143 pronominal 25 reflexive 25, 140 Proof 51 PROPEL 114 Proper name, see under Name, proper Proper name phrase 234 Propositions xi, 176 arithmetical 262 'atomic' 199 generic 191, 201-10, 240 indefinite 191 schema 51 Proverbs as examples of generic propositions 206 Pseudo-baptism 225 PTRANS 114 Quantification 75-93, 102, 131, 239-46 in conceptual dependency 120 multiple, in transformational grammar 35 in semantic networks 98 and tree grammars 91 unrestricted 196 Quantifiers 111, 239 categorization (Lewis) 194, 202
306
Index
Quantifiers (cont'd) category according to Evans 148 'just one' 144, 147, 160 most 259 non-numerical collective 268 non-standard 245 notation 80; in LUNAR (Woods) 108 plural 146, 156 plurative and numerical 259-70 second-order 164 in Sowa's conceptual graphs 105 Quantifying expressions/phrases 239 characterized 75 and converging scope 163 extraction from phrase markers 109 multiple 44, 106, 151 notation for 79 vs. proper name 78 and scope 132 scope of 78, 132 in transformational grammar 38 Quillian, M.R. 96 Quine, W.V.O. 79, 244 Quotation marks and names of bodies 226 Radford, A. 23, 35, 43 Rank 46 Reclmungsausdruck 55 Recipient 113 Reference 138 Reinhart, T. 158 Relative clauses 24, 26-32, 148-62, 247-58 added to pointing phrases 224 appositive 150, 247; vs. restrictive 26, 148, 161 defining 26 qualifying proper name 254 restrictive 153, 248 in transformational grammar 26 'whichV'that' convention 27 Replacement test for operand 57 Representation problem xi, xiii Rigid designators 227 Root, of tree 46 Rules; see also under Productions (A) (Ajdukiewicz's) 65, 81 combination vs. production 65 (D), (E) (Geach's) 82 Forward Partial Combination 82 for graph grammar 95 (If+) 180 (If-) (modus ponens) 179 of inference 51
parsing vs. generative 65 projection 98 QR 34-5,38 quantifier 179-80, 243 (Rl) 84 (R2) 84 (R3) 85,87 (R4) 90 re-writing 46 semantic interpretation 98 thema 51, 53, 82, 84-6, 88, 90 transformation 18 Russell, B.A.W. 113, 149, 171, 189, 212, 219 'Same' and identity 272 'is the same B as' 228 misidentification 229 Schank, R.C. Ill Schema 49, 124 higher-level 81 key to schematic symbols in 50 level 81 minimal 53 vs. operator 71 parts 71 proposition 51, 70 second-level, degenerate 89 signifies function (Frege) 182 tensed 221 third-level 86 Schematic symbol 49 Schubert, L.K. 101, 104, 109, 111 Schubert, L.K., Goebel, R.G., and Cercone, N. 101 Scope 44,98, 100, 101, 111, 131 ambiguity 220 back 133, 135, 148 converging 137-63, 246 defined 62 edges or inclusion links 103 forward 133, 135 at higher levels 162-8 immediate 62 of quanitfying expression 78 in semantic networks 104 Scopeways 131-7 Scragg, G. 97 Selection restrictions 128 Semantic component of transformational grammar 32-45 Semantic field 111, 175
Index circularity in 121 Semantic grid 11, 13, 17 in phrase markers 15 Semantic interpretation 34, 98 Absorption 36 QR rule, see under Rules, QR Semantic mood 176 Semantic networks 96-111 ambiguity of notation 98 categories in 101 conceptual dependency 111-22 conjunction, disjunction and negation in 100 implication in 101 in LISP notation 97 modality 103 partitions in 100 and quantification 98 quantification in 102, 106 scope in 104, 106 Semantic representation, relation to languages 105 Semantic roles 9-15, 17, 105, 110, 113, 134; see also under individual roles characterization 12 number 10, 12 occurrence restrictions 14 and passive construction 19 Semantic structures of operands 282 parts relative 81 Semantic unit 251 Semantically marked argument in phrase marker 15 Semantics interpretative 33 possible-worlds 123 relation to syntax 42 Sense and nonsense 168 Sentence, open 80; see also under Propositions Sentence-patterns 4 Sets 185 category of proper names of 266 set relation in semantic network 96 Shallow structure 34 Shapiro, S.C. 109 Sight 214 Simmons, R.F. 97 Singular term 138 Sinn 185 Skolem function 99, 102, 124 SLASH 33
307
Slots in frames 123, 126 Small clause 20 Sommers, F. 177 Sorites 258 Sortals 195 Source role 10, 11, 14, 113 Sowa, J.F. 105 SPEAK 115 Square of opposition 243 Start symbol 3-5, 46 States end and initial 112 vs. actions 111 Steedman, M.J. 86 Strawson, P.F. 244 String grammars 1-45 characterized 17 Chomsky hierarchy 17 comparison with tree 47 context-free 17 context-sensitive 17 regular 17 unrestricted 17 Structural analysis, see under Analyses Structural sign 138 Structure deep 18 dimensionality of 95 and meaning xi and notation xiii and purpose 1 semantic 91 semantic vs. syntactic 67 shallow 34 surface 18 and syntax 1 of thoughts 111 and truth conditions xiii Sub-categorization 9, 168-75 in head-driven phrase structure grammar 23 for tense 190 Sub-graph 134 Sub-scripts and back scope 132 in category labels 277 Super-scripts, numerical, as scope indicators 131 Subject logical 56 and predicate 54, 56, 58, 62, 63, 137 Substance, names of kinds of 201 Substance-concepts vs. phase-sortals 195
308
Index
Substantival terms (Geach) 193 Substitution restrictions, violations of Suppositio 206 Surface structure 18 Syllogisms 180, 245, 256 Symbol, schematic 49 Syntactic analysis, see under Analyses Syntactic category 4 Adjective 4, 5 Complementizer 4 Determiner 5 Inflexion 4 intermediate 4 lexical 4, 8 and non-terminals 4 Noun 4 phrasal 4, 8 Preposition 4 Sentence 4 Verb 4 Syntactic structure 108 Syntax defined 1 and graphs 137
174
Temporal expressions xiii Tense 5, 189 and generic propositions 208 Temrinals 3, 46 Themas, see under Rules, thema 'there' 214 Theta Criterion 14 'this' and 'that' 212 This what? 217 Thoughts of actions 111 Fregean 212 structure of 111 Time 10,11 Time role 114 Token, in semantic network 96 Trace 22 Transformation rules 18 adjunction 19 passive 18 substitution 19 trace 22 Transformational grammar 3-33, 42-5 semantic component of 32-45 Translation 68 Tree 5, 92 defined 46 degree of node in 46 edge 5
frontier 46 leaf 5,46 node 5 rank of node in 46 root 5, 46 start symbol 5 subtree 46 Tree Grammars 46-8 defined 46 and logic 46 Trier, J. 175 Truth conditions 51 Type of an expression 172 Type-raising 86, 90 Typical description 126 Unit logical 251 semantic 251 Universals, problem of
181
Values, in frames 123 Van Benthem, J. 86, 88, 89 Variable 49 bound 79, 80 in Frege 80 Verb causal 115 intransitive 9, 178 vs. noun 177, 189 number of semantic roles accompanying 12 psychological 140 transitive 9 Wang, H. 197 Watts, I. 256 Weinreich, U. 33, 45 Wells, R.S. 3 Wiggins, D. 195, 198, 200 William of Sherwood 206 Williams, C.J.F. 270, 272, 274 Williams, E. 23 Wittgenstein, L. 45, 56, 67, 71, 76, 77, 84, 120, 122, 128, 129, 169, 185, 189, 199, 201, 212, 271, 272 Woods, W.A. 98-101, 108, 109, 111, 268 Work upon 56, 58, 77 and immediate scope 62 how signified 71 and trees 60 unclear in linear notation 61 Yourgrau, P.
212