JOURNAL OF SEMANTICS AN INTERNATIONAL JoURNAL FOR THE INTERDISCIPLINARY STUDY OF THE SEMANTICS OF NATURAL LANGUAGE
M A N A GIN G E D I T O R : ASSOCI ATE E DITO RS :
PETER BoscH
(University of Osnabriick) (University of Texas, Austin) SANDT (University of Nijmegen)
NICHOLAS AsHER
RoB VAN
DER
E D IT O R IAL BOARD: MANFRED BIERWISCH
University Berlin) BRANIMIR BoGURAEV
Center)
(MPG and Humboldt (IBM TJ. Watson Research
(University of Essex) (University of Milan) f>NN CoPESTAKE (University of Cambridge) OsTEN DAHL (University of Stockholm) KEES VAN DEEMTER (University of Brighton) PAuL DEKKER (University of Amsterdam) KuRT EBERLE (linguatec-es, Heidelberg) REciNE EcKARDT (University of Konstanz) CLAIRE GARDENT (CNRS, Nancy) BART GEuRTS {University of Nijmegen) LAURENCE R HoRN (Yale University) JoACHIM JACOBS (University of Wuppertal) KEITH BRowN
GENNARO CHIERCH!A
N. J oHNSON-LAIRD (Princeton University) (University of Stuttgart) GRAHAM KATz (University of Tiibingen) SEBASTIAN LOBNER (University of Dusseldorf) Sm JoHN LYoNs (Vemeuil-en-Bourbonnais) MARc MoENS (University of Edinbur gh) FRANCIS J. PELLETIER (University of Alberta) MANFRED PINKAL (University of Saarbriicken) ARNIM voN STECHOW (University of Tiibingen) MARK STEEDMAN (University of Edinburgh) ANATOLI STRIGIN (ZAS, Berlin) HENRIETIE DE SwART (University of Utrecht) BoNNIE WEBBER (University of Edinbur&h) HENK ZEEVAT (University of Amsterdam) THOMAS E. ZIMMERMANN (University of Frankfurt) PHILIP
HANs KAMP
EDITORIAL ADDRESS: Journal of Semantics, c/o Dr P. Bosch, Lerchenstr. 76, 70I76 Stuttgart, Germany. Phone: (49-7II-) 2262616. Telefax: (49-7I I-) 2262614. Email:
[email protected] © Oxford University Press
2000
All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise without either the prior written permission of the Publishers, or a licence permitting restricted copying issued in the UK by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1P 9HE, or in the USA by the Copyright Clearance Center, 222 Rosewood Drive, Danvers, Massachusetts 01923, USA
Journal of Semantics (ISSN ot67 S'33) is published quarterly in February, May, August and November by Oxford University Press, Oxford, UK. Annual subscription is US$173 per year. journal ofSemantics is distributed by MAIL America, 2323 Randolph Avenue, Avenel, New Jersey 07001, USA Periodical postage paid at Rahway, New Jersey, USA and at additional entry points.
US POSTMASTER: send address corrections to journal of Semantics, c/o Avenel, New Jersey 07001, USA
For subscription information please see inside back cover.
MAIL America, 2323 Randolph Avenue,
JOURNAL OF SEMANTICS Volume 17 Number 3
Special Issue on Optimization of Interpretation (Part I) Guest Editors: Petra Hendriks, Henriette de Swart and Helen de Hoop
CONTENTS
PETRA HENDRIKS, HENRIETIE DE SwART AND HELEN DE HooP
Introduction
185
REIHARD BLUTNER
Some Aspects of Optimality in Natural Language Interpretation
189
PAuL DEKKER AND RoBERT vAN Roov
Bi-Directional Optimality Theory: An Application of Game Theory 217 HENK ZEEVAT
The Asymmetry of Optimality Theoretic Syntax and Semantics
(Part II to follow in vol. 17.4)
Please visit the journal's world wide web site at http://jos.oupjournals.org and the editorial web site at http:/ /journal-of-semantics.org
243
Subscriptions: The Journal of Semantics is published quarterly. £99; USA and Rest of World US$173- (Single issues: UK and Europe £31; USA and Rest of World US$54.)
Institutional: UK and Europe Personal:* UK
and Europe £42.50; USA and Rest of World US$79. (Single issue: UK and Europe £r3; USA and Rest of World US$25.)
*Personal rates apply only when copies are sent to a private address and payment is made by personal cheque/credit card. Prices include postage by surface mail or,for subscribers in the USA and Canada by Airfreight or in Japan,Australia,New Zealand and India by Air Speeded Post. Airmail rates are available on request. Back Issues. The current plus two back volumes are available from the Oxford University Press, Great Clarendon Street, Oxford OX2 6DP. Previous volumes can be obtained from Dawsons Back Issues, Cannon House,Park Farm Road,Folkestone, Kent CTr9 sEE, tel +44 (o)r303 85oror,fax +44 (o)r303 850440. Volumes 1-6 are available from Swets and Zeitlinger,PO Box 830, 2r6o SZ Lisse, The Netherlands. Payment is required with all orders and subscriptions are accepted and entered by the volume. Payment may be made by cheque or Eurocheque (made payable to Oxford University Press), National Girobank (account soo ros6), Credit cards (Access, Visa, American Express, Diners Club),or UNESCO coupons. Please send orders and requests for sample copies to the Journals Subscriptions Department, Oxford University Press,Great Clarendon Street,Oxford OX2 6DP,UK, tel +44 (o)r86 5 267907, fax +44 (o)r86 s 26748s,
[email protected].
Scope of this Journal
The journal of Semantics publishes articles, notes, discussions, and book reviews in the area of academic research into the semantics of natural language. It is explicitly interdisciplinary, in that it aims at an integration of philosophical, psychological, and linguistic semantics as well as semantic work done in logic, artificial intelligence, and anthropology. Contributions must be of good quality (to be judged by at least two referees) and must report original research relating to questions of comprehension and interpretation of sentences, texts, or discourse in narural language. The editors welcome not only papers that cross traditional discipline boundaries, bur also more specialized contributions, provided they are accessible to and interesting for a general readership in the field of natural language semantics. Empirical relevance, sound theoretic foundation, and formal as well as methodological correctness by currently accepted academic standards are the central criteria of acceptance for publication. It is also required of contributions published in the Journal that they link up with currently relevant discussions in the field of natural language semantics. Information for Authors: Papers for publication should be submitted to the Managing Editor (
[email protected]) as a PDF 6le or PS file attachment. Only if this is not feasible please send three paper copies by post to the editorial address and, if possible, enclose a DOS-formatted 3·5 inch disk with a PDF or PS 6le, or text processing source 6le. Papers are accepted for review only on the condition that they have neither as a whole, nor in part, been published elsewhere, are elsewhere under review, or have been accepted for publication. In case of any doubt authors must notify the editor of the relevant circumstances at the time of submission. The style requirements of theJournal ofSemantics are found in the style sheet http://journal-of-semantics.org/style.html and are binding for the final version to be prepared by the author when the paper is accepted for publication. For initial submission it suffices if the following minimal requires are met. The page size should be A4 (or similar format). The paper must be headed by its title and must carry the name and affiliation of the author along with the author's correspondence address (post and email) at the end of the text. All submissions must be accompanied by an approx. 200 word abstract. Detailed bibliographical references must appear at the end of the paper in alphabetical order of authors' names, abbreviated in the text by author's surname and year of publication. Diagrams must be submitted in electronic 6les or camera-ready on paper. Copyright: It is a condition of publication in the Journal that authors assign copyright to Oxford University Press. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning copyright, authors may use their own material in other publications provided that the Journal is acknowledged as the original place of publication, and Oxford University Press is notified in writing and in advance. Advertising:
Advertisements are welcome and rates will be quoted on request. Enquiries should be addressed to Helen Pearson, Oxford Journals Advertising, PO Box 347, Abingdon SO, OX14 sXX, UK. Tel/fax: +44 (o)1235 201904,
[email protected].
Journal of Semantics
I 7' I 8 S- 1 8 7
© Oxford University Press
2000
Guest Editors' Introduction PETRA HENDRIKS, HELEN DE HOOP, AND HENRIETTE DE SWART
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Optimality Theory (OT) was developed in the 1 990s by Alan Prince and Paul Smolensky as a general theory of language and grammar. Crucial for OT is Smolensky's idea of identifying a connectionist notion of well formedness (Harmony) with linguistic well-formedness. In OT a grammar consists of a set of well-formedness constraints. These constraints apply to representations of linguistic structures simultaneously. Moreover, they are soft, which means violable and potentially conflicting. At least an important subpart of these constraints is assumed to be shared by all languages. Individual languages rank these universal constraints differently in such a way that higher-ranked constraints have total dominance over lower ranked constraints. Possible output candidates for each underlying form are evaluated by means of these constraint rankings. The output that best satisfies the constraints is the optimal candidate and will be realized. Although only recently OT was applied to semantic and pragmatic analysis for the first time, the last two years have shown a remarkable growth in the use of soft, conflicting constraints to characterize natural language interpretation. In the OT semantic theory developed by Hendriks & De Hoop (1997, to appear), each grammatical expression is associated with an, in principle, infinite number of interpretations. These candidate interpretations ate tested against the ranked constraints in a parallel fashion. One of the advantages of such an approach is that constraints of various nature (syntactic, pragmatic, etc.) interact with each other in a truly cross modular way. This view crucially differs from the classical compositional approach, where one interpretation is computed on the basis of the syntactic input, making use of context only when necessary. One aspect that receives a lot of attention in this special issue is the adequate treatment of the roles of the speaker's perspective (generation) and the hearer's perspective (comprehension). Whereas OT syntax optimizes syntactic structure with respect to a semantic input (one might say that OT syntax takes the perspective of the speaker, who has a certain thought and wants to express this correctly and optimally through a syntactic structure), OT semantics, on the other hand, takes the point of view of a hearer, who hears (or reads) a certain utterance and wants to interpret it correctly and optimally.
r
86 Guest Editors' Introduction
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Several papers in this special issue argue in favour of a bi-directional OT, where the speaker's and hearer's perspectives are taken simultaneously. Reinhard Blutner establishes a conceptual framework that realizes the integration of the two perspectives. A bi-directional approach explains interpretative preferences that are problematic from the speaker's point of view as well as blocking effects that cannot be explained from the hearer's perspective. Blutner argues that his bi-directional framework captures the essence of the Gricean maxims and the balance between informativeness and efficiency in natural language processing. Henk Zeevat argues for a slightly different combination of syntax and semantics that avoids certain problems of Blutner's hi-directionality. In Zeevat's view, OT syntax is the basic framework, which also deals with interpretation. This program is extended with a bi-directional pragmatic component in the spirit of Blutner. The resulting asymmetry between OT syntax and OT semantics is consistent with the vast differences between what people can say and what they can understand. Another case of syntax/semantics interaction is the Finnish partitive construction discussed by Arto Anttila and Vivienne Fong. This construc tion exhibits a case alternation that is partly semantically and partly syntactically driven. The crucial syntactic and semantic constraints conflict with each other leading to various kinds of outcomes, including free variation and ambiguity, as well as preferences in expression and prefer ences in interpretation. An OT analysis of these facts is developed based on partially ordered grammars. Partial ordering is argued to be crucial in deriving ambiguity and blocking effects. An important question in OT semantics is whether we can account for cross-linguistic variation in interpretation as a result of different rankings among the different types of constraints that relate form and meaning. Alice ter Meulen accounts for differences in reflexivization strategies of Dutch and English by supplementing binding principles applied to Dutch reflexives with optimality considerations and a general principle of linguistic economy. Dutch SE-reflexives optimally encode coreference in contrast to English ordinary bound pronouns. The framework of OT naturally suggests itself for dealing with a wide range of problems in semantics and pragmatics, according to Bart Geurts. Geurts' paper can be viewed as a reply to the OT treatments of presupposition proposed by Blutner and Zeevat. Geurts compares the Informativeness Principle (which states that more informative readings are preferred to less informative ones) to his own Buoyancy Principle (which states that backgrounded material tends to float up to the main context) and concludes in favour of the BP.
Petra Hendriks, Helen de Hoop, and Henriette de Swart I 87
Acknowledgements Helen de Hoop gratefully acknowledges support by the Netherlands Organization for Scientific Research, NWO (grant 300-75 -020). HELEN DE HOOP Rijkuniversiteit Utrecht Trans
10
3512 JK Utrecht The Netherlands email:
[email protected]
REFERENCES Hendriks, Petra & Hoop, Helen de (1997), 'On the interpretation of semantic relations in the absence of syntactic structure', in P. Dekker, M. Stokhof, & Y. Venema (eds), The Proceedings of the 1 Jth Amsterdam Collo-
quium, ILLC/Department of Philosophy,
Amsterdam, 15 7-62. Hendriks, Petra & Hoop, Helen de (to appear), 'Optimality theoretic semantics', Linguistics and Philosophy.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The papers in this issue share the goal of elucidating the processes of natural language interpretation, but the theoretical perspectives differ from one another. In the paper by Paul Dekker and Robert van Rooy some parallels are pointed out between principles employed in OT interpretation, and notions from the field of Game Theory. OT interpretation is defined as what Dekker and Van Rooy call an 'interpretation game' and optimality itself is the solution concept for a game. More in particular, optimality is characterized in terms of the game-theoretical notion of a 'Nash Equilibrium'. We hope that the present collection of papers will bring the project of OT semantics to the attention of a broad linguistic community. The papers in this issue represent some of the m�or developments in OT semantics and they will hopefully form a basis for future research in this exciting new field.
Journal of Semantics
© Oxford University Press
17: 189-216
2000
Some Aspects of Optimality in Natural Language Interpretation REINHARD BLUTNER
Humboldt University Berlin Abstract
r
INTRODUCTION
The popularity of Optimality Theory (OT) is notably different in the various fields of linguistics. In phonology it has become the dominant theoretical paradigm. The main reason that OT grew so rapidly in this field is that constraint ranking was silently present in the phonological literature for many years. After the idea was brought from the periphery to the foreground its need in phonology was quite clear. In syntax, the predominant research tradition has given typically negative answers to the question whether a conflict between constraints is resolved by ranking one constraint over the other. Constraints were assumed to be hard and there is ample evidence that conflicts block the existence of any acceptable output (c£ the discussion in Pesetsky 1997). The recent interest in OT syntax is obvious in the investigation of some non-standard phenomena, especially concerning the interaction between syntax, pro nunciation and reference (e.g. Pesetsky 1997). Other motivation came from language typology and from the view that the parser and the grammar are not very different objects. Furthermore, a closer look at the 'absolute' principles has made clear that their violability is actually quite widespread (Speas r 997) In natural language interpretation the idea of optimization is quite .
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In a series of papers, Petra Hendriks, Helen de Hoop, and Henriette de Swart have applied optimality theory (OT) to semantics. These authors argue that there is a fundamental difference between the form of OT as used in syntax on the one hand and its form as used in semantics on the other hand. Whereas in the first case OT takes the point of view of the speaker, in the second case the point of view of the hearer is taken. The aim of this paper is to argue that the proper treatment of OT in natural language interpretation has to take both perspectives at the same time. A conceptual framework is established that realizes the integration of both perspectives. It will be argued that this framework captures the essence of the Gricean maxims and gives a precise explication of Atlas & Levinson's ( 1 9 8 r) idea of balancing between informativeness and efficiency in natural language processing. The ideas are then applied to resolve some puzzles in natural language interpretation.
190
Some Aspects of Optimality in Natural Language Interpretation
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
obvious and there is much evidence in favour of competition and constraint ranking in this field. However, the field is rather divergent. Looking at the different conceptions of discourse coherence gives an impression of the heterogeneity of the field. What is essential is a kind of integrative framework that makes it possible to formulate the different conceptions in one scientific language and thus to make comparisons between different models transparent. In my opinion, OT is an opportunity for realizing such an integrative framework. However, in its present form OT is insufficient to do this job. So, what we have to do first is to adjust OT to the specific demands of natural language interpretation. Then we can come back to the task of integrating different aspects and different views of natural language interpretation. In OT it is common to assume three formal components: the Generator, the Evaluator and a system of (ranked) Constraints. These components are characterized by three basic assumptions. First, a set of inputs A is assumed. For each input, Gen creates a candidate set of potential outputs B. The second assumption is that from the candidate set Eval selects the optimal output for that input. The third assumption is that there is a language particular ranking of constraints from a universal set of constraints. Constraints are absolute and the ranking of the constraints is strict in the sense that outputs that have at least one violation of a higher-ranked constraint can never win over outputs that have arbitrarily many violations of lower-ranked constraints (cf. Prince & Smolensky 1993; Kager 1999). Each of these three assumptions has to be adjusted or revised in order to satisfy the demands of natural language interpretation. With respect to Gen, I think, it is best to take a dynamic picture of natural language semantics and to describe it in terms of a context change semantics. This adjustment is especially important in order to deal with the context dependency of natural language interpretation (e.g. Kamp & Reyle 1993). Next, consider Eval. The direction of optimization is usually taken unidirectional (from A to B, where the elements of A sometimes are called inputs and the elements ofB outputs). One of my main arguments is that in the case of interpretation it is inevitable to have bidirection of optimization (from A to B and from B to A). Both directions are not independent of each other; instead, they should be interrelated in a particular way. Third, with regard to Con we have to acknowledge the role of graded constraints. Graded constraints also appear in other domains, for example in phonology (c£ Prince & Smolensky 1 993; Boersma 1 998). However, in natural language interpretation the role of graded constraints seems to be much more important than in other domains. Another point is that in natural language interpretation the relevant pragmatic constraints are always ranked universally within the set of pragmatic constraints. As a
Reinhard Blutner
19 1
2 TWO PERSPECT IVES O F OPTIMALITY De Hoop & de Swart (1998), Hendriks & de Hoop (to appear), and de Hoop (2ooo) applied OT to sentence interpretation. These authors argue that there is a fundamental difference between the form of OT as used in syntax on the one hand and its form as used in semantics on the other. Whereas in the former case OT takes the point of view of the speaker (production perspective), in the latter case the point of view of the hearer is taken (comprehension perspective).1 This idea is an important one and I think most of the existing analyses conform to it. Moreover, the picture can be extended to OT phonology and morphology. For example, in phonology Gen clearly takes the production perspective and creates a candidate set of potential outputs (=speech sounds as they occur in utterances) for a given input (=speech sounds as they occur in the mental lexicon). From the candidate set, Eval selects the best (optimal) output for that input. A similar picture can be found in OT morphology (e.g. Bresnan, to appear). Here the input 1 By using the terms 'comprehension' and 'production' we do not refer to performance but rather to abstract functions in a mathematical sense that pair certain pairs of representations (cf Smolensky 1996).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
consequence, typological differences between languages are not triggered by a reranking of the constraints within the pragmatic domain. Instead, typological effects are triggered-among other things-by variations that concern the relative importance of pragmatic constraints with regard to other types of constraints. Choi (1996) supports this point in an indirect way by comparing scrambling phenomena in German and Korean. The paper is structured as follows. In section 2 some arguments are put forward as to why bidirection of optimization is of central importance when we try to apply OT to natural language interpretation. Section 3 introduces my proposals for a proper treatment of optimality in natural language interpretation. The starting point is the context change potential of an (underspecified) expression which is described as a relation between input and output contexts. The effect of optimality is simply to constrain this relationship in a way which both involves optimization for interpreta tion and optimization for production. In section 4 the general framework is put in concrete terms by modelling contexts as DRSs. It is demonstrated that van der Sandt's/Geurts' projection mechanism for presuppositions can be reconstructed and extended as a consequence of the present form of OT.
192
Some Aspects of Optimality in Natural Language Interpretation
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
represents language-independent 'content' in the multidimensional space of possible grammatical and lexical contrasts and Gen enumerates a set of concrete realizations of the input that are available across languages (expressing the 'content' with varying fidelity). However, the one way tableau typically assumed in phonology may be insufficient. One reason for this shortage has to do with the nature of the input under OT. In contrast to standard generative phonology, where numerous constraints were imposed on the input, in OT constraints on the input are typically lacking. In principle, the set of inputs to the grammars of all languages is assumed to be the same (richness of the base). As a consequence, in many cases it is easy to construct multiple inputs that converge on a single output. Which of the multiple inputs should be selected? This question is important when we assume that the relevant inputs must be stored somewhere in the mental lexicon. The economy of the lexicon requires that corresponding inputs must be selected careful. Prince & Smolensky (1993: section 9) introduced an algorithm called lexicon optimization (further developed by Ito, Mester, & Padgett 1 995) which optimizes the inputs. The algorithm examines the constraint violations incurred by the winning output candidate corresponding to each competing input. The input-output pair with the fewest violations is selected as the optimal pair. Thus, lexicon optimization works from the inputs A to the outputs B and back from B to A. As a consequence, the 'input' set A is restricted in an indirect way, by means of the system of ranked constraints and the possible outputs. OT syntax is another case where the production perspective is taken exclusively. It optimizes syntactic structure with respect to a semantic input. Now we have to notice human sentence parsing as a related area in which optimality has always been assumed. According to the nature of parsing, in this case the comprehension perspective comes in. Consequently, the parser optimizes underlying structures with respect to a surface input. Gibson & Broihier (1998) and Fanselow, Schlesewsky, Cavar, & Kliegl (1 999) have shown that parsing preferences can be explained in this way. Furthermore, Fanselow, Schlesewsky, Cavar, & Kliegl (1 999) have tried to demonstrate that the same constraints seem to be used both in OT syntax and parsing. If this it right, it demonstrates that both directions of optimization are relevant. OT syntax normally ignores the phenomenon of syntactic ambiguities and does not try to explain the preferences for the different readings that suggest itself (cases in point are quantifier scope and PP attachment). I see it as an opportunity for OT syntax to explain the relevant preferences with the help of syntactic constraints, which are motivated independently. If we consider optimality under the production perspective exclusively, we lose this opportunity to give a syntactic explanation for the
Reinhard Blutner
19 3
( I ) a. I ate pork/?pig
b. Some persons are forbidden to eat beef/?cow c. The table is made of wood/?tree Blocking effects need not be absolute. Instead, they may be cancelled under special contextual conditions. Nunberg & Zaenen ( r 992) give the following example of what they call deblocking: (z) Hindus are forbidden to eat cow/?beef
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
preferences. This does not exclude the relevance of pragmatic factors that arguably interact with the syntactic factors. Now let us address natural language interpretation. Ambiguity, polys emy, and other forms of flexibility are much more obvious and manifested in a much broader way in this area than in the realm of syntax. The assumption that OT in sentence interpretation takes the point of view of the hearer is mainly motivated by this observation and the aim to explain the interpretive preferences. Using this perspective a mechanism for preferred interpretations is constituted that provides insights into different phenomena of interpretations, such as the determination of quantificational structure (Hendriks & de Hoop, to appear), nominal and temporal anaphorization (de Hoop & de Swart I 998), and the interpretational effects of scrambling (de Hoop zooo). However, I think there are reasons demonstratnig this design of OT to be inappropriate and too weak in a number of cases. The reasons have to do with the fact that Gen can pair different forms with one and the same interpretation. The existence of such alternative forms may raise blocking effects that strongly affect what is selected as the preferred interpretation. It is not difficult to see that the arguments for a bidirectional view in syntax and the arguments for a bidirectional view in interpretation are complementary. In the case of syntax, we cannot explain interpretative preferences when we take the production perspective alone. In the case of semantics/pragmatics we cannot explain blocking effects when we take the comprehension perspective alone. Blocking effects are essential for the explanation of pragmatic anomalies. This may be illustrated with an example. Consider the well-known phenomenon of 'conceptual grinding', whereby ordinary count nouns acquire a mass noun reading denoting the stuff the individual objects are made of, as in Fish is on the table or Dog is all over the street. One of the essential factors that restrict the grinding mechanism is lexical blocking. For example, in English the specialized mass terms pork, beef, wood usually block the grinding mechanism in connection with the count nouns pig, cow, tree. This explains the contrasts given in (I).
194
Some Aspects of Optimality in Natural Language Interpretation
(3) a. Johni washes himsel( b. *Johni washes himi c. Johni expected Mary to wash himi In (3b) the coreferential reading is impossible because this interpretation is blocked by the form (3a) which is assumed to be more cheaply generated (because of a weak constraint saying 'bound NPs are marked reflexive'). In (3c) this blocking effect is cancelled out by a higher-ranked constraint 'A reflexive must be bound locally' (Burzio 1 998). The version of (3c) with a reflexive will now be taken to violate this constraint, while the one with the pronoun only violates the lower-ranked constraint 'bound NPs are marked reflexive', thus representing the optimal candidate. Appreciating the basic findings of Petra Hendriks, Helen de Hoop and Henriette de Swart concerning the selection among interpretations, the conclusion can only be that we have to consider bidirectional optimization. This appears to be almost a conceptual necessity. A careful argument in favour of bidirectionality has to take into account the important distinction between a semantic representation (=formal meaning) and an interpretation (content). If we identify semantic repre sentations and interpretational content, then we simply have to state that a bidirectional OT is established by combining OT syntax and OT semantics SYNTAX syntactic representation
semantic representation
SEMANTICS Figure
I
Syntax and semantics as the two directions of bidirectional OT
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
They argue that what makes beif odd here is that the interdiction concerns the status of the animal as a whole, and not simply its meat. That is, Hindus are forbidden to eat beef only because it is cow-stuff Copestake & Briscoe (1995) provide further examples that substantiate this claim. The simplest explanation for blocking (and also deblocking) is a bidirectional OT that takes into account the production perspective. An expression is blocked with regard to a certain interpretation if this interpretation can be generated more economically by an alternative expression. Linguistic and contextual factors can trigger deblocking in case they reverse the corresponding cost values (cf Copestake & Briscoe 1 995; Blutner 1998). The binding behaviour of pronominal expressions gives another illus tration for the importance of blocking in natural language interpretation.
Reinhard Blutner
195
SYNTAX semantic representation
syntactic representation
interpretation
PRAGMASEMANTICS
Figure
2
The two directions of optimization in a model without bidirection
SYNTAX syntactic representation
semantic representation SEMANTICS
Figure
3
interpretation PRAGMATICS
A model with two modes of bidirection
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(see Figure r ). OT semantics takes syntactic representations as inputs and results in optimal semantic outputs, and OT syntax takes semantic representations as inputs and results in optimal syntactic outputs. To say that we need bidirectionality is then simply to say that we need OT syntax and OT semantics. Presumably, this is the view taken by the pioneers of OT semantics. There are different schools of linguistics which consider the distinction between formal meaning and interpretational content as a very important issue. For example, Bierwisch (I 98 3, I996) proposed his two-level semantics, Carston (I998) made a similar point from the perspective of relevance theory, and many people in computational linguistics have a related distinction based on the idea of underspecification (e.g. van Deemter & Peters I996). Assuming this distinction could lead us to an architecture combining the ideas of optimal production and optimal interpretation in a way that does not make use of bidirection (Figure 2 ) . It is not difficult to see that this architecture is unable to explain the blocking of interpretations in the general case. It only describes the blocking of interpretations just for those cases where the corresponding semantic representations are blocked. The example of 'conceptual grinding' and other phenomena within the realm of lexical pragmatics (cf Blutner I998) suggest that one and the same semantic representation may be connected with a variety of different interpretations. Nevertheless, certain interpretations can be blocked without blocking the corresponding semantic representations. It is not difficult to suggest an architecture that doesn't suffer from these shortcomings. It is shown in Figure 3· Here we have to consider two modes of bidirection-one for relating syntactic and semantic representations and one for relating semantic representations and interpretations. It goes without saying that this architecture does not really conflict with the
196
Some Aspects of Optimality in Natural Language Interpretation
2 Another nice example where a bidirectional competition technique can help to explain empirical generalizations is discussed by Lee (2ooo). Based on the constraints assumed by Choi ( 1 996), Lee shows that a bidirectional model can explain some types of 'freezing effects' concerning the word order in German and Korean (looking. at sentences with ambiguous case marking). For further examples and references, see Kuhn (zooo) and the web page of Bresnan: http://www. lfg.stanford.edu/lfg/bresnan/.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
ideas of the pioneers of OT semantics. Instead it broadens their view in a straightforward way. Not surprisingly, it is rather unclear sometimes which phenomenon should be treated within which mode of bidirection. Consider the case of binding phenomena. Building on Burzio (r989), C�lin Wilson (1998) develops a theory of anaphora incorporating two types of competition. Assuming the interface betw:een syntax and semantics to have a particular 'direction', Wilson takes both directions into account-the direction that maps from semantic structures to syntactic ones and the opposite direction that maps from syntactic structures to semantic ones.. Clearly, Wilson's account refers to the mode of bidirection shown on the left-hand side of Figure 3· In contrast, there is Levinson's pragmatic theory of anaphora (e.g. Levinson 1987), which can be seen as operating in the pragmatic. mode of bidirection (right-hand side ofFigure 3). It is not the aim of this paper to judge which decision is the better one. Independent of the position we take with regard to the distinction between meaning and interpretation, the advantage of the bidirectional view becomes clear now: it integrates interpretational preferences and blocking effects and it keeps OT simple: 'What is best expressed as a generation principle is expressed as a generation principle, what is best expressed as an interpretation principle is expressed as an interpretation principle' (Zeevat, this volume),. Under the present perspective of integrating production and compre hension optimality we can account both for ineffability and for pragmatic anomaly. The first case occurs when the optimal production can be triggered more efficiently by an alternative interpretational input. The second case occurs when the optimal interpretation can be expressed more efficiently by an alternative form.2 The final remark has to do with the foundation of OT in Harmony Theory; Harmony Theory is a formalism which abstracts away from the details of connectionist networks and seeks to find out general mathemat ical techniques for analysing classes of connectionist networks (Prince & Smolensky 1993; Smolensky 1986). One essence of Harmony Theory is its founding on a two-layer scheme which allows a combination of simplicity with uniformity. On the lower layer we find representational nodes that encode the different kinds of information involved in language processing
Reinhard Blutner
197
3
AN INTE GRAT IVE FRAME W O RK
In this section an attempt is made to integrate optimal interpretation and optimal production. A look at the area of pragmatics seems to be useful since an analogous optimality metric plays an indispensable role there. The Gricean conversational maxims are widely recognized as a (rather informal) expression of this metric. With Zipf ( r 949) as a forerunner we have to
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(phonological, morphological, syntactic, semantic). On the upper level we find knowledge nodes that are hidden units that encode certain 'patterns' that relate particular configurations of representational units. A connectionist network is a dynamical system that is controlled by a certain Ljapunov function. When activation dynamically spreads off, this function always decreases or remains constant. In other words, harmony theory says that starting from any incomplete representational vector, this vector is always completed in a minimalistic/optimal way. Harmony theory does not say that the different optimizations converge when we start with different parts of a lucid representational vector. The theory says only that one and the same Ljapunow function (=system of ranked constraints in OT) can be used when the system operates like a hearer (starting with a natural language form and ending with an interpretation) or when it operates like a speaker (starting with an activated interpretation and ending with a form). The theory does not say that we come back to the original expression when we execute both operations in successwn. Everyone can describe numerous situations in which he was unable to produce what he understands. More drastically, the phenomenon of aphasia illustrates possible asymmetries in production and comprehension (e.g. Jakobson 194I/I968). A related asymmetry is found in language acquisition. It is well known that children's abilities in production lag dramatically behind their abilities in comprehension. In overcoming this lag, a kind of bootstrap mechanism seems to apply that depends crucially on the robustness of comprehension, possibly by using a technique called robust interpretative parsing (Smolensky 1996; Tesar & Smolensky 2ooo). Consequently, when it comes to relate the two perspectives within a bidirectional OT, we have to acknowledge the close interrelation between them in the OT learning algorithm. In summary, harmony theory per se does not give any argument in favour of bidirection. Instead, the arguments are coming from OT learning theory. I will come back to this important conceptual point in the next section.
198
Some Aspects of Optimality in Natural Language Interpretation
(4) Q-principle: Say as much as you can (given I) (Horn 1984: 13). Do not provide a statement that is informationally weaker than your knowledge of the world allows, unless providing a stronger statement would contravene the !-principle (Levinson 1987: 401). !-principle: Say no more than you must (given Q) (Horn 1984: 13). Say as little as necessary, i.e. produce the minimal linguistic information sufficient to achieve your communicational ends (bearing the Q-principle in mind) (Levinson 1987: 402). Read as much into an utterance as is consistent with what you know about the world (Levinson 1983: 146-7). Obviously, the Q-principle corresponds to the first part of Grice's quantity maxim (make your contribution as informative as required), while it can be argued that the countervailing !-principle collects the second part of the quantity maxim (do not make your contribution more informative than is required), the maxim of relation and possibly all the manner maxims. In a slightly different formulation, the !-principle seeks to select the most coherent interpretation, and the Q-principle acts as a blocking mechanism and blocks all the outputs that can be derived more economically from an alternative linguistic input (for a detailed discussion see Blutner 1998). This formulation makes it quite clear that the Gricean framework can be understood in a bidirectional optimality framework which integrates production and comprehension optimality. At first glance, using a bidirec tional competition technique can be seen merely as establishing the very same ideas as presented in Blutner (1998) using a more widely acknow ledged and well-known basis. However, that is not the whole story. We have to acknowledge that the framework of OT gives us a much wider
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
acknowledge two basic and competing forces, one force of unification, or Speaker's economy, and the antithetical force of diversification, or Auditor's economy. The two opposing economies are in extreme conflict, and we have to look for an optimal way to resolve this conflict. An important step in reformulating and explicating the Gricean frame work has been made by Atlas & Levinson (1981) and Horn (1984), who have tried to clarify the consequences of these opposing economies. Taking Quantity as a starting point, they distinguish between two principles, the Q-principle and the !-principle (termed R-principle by Horn 1984). The !-principle can be seen as the force of unification minimizing the Speaker's effort, and the Q/R-principle can be seen as the force of diversification minimizing the Auditor's effort. Simple but informal formulations of these principles are as follows:
Reinhard Blumer
199
(s) Gena= {(sem(A), r): O'[sem(A)]r} For convenience, we will simply write A instead of sem(A) from now on. The effect of the Gricean maxims is simply to constrain this relation in a particular way, and we have already given some initial motivation that this constraint can be formulated best in a bidirectional OT framework. In OT there is a cost function (harmony function) that evaluates the elements of the generator. For the present aims it is sufficient to assume an ordering relation >-- (being more harmonic, being more economical) that ranks the elements of the Generator.3 Now the following formulation of the Q and the !-principle comes immediately to mind and brings us to a bidirectional optimality view:
(6) Bidirectional OT (strong version) (Q) (A, r) satisfies the Q-principle iff (A, r ) E Gena and there is no other pair (A', r) such that (A', r ) >-- (A, r ) 3 Being more pedantic, we should write >-a in order to indicate the dependence on the actual context u. We can drop the index because here and in the following we assume the actual context to be fxi ed.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
perspective on relating natural language comprehension, language acquisi tion (Tesar & Smolensky zooo) and language change (e.g. Haspelmath 1 999). Furthermore, there are interesting mathematical results concerning the computational capacity of OT systems (see Kuhn 2000 for further references). Taking the broader perspective and the more rigorous for malization, the use of OT may give the enterprise of Radical Pragmatics in general and Lexical Pragmatics in particular a new impulse. With the Gricean maxims as Eval, we have to make more explicit now the status of Gen. Following current trends in semantics, we see the formal meaning of a natural language expression A as its context change potential (e.g. Heim 1 982; Kamp 1 981; Kamp & Reyle 1 993; Groenendijk & Stokhof 1 99 1 ). It describes the way A (or better, the semantic form sem(A) that is associated with A) updates the current context 0' leading to a new context r. In standard dynamic semantics the context change potential is assumed to be a function, with the argument of the function usually written to the left: O'[sem(A)] = r. Taking into account that the semantics is highly under specified (e.g. Reyle 1 993 ) and that it seldom specifies a definite outcome, we assume that the context change potential is a relational notion. If r is one of the potential outcomes of updating 0' with sem(A), this is written as O'[sem(A)]r. The Generator Gena is now identified with the set of input output (form-interpretation) pairs (sem(A), r ) such that r is a potential result of updating 0' with sem(A); more formally:
200
Some Aspects of Optimality in Natural Language Interpretation
(A, T ) satisfies the !-principle iff (A, T ) E Gen17 and there is no other pair (A, T1 ) such that (A, T1 ) >- (A, T ) (A, T ) is called optimal iff it satisfies both the Q-principle and the I-principle.4
(I)
4 In terms of game theory, the solution concept that underlies the formulation of (strong) optimality is that of a 'Nash Equilibrium' (see Dekker & van Rooy, this volume).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Obviously, a pair (A, T ) satisfies the Q principle just in case A is an optimal production that can be generated starting with T. On the other hand, a pair (A, T ) satisfies the I-principle just in case T is an optimal outcome of interpreting A. Seeing both principles as being part of the real mechanism of natural language comprehension, the !-principle can be considered as a sub-mechanism for finding out preferred interpretations, and the Q principle can be considered as an (absolute) blocking mechanism that suppresses the interpretations that are connected more economically with an alternative form. In standard OT the ordering relation between elements of the generator is established via a system of ranked constraints. These constraints are typically assumed to be output constraints, i.e. they may be either satisfied or violated by an output form. In the bidirectional framework just presented, changing the perspectives is possible. This means that an output under one perspective can be seen as an input under the other perspective. Therefore, it is plausible to assume output and input con straints. However, we should avoid (relational) constraints that refer to inputs and outputs simultaneously. Seeing the input as a linguistic form that conveys phonological, syntactic, and semantic information, input con straints are typically markedness conditions that evaluate the 'harmony' of the form. On the other hand, the output (i.e. the resulting context T) is evaluated by constraints that determine its coherence and informativeness (with regard to the initial context a). Let me now give a very schematic example in order to illustrate some characteristics of the bidirectional OT (labelled strong version in order to discriminate it from a weak version introduced later). Assume that we have two constraints called F and C. F is a constraints on linguistic forms and collects the effects of linguistic markedness. C is a constraint on resulting contexts and refers to coherence and informativeness. There is no reason to introduce a ranking between F and C. Let us assume two forms AI and A2 which are semantically equivalent. That means Gena associates the same relations of context change with them. With a as initial context, let us assume the possible outcomes are TI and T2• Further, we assume that no other form updates a to one of these outcomes. Let us stipulate that AI satisfies F but not A2 and that TI satisfies C but not T2• That makes the form
Reinhard Blumer
201
A2 less well-formed than the form A1 and the resulting context T2 more complex than the resulting context TI. The bidirectional view can be demonstrated by the following tableau, where two super-columns are introduced, one for each result of context change.
(7)
AI
�
Az
»+
»+
F
c
I&
* *
*
Tl
*
Tl
I use Smolensky's (1 996) repertoire of symbols here:
�& indicates the optimal candidate when the production perspective is taken (find an optimal expression starting with ri) and »+ indicates the optimal candidate when the comprehension perspective is taken (find an optimal interpreta tion starting with Ai). Super-optimal pairs are those that are production and comprehension optimal. This is indicated by the simultaneous occurrence of� and »+. The tableau shows that only the form A 1 survives, with T 1 as its only interpretative outcome. Obviously, the form A2 is blocked in all its (semantically admissible) interpretations.5 The scenario just installed describes the case of total blocking where some forms (e.g. *furiosity, *fallacity) do not exist because others do (fury, fallacy). However, blocking is not always total but may be partial. According to Kiparsky (1982), partial blocking is realized in the case where the special (less productive) affix occurs in some restricted meaning and the general (more productive) affix picks up the remaining meaning (consider examples like refrigerant- refrigerator, informant - informer, contestant- contester). To handle these and other cases Kiparsky ( 1 982) formulates a general condition Avoid Synonymy. Working independently of the Aronoff-Kiparsky line, McCawley (1978) collects a number of further examples demonstrating the phenomenon of partial blocking outside the domain of derivational and inflectional processes. For example, he observes that the distribution of 5 Zeevat (personal communication) has proposed using pictures of the following kind, where arrows indicate the optimal candidate that arises when the indicated direction of optimization is taken. A link with arrows in both directions indicates a super-optimal pair.
A· x '· 2A 't2
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Interpretations
c
F
Forms
202
Some Aspects of Optimality in Natural Language Interpretation
productive causatives (in English, Japanese, German, and other languages) is restricted by the existence of a corresponding lexical causative. Whereas lexical causatives (e.g. (Sa)) tend to be restricted in their distribution to the stereotypical causative situation (direct, unmediated causation through physical action), productive (periphrastic) causatives tend to pick up more marked situations of mediated, indirect causation. For example, (8b) could have been used appropriately when Black Bart caused the sheriff's gun to backfire by stuffing it with cotton. (8)
a.
Black Bart killed the sheriff Black Bart caused the sheriff to die
Typical cases of total and partial blocking are not only found in morphology, but in syntax and semantics as well (cf. Atlas & Levinson I 98 I ; Horn I 984; Williams I 997 ). The general tendency of partial blocking seems to be that 'unmarked forms tend to be used for unmarked situations and marked forms for marked situations' (Horn I 984: 26)-a tendency that Horn ( I 984: 22) calls 'the division of pragmatic labour'. There are two principal possibilities avoiding total blocking within the bidirectional OT framework. The first possibility is to make some stipulations concerning Gen excluding equivalent semantical forms. Such a case is demonstrated in (9): F
Forms
..
c
*
Interp retations
In this case the unmarked form A, is stipulated to be used for the unmarked situation only. (This seems plausible when we assume the child learns the meaning of kill in stereotypical, unmarked situations). The interpretation of the marked form A2 remains open. Unfortunately, the bidirectional OT described in (6) does not select any situation for A2• Starting with T2, expressive optimization selects A2, as desired. However, we do not come back to the marked situation T2 when the inverse perspective (interpretative optimization) is taken. Instead, the unmarked situation T, is selected. Consequently, there is no output that is paired super-optimal with A2• That means, A2 is blocked in all interpretations.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
b.
Reinhard Blutner
203
The only possibility to account for Horn's division of pragmatic labour is to stipulate it as a property of the Generator. This is indicated by the following tableau: ( Io)
Forms
Obviously, this solution is completely ad hoc, and we should look out for an alternative solution.6 The bidirectional OT we have considered until now is a very strong and absolute one. We have assumed (i) that an input-output pair (A, T) is super optimal just in case T is optimal for A and A is optimal for T, and (ii) that the bidirections of optimization are independent of each other. This means that the results of optimization under one perspective are not assumed to influence which structures compete under the other perspective. Our initial motivation for developing a bidirectional OT was the formulation of the Gricean maxims in Radical Pragmatics (Atlas & Levinson .I 9 8 I ; Horn I 9 8 4). Already the informal formulations given in (4) make it completely clear that we need a formalization where bidirections of optimization refer to each other. Such a formalization has been given in Blumer ( I998 ):
( I I ) bidirectional OT (weak version) (Q) (A, T) satisfies the Q-principle iff (A, T ) E Gena and there is no other pair (A', T ) satisfying the I-principle such that (A1, T) >- (A, T) (A, T) satisfies the I -principle iff (A, T ) E Gena and there is no (I) other pair (A, T1 ) satisfying the Q-principle such that (A, T 1 ) >- (A, T) (A, T ) is called super-optimal iff it satisfies both the Q-principle and the I- principle.7
6 As suggested by an anonymous referee, there is a further argument that shows that it is problematic to have hard constraints for excluding total blocking. In fact, a sentence like (8a) CAN be used in situations where Black Bart caused the Sheriffs gun to backfire it with cotton. This possibiliry is excluded when hard constraints are used as in (9) and ( 1 0). 7 Recently, Gerhard Jager Gager 1999; see also Jager & Blumer to appear) has presented a more transparent formulation of bidirectional OT:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Interpretations
204
Some Aspects of Optimality in Natural Language Interpretation
I call this variant of the bidirectional OT the weak vers10n. The important point is that the structures that compete in one perspective of optimization are constrained by the outcomes of the other perspective and vice versa. The purpose of this kind of recursive dependence can be demonstrated by coming back to our original example which leads now to the following tableau:
( I 2)
Forms AI
Interpretations
I&
F
c
c
»+
* I&
* Tl
Jl)»
*
*
Tl
Let us take first the comprehension perspective starting with A 1• The structures that compete are {r1 , r2} (the marked form A2 does not block any of them). From the fact that ri is less expensive (more stereotypical) than T2 it follows that the little arc »+ has to select T 1 • Now take the production perspective starting with T I . An analogous argument shows that the little hand I& selects A I . Consequently, the pair (Au T1 ) is super-optimal-just as in tableau (7) where we discussed the strong view. Next consider the comprehension perspective starting with A2• In this case the structures that compete are restricted to the singleton {r2} since the unmarked form A 1 blocks T � > and we get that the little arc Jl)» has to select T 2• An analogous argument applies to the production perspective starting with T2• In this case the competition set is restricted to the singleton {A2}, and the little hand I& selects A2• In contrast to the strong view, now the pair (A2, T 2 ) comes out as super-optimal as well. And this demonstrates that the weak view can (A, T ) is super-optimal iff (A, r ) E Gena and (Q) there is no super-optimal (A', r ) < (A, r ) (I) there is no super-optimal (A, r ' ) < (A, r ). Jager has shown that there is a unique super-optimality relation in case < is well founded. Furthermore, this formulation of super-optimality is equivalent to that presented in (I I) if < satisfies transitivity. Jager's results demonstrate that the circularity inherent in definition (I I) is an apparent one only. Suppose the preference relation < as well founded, then both the definition (I I) and Jager's definition of super-optimality come out as sound recursive definitions (cf. also Dekker & van Rooy, this volume). Does the recursive variant of bidirectionality (i.e. weak bidirection) extend the computational capacity of the generator and, if yes, in which way? These are important but largely unsolved problems even for unidirectional OT. (For some interesting results concerning the system OT-LFG, cf. Kuhn zooo). Gerhard Jager (p.c.) has a proof that under the same conditions that are assumed in Karttunen (I 998), weak bidirection does not extend the generative capacity of the generator.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Az
F
Reinhard Blutner
205
.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
account for the good old idea that unmarked forms tend to be used for unmarked situations and marked forms for marked situations. One consequence of the strong mode of optimization in (6) can be summarized as follows: What we produce we are able to understand adequately and what we understand we are able to produce adequately. At least the second part of this consequence is clearly false when we consider children's ability in natural language production, which lags dramatically behind their ability in comprehension. Smolensky (1996) has demonstrated that OT gives an plausible explanation for this lag. OT predicts that in comprehension relatively marked forms can be understood appropriately. However, when we consider generation, then highly unmarked forms are produced that significantly differ from the initial forms. The lag between comprehension and production is overcome by learning. According to the OT learning theory (Smolensky 1 996; Tesar & Smolensky zooo), learning results in a state of the system that satisfies the demands of strong bidirection. It is easy to prove that a pair that is optimal (strong bidirection, c£ (6) ), is super-optimal (weak bidirection, c£ (9) ) as well. However, weak bidirection gives a chance to find additional super-optimal solutions. This is demon strated by tableau ( r 2) Is it possible to give a natural interpretation for these additional solutions? I want to propose the idea that these additional solutions are due to the flexibility and ability to learn which the weak formulation alluded to. In my opinion, the weak version of the bidirectional OT can be taken to describe the possible outcomes of self-organization before the learning mechanism has fully realized the equilibrium between product ive and interpretative optimization. Jager (1 999) and Dekker & van Rooy have proposed algorithms that update the ordering (preference) relation >-- such that (i) optimal pairs are preserved and (ii) a new optimal pair is produced if and only if the same pair was super-optimal at earlier stages. Consequently, we can take the solutions of weak bidirection to be identical with the solutions of strong bidirection considering all the systems that result from updating the ordering relation. Arguably, updating the ordering relation in the style of Jager describes a kind of self-organization which is very close to certain mechanisms of self organization in language change. This point may be clarified when we (re)consider the principle of iconicity (called 'the division of pragmatic labour' within the domain of pragmasemantics). This principle can be proven to result from weak bidirection (ask Gerhard for the proof). In the school of natural morphology (for references c£ Wurzel 1998), the same principle plays an important role in describing the direction of language change.
206 Some Aspects of Optimality in Natural Language Interpretation Constructional iconicity: A semantically more complex, derived morphological form is unmarked regarding constructional iconicity if it is symbolized formally more costly than its semantically less complex base; it is the more marked, the stronger its symbolization deviates from this (Wurzel 1 998: 68).
Analogies of this kind give substance to the claim that weak bidirection can be considered as a principle describing (in part) the direction of language change: super-optimal pairs are tentatively realized in language change. This relates to the view of Horn (1984) who considers the Q principle and the I principle as diametrically opposed forces in inference strategies of language change.
PRESUPP O S I T I O N PROJE C T I O N
In the previous section we have outlined two general ideas that determine the shape of Gen in natural language interpretation: underspecification and dynamic semantics. Within the realm of underspecification we can discriminate between structural underspecification and lexical under specification. Structural underspecification is related, for example, to scope, ellipsis, and presupposition. Lexical underspecification, on the other hand, relates to polysemy, metonymy, and other aspects of the 'Generative Lexicon'. Although it is seldom made completely explicit in OT, the choice of a particular representational format is unavoidable in order to be give a sound formulation of the constraints and their ranking. With regard to the representational format, we will proceed by modelling contexts as DRSs. Moreover, the initial DRSs of presupposition-inducing expressions are treated in the particular framework of van der Sandt (1992) and Geurts (199 5 ). This framework combines the idea of dynamics with the aspect of underspecification that relates to presupposition projection. The aim of this section is to demonstrate that van der Sandt's/Geurts' projection mechanism for presuppositions can be reconstructed (in important aspects) and improved (in secondary aspects) as a consequence of the I-principle. Moreover, it can be explained why accommodation is sometimes blocked. This is an important consequence of the Q-principle, and its integration realizes an effective extension of the van der Sandt/ Geurts proposal. As usual, we consider a DRS K as a pair ( U(K), Con(K) ) , where U(K) is a set of reference markers and Con(K) is a set of DRS-conditions. If P is an n-place predicate, and X1, Xn are reference markers, then P(x1 , , xn) is a simple DRS-condition. If K and K' are DRSs, then • K, K V K', K :::::} K' are •
•
•
,
•
•
•
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
4
Reinhard Blumer
(complex) DRS-condition (c£ Kamp 1 99 5 . 1 999)·
&
Reyle
1 99 3 ;
Kadmon
1 990;
207
Geurts
( 1 3)
u [sem(A)] T just in case T is the result of merging8 u with the result of projecting the presupposed material of sem(A) such that the resulting DRS is a proper one (it may not contain any free reference markers).9
Using the conception of Gen as defined in ( 5 ) , the formulation in ( 1 4) results where the Generator is considered for a specific input form A:
( 1 4) Gena (A) = {T: T is the result of merging u with the result of projecting the presupposed material of sem(A) such that the resulting DRS is a proper one}
The part of the proj ected DRS that factors with part of the superordinated DRS/initial context (u) will be called bound (or resolved) material; the part that does not factor will be called accommodated material. For convenience, in the corresponding DRSs, the part of the presupposition which counts as bound when projected is underlined, and the part which has to be accommodated is underlined twice. 1 995): IfKis a set ofDRSs, then EBK ( UK E K U(K ) , UK E K Con(K )}. A necessary condition is that presupposed material projects to a DRS that subordinates the origin position. 8 9
DRS-merge (c£ Geurts
=
Geurts 1 995): ::; is the smallest preorder (transitive, reflexive) for which all of the following hold, for any K, K', K":
Subordination (c£
a. b. c. d.
If -,K' E Con(K), then K ::; K' If K ' V K " E Con(K), then K ::; K' and K ::; K " If K ' => K" E Con(K), then K ::; K' ::; K" If B/K ' E Con(K), then K ::; K' (Read K' ::; K as K' subordinates K ).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In order to account for presupposition inducers we introduce a further type of complex DRS-conditions: conditions of the form B/K, where K is a DRS and B is a DRS-condition. Conditions B/K have a special status and are called slash-conditions. They induce presuppositions and mark them as material behind the slash. Though not identical, this notation is very similar to that of Geurts ( I 995 ). The role of slash conditions is to indicate that a presupposition may be bound or accommodated in any DRS that subordinates the DRS in which it originates. Since the structural position where the presupposition is resolved/accommodated is not specified semantically, an element of structural underspecification is introduced into the whole framework. More formally, let u and T be ordinary DRSs and sem(A) be a DRS that may contain slash conditions (introducing presupposed material). Then the idea can be expressed by the following notion of context change:
208 Some Aspects of Optimality in Natural Language Interpretation
Let us give two simple examples. In ( I 5) a conditional A is given and its semantic form sem(A) is indicated. With regard to an initial context that is empty (0) three projections of the presupposed material are possible. They are indicated by T I , T 2 , r3 and refer to what is usually called local, intermediate, and global accommodation, respectively. Binding is not possible in these situations.
( I 5) A:
If Peter has a dog, then his cat is gray - [ :[x: dog(x), have(Peter, x)] =? [ : gray(y) I [y: have(Peter, y), cat(y)] ] ] Gen(A) {ri , T2, r J , where T I = [ : [x: dog(x), have(Peter, x)] =? [y: gray(y), have(Peter, y), cat(y)l ] T2 = [ :[x, y: dog(x), have(Peter, x), have(Peter, v), cat(v)] =? [ : gray(y)] ) r3 = [y: have(Peter, y), cat(y), [x: dog(x), have(Peter, x)] =? ( : gray(y)] ] sem(A)
( I 6) A:
If Peter has a cat, then his cat is gray ( : [x: cat(x), have(Peter, x)] sem(A) =? [ : gray(y) I [y: have(Peter, y), cat(y)] ] ] Gen(A) {r1 2 T2 , rJ, where T I = [ :[x: cat(x), have(Peter, x)] =? [y: gray(y), have(Peter, y), cat(y)l ] T2 = ( :[x: cat(x), have(Peter, x)] =? [ : gray(x)] ] T 3 = [y: have(Peter, y), cat(y), [x: cat(x), have(Peter, x)] Y [ : gray(y)] ]
In this case, the local projection (r1) and the global projection (r3) require accommodation. In contrast, the intermediate projection allows factoring, which is already realized in T2• (Bound material is indicated by single underlining). In example (I 6) the intuitively correct interpretation refers to the intermediate projection (T2). In order to account for the intuitively correct interpretations of complex sentences that contain presupposition inducers, van der Sandt ( 1992) assumes that the projection process is restricted by general preferences. Geurts ( I995 ) has reformulated and improved van der Sandt's account. His preferences are as follows: (i) If a presupposition can both be bound or accommodated, there will in general be a preference for the first option, and (ii) If a presupposition can be accommodated at two different sites, one of which is subordinate to the other, the higher site will, ceteris paribus, be preferred. (Geurts I 995: 27ff)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Intuitively, the interpretation given by r3 (global accommodation) seems to be strictly preferred. This conforms to our intuition which interprets A by assuming that Peter has a cat and saying that it is gray in case Peter has a dog. Another example is the following:
Reinhard Blumer 209
Moreover, Geurts provides a clear motivation for these preferences. The rationale behind (i) is that hearers generally aim at interpretations that are maximally coherent, and (ii) is explained by the assumption that hearers tend to prefer the strongest interpretation that is consistent with what the speaker says (Geurts 1995: 28).1 0
My suggestion for an OT treatment of presupposition projection is simply to take the rationale behind Geurt's preferences more serious than the preferences themselves. Consequently, the following constraints can be formulated:
Their ranking is R: AvoidA
»
BeStrong
The first constraint prefers to bind presupposed material instead of accommodating it. Moreover, the present formulation of AvoidA gives a partial explanation for the preference for bridging and partial resolution over pure accommodation.1 1 The notion of strength, on the other hand, is based on the entailment relation which is well defined within DRT (c£ Geurts 1995). As demonstrated in Blutner (1998), this notion can be refined by introducing a probabilistic measure. In any case, what is important is the fact that BeStrong is a graded constraint, not an absolute one. The ranking AvoidA » BeStrong is necessary to validate van der Sandt's/Geurts' first preference. 1 2 It is not difficult to see how interpretation optimality (!-principle) solves the selection task with regard to the examples given in ( I 5) and ( I 6). The respective OT tableaus are presented in (r7) and (r 8) in a schematic form. u>v
0
If p then q/r
»->
w>v
*AvoidA "BeStrong *AvoidA vBeStrong r,p =>
q (global)
(r 1\ p)
=>
*AvoidA wBeStrong
q (Interm.) p => ( q 1\ r) (local)
10 In a footnote, Geurts tells us that this is true only as long as we ignore bridging. In the present paper, we are susceptible to this ignorance. 1 1 By introducing probabilistic notions such as salience and cue validiry the formulation of the constraint can be refined (perhaps along the lines outlined in Blumer 1 998). 12 I am convinced that this strict ranking system must be replaced by a cumulative constraint weighting system when it comes to considering the bulk of bridging phenomena.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
C 1 : Avoid Accommodation (AvoidA): It counts the number of discourse markers that are involved in accommodation. Cz: Be Strong: It evaluates pairs ( A, T) with stronger outputs T higher than pairs with weaker ones.
2 r o Some Aspects of Optimality in Natural Language Interpretation
In the first case all the possible outcomes (r u T2, r3 ) violate the constraint AvoidA (with regard to the reference marker y). Consequently, BeStrong is the critical constraint. Because global accommodation gives the strongest outcome it wins the competition. ( I 8) 0 If p then q/p *AvoidA "BeStrong p, p
=>
q (global)
u>v
w=v
D->
AvoidA vBeStrong
*AvoidA wBeStrong
p => q (Interm.)
p => ( q 1\ p) (localr3
(I 9) a. Every German is proud of his car b. Every German who owns a car is proud of it c. Every German has a car and is proud of it In (I9a) global accommodation is excluded14 and we have to select between intermediate and local accommodation only. Local accommoda tion refers to the stronger interpretation and intermediate accommoda tion refers to accommodation at the higher site. Consequently, if we take the criterion that prefers the higher site, then the interpretation of (I 9a) is identified with that of ( I 9b). In contrast, the criterion that prefers the stronger interpretation identifies the interpretation of (I 9a) with that of ( I 9c). Unfortunately, it is not easy to determine what the intuitively correct interpretation of ( I 9a) is, since the proposition that Germans have cars is nearly tautological. Beaver (I 994) gives an example where the judgement is easier. The following is a slightly simplified vers10n. 13 In this schematic formulation (ignoring reference markers) the intermediate and the local version seem to be logically equivalent, which is not really the case. 1 4 The presupposition triggered by his car contains a reference marker that is bound by the quantifier and it would be free if the presupposition were accommodated globally (resulting in an improper DRS).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In the second case, global and local projection give outcomes that violate the constraint AvoidA. In contrast, intermediate projection allows factoring and that is why it avoids accommodation. Because the constraint AvoidA ranks higher than the constraint BeStrong, intermediate projection is the winner. Obviously, there is no necessary connection between how close the projection is to the main DRS and how strong the resulting interpretation is. A case in point where the two criteria diverge is given by the following example:
Reinhard Blutner
2I I
(2o) a. ??Few of the team members can drive, but every team member will come to the match in her car. b. Few of the team members can drive, but every team member who owns a car will come to the match in her car. c. ?Few of the team members can drive, but every team member owns a car and will come to the match in her car
(2 1) a. Birds lay eggs (preferred ftmale birds lay e�s) b. Most ships unload at night (preferred most ships that unload do it at night) My feeling is that intermediate accommodation is partial in these cases and can outrank local accommodation, which is less partial. 1 6 The kind of partiality I have in mind is probabilistic in nature. A possible way to approach this phenomenon is by adopting an OT framework that is controlled by cue validity and other probabilistic factors (c£ Blutner (1998) for realizing such a framework using a Generator based on abduction). Further research seems necessary to clarify this point. So far we have almost exclusively considered interpretation optimality (!-principle). Is it necessary to make use of the other way of optimization (Q-principle)? The answer is clearly affirmative. The point is that accommodation is not always possible although the !-principle demands it. Accommodation can be blocked. The following example by Asher & Lascarides ( 1998) gives a demonstration. Let us compare the two dialogues (22abc) and (22abd): (22) a. b. c. d.
A: Did you hear about John? B: No, what? A: He had an accident. A car hit him. A: He had an accident. ??The car hit him.
15 This is a somewhat unfair and roughly simplifying look on the van der Sandt/Geurrs proposal. Geurts and van der Sandt (1 999) demonstrate that with a little use of abstraction rules and propositional reference markers the data of Beaver ( 1 994) can be handled. My point here is only to demonstrate that the problems can be resolved in a different way if we take the rationale behind the preferences more seriously than the preferences themselves. 16 Note also the importance of stress and focus, especially in example (2 1 b) (c£ Hendriks & de Hoop, to appear)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Intuitively the interpretation of (2oa) is rather strange while (2ob) is a perfectly acceptable sentence. According to Beaver (1994), this demonstrates that the van der Sandt/Geurts proposal must be wrong, since their criterion identifies the interpretation of (2oa) with that of (2ob). In contrast, the present OT proposal identifies the interpretation of (2oa) with that of (2oc), which I think is a much better choice. 1 5 A further point i s that we should explain why in many examples intermediate accommodation is clearly dominant, such as in the following:
212
Some Aspects of Optimality in Natural Language Interpretation
(2 3) A trigger for presuppositions does not accommodate iff any occurrence of it has a simple expression alternative that does not trigger. Based on the availability of expression alternatives and the logical requirement of the presupposition proposed a fine-grained classification of presupposition triggers can be proposed. Even more interesting, an understanding of presupposition triggers like discourse particles, which are typically outside the scope of most standard theories becomes feasible (cf Zeevat I 999). The semantics and pragmatics of focus provides a further challenge for applying the present ideas. Adding only one new constraint, Avoid Focus, which is ranked lower than Avoid Accommodation, it is a simple exercise to demonstrate that Schwarzschild's deaccenting theory of congruence (Schwarzschild I999) is a natural consequence of the present ideas, crucially making use of the Q-principle. In the first part of this paper I have outlined some theoretical reasons that recommend the weak version of bidirectional OT. From an empirical point of view it is not trivial to find data where the weak version is clearly 17 Bart Geurts (p.c.) argues that the discourse (2 sd) is unacceptable because the proposition made by the second part is rather uninformative (supposed appropriate bridging). Though this idea is interesting it cannot be the whole story. In particular, the idea cannot explain the contrast between the following examples:
c'. He had a bike accident. A car hit him seriously. d'. He had a bike accident. ?The car hit him seriously. Furthermore, the contrast does not disappear when dropping the material that according to Geurts can trigger bridging: c11• A car hit him (seriously). d". ?The car hit him (seriously).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The van der Sandt/Geurts approach does not predict any difference between these two discourses and would find them both acceptable. But (22abd) is unacceptable, while (2sabc) is acceptable. fu a matter of fact the presupposition of the car cannot be accommodated in (22abd). With the help of the Q-principle this observation is easy to explain. Starting with a neutral context (} (neutral with regard to cars), the outcome of context change is the same for (22c) and for (22d). Consequently, the two sentences constitute simple expression alternatives. The difference is that in the second case but not in the first one accommodation is necessary to yield the output context. This makes the second case the more complex one and as such it is blocked by the simpler alternative (Q-principle). 17 Zeevat ( I 999) formulated and substantiated the following theorem which generalizes a series of related facts. It can be proved in the very same way we have just sketched.
Reinhard Blutner 2 r 3
Acknowledgements This paper is dedicated to Manfred Bierwisch on the occasion of his 7oth birthday. This work was supported by the Deutsche Forschungsgemeindschaft (DFG). Parts of this paper were first presented on a DIP colloquium in Amsterdam. My special thanks go to Henk Zeevat and Helen de Hoop who have encouraged me to pursue this line of research and gave valuable impulses and stimulation. Furthermore, I have to thank Anton Benz, Manfred Bierwisch, Paul David Doherty, Werner Frey, Bart Geurts, Gerhard Jager, Paul Law, Klaus Robering, Paul Smolensky, and Rob van der Sandt. I am grateful to two anonymous referees for their very helpful comments. Received: Final version received:
REINHARD BLUTNER Humboldt University, Berlin Prenzlauer Promenade 149 -152 D-13189 Berlin Germany
[email protected] http://wwwz.rz.hu-berlin.de/asg/blutner/
0 5 .04.00 2 5 .07.00
REFERE N CE S Asher, Nicholas & Lascarides, Alex ( 1 998), 'The Semantics and Pragmatics of Pre supposition', journal of Semantics, r s , 239-99· Beaver, David (r994), 'Accommodating Topics', in Peter Bosch & Rob van der Sandt (eds), Focus and Natural Language Processing. Volume ;: Discourse, IBM, Heidelberg, 439-48. Atlas, Jay David & Levinson, Stephen C. (I 98 I), 'It-clefts, informativeness and logical form', in Peter Cole (ed.), Radical
Pragmatics, Academic Press, New York,
I-6 1 . Bierwisch, Manfred (I 9 8 3), 'Semantische Einheiten und konzeptuelle Reprasenta tion lexikalischer Einheiten', in Untersu chungen zur Semantik, Akademie-Verlag, Berlin, 6 I -99. Bierwisch, Manfred (1996), 'Lexical infor mation from a minimalistic point', in Chris Wilder (ed.), The Role of Economy Principles in Linguistic Theory, Akademie Verlag, Berlin. 227-66.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
preferred over its strong counterpart. The investigation of phenomena where Q-based effects (blocking) interact with 1-based effects (interpreta tional preferences) may be an opportunity to make the comparison conceivable. As a first step in this direction, Jager & Blutner (to appear) investigated the interaction between polysemy and focus. Dealing with the German adverb of repetition 'wieder' (again), the specific linguistic puzzle that was envisaged concerned the selection of the repetitive vs. the restitutive readings, depending on focus and scrambling. The results appeared to favour the weak version of bidirectional OT. It seems important to me to pursue the problem of discriminating between the weak and the strong version in depth.
2 I 4 Some Aspects of Optimality in Natural Language Interpretation
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Blutner, Reinhard (I998), 'Lexical prag Gibson, Edward & Broihier, Kevin (I 998), 'Optimality theory and human sentence matics', Journal ofSemantics, I s, I I s-62. processing', in Pilar Barbosa, Danny Fox, Boersma, Paul (I998), Functional Phonology, Paul Hagstrom, Martha McGinnis, & Holland Academic Graphics, The David Pesetsky (eds), Optimality and Hague. Competition in Syntax. MIT Press, Bresnan, Joan (to appear), 'Explaining mor Cambridge, MA, I 5 7-91 . phosyntactic competition', in Mark Baltin & Chris Collins (eds), Handbook Groenendij k, Jeroen & Stokhof, Martin ( 1 99 1 ), 'Dynamic predicate logic', of Contemporary Syntactic Theory, Black Linguistics and Philosophy, 14, 39-Ioo. well, Oxford. Burzio, Luigi (r989), 'On the non-existence Haspelmath, Martin (I999), 'Optimality and diachronic adaptation', Zeitschrift of disjoint reference principles', Rivista di fur Sprachwissenschafi, r 8 , I 80-205. Grammatica Generativa, 14, 3-27. Burzio, Luigi (I998), 'Anaphora and soft Heim, Irene (I982), 'The semantics of definite and indefinite noun phrases', constraints', in Pilar Barbosa, Danny Fox, Ph.D. thesis, University of Massachu Paul Hagstrom, Martha McGinnis, & setts, Amherst. David Pesetsky (eds), Optimality and Competition in Syntax, MIT Press, Hendriks, Petra & Hoop, Helen de (to Cambridge, MA, 9 3- I I 3. appear), 'Optimality theoretic semantics', Linguistics and Philosophy. Carston, Robyn (I998), 'The semantics/ pragmatics distinction: a view from rel Hoop, Helen de (2ooo), 'Optimal scram evance theory', UCL Working Papers in bling and interpretation', in H. Bennis, Linguistics, 10, I-30. M. Everaert, & E. Reuland (eds), Inter face Strategies, KNAW, Amsterdam, Choi, Hye-Won (1996), 'Optimizing Struc ture in Context', Ph.D. dissertation, I 5 3 - I 68. Stanford University. Hoop, de Helen & Swart, Henriette de (I998), 'Temporal adjunct clauses in Copestake, Ann & Briscoe, Ted (I 995), 'Semi-productive polysemy and sense optimality theory', MS, OTS Utrecht. extension', jou rnal ofSemantics, 12, I s-67. Horn, Laurence R. (I984), 'Toward a new taxonomy for pragmatic inference: Deemter, Kees van & Peters, Stanley (eds) Q-based and R-based implicatures', in (I996), Semantic Ambiguity and Under specification, CSLI Publications, Stanford, D. Schiffrin (ed.), Meaning, Form, and Use in Context, Georgetown University Press, CA Washington, I I-42. Dekker, Paul & Rooy, Robert van (this volume), 'Optimality theory and game Ito, Junko, Mester, Armin & Padgett, Laye (I 99 5 ), 'Underspecification in optimal theory: some parallels'. ity theory', Linguistic Inquiry, 26, 5 7 1 Fanselow, Gisbert, Schlesewsky, Matthias Cavar, Damir & Kliegl, Reinhold 6 1 3. (I999), 'Optimal parsing'. MS, University Jager, Gerhard ( 1999), 'Optimal syntax and of Potsdam. optimal semantics', handout for talk at Geurts, Bart (I99S), 'Presupposing', Ph.D. DIP-colloquium. Available from http:// www .zas.gwz-berlin.de/ mitarb/home dissertation, University of Osnabriick. Geurts, Bart (I 999), 'Presuppositions and page/jaeger/ Pronouns', Elsevier, Oxford. Jager, Gerhard & Blutner, Reinhard (to Geurts, Bart & Sandt, Robert A van der appear), 'Against lexical decomposition (I 999), 'Domain restriction', in Peter in syntax', in Proceedings of IATL 15, Bosch & Robert A van der Sandt (eds), University of Haifa. Focus: Linguistic, Cognitive and Computa Jakobson, Roman (r94 I / r 968), Child Lan tional Perspectives, Cambridge University guage, Aphasia and Phonological Universals, Press, Cambridge. Mouton, The Hague.
Reinhard Blumer 2 I 5 Kadmon, Nirit (I990), 'Uniqueness', Lin guistics and Philosophy, 1 3, 273-3 24. Kager, Rene (I999), Optimality Theory, Cambridge University Press, Cambridge. Kamp, Hans (I98 I), 'A theory of truth and semantic representation', in Jeroen Groenendijk et al. (eds), Formal Methods in the Study of Language, Mathematisch Centrum, Amsterdam. Kamp, Hans & Reyle, Uwe (1993), From
Dordrecht. Karttunen, Lauri (I998), 'The proper treat ment of optimality in computational phonology', Xerox Research Centre Europe manuscript. ROA-25 8-0498, Rutgers Optimality Archive, http:// ruccs.rutgers.ed u/roa.html. Kiparsky, Paul (1982), 'Word-formation and the lexicon' in F. Ingeman (ed.), Proceedings of the guistic Co'!ftrence.
1982
Mid-America Lin
Kuhn, Jonas (2ooo), 'Generation and par sing in optimality theoretic syntax issues in the formalization of OT LFG', to appear in Peter Sells (ed.), Formal and Empirical Issues in Optimal ity-theoretic Syntax, CSLI Publications,
Stanford, CA Lee, Hanjung (2ooo), 'Markedness and word order freezing', to appear in Peter Sells (ed.), Formal and Empirical Issues in Optimality-theoretic Syntax, CSLI Publica tions, Stanford, CA Levinson, Stephen C. ( r 98 3 ), Pragmatics, Cambridge University Press, Cambridge. Levinson, Stephen C. (r987), 'Pragmatics and the grammar of anaphora', journal of Linguistics, 2 3 , 379-434· McCawley, James D. (1978), 'Conversa tional implicature and the lexicon', in Peter Cole (ed.), Syntax and Semantics 9: Pragmatics, Academic Press, New York, 245-59· Nunberg, G. & Zaenen, A (1992), 'Systema tic polysemy in lexicology and lexico graphy', in K Varantola, H. Tommola,
,
Explorations in the Microstructure of Cogni tion, MIT Press, Cambridge, MA, 1 94-
28 I. Smolensky, Paul (1996), 'On the com prehension/production dilemma in child language', Linguistic Inquiry, 27, 720-3 I . Speas, Margaret (1997), 'Optimality theory and syntax: null pronouns and control', in Diana Archangeli & D. Terence Langendoen (eds), Optimality Theory: An Overview, Blackwell, Oxford, 1 3 470. Tesar, Bruce & Smolensky, Paul (2ooo), Learnability in Optimality Theory, MIT Press, Cambridge, MA Williams, Edwin (1997), 'Blocking and anaphora', Linguistic Inquiry, 28, 577-628. Wilson, Colin (1998), 'Bidirectional opti mization and the theory of anaphora', MS, Johns Hopkin University, to
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Discourse to Logic: Introduction to Mod e/theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory, Kluwer Academic Publishers,
T. Salmi-Tolonen, & J. Schopp (eds), Euralex II, Tampere, Finland. Pesetsky, David (1997), 'Optimality Theory and syntax: movement and pronunci ation', in Diana Archangeli & D. Terence Langendoen (eds), Optimality Theory: An Overview, Blackwell, Oxford, I 3 4-70. Prince, Alan & Smolensky, Paul (1993), 'Optimality theory: constraint inter action in generative grammar., MS, Rudgers University, New Brunswick, NJ and University of Colorado, Boulder (to appear, MIT Press, Cambridge, MA). Reyle, Uwe (1993), 'Dealing with ambigu ities by underspecification: construction, representation and deduction', journal of Semantics, ro I 23-79· Sandt, van der & Robert, A (I992), 'Pre supposition projection as anaphora reso lution', journal of Semantics, 9, 3 3 3-77. Schwarzschild, Roger (1999), 'GIVENness, AvoidF and other constraints of the placement of accent', Natural Language Semantics, r 3 , 87-I 3 8. Smolensky, Paul (I 986), 'Information processing in dynamical systems: foun dation of harmony theory', in David E. Rumelhart & James L McClelland (eds),
2 16
Some Aspects of Optimality in Natural Language Interpretation
appear in Jane Grimshaw, Geraldine Legendre, & Sten Vikner (eds), Optim ality Theoretic Syntax, MIT Press, Cam bridge, MA Wurzel, Wolfgang U. (1998), 'On marked ness', Theoretical Linguistics, 24, S 3-7 r . Zeevat, Henk (1999), 'Explaining presuppo sition triggers', MS AC99, University of
Amsterdam, available from http:/I www.hum.uva.nl/ computerlinguistiek/ henk/ Zeevat, Henk (this volume), 'Semantics in optimality theory'. Zipf, George K.. ( 1 949), Human Behavior and the Principle of Least Effort, Addison Wesley, Cambridge.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
journal of Semantia
17: 2 1 7-242
© Oxford University Press
2000
Bi-Directional Optimality Theory: An Application of Game Theory
University of Amsterdam
PAUL DEKKER AND ROBERT VAN ROOY
Abstract
r
INTRODUCTION
I fJohn says that OTS is possibly right, we can infer from this that he thinks it is not obviously, or necessarily right. What kind of inference is this? Suppose that from Possibly A we can infer semantically that it is possible that A is false. By this assumption we can easily account for the above inference, but we can no longer account for the fact that we might appropriately say OTS is possibly right, if not necessarily. The latter example makes clear that the above inference to the possibility that OTS is wrong cannot be conventionally associated with all sentences in which the sentential clause OTS is possibly right occurs. But how then should we account for the intuition that we can conclude that OTS might be wrong from what John says? Following Grice (1975), it has become a common practice in the area of pragmatics to distinguish what is said by the speaker's use of a sentence (the semantic or truth-conditional meaning of a sentence), and what is meant by it on a particular occasion. Thus conceived, pragmatics is concerned with the study of what is meant by an utterance above its semantic, or truth-conditional, content by taking into account the issue whether the utterance is appropriate in its conversational context, i.e. with respect to the (common) beliefs and intentions of the participants of the conversation. The main motivation for this division of labour between semantics and pragmatics is to keep the semantics as simple as possible; it allows us to determine the semantic content of a sentence in a compositional way based on its syntactic structure, without making reference to the attitudes of speakers and hearers.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Optimality Theory catches on in linguistics, first in phonology, then in syntax, and recently also at the semantics/pragmatics interface. In this paper we point to some parallels between principles employed in optimality theoretic interpretation, and notions from the well established field of Game Theory. Optimality theoretic interpretation can be defined as what we call an 'interpretation game', and optimality itself can be viewed as a solution concept for a game. More in particular, optimality can be characterized in terms of the game-theoretical notion of a 'Nash Equilibrium'.
2 1 8 Bi-Directional O ptimality Theory: An Application of Game Theory
Following Gazdar (1979), the following general pipe-line architecture of the semantics/pragmatics interface has emerged: I. 2.
3·
Thus, according to Gazdar, the semantics/pragmatics interaction goes only one way; although what is pragmatically presupposed or implicated might depend on the semantic content of the sentence, semantics is autonomous from pragmatics. It seems clear to us that this strong Gazdarian picture of the interface must be wrong for the following reason: not only what is pragmatically implicated depends on the attitudes of the participants of the conversation, but this might also the case for the truth-conditions that a sentence has. Pragmatic notions like appropriateness, expectation/naturalness and relevance are used both to determine what is conversationally implicated and to determine what is asserted by a sentence. It is clear that this dependence of what is said, or asserted, on pragmatic notions undermines the goal to determine the truth-conditions of sentences in a compositional way. Natural-language sentence are highly context-dependent; their truth conditions depend not only on the words used, but also on the circum stances in which they are used. The crucial point is that it seems impossible to explain systematically the truth-conditions that sentences have without referring to the beliefs, presuppositions and intentions of the participants of the conversation. For an illustrative example, let us consider briefly the process of anaphora resolution for a sentence like He is tall. It is clear that this sentence is highly underspecified or ambiguous; in different contexts the pronoun might refer to different individuals. Its resolution implies reference to such things as focus (Sidner r 98 3 ), the syntactic position (subject/non-subject) of the antecedent (Grosz et al. 1995), but also to the scenarios/prototypical situations involved (e.g. Sanford & Garrod 1 98 r ). Although the meaning of the sentence is highly context-dependent, the sentence has a more constant meaning, too; we might say that in all contexts the pronoun refers to the most salient male individual in that context. A
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
What is said by a (declarative) sentence, its semantic content, is equated with its truth-conditions. Truth-conditional content can be determined in a rather simple way compositionally without making reference to either what is (or could be) pragmatically implicated by what is said, or the attitudes of the participants of the conversation. To determine what is pragmatically implicated we can, and have to, make use of the truth-conditional content of the sentence; what is potentially implicated might be overruled, or cancelled, if it conflicts with what is semantically entailed, as in our above example OTS is possibly right, if not necessarily.
Paul Dekker and Robert van Rooy
2I9
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Gazdarian might then propose to represent this contextual information in a more or less objective way, without referring to the attitudes of the agents. What is the most salient individual in a context? For some contexts we can give rather objective criteria. For instance, it seems clear that when we utter the above sentence in the context where Bill is next to John has just been uttered, the pronoun will refer to Bill, but when the foregoing sentence would have been John is next to Bill, the pronoun would refer to John. The objective criterium in this case is that the (individual denoted by the) subject of a preceding sentence is more salient than the (individual denoted by the) object. But now consider the following discourse: Bill tickledJohn. He squirmed. According to the above rule the pronoun should refer to Bill. It is clear, however, that according to its most reasonable interpretation the pronoun does not refer to Bill, but to John. Why? Because we assume that it is the tickled person who has reason to squirm; the assertion that John squirmed is more in accordance with the expected scenario triggered by the previous sentence than the assertion that it is Bill who squirmed. We conclude that the speaker asserted that John squirmed, i.e. the constraint that the pronoun refers to the most salient person in its context of interpretation is overruled by the constraint that demands that what is said should be natural in its context of interpretation, i.e. in accordance with the relevant scenario. The triggered scenarios depend on world-knowledge and expectations of the participants of a conversation, which suggests that the relevant contextual parameters cannot be given without making reference to the attitudes of the speakers. But now we are running ahead of ourselves. For we might think of representing the relevant contextual parameter in the context of interpretation of the sentence in which the pronoun occurs as an 'objective' salient order, when we allow with Lewis (1979) for a rule of accommodation of comparative salience. In principle this is feasible, but note that in this case it is the process of accommodation that is governed by notions like appropriateness, naturalness or relevance that cannot be described without making reference to the attitudes of agents. Notice that according to this variant the relevant contextual parameter that helps to determine what is said (its truth conditions) by an utterance, the salience ordering, crucially depends on the utterance itself; whether and how the salience order should be accommodated depends on what would have been said by this utterance according to the different possible salience orderings. Observe also that in this variant some constraints can be overruled by our general pragmatic notions; in this case not that a pronoun should refer to the most salient individual in its context, but rather that the salience order determined after the interpretation of the first sentence of a discourse will function as the relevant salience order to interpret the anaphoric pronouns of the following sentence.
220
Bi-Directional Optimality Theory:
An
Application of Game Theory
2 O PTIMALITY THE ORETIC I NTERPRE TATI O N Recently, various phenomena on the semantics/pragmatics interface, like the ones discussed above, have been given an optimality theoretic formulation (Blutner, Hendriks & de Hoop, de Hoop & de Swart, Jager, Zeevat). In this section, and in section 4, we give a short overview of the various types of analyses that have been proposed, and illustrate these by means of a few examples.
2. r
One-dimensional optimality
According to the proposed application of Optimality Theoretic principles by de Hoop & de Swart (to appear) and Hendriks & de Hoop (2001) to the theory of interpretation, what compositional semantics gives us is a radically underspecified notion of meaning represented by a possibly infinite set of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The above example shows that we cannot systematically determine the semantic content of a sentence in a compositional way based on its syntactic structure, without making reference to the attitudes of speakers and hearers, if we equate the semantic content of a sentence with its truth -conditions. So what should we do? Give up compositionality, or give up the assumption that what should be determined compositionally are the truth-conditions of a sentence? The former, radical, option would result almost surely in giving up the distinction between semantics and pragmatics, as has been proposed in the old days of generative semantics. According to the latter option, compositional semantics still has a role to play. However, the semantic content of a sentence is not fully determined and does not give rise to clearcut truth -conditions; it is left underspecified. We have only discussed pronouns above, but similar remarks can be, and have been, made for the interpretation of other context-dependent constructions like modals (Kratzer 1977), presuppositions (van der Sandt 1992), quantifier scope (Parikh 1991), tenses (Asher & Lascarides 1993), adjectives (Blutner 1998), and quantified constructions (Hendriks & de Hoop, 2001). For all those cases it has been proposed that what should be determined compositionally should be left rather underspecified, and that to determine the actual truth-conditions of a sentence we have to rely on constraints motivated by principles of rational communication as given, for instance, by Grice's maxims of conversation. This results, obviously, in a new formulation of the semantics/pragmatics interface.
Paul Dekker and Robert van Rooy
22 1
( r ) Often when I talk to a doctor;, the doctor { i, J} disagrees with him { i, J} · In the interpretation of this example two constraints are at work:
(B)
If two arguments of the same semantic relation are not marked as being identical, interpret them as being distinct (DOAP) Don't Overlook Anaphoric Possibilities
In example ( I ), the two constraints have conflicting effects. If (DOAP) is fully satisfied, that is, if both 'the doctor' and 'him' are interpreted as anaphoric upon 'a doctor', then (B) is violated. And if (B) is satisfied, then at least either 'the doctor' or 'him' remains unresolved. Intuitively, this seems the best solution, and Hendriks & de Hoop therefore use this example to show that constraint (B) is harder than (DOAP). The (DOAP)-principle can be
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
interpretations of a well-formed syntactic structure. In addition, optimality theory gives us a ranked set of constraints that allow us to select the optimal interpretation associated with a particular syntactic structure. These constraints should of course be as general as possible, and also the rankings between those constraints should, if possible, be valid for a wide range of languages, based on general principles of rational communication. In order to illustrate how things might work out in such a theory, consider again the example that we discussed above with an anaphoric pronoun. The example is of the form aRb. He is P, where in the first sentence a and b are both names for male individuals. Discourses of this form are potentially ambiguous, or underspecified, because the pronoun might refer back to either a or b. But we can say something more; on the basis of empirical data we might observe that the pronoun will typically refer back to the subject expression, i.e. a. We can state this observation explicitly in a constraint. This constraint is very particular, but we might embed this particular constraint within a more general one, if we make use of the notion of comparative salience. In whatever way we do this, the important point is that the relevant constraint should not be too hard; in some circumstances it might be overruled. In the above discussed discourse Bill tickled John. He squirmed, for instance, it does not seem natural to state that Bill squirmed after the first sentence. Because it seems reasonable, with an eye upon the communicative aims, to assume that the constraint on naturalness is more important than the constraint on salience, the constraint that in our case demands that the pronoun should refer to the subject expression of the previous sentence becomes overruled. Thus, although pronouns are meant to refer back to subject expressions of previous sentences, this will only result in an optimal interpretation in case the stronger constraint of naturalness is also met. Another example, discussed in Hendriks & de Hoop, is the following:
222
Bi-Directional Optimality Theory: An Application of Game Theory
2.2
The Q- and !-principles
In his seminal paper on Logic and Conversation, Grice (1975) tried to account for so-called pragmatic inferences by making use of four maxims of conversation: the maxims of quality, quantity, relation, and manner. More recently, some attempts have been made to reduce and explicate these maxims to some more principled rules of, or constraints on, rational behaviour in communication. Valuable contributions in this direction have been made especially by Atlas & Levinson ( 198 r) and Horn (r984), who seek to reduce the maxims of quantity, relation, and manner to the following two principles: the Q-principle (implementing Grice's first maxim of quantity), which advises the speaker to say as much as he can to fulfil his communicative goals, and the I -principle (called R-principle by Horn 1984, and implementing the rest of the Gricean maxims except for quality), which advises the speaker to say no more than he must to fulfil his 1 The idea to compare not only different outputs with each other to determine the optimal interpretation, but also to take different inputs into account, can be traced back to Prince & Smolensky's (to appear) principle of Lexicon Optimization (section 9.3). A bi-directional view on optimality plays implicitly also an important role in the OT learning algorithm (Tesar & Smolenksy, to appear). According to this algorithm each piece of positive evidence (structural description) about the correct ordering of constraints brings with it a body of implicit negative evidence; the chosen description is preferred to the given competitors.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
overruled in order to satisfy (B), and the 'optimal' interpretation is that either 'the doctor' and not 'him' is anaphoric upon the antecedent 'a doctor', or the pronoun and not the definite description is. So far we have sketched an optimality theoretic formulation of only one of the two types of pragmatic inferences which we discussed in the first section of this paper. So how should we account for the case with which we began our story: the scalar implicature from 0 A to -, OA? Our intuitive explanation for this implicature was that the speaker did not think it was necessary that OTS was right, because otherwise he would have said so, i.e. he would have used another expression. It is not entirely clear how to account for this reasoning in terms of the above sketched one-dimensional search for optimality where the input is given by single syntactic structure, and no reference is made to alternative expressions that the speaker might have used. Blutner (MS) has recently argued that an account of scalar implicatures requires us to take into consideration what the speaker could have said, and proposed to go from a one-dimensional to a two dimensional search for optimality.1 This two-dimensional view was mainly motivated by a reduction of Grice's maxims of conversation to two principles.
Paul Dekker and Robert van Rooy
223
2 Notice the resemblance with Sperber & Wilson's (1 986) Relevance Theory, according to which meaning-optimal relevance-can be thought of as a balance between the two competing forces of maximization of contextual effect and minimizing of processing effort.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
communicative goals. By means of the I -principle we can explain, for instance, why in many contexts we can use (short, and thus efficient) pronouns to refer to individuals, instead of long eternal definite descrip tions, and it can also help to explain why in many cases the conjunctive connective and gives rise to a temporal, or even causal, interpretation. The Q-principle is responsible for the so-called scalar implicatures, and makes essential reference to alternative expressions the speaker could have used. Although both principles have the effect that the hearers can conclude more from the utterance than what is explicitly said by it, the strenghthenings due to the I and Q principles typically go in opposite directions. As a result, the two principles sometimes advise the speaker to do opposite things, and thus we would expect that the hearers sometimes do not know what to make of the utterance. For instance, if you say John was able to solve the problem, I can conclude by means of the I -principle that John actually solved the problem, while the Q-principle gives rise to the opposite conclusion that John actually did not solve the problem. (For otherwise you should have said he did so.) Horn (1984), following Zipf (1949), gives an interesting motivation for why the I- and Q-principles seem to give rise to opposite conclusions. He argues that the principles can be seen as representations of rational goals of competing forces to minimize their efforts: The I -principle represents the speaker's goal to minimize the effort to communicate as much as possible, while the Q principle can be seen to represent the hearer's goal to minimize his effort to understand.2 Looking at both principles from a minimization point of view has the effect that the I -principle and the Q-principle should be seen from two different perspectives: the I -principle from the speaker's perspective, and the Q-principle from the hearer's perspective. Interestingly, the principles can be viewed, equivalently it seems, from a maximization point of view when we switch roles. That is, an I -maxim requiring a cooperative speaker to say no more than needed, will make a rational hearer to get as much as possible out of which the speaker says, that is, in such a cooperative setting, the I -principle relates to a hearer's goal to maximize the relevance, or informativity, of a given utterance. Conversely, the Q-principle, advises the speaker to maximize his contribution to the goal of being as informative as he can (as it indeed was upon Grice's formulation). The two points of view thus collaborate to achieve two mutually dependent goals of the inter locutors: to maximize the cooperative and mutual goal of informativity, and to minimize individual efforts.
224
Bi-Directional Optimality Theory: An Application of Game Theory 2. 3
Two-dimensional optimality theoretic interpretation
(2) Two-dimensional OT (Strong Version) a representation-meaning pair (r, m) is optimal iff it satisfies both the Q- and the /-principle, where: (Q) (r, m) satisfies the Q-principle iff there is no other pair (r', m) such that (r', m) > (r, m) (I) (r, m) satisfies the I -principle iff there is no other pair (r, m') such that (r, m') > (r, m) How does this blocking due to the Q-principle work? Consider the scalar implicature again from Possibly A to Not necessarily A. Let us suppose that the speaker knows all about the possibility of A, and that he has the opportunity to say Possibly A ( <>p), Necessarily A (0 p) and the negation of these modal possibilities (-, <>p = 0 -, p and -, 0 p = <> -, p, respectively). Let us also assume, as seems quite natural, that 0 p f= Op. Given these 3 Boersma (1998) has recently made a similar move in phonology. He argues that sound structures reflect an interaction between the aniculatory and perceptual principles of efficient and effective communication: the speaker-oriented principle of minimization of articulatory effort and the hearer-oriented principle of minimization ofperceptual confusion.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Blutner (1998, 1 999) has recently given the I- and Q-principle a slightly different formulation such that the Gricean maxims can be seen as being part of a two-dimensional optimality theoretic framework of disambigua tion. The I -principle is formulated much like it was above from a maximization point of view, and helps to select the most coherent, or relevant, interpretation. This principle corresponds to the one-direction view on optimality theoretic interpretation as proposed by Hendriks & de Hoop (to appear) and de Hoop & de Swart (2oo1 ), which, exclusively, adopt the hearer's perspective on disambiguation. What is interesting is that Blutner also implements the Q-principle within an Optimality Theoretical frame work, thereby also taking the speaker's perspective into account. Where the I principle compares different possible interpretations for the same syntactic expression, the Q-principle compares different possible syntactic expres sions that the speaker could have used to communicate the same meaning. The interesting feature of Blutner's formulation of the Q-principle within two-dimensional OT is that although it compares alternative syntactic inputs to one another, it still helps to select the optimal meaning among the various possible outputs of the single actual syntactic input given, by acting as a blocking mechanism.3 The strong version of Blutner's two-dimensional OT can be formulated as follows (we here relate pairs (r, m) of possible representations (r) and meanings (m), by means of an ordering relation ' > ', 'being more efficient'):
Paul Dekker and Robert van Rooy
225
assumptions, the speaker knows that only one of three logical possibilities obtains: (i) that D p (and, hence, Op), (ii) that Op !\ <> • p (so •O• p !\ • D p), or (iii) that D -, p (i.e.
3
GAME THE ORY AND STRO N G O PT I MALITY
The ranking and judging of representations and meanings in optimality theoretic interpretation has a structure which resembles principles devel oped in the well-investigated field of Game Theory. In this section we present a game-theoretical formulation of Blutner's notion of optimality. (For an indepth introduction to game theory, c£ e.g. Osborne & Rubinstein 1994.) The first section presents an introduction to some of the basics of Game Theory, in particular to that of a strategic game. In the next subsection we present the notion of a 'Nash Equilibrium', a renown solution concept in Game Theory. In the third subsection we 'then show how optimality theoretic interpretation can be given a formulation in terms of an interpretation game, and that Blutner's concept of optimality corresponds to precisely this concept of a Nash Equilibrium. 3 . 1 A formal
definition ofgames
In Game Theory, a 'strategic game' is the formal rendering of a game that can be played with a specific number of players, who can play various roles in the game. In strategic games it is assumed that the players all make one choice at the beginning of the game. The players (simultaneously) choose a strategy, and then they play the game, each according to the strategy chosen.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Furthermore, there is a common preference to communicate as much as possible, that is, a preference for (i) and (iii) over (ii). In this situation, saying Possibly A ( Op) implicates <> • p. For if the speaker had information to the effect that -, <> -, p = D p, he would have said Necessarily A, which is more informative. As he has not done so, and as long as there is no reason to suppose otherwise, the hearer is entitled to infer <> -, p. So, although the sentence Possibly A is logically consistent with both Necessarily A and Possibly not A, the first is blocked (by the Q-principle), because of the existence of an alternative syntactic form that would express that meaning in a more efficient way.
Bi-Directional Optimality Theory: An Application of Game Theory
226
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It is assumed that the players know what options are available to them and to the other players, and what are the outcomes of the game if they know the actions chosen. A strategic game is formalized as a triple (N, (A ; ) , ( 2;)) which consists of a set of players N, and, for each player i E N, a non-empty set of possible actions A;, and a preference relation 2; over the product XJ ENAJ of possible actions of all players. The intuitive idea behind this definition can be put as follows. Each player i can choose any action from his alternatives A;. If all the players have made their choice, we get what is called an 'action profile'. Intuitively, such a profile is one of the possible courses which a game may take. If our players are I , . . . , n and if they choose actions a, . . . , an E XJ E NAJ then that's one possible 'run' of the game. Players are assumed to choose an action which has a preferred result. Preferences over results are given by the preference relations (2;) which are taken to depend wholly and only on the particular actions which the players may choose. Thus, if the players I , . . . , n choose actions a* = a, , . . . , an, respectively, then the result may be better for one player i than when they , bn. In that case, we find that a* > ; b*, that is, a* 2; b* choose b* = b1 , and not b* 2 ; a* Obviously, it may be the case that a* > ; b* and a* >j b* for two profiles a* and b* and players i and j. (This is the case, typically, when two-players have competing or conflicting interests.) In general it is assumed that preference relations are reflexive, transitive, and complete. It may be clear, even from these introductory comments, that the consequences of a particular choice of player i for action a; generally depend, not only on this particular choice, but also on the choices which the other players make. Thus, if the players I , . . . , n choose the action profile , an, respectively, then player a; may be happy about the a* = a1 , result, but if player i sticks to his choice a;, while the others I , . . . , i - I , i + I , . . . , n happen to choose br , . . . , b;- 1 , bi+r , . . . , bm the result may be less welcome for i, of course. On the other hand, if we may assume that the other players I , . . . , i - I , i + I , . . . , in choose a1 , , a;- 1 , a;+1 , , am respectively, then player i is assumed to choose an action a; such that outcome or profile a* = a , , . . . , an is at least as good , an which may result as any alternative profile a" . . . , a;_ , b;, a;+1 , from an alternative choice of i for b;. A note on notation: if we have a profile a* = a, , . . . , an, then we use a':_ ; to indicate the list of profile's strategies of all players except i-i.e. a , , . . . , a;_" a;+" . . . , an-and we use (a':_ ; , b;) to indicate the profile which is like a* with the sole difference that i chooses b; in stead of a;. Typically, of course, a* = (a':_ ; , a;). In order to clarify these notions a bit more, consider the following somewhat stylized example. A famous two-player game is a 'coordination
Paul Dekker and Robert van Rooy
227
game' called 'Bach or Stravinsky'.4 In this game two persons want to go out. They can choose between the performance of a concert of Bach and the performance of a concert of Stravinsky. One player (Bonnie) prefers to go to Bach, the other (Clyde) prefers Stravinsky, but the main concern of both players is to go out together. Formally, this corresponds to a game (N, (A ; ) , {;:::: ; ) ) , where (3) the set of players N = {b, c} consists of Bonnie and Clyde (4) the set of possible actions of Bonnie and Clyde Ab = Ac = {B, S} consist of (a choice for) Bach and Stravinsky
(s) (B, B) >b (s , s) >b (B, s) >b (s, B) (6) (S, S ) > c (B, B) >c (B, S) >c (S, B)
A convenient representation of two-player games can be given in a two dimensional matrix, in which the various rows represent the possible actions of player one (Bonnie) and the columns the possible actions of player two (Clyde):
B
S
(7) B
(3, 2) ( I , I )
S
(o, o) (2 , 3 )
I n this matrix, we have filled in payoff pairs ( n , m ) which indicate the relative payoff of a specific action profile (x, y) for Bonnie and Clyde, respectively. Thus, the pair ( 3 , 2 ) indicates the relative payoff of Bonnie ( 3 ) and Clyde ( 2 ) when Bonnie and Clyde both choose Bach. For Bonnie this constitutes a better payoff then the one in which both choose Stravinsky, because in that case we find a relative payoff pair (2 , 3 ) where Bonnie's payoff ( 2) is less than 3 . For a similar reason, the last profile is better for Clyde, because he prefers a joint choice for Stravinsky over a joint choice for Bach. However, both of these profiles are better than the two in which 4
Originally known
as
'The Battle of the Sexes'.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The profiles of this game are (B, B), (B, S) , (S , B) , and (S, S), where (x, y) indicates the profile which obtains when Bonnie chooses x and Clyde chooses y. Since Bonnie and Clyde definitely prefer to go out together, they both prefer (B, B) and (S, S) over the two other profiles (B, S) and (S, B). Since Bonnie moreover prefers Bach, she also prefers (B, B) over (S, S) and (B, S) over (S, B). Similarly, Clyde prefers (S, S) over (B, B), and (B, S) over ( S, B). The preferences of Bonnie and Clyde, > b and > n can thus be summarized as follows:
228
Bi-Directional Optimality Theory:
An
Application of Game Theory
they do not go out together, and in which they at best reach a payoff of only one. 3 .2
Nash equilibria as solutions
(8) Vi E N and a; E A;: a*
?:;
(a:_ ; , a;)
Intuitively, this says the following. A Nash Equilibrium is a profile in which each player's action is a best response to the choices of the other players in that profile. For no player i is there any alternative a; for the action a: which he chooses in a*, by means of which she can get a better payoff, given that all the other players choose as they choose in a*. A Nash Equilibrium clearly need not give the best possible result which one player might prefer. A player gets the best payoff relative to the choices of the other players in the profile, and this really is an equilibrium because this holds for all players. If we now return to the example which we discussed above we can see that it has two Nash Equilibria, the ones in which both Bonnie and Clyde choose Bach, and the one in which both choose Stravinsky. It is expedient to see why these profiles qualify as equilibria. The profile (B, B) is a Nash Equilibrium because, given that Bonnie chooses Bach, the best possible outcome for Clyde obtains when he chooses Bach as well (since (B, B) >c (B, S)), while given that Clyde chooses Bach, Bach is also the very best choice for Bonnie (since (B, B) > b (S, B)). Something analogous holds of the (S, S) equilibrium. In both profiles, none of the two-players has reason to deviate from the choice he actually makes. Surely, when Bonnie considers the Nash Equilibrium (S, S) she might reason as follows: 'well, I better choose Bach rather than Stravinsky, because given that choice, it is better for Clyde to choose Bach as well, and I like (B, B) better than (S, S)' and therefore choose Bach after all. However, this type of reasoning does not by itself constitute a sound solution concept, because if Clyde also reasons this way , he will choose Stravinsky, and the outcome is (B, S), a profile that is worse, for both Bonnie and Clyde, than the outcome of each of the two mentioned equilibria. The nice point about the two Nash Equilibria in the Bach or Stravinsky game is that the two equilibria are not
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
One of the central notions in game theory is that of a solution concept. In general, solution concepts are abstract and formal specifications of certain optimality recipes. They relate to the reasonable choices which players may make, given some notion of rationality and common knowledge. A very well known solution concept is that of a 'Nash Equilibrium'. A Nash Equilibrium of a strategic game (N, (A; ) , (:2:;) ) is an action profile a* E xJ E N AJ such that:
Paul Dekker and Robert van Rooy
229
absolutely optimal profiles for both players, but optimal profiles relative to the other's choices. Both equilibria are satisfying for both players in this sense, or 'stable'. In the definition of a Nash equilibrium, the only preferences that really count are those between two action profiles a* and b* if their only difference lies in the choice of i, i.e. if a:_i b:_;· Furthermore, non-strict preferences, where both a* ?.; b* and b* ?.; a*, do not count either. (In a Nash Equilibrium, players may have alternative options which are equally good, as long as they are not strictly better.) For this reason, Nash Equilibria in two-player games can be visualized by drawing arrows between two profiles on the same row, or in the same column, with the following meaning: <--- means 'player 2 strictly prefers the left profile,' -+ means 'player 2 strictly prefers the right profile,' j means 'player I strictly prefers the top profile,' and ! means 'player I strictly prefers the bottom profile.' The Bach or Stravinsky game then boils down to the following table: =
B
S
" It - 1 I s
-- 0
If in such a table no arrow leaves from a certain cell, then the corresponding profile is a Nash Equilibrium, here indicated by o. This diagram clearly shows the dependence of the two preferences of each player upon the possible choices of the other. Player I (Bonnie) has j in case Clyde chooses Bach, and ! if Clyde chooses Stravinsky. Similarly, Clyde's preferences ( <--- and -+) vary with the possible choices of Bonnie (Bach and Stravinsky, respectively).
3·3
Interpretation games
From these introductory remarks the reader may already feel some connection between the notion of a solution concept and that of optimality. Both rely on a notion of 'better then' and both acknowledge a form of non perfect optimality. Actually, we can formulate the optimality theoretic interpretation as an interpretation game. An interpretation game is played between two-players, an (abstract) speaker (S) and an (abstract) hearer (H). On the one hand, the speaker wants to communicate a certain meaning and she has to choose a suitable formulation for it; on the other, the hearer gets confronted with a certain formulation, and he has to assign it a suitable interpretation. Thus, the speaker's possible actions are given by the set of possible representations, the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(9)
230
Bi-Directional Optimality Theory: An Application of Game Theory
( 1 0)
b· t ·s · o -i S
B
-
Trivially, this interpretation game of the Bach or Stravinsky variety has two Nash Equilibria, which also constitute two optimal interpretations ('b', B) and ('s', S). For given that S wants to refer to B, he had better use 'b' and given that H hears 'b', the interpretation better be B. Similarly, for the profile or interpretation ('s', S). As trivial as the example may be, it certainly shows the parallel in the type of reasoning involved in the determination of optimality as an equilibrium. Let us now turn to two more interesting examples reminiscent of one we discussed above, viz. ( I): (I I ) Bill loves himself ( 1 2) Bill loves him.
In a matrix, the interpretation of the two sentences can be rendered as follows: (I 3)
Lbb Lbx 'self' �
'him'�
There are two possible representations, 'self', which is short for ( 1 1 ) and 'him', short for ( 1 2) . Assuming that these are evaluated in a context where Bill is salient already, there are two possible interpretations: that Bill loves himself (Lbb) and that Bill loves someone else (Lbx), a person who
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
hearer's actions are given by the set of possible meanings, and the profiles are pairs of representations and possible meanings. Optimality theoretic preferences < next can be used to define preference relations > s and > H over these pairs, and given these preference relations, some pairs of representation and meaning come out as optimal. Finally, when we evaluate for optimality, we always look along one dimension at a time. An optimal profile is one for which no player has a strictly better alternative, given that the other dimension remains fixed. By way of illustration, consider a very simple and stylized example. Suppose that we have two names, 'Bach' and 'Stravinsky', or 'b' and 's', for short, and two possible referents, Bach (B) and Stravinsky (S). Suppose that we also have two semantic constraints, according to which 'b' preferably refers to B, and 's' to S. This game can be displayed as follows:
Paul Dekker and Robert van Rooy
23 1
presumably is to be found in the context. The arrows indicate the preferences resulting from principle (B) and (DOAP):5 If two arguments of the same semantic relation are not marked as being identical, interpret them as being distinct (DOAP) Don't Overlook Anaphoric Possibilities
(B)
we said, it is assumed that (B) is stronger than (DOAP). Given this, the profile ('him', Lbb) is ruled out by ('self', Lbb) because it violates (B) and there is a better alternative, and this is indicated by j. Similarly, and as ,..._ indicates, ('self', Lbx) is ruled out by ('self', Lbb), because it violates (DOAP). Finally, although, ('him', Lbx) violates DOAP, it is better than ('him', Lbb), since the latter violates (B), which is judged a stronger constraint. As the pictures shows, the matrix has two Nash Equilibria, ('self', Lbb) and ('him', Lbx), precisely the two representation meaning pairs argued for.6 Before we carry on, it is expedient to inspect some general properties of interpretation games. It is easily seen that the following holds: As
•
1
(Optimality Subsumes Nash)
a profile is strongly optimal if and only if it is a Nash Equilibrium
The next observation relies on the assumption that the ordering relation > is well founded, an assumption enforced by Jager's requirement that it is (cf below): Observation •
2
(no Nash, no optimality)
every interpretation game has a Nash Equilibrium
Proof Given that > is well-founded there is at least one ( r , m ) such that there is no ( r' , m1 ) < ( r , m ) ; a forteriori, there is no ( r' , m ) or ( r , m' ) such that ( r' , m ) >s ( r, m ) or ( r , m ' ) >H ( r , m ) , so ( r , m ) is a Nash Equilibrium. End of Proof The last observation is of interest from a linguistic perspective. In Game Theory, the absence of Nash Equilibria is not at all unusual, for instance in the case of zero-sum games like 'Heads or Tails', which can be displayed as follows: 5 The arrows in these matrices thus do not show the rankings of the constraints themselves, but the effects of their rankings on the preferences of the speaker and the hearer, respectively. As will be shown in more detail below, different constraints and different rankings may eventually yield the same preferences for speaker and hearer. 6 We thank Reinhart Blutner for pointing out a flaw in an earlier presentation we gave of de Hoop's analysis.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Observation
2 3 2 Bi-Directional Optimality Theory: An Application of Game Theory
H
T
IJ!l H
(o , 1 ) ( 1 , o) ( 1 , o) (o , 1 )
or
H
T
T�
4 GAMES AND WEAK O PT I MALITY We have seen above that Blutner's strong version of two-dimensional OT can be neatly formulated using the game-theoretical concept of a Nash Equilibrium. However, Blutner (1 998), and subsequently Jager (1 999) and Zeevat ( 1 999), have employed a 'weak' notion of optimality which is more subtle than the one we discussed in section 2. In this section we discuss this refinement, and show that it also can be given a very intuitive Game Theoretical formulation. 4. 1
Blutnerljager optimality
In his (1 998) paper, Blutner argues that the strong notion of optimality presented in section 2 is not entirely satisfactory. This notion does not enable us to account for Horn's ( r984) division of pragmatic labour, the intuition that unmarked forms tend to be used for unmarked situations and marked forms for marked situations. To account for cases where Horn's division ofpragmatic labour is relevant, Blutner (1998) then proposes a weak version of two-dimensional OT,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Well-foundedness of > means that we are dealing with a particular type of game, in which solutions are guaranteed to exist. It is easily acknowledged that this makes sense: if an interpretation game were to have no solutions, then communication would be quite a void enterprise indeed. A couple of other more general observations can be made at this point. Of course, an interpretation game would also be void if all profiles were Nash Equilibria. In that case any representation could be associated with any interpretation. With an eye on the use of language in communication, the ideal situation would obtain if the set of solutions is a one-to-one relation between the set of possible representations and the set of possible meanings. Interesting mixed cases can be characterized as well. Ambiguity obtains in situations in which the set of solutions is one-to-many; when the solutions are many-to-one we have synonymy; and when certain possible meanings do not occur in solutions we have expressive incompleteness.
Paul Dekker and Robert van Rooy
233
according to which the two dimensions of optimization are mutually related: ( I S) Two-dimensional OT (Weak Version) a representation-meaning pair ( r, m ) is super-optimal iff it satisfies both the Q- and the I- principle, where: (Q) ( r, m ) satisfies the Q-principle iff there is no other pair ( r', m ) which satisfies the !-principle such that ( r', m ) > ( r, m) (I) ( r, m ) satisfies the !-principle iff there is no other pair ( r, m' ) which satisfies the Q-principle such that ( r, m' ) > ( r, m )
(I6) a representation-meaning pair ( r, m ) is optimal iff (Q) there is no other optimal pair ( r', m ) : ( r', m ) > ( r, m ) (I) there is no other optimal pair ( r, m' ) : ( r, m' ) > ( r, m)
Under the assumption that > is transitive and well-founded, Jager observes (I 7) a representation-meaning pair is optimal in the Jager sense if and only if it is super-optimal in the Blutner sense Jager's assumptions about > can be argued to be pretty harmless. Transitivity, of course, is a very natural property of the 'better than' relation > and well-foundedness is natural, too. The important difference between the weak and strong notions of optimality is that the weak one accepts (super)-optimal representation meanings pairs that would not be optimal according to the strong version. It typically allows marked expressions to have an optimal interpretation, although both the expression and the cases they describe have a more efficient, or more typical, counterpart. Consider, for instance, the following minimal pair discussed by Horn (I984): ( I 8) Lee stopped the car. (I9) Lee made the car stop. The use of unmarked lexical causative stopped in (I 8) has intuitively the result that the sentence will be about an event where the car stopped in the stereotypical way, i.e. where the driver of the car stepped on the brake pedal. This by itself can be explained by means of the strong version of two dimensional OT, and corresponds to a Nash Equilibrium; unmarked is preferred to marked, and stereotypical ways of stopping cars are easier to understand than alternative unusual methods. But the strong version cannot explain why also the marked form, ( I 9), has an interpretation; the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(Notice that this definition employs strict preferences over representation meaning pairs.) A possibly more transparent formulation of super optimality has been proposed by Jager (ms):
2 3 4 Bi-Directional Optimality Theory: An Application of Game Theory
interpretation where the car was stopped in an unusual way (pulling the emergency brake, telekinesis, etc.). It is easy to see, however, that the weak version of two-dimensional OT can explain why (I 9) gets this interpreta tion. The marked form gets the atypical interpretation, because this form meaning pair is optimal: (i) the alternative sentence ( I 8) does not get this atypical interpretation, and (ii) we prefer to refer to the typical situation by using ( I 8) instead of ( I 9) For another example where, because of the division of pragmatic labour, the more specialized, or more complex, form of two in principle co extensive expressions will be associated with the less preferred reading, look at the two following sentences discussed by Levinson (I987): .
Although a full pronoun like 'him' could in principle refer to the same object as the null PRO, the selection of the full pronoun over its empty counterpart in fact signals the absence of the coreferential reading. On the assumption that coreferentiality is the preferred, or typical, option, strong optimality can explain why (2o) gets the coreferential reading. But we need weak optimality to explain why also (2I) gets a reading, namely the less preferred non-coreferential one. The reason is, again, that the preferred coreferential reading is blocked due to the existence of the less lexicalized expression (2o) that could have been used. Before we turn to the game-theoretical formulation of the Blutner/Jiiger notion of (weak) optimality, it is expedient to present Jager's algorithm for computing optimal representation-meaning pairs. The algorithm computes which pairs are optimal and which are blocked, in a recursive manner. It starts off with empty sets OPT and BLO of optimal and blocked pairs and terminates when all pairs are either optimal or blocked. It is convenient to indicate the pairs which have not yet been classified as pairs which are still in the game: GAM = OPT .U BLO. (Thus, at the start of the algorithm, all pairs are in the game; in the end GAM is empty.) The algorithm is defined as follows: (22)
OPT
=
while
0;
=/=
BLO =
GAM
0;
0:
{ (r, m) \l' BLO \ -, :J(r1 , m1) E GAM: (r', m') > (r, m) } ; BLO U { (r, m) �if oPT \ :l(r', m) o r (r, m')
OPT = OPT U BLO
return
=
}
E oPT ;
OPT;
By means of this procedure, first all the strongly optimal representation meaning pairs are selected as oPT; then those pairs are selected as blocked
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(2o) He; wants PRO;, ; to win. (2 I) He; wants him;, ; to win.
Paul Dekker and Robert van Rooy
235
for which there is an optimal alternative along the Q- or I -dimension; then those for which there is no better alternative in the game are selected as opt, etc. When all pairs are thus categorized, the algorithm returns the set of Jager optimal (i.e. Blutner super-optimal) pairs as output. In what follows, these are called BJ-optimal . 4.2 A
game-theoretical definition of BJ-optimality
(RE) a reflexive element is preferable to a pronoun (LA) a syntactic domain must contain a pronoun's antecedent
The effect of these constraints can be modelled by means of the following matrix for the corresponding interpretation game: (2 3)
'se/f"llll Lbb
'him "
Lbx
�
where 'self' is again short for the sentence 'Bill loves himself' and 'him' for 'Bill loves him'. In this case there is a clear preference for using sentence 'self' (the two i's), and a preference for interpreting 'self' and 'him' as Bill (the two +-'s). As can be seen from the diagram, this game has only one Nash Equilibrium: ('self', Lbb), the only profile from which no arrow leaves. However, there is also a BJ-optimal profile ('him', Lbx) which is not a Nash Equilibrium. For, although there are better alternatives ('him', Lbb), and ('self', Lbx), these are themselves both overruled by the alternative ('self', Lbb). In other words, although, ('him', Lbb) >H ('him', Lbx), and ('self', Lbx) > s ('him', Lbx), these preferences do not count because the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We have seen that the notion of strong optimality corresponds to that of a Nash Equilibrium. Now, although weak or BJ -optimality and Nash are also closely related, they are not the same, of course. BJ -optimality is a weaker (or softer) notion so that the set of Nash Equilibria of an interpretation game is or can be a proper subset of the optimal solutions. For instance, for some representation meaning pairs ( r , m) there may be 'better' alternatives ( r' , m) or (r, m'), which however do not qualify as optimal, if there are yet other alternatives ( r' , m') which are. A nice illustration can be given by means of a reanalysis of de Hoop's case of 'self' versus 'him', which is suggested to us by Reinhard Blutner. According to this analysis, there are two constraints at work, an expressive constraint 'referential economy' (RE) and an interpretive constraint 'local antecedent':
236
Bi-Directional Optimality Theory:
An
Application of Game Theory
preferred alternatives are each blocked by the Nash/optimal ('self', Lbb), since ('self', Lbb) >s ('him', Lbb) and ('self', Lbb) >H ('self', Lbx). In the representation of an interpretation game we can visualize this kind of blocking by removing arrows. That is, if a profile points to a Nash Equilibrium, then all pointers to that profile can be removed. If we, thus, remove the arrows pointing to profiles which point to the equilibrium o in the example above, then we get the following, derived game:
(24)
'self'� Lbb
Lbx
I n the resulting interpretation game we find two Nash Equilibria, corresponding to the two BJ -optimal solutions in the original game. This result can be generalized for more involved games with more than two representations and meanings. In such more involved games, the removal of preferences may yield games with new equilibria, and these in their turn may block yet other alternatives. Thus, if we successively keep on removing preferences for blocked profiles, then we collect more and more possible solutions, and if this process reaches a fixed point, then all the resulting Nash Equilibria of the fixed point correspond to the BJ -optimal pairs in the original game. As a matter of fact, such a procedure is the Interpretation Game Theoretical counterpart of Jager's algorithm. Formally, this procedure can be specified as follows. Let I0 be an interpretation game (N, (As, AH ) , ( >s,o , >H ,o) ) , with > i a strict prefer ence relation. Then we define the game In+1 -which is the game In with updated preferences-as follows:
(2 5 ) In+ I = (N, (As , AH) , (>s ,n+ r ' >H ,n+r ) ) with I. >s,n+r = >s,n \ { (y , z ) I ::lx E NEI• : X >H , n y } and 2. >H,n+ r = >H,n \ { (y , z) I ::lx E NEI" : X >s, n y } (In this definition NEI• indicates the set of Nash Equilibria of game In·) If we now construct a sequence of interpretation games I0 , • • • , In , . . . and if we find that In+r = In, then: Observation 3 (B)-solutions are Nash in updated games)
•
the BJ-optimal solutions of I0 are the Nash Equilibria of In
This fact can be proved by comparing the update of preferences with Jager's algorithm for computing optimal solutions. Jager's procedure involves the iterated generation of optimal and blocked profiles. In the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
'him'�
Paul Dekker and Robert van Rooy
237
first run of this procedure, profiles are accepted as optimal that are Nash Equilibria in I0 7 and next those are blocked that have an optimal alternative. It is relatively easily seen that: updates of preferences preserve Nash Equilibria; if an update produces a new Nash Equilibrium, then the same profile was BJ -optimal at earlier stages; if we reach a fixed point Im then all profiles either are a Nash Equilibrium (have no arrow leaving that profile), or are blocked (point at a Nash Equilibrium).
1. 2. 3·
•
•
•
•
•
•
•
•
•
•
7
Since the procedure starts with empty sets of blocked and optimal profiles, the selected optimals ' m ; of course it may be that there is ' such an alternative for a Nash Equilibrium, in case r' =I r and m =I m. However, if ( r, m ) really is a Nash Equilibrium, then it will never get blocked, and as soon as (r', m ' ) is qualified as either optimal or blocked at some stage, then (r, m ) gets accepted as optimal at the next stage. Well-foundedness of Jager's > guarantees this effect. 8 The Game Theoretical formulation of BJ-optimality is close in spirit to von Neumann & Morgenstern (1 944)'s notion of a Stable Set in a coalitional game. Stable Sets are minimal sets of outcomes for which there are no other preferable stable outcomes. Although the concept is framed in terms of outcomes of coalitional games, the idea is clearly similar. C( e.g. Osborne & Rubinstein (1 994: 278ff) for more discussion.
(
r, m
) are those for which there is no preferred alternative (r', )
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Here we witness one merit of viewing optimality theoretic interpretation in terms of (interpretation) games: BJ -optimal solutions can be characterized by means of the independently motivated and well-studied notion of a Nash Equilibrium.8 The update procedure defined above can be illustrated by means of a somewhat artificial but illuminating example. Suppose the possible repre sentations are linearly ordered, so that we can number them: r0, rr , , and that the possible meanings are linearly ordered, too: m0, m 1 , . . . . In this game Io there is one Nash Equilibrium, which is (r0, m0). If we update the preferences in this game, then all H's preferences for ( r 1 , m0) , ( r2 , m0) , are removed, because (r0, m0) is a better Nash Equilibrium for S, and S's are removed because (r0, m0) is a preferences for (r0, m1 ) , (r0, m2) , better Nash Equilibrium for H. Thus, in Iu profile h , m1 ) comes out as Nash Equilibrium as well, because the preferences for (rr , m0) and (r0, m1 ) have been removed. But then we can update again, and remove all H's preferences for (r2, m1 ) , (r3 , m1 ) , . . . and S's preferences for (r1 , m2 ) , (r1 , mJ , . . . . Thus, in I2, profile (r2 , m2 ) comes out as Nash Equilibrium as well. In short, we will find that in game In we have Nash Equilibria (r;, m;) for all i ::; n, so that we construct the diagonal as the solution of I0 The last example also constitutes inspiration for the following proposition:
2 3 8 Bi-Directional Optimality Theory: An Application of Game Theory
Observation 4 (linearizing unambiguous interpretation games)
o
if the set of solutions of an interpretation game is a one-to-one relation between representations and meanings, then the preferences in the game can be equivalently stated by means of a linear order of representations and meanings
4· 3
On two
X
two interpretation games
In this section we give a systematic study of two X two interpretation games, that is games with four profiles. If we thus restrict our attention, we can in principle distinguish seven possible types: one in which there is no solution, one in which there is one solution, one in which there are four, one in which there are three, and three in which there are two:
(26) D D D D D D D (All other types are logical permutations of these types of games.) As we already observed above the first case is excluded by Jager's well-foundedness of > and the second two are void. A three-solutions game is in a sense a combination of the first two two-solutions games. The first two-solutions game models ambiguity, the second synonymy and (expressive) incomplete ness, and the last is the (ideal) diagonal type. It is interesting to note that the last type of interpretation can again be obtained in a variety of ways. All of the following matrices have the diagonal as a solution:
DD
(Besides, any matrices that is a mirror of these matrices along one of the two diagonals yields the same result as well.) In all matrices (and their mirror images) except (the mirror-images of the) first one, one solution is not Nash, that is in these cases the BJ-optimality of that profile is obtained by
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Proo£ If the solutions constitute such a one-to-one relation, and if we order the solutions, then we can identify the i-st representation r; with the representation in the i-st solution, and the i-st meaning with the meaning in the i-st solution; then we can take H's preferences to be defined by precedence in the sequence of meanings, and S's preferences by precedence in the sequence of representations, and the resulting set of solutions is the diagonal, the set of solutions we started out with. End of Proo£
Paul
Dekker
and
Robert
van
Rooy
239
blocked preferences. This is interesting, because it shows that one and the same result can be obtained by a variety of preferences. However, this does not mean that any statement of preferences, which gives the right results, is equally good. In order to appreciate this point, consider the pair of examples discussed in Hendriks & de Hoop (2oor), under the analysis suggested by Blutner:
( r ) Often when I talk to a doctor;, the doctoq ;, J} disagrees with him { i, j} ·
(28) Often when I talk to a doctor;, the doctoq ;, J} disagrees with himself{ i, J} ·
(RE) a reflexive element is preferable to a pronoun (LA) a syntactic domain must contain a pronoun's antecedent The relevant preferences are displayed in the following diagram:
o t t { i, j}
(i, i)
'the doctor-self ' th e dactor-h tm " '
.__
o
This is a diagram of the third diagonal type, in which ('the doctor him', {i, j}) is a BJ-optimal solution because the (LA)-preference for ('the doctor-him', (i, i)) is blocked by the (RE)-preference of ('the doctor self', (i, i)) over this alternative, and because the (RE)-preference for ('the doctor-self', {i, j}) is blocked by the (LA)-preference of ('the doctor self', (i, i)) over this alternative. However, as we argued, we could have obtained the very same result if the preferences were spelled out, alternatively, as indicated by the following diagram: (i, i)
'the doctor-self 'the doctor-h im '
{ i, )}
� �
In this diagram, we have encoded the effect of the converse of the principles (RE) and (LA), and we have obtained a mirror image of the original matrix.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
A BJ-optimal interpretation of example ( r ) is one in which the indices on the noun phrases 'the doctor' and 'him' are different, so that either 'the doctor' or 'him' is interpreted as anaphoric upon 'a doctor', not both. An optimal interpretation of example (28) is one in which both 'the doctor' and 'himself' are interpreted as anaphoric upon 'a doctor'. These results can be obtained by the joint effect of the two constraints (RE) and (LA), which we repeat here for convenience:
240 Bi-Directional Optimality Theory: An Application of Game Theory
This time the solution ('the doctor-him', {i, j}) is optimal (Nash), and the interpretation of ('the doctor-self', (i, i)) turns out BJ-optimal, but the resulting BJ -optimal pairs are the same. Does this mean that we can get away with using the converses of any two or more principles? Certainly not. This can be appreciated when we look at a more general case, where we take more possibilities ((j, j), and {j, k}) into account:
(3 1 )
''he Joc
(i, i)
+--+---
{i. j}
t
0
+--+---
(j. j)
i
+--+---
{), k}
t
I
With the principles (RE) and (LA) we get the right solutions ('the doctor self', (i, i)) and ('the doctor-him', {i, j} ). If, instead, we had adopted their counterintuitive converses, the solutions would have been, incorrectly, ('the doctor-self', (j, j)) and ('the doctor-him', {j, k} ) This exercise thus shows that not any way of getting certain interpretation results is fine. It also shows that one should be careful with the notion of (BJ-)optimality, or that of a solution in interpretation games. Optimal profiles can get blocked if more options get considered (and if more constraints are involved). .
s
PRO SPECTS AND C O N C L U S I O N S
I n this paper we have pointed out some parallelisms between some notions studied in Optimality Theory and in Game Theory. Optimality theoretic interpretation can be modelled in terms of an interpretation game, and both Blutner's notion of (strong) optimality, as well as the Blutner/Jiger notion of (weak) optimality, can be defined as a Nash Equilibrium of the interpretation game, or of an update of it. We have restricted ourselves here in two respects. Of the various types of games studied in Game Theory we have studied only one, and we have concentrated upon only one type of solution concept. The natural question that arises is whether optimality theoretic interpretation would not gain if we employed other kinds of games (extensive, instead of strategic, games; games with imperfect, rather than complete, information) and other solution concepts. In this respect we must mention Parikh ( r 99 r ), who applies Game Theory to an analysis of the process of disambiguation, and who employs extensive cooperative game with partial information. It remains an open question how Parikh's approach relates to the one discussed in this paper.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
��
Paul Dekker and Robert van Rooy
24 I
Acknowledgements Earlier versions of this paper were presented at the Optimality Theoretic Semantics Meeting at the Utrecht Institute of Linguistics OTS in January 2000, and at the Ninth CSLI Workshop on Logic, Language and Computation, Stanford, May 2000. fu well as the audiences on these occasions, the first author wishes to thank the organizers of the DIP Colloquium in Amsterdam for providing a platform for Reinhard Blumer, Petra Hendriks, Helen de Hoop, Gerhard Jager, and Henriette de Swart to present their views upon optimality theoretic interpretation in Amsterdam; the second author wishes to thank the above-mentioned for presenting their views, and the first author for being a cooperative player. We both thank Maria Aloni and Marie Nilsenova for additional comments. The first author is financially supported by a fellowship from the Royal Netherlands Academy of Arts and Sciences (KNAW), and the second by the Dutch Organization for Scientific Research (NWO), which are gratefully acknowledged. PAUL DEKKER and ROBERT
VAN ROOY
Received:
ILLC/University of Amsterdam Department of Philosophy Nieuwe Doelenstraat 15 1 0 1 2 CP Amsterdam dekker,
[email protected]
Final version received:
05.04.2000 28.o8.2ooo
RE FERE N C E S Asher, Nicholas & Lascarides, Alex (1993), 'Temporal interpretation, discourse rela tions and commonsense entailment', 16, 437-93. & Levinson, Stephen C. ( 1 98 I ),
Linguistics and Philosophy, Atlas, Jay D.
'It-clefts, informativeness and logical form', in P. Cole (ed.), Radical Pragmatics, AP, New York. Boersma, Paul logy:
( I 998),
formalizing
'Functional phono the interactions
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Another restriction is that we have concentrated mainly on the formal parallelism between optimality games. However, the parallel with the work of Parikh, and the intuitions behind the Q- and !-principles, suggest that the parallelism goes deeper. Optimality crucially involves both the speaker and the hearer, conceived of as rational agents with possibly opposing preferences. An optimal interpretation of a sentence can thus be seen as the result of (hypothetical) negotiation between two-players who, with their particular beliefs and desires, engage in a communication game. Here lies an interesting parallel with the approach advocated in Merin (r997). Merin construes verbal interaction as a game in which speaker and hearer have strictly opposing preferences. It would be interesting to see if this can be given an optimality-style formulation. Mter all, in strictly competitive games the players' strategies are also guided by the intended optimization of the results.
242
Bi-Directional Optimality Theory: An Application of Game Theory
between articulary and perceptual drives', Ph.D. thesis, University of
Logic,
8,
3 39-59·
Merin, Arthur ( I 997), 'Information, relevance, and social decisionmaking: some principles and results of Deci sion-Theoretic Semantics', in L. Moss, J. Ginzburg, & M. de Rijke (eds), Logic, Language, and Computation, Vol. 2, CSLI, Stanford. Neumann, John von & Morgenstern, Oskar
( I 944), Theory of Games and Economic Behavior , John Wiley & Sons, New York.
& Rubinstein, Ariel (I 994). A Course in Game Theory, MIT Press, Cambridge, MA Parikh, Prashant ( I 99 I ), 'Communication Osborne, Martin J.
and strategic inference',
Philosophy, 14, 473-5 1 4. Prince, Alan
Linguistics and
& Smolensky, Paul
(to appear),
Optimality Theory: Constraint Interaction in Generative Grammar, MIT Press,
Computational Linguistics, 21, 203-25.
Cambridge, MA Sandt, Rob van der ( I 992), 'Presupposition projection as anaphora resolution',
guistics and Philosophy, 24, r-p.
Sanford, Tony
Hendriks, Petra & Hoop, Helen de (2 oo r ), 'Optimality theoretic semantics', Lin Hoop, Helen de & Swart, Henriette de (to appear), 'Temporal adj unct clauses in optimality theory', Rivista di Linguistica. Horn, Laurence R ( 1984), 'Towards a new taxonomy for pragmatic inference: Q-based and R-based implicatures', in D. Schiffrin (ed.), Meaning, Form, and Use
in Context, Georgetown University Press, Washington, I I -42. Jager, Gerhard ( I 999), 'Optimal syntax and optimal semantics', handout for talk at DIP-colloquium, 1 999. Kratzer, Angelika ( I 977 ), 'What 'Must' and 'Can' must and can mean', Linguistics and I,
3 37-75. Stephen C. ( I987),
Philosophy,
Levinson, 'Minimization and conventional inference', in J. Verschueren & M. Berrucelli-Papi (eds), The Pragmatic Perspective, John Benjamins, Amsterdam, 6 1 - I 29.
.
Journal of Semantics,
&
9,
3 3 3-77.
Garrod, Simon
( r 9 8 I ),
Understanding Written Language, John
Wiley & Sons, Chichester, UK. Sidner, Candy L. ( I983 ). 'Focusing in the comprehension of definite anaphora', in M. Brady & R C. Berwick (eds), Compu tational Models of Discourse, MIT Press,
Cambridge, MA 267-3 3 0. Sperber, Dan & Wilson, Deirdre ,
( I 986),
Relevance: Communication and Cognition, Blackwell, Oxford. Tesar, Bruce & Smolensky, Paul (to appear), 'Learnability in Optimality Theory',
Linguistic Inquiry. Zeevat,
Henk
( I 999),
'Explaining
pre
supposition triggers', in P. Dekker (ed.),
Proceedings of the Twe!fih Amsterdam Colloquium, ILLC, Amsterdam, I9-24. Zipf, George Kingsley ( 1949), Human Beha viour and the Principle of Least Effort, Addison-Wesley, Cambridge,
MA
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Amsterdam. Blutner, Reinhard ( 1 998 ), 'Lexical prag matics', journal of Semantics, 1 5, I I 5-62. Blutner, Reinhard ( I 999). 'Some aspects of optimality in narural language inter pretation', in H. de Hoop & H. de Swart (eds), OTS2• Papers on Optimality Theoretic Semantics, Utrecht Institute of Linguistics OTS, Utrecht. Blutner, Reinhard & Jager, Gerhard ( 1 999), 'Competition and interpretation: the German adverbs of repetition', MS, Humboldt University Berlin. Gazdar, Gerald ( 1 979), Pragmatics, AP, New York. Grice, Paul ( I 975 ), 'Logic and conversation', in P. Cole & J. L. Morgan (eds), Syntax and Semantics, 3: Speech Acts, AP, New York. Grosz, Barbara J., Joshi, Aravind K. & Weinstein, Scott ( r 99 5 ), 'Centering',
Lewis, David ( 1 979), 'Scorekeeping in a language game', Journal of Philosophical
Journal ofSemantics
1 7:
243-262
© Oxford University Press 2000
The Asymmetry of Optimality Theoretic Syntax and Semantics
University of Amsterdam
HENK ZEE VAT
Abstract
closely related to existing ideas about natural language interpretation. The paper argues for the priority of the direction from content to form, develops the pragmatic component, and argues for the bidirectionality of the pragmatic component on the basis of Grice's principle of cooperation. It applies the resulting theory to a small set of relevant examples. The asymmetry in the title is consistent with, but goes beyond, the asymmetry between syntax and semantics used in Smolensky ( 1 996).
r
O T SE MA N T I C S AND SYNTAX
Optimality theoretic syntax (OT syntax) is the proposal to think of the knowledge of natural language syntactic structures as an ordered sequence of constraints that decide which are the best candidate sentences for expressing some given content 1 (the input). Optimal candidates are the ones that do better on the ordered constraints than all the other competing candidates. s l is a better candidate than s2 if there is a strongest constraint C such that S1 and S2 do equally well on the constraints that are stronger than C but S1 does better on C itsel£ Moreover, OT syntax makes the following assumptions. First, the set of constraints is the same for all languages, but languages differ in the ordering of the constraints. Second, constraint satisfaction is scored discretely. Both of these assumptions can be given up in principle without changing the essence of the theory as a descriptive device for a particular language, but they have an important methodological value since the first assumption militates against language 1 Though this plays only a minor role in the argument, I wish to make clear my assumption that content is a semantic representation in some suitable logical formalism against the background of discourse context representing the common ground and the current discourse situation. The semantic features referred to by the constraints can therefore equally well be properties that the object identifiers have in virtue of their role within the discourse context. This goes against some proposals for the input, which favour underspecified representations or even quasi-syntactic inputs.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This paper argues for a combination of semantics and syntax in an optimality theoretic framework that avoids the rat/rad problem and provides simultaneously a certain amount of bidirectionality, in the spirit of Blutner, for an approach to ineffability. It can be succinctly described as taking the program of optimality theoretic syntax as basic, also as a theory of interpretation, and extending it with a bidirectional pragmatic component that is
244
The Asymmetry of Optimality Theoretic Syntax and Semantics
particular constraints and the second keeps the theory formally simpler. Though there is as yet no consensus about a particular set of constraints for syntax, there is a lot of promising work going on in the area, like e.g. Grimshaw (I997), Choi (I998), and Bresnan (MS A and MS B). OT syntax suffers from a problem. The prediction-which arises from the formal conception itself-is that for any input there is a set of optimal candidates, i.e. any content can be expressed. This prediction is easily refuted by showing that some sentences are untranslatable. For example (I)
(I) Who ate what? (2) *Chi ha mangiato che cosa? It is a natural assumption that the input of the English sentence is also available to Italian language users. Yet there does not seem to be an Italian form (except complicated paraphrases) that expresses this input. This problem is known as the ineffability problem. Contrary to what OT syntax predicts, not everything can be said in any language. The same problem has been noticed by Pesetsky (I997) using ungrammatical sentences that do not allow repair. Optimality Theoretic Semantics (OT semantics) is a more recent enterprise in which the traditional methods for natural language interpreta tion are replaced by systems of ordered constraints. Given the problems that natural language semantics faces, this is a natural and wise move and has led to interesting approaches to when-sentences (de Hoop & de Swart 2ooo) and to presupposition (Blutner 2ooo). But there is a natural question to ask about the enterprise as such. If there is an OT semantics, how is it related to OT syntax? It is clear that we do not want a conflict: the OT semantics should not assign an optimal interpretation to a sentence for which the sentence is not optimal according to OT syntax. And also we do not want the OT syntax to assign a sentence to the input that does not have the input as an optimal interpretation. The problem is that both OT syntax and OT semantics are complete theories about the relation between form and content and it would therefore seem that they cannot be independent of each other. Blutner has pioneered a first version of bidirectional OT which over comes these problems. In his conception of superoptimality there is a single ordered set of constraints that regulates the relation between form and content. But the constraints are used twice: a pair ( Form, Content ) is superoptimal iff there is no better pair ( Form1 , Content ) and no better pair (Form, Content1 ). In weak superoptimality-the notion he really favours-we find also some recursion: A pair (Form, Content ) is weakly
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
is a proper English sentence but does not have an Italian translation, like (2).
Henk Zeevat
245
( 3 ) a. Wie slaat Hans?
a'. Who beats Hans? b. Wie slaat Hans? b'. Whom does Hans beat? Superoptimality would predict not just that reading (a') is preferred but that it is the only reading, of course under the assumption of the analysis in 2
I want to remain strictly uncommitted to any syntactic analysis in this paper. Not in life.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
superoptimal iff there are no weakly superoptimal better pam (Form1 , Content ) or (Form, Content1 ) . Both of these notions are highly interesting and lead to important results, like a solution of the ineffability problem and treatments of presupposition and lexical semantics. But superoptimality labours from its essentially symmetric character. One prediction that can be derived from weak superoptimality is that both synonymy and ambiguity are dying phenom ena in natural languages: they tend to disappear. Now it is true that synonymy is not a stable phenomenon. It is a linguistic common place that 'real' synonymy does not exist. Though debatable, the point about synonymy can certainly be defended and it seems the sort of fact that needs explanation in the kind of theory that we are discussing. But ambiguity seems ever on the increase. It is the major problem for computational linguistics and a remarkable ubiquitous and robust phe nomenon. Moreover, it increases whenever a language loses phonological, morphological or configurational properties, i.e. almost whenever language change occurs. The OT literature also contains a formal argument against the symmetric view: the rat/rad problem. The Dutch word rat (meaning rat) is homophonous with the Dutch word rad (meaning wheel) in its singular form. The pronunciation of rad (but not rat) is derived by a faithfulness violation: the underlying feature +voiced is lost at the end of Dutch words. In a treatment like Blutner's, this has consequences for the interpretation of the sound /rat j. If it is interpreted as wheel there is a better form content pair, namely ( /rat/, rat). According to both notions of superoptimality, this means that ( /rat/, wheel) is thrown out of the competition, not just in interpretation but also in generation. The rat/rad problem is a simple phonological problem, but it would arise in any ambiguity where in one of pairs (Form, Content1 ) , (Form, ContentJ , Form is in one case derived by more serious syntactic constraint violations than the other. A simple case is perhaps2 ( 3 ) assuming that (b) involves two violations of the constraint STAY enforcing constituents to stay in their canonical position rather than (at most) one as in (a).
246
The Asymmetry of Optimality Theoretic Syntax and Semantics
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
terms of STAY. This does not match the facts of Dutch. There is a preference for reading (a'), but the other is also available. More serious than this particular example is the fact that given any particular syntactic system of constraints, examples of this kind can be found at will. This paper is an attempt to develop a competing theory of the combination of syntax and semantics in optimality theory, which maintains as much as possible of the insights of Blutner, while avoiding the problems. It moreover aims at being a naturalistic theory of these matters, i.e. a theory that can be interpreted as constraining actual processes of language production and interpretation. It has been my view for a long time that the asymmetry between speaking and listening should be taken more seriously than theories generally do. Different parts of the body are involved and there can be vast differences between what people can say and what they can understand. Moreover listening and speaking differ in their very nature. Speaking is an active process in which the speaker has control, whereas listening is essentially a passive activity, in which the listener tries to make the most of the signal she receives. Equally important is the naturalistic character of an optimality theoretic account of speaking or understanding. OT was inspired by the consideration of processes in the brain and still derives much of its psychological plausibility from its interpretation as a theory about brain processes. A theory of the relation between form and content should therefore primarily be a theory of speaking and understanding, as these are the processes in which the brain uses the constraints. According to Smolensky ( 1 996), the naturalistic interpretation still does not give a theory of the actual processes in performance (which would involve other mechanisms as well) but only a description of the grammatical norm. Therefore, naturalism here only means that we can think of the theory as a part of an overall account of the actual production and understanding mechanisms. In the next two sections, the paper explores some general reasons for assuming that OT syntax is the basic theory. They are far from being conclusive arguments but they make that view plausible. That an OT syntax is needed at all (but possibly in conjunction with OT semantics) follows from the phenomenon of semantic blocking. For semantic blocking, see Zeevat (2ooo) and Bresnan (MS A). Section 4 tries to make it clear that interpretation cannot be handled by OT syntax on its own, because certain necessary constraints do not allow a proper reformulation in OT syntax. A minimal system of interpretation constraints that cannot be reduced to syntactic constraints is developed and defended. Sections 5 and 6 discuss the way in which syntax and semantics are connected. Section 6 applies the resulting theory to some key problems.
Henk Zeevat 2
247
C H ICKEN O R E G G
3 Work on visual languages, especially Marriott & Meyer (1998 ) shows conclusively that going from the discrete to simple graphical diagrams leads to an immediate explosion of computational complexity. A tiger has to make predictions about the behaviour of its prey, birds need to orient themselves in their treks and all these tasks, are seriously more complex than context-free parsing. Recursive structure, e.g. squares within squares, arises as much in the visual field as in natural languages.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
What did evolution achieve when it created language? I think the right answer is the creation of a system of forms in which contents can be coded. Though the creation of the forms doubtlessly helped extend the richness of the contents that can be expressed by means of them, nothing suggests that the everyday thoughts we have and that we routinely transmit to our fellow humans are that different from the thoughts of somebody who lacks language or even from the thoughts of our closer biological cousins. Mter all, our basic drives are the same and so is the information we gather in order to satisfy these drives. The wrong answer is surely that evolution created a stronger power of understanding that allows us to make sense of the complex contents expressed by the forms found in natural languages. This is the wrong answer if we assume that the new power of understanding is prior to or independent of the creation of the system of forms. I do not think the system of understanding had to adapt very much. Already before language evolved, it was possible to interpret the behaviour of other humans and of animals and to interpret the environments. These are the hard problems, not language understanding.3 Understanding limits the diversification of the production of acoustic signals: if a differentiation cannot be cashed in by a corresponding differentiation in understanding, it is not functional and will not become part of the language. The development of language use therefore can not be understood in isolation from the process of decoding the language tokens. But the biological achievement is the differentiation of the acoustic signals, which in combination with the recognitional and understanding capacities of the producing organisms use the differentiation to a biological advantage. Nothing rules out that the understanding powers grow as a consequence of the development of language, and that this growth then allows further differentiation in understanding for which new forms are developed. But the initiative is on the side of the forms. This can be underpinned to some extent by physiological considerations. Whereas the ear is largely what it was before language as we know it, there are physiological changes in the larynx and in the way it is used. The point of these remarks is that, as linguists interested in the nature of language, we should be primarily concerned with the production of
248
The Asymmetry of Optimality Theoretic Syntax and Semantics
3
C O N FLIC T I N PRODU CT I O N AND UNDERSTAN D I N G
Following Boersma (1998), we can make the following observations. As in the production of speech, the production of sentences stands under two opposing principles. The first principle (expressiveness) is that the receiver of the sentence should be able to take out the message that the speaker has coded into the sentence. That is after all the purpose of language use. This goal is served by marking every semantically relevant property of the input by some syntactic feature, such as morphology, word order, lexical items, etc. At the same time, the speaker stands under a principle of minimal effort. There is no point in marking a feature that is inferable and often the available means of marking will be conflicting. The requirements conflict and the optimal realisation is a particular way of solving the conflict. The OT syntactic constraints reflect economy and expressiveness requirements and their ordering is the standard conflict resolution mechanism adopted by a language. It is not clear that in interpretation the same conflict between different interests of the interpreter repeats itsel£ If the interpreter wants to minimize her effort, she runs the risk of not finding the speaker's intention. Of course, it does not pay off to put in more effort than is needed to recognize the speaker's intention, but economizing on effort cannot go below the effort required, on the pain of disfunctioning. There is of course the same principle of expressiveness: everything that is in the signal must be interpreted. But there does not seem to be a conflict between doing that to the maximal extent and the principle of not doing more than is required to find the speaker's intention.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
language and develop theories of the production process. Producing language would not make sense without understanding, but it is not clear that the understanding needed to develop that much. It seems to follow that if we want to develop an empirical theory of the relationship between forms and meanings we find in natural languages we should be primarily concerned with the direction that goes from meaning to form. The other direction is like other perception problems where one reasons from a perceptual content to its causes. The argument of this section is speculative and can at most underpin a certain bias towards the primacy of OT syntax and against an independent OT semantics. The argument in the next section has more substance, though it is not compelling either.
Henk Zeevat
249
4
PROPER OPTIMALITY THEORETIC SEMANT I C S
The previous two sections may be read as arguments against assuming that there should be an OT semantics in addition to or side by side with OT 4 Later on I defend some defeasible interpretation constraints that are ordered with respect to each other. I accept the conclusion that their ordering is not a result of language users learning how their language resolves a conflict between opposing principles. This conclusion is also unavoidable given that nothing indicates that different languages could have them in different orderings. In fact, bizarre concepts of language use result if one tries alternative orderings, which seems to indicate that we should look for rational rather than empirical explanations of the ordering.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
From this, I want to conclude that whereas there is a naturalistic interpretation of conflicting constraints in language production, there is no such naturalistic interpretation for conflicting constraints in interpreta tion. If there are conflicting constraints in language interpretation they must derive from constraints about language production.4 The situation can be fruitfully compared to the habit of hiding Easter eggs for one's children on Easter Sunday. The parents engaged in hiding the eggs balance the amount of effort with the desired amount of difficulty in finding the egg. (They also picture the child looking for it and try to keep it possible for the child to find the egg, without spoiling the fun.) For the child it is another matter. It just has to throw in the effort required for finding the eggs. Not more of course, but definitely not less. It is not a complicated balancing act. This would be the argument that shows that the process of language production has to find a balance between conflicting constraints. Languages are an inventory and a conventionalized way of establishing the balance: the language particular ordering of the constraints. A similar argument for underpinning this balance in understanding cannot be given. If it could be shown that the task of interpreting is in fact always an instance of an ongoing hermeneutic process of refinement-as some would perhaps argue-the situation changes. Assuming the existence of a process of ongoing refinement, it is indeed possible to argue for a conflict between economy of effort on the one hand and the need for quick results on the other. It is, however, not easy to see how the semantic constraints that have been proposed can be seen as embodying a compromise between these conflicting needs. My intuition also tells me that the hermeneutic circle is normally quickly closed. The communication of everyday thoughts (What time is it? Give me a coffee! Do you have something to eat?) quickly results in the grasp of the speaker intention. Negative feedback can result in further reflection, but unprompted further reflection is pointless once a plausible and relevant speaker intention is found.
2 50
The Asymmetry of Optimality Theoretic Syntax and Semantics
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
syntax. My prejudice has in fact always been that there should not be a separate OT semantics. The proposed constraints of OT semantics and their ordering are really syntactic constraints in disguise and their ordering is the ordering of the disguised syntactic constraints. I tried to show the plausibility of this view by reconstructing the analysis of when-clauses of de Hoop & de Swart (zooo) within OT syntax (Zeevat zooo). But my plan of showing this ran up against the problem that there are some interpretation constraints that do important work and do not appear to allow a reformulation as syntactic constraints. These are the ones I know about: *ACCOMMODATION, *INVENT, STRENGTH, ANCHOR, CONSISTENCY, and FAITH-INT. I do not expect there to be many others and these ones also seem to form an interesting natural class, as I will try to show at the end of this section. I should also say at this point that my general solution does not depend on the question of which semantic constraints must be assumed or on the formulation of those constraints. The only requirement is that there should be some, otherwise the theory collapses into optimality theoretic syntax. Though the precise content of the system is not essential, I believe the system I present in this section has some independent merit. The first constraint is *ACCOMMODATION. It (fallibly) prohibits accommodation of the antecedents of presupposition triggers. A presuppo sition trigger such as regret requires that its complement is already true in the context in which it is used. If it is not true, the content of the complement needs to be added to the context, a process called accom modation. Nothing should be added if the context (or one of the local contexts) already has the material and *ACCOMMODATION does just that. I cannot imagine anything in syntax that has the effect of *ACCOM MODATION. It cannot be a prohibition against using the trigger in a context that does not have the antecedent: that occurs frequently and appropriately. If one wants, *ACCOMMODATION can be taken as a special case of a principle that forbids us to add material to the context of the utterance or to the content of the utterance without a proper reason (like external evidence or the material supplied by the sentence). *INVENT seems a good name for such a constraint. It is quite unclear how the speaker can rule out this bad behaviour of the listener by adding some feature to the sentence. For example, *INVENT forbids us to start thinking that John is ill, if all that the speaker said is that Mary had an ice cream. It is the principle that asks us not to overinterpret. STRENGTH expresses the preference for informationally stronger readings of the sentence. It is the odd man out here, because it does not seem to allow a discrete evaluation measure and also makes a couple of
Henk Zeevat 2 5 r
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
wrong predictions, as Geurts (2ooo) has pointed out. Nevertheless, a version of STRENGTH is needed for the interpretation of presupposition triggers and-as Dalrymple et al. (1998) have argued-for the interpretation of reciprocals. It is obvious that there is no generation principle that can capture the effect of STRENGTH. From the generation perspective, it seems that the weaker inputs that STRENGTH rules out as an interpretation will nevertheless be optimally realised by the sentence. ANCHOR is the principle that interpretations should be anchored. In essence, this means that all the pronouns, ellipses, tenses, and topics should find proper antecedents and that a discourse relation must be constructed from the current sentence to the appropriate earlier element of the discourse or dialogue. Accommodation occurs because of the needs of ANCHOR. There is something in generation that corresponds to this: the principles that select the proforms, ellipsed versions, the presupposition triggers, the topic focus articulation and connectives based on the speaker's estimate of the context. It seems that ANCHOR can be reformulated as a syntactic principle that prevents the choice of a reduced form (a pronoun, an ellipsis, deaccented pronunciation, zero connective) when this is not appropriate. In principle, we could have a generation principle *REDUCE that prevents such reductions when the context does not licence them. (*REDUCE would have to be ordered below the constraints that force the reductions.) But as will become clear, it suffices to have ANCHOR to get this effect and that seems the more natural choice. An additional argument is that reduced forms are not really required in the inventory of the language. E.g. the Latin homo (a man or the man) is not reduced with respect to the indefinite when it means the man and has a linguistic antecedent. Languages that have no reduced forms are just less efficient for the generator. The interpreter would still be trying to identify as much material as possible in the preceding context or relate objects by bridging and discourse relations, by the principle ANCHOR. Without reduced forms, there is no effect of *REDUCE in generation and consequently there would be no reason for finding an antecedent. CONSISTENCY prefers interpretations that do not conflict with the context. It plays a role in ambiguity resolution, selecting between different resolutions of anaphoric elements and in ruling out certain accommoda tions. It can be violated, since it is certainly possible to contradict the given context. Once more, there is no good generation constraint that rules out the expression of thoughts that contradict the context. It can just be done and the OT syntax tells us what is the best way of doing it. In certain cases, there is obligatory marking of inconsistency, using contrastive and concessive devices. Overt corrections have a number of syntactic features that make them recognisable. A language that would, however, not have
252
The Asymmetry of Optimality Theoretic Syntax and Semantics
FAITH-INT > CONSISTENCY, ANCHOR > *INVENT, *ACCOMMODATION > STRENGTH An
example illustrating FAITH-INT > CONSISTENCY, ANCHOR
is (4). (4)
A: B:
John hates Bill. He hates SUZY.
The second sentence, interpreted as a correction, violates consistency. Corrections would be impossible, if the ordering were reversed. The same example also illustrates that ANCHOR is not weaker than CONSISTENCY. If it were, the pronoun used by B could not refer to john. So we have CONSISTENCY � ANCHOR *ACCOMMODATE explains the contrast between (sa.) and (sb.).
(s) a. If John is in Berlin, he regrets that he is in Berlin. b. If Mary is in Amsterdam, John regrets that he is in Berlin. 5
As
Bresnan does
as
part of the faithfulness constraint in Bresnan (MS B).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
such syntactic devices-or that does not always mark inconsistency with the context-does not seem impossible. Obligatory marking of inconsistency is therefore not an alternative to the assumption of CONSISTENCY as a principle. For these violations of consistency, we need the principle of faithful interpretation FAITH-INT. This principle forces us to interpret all that the speaker has said. FAITH-INT could in principle5 be a generation constraint ('do not mark any features that are not in the input') but the positive formulation is an interpretation constraint and that makes it more natural to think of it as one. In the scheme I am presenting in the next two sections, the principle is superfluous: it is captured by the first step of recovering the set of inputs that could lead to the sentence. The ordering between the constraints is also fairly obvious. Readings can be inconsistent with the context if they are faithful and accommodation is only allowed because of the need to anchor. Accommodation is restricted to consistent additions to the context and selects the strongest reading when different ones are possible. This is just a rephrasing of the standard views on presupposition accommodation. This gives us the following picture of what-if l am right-is the whole of OT semantics.
Henk Zeevat 2 S 3
The (b.) example entails that John is in Berlin, but not the (a.) example, due to the presupposition trigger regret. In the (a.) example the presupposition is resolved to the condition of the implication. in (b.) that is not possible and the only interpretation is obtained by anchoring the trigger through the addition of the presupposition to the main context. This addition is ruled out in (a.) by *ACCOMMODATE. Addition to the context given by the condition in (b.) is ruled out by STRENGTH, as the resulting interpreta tion would be entailed by the addition to the main context. From this it follows that ANCHOR > *ACCOMMODATE and further that *ACCOMMODATE > STRENGTH. Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It should be clear that without support from OT syntax the semantics given by these principles is unable to interpret any sentence whatsoever. But OT syntax exists and how it is integrated with the semantic constraints is the subject of the next two sections. There is, however, one more aspect of the system that should be pointed out. It turns out to be no more than an OT reformulation of the essence of the received interpretation theory from the I970s. There we had the compositional semantics of Lewis (I 970) and Montague (I 97 4), supple mented with Karttunen (I973, I 974) and Stalnaker's (I978) ideas about presupposition and assertion. In the 198os these have been supplemented by establishing that anaphoric resolutions and discourse relations can be best thought of as special cases of presupposition. The combination of FAITH-INT and *INVENT restores important aspects of compositional semantics (not the full principle, but essential aspects). The combination of CONSISTENCY and STRENGTH are (a strengthening ofj Stalnaker's principles of assertion and ANCHOR and *ACCOMMODATION together give a reconstruction of the field of discourse, including insights from discourse representation theory (e.g. Kamp I98 1 and Heim 1982) and the analysis of presupposition, Heim I 98 3; van der Sandt 1992). The set of constraints itself is almost nothing more than the received theory. My proposal adds to the received theory by ordering the constraints and by allowing exceptions. It is extremely unlikely that there would be reasons for changing the constraints and their ordering if one moves from language to language. What is missing is not the rational argumentation for the constraints (that argumentation is just part of the literature) but the rational argumentation for their ordering. It is fairly clear from the empirical point of view that the ordering is as I sketched above. It seems that it is not hard to see that alternative orderings lead to problems. E.g. if CONSIST ENCY were weaker than STRENGTH, we would be hunting for strong but false interpretations whenever possible, which does not seem a good idea. Or if FAITH-INT were weaker than CONSISTENCY, we could
2 5 4 The Asymmetry of Optimality Theoretic Syntax and Semantics
s
THE B AS I C C O N N E C T I O N
The prediction of OT syntax is that a n optimal interpretation of a sentence S is any semantic input I that beats its competitors among the candidate set { (S, J ) : J is a semantic input} by the system consisting of the normal syntactic constraints and their ordering. Smolensky ( 1 996) points out that the winner of the interpretation competition for a sentence S is not necessarily going to be optimally generated as S by the same system and thereby explains observed asymmetries between production and generation in child language, since the competition in the other direction involves the different candidate set { ( S, I ) : S is a syntactic form} . Given what we have done so far, we can define the optimal interpreta tion of a sentence Form in two steps. First we take our OT syntax system and determine the set { Content : Form is an optimal form for the content Content}. In a second step, we determine which of the elements of that set optimally satisfy the interpretation constraints. Those are then the best interpretations. This can be understood as the evaluation of pairs ( Content, Form ) over two systems of constraints: the syntax constraints G = CG1 , , CGn and the interpretation constraints I = CI1 , • • • , Cim. The fact that we first take the set { Content : Form is an optimal form for the content Content} orders the interpretation constraints after the generation constraints, if we take both constraints as constraints on pairs. In the table below, the evaluation starts with all pairs in which Form is the input. The optimal pairs are found before the evaluation by the semantic constraints begins and form the set GEN for semantic evaluation. •
•
•
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
not correct each other. Given the communication protocol that we seem to have adopted, alternative orderings would lead to a loss of functionality. A proper rational foundation is, however, a complicated matter. It should show in detail why each of the constraints is there, why each ordering statement must be there and, importantly, why it is rational to have defeasible constraints, etc. This task must be deferred to future work. This section presented the case for preserving some OT semantics in the face of the criticism that OT semantics is not necessary or desirable given OT syntax. It is only a modest semantics that remains. In the next two sections I will only assume that OT semantics is a system of constraints that help us in deciding between the different readings predicted by OT syntax. I refer to that system as the interpretation constraints.
Henk Zeevat
255
The pairs that are optimal by the generation constraints give the optimal interpretations of Form.
( Content1 , Form ) ( Content) , Form)
Since the generation and interpretation constraints form disjoint systems we have no problem with harmonizing between the interpretation and the generation process. We can assume that an interpreter proceeds in this way (in an efficient implementation of it). But it is not wild to assume that the speaker does the same. Why say something knowing that it will be understood in the wrong way? It is also standard in natural language generation systems to check that the semantic representation from which generation started also comes out when the generated sentence is interpreted. One can even wonder whether a natural language speaker who-after all-is also a natural language understander can avoid interpreting her own words. This basic system already suffices for an explanation of the ineffability problem: ineffable contents are those whose optimal realisation is mis interpreted by the interpretation constraints. I will give a more subtle account of ineffability later on.
6 C O OPERATIVITY important aspect of pragmatics we did not incorporate so far is Grice's principle of cooperation.6 Language use is a special kind of cooperative behaviour and the speaker has a cooperative obligation when she speaks. In particular, the speaker has a responsibility for what the listener will make of her sentence. That makes it plausible to assume that the speaker goes through the interpreter's part of the process and makes sure that at least she would get the interpretation she intends. But there is something more to it. The speaker can make sure that interpretation is as painless as possible by An
6 Charity of the interpreter is coded in the interpretation principle of consistency with the context and in the principle of going for the most informative reading. But this is only one aspect of cooperativity.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
( Contentm, Form )
256
The Asymmetry of Optimality Theoretic Syntax and Semantics
avoiding violations of the interpretation constraints.7 This gives us the following picture. (G is the system of generation constraints, I the system of interpretation constraints.)
Computationally, there are differences. In interpretation, (b.) and (c.) are evaluated first, and in the course of step (b.) step (a.) is already carried out. In generation, step (a.) comes first and step (b.) and (c.) are checks on the outcome. But theoretically, optimal generation and interpretation are the same, as they should be. The asymmetry of Smolensky (1996) can only be a property of the emerging comprehension and production systems only, if one adopts this version of bidirectional OT. This is what people seem to do when they carry out the task of generating from a fixed content, like e.g. in literary translation. Real generation is probably better understood as a process starting from an only partially specified content. A succinct formulation of the system is to say that we first do normal OT syntax and-after that-superoptimality over the interpretation constraints. The cooperativity of the speaker gives us superoptimality in the semantics. The advantage of cooperativity is that we keep some of the effects of Blutner's bidirectionality. In particular, we preserve Blutner's theorem which offers revolutionary insights in the analysis of presupposition triggers, at least if you want to believe Zeevat (1999) or Zeevat (zooo). We also get a diagnosis for what is wrong with full superoptimality. In superoptimality, it is not just the speaker who is cooperative, but also the listener. The listener must select a reading taking into account the effort of the speaker: the reading is deselected if the speaker has to violate a stronger 7 I am not sure of my equation of pain and constraint violation, bur it is a natural idea. At least in syntax, it should be testable whether there is a relation between understanding times and the amount of constraint violation that goes on in sentences. Certainly the violations of the interpretation constraints that are the standard examples in the presupposition literature are not easy to understand.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Form is an optimal generation for Content (Content is an optimal interpreta tion of Form) iff a. Form wins the competition with respect to the generation constraints G for Content within the set of all forms. b. Within the set { Content;: Content; is an arbitrary content such that Form wins for Content; using G within the set of all forms}, Content wins the competition under the interpretation constraints I for the form Form. c. There is no other form Form1 that is better by I for Content than Form is. Here better refers to a standard OT competition between forms for a certain content using the interpretation constraints.
Henk Zeevat
257
constraint or the same constraint more severely for it than for another reading. But that does not make sense at all. The speaker will just spend the effort to express the content in question and the listener does not have the control necessary to reduce the speaker effort.
7
A P PL I C AT I O N S
Rat and rad
The last point of the last section is the solution to the rad/rat problem.
Italian WH-phrases
Let us assume that Italian wants it WH -phrases fronted, i.e. it has strong constraint FR.ONT-WH, i.e. which is violated by WH-phrases that are not in the first position. Let us also assume that it wants to mark semantical WH -phrases (variables bound by the question operator) by the typical morphology of WH -phrases, but not as much as it wants to front them. This means that we have a constraint PARSE-WH that is weaker than FRONT-WH.
It then follows that the optimal candidate for ?xy eat( e, x, y)) is some thing like (assuming qualcosa is the default NP of Italian).
(6) Chi ha mangiato qualcosa?
The WH-constituent is fronted and the subject, but not the object is WH marked. The object therefore violates PARSE-WH, but the damage is smaller than marking it and violating FRONT-WH. The generation competition gives-as always-an optimal candidate. But in interpretation, by *INVENT the semantic correspondent of the WH -feature cannot be recovered. That means that the optimal candidate is in fact not a good expression of the input. It wins the syntactic competition, but its WH -interpretation always loses out the interpretation contest.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
From the interpretation point of view rad (wheel) and rat (rat) are equally good interpretations for /rat/. Neither incurs a mark by any of the interpretation constraints. The mark occurred in the generation com ponent is unimportant once /rat/ has become the optimal realisation of rad and rat. The same applies to my syntactic version of the rad-rat problem. After Wie slaat Hans? has become the optimal realisation of both ?x beat(x, Hans) and ?x beat(Hans, x) the STAY violations become irrelevant.
258
The Asymmetry of Optimality Theoretic Syntax and Semantics
Killing and causing
to
die
We lose the ability to predict the semantic difference between kill and cause to die in this framework. A use of kill tends to be interpreted as a 'standard killing' while cause to die indicates that the killing is indirect, or at least non standard. Blutner8 explains this selection of meanings with weak super optimality, using only the way generation and interpretation are combined.
Reflexives
Grice ( 1975) remarks that if you say , you imply that the woman is not his wife, his mother, or his sister. (7) I saw John in town yesterday with a woman. We might add that the woman is also not the speaker or the listener or any other high salience item in the discourse situation. A natural explanation for 8 As Blumer points out, there is another problem If there are not two possibilities, the prediction from superoptimality is that only the simple reading remains. That would predict that make lauglt only has the direct interpretation, or that in Frisian, which has no reflexives, normal pronouns would only have reflexive meanings.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It is a pity we lose this explanation, but there is no reason for despair because a simple alternative explanation is available. Let us assume that there is an ECONOMY constraint active in the OT syntax. This constraint militates against long and infrequent ways of expression. If the sheriff killed Bill in a normal way, ECONOMY will prevent the selection of cause to die. For the interpreter, that means that the interpretation 'standard killing' is not available for the form cause to die. That form is not a survivor, since for simple killing kill must be used due to the ECONOMY constraint. Suppose that we also have a (stronger) constraint PARSE-MARKED which requires a marked way of expression when an input item is semantically marked, i.e. it belongs to the extension of a certain predicate, but it is ari unusual member of that extension. Assume, moreover, that the use of long and/or infrequent expressions are marked ways of expression and so fulfil the constraint when the input is semantically marked. The interpreter can then only interpret cause to die as the expression of a marked way of killing. The generator would violate PARSE-MARKED by simply using kill, if there was something strange about the way the sheriff proceeded. Though I appreciate the beauty of the explanation by weak super optimality, I am worried by the fact that the interpreter actually over interprets cause to die in Blutner's account. As I see it, the interpreter would violate *INVENT. I avoid this problem by having an input feature that distinguishes the two readings.
Henk Zeevat 2 5 9
8 M O RALS In this paper, I have shown that a theory of semantic interpretation on the basis of OT syntax is feasible, if it is supplemented with some quite general semantic and pragmatic principles. The place of the Gricean maxims within this scheme has so far not been explored properly. It is clear that relevance and quantity must play a role at some point. Superoptimality (or weak superoptimality) and the speaker and listener games developed by Dekker & van Rooy (2ooo) continue to be relevant, but do not penetrate syntax as such. The treatment opens perspectives for the further development of the field of semantics as such. If I am right, compositionality does not need to be as much a straightjacket as it was in the heydays of the rule-to-rule 9
A counterexample is Isherwood's title , the disciple being the author himself:
(8) My Guru and his Disciple. Is this incorrect? Certainly there is suggestion of respect and modesty that would be absent in My guru Another literary effect seems that the topic of the book is neatly described: it is about the guru and Isherwood himself but only in his capacity as the guru's disciple. 10 An exception should be made here for Panini, who by his general architecture and elsewhere principle is clearly a precursor of OT. and Me.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
this within OT is the assumption of a sequence of parsing constraints that force us to indicate in the output that the referent of an NP is the speaker9 or the listener, c-commanded, currently in the discourse topic, in the visible surrounding of the utterance, has been mentioned before, is related to a highly salient discourse item by a relation expressed by the common noun of the NP, is uniquely described by the common noun of the NP, etc. We further have to assume that first- and second-person · pronouns express the person, reflexives c-commanding (or-in English perspective), personal pronouns membership of the discourse topic, demonstratives the presence in the visible surroundings, the definite article either previous mention or a relation to an object in the discourse topic or uniqueness. The use of default rules for NP-selection is the standard technique in natural language generation and the only reason they have not found their way into linguistics is that most grammatical formalisms before OT syntax cannot accommodate them in natural way. 1 0 In combination with *INVENT and ANCHOR the hierarchy of parse constraints give us precisely the effects that Grice predicts: that we can rule out all the properties higher up in the hierarchy.
260
The Asymmetry of Optimality Theoretic Syntax and Semantics
hypothesis. A traditional problem is that of idiomatic expressions. The rule-to-rule hypothesis predicts that both sentences in mean the same, i.e. that the speaker wants to know the time.
(9) What time is it? How late is it?
( 1 0) Vx(man(x) --+ :Jy(woman(y) 1\ like(x, y) ) ) 3y(woman(y)
1\
Vx(man(x) --+ like(x, y) ) )
The syntax parses the grammatical function of the two quantifiers and their quantificational force, but not their relative scope. The function of polarity sensitive items also becomes clearer: they parse a semantic feature of the environment of the semantic NP. What we need is a weaker interpretation of the principle of composi tionality. Frege does not say much more than that the meaning of a complex expression is a function of the meaning of its parts. What we need are slightly more liberal formulations. Parts must be taken to be the smallest meaningful part, which can irrclude fixed combinations of words. And though we must admit that the meaning of a complex expression is determined by applying a function to the meaning of its parts, it does not follow that natural -languages make it clearer what the precise logical content of that function is on a particular occasion than they make it clear what shade of blue is involved in my daughter's new blue dress. Though we can go for more precision in both cases, such precision is not required or desirable for everyday communication. Acknowledgements This paper stems from remarks
I made as a member of the panel at the OTS Conference in
January 2000, Paul Smolensky's acute criticism of superoptimality, dinner-time discussion after the conference, and email discussion with Reinhard Blutner. I wish to thank especially Robert van Rooy, Reinhard Blutner, Marie Nilsenova, and Anna Pilatova for
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Now the fact of the matter is that in English the second expression, though grammatical, is merely a source of wonder, while only the first actually expresses it. (This is reversed for the two Dutch equivalents.) It should be easy to configure the English OT syntax so that only the first is an optimal expression of the input (avoiding low frequency items would already seem to do that). The second sentence is then correctly predicted to be uninterpretable. An important feature of OT syntax is that is can easily underspecify the full content of the semantic input. It is reasonable to assume that the representations in are both optimally generated by Every man likes a woman.
Henk Zeevat 26I their many useful comments. Many thanks also to the two anonymous reviewers for their excellent suggestions.
HENK ZEEVAT
Received: 05.04.2000
ILLC/Computational Linguistics University of Amsterdam Spuistraat 134 1 0 1 2 VB Amsterdam The Netherlands
Final version received: 29.08.2000
RE FERE NCES
information structure', Ph.D., Stanford. Dalrymple, M., Kanazawa, M., Kim, Y., Mchombo, S., & Peters, S. ( I 998), 'Reci procal expressions and the concept of reciprocity', Linguistics & Philosophy, 21, I 59-2 I 0. Dekker, P. & Rooy, R. van (2ooo), 'Bidirec tional optimality theory: an application of game theory' (this volume). Geurts, B. (2ooo), 'Buoyancy and strength' (this volume). Grice, H. P. ( I 975), 'Logic and conversation', in P. Cole & J. L. Morgan (eds), Syntax and Semantics 3: Speech Acts, Academic Press, New York, 4 I -5 8. Grimshaw, ]. (I 997 ), 'Projection, heads and optimality', Linguistic Inquiry, 28, 3 7 3 -422. Heim, I. (I 982), 'The semantics of definite and indefinite noun phrases', Ph.D., University of Massachusetts, Amherst. Heim, I. ( I 98 3 ), 'On the projection prob lem for presuppositions', WCCFL, 2, I I 4-26. Hendriks,
P.
&
Hoop,
H.
de
(2ooo),
'Optimality theoretic semantics',
Lin
guistics and Philosophy (to appear). Hoop, H. de & Swart, H. de (2ooo), 'Tem poral adjunct clauses in optimality theory', in H. de Hoop & H. de Swart (eds) OTS2 , OTS, Utrecht. Kamp, H. (I 9 8 I ), 'A theory of truth and semantic representation', in J. A G. Groenendijk, T. M. V. Janssen, & M. B. J. Stokhof (eds), Formal Methods in the Study of Language, Mathematical Centre, Amsterdam. Karttunen, L. ( 1 97 3), Presuppositions of Compound Sentences, Linguistic Inquiry, I 67-93 · Karttunen, L . ( I974), 'Presupposition and
Theoretical Linguistics,
linguistic context', I, I 8 I -94· Lewis,
D.
( I 970),
Synthese,
'General
semantics',
22, I 8-67.
Marriott, K. & Meyer, B. (I 998), 'The CCMG visual language hierarchy', in K. Marriott & B. Meyer ( I 998),
Visual Language Theory, Springer Verlag, New York/Berlin/Heidelberg, I 29- I 7 I . Montague, R. ( I 974), 'The proper treatment of quantification in ordinary English', in R. Thomason (ed.), Formal Philosophy, New Haven/London, 247-7 1. Pesetsky, D. ( I 997), 'Optimality in syntax', in D. Archangeli & T. Langendoen (eds),
Optimality
Theory,
Blackwell,
Oxford. Sandt, R. van der ( I 992), 'Presupposition projection
as
anaphora
resolution',
Journal ofSemantics, 9, 3 3 3-77·
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Blumer, R. (2ooo), 'Some aspects o f opti mality theory in unterpretation' (this volume). Boersma, P. (1 998), 'Functional phonology', Ph.D., Amsterdam. Bresnan, J. (MS A). 'The emergence of the unmarked pronoun: Chichewa pronom inals in optimality theory, BLS23. Bresnan, J. (MS B), 'Optimal Syntax', MS, Stanford. Choi, Hye-Won ( I 998), 'Optimizing struc ture in communication: scrambling and
262 The Asymmetry of Optimality Theoretic Syntax and Semantics
Smolensky, P. ( 1996), 'On the compre hension/production dilemma in child language, Linguistic Inquiry, 27.
Zeevat,
Smolensky, P. (zooo), Handout OTS Conference, Utrecht 3 January 2000. Stalnaker, R ( r 978), 'Assertion', in Peter, Cole (ed.), Syntax and Semantics 9: Pragmatics, Academic Press, New York, 3 1 5- 3 2·
Amsterdam. Zeevat, H. (2ooo), 'Optimal semantics', in H. de Hoop & H. de Swart (eds), OTS2• OTS, Utrecht.
H.
(1999),
'Explaining
pre
supposition triggers', in P. Dekker (ed), AC 99 Proceedings. University of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011