VOLUME 26 NUMBER 4 NOVEMBER 2009
CONTENTS 329
Nina Gierasimczuk and Jakub Szymanik Branching Quantification v. Two-way Quantification
367
Alex Lascarides and Matthew Stone A Formal Semantic Analysis of Gesture
393
FORTHCOMING ARTICLES Dorit Abusch: Presupposition Triggering from Alternatives Donka F. Farkas and Kim B. Bruce: On Reacting to Assertions and Polar Questions Kristen Syrett: Meaning and Context in Children’s Understanding of Gradable Adjectives
VOLUME 26 NUMBER 4 NOVEMBER 2009
Christopher Davis Decisions, Dynamics and the Japanese Particle yo
JOURNAL OF SEMANTICS
JOURNAL OF SEMANTICS
VOLUME 26 NUMBER 4 NOVEMBER 2009
Journal of
SEMANTICS www.jos.oxfordjournals.org
oxford
issn 0167-5133 (Print) issn 1477-4593 (Online)
JOURNAL OF SEMANTICS A N I NTERNATIONAL J OURNAL FOR THE I NTERDISCIPLINARY S TUDY THE S EMANTICS OF N ATURAL L ANGUAGE
OF
MANAGING EDITOR:
Philippe Schlenker (Institut Jean-Nicod, Paris; New York University) ASSOCIATE EDITORS: Danny Fox (Massachusetts Institute of Technology) Manfred Krifka (Humboldt University Berlin; ZAS, Berlin) Rick Nouwen (Utrecht University) Robert van Rooij (University of Amsterdam) Yael Sharvit (University of Connecticut) Jesse Snedeker (Harvard University) Zoltán Gendler Szabó (Yale University) Anna Szabolcsi (New York University) ADVISORY BOARD: Gennaro Chierchia (Harvard University) Bart Geurts (University of Nijmegen) Lila Gleitman (University of Pennsylvania) Irene Heim (Massachusetts Institute of Technology) Laurence R. Horn (Yale University) Beth Levin (Stanford University)
Barbara Partee (University of Massachusetts, Amherst) François Recanati (Institut Jean-Nicod, Paris) Roger Schwarzschild (Rutgers University) Arnim von Stechow (University of Tübingen) Thomas Ede Zimmermann (University of Frankfurt)
EDITORIAL BOARD: Maria Aloni (University of Amsterdam) Pranav Anand (University of California, Santa Cruz) Nicholas Asher (IRIT, Toulouse; University of Texas, Austin) Chris Barker (New York University) Sigrid Beck (University of Tübingen) David Beaver (University of Texas, Austin) Rajesh Bhatt (University of Massachusetts, Amherst) Maria Bittner (Rutgers University) Peter Bosch (University of Osnabrück) Richard Breheny (University College London) Daniel Büring (University of California, Los Angeles) Emmanuel Chemla (Institut Jean-Nicod, Paris; LSCP, Paris) Jill G. de Villiers (Smith College) Paul Dekker (University of Amsterdam) Josh Dever (University of Texas, Austin) Regine Eckardt (University of Göttingen) Martina Faller (University of Manchester) Delia Fara (Princeton University) Lyn Frazier (University of Massachusetts, Amherst) Jeroen Groenendijk (University of Amsterdam) Elena Guerzoni (University of Southern California) Martin Hackl (Pomona College) Pauline Jacobson (Brown University) Andrew Kehler (University of California, San Diego) Chris Kennedy (University of Chicago) Jeffrey C. King (Rutgers University) Angelika Kratzer (University of Massachusetts, Amherst)
Peter Lasersohn (University of Illinois) Jeffrey Lidz (University of Maryland) John MacFarlane (University of California, Berkeley) Lisa Matthewson (University of British Columbia) Julien Musolino (Rutgers University) Ira Noveck (L2C2, CNRS, Lyon) Francis Jeffry Pelletier (University of Alberta) Colin Phillips (University of Maryland) Paul M. Pietroski (University of Maryland) Christopher Potts (Stanford University) Liina Pylkkänen (New York University) Gillian C. Ramchand (University of Tromsoe) Maribel Romero (University of Konstanz) Mats Rooth (Cornell University) Uli Sauerland (ZAS, Berlin) Barry Schein (University of Southern California) Bernhard Schwarz (McGill University) Benjamin Spector (Institut Jean-Nicod, Paris) Robert Stalnaker (Massachusetts Institute of Technology) Jason Stanley (Rutgers University) Mark Steedman (University of Edinburgh) Michael K. Tanenhaus (University of Rochester) Jos van Berkum (Max Planck Institute for Psycholinguistics, Nijmegen) Rob van der Sandt (University of Nijmegen) Yoad Winter (Utrecht University) Henk Zeevat (University of Amsterdam)
EDITORIAL CONTACT:
[email protected] © Oxford University Press 2009 For subscription information please see back of journal.
Editorial Policy Scope Journal of Semantics aims to be the premier generalist journal in semantics. It covers all areas in the study of meaning, and particularly welcomes submissions using the best available methodologies in semantics, pragmatics, the syntax/semantics interface, cross-linguistic semantics, experimental studies of meaning (processing, acquisition, neurolinguistics), and semantically informed philosophy of language. Types of articles Journal of Semantics welcomes all types of research articles–with the usual proviso that length must be justified by scientific value. Besides standard articles, the Journal will welcome ‘squibs’, i.e. very short empirical or theoretical contributions that make a pointed argument. In exceptional circumstances, and upon the advice of the head of the Advisory Board, the Journal will publish ‘featured articles’, i.e. pieces that we take to make extraordinary contributions to the field. Editorial decisions within 10 weeks The Journal aims to make editorial decisions within 10 weeks of submission. Refereeing Articles can only be accepted upon the advice of anonymous referees, who are asked to uphold strict scientific standards. Authors may include their names on their manuscripts, but they need not do so. (To avoid conflicts of interest, any manuscript submitted by one of the Editors will be handled by the head of the Advisory Board, who will be responsible for selecting referees and making an editorial decision.) Submissions All submissions are handled electronically. Manuscripts should be emailed in PDF format to the Managing Editor [
[email protected]], who will forward them to one of the Editors. The latter will be responsible for selecting referees and making an editorial decision. Receipt of a submission is systematically confirmed. Papers are accepted for review only on the condition that they have neither as a whole nor in part been published elsewhere, are elsewhere under review or have been accepted for publication. In case of any doubt authors must notify the Managing Editor of the relevant circumstances at the time of submission. It is understood that authors accept the copyright conditions stated in the journal if the paper is accepted for publication.
All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise without prior written permission of the Publishers, or a licence permitting restricted copying issued in the UK by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1P 9HE, or in the USA by the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. Typeset by TnQ Books and Journals Pvt. Ltd., Chennai, India. Printed by Bell and Bain Ltd, Glasgow, UK
SUBSCRIPTIONS
A subscription to Journal of Semantics comprises 4 issues. All prices include postage, and for subscribers outside the UK delivery is by Standard Air. Journal of Semantics Advance Access contains papers that have been finalised, but have not yet been included within the issue. Advance Access is updated monthly. Annual Subscription Rate (Volume 26, 4 issues, 2009) Institutional Print edition and site-wide online access: £187/$365/=281 C Print edition only: £178/$347/=267 C Site-wide online access only: £178/$347/=267 C Personal Print edition and individual online access: £70/$137/=105 C Please note: £ Sterling rates apply in Europe, US$ elsewhere There may be other subscription rates available, for a complete listing please visit www.jos.oxfordjournals.org/subscriptions. Full prepayment, in the correct currency, is required for all orders. Orders are regarded as firm and payments are not refundable. Subscriptions are accepted and entered on a complete volume basis. Claims cannot be considered more than FOUR months after publication or date of order, whichever is later. All subscriptions in Canada are subject to GST. Subscriptions in the EU may be subject to European VAT. If registered, please supply details to avoid unnecessary charges. For subscriptions that include online versions, a proportion of the subscription price may be subject to UK VAT. Personal rate subscriptions are only available if payment is made by personal cheque or credit card and delivery is to a private address. The current year and two previous years’ issues are available from Oxford University Press. Previous volumes can be obtained from the Periodicals Service Company, 11 Main Street, Germantown, NY 12526, USA. Email:
[email protected]. Tel: +1 (518) 537 4700. Fax: +1 (518) 537 5899. For further information, please contact: Journals Customer Service Department, Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, UK. Email:
[email protected]. Tel (and answerphone outside normal working hours): +44 (0)1865 353907. Fax: + 44 (0)1865 353485. In the US, please contact: Journals Customer Service Department, Oxford University Press, 2001 Evans Road, Cary, NC 27513, USA. Email:
[email protected]. Tel (and answerphone outside normal working hours): 800 852 7323 (toll-free in USA/Canada). Fax: 919 677 1714. In Japan, please contact: Journals Customer Services, Oxford University Press, 1-1-17-5F, Mukogaoka, Bunkyo-ku, Tokyo, 113-0023, Japan. Email:
[email protected]. Tel: (03) 3813 1461. Fax: (03) 3818 1522. Methods of payment. Payment should be made: by cheque (to Oxford University Press, Cashiers Office, Great Clarendon Street, Oxford, OX2 6DP, UK); by bank transfer [to Barclays Bank Plc, Oxford Office, Oxford (bank sort code 20-65-18) (UK);
overseas only Swift code BARC GB22 (GB£ Sterling Account no. 70299332, IBAN GB89BARC20651870299332; US$ Dollars Account no. 66014600, IBAN GB27BARC20651866014600; EU= C EURO Account no. 78923655, IBAN GB16BARC20651878923655]; or by credit card (Mastercard, Visa, Switch or American Express). Journal of Semantics (ISSN 0167 5133) is published quarterly (in February, May, August and November) by Oxford University Press, Oxford, UK. Annual subscription price is £187/$365/=281. C Journal of Semantics is distributed by Mercury International, 365 Blair Road, Avenel, NJ 07001, USA. Periodicals postage paid at Rahway, NJ and at additional entry points. US Postmaster: send address changes to Journal of Semantics (ISSN 0167-5133), c/o Mercury International, 365 Blair Road, Avenel, NJ 07001, USA. Abstracting and Indexing Annual Bibliography English Language Literature (ABEL), INSPEC, International Bibliography Sociology, Linguistics Abstracts, Linguistics and Language Behaviour Abstracts (LLBA), MLA: International Bibliography Books, Articles and Modern Language Literature, periodicals Contents Index, Philosopher’s Index, Social Planning Policy and Development Abstracts, Bibliographie Linguistique/Linguistic Bibliography and BLonline. Permissions For information on how to request permissions to reproduce articles/information from this journal, please visit www.oxfordjournals.org/jnls/permissions. Advertising Inquiries about advertising should be sent to Linda Hann, E-mail: lhann@lhms. fsnet.co.uk. Phone/fax: 01344 779945. Disclaimer Statements of fact and opinion in the articles in Journal of Semantics are those of the respective authors and contributors and not of Journal of Semantics or Oxford University Press. Neither Oxford University Press nor Journal of Semantics make any representation, express or implied, in respect of the accuracy of the material in this journal and cannot accept any legal responsibility or liability for any errors or omissions that may be made. The reader should make his/her own evaluation as to the appropriateness or otherwise of any experimental technique described.
JOURNAL OF SEMANTICS A N I NTERNATIONAL J OURNAL FOR THE I NTERDISCIPLINARY S TUDY THE S EMANTICS OF N ATURAL L ANGUAGE
OF
MANAGING EDITOR:
Philippe Schlenker (Institut Jean-Nicod, Paris; New York University) ASSOCIATE EDITORS: Danny Fox (Massachusetts Institute of Technology) Manfred Krifka (Humboldt University Berlin; ZAS, Berlin) Rick Nouwen (Utrecht University) Robert van Rooij (University of Amsterdam) Yael Sharvit (University of Connecticut) Jesse Snedeker (Harvard University) Zoltán Gendler Szabó (Yale University) Anna Szabolcsi (New York University) ADVISORY BOARD: Gennaro Chierchia (Harvard University) Bart Geurts (University of Nijmegen) Lila Gleitman (University of Pennsylvania) Irene Heim (Massachusetts Institute of Technology) Laurence R. Horn (Yale University) Beth Levin (Stanford University)
Barbara Partee (University of Massachusetts, Amherst) François Recanati (Institut Jean-Nicod, Paris) Roger Schwarzschild (Rutgers University) Arnim von Stechow (University of Tübingen) Thomas Ede Zimmermann (University of Frankfurt)
EDITORIAL BOARD: Maria Aloni (University of Amsterdam) Pranav Anand (University of California, Santa Cruz) Nicholas Asher (IRIT, Toulouse; University of Texas, Austin) Chris Barker (New York University) Sigrid Beck (University of Tübingen) David Beaver (University of Texas, Austin) Rajesh Bhatt (University of Massachusetts, Amherst) Maria Bittner (Rutgers University) Peter Bosch (University of Osnabrück) Richard Breheny (University College London) Daniel Büring (University of California, Los Angeles) Emmanuel Chemla (Institut Jean-Nicod, Paris; LSCP, Paris) Jill G. de Villiers (Smith College) Paul Dekker (University of Amsterdam) Josh Dever (University of Texas, Austin) Regine Eckardt (University of Göttingen) Martina Faller (University of Manchester) Delia Fara (Princeton University) Lyn Frazier (University of Massachusetts, Amherst) Jeroen Groenendijk (University of Amsterdam) Elena Guerzoni (University of Southern California) Martin Hackl (Pomona College) Pauline Jacobson (Brown University) Andrew Kehler (University of California, San Diego) Chris Kennedy (University of Chicago) Jeffrey C. King (Rutgers University) Angelika Kratzer (University of Massachusetts, Amherst)
Peter Lasersohn (University of Illinois) Jeffrey Lidz (University of Maryland) John MacFarlane (University of California, Berkeley) Lisa Matthewson (University of British Columbia) Julien Musolino (Rutgers University) Ira Noveck (L2C2, CNRS, Lyon) Francis Jeffry Pelletier (University of Alberta) Colin Phillips (University of Maryland) Paul M. Pietroski (University of Maryland) Christopher Potts (Stanford University) Liina Pylkkänen (New York University) Gillian C. Ramchand (University of Tromsoe) Maribel Romero (University of Konstanz) Mats Rooth (Cornell University) Uli Sauerland (ZAS, Berlin) Barry Schein (University of Southern California) Bernhard Schwarz (McGill University) Benjamin Spector (Institut Jean-Nicod, Paris) Robert Stalnaker (Massachusetts Institute of Technology) Jason Stanley (Rutgers University) Mark Steedman (University of Edinburgh) Michael K. Tanenhaus (University of Rochester) Jos van Berkum (Max Planck Institute for Psycholinguistics, Nijmegen) Rob van der Sandt (University of Nijmegen) Yoad Winter (Utrecht University) Henk Zeevat (University of Amsterdam)
EDITORIAL CONTACT:
[email protected] © Oxford University Press 2009 For subscription information please see back of journal.
Editorial Policy Scope Journal of Semantics aims to be the premier generalist journal in semantics. It covers all areas in the study of meaning, and particularly welcomes submissions using the best available methodologies in semantics, pragmatics, the syntax/semantics interface, cross-linguistic semantics, experimental studies of meaning (processing, acquisition, neurolinguistics), and semantically informed philosophy of language. Types of articles Journal of Semantics welcomes all types of research articles–with the usual proviso that length must be justified by scientific value. Besides standard articles, the Journal will welcome ‘squibs’, i.e. very short empirical or theoretical contributions that make a pointed argument. In exceptional circumstances, and upon the advice of the head of the Advisory Board, the Journal will publish ‘featured articles’, i.e. pieces that we take to make extraordinary contributions to the field. Editorial decisions within 10 weeks The Journal aims to make editorial decisions within 10 weeks of submission. Refereeing Articles can only be accepted upon the advice of anonymous referees, who are asked to uphold strict scientific standards. Authors may include their names on their manuscripts, but they need not do so. (To avoid conflicts of interest, any manuscript submitted by one of the Editors will be handled by the head of the Advisory Board, who will be responsible for selecting referees and making an editorial decision.) Submissions All submissions are handled electronically. Manuscripts should be emailed in PDF format to the Managing Editor [
[email protected]], who will forward them to one of the Editors. The latter will be responsible for selecting referees and making an editorial decision. Receipt of a submission is systematically confirmed. Papers are accepted for review only on the condition that they have neither as a whole nor in part been published elsewhere, are elsewhere under review or have been accepted for publication. In case of any doubt authors must notify the Managing Editor of the relevant circumstances at the time of submission. It is understood that authors accept the copyright conditions stated in the journal if the paper is accepted for publication.
All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise without prior written permission of the Publishers, or a licence permitting restricted copying issued in the UK by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1P 9HE, or in the USA by the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. Typeset by TnQ Books and Journals Pvt. Ltd., Chennai, India. Printed by Bell and Bain Ltd, Glasgow, UK
JOURNAL OF SEMANTICS Volume 26 Number 4
CONTENTS CHRISTOPHER DAVIS Decisions, Dynamics and the Japanese Particle yo
329
NINA GIERASIMCZUK AND JAKUB SZYMANIK Branching Quantification v. Two-way Quantification
367
ALEX LASCARIDES AND MATTHEW STONE A Formal Semantic Analysis of Gesture
393
Please visit the journal’s web site at www.jos.oxfordjournals.org
Journal of Semantics 26: 329–366 doi:10.1093/jos/ffp007 Advance Access publication July 17, 2009
Decisions, Dynamics and the Japanese Particle yo CHRISTOPHER DAVIS University of Massachusetts Amherst
Abstract
1 INTRODUCTION
1.1 Basic facts The particle yo is one of a number of sentence-final particles (SFPs) in Japanese.1 A typical example of a sentence with yo is given in (1). (1)
densha-ga ki-ta yo train-NOM come-PAST yo ‘The train is here yo.’
As noted by McCready (2005, 2009), the presence of yo has no obvious effect on the truth conditions of a sentence in which it occurs. Thus, the sentence in (1) is true just in case a train arrived; these truth conditions hold with or without the presence of yo. 1
That is, standard Tokyo Japanese. The inventory of SFPs and their range of uses are subject to dialectal variation. The facts reported in this article reflect the judgments of native speakers of Tokyo Japanese. The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email:
[email protected].
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I provide an account of the Japanese sentence-final particle yo within a dynamic semantics framework. I argue that yo is used with one of two intonational morphemes, corresponding to sentence-final rising or falling tunes. These intonational morphemes modify a sentence’s illocutionary force head, adding an addressee-directed update semantics to the utterance. The different intonational contours specify whether this update is monotonic or non-monotonic. The use of yo is then argued to contribute a pragmatic presupposition to the utterance saying that the post-update discourse context is one in which the addressee’s contextual decision problem is resolved. This proposal is shown to account for a range of constraints on the felicitous use of yo, including its restriction to addressee-new and addresseerelevant information in assertions, as well as its behaviour in imperatives and interrogatives.
330 Decisions, Dynamics and the Japanese Particle yo Like other SFPs, the syntactic position of yo is extremely rigid. It must occur sentence finally2 and cannot be embedded, as seen in (2), which is ungrammatical unless the complement of omot-ta ‘thought’ is interpreted as a direct quotation of John’s thoughts. (2) *densha-ga ki-ta yo to John-ga omot-ta train-NOM come-PAST yo COMP John-NOM think-PAST ‘John thought [that the train had come yo].’
2 SFPs can co-occur in Japanese, as in the following example with the particles wa and yo, so that strictly speaking it is the particle cluster that has to appear sentence finally.
mou kaet-ta wa yo already return-PAST wa yo ‘(He) already went home.’ When multiple particles occur, their relative ordering is rigid. The only particles that can follow yo are ne and na, which are themselves in complementary distribution. This article does not deal with particle clusters, focusing instead on sentences containing only the particle yo. The question of other particle meanings, and a compositional account of particle clusters, remains for future research.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
If yo does not affect the truth conditions of a sentence, what does it do? The literature is rich in data and insights into the conditions governing the felicitous use of yo, but with the exception of the proposals of McCready (2005, 2006, 2009), these accounts are not typically presented in the context of a formal semantic and pragmatic theory. McCready (2005) identifies three main perspectives on the use of yo; these are summarized below, along with a non-exhaustive list of references. Uttering (1) is infelicitous if the hearer already knows that the train has arrived, reflecting the observation (Kamio 1994; Suzuki Kose 1997a,b) that a sentence with yo marks information that is new to the hearer, or which the hearer has forgotten. Also, uttering (1) often carries a sense of urgency or insistence, a fact that can be attributed to the use of yo (Suzuki Kose 1997a; McCready 2009). Finally, (1) is odd unless the hearer is assumed to have some interest in the arrival of the train, as noted by McCready (2005), citing Noda (2002), reflecting the observation that yo marks the relevance of the asserted content to the addressee. The account of yo I propose in this article captures all of these observations. The formal proposal builds on the intuition that yo marks the relevance of an utterance’s content to the addressee. However, I formalize this notion in a way that encompasses the use of yo in imperatives and questions, whose characterization is not captured by the above generalizations, since imperatives and questions do not encode informational content in the same way that assertions do. The
Christopher Davis 331
intuition that yo gives a sense of urgency or insistence to an utterance is argued to stem from one of two intonational morphemes with which yo occurs. Before giving my account of the semantics and pragmatics of yo, I review the account of McCready (2009), which most closely resembles that proposed in this article.
1.2 McCready’s (2009) account: dynamics and relevance
(3)
½½yo(u) ¼ a. Semantics: rkS-ASSERT(u)kr# b. Presupposition: BS IVH ðQ; uÞ > ds
The dynamic component of yo’s meaning involves a ‘strong assertion’ of yo’s propositional complement u, S-ASSERT(u), defined in (4). (4)
rkS-ASSERT(u)kr# ¼ a. rkukr# if rku 6¼ ; b. rkY:u ; ukr# otherwise.
S-ASSERT(u)
is an instruction to the interpreter to update its information state r with u if the post-update information state r# is non-empty. If the state resulting from an update to r with u would cause an empty information state, then the interpreter is instructed to first downdate with the negation of u (written Y :u), then update with u. Taking the information state r to be a set of propositions, the downdate operator is an instruction to remove :u from r. In simple cases, this corresponds to set subtraction, so that the new information state is r f:ug. In cases where the removed proposition is entailed by other propositions in the information state, or is itself an important premise, more extensive and often non-deterministic revisions are required (see, e.g. Ga¨rdenfors 1988 for discussion). McCready argues that the S-ASSERT component of yo’s meaning is responsible for the sense of strength or insistence that yo contributes to an utterance. The presuppositional component of McCready’s semantics for yo is intended to capture the intuition that yo marks assertions whose propositional content is taken by the speaker to be relevant to the hearer. The formula in (3b) says that it is a presupposition of yo(u) that
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
McCready (2009) proposes the two-part semantics of yo in (3), with a dynamic component (3a) in which yo contributes a specific kind of update semantics, and a presuppositional component (3b) that captures the intuition that the use of yo indicates the relevance of the utterance for the hearer.
332 Decisions, Dynamics and the Japanese Particle yo
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
the speaker believes (BS ) that the information value for the hearer of u with respect to some contextual question Q, IVH(Q, u), is above some contextual relevance threshold ds. This formulation of relevance builds on proposals of van Rooy (2003a,b) in which the relevance of a proposition u is associated with u’s informativity for the interpreter with respect to a contextually specified question Q, which is understood as a partition on the set of worlds and can be identified with the Question Under Discussion (QUD, Roberts 1996, 2004). At an intuitive level, the informativity metric in (3b) measures the extent to which the proposition u helps to reduce the hearer’s uncertainty with respect to the question Q. More technically, the measure considers the entropy, an information theoretic measure of uncertainty, of the hearer’s information state with respect to Q before and after update with u. This difference is the informativity value for the hearer of u with respect to Q. The presupposition in (3b) requires that this value be above some contextual threshold. McCready’s proposal is the starting point for my own analysis. I take the idea underlying his proposal as essentially correct, in that it attempts to account for both the dynamic effect of using yo and the intuition that yo has something to do with marking the relevance of the utterance to the addressee. I further explore both of these aspects of yo’s meaning. In section 2, I argue that the dynamic component of yo’s meaning depends on the intonational pattern with which yo is used: yo with a falling intonational pattern is associated with a non-monotonic update similar to that contributed by McCready’s S-ASSERT, while a rising intonational pattern is associated with a standard, monotonic update. Both intonational morphemes encode an update to the addressee’s, rather than the speaker’s, public beliefs in the discourse context, accounting for the addressee-directedness of utterances with yo. I then present evidence that the same generalization holds in imperatives. This leads to an extension of yo’s dynamic semantics to imperatives. In section 3, I present data showing that yo is used with utterances that are intended by the speaker to guide the action of the addressee, and moreover that in such contexts bare declaratives are infelicitous in Japanese. On the basis of these observations, I propose a non-assertive component of yo’s meaning that captures its use as a guide to action and at the same time accounts for the intuition that yo marks information that is in some sense relevant to the addressee. This proposal accounts for the behaviour of yo not only in assertive utterances but also in imperatives and questions as well. Section 4 concludes with discussion about the architecture of the semantic system developed in this article
Christopher Davis 333
and the way that semantics and pragmatics interact in governing the felicity conditions of discourse particles like yo. 2 INTONATION AND THE DYNAMICS OF YO
3
Koyama (1997) claims that yo can occur with a distinct falling–rising contour intonation, in addition to simple falling and simple rising intonations. However, in terms of the contextual restrictions and interpretations that I discuss in this article, the rising intonation and the falling–rising contour intonation pattern seem to pattern identically, so I treat them as a group. Further work may, though, reveal interpretational differences between the simple rising and the falling–rising intonations.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
At the very least, one can identify distinct rising and falling intonational patterns with which yo can occur (Shirakawa 1993; Matsuoka 2003).3 I use yo[ to designate yo with rising intonation, and yoY to designate yo with falling intonation. Koyama (1997) argues that the meaning of yo (and other SFPs in Japanese) should be distinguished from the meaning attributable to the intonational contour with which it occurs. I follow Koyama in this respect, arguing that yo[ and yoY are morphologically complex, consisting of the morpheme yo and one of two other morphemes that I represent as [ and Y, reflecting their phonological manifestation on the intonational tier. The rising and falling tunes associated with [ and Y can be identified phonologically with two of the five boundary phrase markers (BPMs) discussed by Venditti et al. (2008) within the framework of XJToBI [extended Japanese ToBI, an extension/revision of the J_ToBI schema (Venditti 2005), which in turn is based on the theory of Japanese intonational structure of Pierrehumbert & Beckman (1988)]. BPMs are a robustly attested phenomenon in spoken Japanese and are generally considered to be ‘pragmatic morphemes’ (see Venditti et al. 2008:13, for discussion and extensive references). Under the analysis implicit in the X-JToBI labelling schema, BPMs are associated with the right edge of an accentual phrase and can thus occur both sentencemedially and sentence finally. The dialogues from the ‘core’ portion of the Corpus of Spoken Japanese [Kokuritsu Kokugo Kenkyuujo (National Institute for Japanese Language) 2006] contain 55 utterance-final occurrences of yo with associated X-JToBI intonational labelling, 33 of which are associated with the rising BPM tune L%H% and 12 with the falling BPM tune L%. Based on these data, I assume that the morpheme I represent as [ is realized phonologically by a tune on the intonational tier which in the X-JToBI system is represented as the tune L%H%. The morpheme I represent as Y is phonologically manifested as the
334 Decisions, Dynamics and the Japanese Particle yo
2.1 Intonation of yo in assertions My analysis of the dynamic effects of yo’s intonation in assertions is situated within the model of discourse context proposed by Gunlogson (2003), in which the Common Ground (Stalnaker 1978) is derived from the discourse participants’ public beliefs. PBA(C) represents the public beliefs of agent A in context C.4 Gunlogson’s definition of public beliefs is given in (5). (5) Let PBA and PBB be sets of propositions representing the public beliefs of A and B, respectively, with respect to a discourse in which A and B are the participants, where: a. p is a public belief of A iff ‘A believes p’ is a mutual belief of A and B b. p is a public belief of B iff ‘B believes p’ is a mutual belief of A and B A conversational context C is a tuple whose elements are the public beliefs of each conversational participant. In case there are two participants A and B, C ¼ ÆPBA, PBBæ. I will use the notation PBX(C) to refer to the set containing the public beliefs of a discourse participant X in discourse context C. The Common Ground can be reconstructed by taking the intersection of the public beliefs of each discourse participant in a given discourse context C. In case there are just two discourse participants A 4
In Gunlogson’s model, the public beliefs of an agent are identified with what are called discourse commitments. In this article, I expand the notion of discourse commitments to include an agent’s public intentions, a topic addressed in section 2.2 when I turn to imperatives.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
tune L%. This makes the phonology of Y indistinguishable (at least in the X-JToBI labelling conventions) from the default final fall associated with the right edge of accentual and intonational phrases in the phonology of Japanese. The justification for positing a distinct morpheme Y comes from semantic arguments presented just below. In this section, I provide data suggesting that the dynamic component of yo’s meaning is a function of the intonational pattern with which it is used. I argue that the morpheme [ encodes an addressee-directed monotonic update, while Y encodes an addresseedirected non-monotonic update. In section 2.1, I describe the contribution of intonation to the dynamic component of yo’s meaning in assertions. In section 2.2, I show that yo’s intonation has a function in imperatives that is analogous to its function in assertions.
Christopher Davis 335
and B, a simplifying assumption I make throughout this article, this reduces to the following: (6)
CGfA,Bg(C) ¼ PBA(C) \ PBB(C)
(7)
CCP of assertions: ½½ASSERT¼ kpkC.PBspkr(C) + p
ASSERT is a monotonic function taking a propositional argument and returning a function from contexts to contexts (i.e. a CCP), whereby the input context C is updated by adding the proposition p to the speaker’s public beliefs in C. There is nothing in the semantics in (7) directly encoding an update to the addressee’s public beliefs, unlike the proposal in, for example Portner (2007b), in which an expressive analysis of force markers is outlined. Typically, in a dialogue, an assertion is made not only to achieve an update to the speaker’s public beliefs but also to influence the public beliefs of the addressee, and thus the contents of the common ground. I propose that a bare assertion (i.e. one without
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Throughout the article, I abbreviate CGfA,Bg(C) to CG(C), with the understanding that this refers to the common ground constructed from the intersection of the public beliefs of both discourse participants in context C. Gunlogson follows Heim (1982) and others in treating the meaning of a sentence in terms of its context-change potential (CCP). Gunlogson adapts the CCP idea to her more articulated model of contexts by arguing that a given sentence corresponds to an update to a particular substructure of the context. Since in her system the common ground consists of the intersection of the public beliefs of the discourse participants in that context, the semantics of a declarative sentence is interpreted as an update to the public beliefs of some discourse participant X. Gunlogson leaves the discourse participant whose public beliefs are updated with the content of the declarative unspecified, arguing that intonation serves to specify the target of the update. I depart from Gunlogson’s approach on this point, arguing that an assertive declarative sentence in a context C is uniformly associated with an update to the speaker’s public beliefs in C. The update function + operates on a subpart of a discourse context and adds to that subpart a proposition. For example, PBX(C) + p is a context that is just like C in every respect, except that PBX(C) now contains p. An assertion encodes such an update to the speaker’s public beliefs. This is encoded in the assertive operator ASSERT in (7).
336 Decisions, Dynamics and the Japanese Particle yo
(8) A:
souridaijin-ga nakunat-ta prime.minister-NOM die-PAST ‘The prime minister died.’ B: sin-de-nai yoY/#yo[ die-INF-NEG yoY/#yo[ ‘(No), he did not die.’
In the dialogue in (8), B indicates that he disbelieves A’s assertion that the prime minister has died. In this context, B must use yoY in his rebuttal to A’s utterance. It is infelicitous for B to use yo[. In dialogues where the speaker is not challenging any of the addressee’s commitments, the use of yo[ becomes natural, as illustrated by the dialogue in (9). (9) A:
go-han mou tabe-ta? HON-rice already eat-PAST? ‘Did you eat already?’ B: tabe-ta (yo[/yoY) eat-PAST (yo[/yoY) ‘(Yeah,) I ate.’
5 The original Japanese is ‘kono taipu ga yo no youhou no naka de mottomo tenkeiteki de aru to omoware, iwayuru kokuchi, jouhoudentatsu, yobikake nado ni bunrui sareru’. 6 The original Japanese is ‘kouchou no intoneeshon o tomonau baai ni wa, hanashite to kikite no ninshiki no sa ga kyouchou sareru dake de naku, mushiro sore ga kuichigatteiru koto ga shimesareru’.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
the particle yo) is typically intended and understood as an attempt to get the addressee to update his public beliefs. More concretely, an utterance of ASSERT(p) is interpreted as a function from contexts to contexts, in which the resulting context has p as a public belief of the speaker. The conversational function of such an utterance is to update the current discourse context according to the semantics of ASSERT(p). This has the effect of publicly committing the speaker to the truth of p. Further pragmatic considerations will in many circumstances lead one to interpret the speaker’s assertion as an invitation (or suggestion, demand, etc.) to the addressee to update his own public beliefs with p. Turning to the dynamic contribution of yo[ and yoY in assertions, I first note a number of facts that motivate the analysis. Koyama (1997: 105) says that yo[ exhibits the ‘most typical’ of the meanings associated with yo, including ‘notification, information-transmission, and attentioncalling’.5 On the other hand, Koyama argues, yoY gives to an utterance a sense that there is some kind of conflict or incompatibility in the speaker’s and addressee’s understanding.6 This is illustrated by the dialogue in (8).
Christopher Davis 337
(10)
a.
½½[ ¼ kFkpkC.F(p)(PBaddr(C) + p)
b.
½½Y ¼ kFkpkC.F(p) ((PBaddr(C) Y q) + p)
The denotation of [ encodes a standard monotonic update to the addressee’s public beliefs, parallel to the update to the speaker’s public beliefs encoded by the ASSERT operator in (7). The denotation of Y also encodes an update to the addressee’s public beliefs, but only after an initial downdate with another proposition q, which is a free propositional variable whose value must be contextually supplied. In cases like (8), this will be the negation of the propositional argument of Y. In cases like (9), it will have to be inferred more indirectly from the context. The data in (8) can now be accounted for as follows: in this dialogue, A first indicates his belief in the truth of the proposition p ¼ ‘The prime minister has died’ by uttering ASSERT(p). This has the effect of adding p to the public beliefs of A. B responds by asserting that the prime minister did not die, ASSERT(:p). This constitutes a contradiction to A’s public belief in p and thus requires the use of yoY; yo[ is infelicitous. This is explained by the fact that since B’s assertion is 7
I thank Eric McCready for pointing out to me that the use of yoY is not simply infelicitous in such cases. 8 F is a variable over force heads, of type Æst, ÆC,Cææ.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
By asking an information-seeking yes/no question, A indicates that he is uncommitted regarding whether B has eaten. Unlike in the dialogue in (8), there is nothing in A’s assertion that is incompatible with B’s response, and the use of yo[ is felicitous. If B nevertheless uses yoY, she conveys an objection to something about A’s question. Informants report that B’s use of yoY in this context implies that there is something A is taking for granted that B thinks A should not take for granted (e.g. the possibility that B has not yet eaten, or that it is appropriate to be asking this question of B). In this particular context, B’s response is likely to convey something like ‘Of course I ate, why are you asking me all these questions?!’7 For this reason, in neutral, non-confrontational contexts, the use of yoY is perceived as infelicitous. The use of yo[ has no such implications. I propose that the differences in use of yo[ and yoY illustrated above are due to the semantic contributions of the morphemes[ and Y. These morphemes, I argue, attach to the force head ASSERT and return a function of the same type, from propositions to CCPs. They are essentially adverbial modifiers of the force head. The denotations are given in (10).8
338 Decisions, Dynamics and the Japanese Particle yo
2.2 Intonation of yo in imperatives Portner (2005) argues that the dynamics of imperatives can be understood with respect to what he calls an agent’s To-Do List (TDL). Just as the function of assertions is to update a discourse participant’s public beliefs, the function of imperatives is to update a discourse participant’s TDL. In Portner’s model, an agent A’s TDL is a set of properties that A is publicly committed to trying to make true 9 A consistent public belief set is one in which the set of worlds consistent with every proposition in the set is non-null. Since there is no possible world in which both p and :p hold, any set of public beliefs containing both of these propositions is inconsistent. 10 Original Japanese: ‘hatsuwa ga kikite ni mukerareteiru koto no kyouchou’.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
inconsistent with A’s public beliefs, A must first downdate with the negation of B’s assertion before he can update. If he were instructed simply to update his belief state with the content of B’s assertion, the resulting set of public beliefs would be inconsistent, since it would contain both the proposition ‘The prime minister has died’ and ‘The prime minister has not died’.9 While an assertion of p with yo[ is infelicitous in contexts in which the addressee has a public belief that :p, it is completely natural in a context in which the addressee has not made any commitment to the truth of p or :p. Thus B’s assertion in (9) is felicitous. We also account for the fact that, if B uses yoY in (9), he is understood to mean that there is some proposition q to which A is publicly committed that B is instructing A to renounce. This proposition is contextually determined, and in the case at hand might be resolved to q ¼ ‘It is possible that B has not eaten’ or q ¼ ‘It is appropriate for A to ask B whether he has eaten’. The use of [ and Y with an assertion explicitly encodes the update that the speaker wants the addressee to perform to his public beliefs. In bare assertions, by contrast, this update is indicated indirectly, via pragmatic reasoning. This allows us to account for the observation that, unlike bare assertive declaratives, assertions with yo are limited to dialogues and are not used in monologues, except in quotative contexts, a fact noted by Katagiri (2007). Related to this fact is the claim made by Shirakawa (1992) (cited in Koyama 1997: 104) that the use of yo (with either intonational pattern) serves to ‘emphasize the fact that an utterance is being directed toward the addressee’.10 These observations follow from the semantics of [ and Y, which make explicit the fact that the speaker’s assertion is intended as an update to the addressee’s public beliefs.
Christopher Davis 339
(11)
½½IMP ¼ kpkC.PIspkr(C) + p
In the same way that an utterance of ASSERT(p) serves to update the speaker’s public beliefs with p, an utterance of IMP(p) serves to update the speaker’s (not the addressee’s) public intentions with p. The speaker A can make his intention that B enter his study public by uttering the imperative sentence Come in. This has the effect of placing the proposition come-in(addr) in the set of A’s public intentions. With a bare imperative (one without yo), the fact that the speaker intends for the addressee to add p to his public intentions as well will follow from pragmatic considerations, in the same way that a bare assertion pragmatically indicates the speaker’s intention of having the addressee update his public beliefs with the propositional content of his assertion. In imperatives of various sorts (commands, requests, etc.), the contribution of yo’s intonation parallels that found in assertive sentences. Koyama (1997) argues that an imperative with yoY presupposes that the addressee is intent on not doing the action encoded by the imperative. An imperative with yo[, according to Koyama, carries no such presupposition and indicates only that the addressee’s understanding is not in line with that of the speaker, or that the addressee is not even aware of the issue. This contrast is seen in the following example, due to Shirakawa (1993), taken from the comic book Maison Ikkoku (gloss and translation are mine):
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
of herself, assuming rationality. I adopt a slightly modified version of Portner’s model, in which the TDL is expanded to what I will call an agent A’s public intentions in a context C, PIA(C), which are taken to consist of a set of propositions (not properties), to whose realization in the actual world the agent is publicly committed in C. The contents of an agent’s public intentions are interpreted, per Portner’s TDL, as providing a guide to what a rational and cooperative agent can be expected to do, relative to the contents of their public beliefs. The model of discourse contexts with two agents A and B is expanded to the four-tuple fPBA,PBB,PIA,PIBg. I use the term discourse commitments to refer to both the public beliefs and the public intentions of a given agent. Thus, in a context C, the discourse commitments of X consist of the two sets PBX(C) and PIX(C). The update semantics of the imperative operator IMP is then defined in a manner parallel to ASSERT, except that the update targets the speaker’s public intentions, rather than his public beliefs.
340 Decisions, Dynamics and the Japanese Particle yo (12) Context: Mitaka, trying to climb to a high spot, has Godai on all fours, and is using him as a footstool. He says to Godai: sikkari sasae-te-te-kure yo[/#yoY firmly support-INF-PROG-give.IMP yo[/#yoY ‘(Be sure and) Keep steady!’
(13) Context: Godai’s grandmother has asked Godai to take her to a class reunion. Godai tries to refuse, saying he already has other plans. At this point, Kyouko says the following: sonna. . . Godai-san tuiteit-te oage-nasai yoY/#yo[ that.kind.of Godai-san go.with-INF HON-please yoY/#yo[ ‘Hey now - Godai, go with her.’ Shirakawa claims that the request by Kyouko to Godai would naturally have falling intonation on yo. It is clear from the context in which this sentence occurs that the addressee, Godai, does not want or plan to take the grandmother to the reunion, since he has other plans that conflict with Kyouko’s request. The use of yoY here is natural because the hearer obviously does not want or intend to carry out the request. According to Shirakawa, using yo[ in this context is infelicitous. The contribution to imperatives of the intonational morphemes associated with yo is parallel with that in assertions. We need only modify the denotations in (10) so that the contextual object whose update is targeted is sensitive to whether the complement is ASSERT or IMP. The denotations in (10) are extended to handle imperative sentences by the denotations in (14).
(14) a. b.
kFkpkc: FðpÞðPBaddr ðcÞ+ pÞ if F ¼ assert kFkpkc: FðpÞðPIaddr ðcÞ+ pÞ if F ¼ imp kFkpkc: FðpÞððPBaddr ðcÞYqÞ+ pÞ if F ¼ assert ½½Y ¼ kFkpkc: FðpÞððPIaddr ðcÞYqÞ+ pÞ if F ¼ imp ½½[ ¼
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Here, it is most natural for yo to have rising intonation. Shirakawa (1993) notes that if yo is used with falling intonation here, it would indicate that Mitaka believes that Godai is in fact failing or will fail to support him securely; in the context of the story, however, Mitaka is simply checking to make sure that Godai will support him, with no implication that Godai is failing or will fail to do so. On the other hand, if the context is changed to one in which Godai is failing to hold Mitaka up, then the use of yoY becomes felicitous. Shirakawa gives another example from Maison Ikkoku illustrating the same point (gloss and translation are mine).
Christopher Davis 341
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Now the imperative facts can be explained in the same way as the assertion facts in 2.1 were. We expect for pragmatic reasons that an utterance of yoY (IMP(p)) should be used in a context C if the addressee’s public intentions in C already contains :p, or some other proposition that is inconsistent with p. In such a context, a straightforward update to PIaddr(C) with p will lead to inconsistency. Thus, the speaker must first request that the addressee remove :p, or some other proposition q inconsistent with p, from his set of public intentions before updating it with p. This is precisely the dynamic effect that an utterance of yoY (IMP(p)) has. This accounts for the observation of Koyama (1997) and Shirakawa (1993) that imperatives with yoY are felicitous just in case the hearer is taken to be committed to not doing the action encoded by the imperative. The example in (13) is thus felicitous with falling intonation on yo, since in this context Godai has made it clear that his prior plans and intentions are inconsistent with the request that he escort his grandmother. The infelicity of yo[ in this context is also predicted, since in this situation an update with the content of the imperative to Godai’s public intentions would lead to inconsistency. On the other hand, when the addressee is not contextually committed to a proposition inconsistent with p, the use of yo[(IMP(p)) is felicitous, while yoY(IMP(p)) is infelicitous. This fact was illustrated by the example from Shirakawa in (12). Recall that in the context of this example, it is not the case that Godai has indicated his unwillingness to keep steady for Mitaka. If Mitaka were to utter his imperative with yoY, it would instruct Godai to remove some proposition, most likely one that conflicts with the requested update, from his public intentions; but since Godai has not indicated any other intentions in this context, yoY is infelicitous, and yo[ must be used instead. The above account of the semantics of yo’s intonational associate in imperative utterances commits us to an account in which the dynamic effect of bare imperatives is an update to the speaker’s, but not the addressee’s, public intentions. An imperative with either yo[ or yoY, on the other hand, encodes both an update to the speaker’s public intentions (due to the IMP operator) and an update to the addressee’s public intentions (due to [ or Y). This distinction is supported by evidence from imperatives used to grant permission to the addressee. This use of the imperative is discussed briefly by Han (1999), who mentions the case of the imperative sentence Come in used in a context where the addressee has just knocked on the speaker’s door. Han argues for a semantics of imperatives in which the IMP operator encodes an update to the addressee’s plan set (analogous to the public intentions of the present article), but notes that permission imperatives
342 Decisions, Dynamics and the Japanese Particle yo
(15) Context: The addressee knocks on the speaker’s door. The speaker says: hait-te kudasai (#yo[/#yoY) come.in-INF please ‘Come in please (#yo[/#yoY).’ The infelicity of yo[ and yoY in this context can be explained by the fact that the intonational morphemes [ and Y encode an update to the addressee’s public intentions, and it does not make sense to request that the addressee update her public intentions with a proposition to which she is already publicly committed. The felicity of the bare imperative falls out from the fact that it does not encode an addressee-directed update.
2.3 Interim conclusion In this section I have argued for a semantics of the intonational morphemes associated with yo according to which they encode an update to the addressee’s discourse commitments. In assertions, this update is to the addressee’s public beliefs, while in imperatives it is to his public intentions. Bare assertions and imperatives, by contrast, encode only an update to the speaker’s own public beliefs or intentions. The semantic object returned by the combination of [ and Y with one of the force operators is a function from contexts to contexts, a CCP. It is up to the addressee whether this change goes through.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
are not handled straightforwardly by such a semantics, since the addressee has, by knocking, publicly conveyed an intention to come in. Thus, a request that she update her public intentions would be redundant. Han argues that such cases should be handled as indirect speech acts. I would like to suggest instead that in such cases the speaker is using the imperative in the usual way, to update his own public intentions with the proposition come-in(addr). As mentioned earlier, an update to the addressee’s public intentions is not conventionally encoded by the imperative itself, but is pragmatically inferred as the intent of the utterance, depending on the circumstances of the utterance. In the present circumstance, no such inference is made, since the addressee’s public intentions already include this proposition. If, however, either yo[ or yoY is used, then the sentence encodes an update to the addressee’s public intentions, due to the semantics of [ or Y. As illustrated in (15), the use of yo[ and yoY is infelicitous in contexts where the imperative is interpreted as granting permission.
Christopher Davis 343
3 RELEVANCE AND YO In this section I argue that the use of yo indicates that the speaker’s utterance is intended to guide the addressee in decision making. While in English the relevance relation between an assertion and the addressee’s decision problem may be left implicit, in Japanese these
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I have argued for a morphological decomposition of yo[ and yoY, according to which the morphemes [ and Y contribute to the update semantics of the sentence. The idea that intonational morphemes are used to relate the propositional content of an utterance to mutual beliefs of the discourse participants is found in Pierrehumbert & Hirschberg (1990), who say that in a dialogue between S and H, ‘S may seek to inform H of some proposition x by communicating that x is to be added to what H believes to be mutually believed between S and H—via the tune S chooses’. This idea is reflected in my account of the morphemes [ and Y, which encode an update to the addressee’s discourse commitments. When combined with one of the force heads ASSERT or IMP, the semantics of the resulting complex force head encodes an update to the discourse commitments of both the speaker and the addressee with its propositional complement. The approach to the meaning of [ and Y contrasts with the approach to intonational meaning taken by Pierrehumbert & Hirschberg (1990), however, in that these authors argue for a oneto-one mapping between intonational phonemes and meaning in English. In particular, the boundary tone H% is argued (p. 305) to indicate that the speaker wishes the hearer to ‘interpret the utterance with particular attention to subsequent utterances’. In contrast, I do not assume that every L%H% tune in Japanese carries the semantics of the morpheme [. In bare declaratives, for example rising intonation gives rise to an interrogative interpretation, rather than the addresseeoriented update semantics contributed by [ in yo[. Moriyama (1997) notes that rising intonation is ungrammatical with ‘command’ imperatives in standard (as well as Kyoto) Japanese, but that when yo is added to the imperative, the rising intonation becomes available. I take this as evidence that the morphemes [ and Y are licensed by the presence yo and that not all occurrences of sentence-final rising and falling intonation should be identified with the morphemes [ and Y. I leave to future research the question of whether the morphemes [ and Y are completely parasitic on yo or whether they are found in other contexts as well.
344 Decisions, Dynamics and the Japanese Particle yo relations must often be overtly indicated by the use of a discourse particle. I propose a formal analysis of yo’s meaning that captures the intuition that the use of yo serves to point out the relevance of an assertion to the addressee, and I extend the account to imperatives and questions.
3.1 yo as a guide to action Grice (1975) gives the following dialogue, whose interpretation relies crucially on the Maxim of Relation (i.e. Be Relevant!):
Grice (1975) places special emphasis on the unstated connection between A’s and B’s contributions: In this example, [. . .] the unstated connection between B’s remark and A’s remark is so obvious that, even if one interprets the supermaxim of Manner, ‘Be perspicuous,’ as applying not only to the expression of what is said but to the connection of what is said to adjacent remarks, there seems to be no case for regarding that supermaxim as infringed in this example. The obvious connection linking B’s remark to A’s remark (given the non-linguistic context) is simply that B’s assertion is made in order to help A get some petrol into his car. While the connection is not direct, Grice argues that it is so obvious as to not constitute a violation of any of the maxims. Replicating this dialogue in Japanese brings the obviousness of the connection between B’s assertion and A’s problem into doubt. The sentence in (17) is made by B in response to A’s situation11: (17) B: kono miti-o zutto it-ta tokoro ni at this road-ACC straight go-PAST place gasorinsutando-ga ari-masu #(yo[) be-HON gas.station-NOM ‘There’s a gas station straight down the road yo[.’ 11 This example contains yo[, rather than yoY, because the speaker’s assertion does not contradict any of the public beliefs of the addressee in this context. Similar considerations apply to the other examples in this subsection, all of which are felicitous with yo[ as opposed to yoY for the same reason.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(16) Context: A is standing by an obviously immobilized car and is approached by B. A: I am out of petrol. B: There is a garage around the corner.
Christopher Davis 345
The plain declarative without yo in this context is felt by informants to be rather less natural than the version with yo. Native speakers report that if B uses the bare declarative without yo, it sounds as if B is simply stating a fact, with no connection to A’s problem, and with no implication that this information will help A to resolve his problem (by getting gas at the station). The infelicity disappears if B follows his assertion with yo. This pattern is robust; in both of the following cases, the context is such that A faces some kind of dilemma, and B’s assertion is meant to provide information that will guide A in making a decision. In each case, B’s assertion is infelicitous as a plain declarative with no particle, but becomes completely natural with yo. A: aa, mayot-ta. dono susi-ni si-you ka na. Oh at.a.loss-PAST which sushi-DAT do-HORT Q PRT ‘I’m stuck—I wonder which sort of sushi I should get?’ B: koko-no maguro-wa oisi-i #(yo[) tasty-NONPAST #(yo[) here-GEN tuna-TOP ‘The tuna here is good yo[.’
(19)
A:
tabe-te-kara eiga-o mi ni ik-ou ka na eat-INF-from movie-ACC see to go-HORT Q PRT ‘I wonder if I should eat before going to the movie?’ B: mou 7-ji sugi deshou? eiga-wa already 7-o’clock past right movie-TOP 8-ji kara hajimar-u #(yo[) 8-o’clock from start-NONPAST #(yo[) ‘It’s already 7, right? The movie starts at 8 yo[.’
In the dialogue in (18), the implication is that, since the tuna is good, the hearer should get the tuna. But if B uses the bare declarative in this context, the sequence is infelicitous. Native speakers strongly prefer the version with yo. A similar situation is seen in example (19). The implication here is that there is not enough time to eat before going to the movies and that A should therefore go straight to the movies without eating. By using yo with the second assertion, the speaker indicates that the fact that the movie starts at eight, in conjunction with the fact that it is already seven, is sufficient to rule out the possibility that the speaker goes to eat before going to the movie. Just as in (18), the bare assertion is infelicitous in this context. A similar generalization is found with assertions that are used to suggest that the addressee do something other than what she is currently doing. The following examples all show that in such situations, the bare declarative is infelicitous, while the same sentence with yo is perfectly natural.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(18)
346 Decisions, Dynamics and the Japanese Particle yo
In both of these examples, the assertion is not made primarily in order to transmit the information encoded by the sentence to the hearer. Rather, the purpose is to guide the hearer’s action. In (20), the speaker knows that the hearer wants to get on the train when it comes and expects that the information that the train has arrived will be sufficient to cause the addressee to stop what she is doing and get on the train. In (21) the speaker knows that the addressee plans to go to the meeting and that the information that the meeting is starting at three will be sufficient to cause the hearer to stop what she is doing and go. The example in (20) is one in which the addressee’s expected reaction to the information conveyed is based on the speaker’s assumptions regarding the addressee’s desires. But the example in (21) shows that this expectation can also be based on the speaker’s assumptions regarding the addressee’s obligations, in this case her obligation to attend the meeting. More generally, yo can engage the law, morality, pleasure or any other contextually salient guide to action for the addressee. The examples in this subsection serve to show that when an assertion is made in order to guide the addressee’s action, the relevance relation between the information asserted and consequence for the addressee’s action must be explicitly indicated by the use of a discourse particle. I will argue in the rest of this section that yo serves to make these assertions felicitous precisely because it encodes the kind of relevance relation that, in English, is left implicit.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(20) Context: The addressee is waiting for a train, and wants to get on, but doesn’t notice that it has arrived. The speaker knows this, and says: densha ki-ta #(yo[) train come-PAST #(yo[) ‘The train is here yo[.’ (21) Context: The speaker knows that the addressee must attend a meeting, but even though the meeting time is fast approaching, the addressee is not getting ready to go. The speaker says: miitingu-wa san-ji kara desu #(yo[) meeting-TOP 3-o’clock from be.HON #(yo[) ‘The meeting starts at 3 yo[.’
Christopher Davis 347
3.2 Formal analysis
(22)
Partial Ordering of Worlds [modified from Portner’s (2007a)]: For all worlds wi, wj 2 \ CG(C), wi
The semantics of yo is defined with respect to the ordering on the context set defined in (22). (23)
12
a. ½½yo(CCP)(C) is defined iff: da 2 A(C#) "wi, wj 2 \ CG(C#) ½ ðaðaddrÞðwi Þ & wi
The set A(C) might be added as primitive element to the context, so that C is expanded to the five-tuple fPBA, PIA, PBB, PIB, Ag. Alternatively, A(C) might be derived from the other elements of C in a systematic way. 13 This ordering is in Portner’s article relatived to the properties in an agent’s TDL, which Portner argues contribute to the contextual ordering source. I use the definition more generally, as a definition of what the ordering source does. In a given context, the ordering source may be contextually specified with the contents of an agent’s TDL, or a similar discourse object, like the public intentions introduced in section 3.1. I should note that the partial ordering defined in (22) handles the cases of inconsistent premise sets discussed by Kratzer (1977), and can thus plausibly be used as a general way of ordering the modal base with respect to an ordering source for the purposes of modal interpretation.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I propose that yo serves to indicate the optimality for the addressee of some contextually salient action, given the post-update common ground. This account borrows from van Rooy (2003b) the idea of a contextually salient decision problem. I do not adopt all of the decisiontheoretic apparatus (world-action pairs, utility functions, etc.) used by van Rooy, but just the notion that a context C has a salient set of possible actions, A(C), from which the agents in that context must choose.12 As I show in section 3.4, this object can be seen as an extension of Roberts’s (1996) QUD to dialogues driven not by the question ‘What is the world like?’, but by the question ‘What should we do?’. Formally, I treat the elements of A as properties; the decision problem amounts to the question of which property it is optimal for an agent to have. The calculation of optimality, I argue, is relative to the contextual ordering source (Kratzer 1981), understood to be a set of propositions, such as the set of laws (deontic ordering source), desires (bouletic ordering source) or the like. This set of propositions imposes a partial order on the set of worlds compatible with the common ground. I adopt the ordering relation in (22), slightly modified from Portner (2007a):13
348 Decisions, Dynamics and the Japanese Particle yo b. Where defined, ½½yo(CCP)(C) ¼
CCP(C).
14 The presupposition in (23a) is taken here as a definedness condition, but might also be understood as a condition on felicitous contexts of utterance, in which cases the semantic value of utterances that violate the condition in (23a) would not be undefined, but the utterance as a whole would be pragmatically infelicitous.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The denotation in (23) is broken into two parts. The denotation in (23b) is a simple identity function, taking a CCP-type complement and returning an identical CCP (function from contexts to contexts) as a value. The presupposition14 in (23a) restricts the set of utterance contexts in which the value of (23b) is defined to those in which there is some contextually relevant action a such that there are no worlds compatible with the common ground in the post-update context (i.e. the context returned by applying the function denoted by the complement of yo to the utterance context) in which the addressee chooses action a that are lower ranked according to the ordering source than a world in which the addressee does not choose action a. Or more simply put, all worlds in which the addressee chooses action a are at least as good in terms of the contextual ordering source as ones in which he does not. I describe this situation by saying that the action a is optimal in the post-update context, given the common ground and ordering source. The contribution of yo, then, is to indicate to the addressee that there is a contextually salient optimal action, if he accepts the context update associated with yo’s complement. I now give an extended discussion of how the semantics in (23) applies to the example in (18). For the sake of simplicity, consider a context in which there are only two kinds of sushi on the menu, tuna and salmon. A is trying to decide which of these to order. The set of contextually salient actions for A is thus A ¼ ft; sg consisting of the actions t ¼ ‘order the tuna’ and s ¼ ‘order the salmon’. Assume that as far as A’s decision is concerned, the main criterion is taste. Assume further that A has never eaten at this restaurant and does not know whether the tuna is tasty or whether the salmon is tasty. The CG in this context can be partitioned into three subsets T, S and B, where T is the set of all worlds in which only the tuna tastes good, S is the set of worlds in which only the salmon tastes good and B is the set of worlds in which both the tuna and the salmon taste good. This partition can be further partitioned into those worlds in which A chooses tuna and those worlds in which A chooses salmon (ignoring for simplicity those worlds in which the speaker chooses neither or both). We can indicate this further subdivision with the subscripts t(A) and s(A), respectively. Thus, for example the subset of
Christopher Davis 349
worlds compatible with the common ground in which only tuna is tasty but in which A chooses salmon is written Ts(A), while the subset of worlds in which only the tuna is tasty and A chooses tuna is written Tt(A). The ordering source in this context ranks worlds in which A eats tasty sushi over worlds in which A eats sushi that is not tasty. This gives us the following partial ordering on worlds in the pre-update context, expressed in the subset notation just introduced: (24)
fTs(A), St(A)g
(25)
fTs(A)g
The ordering in (25) satisfies the requirements imposed by the semantics of yo, since action t is optimal. We do not require for the felicitous use of yo a context in which all maximal worlds in the ordering are tuna eating worlds; the possibility exists in this context that both the tuna and salmon are tasty, and in this case worlds in which A chooses salmon are maximal according to this ordering source. The optimality of action t in this context is in virtue of the fact that there are no worlds in which A chooses t that are ordered below worlds in which A chooses s. This shows how use of yo indicates the relevance of an assertion. In this case, use of yo indicates that the asserted proposition is sufficient to resolve the addressee’s decision problem, since the postupdate context contains only worlds in which it is optimal to order tuna. The semantics of yo does not make explicit which action is optimal; this must be inferred by the same pragmatic mechanisms that apply in English, where the equivalent assertion without yo is understood to resolve the addressee’s decision problem, in this case by suggesting that the addressee order tuna. The difference between English and Japanese
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It can be seen from the ordering in (24) that neither of the two actions is optimal in the pre-update context. B now makes a yo[-marked assertion of u ¼ ‘The tuna here is good’. As per the discussion in section 2, the intonational morpheme [ combines with ASSERT, which then combines with u to return a function from contexts to contexts in which both the speaker’s and addressee’s public beliefs, and hence the CG, in the output context are updated with u. The contextual update encoded by B’s utterance has the effect (if the addressee accepts and applies the update encoded) of eliminating from the output CG those worlds in which only the salmon is tasty, that is, the subsets Ss(A) and St(A). The ordering on the worlds consistent with the (potential) post-update context is thus as in (25).
350 Decisions, Dynamics and the Japanese Particle yo
15 At least if we follow Grice. J. Kingston (personal communication) suggests that the English facts are not as straightforward as this discussion suggests. It seems likely that a natural discourse in English would require a particular intonational pattern, the use of a particle like well, or similar devices in order to make sequences like this one felicitous.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
is that in Japanese this relevance relation between the asserted content and the addressee’s decision problem must be made explicit by a particle like yo, while in English it can be left implicit.15 A similar analysis applies to the example in (17), in which the contextually salient decision problem is how A can get gas in the car. In a realistic context, there are going to be any number of potential options for A, including calling a tow truck, walking down any number of streets in search of a gas station, syphoning gas from a car in a nearby parking lot, etc. By using yo with his assertion, B indicates that, after learning the asserted information, the addressee’s decision problem is resolved, since he can walk to the gasoline station to purchase gasoline. In the preceding examples, the addressee’s contextual decision problem was more or less explicitly given by the preceding linguistic context, but we also have examples in which yo is used without any previous linguistic clue as to the decision problem being referenced. In these examples, the non-linguistic context plays a particularly crucial role in understanding the meaning of yo. Without a preceding linguistic context that sets up a decision problem faced by the addressee, use of yo typically indicates that the addressee should do something other than what he or she is currently doing. This can be seen in example (20). We can represent the set of alternative actions for the addressee in this context as A ¼ fb; sg, where b ¼ ‘get ready to board the train’ and s ¼ ‘keep sitting’. In the context in which the speaker makes the assertion in (20), the addressee is sitting down, and there is no indication that she is going to get ready to board the train. But an ordering source based on the addressee’s desires in this context ranks worlds in which the addressee boards the train above those in which she misses it. We can further assume that once the train has arrived it is necessary to get ready to board the train in a timely fashion in order not to miss it. The assertion in (20) thus serves to update the common ground in such a way that all worlds consistent with the postupdate context are ones in which it is optimal for the addressee to stop sitting and get ready to board the train. The same pattern is seen in the example in (21). In this context, the set of relevant alternative actions to which yo makes reference is something like A ¼ fg; :gg, where g ¼ ‘go to the meeting now’ and :g ¼ ‘do not go to the meeting now’. The behaviour of the addressee
Christopher Davis 351
suggests that, without intervention, she is not going to get ready to go to the meeting, and her behaviour is thus consistent with :g rather than g. The speaker expects her assertion in (21) to be sufficient to get the addressee to go to the meeting. The addressee’s obligations are such that she must attend the meeting, and so in all worlds consistent with the common ground in which the meeting is starting soon, the optimal action for the addressee, given an ordering source based on the addressee’s obligations, is to get ready to go, since otherwise she will be late or miss the meeting.
3.3 Imperatives with yo
(26)
tabe-te yoY eat-IMP yoY ‘Eat yoY.’
Native speakers report that by using yo in this context, not only is A telling B to eat but is also pointing to the fact that B should eat, in this context because A has gone to the trouble to make the food for him. The imperative without yo does not have this implication. In this 16 The use of Y here follows from the fact that the addressee’s public intentions seem to be incompatible with the request that the speaker is making.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We have seen that, in assertions, yo is used to indicate that all worlds consistent with the post-update common ground are such that a particular action is optimal for the addressee. Unlike assertions, the CCP of imperative sentences does not serve to update the common ground. Instead, imperatives encode an update to the speaker’s public intentions. When combined with yo[ or yoY, an imperative also encodes an update to the addressee’s public intentions, due to the semantics of [ and Y. When used with imperatives, yo seems to indicate that the common ground is such that the addressee should do the action encoded by the imperative. That is, the use of yo indicates that the action encoded by the imperative is optimal with respect to some contextually specified ordering, given the common ground. While the optimal action must be inferred in assertions with yo, in imperatives it is resolved to the action encoded by the imperative. An imperative with yo thus makes explicit the action that is indicated by the semantics of yo to be optimal according to the common ground and ordering source. Consider the following example: A has made dinner for B, putting a lot of effort into the process. A notices that B does not seem to be eating his food and gets upset about this, since she worked so hard to cook dinner for B. She then says16:
352 Decisions, Dynamics and the Japanese Particle yo context, it is already common ground that the speaker went to a lot of trouble to make dinner for the addressee. The use of yo indicates that it follows from this (and other facts in the common ground) that it is optimal for the addressee to eat his dinner. Optimal in what sense? In this case, the optimality is determined by the speaker’s desires or perhaps the addressee’s obligations. We can compare the imperative in (26) with the assertion in (27).
The sentence in (27) asserts that the speaker went to a lot of trouble to make the addressee dinner, and the use of yo indicates that from this it follows that the addressee should eat. By contrast, as we saw, the imperative in (26) directly encodes the action that the addressee should do, and the use of yo indicates that the addressee should do this on the basis of what is already in the common ground. The example in (28) combines the assertion with the imperative. (28) A:
isshoukenmei tsukut-ta nda kara tabe-te yoY eat-IMP yoY with.much.effort make-PAST PRT so ‘I put a lot of effort into this, so eat yoY.’
Here, A explicitly indicates the basis for her suggestion that B really should eat the food, namely, because A has made it for him, and therefore it follows from politeness, consideration for A’s feelings, or the like, that B should eat. The example in (29) is from McCready (2005) (29) mata nanika at-tara soudan ni ki-te again something be-COND consultation for come-IMP kudasai (yo[) please (yo[) ‘If anything else happens, please come talk to me again (yo[).’ McCready notes that if the sentence in (29) occurs with yo, then ‘the speaker seems to have personal reasons for wanting the hearer to consult with him’, while the same sentence without yo has no such implication. This can be understood by assuming that in this context optimality is defined relative to the speaker’s desires, so that the use of yo says that all worlds consistent with the common ground are worlds in which it is optimal according to the speaker’s desires that the addressee come to talk to the speaker.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(27) isshoukenmei tsukut-ta nda yo[ with.much.effort make-PAST PRT yo[ ‘I put a lot of effort into this yo[.’ (Implied: ‘And therefore you should eat it.’)
Christopher Davis 353
3.4 Questions and yo Compared to declaratives and imperatives, little has been written about the behaviour of yo in interrogatives. In fact, with canonical information-seeking questions, yo seems to be simply ungrammatical, as noted by Shirakawa (1993). He gives the following examples: (30)
a.
mada, ame, fut-te-ru ka (*yo) still rain fall-PROG-NONPAST Q (*yo) ‘Is it still raining?’ b. nomimono, nani-ga ar-u ka (*yo) drink what-NOM be-NONPAST Q (*yo) ‘What do you have to drink?’ c. ima, nan-ji da ka wakari-masu ka (*yo)? Q (*yo) Now what-time COP Q know-HON ‘Do you know what time it is now?’
These are all information-seeking questions, with the canonical syntax for questions in Japanese using the question particle ka.
17 The fact that it is the action encoded by the imperative that is understood as the optimal one referred to by yo does not fall out directly from the semantics, since yo merely serves to say that there is some action that is optimal in the post-update common ground. I assume that the identification of this action with that encoded by the imperative is handled pragmatically; since the imperative itself encodes an action, its utterance serves to make that action salient.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
These examples all lead to the following generalization: with assertions, yo is used to indicate that the asserted content is sufficient, given the common ground, to make some action optimal for the addressee. On the other hand, with imperatives yo indicates that the pre-update common ground is sufficient to make the action encoded by the imperative optimal, relative to some contextually specified ordering, for the addressee. This follows from the fact that, with imperatives, the post-update common ground is the same as the pre-update common ground, because the CCP of imperatives with yo targets the addressee’s public intentions rather than her beliefs. The semantics of yo says that all the worlds compatible with the postupdate common ground are ones in which a particular action is optimal. But since with imperatives the post-update common ground is the same as the pre-update common ground, this is the same as saying that an imperative with yo conveys that all worlds compatible with the pre-update common ground are ones in which a particular action, namely, the one encoded by the imperative, is optimal.17
354 Decisions, Dynamics and the Japanese Particle yo The fact that the sentences in (30) are ungrammatical with yo cannot be attributed to just the form of the sentence, however. Shirakawa gives the following examples of sentences containing the question marker ka that are grammatical with yo. (31) a.
The questions in (31) are not information seeking, but rhetorical, as I have tried to indicate in my translations. The sentence in (31a) can be used if the speaker is convinced that the hearer cannot in fact build a house with his salary, while the sentence in (31b) can be used if the speaker is convinced that no one would buy the kind of book in question. The syntax of the rhetorical questions in (31) is no different from that of the standard information-seeking questions in (30), suggesting that the restriction of yo to rhetorical questions must be accounted for in terms of meaning rather than syntax.18 One way to understand these facts would be to treat rhetorical questions as assertions and collapse the treatment of yo in rhetorical questions with its treatment in assertions. Then one could state a restriction to the effect that yo is infelicitous in interrogatives, but that rhetorical questions are assertive rather than interrogative. This analysis would commit us to the view that rhetorical questions are semantically assertions. Han (2002) has argued, though, that the assertive character of rhetorical questions is due to pragmatic reasoning. I would like to suggest a way in which the behaviour of yo in rhetorical questions can be explained without assuming that rhetorical questions are semantically assertions. A special case of a decision problem is one in which an agent is trying to decide what she should believe about the world. In a possibleworlds setting, this amounts to the question of which of all the possible worlds the agent in fact inhabits. This can be cast as a decision problem in which each possible action corresponds to picking a cell of the partition to believe. We can represent such actions as bp for each 18 The sentences in (31) can be interpreted as rhetorical questions in the absence of yo, so that yo is not required for a rhetorical reading. If yo is present, however, only the rhetorical reading is possible.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
kimi-no kyuuryou de ie-ga with house-NOM you-GEN salary tate-rare-ru ka (yoY) build-can-NONPAST Q (yoY) ‘You think you can build a house with your salary!?’ b. konna hon, dare-ga ka-u ka (yoY) this.kind.of book who-NOM buy-NONPAST Q (yoY) ‘Who the hell would buy a book like this!?’
Christopher Davis 355
(32)
dare-ga boku-no biiru-o non-da who-NOM me-GEN beer-ACC drink-PAST ‘Who drank my beer nda yoY?’
nda yoY nda yoY
19 For a concrete proposal on the dynamics of interrogatives which affects the structure of the CG without eliminating worlds, see Groenendijk (1999).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
proposition p that picks out a cell in the partition induced by the question under consideration. For a question with n cells (possible answers), there will be n distinct actions bpi of believing in the proposition corresponding to the ith cell, giving the action set A ¼ fbp1 ; bp2 ; . . . ; bpn g. Viewed in this way, the QUD of Roberts (1996, 2004) is reduced to a special case of a decision problem in which the discourse participants are all trying to decide which answer to a contextually salient question they should believe. For every question Q ¼ fp1, p2, . . . , png there is a corresponding decision problem A ¼ fbp1 ; bp2 ; . . . ; bpn g. If the set of worlds consistent with the common ground in a context C entails some pi 2 Q, then the question Q is resolved in C, as is the corresponding decision problem as to which proposition in Q the agent should believe. By asking a question Q, the speaker is introducing the decision problem of which of the elements of Q should be believed. This fact can be used to understand the behaviour of yo in rhetorical questions as well as its infelicity in non-rhetorical questions. The question Q gives rise to a contextual decision problem A. The use of yo indicates that all worlds in the post-update context set are ones in which a particular action a 2 A is optimal. Optimality in this case can be equated with truth; it is optimal for the agent to believe the proposition p in just those worlds in which p is true. This amounts to saying that using yo with a question indicates that the post-update context set entails an answer to the question. But a question does not serve to add a proposition to any discourse participant’s discourse commitments and does not eliminate worlds from the common ground.19 Thus the postupdate common ground is identical to the pre-update common ground when a question is used. We thus predict that the use of yo in a question indicates that the pre-update common ground entails an answer to the question being asked. This is precisely what we find. The use of yo in a question forces a rhetorical interpretation, indicating that the answer to the question is already known to all discourse participants. The example in (32) seems to violate the generalization that yo cannot occur with information-seeking questions.
356 Decisions, Dynamics and the Japanese Particle yo The sentence in (32) constitutes a request for information; it is used in situations where the speaker does not know who drank his beer and wants the hearer to tell him who did. This seems to violate the restriction against yo in information-seeking questions that was suggested by the ungrammaticality of the examples in (30) and was argued to follow directly from the semantics of yo and the dynamics of questions. The sentence in (32) contains nda, which I have left unglossed, rather than the question particle ka. Questions are canonically formed in Japanese with the sentence-final question particle ka.20 Examples of canonical yes/no and wh-questions are given in (33). Tarou-ga ki-ta ka Taro-NOM come-PAST Q ‘Did Taro come?’ b. dare-ga ki-ta ka who-NOM come-PAST Q ‘Who came?’
The examples in (30) and (31) that formed the basis of the generalization that yo occurs only with rhetorical questions all have this canonical syntax. Sentence (32) has a different syntax, with nda replacing the question particle ka. It is rather clear that nda is not simply a variant question particle. For one thing, the question particle in the wh-question (33b) can be replaced with nda and still be interpreted as a question, while the question particle in the yes/no question in (33a) cannot be replaced with nda while retaining its interrogative meaning. Tarou-ga ki-ta nda Taro-NOM come-PAST nda ‘Taro came.’ (Cannot mean ‘Did Taro come?’) b. dare-ga ki-ta nda who-NOM come-PAST nda ‘Who came?’
(34) a.
This reflects a general pattern: nda may be used to form wh-questions, but not yes/no questions. Another contrast between questions with nda and questions with ka is that the latter can be embedded, while the former cannot: (35) dare-ga tabe-ta ka/*nda who-NOM eat-PAST ka/*nda
sira-nai know-NEG
20 When the verb has non-honorific morphology, the use of ka sounds slightly unnatural. In general, the particle ka can be left absent, as long as the utterance is given rising intonation.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(33) a.
Christopher Davis 357
‘I don’t know who ate.’ Morphologically, nda seems to be a contracted form of no da, which in turn consists of the nominalizer no and the copula da. This suggests that a sentence ending with nda is syntactically a declarative and might have a different CCP than canonical questions with ka. This view is supported by examples like (36) showing that nda is used to form assertions. The example in (37) is interpreted pragmatically as an imperative but seems to be syntactically and semantically an assertion, as indicated in the gloss. boku-no biiru-o maiku-ga non-da me-GEN beer-ACC Mike-NOM drink-PAST ‘Mike drank my beer.’
(37)
omae-ga tabe-ru nda you.ANTI.HON-NOM eat-NONPAST nda ‘You will eat.’
nda nda
These examples suggest the possibility that wh-interrogatives with nda might not be associated with the same CCP as canonical questions. I tentatively suggest that wh-interrogatives with nda should be treated as a kind of assertion or imperative. A sentence like that in (32) would then correspond to an assertion like The question is who drank my beer or an imperative like Tell me who drank my beer. The use of yo in whinterrogatives formed with nda would then work as it does in assertions or imperatives. Some further notes are in order regarding the behaviour of yo in questions. First, yo must have falling intonation in order to be felicitously used with questions; yo[ in these examples is bad to the point of ungrammaticality. This holds for both the rhetorical questions in (31) and nda interrogatives like that in (32). Also, the examples in (31) and (32) have a distinctly aggressive or confrontational flavour to them. This hint of aggressiveness or anger is one that seems to hold for all cases of questions with yo, even though other sentence types with yo do not have this character. I have no explanation for these facts at this time.
3.5 Comparison with McCready’s account I have traced the connection between yo and relevance to the addressee’s decision problem, represented as a choice from a contextually salient set of actions. I showed that this idea accounts for the behaviour of yo not only in assertions but also in imperatives and questions. McCready’s (2009) presuppositional component of yo’s
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(36)
358 Decisions, Dynamics and the Japanese Particle yo meaning, repeated in (38), captures the intuition that yo is used in assertions whose propositional content is informative to the addressee. (38) BS IVH ðQ; uÞ > ds
4 FURTHER ISSUES
4.1 Structural considerations The decomposition of yo[ and yoY into the single morpheme yo plus one of two intonational morphemes [ or Y gives rise to the structure in (39). (39)
The structure in (39) shows the syntactic organization of a sentence containing yo[ or yoY. At the lowest node a force head combines with one of the intonational morphemes [ or Y, returning a function of the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The present account can also derive the restriction of yo to assertions of hearer-new, informative propositions. This follows from the fact that yo requires that the post-update common ground resolve some contextually salient decision problem of the addressee. Since this will only occur if the post-update common ground differs from the pre-update common ground, it follows that an assertion with yo will be informative. Moreover, I have shown that the contribution of yo in assertions goes beyond an indication that the asserted proposition is informative to the addressee. It must also be relevant, in a sense made precise in this section. An advantage of the present account is that it extends naturally to the behaviour of yo in non-assertive utterances. It is unclear how the meaning in (38) should be extended to the behaviour of yo in imperatives and questions. The present proposal builds on the intuition underlying McCready’s denotation in (38), but is better able to capture yo’s contribution in a range of contexts and clause types.
Christopher Davis 359
4.2 Other particles I have provided evidence in this article that the particle yo functions as a modifier of CCP meanings. Moreover, the intonational associate of yo functions as a modifier of the illocutionary force head itself. This analysis may offer a useful approach to a host of particles which have been argued to affect meaning at the level of speech acts. Karagjosova (2003) suggests that the German modal particles (MPs) ja, doch, eben/ halt and auch carry meaning in two ways: by affecting the speech act performed by the utterance and by affecting the discourse function of the utterance. Thus, for example declaratives with doch are said to encode an illocutionary remind-act, as opposed to the assert-act conventionally conveyed by a bare declarative. Moreover, the use of doch in a declarative serves to convey (in a context-dependent way) the intended discourse function of the declarative, for example as a correction of the addressee’s previous linguistic act. This dual nature of discourse particles is reflected in semantics of yo[ and yoY, but with the two kinds of meaning decomposed. The intonational morphemes [ and Y encode a change to the illocutionary force of the utterance by directly modifying the semantics of the illocutionary force head. The morpheme yo conveys, in a contextdependent way, information about the intended discourse function of the utterance: to guide the addressee in the resolution of a contextually salient decision problem. There is evidence that, language internally, different discourse particles occupy different syntactic positions, with attendant differences in meaning and semantic composition. Zimmermann (2009), following Jacobs (1991), argues that the German particle ja is a modifier of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
same type as the original force head. The intonational morphemes thus act as a kind of adverbial modifier to the force head. The modified force head FORCE# then combines with a propositional complement, returning a CCP, of type ÆC,Cæ. The morpheme yo then combines with this CCP, returning the same CCP with an added presuppositional meaning, as described in section 4.1. One nice consequence of the semantics proposed for yo in this article is that it is able to explain why yo cannot be embedded. I have argued that yo takes as its argument an object that is a function from contexts to contexts. The force head of a sentence is responsible for taking a propositional object and returning a CCP, so that yo is structurally higher in the syntax than the force head. To the extent that force heads cannot be embedded, it follows that yo is also unembeddable.
360 Decisions, Dynamics and the Japanese Particle yo
4.3 Discourse particles and pragmatics We saw that the dialogue in (16), repeated in (40), was argued by Grice to exhibit such an obvious relevance relation between B’s remark and the prior remark of A that no maxim violations were risked. (40) Context: A is standing by an obviously immobilized car and is approached by B. A: I am out of petrol. B: There is a garage around the corner. But in Japanese, bare assertions are not felicitous in these kinds of contexts. Rather, the relevance relation has to be overtly indicated with the use of a particle. Bare assertions seem to become infelicitous in Japanese to the extent that the preceding linguistic context fails to explicitly indicate the question or decision problem that the assertion addresses. The generalization can be seen by comparing B’s assertion in (41) in response to each of the two preceding utterances of A. (41) A: 21
a.
Nihon-no sinbun
doko
I thank an anonymous reviewer for pointing out this issue.
de
ka-e-ru?
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
illocutionary operators, while the particles doch and wohl are not. Evidence for this position comes from the fact that the latter particles can be embedded and are interpreted in their embedded environment, while ja resists embedding except in reported speech acts and is always interpreted relative to the utterance context. If Zimmermann is right, then ja might be given an analysis either as a modifier of the illocutionary force head itself, like the intonational morphemes [ and Y, or as a function on the CCP meaning of the entire utterance, like yo. Law (2002) provides evidence that a subset of SFPs in Cantonese occupy the Force head at the periphery of the clause structure, providing further support for the idea that some discourse particles function semantically at the level of illocutionary force operators. A final issue of cross-linguistic relevance is the decomposition of particle meanings from their associated intonational or stress patterns. The German MPs, such as doch and ja, can be stressed or unstressed, with attendant differences in meaning. The question arises whether a decomposition of particle meanings into a basic core and an additional intonational meaning can be applied to these particles as well.21
Christopher Davis 361
Japan-GEN newspaper where at buy-can-NONPAST ‘Where do they sell Japanese newspapers?’ b. Nihon-no sinbun yomi-tai na Japan-GEN newspaper read-want PRT ‘I really want to read a Japanese newspaper.’ B: eki de ka-e-ru (yo[) station at buy-can-NONPAST (yo[) ‘You can buy one at the station (yo[).
(42)
a.
b.
Mary-ga ringo-o tabe-masi-ta Mary-NOM apple-ACC eat-perf.hon-PAST i. ‘Mary ate the apple.’ ii. ‘I am speaking nicely to you.’ Mary-ga ringo-o tabe-ta Mary-NOM apple-ACC eat-PAST
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
If A asks the question in (41a), then B’s assertion is of a form that directly picks out one of the propositions that constitutes a resolving answer to the question. In this case, native speakers report that B’s answer without yo is not so bad, although there seems to be a preference for the response with yo. At a more subtle level, speakers report an intuition that if B’s answer does not have yo, then it is just answering the question asked by A, while using yo seems to indicate more directly that the speaker expects the addressee to go to the station as a result of learning the information asserted. If A makes a statement like that in (41b), then native speakers consistently report that B’s assertion without yo is completely infelicitous. By using yo, B’s assertion becomes felicitous in this context, and moreover conveys the fact that B expects this information to help A get a Japanese newspaper. We thus see that the felicity of bare assertions in Japanese degrades rapidly in so far as the assertion is meant to resolve a decision problem that is implicit in the context, but which has not been directly encoded in a preceding question by the addressee. Why should there be cross-linguistic variability in the pragmatic licensing conditions of basic assertions? It may be that if a language has a robust system of discourse particles that provide scaffolding for pragmatic inference, then failure to use an available particle is not free of pragmatic implications. A similar situation is seen in the system of honorific marking in Japanese. The example in (42a) from Potts & Kawahara (2004) contains what Harada (1976) calls performative honorification. This example can be contrasted with the one in (42b) which differs minimally in that it does not contain the performative honorific.
362 Decisions, Dynamics and the Japanese Particle yo ‘Mary ate an apple.’
A proper analysis of the particles requires . . . an analysis of speech acts in terms of the conditions under which it can be carried out, the effects that are achieved if the act is taken seriously by the hearer together with the effects that the speaker intends to achieve. Discourse particles are means for indicating that these are not the normal ones and that other conditions or intended effects apply.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The contribution of the performative honorific is roughly described by the gloss in (42a-ii). As indicated, the effect of the morpheme is to express politeness to the addressee. In contrast, the sentence in (42b), which lacks an honorific, does not contain any meaning component indicating the speaker’s politeness, or lack thereof, towards the addressee. But using a non-honorific sentence like the one in (42b) has the effect of being non-polite. That is, if the addressee is someone to whom polite speech is socially appropriate, the use of a nonhonorific sentence like that in (42b) will sound rude. This is not because the sentence encodes anything like non-politeness. Rather, since the option in (42a) exists, if the speaker fails to use it, he indirectly indicates that he is being non-polite, and often (by implicature) impolite, to the addressee. The use of sentences without direct encoding of addressee honorification thus implicates that the speaker does not show the addressee the kind of politeness that the performative honorific encodes. One cannot opt out of the system; a speaker cannot simply choose to never use honorifics, and allow other linguistic and nonlinguistic features of his behaviour to indicate the politeness that is conventionally encoded by the honorific. Once a system of grammaticized honorification is in place, failure on the part of the speaker to use an honorific is interpreted as non-politeness on the part of the speaker. The same pattern holds in the examples of assertions in Japanese whose pragmatic felicity requires the use of a particle. Since the Japanese particle yo conventionally encodes a relevance relation between the content of an utterance and some decision facing the addressee, failure to use yo tends to indicate a lack of such a relationship. The system of particles provides scaffolding supporting the pragmatics of communicative intent, in the same way that the system of honorifics provides scaffolding for the pragmatics of politeness. The current account embodies an intuition about the contribution of particles expressed by Zeevat (2003):
Christopher Davis 363
Acknowledgements I am very grateful to John Kingston, Angelika Kratzer and Christopher Potts for their help, as well as for their criticism of earlier instantiations of this work. I am also grateful to Yurie Hara, Masashi Hashimoto, Misato Hiraga, Shigeto Kawahara, Eric McCready, Yasutada Sudo, Shoichi Takahashi and Yoko Takahashi for judgments and prolonged discussion. This article benefited greatly from the comments and criticism of two anonymous reviewers, due to whose insights the analysis was greatly improved. Thanks also to Maria Biezma, Kai von Fintel, Peggy Speas, Adrian Staub and the UMass Evidentials Group. This work is supported, in part, by National Science Foundation (NSF) grants BCS-0642752 and HSD-0527509. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF.
CHRISTOPHER DAVIS Department of Linguistics University of Massachusetts Amherst 225 South College 150 Hicks Way Amherst, MA 01003 e-mail:
[email protected]
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The present account gives a denotation of yo which indicates ‘the effects that are achieved if the act is taken seriously by the hearer’; yo tells the hearer that, if he accepts the update to his commitment set encoded by yo’s intonational associate [/Y, then there will be some action in a contextually salient decision problem that is optimal in all worlds consistent with the updated common ground relative to some contextually supplied ordering. The use of yo thus provides a kind of glue linking the conventional semantics of an utterance with its pragmatics and intended role in discourse. This link is explicitly connected to a representation of the discourse participants’ beliefs, intentions and goals, giving the present account a strong conceptual connection to Belief-Desire-Intentions models of practical reasoning (Bratman 1987), and related proposals for the intentional structure of discourse (Grosz & Sidner 1986; Lochbaum 1998). The meaning contributed by yo helps the hearer understand the speaker’s intended discourse contribution in saying what he said. The computation involved is pragmatic; the hearer must deduce the speaker’s intent on the basis of the sentence’s propositional content, the content contributed by particles and other non-assertive elements, and the discourse context. Using yo helps to narrow the range of possible speaker intentions.
364 Decisions, Dynamics and the Japanese Particle yo REFERENCES Doctoral Dissertation, University of Massachusetts, Amherst, MA. Jacobs, Joachim (1991), On the semantics of modal particles. In Werner Abraham (ed.), Discourse Particles. John Benjamins. Amsterdam. 141–62. Kamio, Akio (1994), The theory of territory of information: the case of Japanese. Journal of Pragmatics 21: 67–100. Karagjosova, Elena (2003), Modal particles and the common ground: meaning and functions of German ja, doch, eben/halt, and auch. In Peter Ku¨hnlein, Hannes Rieser, and Henk Zeevat (eds.), Perspectives on Dialogue in the New Millenium. John Benjamins. Philadelphia, PA. 335–49. Katagiri, Yasuhiro (2007), Dialogue functions of Japanese sentence-final particles ‘Yo’ and ‘Ne’. Journal of Pragmatics 39:1313–23. Kokuritsu Kokugo Kenkyuujo [National Institute for Japanese Language] (2006), Nihongo hanashi kotoba koopasu no koochikuhou [Construction of the Corpus of Spontaneous Japanese]. NIJL report No. 124. Tokyo (http:// www2.kokken.go.jp/;csj/public/). Koyama, Tetsuhara (1997), Bunmatsusi to bunmatsu intoneeshon (Sentence Final Particles and Sentence Final Intonation), Volume 1 of Bunpou to Onsei [Speech and Grammar], Chapter 6. Kuroshio, Japan. 97–119. Kratzer, Angelika (1977), What ‘‘must’’ and ‘‘can’’ must and can mean. Linguistics and Philosophy 1:337–55. Kratzer, Angelika (1981), The notional category of modality. In Hans-Ju¨rgen Eikmeyer and Hannes Rieser (eds.), Words, Worlds, and Contexts. New Approaches in Word Semantics. de Gruyter. Berlin. 38–74.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Bratman, Michael E. (1987), Intention, Plans, and Practical Reason. Harvard University Press. Cambridge, MA. Ga¨rdenfors, Peter (1988), Knowledge in Flux: Modeling the Dynamics of Epistemic States. MIT Press. Cambridge, MA. Grice, H. Paul (1975), Logic and conversation. In Peter Cole and Jerry L. Morgan (eds.), Syntax and Semantics, Volume 3: Speech Acts. Academic Press. New York, 41–58. Groenendijk, Jeroen (1999), The logic of interrogation: classical version. In T. Matthews and D. L. Strolowitch (eds.), Proceedings of the Ninth Conference on Semantics and Linguistics Theory (SALT-9). CLC Publications. Stanford, CA. 109–26. Grosz, Barbara J. & Candace L. Sidner (1986), Attention, intention and the structure of discourse. Computational Linguistics 12:175–204. Gunlogson, Christine (2003), True to Form: Rising and Falling Declaratives as Questions in English. Routledge. New York. Han, Chung Hye (1999), The contribution of mood and force in the interpretation of imperatives. In P. Tamanji, M. Hirotani and N. Hall (eds.), Proceedings of the 29th North East Linguistics Society. GLSA. Amherst, MA. 97–111. Han, Chung Hye (2002), Interpreting interrogatives as rhetorical questions. Lingua 112:201–29. Harada, S. I. (1976), Honorifics. In Masayoshi Shibatani (ed.) Syntax and Semantics, Volume 5: Japanese Generative Grammar. Academic Press. New York. 499–561. Heim, Irene (1982), The Semantics of Definite and Indefinite Noun Phrases.
Christopher Davis 365 Law, Ann (2002), Cantonese sentencefinal particles in the CP domain. In Ad Neeleman and Reiko Vermeulen (eds.), UCLWPL, vol 14. 375–98. Lochbaum, Karen E. (1998), A collaborative planning model of intentional structure. Computational Linguistics 24: 525–72.
McCready, Eric (2005), The Dynamics of Particles. Doctoral Dissertation, University of Texas Austin. McCready, Eric (2006), Japanese yo: Its semantics and pragmatics. Sprache und Datenverarbeitung 30:25–34. McCready, Eric (2009), Particles: dynamics vs. utility. In Yukinori Takubo, Tamohide Kinuhata, Szymon Grzelak, and Kayo Nagai (eds.), Japanese/Korean Linguistics 16. University of Chicago Press. Chicago, IL. Moriyama, Takuro (1997), Meirei hyougen to sono intoneeshon: Kyoutoshi hougen o chuushin ni [The Intonation of Directive Expressions in the Kyoto City Dialect], Volume 2 of Onsei to Bunpou [Speech and Grammar], Chapter 3. Kuroshio, Japan. 39–55. Noda, Harumi (2002), Syuuzyosi no kinou [The functions of sentence-final particles]. In Modariti [Modality]. Kuroshio, Japan. 261–88. Pierrehumbert, Janet B. & Julia Hirschberg (1990), The meaning of intonational contours in the interpretation of discourse. In Philip R. Cohen, Jerry Morgan, and Martha E. Pollack (eds.), Intentions in Communication, Chapter 14. MIT Press. Cambridge, MA. 271–311.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Matsuoka, Miyuki (2003), Danwaba ni okeru shuujosi yo no kinou (The function of the sentence final particle yo in conversational context). Kotoba to Bunka (Language and Culture) 4: 53–70.
Pierrehumbert, Janet B. & Mary E. Beckman. (1988), Japanese Tone Structure. MIT Press. Cambridge, MA. Portner, Paul (2005), The semantics of imperatives within a theory of clause types. In Kazuha Watanabe and Robert B. Young (eds.), Proceedings of Semantics and Linguistic Theory 14. CLC Publications. New York. Portner, Paul (2007a), Imperatives and modals. Natural Language Semantics 15:351–383. Portner, Paul (2007b), Instructions for interpretation as separate performatives. In Kerstin Schwabe and Susanne Winkler (eds.), On Information, Meaning and Form. John Benjamins. Amsterdam, The Netherlands. 407–26. Potts, Christopher & Shigeto Kawahara (2004), Japanese honorifics as emotive definite descriptions. In Kazuha Watanabe and Robert B. Young (eds.), Proceedings of Semantics and Linguistic Theory 14. CLC Publications. Ithaca, NY. 235–54. Roberts, Craige (1996), Information structure in discourse: towards an integrated formal theory of pragmatics. In Jae Hak Yoon and Andreas Kathol (eds.), OSUWPL Volume 49: Papers in Semantics. The Ohio State University Department of Linguistics. Columbus, OH. 91–136. (http://ling. osu.edu/croberts/infostr.pdf). (1998 revision available online). Roberts, Craige (2004), Context in dynamic interpretation. In Laurence Horn and Gregory Ward (eds.), Handbook of Contemporary Pragmatic Theory. Blackwell. Oxford. 197–220. Shirakawa, Hiroyuki (1992), Shuujoshi yo no kinou [The function of the sentence final particle yo]. Nihongo Kyouiku [Journal of Japanese Language Education] 77:178–200. Shirakawa, Hiroyuki (1993), Hatarakikake toikake no bun to shuujoshi yo [Imperatives, interrogatives, and the
366 Decisions, Dynamics and the Japanese Particle yo The Phonology of Intonation and Phrasing. Oxford University Press. Oxford. 172–200. Venditti, Jennifer J., Kikuo Maekawa, & Mary E. Beckman (2008), Prominence marking in the Japanese intonation system. In Shigeru Miyagawa and Mamoru Saito (eds.), Handbook of Japanese Linguistics, Chapter 17. Oxford University Press. Oxford. 456– 512. Zeevat, Henk (2003), Particles: presupposition triggers, context markers or speech act markers. In Reinhart Blutner and Henk Zeevat (eds.), Optimality Theory and Pragmatics. Palgrave. New York. 91–111. Zimmermann, Malte. Discourse particles. In Claudia Maienborn, Klaus von Heusinger, and Paul Portner (eds.), Semantics: An International Handbook of Natural Language Meaning. Mouton de Gruyter. Berlin, forthcoming. First version received: 24.11.2008 Second version received: 02.04.2009 Accepted: 24.04.2009
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
sentence final particle yo]. Nihongo Kyouiku Gakka Kiyou 3:7–14. Stalnaker, Robert (1978), Assertion. Syntax and Semantics 9:315–32. Suzuki Kose, Yuriko (1997a), Japanese Sentence-Final Particles: A Pragmatic Principle Approach. Doctoral Dissertation, University of Illinois at UrbanaChampaign. Suzuki Kose, Yuriko (1997b), A nonscalar account of apparent gradience: evidence from Yo and Ne. In Alexis Dimitriadis, Laura Siegel, Clarrisa Surek-Clark, and Alexander Williams (eds.), Proceedings of the 21st Annual Penn Linguistics Colloquium, University of Pennsylvania Working Papers in Linguistics. vol. 4. Penn Linguistics Club. Philadelphia, PA. van Rooy, Robert (2003a), Quality and quantity of information exchange. Journal of Logic, Language, and Information 12:423–51. van Rooy, Robert (2003b), Questioning to resolve decision problems. Linguistics and Philosophy 26:727–63. Venditti, Jennifer J. (2005), The J_ToBI model of Japanese intonation. In Sun-Ah Jun (ed.), Prosodic Typology:
Journal of Semantics 26: 367–392 doi:10.1093/jos/ffp008 Advance Access publication July 27, 2009
Branching Quantification v. Two-way Quantification NINA GIERASIMCZUK AND JAKUB SZYMANIK University of Amsterdam
Abstract
1 HINTIKKA’S THESIS Hintikka (1973) claims that the following sentences essentially require non-linear quantification for expressing their meaning. (1)
Some relative of each villager and some relative of each townsman hate each other.
(2)
Some book by every author is referred to in some essay by every critic.
(3)
Every writer likes a book of his almost as much as every critic dislikes some book he has reviewed.
Throughout the paper, we will refer to sentence (1) as Hintikka sentence. According to Hintikka, the interpretation of sentence (1) can be only expressed using Henkin’s quantifier as follows: "xdy (4) ððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ; "zdw where unary predicates V and T denote the set of villagers and the set of townsmen, respectively. The binary predicate symbol R(x, y) denotes that the relation ‘x and y are relatives’ and H(x, y) the symmetric relation ‘x and y hate each other’. Branching quantification (also called partially ordered quantification, Henkin quantification) was proposed by Henkin (1961) (for a survey, see Krynicki & Mostowski 1995). Informally speaking, the idea of such constructions is that for different rows of quantifiers in a prefix, the The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email:
[email protected].
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We discuss the thesis formulated by Hintikka (1973) that certain natural language sentences require non-linear quantification to express their meaning. We investigate sentences with combinations of quantifiers similar to Hintikka’s examples and propose a novel alternative reading expressible by linear formulae. This interpretation is based on linguistic and logical observations. We report on our experiments showing that people tend to interpret sentences similar to Hintikka sentence in a way consistent with our interpretation.
368 Branching Quantification v. Two-way Quantification values of the quantified variables are chosen independently. According to Henkin’s semantics for branching quantifiers, formula (4) is equivalent to the following existential second-order sentence: df dg"x"zððVðxÞ ^ TðzÞÞ/ðRðx; f ðxÞÞ ^ Rðz; gðzÞÞ ^ Hðf ðxÞ; gðzÞÞÞÞ:
dAdB"x"zððVðxÞ ^ TðzÞÞ/ðdy 2 A Rðx; yÞ ^dw 2 B Rðz; wÞ ^ "y 2 A"w 2 B Hðy; wÞÞÞ: The existential second-order sentence is not equivalent to any first-order sentence (see the Barwise–Kunen theorem in Barwise 1979). Not only universal and existential quantifiers can be branched; the procedure of branching works in a very similar way for other quantifiers. Some examples are discussed in the next section of this paper. The reading of Hintikka sentence given by formula (4) is called the branching reading. However, it can also be assigned weaker readings, that is linear representations which are expressible in elementary logic. Let us consider the following candidates: (5) "xdy"zdwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ ^"zdw"xdyððVðxÞ ^ TðzÞÞ/ðRðx; yÞ^Rðz; wÞ^Hðy; wÞÞÞ: (6) "xdy"zdwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ: (7) "x"zdydwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ: In all these formulae, the choice of the second relative depends on the one that has been previously selected. To see the difference between the above readings and the branching reading, consider the second-order formula equivalent to the sentence (6): df dg"x"zððVðxÞ ^ TðzÞÞ/ðRðx; f ðxÞÞ ^ Rðz; gðx; zÞÞ ^ Hðf ðxÞ; gðx; zÞÞÞÞ: 1 The idea of branching is more visible in the case of simpler quantifier prefixes, like in sentence (8) discussed in section 3.2.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Functions f and g (so-called Skolem functions) choose relatives for every villager and every townsman, respectively. Notice that the value of f(g) is determined only by the choice of a certain villager (townsman). In other words, to satisfy the formula, relatives have to be chosen independently.1 This second-order formula is equivalent to the following sentence with quantification over sets:
Nina Gierasimczuk and Jakub Szymanik 369
Hintikka’s thesis. Hintikka sentences don’t have an adequate linear reading. They should be assigned the strong reading and not any of the weaker readings. Because of its many philosophical and linguistic consequences Hintikka’s claim has sparked lively controversy (see e.g. Jackendoff 1972; Gabbay & Moravcsik 1974; Guenthner & Hoepelman 1976; Hintikka 1976; Stenius 1976; Barwise 1979; Bellert 1989; May 1989; Sher 1990; Mostowski 1994; Liu 1996; Beghelli et al. 1997; Janssen 2003; Mostowski & Wojtyniak 2004; Szymanik 2005; Schlenker 2006; Gierasimczuk & Szymanik 2007). In relation to that, there has also been a vivid discussion on the ambiguity of sentences with multiple quantifiers (see Kempson & Cormack 1981a, 1981b, 1982; Tennant 1981; Bach 1982; May 1985; Jaszczolt 2002; Bott & Rado´ 2009; Robaldo 2009). In the present article, some of the arguments presented in the discussion are analysed and critically discussed. We propose to interpret Hintikka sentence by the first-order formula (5): "xdy"zdwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ ^"zdw"xdyððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ: In the rest of this paper, we will refer to this reading as the two-way reading of Hintikka sentence, as opposed to one-way reading expressed by formula (6). Our proposal turns out to agree with speakers’ intuitions, as we show in the next section, and it is also consistent with speakers’ behaviour. The latter fact is supported by empirical data, which we
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It is enough to compare the choice functions in this formula with those in existential second-order formula corresponding to the sentence (4) to see the difference in the structure of dependencies required in both readings. Of course, dependencies in sentences (5) and (7) are analogous to (6). As a result, all the weaker readings are implied by the branching reading (4) (where both relatives have to be chosen independently). Therefore, we sometimes refer to the branching reading as the strong reading. Formulae (5)–(7) are also ordered according to the inference relation which occurs between them. Obviously, formula (5) implies formula (6), which implies formula (7). Therefore, formula (5) is the strongest among the weak readings. By Hintikka’s thesis, we mean the following statement:
370 Branching Quantification v. Two-way Quantification present in section 4.2 Our main conclusion is that sentences with multiple quantifiers, including the Hintikka sentence, allow for a linear reading. This of course clearly contradicts Hintikka’s thesis. 2 MULTIQUANTIFIER SENTENCES
(8) Most villagers and most townsmen hate each other. (9) One-third of the villagers and half of the townsmen hate each other. These sentences seem to be more frequent in our everyday language and more natural than Hintikka’s own examples, even though their adequate meaning representation is no less controversial. 2
It is worth noticing that our proposal is reminiscent of the linguistic representation of reciprocals. For example, according to the seminal paper on ‘each other’ by Heim et al. (1991), Hintikka sentence has the following structure: EACH[[QP and QP]][V the other], where ‘each’ quantifies over the two conjuncts, which turns the sentence into [QP1 V the other and QP2 V the other], where ‘the other’ picks up the rest of quantifiers anaphorically. This interpretation is similar to the two-way reading.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Every branching quantifier can be expressed by some single generalized quantifier, so in the sense of definability Hintikka’s thesis cannot be right. However, the syntax of branching quantification has a particular simplicity and elegance that is lost when translated into the language of generalized quantifiers. The procedure of branching does not employ new quantifiers. Instead, it enriches the syntactic means of arranging existing quantifiers, at the same time increasing their expressive power. Therefore, the general question is as follows: Are there sentences with simple determiners such that non-linear combinations of quantifiers corresponding to the determiners are essential to account for the meanings of those sentences? The affirmative answer to this question, suggested by Hintikka, claims existence of sentences with quantified noun phrases which are always interpreted scope independently. We show that for sentences similar to those proposed by Hintikka, the claim is not true. Before we move on to the central problem, let us consider more sentences with combinations of at least two determiners. We are mainly interested in sentences whose branching interpretation is not equivalent to any linear reading. They all fall into the scope of our discussion and we will call them all ‘Hintikka sentences’. Interesting examples of Hintikka sentences, which we will discuss later, were given by Barwise (1979).
Nina Gierasimczuk and Jakub Szymanik 371
Many more examples have been given to justify the existence of non-linear semantic structures in natural language [see e.g. sentences (10)–(12)]. I told many of the men three of the stories. (Jackendoff 1972)
(11)
A majority of the students read two of those books. (Liu 1996)
(12)
We have been fighting for many years for human rights in China. I recount the story of our failures and successes and say: ‘Whenever a representative from each country fought for the release of at least one dissident from each prison, our campaign was a success’. (Schlenker 2006)
3 THEORETICAL DISCUSSION OF HINTIKKA’S THESIS
3.1 A remark on possible readings Let us start with the following remark. It was observed by Mostowski (1994) that from Hintikka sentence (1), we can infer that (13)
Each villager has a relative.
This sentence obviously has the following reading: "xðVðxÞ/dyRðx; yÞÞ: It can be false in a model with an empty town, if there is a villager without a relative. However, the strong reading of Hintikka sentence [see formula (1)], which has the form of an implication with a universally quantified antecedent, is true in every model with an empty town. Hence, the reading of (13) is not logically implied by proposed readings of Hintikka sentence. Therefore, the branching meaning of Hintikka sentence should be corrected to the following formula with restricted quantifiers: (14)
ð"x : VðxÞÞðdy : Rðx; yÞÞ Hðy; wÞ; ð"z : TðzÞÞðdw : Rðz; wÞÞ
which is equivalent to dAdBð"xðVðxÞ/dy 2 A Rðx; yÞÞ ^ "zðTðzÞ/dw 2 B Rðz; wÞÞ ^ "y 2 A"w 2 B Hðy; wÞÞ: Observe that similar reasoning can be used to argue for restricting quantifiers in formulae expressing different possible meanings of all our sentences. However, applying these corrections uniformly would not change the main point of our discussion. We still would have to choose between the same number of possible readings, the only difference being the restricted quantifiers. Therefore, for simplicity, we will
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(10)
372 Branching Quantification v. Two-way Quantification forego these corrections. From now on, we will assume that all predicates in our formulae have non-empty denotation.
3.2 Hintikka sentences are symmetric It has been observed that there is a strong intuition that the two following versions of Hintikka sentence are equivalent (Hintikka 1973): (1) Some relative of each villager and some relative of each townsman hate each other.
However, if we assume that formula (6) repeated here (6) "xdy"zdwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ is an adequate reading of sentence (1), then we also have to assume that an adequate reading of sentence (15) is represented by the formula: (16) "zdw"xdyððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ: However, (6) and (16) are not logically equivalent and therefore, it would be wrong to treat them as correct interpretations of sentences (1) or (15). Therefore, we have to reject readings (6) and (16) from the set of possible alternatives. Notice that a similar argument applies when we consider other Hintikka sentences. For instance, it is enough to observe that the following sentences are also equivalent: (8) Most villagers and most townsmen hate each other. (17) Most townsmen and most villagers hate each other. However, the possible one-way linear reading of (8) (18) MOST x ðVðxÞ; MOST y ðTðyÞ; Hðx; yÞÞÞ is not equivalent to an analogous reading of (17). Hence, the one-way linear reading in (18) cannot be right. One of the empirical tests we conducted was aimed at checking whether people really consider pairs like (8) and (17) to be equivalent. The results that we will present prove that this is the case. Therefore, the argument from symmetry is also cognitively convincing (see section 4.4.1 for a description of the experiment and section 4.4.2 for our empirical results). Despite this observation, we cannot conclude the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(15) Some relative of each townsman and some relative of each villager hate each other.
Nina Gierasimczuk and Jakub Szymanik 373
validity of Hintikka’s thesis so easily. First, we have to consider the remaining weak candidates, that is formulae (5) and (7): (5)
"xdy"zdwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ ^"zdw"xdyððVðxÞ ^ TðzÞÞ/ðRðx; yÞ^Rðz; wÞ ^ Hðy; wÞÞÞ;
(7)
"x"zdydwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ:
(20)
dAdBðMOST x ðVðxÞ; AðxÞÞ ^ MOST y ðTðxÞ; BðyÞÞ^ "x 2 A "y 2 B Hðx; yÞÞ
but also the two-way meaning (21)
MOST x ðVðxÞ; MOST y ðTðyÞ; Hðx; yÞÞÞ ^ MOST y ðTðyÞ; MOST x ðVðxÞ; Hðy; xÞÞÞ:
Notice that for proportional sentences, like (8), there is no interpretation corresponding to the weakest reading of Hintikka sentence, formula (7), as proportional sentences contain only two simple determiners and not four as in Hintikka’s original example. This observation already indicates that the two-way form, as a uniform representation of all Hintikka sentences, should be preferred to the weakest reading. We will present a further argument against the weakest reading (7) in the next section. To sum up, the symmetry argument rules out readings with asymmetric scope dependencies. At this point, the adequacy of the weakest reading is also controversial since it is not uniform: it cannot be extended to proportional sentences. Our space of possibilities now consists of the branching and the two-way reading. In the next section, we give further reasons to reject the weakest reading of Hintikka’s sentence.
3.3 Inferential arguments Let us now move on to Mostowski’s (1994) argument against the weakest reading of Hintikka sentences. Consider the following:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Hintikka does not consider either of these, and other authors focus only on formula (7) (see e.g. Barwise 1979; Mostowski & Wojtyniak 2004). Also for different Hintikka sentences, we still have to differentiate between some possibilities. As an alternative for formula (18), we can consider not only the branching reading (19) [equivalent to (20)] MOST x : VðxÞ (19) Hðx; yÞ MOST y : TðyÞ
374 Branching Quantification v. Two-way Quantification Some relative of each villager and some relative of each townsman hate each other. Mark is a villager. Therefore: Some relative of Mark and some relative of each townsman hate each other. In other words, if we assume that Mark is a villager, then we have to agree that Hintikka sentence implies that some relative of Mark and some relative of each townsman hate each other. If we interpret Hintikka sentence as having the weakest meaning (7) (7) "x"zdydwððVðxÞ ^ TðzÞÞ/ðRðx; yÞ ^ Rðz; wÞ ^ Hðy; wÞÞÞ; (1) Some relative of Mark and some relative of each townsman hate each other. Mostowski (1994) observes that this is a dubious consequence of the weakest reading. He claims that sentence (1) intuitively has the following reading: (2) dxðRðMark; xÞ ^ "yðTðyÞ/dzðRðy; zÞ ^ Hðx; zÞÞÞÞ: Formula (2) is false in the model of Figure 1. Therefore, it cannot be implied by the weakest reading of Hintikka sentence which is true in the model. However, it is implied by the strong reading which is also false in the model. Hence, Mostowski concludes that Hintikka sentence cannot have the weakest reading (7).
Figure 1 Relatives of Mark are on the left; on the right are two town families.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
then we have to agree that the following sentence is true in Figure 1.
Nina Gierasimczuk and Jakub Szymanik 375
If Mostowski’s intuition is correct (which could be established by experimental means), then we can conclude from this argument that the weakest reading, (7), should be eliminated from the set of possible alternatives. Then we are left with two propositions: the branching and the two-way interpretation. Both of them have the desired inference properties.
(3)
Everyone owns a car.
can be negated normally as follows: (4)
Someone doesn’t own a car.
As an example of a statement which is not negation normal consider the following (see Barwise 1979): (5)
The richer the country, the more powerful its ruler.
It seems that the most efficient way to negate it is as follows: (6)
It is not the case that the richer the country, the more powerful its ruler.
Barwise proposes to treat negation normality as a test for first-order definability with respect to sentences with combinations of elementary quantifiers. This proposal is based on the following theorem. 1
Theorem 1. If / is a sentence definable in +1 , the existential fragment of second-order logic, and its negation is logically 1 equivalent to a +1 -sentence, then / is logically equivalent to some first-order sentence. Barwise claims that the results of the negation normality test suggest that people tend to find Hintikka sentence to be negation normal, and hence definable in elementary logic. According to Barwise, people
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
3.3.1 Negation normality. In his paper on Hintikka’s thesis, Barwise (1979) refers to the notion of negation normality in a defence of the statement that the proper interpretation of Hintikka sentence is an elementary formula. He observes that negations of some simple quantifier sentences, that is sentences without sentential connectives other than ‘not’ before a verb, can easily be formulated as simple quantifier sentences. In some cases, this is impossible. Namely, the only way to negate some simple sentences is by prefixing them with the phrase ‘it is not the case that’ or an equivalent expression of a theoretical character. Sentences of the first kind are called negation normal. For example, sentence
376 Branching Quantification v. Two-way Quantification tend to agree that the negation of Hintikka sentence can be formulated as follows: (7) There is a villager and a townsmen that have no relatives that hate each other.
3.4 Complexity arguments Mostowski & Wojtyniak (2004) claim that native speakers’ inclination towards a first-order reading of Hintikka sentence can be explained by means of computational complexity theory (see e.g. Papadimitriou 1993). The authors prove that the problem of recognizing the truth value of the branching reading of Hintikka sentence in finite models is an NPTIME-complete problem.3 It can also be shown that proportional branching sentences define an NPTIME-complete class of finite models (see Sevenster 2006). Assuming that the class of practically computable problems is identical with the PTIME class (i.e. the tractable version of Church– Turing thesis; see Edmonds 1965), it may be argued that the human mind is not equipped with mechanisms for recognizing NPTIMEcomplete problems.4 In other words, in many situations, an algorithm for checking the truth value of the strong reading of Hintikka sentence is intractable. According to Mostowski & Wojtyniak (2004), native speakers can only choose between meanings which are practically computable. The two-way reading is PTIME computable5 and therefore, even taking into account computational restrictions, is more plausible than the branching reading. 3 NPTIME-complete problems are computationally the most difficult problems in the NPTIME class. In particular, PTIME ¼ NPTIME if any NPTIME-complete problem is PTIME computable. PTIME (NPTIME) is the class of problems which can be solved by a (non-deterministic) Turing machine in a number of steps bounded by a polynomial function of the length of a query. See Garey & Johnson (1979) for more details. 4 This statement can be given independent psychological support (see e.g. Frixione 2001). 5 As model checking for first-order sentences is PTIME computable (see e.g. Immerman 1998).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Barwise’s claim excludes the branching reading of Hintikka sentence but is consistent with the two-way interpretation. Therefore, in case of Hintikka sentence, we are left with only one possible reading: the twoway reading. However, Barwise’s argument does not apply to the proportional sentences, as proportional quantifiers are not definable in first-order logic. Therefore, in the case of proportional sentences, we still have to choose between the branching and the two-way interpretation.
Nina Gierasimczuk and Jakub Szymanik 377
3.5 Conclusions In the foregoing, we discussed possible obstacles to various interpretations of Hintikka sentences. Our two-way reading for Hintikka sentences is the only reading satisfying all the following properties: It is symmetric.
d
It ensures a uniform reading for all Hintikka sentences.
d
It passes Mostowski’s inferential test.
d
It is negation normal for Hintikka sentence.
d
Its truth value is practically computable in finite models.
In the next section, we will present empirical arguments that the twoway reading is consistent with the interpretation people most often assign to Hintikka sentences. 4 EMPIRICAL EVIDENCE FOR THE TWO-WAY READING Many of the authors taking part in the dispute on the proper logical interpretation of Hintikka sentences have argued not only from their own linguistic intuitions but also from the universal agreement of native speakers. For instance, Barwise claims that In our experience, there is almost universal agreement rejecting Hintikka’s claim for a branching reading (Barwise 1979). However, none of these authors have provided genuine empirical data to support their claims. In the rest of this section, we present experimental work supporting the two-way reading.
4.1 Experimental hypotheses Our hypotheses are as follows. Hypothesis 1. sentences.
People treat Hintikka sentences as symmetric
This was theoretically justified by Hintikka (1973) and discussed in section 3.2. To be more precise, we predict that subjects will treat sentences like (8) and (9) as equivalent. (8)
More than 3 villagers and more than 5 townsmen hate each other.
(9)
More than 5 townsmen and more than 3 villagers hate each other.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
d
378 Branching Quantification v. Two-way Quantification Hypothesis 2. In an experimental context, the preferred reading of the Hintikka sentences is best represented by the two-way formula. Based on the arguments of the last section, we predict that subjects will tend to assign the two-way reading to Hintikka sentences, that is they will accept Hintikka sentence when confronted with a model that satisfies its two-way interpretation. We also predict that the comprehension of Hintikka sentences is similar in English and Polish—in both languages native speakers accept the two-way reading.
4.2 Subjects
4.3 Materials It was suggested by Barwise & Cooper (1981) and empirically verified by Geurts & van der Slik (2005) (see also Szymanik 2009) that the monotonicity of quantifiers influences how difficult they are to comprehend. In particular, sentences containing downward monotone quantifiers are more difficult to reason with than sentences containing only upward monotone quantifiers.6 For this reason, in the experiment, we only used (combinations of) monotone increasing quantifiers of the form ‘More than n’ in otherwise simple sentences. In our tasks, the quantifiers referred to shape of geometrical objects (circles and squares). The sentences were Hintikka sentences (e.g. see sections 4.4.1 and 4.4.3).
4.4 Experiments The study was conducted in two languages and consisted of two parts. It was a paper-and-pencil study. There were no time limits and it took 20 minutes on average for all students to finish the test. Below we present descriptions of each part of the English version (Appendix A) of the test. The Polish test was analogous. 6
A quantifier QM is upward monotone (increasing) if the following holds: if QM(A) and moreover A 4 B 4 M, then QM(B). The downward monotone (decreasing) quantifiers are defined analogously as being closed on taking subsets.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Subjects were native speakers of English and native speakers of Polish who volunteered to take part in the experiment. They were undergraduate students in computer science at Stanford University and in philosophy at Warsaw University. All subjects had elementary training in logic so that they could understand the instructions. The experiment was conducted with 32 computer science students and 90 philosophy students.
Nina Gierasimczuk and Jakub Szymanik 379
4.4.2 Results. Our main finding was that 94% [v2 ¼ 709.33, degrees of freedom (df) ¼ 1, P < 0.001] and 98% (v2 ¼ 286.90, df ¼ 1, P < 0.001) of the responses were in agreement with our symmetry hypothesis in the group consisting of philosophy undergraduates at Warsaw University and among Stanford University computer science students, respectively. Moreover, for the sake of completeness, we report results for the remaining part of the experiment. In simple inferences, 83% (v2 ¼ 153.4, df ¼ 1, P < 0.001) and 97% (v2 ¼ 110.63,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
4.4.1 Experiment I: are Hintikka sentences symmetric? The first part of the test was designed to check whether subjects treat Hintikka sentences as symmetric (see section 3.2 for a discussion). Recall the notion of symmetry for our sentences. Let Q1, Q2 be quantifiers and w a quantifier-free formula. We will say that sentence Q1x Q2y w(x, y) is symmetric if and only if it is equivalent to Q2y Q1x w(x, y). In other words, switching the whole-quantifier prefix (determiner + noun phrase) does not change its meaning. In order to check whether subjects treat sentences with switched quantifier prefixes as equivalent, we presented them with sentence pairs /, /# and asked whether the first sentence implies the second sentence. There were 20 tasks. Ten of them were valid inference patterns provided symmetry holds. The rest were fillers, six of which were invalid patterns similar to the symmetric case. In three of these, we changed the order of nouns, that is we had Q1x Q2y w(x, y) and Q1y Q2x w(x, y). In the remaining three, we switched determiners (rather than complete quantifier phrases), that is Q1x Q2y w(x, y) and Q2x Q1y w(x, y). Four of the tasks were simple valid and invalid inferences with the quantifiers ‘more than’, ‘all’ and ‘some’. We constructed our sentences using non-existing nouns to eliminate pragmatic influence on subjects’ answers. For example, in the English version (Appendix) of the test, we used nouns proposed by Soja et al. (1991): mells, stads, blickets, frobs, wozzles, fleems, coodles, doffs, tannins, fitches and tulvers. In Polish, we had the following nouns: strzew, memniak, balbasz, protoro_zec, melarek, kre˛towiec, stular, wachlacz, fisut, bubrak and wypsztyk. Our subjects were informed that they were not supposed to know the meanings of the common nouns occurring in the sentences. Figure 2 gives examples of each type of task in English. We excluded the possibility of interpreting the sentences as being about the relations between objects of the same kind (e.g. ‘68 coodles hate each other’) by explicitly telling the subjects that in this setting the relation can occur only between objects from two different groups.
380 Branching Quantification v. Two-way Quantification
df ¼ 1, P < 0.001) of the answers were logically correct. For invalid symmetry inferences, the results were 86% (v2 ¼ 286.02, df ¼ 1, P < 0.001) and 93% (v2 ¼ 138.38, df ¼ 1, P < 0.001) (see Figure 3). This is a statistically significant result for both groups.7 Therefore, our first hypothesis—that people treat Hintikka sentences as symmetric—was confirmed. We also compared the performance of the two groups (philosophers v. computer scientists) with respect to the three kinds of tests and found no statistically significant differences. To be more precise, there was no difference either in the symmetry task (v2 ¼ 6.583, df ¼ 6, P ¼ 0.361), in the simple inferences (v2 ¼ 8.214, df ¼ 4, P ¼ 0.084), or in the invalid arguments (v2 ¼ 3.888, df ¼ 4, P ¼ 0.421).
7 We were only interested in the frequency of correct answers among all answers to the tasks based on the valid symmetric inference pattern (simple inferences and inferences based on the logically invalid schema were treated as fillers) and that is why we used v2 to analyse our data and not a statistical model, like multivariate analysis of variance (MANOVA), in which the observed variance is partitioned into components due to different independent (explanatory) variables (e.g. two groups of subjects, four types of tasks). We did not analyse the data with MANOVA because the following assumptions were violated (see e.g. Ferguson & Takane 1990): According to our hypothesis, we had expected that the number of answers ‘valid’ will dominate. In other words, the normality assumption of MANOVA was not satisfied, that is the distribution of the answers is not normal but skewed (4.728) towards validity, which was a further reason for using a non-parametric test. Additionally, the conditions (within subject) for each kind of tasks were different (the number of problems varied between 10, 4, 3 and 3) and the groups were not equal (90 philosophers, 32 computer scientists), which also indicates the use of non-parametric statistical model.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Figure 2 Four tasks from the first experiment: symmetry pattern, two invalid patterns and simple inference.
Nina Gierasimczuk and Jakub Szymanik 381
4.4.3 Experiment II: branching v. two-way interpretation. The second questionnaire was the main part of the experiment, designed to discover whether people agree with the two-way reading of Hintikka sentences. Subjects were presented with nine non-equivalent Hintikka sentences. Every sentence was paired with a model. All but two sentences were accompanied by a picture satisfying the two-way reading but not the branching reading. The remaining control tasks consisted of pictures in which the associated sentences were false, regardless of which of the possible interpretations was chosen.8 Every illustration was black and white and showed irregularly distributed squares and circles. Some objects of different shapes were connected with each other by lines. The number of objects in the pictures varied between 9 and 13 and the number of lines was between 3 and 15. All critical sentences were of the following form, where 1 < n, m < 3: (10)
More than n squares and more than m circles are connected by lines.
(11)
Wie˛cej ni_z n kwadraty i wie˛cej ni_z m koła sa˛ poła˛czone liniami.
Notice that some Hintikka sentences contain the phrase ‘each other’. However, we decided not to use this phrase in the sentences tested in the main part of the experiments. This was because our previous experiments (Gierasimczuk & Szymanik 2007) indicated that the occurrence of reciprocal expressions in these sentences made people 8 Bott & Rado´ (2007) empirically assessed this methodology for studying quantifier scope and demonstrated its reliability and validity.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Figure 3 Percentage of correct answers in the first test.
382 Branching Quantification v. Two-way Quantification
interpret them as statements about the existence of lines between figures of the same geometrical shape, which is not the interpretation we wanted to test. In the first test, the usage of the phrase ‘each other’ made the ‘hating’ relation symmetric. In this experiment, the relation ‘being connected by a line’ is already symmetric in itself. Moreover, interviews with native speakers suggest that in the context of the relation ‘being connected by lines’ omitting ‘each other’ leads to more natural sentences. Additionally, in the Polish version of the sentences, there is no grammatically possible phrase corresponding to ‘each other’. Figures 4 and 5 show two examples of our tasks. In the first picture, the two-way reading is true and the branching reading is false. In the second picture, the sentence is false on either readings. The subjects were asked to decide if the sentence is a true description of the picture. 4.4.4 Results. We got the following results9: 94% (v2 ¼ 444.19, df ¼ 1, P < 0.001) of the answers of the philosophy students and 96% (v2 ¼ 187.61, df ¼ 1, P < 0.001) of the answers of the computer science students were two-way, that is ‘true’ when the picture represented 9
We used a non-parametric statistical test for the same reasons as in the first experiment.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Figure 4 Two-way task from the second part of the experiment.
Nina Gierasimczuk and Jakub Szymanik 383
a model for a two-way reading of the sentence. For the two sentences that were false, no matter how subjects interpreted them, the rates of correct answers were 92% (v2 ¼ 136.94, df ¼ 1, P < 0.001) and 96% (v2 ¼ 50.77, df ¼ 1, P < 0.001). These results are statistically significant. Therefore, our second hypothesis—that in an empirical context people can assign to Hintikka sentences meanings which are best represented by two-way formulae—was confirmed. Further analysis of the individual subjects’ preferences revealed that 94% of the philosophers (v2 ¼ 71.11, P < 0.001, df ¼ 1) and 97% of the computer scientists (v2 ¼ 28.12, P < 0.001, df ¼ 1) agreed on the two-way reading in more than half of the cases. Moreover, 67 (74%, v2 ¼ 21.51, P < 0.001, df ¼ 1) philosophers and 28 (88%, v2 ¼ 18, P < 0.001, df ¼ 1) computer scientists chose two-way readings in all tasks (see Table 1 for a presentation of all data). Once again, we did not observe any differences between our two subject groups either in judging obviously false situations (v2 ¼ 0.188, df ¼ 1, P ¼ 0.664) or in the two-way preferences (v2 ¼ 3.900, df ¼ 7, P ¼ 0.791). Therefore, we conclude that with respect to the interpretation of quantifier combinations in Hintikka sentences there is no difference between English and Polish.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Figure 5 An example of a false task from the second part of the experiment.
384 Branching Quantification v. Two-way Quantification Groups Number of subjects Two-way answers Recognized falsity
Polish philosophers
American computer scientists
90 94% 92%
32 95% 96%
Table 1 Results of the second test
5 CONCLUSIONS AND PERSPECTIVES
5.1 Conclusions
i.
For Hintikka sentence, we should focus on four possibilities: a branching reading (4) and three weak readings: (5), (6) and (7).
ii. Hintikka’s argument from symmetry given in section 3.2, together with the results of our first experiment, allows us to reject asymmetric formulae. A similar argument leads to rejecting the linear readings of other Hintikka sentences. iii.
What about the weakest reading? It does not exist for some Hintikka sentences so it cannot be viewed as a universal reading for all of them. Moreover, the inferential argument from section 3.3 suggests that the weakest meaning is also not an appropriate reading of Hintikka sentence.
iv. Therefore, there are only two alternatives: we have to choose between the two-way (5) and the branching readings (4). In section 4, we discussed our empirical results. They indicate that people interpret Hintikka sentences in accordance with the two-way reading, at least in an experimental context. Additionally, we observed no statistically significant differences in preferences of native English and native Polish subjects. Moreover, our experimental arguments are supported by the following observations.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Contrary to what Hintikka (1973) and many of his followers have claimed, we argue that Hintikka sentences have readings expressible by linear formulae that satisfy all conditions which prompted the introduction of branching interpretation. The reasons for treating such natural language sentences as having Fregean (linear) readings are twofold. In section 1, we discussed a number of theoretical arguments, which can be summed up as follows.
Nina Gierasimczuk and Jakub Szymanik 385
i. The argument by Barwise from negation normality, discussed in section 3.3.1, agrees with our empirical results. ii.
Branching readings, being NP-complete, may be too difficult for language users. Two-way readings, which are PTIME computable, are much easier in this sense.
Hence, even though we in principle agree that Hintikka sentences are ambiguous between all proposed readings, our experiments and theoretical considerations convince us that in some situations the proper reading of Hintikka sentences can be given by two-way formulae. This clearly contradicts Hintikka’s thesis.
We have tested one of the best known among non-Fregean combinations of quantifiers, the so-called Hintikka sentences. We have presented arguments that these sentences can be interpreted in natural language by Fregean combinations of quantifiers. However, there is still some research to be done here. One can find and describe linguistic situations in which Hintikka sentences demand a branching analysis [recall example (12)]. For example, the work of Schlenker (2006) goes in this direction. Moreover, it is interesting to ask which determiners allow a branching interpretation at all (see e.g. Beghelli et al. 1997). Finally, we did not discuss the interplay of our proposition with the collective reading of noun phrases (see e.g. Lønning 1997) and various interpretations of reciprocal expressions (see Dalrymple et al. 1998). As to the empirical work, we find a continuation towards covering other quantifier combinations exciting and challenging. Some ideas we discussed in the context of Hintikka sentences, such as inferential meaning, negation normality and the computational complexity perspective, seem universal and potentially useful for studying other quantifier combinations. Acknowledgements We would like to thank two anonymous reviewers of Journal of Semantics and Johan van Benthem, Tadeusz Ciecierski, Paul Dekker, Bogdan Dziobkowski, Bart Geurts, Justyna Grudzin´ska, Tikitu de Jager, Theo Janssen, Dick de Jongh, Allen Mann, Marcin Mostowski, Rick Nouwen, Eric Pacuit, Ingmar Visser, Yoad Winter, qukasz Wojtyniak and Marcin Zajenkowski. The first author is a recipient of the 2009 Foundation for Polish Science Grant for Young Scientists. The second author was supported by a Marie Curie Early Stage Research fellowship in the project GLoRiClass (MEST-CT-2005-020841).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
5.2 Perspectives
386 Branching Quantification v. Two-way Quantification NINA GIERASIMCZUK Institute for Logic, Language and Computation Universiteit van Amsterdam Science Park 904 1098 XH Amsterdam The Netherlands e-mail:
[email protected]
JAKUB SZYMANIK Institute for Logic, Language and Computation Universiteit van Amsterdam Science Park 904 1098 XH Amsterdam The Netherlands e-mail:
[email protected]
APPENDIX: ENGLISH VERSION OF THE TEST Instruction: Over the next pages, you will find 20 tasks. Each task represents some inference. Your aim is to decide whether this inference is valid. In other words, each task consists of two sentences with a horizontal line between them. You must decide whether a sentence above the line implies a sentence below the line. If you think that inference pattern is valid (second sentence is implied by the first one) encircle: ‘VALID’, otherwise encircle: ‘NOT VALID’. Example 1:
Example 2:
The following pairs of sentences were used in the test. d d
More than 6 fleems are tulvers. More than 5 fleems are tulvers. More than 12 fleems and more than 13 coodles hate each other. More than 13 coodles and more than 12 fleems hate each other.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
A. First test
Nina Gierasimczuk and Jakub Szymanik 387 d
d
d d
d
d d
d
d
d
d
d d
d
d
d
d
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
d
More than 16 stads and more than 9 blickets hate each other. More than 9 blickets and more than 16 stads hate each other. More than 16 mells and more than 25 blickets hate each other. More than 25 blickets and more than 16 mells hate each other. More than 10 mells are fleems. More than 11 mells are fleems. More than 9 frobs and more than 8 coodles hate each other. More than 8 coodles and more than 9 frobs hate each other. More than 20 wozzles and more than 35 fitches hate each other. More than 20 fitches and more than 35 wozzles hate each other. All wozzles are fleems. All fleems are wozzles. More than 100 wozzles and more than 150 stads hate each other. More than 150 stads and more than 100 wozzles hate each other. More than 105 wozzles and more than 68 coodles hate each other. More than 68 wozzles and more than 105 coodles hate each other. More than 6 doffs and more than 5 fitches hate each other. More than 5 fitches and more than 6 doffs hate each other. More than 47 stads and more than 55 tannins hate each other. More than 47 tannins and more than 55 stads hate each other. More than 58 frobs and more than 49 tannins hate each other. More than 49 frobs and more than 58 tannins hate each other. More than 7 coodles and more than 6 doffs hate each other. More than 6 doffs and more than 7 coodles hate each other. Some tulvers are mells. Some mells are tulvers. More than 99 coodles and more than 68 tulvers hate each other. More than 68 tulvers and more than 99 coodles hate each other. More than 7 tannins and more than 8 fitches hate each other. More than 8 fitches and more than 7 tannins hate each other. More than 19 frobs and more than 11 fleems hate each other. More than 11 fleems and more than 19 frobs hate each other. More than 159 stads and more than 25 fitches hate each other. More than 159 fitches and more than 25 stads hate each other. More than 8 frobs and more than 27 doffs hate each other. More than 27 frobs and more than 8 doffs hate each other.
388 Branching Quantification v. Two-way Quantification
B. Second test Instruction: Over the next few pages, you will find nine tasks to solve. Each task consists of a picture. Above every picture, there is exactly one sentence. Encircle TRUE if and only if the sentence is a true description of the picture. Otherwise, encircle FALSE. More than 1 square and more than 2 circles are connected by lines.
More than 1 square and more than 1 circle are connected by lines.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
More than 3 circles and more than 2 squares are connected by lines.
Nina Gierasimczuk and Jakub Szymanik 389
More than 3 circles and more than 1 square are connected by lines.
More than 2 circles and more than 3 squares are connected by lines.
More than 3 circles and more than 3 squares are connected by lines.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
More than 3 circles and more than 3 squares are connected by lines.
390 Branching Quantification v. Two-way Quantification More than 2 squares and more than 1 circle are connected by lines.
REFERENCES Bach, K. (1982), ‘Semantic nonspecificity and mixed quantifiers’. Linguistics and Philosophy 4:593–605. Barwise, J. (1979), ‘On branching quantifiers in English’. Journal of Philosophical Logic 8:47–80. Barwise, J. & Cooper, R. (1981), ‘Generalized quantifiers and natural language’. Linguistics and Philosophy 4:159–219. Beghelli, F., Ben-Shalom, D., & Szabolcsi, A. (1997), ‘Variation, distributivity, and the illusion of branching’. In A. Szabolcsi (ed.), Ways of Scope Taking, vol. 65. Studies in Linguistic and Philosophy. Kluwer Academic Publisher. The Netherlands. 29–69. Bellert, I. (1989), Feature System for Quantification Structures in Natural
Language. Foris Publications, Dordrecht. Bott, O. & Rado´, J. (2007), ‘Quantifying quantifier scope: a cross-methodological comparison’. In S. Featherston & W. Sternefeld (eds.), Roots—Linguistics in Search of Its Evidential Base, vol. 96. Studies in Generative Grammar. Mouton de Gruyter. Berlin. 53–74. Bott, O. & Rado´, J. (2009), ‘How to provide exactly one interpretation for every sentence, or what eye movements reveal about quantifier scope’. In S. Winkler & S. Featherson (eds.), The fruits of empirical linguistics, vol. 1. Walter de Gruyter. Berlin. Dalrymple, M., Kanazawa, M., Kim, Y., Mchombo, S., & Peters, S. (1998), ‘Reciprocal expressions and the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
More than 2 squares and more than 1 circle are connected by lines.
Nina Gierasimczuk and Jakub Szymanik 391 Janssen, T. (2003), ‘On the semantics of branching quantifier sentences’. In P. Dekker & R. van Rooij (eds.), Proceedings 14th Amsterdam Colloquium. University of Amsterdam. 147–51. Jaszczolt, K. (2002), Semantics and Pragmatics: Meaning in Language and Discourse. Longman. London. Kempson, R. M. & Cormack, A. (1981a), ‘Ambiguity and quantification’. Linguistics and Philosophy 4:259– 309. Kempson, R. M. & Cormack, A. (1981b), ‘On ’formal games and forms for games‘’. Linguistics and Philosophy 4: 431–35. Kempson, R. M. & Cormack, A. (1982), ‘Quantification and pragmatics’. Linguistics and Philosophy 4:607–18. Krynicki, M. & Mostowski, M. (1995), ‘Henkin quantifiers’. In M. Krynicki, M. Mostowski, & L. Szczerba (eds.), Quantifiers: Logics, Models and Computation. Kluwer Academic Publishers. 193–263. Liu, F.-H. (1996), ‘Branching quantification and scope independence’. In J. van der Does & J. van Eijck (eds.), Quantifiers, Logic and Language, Center for the Study of Language and Information. Standford. 155–68. Lønning, J. T. (1997), ‘Plurals and collectivity’. In J. van Benthem & A. ter Meulen (eds.), Handbook of Logic and Language. Elsevier. The Netherlands. 1009–53. May, R. (1985), Logical Form: Its Structure and Derivation. The MIT Press. Cambridge. May, R. (1989), ‘Interpreting logical form’. Linguistics and Philosophy 12:387–435. Mostowski, M. (1994), ‘Kwantyfikatory rozgale˛zione a problem formy logicznej’. In M. Omyła (ed.), Nauka i je˛zyk, Biblioteka Mys´li Semiotycznej. Warsaw. 201–42.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
concept of reciprocity’. Linguistics and Philosophy 21:159–210. Edmonds, J. (1965), ‘Paths, trees, and flowers’. Canadian Journal of Mathematics 17:449–467. Ferguson, G. A. & Takane, Y. (1990), Statistical Analysis in Psychology and Education. McGraw-Hill Education. New York. Frixione, M. (2001),‘ Tractable competence’. Minds and Machines 11:379–97. Gabbay, D. M. & Moravcsik, J. M. E. (1974), ‘Branching quantifiers, English and Montague grammar’. Theoretical Linguistics 1:140–57. Garey, M. R. & Johnson, D. S. (1979), Computers and Intractability. W. H. Freeman and Co. San Francisco. Geurts, B. & van der Slik, F. (2005), ‘Monotonicity and processing load’. Journal of Semantics 22:97–117. Gierasimczuk, N. & Szymanik, J. (2007), ‘Hintikka’s thesis revisited’. The Bulletin of Symbolic Logic 17:273. Guenthner, F. & Hoepelman, J. P. (1976), ‘A note on the representation of branching quantifiers’. Theoretical Linguistics 3:285–89. Heim, I., Lasnik, H., & May, R. (1991), ‘Reciprocity and plurality’. Linguistic Inquiry 22:63–101. Henkin, L. (1961), ‘Some remarks on infinitely long formulas’. In Infinistic Methods. Pergamon Press. Warsaw. 167–83. Hintikka, J. (1973), ‘Quantifiers vs. quantification theory’. Dialectica 27: 329–58. Hintikka, J. (1976), ‘Partially ordered quantifiers vs. partially ordered ideas’. Dialectica 30:89–99. Immerman, N. (1998), Descriptive Complexity. Springer. New York. Jackendoff, R. (1972), Semantic Interpretation and Generative Grammar. MIT Press. Cambridge, MA.
392 Branching Quantification v. Two-way Quantification Mostowski, M. & Wojtyniak, D. (2004), ‘Computational complexity of the semantics of some natural language constructions’. Annals of Pure and Applied Logic 127:219–27. Papadimitriou, C. H. (1993), Computational Complexity. Addison Wesley. California. Robaldo, L. (2009), ‘Independent set readings and generalized quantifiers’. Journal of Philosophical Logic, forthcoming.
Sevenster, M. (2006), Branches of Imperfect Information: Logic, Games, and Computation. Ph.D. thesis, Universiteit van Amsterdam. Amsterdam. Sher, G. (1990), ‘Ways of branching quantifiers’. Linguistics and Philosophy 13:393–442.
First version received: 25.8.2008 Second version received: 16.1.2009 Accepted: 25.4.2009
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Schlenker, P. (2006), ‘Scopal independence: a note on branching and wide scope readings of indefinites and disjunctions’. Journal of Semantics 23:281–314.
Soja, N., Carey, S., & Spelke, E. (1991), ‘Ontological categories guide young children’s induction of word meaning: Object terms and substance terms’. Cognition 38:179–211. Stenius, E. (1976), ‘Comments on Jaakko Hintikka’s paper ‘‘Quantifiers vs. quantification theory’’’. Dialectica 30:67–88. Szymanik, J. (2005), ‘Problemy z forma˛ logiczna˛’. Studia Semiotyczne 25:187– 200. Szymanik, J. (2009), Quantifiers in TIME and SPACE. Computational Complexity of Generalized Quantifiers in Natural Language. Ph.D. thesis, Universiteit van Amsterdam. Amsterdam. Tennant, N. (1981), ‘Formal games and forms for games’. Linguistics and Philosophy 4:311–20.
Journal of Semantics 26: 393–449 doi:10.1093/jos/ffp004 Advance Access publication May 4, 2009
A Formal Semantic Analysis of Gesture ALEX LASCARIDES University of Edinburgh MATTHEW STONE Rutgers University
The gestures that speakers use in tandem with speech include not only conventionalized actions with identifiable meanings (so-called narrow gloss gestures or emblems) but also productive iconic and deictic gestures whose form and meanings seem largely improvised in context. In this paper, we bridge the descriptive tradition with formal models of reference and discourse structure so as to articulate an approach to the interpretation of these productive gestures. Our model captures gestures’ partial and incomplete meanings as derived from form and accounts for the more specific interpretations they derive in context. Our work emphasizes the commonality of the pragmatic mechanisms for interpreting both language and gesture, and the place of formal methods in discovering the principles and knowledge that those mechanisms rely on.
1 INTRODUCTION Face-to-face dialogue is the primary setting for language use, and there is increasing evidence that theories of semantics and pragmatics are best formulated directly for dialogue. For example, many accounts of semantic content see extended patterns of interaction rather than individual sentences as primary (Kamp 1981; Asher & Lascarides 2003; Ginzburg & Cooper 2004; Cumming 2007). Likewise, many pragmatic theories derive their principles from cognitive models of interlocutors who must coordinate their interactions while advancing their own interests (Lewis 1969; Grice 1975; Clark 1996; Asher & Lascarides 2003; Benz et al. 2005). But face-to-face dialogue is not just words. Speakers can use facial expressions, eye gaze, hand and arm movements and body posture intentionally to convey meaning; see, for example, McNeill (1992). This raises the challenge of fitting a much broader range of behaviours into formal semantic and pragmatic models. We take up this challenge in this paper. We focus on a broad class of improvised, coverbal, communicative actions, which seem both particularly important and particularly The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email:
[email protected].
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Abstract
394 A Formal Semantic Analysis of Gesture
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
challenging for models of meaning in face-to-face dialogue. We distinguish communicative actions from other behaviours that people do in conversation, such as practical actions and incidental ‘nervous’ movements, following a long descriptive tradition (Goffman 1963; Ekman & Friesen 1969; Kendon 2004). This allows us to focus on a core set of behaviours—which we call gestures following Kendon (2004)—that untrained viewers are sensitive to (Kendon 1978), that linguists can reliably annotate (Carletta 2007) and that any interpretive theory must account for. Gestures have what Kendon calls ‘features of manifest deliberate expressiveness’ (Kendon 2004: 15), including the kinematic profile of the movement, as an excursion from and back to a rest position; its dynamics, including pronounced onset and release and the attention and treatment interlocutors afford it. Coverbal gestures are those that are performed in synchrony with simultaneous speech. Gestures can also be performed without speech [see Kendon (2004, Ch. 14) for examples], in the pauses between spoken phrases [see Engle (2000, Ch. 3) for example] or over extended spans that include both speech and silence [see Oviatt et al. (1997) for examples]. Coverbal gestures show a fine-grained alignment with the prosodic structure of speech (Kendon 1972, 2004, Ch. 7). The gesture typically begins with a preparatory phase, where the agent moves the hands into position for the gesture. It continues with a stroke (which can involve motion or not), which is that part of the gesture that is designed to convey meaning—we focus in this paper on interpreting strokes. Finally, it can conclude with a post-stroke phase where the hands retract to rest. Speakers coordinate gestures with speech so that the phases of gesture performance align with intonational phrases in speech and so that strokes in particular are performed in time with nuclear accents in speech. This coordination may involve brief pauses in one or the other modality, orchestrated to maintain synchrony between temporally extended behaviours (Kendon 2004, Ch. 7). The active alignment between speech and gesture is indicative of the close semantic and pragmatic relationship between them. Finally, we contrast improvised gestures both with other gestures whose content is emblematic and completely conventionalized, such as the ‘thumbs up’ gesture, and with beat gestures, which merely emphasize important moments in the delivery of an utterance. Improvised gestures may involve deixis, where an agent designates a real, virtual or abstract object, and iconicity, where the gesture’s form or manner of execution mimics its content. Deixis and iconicity sometimes involve the creative introduction of correspondences between the body and depicted space. Nevertheless, as Kendon’s (2004) fieldwork shows, even in deixis and
Alex Lascarides and Matthew Stone 395
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
iconic representation, speakers recruit specific features of form consistently to convey specific kinds of content. The partial conventionalization involved in these correspondences is revealed not only in consistent patterns of use by individual speakers but also in crosscultural differences in gesture form and meaning. Researchers have long argued that speakers use language and gesture as an integrated ensemble to negotiate a single contribution to conversation—to ‘express a single thought’ (McNeill 1992; Engle 2000; Kendon 2004). We begin with a collection of attested examples which lets us develop this idea precisely (Section 2). We show that characterizing the interpretation of such examples demands a finegrained semantic and pragmatic representation, which must encompass content from language and gesture and formalize scope relationships, speech acts and contextually inferred referential connections. Our approach adopts representations that use dynamic semantics to capture the evolving structure of salient objects and spatial relationships in the discourse and a segmented structure organized by rhetorical connections to characterize the semantic and pragmatic connections between gesture and its communicative context. We motivate and describe these logical forms (LFs) in Section 3. We then follow up our earlier programmatic suggestion (Lascarides & Stone forthcoming) that such LFs should be derivable from underspecified semantic representations that capture constraints on meaning imposed by linguistic and gestural form, via constrained inference, which reconstructs how language and gesture are rhetorically connected. Our underspecified semantic representations, described in Section 4, capture the incompleteness of meaning that is revealed by gestural form while also capturing, very abstractly, what a gesture must convey given its particular pattern of shape and movement. We describe the resolution from underspecified meaning to specific interpretation in Section 5: a glue logic composes the LF of discourse from underspecified semantic representations via default rules for inferring rhetorical connections; and as a by-product of this reasoning, underspecified aspects of meaning are disambiguated to pragmatically preferred, specific values. These formal resources parallel those required for recognizing how sentences in spoken discourse are coherent, as developed by Asher & Lascarides (2003) inter alia. The distinctive contribution of our work, then, is to meet the challenge, implicit in descriptive work on non-verbal communication, of handling gesture within a framework that is continuous with and complementary to purely linguistic theories. This is both a theoretical and methodological contribution. In formalizing the principles of
396 A Formal Semantic Analysis of Gesture
2 DIMENSIONS OF GESTURE MEANING IN INTERACTION WITH SPEECH We begin with an overview of the possible interpretations of improvised coverbal gestures. We emphasize that the precise reference and content of these gestures is typically indeterminate, so that multiple consistent interpretations are often available. It is the details and commonalities of these alternative interpretations that we aim to explain. We argue that they reveal three key generalizations about gesture and its relationship to speech. 1. 2. 3.
Gestures can depict the referents of expressions in the synchronous speech, inferentially related individuals or salient individuals from the prior context. Gestures can show what speech describes, or they can complement speech with distinct but semantically related information. Gesture and speech combine into integrated overarching speech acts with a uniform force in the dialogue and consistent assignments of scope relationships.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
coherence that guide the interpretation of gesture, we go beyond previous work—whether descriptive (McNeill 1992; Kendon 2004), psychological (Goldin-Meadow 2003; So et al. forthcoming) or applied to embodied agents (Cassell 2001; Kopp et al. 2004). Such a logically precise model is crucial to substantiating the theoretical claim that speech and gesture convey an integrated message. Such a model is also crucial to inform future empirical research. The new data afforded by gesture call for refinements to theories of purely linguistic discourse, potentially resulting in more general and deeper models of semantics and pragmatics. But a formal model is often indispensable for formulating precise hypotheses to guide empirical work. For instance, our framework raises for the first time a set of logically precise constraints characterizing reference and content across speech and gesture and sequences of gesture in embodied discourse. Testing these constraints empirically will have a direct influence not only on the development of formal theory but also on our understanding of the fundamental pragmatic principles underlying multimodal communication. A hybrid research methodology, combining empirical research and logically precise models to mutual benefit, has proved highly successful in analysing language. We hope the same will be true for analysing gesture.
Alex Lascarides and Matthew Stone 397
(1)
And [Norris]1 [is exactly across from the library.]2 First: The left arm is extended into the left periphery; the left palm faces right so that it and fingers are aligned with the forearm in a flat, open shape. Meanwhile, the right hand is also held flat, in line with the forearm; the arm is held forward, with elbow high and bent, so that the fingers are directly in front of the shoulder. Second: The left hand remains in its position while the right hand is extended to the extreme upper right periphery.
Figure 1 Hand gestures place landmarks on a virtual map.
1
Video is available at homepages.inf.ed.ac.uk/alex/Gesture/norris-eg.mov
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
These principles—which we defend in more detail in Lascarides & Stone (2006, forthcoming)—underpin the formalism we present in the rest of the paper. Our discussion follows McNeill (2005: 41) in characterizing deixis and iconicity as two dimensions of gesture meaning, rather than two kinds of gesture. Deixis is that dimension of gesture meaning that locates objects and events with respect to a consistent spatial reference frame; our first examples highlight the semantic interaction of this spatial reference with the words that accompany coverbal gestures. Iconicity, meanwhile, is the dimension of gesture meaning which depicts aspects of form and motion in a described situation through a natural correspondence with the form and motion of the gesture itself; we consider iconicity in relatively ‘pure’ examples later in this section. So characterized, of course, deixis and iconicity are not mutually exclusive, so our formalism must allow us to regiment and combine the deictic and iconic contributions to gesture interpretation. We start with utterance (1), which is taken from the corpus of faceto-face direction-giving dialogues of Kopp et al. (2004) and visualized in Figure 1 (in this paper, we use square brackets to indicate the temporal alignment of speech and gesture, and where relevant small capitals to mark pitch accents):1
398 A Formal Semantic Analysis of Gesture
(2) They [have
SPRINGS.]
Speaker places right pinched hand (that seems to be holding a small vertical object) just above left pinched hand (that seems to be holding another small vertical thing).
The speaker here describes how the cotter pins on a lock are held in position. The utterance refers to the set of pins with they and the whole set of corresponding springs with springs. The gesture, however, depicts a single spring and pin in combination, highlighting the vertical relationship through which a spring pushes its corresponding pin into the key cylinder to help hold the cylinder in place. As is common, the gesture remains ambiguous; it is not clear which hand represents the spring and which the pin.2 But even allowing for this ambiguity, we know that the gesture elaborates on the speech by showing the vertical spatial relationship maintained in a prototypical case of the relationship described in speech. Furthermore, as Engle notes, the gesture serves to disambiguate the plural predication in the accompanying sentence to a distributive interpretation. 2 In fact, springs push pins down into the cylinder, as diagrammed in the explanation this subject relied on, and as required by the need to make locks robust against gravity.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The utterance concludes a direction-giving episode in which the speaker has already used gestures to create a virtual map of the Northwestern University campus (more of this episode appears later as examples (10) and (11)). Throughout the utterance, the speaker’s left hand is positioned to mark the most salient landmark of this episode—the large lagoon at the centre of the campus, which she has introduced in previous speech and gesture. The gesture on the left of Figure 1 positions the right hand at a location that was initially established for Norris Hall, while the next gesture moves the right hand to resume an earlier demonstration of the library. One could interpret these gestures as demonstrating the buildings, their locations or even the eventualities mentioned in the clause. But any of these interpretations result in a multimodal communicative act where the right hand indexes entities referenced in the accompanying speech and indicates the spatial relationship that the sentence describes. To capture the deictic dimension of gesture form and meaning here, we characterize the form of the gesture as targeting a specific region of space, and then use a referent to that region of space to characterize the content that the gesture conveys. Example (2), from Engle (2000: 37), illustrates a less constrained relationship between the contents conveyed by speech and gesture (pitch accents are shown with small capitals).
Alex Lascarides and Matthew Stone 399
Gestures maintain semantic links to questions and commands, as well as assertions. Such examples underscore the need to integrate the reference and content of gesture precisely with semantic and pragmatic representations of linguistic units. Consider the following example, taken from the Augmented Multi-party Interaction (AMI) corpus (dialogue ES2002b, Carletta 2007), in which a group of four people are tasked with designing a remote control: (3)
C: [Do you want to switch places?]
Intuitively, the gesture in (3) adds content to C’s question: it is not about whether D wants to switch places with someone unspecified, but rather switch places with C (the speaker). So overall, the multimodal action means ‘Do you want to switch places with me?’. A different gesture, involving C’s hand moving between agents D and A, would have resulted in a different overall question: Do you want to switch places with A? To capture this interaction in LF, the interrogative operator associated with the question must have a referential or scopal dependence which allows its contribution to covary with the content of the gesture. The following example is taken from the same dialogue as (1): (4)
[You walk out the doors] The gesture is one with a flat hand shape and vertical palm, with the fingers pointing right, and palm facing outward.
The linguistic component expresses an instruction. And intuitively, the interpretation of the gesture is also an instruction: ‘and then immediately turn right’. The inferential connection here must integrate both the semantic and pragmatic relationships between gesture and speech. Semantically, the two modalities respect a general presentational constraint that the time and place where the first event ends (in this case, where the addressee would be once he walks out the door) overlaps with where the second event starts (turning right). Pragmatically, they are interpreted as presenting an integrated instruction. (Note that if we were to replace the clause in (4) with an indicative such as ‘John walked out the doors’, then although the content of the utterance and the gesture would exhibit the same semantic relationship, the gesture would now have the illocutionary effects of an assertion.) Both aspects can be characterized by representing the content of the two as connected by
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
While C speaks, her right hand has its index finger extended; it starts at her waist and moves horizontally to the right towards D and then back again to C’s waist, and this movement is repeated, as if to depict the motion of C moving to D’s location and D moving to C’s location.
400 A Formal Semantic Analysis of Gesture a rhetorical relation: this reflects both their integrated content and the integrated speech act that the speaker accomplishes with the two components of the utterance (Asher & Lascarides 2003). In this case, that relation is Narration, the prototypical way to present an integrated description of successive events in discourse. Our examples thus far show how the hands can establish a consistent deictic frame to indicate objects and actions in a real or virtual space. Many gestures use space more abstractly, to depict aspects of form and motion in a described situation. A representative example is the series of gestures in (5)—Example 6 and Figure 8.4 in Kendon (2004: 136) that is an extract of the story of Little Red Riding Hood: First: Speaker’s right hand grasps left hand, with wrists bent. Second: Speaker lifts poised hands above right shoulder.
b. with] [a mighty
SWEEP]
First: Hands, held above right shoulder, move back then forward slightly.
c. [(pause 0.4 sec)] [SLICED the wolf ’s stomach open] First: Speaker turns head Second: Arms swing in time with sliced; then are held horizontally at the left
In this example, the speaker assumes what McNeill (1992: 118) calls character viewpoint: she mirrors the actions of the woodsman as he kills the wolf. In (5a) and (5b), the speaker’s hands, in coming together, depict the action of grabbing the handle of the hatchet and then the action of drawing the hatchet overhead, ready to strike; in (5c), the speaker’s broad swinging motion depicts the woodsman’s effort in delivering his blow to the wolf with the hatchet. The whole discourse thus exhibits a consistent dramatization of the successive events, with the speaker understood to act out the part of the woodsman, and her hands in particular understood as holding and wielding the woodsman’s hatchet. However, there seems to be no implication that these actions are demonstrated in the same spatial frame as previous or subsequent depictions of events in the story. Cross-cultural studies, as in the work of Haviland (2000) among others, suggest that narrative traditions differ in how freely gestures can depart from a presupposed anchoring to a consistent real or virtual space, with examples like (5) in English representing the most liberal case. Following the descriptive literature on gesture, we represent the form of such gestures with qualitative features that mirror elements of the English descriptions we give for these gestures. In (5a), for example, we indicate that the speaker’s hands are held above the right shoulder, that the right is grabbing the left and that both hands are in a fist shape as though grabbing something.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(5) a. and [took his] [HATCHET and
Alex Lascarides and Matthew Stone 401
Iconicity is captured by the relationship between these gesture elements and a naturally related predication that each element contributes to the interpretation of the gesture as a whole. Gestures with iconic meanings, like those with deictic meanings, must be interpreted in tight semantic and pragmatic integration with the accompanying utterances. Consider utterance (6) from the AMI corpus (dialogue ES2005b): (6)
When D says ‘‘computery’’, her right hand has fingers and thumb curled downward (in a 5-claw shape), palm also facing down, and she moves the fingers as if to depict typing.
The content of D’s gesture is presented in semantic interaction with the scope-bearing elements introduced in the sentence. Intuitively, the gesture depicts a keyboard, not anchored to any specific virtual space. There is nothing within the form of the gesture itself that depicts negation. Nevertheless, D’s overall multimodal act entails not with a keyboard. This requires the negation that is introduced by the word not to semantically outscope the content depicted by the gesture. This scope relation can be achieved only via a unified semantic framework for representing verbal and gestural content. In fact, we will argue in Section 4.3 that example (6) calls for an integrated, compositional description of utterance form and meaning that captures both linguistic and gestural components. That way, established and well-understood methods for composing semantic representations from descriptions of the part–whole structure of communicative actions can determine the semantic scope between gestured content and linguistically introduced negation. Iconicity gives rise to the same interpretive underspecification that we saw with deixis. For example, consider the depiction of ‘trashing’ in the following example from a psychology lecture:3 (7)
I can give you other books that would totally trash experimentalism. When the speaker says ‘‘trash’’, both hands are in an open flat handshape (ASL 5), with the palms facing each other and index fingers pointing forward. The palms are at a 45-degree angle to the horizontal. The hands start at the central torso, and move in parallel upwards and to the left.
3
See www.talkbank.org/media/ClassTalk/Lecture-unlinked/feb07/feb07-1.mov
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
D: And um I thought not too edgy and like a box, more kind of hand-held more um . . .not as uh [computery] and organic, yeah, more organic shape I think.
402 A Formal Semantic Analysis of Gesture While there is ambiguity in what the speaker’s hands denote—they could be hands metaphorically holding experimentalism itself, a representation of a statement of experimentalism such as a book or the content of such a book—the gesture is clearly coherent and depicts experimentalism being thrown away. The following example (8) illustrates the possibility for deictic and iconic imagery to combine in complex gestures. It is extracted from Kendon’s Christmas cake narrative (2004, Figure 15.6: 321–2), where a speaker describes how his father, a small-town grocer, would sell pieces of a giant cake at Christmas time: During the pauses, the speaker frames a large horizontal square using both hands; his index fingers are extended, but other fingers are drawn in, palms down.
b. and [he’d cut it off in bits] The speaker lowers his right hand, held open, palm facing to his left, in one corner of the virtual square established in the previous gesture
The gesture in (8b) involves both iconic and deictic meaning. The iconicity comes in the configuration and orientation of the speaker’s right hand, which mirrors a flat vertical surface involved in cutting: perhaps the knife used to cut the cake, the path it follows through the cake or the boundary thereby created. (We are by now familiar with such underspecification.) The deixis comes in the position of the speaker’s hand, which is interpreted by reference to the virtual space occupied by the cake, as established by the previous gesture. We finish with an example, taken from a lecture on speech,4 that—like all our examples—underscores how gesture interpretation is dependent on both its form and its coherent links to accompanying linguistic context (Figure 2): (9) So there are these very low level phonological errors that tend to not get reported. The hand is in a fist with the thumb to the side (ASL A) and moves iteratively in the sagittal plane in clockwise circles (as viewed from left), below the mouth.
One salient interpretation is that the gesture depicts the iterative processes that cause low-level phonological errors, slipping beneath everyone’s awareness. In previous utterances, the speaker used both words and gestures to show that anecdotal methods for studying speech errors are biased towards noticeable errors like Spoonerisms. Those 4
http://www.talkbank.org/media/Class/Lecture-unlinked/feb02/feb02-8.mov
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(8) a. and it was [pause 1.02 sec] this sort of [pause 0.4 sec] size
Alex Lascarides and Matthew Stone 403
Figure 2 Hand gestures depicting speech errors. Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
noticeable errors were depicted with the hand emerging upward from the mouth into a prominent space between the speaker and his audience. If we take the different position of the gesture in (9), below the mouth, nearer the speaker, as intended to signal a contrast with this earlier case, then we derive an interpretation of this gesture as depicting low-level phonological errors as less noticeable. At the same time, as in (8b), we might understand the hand shape iconically, with the fist shape suggesting the action of processes of production in bringing forth phonological material. This ensemble represents a coherent interpretation because it allows us to understand the gesture as providing information that directly supports what is said in the accompanying speech—the fact that these errors are less noticeable explains why anecdotal methods would not naturally detect them. Of course, this interpretation is just one of several coherent alternatives for this gesture. Another plausible interpretation of the gesture in (9) is that it depicts the low level of the phonological errors, rather than the fact that these errors are less noticeable. This alternative interpretation is also coherently related to the linguistic content: like (1) it depicts objects that are denoted in the sentence. In fact, this interpretation would be supported by a distinct view of the gesture’s form, where instead of conceiving it as a single stroke (as our prior interpretation requires since the repeated movement was taken to depict an iterative process), it is several strokes—a sequence of identical gestures, each consisting of a fist moving in exactly one circle, and each circle demonstrating a distinct low-level phonological error. This alternative interpretation demonstrates how ambiguity can persist in a coherent discourse at all levels, from form to interpretation. But crucially, all plausible interpretations must satisfy the dual constraints that (i) the interpretation offer a plausible iconic rendition of the gesture’s form and (ii) the interpretation be coherently related to the content conveyed by its synchronous speech. Accordingly, while
404 A Formal Semantic Analysis of Gesture
3 THE LF OF MULTIMODAL COMMUNICATION The overall architecture of our formalism responds to the claim, substantiated in Section 2, that gesture and speech present complementary, inferentially related information as part of an integrated, overarching speech act with a uniform force and consistent assignments of scope relationships. We formalize this integration of gesture and speech by developing an integrated logical form (LF) for multimodal discourse, which, like the LF of purely linguistic discourse, makes explicit the illocutionary content that the speaker is committed to in the conversation. As in theories of linguistic discourse, we give a central place in LF to rhetorical relations between discourse units. Here
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
computing the interpretation of gesture via unification with the content of synchronous speech may suffice for examples where gesture coherence is achieved through conveying the same content as speech (Kopp et al. 2004), on its own it cannot account for the gestures in examples such as (4) and (6) that evoke distinct, but related, objects and properties to those in the speech. Rather, computing an interpretation of the gesture that is coherently related to the content conveyed in speech will involve commonsense reasoning. Whether the full inventory of rhetorical relations that are attested in linguistic discourse are also attested for relating a gesture to its synchronous speech is an empirical matter. But we rather suspect that certain relations are excluded—for instance, interpreting a gesture so that it connects to its synchronous speech with Disjunction seems implausible (although Disjunction could relate one multimodal discourse unit that includes a gestural element to another). But this does not undermine the role of coherence relations in interpreting gesture any more than it does for interpreting other purely linguistic constructions that signal the presence of one of a strict subset of rhetorical connections. For example, sense-ambiguous discourse connectives such as and are like this: and signals the presence of a rhetorical relation between its complements; it underspecifies its value, but it cannot be Disjunction (Carston 2002). Similarly, Kortmann (1991) argues that the interpretation of free adjuncts (e.g. ‘opening the drawer, John found a revolver’) involves inferring coherence relationships between the subordinate and main clauses, but certain relationships such as Disjunction are ruled out. It is not surprising that synchronous speech and gesture likewise signal the presence of a coherence relation whose value is not fully determined by form, although certain relations are ruled out.
Alex Lascarides and Matthew Stone 405
(10) a. Norris is like up here— The right arm is extended directly forward from the shoulder with forearm slightly raised; the right palm is flat and faces up and to the left.
b. And then the library is over here. After returning the right hand to rest, the right hand is re-extended now to the extreme upper right periphery, with palm held left.
The speaker evokes the same virtual space in (1) as in the preceding (10) by designating the same physical positions when naming the buildings. The rhetorical connection Overlay captures the intuition that commonalities in the use of space mark the coherent use of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
rhetorical relations must not only link linguistic material together but also link gestures to synchronous speech and to other material in the ongoing discourse. A rhetorical relation represents a type of (relational) speech act (Asher & Lascarides 2003). Examples include Narration (describing one eventuality and then another that is in contingent succession), Background (a strategy like Narration’s save that the eventualities temporally overlap) and Contrast (presenting related information about two entities, using parallelism of syntax and semantics to call attention to their differences). The inventory also includes metatalk relations that relate units at the level of the speech acts rather than content. For instance, you might follow ‘Chris is impulsive’ with ‘I have to admit it’—an explanation of why you said Chris is impulsive, not an explanation of why Chris is impulsive. These are symbolized with a subscript star—Explanation* for this example. To extend the account to gesture, we highlight an additional set of connections which specify the distinctive ways embodied communicative actions connect together. The examples from Section 2 provide evidence for three such relations. First, Depiction is the strategy of using a gesture to visualize exactly the content conveyed in speech. Example (1) is an illustrative case. The speaker says that Norris is across from the library at the same time as she depicts their relative locations across from one another. Technically, Depiction might be formalized as a special case of Elaboration, where the gesture does not present additional information to that in the speech. We distinguish Depiction, however, because Depiction does not carry the implicatures normally associated with redundancy in purely spoken discourse (Walker 1993)—it is helpful, not marked. Second, Overlay relates one gesture to another when the latter continues to develop the same virtual space. Example (1), which is preceded in the discourse by (10), illustrates this:
406 A Formal Semantic Analysis of Gesture
3.1 Spatial content We begin by formalizing the spatial reference that underpins deixis as a dimension of gesture meaning. Our formalization adds symbols for places and paths through space, variables that map physical space to virtual or abstract spaces and predicates that record the propositional information that gestures offer in locating entities in real, virtual and abstract spaces. This section presents each of these innovations in turn. We begin by adding to the model a spatio-temporal locality L R4 within which individuals and events can be located. We also add to the p2 ; . . . which are mapped a subset of L by language a set of constants ~ p1 ;~ the model’s interpretation function I—that is ½½~ pM ¼ I M ð~ pÞ4LM Whenever we need to formalize the place or path in physical space designated by the gesture, we use a suitable constant ~ p. We will return shortly to how physical locations map to locations in the situation the speaker describes. In Section 2, we used the gestures in (1) as representative examples of spatial reference, since the position of the right hand in space signals the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
gesture. Here, a LF that features Overlay between the successive gestures in (10ab) and (1) captures the correct spatial aspects of the discourse’s content. Our third new relation is Replication, which relates successive gestures that use the body in the same way to depict the same entities. The gestures of example (5) illustrate Replication. The initial gesture adopts a figuration in which the speaker represents the woodsman, with her hands modelling his grip on the handle of the hatchet. While the subsequent speech no longer explicitly mentions the hatchet or even the woodsman, subsequent gestures continue with the imagery adopted in the earlier gesture. Connecting subsequent gestures to earlier ones by Replication captures the coherence of this consistent imagery. Our plan for this section is to formalize this programmatic outline, by developing representations of LF that combine these rhetorical relations with appropriate models of spatial content (Section 3.1), dynamic semantics (Section 3.2) and discourse structure (Section 3.3). We then show that these LFs allow for the interpretive links in reference, content and scope that we observed in Section 2. The section culminates in the presentation of a language Lsdrs (Section 3.4) for describing the content of multimodal discourse that is based on that of Segmented Discourse Representation Theory (SDRT; Asher & Lascarides 2003).
Alex Lascarides and Matthew Stone 407
(11) It’s the weird-looking building over here. The left-hand shape is ASL 5 open, the palm facing right and fingers facing forward; the hand sweeps around to the right as though tracing the surface of a cylinder.
This trajectory is meant to represent the cylindrical exterior of the library—a fact that must be captured in LF via a suitable constant ~ ps . As we saw in Section 2, such alternative methods of spatial reference give rise to ambiguities in the form and interpretation of gestures— ambiguities that may never be fully resolved. Accordingly, it may not be possible or desirable to draw inferences from LFs involving spatial constants such as ~ ps that depend on the constants’ exact values. A further cross-cutting distinction is whether the speaker indicates the location of the hand itself, as in (1) and (11), or uses the hand to designate a distant region. The typical pointing gesture, with the index finger extended (the ASL 1-index hand shape) is often used in this way. Take the case of deferred reference illustrated in (12), after Engle (2000, Table 8: 38): (12) [These things] push up the pins. The speaker points closely at the front-most wedge of the line of jagged wedges that runs along the top of a key as it enters the cylinder of a lock.
It seems clear that the speaker aims to call attention to the spatial location ~ pw of the first wedge, not the spatial location of the finger.5 Utterance (12) also contrasts with (1) and (11) in whether they link up with the real places or establish a virtual space that models the real 5 This demonstrative noun phrase and accompanying demonstration is attested in Engle’s data. Unfortunately Engle does not report the entire sentence context in which the gesture is used; the continuation is based on other examples she reports. Further examples of the variety of spatial reference in gesture are provided by Johnston et al. (1997), Johnston (1998) and Lu¨cking et al. (2006).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(relative) locations of Norris Hall and the library in interpretation. To formalize this, we can use a spatial reference ~ pn denoting a place in front of the speaker’s shoulder, and ~ pl denoting a place up and to her right. The contrast is now evident between the use of space in (1) and the mimicry in examples such as (5) and (7). Whereas (5) and (7) portray ‘non-spatial’ content through qualitative aspects of movement, (1) expresses intrinsically spatial information through explicit spatial reference. For now, we remain relatively eclectic about how a speaker uses movement to indicate a spatio-temporal region. In (1) the speaker designates the position of the hand. But in (11)—a description of the library from the discourse preceding (1)—the speaker designates the trajectory followed by the hand:
408 A Formal Semantic Analysis of Gesture
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
world. In (12) the speaker locates the actual wedge on the key. In (1) and (11), however, the speaker is not actually pointing at the buildings—she marks their imagined locations in the space in front of her. The information that (12) gives about the real world can therefore be characterized directly in terms of the real-world region ~ pw that the speaker’s gesture designates. By contrast, the content of (1) and (11) can only be described in terms of context-dependent mappings vc and vs from the space in front of the speaker to (in this case) the Northwestern University campus. The relationship between the real positions of the speaker’s hands during her demonstrations ~ pn and ~ pl in (1) thus serves to signal the corresponding relationship between the pn Þ and the actual location of the actual location of Norris Hall vc ð~ pl Þ. A related (perhaps identical) mapping vs is at play in (11) library vc ð~ when the speaker characterizes the actual shape of the library facade ps Þ in terms of the curved path ~ ps of her hand. vs ð~ These spatio-temporal mappings are the second formal innovation of the language to represent gesture meaning. Formally, variables such as vc and vs are part of an infinite family of variables that denote transformations over L. They simplify the relationship between the form of a gesture and its semantics considerably. We do not have to assume an ambiguity between reference to physical space v. virtual space. Rather, gesture always refers to physical space and always invokes a mapping between this physical space and the described situation—for example the gesture in (12) makes the relevant mapping the identity function vI. The values of these variables v1, v2, . . . are determined by context. Some continuations of discourse are coherent only when the speaker continues to use the space in his current gesture in the same way as his previous gestures. Other continuations are coherent even though the current gesture uses space in a different way. The values of v1, v2, . . . are therefore provided by assignment functions, which in our dynamic semantics mediate the formal treatment of context dependence and context change since they are a part of the context of evaluation (see Section 3.4). To respect the iconicity, the possible values for a mapping v are tightly constrained: they can rotate and re-scale space but not effect a mirroring transformation. At the same time (given their origin in human cognition and bodily action), we would not expect mappings to realize spatial relationships exactly. Here we simply assume that there is a suitably constrained set of mappings T in any model, and so where f is an assignment function, f(v) 2 T. As is standard in dynamic semantics (Williamson 1994; Kyburg & Morreau 2000; Barker 2002), we understand the context dependence
Alex Lascarides and Matthew Stone 409
pn ÞÞ ^ locðe2 ; l; vc ð~ pl ÞÞ (13) a. locðe1 ; n; vc ð~ b. locðe3 ; f ; vs ð~ ps ÞÞ ^ facadeðl; f Þ pw ÞÞ c. locðe4 ; w; vI ð~ In words, e1 is the state of n, the discourse referent for Norris Hall that is introduced in the clause, being contained in the spatio-temporal pn Þ on the speaker’s virtual map of a point ~ pn in front of her image vc ð~ left shoulder; e2 is the state of the library l being contained in the spatio-temporal image vc ð~ pl Þ of the designated point ~ pl further up and to the right. Example (13b) states that the facade f of the library lies in ps Þ determined by the speaker’s hand the real-world cylindrical shell vs ð~ movement ~ ps in (11). The predication facade(l, f ) is not contributed by an explicit element of gesture morphology but is, as we describe in more detail in Section 3.2, the result of pragmatic inference that connects individuals introduced in the gesture to antecedents from the verbal content (i.e. the library). The LF (13c) of the deictic gesture in (12) locates the front-most wedge w at the (distant) location ~ pw where
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
of mapping variables to offer a solution to the problem of vagueness in spatial mappings. These variables take on precise values, given a precise context. However, interlocutors do not typically nail down a precise context, and if the denotation of a variable v is not determined uniquely, the LF will be correspondingly vague about spatial reference. A range of spatial reference will be possible and the interpretation of the gesture will fit a range of real-world layouts. For instance, utterances (1) and (11) can either exemplify a statement at a particular time or extend their spatio-temporal reference throughout a wider temporal interval (Talmy 1996). We also discussed in Section 2 that the gestures in (1) can be interpreted as demonstrating the buildings or the locations of the buildings. These alternatives correspond to distinct values for the mapping v that might both be supported by the discourse context. Interlocutors can converse to resolve these vagaries (Kyburg & Morreau 2000; Barker 2002). But eventually, even if some vagueness in interpretation persists, interlocutors understand each other well enough for the purposes of the conversation (Clark 1996). We complete this formalization of spatial reference in gesture by introducing two new predicates: loc and classify, respectively, describe the literal and metaphorical use of space to locate an entity. loc is a three-place predicate, and locðe; x;~ pÞ is true just in case at each moment spanned by the temporal interval e, x is spatially contained in the region specified by ~ p. For example, (13a) represents the interpretation of the gestures in (1):
410 A Formal Semantic Analysis of Gesture
(14) We have this one ball, as you said, Susan. The speaker sits leaning forward, with the right-hand elbow resting on his knee and the right hand held straight ahead, in a loose ASL L gesture (thumb and index finger extended, other fingers curled) pointing at his addressee.
Example (14) is part of an extended explanation of the solution to a problem in physics. The speaker’s explanation describes the results of a thought experiment that his addressee Susan had already introduced into the dialogue, and he works to acknowledge this in both speech and gesture. More precisely, both the gesture in (14) and the adjunct clause ‘as you said’ function as meta-comments characterizing the original source of ‘We have this one ball’. The gesture, by being performed close to Susan and away from the speaker’s body, shows that his contribution here is metaphorically located with Susan; that is it recapitulates the content she contributed. Formally, we handle such metaphorical reference to space by assuming a background of meaning postulates that link spatial coordinates with the corresponding properties. The predicate classify is used to express instantiations of this link. For example, corresponding to the conduit metaphor is a virtual space vm that associates people with their contributions to conversation. If ~ pi is the location of any interlocutor i, then classifyðe; u; vm ð~ pi ÞÞ is true exactly when utterance u presents content that is originally due to a contribution by i. The generative mapping vm between interlocutors’ locations and contributed content shows why the metaphor depends on spatial reference.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
the speaker is pointing. The deferred reference from that one wedge w to the entire set of wedges at the top of the key as denoted by the linguistic phrase these things is obtained via pragmatics: context resolves to a specific value an underspecified relation between the deictic referent and the NP’s referent, this underspecified relation being a part of the compositional semantics of the multimodal act (see Section 4.2). Indeed, identifying the gesture’s referent and spatial mapping as w and vI, resolving the referent of these things and resolving this underspecified relation between them (to exemplifies) are logically codependent (see Section 5). Speakers can also use positions in space as a proxy for abstract predications. Such metaphorical uses are formalized through the predicate classify. A representative illustration can be found in the conduit metaphor for communication (Reddy 1993), where information contributed by a particular dialogue agent is metaphorically located with that agent in space, as shown in the naturally occurring utterance (14):
Alex Lascarides and Matthew Stone 411
3.2 Dynamic semantics We proceed by formalizing a shared set of constraints on coreference in discourse that describes both deictic and iconic gesture meaning. Our formalization distinguishes between entities introduced in speech and those introduced in gesture. As a provisional account of our empirical data and linguistic intuitions, we articulate a model in which entities introduced in gesture must be bridging related to entities introduced explicitly in accompanying speech (Clark 1977). These inferred entities can figure in the interpretation of subsequent gestures but do not license pronominal anaphora in subsequent speech. This provisional model offers a lens with which to focus future empirical and theoretical research. In Section 2, we presented a range of examples in which gestures seem most naturally understood as depicting entities that are not directly referenced in the accompanying speech, but which stand in close relations to them given commonsense knowledge. For instance in (2), the prototypical spring and the prototypical pin depicted in gesture are inferable from the set of springs and the set of pins explicitly referenced in words. Conversely in (12), the referent of ‘these things’ is inferable from but not identical to the specific individual wedge that is denoted by the speaker’s gesture. The grip of the woodsman’s hands on the handle of the hatchet, we suggested, is inferable but not explicitly referenced in the words ‘took his hatchet’ of (5a). The knife-edge depicted by the hand in (8b) is also inferable but not explicitly
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The LF of (14) will therefore use the formula classifyðe6 ; u#; vm ð~ pS ÞÞ, where u# denotes an utterance whose content entails ‘We have this one ball’, and ~ pS denotes Susan’s location. This treatment of metaphor is continuous with models of linguistic metaphor as context-dependent indirect reference (Stern 2000; Glucksberg & McGlone 2001). This indirect reference depends on the conventional referent (here a point in space) and a generating principle taken from context that maps the conventional referent into another domain (here the domain of contributing to conversation). As revealed within cognitive linguistics (Lakoff & Johnson 1981; Gibbs 1994; Fauconnier 1997), such mappings are typically richly structured but flexible and open-ended. Thus, interlocutors will no more fix a unique way to understand the metaphor than they will fix other aspects of context-dependent interpretation. So metaphorical interpretations on our account—though represented precisely in terms of context-dependent reference—will remain vague.
412 A Formal Semantic Analysis of Gesture
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
referenced in the accompanying statement about the grocer’s work with the cake, ‘he’d cut it off into bits’. While we do not believe gestures refer only to entities evoked in speech, we do think some inferential connection to prior discourse is necessary. Not only does this characterize all the examples we have investigated but also to go beyond inference would place an uncharacteristically heavy burden on gesture meaning, which is typically ambiguous and open-ended except when interpreted in close connection to its discourse context. Thus, we will formalize initial references to new entities in gesture by analogy to definite references to inferable entities in discourse—what is known as bridging in the discourse literature (Clark 1977). Our techniques are standard; see, for example Chierchia (1995). But nevertheless, they offer the chance to engage future empirical work on reference in gesture with a related formal and empirical approach to linguistic discourse. Entities depicted initially in gesture remain available for the interpretation of subsequent gestures. For example, the lagoon, located as a landmark on the speaker’s left in the initial segment of the direction-giving episode excerpted in (1) serves as the referent demonstrated by the speaker’s left hand in both gestures of (1)—despite the fact that the speaker does not continue to reference the lagoon in speech. Similarly, once introduced in (5a), the grip of the woodsman’s hands on the handle of the hatchet continues to guide the interpretation of the gestures of (5b), even though the speaker does not mention the hatchet again. By contrast, inferable entities evoked only in gesture seem not to be brought to prominence in such a way as to license the use of a pronoun in subsequent speech. Such examples would be very surprising in light of the tradition in discourse semantics—see, for example Heim (1982)—that sees pronominal reference as a reflex of a formal link to a linguistic antecedent. In fact, we have found no such examples. And our intuitions find hypothetical attempts at such reference quite unnatural. Try, for example, following (8b) ‘he’d cut it off into bits’ with? ‘and it would get frosting all over it’, with it understood to pick out the cutting edge demonstrated in gesture in (8b). We show in this section how to formalize such a constraint. Our formalism builds an existing dynamic semantic model of anaphora in discourse, since with dynamic semantics—unlike alternatives such as Centering Theory (Grosz et al. 1995) and Optimality Theory (Buchwald et al. 2002)—we can build on previous work that integrates anaphoric interpretation with rhetorical relations and discourse structure (Asher & Lascarides 2003). We use a dynamic
Alex Lascarides and Matthew Stone 413
6 Our framework aims for the simplest possible dynamic semantic formalization. This involves taking up generalizations from linguistic discourse as a provisional guide to the behaviour of gesture. For example, we assume that a gesture that is outscoped by a negation, like the keyboard gesture of (6), does not license subsequent anaphora, just as an indefinite in speech that is outscoped by a negation does not license a subsequent pronoun (Groenendijk & Stokhof 1991). Other analyses are possible, with more complex dynamic semantics, modelled, for example after treatments of so-called specific indefinites in language; see Farkas (2002). More generally, our developments are compatible with more complex architectures for dynamic semantics. But the formalism we provide is already sufficient for interpreting the key examples we consider here.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
semantics where context is represented as a partial variable assignment function (van Eijck & Kamp 1997). As a discourse is interpreted the model remains fixed but the input assignment function changes into a different output one. The basic operations are to test that the input context satisfies certain conditions (with respect to the model), to extend the input variable assignment function by defining a value for a new variable and to sequence two actions together, thereby composing their individual effects on their input contexts. These primitives are already used to model the anaphoric dependencies across sentence boundaries; here we will use them to model anaphoric dependencies between spoken utterances and gesture and between sequences of gestures.6 To distinguish between a set of prominent entities introduced explicitly in speech and a set of background entities that is depicted only in gesture, we take our cue from Bittner’s (2001) formalization of centring morphology as distinguishing foreground and background entities. This involves splitting the context into two assignment functions Æf ; gæ to distinguish referents of different status. For us, the first function f records the entities that can be used to interpret pronouns and other anaphors in speech; the second one g records the entities that are the basis for referential depiction in gesture. To ensure that linguistic indefinites can license depiction in gesture (e.g. the indefinite springs in (2) licenses the depiction of a prototypical spring in gesture), existential quantifiers in the logic will trigger an update to both f and g. Meanwhile, to delimit the scope of gesture, we introduce an operator ½G over formulae, which semantically restricts the contextupdating operations that happen within its scope to only the second function g (see Section 3.4 for details). ½G is extensional, not modal. It allows us to capture the different status of referents introduced in different ways while allowing us to treat gesture and speech within a common logical representation—as we must, given the inferential and scopal dependencies we observed in Section 2. The need for this formal device is independent of the use of rhetorical relations to integrate content from different modalities together. In particular, the
414 A Formal Semantic Analysis of Gesture need for a suitable dynamic semantics does not undermine our claim that gesture and speech are rhetorically connected. Modal subordination and grammatically marked focus systems in language also block individuals in one clause from being an antecedent to anaphora in another, even when there is a clear rationale for a rhetorical connection. So the anaphoric constraints across speech and gesture are no more a counterargument to rhetorical connections than modal subordination and focus give counterarguments to using rhetorical relations to model linguistic discourse.
There are several existing frameworks that use rhetorical relations; for example Mann & Thompson (1987) and Hobbs et al. (1993). We will use SDRT (Asher & Lascarides 2003) as our starting point, for three main reasons. First, SDRT fully supports semantic underspecification. This is useful because the meaning of a gesture as revealed by its form is highly underspecified—we can reuse SDRT’s existing techniques for resolving underspecified content to pragmatically preferred values in our model of gesture interpretation (see Section 5). Secondly, SDRT acknowledges that ambiguity can persist in a coherent discourse. Its motivation for doing so stems originally from observing that there may be more than one maximally coherent interpretation of linguistic discourse. We have seen that the same is true of gesture, and so a framework where coherence constrains interpretation but does not necessarily resolve it uniquely is essential. Finally, SDRT offers a dynamic semantic interpretation of LFs. We have seen already in Sections 3.1 and 3.2 that dynamic semantics offers an elegant way of modelling both the vagueness in gesture interpretation and constraints on coreference. In SDRT, LFs consist of labels p1, p2, . . . that each represent a unit of discourse, and a function that associates each label with a formula that represents the unit’s interpretation—these formulae can be rhetorical relations between labels. We will treat individual clauses and gestures as units of discourse and so they each receive a label. Rhetorical connections among units of discourse create discourse segments: p1 immediately outscopes p2 if p1’s formula includes R(p, p2) or R(p2, p) for some R and p. While a segment may consist of (or outscope) a continuous set of discourse units, this is not necessary; see Asher & Lascarides (2003) for many examples. Gestures likewise structure discourse in flexible but constrained ways. As we have seen, gestures like those in (10ab) followed by (1) will bear a rhetorical
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
3.3 Rhetorical relations and discourse structure
Alex Lascarides and Matthew Stone 415
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
relation both to simultaneous speech and to previous gestures. In all cases, however, the outscopes relation over labels in an LF cannot contain cycles and must have a single root—that is the unique segment that is the entire discourse. Each rhetorical relation symbol receives a semantic interpretation that is defined in terms of the semantics of its arguments. For instance, a discourse unit that is formed by connecting together smaller units p1 and p2 with a veridical rhetorical relation R entails the content of the smaller units, as interpreted in dynamic succession, and goes on to add a set of conditions uRðp1 ;p2 Þ that encode the particular illocutionary effects of R. For example, Explanation(p1, p2) transforms an input context C into an output one C# only if Kp1 ^ Kp2 ^ uExplanationðp1 ;p2 Þ also does this, where ^ is dynamic conjunction and Kp1 and Kp2 are the contents of labels p1 and p2, respectively. The formula uExplanation ðp1 ;p2 Þ is a test on its input context. Meaning postulates constrain its interpretation: for example Kp2 must be an answer to the question Why Kp1 ? The formalization of the three new rhetorical relations is straightforward in this setting; see Section 3.4. The content of the entire discourse is then interpreted in a compositional manner, by recursively unpacking the truth conditions of the formula that is associated with the unique root label. This natural extension of the formal tools for describing discourse coherence fits what we see as the fundamental commonality in mechanisms for representing and establishing coherence across all modalities. The structure induced by the labels and their rhetorical connections impose constraints and preferences over interpretations: in other words, discourse structure guides the resolution of ambiguity and semantic underspecification that is induced by form. For example, SDRT gives a characterization of attachment that limits the rhetorical links interpreters should consider for a new discourse unit, based on the pattern of links in preceding discourse. Rhetorical connections also restrict the availability of referents as antecedents to anaphora. Both of these ingredients of SDRT carry over to embodied discourse (Lascarides & Stone forthcoming). Anticipating the arguments of Section 4, we assume a distinction between identifying gestures, which simply demonstrate objects, from more general visualizing gestures, which depict some aspect of the world. Interpreting a gesture may involve resolving an ambiguity in its form, as to whether it is identifying or visualizing. Identifying gestures are interpreted in construction with a suitable linguistic constituent. (We remain agnostic about the temporal and structural constraints between speech and gesture that may apply here.) The joint
416 A Formal Semantic Analysis of Gesture
7 Parallel, Contrast and discourse subordination relax this right-frontier constraint, but we ignore this here.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
interpretation introduces an underspecified predicate symbol—call it id_rel—that relates the referent of the identifying gesture to the semantic index of the corresponding linguistic unit. Constructing the discourse’s LF then involves resolving the underspecified relation id_rel to a specific value. The complex demonstrative in (12) represents such a case. The speaker’s identifying gesture refers to the front-most wedge on the key. That referent exemplifies the set denoted by the demonstrative NP these things; so in this case id_rel resolves to exemplifies. Discourse structure and commonsense reasoning guide this process of resolution. Similarly, visualizing gestures are also interpreted in construction with a suitable linguistic constituent (often a clause). The joint interpretation introduces an underspecified rhetorical connection vis_rel(ps, pg) between the spoken part ps and the gesture part pg. Thus, for a visualizing gesture, constructing the LF of the discourse involves achieving (at least) four, logically codependent tasks that are all designed to make the underspecified LF that is derived from its form more specific. First, one resolves the underspecified content of the gesture to specific values. Second, the underspecified rhetorical connection vis_rel is resolved to one of a set of constrained values (e.g. Narration is OK, Disjunction is not). Third, one identifies a label for this rhetorical connection: if it is a new label which in turn attaches to some label in the context then ps and pg start a new discourse segment; otherwise it is an existing label and ps and pg continue that existing segment. And finally, one computes whether pg and/or ps are also rhetorically connected to other labels. Discourse structure imposes constraints on all of these decisions. In SDRT, the available labels in the context for connections to new ones are either (i) the last label pl that was added or (ii) any label that dominates pl via a sequence of the outscopes relation and/or subordinating relations (e.g. Elaboration, Explanation, Background are subordinating, while Narration is not). This corresponds to the right frontier when the discourse structure is diagrammed as a graph with outscoping and subordinating relations depicted with downward arcs.7 However, extending SDRT to handle gesture introduces a complication, because the last label is not unique: while a linguistic discourse imposes a linear order on its minimal discourse units (one must say or write one clause at a time), this linear order breaks down when one gestures and speaks at the same time. As the most conservative possible working hypothesis, we
Alex Lascarides and Matthew Stone 417
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
assume there that new attachments remain limited to the right frontier, the only difference being that instead of one last label, there are two: the g label psl for the last minimal spoken unit, and the label pl for its synchronous gesture (if there was one). Since there are two last labels, there are now two right frontiers. The ‘spoken’ right-frontier Ps is the set of all labels that dominate psl via outscopes and subordinating relations, and the ‘gesture’ frontier Pg is the set of all labels that dominate g pl . Thus, the available labels are Ps [ Pg (note that Ps \ Pg is nonempty and contains at least the root p0). In other words, when an utterance attaches to its context, the dependencies of its speech and gesture are satisfied through the connection to the discourse as a whole, to one another, or to the continued organization of communication in their respective modalities. Given this definition of attachment, antecedents to anaphora can be controlled as in Asher & Lascarides (2003)—roughly, the antecedent is in the same discourse unit or in one that is rhetorically connected to it. This definition combines with the dynamic semantics from Section 3.2 to make precise predictions about how logical structure and discourse structure constrain coreference. For instance, the coreference across gestures that we observed in the extended discourse (10ab) followed by (1) satisfies these constraints—each spoken clause is connected to its prior one with the subordinating relation Background, making all the spoken labels on the ‘spoken’ right frontier; and each gesture is connected to the prior one with the subordinating relation Overlay (this entails that space is used in the same way in the gestures), placing them on the ‘gesture’ right frontier. So all labels remain available. Each gesture also connects to its synchronous clause with Depiction—a coordinating relation because the content of one argument is not more fine-grained or ‘backgrounded’ relative to the other. Nevertheless, all labels are on the right frontier because of the Background and Overlay connections. Conversely, availability constrains the interpretation of the gestures in (15) which is taken from the same dialogue as (10) and (1)—in ways that match intuitions. To see this, we discuss the interpretation of each multimodal action in turn. Intuitively, the gesture in (15a) is related to its synchronous clause with Depiction, since it demonstrates the direction to turn into. Example (15b) attaches to the spoken and gesture labels of (15a) with the subordinating relation Acknowledgement—thus A’s utterance (15c) can connect to (15a). And indeed it does: the speech unit in (15c) is related to that in (15a) with the coordinating relation Narration—so it is interpreted as a command to keep walking after turning right. Furthermore, the gesture connects to
418 A Formal Semantic Analysis of Gesture (15a)’s gesture with Overlay so that it conveys keep walking rightward from the position where you turned. (15) a. A: So then, once you get to that parking lot you turn [right]. When A says the word right, her right hand is held in flat open shape (ASL5) with the palm facing forward and fingers pointing right, and the hand is held far out to the right of her torso.
b. B: Right c. A: And you [keep walking]. When A says keep walking, she repeats the gesture from (15a).
d. A: And then there’s like a [road]
e. B: U-huh f. A: [It will just kinda like consolidate, you know, like come into a road]. A’s hands are in ASL-5, and they start with the palms facing each other at an angle (as if the hands form two sides of a equilateral triangle), very close to her central torso. They then sweep out towards B, and the hands go from being at an angle to being parallel to each other.
g. A: [Just stay on the road and then walk for a little bit]. A’s right hand starts at the centre of her torso and sweeps out to the right.
h. A: [There are buildings over here] A’s right hand goes to her right, to the same place where her hand was in the (15a)’s gesture. Her hand is in a loose claw shape with the palm facing down.
But (15d) marks a change in how the physical location of the hands map to the locations of landmarks on the university campus. The discourse cue phrase and then in (15d) implicates that the linguistic unit attaches to the prior one with Narration, and its gesture attaches to the content of this clause with Depiction to capture the intuition that it depicts the road—the object introduced in the clause. But crucially, the physical location of this depiction bears no relation to the spatial setup given by the prior two gestures: technically, it does not attach to (15c) with Overlay, reflecting the fact that the mapping v# from physical space to virtual space that is a part of its interpretation is different from the mapping v that was used earlier. To put this another way, the road that is demonstrated in (15d) is not to the left of the walking path that is demonstrated in (15c), even though the hands in (15d) are to the left of where they were in (15c). These rhetorical connections mean that the gestures in (15ac) are no longer on the right frontier. And thus, according to our model, the mapping v that is evoked by (15ac) is no longer available for interpreting subsequent gestures (while the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
When A says road, both her hands are brought back towards the centre of her torso, with flat open hand shapes (ASL-5) and palms held vertically and facing each other, with fingers facing forwards.
Alex Lascarides and Matthew Stone 419
mapping v# that is used in (15d) is available). Interestingly, this prediction matches our intuitions about how the gestures in (15fg) are interpreted. In particular, even though the gesture in (15g) places the right hand in the same place as it was in (15a), it does not demonstrate that the buildings it denotes are co-located with the place where the agent is to turn right (i.e. at the parking lot). The right-frontier constraint likewise predicts that the clause in (15g) cannot connect to the clause in (15a); it cannot be interpreted in the same way as the discourse ‘So then, once you get to the parking lot you turn right; there are buildings over here’.
We now complete our presentation of the LF for embodied discourse by giving formally precise definitions. We start with the syntax of the language Lsdrs for expressing LFs, which is based on that of SDRT. It is extended to include spatial expressions (see Section 3.1) and the two last labels (see Section 3.3). The dynamic semantics of Lsdrs is similar to that in Asher & Lascarides (2003), except that a context of evaluation is refined to include two partial variable assignment functions rather than one; these track salient entities for interpreting language and gesture, respectively (see Section 3.2). We close with worked examples to illustrate the formalism. Definition 1 Vocabulary and Terms The following vocabulary provides the syntactic atoms of the language Lsdrs :
A set P of predicate symbols (P1, P2, . . .) each with a specified sort giving its arity and the type of term in each argument position; A set R of (two-place) rhetorical relation symbols over labels (e.g. Contrast, Explanation, Narration, Overlay, . . .); Individual variables (x1, x2, y, z. . .); eventuality variables (e1, p1 ;~ p2 . . .); e2. . .); and constants for spatio-temporal regions (~ Variables over mappings between spatio-temporal regions (v1, v2. . .); The boolean operators (:, ^); the operator ½G; and quantifiers "; and $; Labels p1, p2. . ..
We also define terms from this vocabulary, each with a corresponding sort. Individual variables are individual terms; eventuality variables
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
3.4 Summary: formalism and examples
420 A Formal Semantic Analysis of Gesture are eventuality terms; and if ~ p is a constant for a spatio-temporal region and v is a variable over mappings, then ~ p and vð~ pÞ are place terms. A LF for discourse is a Segmented Discourse Representation Structure (SDRS). This is constructed from SDRS-formulae in Lsdrs as defined in Definition 2:
1. 2. 3. 4.
If P 2 P is an n-place predicate and i1, i2, . . ., in are terms of the appropriate sort for P, then Pði1 ; . . . ; in Þ 2 Lbase . If /; w 2 Lbase and u is a variable, then $u/; "u/ 2 Lbase . If R is a rhetorical relation and p1 and p2 are labels, then Rðp1 ; p2 Þ 2 Lsdrs . If /; w 2 Lsdrs , then / ^ w; :/; ½G/ 2 Lsdrs .
An SDRS is a set of labels (two of which are designated to be last), and a set of SDRS-formulae associated with each label: Definition 3 SDRS An SDRS is a triple: ÆA; F; lastæ, where:
A is a set of labels; F is a mapping from A to Lsdrs ; and last is set containing at most two labels fps, pgg 4 A, where ps labels the content of a token linguistic unit, and pg the content of a token gesture (intuitively, this is the last multimodal act and last will contain no gesture label if the act had no gesture).
We say that p immediately outscopes p# iff F(p) contains p# as a literal. Its transitive closure _ must be a well-founded partial order with a unique root (i.e. there is a unique p0 2 A such that "p 2 A, p0 c p). The unique root makes an SDRS the LF of a single discourse: the segment that the root label corresponds to is the entire discourse. The outscopes relation need not form a tree, reflecting the fact that a single communicative act can play multiple illocutionary roles in its context (see Section 3.3). When there is no confusion, we may omit last
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Definition 2 SDRS-Formulae The definition of the SDRS-formulae Lsdrs starts with a definition of a subset Lbase Lsdrs of SDRS-formulae that feature no rhetorical relations:
Alex Lascarides and Matthew Stone 421
from the specification of an SDRS, writing it ÆA; Fæ. We may also write F(p) ¼ / as p :/. And we will continue occasionally to use Kp as notation for the content F(p). An example SDRS is shown in (9#); this represents one of the plausible interpretations of (9) (see Section 2)—the gesture depicts the subconscious nature of the processes that sustain low-level phonological errors: (9)
So there are these very low level phonological errors that tend to not get reported. The hand is in a fist with the thumb to the side (ASL A) and moves iteratively in the sagittal plane in clockwise circles (as viewed from left), below the mouth.
We will shortly discuss the much more incomplete representation of meaning that is revealed by (9)’s form and how commonsense reasoning uses that together with contextual information to construct the SDRS (9#). But first, we give details of the model theory of SDRSs, ensuring in particular that the dynamic semantics of (9#) is as intended. Definition 4 Model A model is a tuple ÆD; L; T; Iæ where:
D consists of eventualities (DE) and individuals (DI). L R4 is a spatio-temporal locality. T is a set of constrained mappings from L to L (i.e. they can expand, contract and rotate space, but not invert it). I is an interpretation function that maps non-logical constants from Lbase to denotations of appropriate type (e.g. Ið~ pÞ4L).
Note that I does not assign denotations to rhetorical relations; we will return to them shortly. But the semantics of all SDRS-formulae / relative to a model M will specify a context-change potential that characterizes exactly when / relates an input context to an output one. A context is a pair of partial variable assignment functions Æf ; gæ (see Section 3.2 for motivation); these define values for individual variables (f(x) 2 DI), eventuality variables (f(e) 2 DE) and spatial mappings (f(v) 2 T).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(9#) p1 : $y(low-level(y) ^ phonological(y) ^ errors(y) ^ go-unreported(e, y)) p2 : ½G$xðcontinuousðxÞ ^ below-awarenessðxÞ ^ processðxÞ ^ sustainðe#; x; yÞÞ p0 : Explanationðp1 ; p2 Þ
422 A Formal Semantic Analysis of Gesture As is usual in dynamic semantics, all atomic formulae and :/ are tests on the input context. The existential quantifier $x extends the input functions Æf ; gæ to be defined for x, and dynamic conjunction is composition. Hence, $x/ is equivalent to $x ^ /. The operator ½G for gesture ensures that all formulae in its scope act as tests or updates only on the function g in the input context Æf ; gæ, but leave f unchanged. This means that the denotations for each occurrence of x in ð½G$xPðxÞÞ ^ ð½GQðxÞÞ are identical, but they do not corefer in ð½G$xPðxÞÞ ^ QðxÞ. This matches intuitions about coreference in discourse (across gestures v. from gesture to subsequent speech, respectively) that we discussed in Section 3.2.
1. 2. 3. 4. 5.
6. 7. 8.
Where i is a constant term, Æf ; gæ½½iM ¼ IðiÞ. Where i is a variable, Æf ; gæ½½iM ¼ f ðiÞ. Where vð~ pÞ is a spatial term, Æf ; gæ½½vð~ pÞM ¼ f ðvÞðIð~ pÞÞ. n For a formula P (i1,. . .,in), Æf ; gæ½½Pn ði1 ; . . . ; in ÞM Æf #; g#æ iff Æf ; gæ ¼ Æf #; g#æ and ÆÆf ; gæ½½i1 M ; . . . ; Æf ; gæ½½in M æ e IðPn Þ. Æf ; gæ½½$xM Æf #; g#æ iff: (a) dom( f#) ¼ dom( f ) [ fxg and "y 2 dom( f ), f#(y) ¼ f(y) (i.e. f 4x f#); (b) dom(g#) ¼ dom(g) [ fxg and "y 2 dom(g), g#(y) ¼ g(y) (i.e. g 4x g#); (c) f#(x) ¼ g#(x). Æf ; gæ½½/ ^ wM Æf #; g#æ iff Æf ; gæ½½/M +½½wM Æf #; g#æ. Æf ; gæ½½:/M Æf ; gæ iff Æf ; gæ ¼ Æf #; g#æ and for no Æf $; g$æ; Æf ; gæ½½/M Æf $; g$æ Æf ; gæ½½½Gð/ÞM Æf #; g#æ iff f ¼ f #and $g$ such that Æg; gæ½½/M Æg$; g#æ.
Finally, we address the semantics of rhetorical relations. Unlike the predicate symbols in P, these do not impose tests on the input context. As speech acts, they change the context just like actions generally do. We emphasize veridical relations: Definition 6 Semantic Schema for Rhetorical Relations Let R be a veridical rhetorical relation (i.e. Narration, Background, Elaboration, Explanation, Contrast, Parallel, Depiction, Overlay, Replication). Then: Æf ; gæ½½Rðp1 ; p2 ÞM Æf #; g#æ iff Æf ; gæ½½Kp1 ^Kp2 ^ uRðp1 ;p2 Þ M Æf #; g#æ
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Definition 5 Semantics of SDRS-formulae without rhetorical relations
Alex Lascarides and Matthew Stone 423
So, for instance, representing (16) with p0:Narration(p1, p2), where p1 and p2 label the contents of the clauses, ensures its dynamic interpretation matches intuitions: John went out the door, and then from the other side of the door he turned right. (16) p1. John went out the door. p2. He turned right. It is important to stress, however, that these interpretations of rhetorical relations are defined only with respect to complete interpretations: Kp1 and Kp2 must be SDRS-formulae, with all underspecified aspects of content that are revealed by form fully resolved. Accordingly, the type of speech act that is performed is a property of a contextually resolved interpretation of an utterance (or a gesture) rather than a property of its form. This belies the fact that in linguistic discourse, it is possible to align certain linguistic forms with certain types of speech acts: for example indicatives tend to be assertions while interrogatives tend to be questions. Such alignments are not possible with gesture, and our theory reflects this: the form of a gesture on its own is insufficient for inferring anything about its illocutionary effects; it is only when it is combined with context that clues about the speech act are revealed. In short, the syntax and model theory of Lsdrs are designed only for representing the pragmatically preferred interpretations. And when there is more than one pragmatically plausible interpretation, there is more than one LF expressed in Lsdrs . We will examine shortly how these pragmatic interpretations are inferred from form and context. 8
In fact, this axiom is stated here in simplified form; for details, see Asher & Lascarides (2003).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In words, R(p1,p2) transforms an input context Æf ; gæ into an output one Æf #; g#æ if and only if the contents Kp1 followed by Kp2 followed by some particular illocutionary effects uRðp1 ;p2 Þ also do this. Meaning postulates then impose constraints on the illocutionary effects uRðp1 ;p2 Þ for various relations R. For instance, the meaning postulate for uNarrationðp1 ;p2 Þ stipulates that individuals are in the same spatio-temporal location at the end of the first described event ep1 as they are at the start of the second described event ep2 , and so ep1 temporally precedes ep2 (we assume prestate and poststate are functions that map an eventuality to the spatio-temporal regions in L of its prestate and poststate, respectively):8 Meaning Postulate for Narration uNarrationðp1 ;p2 Þ /overlapðpoststateðep1 Þ; prestateðep2 ÞÞ
424 A Formal Semantic Analysis of Gesture
Definition 7 The Dynamic Interpretation of an SDRS Let S ¼ ÆA; F; lastæ be an SDRS, and let p0 2 A be its unique root. Then: Æf ; gæ ½½SM Æf #; g#æ iff Æf ; gæ ½½Fðp0 ÞM Æf #; g#æ If the SDRS features only veridical rhetorical relations, then it transforms an input context into an output one only if the contents of each clause and gesture also do this. As discussed in Section 3.3, we minimize the changes to SDRT’s original constraints on the parts of a discourse context to which new material can be rhetorically connected—the so-called notion of availability. In other words, the available labels in embodied discourse are those on the right frontier of at least one last label: Definition 8 Availability for Multimodal Discourse Let S ¼ ÆA; F; lastæ be an SDRS for multimodal discourse (and so by Definition 3, last is a non-empty set of at most two labels). Where p, p# 2 A, we say that p > p# iff either p immediately outscopes p# or there is a label p$ 2 A such that F(p$) contains the literal R(p, p#) for some subordinating rhetorical relation R (e.g. Elaboration or
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Rhetorical relations that are already a part of SDRT—like Explanation—can now relate contents of gesture. We also argued earlier for three relations whose arguments are restricted to gesture: Depiction, Overlay and Replication. These are all veridical relations, and the meaning postulates that define their illocutionary effects match the informal definitions given earlier. For instance, Depiction(p1, p2) holds only if p1 labels the content of a spoken unit, p2 labels the content of a gesture, and Kp1 and Kp2 are non-monotonically equivalent; we omit formal details because it requires a modal model theory for Lsdrs as described in Asher & Lascarides (2003). We assume that a discourse unit labels speech only if all the minimal units outscoped by it label speech; similarly for gesture. Overlay(p1, p2) holds only if p1 and p2 are gestures, and Kp2 continues to develop the same virtual space as Kp1 : in other words, Kp1 and Kp2 entail contingent formulae containing the same mapping v. Finally, uReplicationðp1 ;p2 Þ holds only if p1 and p2 are gestures, and they depict common entities in the same way. More formally, there is a partial isomorphic mapping l from the constructors in Kp1 to those in Kp2 such that for all constructors c from Kp1 , c and l(c) are semantically similar. We forego a formal definition of semantic similarity here. Definition 3 now formalizes the interpretation of an SDRS:
Alex Lascarides and Matthew Stone 425
Explanation but not Narration). Let >* be the transitive closure of >. Then p 2 A is available in S iff p >*l, where l 2 last. In keeping with our strategy for minimizing changes that are engendered by gesture, the constraints on anaphora match exactly SDRT’s constraints for purely linguistic discourse:
In other words, an antecedent must be in the same discourse unit as the anaphor, or accessible in a distinct unit that is rhetorically connected to a unit that contains the anaphor. To illustrate this formalism, we give the precise semantics for some key discourses that guided its development. We start with the discourse (5); its SDRS is shown in (5#)—with some simplification, since we have ignored tense and presuppositions: (5) a. p1: and took his hatchet p2: Speaker’s right hand grasps left hand, with wrists bent. p3: Speaker lifts poised hands above right shoulder. b. p4: and with a mighty sweep p5: Hands, held above right shoulder, move back then forward slightly. c. p6: SLICED the wolf ’s stomach open p7: Arms swing in time with sliced; then are held horizontally at the left (5#) Æfp0 ;p;p1 ;p2 ;p3 ;p4 ;p5 ;p6 ;p7 g;F; fp6 ;p7 gæ, where F is as follows: p1: $hw[took(e1, w, h) ^ hatchet(h)] p2: ½Gð$lra½left handðl; wÞ ^ right handðr; wÞ ^ handleðh; aÞ^ grab(e2, w, a) ^ instrument(e2, l) ^ instrument(e2, r)]) p3: ½Gð$d½right shoulderðd; wÞ ^ liftðe3 ; w; hÞ^ goal_location(e3, d) ^ instrument(e3, l) ^ instrument(e3, r)]) p4: $s[sweep(s) ^ mighty(s) ^ with(e4, s)] p5: ½Gðcoiled backswingðe5 ; wÞ^instrumentðe5 ; lÞ^instrumentðe5 ; rÞÞ p6: $ft[slice-open(e4, w, t) ^ stomach(t, f ) ^ wolf( f )] p7: ½G½slice-openðe4 ; w; tÞÞ
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Definition 9 Antecedents to Anaphora Suppose that Kb contains an anaphoric condition u. Then the available antecedents to u are terms that are: 1. in Kb and DRS-accessible to u (as defined in Kamp & Reyle 1993); or 2. in Ka, DRS-accessible to any sub-formula in Ka, and there is a formula R(a, c) in the SDRS such that ccb.
426 A Formal Semantic Analysis of Gesture p0: Elaboration(p1, p) ^ Narration(p1, p4) ^ Explanation(p4, p5) ^ Replication(p, p5) ^ Narration(p, p5) ^ Background(p4, p6) ^ Depiction(p6, p7) ^ Replication(p5, p7) ^ Narration(p5, p7) p: Replication(p2, p3)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This LF is built from the underspecified content that is revealed by linguistic and gestural form via complex pragmatic inference. In fact, resolving the underspecified content and identifying the rhetorical connections are logically codependent. We will discuss such inference in Section 5. Here, we simply explain (5#)’s dynamic semantic interpretation. First, observe the referential connection between the initial grasping gesture and the clause in (5a): the woodsman w and the hatchet h are bound by quantifiers in the content F(p1) of the linguistic component of (5a). But the dynamic semantics of Explanation(p1, p) (and the value of F(p), which outscopes the content F(p2) of the gesture) then ensures that the functions f and g in the input context Æf ; gæ for interpreting the gesture F(p2) assign the same values to w and h as is used to satisfy the body of the formula F(p1)—the speech and gesture are about the same woodsman and hatchet. Furthermore, by Definition 9, w and h are available antecedents for the bridging references to the woodsman’s hands l and r and the handle a of the hatchet that form part of the content F(p2) of the gesture. Similarly, the continued references to these individuals throughout the rest of the gestures is licensed by the sequence of Replication relations connecting p2 to p3 and then to p5 and finally to p7 (and these rhetorical connections are licensed by Definition 8). These connections also entail that all the gestures depict the same mimicry—here, the woodsman’s embodied actions in attacking the wolf with the hatchet. The fragments of speech also rhetorically connect together: the first clause p1 describes a first event; the next adjunct p4 continues the narrative, indicating that the sweep immediately follows the taking described in p1; the Background relation between p4 and p6 entails that the sweep is part of the action that accomplishes the slicing. Finally, we have an additional layer of rhetorical connections that describe the interaction of gesture and speech. We assume that the two gestures in p2 and p3 show how the woodsman takes his hatchet: by grabbing the handle with his hands and hoisting it over his right shoulder. Then, we assume that the coiled backswing demonstrated in gesture p5 shows how the woodsman is able to deliver such a mighty swing—so this
Alex Lascarides and Matthew Stone 427
gesture serves as an Explanation of the synchronous speech. The final ‘slicing’ gesture of p7 is a direct Depiction of the event described in the utterance segment p6 that accompanies it, and the Narration connection to the gesture p5 entails that the slicing happens after the coiled backswing. Again, Definition 8 makes this rhetorical structure possible. Now consider an example with identifying spatial reference: (12) [These things] push up the pins. The speaker points closely at the front-most wedge of the line of jagged wedges that runs along the top of a key as it enters the cylinder of a lock.
The construction rules for multimodal utterances introduce an underspecified anaphoric condition for these things, and an underspecified condition id_rel for relating the denotation of these things to the referent of the synchronous identifying gesture. Resolving id_rel to exemplifies, identifying the denotation of these things to the set of wedges s on the top surface of the key, identifying the demonstrated object w as the front-most wedge and identifying the gesture as directly demonstrating a copresent object in real space (so that the spatial mapping is vI) are all logically codependent tasks. The individual w that is referenced in the identifying gesture but not in synchronous speech is outscoped by ½G; this predicts the anomaly of continuing (12) with, for example ??It has the right height via Definition 5. The LF (14#) of example (14) formalizes the metaphorical use of spatial reference. We divide the speech into two segments: p1 labels ‘we have one ball’; and p2 elaborates the speaker’s act in providing this information—it is something Susan has already said. (14) We have this one ball, as you said, Susan. The speaker sits leaning forward, with the right-hand elbow resting on his knee and the right hand held straight ahead, in a loose ASL L gesture (thumb and index finger extended, other fingers curled) pointing at his addressee.
(14#) p1 : $wb(we(w) ^ have(e, w, b) ^ one(b) ^ ball(b)) p2 : $us(susan(s) ^ said(e#, u, s)) pi ÞÞ p3 : ½Gclassifyðe$; u; vm ð~ p : Depiction(p2, p3) p0 : Elaboration*(p1, p) The gesture p3 offers a metaphorical depiction of ‘as you said Susan’: it classifies the speaker’s utterance u as associated with the virtual space of Susan’s contributions. In fact, given the illocutionary effects of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(12#) p0: $sp(things(s) ^ pins(p) ^ push_up(e, s, p)) ^ ½G$wðexemplifiesðw; sÞ ^ locðe; w; vI ð~ pw ÞÞÞ
428 A Formal Semantic Analysis of Gesture Elaboration*(p1, p) (formal details of which we omit here), it is satisfied only if the content u of what Susan said entails Kp1 . 4 UNDERSPECIFIED MEANING FOR GESTURE
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The LFs presented in Section 3 capture specific interpretations in context. These result from inference that reconciles the abstract meanings that are revealed by linguistic and gestural forms with overarching constraints on coherent communication and commonsense background knowledge. In this section, we formalize the abstract level of gesture meaning that is revealed by its form. The formalization follows (Kendon 2004; Kopp et al. 2004; McNeill 2005) in locating iconicity and deixis within individual aspects of the form of a gesture. For example, Kendon (2004) finds interpretive generalizations across related gestures with similar handshapes or hand orientations. Kopp et al. (2004) describe image description features that offer an abstract representation of the iconic significance of a wider range of form features. Section 4.1 reviews how the descriptive literature analyses gestures as complexes of form features and gives a formal realization. Section 4.2, meanwhile, formalizes the significance of these form features as constraints on interpretation. We emphasize that principles of gesture meaning such as iconicity cannot be formalized transparently at the level of truth-conditional content. Consider, for example, the interpretive effect of holding the right hand in a fist while performing a gesture. The fist itself might depict a roughly spherical object located in a virtual space. For example, McNeill (1992, example 8.3: 224) offers a case where a speaker narrating the events of a cartoon uses a fist to depict a bowling ball. Alternatively, the fist might mirror the described pose of a character’s hand as a fist. Threatening a punch is such a case, as when Jackie Gleason, playing the role of Ralph on The Honeymooners, holds up a fist to his wife and announces ‘You’re going to the moon!’. Finally, the fist can depict a grip on a (perhaps abstract) object, as in the woodsman’s grip on the hatchet in (5) or our metaphorical understanding of low-level processes as carrying speech errors with them in (9). Logically, the different cases involve qualitatively different relationships, with different numbers and sorts of participants; so the iconicity shared by all these examples must reside in an abstract description of iconic meaning, rather than any specific iconic content shared by all the cases. Finally, in Section 4.3, we formalize the additional constraints on interpretation that emerge when gesture and speech are used together
Alex Lascarides and Matthew Stone 429
in synchrony. Our formalism represents these constraints through abstract, underspecified relationships that connect content across modalities. The semantic constraints contributed by iconicity, deixis and synchrony describe LF and provide input to the processes of establishing discourse coherence described in Section 5. The semantic constraints identify a range of alternative possible specific interpretations. Recognizing why the communicative action is coherent and identifying which of the possible specific interpretations are pragmatically preferred are then logically codependent tasks.
By the form of gesture, we mean the cognitive organization that underlies interlocutors’ generative abilities to produce and recognize gestures in an unbounded array. This definition shows a clear parallel to natural language grammar, and we build on that parallel throughout. But the differences between gesture form and natural language syntax are also important. In particular, gesture seems to lack the arbitrary correspondence between the order of actions and their interpretation as an ensemble, as mediated by a hierarchical formal structure, which is characteristic of natural language syntax (McNeill 1992). Instead, gesture form is at heart multidimensional. A gesture involves various features of performance—the hand shape, the orientations of the palm and finger, the position of the hands relative to the speaker’s torso, the paths of the hands and the direction of movement. These form features are interpreted jointly; not through arbitrary ‘syntactic’ conventions, but through creative reasoning about the principles of iconicity, deixis and coherence. Following Kopp et al. (2004), we represent this multidimensionality by describing the form of each gesture stroke with a feature structure. The feature structure contains a list of attribute–value pairs characterizing the physical makeup of the performance of the gesture. For example, we represent the form of the right-hand gesture identifying Norris Hall in (1) with the feature structure (17). 2 3 identifying-gesture 6 right-hand-shape : loose-asl-5-thumb-open 7 6 7 7 ð17Þ 6 right-finger-direction : forward 6 7 4 right-palm-direction : up-left 5 right-location : ~ c
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
4.1 The form of gesture
430 A Formal Semantic Analysis of Gesture
This treats the hand movement in (9) as one stroke and it abstracts away from the number and exact spatial trajectory followed in each circling of the hand. We might also analyse this gesture as a composition of several identical strokes. Furthermore, the repetition of the movement is captured in (18) via the value iterative; another licenced representation lacks this value but instead makes trajectory be exactly two sagittal circles. Finally, this gesture has a licenced representation whose values are spatio-temporal coordinates as in (17) rather than qualitative as in 9 Note that some of the values are expressed as sets (e.g. the movement direction). This allows us to capture generalizations over clockwise movements on the one hand (iterative or not) and iterative movements on the other (where iterative can represent a finite repetition of movement, as in this case). More generally, if features change during a stroke, we can specify feature values as sequences as well.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Here ~ c is the spatio-temporal coordinate in R4 of the tip of the right index finger (i.e. up and to the right of the speaker’s shoulder) that, together with the values of the other attributes, serves to identify the region ~ pn in space that is designated by the gesture. Unlike Kopp et al. (2004), our representations are typed: for example (17) is typed identifying-gesture. We particularly use this to distinguish between form features that are interpreted in terms of spatial reference, like the feature right-location in (17), and those that are interpreted via iconicity, perhaps like the feature right-hand-shape in (17). Kendon (2004, Ch. 11) observes that hand shape in pointing gestures often serves as an indication of the speaker’s communicative goal in demonstrating an object—distinguishing, presenting, orienting, directing attention— through a broadly metaphorical kind of meaning-making. The organization of gesture is recognized or constructed by our perceptual system. So parsing a non-verbal signal into a contextually appropriate description of its underlying form is a highly complex task where many ambiguities must be resolved—or left open—just as in parsing language. We can see this by recapitulating the ambiguity in form associated with the iterative circling gesture of (9).9 One account of its form is (18): 2 3 qualitative-characterising-gesture 6 right-hand-shape : asl-a 7 6 7 6 right-finger-direction : down 7 6 7 6 7 ð18Þ 6 right-palm-direction : left 7 6 right-trajectory : sagittal-circle 7 6 7 4 right-movement-direction : fiterative; clockwiseg 5 right-location : central-right
Alex Lascarides and Matthew Stone 431
4.2 The meaning of gesture We formalize gesture meaning using the technique of underspecification from computational semantics (Egg et al. 2001). With underspecification, knowledge of meaning determines a partial description of the LF of an utterance. This partial description is expressed in a distinct language Lulf from the language Lsdrs of LFs. Each model M for Lulf corresponds to a unique formula in Lsdrs , and M satisfies / 2 Lulf if and only if / (partially) describes the unique formula corresponding to M. Semantic underspecification languages are typically able to express partial information about semantic scope, anaphora, ellipsis and lexical sense; for example Koller et al. (2000). For instance, pronoun meaning will stipulate that the LF must include an equality between the discourse referent that interprets the pronoun and some other discourse referent that is present in the LF of the context, but it will not stipulate which contextual discourse referent this is. Thus, the partial description is compatible with several LFs: one for each available discourse referent in the LF of the context. Such partial descriptions are useful for ensuring that the grammar neither underdetermines nor over-determines content that is revealed by form. We illustrate the technical resources of underspecification with a sentence (19a) whose syntax under-determines semantic scope. In a typical underspecified representation, so-called Minimal Recursion Semantics (Copestake et al. 1999), the description given in (19b) underspecifies semantic scope: (i) each predication is labelled (l1, etc); (ii) scopal arguments are holes (h1, h2 etc); (iii) there are scope
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(18) (and accordingly it will have a distinct root type); this form, in contrast to (18), yields spatial constants in semantics. Our theory tolerates and indeed welcomes such ambiguities. Similarly, we regard synchrony as an underlying perceptual judgement about the relationship between gesture and speech. Because we regard form as an aspect of perceptual organization, we do not need to assume that perceived synchrony between speech and gesture necessarily involves strict temporal constraints. In fact, there is no clear answer about the conditions required for a gesture to be perceived as synchronous with a linguistic phrase (Oviatt et al. 1997; Sowa & Wachsmuth 2000; Quek et al. 2002). Interlocutors’ judgements are influenced by the relative time of performance of the gesture to speech, the type of syntactic constituent of the linguistic phrase (and possibly the type of gesture), prosody and perhaps other factors. We remain neutral about these details.
432 A Formal Semantic Analysis of Gesture constraints (h > l means that h outscopes l’s predication) and (iv) the constraints admit two ways of equating holes with labels.
Intersective modification is achieved by sharing labels across predications (e.g. l2 in (19b)), meaning that in any fully specific LF, _black_a_1(e1, x) and _cat_n_1(x) are connected with logical conjunction. Observe also the naming convention for the predicate symbols, based on word lemmas, part-of-speech (POS) tags and sense numbers. Our approach to gesture also leverages this ability to regiment the links between form and meaning. An extension of these ideas is explored in Robust Minimal Recursion Semantics (RMRS; Copestake 2003)—RMRS can also underspecify the arity of the predicate symbols and what sorts of arguments they take. Since the iconic meaning of gesture constrains, but does not fully determine, all these aspects of interpretation, we adopt RMRS as the underlying semantic formalism Lulf . RMRS is fully compatible with the language Lsdrs of SDRT (Asher & Lascarides 2003: 122)—indeed, SDRT’s existing glue logic supports any description language, and so can construct from the RMRSs of discourse units an SDRS (or a set of them if ambiguities persist) that captures the pragmatic interpretation [we do not use the specific description language from Asher & Lascarides (2003) here because it does not underspecify arity]. The RMRS corresponding to (19b) is (19c): RMRSs offer a more factorized representation where the base predicates are unary and the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(19) a. Every black cat loved some dog. b. l1 : _every_q(x, h1, h2) l2 : _black_a_1(e1, x) l2 : _cat_n_1(x) l3 : _loved_v_1(e2, x, y) l4 : _some_q(y, h3, h4) l5 : _dog_n_1(y) h2 > l2, h3 > l5 c. l1 : a1_every_q(x), RESTR(a1, h1), BODY(a1, h2) l21 : a21 : _black_a_1(e1), ARG1(a21, x1) l22 : a22 : _cat_n_1(x2) l3 : a3 : _loved_v_1(e2), ARG1(a3, x3), ARG2(a3, y1) l4 : a4 : _some_q(y), RESTR(a4, h3), BODY(a4, h4) l5 : a5 : _dog_n_1(y2) h2 > l2, h3 > l5 x ¼ x1, x ¼ x2, x ¼ x3, x1 ¼ x2, x2 ¼ x3, y ¼ y1, y ¼ y2, y1 ¼ y2, l21 ¼ l22
Alex Lascarides and Matthew Stone 433
other arguments are represented by separate binary relations on the unique anchor of the relevant predicate symbols (a1, a2, . . .) together with variable and label equalities (e.g. x ¼ x1, l21 ¼ l22). This factored representation allows one to build semantic components to shallow parsers, where lexical or syntactic information that contributes to meaning is absent. An extreme example would be a POS tagger: one can build its semantic component simply by deriving lexical predicate symbols from the word lemmas and their POS tags, as given in (20): cat_NN1
loved_VVD
some_DD
Semantic relations, sense numbers and the arity of the predicates are missing from (20b) because the POS tagger does not reveal information about syntactic constituency, word sense or lexical subcategorisation. But the RMRSs (19c) and (20b) are entirely compatible, the former being more specific than the latter. In particular, the model theory of RMRS restricts the possible denotations of _lemma_tag_sense (which are all constructors in the fully specific language Lsdrs ) to being a subset of those of _lemma_tag. To regiment the interpretation of gesture formally in RMRS, we assume that interpretive constraints apply at the level of form features. Each attribute–value element yields an RMRS predication, which must be resolved to a formula in the LF of gesture in context.10 If the element is interpreted by iconcity, we constrain the resolution to respect possibilities for depiction. If the element is interpreted by spatial reference, we interpret it as locating an underspecified individual via an underspecified correspondence to the physical point in space-time that is designated by that feature of the gesture. We treat attributes with set values (e.g. the movement-direction attribute in (18)) like intersective modification in language (e.g. black cat in (19b)). This captures the intuition that the different aspects of the direction of movement in (18) must all depict properties of the same thing in interpretation. 10 This suggests that the form of an iconic gesture is like a bag of words. Kopp et al. (2004) liken it to a bag of morphemes, on the grounds that the resolved interpretations of the features cannot be finitely enumerated. But word senses cannot be enumerated either (Pustejovsky 1995); hence, our analogy with words is also legitimate.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(20) a. Every_AT1 black_JJ dog_NN1 b. l1 : a1 : _every_q(x), l21 : a21 : _black_a(e1), l22 : a22 : _cat_n(x2) l3 : a3 : _loved_v(e2), l4 : a4 : _some_q(y), l5 : a5 : _dog_n(y2)
434 A Formal Semantic Analysis of Gesture So, more formally, each iconic attribute–value pair introduces an underspecified predication that corresponds directly to it; for instance, the hand shape in (2) introduces the predication (21) to the RMRS for this gesture: (21) l1 : a1 : right_hand_shape_asl–a(i1)
Figure 3
Possible resolutions for hand_shape_asl-a.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Here, l1 is a unique label that underspecifies the scope of the predication; a1 is the unique anchor that provides the locus for specifying the predicate’s arguments; i1 is a unique metavariable that underspecifies the sort of the main argument of the predication (it could be an individual object or an eventuality); and hand_shape_asl-a underspecifies reference to a property that the entity i1 has and that can be depicted through the gesture’s fist shape. We then represent the possible resolutions of the underspecified predicates via a hierarchy of increasingly specific properties, as in Figure 3. The hierarchy of Figure 3 captures the metaphorical contribution of the fist to the depiction of the process in (9), by allowing right_hand_shape_asl-a to depict a holding event, metaphorically interpreted as the event e of a process x sustaining errors y in speech production (‘bearing them with it’, as it were). Following Copestake & Briscoe (1995), this treats metaphorical interpretations as a specialization of the underspecified predicate symbol that is produced by form, as opposed to coercion on a specific literal interpretation of a word that in turn contradicts information in the context; for example Hays & Bayer (2001). We have argued elsewhere for treating metaphor in linguistic discourse in this way (Asher & Lascarides 1995) and choose to treat metaphor in gesture in the same way so as to maintain as uniform a pragmatics for speech and gesture as possible.
Alex Lascarides and Matthew Stone 435
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
At the same time, we can capture the contribution of a fist as depicting something held by resolving right_hand_shape_asl-a accordingly; for example if the gesture in (9) were to accompany the utterance ‘the mouse ran round the wheel’, then the underspecified predicate symbol would resolve to a marker_point x indicating a designated location on the mouse’s spinning wheel. Finally, all underspecified predications are resolvable to validity (>), since any form feature in a gesture may contribute no meaning in context (e.g. the clockwise motion in (9)). We assume that resolving all predications to > is pragmatically dispreferred compared with logically contingent resolutions (see Section 5). Underspecified predicates may also share certain specific resolutions: for example marker_point is also one way of resolving the underspecified predicate corresponding to a flat hand— hand_shape_asl-5. Figure 3 reflects the fact that, like all dimensions of iconic gesture, the fist shape underspecifies how many entities it relates in its specific semantic interpretation. The predicates in Figure 3 vary in the number of arguments they take and the factorized notation of RMRS lets us express this. For example, sustain is a three-place relation and so l : a : sustain(e) entails l:a:sustain(e), ARG1(a, x), ARG2(h, y) for some x and y, while marker_point is a one-place property, and therefore l:a:marker_point(x), ARG1(a, y) is unsatisfiable. Figure 3 represents a special kind of commonsense background knowledge; namely, general possibilities for iconic representation. Technically, the interpretation of hand_shape_asl-a is not defined at all with respect to the dynamic semantics given in Definition 5 because it is not a part of the language Lsdrs of fully specific LFs; rather, the distinct and static model theory for RMRS ensures that hand_ shape_asl-a denotes a constructor from Lsdrs or a combination thereof that is compatible with the hierarchy in Figure 3. More informally, you can compare predications like hand_shape_asl-a to the image description features of Kopp et al. (2004)—an abstract representation that captures gesture meaning. While some of the leaves in this hierarchy correspond to fully specific interpretations, others represent vague ones. [Following Kopp et al. (2004), we do not believe that the specific interpretations that are licensed by a (unique) underspecified semantic representation can be finitely enumerated.] We envisage that either the speaker and hearer sometimes settle on a coherent but vague interpretation or additional logical axioms will resolve a vague interpretation to a more specific one in the particular discourse context. Let us now consider the compositional semantics of a spatially locating component of gesture meaning. In words, (22) states that x
436 A Formal Semantic Analysis of Gesture denotes an individual which is spatially located at the coordinates vð~ pÞ, where ~ p is the physical location actually designated by the gesture and v is a mapping from this physical point to the space depicted in meaning (with the value of v being resolved through discourse interpretation). (22) l2 : a2 : sp_ref(e), ARG1(a2, x), ARG2ða2 ; vð~ pÞÞ
(23) l0 : a0 : ½GðhÞ l1 : a1 : right_hand_shape_asl-a(i1), l2 : a2 : right_finger_dir_down(i2), l3 : a3 : right_palm_dir_left(i3), l4 : a4 : right_traj_sagittal_circle(i4), l5 : a51 : right_move_dir_iterative(i5), l5 : a52 : right_move_dir_clockwise(i5), l6 : a6 : right_loc_central - right(i6), h > lj, for 1 < j < 6 To handle identifying gestures, we add an overall layer of quantificational structure as in (24) so that we model the gesture as identifying an appropriate entity x. (24) l0 : ½Gðh1 Þ, l1 : a1 : deictic_q(x), RESTR(a1, h2), BODY(a1, h3) l2 : a2 : sp_ref(e), ARG1(a2, x), ARG2ða2 ; vð~ pÞÞ h1 > l2, h2 > l2 More generally, anywhere the context-resolved interpretation of a gesture introduces an individual that is not coreferent with any
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The predicate sp_ref in (22) can resolve to the constructor loc or to the constructor classify in Lsdrs , with its dynamic semantics defined in Section 3.1. The constant ~ p in (22) is determined as a complex function of the grounding of gesture form in physical space. For instance, a pointing gesture (with hand shape 1-index) will make ~ p denote a cone whose tip is the hand-location coordinate ~ c, with the cone expanding out from ~ c in the same direction as the value of fingerdirection (Kranstedt et al. 2006). Finally, in Section 3.2 we motivated the introduction of an operator ½G for each stroke, which must outscope the content conveyed by the gesture’s form features. This was necessary for constraining coreference. We translate each gesture overall using an instance of this operator, constrained to outscope each predication that is contributed by its form features. Thus, the (underspecified) content arising from the form of the visualizing gesture in (18) is the RMRS (23):
Alex Lascarides and Matthew Stone 437
4.3 Combining speech and gesture in the grammar Like Kopp et al. (2004), we believe that we need a formal account of the integration of speech and gesture that directly describes the organization of multimodal communicative actions into complex units that contribute to discourse. Kopp et al. argue for this on the grounds of generation; our motivation stems from issues in interpretation. First, our observations about the relative semantic scope of negation and the content depicted by the gesture in (6) suggest that scope-bearing elements introduced by language can outscope gestured content. It is
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
individual in the synchronous speech (by the bridging inference described in Section 3.2), then we must have a quantifier and inference relation to introduce this individual, outscoped by ½G so that availability (Definition 8) and semantic interpretation (Definition 5) constrain anaphoric dependencies correctly—that is the gestured individual cannot be an antecedent to a pronoun in subsequent speech. This scopal constraint can be expressed as part of discourse update—the logic that builds a specific LF of discourse from the underspecified LFs of its units—although we forego details here. Mapping the syntactic representation of gestures such as (18) to their unique RMRS (23) is very simple. Computing the predications from each attribute–value pair in (18) involves exactly the same techniques as used by Copestake (2003) to build the semantic component of a POS tagger. Adding the scopal predication ½G and its >-conditions is triggered by the gestural-type of the feature structure: for example qualitative-characterizing-gesture introduces ½G via scopal modification as defined in the semantic algebra for RMRS (Copestake et al. 2001). Our representations of gesture meaning are analogous, both formally and substantively, to the underspecified meanings that computational semanticists already use to represent language. In particular, as we show in Section 5, we can therefore build reasoning mechanisms that combine information from language and gesture to derive integrated LFs for multimodal discourse. Differences remain across modalities, however, in the kinds of semantic underspecification that are present in the RMRS representation of a phrase v. that of gesture. A complete and disambiguated syntactic representation of a linguistic phrase fully specifies predicate argument structure. But gestures lack hierarchical syntactic structure and individual form features, unlike words, do not fix their subcategorization frames. Consequently, a complete and disambiguated syntactic analysis of gesture underspecifies all aspects of content, including predicate argument structure.
438 A Formal Semantic Analysis of Gesture
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
very straightforward to derive such a semantics from a single derivation of the structure of a multimodal utterance: use a construction rule to combine the gesture with computery and a further construction rule to combine the result with not. Standard methods for composing semantic representations from syntax—for example Copestake et al. (2001)—would then make the (scopal) argument of the negation outscope both the atomic formula computery(x) and the gesture modality ½G, as required. Of course, other analyses may be licensed by the grammar: for instance, the gesture might combine with the phrase not computery. (We do not address the resolution of such ambiguities of form here.) Secondly, we assume, as is standard in dynamic semantic theories, that discourse update has access to semantic representations but no direct access to form. But synchrony is an aspect of form that conveys meaning, and consequently we need to give a formal description of this form–meaning relation as part of utterance structure. For instance, we suggested in Section 2 that the content of a characterizing gesture must be related to its synchronous linguistic phrase with one of a subset of the full inventory of rhetorical connections (for instance, Disjunction is excluded). This means that the synchrony that connects speech and gesture together conveys semantic information that is similar to that conveyed by a highly sense-ambiguous discourse connective or a free adjunct in language: they both introduce rhetorical connections between their syntactic complements, but do not fully specify the value of the relation. We must represent this semantic contribution that is revealed by form. For identifying gestures, meanwhile, synchrony identifies which verbally introduced individual y is placed in correspondence with the individual x designated in gesture. Moreover, synchrony serves to constrain the relationship between x and y. Sometimes the relationship is equality but not always—as in these things said while pointing to an exemplar (see (12)). We treat the semantic relationship as underspecified but not arbitrary (Nunberg 1978). While we do not give details here, we assume a unification or constraint-based representation of utterance structure, following Johnston (1998). Construction rules in this specification describe the form and meaning of complex communicative actions including both speech and gesture. We assume that the construction rule for combining a characterizing gesture and a linguistic phrase contributes its own semantics. In other words, as well as the daughters’ RMRSs being a part of the RMRS of the mother node (as is always the case), the construction rule introduces the new predication (25) to the
Alex Lascarides and Matthew Stone 439
mother’s RMRS, where hs is the top-most label of the content of the ‘speech’ daughter, and hg the top-most label of the gesture: (25) l : a : vis_rel(hs), ARG1(a, hg)
(26) l2 : a21 : id_rel(x), ARG1(a21, y) So, for instance, the RMRS for a multimodal constituent consisting of the NP these things combined with the pointing gesture in (12) is the following: (27) l0 : ½Gðh1 Þ, l1 : a1 : deictic_q(x), RESTR(a1, h2), BODY(a1, h3) l2 : a2 : sp_ref(e), ARG1(a2, x), ARG2ða2 ; vð~ pÞÞ l2 : a21 : id_rel(x), ARG1(a21, y) l3 : a3 : _these_q(y), RESTR(a3, h4), BODY(a3, h5), l4 : a4 : _things_n_1(y) h1 > l2, h2 > l2, h4 > l4 Identifying gestures that combine with other kinds of linguistic syntactic categories, such as PPs and VPs are also possible in principle, although we leave the details to future work. 5 ESTABLISHING COHERENCE THROUGH DEFAULT INFERENCE SDRT describes which possible ways of resolving an underspecified semantics are pragmatically preferred. This occurs as a byproduct of discourse update: the process by which one constructs the LF of discourse from the (underspecified) compositional semantics of its units. So far, SDRT’s discourse update has been used to model linguistic phenomena. Here, we indicate how it can resolve the underspecified meaning of both language and gesture. Discourse update in SDRT involves non-monotonic reasoning and is defined in a constraint-based way: Where / 2 Lulf represents the (underspecified) old content of the context, which is to be updated
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The underspecified predicate vis_rel must then be resolved via discourse update to a specific rhetorical relation, where we assume at least that Disjunction is not an option. Similarly, we assume that the construction rule that combines an identifying gesture with an NP contributes its own semantics: the labelled predication (26), where l2 is the label of the spatial condition sp_ref introduced by the RMRS of the gesture (see (24)) and y is the semantic index of the NP.
440 A Formal Semantic Analysis of Gesture
Glue Logic Schema: (k : ?(a,b) ^ some stuff) > k : R(a,b)
In words, this axiom says: if b is to be connected to a with a rhetorical relation, and the result is to appear as part of the logical scope labelled k, but we do not know what the value of that relation is yet, and moreover ‘some stuff ’ holds of the content labelled by a and b, then normally the rhetorical relation is R. The ‘some stuff ’ is derived from the (underspecified) LFs (expressed in Lulf ) that a and b label (in our case, this language is that of RMRS), and the rules are justified on the basis of underlying linguistic knowledge, world knowledge or knowledge of the cognitive states of the dialogue agents. For example, the glue logic axiom Narration stipulates that one can normally infer Narration if the constituents that are to be rhetorically connected describe eventualities which are in an occasion relation. That is, there is a ‘natural event sequence’ such that events of the sort described by a lead to events of the sort described by b: Narration: k : ?(a, b) ^ occasion(a, b)) > k : Narration(a, b). Scripts of Schank and Abelson (1977) attempted to capture information about which eventualities occasion which others; in SDRT such scripts are default axioms. For example, we assume that the underspecified LFs of the clauses in (16) verify the antecedents of a default axiom whose consequence is occasion(p1, p2), yielding p0 : Narration(p1, p2) via Narration: (16) p1. John went out the door. p2. He turned right.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
with the (underspecified) new content w 2 Lulf of the current utterance, the result of update will be a formula v 2 Lulf that is entailed by both /, w and the consequences of a non-monotonic logic—known as the glue logic—when / and w are premises in it. This constraint-based approach allows an updated interpretation of discourse to exhibit ambiguities. This is well established in the literature for being necessary for linguistic discourse; here, we observed via examples like (1), (7) and (9) that it is necessary for gesture as well. The glue logic axioms specify which speech act k : R(a, b) was performed, given the content and context of utterances. And being a non-monotonic consequence of the glue logic, k : R(a, b) becomes part of the updated LF. Formally, the glue logic axioms typically have the shape schematized below, where A > B means If A then normally B (note that without loss of generality, we omit anchors when the arguments are specified; so, for instance, k : R(a, b) is a notational variant of k : a : R(a), ARG1(a, b)):
Alex Lascarides and Matthew Stone 441
Indeed, the glue logic axioms that do this should be neutral about sentence mood, so that they also predict that the imperatives in (28) are connected by Narration: (28) Go out the door. Turn right. In the model theory of SDRT, such a LF for (28) entails that (i) both imperatives are commanded (because Narration is veridical) and (ii) the command overall is to turn right immediately after going out the door.11 This is exactly the discourse interpretation we desire for the multimodal act (4) from the NUMACK corpus: You walk out the doors The gesture is one with a flat hand shape and vertical palm, with the fingers pointing right, and palm facing outward.
So our aim now is to ensure that discourse update in SDRT supports the following two codependent inferences in constructing the LF of (4): (i) the contents of the clause and gesture are related by Narration and (ii) the (underspecified) content of the gesture as revealed by its form resolves to turn right in this context. We will see how shortly. Explanation is inferred on the basis of evidence in the discourse for a causal relation:
Explanation: (k : ?(a,b) ^ causeD(b,a)) > k : Explanation(a,b)
Note that causeD(b, a) does not entail that b actually did cause a; the latter causal relation would be inferred if Explanation is inferred. The formula causeD(b, a) is inferred on the basis of monotonic axioms (monotonic because the evidence for causation is present in the discourse, or it is not), where the antecedent to these axioms are expressed in terms of the (underspecified) content of a and b. We assume that there will be such a monotonic axiom for inferring causeD(p2, p1) for the discourse (29) (we omit details), which bears analogies to the embodied utterance (9). (29) p1. There are low-level phonological errors which tend not to get reported. p2. They are created via subconscious processes. In SDRT the inferences can flow in one of several directions. If the premises of a glue logic axiom is satisfied by the underspecified semantics derived from the grammar, then a particular rhetorical 11 We did not give the model theory for imperatives in order to stay within the confines of the extensional version of SDRT in this paper. See Asher & Lascarides (2003) for details.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(4)
442 A Formal Semantic Analysis of Gesture
(30) ½G$xðcontinuousðxÞ ^ belowawarenessðxÞ ^ processðxÞ^ sustainðe#; x; yÞÞ This particular interpretation is licensed by the predications in the RMRS (23), via hierarchies such as the one shown in Figure 3 (using y
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
relation follows and its semantics yields inferences about how underspecified parts of the utterance and gesture contents are resolved. Alternatively, there are cases where the underspecified compositional semantics is insufficient for inferring any rhetorical relation. In this case, discourse update allows inference to flow in the opposite direction: one can resolve the underspecified content to a more specific interpretation that supports an inference to a rhetorical relation. If there is a choice of which way to resolve the underspecified content so as to infer a rhetorical relation from it, then one chooses an interpretation which maximizes the quality and quantity of the rhetorical relations; see Asher & Lascarides (2003) for details. There may be more than one such interpretation, in which case ambiguity persists in the updated discourse representation. Of course, this inferential flow from possible resolved interpretations of speech and gesture to rhetorical connections represents a competence model of discourse interpretation only. Any implementation of SDRT’s discourse update would have to restrict drastically the massive search space of fully resolved interpretations that are licensed by an underspecified LF for discourse. We have just begun to explore such issues in related work (Schlangen & Lascarides 2002). Let us illustrate the inference from underspecified content to complete LF with the example (9). As described in Section 4.2, the grammar yields (23) for the content of the gesture, an RMRS for the compositional semantics of the clause (which is omitted here for reasons of space) and the construction rule that combines them contributes the predication l : vis_rel(hs, l0), where hs outscopes all labels in the RMRS of the clause and l0 labels the scopal modifier ½G in (23). Producing a fully specific LF from this therefore involves, among other things, resolving the underspecified predications in (23), and the underspecified predicate vis_rel must resolve to a rhetorical relation that is licensed by it—so not Disjunction. Even though the RMRS (23) fails to satisfy the antecedent of any axiom for inferring a particular rhetorical relation, one can consider alternative ways of resolving it so as to support a rhetorical connection. One alternative is to resolve it to denote a continuous, subconscious process which sustains the phonological errors as shown in (30) where y is the low-level phonological errors introduced in the clause:
Alex Lascarides and Matthew Stone 443
Co-reference: ðid relðx; yÞ ^ locðe; y;~ pÞ ^ PðxÞ ^ PðyÞÞ > x ¼ y
In words, Co-reference stipulates that if x and y are related by id_rel, and moreover, the individual y that is physically located at ~ p shares a property with x, then normally x and y are coreferent. Other default axioms can be articulated for inferring exemplifies(x, y) rather than x ¼ y in the interpretation of (12). 6 CONCLUSIONS We have provided a formal semantic analysis of coverbal iconic and deictic gesture which captures several observations from the descriptive literature. For instance, three features in our analysis encapsulate the observation that speech and gesture together form a ‘single thought’. First, the content of language and gesture are represented jointly in the same logical language. Secondly, rhetorical relations connect the content of iconic gesture to that of its synchronous speech. And finally, language and gesture are interpreted jointly within an integrated architecture for linking utterance form
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
in (30) is also licensed by Definitions 8 and 9). And similar to discourse (29), this and the compositional semantics of the clause satisfy the antecedent of an axiom whose consequent is causeD(b, a). And so an Explanation relation is inferred via Explanation, resulting in the LF (9#) shown earlier. As stated earlier, an alternative specific interpretation of the gesture (in fact, one that can stem from an alternative analysis of its form) entails that the gesture depicts the low level of the phonological errors. This specific interpretation would validate an inference in the glue logic that the gesture and speech are connected with Depiction (this would be on the general grounds in the glue logic that the necessary semantic consequences of a rhetorical connection are normally sufficient for inferring it). If both of these interpretations are equally coherent, then discourse update predicts that the multimodal utterance, while coherent, is ambiguous. If, on the other hand, the interpretation given in (9#) yields a more coherent interpretation (and we believe that it does because it supports additional Contrast relations with prior gestures that are in the context), then discourse update predicts this fully specific interpretation. Finally, let us examine deictic gesture: discourse update must support inferences which resolve the underspecified relation id_rel between the denotations of an NP and its accompanying deictic gesture to a specific value. This is easily achieved via default axioms such as the following (we have omitted labels and anchors for simplicity):
444 A Formal Semantic Analysis of Gesture
Acknowledgements This work has been presented at several workshops and conferences: the Workshop on Embodied Communication (Bielefeld, March 2006); Constraints in Discourse (Maynooth, June 2006); the Pragmatics Workshop (Cambridge, September 2006),
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
and meaning. Our theory also substantiates the observation that iconic gesture on its own does not receive a coherent interpretation. Its form produces a very underspecified semantic representation; this must be resolved by reasoning about how it is coherently related to its context. Finally, we exploited discourse structure and dynamic semantics to account for coreference across speech and gesture and across sequences of gestures. One major advantage of our approach is that all aspects of our framework are already established for modelling purely linguistic discourse, and consequently we demonstrate that existing mechanisms for representing language can be exploited to model gesture as well. We showed that they suffice for a wide variety of spontaneous and naturally occurring coverbal gestures, ranging from simple deictic ones to much more complex iconic ones with metaphorical interpretations. Furthermore, our model is sufficiently formal but flexible that one can articulate specific hypotheses that can then guide empirical investigations that deepen our understanding of the phenomena. Ultimately, we hope that the empirical and theoretical enquiries that this work enables will support a broader perspective on the form, meaning and use of non-verbal communication. This will require describing the organization of gesture in relationship to speech, where theoretical and empirical work must interact to characterize ambiguities of form and describe how they are resolved in context. It also requires further research into other kinds of communicative action. For example, we believe that formalisms for modelling intonation and focus—for example Steedman (2000)—offer a useful starting point for an analysis of beat gestures. We have also ignored body posture and facial expressions. But as Krahmer & Swerts (2007) demonstrate, they can interact in complex ways not only with speech but also with hand gestures. Finally, we have focussed here almost entirely on contributions from single speakers. But in conversation, social aspects of meaning are important: we need to explore how gestures affect and are affected by grounding and disputes, for instance. This is another place where empirical research such as those of Emmorey et al. (2000) and formal methods such as those of Asher & Lascarides (2008) have been pursued independently and can benefit from being brought into rapport.
Alex Lascarides and Matthew Stone 445
ALEX LASCARIDES School of Informatics University of Edinburgh 10, Crichton Street Edinburgh EH8 9AB Scotland, UK e-mail:
[email protected]
MATTHEW STONE Computer Science Rutgers University 110 Frelinghuysen Road Piscataway, NJ 08854-8019 USA e-mail:
[email protected]
REFERENCES Asher, N. & Lascarides, A. (1995), ‘Metaphor in discourse’. In Proceedings of the AAAI Spring Symposium Series: Representation and Acquisition of Lexical Knowledge: Polysemy, Ambiguity and Generativity. Stanford. 3–7 Asher, N. & Lascarides, A. (2003), Logics of Conversation. Cambridge University Press. Cambridge. Asher, N. & Lascarides, A. (2008), ‘Commitments, beliefs and intentions in dialogue’. In Proceedings of the 12th Workshop on the Semantics and Pragmatics of Dialogue (Londial). London. 35–42.
Barker, C. (2002), ‘The dynamics of vagueness’. Linguistics and Philosophy 25:11–36. Benz, A., Ja¨ger, G. & van Rooij, R (eds.), (2005), Game Theory and Pragmatics. Palgrave Macmillan. Basingstoke, United Kingdom. Bittner, M. (2001), ‘Surface composition as bridging’. Journal of Semantics 18:127–77. Buchwald, A., Schwartz, O., Seidl, A. & Smolensky, P. (2002), ‘Recoverability optimality theory: discourse anaphora in a bidirectional framework’. In Proceedings of the International
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Brandial (Potsdam, September 2006); the Workshop on Dynamic Semantics (Oslo, September 2006); Dialogue Matters (London, February 2008) and the Rutgers Semantics Workshop (New Jersey, November 2008). We would like to thank the participants of these for their very useful comments and feedback. Finally, we would like to thank the many individuals who have influenced this work through discussion and feedback: Nicholas Asher, Susan Brennan and her colleagues at Stony Brook, Justine Cassell, Herb Clark, Susan Duncan, Jacob Eisenstein, Dan Flickinger, Jerry Hobbs, Michael Johnston, Adam Kendon, Hannes Rieser and his colleagues at Bielefeld, Candy Sidner and Rich Thomason. We also owe a special thanks to Jean Carletta, who provided support in our search of the AMI corpus, including writing search scripts that helped us to find what we were looking for. Finally, we would like to thank two anonymous reviewers for this journal, for their very detailed and thoughtful comments on earlier drafts of this paper, and the editors Anna Szabolcsi and Bart Geurts. Any mistakes that remain are our own. Much of this research was done while Matthew Stone held a fellowship in Edinburgh, funded by the Leverhulme Trust. We would also like to thank the following grants from the National Science Foundation for their financial support: HLC-0308121, CCF0541185 and HSD-0624191.
446 A Formal Semantic Analysis of Gesture Cumming, S. (2007), Proper Nouns. Ph.D. thesis, Rutgers University, Rutgers, NJ. Egg, M., Koller, A. & Niehren J. (2001), ‘The constraint language for lambda structures’. Journal of Logic, Language, and Information 10:457–85. Ekman, P. & Friesen, W. V. (1969), ‘The repertoire of nonverbal behavior: categories, origins, usage, and coding’. Semiotica, 1:49–98. Emmorey, K., Tversky, B. B. & Taylor, H. (2000), ‘Using space to describe space: perspective in speech, sign and gesture’. Spatial Cognition and Computation 2:157–80. Engle, R. (2000), Toward a Theory of Multimodal Communication: Combining Speech, Gestures, Diagrams and Demonstrations in Structural Explanations. Ph.D. thesis, Stanford University, Stanford. Farkas, D. (2002), ‘Specificity distinctions’. Journal of Semantics 19:1–31. Fauconnier, G. (1997), Mappings in Thought and Language. Cambridge University Press. Cambridge. Gibbs, R. W. (1994), The Poetics of Mind: Figurative Thought, Language, and Understanding. Cambridge University Press. Cambridge. Ginzburg, J. & Cooper, R. (2004), ‘Clarification, ellipsis and the nature of contextual updates in dialogue’. Linguistics and Philosophy 27:297–366. Glucksberg, S. & McGlone, M. S. (2001). Understanding Figurative Language: From Metaphors to Idioms. Oxford University Press. Oxford. Goffman, E. (1963), Behavior in Public Places. The Free Press, New York. Goldin-Meadow, S. (2003), Hearing Gesture: The Gestures We Produce when We Talk. Harvard University Press. Cambridge. Grice, H. P. (1975), ‘Logic and conversation’. In P. Cole and J. L. Morgan,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Workshop on the Semantics and Pragmatics of Dialogue (EDILOG). Edinburgh. 37–44. Carletta, J. (2007), ‘Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus’. Language Resources and Evaluation Journal 41:181–90. Carston, R. (2002), Thoughts and Utterances: The Pragmatics of Explicit Communication. Blackwell. Oxford. Cassell, J. (2001), ‘Embodied conversational agents: representation and intelligence in user interface’. AI Magazine 22:67–83. Chierchia, G. (1995), Dynamics of Meaning: Anaphora, Presupposition and the Theory of Grammar. University of Chicago Press. Chicago. Clark, H. (1977), ‘Bridging’. In P. N. Johnson-Laird and P. C. Wason (eds.), Thinking: Readings in Cognitive Science. Cambridge University Press. New York. 411–20. Clark, H. (1996), Using Language. Cambridge University Press. Cambridge. Copestake, A. (2003), ‘Report on the Design of RMRS’. Technical Report EU Deliverable for Project number IST-2001-37836, WP1a, Computer Laboratory, University of Cambridge. Copestake, A. & Briscoe, E. J. (1995), ‘Semi-productive polysemy and sense extension’. Journal of Semantics 12:15– 67. Copestake, A., Flickinger, D., Sag, I. A. & Pollard C. (1999), ‘Minimal recursion semantics: an introduction’. (http://www-csli.stanford.edu/;aac). Copestake, A., Lascarides, A. & Flickinger, D. (2001), ‘An algebra for semantic construction in constraint-based grammars’. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL/EACL 2001). Toulouse. 132–39.
Alex Lascarides and Matthew Stone 447 M. Stokhof (eds.), Formal Methods in the Study of Language. Mathematisch Centrum. Amsterdam. 277–322 Kamp, H. & Reyle, U. (1993), From Discourse to the Lexicon: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic Publishers. Dordrecht, the Netherlands. Kendon, A. (1972), ‘Some relationships between body motion and speech: an analysis of an example’. In W. Siegman and B. Pope (eds.), Studies in Dyadic Communication, Pergamon Press. Oxford. 177–210. Kendon, A. (1978), ‘Differential perception and attentional frame: two problems for investigation’. Semiotica 24:305–15. Kendon, A. (2004), Gesture: Visible Action as Utterance. Cambridge University Press. Cambridge. Koller, A., Mehlhorn, K. & Niehren, J. (2000), ‘A polynomial-time fragment of dominance constraints’. In Proceeedings of the 28th Annual Meeting of the Association for Computational Linguistics (ACL2000). Hong Kong. Kopp, S., Tepper, P. & Cassell J. (2004), ‘Towards integrated microplanning of language and iconic gesture for multimodal output’. In Proceedings of ICMI. State College, PA. Kortmann, B. (1991), Free Adjuncts and Absolutes in English: Problems in Control and Interpretation. Routledge. Abingdon, United Kingdom. Krahmer, E. & Swerts, M. (2007) ‘The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception’. Journal of Memory and Language 57:396–414. Kranstedt, A., Lu¨king, A., Pfeiffer, T., Rieser, H. & Wachsmith, I. (2006), ’Deixis: how to determine
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(eds.), Syntax and Semantics Volume 3: Speech Acts. Academic Press. New York. 41–58. Groenendijk, J. & Stokhof, M. (1991), ‘Dynamic predicate logic’. Linguistics and Philosophy 14:39–100. Grosz, B., Joshi, A. & Weinstein, S. (1995), ‘Centering: a framework for modelling the local coherence of discourse’. Computational Linguistics 21:203–26. Haviland, J. (2000), ‘Pointing, gesture spaces, and mental maps’. In David McNeill (ed.), Language and Gesture. Cambridge University Press. New York. 13–46. Hays, E. & Bayer, S. (2001), ‘Metaphoric generalization through sort coercion’. In Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL). Berkeley, California. 222–8 Heim, I. (1982), The Semantics of Definite and Indefinite Noun Phrases. Ph.D. thesis, University of Massachusetts, Amherst. Hobbs, J. R., Stickel, M., Appelt, D. & Martin, P. (1993), ‘Interpretation as abduction’. Artificial Intelligence 63:69–142. Johnston, M. (1998), ‘Unification-based Multimodal Parsing’. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and International Conference in Computational Linguistics (ACL/COLING). Montreal, Canada. Johnston, M., Cohen, P. R., McGee, D., Pittman, J., Oviatt, S. L. & Smith, I. (1997), ‘Unification-based multimodal integration’. In ACL/EACL 97: Proceedings of the Annual Meeting of the Association for Computational Linguistics. Madrid. Kamp, H. (1981), ‘A theory of truth and semantic representation’. In J. Groenendijk, T. Janssen, and
448 A Formal Semantic Analysis of Gesture Pustejovsky, J. (1995), The Generative Lexicon. MIT Press. Cambridge. Quek, F., McNeill, D., Bryll, R., Duncan, S., Ma, X., Kirbas, C., McCullough, K. & Ansari R. (2002), ‘Multimodal human discourse: gesture and speech’. ACM Transactions on Computer-Human Interaction 9:171–93. Reddy, M. J. (1993), ‘The conduit metaphor: a case of frame conflict in our language about language’. In Andrew Ortony (ed.), Metaphor and Thought. Cambridge University Press. New York. 164–201. Schank, R. C. & Abelson R. (1977), Scripts, Plans, Goals, and Understanding. Lawrence Erlbaum Associates. Hillsdale, NJ. Schlangen, D. & Lascarides, A. (2002), ‘Resolving fragments using discourse information’. In Proceedings of the 6th International Workshop on the Semantics and Pragmatics of Dialogue (Edilog). Edinburgh. So, W., Kita, S. & Goldin-Meadow, S. ‘Using the hands to keep track of who does what to whom’. Cognitive Science, forthcoming. Sowa, T. & Wachsmuth, I. (2000), ‘Coverbal iconic gestures for object descriptions in virtual environments: an Empirical study’. In PostProceedings of the Conference of Gestures: Meaning and Use. Portugal. Steedman, M. (2000), The Syntactic Process. MIT Press. Cambridge. Stern, J. (2000), Metaphor in Context. MIT Press. Cambridge. Talmy, L. (1996), ‘Fictive motion in language and ‘‘ception’’’. In Paul Bloom, Mary Peterson, Lynn Nadel and Merrill Garrett (eds.), Language and Spacea. MIT Press. Cambridge. 211–76. van Eijck, J. & Kamp, H. (1997), ‘Representing discourse in context’. In Johan van Benthem & Alice ter Meulen
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
demonstrated objects using a pointing cone.’’ In Gestures in Human-Computer Interaction and Simulation. Springer. Berlin. 300–11 Kyburg, A. & Morreau, M. (2000), ‘Fitting words: vague words in context’. Linguistics and Philosophy, 23:577–97. Lakoff, G. & Johnson, M. (1981), Metaphors We Live by. University of Chicago Press. Chicago. Lascarides, A. & Stone, M. (2006), ‘Formal semantics for iconic gesture’. In Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue (Brandial). Potsdam. Lascarides, A. & Stone, M. ‘Discourse coherence and gesture interpretation’. Gesture, forthcoming. Lewis, D. (1969), Convention: A Philosophical Study. Harvard University Press. Cambridge. Lu¨cking, A., Rieser, H. & Staudacher, M. (2006), ‘SDRT and multi-modal situated communication’. In Proceedings of BRANDIAL. Potsdam. Mann, W. C. & Thompson, S. A. (1987), ‘Rhetorical structure theory: a framework for the analysis of texts’. International Pragmatics Association Papers in Pragmatics 1:79–105. McNeill, D. (1992), Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press. Chicago. McNeill, D. (2005), Gesture and Thought. University of Chicago Press. Chicago. Nunberg, G. (1978), The Pragmatics of Reference. Indiana University Linguistics Club. Bloomington, Indiana. Oviatt, S., DeAngeli, A. & Kuhn, K. (1997), ‘Integration and synchronization of input modes during multimodal human-computer interaction’. In Proceedings of the Conference on Human Factors in Computing Systems: CHI ’97. Los Angeles.
Alex Lascarides and Matthew Stone 449 (eds.), Handbook of Logic and Linguistics. Elsevier. Amsterdam. 179–237. Walker, M. (1993), Informational Redundancy and Resource Bounds in Dialogue. Ph.D. thesis, Department of Computer & Information Science, University of Pennsylvania, Philadelphia.
Williamson, T. (1994), Vagueness. Routledge. London.
First version received: 22.05.2008 Second version received: 09.01.2009 Accepted: 16.02.2009
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011