Salience
Trends in Linguistics Studies and Monographs 227
Editor
Volker Gast Founding Editor
Werner Winter Editorial Board
Walter Bisang Hans Henrich Hock Matthias Schlesewsky Niina Ning Zhang Editor responsible for this volume
Walter Bisang
De Gruyter Mouton
Salience Multidisciplinary Perspectives on its Function in Discourse
Edited by
Christian Chiarcos Berry Claus Michael Grabski
De Gruyter Mouton
ISBN 978-3-11-024072-6 e-ISBN 978-3-11-024102-0 ISSN 1861-4302 Library of Congress Cataloging-in-Publication Data Salience : multidisciplinary perspectives on its function in discourse / edited by Christian Chiarcos, Berry Claus, Michael Grabski. p. cm. ⫺ (Trends in Linguistics. Studies and Monographs ; 227) Includes bibliographical references and index. ISBN 978-3-11-024072-6 (alk. paper) 1. Discourse analysis. 2. Computational linguistics. 3. Psycholinguistics. I. Chiarcos, Christian. II. Claus, Berry. III. Grabski, Michael. P302.S24 2011 4011.41⫺dc22 2011000754
Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. ” 2011 Walter de Gruyter GmbH & Co. KG, Berlin/New York Typesetting: PTP-Berlin Protago TEX-Production GmbH, Berlin Printing: Hubert & Co. GmbH & Co. KG, Göttingen ⬁ Printed on acid-free paper Printed in Germany. www.degruyter.com
Contents Introduction: Salience in linguistics and beyond . . . . . . . . . . . . . Christian Chiarcos, Berry Claus, and Michael Grabski
1
Part I. Entity-based salience in discourse Demonstratives and salience: Towards a functional taxonomy . . . . . . Olga Krasavina
31
Parenthetical agent-demoting constructions in Eastern Khanty: Discourse Salience vis-à-vis referring expressions . . . . . . . . . . . . Andrey Y. Filchenko
57
Joint information value of syntactic and semantic prominence for subsequent pronominal reference . . . . . . . . . . . . . . . . . . . Ralph L. Rose
81
The Mental Salience Framework: Context-adequate generation of referring expressions . . . . . . . . . . 105 Christian Chiarcos
Part II. Beyond entities in discourse Discourse-structural salience from a cross-linguistic perspective: Coordination and its contribution to discourse (structure) . . . . . . . . 143 Wiebke Ramm Rhetorical relations and verb placement in Old High German . . . . . . 173 Roland Hinterhölzl and Svetlana Petrova
Part III. Beyond purely linguistic salience Visual salience and the other one . . . . . . . . . . . . . . . . . . . . . 205 John D. Kelleher Salience in hypertext: Multiple preferred centers in a plurilinear discourse environment . . . . 229 Birgitta Bexten
vi
Contents
Establishing salience during narrative text comprehension: A simulation view account . . . . . . . . . . . . . . . . . . . . . . . . 251 Berry Claus Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Language index . . . . . . . . . . . . . . . . . . . . . . . . Index of determinants, manifestations and aspects of salience Subject index . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
279 279 279 281
Introduction: Salience in linguistics and beyond Christian Chiarcos, Berry Claus, and Michael Grabski
1.
Introduction quod punctum salit iam et movetur ut animal Aristotle, Hist. Anim. 6.3
“A point that hops and jumps like a living being”. The Latin translation of Aristotle’s description of the heart of an embryo in a hen’s egg – a red point that stands out from the yellow yolk – has been the source of metaphors throughout all languages of Europe. The salient point, le point saillant, der springende Punkt all refer to things that are particularly important or relevant. Starting from this metaphor, salient and salience in English became to mean things that stand out from the ground, can be easily recognized, are in the focus of attention, or foremost to a person’s state of mind.1 This volume addresses perspectives and functions of salience in discourse. The volume emanates from the 6th International Workshop on Multidisciplinary Approaches to Discourse that was held in October, 2005, in Chorin, Germany, under the theme Salience in Discourse. The goal of the workshop was to illustrate the differences and commonalities in research and perspectives on salience within and between various areas of research, including computational linguistics, discourse studies, and psycholinguistics. Looking for a general definition of an entity as being salient, one may start with the familiar function salience has in linguistics. In the context of the anaphoric binding of noun phrases, salience of possible antecedents is an important contextual feature, that may be established by complex linguistic means. A related phenomenon is the salience of discourse segments that allows subsequent segments to be linked to them by means of some discourse relation. Moreover, the use of the term salience can be stretched further to extra-linguistic entities – for example, non-linguistic objects can be defined as salient, as they restrict linguistic reference to real entities in the discourse-external context. As gener1. For a detailed description of the history of the term and the metaphor see von Heusinger (1997).
2
Christian Chiarcos, Berry Claus, and Michael Grabski
alization over these different uses of ‘salience’, a working definition emanated from discussions at the MAD ’05 workshop that can be given as follows: “Salience defines the degree of relative prominence of a unit of information, at a specific point in time, in comparison to the other units of information.” The contributions in this volume are organized in accordance with the different views on salience in discourse identified above: The salience of entities in discourse (Part I, see Section 2 for an overview), discourse structural salience (Part II, see Section 3 for an overview), and aspects of salience beyond linguistics (Part III, see Section 4 for an overview). 1.1. Input and output contexts A useful picture that has evolved in discourse semantics is that a text utterance and the presuppositions that it makes exploit an input context that is induced by the preceding text. This relates to anaphora that have been given a status of presupposing expressions in the influential work of van der Sandt (1992). As an utterance always adds some information, it also provides an output context that can be exploited by subsequent text. This function of an utterance has been stressed in dynamic semantics that has identified an utterance’s meaning with its Context Change Potential (e.g., Groenendijk and Stokhof, 1991; Kamp, 1981; Lascarides and Asher, 2007; Muskens, 1991). To this picture, salience of noun phrase anaphora can be related like this: at an utterance U that contains some anaphorical expression, the relative salience of any possible antecedent is determined by the input context for U. The picture also explains why salience of linguistic expressions is subject to change: as the text proceeds, input contexts change and with them the salience status that the expression in question may have as an antecedent in some anaphorical relation. 1.2. Entity-based salience in discourse For many researchers, the notion of salience is particularly closely related with discourse referents and their realization in discourse. This field of research has been particularly productive in the last 20 years, due to the concentration of a great deal of the community on the computational treatment of referring expressions during the 1990s. In parts, these developments cumulated in the development of Centering Theory (Grosz, Joshi and Weinstein, 1995): Here, a number of linguistic phenomena pertaining to the appearance of discourse referents as well as the notion of local coherence were traced back to the concept of centers (discourse refer-
Introduction: Salience in linguistics and beyond
3
ents) and their salience ranking. Centering also formalized an important insight, that entities in discourse have both a backward-looking and a forward-looking aspect that correspond to the schema of input and output contexts of a given utterance. 1.3. Salience and information structure Information structure pertains to the functional structuring of utterances into partitions that are attributed different textual functions, such as topic/comment, focus/background, theme/rheme, given-new etc. (e.g., Krifka, 2007; Molnár, 1993; Vallduví, 1992; Vallduví and Engdahl, 1996). In the literature, this partitioning was partly defined in terms of salience, or even reduced to it, as proposed in Sgall, Hajiˇcová, and Panevova (1986). But the general relationship is far from clear. One point is that such partitions may concern sentence constituents that do not exclusively refer to entities. For instance, the focus part of a sentence, which often is considered to be more salient than the background part, may very well be a VP that does not contain an NP. Salience in information structure then extends beyond entity-based salience. On the other hand, a definition of salience might start with these partitions, taking, e.g., topic and focus constituents as candidates for salient expressions (Arnold, 2005). But then, these constituents are salient in different ways, as the respective partitions to which they belong are interpreted differently. A further problem is that the two parts very often are complementary. As a consequence, salient material would cover the whole clause even in rather ordinary utterances – rendering the notion of salience as void. As an alternative, different dimensions of salience may be distinguished. In this regard, Centering Theory with its differentiation between forward-looking and backward-looking functions of centers provides a promising point of departure. They specify two dimensions of salience that linguistic objects regularly have in discourse and that could be exploited for definitions of informationstructural distinctions. In Subsection 2.3, multidimensionality of salience in discourse will be specifically discussed with respect to referring expressions. 1.4. Salience and discourse structure Another concept of salience deals with structures at or above clause level. We then have relative salience of discourse segments, seen as either surface text utterances or as their semantic counterparts, such as propositions or corresponding semantic objects. The global structure of discourse is often represented by means of a hierarchical structure, e.g., a tree (Grosz and Sidner, 1986; Mann and Thompson, 1988). Discourse segments correspond to nodes in such a tree
4
Christian Chiarcos, Berry Claus, and Michael Grabski
and are related by coordinating or subordinating discourse or coherence relations. Discourse segments that are subordinated by such relations are assumed to be less salient, in that they are less accessible or less important, than nonsubordinated segments (Brandt, 1996). From a processing point of view, the identification of discourse structure then bears a remarkable resemblance with tasks like the resolution of anaphoric references: For a given utterance U, a discourse segment has to be identified to which this utterance may be attached. With the prior analysis of the discourse as input context, the attachment operation modifies the discourse structure, and creates an updated output context. In this sense, salience of discourse segments or propositions and salience of entities may be compared with each other. The parallel between these different kinds of salience in discourse is further described in Section 3. 1.5. Extra-linguistic salience Section 4 below extends the scope beyond purely linguistic phenomena. An obvious example where non-linguistic factors play a role in language use is situated communication, i.e., physically grounded communication. In situated communication, the interlocutors typically refer to objects that are actually present in the immediate environment. Hence, it can be expected that situated language production and comprehension is determined by conditions of the physical environment, such as its visual properties. A second, less obvious, example that is dealt with in Section 4 is visual text marking. For instance, words in written texts may be typographically highlighted to signal information focus (McAteer, 1992). Furthermore, visual marking can also indicate a particular functionality as is the case with the visual marking of cross-references in dictionaries and of links in hypertexts. The third example, which may be surprising from a linguistic perspective, is the effect of properties of the situation that is described in a text, in addition to genuine textual factors. The salience of an entity that is mentioned in a text may not only depend on its linguistic salience but also on its salience in the situation being described. At least, this is what is suggested by empirical results that demonstrate effects of situational factors such as spatial and temporal variables (e.g., Glenberg, Meyer, and Lindem, 1987; Kelter, Kaup, and Claus, 2004).
Introduction: Salience in linguistics and beyond
2.
5
Entity-based salience in discourse
Early research on salience and referentiality includes Fillmore’s (1977) Saliency Hierarchy and its relevance to the assignment of grammatical roles, Lewis’ (1979) characterization of definite expressions on the basis of salience considerations, Osgood and Bock’s (1977) psycholinguistic study of salience and sentencing, and Hajiˇcová and Vrbova’s (1982) computational model of salience in the stock of shared knowledge, i.e., the common ground established between hearer and speaker. These approaches, however, are to be seen in a field of great terminological diffusion, with related conceptions existing under different terms, e.g., referential activation (Chafe, 1976), familiarity (Prince, 1981), topicality (Givón, 1983), accessibility (Ariel, 1990), or givenness (Gundel, Hedberg, and Zacharski, 1993). 2.1. Centering Theory Nowadays, Centering Theory (Brennan, Friedman, and Pollard, 1987; Grosz, 1981; Grosz et al., 1995; Joshi and Weinstein, 1981; Sidner, 1978, 1981, 1983; Walker, Joshi, and Prince, 1998) represents probably the most influential account of entity-based salience in discourse. Its attraction and spread across several communities is particularly due to the introduction of an independent, selfcontained terminology. Centering operates on the assumption that attention has to be focused, or “centered” in discourse. While this insight also underlies the definition of “topic” (Tomlin, 1995), Centering theorists used it to develop a theory-specific terminology. Important notions in their theory are (following Grosz et al., 1983, 1995): centers: entities that serve to link an utterance U with other utterances forward-looking centers: the set Cf (U ) of centers that are realized in U and that can be referred to in subsequent utterances. backward-looking center a unique center Cb (Uk ), defined for each utterance Uk (except the segment initial one) that refers back to a forward-looking center of the preceding utterance Uk−1 , and that, intuitively, represents the discourse entity which is the center of attention at the utterance of Uk . The backward-looking center is selected from the forward-looking centers of the preceding utterance; it thus anchors an utterance in the preceding discourse. Now, the fundamental claim of Centering is that in order to establish (local)
6
Christian Chiarcos, Berry Claus, and Michael Grabski
coherence in discourse, speakers need to make sure that the backward-looking center can be identified. Forward-looking centers are thus organized in a salience ranking that reflects their likelihood to serve as backward-looking center of the following utterance. The backward-looking center Cb (Uk ) is identified with the most salient forward-looking center of the preceding utterance Uk−1 that is realized in Uk . salience ranking of forward-looking centers The elements of Cf (Uk−1 ) are organized in a partial order according to their realization in Uk−1 , i.e., their grammatical roles: subject > object > other preferred center The highest-ranking element of Cf (Uk−1 ) is defined as its “preferred center”, Cp (Uk−1 ); intuitively, it is the most likely candidate for the backwardlooking center of the following utterance, Cb (Uk ). Assuming that a hearer needs to keep track of the backward-looking center, Centering then specifies the conditions that establish (local) coherence in discourse. These conditions include preferences among transitions between adjacent utterances and constraints on the realization of the backward-looking center. Grosz et al. (1995) posit two rules which represent concise predictions of Centering Theory. These concern the usage of pronouns and aspects of local referential coherence. Rule 1 (Pronominalization rule) If any element of Cf (Uk ) is realized by a pronoun in Uk+1 then the Cb (Uk+1 ) must be realized by a pronoun also. Rule 1 formulates the insight that the most salient referent, the attentional center of the hearer, is likely to appear as a pronominal expression. If, thus, the identification of the current backward-looking center is crucial for the flow of attention in discourse, then pronominalization of another element that is different from the backward-looking center could produce misleading interpretations of an utterance. As for local referential coherence, Centering postulates preferences on the transitions between adjacent utterances. Following Grosz et al. (1995), the following types of transitions are to be distinguished: center transitions continue: The backward-looking center of the previous utterance is maintained and it is the preferred center of the current utterance.
Introduction: Salience in linguistics and beyond
retain: shift:
7
The backward-looking center of the previous utterance is maintained, but it is not the preferred center of the current utterance. The backward-looking center of the current utterance differs from that of the preceding utterance.
Rule 2 (Transition rule) Sequences of continue are preferred over sequences of retain; and sequences of retain are preferred over sequences of shift. Rule 2 states a direct relationship between center indication and the local coherence of two utterances: Shifts of attention (i.e., shifts of the backward-looking center, shift) have to be minimized, and the continuity of the current backwardlooking center has to be signaled by cohesive means (i.e., continue > retain). Both requirements support the identification of the backward-looking center. 2.2. An example (1)
(a) For insurance agent Toni Johnson, dealing with the earthquake has been more than just a work experience. (b) She lives in Oakland, a community hit hard by the earthquake. (c) The apartment she shares with her sister was rattled, but nothing was severely damaged. RST Discourse Treebank (Carlson, Marcu, and Okurowski, 2003), file 3 (slightly simplified)
The centers in this short text are the discourse referents Toni, the earthquake, Oakland, the apartment, and Toni’s sister. Consider sentences (1.b) and (1.c). Sentence (1.b) contains three forward-looking centers, Toni, Oakland, and the earthquake. From these, both Toni and the earthquake are equally feasible candidates for the backward-looking center. Thus, the backward-looking center Cb (U1.b ) cannot be identified on grounds of grammatical roles alone. However, this uncertainty can be resolved by means of Rule 2, as Toni is the backward-looking center of (1.c). Identifying Toni with the backward-looking center Cb (U1.b ) thus results in a continue transition between (1.b) and (1.c), while assuming the earthquake to be Cb (U1.b ), means that a shift occurred. Thus, Rule 2 predicts that the (preferred) backward-looking center of (1.b) is Toni. In fact, this prediction is also consistent with Rule 1, as the referent pronominalized in (1.b) is Toni. With respect to the forward-looking aspect of Centering, Toni is realized as subject referent, the other referents being oblique, and thus, Toni represents the preferred center.
8
Christian Chiarcos, Berry Claus, and Michael Grabski
(1.b)
Cf : {Toni} > {Oakland, earthquake} Cb : Toni, Cp : Toni, transition: continue
Sentence (1.c) realizes the apartment, Toni and Toni’s sister, with Toni being clearly identifiable as the backward-looking center. The backward-looking center is thus maintained from the preceding utterance, and, thus, the transition between (1.b) and (1.c) is to be regarded as continue. As for the Cf ordering, Toni and the apartment are realized as subject referents in (1.c), and thus, more salient than Toni’s sister. (1.c)
Cf : {Toni, apartment} > {Toni’s sister} Cb : Toni, Cp : Toni or apartment, transition: continue
In this way, Centering represents a model of local discourse coherence (Centering transitions), the assignment of grammatical roles (Cf ordering), the establishment of the topic (Cb ), and pronominalization preferences (Rule 1), thus grounding these phenomena on utterance-internal salience. Centering has gained a lot of attention in the last decades, as it provides a consistent terminological framework in a field notoriously plagued by terminological difficulties. Moreover, it was formulated as an operationalizable framework for the (heuristic) treatment of discourse phenomena, thus encouraging the formulation of algorithms, e.g., for anaphor resolution (Brennan et al., 1987). For both reasons, it was widely adopted throughout different communities in linguistics and has also stimulated psycholinguistic research (e.g., Almor, 1999; Brown-Schmidt, Byron, and Tanenhaus, 2005; Gordon, Grosz, and Gilliom, 1993; Gordon and Hendrick, 1998; Stevenson, 2002). However, with the great acceptance of Centering in different linguistic communities, also the formulation and understanding of Centering has advanced. Centering Theory evolved into a family of theories that differ with respect to certain assumptions and parameters. These parameters include definitional issues (‘utterance’, ‘center’, ‘realized in an utterance’, ‘pronoun’), the assumed salience rankings (e.g., based on word order and the types of referring expressions rather than grammatical roles), but also fundamental claims (e.g., whether or not a unique backward-looking center is assumed, what types of transitions are considered, whether backward-looking center is to be searched only in the immediately preceding utterance). Some of these parameters have been studied by Poesio, Stevenson, Di Eugenio, and Hitzeman (2004).
Introduction: Salience in linguistics and beyond
9
2.3. Multidimensional models of salience Besides the investigation of parameters of salience and its realization in discourse, also the nature of salience itself has been studied. In the light of newer experimental and empirical findings, this issue evolved into a major research question in salience research (e.g., Arnold, 2005; Kaiser, 2006; Mulkern, 2007; Navaretta 2002). Aspects of research involve the question whether or not one single dimension of salience is to be assumed as the basis for the choice of referring expressions, whether salience involves multiple functional aspects that are inherently independent from each other, or whether other factors besides salience have to be taken into consideration. Influential models such as Givenness Hierarchy proposed by Gundel et al. (1993), but also Centering Theory, postulate one single scale or hierarchy of degrees of salience along which discourse referents are organized, but alternatives to these unidimensional models of salience have also been proposed very early. Givón’s concept of topicality (1983, 2001) involves two dimensions (‘anaphoric topicality’ and ‘cataphoric topicality’) that represent distinct functional and cognitive aspects of the processing of referring expressions in discourse. ‘Anaphoric topicality’ concerns memory representations of the structure of the preceding discourse, and is therefore comparable to the notion of salience in Centering Theory. ‘Cataphoric topicality’ relates to the structure of the subsequent discourse, and the speaker’s current focus of attention. Both dimensions can be compared to the differentiation between backward-looking and forwardlooking aspects of centers (referring expressions), but are formalized here by means of different conceptions of topicality or salience. This functional differentiation between two dimensions of salience has later been rendered into similar dichotomies between inherent salience and imposed salience (Clamons, Mulkern, and Sanders, 1993; Mulkern, 2007), activation and prominence (Chafe, 1994), or givenness and relevance (Gundel and Mulkern, 1997; Pattabhiraman and Cercone, 1990). Such multi-dimensional models of salience receive support from newer experimental and empirical studies. There is evidence that pronouns and demonstratives systematically deviate in their sensitivity to different salience factors as shown for pronominals and demonstratives in Finnish and Dutch (Kaiser and Trueswell, 2004, 2011; see also Brown-Schmidt et al., 2005). Yet, if differences between different salience factors are preserved, this means that the cognitive representation factors cannot be leveled by means of a single dimension of salience.
10
Christian Chiarcos, Berry Claus, and Michael Grabski
Navaretta (2002), Arnold (2005), and Kaiser (2006) showed that functionally different constructions, i.e., topic-marking constructions as opposed to focus-marking constructions, have similar implications on the choice of pronominal as opposed to nominal expressions in the forthcoming discourse. This is taken as evidence that both topic-marking and focus-marking constructions have a specific forward-looking function that exists independently from backward-looking aspects of salience. Moreover, a psycholinguistic study by Stevenson, Crawley, and Kleinman (1994; see also Miltsakaki, 2007) points to a crucial role of semantic factors on focusing, such as the assignment of thematic roles that can be interpreted as a forward-looking aspect of discourse coherence. 2.4. The contributions The section on entity-based salience comprises four contributions that document the recent trend to investigate the multidimensionality of salience in discourse. Olga Krasavina’s contribution concerns the characterization of Russian demonstratives in terms of salience (activation). A standard assumption in unidimensional accounts of salience in discourse (e.g., Gundel et al., 1993), is that demonstratives are characterized by a degree of salience intermediate between (highly salient) pronominals and (non-salient) non-demonstrative nominals. This hypothesis is analyzed and further developed on the basis of a case study of narratives. In addition, a series of experiments is described that assesses factors that affect the use of demonstratives. One main conclusion of Krasavina’s study is that the relationship between salience and the use of Russian demonstrative NPs seems to be rather loose. Andrey Filchenko’s contribution is a typological study that presents material that can be interpreted in terms of Centering Theory. However, Centering is originally restricted to textual cues and expressions of salience whereas the contexts that are looked at in Filchenko’s contribution are highly situationdependent. He observes the use of certain grammatical functions by speakers of Khanty, a Uralic language in North Siberia, in situations where speakers refer to themselves and at the same time are interested in concealing their agenthood in certain activities. To do this they use a locative-marked ergative construction to express an ‘intransitive/transitive subject relation’, in terms of Dixon (1994). Pragmatically, this makes up a ‘demoting’ construction, as compared with an agentive subject construction, which allows that their agenthood can be inferred, but, at the same time, their reports fail to express agenthood directly. The motivation is that in the described situations the latter linguistic behavior
Introduction: Salience in linguistics and beyond
11
would break a cultural taboo. The use of the specific locative-marked ergative constructions can thus be analyzed as being motivated by an act of intentionally diminishing the salience of an entity (i.e., the speaker, in the examples brought forward). Ralph Rose’s contribution addresses the question whether a discourse entity’s syntactic salience is indeed the major determinant of the use of referring expressions. Starting from the observation that in English, syntactic role is often confounded with semantic role, he conducted a corpus study to investigate the relative contribution of syntactic and semantic factors on the use of pronominal reference. Semantic role was operationalized in terms of FrameNet (Baker, Fillmore, and Lowe, 1998) as well as in terms of PROTO-role entailments (Dowty, 1991). To analyze the corpus data, Rose adopted the notion of information value from Information Theory (Shannon, 1948). The results indicate that syntactic prominence and semantic prominence independently affect the salience of discourse entities Finally, Christian Chiarcos’ contribution introduces the Mental Salience Framework, a computational framework of salience metrics for the contextadequate generation of referring expressions, i.e., the choice of referring expressions, the assignment of grammatical roles, and word order preferences. The Mental Salience Framework describes the realization of referring expressions on the basis of two dimensions of salience, backward-looking (hearer-centered) salience, and forward-looking (speaker-centered) salience. On this basis, an integrated architecture for aspects of attention control in discourse is proposed which provides necessary preconditions for the technical operationalization of multi-dimensional accounts of salience in discourse. 3.
Beyond entities in discourse
3.1. Salience of propositions We turn now to salience of text utterances. The similarity of these (or rather their semantic counterparts, propositions) to entities becomes apparent when their processing is modeled by means of a hierarchically structured text representation where so-called rhetorical relations (or discourse relations) specify the linking of each text utterance to the preceding text (Asher and Lascarides, 2003; Lascarides and Asher, 2007; Mann and Thompson, 1988). Whenever a text utterance is processed, its attachment site has to be determined, i.e., some node in the representation constructed so far. Attachment to that node may be not trivial, and in fact resembles the choice of a salient antecedent.
12
Christian Chiarcos, Berry Claus, and Michael Grabski
Salience of propositions is regulated both by input and output contexts. On the one hand there is a ‘backward orientation’: choice of an attachment site means exploiting salience contours of the actual input context. The input context consists of a tree-like structure, to one of whose nodes the constituent is to be attached. An important hypothesis here is that salient nodes are on the rightmost branch of the tree, that leads down from the root node to the node k of the constituent that has been attached just before. This branch makes up the socalled Right Frontier of the actual text representation (cf. Webber, 1991). The most salient node on this branch is k, its ‘leaf’. Nodes that are above k in the tree can likewise be attachment sites, but require that a more general level of the text is aimed at, a step called discourse popping. Nodes that are not on the Right Frontier can be used for attachment only by specific linguistic means, one of them being the use of it-clefts (cf. Knott, Oberlander, O’Donnell, and Mellish, 2000). Clearly such nodes are much less salient within the salience contour that is presented by an input context. As will become apparent in what follows, also output contexts of text utterances are determined with respect to coherence structure. 3.2. An example Consider the following text, taken from Lascarides and Asher (2007): (2)
a. John had a great evening last night. b. He had a great meal. c. He ate salmon. d. He devoured lots of cheese. e. He won a dancing competition. f. ??It was a beautiful pink.
To describe the coherence structure in (2) we use the terminology from Segmented Discourse Representation Theory (SDRT; cf. Asher, 1993; Asher and Lascarides, 2003; Lascarides and Asher, 2007). SDRT combines classical Discourse Representation Theory (DRT; Kamp, 1981; Kamp and Reyle, 1993) with discourse relations that determine, in a stepwise processing of a text, the type of coherence of each of its utterances with a preceding text segment. In (2) the first relation to be established is between text utterances (2.a) and (2.b). In this case, the fact can be exploited that (2.a) presents an event of which the event mentioned in (2.b) is a part. This normally permits that the relation Elaboration is derived (cf. Mann and Thompson, 1988). Utterances (2.c) and (2.d) both can attach to (2.b), using the latter as an input context for linking them, again by Elaboration. There is also a temporal sequence of the events
Introduction: Salience in linguistics and beyond
13
mentioned in (2.c) and (2.d), which establishes a Narration relationship (Asher, 1993). A discourse structure as in (3) is obtained, with the four utterances each abbreviated by a characteristic noun. These utterance contents specify the nodes ka –kd of a graph structure, the subordinating relation Elaboration creating a dominance relation between ka and kb , etc. In contrast, Narration, that is coordinating, establishes a relation of sisters between kc and kd : (3)
Now, in (2.e) the text mentions an event that cannot be interpreted as a sub-event of the event in (2.b). But the input context made up by (2.a–2.d) and represented in (3), offers the possibility to interpret it as a sub-event of the event mentioned in (2.a). The node ke introduced by (2.e) is therefore attached, by Elaboration, to the higher node ka , i.e., ‘discourse popping’ occurs. The event is also related to (2.b) by Narration, cf. (4). (4)
The discourse structure obtained so far poses a problem for attachment of (2.f). By its content, it is best interpreted as elaborating the ‘Salmon constituent’, (2.c). But that constituent is beyond the Right Frontier, which by now consists
14
Christian Chiarcos, Berry Claus, and Michael Grabski
of the branch ka –ke . The oddness of (2.f) is explained by the specific input context that has been built up so far. We may now sketch the effect of the choice of specific coherence relations on output contexts. In several approaches, relations are classified as for the ‘weight’ of their arguments. In Rhetorical Structure Theory (RST; cf. Carlson and Marcu, 2001; Mann and Thompson, 1988), most relations make a difference between their nucleus and satellite arguments; in SDRT relations are either coordinating, i.e., relate sister nodes (Narration, Result etc.), or are subordinating, i.e., relate a daughter node to its mother node (Elaboration, Explanation etc.). This difference shapes different salience contours at the output context in the following way: By a coordinating relation, the newly attached node becomes the lowest element of the Right Frontier and the most salient node for the next discourse constituent to be attached. In the example above, this happened when (2.d) was attached by Narration, that is by a coordinating relation. In (3) the node kd has become the most salient node, and kc , being no longer on the Right Frontier, has lost its salience for subsequent text utterances. In contrast, a subordinating relation preserves the position of an attachment site k on the Right Frontier, as instantiated by the link of (2.b) to (2.a) by Elaboration, cf. the position of nodes ka and kb in (3), (4). 3.3. Correspondence between salience of entities and of propositions Indirectly, an interaction between salience of propositions and of entities has been acknowledged, as rhetorical structure restricts anaphor resolution across discourse constituents (Asher, 1993; Cristea, Ide, and Romary, 1998; Fox, 1987; Grosz and Sidner 1986; for an overview see Chiarcos and Krasavina, 2008). Already classical Discourse Representation Theory (DRT; Kamp, 1981; Kamp and Reyle, 1993) discusses restrictions on the accessibility of antecedents due to sentence internal embedding of clauses. Conversely, anaphor resolution has been used as a testbed for classifying specific discourse relations as being coordinating or subordinating (Asher and Vieu, 2005). Also, anaphoric accessibility has been used for the automated parsing of discourse structure (Schauer, 2000). Yet, a proposal that relates salience of entities and propositions in a principled way is still a desideratum. However, Knott et al. (2000) and von Heusinger (2007) provide promising first attempts in this regards. 3.4. The contributions In the present volume, two papers address issues of salience in discourse structure and its relationship to the salience of entities.
Introduction: Salience in linguistics and beyond
15
The contribution by Wiebke Ramm addresses the translation of discourse relations as expressed by a given connective. In a corpus of translations between Norwegian and German, she looks at the role of the ‘additive’ Norwegian connective og (Engl. and) and its German counterpart, und. Both og and und express coordinating relations. But interestingly, the actual distribution of the two connectives is considerably different in both languages. In many cases Norwegian og corresponds to a different connective in German, one that signals a subordinating relation. Coordination and subordination being correlated with salience differences, these facts invite questions about the universality of a direct coupling of connectives and discourse relations, and a possible influence of literary style. In the contribution by Roland Hinterhölzl and Svetlana Petrova, different linguistic means, positions of verb arguments, are related to both, salience of entities and of propositions. Analyzing the Old High German Tatian translation from the 9th century, they observe a correlation between fronting of NPs and their status of being discourse-old, i.e., given. As an effect, V2 constructions arise. This contrasts with the regularity that discourse-new entities obviously are referred to in a post-verbal position, yielding V1 constructions. Interestingly, there is a second observation, that the coordination/subordination difference between discourse relations may overwrite the first regularity: V2 appears in subordinated constituents, V1 in coordinated ones. The paper thus discusses an aspect of the relationship between two major types of salience: utterance internal salience that plays a role in information structure and in the ranking of forward-looking centers in centering, and the salience of whole utterances that serves to establish coherence relations within a text.
4.
Beyond purely linguistic salience
The issue of salience is not specific to linguistic information processing. Salience has been a topic of research in a wide range of disciplines and areas of study, including judgments of similarity (e.g., Tversky, 1977), social cognition (e.g., Higgins, 1996), causal attribution (e.g., Taylor and Fiske, 1978), and music perception (Parncutt, 1994; see also Noll, 2005). In fact, salience is of relevance for all kinds of mental processes. A common problem in linguistic as well as in non-linguistic cognition is capacity overload due to too much information. Hence, for all mental processes, there is the need to select which part of the available information is to be processed at a given point in time. It is a common assumption that this selection is affected by the salience of stimuli. For instance, some stimuli may “pop out” because
16
Christian Chiarcos, Berry Claus, and Michael Grabski
they are novel or one-of-a-kind. Being more salient, these stimuli will capture attention. 4.1. Visual salience and situated language processing The need for selection of information is particularly important in visual processing. Thus, it is not surprising that the issue of salience has been a topic of much research in the area of visual perception. Many studies investigated patterns of eye movements, based on the fact that eye movements are an overt manifestation of selective attention. A highly influential account of visual salience is the biologically motivated saliency map approach of Itti and Koch (2000). In their computational model, salience is established in a bottom-up fashion from perceptual factors such as contrast in color, intensity, and orientation. Indeed, there is empirical support that such purely stimulus-driven factors affect the location of gaze fixations in free viewing of meaningless visual patterns (Parkhurst, Law, and Niebur, 2002). However, selective attention in visual processing is by no means governed solely by bottom-up salience. Rather, attention allocation in scene viewing appears to be affected by both perceptual, bottom-up factors and cognitive, top-down factors. Recent evidence suggests that during active viewing of meaningful scenes, fixation location is primarily determined by cognitive factors, such as scene knowledge and task knowledge (Henderson, Brockmole, Castelhano, and Mack, 2007; see also Einhäuser, Rutishauser, and Koch, 2008). A naturally occurring activity that involves active viewing of meaningful scenes is situated language processing. Findings of studies using the so-called visual-world paradigm point to an interaction between visual processing and language processing during situated language comprehension (e.g., Ellsiepen, Knoeferle, and Crocker, 2008; Kamide, Altmann, and Haywood, 2003; Tanenhaus, Spivey-Knowlton, Eberhard, and Sedivy, 1995). On the one hand, utterance comprehension directs attention in the visual scene and on the other hand, scene information guides the interpretation of utterances. Remarkably, there is also empirical evidence that language processing in a situated context is not only affected by visual information per se but also by action-based affordances of the situation. It was found that the interpretation of noun phrases depends on the compatibility between to-be-performed actions and objects within the situation (Chambers, Tanenhaus, Eberhard, Filip, and Carlson, 2002; Chambers, Tanenhaus, and Magnuson, 2004). Moreover, experimental studies on situated communication show that when interlocutors refer to objects of the immediate physical environment, the choice of the referring expression is determined by visibility and spatial distance. For
Introduction: Salience in linguistics and beyond
17
example, the proportion of deictic expressions is considerably higher when the reference objects are visible to both interlocutors compared with when they are not visible to the addressee (Clark and Krych, 2004) and the use of deixis accompanied by pointing gestures increases when the reference objects are spatially close (at arm length) compared with when they are further away (Bangerter, 2004). 4.2. Non-linguistic salience and text comprehension What might be somewhat surprising at first glance is that non-linguistic salience does not only play a role in situated communication but also in plain, nonsituated text comprehension. We consider two very different examples of this, typographical properties and properties of the described situation. Typographical marking (e.g., underlining, boldface, italicization, and capitalization) is a means for accentuation in written language, similar to prosodic marking in spoken language. Experimental findings indicate that both, typographical marking (italicization) and prosodic marking (focus-driven word stress), capture attention and enhance depth of processing (A.S.J. Sanford, A.J. Sanford, Molle, and Emmot, 2006). There is also empirical evidence that different kinds of typographical marking have different functions – capitalization signals modulatory stress whereas italicization signals contrastive stress (McAteer, 1992). According to the saliency map approach (Itti and Koch, 2000), typographically marked words are highly visually salient – they stand out from the rest of the text. However, that the visual marking of a word conveys its informational salience is – beyond bottom-up, perceptual salience – driven by a top-down factor, that is, the readers’ knowledge concerning the communicative function of typographic signals. This is especially true for particular cases of typographically marked information, namely cross-references in dictionaries and hyperlinks in electronic documents. Readers of dictionary entries or hypertexts expect that the target texts of cross-references or hyperlinks contain information that is related to the words that are marked as cross-references or hyperlinks – and this expectation is based upon their knowledge of conventions. We now turn to the second example of non-linguistic salience in text comprehension, properties of the described situation As was mentioned before, the resolution of ambiguous pronominal anaphoric reference is not only determined by structural factors but also by other factors such as the thematic roles of the possible antecedents (Stevenson et al., 1994). Stevenson and her co-authors attribute this finding to the focusing of entities in comprehenders’ models of described events by proposing a default focus on the thematic role that is associated with the endpoint of the event. Results from other studies demonstrate
18
Christian Chiarcos, Berry Claus, and Michael Grabski
that anaphor resolution is also affected by properties of the described situation, for example, the presence of the reference entity in the described situation (e.g., Glenberg, et al., 1987) or the temporal distance between events in the described world (Kelter et al., 2004). Hence, empirical evidence points to effects of the salience of entities in the described situation over and above linguistic salience. This is perfectly in line with the simulation view of language comprehension (e.g., Barsalou, 1999; Glenberg, 2007; Zwaan, 2004), a theoretical approach that gained increasing importance in language comprehension research in recent years. A core assumption of this view is that representations of described situations share the same representational format and recruit the same mental subsystems as representations that are constructed during perception and interaction with the world. A corollary of this assumption is that factors that affect perception and action (such as spatial and temporal variables) should have parallel effects on language comprehension. 4.3. The contributions The contibution by John Kelleher is concerned with dialogue situated in a visual context. He devises an approach to integrate linguistic salience marking and visual salience with regard to their impact on reference resolution. In a situated dialogue, referring expression can anaphorically refer to afore mentioned entities or denote entities of the visual context that have not been introduced during previous discourse. To account for reference resolution in a situated dialogue, Kelleher proposes a framework that uses integrated scores of linguistic and visual salience for each object that is located in the visual context. In this framework, linguistic salience is computed by an algorithm that is inspired by Centering Theory and is based on the forward-looking centers of the preceding discourse. The visual-salience algorithm is based on two factors that determine the visibility of the objects in the scene, object size and distance from the point of focus. These basic saliencies are weighted according to the given referring expression to be resolved – taking into account both its content and form. The result is an integrated reference relative salience score for each of the objects in the dialogue situation with the highest scoring object being selected as the referent of the referential expression. Hence, in Kelleher’s framework, the referential resolution process exploits both, the linguistic context and the visual context of the situated dialogue. Birgitta Bexten’s contribution focuses on hypertexts. She proposes an integrated account of both linguistic and hypertextual salience marking within the framework of Centering Theory. A characteristic feature of hypertexts is their branching structure of information through hypertext links. Readers of
Introduction: Salience in linguistics and beyond
19
hypertexts act on the assumption that the target text of a hypertext link provides additional information on the linked-marked entity. To put it in Bexten’s words, a hypertext link allows for predictions of the content of the target text. In this respect, a hypertext link resembles the Centering Theory’s notion of a forward-looking center. It is by virtue of this parallel that Bexten proposes that hypertext links should be regarded as preferred centers as well. What makes a hypertext an interesting theoretical case is that hypertext links do not necessarily coincide with linguistic criteria that define forward-looking centers. As a consequence, a single utterance of a hypertext can contain more than one preferred center. In her contribution, Bexten develops a descriptive approach of coherence in hypertexts, taking into account the commonalities and differences between hypertextual and linguistic salience marking. The contribution by Berry Claus takes a look at the issue of salience from the perspective of the simulation view of language comprehension. Proponents of this view (e.g., Barsalou, 1999; Glenberg, 2007, Zwaan, 2004) assume that language comprehension is tantamount to mentally simulating the actual experience of the described situations. In her contribution, Claus proposes that (non-linguistic) salience of discourse entities derives from the mental simulations constructed during language comprehension. Her considerations are confined to narrative text comprehension and she argues that what makes an entity salient – over and above linguistic factors – may depend on facets of the narrated situation. The main implication of the simulation view with regard to non-linguistic salience is that the most salient entities are those that are present in the described situation. Claus reviews empirical support for this notion stemming from studies that investigated whether the mental accessibility of entities mentioned in narrative text is affected by properties of the described situation. References Almor, Amit 1999 Ariel, Mira 1990
Noun-phrase anaphora and focus: the informational load hypothesis. Psychological Review 106: 748–765. Accessing noun-phrase antecedents. London: Routledge.
Arnold, Jennifer 2005 Marking salience: The similarity of topic and focus. [On-line] Available: http://www.unc.edu/∼jarnold/papers/top.foc.html Asher, Nicolas 1993
Reference to abstract objects in discourse. Dordrecht: Kluwer.
20
Christian Chiarcos, Berry Claus, and Michael Grabski
Asher, Nicolas and Alex Lascarides 2003 Logics of conversation. Cambridge: Cambridge University Press. Asher, Nicolas and Laure Vieu 2005 Subordinating and coordinating discourse relations. Lingua 115: 591– 610. Baker, Collin F., Charles J. Fillmore, and John B. Lowe 1998 The Berkeley FrameNet project. In Proceedings of the COLING-ACL ’98 Conference, 86–90. Montreal: Association for Computational Linguistics. Bangerter, Adrian 2004 Using pointing and describing to achieve joint focus of attention in dialogue. Psychological Science 15: 415–419. Barsalou, Lawrence W. 1999 Perceptual symbol systems. Behavioral and Brain Sciences 22: 577– 660. Brandt, Margarethe 1996 Subordination und Parenthese als Mittel der Informations-strukturierung in Texten. In Ebenen der Textstruktur. Sprachliche und kommunikative Prinzipien, Wolfgang Motsch (ed.), 211–240, Tübingen: Niemeyer. Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard 1987 A Centering approach to pronouns. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, 155–163. Stanford, Cal. Brown-Schmidt, Sarah, Donna K. Byron, and Michael K. Tanenhaus 2005 Beyond salience: interpretation of personal and demonstrative pronouns. Journal of Memory and Language 53: 292–313. Carlson, Lynn and Daniel Marcu 2001 Discourse tagging manual. ISI Tech Report ISI-TR-545. Carlson, Lynn, Daniel Marcu, and Mary Ellen Okurowski 2003 Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Current directions in discourse and dialogue, Jan van Kuppevelt and Ronnie Smith (eds.), 85–112. New York: Kluwer Academic Publishers. Chafe, Wallace 1976
Giveness, contrastiveness, definiteness, subjects, topics, and point of view. In Subject and topic, Charles N. Li (ed.), 25–56. New York: Academic Press.
Introduction: Salience in linguistics and beyond Chafe, Wallace 1994
21
Discourse, consiousness, and time. The flow and displacement of conscious experience in speaking and writing. Chicago: University of Chicago Press.
Chambers, Craig G., Michael K. Tanenhaus, Kathleen M. Eberhard, Hana Filip, and Greg N. Carlson 2002 Circumscribing referential domains during real-time language comprehension. Journal of Memory and Language 47: 30–49. Chambers, Craig G., Michael K. Tanenhaus, and James S. Magnuson 2004 Actions and affordances in syntactic ambiguity resolution. Journal of Experimental Psychology: Learning, Memory, and Cognition 30: 687–696. Chiarcos, Christian and Olga Krasavina 2008 Rhetorical distance revisited: A parameterized approach. In Constraints in Discourse, Anton Benz and Peter Kühnlein (eds.), 97–115, Amsterdam: John Benjamins. Clamons, C. Robin, Ann E. Mulkern, and Gerald Sanders 1993 Salience signaling in Oromo. Journal of Pragmatics 19: 519–536. Clark, Herbert H. and Meredyth A. Krych 2004 Speaking while monitoring addressees for understanding. Journal of Memory and Language 50: 62–81. Cristea, Dan, Nancy Ide, and Laurent Romary 1998 Veins Theory: A model of global discourse cohesion and coherence. In Proceedings of the 36th Meeting of the Association for Computational Linguistics and 17th Conference on Computational Linguistics, 281–285, San Francisco. Dixon, Robert M. W. 1994 Ergativity. Cambridge: Cambridge University Press. Dowty, David 1991
Thematic proto-roles and argument selection. Language 67: 547–619.
Einhäuser, Wolfgang, Ueli Rutishauser, and Christof Koch 2008 Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. Journal of Vision 8: 1–19. Ellsiepen, Emilia, Pia Knoeferle, and Matthew W. Crocker 2008 Incremental syntactic disambiguation using depicted events: Plausibility, co-presence and dynamic presentation. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, Brad C. Love, Ken McRae, and Vladimir M. Sloutsky (Eds.), 2398–2403. Austin, TX: Cognitive Science Society.
22
Christian Chiarcos, Berry Claus, and Michael Grabski
Fillmore, Charles J. 1977 Topics in lexical semantics. In Current issues in linguistic theory, Roger W. Cole (ed.), 76–138. Bloomington: Indiana University Press. Fox, Barbara A. 1987 Discourse structure and anaphora: Written and conversational English. Cambridge: Cambridge University Press. Givón, Talmy 1983
Introduction. In Topic continuity in discourse: a quantitative crosslanguage study, Talmy Givón (ed.), 5–41. Amsterdam: John Benjamins.
Givón, Talmy 1995
Functionalism and grammar. Amsterdam: John Benjamins.
Givón, Talmy 2001
Syntax (2nd edition). Amsterdam: John Benjamins.
Glenberg, Arthur M. 2007 Language and action: Creating sensible combinations of ideas. In Oxford Handbook of Psycholinguistics, Gareth Gaskell (ed.), 361–370. Oxford, UK: Oxford University Press. Glenberg, Arthur M., Marion Meyer, and Karen Lindem 1987 Mental models contribute to foregrounding during text comprehension. Journal of Memory and Language 26: 69–83. Gordon, Peter C., Barbara J. Grosz, and Laura A. Gilliom 1993 Pronouns, names, and the centering of attention in discourse. Cognitive Science 3: 311–347. Gordon, Peter C. and Randall Hendrick 1998 The representation and processing of coreference in discourse. Cognitive Science 22: 389–424. Groenendijk, Jeroen and Martin Stokhof 1991 Dynamic predicate logic. Linguistics and Philosophy 14: 39–100. Grosz, Barbara 1981
Focusing and description in natural language dialogues. In Elements of discourse understanding, Aravind K. Joshi, Bonnie L. Webber, and Ivan A. Sag (eds.), 85–105. Cambridge: Cambridge University Press.
Grosz, Barbara J., Aravind K. Joshi, and Scott Weinstein 1983 Providing a unified account of definite noun phrases in discourse. In Proceedings of the 21st Annual Meeting of the Association of Computational Linguistics, 44–50. Grosz, Barbara J., Aravind K. Joshi, and Scott Weinstein 1995 Centering: A framework for modelling the local coherence of discourse. Computational Linguistics 21: 203–225.
Introduction: Salience in linguistics and beyond
23
Grosz, Barbara J. and Candace L. Sidner 1986 Attention, intentions, and the structure of discourse. Computational Linguistics 12:175–204. Gundel, Jeanette K., Nancy A. Hedberg, and Ron Zacharski 1993 Cognitive staus and the form of referring expressions in discourse. Language 69: 247–307. Gundel, Jeanette K. and Ann Mulkern 1997 Relevance, referring expressions and the Givenness Hierarchy. In Proceedings of the Workshop on Relevance Theory. University of Herfordshire. Hajiˇcová, Eva and Jarka Vrbova 1982 On the role of the hierarchy of activation in the process of natural language understanding. In COLING 82 – Proceedings of the Ninth International Conference of Computational Linguistics, Jan Horecký (ed.), 107–113. Amsterdam: North Holland. Henderson, John M., James R. Brockmole, Monica S. Castelhano and Michael Mack 2007 Visual saliency does not account for eye movements during visual search in real-world scenes. In Eye movements: A window on mind and brain, Roger P. G. van Gompel, Martin H. Fischer, Wayne S. Murray, and Robin L. Hill (eds.), 537–562. Oxford: Elsevier. von Heusinger, Klaus 1997 Salienz und Referenz: Der Epsilonoperator in der Semantik der Nominalphrase und anaphorischer Pronomen. Berlin: Akademie Verlag. von Heusinger, Klaus 2007 Accessibility and definite noun phrases. In Anaphors in text. Cognitive, formal and applied approaches to anaphoric reference, Monika Schwarz-Friesel, Manfred Consten, and Mareile Knees (eds.), 123– 144. Amsterdam: John Benjamins. Higgins, E. Tory 1996 Knowledge activation: Accessibility, applicability, and salience. In Social psychology: Handbook of basic principles, E. Tory Higgins and Arie W. Kruglanski (eds.), 133–168. New York: The Guilford Press. Itti, Laurent and Christof Koch 2000 A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 40: 1489–1506. Joshi, Aravind K. and Scott Weinstein 1981 Control of inference: the role of some aspects of discourse structurecentering. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, 385–387.
24
Christian Chiarcos, Berry Claus, and Michael Grabski
Kaiser, Elsi 2006
Effects of topic and focus on salience. In Proceedings of Sinn und Bedeutung 10, Christian Ebert and Cornelia Endriss (eds), 139–154. Berlin: ZAS Working Papers in Linguistics, Vol. 44.
Kaiser, Elsi and John Trueswell 2004 The referential properties of Dutch pronouns and demonstratives: Is salience enough? In Proceedings of the Conference „sub8 – Sinn und Bedeutung“, Cecile Meier and Matthias Weisgerber (eds.), 137–149. Universität Konstanz: Arbeitspapier Nr. 177, FB Sprachwissenschaft. Konstanz, Germany. Kaiser, Elsi and John Trueswell to appear 2011 Investigating the interpretation of pronouns and demonstratives in Finnish: Going beyond salience. In The processing and acquisition of reference, Edward Gibson and Neal J. Pearlmutter (eds). Cambridge, Mass.: MIT Press. Kamide, Yuki, Gerry T. M. Altmann, and Sarah L. Haywood 2003 The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language 49: 133–156. Kamp, Hans 1981
A theory of truth and semantic representation. In Formal methods in the study of language, Jeroen A.G. Groenendijk, Theo M. V. Janssen, and Martin B. J. Stokhof (eds.), 277–322. Amsterdam: Foris.
Kamp, Hans and Uwe Reyle 1993 From discourse to logic. Dordrecht: Kluver. Kelter, Stephanie, Barbara Kaup, and Berry Claus 2004 Representing a described sequence of events: A dynamic view of narrative comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition 30: 451–464. Knott, Alistair, Jon Oberlander, Mick O’Donnell, and Chris Mellish 2000 Beyond elaboration: The interactions of relations and focus in coherent text. In Text representation: linguistic and psycholinguistic aspects, Ted Sanders, Joost Schilperoord, and Wilbert Spooren (eds.), 181–196. Amsterdam: John Benjamins. Krifka, Manfred 2007 Basic notions of information structure. In Interdisciplinary studies of information structure 6, Caroline Féry and Manfred Krifka (eds.), Potsdam.
Introduction: Salience in linguistics and beyond
25
Lascarides, Alex and Nicolas Asher 2007 Segmented Discourse Representation Theory: Dynamic semantics with discourse structure. In Computing meaning, Volume 3, Harry Bunt and Reinhard Muskens (eds.), 87–124. Berlin: Springer. Lewis, David 1979
Scorekeeping in a language game. In Semantics from different points of view, Rainer Bäuerle, Urs Egli, and Arnim von Stechow (eds.), 172–187. Berlin: Springer.
Mann, William C. and Sandra. A. Thompson 1988 Rhetorical structure theory: Toward a functional theory of text organization. Text 8: 243–281. McAteer, Erica 1992
Typeface emphasis and information focus in written language. Applied Cognitive Psychology 6: 345–359.
Miltsakaki, Eleni 2007 A rethink of the relationship between salience and anaphora resolution. In Anaphora: Analysis, algorithms and applications, António Branco (ed.), 91–96. Berlin: Springer. Molnár, Valéria 1993
Mulkern, Ann 2007
Zur Pragmatik und Grammatik des TOPIK-Begriffes. In Wortstellung und Informationsstruktur, Marga Reis, (ed.), 155–202. Tübingen: Niemeyer. Knowing who’s important: Relative discourse salience and irish pronominal forms. In The grammar-pragmatics interface: Essays in honor of Jeanette K. Gundel, Nancy A. Hedberg and Ron Zacharski (eds.), 113–142. Amsterdam: John Benjamins.
Muskens, Reinhard 1991 Anaphora and the logic of change. In Logics in AI, Proceedings of JELIA ’90, Volume 478 of Lecture Notes in Computer Science, Jan van Eijck (ed.), 412–427). Berlin: Springer. Navaretta, Costanza 2002 Combining information structure and Centering-based models of salience for resolving intersentential pronominal anaphora. In Proceedings of the 4th Discourse Anaphora and Anaphora Resolution Colloquium, Tony McEnery, António Branco, and Ruslan Mitkov (eds.), 135–140. Lisbon: Edições Colibri. Noll, Thomas 2005
Salience and musical discourse. In Salience in discourse. Proceedings of the 6th Workshop on Multidisciplinary Approaches to Discourse, Manfred Stede, Christian Chiarcos, Michael Grabski, and
26
Christian Chiarcos, Berry Claus, and Michael Grabski Luuk Lagerwerf (eds), 5–6. Amsterdam: Stichting Neerlandistiek; Münster: Nodus Publikationen.
Osgood, Charles E. and J. Kathryn Bock 1977 Salience and sentencing: Some production principles. In Sentence production: Developments in research and theory, Sheldon Rosenberg (ed.), 89–140. Hillsdale: Erlbaum. Parkhurst, Derrick, Klinton Law, and Ernst Niebur 2002 Modeling the role of salience in the allocation of overt visual attention. Vision Research 42: 107–123. Parncutt, Richard 1994 A perceptual model of pulse salience and metrical accent in musical rythms. Music Perception 11: 409–464. Pattabhiraman, Thiyagarajasarma and Nick Cercone 1990 Selection: Salience, relevance and the coupling between domain-level tasks and text planning. In Proceedings of the 5th International Workshop on Natural Language Generation, 79–86. Pittsburgh. Poesio, Massimo, Rosemary Stevenson, Barbara Di Eugenio, and Janet Hitzeman 2004 Centering: a parametric theory and its instantiation. Computational Linguistics 30: 309–364. Prince, Ellen F. 1981
Toward a taxonomy of given-new information. In Radical Pragmatics, Peter Cole (ed.), 223–256. New York: Academic Press.
van der Sandt, Rob 1992 Presupposition projection as anaphora resolution. Journal of Semantics 9: 333–377. Sanford, Alison J. S., Anthony J. Sanford, Jo Molle, and Catherine Emmott 2006 Shallow processing and attention capture in written and spoken discourse. Discourse Processes 42: 109–130. Schauer, Holger 2000 From elementary discourse units to complex ones. In Proceedings of 1st SIGdial Workshop on Discourse and Dialogue, Laila Dybkjær, Kôiti Hasida and David Traum (eds.), 46–55. Hong Kong. Sgall, Petr, Eva Hajiˇcová, and Jarmila Panevova 1986 The meaning of the sentence in its semantic and pragmatic aspects. Dordrecht: Reidel. Shannon, Claude E. 1948 A mathematical theory of communication. The Bell System Technical Journal 27: 379–423, 623–656.
Introduction: Salience in linguistics and beyond
27
Sidner, Candace L. 1978 The use of focus as a tool for disambiguation of definite noun phrases. In Theoretical issues in natural language processing, TINLAP 2, David L. Waltz (ed.), 86–95. Association for Computing Machinery, University of Illinois at Urbana- Champaign. Sidner, Candace L. 1981 Focusing for the interpretation of pronouns. American Journal of Computational Linguistics 7: 217–231. Sidner, Candace L. 1983 Focusing in the comprehension of definite anaphora. In Computational models of discourse, Michael Brady and Robert Berwick (eds.), 267–330. Cambridge: MIT Press. Stevenson, Rosemary 2002 The role of salience in the production of referring expressions. In Information sharing: Reference and presupposition in language generation and interpretation, Kees van Deemter and Rodger Kibble (eds.), 167–192. Stanford: CSLI Publications. Stevenson, Rosemary J., Rosalind A. Crawley, and David Kleinman 1994 Thematic roles, focus and the representation of events. Language and Cognitive Processes 9: 519–548. Tanenhaus, Michael K., Michael J. Spivey-Knowlton, Kathleen M. Eberhard, and Julie C. Sedivy 1995 Integration of visual and linguistic information in spoken language comprehension. Science 268: 1632–1634. Taylor, Shelley E. and Susan T. Fiske 1978 Salience, attention, and attributions: Top of the head phenomena. In Advances in experimental psychology – Vol. 11, Leonard Berkowitz (ed.), 249–287. New York: Academic Press. Tomlin, Russel S. 1995 Focal attention, voice, and word order. an experimental, cross-linguistic study. In Word order in discourse, Mickey Noonan and Pamela Downing (eds.), 517–554. Amsterdam: John Benjamins. Tversky, Amos 1977
Features of similarity. Psychological Review 84: 327–352.
Vallduví, Enric 1992
The informational component. New York: Garland.
Vallduví, Enric and Elisabeth Engdahl 1996 The linguistic realization of information packaging, Linguistics 34: 459–519.
28
Christian Chiarcos, Berry Claus, and Michael Grabski
Walker, Marilyn A., Aravind K. Joshi, and Ellen F. Prince 1998 Centering theory in discourse. Oxford: Clarendon Press. Webber, Bonnie Lynn 1991 Structure and ostension in the interpretation of discourse deixis. Natural Language and Cognitive Processes 6: 107–35. Zwaan, Rolf. A. 2004 The immersed experiencer: toward an embodied theory of language comprehension. In The psychology of learning and motivation, Brian H. Ross (ed.), 35–62. Academic Press, New York.
Part I. Entity-based salience in discourse
Demonstratives and salience: Towards a functional taxonomy Olga Krasavina
Abstract. The current article focuses on the use of demonstratives in Russian within the broader discourse phenomenon of referential choice. It has been repeatedly claimed that referential choice and the activation level (salience) of a referent in the memory of the speaker/listener are interconnected (e.g. Chafe 1994; Tomlin and Pu 1991; Kibrik 1996; 2000). In this article, the cognitive-psychological model of Gundel et. al. (1993, 2001) that specifies this connection is assessed and formulated more precisely. Following this analysis, I determine several cases of demonstrative NP use and summarize them as a model of referential choice. This study combines corpus analysis with experimental methods and presents the results obtained in three experiments. The experiments employ a questionnaire method, and involve a forced choice task and a written text-continuation exercise. The results of Experiment 1 demonstrate the effect of a time shift factor on demonstrative noun phrase (NP) use; the results of Experiments 2 and 3 cast doubt on the prevalent hypothesis that it is the prospective relevance of a discourse new referent that stimulates demonstrative NP use when mentioned for the second time.
1.
Introduction
In the last few decades, studies in referential choice1 have enjoyed great popularity. This article specifically examines the discourse use of demonstratives in Russian. Despite considerable progress in explanation of full noun phrase (NP) and pronoun use achieved today, the use of demonstratives has not received proper attention. The question of what specific factors stimulate the use of demonstrative noun phrases still remains open. The subject of this study was restricted to proximal demonstratives, since the factors leading to the use of proximal and distant demonstratives may be of different nature. Proximal demonstratives have an important place in the model of referential choice – in Russian they are the third most frequent anaphoric 1. Referential choice is a selection of one referential device from a number of referential devices available (e.g., the student – this student – he) made by a speaker at the moment of utterance.
32
Olga Krasavina
device2 after non-demonstrative full NPs and personal pronouns. This article is based on Russian data, and I cite only Russian examples below. It is quite likely, however, that some results obtained here may be relevant for a more general understanding of the use of demonstratives in discourse. The current article presents a combination of corpus and experimental approaches. Unlike the previous studies of demonstratives in Russian, which used a limited number of casual examples (Paduˇceva 1985, Boguslavskaja and Murav’jeva 1987), the present study involves several hundred examples from natural discourse and aims at describing all these examples. To verify the role of some potential factors, several psycholinguistic experiments were conducted. The term demonstrative NP, or briefly ètot X in this paper, is used to refer to a non-pronominal NP in anaphoric use3 and consists of the following lexical items: 1) proximal demonstrative ètot ‘this’ in adnominal use followed by a head noun with or without attributes, e.g. ètot mal’ˇcik ‘this boy’ or 2) proximal demonstrative ètot ‘this one’ in nominal use with or without attributes.4 The term “pronoun” will refer to the third-person pronouns (on).5 With respect to non-demonstrative full NPs, I will henceforth use the term “plain NP” for the sake of brevity. The paradigm of ètot is presented below in Table 1. Some influential explanatory models of referential choice emerged within the cognitive approach to discourse (Chafe 1976, Givón 1983, Gundel et. al. 1993, 2001, Ariel 1994, Grosz et. al. 1995). Within this approach it is claimed that the choice of a referential expression is constrained by the speaker’s evaluation of the referent’s representation in the memory, or activation (salience), in the mind of the listener. Gundel et. al. (1993: 275) argue that each “memory and/or attention state” in the Givenness Hierarchy (see below) “is a necessary and sufficient condition for appropriate use of a different form or forms”, and that demonstrative NPs correspond to the medium memory and/or attention state.
2. “Anaphoric device” is defined in this work as a form used to refer to a recently mentioned referent. 3. “Anaphora” is understood as a cohesion which points back to some previous item (Halliday and Hasan 1976). 4. Demonstratives in both syntactic uses have the same morphological forms in Russian. 5. The system of the third-person pronouns in Russian is represented by three different forms for three genders: on (masculine), ona (feminine) and ono (neuter); pl. oni ‘they’ for all genders. Russian grammatical gender is not semantically transparent, so on, for example, translates into English as ‘he’, ‘she’, or ‘it’ depending on the grammatical gender of the corresponding noun.
Demonstratives and salience: Towards a functional taxonomy
33
Table 1. Paradigm of ètot.
Nominative Accusative Dative Instrumental Locative Genitive
Singular masculine ètot ètot/ètogo ètomu ètim ètom ètogo
Plural neuter èto 6 èto ètomu ètim ètom ètogo
feminine èta ètu ètoj ètoj ètoj ètoj
èti èti/ètix 7 ètim ètimi ètix ètix
Since activation level is a cognitive category, it cannot be seen or directly measured (at least without developing online testing methods). Attempts to explain the use of a certain referential expression on the basis of an activation level and vice versa often result in a circular reasoning. In the qualitative model of Kibrik (1996, 2000 for Russian and English), judgments as to the activation level are based on the independent factors. An activation level receives a numerical characteristic. A value from zero to one is attached to the factors that prove to be relevant for the use of pronouns/full NPs, such as the distance to the antecedent, the referent’s role as a protagonist, the grammatical role of the antecedent, etc. The activation level is a sum of these values. This model is designed to explain and predict the use of a referential device on the basis of a set of observable factors. This work follows Kibrik (2000) in adopting a number of basic assumptions (for the model of referential choice as a whole): – referential choice is a process, conducted by a speaker at a certain moment; – referential choice is a multi-factorial process; – the model of referential choice should be a) predictive, and b) oriented towards the cognitive structures of the speaker at the moment of utterance. The basic questions addressed in this article are: 1) Is the conclusion of Gundel et. al. (1993) valid as far as demonstratives are concerned? 6. The form èto in nominal use is a specialized device which has been excluded from consideration in this work for the sake of simplicity. The nominal èto often denotes events and situations rather than referents, e.g. Ja byl gotov k otvetu, i èto pridavalo mne uverennosti. – I was prepared to answer, and this made me feel confident. Nominal èto should not be confused with the neuter form of ètot. Nominal èto has been studied by Paduˇceva (1985). 7. The accusative forms ètot/ètogo and èti/ètix correspond to inanimate and animate objects, respectively.
34
Olga Krasavina
2) If not, what are the factors determining the use of a demonstrative NP? The structure of this article is as follows. In Section 2 a connection between activation level and demonstrative NPs is investigated, and the results of the corpus study are summarized. In Section 3 the cases that remained unexplained in the corpus study are considered and the experiments that served to verify several hypotheses are discussed. The final conclusions are reported in Section 4. 2.
A corpus study of demonstrative NP uses
The corpus study used the texts of Russian authors such as F. Iskander, K Simonov, etc. (see the list of authors at the end of this article). All texts were written fiction. The study sample consisted of 254 examples, out of which 217 were demonstrative NP uses and 37 were plain NP uses that are interchangeable with a demonstrative NP. A demonstrative NP proved to be a relatively infrequently used device compared to the basic referential devices – plain NPs and pronouns (see Table 2 for the numbers). Examples that were functionally marked and demanded a separate consideration were excluded. The excluded cases mostly constituted lexicalized expressions like na ètot raz ‘this time’, na ètot sˇcet ‘in this respect’, and occurrences of demonstrative NP in contexts of internal and direct speech.8 Table 2. The frequency of demonstrative NP, plain NP, and pronoun uses per 1000 words (in a sample text). plain NP 180
pronoun 39
ètot X 9
2.1. Gundel et. al. (1993) evaluation The Givenness Hierarchy by Gundel et. al. (1993) (see also Gundel et. al. 2001 for the use of demonstratives) is a well-known approach which claims to provide an explanation of how referential expressions are chosen. I focus on this model in this study because of its general character. The approach in question suggests six cognitive statuses or memory and/or attention states of referent representation, each status being necessary and sufficient for the appropriate use of a certain referential device. The opposite is also true: the use of a certain referential form suggests a certain cognitive status of a referent, so that the 8. More than 30% of demonstrative NP uses were excluded in total.
Demonstratives and salience: Towards a functional taxonomy
35
addressee is informed about it. The formal means corresponding to a certain cognitive status can vary in different languages. According to the Givenness Hierarchy (see Table 3), demonstratives overtly signal a medium referent activation in memory (e.g. at most “familiar” or “activated”) being situated between pronouns that correspond to the highest activation (“in focus”) and plain NPs that correspond to the lowest activation (“uniquely identifiable”, “referential”, “type identifiable”). Table 3. The cognitive statuses and the corresponding referential forms in Russian (from Gundel et. al. 1993: 284). Statuses
in focus >
activated >
familiar >
Forms
Ø; on ‘he’
èto ‘this’; to ‘that’
ètot X ‘this X’; tot X ‘that X’
uniquely identifiable > referential > type identifiable ØX
Some examples in my corpus appear to confirm this generalization. According to Gundel et. al. (1993), the referent is “familiar” if “the addressee is able to uniquely identify the intended referent because he already has a representation of it in memory (in long-term memory if it has not been recently mentioned or perceived or in short-term memory if it has)”. The referent is “activated” if it is “represented in current short-term memory”; “activated representations may have been retrieved from long-term memory, or they may arise from the immediate linguistic or extra-linguistic context”. The referent is “in focus” if it includes “at least the topic of the preceding utterance, as well as any still relevant higher-order topics” (topic is understood as “what the speaker intends a sentence to be primarily about”; Gundel et. al. 1993: 278–279).9 The occurrence of ètot X in the following example appears to conform to what one can assess at least as “familiar”: (1)
Oni emu ne poverili, no oˇcen’ obradovalis", kogda on, našˇcupav v karmane šineli banku konservov, predložil im poest’ pered dorogoj. V banke okazalis’ kil’ki, i oni vtroem s""eli èti kil’ki bez xleba i vody ([3]).10
9. Gundel et. al. (1993, 2001) often make judgements as to what memory and/or attention state a referent may have at a certain case, on the basis of the context or the referent’s properties, as quoted above. A general model of identifying cognitive statuses for certain cases has not been proposed, though. 10. I do not use word-by-word glosses here since the details of the Russian examples are irrelevant for the topic of this article.
36
Olga Krasavina
‘They didn’t believe him but they were very glad when he, having discovered a tin in the pocket of his overcoat, suggested that they eat before the journey. In the tin there were sprats, and the three of them ate these sprats without bread and water.’ Gundel et. al. (1993) claim that the referent of ètot must be at least familiar, so this condition would also be met by the higher cognitive statuses (“activated” or “in focus”). This claim is supported by my material (see (2) where ètot X is used for a referent “in focus” in Gundel et. al. (1993) terms). It is not difficult to find contexts, however, in which under a low activation level – where only a plain NP is expected (“uniquely identifiable” or “referential” or “type identifiable”) – a demonstrative NP is used, as in examples where the distance from a demonstrative NP to its antecedent is high (more than three clauses)11 or as in (3): (2)
Mne stalo stydno za sebja, potomu cˇ to ni razu v žizni ja ne projavil nastojašˇcego interesa k tomu, što on delal. Kak i vse my, poglošˇcennyj svoimi zabotami, ja ne pridaval dolžnogo znaˇcenija žiznennoj celi ètogo ognennogo meˇctatelja [1]. ‘I felt ashamed for myself because I didn’t express real interest in what he was doing, not even once. Like all of us swallowed by everyday problems, I never attached due importance to the life goal of this fired-up dreamer.’
(3)
– Na sever sejˇcas nevozmožno – cˇ erez šosse ne proskoˇcim. Noˇci nado ždat’ … I xoronit’ aby kak – grešno. My što – boimsja ix? Ili udiraem? Ladno, davaj na xutor k Šandoru Borce. (Beginning of a new chapter) Ètogo vengra xorošo znali vse okrest. On razvodil xmel’, deržal neskol’ko korov … [7]. ‘– [To go] to the north is impossible now – we’ll not make it through the highway. We need to wait for the night … And to bury [him] improperly is no good. We are not afraid of them, are we? Or are we running away? Okay, let’s go to the farm of Shandor Borec. [Beginning of a new chapter] This Hungarian was famous all around (lit. This Hungarian [Obj] knew everyone [Subj] around). He cultivated hops, had a few cows …’
In (3), the referent Hungarian was mentioned previously only once by means of a proper name in the end of the preceding chapter in a non-prominent syntac11. I avoid citing such examples here, in order to save space: these examples are extremely lengthy.
Demonstratives and salience: Towards a functional taxonomy
37
tic position before it was mentioned by means of a demonstrative NP. Ètot X occurred at the beginning of the first sentence of a new chapter.12 The nominal part of the demonstrative NP is not the same as that of the antecedent. Thus, the identity of the referents encoded by the proper name and the demonstrative NP is far from obvious. This identity may be obvious for the speaker (writer, in this case), but not for the addressee. Moreover, both mentions of the referent are separated by the chapter border which lowers the referent’s activation level even more. The demonstrative NP in question corresponds to what Gundel et. al. (1993) would call “referential”: the speaker intends to say something about the particular Hungarian. As seen in (3), ètot X can be used to encode the referents which have both higher than “familiar” status (in consistency with the Gundel et. al. (1993) prediction) and lower than “familiar” status. That means that the scope of the possible appropriate uses of ètot X is wider than is predicted by Gundel at. al. (1993). Thus, within the model of Gundel et. al. (1993) a more precise statement regarding the use of demonstrative NPs, at least the Russian ètot X, needs to be made. The resulting model can be summarized as follows (see Figure 1). low activation
middle activation
high activation
plain NPs demonstrative NPs pronouns
Figure 1. The prototypical correspondences between cognitive statuses and referential forms according to Gundel et. al. (1993), and directions in which the forms can expand the scope of their appropriate uses.
Low activation is necessary and sufficient for appropriate use of a plain NP, but it can also expand to referents that have middle or high activation. High activation is necessary and sufficient for appropriate use of pronouns. Middle activation is necessary and sufficient for appropriate use of a demonstrative NP ètot X, yet ètot X can expand the scope of its possible uses in both directions, that is, it can also be appropriately used for coding referents enjoying either high or low activation. In other words, a pronoun cannot expand the scope of its possible appropriate uses, while a demonstrative NP and a plain NP can – to all other statuses. 12. Long distances (e.g. in clauses, sentences, over paragraph borders) are important factors lowering the referent activation (cf. Fox 1987; Givón 1983, 1990; Kibrik 1996, 2000). Chapter borders may be an even stronger factor than paragraph borders.
38
Olga Krasavina
Now, it seems clear that activation level is irrelevant for the use of demonstrative NPs. Rather demonstrative NPs appear to occur under very different circumstances in terms of activation level. In this respect, they differ from plain NPs and pronouns, assuming that the distribution of the latter is determined by an activation level of a referent (Chafe 1994; Tomlin and Pu 1991; Kibrik 1996, 2000). Therefore, the question remains: what specific factors govern the use of a demonstrative NP? 2.2. Results of the corpus study of demonstrative NP uses In the study of demonstratives in Dutch, Maes and Noordman (1995) hypothesize that a demonstrative NP produces modification in the referent’s representation that existed in the preceding discourse, activating relevant contextual properties of a referent. This modification is reflected in the formal-lexical relationship between the antecedent and anaphor NPs. After adapting this idea on the Russian data and considering the vast number of cases where demonstratives are used (for more details see Krasavina 2004), I came to the conclusion that a number of ètot X occurrences can be explained by a modification procedure, as in (2). The fact that the referent was a fired-up dreamer is activated by mentioning it within the demonstrative NP. Yet, in contrast to the theory of Maes and Noordman (1995), not all examples in my sample could be explained this way. I propose an alternative typology which consists basically of two classes: functionally determined uses and substitutionally determined uses. As functionally determined I consider the cases in which the use of ètot NPs is basically determined by one of its fundamental functions: 1) selection of one element from a set (as in (4)); 2) pejorative function (as in (5)), and 3) identification (of not obviously identical referents) (as in (6) and (7)). The functionally determined uses make up 45% of all cases. (4)
Nemcy mogli sbrasyvat’ trupy poˇcti u vxoda, a ètot ležal daleko ([3]). ‘The Germans could throw out the corpses beside the entrance, but this (one) lay in the distance ([7]).’
(5)
Matematiˇcke kakoj-to binom Njutona dorože vsej poetiki Puškina … I nikto ne podumaet, što ètot binom, možet, nikogda emu v žizni ne ponadobitsja …([4]). ‘For a math teacher some Binomial theorem is of a higher value than Pushkin’s poetry…And no one would think that this Binomial theorem may never be of any use to him [=the student] in his life…’
(6)
Poka cˇ to on byl spisan i cˇ islilsja voenrukom v Tyrnyauze, v škole. Iz ètoj školy prišel i Griša ([5]).
Demonstratives and salience: Towards a functional taxonomy
39
‘So far, he was retired and was listed as a military teacher in Tyrnyauz, at school. From this school came Grisha as well. ’ (7)
I tut že migom odna ruka lezet v ledjanoj sumrak za butylkoj, v to vremja kak drugaja vytiraet trjapkoj prilavok, pološˇcet gromadnuju lituju kružku s žul’niˇceskim tolstym dnom, perevoraˇcivaet ètu kružku i so stukom stavit ee pered pokupatelem ([6]). ‘And immediately one hand reaches into the icy darkness for the bottle, while the other wipes the counter with a cloth, rinses a huge cast mug with a thick false bottom, turns this mug over and puts it loudly in front of the client.’
In (4), nominal ètot points out that specifically this corpse is meant. Thus the corpse in question is contrasted to those that could have been thrown out beside the entrance. In (5), the negative evaluation of the Binomial theorem and the speaker’s ironical attitude to its exaggerated importance to a math teacher is expressed by means of ètot X. Russian does not use articles to denote whether the NP refers to a unique object or any object of the sort. Consider (6), where at its first mention, the school can be interpreted as referring either to some school or to a specific school. Ètot X can only refer to a specific school. Moreover, ètot can be substituted by ètot že ‘the same’, without a change of meaning. Thus the use of ètot marks that the referents in the first and the second predications in (6) are identical. In (7), all three devices are interchangeable – a demonstrative NP, a pronoun, and a plain NP. The use of a demonstrative NP in this example shows the speaker’s intention to identify the mug in question with the one that is “huge”, “cast”, and “with a thick false bottom”, rather than some other mug. Another class of demonstrative NP uses is that of substitutionally determined uses (41% of all cases). In these cases a demonstrative NP is used when neither a pronoun nor a plain NP can be used for a certain reason. In other words, some restricting factors prevent the use of both basic referential devices (the use of the latter on the basis of the activation level is explained in detail in Kibrik 1996; 2000). As a result, a demonstrative NP substitutes a corresponding device, as for example, in (8). A plain NP cannot be used in (8) because of a stylistic constraint in Russian on full repetition of a plain NP at short distances. The distance at which this constraint functions requires further investigation. If the activation level (see Kibrik 1996, 2000) typical for the examples of this type is included (8), the value of this measurement would be high enough for a pronoun to be used. The introductory character of the antecedent, however, lowers the activation level. For this reason it is unlikely that a pronoun can be used.
40 (8)
Olga Krasavina
Ja raskryl knigu, kotoruju prines s soboj. Eto byl kakoj-to xrestomatijnyj uˇcebnik dlja vtorogo ili tret’ego klassov s nebolšimi otryvkami iz klassiˇceskix rasskazov i povestej. Ja stal gromko cˇ itat’ èti otryvki (*nebol’šie otryvki, ? otryvki, *ix) iskljuˇcitel’no dlja togo, cˇ toby obratit’ vnimanie uˇcitelej na beglost’ moego cˇ tenija ([1]). ‘I opened the book that I brought with me. It was some textbook for the second or the third form containing short extracts from some classic short stories and novels. I began reading these extracts (*the short extracts, ? the extracts, *them) loudly, in order to turn the teachers’ attention to the fluency of my reading.’
Figure 2 below summarizes the model of referential choice which accounts for functionally and substitutionally determined uses:
Figure 2. Referential choice heuristics.
Step 1: This stage involves checking if there are any conditions for a demonstrative NP to be used in one of its functions. If such conditions are present and no blocking filters are present, a demonstrative NP is used (see examples (4) – (7)). If there are no such conditions or if such conditions are present, but blocking filters are also present, then one proceeds to Step 2. Step 2: According to the activation level, either a plain NP or a pronoun is chosen (see Kibrik 1996, 2000).
Demonstratives and salience: Towards a functional taxonomy
41
Step 3: If there are no filters that can block the use of a pronoun or a plain NP, then a corresponding referential device is used. If such filters are present, then one proceeds to Step 4. Step 4: A substitutionally determined demonstrative NP is used (see (8)). At this point a question may occur: what happens if not only the basic referential devices are blocked, but a demonstrative NP as well? Other ways of referring to the target object may exist. At this time no definitive answer to this question is available. 2.3. Unaccounted cases In Section 2.2, I have demonstrated that functionally and substitutionally determined uses make up most of the cases of demonstrative NP use. There are some cases, however, that are neither functionally, nor substitutionally determined, namely 14% of all cases. For example, what governs the referential choice in cases like (9)? (9)
Poezd opazdyval, i ja progulivalsja po perronu, razgljadyvaja okružajušˇcix. V osnovnom eto byli daˇcniki i studenty. Sredi tolpy ja razgljadel neskol’ko turistov s sobakoj. Na ètix turistax (=nix) byla jarkaja sportivnaja odežda i professionalnye botinki na tolstoj podošve. ‘The train was delayed, and I walked along the platform watching the people around me. For the most part these were summer residents and students. Among the crowd I noticed several tourists with a dog. These tourists (= they) wore bright tracksuits and professional shoes with a thick sole.’
Many claims have been made concerning the use of demonstrative NPs that signal different kinds of shifts (cf. Kleiber 1988, Gundel 1993, Himmelmann 1996, Cornish 1999), with the only difference from plain NPs being that plain NPs have a presupposition of uniqueness, whereas “the notion of contrast is built into the conditions of use regulating demonstratives” (Cornish 1999:59). In my study sample, 3% of demonstrative NPs occur under the condition of a time shift. It has also been repeatedly argued, on the other hand, that various kinds of shifts are the typical conditions for plain NP use as contrasted to pronoun use (cf. Walker 1998; Cornish 1999). Most likely such contradictions in conclusions are due to the difference in unaccounted variables that could have had different values in the texts tested. One needs to carefully consider all condi-
42
Olga Krasavina
tions that could have had any weight in referential choice and make the material as uniform as possible. In Section 3, I consider the cases where 1) a demonstrative NP is used to indicate a topic shift, and 2) a demonstrative NP is used after a time shift. In the first case, the animate and inanimate referents were considered separately, in two different experiments: animate and inanimate referents can have different activation levels.13 In the second case, referents were locations. Since this study has been conducted in the production-oriented perspective, the experimental tasks were modeled in such a way that utterance production process was imitated. The questionnaire method was used. Experiments 1 and 2 involved a forced choice task, and Experiment 3 a written text-continuation exercise. The following section is devoted to these experiments. 3.
Experimental evidence
3.1.
Experiment 1. The use of a demonstrative NP after a time shift
3.1.1. Purpose of experiment The purpose of this experiment was to discern whether the factor of time shift between the antecedent clause and the anaphor clause affects the use of a demonstrative NP as an anaphor. According to Paduˇceva (1985), a shift in the temporal perspective can be a reason for a demonstrative NP to be used in (11) as contrasted to (10): (10)
Za rekoj rasstilalsja lug. Na lugu paslis’ korovy. ‘Across the river there was a meadow. Cows grazed on the meadow.’
(11)
Za rekoj rasstilalsja lug. Na ètom lugu v prošlom godu paslis’ korovy. ‘Across the river there was a meadow. Last year cows grazed on this meadow.’
In the examples from my sample, all usages of ètot X under consideration have two common features: 1) the antecedent of the demonstrative NP is the introductory mention of the referent;
13. As mentioned before, referential choice is sensitive to activation level of a referent. Animacy contributes to the activation level.
Demonstratives and salience: Towards a functional taxonomy
43
2) the predications preceding the one containing the demonstrative NP are connected by the rhetorical relation of “sequence”14 (in terms of RST Theory; see Mann, Matthiessen, and Thompson 1992). The hypothesis tested in this experiment can be formulated as follows. Let us assume that the context meets criteria 1) and 2). Then a demonstrative NP is used after a time shift (Condition 1). If there is no time shift, mostly plain NPs are used (Condition 2). Condition 1. Nakonec, kogda poˇcti sovsem stemnelo, on vyšel iz lesa na perekopannoe protivotankovym rvom pole. Ix otrjad kopal ètot rov vˇcera noˇc’ju. ‘Finally, when it got almost completely dark, he went out of the forest to the field with a tank ditch. His detachment had been digging this ditch the night before.’ Condition 2. Nakonec, kogda poˇcti sovsem stemnelo, on vyšel iz lesa na perekopannoje protivotankovym rvom pole. On perebralsja cˇ erez rov i došel do kakixto vyselok – trex domikov s tjanuvšimisja szadi nix pletnjami. ‘Finally, when it got almost totally dark, he went out of the forest to the field with a tank ditch. He got over the ditch and reached some settlement – there were three little houses with wicker fences stretching behind them.’ 3.1.2. Participants The participants were 18 to 32 years old. They were divided randomly into two parts. Each part consisted of 16 people. Only individuals with no linguistic background were admitted. This was meant to prevent problems connected with the so-called “effect of study”, that is, the possibility that a participant would figure out exactly what the experimenter expected to see and would unconsciously either help the experimenter to obtain the expected result, or prevent him from doing so. 3.1.3. Materials and procedure In this experiment, each participant was involved in tasks based on two different conditions. One part of the participants was tested independently of the other part. The purpose of this division was to ensure that no subject variables were disregarded. Each text was presented in two variants corresponding to the two above-mentioned conditions. Participants were not given two versions of the 14. The sequence relation specifies that a succession relationship must exist between the related spans.
44
Olga Krasavina
same text, so they had no chance to compare these two variants while making the choice of a referential device. Each experimental sheet consisted of four texts, one of which was the target text with presence or absence of a time shift; the three others were filler-texts. The filler-texts imitated the target texts. Each experimental text consisted of two sentences. In the first sentence the target referent was mentioned for the first time, and in the second sentence for the second time. At the second mention of the referent the variants of choice (a plain NP and a demonstrative NP) were suggested in parentheses. Thus the task involved a forced choice between these variants. Each participant received four test sheets, including two with a time shift and two with no time shift. This yielded 4 (text types) × 32 (participants) = 128 test sheets. The test texts and filler texts were typed out in a random order on the test sheets. The time for completing the task was not limited. Some texts that were used for this experiment were original extracts from the corpus and some were constructed for this experiment. Below is an example of an experimental sheet. The target text is the second one from the top. This text includes a time shift. The other texts are filler-texts. Task: Below you see some extracts from Russian fiction. Please underline that variant given in parentheses that you think sounds best in this context. Accomplish the tasks in order. Proceed to the following task; do not return to any of the previous ones. Nakonec vygljanulo solnce. (Ono, èto solnce) osvetilo komnatu, brosaja vyzov ploxomu nastroeniju. ‘At last the sun came out of the clouds. (It, this sun) lightened the room, challenging the bad frame of mind’. Miša vošel v komnatu i podošel k stolu. Kogda-to za (ètim stolom, stolom) sidel ego deduška. ‘Misha entered the room and came to the table. Some time ago his grandfather used to sit at (this table, the table). Vernuvšis’, ja zastal ix za jarostnym sporom. Ponabl’udav nemnogo za (ètoj scenoj, nej), ja rešil vmešat’sja. ‘As I returned, I found them in a furious dispute. After watching (this scene, it) for a while, I decided to interrupt’. Odin student rešil podšutit’ nad svoim odnokursnikom. (Ètot student, on) napisal cˇ toto na kloˇcke bumagi i položil v sumku svoemu drugu. ‘Once a student wanted to make fun of his classmate. (This student, he) wrote something on a sheet of paper and dropped it into his friend’s bag’. Figure 3. Example of an experimental sheet (translation from Russian)
Demonstratives and salience: Towards a functional taxonomy
45
3.1.4. Results and discussion The results conformed to the expectation (see Table 4): under Condition 1 demonstratives were chosen in 69% of cases, and plain NPs only in 31% of cases. And vice versa: under Condition 2 ètot X was chosen only in 34% of cases, in contrast to plain NPs, which were used in 66% of cases. There is a statistically significant distinction in the use of referential devices (demonstrative and plain NP) depending on the presence/absence of a time shift (χ 2 (1) = 6.96, p < 0.01). Thus, the results obtained in this experiment confirmed the hypothesis that the presence of a time shift in case of sequential events stimulates the use of a demonstrative NP. The results obtained in Experiment 1 with the second part were similar to those obtained with the first part, and for this reason are not presented here. The similarity of results proved that there were no subject variables that were disregarded. Table 4. Experiment 1 with the first part of the participants. The frequency of demonstrative and plain NP uses under the conditions of presence and absence of a time shift.
Condition 1 (time shift) Condition 2 (no time shift)
3.2.
ètot X 22 (69%) 11 (34%)
plain NP 10 (31%) 21 (66%)
Experiment 2. Demonstrative NP use in cases with an animate referent
3.2.1. Purpose of experiment The use of a demonstrative NP after the first mention of a referent is considered to be a common strategy for establishing important discourse participants. As soon as a new participant is established, further mentions are coded by means of third-person pronouns and plain NPs. For example, as Himmelman (1996:229) indicates, in languages both with and without definite articles demonstrative NPs are used after the first mention of a thematically prominent referent that will be mentioned again in the subsequent discourse. In the following experiment I test this observation on the Russian material. 3.2.2. Participants The participants involved in this experiment were the same individuals who were involved in Experiment 1. The subjects were subdivided in a random way into two parts, 16 subjects each, so that each part received different versions of the same texts. These two versions corresponded to the two conditions mentioned above.
46
Olga Krasavina
3.2.3. Experimental design and procedure The target context can be described as follows: the referent is animate, and its first mention in the discourse occurs one discourse unit before its second mention. At its second mention a pronoun or a demonstrative NP can be employed. A plain NP cannot be used because of certain constraints in Russian. The hypothesis tested is as follows: demonstrative pronouns are used when a new referent important for the subsequent discourse is established. I assume that the importance of the referent for the subsequent discourse correlates with the presence of subsequent mentions of this referent in the discourse. Thus we have two conditions. In Condition 1 (thematic prominent), a referent is present in the following context after its second mention. In Condition 2 (not thematic prominent), a referent is not mentioned in the following context. The target referent is double-underlined in the examples: (12)
Kak-to veˇcerom ja pošel guljat’ s sobakoj. Projdjas’ po parku, ja pošel obratno, podumav, cˇ to pora vozvrašˇcat’sja domoj. Vdrug otkuda-to iz kustov vyšel kakoj-to cˇ elovek. (Ètot cˇ elovek, on) sprosil u menja dokumenty, pokazav mne milicejskoe udostverenie. ‘One evening I went to walk the dog. I made a walk around the park and went back with the intention of going home. Suddenly a man stepped out from somewhere in the bushes. (This man, he) showed me a police card and demanded my identity document.’ {The continuation follows according to either Condition 1 or Condition 2}.
Condition 1 (the referent is mentioned in the following context) Koneˇcno že, ja niˇcego ne vz’al s soboj, no on ne stal vyslušivat’ menja. ‘Certainly I had no document with me, but he wouldn’t listen to me’. Condition 2 (the referent is not mentioned in the following context) Sueta suet, vse sueta, podumal ja. Õodim, cˇ to-to delaem, suetimsja, a vo vsem ètom net nikakogo smysla. ‘Vanity of vanities, I thought. We’re moving around, doing so much useless stuff, and all this makes no sense.’ Each participant received a test sheet with 4 test texts on it, each 45–60 words long: two texts according to Condition 1, and two according to Condition 2. Filler material was not used. In the target place there were two variants of choice. Participants were instructed to choose one, either a pronoun or a demonstrative NP. This yielded 4 (texts) × 32 (participants) = 128 test texts.
Demonstratives and salience: Towards a functional taxonomy
47
Demonstrative NPs that were chosen noticeably more frequently under Condition 1, and demonstrative NPs and pronouns that were chosen with an equal frequency or with prevalence of pronouns under Condition 2, would be interpreted as support for the hypothesis. The participants were given the test sheets for both experiments at the same time. The test tasks were carried out consecutively, without any time break. 3.2.4. Results and discussion In this experiment I found that there was no significant difference between the referential device uses under two conditions (χ 2 (1) = 4.04 n.s). On the whole, pronouns were preferred (73% of all cases under Condition 1 and 81% under Condition 2), see Table 5. The effect of pronoun preference may have been caused by a factor that was not considered as the experiment was being planned: participants could have made their choices before they read the text to its end. Thus there was no guarantee that it was the information located in the further context that affected the subjects’ choice. To avoid this effect, the task was changed: the subjects were specifically instructed to read the texts to the end before making their choice. Table 5. The number of ètot X and on. Condition 1 (thematic prominent) Condition 2 (not thematic prominent)
ètot X 17 (27%) 12 (19%)
on 47 (73%) 52 (81%)
Twelve new individuals took part in a modified version of Experiment 2. All conditions remained the same. The results of this modified version of the experiment, like the results of the initial experiment (see Table 6), showed that at the second mention of an animate referent there was no significant difference between the referential device uses under two conditions (χ 2 (1) = 0.4 n.s.). This disproved the hypothesis regarding the influence of the “thematic prominence” factor on the choice “demonstrative NP/pronoun”. One can observe the tendency that the number of demonstrative NPs used under Condition 1 is larger than the number of demonstratives used when the referent is not mentioned in the further context. Still, the results showed that a referent’s relevance to the subsequent discourse was not sufficient to warrant that a demonstrative NP would be used at the second mention of this referent. Moreover, in Table 7, one can see another illustration that pronouns on the whole were used more often than demonstrative NPs at the second mention of an animate referent (pronouns made up more than two thirds
48
Olga Krasavina
Table 6. Frequency of demonstrative NP and pronoun use under Conditions 1 and 2.
Condition 1 (thematic prominent) Condition 2 (not thematic prominent)
ètot X 8 (33%) 6 (25%)
on 16 (66%) 18 (75%)
Table 7. Summary frequency of ètot X and pronoun use at the second mention of the referent under Condition 1 and 2. Condition 1 or Condition 2
ètot X
on
Total
14 (29%)
34 (70%)
48 (100%)
of the chosen devices). There must be either some additional, more powerful factors that regulate the choice between a pronoun and a demonstrative NP in this situation, probably depending on the speaker’s intention, the genre, etc., or a different experimental design needs to be chosen, in order to ensure that the participants understand the relevance of the target referent for the subsequent discourse. 3.3.
Experiment 3. Demonstrative NP use in cases with an inanimate referent
3.3.1. Purpose of experiment The hypothesis tested in this experiment is similar to that in the previous experiment. There were several important differences, however. First, in this experiment the contexts with an inanimate referent were tested. Second, the experimental design was changed: the experiment involved a story continuation task. In the original text, the referent a hole was presented as a pronoun at its second mention (see (13)). According to the hypothesis, a pronoun and not a demonstrative was used because the referent was not mentioned again in the following context. So, most likely the pronoun was used because the speaker evaluated this referent as irrelevant for the development of the further discourse. (13)
Vskore ona nauˇcilas’ vypolnjat’ nesložnuju xozjajstvennuju rabotu: molotit’ kolotuškoj kukuruzu, taskat’ na mel’nicu meški…Pod otkrytym navesom, gde ona žila, Zana vyryla jamu, obložila ee paporotnikom i, takim obrazom, ustroila sebe dovol’no ujutnuju spal’nju ([2]). ‘Soon she learned to accomplish simple domesticities: threshing maize with a mallet, dragging sacks to the mill…Under the open shed where
Demonstratives and salience: Towards a functional taxonomy
49
she lived, Zana dug a hole and put some ferns around it, and thus she made herself quite a cozy bedroom.’ 3.3.2. Participants 12 subjects took part in the experiment. The subjects had never taken part in this kind of experiment. The participants met the same criteria as the participants in Experiments 1 and 2. 3.3.3. Material and experimental procedure Each participant was given one test text consisting of two sentences. The target referent was introduced within the second sentence. The participants were instructed to continue this text with two or three sentences. The time for task completion was not limited. No hints about what referent should be used were provided, so participants were free to choose the subject they wrote about and to mention or not to mention a certain referent and if they did mention one, they were free to do so as many times as they wished. This yielded 1 × 12 = 12 test sheets. The question was by what means the further mentions of the inanimate referent would be coded. The predictions were distributed in the following way: (1) If the referent was mentioned only once in the following context, and namely in the next sentence, then a pronoun would be used; (2) If the referent was mentioned more than once in the following context, then at the first of these mentions (i.e. at the second mention altogether) this referent would be coded by a demonstrative NP. In Figure 4 I present the test sheets used in this experiment. Before the test texts there was a short “introduction”, where the protagonist was introduced. Task: continue the story in 2 or 3 sentences. The results were as follows: the subjects either did not mention the target referent at all, or used relative pronouns or adverbial demonstratives for coding it (see Table 8). Consequently, the test material was modified in the following way: at the end of each text that was to be continued, there was an instruction in parentheses as to what the end of the story should be. The referent jama ‘ditch’ was not explicitly mentioned. The instruction stated, “Describe her place” or “Describe her subsequent actions.” Initially 16 subjects participated in the experiment; the number of participants was increased until there were 16 relevant referent mentions through a demonstrative NP or a pronoun. The final number of participants was 70. In half of the experimental texts, subjects were asked to continue the text after a comma; in the other half, after a full stop. Punctuation
50
Olga Krasavina
Kogda mir zalixoradilo poiskami snežnogo cˇ eloveka, Viktor Maksimoviˇc neredko prixodil v kofejnju s žurnalami ili gazetnymi vyrezkami, v kotoryx govorilos’ ob ètom. I vdrug èti poiski obrušilis’ na Abxaziju. Okazyvaetsja, v abxazskom sele Txina v prošlom veke pojmali dikuju, ili lesnuju, kak govorjat abxazcy, ženšˇcinu. Vskore ona nauˇcilas’ vypolnjat’ nesložnuju xozjajstvennuju rabotu: molotit’ kolotuškoj kukuruzu, taskat’ na mel’nicu meški … Pod otkrytym navesom, gde ona žila, Zana vyryla jamu … ([2]). ‘When the world was in snowman-search fever, Viktor Maximovich often came to the coffeehouse with magazines or newspaper excerpts about it. Suddenly the search fell upon Abkhazia. It turned out that in the Abkhazian village called Thina, a wild, or forest woman was caught in the past century. She was named Zana. Soon she learned to accomplish simple domesticities: threshing maize with a mallet, dragging sacks to the mill … Under the open shed, where she lived, Zana dug a hole …’ Figure 4. Example of the test sheet (translation from Russian)
marks were neglected regardless of the instructions: sometimes the participants started a new sentence after a comma and continued the previous one after a period. Introduction and experimental texts remained the same. Table 8. Referential devices used by the subjects for referring to the referent ‘the hole’ in the first variant of Experiment 3. Referential devices Plain NP Pronoun Demonstrative NP Adverbial demonstrative (such as “there”) Relative pronoun (i.e. “where, which”) Referent is not mentioned anymore
Number 0 1 0 2 8 1
% 0 8 0 17 67 8
3.3.4. Results and discussion The final results can be seen in Tables 9 and 10. Table 9 represents the formal referential devices that the participants used when they mentioned ‘hole’. The devices not relevant for our discussion and excluded from consideration here are marked by an asterisk. As for the relevant uses, the number of cases with more than one subsequent mention of ‘hole’ was only 5 (31%) out of 16, three of which were realized as pronouns and two as demonstrative NPs (Table 10, lines 1 and 3). Under the condition of no further mentions of the
Demonstratives and salience: Towards a functional taxonomy
51
Table 9. Experiment 3. Text continuations. Number 7 9 10 6 7 8 23 70
*Plain NP at linear distance = 1 *Plain NP at linear distance > 1 Pronoun Demonstrative NP *Adverbial demonstrative *Relative pronoun *Target referent is not mentioned Total
% 10 13 14 9 10 11 33 100
Table 10. Correlation of pronoun use and referent mention in the following context (16 Total). referent is mentioned in the following context 5
referent is not mentioned in the following context 11
pronoun
demonstrative
pronoun
demonstrative
3 (60%)
2 (40%)
7 (63%)
4 (37%)
referent, demonstratives were still used (Table 10, lines 2 and 4), although the number of pronouns used under this condition was higher (7 cases, or 44%) than the number of demonstratives (4 cases, or 25%). At the same time, the number of demonstratives on the whole (6 cases, or 38%) was lower than the number of pronouns (10 cases, or 63%). The statistical analysis shows that there is no significant difference between pronoun or demonstrative NP use under two conditions (χ 2 (1) = 0.02 n.s.). This indicates that the factor of “thematic prominence” in the subsequent discourse of a new discourse referent does not affect the choice of demonstratives. The number of demonstrative NPs turned out to be quite low. As can be seen in Table 9, jama ‘hole’ was often referred to by means of adverbial demonstratives and plain NPs at the linear distance of one clause despite the constraint on full NP repetition at such distance. This constraint can be abolished in cases where the referent is a location, as in the case considered in Experiment 1. The high occurrence of adverbial demonstratives and plain NPs here can be addressed in terms of referent semantics.
52
Olga Krasavina
4.
Conclusions
The present study has focused on the use of an important, but little-studied, referential device in Russian – the demonstrative NP ètot X. One of the most essential points made in this study is the clarification of the connection between demonstrative NPs and activation level. This connection turned out to be quite weak: demonstrative NPs can be used under any activation level of a referent. It was demonstrated that ètot X can encode the referents which have both higher than “familiar” status (in line with the Gundel et. al. (1993, 2001) prediction) and lower than “familiar” status. The latter can be interpreted as an important addition to the theory of Gundel et. al. (1993, 2001). Using the corpus material, two classes covering most of the demonstrative NP uses were singled out. Demonstratives are used when neither of the basic referential devices can be used. There are certain factors that block the use of a pronoun under a high activation level and of a plain NP under a low activation level of a referent, so that the only possible referential form left is a demonstrative NP. These cases make up the first class. Another class is represented by cases that can be accounted for by one of the basic functions underlying the use of demonstratives: selection from a set, identification, or pejorative function. After the examination of several hypotheses concerning the character of the factors that lead to the use of demonstrative NPs in experimental studies, it has been proven that the demonstrative NP is a preferable referential device after a temporal shift takes place. An important “negative” result was also obtained during the study: the experiment indicates that the factor of further referent’s relevance in the following context does not affect the choice between a demonstrative NP and a pronoun. This study calls for further empirical studies in the use of Russian demonstratives. The questions that remain to be answered are: – What governs the referential choice in cases like those described in Experiments 2 and 3? – Within the class of substitutional uses, what happens if not only a pronoun and a plain NP are blocked, but a demonstrative NP as well? Additionally, the nature of constraints on the use of referential expressions requires further investigation. For example, the distance from the antecedent at which a plain NP is blocked needs to be clarified. Future work will include a study of a larger data set, as well as reaction time experiments.
Demonstratives and salience: Towards a functional taxonomy
53
Acknowledgements The research presented in this paper was supported by grant 03-06-80241a of the Russian Fund of Fundamental Research. Abbreviations The literature from which the examples were taken (URL: http://www.lib.ru): [1] [2] [3] [4] [5] [6] [7]
Iskander F. “Školnyj val’s, ili Energija styda”;15 Iskander F. “Stojanka cˇ eloveka”; Simonov K. “Živye i mertvye”; Bykov V. “Obelisk”; Vizbor J. “Legenda sedogo El’brusa”; Kataev V. “Beleet parus odinokij”; Akimov I. “Legenda o malom garnizone”.
References Ariel, Mira 1994
Interpreting Anaphoric Expressions: A cognitive versus a pragmatic approach. In: Journal of Linguistics (30), 3–42.
Boguslavskaja, Olga and Muravjeva, Irina 1987 Mexanizm anaforiˇceskoj nominacii. In: Modelirovanie jazykovoj dejatel’nosti v intellektual’nyx sistemax (red. A.E. Kibrik, A.S. Narinjani). Moskva: Nauka, 78–127. Chafe, Wallace 1976
Chafe, Wallace 1994
Givenness, contrastiveness, definiteness, subjects, topics and point of view. In: C.N. Li (ed.), Subject and topic. New York: Academic Press, 25–55. Discourse, Consciousness, and Time. The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago: University of Chicago Press.
Cornish, Fransis 1999 Anaphora, Discourse and Understanding. Evidence from English and French. Oxford: Clarendon. Fox, Barbara 1987
Discourse Structure and Anaphora. Cambridge: Cambridge University Press.
15. Examples from this and the other sources used in this study, were abridged as far as possible.
54
Olga Krasavina
Givón, Talmy, ed. 1983 Topic Continuity in Discourse: A quantitative cross-linguistic study. Amsterdam: Benjamins. Givón, Talmy 1990
Syntax: A Functional-typological Introduction. Vol. 2. Amsterdam and Philadelphia: John Benjamins.
Grosz, Barbara, Weinstein, Scott and Joshi, Aravind 1995 Centering a framework for modeling the local coherence of discourse. In: Computational linguistics 21(2), 203–25. Gundel, Janette, Hedberg, Nancy and Zacharski, Ron 1993 Cognitive status and the form of referring expressions in discourse. In: Language 69 (2), 274–307. Gundel, Janette, Hedberg, Nancy and Zacharski, Ron 2001 Cognitive Status and Definite Descriptions in English: Why Accommodation is Unnecessary. In: English Language and Linguistics 5, 273–295. Halliday, Michael and Hasan, Riqaiya 1976 Cohesion in English. Longman, London. Himmelmann, Nikolaus 1996 Demonstratives in Narrative Discourse: a taxonomy of universal uses. In: B. Fox (ed.), Studies in Anaphora. Amsterdam and Philadelphia: John Benjamins, 205–254. Kibrik, Andrej 1996
Kibrik, Andrej 2000
Anaphora in Russian narrative prose: A cognitive account. In: B. Fox (ed.), Studies in Anaphora. Amsterdam and Philadelphia: John Benjamins, 255-303. A Cognitive Calculative Approach towards Discourse Anaphora. In: Paul Baker, Andrew Hardie, McEnery, Tony and Siewierska, Anna (eds.) Proceedings of the Discourse anaphora and reference resolution conference (DAARC 2000). Lancaster University: University Center for Computer Corpus Research on Language, Technical Papers 12, 72–82.
Kleiber, Georges 1988 Sur l’anaphore demonstrative. In: G. Maurand (ed.), Nouvelles recherches en grammaire: Acts du Colloque d’Alibi, Université de Toulouse-Le Mirail, 52–74. Krasavina, Olga 2004 Upotreblenije ukazatel’noj imennoj gruppy v russkom pis’mennom narrativnom diskurse. Voprosy jazykoznanija 3.
Demonstratives and salience: Towards a functional taxonomy
55
Maes, Alfons and Noordman, Leo 1995 Demonstrative nominal anaphors: A case of nonidentificational markedness, Linguistics 33, 255–282. Mann, William, Matthiessen, Christian and Thompson, Sandra 1992 Rhetorical structure theory and text analysis. In W. Mann and S. Thompson (eds.), Discourse Description. Diverse Linguistic Analyses of a Fund-raising Text. Amsterdam and Philadelphia: John Benjamins, 39–78. Paduˇceva, Elena 1985 Vyskazyvnije i ego sootnesennost’ s dejstvitel’nostju. (Utterance and its interrelationship with reality). Moskva: Nauka. Tomlin, Russel and Pu, Ming 1991 The management of reference in Mandarin discourse. Cognitive Linguistics 2: 65–93. Walker, Marylin, Joshi, Aravind and Prince, Ellen (eds.) 1998 Centering Theory in Discourse. Oxford: Clarendon Press.
Parenthetical agent-demoting constructions in Eastern Khanty: Discourse salience vis-à-vis referring expressions1 Andrey Y. Filchenko
Pragmatic salience of referents in natural discourse is a matter of gradience and dynamics. Such dynamics in pragmatic salience of the discourse entities appears to be accordingly manifest in their morphosyntactic form. Most frequently, this pragmatic and syntactic dynamics is directed towards promotion, that is, for a discourse entity to become more discourse prominent enjoying correspondingly prominent formal syntactic coding: reduced refererring expression (cf. Rose, this vol.; Lambrecht 1994, inter alia), or particular word order alterations (Hinterhölzl and Petrova, this vol.). Opposite dynamics that is, towards demotion is also possible, rendering a referent decreasingly salient in a stretch of discourse. Such demotion is often temporary: entity’s discourse prominence or its semantic properties (particularly agenthood) may be parenthetically disrupted or contested, which is appropriately reflected in the special morphosyntactic arrangement: Agent-demoting constructions. These constructions signal desonance in canonical cross-mapping of the referent’s semantic roles, grammatical relations and pragmatic status. 1.
Introduction
Khanty is one of the indigenous languages traditionally genetically affiliated with the Uralic language family. It is spoken by fewer than 7000 indigenous hunter-gatherers and reindeer herders in north-western Siberia. The presentday territory settled by the Khanty lies to the east of the Ural Range along the Ob’ river in the Tyumen and Tomsk regions of Russia. Though considered to 1. The work leading to this publication was supported by the NEH-NSF Documenting Endangered Languages Fellowship, 2005–2006. Any views, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect those of the National Endowment for the Humanities and the National Science Foundation.
58
Andrey Y. Filchenko
be a single language, Khanty is a dialectal continuum with a large conventional division into western and eastern (Decsy, 1965; Jääsalmi-Krüger, 1998). The dialects of interest in this study are the adjacent eastern-most river varieties of Vasyugan, Alexandrovo and Vakh totaling fewer than 200 speakers. All Eastern Khanty speakers are bilingual with Russian being the language of daily communication across ethnic groups. The language of Khanty undergoes a steady decrease of the functional sphere, reserved primarily for occasional family use and rare peer communications. These dialects are particularly interesting as they are the least documented and represent reportedly more archaic and richer systems (Gulya 1970, Honti 1982, Kulonen 1989, Decsy 1990). Dialectal variation of Khanty is considerable to the extent that even within the cluster of Eastern Khanty many varieties are mutually incomprehensible to the speakers. In typological terms the variation is extensive, with the Vasyugan, Alexandrovo and Vakh dialects demonstrating the most distinct features: Vowel Harmony vs. Vowel Length of western dialects; up to 12 grammatical cases vs. 5 in other eastern and vs. 3 in western Khanty; 5 individual tense forms vs. 3 in other dialects; “ergative-like” (Comrie 1975; Kulonen 1989) or Loc-Agent (Filchenko 2006) constructions apart from a set of passive constructions in other dialects; conceptual variation in numerals: “18” in the east: jöG¨@rki n1l@G ‘eight after/over ten’ vs. “18” in the west nijEl-xus ‘8 towards 20’; unique eastern Khanty use of the nominalizer taG1 ‘place’ (Potanina 2007).
Figure 1. North-western Siberia, Eastern Khanty language area.
Parenthetical agent-demoting in Eastern Khanty
59
The empirical base for the discussion is a corpus of the Eastern Khanty narratives, collected and transcribed between 2000-2003, supplemented by some Eastern Khanty texts published between 1900-1995. In the discussion of the structural properties and information structuring of the Eastern Khanty clauses, I will differentiate the grammatical relations from the semantic roles of the arguments of propositions, and pragmatic functions of the referents and pragmatic operations they are involved in. Grammatical relations are henceforth indicated following Dixon (1994) as: S – intransitive subject; A – transitive subject; O – transitive non subject. The main semantic roles of the arguments of propositions (relevant for the purpose of discussion) are generally defined after Van Valin and Lapolla (1997) as either Agent or Target, each representing a host of semantic features, or entailments (cf. also Rose, this vol., for a more in-depth discussion). The notions of discourse-pragmatic status (salience), pragmatic operations, topicality are central to the discussion of the features of the constructions in the narrative. A referent is defined as pragmatically central if it shows the following properties: it belongs to the presuppositional part of the proposition, it is contextually accessible and active, in dislocation tests (“as for” and “about”) it produces the target clause, it normally does not carry the clause accent, and the rest of the proposition appears to carry a relation of “aboutness” towards it (Strawson, 1964; Kuno, 1972; Gundel, 1976; Lambrecht, 1994). In the discussion of the pragmatic structure of the data, I will also use the terminology and premises of the Centering framework as in Grosz et al. (1995), cf. the Introduction of this volume for a more detailed discussion of Centering. I will also use less conventional terms foregrounded center and backgrounded center to mark special pragmatic states resulting from the voice operations in the Eastern Khanty parenthetical constructions (cf. Hinterhölzl and Petrova, this vol. for similar notions). In this, by the backgrounded center in case of Eastern Khanty parenthetical Agent-demoting constructions I will understand a primary topical referent (Cb ), whose pragmatic status is temporary demoted (backgrounded) (C[backgrnd] = C[n−1] (= C[n+1] ) =/ C[n+2] ), while another referent with a competing discourse status but with non-typical semantic role appears temporary promoted (foregrounded) as a pragmatic center for an interval of 1–2 clauses, as in foregrounded center (C[foregrnd] =/ C[n−1] (= C[n+1] ) =/ C[n+2] ). Section (2) will provide a brief outline of the canonical Eastern Khanty active-direct clause and consider the main features of organization of grammatical relations important for the discussion to follow. Section 3 will introduce the Eastern Khanty agent demotion constructions, listing first the so called “ergative constructions” (3.1.) immediately compared to the “agented passive constructions” (3.2.). The description of the key morphosyntactic, semantic and
60
Andrey Y. Filchenko
pragmatic features of these constructions will be supplemented with the insight into their cultural contexts in naturally occurring discourse in section 4 of the paper, offering a possible pragmatic explanation of their usage. In conclusion, section 5 of the paper will provide a multi-factorial comparison of these constructions and posit their common nature, namely the demotion of the agent participant of the event (cf. Krasavina, Kelleher, Rose, Chiarcos, this vol. for related multi-dimensional models).
2.
Canonical Active-Direct Constructions
Typical Eastern Khanty simple clause shows general tendency towards the SOV pattern. Khanty unmarked neutral simple clause is verb-final where, generally, subject precedes object. The position of the O constituent may vary contingent upon its pragmatic properties, that is, brand new, inactive, unidentifiable O referents are always rigidly fixed in the SOV order. In cases of pragmatically active and identifiable O referents other orders may result, OSV and SVO. The semantic role of Agent is typically mapped to A grammatical relation, while the semantic role mapped to O relation is typically the Target, an entity saliently affected in the event. The semantic role mapped to the S grammatical relation is understood as that of a single core NP of an intransitive verb. (1)
a.
↓ mä m@n-l-im 1sg walk-prst-1sg ‘I walk’
b.
↓ ↓ mä ajr1t-äm t1Gl-a qar1-mta-s-1m 1sg canoe-1sg/sg det-ill pull-intn-pst2-1sg/sg ‘I pulled my canoe here’
The referent with the semantic role of Agent appears clause-initially, expressed by the argument in S/A relation marked by Nom. case that controls S/A-V agreement inflection on the predicate. S/A-V agreement and O-V agreement refers to Khanty single and double conjugation, i.e. co-referential agreement inflection on the transitive predicate controlled by the S/A relation and O relation respectively. Agreement controlled by S/A grammatical relation is obligatory (1a–b). Agreement between the grammatical relation of O and transitive predicate is contingent upon the pragmatic properties of the O (1b), i.e. pragmatic identifiability and activation of this referent in the interlocutors’ discourse universe.
Parenthetical agent-demoting in Eastern Khanty
61
The argument in the S/A grammatical relation normally has all the traditional subjecthood properties, such as control over referential relations clauseinternally and -externally: control over embedded non-finite clauses; control over zero anaphora across conjoined clauses; control over reflexivization; quantifier movement control. Eastern Khanty nouns in O relation are zero-marked for case, i.e. morphologically indistinguishable from the Nominative. In the pronouns, the Accusative case has a marker /-t/. Thus: mä ämp tuG@m ‘I brought a dog’ vs. ämp mänt por ‘a dog bit me’. Pragmatically, a new referent is introduced or re-activated by a full NP or a free pronoun in the S/A grammatical relation. Once the referent is identifiable as topic, its discourse salience is then expressed by elision and by verbal agreement – a preferred topic expression (Filchenko 2006). There is a strong correlation: Topic = S/A = ClauseInitial = Agent, that is within a general typological information structural pattern (Lambrecht, 1994). Topicality associates strongly with minimal morphological complexity, implicitness, while new information and focus correlate with morphological explicitness. In Centering framework terms (Grosz et al. 1995), Eastern Khanty canonical clause adheres well to the cross-linguistic prototypical correlation of formal referring expressions to attention state and inference load, i.e. in center continuation, a persisting preferred center Cp(n) is typically realized by a zero pronoun and is strongly associated with grammatical relation of S/A and subjecthood, clause-initiality and pronominalization. On the other hand, in center shifts, preferred center Cp(n) that associates with C(n+1) is typically realized by a morphologically explicit full NP. 3.
Parenthetical Agent-demoting Constructions
3.1. “Ergative” Construction Khanty ergative constructions display structural similarity to the canonical active-direct clause type, with an important exception: the S/A argument is always overt and inflected for Loc case, which typically marks spatial and temporal relation: kat-n@ ‘in (the) house’, it-n@ ‘in the evening’. (2)
a.
tS1laGt-@s-m rut’ saG1: “medved!” cry-pst2-1sg Russian manner “bear” ‘I cried2 in Russian “bear!”’
2. The small caps are used to denote the clause constituent bearing the clausal accent.
62
Andrey Y. Filchenko
b.
moS@t j1G1-n@ qol-waGta-l-1l “maybe” 3pl-loc hear-atten-prst-3pl/sg ‘Maybe they would hear it’
c.
nu jem-aki, j1Gata-l-1m j1Gata-l-1m, aGa, wajaG “nu” good-prd look-prst-1sg/sg OK, animal ‘Ok, I look, there it is, the animal’
In (2a), the canonical active-direct clause with the elided 1sg. topic is followed by the ergative (2b) with a new 3 pl referent. Counter to the preferred topic expression pattern discussed in the section above, the topical Agent referent ‘they’ in (2b) appears coded by a free Loc-marked pronoun in S/A relation and by S-V agreement inflection. The Message “Bear!” now has high discourse activation status, marked in (2b) by 3sg O-V agreement on the predicate. However, the narrative resumes canonically in the immediately subsequent active-direct clause (2c), where the topical status of the 1sg referent has the preferred topic expression. That is, the demoted 1sg. topic referent of (2a) reappears expressed in (2c) by elision and co-reference agreement inflection on the predicate, thus retaining its overall discourse salience. Superficially, nothing in the structure/grammar of these clauses, and their immediate discourse environment, precludes the use of the canonical activedirect construction type to express the same semantic content. These features correlate to general pragmatics of Topic-comment (2a) vs. marked predicatefocus (2b) with the Loc-marked S/A agent pronoun. In respect to discourse information structuring, the ‘ergative’ events appear parenthetical, consequential (reactive) in their nature, representing a causeeffect or action-consequence dependence upon the event in the preceding activedirect clause: (2a) implies the projected effect of (2b). Although the affect is often implicit, however, the affectedness of the Target referent (expected in the typical transitive event) is never specified. (3)
män-n@ tSimläli tSi-näm joGo-s-im, tSut-na-pa 1sg-loc a.little det-lat shoot-pst2-1sg det-com-top @nt-im-äki neg-pp-prd ‘I shot there a little, nothing happens’
Thus, the Loc-marked ergative S/A referents, although mainly inherently agentive (definite human/animate), are deprived, at least in part, of some of the subjecthood properties: control/volition, which correlates with the fact of their increased morphological complexity: oblique case marking of the Agent. This is
Parenthetical agent-demoting in Eastern Khanty
63
yet another departure from canonical Topic-Agent-S/A arrangement. Pragmatically, the Eastern Khanty ergative clause’s preferred center Cp(n) appears realized implicitly, whereas the current foregrounded center Cf(n) does normally not correspond to either C(n−1) or C(n+1) . The fact that for a typical ergative clause U(n) , consequently salient entity C(n+1) is typically the same as the previously salient entity C(n−1) shows that “ergative” clauses are largely a centerretaining relation, which complies with the cross-linguistic preference (ergative clauses do not sequence), that is Center[foregrnd] = Prn(typical) = Cf(n) =/ C(n+1) = Subjecthood(−/+) . A more complete list of properties of this construction is as follows: – Agent is typically clause-initial, marked for Loc case and mapped to A grammatical relation controlling co-referential verbal agreement – matrix predicate is a transitive verb in active morphological form, normally expressing a perfective action, however, the affectedness of the Target is uncertain and underlying resultant transitivity of the event is low; – the argument with the semantic role of Target, is expressed by a full Ømarked NP or the Acc-marked pronoun; – prosodically, the Loc-marked ergative Agent, particularly pronominal, does not carry the sentence stress, whereas the active-direct A arguments have sentence accent of some kind; – ergative constructions mark temporary alteration of the discourse center, foregrounding a Loc-marked Agent other than current discourse topic; – the primary discourse topic preceding the ergative clause, reappears in preferred topic expression (elision and verbal inflection), thus maintaining its overall discourse topicality (salience); – ergatives code events in reactive semantic relation to preceding clause; – overall type frequency: average 12%. 3.2. Agented Passive Clauses One of the most typical Eastern Khanty passive constructions has overt agent referent marked with Loc case and mapped onto a non-S/A grammatical relation. The referent mapped to S grammatical relation is instead the one in the semantic role of Target. (4)
min-n@ tü taG1 jöG-ä ert@l-s-i 2du-loc det place 3ag-ill tell-pst2-ps.3sg ‘We (two) told him all about this’
64
Andrey Y. Filchenko
In passive (4), the ‘message’, is essentially equated to Target and promoted to the S relation, whereas the Agent-speaker is obliqe case marked. The analysis of the sequence (5a–b) reveals some of the structural features and discourse functional patterns of the Eastern Khanty passive constructions. In the active-direct (5a), the topical agentive referent 1sg S/A, is expressed predictably by elision and the predicate agreement inflection.
In the adjoined passive (5b), the Target referent ‘small dog’ is expressed by the full NP in the S relation controlling 3sg verbal agreement, whereas the Agent referent is expressed by the free Loc-marked pronoun in the non-S relation. The topic in (5b) is the S argument ‘small dog’, whereas the 1sg agent is apparently temporarily demoted to an oblique-like role. The relevance of the agent in the proposition (5b) is still manifested through its overt presence, to minimize potential ambiguity. However, in the active-direct (6), the demoted 1sg agent reappears in the S/A relation, remaining topical, coded appropriately by elision and predicate agreement inflection, and controlling the S/A relation over the non-finite modifiers (inflected for 1sg), thus unaffected discoursively by the passive demotion.
With regard to information structure, the correlation of [pragmatic function – to semantic role – to grammatical relation] (TOPIC = Agent = S/A) translates in passive into (TOPIC = Target = S). However, the referent with the role of Agent, demoted in passive from A to O grammatical relation, maintains some pragmatic properties testifying to its retained discourse centrality that allows it to emerge as topic in the immediately subsequent discourse without any special
Parenthetical agent-demoting in Eastern Khanty
65
topic promotion means, i.e. expressed by elision and co-referential agreement on the predicate. Pragmatically, Eastern Khanty agented passives typically have referents with competing discourse centrality. One, a primary topic, a backgrounded center Cb (often corresponding to a C(n−1) ) has the semantic role of Agent realized by Loc-marked grammatical relation O, typically a free pronoun, that appears to have some of the subjecthood properties and which typically corresponds to the subsequently salient entity C[n+1] . That is, O = Prn(typical) = Agent = Subjecthood(+/−) = Cb = C[n+1] = Cp(n) . Another referent with competing discourse centrality, a secondary topic, a foregrounded center Cf with the role of Target is realized by Nom-marked grammatical relation S, typically a full NP or a free pronoun, clause-initial, also having some subjecthood properties, such as predicate agreement control, and which is highly improbable as C(n+1) , i.e. S = NPfull = Target = Cf =/ C[n+1] = Subjecthood(−/+) . This manifests the centerretaining relation of agented passive clauses, adhering to the cross-linguistic center-continuation sequencing preference constraint (agented passives do not normally sequence over 2 clauses in Eastern Khanty). Summary of the key structural and discourse-pragmatic features of Eastern Khanty passive constructions is as follows (Filchenko 2006): – Target is typically a full NP unmarked for case, mapped onto the S grammatical relation, controlling the S-V agreement on the predicate; – Agent is typically a free pronoun or full NP, mapped onto a non-S grammatical relation, marked by oblique Loc case; – passive marks a change in the pragmatic status of the referents, temporarily foregrounding the non-Agent, promoting it to the S relation; and backgrounding the Agent, demoting it to the non-S relation (oblique); – however, while at the clausal level the pragmatic status of the referents is altered in course of passive clauses, at the level of overall discourse the agent referent maintains high pragmatic status – discourse salience, which follows from its canonical preferred topic expression by elision and agreement inflection in the subsequent active-direct clauses; – passive predicates are prototypically transitive verbs, implying two core arguments, one of which, the demoted O, is high in agentivity status, while the promoted S, is reduced in agentivity being affected in the event; – passive-active sequences, testify that passive is a marked construction type, requiring a special arrangement of the referents, which is outside the canonical pattern of mapping pragmatic functions - to semantic roles - to grammatical relations;
66
Andrey Y. Filchenko
– passive manifests the Eastern Khanty tendency for Topic initiality, i.e. the strongest alignment appears <pragmatic function = grammatical relation>, overriding that of <pragmatic function = semantic role> or <semantic role = grammatical relation>; – type frequency in the narratives ∼ 13%. The Eastern Khanty passives with overt Agent appear to resonate with the general fundamental function of passives “having to do with defocusing of agents” (Shibatani, 1985). The structural properties of the Eastern Khanty voice constructions listed above are not typologically unique, nor are they new to the language description. However, the exact identification of the motivation of their usage, i.e. the functional explanation of these grammaticalized form-function correspondences, is not accounted for. In the meanwhile, the explanation of the choice of speakers’ strategies of mapping of the propositional-semantic content to the structural features and discourse functions in the passive constructions may be aided by the insights into the specifics of the cultural context typically correlating to frequent use of these constructions, passive voice talk. 4.
Cultural Context and Pragmatic Explanation
4.1. Passive voice talk It can be posited that the use of the passive voice talk in Eastern Khanty correlates most typically to the cultural context of marriage and appears illustrative of the conventionalized cultural frame. The choice of grammatical resources appears in correlation with the conventionalized cultural practice (patriarchal/patrilocal residence and strict exogamous marriage).
Parenthetical agent-demoting in Eastern Khanty
67
Thus, within the cultural context of marriage/family, the woman is “given” by her family to go and live with the husband, acts as a Target of ‘giving’, ‘taking’ and ‘keeping’. The man, on the other hand, is the one who is ‘taking’ and ‘keeping’ the woman, acts as a volitional agent.
In the Eastern Khanty conventionalized cultural frame of marriage, the social role of wife towards the man typically correlates in linguistic discourse to the semantic role of the Target, while the social role of husband typically correlates to the semantic role of the Agent. The only consistent discrepancy from the above correlation, is the ‘wife = agent’ in motion constructions, where ‘bride’3 is the Agent of motion events. This motion implies both the literal sense (physical relocation) and a more abstract sense of transferring oneself from the general domain of the Father to the general domain of the husband: ikija m@nta ‘to marry (Lit. towards husband go)’, i.e. becoming associated with the general domain of Husband, as implied by the frame:4
Figure 2. Marriage - ‘social role’ transition = ‘space’ transition.
3. In the Eastern Khanty cultural frame implies that the status of ‘wife’ is complete only upon arrival and observing certain rituals at the new family location (new house), the ‘husband’s’ clan residence. 4. Eastern Khanty have patrilineal, patrilocal setting, where the oldest son normally resides with his Father and inherits from him.
68
Andrey Y. Filchenko
This spatial/abstract transition (motion) is not genuinely agentive, however, in the sense of volition and control, as the cultural frame implies that the agent (woman) is not the one who is truly controlling and volitional in the motion event, but complies with the external will. The Agent (woman) never acts alone, on her own accord, being rather taken/accompanied to a new location by the man. Thus, it can be posited, that the key linguistic features (passive voice) of the Eastern Khanty marriage ‘talk’ are informed and conditioned by the dominant cultural patterns, norms of social behavior. This, perhaps, to an extent anecdotal exemplication of the cultural groundedness of the Eastern Khanty agented passive constructions sought not to identify the functional range or in any way to predict possible grammaticization route, but rather illustrate possible cultural mechanisms underlying the usage of these constructions, which appears to be the tendency to demote the agent referent. The demotion of the agent here appears an operation having to do with de-emphasizing of the core participant in the event, moving it towards the periphery of the semantic frame of the event, manifested by an increase in morphological complexity of the argument, that is, Loc case-marking of the passive Agent, a fairly consistent typological pattern (Shibatani 1985). 4.2. “Ergative” voice talk Eastern Khanty “ergative” constructions show similar general features of agent demotion. That is, agent arguments here are more like adjunct clause constituents marked with oblique case, generally rendering agent and the whole event as less volitional, controlled. Similar “ergative” constructions in remotely related Finnic languages were referred to in early descriptions as logically impersonal sentences, where events were conceptualized by speakers as caused by other unapparent forces. A human here is not granted adequate agentive status, merely marking a locus of an event, and appears in essence a semi-responsible performer of an act (Bubrix 1946; Balandin 1946). The most frequent cultural context for the “ergative” voice talk in Eastern Khanty appears to be that describing interactions with bears.
Parenthetical agent-demoting in Eastern Khanty
69
Bear is an extremely significant cultural agent for many of the Siberian native cultures, and Eastern Khanty in particular. From the available ethnographic accounts (Tschernetsov 1974, Kulemzin 1984, Lukina 1990) and surviving oral folk tradition (Filchenko 2009) it is known that behaviour towards the bear is highly ritualized, with omnipresent taboos and restrictions. Having basically equal status with a human, a bear is the biggest and most dangerous local predator, referred to as wont-iki ‘forest master’, or qaq1 wajaG ‘brother animal’, and just wajaG ‘animal’, and almost never by a proper taboo term jiG ‘bear’. Bear is the only animal who has a complete set of taboo terms for body parts unrelated to proper somatic nomenclature of humans and other animals: cf. kil ‘bear’s stomach’ vs. qon ‘human/animal stomach’); laGl’ip ‘bear’s tooth’ vs. p¨oNk ‘human/animal tooth’, etc). Bear’s bones are never broken and are specially disposed of out of the reach of dogs and other animals, while the skulls are kept by the hunters on house roofs (cf. photo below). The events of hunting the bear and feasting over it are of extreme significance and ritual value, characterized importantly by concealment of the identities of the hunters (mask (cf. photo below) and nicknames) interpreted as a need to avoid possible retaliation from the bear’s spirit (Kulemzin 1984, Filchenko 2007). On the other hand, the ability to hunt bear successfully and thus provide for the community translates into a prominent social status for the hunter (cf. photo, Figure 3). Thus, within the Eastern Khanty cultural convention, on the one hand, there is an apparent tendency towards considerable caution and avoidance of personal association with the bear, and particularly affecting the bear. On the other hand, there is an apparent social significance of the status of a successful bear hunter. This special cultural status and behaviour conventions towards the bear correlate to the special treatment of it linguistically, in that a special pragmatically motivated morphosyntactic operations are implemented aimed at demoting the
70
Andrey Y. Filchenko
Figure 3. Cultural significance of the bear for Eastern Khanty.
status of the agent in the otherwise typical transitive event. In other words, the culturally conditioned tendency towards avoidance of association with affecting the bear is manifested in the linguistic avoidance of discourse salience of the agent in the propositions describing this type of events (cf. also Claus, this vol., outlining experiential effects on language processing). The above, however, should not be seen as a statement that this construction is restricted exclusively to the contexts involving the bear. In (15), the context involves hunting for the biggest local fish, pike. (15)
a.
b.
c.
mä sart wel-s-@m, ¨@ll¨@ 1sg pike kill-pst2-1sg big ‘I caught a pike-fish, a big one’ ¨Oll¨@ sart män-n@ löGöli-s-im big pike 1sg-loc cut-pst2-1sg/sg ‘I prepared the big pike’ terkä-s-im iwes-n@ fry-pst2-1sg/sg stick-loc ‘I fried it on sticks’
Parenthetical agent-demoting in Eastern Khanty
71
In (15a), the human Agent appears clause-initial, controlling predicate agreement. The established pattern would predict further maintenance of this referent as topical by elision and agreement inflection. However, counter to this expectation, in (15b), this referent appears expressed by the free 1sg Loc-marked pronoun and 1sg predicate inflection. The referent “pike” has become identifiable and accessible textually in (15b), which is also evident from the marked O-V agreement. After the temporary alteration by the “ergative” (15b), the narrative discourse resumes in the expected canonical way in the immediately following active-direct (15c), where the topicality of the 1sg Agent referent is canonically expressed by the elision and the 1sg agreement on the predicate. Thus, the pragmatic, semantic and structural features of (15) comply to the established pattern for the “ergative” agent-demoting construction, which is used contextually at exactly the point in the event structure, where an apparent agentive affecting the Target by the Agent is coded at the formal level in an agentivity and salience avoidance manner. Reflexive or middle events may also be coded in Eastern Khanty by the “ergative” constructions consistently with the lower control/volition context, where in (16), adverbials specifying the degree of intentionality appear optional if not redundant. (16)
a.
b.
män-n@ köt-äm (mil-näm / %toGoj) öGö-käs-@m 1sg-loc hand-1sg (touch-rfl / away) cut-pst3-1sg kötSäG-nä knife-com ‘I cut my hand with a knife (incidentally / %on purpose)’ mä köt-äm kötSäG-nä (mil-näm / toGoj) öGö-käs-@m 1sg hand-1sg knife-com (touch-rfl/away) cut-pst3-1sg ‘I cut my hand with a knife (incidentally/on purpose)’
Notice, that in the Agent-demoted (16a), the ‘incidental’ interpretation and respective adverbial use is preferred to the ‘intentional’. However, in the reflexive event with the active-direct transitive clause and canonical Agent coding (16b) both, the ‘incidental’ and ‘intentional’ interpretations and respective adverbial use are acceptable. Unlike (16b), which allows for volitional, purposeful event of acting on oneself, the “ergative” voice in (16a) codes less intentional, defocused Agent and typically has a ‘reading’ of the less volitional, unintentional Event/Action The “ergative” agent-demotion construction is also attested with modal verbal predicates, particularly of cognition (17), which are in many features similar to the already exemplified perception predicates (‘look / aim at’).
72 (17)
Andrey Y. Filchenko
män-n@ onql-l-@m tom qu ju-w@l 1sg-loc know-prst-1sg det man walk-prst.3sg ‘I know the man, who is walking there’
In their features, these examples are consistent with the established pattern of low Target affectedness and reduced/unapparent agentivity and control. In referential terms, the preference for the contextual use of the construction holds for the non SAP. (18)
Igorenka SaSka-n@ sam-a tSi-näm joGo-w@l Igorenko Sashka-loc mug-ill det-lat shoot-prst.3sg ‘Sashka Igorenko shot at the (bear’s) mug’
In (18), the 3sg non-SAP agentive referent is demoted in the context of bear hunting, implying agent’s affecting the bear, but the agent’s control and volition in the proposition is backgrounded by Loc case-marking. In less culturally significant contexts, the “ergative” agent-demoting constructions are also attested, however, the general pragmatic tendency of agentivity and salience avoidance holds consistently (19): (19)
a.
Matrena Jakowlewna, temi nuN rabota-n? muGuli t@m Matrena Jakovlevna det 2sg work-2sg what here w@r-s-@n? do-pst2-2sg ‘Matrena Jakovlevna, is this your job? What did you do here?’
b.
(temi) @nt@ män-n@, metali-p @ntu-s-@m (det) neg 1sg-loc some-top neg-pst2-1SG ‘(That’s) not me, I did not do anything (nothing happened)’
Notice, in the reply utterance (19b), the Agent is coded as the clause-initial argument marked by the Loc case, with the main pragmatic function being the agent demotion, making it less volitional, controlling, affecting and, importantly less topical for the given stretch of the discourse. The non-topicality, and rather the focus pragmatic relation of the Agent argument towards the whole of the proposition (19b), is an important feature, the one by which the presupposition of (19a) is falsified, or made absent. Thus, it is not incidental or random that the morphosyntactic properties of the Agent argument in (19b) are consistent with those of the demoted Agent voice construction. On the other hand, the nontopicality of the role of Agent here, its coding by the Loc-marked NP aligns it with another marked Agent-demotion construction, the agented passive.
Parenthetical agent-demoting in Eastern Khanty
73
In some of the examples (20), the Target referent may be elided from explicit coding, being expressed only by predicate agreement inflection, whereas Agent is overt and marked formally by the Loc case. These examples testifying additionally to the Agent’s demotion from the core of the proposition towards the periphery, while the elided Target, being elided, appears more topical, pragmatically foregrounded. (20)
män-n@ tSäs qötS@G-näti tuG1 tSoG-l-uj-@n 1sg-loc now knife-instr away cut-prst-ps-2sg ‘I’ll cut you up with a knife now’
This could be seen as a further pragmatic and overall semantic convergence of the two reviewed Agent-demoting voice constructions, the “ergative” and the agented passive.
5.
General Characteristics of the Agent-demoting Constructions
The Eastern Khanty Agent demoting constructions (Loc-marked agents) demonstrate a mixture of features typical of both subject and non-subject arguments. Though demonstrating regular S/A-V agreement control and agencysubjecthood features, the Locative case marking aligns these Agents with nonagentive locative arguments of motion/posture/state propositions, which are essentially intransitive in their nature. These features correlate consistently with the cross-linguistic observations on constructions with non-canonically marked Agent arguments, particularly with the fact that among the predicates requiring the non-canonical marking of the Agents, those expressing uncontrollable activities are typical to the extent that control vs. non-control may be “a generally applicable semantic feature” (Onishi 2001). It is also observed cross-linguistically that in general, oblique case marking of the core arguments “reflects decreased transitivity status of the whole clause” (Onishi 2001). This tendency could be represented as in Figure 4 below in an adaptation of the Onishi continuum (Aikhenvald et al. 2001). Typologically, the well documented variety of manifestations of ergativity introduces less discreteness to the category, allowing for “ergative-like” behaviour in otherwise prototypically nominative languages, making it less a category, but a scalar pattern of organisation of grammatical relations, which can be present to a varying extent at different levels of a language system (Comrie, 1978; Dixon, 1994).
74
Andrey Y. Filchenko (+) agent's subjecthood (+) control/volition (+) clause/event transitivity (+) agent’s pragmatic salience Nominative case canonical
(-) agent's subjecthood (-) control/volition (-) clause/event transitivity (-) agent’s pragmatic salience Locative case agent-demotion
Figure 4. Continuum of pragmatic, semantic and morphosyntactic features vs. canonical/noncanonical Agent coding.
Demotion of the Agent referent in both of the reviewed Eastern Khanty constructions appears an operation having to do with defocusing of the core participant in the event, moving it towards the periphery of the semantic frame of the event, typically manifested by an increase in morphological complexity of the Agent referent, i.e. cross-linguistic examples of the passive constructions (Shibatani, 1985). Among the essential factors that underlie attested ergative, nominative or split systems, it is often suspected that larger discourse pragmatic and/or semantic considerations might be the key conditioning factors. Particularly the degree of topicality/referentiality, as well as volition/control of referents may affect the inclination in a language’s grammatical relations either towards ergativity or nominativity. Morphological complexity of the Agent argument of the Eastern Khanty “ergative” construction quite consistently correlates with prototypical ergative continuity between S and O at the deeper level, i.e. A of the “ergative” approximates the O of the active in its pragmatic and semantic features: decreased topicality, low agentivity/control, approaching the semantics of experiencer/undergoer. In “ergative” clauses, the overt Loc-marked human/animate Agent, the low transitivity of the morphologically active verbal predicate, predicate agreement control, and its parenthetical character (one clause length followed by canonical active-direct clauses with the continuing topic expressed by elision) signal temporary pragmatic demotion of the low control/volition Agent in the consequential event, where the agentive nature of the Agent is de-emphasised, consistent with specific cultural conventions and practices. In passive clauses, such features as: demoted overt Loc-marked Agent, the high semantic transitivity of the morphologically passive verb, promoted caseunmarked Target controlling predicate agreement, and parenthetical character (1–2 clause length followed by a canonical active-direct clause with the continuing topic expressed by elision) communicate temporary pragmatic prominence
Parenthetical agent-demoting in Eastern Khanty
75
of the Target in the spontaneous/consequential event, where the pragmatic topicality (salience) and causer nature of the Agent is de-emphasized. The Eastern Khanty Agent-demoting constructions, the passive and “ergative” are similar as they manifest parenthetical establishment of an alternative, secondary topical discourse referent, whose pragmatic status (topicality) is briefly competing with that of the primary topical agentive referent, which is expressed by the temporary demotion (backgrounding) of the Agent referent. The secondary discourse topic is typically coded by a full NP or a free pronoun, which, as shown at the onset, is not a preferred primary topic expression in Eastern Khanty. What appears to be differing in these Agent-demoting constructions, motivating their co-existence in Eastern Khanty, is the variation in the pragmatic status of the two core roles in the proposition within the discourse context. That is, the Agent-demoting constructions temporary demote (background) current topical Agent, rendering it less controlling/volitional, and possibly parenthetically promoting (foregrounding) another referent for the length of the utterance. The passive construction, while also demoting (backgrounding) the Agent, is not primarily concerned with its agentivity features (as in the case of Loc-Agent construction), but rather aims to promote (foreground) the non-Agent, Target role to the discourse fore. More broadly, within the Eastern Khanty system, Agent-demoting constructions vs. canonical active-direct, indicate a general consistency in the Loc marking of the Agent with the particular pragmatic and semantic environments. The identification of the two Agent-demoting constructions based on discourse pragmatic parameters is also supported by the indication of what appears to be their complementary distribution in the narrative discourse. That is, these constructions have compatible type frequency in the narratives (12%–13%), however, they appear to show counter-proportional or mutually exclusive frequency in the same narratives, as evident in the original corpus and from prior studies (Kulonen 1989).
Conclusion This case study of two agent-demoting constructions in an indigenous language of Siberia offers an empirical contribution to the study of the wider issue of discourse salience. Contrasting the structural features, discourse-pragmatic functions and propositional-semantic content of two types of referring expressions in their narrative environment, I attend to the issues of information structure and the dominant cultural contexts of the expressions. Based on the analysis, I posit,
76
Andrey Y. Filchenko
that a wide cognitive faculty, facilitating discourse coherence in the given cultural context is at play in structuring the information and ultimately affecting the linguistic form, governing the choice of referring expressions of the arguments of propositions. The system’s specific grammatical resources consistently correlate with identifiable and predictable pragmatic and semantic properties, manifesting speaker’s choices in representing the salient entities across a stretch of discourse. The Eastern Khanty voice constructions typically code ultimately de-transitive events with multiple referents potentially competing for discourse salience. The “agented passive” and “ergative” constructions manifest parenthetical shifts in salience of the agentive discourse referents, allowing for gradience in discourse prominence, primary and secondary topicality. This is expressed by promotion/demotion voice operations, oblique case marking and grammatical relation shifts of agents. The implications of the above for the issue of discourse salience are various. What is salient in the discourse appears to be a matter of gradience and underlyingly culturally motivated. The salient entity in a stretch of discourse is a persistent center in a sequence of utterances. Centering is a multifactorial discourse phenomenon controlled by an interaction of cultural convention, pragmatic, lexical semantic and syntactic features. “Ergative” and passive constructions show that: i) center continuation sequencing is indeed preferred over center retention sequencing; ii) Cp(n) does strongly associate with clause-initiality and pronominalization, which confirms correlation of low morphological complexity to high pragmatic salience; iii) Cp(n) ’s association with subjecthood or grammatical relation of S is though strong, however, not impenetrable for such factors as cultural preferences, speaker intentions and pragmatic pressures; iv) in the non-canonical constructions S relation is indeed firmly tied with a relatively high degree of discourse salience, albeit occasionally temporarily foregrounded compared to the primary persisting discourse center; v) in a stretch of utterances more than one simultaneous discourse centers are possible, in which case; vi) the choice of referring expression is determined by the speaker’s intentions interacting with prevailing patterns in the linguistic system, such as: word-order, clause-initiality of the center, association of semantic role(Agent) – grammatical relation (S/A)–and grammatical role/function (Subject). List of abbreviations (glosses) acc – Accusative case Ag – Agent atten – Attenuative affix
com – Comitative case cond – Conditional affix det – Determiner
Parenthetical agent-demoting in Eastern Khanty dim – Diminutive affix du – Dual number el – Elative case Ep – Epenthetic vowel/consonant ill – Illative case impp-imperfective participle impr – Imperative affix inf – Infinitive affix instr – Instrumental case intn – Intensive lat – Lative case loc – Locative case neg – Negative particle np – Noun phrase mmnt – Momentative affix pl – Plural number 1sg – 1 person singular 1sg/sg – verbal agreement or possessive affix (1sg – Agent/Possessor, sg – Object/Possessed) 3pl – 3 person plural
77
3pl/sg – verbal agreement or possessive affix (3Pl – Agent/Possessor, SG – Object/Possessed) 3sg/pl – verbal agreement or possessive affix (3SG – Agent/Possessor, PL – Object/Possessed) pp – Perfective participle prd – Predicator affix prst – Present-Future tense ps – Passive voice affix pst1 – Past tense affix #1 pst2 – Past tense affix #2 pst3 – Past tense affix #3 pst0 – Suffixless past tense affix rfl – Reflexive particle/affix sap – Speech act participant sg – Singular number top – Topicality marker tr – Trajector tr – Transitivizer affix
References Aikhenvald, A.Y., Dixon, R.M.W., Onishi, M. 2001 Non-canonical marking of subjects and objects. Amsterdam/Philadelphia: Benjamins. Comrie, B. 1978
Ergativity. In “Syntactic Typology” ed. W.P.Lehmann. UT Austin Press.
Dixon, R.M.W. 1994
Ergativity. CUP.
Du Bois, J. 1987
Discourse basis of ergativity. Language 63.
Filchenko, A. 2006
The Eastern Khanty Loc-Agent Constructions. Functional DiscoursePragmatic Perspective. In: Demoting the Agent, Ed. Torgrim Solstad and Benjamin Lyngfelt. John Benjamins. Amsterdam-New York.
78
Andrey Y. Filchenko
Filchenko, A.Y. 2009 Landscape Perception and Sacred Places amongst the Vasyugan Khanty. In: ed. P.Jordan: Landscape and Culture in the Siberian North. UCL Press. Filtchenko, A.Y. Field notes from ethno-linguistic research of Eastern Khanty. The Field Archive of the Laboratory of Siberian Indigenous Languages at TSPU. Tomsk. Givon, T. 2001
Syntax. An Introduction. Amsterdam/Philadelphia: John Benjamins.
Grosz, B., A. Joshi and Sc. Weinstein 1995 Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 21. pp. 203–225. Gundel, J.K. 1976
Topic-comment structure and the use of tože and takže. Slavic. and East European Journal 19. pp. 174–176.
Jordan, P. and Filtchenko A. 2005 Continuity and Change in Eastern Khanty Language and Worldview. In: “Rebuilding Identities: Pathways to Reform in Post-Soviet Siberia” edit. Erich Kasten. Dietrich Reimer Verlag. Karjalainen, K.F. 1927 Die Religion der Jugra-Völker. Parvoo. Klimov, G.G. 1984
Nominativnoe i ergativnoe predlozhenia. Moscow.
Kulemzin, V.M. 1984 Celovek i priroda v verovanijakh Khantov. Tomsk. Kulemzin, V.M. 1995 Mirovozzrencheskie aspekty ohoty i rybolovstva. In. V.I. Molodin, N.V. Lukina, V.M. Kulemzin, E.P. Martinova, E. Schmidt, N.N. Fedorova (eds.) Istoria i kultura Khantov. Tomsk. pp. 45–64. Kulonen, U.-M. 1989
The Passive in Ob-Ugrian. Helsinki.
Lambrecht, K. 1994
Information Structure and Sentence From. Cambridge: CUP.
Li, C. (Ed.) 1977
Subject and Topic. London: Academic Press.
Lukina, N.V. 1990
Obshee i osobennoe v kulte medvedja u obskix ugrov. Obrjady narodov severo-zapadnoj Sibiri. Tomsk.
Parenthetical agent-demoting in Eastern Khanty Sarkany M. 1989 Shibatani, M. 1985 Trask, R.L. 1979
79
Female and Male in Myth and Reality. Uralic Mythology and Folklore, Bp., Helsinki. Passives and Related Constructions. Language. V-61, #4. pp. 821– 848. On the origins of ergativity. In: Plank F. (ed), Ergativity. Towards a theory of grammatical relations. London – New-York.
Tschernetsov, W.N. 1974 Bärenfest bei den Ob-Ugriern. Acta Ethnographica Academiae Sceintiarum Hungaricae, t.23 (3–4). Budapest. Van Valin, R. and Lapolla, R.J. 1997 Syntax. Structure, meaning, and function. CUP.
Joint information value of syntactic and semantic prominence for subsequent pronominal reference Ralph L. Rose
1.
Introduction
Many studies of discourse production and perception have observed that entities evoked in subject position are treated somewhat differently than those evoked in other positions when those entities are referred to subsequently. For instance, consider the short discourse in (1). (1)
a. Luke i hit Max j b. Then he i/#j ran home. b’. Then #Luke/Max ran home.
While the pronoun in (1b) is ambiguous and could be interpreted as referring to either luke or max, the preferred interpretation is luke, the subject of the preceding sentence (cf., Hudson-D’Zmura and Tanenhaus 1997; Mathews and Chodorow 1988). Similarly, repeated reference to luke by name as in (1b’) is more marked than repeated reference to max by name (Gordon, Grosz, and Gilliom 1993; Almor 1999; Almor and Eimas 2008). These observations are from the hearer’s perspective, but even from the speaker’s perspective, similar preferences have been observed. Brown (1983) observed that entities introduced as subjects persisted longer than those introduced in other syntactic positions: That is, there were more contiguous utterances in which the entity was referred to again. Many models of discourse production and processing capture these observations through two assumptions. First, the salience of entities evoked in a discourse determines how subsequent reference to those entities should be performed or interpreted (Ariel 1988; Gundel, Hedberg, and Zacharski 1993; Gordon and Hendrick 1997, 1998). Second, syntactic information1 is a primary 1. An alternative to syntactic prominence is word order, which for simplex clauses at least, often results in the same ordering. This is the approach taken in Gernsbacher and Hargreaves (1988). Further, Hinterhölzl and Petrova (this volume) show that
82
Ralph L. Rose
or even the sole factor which determines salience (Grosz, Joshi, and Weinstein 1995; Lappin and Leass, 1994). Thus, according to this kind of model, the first sentence in (1) introduces two entities into the discourse representation, luke and max. With respect to the syntactic prominence hierarchy shown in (2), in this representation, luke is more salient because it was realized in subject position while max is less salient having been realized in object position. (2)
subject > object > oblique > none
One problem with this account is that in such languages as English, syntactic information is often conflated with semantic information. That is, syntactic subjects are often semantic agents and carry more Proto-Agent entailments (e.g., sentience, volition; Dowty 1991), while syntactic objects are often semantic patients and carry more Proto-Patient entailments (e.g., undergo change-of-state, causally affected). Thus, assuming a semantic prominence hierarchy as in (3) (cf., thematic hierarchies in Dorr, Habash, and Traum 1998; Jackendoff 1972, 1990; Speas 1990), it could be the case that luke is more salient than max in (1a) not because it is realized in subject position, but rather because it is realized as an agent (of the hitting event). (3)
agent > patient > others
Along these lines, Stevenson et al. (2000) argues that semantic focusing is a crucial factor helping to explain such inter-utterance coherence effects. However, Miltsakaki (2007) shows that semantic focusing alone is insufficient to explain related phenomena in Greek and suggests that things may be more complicated. The purpose of this paper is to investigate syntactic and semantic prominence. Specifically, I am seeking to answer the question, “What is the relative contribution of syntactic prominence and semantic prominence to the salience of entities evoked in a discourse?” I investigate this question with a corpus investigation which looks at coreference across adjacent utterances and the form of referring expression (pronoun or description) used in subsequent reference. The results are presented in terms of Information Theory (Shannon 1948) and suggest that while syntactic and semantic prominence are comparably informative about the form of subsequent reference, taken together, syntactic and semantic prominence are more informative than either is alone. in Old High German, word order involving verb placement indicates the discourse status of referents.
Joint value of syntactic and semantic prominence
83
In the next section, I describe the basic discourse model I assume in this paper and then in Section 3, I describe the corpus used in this study. Section 4 contains an overview of Information Theory and particularly the concept of the value of information. I report the results of the study in Section 5 along with interleaved discussion. 2.
Discourse model
In this paper, I assume a model of discourse processing in which the current utterance is processed with respect to the context; that is, the representation of the discourse so far (Kamp and Reyle 1993; Kehler 2002). I assume that the context contains representations of the entities evoked in the discourse. Following Karttunen (1976) and Heim (1982, 1983), I call them discourse referents (or just referents for short). The set of referents is a partially-ordered list, the order of which is determined by salience – “the degree of relative prominence of a unit of information, at a specific point in time, compared to the other units of information” (cf. the Introduction, this volume). A number of factors contribute to salience including syntactic role and recency of linguistic expressions in a discourse (see Hirst 1981 and Mitkov 2002 for an overview of these and many other factors). Non-linguistic factors may also contribute to the salience of referents including visual salience (e.g., Kelleher, this volume) and possibly prominence within a mental simulation (e.g., Claus, this volume). However, in this paper I will focus only on linguistic factors. I take the highest ranking referent to be the most salient referent in the current context. As such, if this referent is evoked in the current utterance, then it should be done so pronominally (cf., Rule 2 of the Centering Framework of Grosz, Joshi, and Weinstein 1995). This then is a useful metric for determining which referents in the context are more salient than others (see Krasavina in this volume for more extensive discussion of referential choice and how this works in Russian). This is the approach I use in the corpus analysis in order to examine which referents are most salient and subsequently which syntactic and semantic features are most informative for determining their salience. However one simplification I make is to assume that recency determines that all referents evoked in the most recent utterance are more salient than those evoked in earlier utterances. Thus, while inter-utterance coreference could conceivably span multiple utterances, the present study only considers coreference in adjacent utterances.
84
Ralph L. Rose
The theoretical approach which I take in this study embodies the speaker’s point of view in discourse processing. In other words, I am investigating what the speaker takes as salient in the discourse and the encoding decisions made as a result of that. However, I take salience to be a feature of discourse representation which is ultimately used by both hearer and speaker in their respective tasks. The precise way in which each uses salience may be different, but I assume that they rely on the same core notion of salience in the process of discourse production or perception (cf., Prince 1986; Blutner 1998, 2000; though see Chiarcos in this volume for a detailed view of how speaker and hearer salience may be distinguished). 3.
Corpus design
The corpus is composed of texts selected from an on-line, refereed magazine of fiction called InterText (http://www.intertext.com/magazine). At present the corpus contains five complete texts of varying length comprising a total of 5,480 words. The selected texts are third-person narratives with minimal quoted passages. These texts were manually marked-up using XML. In this section, I describe the relevant mark-up elements and how the corpus was analyzed in order to answer the main research question. It is important to note here that the corpus mark-up was performed entirely by myself. Thus, at present there is no interrater validation. However, numerous passes over the corpus by me have likely ensured a high degree of intra-rater consistency. 3.1. Utterances The texts were first parsed into sentence nodes, <s>, based on their appearance in the text: word strings terminated by a period (except of course for periods marking an abbreviation). The <s> nodes were further marked with a relatively shallow parse based on clause relations. Each clause,
, contained at most one child. The noun-phrase, , and clausal arguments of a verb were marked as siblings of the . The text shown in (4) was thus tagged as in (5) (leaving out currently irrelevant details). (4) (5)
John hit Matt. He told his teacher that John did so. <s> John hit Matt . <s>
Joint value of syntactic and semantic prominence
85
He told his teacher that John did so .
In the analyses which follow, I will be investigating instances of inter-utterance coreference. In terms of the corpus, I define an utterance as a node which is the immediate child of a <s> node. Thus, the embedded clause in (4), John did so is not an utterance. On the other hand, conjoined clauses (e.g., [S [C The building is tall] and [C it is old.]) are treated as separate utterances. One final note here is that this study looks only at coreference between nounphrases (see below for discussion of coreference mark-up in the corpus). Thus such things as event references as in John secretly pinched Matt but the teacher saw it are not included. It is doubtful that this exclusion has much effect on the overall results since there were only a handful of such cases in the corpus. 3.2. Syntactic information The syntactic role of each argument was marked as “subject”, “object”, or “oblique”. Any other nodes which were not arguments of a verb were marked as “none” (i.e., not subject, object, or oblique). In each clause, the nearest node preceding the was marked as the subject; the nearest node following the but not immediately preceded by a preposition was marked as the object (so-called double-object constructions like give Mark the pen were marked with two objects); and any node immediately preceded by a preposition was marked as an oblique. Thus, (6) was tagged as in (7). (6) (7)
Ken threw the frisbee to Jaime. <s> Ken threw the frisbee to Jaime .
86 3.3.
Ralph L. Rose
Semantic information
The semantic role of each argument was marked with respect to two semantic systems: the FrameNet (Baker, Fillmore, and Lowe 1998) system of frames and elements and the Proto-role entailments of Dowty (1991). Here, I briefly explain each of these. 3.3.1. FrameNet Based on the Frame Semantics of Fillmore (1968, 1976), the FrameNet system defines a large number of conceptual frames (e.g., intentionally affect, transitive action), each of which incorporates a set of frame elements (i.e., thematic roles: agent, patient, etc.) which participate in that frame. Each frame encompasses a number of lexical items which invoke that frame and therefore define the particular roles that the arguments of each item play. For instance, the verb throw invokes the cause_motion frame and therefore takes several participants including an agent, a theme, and a goal. In the present study, for each , the semantic role of each argument of that verb was determined by consulting the FrameNet database for the frame which encompassed that verb and then assigning the respective frame element labels to the nodes. If a verb was not in the FrameNet database, then the database was searched for a suitable alternative (e.g., via synonym or hypernym relations). Thus, the sentence in (6) was tagged as in (8). (8)
<s> Ken threw the frisbee to Jaime .
3.3.2. Proto-roles Dowty (1991) proposes an alternative view of the linking between lexical conceptual structure and syntax through semantic entailments placed on arguments by a verb. He posits two sets of Proto-role entailments as in (9). (9)
Proto-Agent entailments – sentience
Joint value of syntactic and semantic prominence
87
– volition – cause event or change-of-state – undergo movement Proto-Patient entailments – undergo change-of-state – causally affected – incremental theme – stationary Under Dowty’s theory, arguments of a verb may carry any number of these entailments. A selection principle then determines that the argument which carries the most Proto-Agent entailments becomes the surface subject. The remaining argument with the most Proto-Patient entailments becomes the object. Any other arguments become obliques. It is important to notice then that under this system, arguments may take on the Proto-Agent or Proto-Patient roles in varying degrees. With one verb, the argument realized as subject may carry all four Proto-Agent entailments while with another verb, the argument realized as subject may carry only one or two. Furthermore, some crossover between the roles is possible: An argument realized as a subject may carry some Proto-Patient entailments while an argument realized as an object may carry some Proto-Agent entailments. In the corpus, Proto-role entailments for every argument were marked. The entailments associated with any particular verb were determined using a series of linguistic tests described in Rose (2005). Thus, (6) was marked as in (10). (10)
<s> Ken threw the frisbee to Jaime .
3.3.3. FrameNet vs. Proto-roles The two different semantic systems used in this study provide an interesting contrast. In Frame Semantics, upon which FrameNet is based, case roles are
88
Ralph L. Rose
seen as derived from primitive, psychologically real semantic concepts (Fillmore 1968). Proto-roles, on the other hand, are seen merely as labels for flexible configurations of semantic entailments (Dowty 1991). If one or the other of these two views could be shown as more closely linked to salience, this may suggest different things about the nature of salience. For instance, if the FrameNet approach can be shown to be better, this may suggest an interesting link between salience and semantic primitives via the roles that entities are seen to play in conceptual frames. 3.4.
Coreference information
In order to be able to examine coreference relationships across adjacent utterances, every referential noun phrase (i.e., excluding such things as expletive it) was marked with an identifier string. Within any given text, all noun-phrases which were interpreted as referring to the same real-word referent were given the same identifier. Thus, (11) was marked as shown in (12). (11) (12)
Louis watched a ballerina. She was graceful. <s> Louis watched a ballerina . <s> <s> She was graceful .
Coreference was determined as cases where (in the coder’s opinion) the author intended for two noun phrases to have the same extensional meaning (and intended the reader to make the same interpretation). Anaphoric dependence was not a sufficient cause to determine coreference. Hence, in Although John saw the students raise their hands, his remained down, although his is anaphorically dependent on their hands, these phrases were not taken as coreferent because
Joint value of syntactic and semantic prominence
89
they have different extensional meanings. Another potential difficulty in marking coreference is the possibility of ambiguous coreference. In this corpus, there were surprisingly few cases of ambiguous coreference. As a result, this was one complication that did not need to be dealt with. 3.5.
Notes on analysis
The results given in Section 5 are based on an analysis which takes each utterance as a whole, intact unit consisting of a list of all the unique discourse referents realized within the boundary of that utterance. Furthermore, the syntactic and semantic information attached to each discourse referent is the cumulative information for that referent within that utterance. This procedure has the advantage of enriching the data set by allowing the syntactic and semantic roles of referents in embedded clauses to be included (rather than only those in the matrix clause). However, it does lead to some other difficulties in the analysis which will be discussed in greater detail in Section 5. 4.
Information Theory
The corpus analysis which follows makes use of one fundamental concept in Information Theory (Shannon 1948): the value of information (hereafter, EIV). EIV is based on the entropy, H – an estimate of the uncertainty of the outcome – of a given probability space. H for a probability space with N possible outcomes can be calculated as shown in (13) where P(n) is the probability of the n-th outcome. N
(13)
− ∑ P(n) · log2 P(n) n=1
For a given question in which all possible outcomes are equally likely (e.g., the flip of a fair coin), the entropy is very high. However, if we learn some information, x, that causes one outcome to be far more likely to occur, then our uncertainty will decrease: H will be reduced. The amount of entropy reduction as a result of learning x, Hr (x), is thus calculated as the difference between the initial entropy, H, and the conditional entropy H (x) (i.e., H given x).2 To illustrate, consider the following problem: If I open a novel to a random page and point to a random letter on the page, what is the probability, P, that the letter is “u”? Without any other information, P is simply the prior probability of 2. The value amounting to the reduction in entropy has also been referred as entropy value (van Rooy 2004).
90
Ralph L. Rose
the occurrence of “u” in the language as a whole. Using this prior probability we could calculate the entropy, H , of the problem. However, imagine we learn that the preceding letter is “q”. Then we can be much more certain that the letter in question is “u”. Thus, the conditional entropy, H (“q”), will be less – a reduction in entropy. Entropy reduction may be either positive or negative: learning that x is true may make us more certain while learning that x is false may make us less certain about some outcome. It is therefore useful to calculate the value of learning whether or not x is true. In other words, it is useful to know what the overall value of asking the question of whether x is true or false is. In Information Theory, this value is estimated as the weighted sum of the entropy reductions for all possible outcomes of x (here, true or false). This value is known as the estimated information value, EIV. Formally, the EIV of learning whether or not x is calculated using the formula shown in (14), where P(x) is the prior probability of the occurrence of x. (14)
EIV(x) = P(x) · Hr (x) + P(¬ x) · Hr (¬x)
A good illustration of information value comes from the game “Who am I?” in which one person pretends to be some famous person and others must ask yes/no questions to find the identity of the person. In this scenario, what is an informative (i.e., having a large information value) first question assuming that there is no bias in the choice of famous person? One candidate would be Are you a male/female? In this case, both terms in the sum of (14) will be at a maximum and thus EIV will be large. However, a question like Are you Albert Einstein? will be much less informative: While the entropy reduction if the answer is yes, Hr (x), is large, the probability the answer is yes, P(x), is very small. If the answer is no then the converse is true. Thus both terms in the sum of (14) will be small and EIV will be small. Of course, if after several questions we have learned that the mystery person is male, is a scientist, lived in the 20th century, and won a Nobel Prize, then the EIV would be much larger. In the present study, I am investigating the information value of syntactic and semantic prominence toward determining the salience of discourse referents. This is done by asking, for example, the following question: What is the information value of learning whether or not a particular discourse referent was a subject to the probability of its being pronominalized in subsequent reference? This information value, EIV(subject), can be calculated using the formulas above. Likewise, the information values for the other syntactic and semantic features can be calculated. Finally, I will calculate the net information value, EIVtot , for syntactic prominence as the total of the EIVs for the various syntac-
Joint value of syntactic and semantic prominence
91
tic features (i.e., EIV(subject), EIV(object), etc.). Similarly, I will calculate the EIVtot for semantic prominence as the total of the EIVs for the various semantic features. Therefore, the central question becomes whether either information about syntactic prominence or semantic prominence is more informative (i.e., larger EIVtot ) than the other or if they are equally informative. A second question is whether syntactic and semantic information together is more informative than either is alone. These two questions are formally summarized in (15)–(16). (15)
Is the syntactic prominence EIVtot greater than, equal to, or less than the semantic prominence EIVtot ?
(16)
Is the joint syntactic and semantic prominence EIVtot greater than either the syntactic or semantic prominence EIVtot ?
With respect to (15), if results show that syntactic and semantic prominence are equally informative, then another question may be posed: Are syntactic and semantic prominence redundant with each other or are they at least somewhat independent but equally informative? An answer to this question may be found by looking at the answer to (16). If the joint information value is higher than either is alone, then they cannot be redundant and must therefore be independent.
5.
Results and Discussion
In the corpus there are 291 cases of inter-utterance coreference. In 224 (77%) of these coreference cases, the coreferent noun phrase in the latter utterance is pronominalized. Thus, the entropy of pronominalization is calculated as shown in (17) where P(pro) is the probability of pronominalization. (17)
H = −[P(pro) · log2 P(pro) + P(¬ pro) · log2 P(¬ pro)] H = −[224/291 · log2 (224/291) + 67/29] · log2 (67/291)] H = 0.778
This value serves as the baseline for entropy reduction: How much is entropy reduced from H = 0.778 by learning some information about syntactic or semantic prominence? In this section, I will present these results along with some interleaved discussion. However, before presenting the results, it is necessary to deal with one complication. The referents in the current context may have been realized in multiple syntactic positions and semantic roles. For instance, in (18), as a verbal argument, john has been realized as a subject and an object, an experiencer and a recipient, and carries the entailments sentience, volition, and stationary.
92 (18)
Ralph L. Rose
<s> John wants his father to give him a bicycle .
In short, there is an overlap of information caused by such co-occurrences within an utterance. This generates such questions as how or whether these occurrences should be handled in the analysis (e.g., should a doubly-realized referent be treated differently from a singly-realized referent?) Accounting for this requires a sophisticated mathematical model. For the present research, I will therefore make certain simplifying assumptions about syntactic prominence and the two semantic prominence approaches. These assumptions will be clarified in greater detail in the respective sections below. 5.1.
Syntactic Prominence
For syntactic prominence information, I assume that for any given referent, the role highest on the syntactic hierarchy shown in (2) determines that referent’s salience. Thus, a referent realized as both a subject and an object within an utterance would be regarded as having its salience determined by its status as a subject for that utterance. Given this, the results shown in Table 1 indicate that learning that a referent was realized as a subject is much more informative than learning it was realized in any other role about whether or not subsequent reference to that referent will be pronominalized or not. The result that the information value of subject-hood is much higher than that of other syntactic roles is especially interesting in that it resembles the binary nature of many information-packaging theories (e.g., topic-comment in Gundel 1974; topic-focus in Sgall 1967; focus-ground in Vallduví 1990). The
Joint value of syntactic and semantic prominence
93
Table 1. Information Value of Syntactic Prominence x subject object oblique none EIVtot
EIV(x) 0.059 0.021 0.010 0.011 0.101
concept of the value of information may provide a useful method for quantifying these theories. It should be noted here that there is some evidence (Miltsaki 2003) that referents introduced in main clauses are more salient than those introduced in subordinate clauses regardless of grammatical role. In the present study, the syntactic information was analyzed with respect to a hierarchical approach that distinguishes the prominence of referents according to level of embedding. However, the information value of syntactic prominence under hierarchical marking was only EIVtot = 0.060. Thus, for expository reasons, the details of this approach were excluded from the present paper, but can be seen in Rose (2005). 5.2.
Semantic Prominence
5.2.1. FrameNet Roles In the corpus, 158 different frame elements occur. An exhaustive treatment of these elements is beyond the scope of this paper and is also unwarranted because many elements have only one or two occurrences. Therefore, I collapsed these elements into seven groups as shown in (19). Each group is shown with a word that briefly describes the central property of the elements in that group as well as some examples of elements in that group. (19)
1. agentivity: agent, deformer, driver 2. perception: cognizer, experiencer 3. movement: theme, impactor, message 4. affected: created entity, victim 5. movement parameters: direction, ground 6. events: activity, event 7. other: specifier, none
While groups 1–6 include elements defined in the FrameNet system, group 7 includes ad hoc labels assigned to noun phrases in roles not defined under
94
Ralph L. Rose
FrameNet. This included genitive noun phrases (e.g., a teacher in a teacher’s chair) and arguments in copular constructions (e.g., John is a teacher). The ordering of the groups shown in (19) parallels orderings given in thematic hierarchies proposed in the literature on syntactic linking theories (cf., Dorr, Habash, and Traum 1998; Jackendoff 1972, 1990; Speas 1990). Similar to the simplifying technique for syntactic information above, for a given referent, the one of its semantic roles which is highest on this hierarchy is regarded as the role which determines the salience of that referent. Thus, a referent realized as both a cognizer and a victim in the same utterance would be regarded as having its salience determined by its role as a cognizer–a group 2 element. Based on this simplification, the results are shown in Table 2. Table 2. Information Value of Semantic Prominence via FrameNet Roles group
EIV(group)
1 2 3 4 5 6 7
0.013 0.045 0.012 0.002 0.005 0.004 0.019
EIVtot
0.101
Three results are notable. First, it is interesting that the perception roles in group 2 are more informative than the agentive roles in group 1, in spite of the fact that agentive roles are usually posited to be highest on many thematic hierarchies. This suggests that sentience is more important to the salience of entities evoked in a discourse than agentivity. This would seem to parallel other results showing the importance of animacy to the salience of discourse entities (Prat-Sala and Branigan 1999). The second interesting result is that the total information value of semantic prominence under FrameNet is equal to that of syntactic prominence. I will discuss the implications of this below. Before that, I present the results for the other semantic prominence approach used in this study. The third result that requires some discussion is the moderately high EIV for the group 7 roles. As noted above, these are roles that fall outside the scope of the FrameNet system. A cursory examination of these cases in the corpus shows that many are instances where a particular referent is being retained across a sequence of utterances (in a manner not unlike the retain transition in Center-
Joint value of syntactic and semantic prominence
95
ing Theory of Grosz, Joshi, and Weinstein 1995) by being placed in a genitive construction (e.g., John likes apples. His mother does too. So, he bought her a whole bushel.). While interesting, these cases are not particularly relevant here to the question of how semantic information influences salience in terms of the FrameNet system and are therefore not discussed further. 5.2.2. Proto-roles A particular discourse referent may carry more than one Proto-role entailment. In order to avoid the overlap problems that this generates in the present analysis, I use a simple transformation. For each referent, I calculate a parameter I call Proto-Agency as the total number of (unique) Proto-Agent entailments on that referent minus the total number of Proto-Patient entailments. Thus, ProtoAgency ranges in integer values from +4 to −4 (although in this corpus, there were no instances of −4). For example, a referent which carries, say, three Proto-Agent entailments and one Proto-Patient entailment within a single utterance would be regarded as having a Proto-Agency value of 3 − 1 = 2. In short, then, Proto-Agency might be regarded as a measure of how agent-like (in Dowtian terms) a particular referent is entailed to be: The higher the value, the more agent-like the referent is. Under this transformation, the results are as shown in Table 3. Table 3. Information Value of Semantic Prominence via Proto-role Entailments Proto-Agency +4 +3 +2 +1 0 −1 −2 −3 −4 EIVtot
EIV(Proto-Agency) 0.000 0.001 0.045 0.005 0.045 0.000 0.001 0.002 *** 0.098
Results here show that semantic prominence with respect to Proto-roles is comparably informative to semantic prominence with respect to FrameNet as well as to syntactic prominence.
96 5.3.
Ralph L. Rose
Joint Information Value
While the above results have looked at the information value of learning about the syntactic or semantic prominence of a referent (e.g., learning that it was realized as a subject or as a group 1 FrameNet element or with a Proto-Agency of +4 or so on), in this section, I look at the value of learning some joint information. That is, what is the value of learning that a referent was realized as, say, a subject and a group 1 role? The fact that the EIVtot values of syntactic and semantic prominence are essentially equal suggests that either they are essentially redundant or that they are at least somewhat independent, but comparably informative. If the former is the case, then the joint information value should be no different than that of each alone. However, if there is some independence between the two pieces of information, then the joint information value may increase. The joint information value of syntactic and semantic prominence was calculated by crossing the four syntactic roles against the seven FrameNet groups or the nine levels of Proto-Agency, and then calculating the EIV for each of the pairings (e.g., subject/group 1, subject/group 2, etc.). The total information value, EIVtot was then calculated as the total of these individual EIVs. The final results are shown in Table 4. Table 4. Joint Information Value of Syntactic and Semantic Prominence EIVtot syntactic role × FrameNet group syntactic role × Proto-Agency
0.165 0.141
The joint information value of syntactic and semantic prominence is higher than that of either factor alone. Thus, the results suggest that syntactic and semantic prominence are not redundant with each other and that each provides at least some unique information with respect to the pronominalization of subsequent reference. This conclusion should be regarded as tentative, however, because the differences noted above are not statistically confirmed. One attempt to evaluate the strength of these findings used a boot-strapping procedure in which the original sample of inter-utterance coreference instances was resampled with replacement 10,000 times. Under this procedure, the differences between the joint EIVs and the individual EIVs were not shown to be significant. However, this procedure is suspect because measures of skewness and kurtosis show that the bootstrap distribution is non-normal. It is clear that a larger data set will be required to confirm this conclusion.
Joint value of syntactic and semantic prominence
97
One interesting result here, though, is the fact that the Proto-role information is not quite as informative as the FrameNet group information when taken together with syntactic role. This, however, could be a by-product of the transformation on the Proto-role information described in Section 5.2.2. This transformation is a mathematical convenience and glosses over semantic distinctions between the various entailments. Perhaps a more sophisticated transformation would result in a greater information value.
6.
General Discussion
Under the discourse model presented above, the results presented in this corpus analysis suggest that the salience of referents in a discourse is influenced by both syntactic and semantic information: Taking both into account results in greater predictive ability for the form of subsequent reference. These results are thus in line with a view of discourse processing in which salience represents information about discourse structure: the more salient a referent is in the current context, the greater the information value about the structure of subsequent discourse, particularly the form of referring expressions. Information Theory thus potentially offers another view of the relative value of the different factors known to affect discourse salience and may provide another means by which to narrow down on which factors are most crucial. The idea that syntactic and semantic information seem to be at least partly independent in their influence on salience suggest that models of discourse salience may benefit by including some account of semantic information as distinct from syntactic information. This is especially relevant to modular approaches in which one module is responsible for structure while an independent module is responsible for interpretation. The results here may be relevant for determining how these modules interact for the purpose of determining salience. The improvement in the joint information value suggests that computational implementations of discourse salience models (e.g., parameterized systems, such as the Mental Salience Framework; Chiarcos, this volume) might see some performance improvement by the inclusion of semantic prominence information. If the assumption that salience is a core notion common to both speaker and hearer is correct, then the present results would indicate that pronoun resolution algorithms might also benefit from the inclusion of semantic prominence as a contributing factor. In this study, two different semantic systems were employed to evaluate semantic prominence. The joint information values suggest that the FrameNet
98
Ralph L. Rose
system may be more informative than the Proto-role system. However, as noted above, this difference may not be real. If it is real, then an interesting line of future investigation would be to look more closely at the relationship between salience and the notion of primitive semantic roles as assumed in Frame Semantics. On the other hand, if the difference between the two systems turns out not to be real, then there is a practical conclusion to make: Technologically speaking, the Proto-role system is less cumbersome than the vast network of frames and roles in FrameNet and therefore may be more efficient in the implementation of mechanisms for discourse processing and salience. Throughout this study, I have compared the semantic prominence hierarchy to thematic hierarchies used in syntactic-semantic linking theories. While it would be very interesting if it were to turn out that these hierarchies are parallel, there is no theoretical reason to presume that this is so. These two different research areas use these hierarchies for completely different reasons so it would not be problematic for the present study if the semantic prominence hierarchy is different. In fact, the evidence from corpus analysis suggests that this might be the case: In particular, it seems that roles entailing sentience (e.g., cognizer, experiencer) are higher on the hierarchy than roles entailing agency. Hence, validation of the semantic prominence hierarchy is required.3 The results presented in this corpus analysis are perhaps somewhat premature to conclude firmly that syntactic and semantic information contribute independently to salience (though see Rose 2005, 2006 for converging evidence from a series of psycholinguistic experiments using different experimental paradigms). One possibility that needs to be investigated is whether or not semantic prominence has a broad effect across the whole range of linguistic items or whether only certain verbs (or certain classes of verbs) influence the salience of their arguments via semantic information. Another angle for further investigation of the role of syntactic and semantic prominence along the lines presented here might include looking at different languages. In English – the language used in this study – syntactic and semantic role is often conflated as noted early in this paper. However, in languages where word order is more free such as Spanish or Japanese, the distinction between
3. An interesting side note is a consideration of the possible connection between the semantic prominence hierarchy and the relative prominence of referents that results as readers construct a mental simulation (see Claus, this volume) of the ongoing discourse. In an extreme view, these could be seen as one and the same thing. Further investigation is warranted to verify whether they are the same and if not, how they differ.
Joint value of syntactic and semantic prominence
99
syntactic and semantic prominence may be easier to observe.4 Such work may provide a clearer view of the degree to which syntactic and semantic prominence each determine the salience of discourse referents. 7.
Conclusion
In this paper, I have sought to compare the relative information value of syntactic and semantic prominence to the salience of discourse referents. Results show that both contribute to salience. On its face, the fact that semantic prominence contributes to salience is perhaps unsurprising. However, this research has also shown ways that semantic prominence can be computed via traditional thematic role labels or via Dowtian Proto-role entailment configurations. The results further suggest how semantic prominence information may be combined with syntactic prominence information to yield more informative measures of salience. Acknowledgments The work presented here is based on my Ph.D. dissertation. I am indebted to Stefan Kaufmann, my adviser, and Michael Dickey for discussions about the theoretical background and data analysis. I am also grateful to two anonymous reviewers for helpful comments on earlier drafts of this paper. References Almor, Amit 1999
Noun-phrase anaphora and focus: The informational load hypothesis. Psychological Review 106: 748–765.
Almor, Amit and Eimas, Peter 2008 Focus and noun phrase anaphors in spoken language comprehension. Language and Cognitive Processes, 23: 201—225. Ariel, Mira 1988
Referring and accessibility. Journal of Linguistics 24: 65–87.
Baker, Collin, Fillmore, Charles, and Lowe, John 1998 The Berkeley FrameNet project. Proceedings of the 17th International Conference on Computational Linguistics, 86–90.
4. Another language in which syntactic and semantic prominence effects may be more easily delineated is Eastern Khanty (Filchenko, this volume), in which agentdemoting constructions differ minimally from canonical agent constructions.
100
Ralph L. Rose
Blutner, Reinhard 1998 Lexical pragmatics. Journal of Semantics 15: 115–162. Blutner, Reinhard 2000 Some aspects of optimality in natural language interpretation. Journal of Semantics 17: 189–216. Brown, Cheryl 1983
Topic continuity in written English narrative. In: Givón, Talmy (ed.), Topic Continuity in Discourse: A Quantitative Cross-Language Study, 313–342. Amsterdam: John Benjamins.
Dorr, Bonnie, Habash, Nizar, and Traum, David 1998 A thematic hierarchy for efficient generation from lexical-conceptual structure. Proceedings of the Third Conference of the Association for Machine Translation in the Americas, 333–343. Dowty, David 1991
Thematic proto-roles and argument selection. Language, 67: 547– 619.
Fillmore, Charles 1968 The case for case. In: Bach, Emmon and Harms, Robert (eds.), Universals in Linguistic Theory, 1–90. New York: Holt, Rhinehart and Winston. Fillmore, Charles 1976 Frame semantics and the nature of language. Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, 280: 20–32. Gernsbacher, Morton Ann and Hargreaves, David 1988 Accessing sentence participants: The advantage of first mention. Journal of Memory and Language, 27: 699–717. Gordon, Peter, Grosz, Barbara, and Gilliom, Laura 1993 Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17: 311–347. Gordon, Peter and Hendrick, R. 1997 Intuitive knowledge of linguistic co-reference. Cognition, 62: 325– 370. Gordon, Peter and Hendrick, R. 1998 The representation and processing of coreference in discourse. Cognitive Science, 22: 389–424. Grosz, Barbara, Joshi, Aravind, and Weinstein, Scott 1995 Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21: 203–225.
Joint value of syntactic and semantic prominence
101
Gundel, Jeanette 1974 The role of topic and comment in linguistic theory. Ph.D. dissertation, University of Texas, Austin. Gundel, Jeanette, Hedberg, Nancy, and Zacharski, Ron 1993 Cognitive status and the form of referring expressions. Language, 69: 274–307. Heim, Irene 1982 Heim, Irene 1983
Hirst, Graeme 1981
The semantics of definite and indefinite noun phrases. Ph.D. dissertation, University of Massachusetts, Amherst. File change semantics and the familiarity theory of definiteness. In: Bäuerle, Rainer, Schwarze, Christoph and von Stechow, Arnim (eds.), Meaning, Use, and Interpretation of Language, 164–189. Berlin: W. DeGruyter. Anaphora in Natural Language Understanding: A Survey. Berlin: Springer-Verlag.
Hudson-D’Zmura, Susan and Tanenhaus, Michael 1997 Assigning antecedents to ambiguous pronouns: The role of the center of attention as the default assignment. In: Walker, Marilyn, Joshi, Aravind, and Prince, Ellen (eds.), Centering Theory in Discourse, 199– 226. Oxford, UK: Clarendon Press. Jackendoff, Ray 1972 Semantic Interpretation in Generative Grammar. Cambridge, Massachusets: MIT Press. Jackendoff, Ray 1990 Semantic Structures. Cambridge, Massachusetts: MIT Press. Kamp, Hans and Reyle, Uwe 1993 From Discourse to Logic. Dordrecht: Kluwer Academic. Karttunen, Lauri 1976 Discourse referents. In: McCawley, James, (ed.), Syntax and Semantics, Vol. 7: Notes from the Linguistic Underground, 363–385. New York: Academic Press. Kehler, Andrew 2002 Coherence, Reference, and the Theory of Grammar. Stanford University, CA: CSLI Publications. Lappin, Shalom and Leass, Herbert 1994 An algorithm for pronominal anaphora resolution. Computational Linguistics, 20: 535–561.
102
Ralph L. Rose
Mathews, Alison and Chodorow, Martin 1988 Pronoun resolution in two-clause sentences: Effects of ambiguity, antecedent location, and depth of embedding. Journal of Memory and Language, 27: 245–260. Miltsakaki, Eleni 2003 The syntax-discourse interface: Effects of the main-subordinate distinction on attention structure. Ph.D. dissertation, University of Pennsylvania. Miltsakaki, Eleni 2007 A rethink of the relationship between salience and anaphora resolution. Proceedings of the 6th Discourse Anaphora and Anaphor Resolution Colloquium, 91–96. Prat-Sala, Mercè and Branigan, Holly 1999 Discourse constraints on syntactic processing in language production: A cross-linguistic study in English and Spanish. Journal of Memory and Language, 42: 168–182. Prince, Ellen 1986
Rose, Ralph 2005
Rose, Ralph 2006
Sgall, Petr 1967
On the syntactic marking of presupposed open propositions. Papers from the Parasession on Pragmatics and Grammatical Theory, 22nd Regional Meeting, Chicago Linguistic Society, 208–222, Chicago, Illinois: University of Chicago. The relative contribution of syntactic and semantic prominence to the salience of discourse entities. Ph.D. dissertation, Department of Linguistics, Northwestern University. Evidence for gradient salience: What happens with competing nonsalient referents during pronoun resolution. Proceedings of the Australian Language Technology Workshop, 91–98. Functional sentence perspective in a generative description. Prague Studies in Mathematical Linguistics, 2: 203–225.
Shannon, Claude 1948 A mathematical theory of communication. The Bell System Technical Journal, 27: 379–423, 623–656. Speas, Margaret 1990 Phrase Structure in Natural Language. Dordrecht, Netherlands: Kluwer Academic Publishers.
Joint value of syntactic and semantic prominence
103
Stevenson, Rosemary, Knott, Alistair, Oberlander, Jon and McDonald, Sharon. 2000 Interpreting pronouns and connectives: Interactions among focusing, thematic roles and coherence relations. Language and Cognitive Processes, 15: 225–262. Vallduví, Enric 1990
The informational component. Ph.D. dissertation, University of Pennsylvania.
van Rooy, Robert 2004 Relevance and bidirectional optimality theory. In: Blutner, Reinhard and Zeevat, Henk (eds.), Optimality Theory and Pragmatics, 173– 210. Oxford, UK: Palgrave Macmillan.
The Mental Salience Framework: Context-adequate generation of referring expressions Christian Chiarcos
Abstract. Here, a general architecture for mechanisms of attention control in discourse is suggested, based on a meta-theoretic notion of salience. I consider two dimensions of salience, hearer salience indicating the status quo or “givenness” of an entity, and speaker salience underlying the attempts of the speaker to manipulate this status. This framework is applied to three information-packaging phenomena, choice of referring expressions, word-order preferences, and assignment of grammatical roles. The adequacy of this proposal is illustrated by providing reconstructions of two theories, Givón’s topicality approach and different instantiations of Centering. The proof for Centeringadequacy is sketched, and the framework is compared to related proposals. As a result, a parameterized architecture for modeling linguistic variability in discourse is presented. It provides a powerful, simple and intuitive mechanism to integrate cognitive-pragmatic aspects of coding preferences in the field of natural language generation (NLG).
1.
Mechanisms of attention control
It had been noticed very early that the choice among syntactically well-formed expressions is by no means determined by semantic constraints alone. Consider the following classical text (Grosz et al. 1995, ex. (5), shortened) with possible truth-semantically equivalent variations as illustrated in (4’), (1’) and (5’): (1)
Terryte really goofs sometimes.
(2)
Hete wanted Tonyto to join him on a sailing expedition.
(3)
Hete called himto at 6AM.
(4)
Tonyto was sick and furious at being woken up so early.
(5)
Heto told Terryte to get lost and hung up.
(6)
Of course, Terryte hadn’t intended to upset Tonyto .
106
Christian Chiarcos
Here, I concentrate on referring expressions, in particular on three interacting levels of variability. Note, that for this exemplar text, the textual function of the alternatives is roughly identical to that in the original examples. choice of referring expressions (REF) (4’)
This guyto was sick and furious at being woken up so early.
assignment of word-order preferences (WO) e.g. topicalization (1’)
Sometimes, Terryte really goofs.
assignment of grammatical roles (GR) e.g. passive vs. active clauses (5’)
Terryte was told to get lost and Tonyto hung up.
Taking up popular assumptions among functional linguists, I regard these grammatical devices to serve (at least partly) as means a speaker can use to guide the hearer’s flow of attention in discourse (Chafe, 1976; Tomlin, 1995; inter alia). The flow of attention is a key mechanism controlling any kind of mental activity. It is motivated by a “bottle-neck effect”: The world surrounding us (and even our internal world) is far too rich to be realized, understood, or described as a whole. Rather, just relevant or especially significant elements are chosen to build up a finite symbolic representation describing the situation sufficiently but sparsely enough to be held in mind or to be communicated. In this view, attention selects only a small subset of the information to be processed (Chafe 1976, “center of attention”), but shifts rapidly across the scene. Thus, complex representations arise not from the current center of attention alone, but from the sequence of attention shifts as well. Applied to text production, a speaker needs to make sure that the hearer’s center (or “focus”) of attention moves along the lines he had in mind. Otherwise, the hearer cannot obtain the mental representation the speaker wants him to construct. To prevent such a failure of communication, the speaker has to be aware of the hearer’s state of mind and of the effects a given utterance might have on the hearer’s model of discourse. In cognitive Linguistics, iconic form-function mappings between mental states and grammatical devices are assumed as a basis of a general framework for mechanisms of attention control in discourse. However, these correlations have remained notoriously vague, which prohibits their practical application in the field of natural language generation (NLG). To overcome this problem, I introduce salience as a cover term of properties of mental states such as
The Mental Salience Framework
107
“discourse prominence” (Pustet, 1997), “activation” (Chafe, 1976), ”topicality” (Givón, 1983), etc., that have defined in an abstract manner only. I adopt a definition from the field of visual attention control (Koch and Itti, 2000): Salience is a situation-bound, dynamic property of entities within a mental model. Opposed to this, attention is a binary property of a selected sub-set of entities, but it tends to be attracted by a high degree of salience. Thus, attention is an epiphenomenon of salience.1 Depending on the salience-induced topological structure (ranking, order) over the entities within a (discourse) model, packaging preferences are assigned. 2.
Salience in discourse
2.1. A generalized conception of salience In linguistics, psychology, artificial intelligence and neighboring fields, different (and partly contradictive) traditions using the notion of salience evolved during the last 30 years. Two extreme bounds in the usage of this term can be seen in the discussion of focal prosody (e.g. Davis and Hirschberg, 1988) and in the discussion of referential accessibility in discourse (e.g. Sgall et al, 1986). Pitch accents mark items as intonationally prominent and convey the relative ‘newness’ or ‘salience’ of items in the discourse. (Davis and Hirschberg, 1988) … salience, [i.e.] foregrounding, or relative activation (in the sense of being immediately ‘given’, i.e. accessible in memory). (Sgall et al., 1986, p.54f.)
I take these examples to be prototypes of two different dimensions of salience corresponding to the two most elementary perspectives on information intended to be uttered: speaker salience (importance/newsworthiness) Speaker salient information is speaker-private and relevant, e.g. new for the hearer, not predictable or something the speaker wants to put special emphasis on.
1. The original definition as used by Koch and Itti is based on visual fields of neurons, i.e. on areas within a scene, not entities. Though it seems that a similar dynamic notion of salience is appropriate on a higher level of abstraction as well, researchers on the interface of visual and linguistic salience modeled it in terms of size and absolute position (Kelleher, this volume), denying the possibility of shifts of attention at all. However, it seems to be generally accepted that this static notion of salience is a heuristic approximation only, a generalization over a longer period of time describing the likelihood of an area to be salient.
108
Christian Chiarcos
hearer salience (accessibility/givenness) Hearer salient information is known and easily retrievable for the hearer. Successful communication crucially depends on the availability of both perspectives to a speaker. The aspect of speaker salience or importance is a necessary pre-condition for any conversation, as it covers a speaker’s motivations to produce an utterance. Besides this, if a speaker aims to produce text that is directed to the hearer, s/he must have some ideas about the hearer’s current attentional state, i.e., hearer salience. Assume an idealized scenario where the speaker has no specific information about the hearer’s state of mind. Then, we can characterize both dimensions of salience as follows: attention control Hearer salience reflects the current status quo, e.g. the attentional state assigned to a discourse referent. Speaker salience arises from intentions to modify this state. intentionality Speaker salience is induced by the intentions of the speaker. As no specific assumptions on the intentions of the hearers are available, hearer salience depends on contextual information available to both speaker and hearer, thus, it is a property of the common ground between them. temporal scope Due to the lack of additional information, hearer salience must be approximated from situational factors, world knowledge and the previous discourse, thus, it is “backward-looking”. As opposed to this, speaker salience or the underlying intentions affects the planning of the further discourse; so, it can be estimated heuristically from properties of the forthcoming discourse. In this respect, speaker salience is “forwardlooking”. stimulus-dependence As speaker salience arises from intentional states, it can be independent of the current situational context, whereas hearer salience is stimulus-induced. Previously, similar classifications have been proposed: cognitive vs. surface-based Pattabhiraman (1992) introduced a distinction between “canonical salience” as a property of surface forms and “instantial salience” of cognitive concepts. He presented an algorithm for the assignment of grammatical devices such that the canonical salience of the resulting expression corresponds to the instantial salience of the underlying concept as much as possible. As a consequence, canonical salience of surface forms uttered before (i.e. encoded instantial salience) and instantial salience of cognitive concepts to be uttered in the forthcoming dis-
The Mental Salience Framework
109
course can be distinguished where the former is available to both hearer and speaker but the latter is private to the speaker alone. However, he did not explore the implications of this distinction for communication in general but concentrated on the production perspective only. perspective In their application of the Prague model of salience onto dialogue, Hajiˇcová et al. (1998) proposed a distinction between two different knowledge stocks (discourse models): The individual stock of dialogue knowledge (ISDK) and the shared stock of dialogue knowledge (SSDK). Accordingly, the activation or salience of entities in the SSDK is based on the common ground between the discourse participants, whereas the “activation degrees of entities in the ISDK depend on the participant’s own attention, dialogue intentions, etc.” (p.386). Thus, it is possible to account for different uses of referring expressions according to perspectives of different discourse participants. salience indication vs. foregrounding Navaretta (2002) identified two components of salience affecting the interpretation of pronominal anaphors in Danish: givenness and explicit salience marking of antecedent (cf. the dichotomy of “inherent salience” and “imposed salience” by Mulkern 2003). While givenness (or accessibility) derives from discourse factors such as frequency and previous mention, explicit salience marking can be used to boost the salience of a referent that is not sufficiently given. Accordingly, two functions of grammatical devices can be distinguished: indication (by using canonical constructions when referring to a referent) and modification (by using special marked constructions) of the inherent salience, i.e. the attentional state a referent has for the hearer. Following a general psychological conception of salience, I consider it to be not necessarily a property of linguistic expressions and the perceived environment, but to be a general cognitive conception, thus a matter of the perception of situational factors, of the interpretation of linguistic cues, and of the mental representation of intentional and emotional states. Especially, salience is a necessary condition for shifts of attention as described above. 2.2. Phenomenology Here, I focus on three phenomena: Choice of referring expressions (REF), assignment of grammatical roles (GR) and word order effects (WO). Salience is defined in terms of givenness or accessibility, it has been frequently remarked that the more salient a discourse referent is, the less complex, the less semantically rich and the less emphasized referring expressions
110
Christian Chiarcos
are expected. Especially, pronouns are expected to denote more salient referents than full descriptions. Similarly, salience rankings of grammatical roles (Grosz et al., 1995; Givón, 2001) follow a conventional hierarchy of markedness with subject (nominative case) being more frequent and less phonologically complex than direct object, etc. Following Givón (1995, p.25-69), markedness is defined in terms of complexity and non-conventionality (inverse frequency). So, for REF and GR, an iconic mapping can be assumed correlating surface or empirical measures of markedness with the underlying degree of relative salience: The more salient a discourse referent is, the less marked it is expected to be encoded. The third dimension under consideration is word order. Both assignment of grammatical roles and choice of referring expressions are entangled with word order preferences. Generally, less complex forms (e.g. pronouns) tend to precede more complex forms (Hawkins, 1992), and subjects tend to precede other grammatical roles (Greenberg, 1963). To integrate these tendencies into the iconicity principle, a gradient increase of markedness (and a decrease of underlying salience) is assumed along with the sequential order of elements within a clause from left to right (Sgall et al., 1986). With this hypothesis, a unified model for the planning of the choice of referring expressions, grammatical roles and word-order preferences can be developed with iconic mappings as illustrated in Fig. 1. markedness REF WO GR salience
− pronoun left-peripheral subject +
+ full description right-peripheral non-subject −
Figure 1. Simplified markedness hierarchies and salience.
The notion of salience as defined here involves two different temporal or participant perspectives, and indeed, similar claims on multi-dimensionality have been made for the levels of referring expressions (cf. deviations from iconic mapping; Ariel, 1990, p. 191ff.) and grammatical roles (cf. the aspects of “discourse prominence”: indicating a horizon and fore-/backgrounding; Pustet, 1997). With respect to word-order preferences, multiple factors have been considered related to both dimensions of salience. On the one hand, it is claimed that given ele-
The Mental Salience Framework
111
ments tend to precede new elements (Sgall et al., 1986), on the other one, it has been shown that – at least for German – this tendency is not absolute (Weber and Müller, 2004). Instead, Strube and Hahn (1999) suggested that relative ordering has an effect on the accessibility in forthcoming discourse, is it thus a device of foregrounding similar to grammatical roles. 2.3. A general framework I suggest that from the interaction of two dimensions of salience, coding preferences can be predicted. A generalized framework is sketched that allows for the reconstruction of two major theories of referential coherence in discourse: Givón’s (1983; 2001) topicality approach and Centering (Grosz et al., 1995). Both approaches distinguish two perspectives on discourse, a backward-looking/ anaphoric aspect on the one hand, and a forward-looking/cataphoric aspect on the other hand, that can be related to hearer salience and speaker salience respectively. Generalizing over the observations made in the last two sections, I propose the following characteristics for an operationalizable framework of attention control and referential coherence in discourse: – Salience induces a ranking over entities within a mental models, e.g., a discourse model. Here, I distinguish • hearer salience, i.e., the degree of attention/prominence a speaker assumes that a hearer assigns to a given discourse entity, and • speaker salience, i.e., the degree of attention/prominence/emphasis a speaker puts on an entity. – The prototypical function of hearer salience is the indication of the (assumed) degree of attention that a referent is assigned according to situation and previous discourse. – The prototypical function of speaker salience is to announce shifts of attention, it is thus sensitive to speaker-private knowledge and properties of the subsequent discourse. – Hearer salience and speaker salience interact and are mapped iconically onto grammatical devices according to underlying markedness hierarchies, cf. Pattabhiraman’s (1992) mapping from instantial onto canonical salience. – The parameters, hearer salience, speaker salience, and packaging preferences can be represented by numerical scores, with salience scores and packaging preferences defined as normalization of the weighted sum (linear combination) of parameter values of hearer and speaker salience respectively. Then, different configurations can be implemented by the assignment of different weights.
112
Christian Chiarcos
– Both grammar-dependent parameter values and preference deduction are based on markedness hierarchies. For presentational purposes, salience scores are modeled as real numbers from the scale [0:1] with 0 representing the lowest degree of salience, and 1 the highest degree. As general form for salience scores, the following standard representation is proposed: 1 (1) sal(r) = 1 + ∑ i wi xi (r) In eq. (1), xi denotes the value of the i-th salience factor for the referring expression r in the actual utterance and wi the corresponding weighting. This set of assumptions constitutes the Mental Salience Framework.
Figure 2. A minimal parameterized framework, schematically.
3.
Adequacy
The proposal sketched in Section 2.3 is a meta-theoretical abstraction. Thus, I do not claim that the framework is cognitively valid, but just to be adequate with respect to existing theories that provide independent empirical evidence. This adequacy claim is justified by the reconstruction of two leading theories.
The Mental Salience Framework
113
3.1. Reconstructing Centering Theory Centering Theory is a model of local discourse coherence that defines relationships between centers (referring expressions) in subsequent utterances, often applied with special emphasis on its effect on pronominalization and anaphor resolution. In Canonical Centering Theory (CCT) (Grosz et al., 1995), the entities of an utterance U n constitute the set of forward-looking centers C f (U n ) that subsumes possible antecedents for anaphoric references in the forthcoming discourse. The forward-looking centers are ordered according to their relative salience as represented by their grammatical roles (SBJ > DIR-OBJ > INDIR-OBJ > OTHER). The backward-looking centerC b (U n+1 ) of the following utterance U n+1 is then defined as the highest-ranked entity from C f (U n ). Centering Theory posits a weak constraint on the usability of pronouns: If a pronoun occurs in U n+1 , then C b (U n +1) must be pronominalized, too (“Rule 1”). Further, it states preferences for transitions between utterances based upon salience characteristics in U n and U n+1 (“Rule 2”): keep the same entity as backward-looking center in both U n and U n+1 , interpret the subject of U n as the backward-looking center of U n , and keep the subject (“preferred center”) of U n as the backwardlooking center of the following utterance U n+1 (Kibble, 2003). In Centering, two aspects of salience are distinguished: Salience of potential backward-looking centers resulting from the assignment of grammatical roles in the preceding utterance, and salience of forward-looking centers as expressed by the assignment of grammatical roles in the actual utterance. The dichotomy of two types of “centers” follows a similar criterion of temporal scope as the distinction between hearer salience and speaker salience as introduced above. Hearer salience hsal CCT (r) of a referring expression r in utterance U n (i.e. salience of r as a potential backward-looking center) can be modeled as the relative grade of the grammatical role of the antecedent of r (GRante , cf. Fig. 3) if it occurred in the directly preceding utterance U n−1 . Accordingly, speaker salience ssal CCT (r) should be predictable from pronominal references to r in the directly following utterance U n+1 (REF ana ). The restriction of the canonical model to relations between directly neighboring sentences seems to be unnatural, so it was suggested that more distant utterances contribute to the salience of a discourse referent, but to a lower degree than the last utterance. This assumption yields Left-Right-Centering (LRCT; Tetreault, 1999). To model hearer salience and speaker salience respectively, an explicit measurement of distance must be integrated. Referential distance is defined as the number of clauses between an antecedent and an anaphor (Givón, 1983), RDante
114
Christian Chiarcos
is the referential distance of an anaphoric link with the anaphor in U n , RDana is the distance of a link whose antecedent is in U n . Using this definition, hearer salience and speaker salience for LRCT can be modeled as follows:2 hsal LRCT (r) =
1 1 + RDante (r) + (1 − GRante (r))
(2)
ssal LRCT (r) =
1 1 + RDana (r) + (1 − REF ana (r))
(3)
As demanded in the preceding section, the outcome is normalized so that 1 denotes the highest possible salience score. If no antecedent (anaphor) exists, RDante (RDana ) is infinite, thus hsal LRCT (ssal LRCT ) converges against 0. For Canonical Centering, the locality constraint can be implemented by replacing RDante with 1/1/RDante . As an alternative to the canonical salience ranking, in Functional Centering (FCT, Strube and Hahn, 1999), the ordering of potential backward-looking centers is replaced by a ranking based on information status, embedding depth and relative word-order. Following Rambow’s (1993) account on Centering and word order in German, I concentrate on word order as a determinant of the ranking of forward-looking centers (WO ante ), further abbreviated as WOCT.3 hsal WO (r) =
1 1 + RDante (r) + WOante (r)
(4)
Again, speaker salience (i.e. salience of forward-looking centers) can be approximated by the choice of referring expressions for an anaphor in the following utterance (REF ana ). Then, grammatical roles and word order are predicted from the relative ranking of discourse referents according to their speaker salience, whereas referring expressions are predicted from hearer salience directly. 2. In this formalization, referential distance is the most influential factor on salience (step-width is 1), with GRante and REF ana providing minor distinctions among cases with equal distance ((1 − GRante ) < 1, (1 − REF ana ) < 1). 3. WOCT covers only one of Strube and Hahn’s (1999) original salience ranking determinants. However, it is generally assumed that word order in German (with the possible exception of the vorfeld), also reflects information status (Kruijff et al., 2001), which is consistent with the simplification of Functional Centering proposed here. For the sake of clarity, however, it should be noted that WOCT is not to be confused with Strube and Hahn’s (1999) original proposal
The Mental Salience Framework
115
Consider an utterance U n and a referring expression r with antecedent q in a preceding utterance U k and anaphor s in a subsequent utterance U l (k < n < l). properties of antecedent ∞ iff. r has no antecedent n − k − 1 else ⎧ 0 iff. r has no antecedent ⎪ ⎪ ⎪ ⎪ ⎪ iff. q is subject ⎪ ⎨1 GRante (r) = 0.9 iff. q is direct object ⎪ ⎪ ⎪ ⎪0.8 iff. q is indirect object ⎪ ⎪ ⎩ 0.7 else ⎧ ⎨0 iff. no antecedent WOante (r) = #words in U k before q ⎩ else #words in U k − #words in q RDante (r) =
properties of anaphor(s)
∞ iff. no anaphor to r exists l − n − 1 else ⎧ iff. r has no anaphor ⎪ ⎨0 REF ana (r) = 1.0 iff. s is pronominal ⎪ ⎩ 0.5 else (i.e. s is a full description) RDana (r) =
TP(r) =
#mentions of r within the next 20 utterances 20
Figure 3. Parameters considered
However, as the pronominalization rule (Rule 1) of Centering is underspecified with respect to coding decisions, a stronger formulation is needed for practical application in NLG (Kibble and Power, 2000). As referential distance lower than 1 is a necessary (but no sufficient) condition for the use of pronouns in CCT, a pronominalization threshold of 0.5 is suggested as a first approximation, i.e., if hsal(r) > 0.5, use a pronoun, unless this is prohibited by ambiguity of reference or a higher-ranked referent has been encoded as a full description already, otherwise, use a full description. 3.2. Reconstructing Topicality Following Givón (2001), topicality is a cognitive dimension that has to do with attention control mechanisms and discourse prominence. The two functional dimensions underlying topicality are anaphoric (“givenness”) and cataphoric topicality (“importance”). Heuristically, anaphoric topicality is approximated by referential distance (RD), whereas cataphoric topicality can be approximated by topic persistence (TP), i.e. the number of mentions of the referent in the following (up to) 20 clauses. In the reconstruction, hearer salience is equated with anaphoric topicality, with referential distance as its only factor, whereas speaker salience is equated with cataphoric topicality, with topic persistence as its only factor. Topic persistence is normalized. 1 (5) hsal TOP (r) = 1 + RDante (r) 1 ssal TOP (r) = (6) 1 + 1 − TP(r)
116
Christian Chiarcos
The interaction of both dimensions on the preference deduction layer seems to be complex but is not explicitly described. Instead, Givón argues that both dimensions of topicality form one single and homogeneous dimension of topicality, illustrating effects by revealing correlations between grammatical devices and topicality measures directly. Here, hearer salience is taken as the main determinant of REF (a strong correlation of pronominalization with referential distance has been proven), speaker salience is taken as the only determinant of WO (Givón claims that the impact of cataphoric topicality is greater than the impact of anaphoric topicality), but GR preferences are calculated by the interaction of both dimensions (according to Givón 2001, both factors contribute). For this combination, addition of hearer salience and speaker salience is suggested. As pronominalization threshold, assume 0.5 again. 3.3. A minimal instantiation For a minimal instantiation of the framework described above, the set of parameters as shown in Fig. 3 is considered. Topicality and the instantiations of Centering Theory can be represented by the choice of weights using these factors or parameters. Then, hearer and speaker salience are calculated as reciprocal of the weighted sum of parameter values. As result values, GR and WO preferences are assigned according to relative differences in salience scores, whereas derived REF preferences depend on absolute values (and ambiguity interference) as described above. For the purpose of illustration, consider example sentence (5). Using the parameter weights as summarized in Fig. 4, hearer and speaker salience are calculated respectively, and preferences can be derived.
parameters RDante 1 − GRante WOante RDana 1 − REF ana TP
weights for hearer salience LRCT WOCT TOP 1 1 1 1 0 0 0 1 0
weights for speaker salience LRCT/WOCT TOP
1 1 0
0 0 1
Figure 4. Reconstructing Left-Right-Centering (LRCT), Centering with word-orderbased salience ranking of forward-looking centers (WOCT), and topicality (TOP).
The Mental Salience Framework
117
Considering Terry (te), we find that RDante (te) = 1 (his last mention was in sentence (3)) and GRante (te) = 1.0 (subject in (3)). Inserting these values in equation (1) using the parameter weights for hearer salience (hsal) in LRCT reconstruction as summarized in Fig. 4, we achieve a formula identical to equation (2). Thus, hsal LRCT (te) can be calculated as 0.5. As the proposed pronominalization threshold is not met, we predict a nominal description. Accordingly, hsal CCT (te) converges against 0, thus the coding preferences in Canonical Centering would be identical. The antecedent of Terry is sentence-initial in (3), so hsal WO (te) is 0.5, too. In the topicality reconstruction, where referential distance is the only parameter of hearer salience, the same prediction is calculated, too. The corresponding parameter values for Tony (to) in (5) are: RDante (to) = 0 (last mention in (4)), GRante (to) = 1.0 (subject) and WOante (to) = 0 (sentenceinitial). So, hearer salience of Tony is calculated as 1 for LRCT, CCT, WOCT and TOP equally. This exceeds the respective pronominalization threshold. As the only possible interfering referent Terry has a sufficiently lower degree of hearer salience, no restrictions arise from ambiguity avoidance strategies. So, we can safely refer to Tony with a pronoun, just as in Grosz et al.’s original example. For speaker salience (ssal), we find anaphoric references to Terry and Tony are in the directly following utterance (RDana = 0), both with full descriptions (REF ana = 0.5), but only once in the forthcoming discourse (TP = 1/20). Thus, speaker salience is identical for both Terry and Tony, in 1 in Centering reconstructions 1 and 20 39 in topicality reconstruction. As grammatical roles resp. word order preferences are determined in Centering reconstruction by relative differences between speaker salience scores, no preferences for GR or WO can be derived here. The same is true for WO preferences in the topicality reconstruction. However, GR preferences in the topicality reconstruction are calculated from the interaction (e.g. addition) of hearer and speaker salience scores, but not from speaker salience alone. Therefore, Tony’s score (hsal TOP (to) = 1) exceeds Terry’s (hsal TOP (te) = 0.5), and we predict Tony to be preferred subject and Terry to be non-subject. In fact, the opposite decision was taken in Grosz et al.’s constructed example. However, this is very likely to be due to constraints from verbal semantics, as a more agentive realization of Tony in a sentence semantically roughly equivalent to ex. (6) would be rather odd (cf. ex. (6’)). (6’) #Of course, Tonyto has not been intended to get upset by Terryte .
118
Christian Chiarcos
Figure 5 summarizes preferences for the whole text is summarized. Besides the effects of a heuristic ambiguity resolution rule4 and partial indistinguishability, few crucial deviations from the original coding decision have been found. Here, (3) seems to be the most critical instance, where actual word order and grammatical role assignment deviate from both Centering and topicality preferences. However, the interpretation of the pronouns in (3) depends on parallelism with the previous utterance. A sentence like Heto was called (by himte ) at 6 AM. is nearly incomprehensible. It would be necessary to use a nominal such as the name for either Tony or both referents (as suggested by the theories).
Figure 5. Example: Parameter values, hearer salience (hsal), speaker salience (ssal) and coding preferences.
This short example already showed up some limitations of approaches of this kind. First, pragmatic preferences for word order, the assignment of grammatical roles, and possibly the choice of referring expressions, too, are by no means unrivaled. Rather, their application is most likely in cases where no other constraints arising from syntax (e.g. binding restrictions), semantics (e.g. valency 4. In Fig. 5, full* means to use a non-pronominal form to avoid ambiguity if the absolute salience score is sufficient for pronominalization, subject* and unspec indicate that Tony and Terry are ranked equally, deviations from original are marked bold.
The Mental Salience Framework
119
frames of applicable verbs) or higher communicative goals (e.g. to add further hearer-new information about a referent within a noun phrase) interfere. Second, the theories and the corresponding reconstructions rely on surfaceoriented heuristics that are often too coarse-grained to generate clear distinctions as shown for word-order preferences in ex. (5). Third, other factors might contribute to salience, too, such as parallelism effects and others. 4.
The Mental Salience Framework: A summary
4.1. General characteristics The Mental Salience Framework described in this paper consists of essentially three components: – differentiation between hearer salience (reference to attentional states of the hearer) and speaker salience (modification of attentional states of the hearer), – hearer salience and speaker salience are modeled as normalized linear combination of different contextual factors, and – coding preferences are traced back to the weighted sum of hearer salience and speaker salience scans. 4.2. Adequacy with respect to existing theories As the linear combination of contextual factors and the interaction between hearer salience and speaker salience involves a number of parameters, different parameter configurations can be considered, and as argued above, different variants of Centering and Givón’s approach can be reconstructed by choosing appropriate parameter values. The idea of an adequacy proof for these reconstructions (for a full proof see Chiarcos, 2009) is as follows: – Provide a definition of adequacy, i.e. all predictions of the original formulation of the theory are predicted by the reconstruction (completeness), and no prediction of the reconstruction is incompatible with the predictions of the original formulation of the theory (compatibility).5 5. This definition of adequacy is inspired by the formal definition of equivalence. However, equivalence differs from adequacy in that it requires soundness rather than compatibility. A reconstruction is sound if “all predictions of the reconstruction are predictions of the original formulation as well”. However, it has been recognized before that neither Centering Theory nor Givón’s approach are fully specified models of discourse processing: As such, Centering does not provide a model describing the cognitive underpinnings of the assignment of grammatical roles or other grammatical devices which
120
Christian Chiarcos
– Prove completeness of the reconstruction: Identify the set of empirically verifiable assumptions and predictions made in the original theory, and prove that these are predicted by the reconstruction as well. For Centering, we have to show • any difference in the ordering of two potential backward-looking centers entails a difference between the hearer salience scores of the corresponding referents (for Canonical Centering by definition, for Left-Right Centering by induction), • if one element is pronominalized, then the backward-looking center is pronominalized (proof by contradiction, assume that the backward-looking center, i.e. the most hearer-salient referent, is not pronominalized. As it is more salient than the other pronominalized element, it must exceed
indicate the ranking of forward-looking centers, and thus, it explains only the effect of these grammatical devices, but not their assignment in discourse. Nevertheless, the preference to keep the backward-looking center over a sequence of utterances (cf. the notion of “preferred center”; Strube and Hahn, 1999) can be exploited to predict the assignment of grammatical roles (Kibble and Power, 2004). It should be noted, however, that these preference are deducted from preferences of transitions between utterances only within the same discourse segment (Grosz and Sidner, 1986), and that it is not clear to what degree these preferences extend towards the discourse as a whole. Similarly, Givón’s approach involves a differentiation between anaphoric and cataphoric aspects of topicality, but he does not describe how these both dimensions are interacting in the deduction of concrete coding decisions. Therefore, like any practical application of a theoretical construct, a reconstruction within a formal framework relies on an interpretation which is maximally predictive in order to achieve concrete predictions, and thus, researchers are usually not interested in equivalent reconstructions, but in reconstructions which involve a gain in predictive power. However, such a reconstruction cannot be equivalent as it systematically violates the soundness criterion. As an example, Beaver’s (2004) equivalence proof between Centering (as formulated by Brennan et al. (1987)) and his reconstruction of Centering in Optimality Theory represents in fact a proof of adequacy, as he claims that the reconstruction Centering entails additional predictions that were not entailed from the original formulation: “This declarativity means that COT is equally suited for generation or interpretation. In contrast, the BFP algorithm is suited for interpretation only. It could not be used to generate texts directly …”. As an alternative to partial equivalence proofs as provided by Bearer, I suggest to distangle equivalence and adequacy and focus on the adequacy between original formulation and the reconstruction rather than on equivalence.
The Mental Salience Framework
121
the pronominalization threshold. As it is the most salient element, ambiguity does not require nominal realization),6 and • if one element in the following utterance is pronominalized, its grammatical role should be higher in the current utterance than those of nonpronominalized elements appearing in both clauses (proof by contradiction: in the reconstruction, grammatical roles are assigned depending on the speaker salience scores. The only factor of speaker salience in the Centering reconstruction is pronominalization in the following utterance. In the reconstruction, a violation of this preference is possible only if the semantics in the current utterance do not permit this relative ranking of grammatical roles. This can be easily contradicted by enumerating the band-width of grammatical devices which allow the pragmatically adequate generation of grammatical roles.) – Prove compatibility of the reconstruction with the original formulation: For Centering, we have to show • if two elements differ in their hearer salience scores, then the lower-ranked one must not have been more salient according to Centering. (proof by contradiction), • if a non-pronominal description is predicted by the reconstruction, then Centering must not predict pronominalization (proof by contradiction, analogous to completeness proof above), and • if the reconstruction predicts the highest possible grammatical role, Centering does predict preferred center (proof by contradiction: assume, Centering unambiguously predicts another element to be preferred center, then, it must have been the only pronoun in the following utterance, but then, it must have been the backward-looking center of the following utterance, then, it must preferably have been the preferred center of the last utterance.) For reasons of brevity, I restrict myself to this short sketch of the ideas behind the proof. Based on these considerations, however, I conclude that the reconstructions described above are adequate with respect to Centering. A similar proof for Givón’s approach can be made,7 thus proving that the respective re-
6. However, this argument is dependent on the concrete definition of ambiguity. If ambiguity is defined on morphological agreement only, then, violations of Centering predictions are possible. However, ambiguity is often resolved from verbal semantics, and thus, also these factors have to be considered. 7. For Givón, only compatibility can be proven, as Givón’s model is concerned with the analysis of empirical preferences, without specifying concrete predictions.
122
Christian Chiarcos
constructions within the Mental Salience Framework are adequate, and thus, the framework is capable to allow the reconstruction of two classical approaches. 4.3. Fields of application The description of the Mental Salience Framework provided here focused on methodological and theoretical aspects. Accordingly, only a minimal set of parameters was considered, capable to reconstruct two classical approaches. In particular, the current speaker salience metrics are incomplete: it cannot be expected that the speaker’s intentions can always be recovered from frequency measures such as topic persistence. Nevertheless important results can be achieved, as partially elaborated by Chiarcos (2009). Whereas speaker salience and hearer salience can be plausibly extrapolated from the original formulation of the theories, the derivation of concrete coding preferences is underspecified (especially for Centering), the interaction of hearer and speaker salience is not fully clear (TOP) or controversial (derivation of word-order preferences in WOCT and TOP), and the set of factors considered is incompatible. An integrated framework as suggested here can be used – to perform a comparative empirical evaluation of different theories resp. their reconstructions, – to identify elementary factors considered in different theories and investigate their respective effect on salience scores, – to evaluate hybrid or modified models by introduction or re-weighting of parameter values, – to provide further insights in the interaction between speaker salience and hearer salience based on empirical results, and finally – for practical application in natural language generation (NLG). With respect to the last point, it seems reasonable to implement speaker salience in NLG systems as an external parameter providing an interface to integrate external “importance” assignments. Such importance assignments can be used by a system designer to guide the attention of a user in a goal-directed way. Besides this, hearer salience provides a mechanism for cohesive coding decisions based on text-oriented measures. One of the most important results to be achieved in empirical research is the clarification of the interaction of hearer and speaker salience and their respective influence for the choice of different grammatical devices.
The Mental Salience Framework
123
4.4. Extensions and challenges The original motivation underlying the Mental Salience Framework was the insight that a model of the attentional states of the hearer does not sufficiently constrain the choice of referring expressions, but that at least one additional dimension interfering with “givenness” must be considered as well (Chiarcos, 2011). This observation has been made before, though, however, different candidates for this alternative, interfering dimension affecting the use of referring expressions have been proposed, e.g. contrastiveness, emphasis, importance (Givón, 1983; Levelt, 1989; Chafe, 1994), etc. While the differentiation between common ground (as reflected in hearer salience) and speaker-private knowledge (from which speaker salience arises) is well justified and probably uncontroversial, the question remains whether these aspects of salience, hearer salience and speaker salience, form by themselves uniform dimensions of attentional states. Instead of defending this specific hypothesis, I motivate this assumption from theoretical minimalism, i.e., methodological considerations. The postulation of another distinction between, say, two kinds of speaker salience, must be justified from empirical findings which cannot be covered by the existing model. Additional dimensions of salience arising from other modalities, e.g. visual salience, do exist, but with respect to salience in discourse, I am currently not aware of any empirical data that makes a differentiation between more than two dimensions of hearer salience or speaker salience for discourse referents necessary. A challenging question, however, is whether the grammatical devices of one type, say referring expressions, can be characterized by only one linear combination of salience scores or whether hearer salience and speaker salience differ in their impact on different grammatical devices. In fact, it has been suggested for demonstratives as compared to personal pronouns, that the condition licensing the use of demonstratives are more specific than those of personal pronouns. For Finnish, Kaiser and Trueswell (to appear 2011) found a preference for personal pronouns to co-refer with the subject of the preceding utterance, whereas demonstratives preferred the last mentioned possible antecedent. They explained this difference with different interoperating dimensions, i.e. linguistic structure (as indicated by grammatical roles) and information structure (as indicated by word order in Finnish), that differ in their relevance to the choice of pronouns as compared to the choice of a demonstrative. In the Mental Salience Framework, this configuration could be modeled by defining hearer salience in terms of grammatical roles, but to assume a greater influence of word order (indicating non-salience, i.e. non-givenness) on speaker salience scores. Then, the observed pattern can be achieved by defining that personal pronouns depend on
124
Christian Chiarcos
hearer salience alone, whereas demonstrative pronouns are sensitive to speaker salience besides hearer salience. This specific model of demonstratives, however, requires that not one cumulated salience score for the generation of referring expressions is generated, but that for certain smaller classes of referring expressions, individual scores are calculated and then, interpreted as the probability to use a specific kind of referring expression. Thus, the association between salience score and a certain grammatical device is no longer a direct one, say, a mapping from a certain score on a scale to a preference for a certain form, but it is a mapping from a twodimensional space onto the preference for a given form, guided by the proximity between the canonical salience of that form and the scores currently achieved. The Mental Salience Framework permits this kind of extension, though it is currently concentrating on the most elementary classes of grammatical devices, abstracting from more fine-grained differentiations such as the differentiation between pronouns and demonstratives. Another possible extension is the combination of the Mental Salience Framework with learning algorithms. As factors, salience scores and coding preferences are specified by numerical scores, which are retrieved from linear combinations, this network can be interpreted as a multi-layer perceptron whose weights (parameters) can be set by backpropagation. As a result, the Mental Salience Framework allows not only for the comparative representation and evaluation of different theories, but also for data-driven parameter weighting. 5.
Related research
The Mental Salience Framework represents a model of a specific insight on the nature of attention control in discourse, that is, the distinction between different dimensions of the salience of discourse referents, associated with different functions in the flow of discourse: hearer salience (part of the speaker’s hearer model, exploited by the speaker to generate expressions in a way that a hearer can relate them to elements introduced in the discourse before) and speaker salience (correlate of the speaker’s intention to guide the hearer’s attention on certain referents, e.g. their role for the further development of the discourse).
The Mental Salience Framework
125
5.1. Multidimensional models of salience in the generation of referring expressions While proposals for multidimensional models of salience have been made before (e.g. Givón, 1983; Clamons et al, 1993, Mulkern, 2003), these often remained merely theoretical, and, to my knowledge, have not been formalized within a model for the prediction of the choice of referring expressions, the assignment of grammatical roles, and the deduction of word order preferences. The differentiation between two types of salience in NLG contexts as proposed by Pattabhiraman (1992) concerns another distinction, that is, the relationship of the degree of (instantial) salience a cognitive representation has, and the degree of (canonical) salience a grammatical device, or a given lexeme, is capable to express. In his terminology, hearer salience and speaker salience are both different aspects of instantial salience, whereas canonical salience is concerned with the mapping between grammatical devices and salience scores. In fact, Pattabhiraman’s model of salience in NLG can be used as an alternative to the deductive linear combination approach presented here. Pattabhiraman’s canonical salience is related to the notion of salience as developed in the field of semantics of comparisons and metaphors. In her investigation of metaphorical and literal readings of potentially metaphorically interpretable expressions, Giora (1999) introduced the notion of salience as an assessment for the likelihood of a semantic meaning a given sequence of words can be assigned. Similarly, in his classical work on comparisons, Tversky (1977) postulated that semantic features differ in their relative salience for different elements, and that these differences have an effect on the ordering of elements in a comparison. In a later extension of Tversky’s work, Ortony (1979) found that feature salience has an effect on the well-formedness of metaphoric expressions. Horacek’s algorithm for the generation of referential descriptions (Horacek, 1997) broadened Tversky’s and Ortony’s understanding of salience by identifying the role of property salience in the generation of referring expressions in general, that is, to account for the observation that referring expressions often involve attributes that are not primarily motivated by their capability to distinguish the given referent from a set of semantically compatible distractors, but from independent considerations. Property salience is, however, a feature of attributes, not discourse referents (as hearer salience and speaker salience). Property salience and object salience are independent from each other, and, as suggested by van der Sluis and Krahmer (2001), it can be assumed that both dimensions co-operate with other dimensions of salience in the production of the form of referring expressions. The third dimension of salience considered by van der Sluis and Krahmer comes from environmental factors, especially the
126
Christian Chiarcos
visual surrounding. Effects of the situational context on the choice of referring expressions have been observed frequently before. Similar to Bühler’s (1934) interpretation of deixis as an extension of anaphora, Prince (1981) considers “situationally evoked” and “textually evoked entities” to form a homogeneous group of highly activated (evoked) referents. Also in the context of multi-modal generation of referring expressions, the interaction between visual salience and linguistic salience has been investigated (cf. Kelleher, this volume). With respect to other existing multi-dimensional models of salience, we may conclude that besides hearer salience and speaker salience, additional dimensions of salience can be assumed which differ from salience as understood here (entity-based salience in discourse) in their domain (canonical salience/feature salience/property salience) or their modality (visual salience), and that are thus independent from the dimensions of salience discussed here, which are more strictly concerned with the flow of discourse. Due to this independence, however, these are compatible with the differentiation between hearer salience and speaker salience and thus, they can be regarded as potential augmentations of the Mental Salience Framework. The differentiation between hearer salience and speaker salience, however, is theoretically well justified (Clamons et al., 1993; Mulkern, 2003), but has not been formalized before, and accordingly, also the Mental Salience Framework can be regarded to provide a more precise model of linguistic salience as compared to older mono-dimensional accounts of linguistic salience as currently employed by existing models for the generation of referring expressions, also in multi-modal contexts, which concentrate on hearer salience, e.g., van der Sluis and Krahmer (2001). 5.2. Centering in Optimality Theory It has been shown above that the Mental Salience Framework is capable to represent adequately existing theories such as classical variants of Centering and related theories such as Givón’s bi-dimensional account of topicality. Similar attempts for the integration of previously independent lines of research within one framework have been proposed before,8 but the Mental Salience Framework differs from these in that its theoretical implications are fairly minimal, that is, essentially only that speaker-private intentions and beliefs have to be
8. Previous proposals include the attempts of Hajiˇcová and Kruijff-Korbayová (1997), Krahmer and Theune (2002) and Navaretta (2002) to bring together the Praguian notion of salience developed by Hajiˇcová and Vrbova (1982) and Centering (Grosz et al., 1995).
The Mental Salience Framework
127
separated from the assumptions that the speaker has about attentional states of the hearer. In this theoretical minimalism, the Mental Salience Framework shares a certain resemblance with Optimality Theory, which can also be viewed as a formal apparatus within which existing theories such as Centering (Beaver, 2004) can be reconstructed.9 Optimality Theory relies on the observation that grammars contain constraints on the well-formedness of linguistic structures, and often, these constraints are in conflict. The rapid and systematic resolution of such conflicts, however, entails that constraints are not equal in their violability, and thus, the existence of a ranking. According to OT, constraints are components of the universal grammar, and language-specific grammars are instantiations of the UG in that they represent different possible rankings of universal constraints. Formally, constraints in OT are conditions on the relationship between an underlying form, or input, and a set of possible surface candidates, i.e. possible output. For the generation of referring expressions, the input is an underspecified logical form of an utterance, the output is a candidate utterance. The optimal candidate output is selected based on the ranking of violated constraints. Given two candidate forms A and B, A is more optimal than B if the highestranking constraint which is violated by B is not violated by A, and no violations of higher-ranked constraints occur for A. In his Centering in OT (COT), Beaver proposes a set of constraints which capture the main ideas of Centering following Brennan et al. (1987, BFP): – pro-top The topic is pronominalized. (Rule 1) – cohere The topic of the current sentence is the topic of the previous one. (dis-preference of shifts, Rule 2) – align The topic is in subject position. (dis-preference of shifts, Rule 2) Further, Beaver provides a constraint-based definition of the backward-looking center (“topic”): – one-sentence-window Only discourse entities mentioned in the previous sentence are salient. (salience definition) – arg-salience One discourse entity is more salient than another if the first was referred to in a less oblique argument position than the second in the same sentence. (salience definition)
9. The concrete claims by Optimality Theory are more rigid, but only concern the nature of constraints as a component of Universal Grammar.
128
Christian Chiarcos
– unique-topic With respect to any sentence, there is exactly one discourse entity which is the topic of that sentence. (definition backward-looking center) – salient-topic The topic of a sentence is the most salient discourse entity referred to in that sentence, and undefined if no previously salient entities are referred to. (definition backward-looking center) The minimal version of COT also involves further constraints which are not directly motivated from Centering: – fam-def Each definite NP is familiar. This means both that the referent is familiar, and that no new information about the referent is provided by the definite. Using this reconstruction of Centering, Beaver shows the equivalence between COT and Brennan et al.’s original account with respect to pronominalization. However, it should be noted, that like the reconstruction of Centering within the Mental Salience Framework, the predictions made by COT are more specific and more elaborate than the predictions of Brennan et al. (1987). The constraint fam-def, though a reasonable assumption, is not motivated from Centering, and pro-top differs from Rule 1 in that it is indistinctive between two critical cases, i.e., (a) no pronominalization in the output, and (b) pronominalization of non-backward-looking center in the output, but not of backward-looking center. For his equivalence proof, Beaver concentrated on proper names and pronouns only (excluding fam-def), and proves equivalence between COT and Centering with respect to three critical cases: – Purely anaphoric resolutions breaking syntactic constraints are never COT optimal, and never correspond to preferred BFP transitions. – Fully anaphoric resolutions which violate Rule 1 are never COT optimal, and never correspond to preferred BFP transitions. – Suppose two fully anaphoric resolutions A and B of a sentence satisfy syntactic constraints and Rule 1. If COT ranks candidate A above candidate B then BFP ranks candidate A above candidate B and vice versa. For the third case, however, Beaver’s proof relies on the assumption that “Since Rule 1 is satisfied by A and B and there are pronouns, PRO-TOP is also satisfied by A and B.” Earlier, he described the motivation of pro-top: “PROTOP has essentially the effect of Centering s Rule 1. … If there are pronouns, then PRO-TOP will function comparably to Rule 1, providing a preference for interpretations that make the topic (i.e., CB) into a pronoun.” However,
The Mental Salience Framework
129
the formulation “if there are pronouns” involves a great abstraction, in that is assumes that pronominalization is triggered only by salience (and agreement filters). As noted in the sketch of the adequacy proof above, this assumption predicts the same results as the original Centering rule only if the definition of agreement filters may extend beyond strict morpho-syntactic congruency. Further, aside from the critical cases identified above, we can construct an example in which pro-top and Centering make different predictions about pronominalization: Marym watched how Sues crossed the street over to Harry’sh house. (Mary, Sue > Harry) (7)
Shem/s wondered about the low traffic today.
(8)
He/Harryh did not realize herm/s .
(9)
Heh did not realize Marym /Sues .
(10)
Harryh did not realize Marym /Sues .
The examples (7) to (10) are possible continuations of the first sentence. The well-formedness of example (9) for both interpretations illustrates that Mary and Sue are equally possible antecedents of a pronoun in the following sentence, thus, a feminine pronoun would be ambiguous between Mary and Sue. Therefore, (7) is fully ambiguous between both readings, and thus from cooperativity considerations, we may conclude that it is not a feasible candidate output. As a consequence, only (9) and (10) are to be considered by COT respectively Centering. However, Harry is clearly more oblique than Mary and Sue in the first utterance, and thus, it cannot be the backward-looking center. Therefore, (9) violates both Rule 1 and pro-top. However, (10) violates pro-top, but does not violate Rule 1. Therefore, if (8) is not available for reasons of ambiguity, the Centering-optimal output is (10), whereas COT is indistinctive between (9) and (10). At this point, I would like to emphasize that because of the unconditional formulation of pro-top in COT and the existence of the fam-def constraint, COT makes predictions beyond the original Centering, and thus, must be deemed adequate with respect to Centering, rather than equivalent. The equivalence proof provided by Beaver is concerned with a subset of critical cases and only with the differentiation between pronouns and the use of proper names. It is possi-
130
Christian Chiarcos
ble to construct a critical example in which Centering and COT make different pronominalization predictions.10 Thus, Centering in OT and the reconstruction of Centering within the Mental Salience Framework are comparable with respect to their adequacy (“equivalence”). However, the theoretical implications of an OT modeling cannot be underestimated. Essentially, all possible constraints must be part of the universal grammar. Postulating a constraint like fam-def entails the assumption that definite NPs form a universal syntactic category, which is clearly contradicted by the existence of languages which have no explicit definiteness markers. Further, the OT reconstruction of Centering, like the original formulation of Centering, are inherently symbolic, categorial accounts, which are capable to predict a finite and fixed set of possible categories of referring expressions. One of the central criticisms of categorial accounts of givenness brought forward by Mira Ariel (1990; 2001) states that the number of grammatical devices distinguished in a specific language, is theoretically unlimited, and if all relevant distinctions among referring expressions are to be captured in an extension of COT similar to the familiarity criterion of definite NPs by the postulation of the corresponding constraints, the formulation of these categories and their salience characterization in OT also entails that these categories are also present in universal grammar, which is probably misleading. As opposed to this, in the Mental Salience Framework, the number of possible referring expressions is not a priori limited, but can be justified in terms of their salience characterization. In the OT and in the anaphor resolution communities, further instantiations of Centering in OT have been developed (Buchwald et al., 2002; Bouma, 2003; Byron and Gegg-Harrison, 2004; Hardt, 2004). From these, the conceptual motivations underlying the Recoverability Optimality Theory (ROT) model (Buchwald et al., 2002) are very closely related to underlying insights of the Mental Salience Framework. Both share a production perspective which leads to the assumption of two discourse models, the model of the speaker’s private 10. Of course, this can be compensated by a formulation of pro-top which is closer to the original Rule 1 and thus, it provides no counter-evidence for the reliability of a reconstriction of Centering in OT in general. Note, that also for the Mental Salience Framework, Centering-conformant behaviour can be achieved only by the use of specialized ambiguity filters. For this example, the Mental Salience reconstruction of Centering predicts (9), if ambiguity is determined by morphological agreement only. However, the Centering-optimal prediction can be achieved if ambiguity is defined without any morphological restrictions, which ultimatively leads to the following Centering-conformant, but not very natural, strategy: pronominalize nothing but the backward-looking center (Kibble and Power, 2004).
The Mental Salience Framework
131
intentions and the discourse model, the salience list or “common ground”. Both share the assumption that cues from the following discourse must be considered in order to generate referring expressions in a proper way. And, as well as the Mental Salience Framework, ROT is a parameterized framework in the sense that the set of constraints considered is subject to possible extensions. Indeed, the Mental Salience Framework could be applied for the ranking of the current and the following salience list, and thus serve as a complement to ROT with respect to the concrete model of salience which is left unspecified so far. Nevertheless, the Mental Salience Framework is less constrained in its theoretical implications and in its adaptive character. Especially, it supports language-specific categories of referring expressions whose treatment in Optimality Theory is uncertain. In the best case, the integration of additional categories of referring expressions only requires to associate them with certain salience scores. Accordingly, the Mental Salience Framework is more oriented towards a broad-scale practical application. 5.3. Centering as a parametric theory Besides approaches dealing with the reconstruction of different theories within a more general framework, also the variation of parameters within one theory has been considered. By its impressive acceptance across different disciplines of linguistics, Centering Theory has become widely adapted throughout a great community. However, as a necessary consequence of this wide spread, the theory was modified in certain contexts. As one example, OT approaches abstract from the formulation of transitions between utterances (Beaver, 2004), and even from the concept of backward-looking center (Hardt, 2004), thus leaving essentially nothing of the original theory but the metaphor that attention has to be “centered” during discourse processing. But also in more conservative formulations of Centering Theory, parameters such as the definition of utterance, the definition of possible forward-looking and backward-looking centers, the criterion of forward-looking centers to be realized within an utterance, and different salience rankings are varying throughout the literature. Some of these parameters have been empirically investigated by Poesio et al. (2004) who considered empirical effects of variation in the definition of utterance (sentence, finite clauses, all clauses with a verb, …), realization (indirect realization: consider not only anaphoric, but also bridging relations between forward-looking centers and potential backward-looking centers; considering non-third person pronouns as forward-looking centers), and different salience rankings.
132
Christian Chiarcos
While the empirical evaluation of different parameters of Centering is a worthful and important achievement, it opens the question what concrete claims of the original theory really remain. From their study, Poesio et al. (2004) motivate a re-formulation of certain aspects of Centering, which, however, is not compatible with radical approaches such as Hardt’s Dynamic Centering (2004). One of the most important results of the study is, however, that Centering cannot be evaluated without considering concrete instantiations of the different parameters it involves. As long as these parameters remain not fully specified, it is unclear to what degree Centering can be falsified at all. Therefore, the central criticism on Centering is not in any of its specific claims, but only in its theoretical status. That is, essentially it must be regarded as a framework which proposes a certain terminology and formalism, but not as a theory in the strict sense. On the other hand, the achievements of Centering cannot be denied. For the first time, a common terminology on several discourse phenomena has been established across different disciplines of linguistics. The Mental Salience Framework, however, differs from Centering in that it does not claim that it represents a theory, but merely a formalism, or a framework. The crucial difference is that a theory must be falsible, whereas a “parametric theory” as long as it cannot be evaluated independently from its parameters, is nothing but a metaphor. A major difference between Centering and the Mental Salience Framework is that within the Mental Salience Framework, a numerical account of salience is provided and explicitly modeled with respect to the choice of referring expressions, grammatical roles and word order preferences, whereas in Centering, pronominalization is seen as a bye-product of entity coherence with only very weak consequences on the choice of referring expressions at all. In the Mental Salience Framework, however, this relationship is formulated in a very explicit way, in that numerical scores are mapped onto specific coding preferences. Further, it applies beyond the scope of pronominalization as opposed to the choice of full nominal NPs, in that it is compatible with the fine-grained specification of an arbitrary number of different grammatical devices in terms of the salience conditions their appropriate use depends on. Further, Centering does not provide a model for the assignment of grammatical roles, but only for their effect on local coherence. In functional linguistics, this function is identified as “foregrounding”, and speaker salience can thus be described as the need of the speaker to place entities in the foreground of a scene, e.g. for processing of the following discourse. As Centering relies on surface-oriented factors indicating foregrounding, it implicitly takes an interpretation perspective on the discourse it is applied to. As opposed to this, the Mental Salience Framework clearly takes a production perspective in that it in-
The Mental Salience Framework
133
cludes an explicit model of attentional states of the speaker, and thus, it is more specialized for the needs of Natural Language Generation. 5.4. Centering Games As an extension of the identification of the parameters of Centering (Poesio et al., 2004) and the existence of different reconstruction of instantiations of Centering Theory in Optimality Theory, Kibble (2003) proposed a Game-theoretic reconstruction of Centering Theory as a framework for collaborative reference resolution as a non-cooperative game of incomplete information. With our approach, the Game-theoretic reconstruction of Centering shares the assumption that two perspectives, hearer perspective and speaker perspective, have to be distinguished. The relevant processing modules of the hearer perspective include: discourse modeler maintains a record of entities mentioned in the discourse which will be candidates for anaphor resolution. Possible discourse models include (a) a Centering model, (b) the list of focal referents from the previous clause, or (c) a fully specified discourse model. reference resolver identifies the referent of a referring expression with an entity in the discourse model. The relevant processing modules of the speaker perspective include: planner/content determination organizes input propositions into a text structure; plan sentences by e.g. choosing verb forms to realize preferred order of arguments. Possible strategies include to (a) promote arguments within a clause according to their perceptual salience, (b) plan consecutive clauses to align salience rankings, or (c) plan sequence of clauses to maximize referential continuity, in addition to salience alignment. realizer generates appropriate referring expression to denote arguments of predicates. Some details of Kibble’s approach remain abstract, and the adequacy of this approach has not been proven so far. However, with the exception of the reference resolver which has no direct parallel in the Mental Salience Framework, its concepts can be interpreted in terms of Kibble’s Game-theoretic framework. Hearer salience is clearly a part of the discourse modeler, though a fully specified discourse model involves additional aspects beyond the modeling of attentional states of the hearer. The strategies enumerated in the planner/content determination module are partly concerned with the assignment of grammatical roles.
134
Christian Chiarcos
Strategy (a) is concerned with perceptual salience only, but is roughly parallel to the word order and grammatical role-strategies specified for speaker salience in TOP. Strategies (b) and (c) involve an “alignment of salience rankings” with utterances from the following discourse, and seemingly, this corresponds to the extrapolation of speaker salience from coding decisions in the following discourse according to the Centering reconstructions. Finally, the realizer covers the determination of coding preferences (from the linear combination of salience scores) and their application. Hence, the conceptions of the Mental Salience Framework seem to be closely related to Kibble’s Game-theoretic reconstruction (“elimination”) of Centering, and it might be regarded a more concrete framework for formulation of the strategies suggested by Kibble. 6.
Summary and outlook
A generalized parameterized framework was sketched providing an architecture for mechanisms of attention control by the salience-based assignment of coding preferences for referring expressions in discourse. Relying on the previously noticed multi-dimensionality of salience, the distinction of two dimensions of salience was suggested which is consistent with different terminological traditions relating the notion of salience to accessibility/givenness and importance/newsworthiness respectively. As an illustration of theoretical adequacy, a minimal instantiation has been proposed capable to represent Givón’s topicality approach and two instantiations of Centering. Further, a proof to the adequacy of these reconstruction was sketched. Hence, the Mental Salience Framework provides a proper basis for the comparative evaluation of these and related theories. Beyond this, the numerical character of the parameters allows for the application of learning algorithms, e.g. based upon an interpretation of the architecture as illustrated in Fig. 2 as a neural network. Thus, a supervised learning algorithm can be applied to assign parameter weights according to empirical data. As a result, an integrated architecture for cognitive-pragmatic aspects of attention control in discourse has been suggested. Due to its appealing simplicity and intuitivity, the implementation for NLG systems becomes likely and is the perspective aim of this research. In this domain, it provides key mechanisms for both optimizing coherence/cohesion of automatically generated texts (by coding preferences due to hearer salience) and the assignment of judgments of emphasis, relevance or importance (speaker salience, if interpreted as relevance, provides an interface to guide the hearer’s attention onto certain aspects or entities according to external parameters).
The Mental Salience Framework
135
References Mira Ariel 2001
Mira Ariel 1994 David I. Beaver 2004 Gerlof Bouma 2003
Accessibility theory: An overview. In T. Sanders, J. Schilperoord, and W. Spooren, editors, Text Representation. Linguistic and psycholinguistic aspects, volume 8 of Human Cognitive Processing, pages 29– 87. John Benjamins, Amsterdam, Philadelphia. Interpreting anaphoric expressions: A cognitive versus a pragmatic approach. Journal of Linguistics, 30:3–42. The optimization of discourse. Linguistics and Philosophy, 27(1). Doing Dutch pronouns automatically in Optimality Theory. In Proceedings of the EACL 2003 Workshop on The Computational Treatment of Anaphora, Budapest.
Susan E. Brennan, Marilyn W. Friedman, and Carl J. Pollard 1987 A Centering approach to pronouns. In Proc. of the 25th Annual Meeting of the Association for Computational Linguistics, pages 155–163, Stanford, Cal., July 1987. Adam Buchwald, Oren Schwartz, Amanda Seidl, and Paul Smolensky 2002 Recoverability Optimality Theory: Discourse anaphora in a bidirectional framework. In Proceedings of the 6th Ws. on the Semantics and Pragmatics of Dialogue (EDILOG), Edinburgh, Sep. 2002. Donna K. Byron and Whitney Gegg-Harrison 2004 Evaluating Optimality Theory for pronoun resolution algorithm specification. In Proceedings of the Discourse Anaphora and Reference Resolution Conference (DAARC2004), pages 27–32, September 2004. Karl Bühler 1934 Wallace Chafe 1976
Wallace Chafe 1994
Sprachtheorie. Die Darstellungsfunktion der Sprache. Gustav Fischer, Stuttgart. Giveness, contrastiveness, definiteness, subjects, topics, and point of view. In Charles W. Li (ed.), Subject and Topic. Academic Press, New York. Discourse, Consiousness, and Time. The Flow and Displacement of Conscious Experience in Speaking and Writing. University of Chicaogo Press, Chicago and London.
136
Christian Chiarcos
Christian Chiarcos 2009 Mental Salience and Grammatical Form. Toward a Framework for Salience Metrics in Natural Language Generation. PhD thesis. Universität Potsdam, Germany. Christian Chiarcos 2011 On the dimensions of discourse salience. Paper presented at the DGFS-2011 Workshop Beyond Semantics. Corpus-based Investigations of Pragmatic and Discourse Phenomena. Göttingen, Feb. 2011. C. Robin Clamons, Ann E. Mulkern, and Gerald Sanders 1993 Salience signaling in Oromo. Journal of Pragmatics, 19:519–536. James Raymond Davis and Julia Hirschberg 1988 Assigning intonational features in synthesized spoken directions. In Proc. ACL-1988, pages 187–193, Buffalo/NY. Rachel Giora 1999 Talmy Givón 2001 Talmy Givón 1983
Talmy Givón 1995
On the priority of salient meanings: Studies of literal and figurative language. Journal of Pragmatics, 31:919–929. Syntax. John Benjamins, Amsterdam, 2nd edition (2 vols). Introduction. In Talmy Givón, editor, Topic Continuity in Discourse: A Quantitative Cross-Language Study. John Benjamins, Amsterdam, Philadelphia, 1983, pages 5–41. Functionalism and Grammar. John Benjamins, Amsterdam, Philadelphia.
Joseph H. Greenberg 1963 Some universals of grammar with particular reference to the order of meaningful elements. In Joseph H. Greenberg, editor, Universals of language, pages 73–113. MIT Press, Cambridge, Mass. Barbara J. Grosz and Candace L. Sidner 1986 Attention, intentions, and the structure of discourse. Computational Linguistics, 12: 175–204. Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein 1995 Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203–225. Eva Hajiˇcová, Ivana Kruijff-Korbayová, and Geert-Jan M. Kruijff 1998 Salience in dialogues. In Svetla Cmejrková, Jana Hoffmannová, Olga Müllerová, and Jindra Svetlá, editors, Dialogue Analysis VI: Proc. of the 5th Int. Congress of the Int. Assoc. of Dialogue Analysis, April 17–20 1996, Prague, Czech Republic, pages 381–393, Prague.
The Mental Salience Framework
137
Eva Hajiˇcová and Ivana Kruijff-Korbayová 1997 Topics and centers: A comparison of the salience-based approach and the Centering Theory. Prague Bulletin of Mathematical Linguistics, 67: 25–50. Eva Hajiˇcová and Jarka Vrbova 1982 On the role of the hierarchy of activation in the process of natural language understanding. In Jan Horecký, editor, COLING 82 – Proceedings of the Ninth International Conference of Computational Linguistics, Prague, pages 107–113, Amsterdam: North Holland. Daniel Hardt 2004
Dynamic Centering. In Proceedings of the Workshop on Reference Resolution and its Applications: ACL 2004, pages 55–62, Barcelona.
John A. Hawkins 1992 Syntactic weight versus information structure in word order variation. In Joachim Jacobs, editor, Informationsstruktur und Grammatik, Linguistische Berichte. Sonderheft 4/1991-92, pages 196–219. Westdeutscher Verlag, Opladen. Helmut Horacek 1997 An algorithm for generating referential descriptions with flexible interfaces. In Proc. 35th ACL/EACL, pages 206–213, Madrid. Elsi Kaiser and John Trueswell to appear 2011 Investigating the interpretation of pronouns and demonstratives in Finnish: Going beyond salience. In Edwin Gibson and Neal J. Pearlmutter, editors, The Processing and Acqusition of Reference. MIT Press, Cambridge, Mass. Rodger Kibble 2003
Towards the elimination of Centering Theory. In Ivana KruijffKorbayovà and Claudia Kosny, editors, Proc. of 7th Workshop on the Semantics and Pragmatics of Dialogue, pages 51–58, University of Saarbrücken, September 2003
Rodger Kibble and Richard Power 2000 An integrated framework for textplanning and pronominalisation. In Proceedings of the International Conference on Natural Language Generation (INLG), 2000 Rodger Kibble and Richard Power 2004 Optimizing referential coherence in text generation. Computational Linguistics, 30(4):401–416.
138
Christian Chiarcos
Christof Koch and Laurent Itti 2000 Computational modelling of visual attention. Nature Review Neuroscience, 2:194–203. Emiel Krahmer and Mariët Theune 2002 Efficient contextsensitive generation of referring expressions. In Kees van Deemter and Rodger Kibble, editors, Information Sharing: Reference and Presupposition in Language Generation and Interpretaion. CSLI Publications, pages 223–264. Geert-Jan M. Kruijff, Ivana Kruijff-Korbayová, John Bateman, and Elke Teich 2001 Linear order as higher-level decision: Information structure in strategic and tactical generation. In Helmut Horacek, editor, Proceedings of the 8th European Workshop on Natural Language Generation, pages 74–83, Toulouse, France, July 5–6 2001. Willem J.M. Levelt 1989 Speaking: From Intention to Articulation. MIT Press, Cambridge, Mass. Ann E. Mulkern 2003 Cognitive Status, Discourse Salience, and Information Structure: Evidence from Irish and Oromo. PhD thesis, University of Minnesota. beruht auf Clamons et al. (1993). Costanza Navaretta 2002 Combining information structure and Centering-based models of salience for resolving intersentential pronominal anaphora. In Antonio Branco, Tony McEnery, and Ruslan Mitkov, editors, Proc. DAARC 2002 – 4h Discourse Anaphora and Anaphora Resolution Colloquium, pages 135–140, Lisbon, September 18–29 2002. Edi¯eoes Colibri. Andrew Ortony 1979
Similarity in similes and metaphors. In Andrew Ortony, editor, Metaphor and Thought. Cambridge University Press, Cambridge, pages 186–201.
Thiyagarajasarma Pattabhiraman 1992 Aspects of Salience in Natural Language Generation. PhD thesis, Simon Fraser University, August 1992. Massimo Poesio, Barbara Di Eugenio, Rosemary Stevenson, and Janet Hitzeman 2004 Centering: A parametric theory and its instantiations. Computational linguistics, 30(3):309–363. Ellen F. Prince 1981
Toward a taxonomy of given-new information. In P. Cole, editor, Radical Pragmatics, pages 223–256. Academic Press, New York.
The Mental Salience Framework Regina Pustet 1997 Owen Rambow 1993
139
Diskursprominenz und Rollensemantik – Eine funktionale Typologie von Partizipantensystemen. Lincom Europa. Pragmatic aspects of scrambling and topicalization in German. In Workshop on Centering Theory in Naturally-Occurring Discourse. Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, PA.
Petr Sgall, Eva Haji´cová, and Jarmila Panevova 1986 The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Reidel, Dordrecht. Michael Strube and Udo Hahn 1996 Functional Centering. In Proc. of 34th Ann. Meeting of the Association for Computational Linguistics (ACL’96), pages 270–277, Santa Cruz/CA, June 1996. Michael Strube and Udo Hahn 1999 Functional Centering – Grounding referential coherence in information structure. Computational Linguistics, 25(3):309–344. Joel R. Tetreault 1999 Analysis of syntax based pronoun resolution methods. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), Maryland/MD. Russel S. Tomlin 1995 Focal attention, voice, and word order. an experimental, crosslinguistic study. In Mickey Noonan and Pamela Downing, editors, Word Order in Discourse. John Benjamins, Amsterdam, Philadelphia, pages 517–554. Amos Tversky 1977
Features of similarity. Psychological Review, 84(4):327–352.
Ielka van der Sluis and Emiel Krahmer 2001 Generating referring expressions in a multimodal context: An empirical approach. In Proceedings 11th CLIN-meeting, Rodopi, Amsterdam/Atlanta. Andrea Weber and Karin Müller 2004 Word order variation in German main clauses: A corpus analysis. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva.
Part II. Beyond entities in discourse
Discourse-structural salience from a cross-linguistic perspective: Coordination and its contribution to discourse (structure) Wiebke Ramm
1.
Introduction
This paper approaches the topic of this volume from a cross-linguistic perspective, with Norwegian, German and English as example languages. Our aim is to contribute to a clarification of discourse-structural concepts like the distinction between subordinating vs. coordinating discourse relations as described in Segmented Discourse Representation Theory (SDRT, Asher and Vieu 2005) or nucleus-satellite vs. multinuclear discourse relations in Rhetorical Structure Theory (RST, Mann and Thompson 1988) and their relation to informationstructural (focus vs. background) and syntactic distinctions (coordination vs. subordination) in a cross-linguistic perspective. Taking non-correspondences regarding (clause or verb phrase) coordination in translation as an observational point of departure, we discuss the interpretation of coordinated structures as compared to non-coordinated alternatives (sentence sequences and syntactic subordination) with a view to the relative salience of the conjuncts in discourse.1 We are concerned with two types of translation discrepancy involving Norwe1. To some degree, the questions addressed in this contribution are similar to those addressed by Hinterhölzl and Petrova (this volume): both papers investigate the relation between structural linguistic features – syntactic coordination vs. non-coordinated structures in the present contribution, V1 vs. V2 word order in Hinterhölzl and Petrova’s article – and how they show up in hierarchical discourse structure, as modelled in discourse theories such as SDRT. There is also a certain relationship to the contribution by Krasavina (this volume) in that questions of choice between different linguistic options and their implications for the marking of salience are addressed: Krasavina deals with choices between different types of referential expressions in Russian, the present paper addresses language-specific choices (showing up as translation mismatches) concerning the linking of discourse units.
144
Wiebke Ramm
gian and German or English, namely coordinated clauses in the source language (SL) translated as a sequence of sentences in the target language (TL) (Norwegian > German, Section 3.1), and syntactic subordination (adjunction) rendered as (verb phrase or clausal) coordination in the TL (German > Norwegian, Section 3.2, and English > Norwegian/German, Section 3.3). Our data are taken from three different parallel corpora, the Oslo Multilingual Corpus (OMC),2 as well as from two smaller corpora of non-fictional texts. It is a well-known fact that coordination, despite its apparent syntactic symmetry (the conjuncts belonging to the same syntactic category), may encode or “explicate” an asymmetrical relation at the semantic-pragmatic level.3 What we want to show is that coordination tends to be exploited somewhat differently in Norwegian than in English or German: Norwegian apparently uses coordination more productively, as a kind of compensation for other grammatical resources (e.g. adjunction or non-coordinated paratactic structures) used in English or German; and Norwegian also seems to be less constrained with respect to what kind of discourse units the coordination marker can link as well as regarding the order of foregrounded and backgrounded information in a coordinated structure. From a theoretical viewpoint our observations raise interesting questions about the correlation between syntactic coordination/subordination and coordinating/subordinating discourse relations (cf. e.g. Asher and Vieu 2005) as well as the status of the latter across languages. Our data suggest that either the use of coordinating/subordinating discourse relations in Norwegian differs from their use in German and English, or that syntactic coordination signalled by a coordination marker (og/und/and) does not necessarily imply a coordinating discourse relation between the conjuncts, contrary to what Asher and Lascarides (2003) and Asher and Vieu (2005), following Txurruka (2000), seem to assume. A further – both theoretically and empirically interesting – implication of our contrastive analyses is that they shed light on the backgrounding role/function of (certain types of) adjuncts. In Section 2 we give a brief overview of theoretical concepts to bear on our topic. Section 3 presents and discusses our translational data. Our conclusions are summarized in Section 4.
2. See http://www.hf.uio.no/iln/tjenester/sprak/korpus/flersprakligekorpus/omc/index. html (visited 16 Sep 2010) 3. Asymmetry in coordination is taken up in several of the contributions in FabriciusHansen and Ramm (eds., 2008), see the editor’s introduction (Fabricius-Hansen and Ramm 2008: 7–11) for an overview.
Discourse-structural salience from a cross-linguistic perspective
2.
145
General concepts
2.1. Views on (non-)salience According to a working definition, salience is “the degree of relative prominence of a unit of information, at a specific point in time, compared to the other units of information” (cf. the introduction of this volume, pp. 2ff.). Applied to the translation scenario for written texts on which the present study builds, our concern is the weighting or relative prominence of portions of information in syntactically coordinated constructions vs. syntactically subordinated or juxtaposed equivalents, i.e. the interpretation of these constructions at a specific point in the SL vs. TL text. The scenario touches upon various concepts at syntactic, semantic and discourse level which either emphasise the nature of the relation between (adjacent) pieces of discourse – e.g. “coordination vs. subordination” or “symmetry vs. asymmetry” – or the status of some element as being more or less prominent than some other element, including the linguistic means employed to give it this status – e.g. marking some element as “foreground(ed)”, “focus(ed)”, “background(ed)” or “downgraded” in some way. Section 2.2 discusses different notions of “background”: background as an information-structural notion used to characterise the weighting of linguistic units within a clause and as a discourse-structural notion used to assign (in a theory-dependent way) a particular relation holding between discourse units of (at least) clause size. Section 2.3 takes up syntactic coordination at clause and verb phrase level and how its relation to discourse structure is modelled in different theoretical approaches, in particular with respect to the relative weighting of the conjuncts. Section 2.4 deals with clause linkage, i.e. the options for complex sentence formation, and the dimensions which are affected in the corpus examples to be discussed in Section 3. 2.2. Some relevant information-structural and discourse-structural notions In the discussion of information structure and discourse relations, “background” is an important but fuzzy term. As part of the so-called focus-background partition (Büring 1997; Rooth 1992), the notion of background concerns information structure at sentence level. It is commonly illustrated by question-answer sequences like (1a–b).4 4. What follows is a very much simplified description, disregarding additional partitions like topic vs. comment (or theme vs. rheme) and the notorious ambiguity of the term focus itself; see e.g. Vallduvi/Engdahl (1996) for a very useful survey.
146
Wiebke Ramm
(1)
a. b. c.
When did you arrive? I arrived yesterday evening. I arrived yesterday evening with some friends.
The part of (1b) that answers the question posed in (1a) – i.e. the adverbial adjunct – expresses focus information; the remaining part is background. Representing one option among a set of alternative answers, focus information is new information whereas the background is given from the context. In the questionanswer sequence (1a–c), however, the manner adjunct with some friends – which is post-focal according to Lambrecht (1994) – encodes information that is new, i.e. not part of the background, but does not contribute to answering the relevant question and thus cannot be part of the focus in the strict sense either. The adjunct, in a way, answers a question that has not been asked. We suspect that, typically, this type of information represents background information in the wide discourse-structural sense of that term (see below), as seems to be the case with the adjuncts discussed in sections 3.2 and 3.3. But to our knowledge, the focus-background partition – and information structure at sentence level in general – has not been thoroughly discussed with respect to sentences enriched by optional adjuncts and occurring in real discourse.5 So we shall leave it at the level of suspicion. The notion of background in the focus-background partition discussed above is quite different in nature from discourse-structural background. As we see it, “background” or “backgrounding” can be understood in (at least) two ways on discourse level, namely a) as the discourse relation defined in SDRT and RST, and b) as referring to discourse subordination in general, i.e. covering all subordinating discourse relations in the SDRT model, and all nucleus-satellite relations in RST (see below). The discourse relation Background, as defined in the framework of SDRT,6 is taken to hold “whenever one constituent provides information about the surrounding state of affairs in which the eventuality mentioned in the other constituent occurred” (Asher and Lascarides 2003: 460). It is generally exemplified by sentence sequences like (2) where it is the second sentence that describes a state temporally overlapping the event introduced by the first sentence; that is, 5. Asher (1999) discusses some aspects of the relation between sentential focus and discourse focus. The issue of optional adjuncts, however, is not taken up here. 6. Since most of this paper was written before July 2007, recent work on the discourse relation Background in SDRT, in particular Asher, Prévot and Vieu (2008), could only partially be taken into account. In the following, we will comment (in notes) where Asher, Prévot and Vieu (2008) deviate from what is said about the SDRT relation Background here.
Discourse-structural salience from a cross-linguistic perspective
147
S2 conveys background description relative to S1 – Background(S1 , S2 ) (cf. Asher and Lascarides 2003: 166–167, 460–461). (2)
Max opened the door. The room was pitch dark.
At one point, in fact, Asher and Lascarides (2003: 207–208) distinguish between two Background relations: Background1 , exemplified by (2), and Background2 , which holds when it is the first segment of a sequence that provides information about the “surrounding state of affairs” relative to the subsequent segment: Background2 (S2 , S1 ). However, Asher and Lascarides do not give any examples of Background2 , and in practice, they seem to understand Background as illustrated by (2), i.e. in the narrow sense of Background1 .7 According to Asher and Lascarides (2003), the discourse relation Background is a coordinating discourse relation, but it differs from the prototypical coordinating discourse relation Narration by allowing a subsequent segment S3 (e.g. He looked cautiously around him.) to attach to S1 – which is a diagnostic property of subordinating discourse relations (cf. Asher and Vieu 2005). Asher and Lascarides (2003: 166–167) overcome the difficulty by assuming that the text consisting of S1 and S2 “has a topic whose content is constructed by repeating (rather than summarizing) the contents” of the two segments. The topic is understood as related to the background segment (i.e. S2 ) by a relation called Foreground-Background Pair – which is classified as a subordinating discourse relation (Asher and Lascarides 2003: 462). In the end, then, Asher and Lascarides (2003) have it both ways: S2 is related to the preceding segment S1 by a coordinating discourse relation (Background), but related by a subordinating discourse relation (Foreground-Background Pair) to the topic constructed by repeating the contents of the DRSes assigned to S1 and S2 . At any rate, Asher and Lascarides (2003) concede that “[i]ntuitively, a discourse structure containing Background(π 1, π 2) where Kπ 1 describes a (foregrounded) event and Kπ 2 describes the (background) state, should encode the fact that Kπ 1 is the “main story line” or the foreground; Kπ 1 is the thing that 7. Background2 , “forward-looking Background”, and its relation to Background1 , “backward-looking Background”, is addressed in more detail in Asher, Prévot and Vieu (2007). They now propose a single Background relation, but differentiate the discourse structures in which the relation appears: whereas in backward-looking Background situations (Background1 ) two constituents are directly linked by a Background relation, in forward-looking Background situations an additional constituent representing a so-called “Framing Topic” is constructed in order to account for the characteristic function of setting a (temporal, spatial etc.) frame against which the following discourse unit(s) is/are to be interpreted.
148
Wiebke Ramm
“matters” in that events from subsequent utterances will be related to it” (Asher and Lascarides 2003: 166).8 Thus understood, the foreground-background distinction seems related to the distinction between Hauptstruktur (‘main structure’) and Nebenstruktur (‘side structure’) made by Klein and von Stutterheim (1987) within the so-called quaestio model. As a cover term for forward-looking and backward-looking Background, i.e. as a discourse relation that can attach in both directions, the SDRT relation Background would also be similar to the discourse relation Background as defined in RST (Mann and Thompson 1988). In RST Background is an (asymmetric) nucleus-satellite relation – roughly corresponding to what is called a subordinating discourse relation in SDRT but not formally defined – where the function of what is presented in the satellite is to increase the reader’s ability to comprehend what is presented in the nucleus (see definitions on the RST webpage).9 Although the RST definition is not restricted to a particular order of nucleus and satellite, in typical RST examples of the Background relation the satellite precedes the nucleus (see examples on the RST webpage.)10 This means that, after all, Background as a discourse relation is understood quite differently in SDRT and RST.11 In any case, the SDRT notion of Background is a narrower concept, being defined solely by way of temporal overlap between an event(uality) and a state – which makes it problematic for the analysis of non-narrative texts, i.e. texts that are not primarily structured by temporal relations.12
8. The question of whether Background is coordinating or subordinating is also taken up in Asher, Prévot and Vieu (2007). As a reaction on Vieu and Prévot (2004) and Fabricius-Hansen et al. (2005) who (independently) challenged the hypothesis that Background is coordinating, they seem to adapt the view that Background is in fact subordinating, rather than coordinating (Asher, Prévot and Vieu 2008: 9). 9. URL: http://www.sfu.ca/rst/01intro/definitions.html (visited 7 Oct 2009) 10. URL: http://www.sfu.ca/rst/02analyses/index.html (visited 7 Oct 2009) 11. The relation between the discourse relations Background as defined in RST vs. SDRT is also discussed in Asher, Prévot and Vieu (2008: 5–6). Referring to the definitions of RST relations given in Carlson and Marcu (2001), they view the SDRT relation Background as covering two relations in RST, Circumstance (requiring cotemporality of the events described) and Background (also allowing the events to occur at distinctly different times). 12. Still another Foreground(ing)-Background(ing) pair, related to salience and attention, is found in the work of Talmy (e.g. Talmy 2000). To him a concept or a category of concepts like Manner (of motion) is backgrounded, i.e. less salient, if it is expressed as part of – “conflated with” – the main verb root, but foregrounded if it
Discourse-structural salience from a cross-linguistic perspective
149
2.3. Syntactic coordination in discourse representation Having addressed different notions of background on sentence and discourse level, the question is how syntactic coordination (clause/verb phrase) fits into this picture. The SDRT model seems to presuppose a strong correlation between syntactic coordination and coordinating discourse relations, see e.g. the (narrative) examples containing and-coordination in Asher and Vieu (2005: 604, 605, 606). In fact, Asher and Lascarides (2003: 170) follow Txurruka (2000) in assuming that “and is a discourse marker for a coordinating relation; it doesn’t correspond to a single rhetorical relation, but rather it signals a number of different possibilities such as Narration or Result”; Asher and Vieu (2005) apparently maintain this assumption, although they do point to data suggesting that the inference from and-coordination to discourse coordination may be defeasible (Asher and Vieu 2005: 598–599). Also in text analyses based on RST – e.g. those published on the RST web site13 – coordinated structures (not containing other discourse markers) are typically assigned a multinuclear discourse relation, e.g. Joint or Conjunction,14 i.e. the two conjuncts are assigned the same discourse salience and they are not hierarchically related by a nucleus-satellite relation. We would like to question whether this actually always holds crosslinguistically, and whether this appropriately represents actual discourse structure. The examples in Section 3 indicate that coordination can also be used to link elements with different salience in discourse, for example, with respect to the continuation in discourse in the following sentence. Interesting in this context is research on coordinated vs. non-coordinated sentences within the framework of Relevance Theory (Blakemore 1987, 2002; Blakemore and Carston 2005), pointing out the (possibly) non-symmetric nature of coordination and showing that coordination is possible in certain cases while blocked in others. In particular, they show that using coordination instead is encoded as an independent constituent; cf. I flew to Hawaii last month vs. I went by plane to Hawaii last month (Talmy 2000 II: 128). 13. URL: http://www.sfu.ca/rst/02analyses/index.html (visited 7 Oct 2009) 14. Conjunction is not defined as a discourse relation on the “official” RST web site, but is contained in the “extMT – extended Mann/Thompson” set of RST relations in the RST tool developed by O’Donnell (URL: http://www.wagsoft.com/RSTTool/ index.html, visited 7 Oct 2009) which is widely used for text analyses across the RST community. Although not precisely defined there either, according to the developer of the tool (answer to a mail request, June 04) this relation is meant to cover constructions with and connectives. See also the related discussion on the definitions of Conjunction and Disjunction on the RST mailing list, September 2006: URL: http://lloyd.emich.edu/cgi-bin/wa?A1=ind0609&L=rstlist (visited 7 Oct 2009).
150
Wiebke Ramm
of a sequence of non-coordinated (“full stop”) sentences sends two types of signal to the reader, namely, (i) that the two conjuncts should be processed as a unit, both conjuncts functioning together as premises in the derivation of a joint cognitive effect, and (ii) that certain inferences are licensed regarding the semanticpragmatic relations holding between them, the first conjunct always functioning as a background to the processing of the second. In narrative examples, for instance, a temporal-causal relation (of “consequentiality”, cf. Sandström 1993) is often inferred without any explicit mention of such a relation; and a nonnarrative use of coordination can be seen in argumentative examples, where the conjuncts make a joint contribution as steps in an argumentation (Blakemore and Carston 2005). Relevance Theory does not distinguish between coordinating and subordinating discourse relations – and prefers to avoid the notion of discourse relations at all (Blakemore 2002: Sect. 5.3) – but most of the narrative as well as the argumentative examples given in Blakemore and Carston (2005) would probably be classified as coordinating discourse relations in the SDRT framework. Further relevant research on the discourse properties of sentential and-coordinations vs. asyndetic sentence connections has recently been done by Jasinskaja (2009), who combines ideas from SDRT and Relevance Theory in order to account for the inference of implicit discourse relations, i.e. the inference of discourse relations which may be communicated without explicit signalling by a connective or other linguistic means. Whereas theories such as SDRT focus on the restrictions and-coordinations impose on the semantic relations that may hold between the conjuncts, Jasinskaja (2009: 68) argues that it is rather asyndetic connection which is associated with some non-trivial constraints on the inference of implicit discourse relations, namely towards interpreting the second sentence as an Elaboration or Explanation of the first. Central chapters of her thesis concentrate on discourse relations in oral communication where prosody makes an important contribution to the signalling of discourse relations. But the two principles Jasinskaja argues to be guiding discourse interpretation (of utterances not containing additional connectives) might also be interesting for the interpretation of written texts: The Principle of Exhaustive Interpretation (Jasinskaja 2009: 289) says that, by default, an utterance is interpreted exhaustively, i.e. as if it were a complete answer to the “question under discussion” (QUD) which is defined similar to the notion of “quaestio” in Klein and v. Stutterheim’s (1987) quaestio model mentioned above. The Principle of Topic Continuity (Jasinskaja 2009: 291) says that, by default, discourse topics do not change. These default principles can only be overridden by special linguistic mechanisms (Jasinskaja 2009: 300). In combination, the two principles imply that a new utterance should be interpreted as addressing the original
Discourse-structural salience from a cross-linguistic perspective
151
“unsettled” question unless there is a discourse marker that indicates something else (Jasinskaja 2006: 291). In this view, Restatement would emerge as a default discourse relation, i.e. a relation that can be inferred without explicit signalling and fulfilling both the Exhaustivity and Topic Continuity principle. The coordination marker and, however, functions as a (weak) linguistic marker that a non-default discourse relation (e.g. Narration) holds. Although basically developed with reference to English, Jasinskaja’s ideas can be interesting for one of the translation scenarios to be discussed in the following section, namely the translation of sentence coordination by juxtaposed (asyndetically and syndetically connected) sentences (Section 3.1). 2.4. Coordination, subordination, and clause linkage In his typology of clause linkage Lehmann (1988) describes the options for complex sentence formation cross-linguistically along six syntactic-semantic parameters. Some of these parameters seem to be useful to characterize the structural changes found in the translation examples we are going to discuss in the following section and may help to relate the syntactic concepts of subordination/coordination15 and hypotaxis/parataxis16 to their discourse-structural counterparts: “Hierarchical downgrading” describes the degree to which a hierarchical relation between the linked segments holds (“parataxis” and “embedding” forming the two poles of this continuum), “desententialisation” refers to the degree to which the subordinate clause is expanded or reduced (with “sententiality” and “nominality” as its extremes), and “explicitness of linking” refers to the presence/absence and type of a connective device between two clauses/segments (with “syndesis” and “asyndesis” at the two ends of the continuum).17 The examples presented in 3.1, sentential coordination translated as a sentence sequence, show changes along the syndesis-asyndesis continuum: In those examples where the discourse relation holding between the sentences in the 15. Lehmann (1988: 182) conceives subordination as a form of clause linkage, while coordination is seen as a “relation of sociation combining two syntagms of the same type and forming a syntagm which is again of the same type” and is thus not restricted to hold on clause-level only. 16. Hypotaxis is understood by Lehmann (1988: 182) “as the subordination of a clause in the narrow sense (which problably includes its finiteness)”, while parataxis refers to the coordination of clauses, with no further restrictions “on the kind or structural means of coordination. In particular, parataxis may be syndetic or asyndetic”. 17. Lehmann (1988: 210) points out that explicitness of linking has nothing to do with parataxis vs. hypotaxis. As examples for linking devices with decreasing explicitness
152
Wiebke Ramm
translation is not explicitly signalled, e.g. by a discourse connective, the translation is more asyndetic than the original. In the cases where the discourse relation is explicitly signalled by a discourse connective other than the coordinator og, however, the translation is more explicit/syndetic than the original, where the relation holding between the conjuncts (typically a narrative/temporal-causal one) has to be inferred from the propositional content of the conjuncts (see Blakemore and Carston 2005, and Section 2.3 on which relations might be licensed in a sentential coordination without an explicit mention of it). In the examples discussed in 3.2 and 3.3 the structural changes are more visible: In both cases one of the linked elements is both hierarchically upgraded (i.e. less dependent on the other) and more sentential in the translation. 3.
Syntactic coordination and discourse subordination: Three contrastive perspectives
What happens at the level of discourse structure when syntactically coordinated structures are translated as non-coordinated sequences of sentences, or subordinated structures are translated as coordinate, and why do translators choose these options in certain cases? In this section we present and discuss certain types of translation mismatch that might challenge the discourse representation approaches presented in Section 2. 3.1. From coordinated clauses to sentence sequences (Norwegian > German) One case at hand is sentential coordination in Norwegian translated as a noncoordinated sequence of sentences in German. The corpus contains several examples of clause coordination in the Norwegian original such as (3) and (4) below, where coordination would sound odd in German, obviously due to language-specific differences regarding the use of sentential coordination in the two languages, as we will show.18 of linking he mentions the following: anaphoric subordinate clause referring back to the preceding discourse (maximal syndesis), gerundial verb, prepositional phrase, connective adverb, specific conjunction, universal subordinator, and nonfinite verb form (asyndesis) (Lehman 1988: 211). 18. In a corpus study investigating sentence boundary adjustments in translations of popular science texts between Norwegian and German, sentential coordinations turned out to be among the most frequent causes of sentence splitting (i.e. the translation of one SL clause or clause complex as two or more independent sentences in the TL) for the translation direction Norwegian-German (Ramm 2010). Moreover, the analysis has shown that 48,7 % of all sentential coordinations in the Norwegian orig-
Discourse-structural salience from a cross-linguistic perspective
(3)
a.
b.
153
Legene hadde sitt eget reisemønster, som er analysert[i] . Studiereiser til utlandet var viktige for profesjonell anseelse og autoritet[ii] , og totalbildet av reisemønsteret er entydig[iii] : Tyskspråklige universiteter var de viktigste reisemål for norske leger som ønsket videreutdannelse eller spesialisering[iv] . ‘The doctors had their own travel pattern, which is analysed[i] . Educational trips abroad were important for professional reputation and authority[ii] , and the overall picture of the travel pattern is clear[iii] : German-speaking universities were the most important destinations for Norwegian doctors who wanted further education or specialisation[iv] .’19 Die Ärzte hatten ihr eigenes, heute analysiertes, Reisemuster[i] . Studienreisen ins Ausland wurden als wichtig für berufliches Ansehen und Autorität angesehen[ii] . Das Gesamtbild der Reisen ist eindeutig[iii] : Deutschsprachige Universitäten waren die wichtigsten Reiseziele norwegischer Ärzte, die eine Weiterbildung oder Spezialisierung wünschten[iv] . ‘The doctors had their own, today analysed, travel pattern[i] . Educational trips abroad were viewed as being important for professional reputation and authority[ii] . The overall picture of the travels is clear[iii] : German-speaking universities were the most important destinations for Norwegian doctors who wanted further education or specialisation[iv] .’
inal texts are not translated by a corresponding coordination in German. Two general translation strategies can be identified for these examples of translation mismatch: In 22,4 % of the Norwegian coordination examples the coordination marker is dropped, and the discourse relation holding between the clauses is left implicit in the German version (as in ex. (3), (4) and (8) in this paper); in the remaining 26,3 % of the mismatch examples some alternative element (e.g., a pronominal adverb or a demonstrative nominal phrase) is used to signal the discourse relation holding between the clauses; this strategy is illustrated by ex. (5), (6), (7) and (9) in this paper. For some of the non-correspondence examples in the corpus translation by coordination in German certainly could have been an option. The frequency of this type of translation mismatch in the corpus, however, should be seen as a strong indication for some language-specific difference regarding the use of coordination. 19. In this and the following examples English glosses of the Norwegian and German text examples keeping the word order particularities of the original languages are given in single quotes. Due to the length of the examples, we refrained from presenting interlinear glosses and idiomatic translations.
154
Wiebke Ramm
In (3a) the lack of a common topic between the two conjuncts seems to block the use of coordination in the German translation (3b). A further problem is the fact that the second conjunct alone is elaborated by the sentence following the colon. In the translation the coordinated clauses are split into two separate sentences which leads to a change of the discourse structure assigned to the text: In the RST model, the German translation can be analysed as a Background or Circumstance relation – with (3b[ii] ) as satellite, its nucleus covering (3b[iii] ) and (3b[iv] ) –, and the span (3b[ii] )–(3b[iv] ) functioning as Elaboration to (3b[i] ). The analysis of the Norwegian original, however, would possibly have to assign a (multinuclear) Conjunction (or Joint) relation to (3a[ii] ) and (3a[iii] ). But where does this span attach to its discourse context? To the left (as Elaboration or Background of (3a[i] ) – which does not fit very well), or to the right (as Background)? But then – at least as a non-native speaker of Norwegian – one runs into problems with how to coherently interpret the sentence following the colon, since (3a[iv] ) clearly elaborates the second conjunct (3a[iii] ), but not the first (3a[ii] ). Thus, the grouping of (3a[ii] ) and (3a[iii] ) as a joint, non-hierarchical span leads to attachment problems with the following discourse segment. Using the SDRT approach one runs into similar problems: In the Norwegian version the reader probably first tries to interpret (3a[ii] ) as providing background information (Background1 in the sense of Asher and Lascarides 2003, backward-looking Background in the sense of Asher, Prévot and Vieu 2008)20 or as adding some kind of explanation (Explanation being one of the subordinating discourse relations in SDRT) to the preceding sentence (3a[i] ). But which relation holds between (3a[ii] ) and (3a[iii] )? In English or German the use of the coordination marker would presuppose the existence of some kind of common topic between the linked elements, but obviously Norwegian is not that strict in this respect. For the German version, an SDRT-style analysis is less problematic: a relation of Background1 /backward-looking Background or Explanation may be assigned between the independent sentence corresponding to the first conjunct (3b[ii] ) and the sentence preceding it (3b[i] ), whereas the counterpart of the second conjunct (3b[iii] ) can be interpreted as elaborating sentence (3b[i] ). Jasinskaja’s (2009) suggestion to treat the coordination marker as a signal that the current utterance (sentence) is not yet completed (i.e. does not conform to the exhaustiveness condition) would not work properly for the interpretation of the Norwegian og-coordination either. The German version, however does: the independent sentence corresponding to the first SL conjunct (3b[ii] ) 20. A problem with the relation Background in this example could be the restriction that, according to the definition in SDRT, it is restricted to hold between an event and a state.
Discourse-structural salience from a cross-linguistic perspective
155
can be interpreted as a complete utterance, connecting it to the preceding context (3b[i] ). And also the counterpart of the second conjunct (3b[iii] ) can be interpreted as an utterance with independent discourse contribution, namely as an elaboration of the preceding sentence (3b[i] ), an interpretation which is not compatible with a coordinated structure. (4)
a.
b.
Andre problemer var ikke mindre alvorlige[i] . Malmforekomstene holdt ikke hva de lovet[ii] , og tapte raskt sin edelhet nedover i fjellet[iii] . Driften gikk med underskudd, og innskyterne trakk seg etter hvert ut[iv] . ‘Other problems were not less serious[i] . The ore deposits were not what they promised[ii] , and lost quickly their preciousness downwards in the mountain[iii] . The operation ran with deficit, and the financial supporters gradually backed down[iv] .’ Andere Probleme waren nicht weniger gravierend[i] . Die Vorkommen hielten nicht, was sie versprachen[ii] ; der Metallgehalt nahm mit zunehmender Tiefe rasch ab[iii] . Die Erzgewinnung war ein Zuschussgeschäft und die Geldgeber machten nach und nach einen Rückzieher[iv] . ‘Other problems were not less serious[i] . The deposits did not hold what they promised[ii] ; the metal content decreased quickly with increasing depth[iii] . The ore winning was a lossmaking business and the financial supporters gradually backed down[iv] .’
Similar problems occur in (4), where the second conjunct (4a[iii] ) should be subordinated (as an Explanation in SDRT, and as an Evidence satellite in RST) in relation to the first conjunct (4a[ii] ), since the following sentence (4a[iv] ) obviously is related only to (4a[ii] ) and not to (4a[iii] ). This discourse representation is precisely what we get in the German translation (4b) – where the coordination marker og (and) is replaced by a semicolon. But which discourse structure should be assigned to the Norwegian version, where both SDRT and RST would be urged to assign a coordinating/multinuclear discourse relation to the sentential coordination, blocking the right frontier (in the SDRT framework) or not providing an appropriate nucleus (in the RST framework) to attach (4a[iv] )? The two examples above illustrate that Norwegian seems to be less restricted as to the types of elements that can be coordinated. They are evidence to the effect that the universality of the definition of discourse relations in theories like SDRT or RST is questioned (cf. 3.4). Our examples show that at least the function of the coordination marker (og/und/and) is not precisely the same cross-
156
Wiebke Ramm
linguistically: syntactic coordination seems to be compatible with discourse relations like Background or Explanation in Norwegian, while blocked in German.21 (5)
a.
b.
Reformasjonen bragte etterhvert denne direkte norsk-tyske forbindelse til opphør. Den dansk-norske konge ønsket å sentralisere presteutdannelsen til universitetet i København (grunnlagt 1479), og det tok slutt med at norske studenter dro til tyske universiteter for å få sin utdannelse. 1500- og 1600-tallets universitet ble et instrument for å befeste den sentraliserte fyrstestat, ved å gi utdannelse for statens embedsmenn. ‘The reformation brought eventually this direct Norwegian-German connection to a stop. The Danish-Norwegian king wished to centralise the priest education to the university in Copenhagen (founded in 1479), and it took an end with that Norwegian students went to German universities to get their education. The 16th and 17th century’s university became an instrument to stabilise the centralised princely state, by giving education to the officials of the state.’ Diese direkte norwegisch-deutsche Verbindung wurde von der Reformation nach und nach zum Erliegen gebracht. Der dänischnorwegische König wünschte eine Konzentration der Pfarrerausbildung an der 1479 gegründeten Universität Kopenhagen. Damit reisten keine norwegischen Studenten mehr zur Ausbildung an deutsche Universitäten. Die Universitäten des 16. und 17. Jahrhunderts wurden durch die Ausbildung höherer Staatsbeamter zu Instrumentarien der Stärkung des zentralisierten Fürstenstaates.
21. We are aware of the fact that the data presented in this paper are based on parallel corpora only, i.e. we are comparing linguistic features of original texts with features of their respective translations only, and that the properties of translations might deviate from the properties of original texts of a language – e.g., in translations the original language may “shine through” (cf. Teich 2003: 61) in some way. We have not analysed the use of sentential coordination in comparable corpora of Norwegian and German, but if we assume that there is at least some “shining through” from the Norwegian SL texts regarding the use of coordination, it can be expected that the differences of use are even clearer in comparable corpora. (see again note 18. on the frequency of non-correspondence regarding coordination in Norwegian and German).
Discourse-structural salience from a cross-linguistic perspective
157
‘This direct Norwegian-German connection was by the reformation eventually brought to a stop. The Danish-Norwegian king wished a concentration of the priest education to the in 1479 founded university in Copenhagen. Damit (‘with/by this’) travelled no longer Norwegian students for education to German universities. The universities of the 16th and 17th century became by the education of higher state officials instruments for the strengthening of the centralised princely state.’ (6)
a.
b.
Den andre kraftlinjen, like fra middelalderen av, har gått til England og den angel-saksiske verden. Bruddet etter 1945 førte til at denne forbindelsen ble dominerende, både politisk og kulturelt, og Norge fremstår i dag som trolig et av de mest amerikaniserte samfunn i Europa. Men Tyskland får igjen økende betydning, som Norges største handelspartner og den viktigste støttespiller innen EU-systemet. ‘The second line of power, directly from the Middle Ages on, has gone to England and the Anglo-Saxon world. The breaking-off after 1945 led to that this connection became dominating politically as well as culturally, and Norway appears today as probably one of the most americanised societies in Europe. But Germany gets again increasing importance, as Norway’s largest trade partner and the most important supporter within the EU system.’ Die zweite Kraftlinie, ebenfalls seit dem Mittelalter, nimmt ihren Ursprung in England und der angelsächsischen Welt. Der Bruch nach 1945 führte dazu, dass diese zweite Verbindung zur wichtigsten wurde, sowohl politisch als auch kulturell. Deswegen ist Norwegen heute wahrscheinlich eines der am stärksten amerikanisierten Länder Europas. Doch Deutschland gewinnt wieder als der größte Handelspartner Norwegens und als die wichtigste Stütze im EU-System an Bedeutung. ‘The second line of power, also from the Middle Ages on, has its origin in England and the Anglo-Saxon world. The breakingoff after 1945 led dazu (pron.adv., lit. ‘there-to’), that this second connection became the most important one, both politically as well as culturally. Therefore is Norway today probably one of the most americanised countries in Europe. But Germany gains again as Norway’s largest trade partner and the most important support within the EU system.’
158
Wiebke Ramm
Also in examples (5) and (6) a sentential coordination in the Norwegian original text is translated by a sequence of independent sentences, but in these examples a connective (pronominal adverb) explicitly signalling the temporal-causal relation between the sentences corresponding to the two conjuncts in the Norwegian version is added. The same relations (of “consequentiality”, cf. Section 2.3) hold between the conjuncts in the Norwegian text, but here they have to be contextually inferred, i.e. they are less explicitly marked. A structurally equivalent translation by a coordination would have been possible for both examples, i.e. would not have been in contradiction with the discourse relations licensed by und-coordination in German. However, the option chosen in the actual translation seems to be more natural with respect to the standard patterns of text organisation in German. (7)
a.
b.
I Bergen hadde det mektige tyske Kontoret hindret tyskere i å ta norsk borgerskap av frykt for at de skulle bli konkurrenter[i] . I 1560 måtte Kontoret oppgi denne politikken[ii] , og en stadig strøm av tidligere hanseater tok i den følgende tida frivillig norsk borgerskap[iii] . I 1766 ble den siste vintersitteren borger i Bergen, det var Jochen Krämer fra Bremen[iv] . ‘In Bergen had the powerful Comptoir hindered the Germans to acquire Norwegian citizenship because of fear that they might become competitors[i] . In 1560 the Comptoir had to give up this politics[ii] , and a continuous stream of previous Hanseats acquired in the following time deliberately Norwegian citizenship[iii] . In 1766 became the last winter-sitter a citizen of Bergen, it was Jochen Kämer from Bremen[iv] .’ In Bergen hatte das mächtige Comptoir aus Angst vor deren Konkurrenz Deutsche daran gehindert, norwegische Bürger zu werden[i] . 1560 musste das Comptoir diese Politik aufgeben[ii] . Während der darauf folgenden Zeit ließ ein ständiger Strom ehemaliger Hanseaten sich freiwillig einbürgern[iii] . 1766 wurde der letzte Wintersitzer, Jochen Krämer aus Bremen, Bürger von Bergen[iv] . ‘In Bergen had the powerful Comptoir because of fear of competition hindered Germans to acquire Norwegian citizenship[i] . In 1560 the Comptoir had to give up this politics[ii] . In the following time acquired a continuous stream of previous hanseats deliberately Norwegian citizenship[iii] . In 1766 became the last wintersitter, Jochen Kämer from Bremen, a citizen of Bergen[iv] .’
Discourse-structural salience from a cross-linguistic perspective
159
A similar type of sentence splitting of a Norwegian sentential coordination leading to a more explicit marking of the discourse relations and the progression of the text in the German version, is illustrated by (7). As in (5) and (6), the discourse relation between the Norwegian conjuncts (here a purely temporal one) is compatible with the meaning of og/and/und, so sentential coordination would in principle have been an option for the German translation (in contrast to (3) and (4)). Nevertheless the sentential coordination is split up and the constituent order in the sentence corresponding to the second conjunct is changed in the translation, placing the temporal adverbial während der darauf folgenden Zeit (‘in the following time’) in sentence-initial position. This leads to a more explicit signalling of the thematic progression in this text fragment (temporal sequence), indicated by the parallelism of three temporal adverbials in sentenceinitial position in (7b[ii]–[iv] ): 1560 – während der darauf folgenden Zeit (‘in the following time’) – 1766. The temporal progression is not as explicitly marked in the Norwegian version, since the choice of a coordinated construction only leaves a thematically less prominent position for the adverbial i den følgende tida (‘in the following time’). So this example seems to give further evidence for the preference of German texts (or at least of translations from Norwegian) to signal the type of text progression and the relations holding between discourse units more explicitly than the corresponding Norwegian original texts. (8)
a.
b.
(9)
a.
Riktignok prøvde noen av bergmennene å drive videre i 1550-årene[i] . Men det er tvilsomt om de lyktes[ii] , og driften var nok i alle tilfelle svært beskjeden[iii] . Noen av dem slo seg ned i Skien[iv] . ‘Indeed tried some of the mineworkers to continue to run (the mine) in the 1550-ies[i] . But it is doubtable whether they succeeded[ii] , and the running was in any case very limited[iii] . Some of them settled in Skien[iv] .’ Zwar versuchten einige der Bergleute, den Betrieb nach 1550 noch fortzusetzen[i] . Doch es ist nicht sicher, ob sie Erfolg hatten[ii] . In jedem Fall war die Ausbeute recht bescheiden[iii] . Einige der Deutschen ließen sich in Skien nieder[iv] . ‘Indeed tried some of the mineworkers to continue running (the mine) after 1550[i] . But it is not sure whether they succeeded[ii] . In any case was the profit very limited[iii] . Some of the Germans settled in Skien[iv] .’ Jeg vil ikke påta meg å besvare spørsmålet[i] . Mange har vært opptatt av det unike ved den tyske universitetsmodell[ii] ,ogdet foreligger en stor litteratur som det vil føre for langt å gjøre rede for
160
Wiebke Ramm
b.
her[iii] . Men det kan være interessant å peke på noen forhold som kan ha betydning[iv] . ‘I will not try to answer the question[i] . Many have been engaged to stress the uniqueness of the German university model[ii] , and there exists big literature which it would take too long to discuss here[iii] . But it might be interesting to point to some circumstances that might have importance[iv] .’ Ich werde hier nicht versuchen, diese Frage zu beantworten[i] . Viele Forscher haben sich damit beschäftigt, worin das Einmalige des deutschen Universitätsmodells bestand[ii] . Zu diesem Thema liegt eine umfangreiche Literatur vor, deren eingehendere Erörterung hier zu weit führen würde[iii] . Doch dürfte es von Interesse sein, auf einige Verhältnisse, die für die Beantwortung der Frage von Bedeutung sein könnten, etwas genauer einzugehen[iv] . ‘I will not try to answer the question[i] . Many researchers have been engaged to define what characterised the uniqueness of the German university model[ii] . On this topic exists a vast amount of literature the discussion of which would take too long here[iii] . But it might be of interest, for some circumstances which are important for answering the question, to go into more detail[iv] .’
Sentence splitting in the translations of (8) and (9) above seems so be motivated by different preferences regarding whether the conjoined clauses are expected to contribute to the incrementally constructed discourse representation as one joint unit or whether it is also possible that only one of the conjuncts constitutes a discourse relation with the preceding or following discourse units. In (8a) the first conjunct (8a[ii] ) is in a (concessive) discourse relation to the previous sentence (8a[i] ), indicated by the sequence of the connectives riktignok (‘although, indeed’) and men (‘but’), but it is not clear whether this concessive relation also holds for the second conjunct (8a[iii] ). This indeterminacy does not exist in the German version, where only the sentence corresponding to the first conjunct (8b[ii] ) is in a concessive relation to the previous sentence (8b[i] ) (zwar ‘although, indeed’ – doch ‘but, however’). The sentence corresponding to the second conjunct (8b[iii] ) rather implies an Evidence (RST) or Explanation (SDRT) relation to the sentence corresponding to the first conjunct, signalled by es ist nicht sicher (‘it is not sure’) and the (topicalised) adverbial in jedem Fall (‘in any case’). Thus, again, the discourse relations holding between some of the discourse units are more clearly inferable in the German version of the text. In (9) the coordination in the Norwegian version leads to some indeterminacy with respect to the interpretation (attachment in discourse structure) of the
Discourse-structural salience from a cross-linguistic perspective
161
following sentence: The following sentence (9a[iv] ), starting with men (‘but’), is in a contrastive relation to the non-restrictive relative clause som det vil føre for langt å gjøre rede for her (‘which it would take too long to discuss here’) which is a part of the second conjunct (9a[iii] ), but this is somewhat blurred in the Norwegian version due to the existence of the coordination which might imply some joint contribution of the two conjuncts to the further development of the discourse. By dropping the coordination in the German translation the sentence corresponding to the second conjunct (9[iii] ) is interpreted as an elaboration of the sentence corresponding to the first conjunct (9b[ii] ) – further emphasised by adding zu diesem Thema (‘on this topic’) and placing it in sentence-initial position. The inference of the contrastive connection between the relative clause deren eingehendere Erörterung hier zu weit führen würde (‘the discussion of which would take too long here’) and the following sentence (9b[iv] ) is not disturbed by the attempt to assign some joint relevance to the conjuncts of a sentential coordination. So, as in (8), sentence splitting in the German translation guarantees a clearer correlation between sentence boundaries and the attachment of discourse units to the incrementally growing discourse representation. These examples of Norwegian sentential coordination translated by noncoordinated sequences of sentences in German indicate that sentential coordination serves somewhat different functions in discourse in the two languages: 1. The discourse relations compatible with (licensed by) coordination seem to be more constraint in German than in Norwegian, as illustrated by (3) and (4). In German the use of the coordination marker und appears to be restricted to the types of relations (e.g. additive and temporal-causal) also compatible with and in English (cf. Blakemore and Carston 2005, see Section 2.3), whereas in Norwegian these constraints obviously are not taken that seriously. 2. In cases where sentential coordination would be compatible with a discourse relation this is often not the preferred realisation in German. Rather, an option is chosen which more explicitly signals the discourse relation holding between adjacent discourse units, e.g. by using a connective as in (5) and (6). 3. German also seems to take und as a means to signal that two conjuncts should be processed as a joint unit and jointly contribute to discourse structure more seriously than Norwegian does, as illustrated by (8) and (9). Paratactic clause linkage with og seems to function as a kind of default sequentialisation strategy in Norwegian, which is applied without imposing too much meaning to the coordination marker. In this way the use of og in written Norwegian texts seems to be similar to the functions and (and its equivalents in other languages) can take in oral narratives, i.e. signalling that the story
162
Wiebke Ramm
goes on without being too specific about the relation holding between the discourse units, or explicitly marking the transitions between discourse units (cf. e.g. Schiffrin 1986 on the functions of and in (English) conversations). This observation would also fit into the picture of written Norwegian as still being under the pressure of the norms holding for oral language (see e.g. Torp and Vikør 2000: Chapt. 14; Solfjeld 2000: 46–48). In German written genres, however, sentential coordination and sentence boundaries in general appear to be taken more seriously as signals indicating the structuring of the discourse into hierarchically and non-hierarchically related units. 3.2. From verb phrase/nominal phrase adjunction to coordination (German > Norwegian) In this and the following section examples are discussed where coordinated structures are found as translations of syntactically subordinated structures. (10) and (11) below are typical examples of what Fabricius-Hansen (1999) has termed backward information extraction, which occurs quite frequently in translations from German into Norwegian (Solfjeld 2004): Syntactically downgraded information encoded in an adjunct at verb phrase level in the source sentence is rendered in a conjunct to the left of the conjunct corresponding most closely to the main predicate of the source sentence, the latter having neutral focus. (The source-text adjunct and its target-text counterpart are given in bold face.) (10)
a.
b.
Für die Trennung des Kindes von der Mutter wurden medizinische und pädagogische Begründungen angeführt und anhand […] beglaubigt. Eine perfekte medizinisch-technische Versorgung bekam die größte Bedeutung. Im Interesse der Infektionsverhütung […] wurde die Sterilität groß geschrieben. ‘For the separation of the child from its mother were medical and pedagogical reasons given and by means of […] supported. A perfect medical-technical care got vital importance. In the interest of infection prevention […] was sterility emphasized.’ Det ble anført medisinske og pedagogiske grunner til at mor og barn skulle skilles ad, og dette ble forklart ved […]. En perfekt medisinsk-teknisk omsorg ble av største betydning. Infeksjoner skulle unngås […], og steriliteten ble skjøvet i forgrunnen. ‘It were given medical and pedagogical reasons for that mother and child should be separated, and this was explained by […]. A perfect medical-technical care became of vital importance. Infec-
Discourse-structural salience from a cross-linguistic perspective
163
tions were to be avoided […], and sterility was moved into the foreground.’ (11)
a.
b.
Als es feststand, daß die Alliierten nicht hier, sondern an der Kanalküste landen würden, disponierte man um und schickte alle Boote dorthin. Der Gegner, uns überhörend, faßte seine Beobachtungen präzise zusammen. ‘When it was clear that the Allies not here, but on the Channel coast would land, we reorganized and sent all the boats there. The opponent, us bugging, summarised his observations precisely.’ Da det nå ble klart at de allierte ikke ville lande her, men i Normandie, ble vi omdirigert dit. Motstanderne våre avlyttet våre radiomeldinger og samlet omhyggelig sammen opplysninger. ‘As it now got clear that the Allies would not land here, but in Normandy, we were redirected to-there. Our opponents bugged our radio messages and gathered information carefully.’
In both examples a structurally equivalent Norwegian translation would not have been possible or would at least have been stylistically marked. Although both languages are V2, Norwegian is less open to place informationally “heavy” constituents in Vorfeld position than German is, making it difficult to render the prepositional phrase in (10a), which furthermore is based on nominalisations (Interesse ‘interest’, Infektionsverhütung ‘infection prevention’), by a corresponding prepositional phrase in Norwegian. Neither is it possible to translate the present participle construction in (11a), uns überhörend (‘us bugging’), by a corresponding participle construction in Norwegian. Choosing a coordinated structure in (10) and (11) in the Norwegian translation can be see as a strategy which tries to compensate for the lack of equivalent structural options or preferences in Norwegian. The Norwegian versions are more sentential than the original texts and exploit the inference mechanisms triggered by the coordinative structure (cf. 2.3, Blakemore 2002) in order to gain a similar interpretation as the German version. The syntactically downgraded function of the German adjunct is “simulated” by the first conjunct which gets the discourse function of “leading up to” the second, i.e. entering into a consequentiality relation with the second conjunct. In this way coordination works as a backgrounding device, establishing the second conjunct as part of the “main story” – equivalent to the source text. The frequent use of coordination also illustrates the tendency that Norwegian prefers to organize discourse paratactically where German tends to use hypotactic/hierarchical structures (Fabricius-Hansen 1996).
164
Wiebke Ramm
3.3. From ing-adjuncts to coordination (English > German/Norwegian) Free ing-adjuncts are adjuncts of some sort but more “sentential” and less integrated (see 2.4, Lehmann 1988), than the German adjectival/adverbial adjuncts translated as a sentential coordination in (10) and (11) above. Quite often such adjunct constructions are rendered as verb phrase coordination in German and Norwegian (cf. Behrens 1998 for English/Norwegian). This is the case in (12), for instance, where the ing-adjunct, representing backgrounded information, precedes its matrix clause and is rendered as first conjunct in both target texts. (12)
a. b.
c.
Then, using a flat pack of slim steel files from his top pocket he started to work on the softer metal of the skeleton. Dann holte er einen Satz dünner Stahlfeilen aus der Brusttasche und bearbeitete damit den Weichmetallteil des Dietrichs. ‘Then took he a set of thin steel files from his top pocket and worked with it (lit. ‘there-with’) the soft metal part of the skeleton key.’ Så tok han en flat pakke tynne stålfiler opp av brystlommen og ga seg til å arbeide på det bløtere metallet i nøkkelen. ‘Then took he a flat pack of thin steel files up from his top pocket and started to work with the softer metal in the key.’
However, also when postponed to their matrix clause, ing-adjuncts are often subordinated from a discourse structural point of view, describing e.g. an “accompanying circumstance”22 to the matrix clause eventuality as in (13a) and (14a). In such cases, German translations by coordination may preserve the order of the two segments but explicitly mark the relation of temporal overlap between them by adding the connective dabei (lit. ‘there-by’, ‘at the same time / on the same occasion’) in the second conjunct, as in (13b), thus blocking a (con)sequential interpretation which might otherwise be preferred. But the order of presentation may also be switched so that the first conjunct in the translation corresponds to the postponed ing-adjunct in the original, as in (14b) and (15b). The Norwegian translations in (14c) and (15c), on the other hand, use coordination without changing the order of the verb phrases corresponding to the matrix clause and the ing-adjunct of the source text – and without overtly marking the temporal relation between the eventualities described in the two conjuncts. It may be objected that the translations are ambiguous and/or not par22. The relation “accompanying circumstance” is discussed in more detail in Behrens and Fabricius-Hansen (2010).
Discourse-structural salience from a cross-linguistic perspective
165
ticularly good. But nevertheless these examples seem to give further evidence for the hypothesis that coordination functions somewhat differently in Norwegian than in German and English. The dispensability of a marker of the temporal overlap in (13c) indicates that Norwegian may be less biased to interpreting clause/verb phrase coordination as a temporal sequence (in narration) than German is. And (14c) and (15c) show that Norwegian possibly is also more open to placing background(ed) information in the second conjunct, the position where focused/foregrounded information is strongly preferred in German. (13)
a. b. c.
(14)
a. b. c.
(15)
a. b.
c.
He smiled slyly, nodding. Er lächelte verstohlen und nickte dabei. ‘He smiled furtively and nodded thereby.’ Han smilte litt lurt og nikket. ‘He smiled somewhat slyly and nodded.’ Tony went home, taking his tool box with him. Tony griff nach seinem Werkzeugkasten und ging nach Hause. ‘Tony reached for his tool box and went home.’ Tony gikk hjem og tok med seg verktøykassen sin. ‘Tony went home and took his tool box with him.’ Things suddenly got very tense in the bar and Dad drank heavily, sweating.. Auf einmal wurde die Atmosphäre in der Bar äußerst angespannt, und Papa schwitzte und trank immer mehr. ‘Suddenly the atmosphere got very tense in the bar, and Dad sweated and drank more and more.’ Stemningen i baren ble plutselig meget spent, og pappa drakk tett og svettet. ‘The atmosphere in the bar got suddenly very tense, and Dad drank heavily and sweated.’
3.4. Discussion: coordination and the marking of (non-)salience in discourse The examples discussed in the previous sections illustrate that clause and verb phrase coordination can serve various discourse functions in Norwegian (as source and target language), and some of them seem to be different from those in German or English. This allows for some cross-linguistic reflections on how coordination relates to the marking of (non-)salience in discourse. In Section 3.1 we related “salience” to the use of coordination with respect to the assignment
166
Wiebke Ramm
of a coordinating/multinuclear vs. subordinating/nucleus-satellite discourse relation between two clauses. The examples presented in this section indicate that sentential coordination can be used in Norwegian in cases where only a subordinating/nucleus-satellite discourse relation would be possible in German. This observation may lead to two conclusions, a) that the coordination marker og not always functions as a marker of equal salience of two clauses in Norwegian (where “equal salience” is understood as implying a coordinating/multinuclear discourse relation), or b) that the differentiation between the two types of discourse relations simply is not that strict (is not taken that seriously) in Norwegian as it is in German. The translation changes made in the examples in 3.1 illustrate a further aspect of discourse structur(ing) that might be interesting in the context of a discussion of the notion of salience. It seems that sentence boundaries marked by full stop are taken more seriously as a discourse segmentation signal in German than in Norwegian, i.e. as a delimited step in the incremental construction of the discourse representation, or – viewed from the opposite direction – that not using such a segmentation signal as in the case of sentential coordination is also taken more seriously as a signal that the incremental construction of a representation of the respective discourse segment is not yet finished. If this observation is correct, this would also put Jasinskaja’s (2009) claim that the most interesting constraints on discourse relations (i.e. which relations are inferred by default) are imposed by the full stop, not by the coordination marker, into a new perspective: maybe Norwegian behaves a bit different than other languages also in this respect (cf. Fabricius-Hansen 1999: 212, for a similar view)? In any case, the examples in 3.1 support the view that discourse-structural salience and the necessity to mark the salience status of a discourse unit (such as a clause) might be a relative or language-dependent concept: whereas some languages (such as German) operate with clear structuring signals (segmentation, hierarchical and non-hierarchical organisation of discourse structure) to indicate how pieces of discourse should be put together, others (such as Norwegian) organise text by relying less on sentence boundaries and explicit structuring signals such as the coordination marker or discourse connectives. The examples in Section 3.2 and 3.3 illustrate the choice of coordination as a translation strategy compensating for language-systematic differences regarding the realisation of certain types of adjuncts. Here we assumed that the adjuncts in the SL version function as some kind of “background(ed)” or less salient type of information (correlating with its syntactically downgraded function in syntax), and that this downgradedness is remodelled/simulated by exploiting the inference mechanisms triggered by the use of the coordination marker in the TL. As mentioned before, the adjuncts in Section 3.2 and 3.3
Discourse-structural salience from a cross-linguistic perspective
167
differ as regards the clause linkage type they realise (cf. Section 2.4) – the adjuncts in 3.3 being more sentential and less integrated than the adjuncts in 3.2.23 This means that more hierarchical upgrading (towards parataxis) and more sententialisation is required in (10) and (11) than in (12) to (15). This implies also that – at least in (10) and (11) – the relations expressed in the SL vs. TL text change their status from semantic relations holding between units/constituents within a clause to discourse relations holding between propositions/clauses. This can be viewed as a change in salience associated with a piece of information in the SL vs. TL text – from non-propositional (or less propositional), contributing to sentence semantics (in the first place) to propositional, contributing to discourse semantics/structure. In sum, discourse salience emerges as a concept that can have many facets in a contrastive perspective. Such a multi-dimensional nature of salience (more specifically, of “nuclearity”) – yet not in a contrastive perspective – has also been argued for by Stede (2008), who demonstrates that discourse units may be assigned salience on different levels of description and that various factors – such as referential structure, thematic development, intentional structure or explicit linguistic markers – come into play here. Another “factored” approach to discourse is pursued by Webber and her colleagues (e.g. Webber, Knott, and Joshi 1999, 2003). Working in the framework of Tree-Adjoining Grammar, they distinguish between (discourse) relations that are induced structurally by punctuation or (coordinating or subordinating) conjunctions like and, although on the one hand, and relations that are established by presupposition-bearing anaphoric adverbials like then, instead, otherwise on the other hand. Whereas relations of the former type hold between the interpretation of adjacent or conjoined discourse units, thus creating a (discourse) structure in the strict sense, anaphoric adverbials signal “a relation between the interpretation of their matrix clause and an entity in or derived from the discourse context” (Webber, Knott, and Joshi 2003: 547) which may cross such structural dependencies. Webber, Knott, and Joshi suggest that this “factored” approach may have “a better chance of providing a cross-linguistic account of discourse than one that relies on a single premise” (Webber et al., 1999: Sect. 5). Their approach does not, as we see it, offer an immediate solution to the specific problems discussed in connection with examples (3) and (4) (Sect. 3.1). But the proposed distinction between structurally induced discourse relations (triggered by punctuation and conjunctions) creating discourse structure in the strict sense 23. The semantics and discourse properties of adjuncts of various kinds are taken up in detail in Fabricius-Hansen and Haug (eds., in prep.). The problem of “competing structures” across languages is particularly discussed in Chapter 5.
168
Wiebke Ramm
and the relations triggered by presupposition-bearing anaphoric adverbials may give a lead as to how the “strange” discourse behaviour of the Norwegian conjunction og could be explained: possibly og doesn’t always function as a conjunction, i.e. doesn’t (always) create discourse structure in the same way as the corresponding conjunctions und and and in German or English do.
4.
Conclusions
We have shown that special conditions seem to hold as regards the use of sentential and verb phrase coordination with (counterparts of) and in Norwegian as compared to German and English. In translations from German or English into Norwegian, coordination is often used as a compensation for languagespecific – structural and stylistic – restrictions on hypotactic complexity at sentence level (Sections 3.2 and 3.3). Apparently, Norwegian is also less constrained as to which kinds of (discourse) elements can be linked by the coordination marker (Section 3.1) and in which order the conjuncts appear (Section 3.3). To put it the other way round, it appears that the function of the coordination marker (og/und/and) is not precisely the same cross-linguistically, so that e.g. syntactic coordination may be compatible with discourse relations like Background, Explanation or Elaboration in Norwegian, while blocked in German or English. These observations cast some doubt on the cross-linguistic validity of the definition of discourse relations in theories like SDRT or RST. In particular, they seem to challenge the assumption (see 2.3) that syntactic coordination with (equivalents of) the connective and necessarily implies a coordinating/multinuclear discourse relation. The translation examples furthermore illustrate that salience can be expressed by various linguistic means and that these means may differ cross-linguistically (3.4). Salience may be assigned by the hierarchical vs. non-hierarchical organisation of discourse in form of subordinating vs. coordinating discourse relations holding between clauses/propositions. But salience can also manifest itself by the choice of the size/granularity of the linguistic unit to communicate a piece of information, in particular by the choice between propositions (clauses) and linguistic units that do not have proposition status, e.g. as phrases. Finally, also discourse segmentation into (complex) sentences separated by full stop (or other “major” punctuation marks such as question mark and exclamation mark) relates to the assignment of salience in discourse. Segmentation into sentences provides the “temporal” dimension of discourse interpretation, by determining which “portions” of information should be integrated into the incrementally
Discourse-structural salience from a cross-linguistic perspective
169
constructed (mental) discourse representation at a certain point in the development of the text. Acknowledgements This article is a modified and extended version of Ramm and Fabricius-Hansen (2005), and many of the ideas presented here have been developed in cooperation with Cathrine Fabricius-Hansen. Moreover, the work has profited from cooperation with Bergljot Behrens (Univ. of Oslo) and Kåre Solfjeld (Østfold Univ. College, Halden) who have contributed with examples and helpful discussions. I am also grateful to the Faculty of Humanities at the University of Oslo, for supporting me by a PhD scholarship (2003–2006). The research has been carried out in connection with the project SPRIK (Språk i kontrast / Languages in Contrast)24 at the University of Oslo, Faculty of Humanities funded by the Norwegian Research Council under project number 158447/530 (2003–2008). References Asher, Nicholas 1999 Discourse and the focus/background distinction. In: Peter Bosch and Rob A. van der Sandt (eds.), Focus: Linguistic, Cognitive, and Computational Perspectives, 247–267. Cambridge/New York: Cambridge University Press. Asher, Nicholas and Alex Lascarides 2003 Logics of Conversation. (Studies in Natural Language Processing.) Cambridge/New York: Cambridge University Press. Asher, Nicholas, Laurent Prévot and Laure Vieu 2007 Setting the background in discourse. Discours 1: 1–29. URL: http://discours.revues.orig/index301.html Asher, Nicholas and Laure Vieu, Laure 2005 Subordinating and coordinating discourse relations. Lingua 115: 591– 610. Behrens, Bergljot 1998 Contrastive discourse: An interlingual approach to the interpretation and translation of free ING-participial adjuncts. Ph.D. dissertation, Department of Linguistics, University of Oslo.
24. Project URL: http://www.hf.uio.no/ilos/forskning/projekter/sprik//index.html (visited 16 Sep 20109)
170
Wiebke Ramm
Behrens, Bergljot and Cathrine Fabricius-Hansen forthc. The discourse relation Accompanying Circumstance across languages. Conflict between linguistic expression and discourse subordination? In: Dingfang Shu and Ken Turner (eds.), Contrasting Meaning in Languages of the East and West. Frankfurt: Peter Lang Blakemore, Diane 1987 Semantic Constraints on Relevance. Oxford: Blackwell. Blakemore, Diane 2002 Relevance and Linguistic Meaning: The Semantics and Pragmatics of Discourse Markers. (Cambridge studies in linguistics 99.) Cambridge: Cambridge University Press. Blakemore, Diane and Robyn Carston 2005 The pragmatics of sentential coordination with “and”. Lingua 115: 569–589. Büring, Daniel 1997
The meaning of topic and focus: The 59th Street Bridge accent. (Routledge studies in German linguistics.) London: Routledge.
Carlson, Lynn and Daniel Marcu 2001 Discourse tagging manual. Technical Report ISI-TR-545, ISI. Fabricius-Hansen, Cathrine 1996 Informational density: A problem for translation and translation theory. Linguistics 34: 521–565. Fabricius-Hansen, Cathrine 1999 Information packaging and translation. Aspects of translational sentence splitting (German – English/Norwegian). In: Monika Doherty (ed.), Sprachspezifische Aspekte der Informationsverteilung, 175– 213. Berlin: Akademie-Verlag. Fabricius-Hansen, Cathrine and Dag T. T. Haug (eds.) in prep. Big Events, Small Clauses: The Grammar of Elaboration. (Language, Context and Cognition.) Berlin/New York: Mouton de Gruyter. Fabricius-Hansen, Cathrine, Wiebke Ramm, Kåre Solfjed and Bergljot Behrens 2005 Coordination, discourse relations and information packaging – crosslinguistic differences. In: Aurnague, M., Bras, M., Le Draoulec, A., and Vieu, L. (eds.), First International Symposium on the Exploration and Modelling of Meaning (SEM-05), 85–93. Fabricius-Hansen, Cathrine and Wiebke Ramm 2008 Editor’s introduction: Subordination and coordination from different perspectives. In: Cathrine Fabricius-Hansen and Wiebke Ramm (eds.), ‘Subordination’ versus ‘coordination’ in sentence and text. A cross-linguistic perspective. (Studies in Language Companion Series 98). Amsterdam/Philadelphia: John Benjamins.
Discourse-structural salience from a cross-linguistic perspective
171
Jasinskaja, Ekaterina 2009 Pragmatics and prosody of implicit discourse relations: The case of restatement. Ph.D. dissertation, University of Tübingen. Klein, Wolfgang and Christiane von Stutterheim 1987 Quaestio und referentielle Bewegung in Erzählungen. Linguistische Berichte 109: 163–183. Lambrecht, Knud 1994 Information Structure and Sentence Form: Topic, Focus, and the Mental Representations of Discourse Referents. (Cambridge studies in linguistics 71.) Cambridge: Cambridge University Press. Lehmann, Christian 1988 Towards a typology of clause linkage. In: John Haiman and Sandra A. Thompson (eds.), Clause Combining in Grammar and Discourse, 181–225. Amsterdam/Philadelphia: John Benjamins. Mann, William C. and Sandra A. Thompson 1988 Rhetorical Structure Theory: Toward a functional theory of text organization. Text 8: 243–281. Ramm, Wiebke 2010
Satzgrenzenveränderungen in der Übersetzung: Satzverbindung und lokale Diskursorganisation im Norwegischen und Deutschen. Ph.D. (submitted), University of Oslo.
Ramm, Wiebke and Cathrine Fabricius-Hansen 2005 Coordination and discourse-structural salience from a cross-linguistic perspective. In: Manfred Stede, Christian Chiarcos, Michael Grabski and Luuk Lagerwerf (eds.), Salience in Discourse: Multidisciplinary Approaches to Discourse 2005, 119–128. Münster: Stichting/Nodus. Rooth, Mats 1992
A theory of focus interpretation. Natural Language Semantics 1: 75– 116.
Sandström, Görel 1993 When-clauses and the temporal interpretation of narrative discourse. Ph.D. dissertation, Department of General Linguistics, University of Umeå. Schiffrin, Deborah 1986 Functions of ‘and’ in discourse. Journal of Pragmatics 10: 41–66. Solfjeld, Kåre 2000
Sententialität, Nominalität und Übersetzung. Eine empirische Untersuchung deutscher Sachprosatexte und ihrer norwegischen Übersetzungen. Frankfurt M.: Peter Lang.
172
Wiebke Ramm
Solfjeld, Kåre 2004
Stede, Manfred 2008
Informationsspaltung nach links in Sachprosaübersetzungen DeutschNorwegisch. In: Eva Lambertsson Björk and Sverre Vesterhus (eds.), Kommunikasjon, 111–130. Halden: Høgskolen i Østfold. RST revisited: Disentangling nuclearity. In: Cathrine Fabricius-Hansen and Wiebke Ramm (eds.), ‘Subordination’ versus ‘Coordination’ in Sentence and Text. A Cross-linguistic Perspective, (Studies in Language Companion Series 98.) Amsterdam/New York: John Benjamins.
Talmy, Leonard 2000 Toward a Cognitive Semantics. Volume I: Concept Structuring Systems. Volume II: Typology and Process in Concept Structuring. Cambridge, MA: The MIT Press. Teich, Elke 2003
Cross-Linguistic Variation in System and Text. A Methodology for the Investigation of Translations and Comparable Texts. (Text, Translation, Computational Processing 5.) Berlin/New York: Mouton de Gruyter.
Torp, Arne and Lars S. Vikør 2000 Hovuddrag i norsk språkhistorie. Oslo: Gyldendal. Txurruka, Isabel G. 2000 The semantics of ‘and’ in discourse. Technical Report ILCLI-00-LIC9, University of the Basque Country. Vallduvi, Enric and Elisabeth Engdahl 1996 The cross-linguistic realization of information packaging. Linguistics 34: 459–519. Vieu, Laure and L. Laurent Prévot 2004 Background in Segmented Discourse Representation Theory. In: Workshop Segmented Discourse Representation Theory, 11th Conference on Natural Language Processing (TALN), 485–494. Webber, Bonnie, Alistair Knott, Matthew Stone and Aravind Joshi 1999 Discourse relations: A structural and presuppositional account using lexicalised TAG. Paper presented at 1999 Meeting of the Association for Computational Linguistics, College Park MD. Webber, Bonnie, Alistair Knott, Matthew Stone and Aravind Joshi 2003 Anaphora and discourse Structure. Computational Linguistics 29: 545–587.
Rhetorical relations and verb placement in Old High German Roland Hinterhölzl and Svetlana Petrova
1.
Introduction
The present paper approaches the issue of salience in discourse from the perspective of historical linguistics and the theory of language change. In particular, we are interested in discerning and describing linguistic phenomena which are formal correlates of salience and related notions in the system of Old High German (henceforth OHG). In particular, we are interested in finding out how the expression of features related to salience influences the development of novel forms and patterns in the history of German. According to the common definition employed in this volume, salience reflects “the degree of relative prominence of a unit of information, at a specific point in time, compared to other units of information” (Introduction, p. 2ff.). A variety of linguistic factors which determine the referent’s current degree of salience have been discussed in the literature, foremost cognitive status (given vs. new), grammatical role (subject vs. non-subject) and animacy (animate vs. non-animate). It is also claimed that there is a special matching relation between the referent’s current degree of salience and the form of the linguistic expression used to refer to it (Gundel et al. 1993) also called ‘referential choice’ (Krasavina, this volume). At the same time, languages employ special strategies to mark shifts in the degree of salience with respect to the preceding context, e.g. when a referent with a lower degree of salience is promoted to a higher degree of prominence at a particular stage of the discourse, also called ‘salience promotion’ (see also Filchenko, Chiarcos, all this volume ). Addressing the issue of referential choice and the form of anaphoric expressions in OHG, Petrova and Solf (2010) have argued that salience promotion as a main principle governing the use of demonstratives vs. personal pronouns in modern German (see Bosch et al. 2003, Bosch and Umbach 2007), has applied already at the earliest stages of the language. Yet the use of anaphoric expressions is only one domain in which saliencerelated features find a formal expression in the system of OHG. In the following
174
Roland Hinterhölzl and Svetlana Petrova
contribution, we will argue that pragmatic factors related to salience and discourse coherence take formal realization in syntax as well, more precisely in the structure of the left periphery of main clauses in OHG. In particular, we will focus on the principles determining the position of the finite verb in the sentence. In this respect, the notion of salience and its realization are crucial for the explanation of structural variation in the left periphery of main clauses in OHG. On the basis of evidence from the OHG Tatian, a major representative of the OHG corpus (see section 2 below), we distinguish verb-initial (V1) and verbsecond (V2) as the two basic word order patterns at this particular stage of the development of German. In approaching the principles governing the distribution and functional properties of these patterns, we first draw the attention to the correlation between salience and syntactic position in the clause. Following initial observations outlined in Hinterhölzl et al. (2005), we show that the positional realization of referring expressions in OHG is sensitive to the degree of salience of the particular referent in the sense of givenness and accessibility in the discourse. So expressions referring to salient, i.e. pre-mentioned or situationally inferable, referents are realized in clause-initial position followed immediately by the finite verb, which results in V2 structures on the surface. In contrast, non-salient, i.e. discourse-new referents are placed postverbally yielding V1 on the surface. Following this, we conclude that V2 is used as a means of marking prominence on the constituent placed in clause initial position and separated from the rest of the utterance by the finite verb. However, this correlation can be overwritten by discourse-structural factors, as is evidenced by the occurrence of V1 orders with given discourse referents. In some of the cases, the factors leading to V1 clearly pertain to discourse organization proper, i.e. they mark the beginning of a new chapter or episode in the structure of the text. With Grüning and Kibrik (2005), we can assume that referential distance across episode/paragraph boundaries lowers the status of salience of the antecedent which results in postverbal realization of the referring expression. In this case, the process of ‘salience demotion’ takes place (see also Filchenko, this volume). But in other cases, V1 with given referents occurs within one and the same episode. In these cases, however, the sentence conveys an especially important event or state which is crucial to the further development of the discourse. In attempting to provide a unified account for all cases of V1, we invoke the distinction between coordination vs. subordination in discourse as outlined in the Segmented Discourse Representation Theory (SDRT, Asher and Lascarides 2003; see also Ramm, this volume). We analyze the instances of V1 and V2 from the perspective of the features viewed as constitutive for the definition of two basic types of rhetorical relations in discourse.
Rhetorical relations and verb placement in Old High German
175
As a result, we relate V2 to subordination, while all types of V1 are attributed to the realization of coordination in discourse. We conclude that word order and especially verb placement in OHG contribute to the realization of a dynamic, multi-layered discourse structure and are therefore best described as a formal correlate of text coherence and discourse relations in the system of OHG. The implications of this study are twofold. For language theory, it outlines the interaction between the word order of constituents and their rhetorical and discourse-functional contribution in the text. For historical linguistics, it proposes an alternative approach to the research on word order variation and the development of V2 in the Germanic languages which sheds new light on these issues.
2.
Philological issues and empirical data base
The OHG corpus comprises texts of different length, genre, and quality of transmission composed in the time between around 750 and 1050. Of course not all of them are equally appropriate for syntactic research (cf. Fleischer 2006). One of the largest prose texts from the beginning of the OHG period is the Tatian text, a gospel harmony translated from Latin and written down in the scriptorium of Fulda by at least 6 scribes. This text has been deliberately chosen for the purpose of the present investigation. Although having been considered for a long time a slavish word-for-word translation of the Latin original and therefore unsuitable for any investigation on word order, this text has been rediscovered as a good basis for research due to novel insights into the main principle of translation applied in it. In the manuscript, as Figure 1 of the Appendix shows, the Latin source and the OHG translation are attested as two juxtaposed columns. Only recently, it has been observed that each line in the OHG text translates exactly the same material found in the corresponding Latin line; departures from this basic principle are extremely rare within the whole text. A new diplomatic edition made available by Masser (1994) reflects these major characteristics and makes it possible to compare the source and target text, cf. Figure 2 of the Appendix. The translating technique applied in the Tatian text certainly imposes restrictions on the possibility of rendering genuine word order patterns in the translation (cf. Masser 1997 a and b), while the deviations from the Latin source can be viewed as evidence for genuine OHG structures (cf. Dittmer and Dittmer 1998; Fleischer, Hinterhölzl and Solf 2008). Therefore, we base our study on deviating examples from the Tatian text exclusively. The corpus of the study comprises the complete sample of deviations in constituent order found in the text portions of three scribes, a total of 1.658
176
Roland Hinterhölzl and Svetlana Petrova
clausal structures. These examples were fed into a corpus and annotated for various morpho-syntactic and information-structural features by project B4 of Collaborative Research Center (SFB) 632 “Information Structure” at Humboldt University Berlin. The corpus is searchable via the ANNIS database (Chiarcos et al., 2008; Zeldes et al., 2009) developed by project D1 of SFB 632 (University of Potsdam, Humboldt University Berlin). For more details concerning the design of this corpus and the use of the ANNIS database see Petrova at al. (2009).
3.
The point of departure
3.1. Distribution of patterns and aim of the study Some of the most puzzling questions in the diachronic syntax of the Germanic languages in general, and of German in particular, concern the principles determining the placement of the finite verb in the earliest records as well as the subsequent establishment of the word order regularities in the modern systems of these languages. To illustrate the degree of word order variation in early Germanic, we provide some examples from one of the earliest OHG records, the Isidor translation dated back to the time around 800. Here, the finite verb may occur in any position in a main declarative clause, for example in initial position (1), in second position (2), or in a later position, following more than two and sometimes all of the remaining constituents of the clause (3). Note that all sentences deviate in word order from the corresponding Latin original:1 (1)
(2)
Quhad got, see miin chnecht spoke God behold my child ‘God spoke: “Behold my child”’ Latin Ecce, inquit, puer meum Ih faru dhir fora I go you-dat. before ‘I’ll go before you’ Latin Ego ante te ibo
(V1) (I 330) (V2) (I 156)
1. The examples from the Isidor [I] text are cited by line number according to the edition of Eggers (1964). The examples from the Tatian [T] text are cited by manuscript page and line number according to Masser (1994). A slash in the Tatian examples represents the end of line according to the manuscript. The inflected verb in both OHG and Latin is underlined for clarity throughout the paper.
Rhetorical relations and verb placement in Old High German
(3)
177
Dher selbo forasago auh in andreru stedi chundida (Vend) the same prophet also in another place announced ‘The same prophet announced in another place too’ (I 348) Latin […] alias […] testatur idem propheta
Table 1 provides the absolute number of word order patterns in main declarative clauses formed against the Latin original in Isidor. This overview shows that the patterns like in (2) and (3) appear with a considerable frequency in the document while V1 is found only rarely in clauses formed against the Latin word order.2 Table 1. Frequency of word order patterns in main declarative clauses in Isidor formed against the Latin original type of pattern number of occurrence in Isidor (against the Latin structure)
V1
V2
Vlate/end
6
74
45
Exploring the frequency of these word order types in the Tatian database described in section 2 above, we discover a rather different picture. Here, mainly V1 and V2 occur in considerable numbers against the structure of the Latin original.3 Patterns in which the verb occurs in a position later then the second one like in (4) are formed against the original only rarely, and cases with the verb at the absolute end of the sentence as in (5) are mere exceptions:4 2. Here, we only briefly refer to some previous accounts on some of these patterns in Isidor. First, we do not subscribe to the view expressed by Robinson (1994) who claims that V1 represents a foreign pattern used exclusively in the translation of the biblical citations rather than of the commentary parts of the treatise in order to signal foreign speech. Rather, we regard V1 as a common Germanic pattern which abounds both in the remaining texts of the OHG tradition as well as in all other early Germanic languages, i.e. in Old English, Old Saxon and Old Norse. Second, with respect to Vlate/Vend, we deny the view of Tomaselli (1995) reducing such examples to cases involving pronominal or other prosodically light constituents which she explains as clitics attached to the left of the verb after a full constituent in initial position. As our example in (3) shows, Vlate/Vend in main clauses in Isidor also appears in sentences with full constituents before the finite verb. 3. Note that the cases of V1 included in this statistics do not comprise elliptic non-initial conjuncts sharing the subject of the preceding clause and therefore showing surface V1-order. 4. In this example, the synthetic passive of the Latin original is represented by an analytic construction involving the finite form of the auxiliary sîn ‘be’ + Past Participle. As the semantics of the Latin main verb is reflected in the OHG participle, the finite
178
Roland Hinterhölzl and Svetlana Petrova
(4)
thanan tho zacharias uuard gitruobit then then Zacharias became troubled ‘Then, Zacharias was troubled’ Latin & zacharias turbatus est
(5)
(T 26, 20)
min tohter/ ubilo fon themo tiuuale giuuegit ist my daughter badly by the devil.DAT tortured is ‘My daughter is badly tortured by the devil’ (T 129, 10–11) Latin filia mea/ male a demonio uexatur
Table 2. Frequency of word order patterns in main declarative clauses in Tatian formed against the Latin original type of pattern
V1
V2
Vlate/end
number of occurrence in Tatian (against the Latin structure)
96
382
11
From this we can conclude that a process towards stricter verb fronting in main declarative clauses and a considerable reduction of the Vlate/end pattern has taken place already within the OHG period. One question arises from this observation, namely whether the distribution of the main competing patterns, V1 and V2, obeys certain rules in the system that emerges in the Tatian, and if so, what kind of principle may be made responsible for the choice of one pattern over the other. This question will be addressed in the following section. 3.2. Previous accounts In the most recent investigation on the structure of the sentence left periphery in OHG, Axel (2007) claims that the verb-second property typical for modern German has already developed at this early stage of the language. In the generative framework which Axel adopts, a constitutive feature of the verb-second rule is that the inflected verb obligatorily moves to the head C◦ of the maximal projection CP. Additionally, in main clauses, the specifier position of CP is filled either by i) movement of a phrase bearing one of the operator features +topic/+focus/+wh (operator movement), or ii) movement of a phrase that occupies the highest position in the middlefield of the corresponding structure (stylistic fronting, cf. Fanselow, 2003). If none of these movement operations applies, a non-referential expletive es is merged in SpecCP. auxiliary has to be regarded as an additional constituent not present in the original. Therefore, its placement in the OHG part is a matter of free choice.
Rhetorical relations and verb placement in Old High German
179
Turning to OHG, Axel shows that both operator movement as well as stylistic fronting occur, while the third option, the placement of a base-generated expletive in Spec,CP has not emerged yet. As a consequence, sentences in which neither operator movement nor stylistic fronting can apply remain as V1 (analyzed as the verb moving to C◦ with Spec,CP remaining empty). This implies that the rule of V2 was not fully grammaticalized yet in OHG. But what is then constitutive of the word order in OHG? To explain why Spec,CP remains empty in OHG, Axel refers to the fact that in most of the cases of V1, the sentence contains the adverbial tho ‘then’ in postverbal position taking the function of a narrative-expressive particle indicating sentence type just like other particles, e.g. the interrogative particle inu/eno, the affirmative particle ia or the imperative particle nu etc. Once sentence type has been indicated by the particle, the application of stylistic fronting is unnecessary leaving Spec,CP empty in the corresponding cases. Expressivity as a factor leading to V1 in early Germanic is known from a number of previous works on the matter. In his very influential study, Fourquet (1974) has put forward the idea that verb fronting in early Germanic is used to highlight the entire contents of a sentence. Much earlier, Ries (1880, 19) had observed for Old Saxon that V1 occurs in sentences reporting an outstandingly important event or property. As for Old English, van Kemenade (1987, 44) reports that in the Anglo-Saxon Chronicle, V1 is especially characteristic of one particular section which is “famous for its lively narrative style”. But expressivity, or stylistic vividness are rather vague terms when it comes to differentiating the domains in which the two main patterns in declaratives in OHG apply. All accounts mentioned before shift the attention to the broad field of pragmatics as the source of additional factors influencing word order in early Germanic. In this respect, they are representative of a long tradition of research whose attempts in explaining this issue should be reconsidered from the perspective of modern linguistic theory. In this respect, we want to analyze more thoroughly the functional domains in which the two main patterns of OHG main-clause syntax occur in order to be able to isolate operational features associated with each of them in OHG.
4.
Information structure and word order in OHG
Hinterhölzl et al. (2005) launch a large-scale investigation on the sensitivity of word order in OHG to factors pertaining to information structure. In line with the account proposed by Molnár (1993) and Krifka (2007) among others, information structure is understood as a complex linguistic phenomenon com-
180
Roland Hinterhölzl and Svetlana Petrova
prising functional distinctions of categories on the following three layers: i) the informational status of referents (theme vs. rheme or given vs. new); ii) the predicational structure of the utterance (topic vs. comment); and iii) the communicative weight or relevance of sentence constituents (focus vs. background). These layers of information structure are viewed to function independently in the language but to interact with each other in yielding the full picture of the information-structural shape of an utterance. In a first step, Hinterhölzl et al. (2005) investigate the relationship between the informational status of discourse referents and their positional realization with respect to the finite verb in the clause. The notion of ‘discourse referents’ is understood in the sense of Karttunen (1976) who applies this term to individuals (persons, events, facts) that can be referred back to in a coherent discourse by coreferential definite expressions, i.e. pronouns or full noun phrases. The identification of the informational status of discourse referents is based on taxonomies proposed by Prince (1981) and Dik (1989) who argue for a more fine-grained system in which ‘given’, i.e. explicitly pre-mentioned material, and ‘new’, i.e. novel, non-inferable information represent the two endpoints of a scale including different sub-types of textually or situationally accessible entities in between. The investigation of a possible correlation between verb placement and discourse status of constituents in instances of the OHG Tatian text reveals two striking tendencies. On the one hand, there is a regular preference for V1 in presentational sentences which introduce new referents to the context. This is shown in (6) through (8). It can be observed that V1 in OHG is the constant pattern corresponding to a variety of different orders in the Latin original: (6)
[The forty-days’ old Infant is presented to the Lord in the temple in Jerusalem and blessed there by Simeon. After that, the holy family meets the prophetess Anna.] uuas thô thâr anna uuizzaga was then there Ann prophetess ‘There lived there at that time the prophetess Anna’ (T 38, 22) Latin & erat anna proph&issa
(7)
[in the Nativity of Christ] uuarun thô hirta In thero lantskeffi were then shepherds in this region ‘There were shepherds in the same country’ Latin Et pastores erant In regione eadem
(T 35, 29)
Rhetorical relations and verb placement in Old High German
(8)
181
[Jesus tells a parable about an unjust judge who was asked by a widow to avenge her against her adversary] uuas thar ouh sum uuitua/ In thero burgi was there also certain widow in this town ‘There was a widow too in that city’ (T 201, 2) Latin Vidua autem quaedam erat/ In ciuitate illa
On the other hand, sentences maintaining an already introduced discourse referent as in (9) or involving a referent considered accessible via a bridging relation to an already established entity as in (10) show a regular tendency for V2 against the underlying word order of the Latin original. In other words, V2 appears to be bound to referents that are already salient in the discourse: (9)
[Jesus compares himself with a shepherd. ih bin guot hirti = ‘I am a good shepherd’] guot hirti/ tuot sina sela furi siniu scaph good shepherd does his soul for his sheep ‘The good shepherd gives his soul for his sheep’ (T 225, 16–17) Latin bonus pastor/ animam suam dat pro ouibus suis
(10)
[The previous sentence introduces Zacharias who is married to one of the daughters of Aaron] Inti ira namo uuas elisab&h and her name was Elizabeth ‘and her name was Elizabeth’ (T 26,2) Latin & nomen eius elisab&h
The text also provides numerous examples of ‘minimal pairs’ where the initial placement of the verb in the first sentence introducing new discourse referents is immediately suspended for a V2 clause in the following utterance making a statement on the referents just established. Consider the following small discourse: (11)
[the beginning of the story about the Nativity of John the Baptist] a. uuas In tagun herodes […]/ sumer biscof […]/ was in days Herod.GEN some bishop Inti quena Imo and wife him.DAT beida fora gote b. siu uuarun rehtiu righteous both before God.DAT they were
182
Roland Hinterhölzl and Svetlana Petrova
‘In the days of Herod […], there was a certain priest […] and his wife […]. They were both righteous before God’ (T 26, 3) Latin a. Fuit in diebus herodis regis/ […] quidam sacerdos/ […]/ & uxor illi […]/ b. erant autem iusti ambo ante deum This evidence provides significant points in favor of the interdependence between verb placement and information structure in OHG. It shows that new referents follow the verb, while referents already salient in the context precede it. What kind of generalization can we draw from these observations? Looking at the data from the perspective of the model developed by Sasse (1995), we discover that the sentences we are dealing with are typical representatives of the thetic vs. categorical type of judgments. By definition, categorical sentences have a bipartite structure divided into a predication base, or topic of the sentence and a comment on this topic. This is the case in (9), (10), and (11b). Here, the finite verb separates from the rest of the utterance exactly that constituent which provides the sentence topic (both in line with the familiarity as well as the aboutness concept, for a discussion see Frey 2000, 137–138). By contrast, the presentational sentences in (6) through (8) and in (11a) are typical instances of the thetic type of judgments. The most significant feature of such instances is that they represent “monominal predications” (Sasse 1995, 4) in which no particular constituent is taken as the predication base of the utterance; rather, the entire sentence, including all participants, is asserted as a unitary whole.5 Therefore, we can conclude that the position of the finite verb in OHG is firmly related to the realization of the topic-comment structure in a sentence. As a rule, the finite verb separates the topic expression from the comment of the utterance. In the most cases, this position is occupied by an expression referring to the most salient referent in the context, which is either previously mentioned or situationally accessible at that particular point of the discourse. Remarkably, novel referents serving as the predication base of a categoric utterance also share the positional properties of canonical (i.e. salient) topics in OHG. Consider the bare plural fohún ‘foxes’ in (12) which is not previously established in the context but is nevertheless placed in preverbal position. The sentence receives an interpretation according to which it makes a statement about a set of individuals of the denoted kind. Thus, the kind-refering bare plural in fohún is the aboutness topic of the utterance:6 5. Drubig (1992) and Lambrecht (1994, 137–146) also argue that in thetic utterances no topic-comment division applies. 6. In this respect, we follow Endriss and Hinterwimmer (2007) who argue that givenness is not necessary for topicality. They argue that novel constituents may provide
183
Rhetorical relations and verb placement in Old High German
(12)
[a chain of coordinate conjuncts claims that every creature has a home to stay over night except the Son of the Lord] fohún habent loh holes foxes have ‘The foxes have holes’ (T 85, 25) Latin vulpes foueas habent
In case no topic-comment distinction applies, the verb moves to the position in front of all arguments to indicate that none of them functions as the sentence topic and that the entire proposition has to be interpreted as wide (sentence) focus. These observations are summarized in (13): (13)
a. b.
[DR]TOP
[Vfin….DRnew …]FOCUS [Vfin……]COMMENT
(V1) (V2)
Lenerz (1984, 151–153) and Ramers (2005, 81) also observe that V1 is typical for presentational sentences in OHG. They conclude that V1 in OHG is used when the sentence conveys discourse-new, or rhematic material only. Looking at the examples above, we nevertheless discover that new information is established only in the subject expressions, while the remaining part of the sentence is given; see e.g. the adverbials In thero lantskeffi ‘in this country’ in (7), or In thero burgi ‘in this town’ in (8). From this perspective, the notion that V1 occurs in all-new sentences cannot be maintained. Rather, verb fronting signals that none of the constituents provided in the sentence takes over the function of the sentences topic because no topic-comment division applies in these utterances. 5.
Discourse structure and the distribution of word order patterns in OHG
5.1. Evidence for discourse relations On a closer look, it turns out that V1 is frequent in sentences with given arguments as well. Consider the subjects in (14) and (15):
the aboutness topic of an utterance if the utterance allows for a topic-comment division in which the respective constituent takes the role of the subject of the predication.
184
Roland Hinterhölzl and Svetlana Petrova
(14)
[A Pharisee invites Jesus to dine in his house. Jesus enters the house and sits down to eat. The Pharisee realizes that Jesus has not washed his hand before dinner and criticizes him on that occasion] bigonda ther phariseus […] quedan began this Pharisee speak.INF ‘The Pharisee began to speak’ (T 126, 5–6) Latin Phariseus autem coepit […] dicere
(15)
[Jesus starts telling a parable on whether it is allowed to heal on Sabbaths] Quad her tho zi then giladoten/ ratissa spoke he then to the invited parable ‘Then he told to the guests a parable’ (T 180, 9–10) Latin Dicebat autem & ad Inuitatos/parabolam
The full definite expression ther phariseus ‘the Pharisee’ in (14) as well as the personal pronoun her ‘he’ in (15) refer to entities already introduced in the previous discourse. But although they display pragmatic properties of sentence topics like givenness/accessibility, definiteness and referentiality, they fail to occupy the topic position established in (13b) above. To explain these data, we need to find a common basis to account for the postverbal placement of both given and new referents in OHG. In our opinion, this may be achieved if one broadens the account on information packaging beyond the scope of the informational status of individual discourse referents in the sentence and takes into consideration the discourse-functional role of the utterance in the narrative structure of the text. 5.2. Basic notions of discourse analysis We shall briefly outline some basic notions and distinctions in current research on discourse structure in order to show in our analysis that important discourserelated features of utterances correlate with the two main word order patterns in OHG, thus allowing the conclusion that variation in verb placement in OHG is pragmatically driven. A particular model relating word order in early Germanic to discourse organization is proposed by Hopper (1979a and b) who distinguishes between the part of main action, i.e. foregrounding, and the part of supportive information, i.e. backgounding in text structure. Hopper identifies some distinctive features associated with these notions. Typically, foregrounding is conveyed by dynamic, perfective verb meanings providing temporal progression on the level of main action. By contrast, backgrounding establishes temporal relations of si-
Rhetorical relations and verb placement in Old High German
185
multaneity to main actions induced by the durative semantics of the predicates involved. In this way, Hopper establishes a relation between discourse structure and aspectuality of the verb in the sentence, a feature which shall turn out to be important in our interpretation of the examples as well. Moreover, in his survey of formal realizations of foregrounding and backgrounding in a variety of non-related languages, Hopper comes across a fundamental matching relation between word order, especially verb placement, and discourse structure in the text of the Old English Anglo-Saxon (Parker) Chronicle as a representative of the early Germanic tradition. He observes that backgrounding parts employ SVO order, i.e. medial verb placement, whereas foregrounding parts generally display peripheral verb placement, either verb-final or verb-initial. The distribution of the latter two patterns is said to be a matter of further “discourse considerations” (cf. Hopper 1979b, 221): verb-initial is viewed to occur in introductory parts, that is, at the beginning of new episodes, whereas verb-final is bound to episode-internal sentences. Recent approaches to discourse semantics also take into consideration the temporal relation between clauses as a major device for text organization and coherence (see Claus, this volume, for the role of discourse participants in imposing a temporal structure on the narrated world). Two approaches that we will take into account are the Rhetorical Structure Theory (RST) by Mann and Thompson (1988) and the Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides (2003). A basic assumption in both of them is that discourse coherence is achieved only if each utterance makes an illocutionary contribution to another utterance in the context. This is achieved when discourse units establish different kinds of rhetorical relations among each other, thus creating a dynamic hierarchical structure in discourse. According to RST and SDRT, the rhetorical relations linking together the contents of single discourse units can be of the following two kinds: – two units can display no dependency relation among each other but share the same level of discourse hierarchy thus creating a multi-nuclear relation in the terms of RST or a relation of coordination in the terms of SDRT – two units can build a dependency relation creating a hierarchical structure in discourse, i.e. a nucleus-satellite relation in RST or a relation of subordination in SDRT. In order to show how verb placement participates in achieving discourse hierarchy in texts of the early Germanic tradition, we chose the model of SDRT. Although the inventory of individual discourse relations is still under discussion, there is overwhelming agreement on the basic features distinguishing coordination vs. subordination as the two basic types of linking. Both are associated with
186
Roland Hinterhölzl and Svetlana Petrova
a particular prototypical rhetorical relation displaying some well-defined, complementary features (Asher and Vieu 2005). Subordination is typically represented in elaboration, i.e. when a unit β provides more detail on another unit α situated on a higher level of discourse structure. In this case, the two events (α , β ) temporally overlap. Further, the rhetorical relation of continuation applies if two or more subsequent units β and γ are equally situated on a lower level of dependency with respect to a higher unit α such that both β and γ elaborate on α . By contrast, coordination, which holds between units situated on the same level of discourse hierarchy, is typically represented in the relation of narration. Narration is established e.g. if two discourse units (α , β ) display a temporal relation of succession and β continues the narrative sequence in discourse. Looking at the distinctive features of coordination vs. subordination in SDRT, we discover a number of parallels between them and the discourse properties of the word order patterns discussed in the foregoing data analysis. These will be discussed in turn in the following two sections. 5.3. V2 as a means of subordination in discourse From the perspective of the discourse relations distinguished above, the instances of V2 in (9), (10) and (11b) immediately evoke parallels to the subordinative type of linking. Consider also the following small discourse:7 (16)
[Jesus and his disciples approach the gates of a city called Nain and witness the following scene] a.
b.
senu arstorbaner/ uúas gitragan einag sun sinero behold dead man was carried only son his.GEN muoter mother Inti thiu uuas uuituuua and she was widow ‘behold, a dead man was being carried out, the only son of his mother and she was a widow’ (T 84, 22–24) Latin a. ecce defunctus/ efferebatur. filius unicus/ matris suae. b. & haec uidua erat.
7. Unfortunately, any significant reordering of constituents in the first sentence of this small discourse is impossible for reasons of the line-for-line principle of translation outlined in section 2 above. Therefore, the placement of the indefinite subject expression arstorbaner ‘a dead man’ introducing a new referent does not illustrate the distributional properties of such constituents outlined in this study.
Rhetorical relations and verb placement in Old High German
187
In (16b), the finite verb is shifted from the sentence final position in the Latin source to the position between the topic and the comment in the OHG translation. Clearly, the sentence in (16b) provides additional information on the discourse referent muoter ‘mother’ introduced by the preceding sentence. With respect to the temporal relation of the two sentences, we can observe that the event in (16b) overlaps with the event in (16a). Taken together, all these features favor the identification of elaboration among (16b) and (16a) as the prototype of the subordinating kind of linking in discourse. In other parts of the text, we discover chains of utterances equally depending on a higher unit in discourse structure. Consider (17b–e) which assign different properties to the referent scrîbera ‘the scribes’ introduced in the opening sentence (17a). V2 is established by the regular insertion of the pronominal subject referring to the topic referent of the entire text portion (topic continuity): (17)
a.
b. c. d. e.
obar stuol/ moyses sâzzun scrîbera/ Inti over seat Mose.GEN sat scribes and pharisej […] Pharisees sie quedent/ Inti nituont and NEG.PRT.do they say sie bintent suuara burdin […]/ heavy burdens they bind sie breitent Iro ruomgiscrib/ […] they make broad their phylacteries sie minnont furista sedal first seats they love ‘in Mose’s seat sit the scribes and the Pharisees. They say and they do not do, they bind heavy burdens, they make their phylacteries broad, they love the best places at feasts’(T 242, 18–243, 5) Latin a. super cathedram/ moysi sederunt scribe/ & pharisej. […] b. dicunt enim/ et non faciunt. c. Alligant autem onera grauia […] d. dilatant enim philacteria sua/ […]/ e. Amant enim primos recubitos
We interpret instances like these as cases of continuation, i.e. as a series of utterances serving to elaborate on the same unit situated higher in the discourse. In other cases, a discourse unit provides additional, explanatory information with respect to a previous event. Consider (18b) which provides a motivation for the proposition denoted in the previous utterance (18a):
188 (18)
Roland Hinterhölzl and Svetlana Petrova
[an angel prophesies to Zacharias the near birth of his son, John the Baptist, and explains that he will be a person of special qualities] a. b.
Inti and her he
manage in sineru giburti mendent have joy many in his birth ist uuârlihho mihhil fora druhtine is truly great before God.DAT
‘And many people will rejoice at his birth. For he will be great in the eyes of the Lord’ (T 26, 29–30) Latin & multi in natiuitate eius gaudebunt/ erit enim magnus coram domino To conclude, we relate the distribution of V2 in OHG to sentences establishing relations of subordination in discourse. First, V2 appears in sentences assigning properties to individuals or explaining the circumstances of events or actions established in previous discourse units. Second, the events provided by V2 sentences temporally overlap with those of the discourse units on which they elaborate. In terms of discourse hierarchy, V2 creates units that depend on higher units in discourse structure, thus instantiating subordination in discourse. 5.4.
V1 as a means of coordination in discourse
Previous descriptive accounts, summarized in Schrodt (2004, 144–145), provide the following two conditions favoring the use of V1 in OHG: first, V1 occurs in text-initial sentences or at the beginning of new episodes; and second, V1 is frequent with certain types of predicates like verbs of motion, verbs of saying etc. We shall look in more detail for a unified explanation of these functions of V1 in OHG, especially with respect to the kind of rhetorical relations they constitute in discourse. 5.4.1. V1 signals episode boundaries The use of V1 as an indication of episode boundaries directly invites the assumption that this pattern functions as a discourse-structuring device. As reported for some modern colloquial registers as well as for some orally transmitted genres like jokes etc. (Lenerz 1984, 153; Önnerfors 1997, 53), V1 has survived in text-opening sentences to the present day. The most numerous examples for this function in the Tatian involve the introductory formula uuard thô for Latin factum est ‘it happened‘ followed by an extraposed subject clause. In the following example, both the original and the translation involve the construction ‘auxiliary + past participle’. However, the scribe of the OHG text
Rhetorical relations and verb placement in Old High German
189
opted for V1 although a precise corresponding linearization pattern would have been possible by leaving the participle in the sentence-initial position, as in the original: (19)
uuard thô gitân In then tagon became then done in those days ‘It happened in those days’ Latin Factum est autem In diebus illis
(T 35, 7)
But also apart from this introductory formula, V1 applies more widely as a text-structuring device in OHG (cf. Petrova 2006; Petrova and Solf 2008). In the Tatian text which combines the events of the four gospels in one harmony episode onsets are signaled by concordance notes in the left-hand margin of the Latin column or between the Latin and the OHG text (see Figure 1, Appendix). Additionally, as is known for both Latin and vernacular manuscripts of Carolingian provenance, the beginnings of new text units are marked by different size and color of the initial letter (cf. Bästlein 1991, 59 and 214–242). As for the Tatian manuscript, Simmler (1998, 306–307) observes that the strategy of dividing episodes and sub-episodes by means of initial capital letters predominantly applies for the Latin section of the text and only rarely occurs in the OHG part. Petrova (2006, 158–159) notices that the graphical distinction of new episodes in the Latin original correlates with the regular preposing of the finite verb in the OHG translation. Consider (20), next to (14) and (15) given above, which demonstrates that the syntactic means of verb fronting systematically applies for marking episode boundaries in OHG as a functional equivalent to the graphical highlighting of the episode onsets in the Latin original: (20)
[Joseph of Arimathea and Nicodemus take the body of Jesus to conduct a Jewish burial] Intfiengun sie tho thes heilantes lichamon took they then the.GEN Saviour.GEN body ‘Then they took the body of Jesus’ (T 321, 29) Latin Acceperunt autem corpus ihesu
This example is remarkable in some more respects. First, it shows that the strong preference for V1 at the beginnings of new episodes does not only account for the post-verbal position of full subject constituents as in (14), but quite obviously affects the positioning of pronominal subjects inserted against the Latin original like sie ‘they’ in (20) or her ‘her’ in (15) above. Second, it shows that V1 in episode-initial position applies generally, not only with impersonal intransitive predicates as in (19) but also with transitive verbs like the one in (20).
190
Roland Hinterhölzl and Svetlana Petrova
The fact that V1 is used to indicate the beginning of a new episode is rather suggestive for the role of this pattern in the structuring of the discourse. In particular, it is clear that no elaboration on the discourse referents involved in the sentences is at issue here. Rather, the information in the sentences under scrutiny is part of the core scheme of the narrative, providing the basis for further elaboration in the discourse. 5.4.2. Types of predicates favoring V1 Next to its function to mark episode boundaries, V1 is said to be frequent with certain groups of predicates. According to our empirical investigation, the most common groups of predicates favouring V1 – apart from existential verbs in presentational sentences discussed in section 4 above – are motion verbs, verbs of saying as well as perfective, inchoative verbs signaling the initiation of a new state of affairs, very often a new physical or cognitive state of the referent. Among these predicate groups, verbs of motion constitute the largest class. Some of the examples as in (21) introduce novel discourse referents and thus functionally overlap with the type of presentational sentences. But in a great number of other cases, the appearance or withdrawal of a given discourse referent is reported, cf. (22) and (23): (21)
[Zacharias conducts service as a priest when suddenly an angel appears in the temple] engil quam thara gotes came there God.GEN angle ‘There came God’s angel’ (T 35, 32) Latin & ecce angelus domini
(22)
[A centurion asks Jesus to heal his servant. Jesus demands his faith and sends him back to his house.] uuarb tho ther centenary in sin hús returned then this centurion to his home ‘Then the centurion returned to his home’ (T 84, 8) Latin & reuersus est centurio in domum suam
(23)
[The archangel Gabriel departs from Mary after the revelation] tho/ fon Iru ther engil Inti arfuor and flew away then from her this angle ‘And then the angel left her’ (T 29, 6–7) Latin & discessit/ ab illa angelus
Rhetorical relations and verb placement in Old High German
191
Furthermore, V1 is attested in clauses with motion verbs selecting an inanimate subject as in (24). It is not the appearance or withdrawal of a discourse referent that is reflected here, but rather the establishment of a new state in the overall development of the plot: (24)
[Jesus has healed lots of people and performed many miracles] Inti argieng thó úz thiu liumunt and spread then out this fame ‘And this fame spread around’ (T 97, 5) Latin & exiuit fama haec
Next to verbs of motion, verbs of saying form another group of stable V1 occurrences in sentences involving context-given referents. The instances indicate a change of interlocutors in a dialogue sequence and therefore a shift in perspective. Consider (25): (25)
[Within a dialogue scene] antlingota thô sîn muoter Inti quad responded then his mother and said ‘Then his mother responded and said’ Latin & respondens mater eius & dixit
(T 30, 24)
Finally, V1 regularly occurs in contexts where a previously given discourse referent undergoes a transition into a new mental or physical state. Verbs of cognitive or sensual perceptions are common representatives of this group of predicates triggering V1: (26)
[A woman suffering from a flow of blood becomes healed by touching secretly the garment of Jesus] uuas furstuont siu thó in ira lihhamen/ thaz siu heil realizes she then in her body that she healed was fon theru suhti from this.DAT plague ‘She realized on her body that she was recovered from this plague’ (T 95, 14–15) Latin & sensit corpore/ quod sanata ess& a plaga
(27)
[Jesus heals a paralyzed boy] uuard tho giheilit ther kneht in thero ziti became then healed the boy in this moment
192
Roland Hinterhölzl and Svetlana Petrova
‘Then the boy was healed at this very moment’ (T 84, 7) Latin & sanatus est puer in illa hora (28)
[Salomé demands from King Herod the head of John the Baptist on a platter. The king is troubled because he has promised to fulfill any wish of the girl] Inti uuard gitroubit ther kuning and became troubled this king ‘And the king was troubled’ (T 116, 21) Latin & contristatus est rex
These instances show that V1 is a wide spread syntactic pattern in OHG, which on the first glance appears to be highly heterogeneous in use. But from the perspective of discourse relations, the uses of V1 in the examples above actually allow for a unified interpretation. On the one hand, it is evident that the sentences with verbs of motion and verbs of saying affect the narrative setting of the situation with respect to the participants involved in the action or the speaker from whose perspective the event or action is reflected. As such, sentences including a predicate of one of these groups automatically indicate a change in the narrative situation. On the other hand, the inchoative predicates convey important, extraordinary or unexpected events which reveal a turning point in the course of the story and therefore establish the initiation of a new situation in the structure of the narrative. From this perspective, sentences with V1 do not provide more information on a discourse referent distinguished as the predication base of the utterance, but assert the contents of the entire proposition, including all participants, as new information representing a unitary whole. In this respect, V1 sentences with these predicates represent thetic judgments with no topic-comment division. From the point of view of temporal relation to the previous context, the examples with V1 discussed here also reveal one important common feature. Without any exceptions, they establish relations of temporal succession with respect to the previous context, quite often indicated by temporal adverbials like tho ‘then, after that’ included in the sentence. From this, we can conclude that sentences with V1 serve to establish new situations by providing narratively important information and carrying forward the discourse. We assume that they continue the discourse on the level of main action and share important properties with coordinative discourse linking like temporal succession and progress in narration.
Rhetorical relations and verb placement in Old High German
6.
193
Implications for the generalization of V2 in modern German
If the distribution of V1 and V2 was ruled by discourse-organizational principles and each of these patterns was associated with one particular, well-defined functional field in the system of early German, then the question arises why and how this functional opposition was lost in the course of language development and how V2 became generalized in main clauses. We assume that the reason for this development is already present in the system of OHG. Note that V2 has already been generalized in wh-interrogatives at the stage of development represented in the Tatian text (cf. Petrova and Solf 2009). Apart from this, we encounter cases of variation in one functional domain of the opposition described for V1, namely in the domain of the coordinative type of discourse relations. Here, next to V1, V2 structures with a sentenceinitial adverbial co-occur. This pattern mainly applies to thô ‘then’ used as a connective marking the coordinative relation to the previous event. Note that (29) through (31) have the same discourse function as the V1 clauses discussed above: (29)
(30)
(31)
tho uuas man In hierusalem then was man in Jerusalem ‘There was a man in Jerusalem’ Latin & ecce homo erat In hierusalem thó uuvrdun sie gifullte […]/ gibuluhti then became they filled anger.DAT ‘then they became full of anger’ Latin & repl&i sunt omnes/ in sinagoga ira tho fragata inan petrus then asked him.DAT Peter ‘then Peter asked him’ Latin interrogabat eum p&rus
(T 37, 23)
(T 115, 7)
(T 128, 18)
This means that we encounter competition between V1 and thô+V2 in the domain of coordinative linking in OHG. This is represented in (32): (32)
coordination in discourse: a. [Vfin….DRnew/giv …]FOCUS b. thô [Vfin….DRnew/giv …]FOCUS
(V1) (thô+V2)
We have to consider these two structures as optional varieties in OHG. This can be inferred from the fact that according to the database, in 52 of the 96
194
Roland Hinterhölzl and Svetlana Petrova
instances involving V1, the adverbial thô is put independently of the original in the position after the finite verb thus supporting V1 on the surface, see (6), (7) as well as (24) through (27) above. However, in 122 of the 382 V2-cases included in the database, the structure in (32b) occurs. We assume that this situation shows the beginning of a process whereby the initial position in a sentence, which was originally preserved for the most salient constituent of sentences with a topic-comment division, was reanalyzed and extended by analogy to adverbials used to link the sentence to the discourse situation established in the previous discourse. Note that adverbials in anaphoric relation to a previously mentioned location or goal share the positional properties of nominal referential expressions as topics described so far. See thar ‘there’ referring to the preestablished place of the wedding ceremony in (33): (33)
[at the Cannae wedding] heilantes muoter thar uuas thes there was the.GEN Saviour.GEN mother ‘The mother of the Saviour was also there’ Latin erat mater ihesu & ibi
(T 81, 15)
As a result of this unification process, the preverbal position cannot be identified with any specific information-structural category anymore and is neutralized leading to V2 in modern German declaratives. Note that there was a different preference for one or the other structure in (32) among the different scribes of the Tatian text. Although it has to be clarified if the scribes are the actual translators of the text, we can detect some interesting patterns. First, within the text portion supplied by the scribe ε , there is a 100 per cent of consistency as to using the structure in (32b) in sentences indicating a change of speaker in dialogue. The investigation of the same amount of text in the portions of three different scribes reveal quite different preferences for V1 against thô+V2 in sentences with verbs of saying, namely 16 to 3 for scribe α , 3 to 9 for scribe β and 1 to 12 for scribe ζ , respectively. The fact that we encounter variation within one and the same functional domain indicates a language change in progress. In the framework of Lightfoot (1999), language change is viewed as a new type of parameter setting in the internal grammar of young generations of speakers resulting from a shift in the frequency relation of competing structures in the input data during language acquisition. In this sense, the existence of competing structures in the domain of sentences attributed to the coordinative type of discourse relations can be viewed as a pre-condition and indication of language change.
Rhetorical relations and verb placement in Old High German
7.
195
Conclusion
In the Old High German (OHG) Tatian text we find systematic variation between V1 and V2 clauses that is pragmatically driven. In particular, the distribution of V1 and V2 clauses correlates with coordination and subordination as the two basic types of discourse relations in the framework of SDRT by Asher and Lascarides (2003). First, instances of V2 are regularly found in structures providing additional descriptive or explanatory information on a discourse referent representing the topic of the sentences. These clauses provide additional information about elements located higher in discourse structure. From this we conclude that V2 correlates with elaboration and continuation, more precisely with the realization of subordination in discourse structure. In contrast, V1 comes in two main functions signaling main line sequentiality and progress in narration: i) it provides information which constitutes the basis for a subsequent elaboration on a lower level of discourse hierarchy; or ii) it signals that a previous chain of subordinative units is suspended and that the discourse returns to the main line of the story. In both cases, we assign to V1 properties of coordination in discourse. Thus, we arrive at the conclusion that verb placement in the earliest stages of German was governed by pragmatic, more precisely, by discourse-related properties. Our main claim is that at a certain stage in the history of the Germanic languages, the position of the verb was a means for distinguishing the type of rhetorical relation which the sentence holds with respect to the previous context. In this way, word order and verb placement were involved in the creation of dynamic text structure and discourse coherence. Acknowledgements This is a modified and extended version of Hinterhölzl and Petrova (2005). The investigation was conducted during our work in Research Project B4 of the Collaborative Research Center 632 “Information Structure” at HumboldtUniversity Berlin8 and was presented at the International Workshop on Salience in Discourse in Chorin, Germany, on 5th–8th Oct 2005. We are grateful to all participants in the workshop for a fruitful exchange of ideas. For useful comments and discussions we also thank Karin Donhauser, Milena Kühnast, Sonja Linde, Michael Solf, Eva Schlachter as well as two anonymous reviewers.
8. Project URL: http://www.linguistik.hu-berlin.de/sprachgeschichte/forschung/in formationsstruktur/index.php; http://www.sfb632.uni-potsdam.de/projects_b4ger. html).
196
Roland Hinterhölzl and Svetlana Petrova
Appendix
Figure 1. The beginning of Luke 2, 8 in the manuscript of the St. Gallen, Stiftsbibl. Cod. 56. Facsimile, pag. 35. In Sonderegger (2003, 130).
Figure 2. The same part of the text in the edition of Masser (1994, 85).
References Asher, Nicholas and Alex Lascarides 2003 Logics of Conversation. Cambridge: Cambridge University Press. Asher, Nicholas and Laure Vieu 2005 Subordinating and coordinating discourse relations. Lingua 115:591– 610. Axel, Katrin 2007
Studies in Old High German Syntax. Left sentence periphery, verb placement and verb-second, Amsterdam/Philadelphia: John Benjamins Publishing Company.
Bästlein, Ulf Christian 1991 Gliederungsinitialen in frühmittelalterlichen Epenhandschriften. Studie zur Problematik ihres Auftretens, ihrer Entwicklung und Funktion in lateinischen und volkssprachlichen Texten der Karolingerund Ottonenzeit. Frankfurt a. M.: Peter Lang.
Rhetorical relations and verb placement in Old High German
197
Bosch, Peter, Tom Rozario and Yufan Zhao 2003 Demonstative Pronouns and Personal Pronouns. German der vs. er. In Proceedings of the EACL Workshop on the computational treatment of anaphora in Budapest 2003. Bosch, Peter and Carla Umbach 2007 Reference Determination for Demonstrative Pronouns. ZAS Papers in Linguistics 48:39–51. Chiarcos, Christian this vol. The Mental Salience Framework: Context-adequate generation of referring expressions. this volume, 105–139. Chiarcos, Christian, Stefanie Dipper, Michael Götze, Ulf Leser, Anke Lüdeling, Julia Ritz and Manfred Stede 2008 A Flexible Framework for Integrating Annotations from Different Tools and Tagsets. TAL (Traitement automatique des langues) 49. Claus, Berry this vol. Dik, Simon C. 1989
Establishing salience during narrative text comprehension: A simulation view account. this volume, 291–277. The Theory of Functional Grammar. Part I: The Structure of the Clause. Dordrecht: Foris Publications.
Dittmer, Arne and Ernst Dittmer 1998 Studien zur Wortstellung - Satzgliedstellung in der althochdeutschen Tatianübersetzung. Göttingen: Vandenhoeck & Ruprecht. Drubig, H. Bernhard 1992 Zur Frage der grammatischen Repräsentation thetischer und kategorischer Sätze. Linguistische Berichte Sonderheft 4 / 1991–92:142–195. Eggers, Hans ed. 1964 Der althochdeutsche Isidor. Nach der Pariser Handschrift und den Monseer Fragmenten. Altdeutsche Textbibliothek 63. Tübingen: Niemeyer. Endriss, Cornelia and Stefan Hinterwimmer 2007 Direct and Indirect Aboutness Topics. In The notions of information structure, eds. Caroline Féry, Gisbert Fanselow and Manfred Krifka, 83–96. Potsdam: Universitätsverlag. Fanselow, Gisbert 2003 Münchhausen-style head movement and the analysis of verb-second. In Syntax at Sunset: Head movement and Syntactic Theory. UCLA Working Papers in Lingusitics 10, ed. Anoop Mahajan, 40–76.
198
Roland Hinterhölzl and Svetlana Petrova
Filchenko, Andrey Y. this vol. Parenthetical Agent-demoting Constructions in Eastern Khanty: Discourse Salience vis-à-vis Referring Expressions. this volume, 57–79. Fleischer, Jürg 2006
Zur Methodologie althochdeutscher Syntaxforschung. Beiträge zur Geschichte der deutschen Sprache und Literatur 128:25–69.
Fleischer, Jürg, Roland Hinterhölzl and Michael Solf 2008 Zum Quellenwert des AHD-Tatian für die Syntaxforschung: Überlegungen auf der Basis von Wortstellungsphänomenen. Zeitschrift für germanistische Linguistik 36:210–239. Fourquet, Jean 1974
Frey, Werner 2000
Genetische Betrachtungen über den deutschen Satzbau. In Studien zur deutschen Literatur und Sprache des Mittelalters. Festschrift für Hugo Moser zum 65. Geburtstag, eds. Werner Besch, Günther Jungbluth, Gerhard Meissburger and Eberhard Nellmann, 314–323. Berlin: Erich Schmidt. Über die syntaktische Position der Satztopiks im Deutschen. ZAS Papers in Linguistics 20:137–172.
Grüning, André and Andrej A.Kibrik 2005 Modelling Referential Choice in Discourse: A Cognitive Calculative Approach and a Neutral Network Approach. In Anaphora Processing. Linguistic, Cognitive and Computations Modelling, eds. António Branco, Tony McEnery and Ruslan Mitkov, 163–197. Amsterdam / Philadelphia: John Benjamins Publishing Company. Gundel, Jeanette K., Nancy Hedberg and Ron Zacharsky 1993 Cognitive status and the form of referring expressions in discourse. Language 69 (2):274–307 Hinterhölzl, Roland and Svetlana Petrova 2005 Rhetorical Relations and Verb Placement in Early Germanic Languages. Evidence from the Old High German Tatian translation (9th century). In Salience in Discourse. Multidisciplinary Approaches to Discourse, eds. Manfred Stede, Christian Chiarcos, Michael Grabski and Luuk Lagerwerf, 71–79. Münster: Stichting / Nodus. Hinterhölzl, Roland, Svetlana Petrova and Michael Solf 2005 Diskurspragmatische Faktoren für Topikalität und Verbstellung in der althochdeutschen Tatianübersetzung (9. Jh.). In Interdisciplinary Studies on Information Structure (ISIS) 3, eds. Shinichiro Ishihara, Michaela Schmitz and Anne Schwarz, 143–182. Potsdam: Universitätsverlag.
Rhetorical relations and verb placement in Old High German Hopper, Paul J. 1979a Hopper, Paul J. 1979b
199
Some Observations on the Typology of Focus and Aspect in Narrative Language. Studies in Language 3.1:37–64. Aspect and Foregrounding in Discourse. In Syntax and Semantics, ed. Talmy Givón, 213–241. San Diego / New York / Berkeley / Boston / London / Sydney / Tokyo / Toronto: Academic Press, INC.
Karttunen, Lauri 1976 Discourse Referents. In Syntax and Semantics 7: Notes from the Linguistic Underground, ed. James McCawley, 363–385. New York / San Francisco / London: Academic Press. Kemenade, Ans van 1987 Syntactic Case and Morphological Case in the History of English. Dordrecht: Foris Publications. Krasavina, Olga this vol. Demonstratives and salience: Towards a functionally taxonomy. this volume, 31–55. Krifka, Manfred 2007 Basic notions of information structure. In The Notions of Information Structure, eds. Caroline Féry, Gisbert Fanselow and Manfred Krifka, 13–56. Potsdam: Universitätsverlag. Lambrecht, Knud 1994 Information structure and sentence form. Topic, focus and the mental representations of discourse referents. Cambridge: Cambridge University Press. Lenerz, Jürgen 1984
Syntaktischer Wandel und Grammatiktheorie. Eine Untersuchung an Beispielen aus der Sprachgeschichte des Deutschen. Tübingen: Max Niemeyer.
Lightfoot, David 1999 The Development of Language. Acquistion, Change, and Evolution. Malden / Oxford: Blackwell. Mann, William C., and Sandra A.Thompson 1988 Rhetorical Structure Theory: Toward a functional theory of text organization. Text. An interdisciplinary journal for the study of discourse 8:243–281. Masser, Achim ed. 1994 Die lateinisch-althochdeutsche Tatianbilingue Stiftsbibliothek St. Gallen Cod. 56. Göttingen: Vandenhoeck & Ruprecht.
200
Roland Hinterhölzl and Svetlana Petrova
Masser, Achim 1997a
Masser, Achim 1997b
Molnár, Valéria 1993
Önnerfors, Olaf 1997
Syntaxprobleme im althochdeutschen Tatian. In Semantik der syntaktischen Beziehungen. Akten des Pariser Kolloquiums zur Erforschung des Althochdeutschen 1994, ed. Yvon Desportes, 123–140. Heidelberg: Carl Winter. Wege zu gesprochenem Althochdeutsch. In Grammatica Ianua Artium. Festschrift für Rolf Bergmann zum 60. Geburtstag, eds. Elvira Glaser and Michael Schlaefer, 49–70. Heidelberg: Carl Winter. Zur Pragmatik und Grammatik des TOPIK-Begriffs. In Wortstellung und Informationsstruktur, ed. Marga Reis, 155–202. Tübingen: Max Niemeyer. Verb-erst-Deklarativsätze. Grammatik und Pragmatik. Stockholm: Almquist & Wiskell International.
Petrova, Svetlana 2006 A discourse-based approach to verb placement in early West-Germanic. In Interdisciplinary Studies on Information Structure (ISIS) 5, eds. Shinichiro Ishihara, Michaela Schmitz and Anne Schwarz, 153– 185. Potsdam: Universitätsverlag. Petrova, Svetlana and Michael Solf 2008 Rhetorical relations and verb placement in early Germanic. A cross linguistic study. In ‘Subordination’ vs. ‘coordination’ in sentence and text – from a cross-linguistic perspective, eds. Cathrine FabriciusHansen and Wiebke Ramm, 329–351. Amsterdam / Philadelphia: John Benjamins Publishing Company. Petrova, Svetlana and Michael Solf 2009 Die Entwicklung von Verbzweit im Fragesatz. Die Evidenz im Althochdeutschen. Beiträge zur Geschichte der deutschen Sprache und Literatur 131 (1):6–49. Petrova, Svetlana and Michael Solf 2010 Pronominale Wiederaufnahme im ältesten Deutsch. Personalvs. Demonstrativpronomen im Althochdeutschen. In Historische Textgrammatik und historische Syntax des Deutschen: Traditionen, Innovationen, Perspektiven, ed. Arne Ziegler and Christian Braun, 339–365, Berlin: Walter de Gruyter. Petrova, Svetlana, Michael Solf, Julia Ritz, Christian Chiarcos and Amir Zeldes 2009 Building and using a richly annotated interlinear diachronic corpus: the case of Old High German Tatian. In Natural Language Processing for Ancient Languages, eds. Joseph Denooz and Serge Rosmor-
Rhetorical relations and verb placement in Old High German
201
duc, special issue of Traitement Automatique des Langues / Natural Language Processing 5 (2):47–71 Prince, Ellen F. 1981
Toward a Taxonomy of Given-New Information. In Radical Pragmatics, ed. Peter Cole, 223–255. New York: Academic Press.
Ramers, Karl Heinz 2005 Verbstellung im Althochdeutschen. Zeitschrift für Germanistische Linguistik 33:78–91. Ramm, Wiebke this vol.
Ries, John 1880
Discourse-structural salience from a cross-linguistic perspective: Coordination and its contribution to discourse (structure). this volume, 143–173. Die Stellung von Subject und Prädicatsverbum im Heliand. Nebst einem Anhang metrischer Excurse: Quellen und Forschungen zur Sprach- und Culturgeschichte der germanischen Völker. Strassburg / London: Karl J. Trübner.
Robinson, Orrin W. 1994 Verb-First Position in the Old High German Isidor Translation. Journal of English and Germanic Philology 93:356–373. Sasse, Hans-Jürgen 1995 “Theticity” and VS order: A case study. Sprachtypologie und Universalienforschung 48, 1/2:3–31. Schrodt, Richard 2004 Althochdeutsche Grammatik II. Syntax: Sammlung kurzer Grammatiken germanischer Dialekte. A. Hauptreihe, Nr. 5/2. Tübingen: Max Niemeyer. Simmler, Franz 1998
Makrostrukturen in der lateinisch-althochdeutschen Tatianbilingue. In Deutsche Grammatik. Thema in Variationen. Festschrift für HansWerner Eroms zum 60. Geburtstag, eds. Karin Donhauser and Ludwig M. Eichinger, 299–335. Heidelberg: Carl Winter.
Sonderegger, Stefan 2003 Althochdeutsche Sprache und Literatur. Eine Einführung in das älteste Deutsch. Darstellung und Grammatik. 3. durchgesehene und wesentlich erweiterte Auflage. Berlin, New York: Walter de Gruyter. Tomaselli, Alessandra 1995 Cases of Verb Third in Old High German. In Clause Structure and Language Change, eds. Adrian Battye and Ian Roberts, 345–369. New York / Oxford: Oxford University Press.
202
Roland Hinterhölzl and Svetlana Petrova
Zeldes, Amir, Julia Ritz, Anke Lüdeling and Christian Chiarcos 2009 ANNIS: A Search Tool for Multi-Layer Annotated Corpora. In Proceedings of Corpus Linguistics 2009, July 20–23, Liverppool, UK.
Part III. Beyond purely linguistic salience
Visual salience and the other one John D. Kelleher
This paper describes a salience based approach to visually situated reference resolution. The framework uses the relationship between referential form and preferred mode of interpretation as a basis for a weighted integration of linguistic and visual salience scores for each entity in the multimodal context. The resulting integrated salience scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context, in so far as the referent is selected from a full list of the objects in the multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. 1.
Introduction
In a dialog human participants expect their discourse partner to construct and maintain a model of the evolving discourse context. This discourse model provides a context against which the references in the dialog can be understood. A referring expression is a natural language expression that denotes an entity, called a referent, in the discourse context. Each referring expression introduces a representation into the semantics of its utterance and this representation must be bound to an element in the context for the utterance’s semantics to be fully resolved. Situated dialog occurs within a shared visual context. In a situated dialog, a referring expression may denote not only entities introduced through the linguistic interaction but also entities within the spatio-temporal context of the dialog. As a result, the scope of the discourse model must expand to include both linguistic and perceptually available entities and the resolution process must be able to match linguistic referring expressions to both linguistic and perceptual entities. Referring expressions come in a variety of forms including: definite descriptions, indefinites, pronouns, demonstratives. Referring expressions that access
206 11.7: 12.1: 13.1: 13.2: 13.3: 13.4: 13.5: 14.1:
John D. Kelleher M : I see an engine i and a boxcar j both at Elmira k S: right M: this looks like the best thing to do M: so we should get M: ... the eng / engine i to picks up the boxcar j M: and head for Corning l M: ’s that sound reasonable S: sure
Figure 1. Excerpt from dialog d91-1.1 of the TRAINS-91 corpus.
Figure 2. Map of TRAINS Domain
a representation in the linguistic context are interpreted anaphorically. Referring expressions that access a representation of an object that has not previously been referred to in the dialog but has entered the context through a non-linguistic modality (such as vision) are interpreted exophorically. The dialog excerpt listed in Figure 1, taken from the TRAINS corpus, Allen and Schubert (1991); Heeman and Allen (1995), illustrates the distinction between anaphoric and exophoric references. The excerpt is taken from a collaborative dialog between two participants, M and S, who are trying to ship goods within a railroad freight system. Figure 2 illustrates the schematic representation of the railroad freight system that provided the visual context for the dialog. In this example, the indices i, j, k and l indicate that all the referring expressions marked by a particular index refer to the same entity. The references an engine, a boxcar, and Elmira in 13.4 and Corning in 11.7 are examples of exophoric references. The entities these expressions denote have not been previously mentioned in the dialog. As a result, these references must be resolved relative to a set of representations in the context model that
Visual salience and the other one
207
entered the model via the non-linguistic modalities, in this instance the visual context of the dialog. By contrast, the references the engine and the boxcar in 13.3 are examples of anaphoric references. The reference the engine can be resolved relative to the linguistic context by binding it to the representation of an engine introduced to the linguistic context by the resolution of 11.7. Similarly, the reference the boxcar can be resolved relative to the linguistic context by binding it to the representation of a boxcar introduced by the resolution of 11.7. Most forms of referring expression have a preferred mode of interpretation, for example, anaphoric, exophoric. For example, pronouns are typically interpreted anaphorically. However, there is no one-to-one relationship between form and mode of interpretation. For example, definite descriptions can be used either anaphorically or exophorically. Indeed, the two most common cases of definite descriptions in the TRAINS corpus of situated dialogue were anaphoric and exophoric definites (Poesio 1993). One consequence of the one-to-many relationship between referential form and mode of interpretation is that a multimodal reference resolution process should define a strategy to deal with cases where different mode of interpretations are suggested for the same reference. One solution, to this issue, is to define a preference ordering over the different interpretation rules. In Sect. 2 several reference resolution frameworks that adopt such an approach are reviewed: Poesio (1993); Kievit et al. (2001); Salmon-Alt and Romary (2001); Landragin and Romary (2003); Gorniak and Roy (2004); Kelleher et al. (2005). In contrast with these rule based approaches, in this paper, we develop a probabilistic framework that addresses the issue of selecting between different modes of interpretation through a saliency based modelling of the attentional spread across the set of entities in the discourse domain. This attention based approach is inspired by psycholinguistic findings, see Sect. 2, that point to a strong interaction between attention and linguistic reference. We will use the concept of salience to describe the factors and associated processes that direct attention. The framework consists of a set a salience models, one for each modality, and an reference resolution process that for each entity in the discourse context computes an overall attention score by integrating the scoring of that entity by each of the salience models. The attention score assigned to an entity represents the probability of that entity being the intended referent of the referring expression. Thus, the entity with the highest overall attention score after the processing of a referring expression is selected as the referent for that reference. The paper is structured as follows: Section 2 reviews related work; Sect. 3 describes the reference resolution framework; Sect. 4 describes an implementation of the framework; Sect. 5 contains a worked example illustrating how
208
John D. Kelleher
the framework functions; the paper finishes, in Sect. 6, with conclusions and future work. 2.
Related work
In recent years a number of psycholinguistic experiments have pointed to the interaction between language and vision. For example, Spivey-Knowlton et al. (1998) and Tanenhaus et al. (1995) indicate that language comprehension affects visual attention. More recently, the interaction between visual attention and linguistic reference has been highlighted. Studies, such as Duwe and Strohner (1997) and Strohner et al. (2000), have shown that people often use perceptual salience to resolve linguistic references. These experimental findings support attention based theories of discourse processing. Grosz (1977) is arguably the seminal work on language and vision integration. Grosz’s work highlighted that attention constrained and structured the processing of discourse. Moreover, Grosz was the first to observe the relationship between focus of attention and the use of exophoric definite descriptions: when an object is in the current mutual focus of attention it can be referred to by means of a definite description even though other objects fulfilling the description have been introduced into the linguistic discourse or are present in the shared visible context. Building on this work, Grosz and Sidner (1986) developed a focus stack model of global discourse attentional state. According to this model the common ground1 can be divided into three parts: the linguistic structure, which contains information about the linguistic structure of utterances in the dialog; the intentional structure, which contains information about the goals of the participants in the conversation; and the attentional structure, which contains information about the objects introduced into the discourse and their relative salience. Furthermore, due to attentional constraints, discourse is segmented or chunked and when a definite description is used anaphorically, the only antecedents2 considered are those in the same discourse segment. Assuming Grosz and Sidner’s (1986) focus stack model to be generally correct as a model of global discourse structure,3 the issue of how focus of attention and reference interact within a discourse segment must still be addressed. 1. The dialog participants mutually developed public view of what they are talking about. 2. The antecedent of an anaphoric reference is the representation of the reference’s referent that was introduced to the discourse model by a prior referring expression. 3. For alternate models see Hobbs (1985); Mann (1987); Asher and Lascarides (2003).
Visual salience and the other one
209
Several theories of discourse reference have attempted to address this issue by providing accounts of the relationship between types of referential expressions on the one hand, and degrees of mental activation of discourse referents on the other (e.g. Alshawi 1987, Ariel 1990, Gundel et al. 1993, Hajicová 1993, Lappin and Leass 1994, Grosz et al 1995).4 A common theme among these accounts is that referential expressions need more coding material as the referent is less activated. However, none of these models explicitly accommodate multimodal contexts. Poesio (1993) reformulates the attentional model of Grosz and Sidner (1986) in situation theoretic terms. Interestingly, Poesio’s framework separates the attentional common ground into several anchoring resource situations. For example, one anchoring resource is called the discourse situation and consists of a record of what has been said. This anchoring resource is used to interpret anaphoric references. Another anchoring resource situation called the situation of attentionmodels the subset of information in the visual field of the discourse participants that they are attending to and is used to interpret exophoric5 definite descriptions. Furthermore, he defines rules within a default logic, called principles for anchoring resource situations, that predict whether a definite description is going to be interpreted anaphorically or exophorically. However, one of the issues with this approach is how to deal with conflicting defaults. Consequently, the framework cannot handle situations in which two principles of anchoring resource situations apply, one suggesting an anaphoric interpretation the other an exophoric interpretation. Many computational frameworks for multi-modal reference resolution have also been developed. Recent systems that focus on multi-modal reference resolution include: Kievit et al. (2001), Salmon-Alt and Romary (2001), Landragin and Romary (2003), Gorniak and Roy (2004) and Kelleher et al. (2005). Kievit et al. (2001) define separate resolution strategies for each form of referring expression. A strategy consists of one or more resolution steps applied in a predefined order. A resolution step consists of 4 stages: (1) the selection of possible referents from a single sub-context (dialog, visual domain, etc.), (2) the filtering of this set of candidates, (3) the ordering of the candidates based on saliency, (4) an evaluation of the result. The algorithm halts as soon as one of the resolution steps finds a unique object or finds several objects and cannot choose which is the intended one. This approach is equivalent to a preference 4. See Kruijff-Korbayová and Hajicová (1997) for a comparison of several of these approaches. 5. Poesio uses the term visible situation use to describe to exophoric definite descriptions.
210
John D. Kelleher
ordering being defined over the different modes of interpretation for each form of reference. One issue with this approach is that the set of candidates considered during any one resolution step is constrained to the set of entities within the sub-context the resolution step uses to construct the initial set of candidates. As a result, the system cannot recognise situations where a reference may be ambiguous between two entities in different sub-contexts, and, consequently, it may resolve a reference incorrectly rather than initiate a clarification process. Gorniak and Roy (2004) focus on the resolution of references containing spatial descriptions. They propose a feed-forward filtering process to reference resolution. In their framework, each lexical item in the system’s lexicon is associated with one or more composer functions. A composer function takes one or more candidate referents as input and filters this set of candidates by computing how well each of the candidates fulfils the semantic model defined for the lexical term. Reference resolution is carried out by chaining the composer functions associated with the lexical terms in the reference together, i.e. the filtered set of candidates output by one composer function is used as the input set by the next composer function in the chain. Gorniak and Roy note that this strategy can fail if one of the composer functions excludes the target object from the set of candidates. For example, when interpreting “the leftmost one in the front” the composer for “leftmost” selects the leftmost objects in the scene, not including the obvious example of “front” that is not a good example of “leftmost”. The reference resolution frameworks presented in Salmon-Alt and Romary (2001), Landragin and Romary (2003) and Kelleher et al. (2005) use the notion of a reference domain. A reference domain is a structured contextual subset of the multi-modal dialog context. Reference domains are created in the context model due to perceptual or linguistic events or conceptual knowledge and are intended to reflect the mental representation of the event they model. In these frameworks the resolution process involves: (1) the construction of an underspecified reference domain, using templates associated with the form of the reference given; (2) the unification of this underspecified domain with a suitable reference domain within the context model; (3) the selection of one of the elements within the unified reference domain to function as the referent. However, similar to the frameworks proposed in Kievit et al. (2001) and Gorniak and Roy (2004), there is the potential for these frameworks to overcommit to a particular subset of the context during the resolution process. As the resolution process occurs within a sub-context, whose selection is at least partially driven by the form of the reference being interpreted, if the wrong reference domain is selected the intended target object and/or plausible distractor referents, that may indicate the need for reference clarification, may be excluded from consideration.
Visual salience and the other one
3.
211
Approach
Resolving a referring expression involves two main tasks: 1. creating and maintaining a model of the discourse context (this model should contain representations for all the entities that are available for reference) 2. matching/binding the representation introduced by a given referring expression to an element (or elements) in the set of possible referents The matching or binding of a referring expression’s representation to a element in the discourse model depends to a large extent on the how the system searches the model. The multi-modal frameworks reviewed in Sect. 2 search the discourse model by incrementally identifying a subset of the model that fulfils a set of constraints defined by the form of reference being resolved and then selecting an element within that subset as the referent. Building on the psycholinguistic evidence and theoretical models of discourse, reviewed in Sect. 2, that relate attention and linguistic reference, we propose a reference resolution framework that treats a given referring expression as a set of instructions that directs how the spread of attention across the set of objects within the discourse context should be modified before the selection of the referent. The binding of the representation of the referring expression consists of selecting the entity with the highest attention within the updated context as the referent. A concept closely related to attention is salience. In this paper, the concept of salience is used to describe the factors and associated processes that direct attention. From a computational perspective, a model of an entity’s salience within a context predicts the amount of attention that will be given to that entity. Moreover, the mechanism that drives the focusing of attention towards the intended referent during the interpretation of a referring expression is the aligning of the parameters of the salience models underpinning the framework with the features described within the reference. For example, interpreting a referring expression such as the blue house will result in all the blue entities in the context becoming more salient. Once the salience models underpinning the framework have been updated to reflect the selection preferences encoded in the referring expression, the next stage in the interpretation process is to integrate the salience scores for each entity in the discourse model. This integration enables the framework to predict the overall focus of attention within context provided by the referring expression. Reflecting the relationship between the form of referring expression and preferred mode of interpretation, noted in Sect. 1, the integration process weights the salience scores of an entity before integration. Equation 1 defines how the
212
John D. Kelleher
linguistic and visual salience models are integrated. The weighting factors α and β accommodate the preferential relationship between the form of referring expression and the mode of interpretation. These weights range in value between 0 and 1. As an example of the possible values these weights might take, when resolving a pronominal reference setting α = 0.1 and β = 0.9 would preference the resolution process towards entities with a high linguistic salience. Ideally, however, these weights should be derived from an empirical analysis of the preferred mode of interpretation for each form of reference. The resulting integrated salience scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. IS = ((α ∗ VS + β ∗ LS))/2 (1) The framework distinguishes three stages of salience integration. Stage 1 This stage computes the salience of the entities in the context prior to the processing of a given referring expression. It includes the basic visual salience (i.e. the prominence of an entity due to bottom up visual cues) and linguistic salience (i.e. the prominence of an entity due to previous discourse) of an object. Stage 2 This stage computes the salience of each entity within the context provided by the referring expression being resolved. For each entity two salience scores result from this stage of processing: a reference relative visual salience and a reference relative linguistic salience. These salience scores are computed by integrating each entities basic visual salience and basic linguistic salience with a rating of how well the entity fulfils the description provided in the referring expression. Stage 3 For each entity in the context this stage computes an overall attention score by integrating the entity’s salience scores that resulted from Stage 2. This overall salience is computed using a weighted integration, where the weights used reflect the biasing associated with different forms of reference toward a particular information source. The flow of information during reference resolution is from Stage 1 to Stage 3. Algorithm 1 lists the basic steps in reference resolution. Algorithm 1 Reference Resolution Algorithm 1. compute the reference relative saliences for each object in the context 2. compute the attention score for each object in the context 3. return the object with the highest attention score as the referent
Visual salience and the other one
213
One advantage of this approach is that resolution process occurs within the full multi-modal context of the dialog, in so far as the the referent is selected from a full list of the objects in the multi-modal context ordered by a model of integrated salience. Consequently, none of the objects in the context are excluded from consideration. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Also, the framework can recognise cross-modal ambiguity by comparing the integrated salience of the primary candidate with the integrated salience of all the other objects in the context. In these ambiguous cases the initiation of a clarification dialog may be a better system response rather than the selection of the primary candidate referent. By contrast, many of the previous multi-modal resolution frameworks exclude entities in the multimodal context model from consideration before the selection of the referent. In some cases, for example Kievit et al. (2001); Salmon-Alt and Romary (2001); Landragin and Romary (2003); Kelleher et al. (2005), the initial set of candidates referents is restricted to a sub-set of the context based on preferences with respect to the mode of interpretation relative to the form of reference. In other frameworks, for example Gorniak and Roy (2004), candidate referents are incrementally excluded from consideration as the resolution process progresses due to the sequential manner that the semantics of the terms within the reference are processed. Moreover, from a functional perspective this approach has the advantage of modularity and the potential to accommodate learning within the system. The modularity of the framework stems from the fact that the only information required by the resolution process from each of the information sources (language and vision) within the context are the salience scores for each entity. As a result, the resolution process is, to a large extent, decoupled from the representations and processes used within the linguistic and visual context models. The learning aspect of the system arises from the ease (relative to rule based approaches) with which the integration weightings associated with a particular form of reference could be updated, for example, using machine learning techniques such as reinforcement learning. Finally, from a cognitive perspective, an attention based model fits the theoretical and psychological data that points to the role of attention within human reference resolution.
4.
Implementation
The resolution framework described in Sect. 3 has been used to update the LIVE system’s (Kelleher et al. 2005) approach to reference resolution. In the follow-
214
John D. Kelleher
ing sections the data structures, salience algorithms and reference resolution algorithms used by the updated system are described. 4.1. Data Structures The basic data structure used by the framework is called an coreference class. Each coreference class stores the saliency information for one object in the context model. Figure 3 illustrates the internal structure of a coreference class. The coreference class id is a unique string identifier. Each coreference class contains components for storing the basic and reference relative visual and linguistic salience scores and the integrated salience scores for the object the class represents in the context model. id = String value visual salience a) basic = [0 . . .1] b) reference-relative = [0 . . .1] linguistic salience a) basic = [0 . . .1] b) reference-relative = [0 . . .1] integrated salience = [0 . . . 1] Figure 3. A Coreference Class
New coreference classes are added to the context model as a result of visual processing. Each time an object is detected in the visual scene the context model is queried for the coreference class representing the object. If there is no coreference class for the object model a new coreference class is created and is assigned the id used by the vision processing. The basic visual salience component is initialised to the value created by vision processing when the object was detected. This is updated after each scene is rendered. All the other salience scores are initialised to 0. These components are updated after each utterance has been processed. Coreference classes are removed from the context model when both their basic visual and linguistic saliences fall below a threshold (.0001). In the following sections the algorithms that provide and use the information stored in these structures are described. 4.2. Modelling Basic Visual Salience Most computational models of visual attention focus on bottom-up processing, see Koch and Itti (2001) and Heinke and Humphreys (2004) for reviews. In most of these models several feature maps (such as colour, intensity etc.) are
Visual salience and the other one
215
computed in parallel across the visual field and these are then combined into a single saliency map. Then a selection process deploys attention to locations in decreasing order of salience. In Kelleher and van Genabith (2004) a simple model of visual salience (based on object size and centrality relative to a focus of visual attention) was presented. In this paper we adopt use this model to capture the information entering the discourse through vision. The visual salience algorithm uses a false-colouring technique. Each object in the simulation is assigned a unique colour or vision-id. This colour differs from the normal colours used to render the object in the world; hence the term false colouring. Each frame is rendered twice: firstly, using the objects’ normal colours, textures and shading, and secondly, using the vision-ids. The first rendering is on screen (i.e. the user sees it), the second rendering may be off screen. After each frame is rendered, a bitmap image of the false colour rendering is created. The bitmap is then scanned and a list of the colours in the image is created. Using this list the system can recognise which objects are visible and which are not. Moreover, the system can identify, at the pixel level, the area covered by each object in the scene. This pixel information is used to compute the basic visual saliency of each object. Mimicking the spread of visual acuity across the retina, the algorithm weights each pixel in the image based on its distance from the point of visual focus. The weighting is computed using Equation 2. In this equation, D equals the distance between the pixel being weighted and the point of focus, M equals the maximum distance between the point of focus and any point on the border of the image. The point of focus can be determined using eye tracking technology to compute the user’s gaze at each scene rendering. However, if eye tracking is not being used the point of focus defaults to the center of the image or to the center of silhouette of the last object referred to. Algorithm 2 lists the procedure used to compute basic visual saliency and to update the coreference classes. For each scene processed the algorithm returns a list of objects in the scene each with a relative salience between 0 and 1, with 1 representing maximum salience.
D (2) Weighting = 1 − M+1 4.3. Modelling Basic Linguistic Salience The basic linguistic salience of objects in the context are computed using an algorithm that is similar to Krahmer and Theune (2002) . The algorithm is based on the ranking of the so-called forward looking centers (Cf ) of an utterance. The set of forward looking centers of an utterance contains the objects referred
216
John D. Kelleher
Algorithm 2 The basic visual salience algorithm. for each object Oi in the scene do AW (Oi ) = average weighting of the pixels covered by Oi TotalAW = TotalAW + AW (Oi ) endfor for each coreference class CRi do if CRi is the coreference class representing Oi then CRi .basic_visual = (CRi .basic_visual/2) + (AW (Oi )/TotalAW else CRi .basic_visual = CRi .basic_visual/2 endif Totalbvs = Totalbvs + CRi .basic_visual_salience endfor for each coreference class CRi CRi .basic_visual = CRi .basic_visual/Totalbvs endfor
to in that utterance. This set is partially ordered to reflect the relative prominence of the referring expressions within the utterance. Grammatical roles are a major factor here, so that subject > object > other. The central component of the algorithm is a function sf that maps the objects in a domain D to the set {0, . . . , 1}, with the intuition that 0 represents non-salience and 1 maximal salience. Figure 4 defines the salience function sf used by the framework. The algorithm assumes that in the initial situation s0 all the objects in the domain are equally (not) salient: sf (s0 , d) = 0 for all d ∈ D. It should be noted that, although this algorithm is inspired by the Centering framework of Grosz et al. (1995) it does deviate from Centering is so far as it is recursive and assigns salience scores to entities not mentioned in the im-
Let Ui be a sentence uttered in state si , in which reference is made to {di , . . . , dn } ⊆ D. Let Cf (Ui ) (the f orward looking center of Ui ) be a partial order defined over {di , . . . , dn } ⊆ D. Then the salience weight of objects in s i+1 is determined as follows: ⎧ 1 if d = subject(Ui ) ⎪ ⎪ ⎨ if d = object(Ui ) (sf (si , d)/2) + .5 sf (si+1 , d) = (sf (s , d)/2) + .25 if d = other(Ui ) ⎪ i ⎪ ⎩ if d ∈ / Cf (Ui ) sf (si , d)/2 Figure 4. Linguistic Salience Weight Assignment
Visual salience and the other one
217
mediately preceding utterance. In this aspect, it is similar to Hajicová’s (1993) framework. Also, it is not claimed that the function sf is the best way to assign linguistic salience. However, it does provides a reasonable, transparent and operational model of linguistic salience. Algorithm 3 defines the procedure used to update linguistic salience after an utterance has been processed. Algorithm 3 The basic linguistic salience algorithm. BLS = basic linguistic salience. Let TotalDS = 0 for each coreference class CRi do CRi .BLS = sf (sj , CRi ) TotalDS = TotalDS + CCi .BLS endfor for each coreference class CCi do CRi .BLS = CRi .BLS/TotalDS endfor
4.4. Computing Reference Relative Saliences The first step in resolving a reference is to compute for each object in the context the salience of that object within each modality within the context provided by the reference. These reference relative saliences are computed for each object by integrating each object’s basic visual and linguistic saliences with a rating of how well the object fulfils the selectional preferences6 encoded in the reference. The rating of how well an object fits the description provided by a reference is called an f-score. Two f-scores are computed for each object for each reference: a visual and a linguistic f-score. Currently, the system can rate objects relative to their type, colour, size7 and location.8 Table 1 lists the ratings ascribed to an object for each type of selectional preference. An object’s visual f-score is initialised to 0 and its ratings are integrated using addition. An object’s linguistic f-score is initialised to 1 and its ratings are integrated using multiplication. Once the f-scores have been computed the object’s reference relative visual and linguistic saliences are computed by integrating the f-scores with its basic visual and linguistic salience. Again, addition is used for integration in the 6. The semantics of the descriptive terms used in the reference. 7. An objects size rating is based on the number of pixels it covers relative to the other objects in the scene. 8. An objects location rating is computed using the AVS model described in Regier and Carlson (2001)
218
John D. Kelleher
Table 1. Selectional Preferences Scores TYPE COLOUR SIZE LOCATION
Fulfils
Not Fulfill
1 1
0 0 [1 . . .0] [1 . . .0]
visual context and multiplication is used for integration in the linguistic context. Consequently, an object’s reference relative visual salience will be > 0 if it fulfils any of the selectional preferences in the description, and its linguistic reference relative salience will be = 0 if it does not fulfill all of the selectional preferences in the description. Algorithm 4 lists the algorithm for computing the reference relative saliences. Algorithm 4 Computing the reference relative saliences. RLS = reference relative linguistic salience, RVS = reference relative visual salience, BLS = basic linguistic salience, BVS = basic visual salience. for each coreference class CRi do f _scorelinguistic = 1 f _scorevision = 0 for each selectional preference spj in the description do f _scorelinguistic = f _scorelinguistic ∗ rating(CRi , spj ) f _scorevision = f _scorevision + rating(CRi , spj ) endfor CRi .RLS = CRi .BLS ∗ f _scorelinguistic Totalrls = Totalrls + CRi .RLS CRi .RVS = CRi .BLSl ∗ f _scorevision Totalrvs = Totalrvs + CRi .RVS endfor for each coreference class CRi do CRi .RLS = CCi .RLS/Totalrls STATE CCi .RVS = CCi .RVS/Totalrvs endfor
4.5. Creating the Integrated Context and Selecting the Referent The final step before the selection of the referent is the integration of each object’s reference relative saliencies. This is done using a weighted combination. The weightings are dependent on the form of referring expression (e.g. defi-
Visual salience and the other one
219
nite descriptions versus pronominal references) being resolved and reflect the preferential interpretation associated with each type of reference. For example, in general, a pronoun is used to refer to a referent that is prominent within the linguistic context. By contrast, a definite description can be used to refer to an object from the visual scene and to previously mentioned objects. Ideally, these weights would be set based on an empirical examination of a multimodal corpora. However, the system currently uses predefined weights for this integration. When resolving a definite description visual and linguistic salience are integrated evenly. When resolving a pronominal reference the integration weightings used biases towards linguistic salience. Algorithm 5 defines the procedure used to construct the integrated context and select the reference. It also defines the mechanism used to check for ambiguous references. This ambiguity check uses a predefined confidence interval and simply checks that within the context provided by the referring expression the integrated salience of the object selected as the referent is sufficiently larger than the other objects in the context to ensure that the reference is not ambiguous. In situations where the ambiguity check fails the algorithm returns 0. Algorithm 5 Constructing the integrated context and selecting the references. RVS = reference relative visual salience, RLS = reference relative linguistic salience. for each coreference class CRi in the context model do Let index = 0, max = 0, interval = 0.3 if reference = definite description then CRi .integrated = (CRi .RVS ∗ 0.5) + (CRi .RLS ∗ 0.5) elseif reference = pronominal reference then CRi .integrated = (CRi .RVS ∗ 0.1) + (CRi .RLS ∗ 0.9) endif if CRi .integrated > max then index = i max = CRi .integrated endif endfor for each coreference class CRj in the context model do if j <> index then if CRj .integrated > CRindex .integrated − interval then return 0 //Reference Deemed Ambiguous endif endif endfor return CRindex
220 5.
John D. Kelleher
Worked Example
The functioning of the framework can be illustrated using a worked example. The example uses Figure 5 as a visual context, and the utterances (1) through (4) as the example discourse. 1. 2. 3. 4.
make the blue house bigger make it smaller make the red house on the right bigger make the other one bigger
The example will illustrate how the framework interprets a definite description, a pronominal reference, a locative expression, and an other anaphoric definite description with a pronominal head. Within a visually situated discourse this last form of reference is particularly interesting as the interpretation of word like other and one illustrates the need to closely interrelate visual and linguistics contexts. Other-anaphora occurs when a definite description contains the modifier other, e.g. the other house. Other designates an object that has been excluded from a specified or implied grouping. For example, the other housepresupposes that the context contains a set of houses within which a subset has already been specified in some manner (e.g. the blue house) and the referent of the NP the other house is then selected from the set of house that are not elements of this subset (e.g. the houses that are not blue). Importantly, interpreting an otheranaphoric NP may require both visual and linguistic information. The definition
Figure 5. Example visual context. H1,H3 = red house, H2 = blue house.
Visual salience and the other one
221
of the set of elements that other designates its referent as being excluded from often requires information from prior discourse and the set of elements that the referent may be selected from may contain elements from the visual context that have not been mentioned previously. It has long been observed that the nominal anaphor one can be resolved only in relation to the discourse context. However, it gains new importance where references of the form the green one or the other one get evaluated. For one thing it is perfectly possible that while the resolution of the pronoun one is to a noun used in the antecedent discourse, the referent of the noun phrase (NP) belongs only to the visual context. Thus the red one can refer to a house that is visible in the scene but has not yet been mentioned, although this is only because the word house occurs in an earlier utterance that the red one can be interpreted as the red house. Neither of these phenomena effect the weightings used to integrate the visual and linguistic salience. However, one anaphora does effect the selectional preference used in the computation of the overall salience of its NP and other anaphora effects the selection of its NP’s referent once the overall saliencies have been computed. When interpreting a one-anaphoric NP, the pronoun one is resolved to the most recent NP in the the discourse that referred to the referent with the highest linguistic salience in the context model. It inherits all the selectional preferences specified in its antecedent NP that do not contradict selectional preferences specified in the one-anaphoric NP. For example, assuming the red house on the right is the most recent reference in the discourse the reference the green one is interpreted as the green house on the right. Other also functions as an anaphor of sorts – it presupposes that the context contains a set X of objects of the relevant kind and a proper subset Y of X that its referent is excluded from. The referent of the NP containing other is then selected from the difference between these two sets, X − Y . Consequently, in order to resolve other anaphora the subset of the context that other specifies its referent is excluded from (i.e. Y ) must be defined. In this framework this subset is defined as the containing the element with the highest overall referential salience computed relative to the selectional preferences encoded in the NP modified by other. Assuming a reasonable distribution of visual salience, the updating of linguistic salience and the constraint that an object’s reference based linguistic salience can only be > 0 if it fulfils all the selectional preferences encoded in the current NP imposes a partial ordering on the reference resolution context:
222
John D. Kelleher
1. objects that have previously been mentioned and that fulfil the selectional preferences encoded in the NP being interpreted (internally ordered by linguistic and visual salience) 2. objects that fulfil the selectional preferences encoded in the NP being interpreted (internally ordered by visual salience) 3. other objects in the context (internally ordered by visual salience) Consequently, by excluding the most salient element in the reference resolution context the most recently mentioned object that fulfils the selectional preferences encoded in the NP is excluded. For example, during the interpretation of a reference the other house the most recently mentioned house would be excluded from the set of candidate referents and the reference would be resolved to the next most salient house in the context. Table 2 lists the data computed by the framework during the different stages of this interaction. Rows 1, 2 and 3 present the visual and linguistic saliencies of the objects in Figure 5 before any commands are input; H2’s visual salience is higher than H1 and H3’s because it is closer to the centre of the image, which is the default point of visual focus in the visual saliency algorithm. Rows 4, 5 and 6 present the data computed during the interpretation of the blue house. This is a definite description so visual and linguistic salience are weighted equally in the integration process. However, at this point, there has been no previous discourse so each object’s overall salience is dependent on the interaction between its visual salience and its visual f-score. H2 fulfils two of the selectional preferences encoded in the reference blue and house. Consequently, its visual f-score is 2. The other houses in the scene only fulfil the type selectional preference and their visual f-scores are 1. Primarily due to the difference in f-scores H2 achieves the highest overall salience and is selected as the referent. The asterix in the overall salience column (os column) of row 5, H2’s row, highlights this. Rows 7, 8, and 9 list the data computed when the reference it is processed. Note, that H2 has a linguistic salience of 1 because it was selected as the referent for the subject in the preceding utterance. The pronoun it does not encode any selectional preferences. Consequently, all the objects visual and linguistic fscores are set to the default values. The biasing towards linguistic salience in the computation of the overall saliencies is apparent in the dominance of H2’s integrated salience and again it is selected as the referent. Rows 10 to 12 list data computed when the red house on the right is processed. There are three selectional preference in this reference red, house and on the right and the visual and linguistic f-scores are computed accordingly. As H2 does not fulfil all the selectional preferences its linguistic f-score is 0. This
Visual salience and the other one
223
results in its linguistic salience being discounted in the resolution process. Consequently, H3 achieves the highest overall salience due to its dominant visual f-score. Rows 13 to 15 list the data computed during the processing of the other one. As noted in Sect. 4.5, the pronoun one inherits its selectional preferences from the most recent NP in the preceding utterance that referred to the referent with the highest linguistic salience in the context model. This results in the reference being interpreted as the other red house on the right. In Sect. 4.5, the effect of the modifier other on its NP was also presented. It does not introduce a selectional preference into the process of computing the overall saliencies, rather it affects the selection of the referent once these saliencies have been computed. Consequently, the overall saliencies are computed using the same selectional preferences as the preceding reference: red, house and on the right. As a result, H1, H2 and H3 have the same visual and linguistic f-scores. However, as H3 was selected as the referent of the preceding subject NP it now has the highest linguistic salience in the context 0.8889. Moreover, it has the maximum visual f-score. These two factors result in H3 achieving the highest overall salience. However, due to the modifier other in the reference it is excluded as a candidate referent. The next most salient object is H1 and it is selected as the referent. H1’s higher overall ranking relative to H2 is due to two factors: (1) H2 did not fulfil all the selectional preferences so its linguistic salience was discounted from the resolution process; (2) H1 fulfilled more of the selectional preferences than H2 which resulted in its visual reference based salience being higher. Figure 5 illustrates the visual context at the end of the interaction.
Figure 6. The final state of the visual context.
224
John D. Kelleher
Table 2. Salience scores computed during the example interaction. Acronyms: bvs = basic visual salience, bls = basic linguistic salience, vfs = visual f-score, lfs = linguistic f-score, rvs = reference based visual salience, rls = reference based linguistic salience, os = overall salience. bvs 1 2 3
H1 H2 H3
0.2812 0.4376 0.2812
4 5 6
H1 H2 H3
0.2812 0.4376 0.2812
7 8 9
H1 H2 H3
0.2879 0.4348 0.2879
10 11 12
H1 H2 H3
0.2812 0.4376 0.2812
13 14 15
H1 H2 H3
0.1990 0.3477 0.4534
6.
bls
vfs lfs rvs Initial Context 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 make the blue house bigger 0.0000 1.0000 0.0000 0.2556 0.0000 2.0000 1.0000 0.4887 0.0000 1.0000 0.0000 0.2556 make it smaller 0.0000 0.0000 1.0000 0.2879 1.0000 0.0000 1.0000 0.4348 0.0000 0.0000 1.0000 0.2879 make the red house on the right bigger 0.0000 2.0000 0.0000 0.3259 1.0000 1.0000 0.0000 0.2054 0.0000 3.0000 1.0000 0.4687 make the other one bigger 0.0000 2.0000 0.0000 0.3141 0.1111 1.0000 0.0000 0.1925 0.8889 3.0000 1.0000 0.4833
rls
os
0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
0.0000 0.0000 0.0000
0.2556 0.4887* 0.2556
0.0000 1.0000 0.0000
0.1438 0.7124* 0.1438
0.0000 0.0000 0.0000
0.3259 0.2054 0.4687*
0.0000 0.0000 1.0000
0.1571* 0.0963 0.7476
Conclusions and Future Work
This paper presented an attention based reference resolution framework for visually situated discourse. The framework uses a weighted integration of visual and linguistic attention to order the candidate referents within the context. The candidate with the highest integrated attention score is taken to be the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Moreover, the system can recognise situations where attentional cues from different modalities make a reference potentially ambiguous. From a cognitive perspective the framework meshes well with psy-
Visual salience and the other one
225
cholinguistic results that point to the role of attention within human reference resolution processes. Finally, it should be noted that the framework as it currently stands is intended to represent an abstract and preliminary attempt. Several issues need to be addressed if it is to be used as a component within a dialog systems for less constrained contexts. In particular, the use of predefined weights for salience integration is overly simplistic. This issue could be addressed by using a machine learning algorithm, such as reinforcement learning, to automatically compute these weights. The visual and linguistic salience algorithms should also be improved. For dialog systems interfacing with virtual environments, the visual salience algorithm should be extended to at least handle attentional cues such as colour, motion and location of gaze. If the framework was to be used within a real-world system, such as a robot dialog system, a computer vision saliency algorithm, such as Itti and Koch (2000), could be adopted. The linguistic saliency algorithm should also be revised and extended. In particular, the relationship between the framework’s model of local level attention and a more global model of discourse structure, such as Grosz and Sidner’s focus stack model or Asher and Lascarides’ SDRT framework, should be clarified. Fortunately, the modular nature of the framework makes such modifications possible without major changes to the overall approach. References J.F. Allen and L.K. Schubert 1991 The TRAINS project. Technical report, University of Rochester, Department of Computer Science. H. Alshawi 1987 M. Ariel 1990
Memory and Context for Language Interpretation. Cambridge University Press, Cambridge, UK. Accessing noun-phrase antecedents. London: Routeledge.
N. Asher and A. Lascarides 2003 Logics of Conversation. Cmabridge University Press. I. Duwe and H. Strohner 1997 Towards a cognitive model of linguistic reference. Report: 97/1 – Situierte Künstliche Kommunikatoren 97/1, Univeristät Bielefeld. P. Gorniak and D. Roy 2004 Grounded semantic composition for visual scenes. Journal of Artificial Intelligence Research, 21:429–470.
226
John D. Kelleher
B.J. Grosz and C.L. Sidner 1986 Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3):175–204. B.J. Grosz, A.K. Joshi, and W. Weinstein 1995 Centering: A framework for modelling local coherence of discourse. Computational Linguistics, 21(2):203–255. B.J. Grosz 1977
The Representation and Use of Focus in Dialogue Understanding. PhD thesis, Standford, University.
J.K. Gundel, N. Hedberg, and R. Zacharski 1993 Cognitive status and the form of referring expression in discourse. Language, 69:274–307. E. Hajicová 1993
Issues of Sentence Structure and Discourse Patterns, volume 2 of Theoretical and Computational Linguistics. Charles University Press.
P.A. Heeman and J.A. Allen 1995 The TRAINS 93 dialogues. Trains Technical Note 94-2, Department of Computer Science, University of Rochester. D. Heinke and G. Humphreys 2004 Computational models of visual selective attention: A review. In G. Houghton, editor, Connectionist Models in Psychology. Psychology Press. J.R. Hobbs 1985
On the coherence and structure of discourse. Technical Report CSLI85-37. Center for the Study of Language and Information.
L. Itti and C. Koch 2000 A saliency-based search mechanism for overt and covert shirts of visual attention. Vision Research, 40(10–12):1489–1506. J. Kelleher and J. van Genabith 2004 Visual salience and reference resolution in simulated 3d environments. AI Review, 21(3-4):253–267. J. Kelleher, F. Costello, and J. van Genabith 2005 Dynamically structuring, updating and interrelating representations of visual and lingusitic discourse context. Artificial Intelligence, 167(12):62–102, September 2005. L. Kievit, P. Piwek, R.J. Beun, and H. Bunt 2001 Multimodal cooperative resolution of referential expressions in the denk system. In H. Bunt and R.J. Beun, editors, Cooperative Multimodal Communication, Lecture Notes in Artificial Intelligence 2155, pages 197–214. Springer-Verlag, Berlin Heidelberg.
Visual salience and the other one
227
C. Koch and L. Itti 2001 Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3):194–203, March 2001. E. Krahmer and M. Theune 2002 Efficient context-sensitive generation of referring expressions. In K. van Deemter and R. Kibble, editors, Information Sharing: Reference and Presupposition in Language Generation and Interpretation. CLSI Publications, Standford. I. Kruijff-Korbayová and E. Hajicová 1997 Topics and centers: A comparison of the salience-based approach and the Centering Theory. Prague Bulletin of Mathematical Linguistics, 67:25–50, Charles University, Prague, Czech Republic F. Landragin and L. Romary 2003 Referring to objects through sub-contexts in multimodal humancomputer interaction. In DiaBruck 7th Workshop on the Semantics and Pragmatics of Dialogue, Sept 4th-6th 2003, University of Saarland, Germany, 2003. S. Lappin and H. Leass 1994 An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561. W.C. Mann and S.A. Thompson 1987 Rhetorical Structure Theory: Description and construction of text structures. In G. Kempen, editor, Natural Language Generation: New Results in Artificial Intelligence, Psychology and Linguistics, pages 83–96. Nijhoff, Dordrecht. M. Poesio 1993
A situation-theoretic formalization of definite description interpretation in plan elaboration dialogues. In P. Aczel, D. Israel, Y. Katagiri, and S. Peters, editors, Situation Theory and its Applications, volume 3. CSLI.
T. Regier and L. Carlson 2001 Grounding spatial language in perception: An empirical and computational investigation. Joural of Experimental Psychology: General, 130(2):273–298. S. Salmon-Alt and L. Romary 2001 Reference resolution within the framework of cogitive grammar. In Proceedings of the Seventh International Colloquium on Cognitive Science (ICCS-01), pages 284–299, Donostia, Spain. M. Spivey-Knowlton, M. Tanenhaus, K. Eberhard, and J. Sedivy 1998 Integration of visuospatial and linguistic information: Language comprehension in real time and real space. In P. Olivier and K.P. Gapp,
228
John D. Kelleher editors, Representation and Processing of Spatial Expressions, pages 201–214. Lawrence Erlbaum Associates.
H. Strohner, L. Sichelschmidt, I. Duwe, and K. Kessler 2000 Discourse focus and conceptual relations in resolving referential ambiguity. Journal of Psycholinguistic Research, 29:497–516. M. Tanenhaus, M. Spivey-Knowlton, K. Eberhard, and J. Spivey 1995 Integration of visual and linguistic information in spoken language comprehension. Science, 268:1632–1634.
Salience in hypertext: Multiple preferred centers in a plurilinear discourse environment Birgitta Bexten
In this paper, I demonstrate that in hypertext, coherence relies on linguistic as well as paratextual salience marking. Linguistically marked, e.g. discourse-old, entities promise a coherent discourse connection in a current hypertext node, whereas paratextual link marks promise that the link-marked entity is coherently connected to the target node. Thus, both kinds of salience markers allow to predict the ‘aboutness’ of the proceeding discourse. Bridging the gap between linguistic centers and hyperlinks with findings concerning common paralinguistic ways of salience marking, like typographic marks of prosodic attributes, I propose an integrative description of the two kinds of salience marks within the framework of Centering. 1.
Introduction
Hypertexts are special. Not only do they increasingly combine semiotic functions and create new ways of associative writing and reading. In addition, their mere tree- or networklike structure provides intellectual challenges for hypertext authors, readers and, of course, for discourse linguists. None of them can completely rely on already existing, polished conventions. Accordingly, all of them wrestle with new attempts to get a grip on text in hypertext. What are the criteria for an appropriate text unit? How can I combine those units to form a coherent whole? How can I mark – and recognise – various text connections, various link types? How can I decide where to read on when I come across a hyperlink? From a discourse linguistic point of view, one of the central questions is what a hypertext’s network structure does to the text. What remains the same if one compares it to traditional linear texts? What changes? Using the framework of Centering, in this paper, I demonstrate how hypertext differs from linear text in its capability and its need to provide multiple so-called preferred centers in a single utterance.
230 2.
Birgitta Bexten
The starting point
Hypertexts, narrative just as well as scientific ones, go beyond the scope of traditional, linear texts. While traditional texts usually only come up with a single reading sequence,1 hypertexts are bound to provide a selection of different sequences. Without this possibility, neither treelike nor networklike hypertexts would be possible. This quality affects the text itself just as well as the hypertext’s author and reader. As far as the text is concerned, it has to adapt to the exceptional discourse structure. Due to the hyperlinks, the text splits up2 and becomes plurilinear.3 Consequently, to form a coherent whole, not only the text within every single hypertext node has to be coherent but also the connections between the linked nodes. For the hypertext author, this means that he has to arrange text and links very carefully. Especially so as he cannot tell for sure which units the reader has already perceived. This is only possible in extremely well-planned hypertexts. The easiest way to master a networked text surely is to cut off direct pronominal connections between the information units and to use referring nouns instead (Kuhlen 1991), but this way of granulating text is surely not desirable for every kind of hypertext. Especially authors of fictional hypertexts can fall back on direct utterance-connecting devices. Surely, the author can add hypertext-specific coherence cues like overviews, lists of currently available target units and so forth (Storrer 2002), but the readability of the hypertext mainly depends on coherent connections between the single units. The sequence of the hypertext units depends on each individual reader’s decision to follow hyperlinks and this might vary every time the hypertext is being read. From the reader’s point of view, this means that the plurilinear structure requires the reader to decide whether he wants to read on in the current unit or in the link’s target unit. The readers’s decision is influenced by his expectations about how the two offered text strings might proceed. He can deduce clues about this from the utterance he is reading at the very moment: usually the most salient entities of a 1. Even though some texts do offer various reading sequences via footnotes, but, in contrast to hypertext units, footnotes are not constitutive for the text. (Nielsen 1995) 2. This description only applies to treelike hypertexts with a simple structure. In a more complex, networklike hypertext, the text not only bifurcates, but also merges at various points. Even if the reader relinearises (parts of) the text while reading, the text would structurally remain a network. For the purpose of this paper, however, the idea of recombining text strings can be left aside as it depends more on the backwardlooking aspect text connections than on the here relevant forward-looking one. 3. For the concept of plurilinear texts see Harweg (1974) and for an application of his model on hypertexts Bexten (2006)
Hypertext and multiple salience centers in the framework of Centering
231
given utterance are most likely to be the topic of the next one. What does this mean for the two text strings in hypertext? As for the string announced by the hyperlink, the reader’s curiosity can be satisfied quite easily even without the need of following the link: he can expect the target unit to provide additional information about the link-marked entity. Hyperlinks are of course quite salient just because they differ from the surrounding text by their highlighted design: conventionally, links are visually marked, e.g. by a different colour.4 In addition to the link marks, there are also hints that concern the continuation of the current unit’s text. This is because not only non-linguistic, paratextual 5 features can increase an entity’s degree of salience but also linguistic ones. Which entities might be central in the ongoing text, the reader can tell from their grammatical role or from their information value, as has been widely described in Centering Theory. Obviously, this kind of discourse marking does not only apply to plurilinear hypertext structures but also to traditional linear discourses. With regard to hypertexts, however, I consider a combination of the two ways of salience marking a decisive factor for the establishment of coherence. The first type of salience facilitates coherent discourse processing in the current hypertext node, and the second for cross-node discourse continuation. What is of interest here, is the question in how far the available results from research on linguistic salience can be used to describe paratextual salience. To answer this question, it is necessary to find out, whether it is justifiable to equate linguistic and paratextual salience. For that purpose, I consider the framework of Centering a promising candidate, for it takes into account the forward-looking, promise-making character of text. In other words: it detects how the text enables the reader to conjecture about the text’s continuation. What is more, the various approaches to Centering indicate that salience can depend on varying factors. Therefore, to cover more than traditional linear texts, an integration of a paratextual way to mark salience is, as the following considerations will point out, not only possible but desirable. The basic Centering model, as originally developed by Grosz et al. (1995) and expended for example by Strube & Hahn (1999) only covers questions of linguistic salience. Therefore, though Centering is a thought-out method to 4. The question of labeling various link types will remain out of consideration here. For an overview over research on the impact of labeled hyperlinks on the cognitive load placed upon the reader see DeStefano & LeFevre (2007). For a model of bottom-up visual salience in a situated-dialog context, see Kelleher (this volume). 5. Genette (1987) defines as paratext those linguistic elements (index, title, etc.) and non-linguistic elements (fonts, illustrations, etc.) that accompany and present text in a medium-specific way.
232
Birgitta Bexten
identify the most salient discourse entity, for the purpose of my argumentation, its focus needs to be extended to non-linguistic ways of salience marking. After a short exemplification of the hypertextual double salience, I will illustrate the fundamental comparability of linguistic and non-linguistic salience in the framework of Centering.6 3.
Hypertext and multiple salient centers in the framework of Centering
3.1. A first example Consider the following example (1) from the fictional hypertext “About time”7 The sequence (1)
a. b. c.
“There are two more continents,” Mouth said. “Maybe more.” Wow. Continents?
is continued in the same hypertext unit with d.
Those are really big, aren’t they?”
and in another unit8 with d.’ Tuber asked where these continents were. The discourse bifurcates due to the hypertext link Continents. It offers the reader a connection to a second text string in addition to the one in the current node. Obviously, the word Continentsis salient because of its link marks. It is however also marked linguistically because of its information value “discourse-old”. In terms of Centering, this last kind of salience, qualifies Continents as the preferred forward-looking center of the current utterance. This kind of salience guarantees that the reader will expect the next utterance to be about continents, 6. For different adoptions of Centering Theory see for example Chiarcos, this volume, who incorporates a variant of Centering Theory into his Mental Salience Framework, and Kelleher, this volume, who computes basic linguistic salience by using an algorithm that is inspired by the Centering framework. 7. Swigart and Strange (2002, “Discoveries” in “Mouth’s Journey 40.000 Years Ago. Part 1.”). 8. Swigart and Strange (2002, “Continents” in “Mouth’s Journey 40.000 Years Ago. Part 1.”)
Hypertext and multiple salience centers in the framework of Centering
233
too; the link marks on the other hand call the reader’s attention to the fact that in the link’s target node he will find a subsequent text string about continents as well.9 Naturally, the two types do not always coincide in one and the same entity. If different entities are salient in the same utterance, one linguistic and one paratextual, the reader can expect that not only the discourse but also the topic bifurcates. The linguistically and the paratextually marked entities are equal in as far as both enable the reader to predict the ‘aboutness’ of subsequent utterances. Hence, both can be regarded as preferred centers. In hypertext, their promising10 character is far more prominent than in linear texts: Linear texts allow the reader simply to read on while they draw his attention more or less unnoticed on the preferred centers. Hypertext forces the reader at least partly to give up this comparatively passive role. With every paratextually marked hyperlink, the hypertext obliges the reader’s attention to undergo a rapid oscillation between between looking through and looking at the text (Bolter 1991), for hypertext offers two different promises about two separated text strings by two different kinds of salience. Text bifurcations in hypertext are funded on the possibility to provide a single utterance with various preferred centers. According to Grosz et al. (1995), in principle, every utterance can have more than one preferred center. Centers are constructs of the discourse: different discourse situations in which one and the same utterance may be made, can lead to different centers. This multiplication of centers is nevertheless not on the same level as multiple preferred centers in hypertext. While in Grosz et al.’s argumentation it is a question of either / or, of various discourse situations, discourse in hypertext construes several parallel discourse situations. In a networklike hypertext environment, i.e., in a discourse with a multitude of bifurcations and mergers, some utterances on all accounts structurally belong to different discourse situations at the same time. Therefore, multiple preferred centers are an essential presupposition for hypertextuality.
9. Admittedly, not all connections between information units are as smooth as this one. Especially in scientific hypertexts, they are rather rare. For the purpose of my argumentation, however, a quantitative inventory is less important than highlighting the possibilities. In addition, the discussion below will show, that the observations from this neat example are transferable to less direct connections. 10. Genette (1987) ascribes all paratexts to some degree an illocutional character: they inform, demand, promise, etc. Hammwöhner (1997), who describes the changes that paratext undergoes in hypertext, points at the promising character of hypertext links.
234
Birgitta Bexten
Before going into detail about forward-looking centers in hypertext, I will give a brief overview on the aspects of Centering Theory which are relevant here, focussing primarily on the question of how preferred forward-looking centers can be identified. 3.2. The basic Centering model The aspect of Centering Theory that is most relevant for the argumentation in this paper is that it examines the possibilities of discourse to give hints about its own continuation. For this purpose, Centering Theory focuses on the interaction between so-called centers of attention and the attentional state. It depends on the connection of these two whether a discourse is perceived as coherent or not. Centers of attention are those discourse entities a sequence of utterances is about. The attentional state dynamically focuses on the most salient center at each point of the progressing discourse. It has direct influence on the inference load placed upon the language user. Usually, a recipient can more or less predict how a discourse would proceed: people tend to continue speaking or writing about the same entity for a while by referring to it intersententially. The attentional state, then, continues centering around a single topic. In this case, the inference load is quite low. But when the discourse flips back and forth between different centers, the thematic progression becomes less predictable. The attentional state undergoes rough shifts, and the discourse is very likely to be perceived as less coherent. What is true for linear discourses definitely holds for connected hypertext units. If the topic that is announced by the link-marked entity is not presented in the links’s target node, the inference load placed upon the reader is even higher than with a rough shift in linear discourses, because the reader relies on the convention that hyperlinks are placed to announce additional information about the marked entity. While a thematic shift in linear discourse is possibel because several centers are eligible for being topicalised in the next utterance, in hypertext the link marks determine the topic of the connected utterance bindingly. The attentional state is directly related to the most salient center of a given utterance. Usually, it is this prominent discourse entity that is semantically linked to an entity of the following utterance. To describe the relation between these connected entities, Grosz et al. (1995) distinguish between forward- and backward-looking centers. The forward-looking centers, Cf , of an utterance Un , are those discourse entities that can be referred to in the following utterance. A set of them can be found in every utterance. Accordingly, the backward-
Hypertext and multiple salience centers in the framework of Centering
235
looking center, Cb , of an utterance Un+1 refers to one of the preceding forwardlooking centers.11 Consequently, it cannot be part of the initial utterance. The following sequence (2)
a. b.
John loves Mary. He wants to kiss her.
is connected by the anaphoric resolution of the forward-looking center John. John, in Grosz et al.’s model, is marked linguistically by its subject position and is therefore more salient than Mary. Naturally, the entity Mary, too, is a forward-looking center. But, because of its lower degree of salience it does not function as antecedent of the backward-looking center in (b): (b) is about John, not about Mary. In the case that the second utterance would be But she does not like him we would have to do with a rather smooth thematic shifting which Grosz et al. would call retaining. The pronoun him still functions as the Cb , but the chance that the next utterance would be about Hans has clearly decreased. She becomes the most salient forward looking center of the current utterance. Walker et al. (1998) state that the most salient forward-looking center, the preferred center, Cp , can be seen as a prediction about the Cb of the following utterance. Hence, identifying the Cp helps the recipient to form expectations about the continuation of the discourse. Marking preferred centers, therefore, has a fundamental impact on the question how smoothly readable a text becomes. The question of how forward-looking centers are marked and which entity is the highest ranked in terms of salience, is widely discussed in Centering Theory. Here, I will briefly introduce two approaches. One of them, namely the predominantly grammatical model by Grosz et al. (1995) represents the basic idea of Centering, while the other, namely Strube and Hahn’s (1999) functional model seems most applicable to hypertext because of its flexibility. 3.3. The preferred center According to Grosz et al., Cf ordering is considerably affected by the grammatical role and the surface position of an entity. The hierarchy they suggest is subject > object(s) > other, for in English, the surface position often corresponds to the grammatical role on the one hand and to topic position on the other. Furthermore, the authors postulate a transition rule that constrains the 11. Backward-looking centers are not the same as anaphora, e.g. pronouns. Grosz et al. illustrate that every utterance can contain several pronouns while only a single entity can function as backward-looking center.
236
Birgitta Bexten
movement of centers between utterances. In short: continuation of the center of attention to the subsequent utterances is preferred over every form of change. Thus, the reader can assume that the subject will be the preferred center, i.e., in this model, the preferred antecedent of a successive backward-looking center, or, informally spoken, the topic of subsequent utterances.12 While Grosz et al. provide syntactic principles – even though they admit that several factors may have influence on the ranking of the Cf –, Strube and Hahn (1999) focus on functional mechanisms. Their model of Centering Theory is language independent due to the fact that functional criteria can be applied to both, fixed- and free-word-order languages. In opposition to Grosz et al., what concerns anaphor resolution, Functional Centering does away with backward-looking centers13 and traditional transition types. Starting from the relatively free-word-order language German, Strube and Hahn ground the ordering of the forward-looking center list in the functional information structure of utterances in a discourse. If an entity provides information that has already been introduced, i.e., if it is discourse-old or hearer-old, it is ranked higher than a discourse-new or hearer-new entity that does not refer to another discourse entity.14 In addition to this basic Cf ranking, which is valid for texts in which pronominal reference is dominant, Strube and Hahn introduce an extended version. The refined criteria of this ranking help analysing texts where pronouns occur rather infrequently, like texts from technical domains. This criterion applies to many hypertexts, too. In a hypertext network, one and the same hypertext node can usually occur at several moments of the reading process. It is therefore easier for the author to avoid the usage of entities without autonomous reference, like pronouns, for cross-node reference. Kuhlen (1991) advises to use referring nouns instead. How exactly the discourse connection between hypertext nodes 12. The interpretation of the centers can differ between languages. In his research on Eastern Kanthy, Filchenko (this volume) resorts to the use of the terms foregrounded center and backgrounded center to mark special pragmatic states. 13. For a first attempt of Centering without backward-looking centers see Strube (1998). 14. The authors make reference to the terms of information status proposed by Prince (1981) and (1992). The sets of discourse-old and discourse-new entities can be further categorised in terms of Prince’s (1981) familiarity scale: hearer-old entities can be split into evoked > unused while the hearer-new entities can be split into inferable > containing inferable > anchored brand-new > brand-new. An investigation that compares the relative contribution of syntactic and semantic prominence to the salience of entities is presented by Rose in this volume. Rose’s corpus analysis indicates that the prediction of subsequent reference is enhanced when joining the information of two salience factors (syntactic and semantic prominence).
Hypertext and multiple salience centers in the framework of Centering
237
is construed anaphorically cannot be discussed in this paper. Here, I focus on the forward-looking part of discourse connections rather than on the question of anaphora resolution.15 Strube and Hahn accommodate the basic Cf ranking to e.g. technical texts by separating a third set of expressions from the hearer-new discourse entities: mediated discourse entities.16 It helps incorporating the availability of world knowledge, which is needed to understand inferables, in a more detailed way. Mediated discourse entities have a status between the set of hearer-old discourse entities (which remains the same) and hearer-new discourse entities (which in this model only consists of brand-new entities). Thus, the extended Cf ranking prefers old discourse entities over mediated ones, and mediated ones over new ones. What is important here is that both models – which, according to Walker et al. (1998), is one of the key aspects of Centering Theory in general – allow for projecting preferences for interpretation in the subsequent discourse. Due to the forced decision-making, i.e., the fact that the networked text forces the reader to decide where to read on with every link, this forward-looking aspect of coherence becomes more dominant in hypertext. In the next section, I illustrate the function of preferred centers in hypertext and illustrate why both Centering models, the grammatical as well as the functional one, would fail to describe them sufficiently. 3.4. … and its application in hypertext The degree of a discourse element’s salience is directly linked to its ability to attract the addressee’s attention. Scharl (2000) points out that in hypertext, salience is considerably affected by the design of the hypertext units. The degree of an element’s salience depends on its placement, its relative size, colour, etc. When it comes to the text itself, it is the link-marked words or phrases that catch the reader’s attention and, therefore, can be assumed to be most salient.17 But a hypertext link does not necessarily correspond to the preferred forward-looking center. Nevertheless, I regard link-marked entities as highranked forward-looking centers as well, as they, too, allow predicting the ‘about15. Nevertheless, I assume that Strube and Hahn’s extended version of the Cf ranking is an applicable Centering model for hypertexts. It does at least allow for the fact that pronominal anaphor resolution is hard to combine with the use of hypertext links. 16. This set consists of inferable, containing inferable, and anchored brand-new entities. 17. The difference between linguistic and non-linguistic salience is also described by Kelleher, this volume, who provides an approach that integrates linguistic and visual salience to account for situated reference resolution.
238
Birgitta Bexten
ness’ of subsequent utterances. The only difference is, that these utterances do not occur in the same hypertext node as the link-marked entity itself but in the link’s target node. Thus, in hypertext, a single utterance can contain multiple preferred centers. Speaking in terms of Grosz et al.’s model, each of the salient centers of an utterance Un , i.e. the one that is ranked highest by grammatical – or functional – criteria as well as the one that is purely salient by means of link marks, can be connected to its own backward-looking center. This does not mean that, at the same time, a single utterance Un+1 could have multiple backward-looking centers. The Cb ’s belong to different utterances which, again, usually belong to different hypertext nodes: one to the node in which the link occurs and one to the target node.18 Link marks can be conceived of as a promise to the reader that he can find more information about the link-marked entity in the target node. This promise can only be kept if the target node contains a coreferent entity, or at least one that shows some kind of connection with the link-marked entity. In other words: whether the promise is kept, depends on a coherent connection between the hypertext nodes. But even – and this is crucial here – if the readers’s prediction does not come true, at the time he comes across the hypertext link, the linkmarked entity has to be regarded as a preferred center nevertheless. Even more: such a case would support my argumention because it illustrates the impact which the hints about the discourse continuation exert on the question of connectivity. The link mark attracts the reader’s attention and offers him an additional way to read on. The discourse bifurcates. The reader has to decide which sequence he wants to follow. When choosing the link, he relies on the link’s promise that there is a coherent connection with the target node. When sticking to in the current node, he can presume the discourse’s process by the usual, syntactically or functionally marked preferred center of the current utterance. In the remainder, I discuss some examples illustrating this phenomenon. As already mentioned, it is not necessarily the preferred center which is marked as hyperlink. I can think of two basic positions a hypertext link can have in respect of the Cp of an utterance. Either the link corresponds to the preferred forward looking center, as was the the case in example (1) – here repeated as (3) –, or it does not, as in example (4).19 Naturally, both cases can occur in one and the same utterance for there can be various link-marked entities at once. To il18. For a discussion of the question whether an utterance can have more than one backward-looking center see Kruiff-Korbayová and Hajiˇcová (1997) 19. Swigart and Strange (2002, ”Who’s on First” in ”The Granville Files. Present Day. Part 2.”)
Hypertext and multiple salience centers in the framework of Centering
239
lustrate the phenomenon, however, it is sufficient to concentrate on occurrences of the two mentioned positions. To make the centers more visible, I have added relevant elliptic forms in square brackets. (3)
a. b. c. d.
“There are two more continents,” Mouth said. “Maybe more [of them].” “Wow. Continents? Those are really big, aren’t they?”
(4)
a. b. c. d. e.
“[…] Conquering another nation is easy. There are models [for conquering another nation]. People had done it before.” “Yeah [they did it before].” “I’m thinking about the ones who did it the very first time.”
In example (3) , the link-marked entity Continents is the preferred center of the given utterance. At the same time, the typical marks show that it serves as a link anchor. In example (4), the center of attention in utterance (e) is obviously not the link-marked verb thinking. The forward-looking centers are I, the ones who, and it. In Grosz et al.’s grammatical model, the subject I would be highest ranked. One could argue that the finite verb is congruent with the subject and therefore could be considered at least be closely connected to the preferred center. But obviously, the author decided not to mark the subject but only the verb. Apart from that, it can be argue in terms of Functional Centering that the situationally evoked entity I is not the preferred center of the utterance after all. Without the link marks, the reader’s attention would neither be focused on Inor on thinking but on the discourse-old entity people evoked by the ones who.20 The reader would not expect the subsequent utterances to deal with the speaker himself or with his process of thinking, but with the people who conquered another nation for the very first time. I’m thinking is no more than a prelude. But is it generally impossible that a verb like thinking can function as a traditional Cp ? Well, technically, it is possible. This requires, however, an adequate linguistic and paralinguistic environment, for example a suitable initiation that enables the speaker to emphasise the verb – or every other entity that is ranked low in terms of salience – contrastively. I get back to this phenomenon below. 20. In Strube and Hahn’s model, forward-looking centers of the same type are ranked according to their position in text. Thus, in (4)e the ones who is ranked higher than it.
240
Birgitta Bexten
Apart from these two basic cases, two more arrangements can be found. It also occurs that, as in example (5),21 not a whole constituent is link-marked but only, e.g, an adjective. In such a case, too, the link could or could not coincide with – or rather be part of – the preferred center. The other possibility is that the link marks go beyond the borders of one constituent as is the case in example (6).22 (5)
a.
b. (6)
a.
b.
Bei Simulationen handelt es sich um spezielle By simulations be it itself about special interaktive Programme, die dynamische Modelle von interactive programs, which dynamic models of Apparaten, Prozessen und Systemen abbilden. devices, processes and systems represent. Simulations are special interactive programs which represent dynamic models of devices, processes and systems. Er nimmt die Sonnenbrille ab, lässt seinen Blick ins He takes the sunglasses off, lets his gaze in_the Wolkenkratzergetümmel tauchen, sieht Details und nimmt skyscraper_turmoil dive, sees details and takes sein Schreiben wieder auf. Die Buchstaben rennen den run the his writing again on. The letters Ereignissen hinterher. events after. He takes off his sunglasses, lets his gaze dive into the turmoil of skyscrapers, sees details and resumes his writing. The letters run after the events.
In example (5), the link only promises more information about interaktive and not about spezielle interaktive Programme. In example (6) – apart from the linguistically marked forward-looking center the letters –, the utterance as a whole functions as a preferred center. In all examples, due to the links, the discourse forks into two distinct text strings. In example (3), for instance, the reader can predict the Cp Continents as being connected to two different subsequences of utterances. Whether this prediction is right, the reader can only find out by reading on either in the current hypertext node or in the target node. For the time being, he can only con-
21. Blumstengel (1998, /Klassifikation-computerunterstuetzter-Lehr-Lernsysteme.html) 22. Ohler (1995, /anfang.htm)
Hypertext and multiple salience centers in the framework of Centering
241
sider the linguistic and paratextual marks of Continents as promises for such connections. Strictly speaking, Continents should therefore be analysed as two Cp ’s, a linguistic center Cp 1 and paratextual center Cp 2. Each belongs to its own text string. At first, both strings are combined in the same hypertext node. Then they split up, and one half continues in the current node and the other in the link’s target node. This phenomenon becomes more obvious in example (4). Here, the two different Cp ’s are divided. The linguistically marked Cp 1, the ones who, is connected to the discourse segment of the current node while the second Cp 2, thinking, opens a discourse connection to the segment of another node. At this point, one could argue that in hypertext not utterances are connected but longer sub-texts and that the question of cross-node forward-looking centers therefore goes beyond traditional23 Centering. Nevertheless, for two reasons, I consider the Centering model a reasonable approach to reveal coherent discourse connections in hypertext. On the one hand, one can doubt that Centering in hypertext necessarily is about connected sub-texts. Especially – but not exclusively – in fictional24 hypertext like the one in example (3), hypertext links are quite often constructed as direct connections between utterances. Due to the network structure, the linked utterances cannot occur as a visible linear sequence but have to be written down in different hypertext nodes. Nevertheless, the structural linguistic connection between the link-marked entity and the coreferent entity in the – usually first – utterance in the target node supports an interpretation as a sequence of utterances. On the other hand, even if this is not the case, Centering can be used to describe the role link marks play for the coherence of a discourse. Coherence, in the Centering framework, can be seen as the discourse’s devices to support a prediction about how it continues and of finding the prediction fulfilled. Both, prediction and fulfillment, are founded in the discourse itself. The predicting part of this progress, which is central here, is fundamentally affected by link marks. 23. Except from traditional Centering theories that focus on local coherence, there are attempts to concentrate on global discourse structure as well (e.g. Hahn and Strube (1997) and Walker (2000)). 24. In non-fictional hypertexts, information retrieval is often more important than a coherent text. Links, then, target at text units that do not continue the text, but rather define the link-marked words. Todesco (1997) even advises to write hypertexts in such a lexicon-like manner. In this case, the hypertext nodes are rather independent of each other. An interpretation of linked sub-texts, therefore, seems indeed more appropriate.
242
Birgitta Bexten
The interpretation of link-marked entities as preferred centers seems to extend the traditional perspective in quite a substantial way. In addition, there are, as the examples have demonstrated parallels with link marks in lexicons and with the use of emphasis in spoken language. Also footnote marks and typographic attributes of speech,25 like bold, italics or underline, can be considered as in line with these findings. Therefore, and also against the background of the question how special hypertexts really are, it seems appropriate to get a bit more granular on the parallels between preferred centers and link marks. 4.
Preferred centers and hyperlinks
The most logical properties to compare preferred centers and hyperlinks are their form and their function. Concerning their function, the preceding elaborations have already demonstrated that both phenomena match considerably. Structurally, both function as a forward-looking connecting element between two discourse elements. From a pragmatic viewpoint, they function as ways to assess the text’s continuation. But hyperlinks go beyond these functions. Apart from the fact that they represent an executable connection between virtual information units, they also announce the bifurcation of the recent text string. The same is true for footnote marks and links in lexicons, but only the latter always announce the same kind of connected unit, namely relatively context independent information about the marked word. Hyperlinks as well as footnote marks can be connected to a much broader range of targets. The difference between them is, that footnotes can only announce side strings of the text while hyperlinks often connect equally important discourse units.26
25. I limit the argumentation here to those typographic attributes that map emphasis to written language. On the one hand, taking into consideration typographic marks in general, would go beyond the scope of this paper – just think of typographic indicators of propositional macro-structures or the marking of technical terms. As far as I know, there are no studies which concentrate on the effects those markings have on connections between utterances. On the other hand, results from research on verbal salience, which are even at hand in the framework of Centering, can be adopted. Which marks are to be regarded as typographic reproduction of prosodic qualities depends on the given discourse context (Wehde 2000). Therefore, it is impossible to in- or exclude specific ways of marking. 26. For a further discussion of this matter see Bexten (2006)
Hypertext and multiple salience centers in the framework of Centering
243
For the matter of the form, the marked entity and the way it is marked can be distinguished.27 First, I will concentrate on the marked entity itself. The preferred centers and hyperlinks in the examples above both mark linguistic entities. How large this entities are, is mutable for both kinds. For the matter of linguistic centers, Grosz et al. (1995) allude that centers are semantic objects and therefore must not be confounded with words, phrases, etc. Also other discourse theories suggest that anaphora connections can be heterosyntactical, anaphora can resolve for example between phrases, subphrases or even conjunctional subordinated clauses (Harweg 1979).28 Here the question is whether centers as for their size are as flexible as hyperlinks. In hypertext, every entity can theoretically be used as a link anchor, no matter how small or large it is. The examples have already shown that whole sentences or mere adjectives can serve as a such.29 Whether a whole constituent or just a noun or an adjective is link marked, depends on several factors. One thing that plays a role is the link’s target: if it provides a definition or explication of the marked word, the surrounding words most likely are not link-marked. This kind of connection usually comes along with a change of dimension (Harweg 1979), which means that a concrete phenomenon, like the interactive character of computer programs in example (5), is linked to a general explanation of the concept of interactivity. Thus, especially non-fictional hypertexts show obvious parallels to printed lexicons.30 In fictional texts, or generally: when a concrete 27. For pragmatical reasons I exclude links, which are placed outside the actual text, and non-linguistic elements, like pictures, that function as link anchor. The fact that non-linguistic elements can be used for thematic connections, points at the smooth transition between the various medial elements, though. 28. In addition to the focus on traditional, linguistic centers, some theories extend the concept to other conceptions of salience, esp. discourse-structural salience (as used by Ramm and Hinterhölzl/Petrova (this volume). They describe salience for both discourse referents and events/discourse segments. 29. And even smaller entities, like letters are link-marked at some websites. In alphabetical lists, this even makes sense, in cases like the one of the proudly presented superlink (http://www.stangl-taller.at/4711/SIEB.10/NETERATUR/LITTERATUR/ Litteratur.html), however, where nearly every letter works as a hyperlink but lacks every thematic connection to its target, the function of hyperlinks is completely undermined. The reader can by no means tell which website he will be sent to when clicking on one of the links. These kinds of hyperlink only function on very experimental websites where text-constitutive phenomena as coherence play a secondary role. 30. The fact that the link marks in print lexicons conventionally follow the word in question comes second here, especially as this kind of marks can be found in hyper-
244
Birgitta Bexten
referent should be referred to, it makes more sense to link-mark the whole constituent. How about linguistic preferred centers then? First, I will focus on their possibility to cross the borders of a single constituent. In the following dialogic sequence of utterances31 (7)
a. b.
A: But you said, that you’ld agree! B: I never said that!
the whole indirect quotation functions as a forward-looking center. To be a realistic candidate32 for anaphora resolution in the next utterance, however, in contrast to larger hyperlinks, such a linguistic center makes certain demands: it has to be preceded by an appropriate preamble, like But you said. This preamble, on the other hand, causes the whole quotation to have the value of a constituent. As example (6) has shown, this is not necessary for hyperlinks. The next question is, whether forward-looking centers can be smaller than constituents? As already implied in the discussion of the verb thinking in example (4), this seems possible, but here, too, certain demands have to be met. Consider the following example: (8)
a. b.
In his opinion, the blue sweater did not suit her. Blue just wasn’t her colour.
In a., blue can only be judged as a preferred center if the thematic context allows it and if, in addition, it is emphasised contrastively. What concerns the context, it requires that several sweaters in various colours were mentioned before. As for the emphasis, it could – leaving the context put of consideration – fall on her or even on sweater just as well, which would cause the preferred center to shift to these entities. Thus, emphasising as a paralinguistic instrument can influence the salience hierarchy. The matter of an entities size and the ways of marking it apparently overlap at this point. As this example indicates and as Grosz et al. (1995) already have conceded, preferred centers can be marked in various ways. Certainly, it does not necessarily have to be a small entity, like an adjective, that is made more salient by means of emphasis. The same texts, too. Grether (n.d.), for example, uses them to differ between intra- and extrahypertext links. 31. For an application of the Centering concept to dialogues see for example Byron & Stent (1998). 32. Theoretically, every clause can be referred to by a meta statement, and therefore can count as a forward looking center, but it has to be regarded as being ranked very low in the salience hierarchy.
Hypertext and multiple salience centers in the framework of Centering
245
holds, as Navarretta’s (2002) supplement to Centering Theory makes clear, for forward-looking centers in general. On the basis of spoken Danish, Navarretta recognises word order, prosodic marking, and syntax as methods for salience marking.33 According to her, only entities that are explicitly focally marked have the highest degree of salience. By marking entities, the language producer announces to the addressee that the ‘aboutness’ of the discourse will change. Doing so, the recipient can prepare to shift his attention as well. Navarretta’s approach refers to spoken, linear discourses, but can without difficulty be adapted to written language in general and to hypertexts in particular, if one takes into account the possibility to use the already mentioned ways to typographically mark prosodic qualities. In the example above, the adjective blue could typographically be highlighted as bold, italics, underlined, etc. Both, emphasis and typographic marks are paralinguistic phenomena that can be operated to influence an entities degree of salience. Especially typographic marks are, in this regard, align with hyperlinks. Hyperlinks, too, depend on typographic highlighting to fullfill their function. In both cases, even an entity that normally would not be expected to be salient can form the center of attention. In example (4), the writer has explicitly marked thinking, an entity that under normal circumstances would hardly serve as a preferred center. By doing so, he pushes the low-ranked entity to a very salient position and informs the reader that, apart from the ones who (i.e. people), there is a second entity the proceeding discourse – or rather one of the two different proceeding text strings – can be about. The reader can prepare to go into one of the two different directions the discourse offers: one with an entity that has a coherent connection with people and one that continues the string evoked by the paratextually marked entity thinking. In example (3), the two different Cp ’s are not separated, but overlay each other. The double salient entity Continents promises the reader that there are two different sequences of utterances which are both about continents. There are, however, also clear differences because hyperlinks are more flexible than linguistic centers. On the one hand, the possibility to emphasising an entity goes hand in hand with the entity’s discourse environment, whereas this does not apply for hyperlinks (remember example (4)). The same holds for the question of marking larger entities: here, too, linguistic centers depend on
33. Her analysis is supported by findings outside Centering Theory. Caldwell (2002), for example, underlines that in English syntax alone is not enough to foreground an entity and names among other devices verbal emphasis.
246
Birgitta Bexten
their surroundings, while this does not necessarily apply for hyperlinks.34 On the other hand, emphasis or typographic marks force both, the forward-looking center and the reader’s attention, to shift. In contrast, in hypertext, the center of attention only shifts in the reader’s mind but not in the discourse itself. In the discourse segment, the first Cp remains where it is while, in addition, a second Cp arises from the link marks. The comparison of form and function of linguistic centers and hyperlinks reveals many parallels. What is more, this comparison demonstrates that it is a small step from mere linguistic to paralinguistic devices of salience marking. As mentioned, the latter already are partly integrated in Centering approaches. Against the background of the presented elaboration, including hyperlink marks is only consequent. 5.
Conclusions and future research
The analysis of utterances in hypertexts shows that the limit of one preferred forward-looking center per utterance can be exceeded. Apart from the normal linguistically marked Cp , the writer can explicitly highlight additional Cp ’s. The first Cp corresponds to the most salient forward-looking center usually analysed in Centering Theory. The additional Cp ’s, the hypertext links, can be entities with a low degree of linguistic salience. They usually are marked paratextually by being, for example, underlined and coloured. Both kinds of Cp ’s are equal in as far as both permit the reader to predict the ‘aboutness’ of subsequent utterances. Therefore, both contribute to the coherence of discourse in a plurilinear hypertext environment. In this paper, I proposed to describe the two types integratively in the Centering framework. While linguistically salient Cp ’s can sufficiently be analysed with traditional Centering theories, I used Navarretta’s description of explicit salience marking as theoretical background to show that an integration of paralinguistic phenomena is perfectly reasonable. Hypertext links are not necessarily salient in a linguistic way. Hence, in traditional Centering, they would not be regarded as preferred centers. Referring to less salient discourse entities, in traditional Centering, would mean to risk the discourse to become incoherent. Here, I pointed out that with regard to hypertext links this interpretation is not accurate. Discourse in hypertext allows for rougher shifts than traditional linear discourse because by highlighting discourse entities as links, the writer warns the reader to prepare for the thematic shift. Without link 34. At least, this is true from a pure descriptive viewpoint. Keeping the question of coherent connections in mind, it is arguable whether it really is advisable to tap the full potential of hyperlink placement.
Hypertext and multiple salience centers in the framework of Centering
247
marks, the reader could focus his attention only on the linguistically marked preferred center in the linear sequence of utterances in one hypertext node. Only the paratextual marks tell him that there is a connection with another text string and that he should shift his attention if he would want to follow it. Without link marks there would be no such information and, what is more, there would be no hypertext. The hypertext network depends on both kinds of preferred centers. Hence, in contrast to linear texts, coherence in hypertext can only be described sufficiently in a model that includes linguistic as well as non-linguistic, i.e. paratextual, salience. In future research such a model could adapt results from Centering research to the analysis of hypertexts. It is, for example desirable to take a closer look on the impact that the placement of hyperlinks in an utterance has on cross-node coherence. Arguing on the assumption that every center shifting between utterances entails a decrease of coherence, one could assume, that this holds for cross-node coherence, too. Therefore, it would be plausible to let coincide the linguistic and the paratextual center as often as possible. On the other hand, it is imaginable that this would confuse the reader, because the added information value would be less comprehensible. An integrative application of Centering Theory, as suggested above, provides a framework to deal with these questions. References Bexten, Birgitta 2006 Hypertext and Plurilinearity. Challenging an old-fashioned discourse model. International Symposium. Discourse and Document. Schedae. Prépublications de l’Université de Caen Basse Normandie. Press universitaires de Caen, 117–121. Blumstengel, Astrid 1998 Entwicklung hypermedialer Lernsysteme. Ph.D. thesis, University of Paderborn, Paderborn. http://dsor.uni-paderborn.de/forschung/publi kationen/blumstengel-diss/. Bolter, David J. 1991
Writing space: The computer, hypertext, and the history of writing. Hilldale etc.: Lawrence Erlbaum Associates, Inc.
Byron, Donna K. and Stent, Amanda J. 1998 A preliminary model of Centering in dialog. Proceedings of the 36th annual meeting on Association for Computational Linguistics, Morriston : Association for Computational Linguists, 1475–1477. Caldwell, Thomas P. 2002 Topic-Comment Effects in English. Meisei Review, 17: 49–69.
248
Birgitta Bexten
Chiarcos, Christian this volume The Mental Salience Framework: Context-adequate generation of referring expressions. this volume, 105–139. DeStefano, Diana and Jo-Anne LeFevre 2007 Cognitive load in hypertext reading. A review. Computing Human Behaviour, 23: 1616–1641. Genette, Gérard 1987
Seuils. Paris: Éditions du Seuil.
Grether, Reinhold n.d. Die Weltrevolution nach Flusser. http://www.flusser.de/. Grosz, Barbara J., Aravind K. Joshi, and Scott Weinstein 1995 Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2): 203–225. Hahn, Udo and Michael Strube 1997 Centering in-the-Large: Computing Referential Discourse Segments. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, 104– 111. Hammwöhner, Rainer 1997 Offene Hypertextsysteme. Das Konstanzer Hypertextsystem (KHS) im wissenschaftlichen und technischen Kontext. Universitäts-Verlag, Konstanz (= Schriften zur Informationswissenschaft; 32). Harweg, Roland 1974 Bifurcations de textes. Semiotika 12: 41–59. Harweg, Roland 1979 [1968] Pronomina und Textkonstitution. München: Wilhelm Fink Verlag. Hinterhölzl, Roland and Svetlana Petrova this volume Rhetorical relations and verb placement in Old High German. this volume, 173–201. Kelleher, John D. this volume Visual Salience and the Other One. this volume, 205–228. Kruijff-Korbayová, Irina and Eva Hajicová 1997 Topics and Centers. A Comparision of the Salience-Based Approach and the Centering Theory. The Prague Bulletin of Mathematical Linguistics, 67: 25–50. Kuhlen, Rainer 1991
Hypertext. Ein nicht-lineares Medium zwischen Buch und Wissensbank. Springer, Berlin, Heidelberg, New York etc. (Edition SEL-Stiftung).
Hypertext and multiple salience centers in the framework of Centering
249
Navarretta, Constanza 2002 Combining Information Structure and Centering-based Models of Salience for Resolving Intersentential Pronominal Anaphora. In: António Branco, Tony McEnery and Ruslan Mitkov, editors,Proceedings of DAARC 2002 – 4h Discourse Anaphora and Anaphora Resolution Colloquium, Lisbon, September, 135–140. Nielsen, Jakob 1995 Ohler, Normen 1995 Prince, Ellen F. 1981 Prince, Ellen F. 1992
Ramm, Wiebke this volume
Rose, Ralph L. this volume Scharl, Arno 2000
Multimedia and hypertext: the internet and beyond. Boston u.a.: Academic Press Professional, Inc. Die Quotenmaschine. http://home.ph-freiburg.de/kepser/qm/. Toward a Taxonomy of Given-New Information. In: Peter Cole, editor, Radical Pragmatics. Academic Press, New York, 223–255. The ZPG Letter: Subjects, Definiteness, and Information-Status. In: William C. Mann and Sandra A. Thompson, editors. Discourse Description: Diverse Analyses of a Fund Raising Text, John Benjamins B.V., Philadelphia, Amsterdam, 295–325. Discourse-structural salience from a cross-linguistic perspective: Coordination and its contribution to discourse (structure). this volume, 143–172. Joint Information Value of Syntactic and Semantic Prominence for Subsequent Pronominal Reference. this volume, 81–103. Evolutionary Web Development. Automated Analysis, Adaptive Design, and Interactive Visualization of Commercial Web Information Systems London: Springer.
Storrer, Angelika 2002 Coherence in Text and Hypertext. Document Design, 3(2):156–168. Strube, Michael 1998
Never Look Back: An Alternative to Centering. Proceedings of the 17th International Conference on Computational Linguistics, Association for Computational Linguistics, Morristown, NJ, USA, 1251– 1257.
Strube, Michael and Udo Hahn 1999 Functional Centering: Grounding Referential Coherence in Information Structure. Computational Linguistics, 25(3):309–344.
250
Birgitta Bexten
Swigart, Rob and Allen Strange 2002 About Time. A Digital Interactive Hypertext Fiction. Two Braided Parallel Paths. A Double Helix. http://www.wordcircuits.com/gallery/ abouttime/. Todesco, Rolf 1997
Die Definition als Textstruktur im Hyper-Sachbuch. In: Dagmar Knorr and Eva Maria Jakobs, editors, Textproduktion in elektronischen Umgebungen. Frankfurt/M: Peter Lang (= Textproduktion und Medium; 2), 109–120.
Walker, Marilyn A., Aravind K. Joshi and Ellen F. Prince 1998 Centering in Naturally Occurring Discourse: An Overview. In: Marilyn A. Walker and Aravind K. Joshi and Ellen F. Prince, editors, Centering Theory in Discourse. Oxford University Press, Oxford, England, 1–30. Walker, Marilyn A. 2000 Toward a Model of the Interaction of Centering with Global Discourse Structures.Verbum. http://www.dcs.shef.ac.uk/ walker/centcache.pdf Wehde, Susanne 2000 Typographische Kultur. Eine zeichentheoretische und kulturgeschichtliche Studie zur Typographie und ihrer Entwicklung.. Tübingen: Niemeyer (= Studien und Texte zur Sozialgeschichte der Literatur, Bd. 69).
Establishing salience during narrative text comprehension: A simulation view account Berry Claus
1.
Introduction
Salience as a theoretical concept has received only little attention in psychological research on text comprehension, that is, in research on how comprehenders mentally represent what is described in a text. However, more than fifteen years ago, salience was one of the controversial issues of a dispute in this field. What had happened? McKoon and Ratcliff published a paper in which they argued for their minimalist hypothesis according to which “readers do not automatically construct inferences to fully represent the situation described by a text” (McKoon and Ratcliff, 1992, p. 440). They also provided a minimalist account of a by-now classical empirical finding of Glenberg, Meyer, and Lindem (1987). In the study by Glenberg and colleagues, participants read short narratives such as (1) in either of the two versions and then were tested for the accessibility of one of the mentioned entities, the target entity (e.g., sweatshirt). The target entity was found to be better accessible when it was spatially associated with the narrative’s protagonist (put on) compared to when it was spatially dissociated from the protagonist (took off). (1)
John was preparing for a marathon in August. After doing a few warmup exercises, he put on (associated) / took off (dissociated) his sweatshirt and went jogging. He jogged halfway around the lake without too much difficulty. Probe: sweatshirt
This finding received much attention and has been considered an elegant support for the notion that comprehenders construct situational representations which guide comprehension. However, McKoon and Ratcliff claimed that there is an alternative interpretation of the finding. They argued that the difference in accessibility between the associated and dissociated condition can be explained without assuming that comprehenders construct situational representations. According to their interpretation, the result does not reflect the spatial distance in
252
Berry Claus
the described situation. Rather, the differential accessibility should be attributed to a difference between the two conditions with regard to salience in a propositional representation. According to this account, the associated entity was more accessible than the dissociated entity because it was more salient. It is true that the two conditions do not only differ with regard to the spatial structure. Hence, McKoon and Ratcliff may be right that the difference in accessibility is not due to the manipulation of spatial distance but rather reflects a difference in salience. Yet, there are two problems with their alternative interpretation in terms of salience. First, McKoon and Ratcliff do not provide an adequate explanation of why the target entity was more salient in the associated condition compared with the dissociated condition. It should be noted that the two conditions did not differ with regard to linguistic salience. The target entity was mentioned only once and at the same syntactic position in both conditions. According to McKoon and Ratcliff, the two conditions differed with regard to salience in a propositional representation because the associated target entity was more relevant to the discourse topic than the dissociated target entity (see Hinterhölzl and Petrova, this volume, and Ramm, this volume, for discussions of the term topic from a linguistic perspective). Though it seems intuitively quite plausible that the two conditions differed with regard to discourse topicality, it remains unclear what this difference is due to. The second problem with McKoon and Ratcliff’s alternative interpretation is the general denial of situational representations. They thereby ignore the possibility that salience may emerge from a situational representation. The notion of salience is by no means incompatible with situational representations. On the contrary, the salience of an entity which is mentioned in a text may well derive from the representation of the described situation.1 Roughly speaking, the target entity in the study by Glenberg and colleagues might have been more salient in the associated condition due to its relation to the protagonist and the affordances it provides. Attributing the assumed difference in salience between the two conditions as a result of the situational representation does not only provide an explanation of what the difference is due to. It also offers a more parsimonious account of the finding by Glenberg and colleagues in terms of salience than McKoon and Ratcliff’s interpretation. Hence, it seems worthwhile to pursue and elaborate an account of salience in terms of situational representations. The aim of the present paper is to convince the reader that what makes an entity salient – over and above linguistic factors – may indeed depend on the representation of the described situation (for approaches that take into account 1. It should be noted that this is also one of the arguments in Glenberg’s reply (see Glenberg and Matthew, 1992) to McKoon and Ratcliff (1992).
A simulation view account of salience
253
other extra-linguistic factors, see Bexten, this volume, who addresses the issue of hypertextual salience marking; Kelleher, this volume, who developed a model of reference resolution in situated communication that takes into account visual salience; see also Rose, this volume, for an account of pronominal reference that joins the information of syntactic and semantic prominence). The theoretical point of departure is the simulation view of language comprehension. According to this view, language comprehension is tantamount to mentally simulating the experience of the described situation. It should be noted in advance that the paper is not intended as providing a full account of salience. Its purpose is to outline a simulation view account of salience which claims that salience may derive from mental simulations constructed during narrative text comprehension. The next section will provide a brief overview of the simulation view of language comprehension. Section 3 will give an account of how mental simulations during language comprehension may affect salience. Section 4 will report empirical support for the simulation view account of salience. Section 5 will conclude with some final remarks.
2.
Language comprehension as mental simulation
Currently, there is wide agreement among language comprehension researchers that narrative comprehension involves the construction of a representation of the described situations (e.g., Johnson-Laird, 1983; van Dijk and Kintsch, 1983; Zwaan and Radvansky, 1998). A situational representation constructed during language comprehension is a referential representation. It is a representation of the nonlinguistic entities which constitute the described situation (cf. Heim, 1982: file change semantic; Kamp and Reyle, 1993: discourse representation theory; Karttunen, 1976: discourse referents). Yet, a controversial issue is the question as to the representational format of a referential representation. Traditionally, language comprehension – and cognition in general – has been viewed as being based on the manipulation of amodal, abstract symbols. This traditional view assumes clear demarcations between language processing on the one hand and perception and action on the other hand. Recently, however, a radically different view has been gaining in importance in psychological language comprehension research. According to embodied theories (e.g., Barsalou, 1999; Glenberg, 1997; Zwaan, 2004), referential representations constructed during language comprehension recruit the same modality-specific mental subsystems as representations constructed during non-
254
Berry Claus
linguistic cognition. Thus, referential representations are assumed to be modal representations, which are grounded in perception and action. Proponents of embodied theories adopt a simulation view of language comprehension. Language comprehension is assumed to involve mentally simulating the described states of affairs. Consider, for example, the utterance in (2). (2)
A pit bull is attacking a little girl.
According to the simulation view, the utterance is understood by mentally simulating the experience of the described situation. This simulation would utilize the same mental subsystems as are involved when actually seeing and/or hearing a pit bull attacking a little girl. It should be noted that the simulation view does not imply that simulations constructed during language comprehension are “life-like”. Simulations are always vague and incomplete (cf. Barsalou, 1999; Zwaan, 2004). A simulation of described states of affairs can be considered to be a model of a part of a world. This model cannot be complete, it has to be partial. Language never specifies all aspects of a described situation. Moreover, comprehenders are economical processors and usually do not fill in what language left undetermined (e.g., Graesser, Singer and Trabasso, 1994). There is already growing evidence for the simulation view of language comprehension. Neuroscientific studies have revealed a considerable overlap between the pattern of brain activation that occurs when a particular linguistic expression is processed and the pattern of activation that is involved in actually experiencing the object or doing the activity denoted by the linguistic expression (e.g., Buccino, Riggio, Melli, Binkofski, Gallese, and Rizzolatti, 2005; González, Barros-Loscertales, Pulvermüller, Meseguer, Sanjuán, Belloch, and Ávila, 2006; Hauk, Johnsrude, and Pulvermüller, 2004; Moscoso del Prado Martín, Hauk, and Pulvermüller, 2006). For example, the study by Hauk and colleagues (2004) indicated that when processing verbs that refer to actions, like pick or kick, areas in the motor area are activated, which overlap with the areas that are activated when actually performing the actions. Further empirical support for the simulation view comes from behavioural studies (e.g., Claus and Kelter, 2009; Glenberg, Havas, Becker, and Rinck, 2005; Glenberg and Kaschak, 2002; Glenberg, Sato, and Cattaneo, 2008; Meteyard, Bahrami, and Vigliocco, 2007; Pecher, van Dantzig, Zwaan, and Zeelenberg, 2009; Taylor and Zwaan, 2008; Zwaan and Taylor, 2006). For example, the study by Zwaan and Taylor (2006) revealed an interaction between comprehending sentences implying a particular motor action and concurrently performing a corresponding motor action. In one of their experiments, participants read
A simulation view account of salience
255
sentences that either implied a clockwise, (3a), or a counterclockwise manual rotation, (3b). The sentences were presented frame-by-frame (in the example below, the frame boundaries are indicated by the slashes). Participants advanced through the sentences by turning a knob either in clockwise or in counterclockwise direction. Reading times for the frame that contained the critical verb (e.g., screwed in, turned down) were longer when there was a mismatch compared to when there was a match between the direction of the rotation implied by the verb and the direction of the knob turning required to advance through a sentence. (3)
a. b.
To attach / the boards / he / took out / his / screwdriver / and / screwed in / the / screw. He / realized / that the / music / was / too loud / so he / turned down / the / volume.
Finally, there is also empirical evidence for the simulation view of language from behavioural studies concerned with the representation of abstract information, such as descriptions of non-physical transfer (Glenberg and Kaschak, 2002), desiderative sentence mood (Claus, 2008), or negation (Kaup, Lüdtke, and Zwaan, 2006; Kaup, Yaxley, Madden, Zwaan, and Lüdtke, 2007). Taken together, the findings from neuroscientific and behavioural studies provide strong empirical support for the view that language comprehension involves embodied mental simulations. The findings are difficult to align with amodal theories of language comprehension. To be sure, amodal theories could account for the findings by adding additional assumptions. However, such an account would be a completely post hoc explanation. Moreover, amodal theories suffer from two inherent problems. The transduction problem refers to the lack of an account as to how amodal abstract symbols emerge in the mind, that is, how perceptual experiences are transduced into arbitrary symbols (Barsalou, 1999; see also Brooks, 1987). The reverse of this problem is the symbol grounding problem. It pertains to the question as to how amodal abstract symbols are mapped back onto the world, that is, how the meaning of arbitrary symbols is grounded (Harnad, 1990). Both problems do not exist in embodied theories of cognition which assume that meaning is grounded in perception and action. At present, the embodied account of language comprehension is not yet a full-fledged theory. Most of the studies that investigated predictions of the simulation view were concerned with the processing and representation of narrated concrete situations. The presently available evidence for the simulation view is limited to narrative text comprehension. There are currently no studies within
256
Berry Claus
this framework which address the issue of expository texts. What is also still lacking are substantial theoretical approaches and empirical evidence regarding the question as to how embodied theories of language comprehension can account for issues such as abstract concepts and function words. However, the results of the above mentioned studies concerning abstract transfer, sentence mood, and negation are promising with regard to future research within the simulation view framework. The next section considers what the simulation view of language comprehension in its present state can contribute to the issue of salience. The scope of the considerations is limited to nonlinguistic aspects of the described situations during the comprehension of narrative texts. 3.
Salience is derivable from mental simulations
According to the simulation view of language comprehension, comprehenders understand the description of a situation by running a mental simulation. With regard to the comprehension of a narrative text that describes an ongoing occurrence consisting of temporally contiguous events, it can be assumed that comprehenders construct a coherent dynamic representation as they do when experiencing an evolving event sequence (cf. Kelter, Kaup and Claus 2004). That is, they would start with the simulation of the first event, and then continue with the simulation of the second event, and so on, gradually constructing a coherent representation. Only when encountering a temporal shift, the current simulation is discontinued, and a new simulation is initiated (cf. Kelter et al., 2004). Hence, as long as the narrative describes a temporally contiguous sequence of events, the representation consists in a continuously growing simulation. However, it is beyond question that the entire hitherto constructed simulation cannot be available at a given moment during text processing. Due to working-memory capacity limits, only a few elements of the described event sequence are available at any one time. It seems reasonable to conceive these elements as the most salient ones at a particular time during text processing.2 What does the simulation view imply with regard to the issue as to which entities of a narrative constitute the available and hence salient elements at any on time? The answer to this question emerges from two characteristics of mental simulations constructed during language comprehension. Simulations are 2. Indeed, there seems to be wide agreement across different theories on text comprehension that the most salient entities are part of the available working-memory representation. However, there is disagreement on the question of which factors constrain the available set of entities.
A simulation view account of salience
257
assumed to be experiential and perspectival. Let’s first consider these two characteristics and then turn back to the question. 3.1. Mental simulations of described situations are experiential Mental simulations constructed during language comprehension rest upon experiences. Roughly speaking, it is assumed that incoming words re-enact multimodal memory traces of previous experiences with the entities which they denote (cf. Zwaan, 2004; see also Barsalou, 1999). Combinations of words govern the activation of mutually compatible experiential traces and guide their integration in a simulation of the described situation. Take for example, the utterance in (2) about the pit bull and the girl. When comprehending this utterance, the words pit bull, attacks, and girl will each re-activate experiential traces of different encounters with pit bulls, girls, and attacking events (originating from actual experience as well as for example from language or films). Thus, according to the simulation view, language comprehension is strongly affected by the comprehender’s experiences. Hence, mental simulations are biased. Consider the short narrative in (4), adapted from Sanford and Garrod (1981, p. 114). (4)
John was on his way to school. He was terribly worried about the maths lesson. He thought he might not be able to control the class again today. It was not a normal part of a janitor’s duties.
When reading the first sentence, most people will simulate a pupil on his way to school. From an experiential simulation view this can be attributed to the fact that for most comprehenders, the majority of memory traces of way-toschool experiences originate from their own school days. Hence, most comprehenders are led up the garden path by an experientially biased simulation, resulting in difficulties when processing the third sentence. According to the simulation view, a teacher, who would read the narrative in (4) might construct a differentially biased simulation, resulting in processing difficulties with the last sentence when John turns out to be a janitor. Mental simulations of described situations are not only shaped by unique experiences. In particular, they are also constrained by basic principles underlying human experience of the world such as temporal and spatial organization. Time plays a central role in how we experience the world. The temporal dimension can be considered to be the most important one in structuring our experiences (cf. Navon, 1978). In experiencing, we conceive time as continuously extending from past to present to future. The present is mentally set off against the past and the future. More precisely, the situation that exists at the
258
Berry Claus
now point is mentally highlighted as it is given in perception and can be acted upon. However, the now point is not fixed but moves forward continuously. Empirical findings suggest that during narrative comprehension, comprehenders similarly act on the assumption of a continuous progression of the now point in the described world. Reading times for sentences implying a discontinuous shift of the narrative Now are prolonged compared with reading times for sentences implying a continuous movement of the narrative Now (e.g., Bestgen and Vonk, 2000; Rinck and Weber, 2003; Speer and Zacks, 2005). Human experience is also organized and affected by the spatial dimension. It is constrained by the scope of the human perceptual and motor apparatus. Perception and action are confined to a limited spatial region. Objects within this region are mentally organized by a spatial framework which is constructed by the three axes of the body (head/feet, front/back, left/right). The axes differ in accessibility depending on their perceptual and physical asymmetries and their relation to gravity. Empirical findings indicate that during language comprehension, people likewise impose a spatial framework on the described world (e.g., Bryant, Tversky, and Lanca, 2001; Hörnig, Claus, and Eyferth, 2000; Franklin and Tversky, 1990). Inevitably, human experience never results in objective representations of states of affairs. Rather, the representations are interpretations of states of affairs, which are governed by the experiencer’s point of view. Hence, representations constructed during nonlinguistic cognition are always perspectival. According to the simulation view, this also holds for language comprehension (cf. MacWhinney, 1977, 2005). 3.2. Mental simulations of described situations are perspectival Narratives are usually centred around a protagonist. Hence, with regard to narrative text comprehension, it can be assumed that the mental simulation of the described events is biased by the stated or inferred perspective of the protagonist. Indeed, empirical findings indicate that comprehenders adopt the protagonist’s spatial point of view (Black, Turner and Bower, 1979; Franklin and Tversky, 1990; Rall and Harris, 2000; Ziegler, Mitchell and Currie; but see O’Brien and Albrecht, 1992). For example, in the study by Black and colleagues (1979), participants read sentences, such as (5a) and (5b), which consisted of two clauses. The main clause introduced a character and his or her location; the subordinate clause described a movement of a second character toward this location. The movement was either referred to by a deictic term of motion (come) that was consistent with the point of view of the first character, (5a), or by a deictic term of motion (go) that implied a perspective shift, (5b).
A simulation view account of salience
(5)
a. b.
259
Bill was sitting in the living room reading the paper when John came into the living room. Bill was sitting in the living room reading the paper when John went into the living room.
Reading times for the perspective-shift sentences were found to be prolonged compared with reading times for the perspective-consistent sentences, indicating that the participants adopted the spatial point of view of the first character. This conclusion is further bolstered by the additional finding that participants made systematic errors in recalling the perspective-shift sentences by replacing went by came. However, spatial point of view is merely one type of perspective. There is empirical evidence that comprehenders track the protagonist’s mental perspective as well. Studies concerning the representation of emotions suggest that comprehenders infer unmentioned emotional states of protagonists (Gernsbacher, Goldsmith and Robertson, 1992; Gernsbacher and Robertson, 1992). Other findings indicate that comprehenders also infer non-explicitly stated goals of protagonists (Long and Golding, 1993; Poynor and Morris, 2003). In addition, there is evidence that comprehenders keep track of the protagonist’s knowledge/ignorance (Barquero, 1999; de Vega, Díaz and León, 1997). A study by Sanford, Clegg, and Majid (1998) suggests that states of affairs being mentioned in a narrative are generally mentally coded in terms of their significance to the protagonist. The results indicate that a background information sentence, such as (6), is interpreted as being experienced by the protagonist. (6)
The air was hot and sticky.
As mental simulations that are constructed during language comprehension are assumed to be experiential in nature, they are biased by the comprehender’s current and past personal experiences. Accordingly, mental simulations constructed during language comprehension should be biased also by the comprehender’s perspective. Indeed, empirical studies indicate that language comprehension is affected by the comprehender’s personality. Findings of Zwaan and Truitt (1998) indicate that smokers and non-smokers differ with regard to processing smoking-related sentences. A study by Holt and Beilock (2006) suggests that novice and expert ice hockey players and novice and expert football players construct different mental simulations of described hockey-specific situations and football-specific situations, respectively. Let’s now turn back to the question as to what the simulation view implies with regard to the issue as to which entities of a narrative constitute the salient elements at a given moment during text comprehension.
260
Berry Claus
3.3. Implications for salience As outlined at the beginning of this section, capacity limits constrain the availability of elements of the narrated world. Only a restricted part of the narrated world is available at any one time during comprehension. Elements which belong to that part can be considered to be the currently most salient entities of the unfolding narrative. As to the question which part of the narrated world can be assumed to be available at a given time, the prediction of the simulation view should be straightforward by now (and might, at first glance, even appear to be trivial). According to the simulation view, what constitutes the available part is the current Here and Now of the protagonist. Hence, the simulation view implies that the entities which make up the protagonist’s current situation are the most salient ones at a given time in the course of comprehension. Thus, entities pertaining to the protagonist’s Now and entities pertaining to the protagonist’s Here should, in principle, be highly salient, whereas temporally or spatially remote entities should be (more or less) low salient. However, mental simulations constructed during narrative text comprehension are assumed to be perspectival. They are biased by the protagonist’s perspective which additionally determines which entities of the narrated world compose the available set of entities at a given time during comprehension. First, the available set is not confined to entities which are physically present at the protagonist’s current situation. It is molded by the protagonist’s mental perspective. Hence, the available set should also comprise the protagonist’s mental states such as thoughts, emotions or goals. Second, the available set does not consist of all entities which are present at the protagonist’s current situation. It is constrained by the protagonist’s spatial point of view and his or her needs and goals. As a result, the available set should first and foremost include those entities of the current situation which are visible to the protagonist, which he or she could act upon, or which are of functional importance to his or her current situation. 4.
Empirical Evidence
This section will report findings which provide empirical support for the claim that salience may derive from mental simulations during language comprehension. Before turning to these findings, some remarks on the measurement of salience have to be made.
A simulation view account of salience
261
4.1. Measuring salience by testing accessibility The empirical findings reported below come from studies that investigated whether the mental accessibility of entities mentioned in a narrative text is affected by properties of the described situation.3 Here, the findings of these studies are considered as being of relevance for the issue of salience as it can be assumed that highly salient entities which are part of the available set of elements at the time of testing are better accessible than non-salient entities. It should be noted that a difference in accessibility does not necessarily imply a difference in anticipation of anaphoric reference. Manipulating the temporal distance in the described world (large vs. less large) between a past event and the current narrative Now at the time of testing affected the mental accessibility of an entity involved in the past event but had no effect on ratings of the likelihood that the upcoming text would anaphoricically refer to the entity (Claus and Kelter, 2006, control experiment). This suggests that mental accessibility is not a reliable predictor of the degree to which an anaphoric reference is expected – at least not in case of entities which do not pertain to the protagonist’s current Now. However, this may not affect the scope of the findings reported below. The findings stem from studies which compared the mental accessibility of entities which are present in the protagonist’s current situation to the mental accessibility of entities which are absent from the current situation. A result by Glenberg and Mathew (1992) indicates that entities that are spatially and temporally associated with the protagonist and entities that are spatially dissociated differ in mental accessibility as well as in ratings of perceived salience. 4.2. Mental accessibility of elements of the protagonist’s current situation According to the simulation view, the available set of elements at a given time during text processing includes those entities which make up the protagonist’s current situation. Hence, entities pertaining to the narrative Now at the time of testing should be highly accessible. Indeed, numerous studies have shown that states of affairs that obtain at the current Now are especially easy to access, whereas states of affairs that obtained in the described world prior to that time 3. The majority of these studies tested mental accessibility by measuring reaction times on a probe-recognition task. In a typical probe-recognition task experiment, participants read texts sentence by sentence at a self-paced rate. At a given moment (either during or at the end of the text presentation), they are presented with a probe word. Their task is to indicate as quickly and accurately as possible whether or not the word was mentioned in the text.
262
Berry Claus
are less accessible (e.g., Anderson, Garrod, and Sanford, 1983; Bestgen and Vonk, 1995; Carreiras, Carriedo, Alonso, and Fernández, 1997; Magliano and Schleich, 2000; Speer and Zacks, 2005; Zwaan, 1996; Zwaan, Madden, and Whitten, 2000). In one of the experiments of the study by Carreiras and colleagues (1997, Experiment 1), participants read short narratives and were tested for the mental accessibility of a job description (e.g., economist) that was mentioned in the narrative either in a sentence describing the protagonist’s current situation, as in (7a), or in a comparable sentence referring to the protagonist’s past, as in (7b). (7)
a. b.
Now she works as an economist for an international company. Sometime in the past she worked as an economist for an international company.
The job description was found to be more accessible when it was presented as currently applying to the protagonist, as in (7a), compared to when it was presented as not currently applying to the protagonist, as in (7b). Remarkably, this effect was obtained even when the accessibility was tested immediately after the manipulated sentence, that is, immediately after reading the sentence in which the job description was introduced. Findings by Zwaan, Madden, and Whitten (2000) also indicate that the presence or absence of states of affairs at the protagonist’s Now immediately affects accessibility. Participants read sentence pairs such as (8a) or (8b). (8)
a. b.
Thomas was programming his computer. When his drink spilled, he continued. Thomas was programming his computer. When his drink spilled, he stopped.
After reading the second sentence, participants were tested for the accessibility of the activity mentioned in the first sentence (e.g., programming). The activity proved to be more accessible when the second sentence stated that it was still going on at the protagonist’s Now at the time of testing, as in (8a), compared to when the second sentence stated a discontinuation of the activity, as in (8b). Additional evidence for the effect of temporal presence on mental accessibility stems from studies examining the role of verb aspect (Carreiras et al., 1997, Experiment 3; Magliano and Schleich, 2000, Experiment 3). For example, in the experiment by Magliano and Schleich (2000), participants were presented with narratives containing a sentence which described an activity of the protagonist either in imperfective aspect, as in (9a), or in perfective aspect, as in (9b).
A simulation view account of salience
263
Hence, the sentence either implied that the activity was ongoing or that it was completed. (9)
a. b.
Stephanie was changing the flat tire. Stephanie changed the flat tire.
The manipulation of verb aspect had an impact on the accessibility of the activity. The activity was more accessible when it was conveyed with an imperfective aspect compared to when it was conveyed with a perfective aspect. The results of the studies reported so far indicate that states of affairs pertaining to the protagonist’s Now are more easy to access than states of affairs pertaining to the (far or distant) past. This finding provides strong empirical evidence for the assumption that the available set of elements at a given time during text processing is composed of those entities that make up the protagonist’s current situation. Additional support for the assumption comes from studies which compared the mental accessibility of entities that are present at the current situation to the mental accessibility of spatially distant entities. Most of the studies which investigated the impact of spatial distance on mental accessibility employed an experimental paradigm introduced by Morrow, Greenspan, and Bower (1987) or Rinck and Bower’s (1995) variant of this paradigm (e.g., Dutke, 2003; Haenggi, Kintsch, and Gernsbacher, 1995; Morrow, Bower, and Greenspan, 1989; Rinck, Hähnel, Bower, and Glowalla, 1997; Rinck, Williams, Bower, and Becker, 1996). Participants first memorize the layout of rooms in a building and the various objects in the rooms. They then read a narrative containing several motion sentences, such as (10), describing the protagonist’s movement from one room (source room) through an unmentioned path room to a goal room. After the presentation of a motion sentence, reading is interrupted by a test of the mental accessibility of the objects in the rooms. (10)
Then he walked from the library to the storage room.
The typical result of studies using this “Morrow paradigm” is that objects of the goal room, that is, objects at the protagonist’s current location at the time of testing, are more accessible than objects of the path room, which in turn are more accessible than objects of the source room. This finding fits well with the assumption that entities pertaining to the protagonist’s current Here are highly salient. Yet, it is questionable whether the results from studies involving memorizing a spatial layout before reading a text about it can be generalized to naturalistic reading conditions. The layout learning may have directed the participants’
264
Berry Claus
attention to the spatial properties of the described world (cf. Zwaan, Radvansky, Hilliard, and Curiel, 1998).4 This objection does not hold for the study by Glenberg and colleagues (1987) which has been mentioned already in the introduction of this paper. Glenberg and colleagues presented their participants with short narratives such as (1), repeated here as (11), without especially encouraging them to attend to spatial relations. (11)
John was preparing for a marathon in August. After doing a few warmup exercises, he put on (associated) / took off (dissociated) his sweatshirt and went jogging. He jogged halfway around the lake without too much difficulty.
Participants were tested for the accessibility of an entity mentioned in the manipulated sentence (e.g., sweatshirt). It was found that the entity was more accessible after reading the associated version than after reading the dissociated version. That is, the probed entity was more accessible when it pertained to the protagonist’s current Here at the time of testing compared to when it was spatial distant from the protagonist. According to the simulation view account of salience this result can be attributed to the salience of the protagonist’s Here. However, one might object that the difference in mental accessibility does not reflect an effect of spatial presence. Indeed, the associated and dissociated condition did not only differ with regard to spatial distance but also with regard to the functional importance of the probed entities for the protagonist’s current situation. Yet, an explanation of the results in terms of functionality (cf. Radvanyky and Copeland, 2000) would by no means be incompatible with the simulation view account of salience. As Glenberg and Mathew (1992) pointed 4. Findings from research on spatial text information suggest the conclusion that comprehenders do not normally construct spatial representations (at least not detailed ones), unless such representations are necessary with regard to the specific task demands or personal reading goals or are easy to construct. However, this conclusion may be premature considering that in virtually all the studies the material was presented visually. Visual text presentation as opposed to auditory text presentation can be expected to be disadvantageous with regard to representing spatial information about a described situation (cf., Brooks, 1967; Eddy and Glass, 1981). As reading already involves a spatial task, namely the control of eye movements, it should interfere with the processing of spatial text information. Indeed, empirical results indicate that comprehenders construct detailed spatial representations of described situations with auditory text presentation but not with visual text presentation under conditions where neither the instruction, nor the materials, nor the experimental task highlighted spatial information (Claus and Kelter, 2009).
A simulation view account of salience
265
out in their reply to McKoon and Ratcliff (1992), “what make an object or event salient are its relations to the other objects and events and our knowledge about what those relations imply for further action”. Mental simulations of described situations are assumed to be perspectival and highly selective, as are representations of directly experienced situations. Hence, the available set of entities at a given time during processing is assumed to be determined by the protagonist’s spatial and mental perspective. Thus, it should include, in particular, those entities which are functionally related to the protagonist. 4.3. Effects of the spatial and mental perspective of the protagonist on mental accessibility Mental simulations constructed during language comprehension are assumed to be biased by the protagonist’s point of view. Thus, according to the simulation view account of salience, which entities of the narrated world compose the available set is affected by the protagonist’s spatial and mental perspective. As was mentioned in section 3.2, several studies indicate that comprehenders adopt the protagonist’s spatial point of view. There are also empirical findings suggesting that the spatial point of view of the protagonist affects the mental accessibility of entities of the described situation. In a study by Borghi, Glenberg, and Kaschak (2004, Experiment 1), participants were presented with sentences describing an object either from an inside perspective, as in (12a), or from an outside perspective, as in (12b). (12)
a. b.
You are driving a car. You are washing a car.
After each sentence, participants had to respond to a part verification task. A probe was presented and the participants’ task was to verify if the probe named a part of the object mentioned in the sentence. Some of the probes named parts usually found inside the object (e.g., fuel gauge) and some of them named parts usually found outside the object (e.g., trunk). Reaction times on the part verification task revealed an interaction between the manipulated perspective and the type of the probed part. Reaction times were shorter when the location of the probed part was consistent with the perspective location (e.g., fuel gauge – driving) compared to when it was inconsistent (e.g., trunk – driving). This finding suggests that perspective-consistent entities of the described world are more accessible than perspective-inconsistent entities. Admittedly, the study by Borghi and colleagues tested the mental accessibility of unmentioned entities. Hence, it remains an open question, whether similar results would be obtained
266
Berry Claus
when testing the mental accessibility of explicitly mentioned entities. However, findings of a study by Horton and Rapp (2003) lend some credence to the conjecture that this question might receive a positive answer in future research. The study by Horton and Rapp was concerned with the question of whether the mental accessibility of an entity mentioned in a text is affected by its visibility to the protagonist. Participants were presented with short narratives like the one in (13). Each narrative mentioned a critical entity that was visible from the protagonist’s current point of view (e.g., mailbox). There were two versions of the last sentence of the narrative, manipulating the visibility of the critical entity without referring to it. In the unblocked version, the sentence described an event that did not block the protagonist’s view of the critical entity, such that it remained visible. In the blocked version, the sentence described an event that blocked the view, such that the critical entity became occluded from the protagonist’s view. (13)
Mr. Ranzini was sitting outside on his front stoop. He had lived on this block for over 30 years. Next door was a local playground for the children. Directly across the street was the mailbox that he used. As usual, Mrs. Rosaldo was taking her poodle for a walk. unblocked: Suddenly, a man on a bike rode up in front of Mr. Ranzini. blocked: Suddenly, a large truck pulled up in front of Mr. Ranzini.
At the end of the narrative, participants had to respond to a probe question about the critical entity (e.g., Was there a mailbox across the street?). Response latencies to the probe question were found to be shorter after reading the unblocked version of the last sentence compared to after reading the blocked version. This result suggests that entities which are visible to the protagonist are mentally more accessible than entities which are occluded from the protagonist’s view. Hence, there is some, albeit limited, support for the assumption, that the available set of entities at a given time during narrative comprehension, is determined by the protagonist’s perceptual perspective. Let’s now turn to studies which demonstrate effects of the protagonist’s mental perspective on accessibility. Results of a study using the “Morrow paradigm” (see section 4.2) indicate that the mental accessibility of entities does not only depend on the protagonist’s current spatial location but mainly on his or her mental location, that is, a location pertaining to the protagonist’s thoughts (Morrow et al., 1989). Participants were presented with sentences, such as (14), describing the protagonist’s thoughts which involved a particular room.
A simulation view account of salience
(14)
267
He thought the library should be rearranged to make room for a display of current research.
It was found that after reading such a sentence, objects located in the room that the protagonist was thinking about were more accessible than objects in any other room. Additional support for the impact of the protagonist’s mental perspective on accessibility comes from studies concerning the significance of protagonists’ goals. Empirical findings of several studies indicate that protagonists’ goals are highly accessible (e.g., Dopkins, Klin and Myers, 1993; Huitema, Dopkins, Klin and Myers, 1993; Suh and Trabasso, 1993). Other findings suggest that this only holds as long as the protagonist’s goal is not satisfied. Goal-related information was found to be more accessible when it pertains to a failed goal compared to when it pertains to a completed goal (Lutz and Radvansky, 1997; Radvansky, and Curiel, 1998; Suh and Trabasso, 1993).5 In the study by Radvansky and Curiel, participants read narratives such as (15). There were two versions of the narrative which differed with respect to whether an initially mentioned goal of the protagonist (buying a retirement gift) was failed or completed. (15)
Once there was a bank teller named Roy. Roy realized his boss was retiring in four days. He wanted to give her a retirement gift. He went to the department store. failed goal: He couldn’t find anything nice enough. He felt discouraged. completed goal: He bought a nice big-screen TV for his boss. He felt pretty good.
Accessibility of the protagonist’s (ongoing or completed) goal was tested by measuring response latencies to a probe question (e.g., Had Roy wanted to buy his boss a gift?). Response were found to be shorter after reading that the goal was failed compared to after reading that the goal was satisfied. The findings of the studies reported in this section fit well with the assumption that the available set of entities is shaped by the protagonist’s point of view. They indicate that the mental accessibility of an entity is affected by its relation to the protagonist’s spatial and mental perspective.
5. This resembles an effect found for non-linguistic cognition. People remember uncompleted tasks better than completed tasks (Zeigarnik, 1927).
268 5.
Berry Claus
Conclusion
The present paper took a look at the issue of salience from the perspective of the simulation view of language comprehension. According to the simulation view, extra-linguistic salience derives from the mental simulations constructed during language comprehension. The available set of entities at a given moment during processing is assumed to be determined by the mental simulation of the situation at the current narrative Now. The studies reported in the previous sections provide empirical support for this assumption. It should be noted, that the simulation view account of salience by no means implies a denial of the impact of linguistic devices on salience. Mental simulations of described situations are constructed through language. They are instructed by linguistic means. Hence, the simulation view account of salience is, in principle, not incompatible with accounts of salience in terms of linguistic factors. It would be interesting to see whether and how both types of accounts might benefit from each other. In future research, it needs to be clarified whether and how simulation-based salience affects the resolution of referential expressions and how this interacts with effects of linguistic salience. There are already some promising empirical findings in this regard. Results of a sentence-completion study by Stevenson, Crawley and Kleinman (1994) indicate a preference to resolve an ambiguous pronoun with the thematic role that is associated with the consequences of the precedingly described event (i.e., goal, patient) rather than with the thematic role that is associated with the starting point (i.e., source, agent) – even when order of mention/syntactic function is controlled for. Results of Morrow (1985) also suggest that reference resolution depends more on the event structure than on the linguistic surface structure. He investigated the resolution of ambiguous definite noun phrases (e.g., He noticed the room was dark) after reading a sentence that described that a character moved from one room to another room. Antecedent choices were determined more by grammatical aspect and preposition that implied the mover’s current location (e.g., John walked past the living room into the kitchen vs. John was walking past the living room to the kitchen) than by the order of mention. Remarkably, an additional experiment indicated that the referential interpretation of proximal and distal demonstratives (e.g., this room vs. that room) was affected by spatial and/or temporal distance in the described world rather than by surface linguistic distance. However, Morrow’s findings as well as the findings by Stevenson and colleagues are based on offline measures. In future studies on the effects of linguistic versus simulationbased salience, it would be expedient to investigate which factors influence the time course of reference resolution during online comprehension. For exam-
A simulation view account of salience
269
ple, findings of a recent study by Kaiser, Runner, Sussman and Tannenhaus (2009) suggest that fine-grained online measures may yield more differentiated results. In a visual-world eye-tracking experiment, they found early effects of both structural and semantic constraints (syntactic vs. semantic role) on the interpretation of pronouns and reflexives (e.g., Peter {told / heard from} Andrew about the picture of {him / himself} on the wall), with the two anaphoric forms being differentially sensitive to structural and semantic information. The present paper focused exclusively on issues concerning language comprehension. It will be a crucial task to account for the production side, that is, the issue as to whether simulation-derived extra-linguistic salience does affect the choice of referring expressions and how such effects may interact with purely linguistic factors (see Filchenko, this volume, for culturally conditioned effects on speakers’ choice of grammatical constructions that were found for the indigenous language Eastern Khanty). An experimental study by Arnold (2001) is germane to this question. She found that speakers more frequently used pronouns when referring to the goal of a transfer verb than when referring to the source. However, this effect primarily occurred for object referents rather than for subject referents6 (see also Rose, this volume, who provides corpus-based evidence that both, syntactic and semantic prominence affect subsequent pronominal reference). Arnold’s findings do suggest that situational factors, such as event structure, may co-determine the choice of referring expressions. Yet, it remains to be seen in future studies whether and to what extent simulation-based salience can affect the form of anaphorical reference. The choice as well as the interpretation of referring expressions is largely governed by linguistic conventions. For instance, pronominal reference is usually not licensed when there is no explicit linguistic antecedent.7 Hence, it is not surprising that the form and resolution of anaphoric expressions is crucially determined by purely linguistic factors. However, recent findings strongly support a form-specific multiple-constraints approach, which claims that the resolution of anaphora is guided by multiple constraints (e.g., syntactic, semantic, discourse based) that are weighted differently for different anaphoric forms (e.g., Brown-Schmidt, Byron, and Tanenhaus, 2005; Kaiser and Trueswell, 2008; Kaiser et al., 2009). The point of this paper is that extra-linguistic, simulationbased salience may also play a role – at least in narrative comprehension. 6. Overall, the proportion of pronoun uses was higher for subject referents than for object referents. 7. Exceptions are, for example, focussed entities in situated language processing and probably also nuclear implicit referents in dialogue utterances (for empirical evidence, see Cornish, Garnham, Cowles, Fossard, and André, 2005).
270
Berry Claus
References Anderson, Anne, Simon C. Garrod, and Anthony J. Sanford 1983 The accessibility of pronominal antecedents as a function of episode shifts in narrative text. Quarterly Journal of Experimental Psychology 35A: 427–440. Arnold, Jennifer E. 2001 The effects of thematic roles on pronoun use and frequency of reference. Discourse Processes 31: 137–162. Barquero, Beatriz 1999 Mentale Modelle von mentalen Zuständen und Handlungen der Textprotagonisten [Mental models of text protagonists’ mental states and actions]. Zeitschrift für Experimentelle Psychologie 46: 243–248. Barsalou, Lawrence W. 1999 Perceptual symbol systems. Behavioral and Brain Sciences 22: 577– 660. Bestgen, Yves and Wietske Vonk 1995 The role of temporal segmentation markers in discourse processing. Discourse Processes 19: 385–406. Bexten, Birgitta this vol.
Multiple preferred centers in a plurilinear discourse environment. this volume, 229–250.
Black, John B., Terrence J. Turner, and Gordon H. Bower 1979 Point of view in narrative comprehension, memory, and production. Journal of Verbal Learning and Verbal Behavior18: 187–198. Borghi, Anna M., Arthur M. Glenberg, and Michael P. Kaschak 2004 Putting words in perspective. Memory & Cognition 32: 863–873. Brooks, Lee R. 1967
The suppression of visualization by reading. Quarterly Journal of Experimental Psychology 19: 289–299.
Brooks, Rodney A. 1987 Intelligence without representation. Artificial Intelligence 47: 139– 159. Brown-Schmidt, Sarah, Donna K. Byron, and Michael K. Tanenhaus 2005 Beyond salience: Interpretation of personal and demonstrative pronouns. Journal of Memory and Language 53: 292–313. Byrant, David J., Barbara Tversky, and Margaret Lanca 2001 Retrieving spatial relations from observation and memory. In Conceptual structure and its interfaces with other modules of representation, Emile van der Zee and Urpo Nikanne (eds.), 116–139. Oxford: Oxford University Press.
A simulation view account of salience
271
Buccino, Giovanni, Lucia Riggio, Gabriele Melli, Ferdinand Binkofski, Vittorio Gallese, and Giacomo Rizzolatti 2005 Listening to action-related sentences modulates the activity of the motor system: A combined TMS and behavioral study. Cognitive Brain Research 24: 355–363. Carreiras, Manuel, Núria Carriedo, María A. Alonso, and Angel Fernández 1997 The role of verb tense and verb aspect in the foregrounding of information during reading. Memory & Cognition 25: 438–446. Claus, Berry 2008
Comprehending descriptions of non-factual desired situations: Discourse referents and motor actions. In Proceedings of the Workshop Constraints in Discourse 3, Anton Benz, Peter Kühnlein, and Manfred Stede (eds.). Potsdam, Germany.
Claus, Berry and Stephanie Kelter 2006 Comprehending narratives containing flashbacks: Evidence for temporally organized representations. Journal of Experimental Psychology: Learning, Memory, and Cognition 32: 1031–1044. Claus, Berry and Stephanie Kelter 2009 Embodied language comprehension: The processing of spatial information during reading and listening. In Advances in Psychology Research, Vol. 59, Alexandra M. Columbus (ed.), 1–44. Hauppauge, NY: Nova Science. Cornish, Francis, Alan Garnham, H. Wind Cowles, Marion Fossard, and Virginie André 2005 Indirect anaphora in English and French: A cross-linguistic study of pronoun resolution. Journal of Memory and Language 52: 363–376. de Vega, Manuel, José M. Díaz, and Immaculada León 1997 To know or not to know: Comprehending protagonists’ beliefs and their emotional consequences. Discourse Processes 23: 169–192. Dopkins, Stephen, Celia Klin, and Jerome L. Myers 1993 Accessibility of information about goals during the processing of narrative texts. Journal of Experimental Psychology: Learning, Memory, & Cognition 19: 70–80. Dutke, Stephan 2003
Anaphor resolution as a function of spatial distance and priming: exploring the spatial distance effect in situation models. Experimental Psychology 50: 270–284.
Eddy, John K., and Arnold L. Glass 1981 Reading and listening to high and low imagery sentences. Journal of Verbal Learning and Verbal Behavior 20: 333–345.
272
Berry Claus
Filchenko, Andrey Y. this vol. Parenthetical agent-demoting constructions in Eastern Khanty: Discourse salience vis-à-vis referring expressions. this volume, 57–79. Franklin, Nancy and Barbara Tversky 1990 Searching imagined environments. Journal of Experimental Psychology: General 119: 63–76. Gernsbacher, Morton Ann, H. Hill Goldsmith, and Rachel R.W. Robertson 1992 Do readers mentally represent fictional characters’ emotional states? Cognition & Emotion 6: 89–111. Gernsbacher, Morton Ann and Rachel R.W. Robertson 1992 Knowledge activation versus sentence mapping when representing fictional characters’ emotional states. Language and Cognitive Processes 7: 353–371. Glenberg, Arthur M. 1997 What memory is for. Behavioral and Brain Sciences 20: 1–55. Glenberg, Arthur M., David Havas, Raymond Becker, and Mike Rinck 2005 Grounding language in bodily states: The case for emotion. In Grounding cognition: The role of perception and action in memory, language, and thinking, Diane Pecher and Rolf A. Zwaan (eds.), 115– 128. Cambridge: Cambridge University Press. Glenberg, Arthur M. and Michael P. Kaschak 2002 Grounding language in action. Psychonomic Bulletin & Review 9: 558–565. Glenberg, Arthur M., and Shashi Mathew 1992 When minimalism is not enough: Mental models in reading comprehension. Psycoloquy 3(64), reading-inference-2.1 Glenberg, Arthur M., Marion Meyer, and Karen Lindem 1987 Mental models contribute to foregrounding during text comprehension. Journal of Memory and Language 26: 69–83. Glenberg, Arthur M., Marc Sato, and Luigi Cattaneo 2008 Use-induced motor plasticity affects the processing of abstract and concrete language. Current Biology 18: R290–R291. González, Julio, Alfonso Barros-Loscertales, Friedemann Pulvermüller, Vanessa Meseguer, Ana Sanjuán, Vicente Belloch, and César Ávila 2006 Reading cinnamon activates olfactory brain regions. NeuroImage 32: 906–912. Graesser, Arthur C., Murray Singer, and Tom Trabasso 1994 Constructing inferences during narrative text comprehension. Psychological Review 101: 371–395.
A simulation view account of salience
273
Haenggi, Dieter, Walter Kintsch, and Morton Ann Gernsbacher 1995 Spatial situation models and text comprehension. Discourse Processes 19: 173–199. Harnad, Stevan 1990
The symbol grounding problem. Physica D 42: 335–346.
Hauk, Olaf, Ingrid Johnsrude, and Friedemann Pulvermüller 2004 Somatotopic representation of action words in human motor and premotor cortex. Neuron 41: 301–307. Heim, Irene 1982
The semantics of definite and indefinite noun phrases. Unpublished Dissertation. University of Massachusets, Armherst.
Hinterhölzl, Roland and Svetlana Petrova this vol. Rhetorical relations and verb placement in Old High German. this volume, 173–201. Holt, Lauren E. and Sian L. Beilock 2006 Expertise and its embodiment: Examining the impact of sensorimotor skill expertise on the representation of action-related text. Psychonomic Bulletin & Review 13: 694–701. Hörnig, Robin, Berry Claus, and Klaus Eyferth 2000 In search of an overall organizing principle in spatial mental models: a question of inference. In Spatial Cognition: Foundations and applications. Selected papers from Mind III, Annual conference of the Cognitive Science Society of Ireland, 1998, Seán Ó’Nualláin (ed.), 69–81. Amsterdam: John Benjamins. Horton, William S. and David N. Rapp 2003 Out of sight, out of mind: Occlusion and the accessibility of information in narrative comprehension. Psychonomic Bulletin & Review 10: 104–110. Huitema, John S., Stephen Dopkins, Celia M. Klin, and Jerome L. Myers 1993 Connecting goals and actions during reading. Journal of Experimental Psychology: Learning, Memory, & Cognition 19: 1053–1060. Johnson-Laird, Philip N. 1983 Mental models: Towards a cognitive science of language, inference, and consciousness. Cambridge: Cambridge University Press. Kaiser, Elsi, Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus 2009 Structural and semantic constraints on the resolution of pronouns and reflexives. Cognition 112: 55–80.
274
Berry Claus
Kaiser, Elsi and John C. Trueswell 2008 Interpreting pronouns and demonstratives in Finnish: Evidence for a form-specific approach to reference. Language and Cognitive Processes 23: 709–748. Kamp, Hans and Uwe Reyle 1993 From discourse to logic. Dordrecht: Kluwer Academic Publishers. Kaup, Barbara, Jana Lüdtke, and Rolf A. Zwaan 2006 Processing negated sentences with contradictory predicates: Is a door that is not open mentally closed? Journal of Pragmatics 38: 1033– 1050. Kaup, Barbara, Richard H. Yaxley, Carol J. Madden, Rolf A. Zwaan, and Jana Lüdtke 2007 Experiential simulations of negated text information. Quarterly Journal of Experimental Psychology 60: 976–990. Karttunen, Lauri 1976 Discourse referents. In Syntax and semantics, Vol. 7: Notes from the linguistic underground, James D. McCawley (ed.), 363–386. New York: Academic Press. Kelleher, John D. this vol. Visual salience and the other one. this volume, 209–228. Kelter, Stephanie, Barbara Kaup, and Berry Claus 2004 Representing a described sequence of events: A dynamic view of narrative comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition 30: 451–464. Long, Debra L. and Jonathan M. Golding 1993 Superordinate goal inferences: Are they automatically generated during comprehension? Discourse Processes 15: 55–73. Lutz, Mark F. and Gabriel A. Radvansky 1997 The fate of completed goal information in narrative comprehension. Journal of Memory and Language 36: 293–310. MacWhinney, Brian 1977 Starting points. Language 53: 152–187. MacWhinney, Brian 2005 The emergence of grammar from perspective. In Grounding cognition: The role of perception and action in memory, language, and thought, D. Pecher and R. Zwaan (eds.), 198–223. Cambridge: Cambridge University Press. Magliano, Joseph P. and Michelle C. Schleich 2000 Verb aspect and situation models. Discourse Processes 29: 83–112. McKoon, Gail and Roger Ratcliff 1992 Inferences during reading. Psychological Review 99: 440–466.
A simulation view account of salience
275
Meteyard, Lotte, Bahador Bahrami, and Gabriella Vigliocco 2007 Motion detection and motion verbs. Psychological Science 18: 1007– 1013. Morrow, Daniel G. 1985 Prepositions and verb aspect in narrative understanding. Journal of Memory and Language 24: 390–404. Morrow, Daniel G., Gordon H. Bower, and Steven L. Greenspan 1989 Updating situation models during narrative comprehension. Journal of Memory and Language 28: 292–312. Morrow, Daniel G., Steven L. Greenspan, and Gordon H. Bower 1987 Accessibility and situation models in narrative comprehension. Journal of Memory and Language 26: 165–187. Moscoso del Prado Martín, Fermín, Olaf Hauk, and Friedemann Pulvermüller 2006 Category specificity in the processing of color-related and form-related words: An ERP study. NeuroImage 29: 29–37. Navon, David 1978
On a conceptual hierarchy of time, space, and other dimensions. Cognition 6: 223–228.
O’Brien, Edward J. and Jason E. Albrecht 1992 Comprehension strategies in the development of a mental model. Journal of Experimental Psychology: Learning, Memory, and Cognition 18: 777–784. Pecher, Diane, Saskia van Dantzig, Rolf A. Zwaan, and René Zeelenberg 2009 Language comprehenders retain implied shape and orientation of objects. Quarterly Journal of Experimental Psychology 62: 1108–1114. Poynor, David V. and Robin K. Morris 2003 Inferred goals in narratives: Evidence from self-paced reading, recall, and eye movements. Journal of Experimental Psychology: Learning, Memory, and Cognition 29: 3–9. Radvansky, Gabriel A. and David E. Copeland 2000 Functionality and spatial relations in memory and language. Memory & Cognition 28: 987–992. Radvansky, Gabriel A. and Jacqueline M. Curiel 1998 Narrative comprehension and aging: The fate of completed goal information. Psychology & Aging 13: 69–79. Rall, Jaime and Paul L. Harris 2000 In Cinderella’s slippers? Story comprehension from the protagonist’s point of view. Developmental Psychology 36: 202–208.
276
Berry Claus
Ramm, Wiebke this vol.
Discourse-structural salience from a cross-linguistic perspective: Coordination and its contribution to discourse (structure). this volume, 143–172.
Rinck, Mike and Gordon H. Bower 1995 Anaphora resolution and the focus of attention in situation models. Journal of Memory and Language 34: 110–131. Rinck, Mike, Andrea Hähnel, Gordon H. Bower, and Ulrich Glowalla 1997 The metrics of spatial situation models. Journal of Experimental Psychology: Learning, Memory, and Cognition 23: 622–637. Rinck, Mike and Ulrike Weber 2003 Who when where: An experimental test of the event-indexing model. Memory & Cognition 31: 1284–1292. Rinck, Mike, Pepper Williams, Gordon H. Bower, and Eni S. Becker 1996 Spatial situation models and narrative understanding: Some generalizations and extensions. Discourse Processes 21: 23–55. Rose, Ralph L. this vol.
Joint information value of syntactic and semantic prominence for subsequent pronominal reference. this volume, 81–103.
Sanford, Anthony J., Michael Clegg, and Asifa Majid 1998 The influence of types of character on processing background information in narrative discourse. Memory & Cognition 26: 1323–1329. Sanford, Anthony J. and Simon C. Garrod 1981 Understanding written language: explorations of comprehension beyond the sentence. New York: John Wiley. Speer, Nicole K. and Jeffrey M. Zacks 2005 Temporal changes as event boundaries: Processing and memory consequences of narrative time shifts. Journal of Memory and Language 53: 125–140. Stevenson, Rosemary J., Rosalind A. Crawley, and David Kleinman 1994 Thematic roles, focus and the representation of events. Language and Cognitive Processes 9: 519–548. Suh, Soyoung and Tom Trabasso 1993 Inferences during reading: Converging evidence from discourse analysis, talk-aloud protocols, and recognition priming. Journal of Memory and Language 32: 279–300. Taylor, Larry J. and Rolf A. Zwaan 2008 Motor resonance and linguistic focus. Quarterly Journal of Experimental Psychology 61: 896–904.
A simulation view account of salience
277
van Dijk, Teun A. and Walter Kintsch 1983 Strategies of discourse comprehension. New York: Academic Press. Zeigarnik, Bluma 1927 Das Behalten erledigter und unerledigter Handlungen [Remembering completed and uncompleted tasks]. Psychologische Forschung 9: 1–85. Ziegler, Fenja, Peter Mitchell, and Gregory Currie 2005 How does narrative cue children’s perspective taking? Developmental Psychology 41: 115–123. Zwaan, Rolf A. 1996 Zwaan, Rolf A. 2004
Processing narrative time shifts. Journal of Experimental Psychology: Learning, Memory, and Cognition 22: 1196–1207. The immersed experiencer: Toward an embodied theory of language comprehension. In The psychology of learning and motivation, Brian H. Ross (ed.), 35–62. Academic Press, New York.
Zwaan, Rolf A., Carol J. Madden, and Shannon N. Whitten 2000 The presence of an event in the narrated situation affects its availability to the comprehender. Memory & Cognition 28: 1022–1028. Zwaan, Rolf A. and Gabriel A. Radvansky 1998 Situation models in language comprehension and memory. Psychological Bulletin 123: 162–183. Zwaan, Rolf A., Gabriel A. Radvansky, Amy E. Hilliard, and Jacqueline M. Curiel 1998 Constructing multidimensional situation models during reading. Scientific Studies of Reading 2: 199–220. Zwaan, Rolf A. and Larry J. Taylor 2006 Seeing, acting, understanding: motor resonance in language comprehension. Journal of Experimental Psychology: General 135: 1–11. Zwaan, Rolf A. and Timothy P. Truitt 1998 Smoking urges affect language processing. Experimental and Clinical Psychopharmacology 6: 325–330.
Index Language index Danish 109, 245 Dutch 9, 38
Latin 1, 175–184, 186–194
English 1, 11, 32–33, 82–98, 143–168, 235, 245
Norwegian 15, 143–169
German 15, 111, 114, 143–169, 173– 178, 193–195, 236
Old Old Old Old
Japanese 98
Russian 10, 31–53, 58, 83, 143
Khanty, Eastern (Uralic) 10, 57–76, 99, 269
Spanish 98
Finnish 9, 123
English 177, 179, 185 High German 15, 82, 173–196 Norse 177 Saxon 177, 179
Index of determinants, manifestations and aspects of salience aboutness 35, 37, 59, 119, 128, 146– 147, 182–183, 195, 229–242, 245– 246, 265–267 accessibility 4–5, 14, 19, 59, 71, 107– 111, 134, 174, 180–184, 251–252, 258, 261–267 activation 5, 9–10, 31–42, 52, 59–60, 62, 107, 109, 208, 254, 257 agenthood (agentivity, see semantic roles) 10, 57, 60–65, 68, 71–76, 82, 86–87, 93–95, 117 animacy (animate, inanimate) 33, 42, 45–50, 62, 74, 94, 173, 191 clause linkage 145, 151, 161, 167 cognitive status (attentional state, discourse status) 34–37, 59, 61, 82, 108–109, 119, 123, 127, 133, 173, 180, 190, 208, 234
contrast (contrastivity, contrastiveness) 16–17, 39, 41, 123, 161, 239, 244 coordination 4, 13–15, 143–169, 174– 175, 183, 185–186, 188, 192–195 definiteness 5, 45, 62, 128–130, 180, 184, 186, 205, 207–209, 219–220, 222, 268 discourse prominence 9, 57, 76, 107, 110–111, 115 distance (recency of mention) 16, 18, 32–33, 35–37, 39, 51–52, 83, 113– 117, 174, 215, 221–223, 242, 251– 252, 261, 263–264, 268 emphasis 107, 111, 123, 134, 239, 242, 244–246, 246 familiarity, information status (as defined by Ellen F. Prince) 5, 114, 182, 236
280
Index
discourse-new (see new) 15, 31, 174, 183, 236 discourse-old (see given) 15, 229, 232, 236, 239 hearer-new (see new) 119, 236–237 hearer-old (see given) 236–237 feature salience 125–126 focusing (semantic focusing) 10, 17, 82, 145, 165 given (givenness) 3, 5, 9, 15, 32, 34, 35, 105, 107–109, 111, 115, 123, 130, 134, 146, 173–174, 180, 182–184, 190–191, 193, 258 grammatical roles (syntactic relations) 5–8, 11, 33, 57, 59–66, 73–74, 76, 81–97, 105–134, 144, 173, 216, 231, 235, 269 hearer salience 11, 84, 107–134 identifiability 35–36, 60, 71 importance 1, 4, 45–46, 94, 107–108, 115, 122–123, 134, 174, 179, 192, 242, 260, 264 imposed salience 9, 109 inherent salience 9, 109 markedness 81, 109–112, 163, 229, 232, 233, 235, 238, 240–241, 244–247 new (newness, newsworthiness) 3, 15, 31, 45–46, 51, 60–62, 107, 111, 119, 128, 134, 146, 150, 173–174, 180– 184, 186, 190–193, 236–237, 256 nuclearity (nucleus-satelite relationships in RST) 14, 143, 146, 148–149, 154–155, 166–168, 185 perceptual salience 15–17, 109, 133– 134, 205, 208, 258
persistence (thematic prominence) 46– 48, 51, 61, 76, 81, 115, 122 pronominalization 6–8, 61, 76, 90–92, 96, 113, 115–118, 120–121, 127– 130, 132 property salience 125–126 prosody 17, 63, 107, 229, 150, 229, 242, 245 referential choice (choice of referring expressions) 9, 11, 31–52, 76, 83, 105–134, 269, 173 referentiality 5, 35–37, 74, 88, 184, 253–254, 268 relevance 1, 9, 31, 35, 38, 47–49, 52, 59, 64, 106–107, 134, 146, 149–150, 161, 180, 230, 252 selective attention 16 semantic roles (thematic roles) 10–11, 17, 57, 59–60, 63–67, 76, 81–99, 150, 167, 268–269 socio-cultural factors 66–73 speaker salience 11, 84, 107–134 subordination 4, 13–15, 93, 143–169, 174–175, 185–188, 195, 243, 258 topicality 5, 9, 59, 61–64, 71–77, 105, 107, 111, 115–118, 120, 126, 134, 182, 252 typographic marks 4, 17, 229, 242, 245– 246 visual salience 16, 18, 83, 107, 123, 126, 205–225, 231, 237, 253 word order (order of mention) 8, 11, 57, 60, 76, 81–82, 98, 105–134, 143– 144, 153, 159, 164, 168, 174–192, 195, 236, 245, 268
Subject index
281
Subject index active-direct voice 59–65, 71, 74–75, 106 attention 1, 5–7, 9, 11, 16–17, 32, 34– 35, 61, 105–109, 111, 115, 119, 122–124, 127, 131, 133–134, 148, 179, 207–209, 211–215, 224–225, 233–234, 236–239, 245–247, 264 Background (discourse structure) 145– 150, 154, 156, 168 background (information structure) 3, 110, 143, 145–146, 149, 180 backgrounding (salience demotion, downgrading, defocusing) 10, 57, 59–62, 64–66, 68–69, 71–76, 99, 110, 144–146, 151, 162–166, 174, 184–185 backward-looking (anaphoric aspects of discourse processing) 3, 5–11, 108, 111, 113–114, 120–121, 127–131, 147–148, 154, 234–236, 238 bridging 131, 181 center of attention (see focus of attention) 6, 106, 131, 239, 245 center (Centering) 5–8, 19, 59, 61, 63, 65, 76, 113–114, 120–121, 127–131, 216, 232–241, 245–247 backgrounded center 59, 65, 236 backward-looking center 5–8, 113– 114, 120–121, 127–131, 234– 236, 238 foregrounded center 59, 63, 65, 236 forward-looking centers 5–7, 15, 18– 19, 113–114, 116, 120, 132, 234– 237, 239–241, 244–246 preferred center 6–7, 61, 63, 113, 120–121, 229–247 Centering Theory 2–3, 5–10, 18–19, 59, 61, 76, 83, 105, 111, 113–122, 126– 134, 216, 229–247
Centering transitions 6–8, 94, 113, 120, 128, 131, 235–236, 243 coherence 3, 6–8, 10, 12, 19, 76, 82, 111, 113, 229, 132, 134, 174–175, 180, 185, 195, 229–231, 237, 241, 243, 246–247, 256 common ground 5, 108–109, 123, 131, 208–209 continuity continuation (discourse relation) 186–187, 195 continue (Centering transition) 6–8, 61, 65, 74, 76, 133, 234, 236 topic continuity 150–151, 187 deixis 17, 126, 258 demonstrative 9–10, 31–52, 123–124, 153, 173, 205, 268 discourse relations (coherence relations, rhetorical relations) 1, 4, 11–15, 43, 144–155, 159–164, 166, 168, 173– 195 Discourse Representation Theory (DRT) 12, 14, 253 discourse structure 2–4, 11–15, 97, 143– 169, 174–175, 183–190, 195, 208, 225, 230, 241, 243 ergative voice construction 10–11, 58, 59, 61–63, 68–76 focus (information structure) 3–4, 10, 17, 61–62, 72, 92, 143, 146, 162, 178, 180, 183, 193 focus (of attention) 1, 5, 9, 18, 35–36, 106, 145, 208, 211, 214–215, 222, 234, 239, 247 focus stack model 208, 225 foreground 132, 145, 147–148 foregrounding (salience promotion) 57, 59, 63–65, 73–76, 107, 109, 110–
282
Index
111, 132–133, 144–145, 147–148, 165, 173, 184–185, 236, 245 forward-looking (cataphoric aspects of discourse) 3, 5–7, 10–11, 15, 18–19, 111, 113–114, 116, 120, 131, 147– 148, 230–232, 234–237, 239–242, 244–246 Game Theory 134–135 Givenness Hierarchy 9, 32, 34–35 hypertext 4, 17–19, 229–247, 253 iconicity 106, 110–111 information structure 3, 15, 59, 61–62, 64, 75, 123, 145–148, 176, 179–183, 194, 236 Information Theory 11, 82–83, 89–91, 97 memory 9, 31–32, 34–35, 107, 256–257 Optimality Theory 120, 126–131, 133 passive voice 58–59, 63–66, 68, 72–77, 106, 177 quaestio, question under discussion (QUD) 148, 150 reference resolution (anaphor resolution) 4, 8, 14, 17–18, 97, 113, 128, 130, 133, 205–225, 236–237, 243–244, 253, 268–269 referent (discourse referent) 2–3, 6–9, 18, 31–39, 42, 44–52, 57, 59–65, 68,
71–76, 82–83, 88–99, 108–120, 123– 126, 128, 133, 173–174, 180–182, 184, 186–187, 190–192, 195, 205, 207–213, 218–224, 243–244, 253, 269 Relevance Theory 149–150 Rhetorical Structure Theory (RST) 14, 43, 143, 146, 148–149, 154–155, 160, 168, 185 right frontier principle 12–14, 155 saliency map 17, 215 Segmented Discourse Representation Theory (SDRT) 12, 143–169, 174, 185–186, 195, 225 shift (of attention) 7, 41, 76, 106–107, 109, 111, 173, 179, 244–247 shift (Centering transition) 7, 61, 127, 234, 247 perspective shift 258, 191, 258–259 temporal shift 31, 41–45, 52, 256 thematic shift 235, 246 topic shift 42 simulation view (of language comprehension) 18–19, 251–269 situated communication 4, 16–18, 205, 207, 220, 224, 231, 237, 253, 269 topic 3, 5, 8, 10, 35, 42, 59, 61–66, 74– 75, 92, 127–128, 145, 147, 178, 180, 182–184, 187, 192, 194–195, 231, 233–236, 252 visual-world paradigm 16, 269