Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Lexis in Contrast Studies in Corpus Linguistics Studies in Corpus Linguistics aims to provide insights into the way...

Author: Bengt Altenberg | Sylviane Granger

89 downloads 1362 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

AUTHOR ""

TITLE "Lexis in Contrast: Corpus-based approaches"

SUBJECT "Studies in Corpus Linguistics, Volume 7"

KEYWORDS ""

SIZE HEIGHT "220"

WIDTH "150"

VOFFSET "4">

Lexis in Contrast

Studies in Corpus Linguistics Studies in Corpus Linguistics aims to provide insights into the way a corpus can be used, the type of ﬁndings that can be obtained, the possible applications of these ﬁndings as well as the theoretical changes that corpus work can bring into linguistics and language engineering. The main concern of SCL is to present ﬁndings based on, or related to, the cumulative eﬀect of naturally occuring language and on the interpretation of frequency and distributional data.

General Editor Elena Tognini-Bonelli

Consulting Editor Wolfgang Teubert

Advisory Board Michael Barlow, Rice University, Houston Robert de Beaugrande, UAE Douglas Biber, North Arizona University Chris Butler, University of Wales, Swansea Wallace Chafe, University of California Stig Johansson, Oslo University M. A. K. Halliday, University of Sydney Graeme Kennedy, Victoria University of Wellington John Laﬄing, Herriot Watt University, Edinburgh Geoﬀrey Leech, University of Lancaster John Sinclair, University of Birmingham Piet van Sterkenburg, Institute for Dutch Lexicology, Leiden Michael Stubbs, University of Trier Jan Svartvik, University of Lund H-Z. Yang, Jiao Tong University, Shanghai Antonio Zampolli, University of Pisa

Volume 7 Lexis in Contrast: Corpus-based approaches Edited by Bengt Altenberg and Sylviane Granger

Lexis in Contrast Corpus-based approaches

Edited by Bengt Altenberg University of Lund

Sylviane Granger Université Catholique de Louvain

John Benjamins Publishing Company Amsterdam/Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996.

Library of Congress Cataloging-in-Publication Data Lexis in contrast : corpus-based approaches / edited by Bengt Altenberg, Sylviane Granger. p. cm. (Studies in Corpus Linguistics, issn 1388–0373 ; v. 7) Includes bibliographical references and index. 1. Lexicology--Data processing. 2. Contrastive linguistics--Data processing. 3. Lexicography--Data processing. 4. Translating and interpreting--Data processing. I. Altenberg, Brengt. II. Granger, Sylviane, 1951- III. Series. P326.5.D38 LA495 2002 413’.028--dc21 isbn 90 272 2277 0 (Eur.) / 1 58811 090 7 (US) (Hb; alk. paper)

2001037885

© 2002 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microﬁlm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Table of contents

Preface

vii

List of contributors

ix

Introduction Recent trends in cross-linguistic lexical studies Bengt Altenberg and Sylviane Granger

11

Cross-Linguistic Equivalence Two types of translation equivalence Raphael Salkie



Functionally complete units of meaning across English and Italian: Towards a corpus-driven approach Elena Tognini Bonelli



Causative constructions in English and Swedish: A corpus-based contrastive study Bengt Altenberg



Contrastive Lexical Semantics Polysemy and disambiguation cues across languages: The case of Swedish få and English get Åke Viberg



A cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese Lan Chun



vi

Table of contents

From figures of speech to lexical units: An English-French contrastive approach to hypallage and metonymy Michel Paillard



Corpus-based Bilingual Lexicography The role of parallel corpora in translation and multilingual lexicography Wolfgang Teubert Bilingual lexicography, overlapping polysemy, and corpus use Victòria Alsina and Janet DeCesaris

 

Computerised set expression dictionaries: Analysis and design Sylviane Cardey and Peter Greenfield



Making a workable glossary out of a specialised corpus: Term extraction and expert knowledge Christine Chodkiewicz, Didier Bourigault and John Humbley



Translation and Parallel Concordancing Translation alignment and lexical correspondences: A methodological reflection Olivier Kraif



The use of electronic corpora and lexical frequency data in solving translation problems François Maniez



Multiconcord: A computer tool for cross-linguistic research Patrick Corness



General index



Author index



Preface

Most of the articles in this volume represent a selection of papers presented at the ‘Contrastive Linguistics and Translation Studies. Empirical Approaches’ conference organised by Sylviane Granger at the Catholic University of Louvain in February 1999. All the contributions have been revised to fit the special theme of the volume. In addition, two contributions have been added to the original selection of papers: the introductory survey by Bengt Altenberg and Sylviane Granger and Wolfgang Teubert’s article on the importance of translations in cross-linguistic lexical research. The contributions reflect three striking tendencies that emerged during the conference. One is the rapidly growing interest in corpus-based approaches to the study of lexis, in particular the use of multilingual corpora, shared by researchers working in widely differing fields - contrastive linguistics, lexicology, lexicography, terminology, computational linguistics, machine translation and other branches of natural language processing. The second tendency finds its expression in the wealth of methodological approaches represented at the conference, especially as regards the kinds of corpora used and the ways in which multilingual lexical information can be extracted from corpora and exploited for various purposes. This methodological diversity reflects to some extent the types of monolingual and multilingual corpora available at the time of the conference, but it is above all a healthy and promising sign of the vitality and desire for reorientation in a number of related fields where not only the object of research (lexis) but also the methodology (the use of corpora) are rapidly expanding and demanding increasing attention. However, no matter what the purpose of the individual contributions may be, whether theoretical or practical, the driving force that unites them all is easily recognisable as the third — and perhaps most fundamental — tendency to have emerged from the conference: a common desire to give the cross-linguistic study of lexis a firm empirical foundation. We have divided the articles into four main groups reflecting what we regard as some major concerns and aspects of the field: the exploration of

viii

Bengt Altenberg and Sylviane Granger

cross-linguistic equivalence, contrastive lexical semantics, corpus-based multilingual lexicography, and translation and parallel concordancing. The conference brought together researchers from a wide range of countries and this is reflected in the diversity of the languages covered in the articles: English, Catalan, Chinese, Czech, Finnish, French, German, Italian, Lithuanian, Spanish and Swedish. In preparing this volume we have benefited from the generous help of several people. Apart from the contributors themselves, we wish to thank an anonymous reviewer for many valuable comments and suggestions, Helen Swallow for her meticulous examination of the manuscript, and Kees Vaes and Elena Tognini-Bonelli for the confidence they have shown in entrusting us with the task of editing this volume. Bengt Altenberg and Sylviane Granger Lund and Louvain-la-Neuve, Autumn 2001

List of contributors

Victòria Alsina Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra, Barcelona Bengt Altenberg Department of English, University of Lund Didier Bourigault Centre National de la Recherche Scientifique, Equipe de recherche en syntaxe et sémantique, Université Tolouse-le-Mirail Sylviane Cardey Centre de recherche en linguistique LucienTesnière, Université de Franche-Comté Christine Chodkiewicz Centre National de la Recherche Scientifique, Centre de terminologie et de néologie, Laboratoire de linguistique informatique, Université Paris 13 Janet DeCesaris Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra, Barcelona Peter Greenfield Centre de recherche en linguistique LucienTesnière, Université de Franche-Comté John Humbley Centre de terminologie et de néologie, Laboratoire de linguistique informatique, Université Paris 13 Patrick Corness School of International Studies and Law, Coventry University Sylviane Granger Centre for English Corpus Linguistics, Université Catholique de Louvain

x

List of contributors

Olivier Kraif Laboratoire d’Ingénierie Linguistique et de Linguistique Appliquée, Université de Nice Sophia Antipolis Lan Chun Department of English, Beijing Foreign Studies University François Maniez Centre de Recherche en Terminologie et Traduction, Département des Langues Etrangères Appliquées, Université Lumière Lyon II Michel Paillard Département d’Études Anglophones, Université de Poitiers Raphael Salkie School of Languages,University of Brighton Wolfgang Teubert Department of English, University of Birmingham Elena Tognini Bonelli Università degli Studi di Lecce and The Tuscan Word Centre Åke Viberg Department of Linguistics, University of Lund

P I

Introduction

Recent trends in cross-linguistic lexical studies Bengt Altenberg and Sylviane Granger

.

Lexis and contrastive linguistics

.. Lexis: an expanding universe The days are long gone when lexis was thought of as an unruly chaos, “a prison”, to use Di Sciullo & Williams’ (1987: 3) words, “[which] contains only the lawless, and [where] the only thing the inmates have in common is their lawlessness”. Following this period of neglect, during which lexis was most definitely the poor relation of grammar and syntax, there has been a radical restructuring of priorities, and the lexicon now features high on the agenda, in both theoretical and applied linguistics. As a result, there is a general trend towards lexically oriented approaches to language in which what was formerly regarded as syntactic phenomena has increasingly come to be viewed as projections of lexical properties. This development is noticeable in most branches of linguistics, formal as well as functional.1 One influential strand of this development is the empiricist movement that is sometimes called ‘British contextualism’, most clearly represented by John Sinclair and his colleagues. Sinclair (1987a) attributes this dramatic turnabout to two concurring factors: Halliday’s model of language and the advent of computers. In 1966, in an article entitled ‘Lexis as a Linguistic Level’, Halliday called for recognition of a lexical level alongside the universally recognised grammatical level. From the start, however, he insisted that lexis was not to be viewed as totally separate from grammar: “If therefore one speaks of a lexical level, there is no question of asserting the ‘independence’ of such a level, whatever this might mean; what is implied is the internal consistency of the statements and



Bengt Altenberg and Sylviane Granger

their referability to a stated model” (1966: 152). Alongside the grammatical and the lexical levels, there is also a lexico-grammatical level where lexical restrictions intersect with grammatical ones. The main argument offered by Halliday in support of a lexical level is the existence of collocations, i.e. combinatory restrictions which are neither grammatical nor semantic but which reflect “the habitual or customary places” of words (Firth 1957: 12). The acceptability of strong tea and powerful car and relative unacceptability of powerful tea and strong car demonstrate the existence of restrictions which depend on the syntagmatic relations into which words enter. Collocations are essentially based on probabilities, with words having a higher or lower likelihood of occurring together. But on the whole this probability is extremely low and, as a result, verification of Halliday’s probabilistic approach relies on the existence of large corpora and computational techniques.2 Without the advent of computers the approach to lexis propounded by Halliday would never have had the tremendous impact it has already had and continues to have on the field of linguistics. Computers have made it possible to store ever larger collections of texts in electronic form and to analyse them using increasingly sophisticated, versatile and user-friendly software tools. But whereas grammar and semantics involve a high degree of abstraction, and are therefore relatively difficult to access using computer technology, lexis lends itself perfectly to the form-based research at which computers excel, whether those forms be letters, word spaces, punctuation marks or, indeed, words. Take frequency counts for example: an ideal field of enquiry in which to use computational techniques. For the first time ever, linguists have been able to rely on non-impressionistic large-scale frequency data. Although the reliability of frequency studies was questioned from a relatively early stage, this did not put an end to them but, instead, merely prompted corpus linguists to gather bigger and more tightly controlled corpora. These two factors have contributed to bringing the study of words to the forefront of linguistic research, along with a change of name from vocabulary to lexis. But it is not only the name which has changed. It has become an altogether different phenomenon, in three ways in particular. First and foremost, lexis and grammar are now seen as interdependent. This idea, first introduced by Halliday, was further developed by Sinclair, who criticised the traditional decoupling of lexis and grammar and claimed that it was “more fruitful to start by supposing that lexical and syntactic choices correlate, than that they vary independently of each other” (1991:104). This interrelation of grammar and lexis is one of the key features in the new corpus-based

Introduction

Longman Grammar of Spoken and Written English (Biber et al. 1999), which gives pride of place to lexico-grammatical associations — both grammatical associations of lexical words and lexical associations of grammatical structures. Closely linked with this development is the fact that lexis has now been firmly placed on the syntagmatic axis. While paradigmatic relations for a long time dominated lexical studies, the pendulum now seems to have swung in the opposite direction so that it is now on the analysis of co-occurrence relations that attention is focused. This new emphasis on the company words keep, to use Firth’s expression, has led to the discovery of a wide range of word combinations or multi-word units, which vary in fixedness and idiomaticity. The third major change which has taken place in perceptions about lexis is that it is now recognised as displaying a much higher degree of stylistic differentiation than had previously been thought. In the case of English, the analysis of corpora has led to the discovery of a wide range of dialectal differences related to regional provenance (American English, Indian English), age (teenager English), sex (female lexis), time (Middle English lexis), social class, as well as diatypic differences in terms of field, mode and tenor (spoken lexis, ESP lexis, informal lexis). Lexis has undergone a dramatic transformation and come out less autonomous, more open to other layers of language, notably grammar, composed of both single words and multi-word units and entering into a complex network of paradigmatic and syntagmatic relations. .. The revival of Contrastive Linguistics Like lexicology, contrastive linguistics now also occupies a dominant position in linguistics, but it has reached this position via a rather different route. Whereas in the case of lexis, its time had come, contrastive linguistics had already had its glory days back in the 1960s, before falling into disfavour, principally because of its association with structuralism. What we are now witnessing is thus more of a revival, and a dramatic one at that. When Contrastive Analysis (CA) emerged as a scholarly discipline in the decades after World War II, it was regarded mainly as an applied branch of linguistics serving practical pedagogical purposes in foreign and second language teaching. In accordance with the linguistic climate of the time (structuralism, early generative grammar), phonology and grammar held centre stage, while lexis played a subordinate role.3 The high hopes it had raised — that similarities and differences between languages could predict, or at least explain, prob-





Bengt Altenberg and Sylviane Granger

lems in foreign and second language learning and make language teaching more efficient — were largely thwarted. For a time CA became a suspect field of study, especially in the United States (on the history and deficiencies of CA, see Ringbom 1994, Sajavaara 1996, Chesterman 1998). However, in Europe CA continued to thrive and large contrastive projects were established in the 1970s, comparing English and other European languages. There, in particular, the view persisted that CA still had much to offer, not only to language pedagogy, but also to translation theory, the description of particular languages, language typology and the study of language universals (on various early approaches, see Di Pietro 1971, James 1980 and Krzeszowski 1990). Now CA — or contrastive linguistics (CL), as it is increasingly called — is again an active and expanding field which generates lively theoretical and methodological discussion. A large number of research projects, conferences and journals are devoted to cross-linguistic work of various kinds, especially in Europe. And lexis, moreover, is very much the focus of attention. Broadly speaking, there are three main reasons for this, although they are closely interrelated and difficult to separate. Internationalisation and the gradual integration of Europe have created an increasing demand for multilingual and cross-cultural competence, for translation, interpreting and foreign language teaching. The importance of accurate and efficient communication across language boundaries has become a concern not only of linguists and teachers but of governments, commercial institutions and international organisations. As a result, there has been a rapidly increasing awareness of the need for large-scale cross-linguistic research. At the same time, there have been important developments within linguistics. A growing interest in real-life communication has shifted the focus away from the earlier preoccupation with abstract language (sub)systems and the reliance on the native speaker’s intuition as the main source of linguistic knowledge in the direction of natural discourse and empirical data as evidence for linguistic observations. The earlier tendency, fostered by structuralism and early generative grammar, to regard language as consisting of autonomous systems (with phonology and grammar in the centre) has given way to a more complex and dynamic view of language which allows greater interaction between the systems and fuzzier boundaries between them. As mentioned, lexis has acquired a more central position in several respects: the concept of the lexical item has expanded and the interdependence between lexical choice and contextual factors has led to a growing tendency to enrich the lexicon with information of a grammatical, semantic and pragmatic nature (see e.g. Atkins

Introduction

et al. 1994). These tendencies have had a profound influence on lexical CL. A third important reason for the revival of contrastive studies is the computer revolution and the possibility of analysing natural language on the basis of large text corpora. This has opened up new possibilities of research on the basis of bilingual or multilingual corpora and experiments in natural language processing, e.g. in the fields of machine translation, information retrieval and computational lexicography. Corpora provide empirical data for linguistic theories and practical applications or serve as testing grounds for linguistic and computational models. The information gained from corpora is both richer and more reliable than that derived from introspection. These new developments have brought about a revival of interest in CL. CL now permeates a number of fields inside and outside linguistics and its impact has been especially strong in areas concerned with natural language processing, such as machine translation and computational lexicography. Indeed, the analysis of individual languages has even been described as forming a part of CL (Weigand 1998b: vii). The new tendencies have also given rise to increased cooperation between experts from a number of fields: linguistics, lexicography, translation, computer science, psychology and cognitive science. Even if the problems of describing and relating many languages are as formidable as ever, great advances have been made in identifying and addressing the issues and there is new hope and great vitality in the field.

. Multilingual corpora . Types of corpora As we have seen, one factor that has influenced the contrastive study of lexis more than any other is the computer revolution and the development of multilingual corpora. Several types of multilingual corpora need to be distinguished. Unfortunately, the terminology used to describe the different types is inconsistent and confusing (for some different typologies, see Baker 1995, 1999 and Hartmann 1996). We shall here use the typology and terms set out in Figure 1 (cf. Johansson 1998:4–7). Depending on the number of languages involved, one distinction that can be made is that between bilingual and multilingual corpora. To simplify matters, we shall use ‘multilingual’ as a general inclusive term and only be more specific when necessary. A more important distinction is that between comparable corpora and translation corpora. Comparable corpora consist of original





Bengt Altenberg and Sylviane Granger

Multilingual corpora

Comparable corpora

Translation corpora

Unidirectional

Bidirectional

Figure 1. Types of multilingual corpora

texts in each language, matched as far as possible in terms of text type, subject matter and communicative function. Corpora of this kind can either be restricted to some specific domain (e.g. genetic engineering, contract law, job interviews) or be large ‘balanced’ corpora representing a wide range of text types. Translation corpora consist of original texts in one language and their translations into one or several other languages. If the translations go in one direction only (from language A to language B) they are unidirectional; if they go in both directions (from language A to language B and from language B to language A) they are bidirectional. The term ‘parallel corpus’ is sometimes used as an umbrella term for both comparable and translation corpora, but it seems more appropriate for aligned translation corpora, where a unit (paragraph, sentence or phrase) in the original text is linked to the corresponding unit in the translation (see Section 2.2).4 Each of these types has its advantages and disadvantages (see Aijmer et al. 1996, Teubert 1996, Johansson 1998). Comparable corpora represent natural language use within the genres they contain and are unaffected by various translation effects (see below). Domain-specific corpora are especially useful for terminological studies. If ‘comparability’ is taken in a broad sense, very large ‘balanced’ corpora representing a wide range of genres and text types can serve as comparable corpora. Since corpus size and large quantities of data are important factors in contrastive lexical research, they are especially useful in collocation studies and as control corpora for results derived from translation corpora. The problem with comparable corpora is, somewhat paradoxically, the comparability of the data. It is difficult, and in some cases impossible, to know what to compare, i.e. to relate expressions with comparable meaning and function in the languages compared. Moreover, unlike translation corpora, compa-

Introduction

rable corpora cannot reveal sets of cross-linguistic equivalents in cases where one or both languages provide a choice of alternatives (unless these have been identified in advance). Another problem with comparable corpora is their functional and stylistic comparability. If the source texts of the corpora are not selected according to the same principles, any comparison is bound to be uncertain. For these reasons, the use of comparable corpora is either limited to restricted domains or to very large balanced corpora where such factors as topic, register, and communicative function can be controlled. Translation corpora have the advantage of keeping meaning and function constant across the compared languages.5 They also make it possible to discover cross-linguistic variants, i.e. alternative ways of rendering a particular meaning or function in the target language. By reversing this process, i.e. starting from the range of variants discovered in language B and observing how these are rendered in language A, it is possible to discover paradigms of cross-linguistic correspondences (see Section 5.2). The disadvantage of using translation corpora is that translations tend to retain traces of the source language (‘translationese’ — see e.g. Gellerstam 1986, 1996) or display other general characteristics of translated texts (see Baker 1993, Schmied and Schäffler 1996). The results based on translation corpora therefore have to be verified on the basis of original text corpora. Another disadvantage of translation corpora is that they rarely provide a full or balanced representation of the languages compared. By definition they are restricted to genres and text types that are translated, which tends to confine them to certain written text types. Moreover, what is translated tends to vary from one language to another: for reasons of cultural dominance certain text types may be translated in one direction but not in the other. As a result, translation corpora are seldom large and well balanced, a fact which limits their usefulness for certain types of cross-linguistic studies. It is obvious from this comparison of the advantages and disadvantages of the two main types of multilingual corpora that they should be seen as complementary sources of cross-linguistic data. The possibility of combining comparable and translation corpora, thus taking advantage of the specific merits of both types, has also been recognised in various contrastive projects, e.g. in the composition of the English-Norwegian Parallel Corpus (see Johansson 1998) and the English-Swedish Parallel Corpus (see Altenberg and Aijmer 2000) and in the cross-linguistic methodology advocated by Teubert (1996). The cross-linguistic insights gained from translation corpora obviously increase considerably if more than two languages can be compared. One inter-





Bengt Altenberg and Sylviane Granger

esting example of a multilingual bidirectional translation corpus involving a number of languages is the Oslo Multilingual Corpus.6 The basis of this corpus is the English-Norwegian Parallel Corpus (ENPC) (Johansson 1998), which is closely linked to similar English-Swedish and English-Finnish translation corpora. By extending the ENPC to include translations between English, German, Dutch and Portuguese, it will be possible to compare six languages using English original texts as a starting point. . Text alignment and search tools To be maximally useful translation corpora must be aligned in such a way that a unit in the original text is linked to the corresponding unit in the translated text. The linked units can then be displayed together and compared, and parallel concordancers and other multilingual search tools can be applied to the aligned texts. Translation corpora can be aligned paragraph by paragraph or, more commonly, sentence by sentence, but experiments are also being made to align translation corpora at phrase and word level.7 Automatic sentence-level alignment, which was first developed for the French and English versions of the Canadian Hansard (see e.g. Brown et al. 1991, Gale and Church 1991), is normally based on statistical matching of features that link corresponding sentences in the source and target texts, such as sentence length (in terms of words or characters), typographical features (e.g. initial capitals, punctuation marks) and cognate words, but there are also programs that make use of a combination of statistical feature matching and a bilingual lexicon of unambiguous equivalents in the languages involved (see Hofland 1996, Hofland and Johansson 1998).8 The main obstacle to automatic sentence alignment is represented by cases where a sentence in the original text has been divided into two (or more) sentences in the translation or, conversely, where two (or more) sentences in the original text have been combined into one in the translation. Sentence-level alignment programs generally achieve a high degree of accuracy, but the result has to be checked and corrected manually. Multilingual alignment, i.e. alignment of a source text and its translations into several languages, has also been carried out with good results (see e.g. Hofland and Johansson 1998:98f.). Efforts have also been made to align parallel texts at word or phrase level (see e.g. Church and Gale 1991, Kay and Röscheisen 1993, Merkel 1999:113ff.). This is a much more difficult task than sentence alignment, since a given word in the source text may be rendered by many translation equivalents and structural

Introduction

paraphrases, and sometimes none at all. Word alignment programs must therefore rely heavily on bilingual lexicons, contextual pattern matching and sophisticated statistical techniques. Since perfect word alignment is difficult to achieve, most text alignment programs used today are sentence-based. A survey of various alignment techniques and an examination of two major problems confronting word alignment, viz. the lack of isomorphism of lexical units across languages and the semantic discrepancy between source and target expressions that is often found in translation corpora, is presented by Kraif in this volume. Text alignment is a prerequisite for parallel concordancers and other multilingual tools. These vary in approach and degree of sophistication. Here we shall distinguish two main types: (1) parallel concordancers and search tools (‘browsers’) which operate on previously aligned corpora and which identify and present a search word (or expression) in its context together with the corresponding aligned unit in the other language, and (2) word-based concordancers pairing lines of text on the basis of computed word correspondences in the compared languages. In the first type the user selects a search item in L1 or L2 as input and either (a) leaves the equivalents in the other language open, or (b) pre-selects one or several potential equivalents in the other language. In the former case the program presents all the aligned sentence pairs containing the search item in one of the languages and it is up to the user to identify any relevant equivalents in the aligned output. This is illustrated in the following example, which shows a small sample of a search for drug(s) (in bold) in the sentence-aligned EnglishFrench Canadian Hansard corpus using the web-based TransSearch interface.9 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Police have to comfort and question the victims of murderers, rapists, armed bandits, drug dealers Les policiers doivent réconforter et interroger les victimes de meurtriers, de violeurs, de bandits armés et de trafiquants de drogue ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– It means that cheaper generic drugs will not be available to them. Cela veut dire qu’ils ne pourront plus obtenir de médicaments génériques bon marché. ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Each time they stop a car they never know whether the driver is armed, on drugs, a hood or an upstanding member of the community. Chaque fois qu’il arrête une voiture, il ne sait jamais si le conducteur est armé,





Bengt Altenberg and Sylviane Granger

drogué, ou s’il s’agit d’un truand ou d’un membre respecté de la collectivité. ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Many young people feel either rejected or marginalized in society which creates additional problems of crime and drug and alcohol abuse. Dans notre société, bien des jeunes se sentent rejetés ou marginalisés, ce qui occasionne d’autres problèmes de criminalité, de toxicomanie et d’alcoolisme. ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– If a pre-selected equivalent of the search item is specified, the program only presents the aligned sentence pairs that contain the search item and the preselected equivalent in the other language. This is illustrated in the following example, which shows a small sample in KWIC format from a TransSearch bilingual query for drug(s) translated either as médicament(s) or drogue(s). drug(s)/médicament(s) ...withdrawal of Bill C-91 which gives brand name drugs a 20-year market monopoly ... du projet de loi C-91 qui donne aux fabricants de médicaments brevetés un monopole de 20 ans... ...to the Canadian people access to information as to drug safety and efficacy. ...à l’information sur l’innocuité et l’efficacité des médicaments

drug(s)/drogue(s) A lot of the drugs that come into this country.... Bon nombre de drogues introduites dans notre pays... ...who were lured into prostitution, hooked on drugs and exploited... ...dans la prostitution, rendues dépendantes de la drogue et exploitées... ...organized crime hides the profits of the drug trade, international smuggling,... ...camoufler les profits du commerce de la drogue, de la contrebande internationale.... ...moving to coastal communities if the drug trade continues the way it has. ...dans les localités côtières, au train où va le trafic de drogues.

These bilingual concordances yield a wealth of information, notably on the most frequent multiword units (drug abuse/dealers/cartels/smugglers/barons/trafficking/ trade) and their equivalents in the other language. In the case of drogue and drug they are the ideal starting point from which to uncover the rules governing the choice between the singular and plural form in the two languages.10 The sophistication of sentence-based concordancers or browsers varies, but most programs allow the user to choose which of the languages he wishes to regard as the source language (L1) and which as the target language (L2), to use ‘wildcards’, and to restrict the search by means of various contextual conditions

Introduction

or word-class tags (if the corpus is tagged for word-class). Some examples of various types of (paragraph-based or sentence-based) multilingual browsers are ParaConc (Barlow 1995), Multiconcord (Wools 1998), the Translation Corpus Explorer (Ebeling 1998) and the Pedant Bilingual Concordance (Ridings 1998). A detailed demonstration of how Microsoft Word can be used to align source texts and translations and be combined effectively with a mark-up program and the parallel concordancer Multiconcord is given by Corness in this volume. Word-based concordance programs are closely related to word alignment and are consequently more problematic. This type makes use of a statistical matching technique which creates an index indicating which words in L1 tend to correspond to which words in L2. It takes just one search word as input and uses the pre-computed index of word correspondences to align concordance lines in L1 with their translations in L2 (see e.g. Church and Gale 1991). Obviously, this is a complicated statistical task and the outcome depends on the efficiency of the index and on the ‘closeness’ of the translation. Parallel concordance programs of this type are still in an experimental stage, and the most robust and immediately useful multilingual search tools available today are therefore concordancers and browsers of the first type. Even if fully automatic and accurate word alignment and word-based concordancing programs may be a utopian goal, there is no doubt that multilingual research tools, however constructed, are extremely useful instruments for anyone concerned with lexical CL, for theoretical as well as practical purposes. By allowing the user to compare an L1 keyword in its context with its counterpart in another language they make it possible to arrive at empirically founded, richer and much more delicate descriptions of translation equivalents. This is also amply demonstrated in the studies in the present volume, many of which depend, implicitly or explicitly, on various kinds of alignment and parallel concordance techniques. . Some uses of multilingual corpora Multilingual text corpora can be used for a variety of purposes in contrastive lexical studies. Their main uses can be summarised as follows (cf. Johansson 1998): –

–

they offer a firm empirical basis for cross-linguistic lexical studies, providing richer and more reliable information about the degree of correspondence between lexical items in different languages than comparisons based on introspection; they give new insights into the lexis of the languages compared — insights





Bengt Altenberg and Sylviane Granger

–

–

–

–

–

that are likely to be missed in studies of monolingual corpora; they can be used for a range of comparative purposes and increase our knowledge of language-specific, typological and cultural differences, as well as of universal features; they can be used to study lexical systems as well as the contextual use of lexical items, and thus provide information about paradigmatic as well as syntagmatic lexical relations; they can serve to disambiguate polysemous items, reveal the degree of mutual correspondence of lexical items in different languages, and uncover cross-linguistic sets of translation equivalents in the languages compared; they are of theoretical as well as practical importance: theoretically, they provide input data for lexical models and serve as testbeds for lexical theories and hypotheses; practically, they are essential for applications in a number of fields, such as multilingual lexicography and terminology, natural language processing, machine-assisted translation, translator training, information retrieval, and language teaching; they illuminate lexical differences between original texts and translations and can be used for studies of individual translation problems and strategies, as well as of language-related and universal translation effects.

In the following sections we shall give a brief survey of some of these uses and indicate some major tendencies in corpus-based contrastive studies of lexis in the last decade. The emphasis will be on theoretical and methodological approaches to the study of lexis, but we shall also touch briefly on some developments in multilingual lexicography (Section 6) and machine-assisted translation (Section 7).

. Theoretical and methodological issues . Some contrastive approaches Traditionally, CL has been described as involving three methodological steps: description, juxtaposition and comparison (see e.g. Krzeszowski 1990:35). The description includes selection of the items to be compared and a preliminary characterisation of these in terms of some language-independent theoretical model. The juxtaposition involves a search for, and identification of, cross-linguistic equivalents. In the comparison proper the degree and type of correspondence between the compared items are specified.

Introduction

Modern lexical CL often follows this procedure, but a characteristic feature of recent corpus-based contrastive work is the great variety of approaches employed. This is largely due to the expansion of the field and the new research possibilities that multilingual corpora and search tools offer. The methodology chosen and the delicacy of the analysis depend to a large extent on the purpose of the analysis, e.g. whether it is primarily ‘theoretical’ (focusing on a contrastive description of the languages involved) or ‘practical’ (intended to serve the needs of a particular application). This in turn may determine the role that the corpus is allowed to play in the analysis. One distinction that is sometimes made in corpus linguistics, and which is also applicable to CL, is that between ‘corpus-based’ and ‘corpus-driven’ approaches (see e.g. Francis 1993 and Tognini Bonelli 2001 and in this volume). The former may involve any work — theory-driven or data-driven — that makes use of a corpus for language description, but it is also used in a restricted sense to refer to studies which start from a model postulating a cross-linguistic difference or similarity on theoretical grounds and use a multilingual corpus to confirm, refute or enrich the theory. The latter approach, on the other hand, may start from an implicit or loosely formulated assumption but uses the corpus primarily to discover types and degrees of cross-linguistic correspondence and to arrive at theoretical statements. In practice, however, the distinction may be slight. The difference lies rather in the importance attached to the initial assumptions and the role that the data play in the analysis. Here we shall use the term ‘corpus-based’ as an umbrella term covering both types of corpus-informed studies. In the following sections we shall briefly examine some of the theoretical and methodological issues involved and how these have been approached in some recent corpus-based contrastive studies of lexis. . Tertium comparationis and translation equivalence Any cross-linguistic comparison presupposes that the compared items are in some sense similar or comparable. That is, to be able to say that certain categories in two languages are similar or different it is necessary that they have some common ground, or tertium comparationis. For lexis it is obvious that the compared items should express ‘the same thing’, i.e. have the same (or at least similar) meaning and pragmatic function (see James 1980: 90f.). However, what exactly this ‘thing’ is is not always obvious, and the problem of identifying a tertium comparationis in CL has been discussed a great deal in the past (see e.g. James 1980:169ff., Krzeszowski 1990, and Chesterman 1998:27ff.).





Bengt Altenberg and Sylviane Granger

Krzeszowski (1990: 23f.) has distinguished seven types of equivalence: statistical equivalence, translation equivalence, system equivalence, semanticosyntactic equivalence, rule equivalence, substantive equivalence and pragmatic equivalence. However, although there is something to say for this taxonomic approach, it seems that the only way we can be sure that we are comparing like with like is to rely on translation equivalence (see James 1980: 178). Chesterman (1998: 37ff.) develops this in the following way. Any notion of equivalence is a matter of judgement. Similarly, cross-linguistic equivalence is not absolute, but a matter of judgement or, more precisely, translation competence. “On this view, estimations of any kind of equivalence that involves meaning must be based on translation competence, precisely because such estimations require the ability to move between utterances in different languages. Translation competence, after all, involves the ability to relate two things” (ibid.: 39). The fact that equivalence is a relative concept also has another consequence. It is not realistic to proceed from a tertium comparationis that is based on ‘identity of meaning’. For one thing, this would be putting the cart before the horse and we would run the risk of methodological circularity: the result of the contrastive analysis would be no more than the initial assumption (cf. Krzeszowski 1990: 20). For another, the area we want to explore is often fuzzy and impossible to define satisfactorily (e.g. epistemic modality or pragmatic particles). In such cases we cannot start from a tertium comparationis that is founded on equivalence in a strict sense (identity of meaning). Instead, what we have to do — and what we generally do — is to start from a perceived or assumed similarity between cross-linguistic items (cf. James 1980: 168f.). Viewed in this way, CL becomes a way of refining initial assumptions of similarity. Chesterman (1998:58) expresses this as follows: In this methodology, the tertium comparationis is thus what we aim to arrive at, after a rigorous analysis; it crystallizes whatever is (to some extent) common to X and Y. It is thus an explicit specification of the initial comparability criterion, but it is not identical with it — hence there is no circularity here. Using an economic metaphor, we could say that the tertium comparationis thus arrived at adds value to the initial perception of comparability, in that the analysis has added explicitness, precision, perhaps formalization; it may also have provided added information, added insights, added perception.

The crucial role that translation equivalence plays in CL has important methodological consequences. We have already described the differences between comparable corpora and translation corpora (Section 2.1). When

Introduction

items are compared across comparable corpora, it is difficult to know if we are comparing like with like. Any judgement about cross-linguistic equivalence (or similarity) must be based on the researcher’s ‘translation competence’. This is true at both ends of the analysis: initially, when items are selected for comparison, and finally, when the results of the comparison are evaluated. When we use translation corpora the situation is different. Although we normally start with an initial assumption about cross-linguistic similarity — the very basis for comparing anything at all — we can place more reliance on the translations found in the corpus. The corpus can be said to lend an element of empirical inter-subjectivity to the concept of equivalence, especially if the corpus represents a variety of translators. However, despite the usefulness of translation corpora, to what extent can we trust the translations we find in them? Can we treat all the translations that turn up as cross-linguistic equivalents? There does not seem to be a simple answer to this question. In one sense, every translation is worth considering as a potential translation equivalent as it reflects the translator’s ‘competence’. However, translations are rarely literal renderings of the original. Translators transfer texts from one language (and culture) to another and the translation therefore tends to deviate in various ways from the original. We have already mentioned possible translation effects — traces of the source language or universal translation strategies — and they may involve additions, omissions and various kinds of ‘free’ renderings that are either uncalled for or motivated by cultural and communicative considerations.11 How, then, can we determine which translations should be regarded as ‘equivalents’ in a stricter sense? One solution has been to resort to the procedure of ‘back-translation’ (see Ivir 1983, 1987), i.e. to restrict the comparison to forms in L2 that can be translated back into the original forms in L1. This is likely to eliminate irrelevant differences that are due to the translator’s idiosyncrasies or motivated by particular communicative or textual strategies. Another solution is to rely on recurrent translation patterns, i.e. to resort to a quantitative notion of translation equivalence (cf. Kzreszowski 1990:27). If several translators have used the same translation, this obviously increases its relevance. However, this too implies a risk: by restricting the comparison to recurrent translations we may throw away valuable evidence and miss the cross-linguistic insights that ‘unexpected’ translations often provide. A variant of this approach which combines Ivir’s idea of back-translation and a quantitative notion of equivalence is to calculate what has been called the ‘mutual correspondence’ (or translatability) of two items in a bidirectional





Bengt Altenberg and Sylviane Granger

translation corpus (see Altenberg 1999). If an item x in language A is always translated by y in language B and, conversely, item y in language B is always translated by x in language A, they will have a mutual correspondence of 100%. If they are never translated by each other their mutual correspondence will be 0 %. In other words, the higher the mutual correspondence value is, the greater the equivalence between the compared items is likely to be. Although the mutual correspondence of categories in different languages seldom reaches 100% in a translation corpus (even 80% seems to be a comparatively high value), a statistical measure of translation equivalence can be a valuable diagnostic of the degree of correspondence between items or categories in different languages (see e.g. Altenberg 1999 and Ebeling 1999: 257ff.). However, it does not tell us where to draw the line between equivalence and non-equivalence. Ultimately, the notion of equivalence is a matter of judgement, reflecting either the researcher’s or the translator’s bilingual competence.12 Both involve a judgement of translation equivalence. . Language system vs. language use In the past, contrastive analysis was chiefly concerned with comparisons of abstract systems across languages. However, corpora reflect language use, and translation equivalence is always equivalence-in-context (Chesterman 1998:31). This broadens the scope of contrastive analysis. The aim is to account for both language systems and language use, i.e. the task is not only to identify translation equivalents and ‘systematic’ correspondences between categories in different languages, but to specify to what extent and in what respect they express ‘the same thing’ and where similarities and differences should be located in a model of linguistic description. The extended scope of corpus-based CL creates theoretical as well as methodological problems. As has been pointed out by Salkie (1997) in a comparison of English but and French mais, translation equivalents in two languages seldom have the same distribution and seldom have 100% correspondence in multilingual corpora. This raises a number of important questions. For example, how regular does an observed difference have to be in order to count as systematic (rather than random or unpredictable)? Where should the difference be located — in the language system (langue) or in language use (parole)? To what extent can linguistic (sub)systems be isolated from each other, and in what ways do they interact? (See Salkie in this volume for further discussion of this question.)

Introduction

The fact that translation equivalents seldom have 100% correspondence in translation corpora has been demonstrated in a number of studies. In Altenberg’s (1999) comparison of adverbial connectors in English and Swedish not even cognate or functionally similar items like instead : i stället and on the other hand : å andra sidan reach a mutual correspondence of 80%. The correspondence of cognate or functionally similar verb pairs across languages tends to be surprisingly low. For example, Altenberg’s comparison of the prototypical causative verbs make in English and få in Swedish (this volume) reveals a mutual correspondence of only 52%. Similarly, Viberg’s (1996a:161) comparison of the cognate verb pairs go/gå and give/ge in English and Swedish shows that they are only translated by each other in about a third of the cases, and the mutual correspondence of the primary ‘possession’ verbs get and få in the same languages is shown to be as low as 15% (Viberg, this volume). It is obvious that a low degree of mutual correspondence between functionally related items has several explanations. In the case of Viberg’s verb pairs the reason is the diverging polysemy and the different meaning extensions that verbs tend to develop in different languages (see Section 4.1). In the case of the English and Swedish connectors examined by Altenberg, some of the differences are clearly system-related. For example, connectors with zero correspondence reveal the existence of lexical gaps in either language: the Swedish explanatory connector nämligen has no exact counterpart in English and the English transitional connector now has no counterpart in Swedish. Items with intermediate correspondence values often illustrate differences in the stylistic or functional status of the connectors in the two languages. This is typically revealed by an asymmetrical translation tendency. For example, English therefore is more often translated into Swedish därför than the other way round, because därför is a more common and stylistically more neutral resultive connector in Swedish than therefore is in English. However, there is also evidence of system interchange. This is clearly revealed in Altenberg’s comparison of causative English make and Swedish få in the present volume. In both languages the ‘periphrastic’ causative verb construction with make and få can be replaced by alternative constructions, such as a synthetic causative verb or a structurally reorganised causative construction. Epistemic modality is another area where different subsystems tend to interact. For example, as shown by Aijmer (1999) in her comparison of epistemic possibility in English and Swedish, when there is a gap in the Swedish system of modal auxiliaries, it can be filled by a modal adverb. Similarly, when English may and Swedish kan are not good equivalents, the translators tend to





Bengt Altenberg and Sylviane Granger

choose a corresponding adverb or a combination of modal elements. A similar tendency is revealed in Johansson’s (1997) multilingual comparison of the generic pronoun man in German and Norwegian and its counterparts in English. Many languages have a generic pronoun (e.g. man in German and the Scandinavian languages, one in English, and on in French), but their frequency and stylistic status vary from language to language. Consequently, translations between such languages tend to display different tendencies depending on the direction of the translation. When a generic pronoun is translated from a language where it is comparatively infrequent (such as English) into a language where it is relatively frequent (such as the Scandinavian languages and, in particular, German and French), it is generally rendered by a generic pronoun in the target language. However, translations in the opposite direction show a different tendency. The generic pronoun in the source language is less often translated by a generic pronoun in the target language. Instead, it tends to be rendered by a range of syntactically restructured impersonal expressions, such as non-finite clauses, agent-less passives, imperatives and nominalisations. These cross-linguistic differences suggest that the tertium comparationis needs to be defined at the intersection of several structural systems. Further examples of system interaction will be given in Section 4.2. The shift from one construction in the source language to another in the target language is often accompanied by a change of viewpoint. For example, in changing an original active clause with generic man as subject into either a construction with a specific personal pronoun (e.g. I, he or she) or an impersonal passive or non-finite construction, the translator can in some sense be said to view the situation expressed in the source language from a different perspective. A shift in perspective of a different kind is examined by Salkie (this volume) under the term ‘modulation’ and used as a way of explaining the various ‘unexpected’ translations of the German adverb kaum into English and of the English verb contain into French. We see, then, that translation corpora confront the researcher with a wealth of different translation ‘types’ reflecting various degrees of cross-linguistic correspondence. Broadly speaking, these can be said to range from highly recurrent ‘expected’ translation equivalents to a bewildering variety of ‘unexpected’ renderings, many of which cross the boundaries between linguistic subsystems and at first sight seem to defy classification. It may be tempting to dismiss such ‘unexpected’ cases as products of the translator’s ‘performance’, but there is generally a good reason behind the choice of translation. It is the task of the contrastive researcher to evaluate the corpus data as far as possible

Introduction

and try to see the patterns lurking behind the translator’s resourcefulness and behind the most ‘unexpected’ renderings that turn up in translation corpora.

4. Types of cross-linguistic correspondence Languages divide up semantic space in different ways. This is a natural consequence of the fact that the conceptual world evolves differently in different languages, for historical, cultural, geographical and social reasons. As a result, complete equivalence between words and expressions in different languages is rather unusual, just as it is unusual to find exact synonyms within one language. This lack of cross-linguistic correspondence is manifested in different ways. The number of concepts encoded in the vocabulary may differ from one language to another. Moreover, the conceptual systems may differ in structure. Familiar examples of this are the ways in which colours and kinship are encoded in different languages. Swedish, for example, has no common term corresponding to English uncle or French oncle but has to make a distinction between farbror ‘father’s brother’ and morbror ‘mother’s brother’. One consequence of this is that words that are treated as translation equivalents in bilingual dictionaries tend to have different ranges of meaning. An example of this is the relationship between the French, English and German words bois : wood : Holz and forêt : forest : Wald (see Svensén 1993:141). Bois has a wider meaning than wood, and wood a wider meaning than Holz; conversely, Wald has a wider meaning than forest, and forest has a wider meaning than forêt. As a result, the meanings of wood and Wald only partly overlap, and the same is true of forest and bois. In other words, there is not complete equivalence between any of the words. Partial overlap of a similar kind is revealed by Teubert (1996) in his analysis of English diary and calendar and German Tagebuch, Kalender and Almanach. The divergent meaning extensions that have evolved in different languages are especially striking in high-frequency words expressing certain basic meanings. This is clearly illustrated by verbs of motion, perception, and cognition, which occur in most languages with roughly the same basic meanings. At the same time, they are highly polysemous owing to various types of universal and language-specific meaning extensions (see e.g. Viberg 1996a).13 The complex cross-linguistic differences these give rise to can be described in terms of such general processes as lexical specification (or elaboration), schematisation (or abstraction), grammaticalisation, metaphorical extension, and idiomatisation.





Bengt Altenberg and Sylviane Granger

Cross-linguistically, these developments result in complex patterns of partially overlapping polysemy. Differences of this kind are not only a major problem for language learners, they have also become one of the major stumbling blocks for machine translation and one reason why the lexicon is often described as the ‘bottleneck’ of natural language processing (see e.g. Calzolari 1996: 3 and Sinclair et al. 1996: 174). To identify and describe these patterns is a challenge for lexical CL. However, cross-linguistic equivalence is not only a matter of semantic content. Since the meaning of words is also determined by their grammatical and lexical environment (syntagmatic relations like colligation and collocation) as well as by the situation in which they are used (style, pragmatics), similarities and differences in these respects must also be considered when cross-linguistic equivalence is determined. In other words, equivalence is a complex phenomenon: it involves several levels of linguistic description, and both paradigmatic and syntagmatic relations. We shall not attempt to give a detailed description of various types of cross-linguistic correspondence here. Instead, we shall make a broad distinction between three types of cross-linguistic relationships: (a) overlapping polysemy (items in two languages have roughly the same meaning extensions) (b) diverging polysemy (items in two languages have different meaning extensions) (c) no correspondence (an item in one language has no obvious equivalent in another language) It should be added that polysemy is not a clear-cut notion. Whether a lexical item can be assigned a certain number of meanings (polysemy) or should be regarded as vague or underspecified with regard to particular items in another language is often difficult to determine. However, it is obvious that translation corpora offer a fertile basis for exploring issues of this kind. In the rest of this section we shall give examples of some recent studies that have explored various types of correspondence. Since overlapping polysemy (in its strictest sense) is relatively uncommon (see however Alsina and DeCesaris, this volume), we shall concentrate on the last two types distinguished above. The difference between paradigmatic and syntagmatic relations will be discussed separately in Section 5.

Introduction

. Diverging polysemy Diverging polysemy is a very common phenomenon in contrastive studies of lexis. In a series of studies focusing on high-frequency verbs with similar basic meanings in English and Swedish, Viberg (1996a and b, 1998, 1999, this volume) has explored the divergent patterns of polysemy characterising verbs of motion (such as go : gå and verbs for ‘running’, ‘putting’ and ‘pulling’) and physical contact verbs (verbs for ‘hitting’) in the English-Swedish Parallel Corpus. Using a general typological framework, partly inspired by Miller and Johnson-Laird (1976), Talmy (1985) and the frame semantics model proposed by Fillmore and Atkins (1992), he demonstrates that verbs that are usually treated as translation equivalents in dictionaries display surprisingly low mutual correspondence in the corpus, a fact which is due to their various divergent meaning extensions and reflected in a wide range of translations in both languages. Viberg’s studies are a good illustration of how theory and cross-linguistic data can interact in a fruitful way. The data serve to test the validity of a language-independent semantic framework, while the framework provides a stable basis for refined descriptions of language-specific and typological lexical differences, as well as of universal semantic categories and principles of meaning extension. In his contribution to the present volume Viberg compares the Swedish possession verb få with is closest English equivalent get and, more briefly, with its equivalents in Finnish and French. Starting from basic sense distinctions of få and get established on the basis of the original texts, he uses their translation equivalents to determine their degree of cross-linguistic correspondence. Viberg finds great conceptual similarities, as regards both their basic and their extended meanings, but the lexicalisation patterns are very language-specific and their mutual translatability low. Another important finding is that the meanings of both verbs can to a large extent be disambiguated by the syntactic frames in which they occur. Some meanings, however, have to be inferred from semantic and pragmatic cues in the linguistic and extra-linguistic context. A good example of the complexity of cross-linguistic (and intralinguistic) lexical relationships are the multiple correspondences revealed by Chodkiewicz et al. (in this volume) in their comparison of the French legal term procédure and the English term proceedings in the French and English versions of the European Convention on Human Rights. Both terms are highly polysemous and consequently have multiple equivalents in the other language: proceedings has no less than twelve translation equivalents in the French sub-





Bengt Altenberg and Sylviane Granger

corpus and procédure has six in the English subcorpus. Although these correspondences do not create any problems of comprehension, their description is a great challenge for the lexicographer and terminologist. A special variant of diverging polysemy can be said to exist when a single word in one language is variously rendered by several items in another language. A well-known example of this is the verb think in English which regularly corresponds to several verbs in other languages (cf. Nuyts 1997). German, for instance, has to use at least three different verbs (denken, glauben and finden) to express the main meanings covered by think. A similar differentiation is required in Swedish where the main counterparts are tänka (‘cogitation’), tycka (‘subjective evaluation’) and tro (‘belief ’), as illustrated in the following examples (from Aijmer 1998:278): What are you thinking of?

Vad tänker du på?

I think Stockholm is a beautiful city

Jag tycker Stockholm är en vacker stad

I think Stockholm is the capital of Sweden

Jag tror Stockholm är Sveriges huvudstad

English think thus represents a complex case of polysemy and semantic fuzziness that forces a semantic (and lexical) distinction in other languages.14 The different meanings must be distinguished pragmatically by means of contextual cues and background knowledge. In cases of this kind translation corpora can help to specify not only the choices that have to be made in other languages, but also the conditions that determine the choices and the semantic range covered by the different alternatives. This is well illustrated in three closely related studies by Simon-Vandenbergen (1998), Aijmer (1998) and Mauranen (1999), who examine the Dutch, Swedish and Finnish equivalents of the epistemic use of I think (i.e. excluding its dynamic ‘cogitation’ sense corresponding to Swedish tänka or German denken). Since Aijmer also makes use of translations into German and Norwegian, the three studies give a broad multilingual picture of the main equivalents of think in five languages. As indicated in Table 1, the field covered by the Germanic verbs can be seen as describing a continuum between two poles: ‘verifiable probability-based opinion’ (the verbs in the left-hand column) and ‘impression-based subjective evaluation’ (the verbs in the right-hand column). Contextual factors determining the choice of verb along the continuum (as well as the use of other related verbs like Dutch dunken and lijken and Swedish tyckas ‘seem’) include such features as type of evidence involved (e.g. direct observation or experience), type of certainty and verifiability and type of speaker authority.

Introduction

Table 1: Translation equivalents of ‘epistemic’ think in some Germanic languages English German Dutch Norwegian Swedish

think glauben denken/geloven tro tro

finden vinden synes tycka

. No correspondence There are also cases where an item in one language has no obvious equivalent in another language. One familiar grammatical example of this is the English progressive, which has no equivalent in many languages and is therefore difficult to translate in a systematic way. Broadly speaking, this kind of cross-linguistic difference is revealed in two ways in translation corpora (although the distinction is not clear-cut and there is a great deal of overlap between the two tendencies): either the lack of a clear equivalent in the target language results in a large number of zero translations, indicating that the translators have great difficulties finding a suitable target item, or in a wide range of translations, indicating that the translators find it necessary to render the source item in some way but, in the absence of a single prototypical equivalent, vary their renderings according to the context. We shall illustrate both tendencies briefly here. A characteristic feature of the Germanic languages is the frequent use of lightly stressed ‘pragmatic particles’ of various kinds, especially in spoken discourse. Familiar examples are the German modal particles ja, doch and schon. The meaning of these particles is difficult to pinpoint and describe in dictionaries, partly because they tend to be multifunctional and partly because their function tends to be pragmatic and highly context-dependent. Although many of them can be described as having a modal function, often corresponding to modal auxiliaries or modal adverbs in other languages, they also tend to have various interactive or interpersonal functions without any direct lexical counterpart in other languages. They are therefore interesting to study contrastively on the basis of translation corpora. Some recent studies will illustrate this. In a study based on the English-Swedish Parallel Corpus, Aijmer (1996) examines the Swedish particles ju, väl, nog and visst and their translations into English. For each particle the translations display a great variety of renderings representing a wide range of categories, from adverbs and modal auxiliaries to full verbs and comment clauses. In some cases (especially in the case of väl)





Bengt Altenberg and Sylviane Granger

questions and tag questions are used as approximate renderings. However, the most striking result is the high frequency of zero translations, especially in the case of ju, nog and visst. The difficulty of rendering these particles into English is particularly clearly illustrated by ju (‘as you know’), which lacks a translation in 71% of the cases. Moreover, a great proportion of the renderings are unique (singleton) translations and the most common English rendering (after all) only represents 5% of the examples. Yet, despite the lack of an obvious English counterpart, each particle has a translation ‘profile’ of its own, reflecting its complex pragmatic function. Hence, the translations can help to specify the functional ‘identity’ of the particles. This identity cannot be described in terms of a single dimension but, like the translations of epistemic think into Dutch or Swedish, rather in terms of a combination of grammatical, modal and interactive features, involving syntactic position, type of evidence (e.g. belief, inference, hearsay, etc), type of authority involved (first, second and third person) and interactive appeal (e.g. soliciting the listener’s confirmation). A very similar picture emerges from Johansson and Løken’s (1997) study of Norwegian particles and their correspondences in English and Johansson’s (1998) study of the noun mind and its Norwegian translations.

. Paradigmatic and syntagmatic perspectives . The lexical unit So far we have tacitly assumed that the lexical items compared across languages are easy to define and identify. However, although the definition and demarcation of a ‘lexical unit’ may be fairly straightforward in theory (see e.g. Cruse 1986: 23ff.), it is often problematic in practice and notoriously difficult for a computer. Teubert (1996:243f.), for example, compares the task confronting a computer with that of a human being trying to make sense of a totally unfamiliar language. In contrast with the ‘orthographical word’, which has no consistent relationship to meaning, a lexical unit (or ‘lexical item’) can be defined as a stable pairing of form and meaning (cf. Sinclair 1998). What complicates the picture is that lexical units may consist of several words and that multiword units tend to be unstable in form, lexically as well as grammatically (see e.g. Moon 1996). Moreover, many meanings are difficult to specify without considering co-occurrence phenomena in the linguistic context. As a result, in corpus-based studies the researcher must either know what to look for, or rely on collocation software for spotting potential lexical units in corpora.

Introduction

As mentioned, the meaning of lexical units must be determined with respect to two linguistic dimensions, the paradigmatic and the syntagmatic. The paradigmatic dimension reflects how the senses of the words in a language are related to each other and, cross-linguistically, to the senses distinguished in other languages. Monolingually, these relations are typically described in terms of such relations as synonymy, antonymy, hyponymy, meronomy, etc. (see Cruse 1986:84ff.). Closely related to this dimension are various ways of organising vocabulary in terms of lexical sets or fields or in terms of prototypes. From a contrastive point of view it has been attractive to work with typological and universal categories and to use these as a basis for the comparison. The syntagmatic dimension relates words to the linguistic context, lexically, semantically, and grammatically. Syntagmatic phenomena are typically described in terms of lexical co-occurrence (collocation), semantic preferences (e.g. case roles, selection restrictions, semantic prosody) and syntactic function (e.g. syntactic dependency or valency). Cross-linguistically, this dimension has also led to various attempts to establish language-independent or universal categories (e.g. in frame semantics) against which the vocabulary of different languages can be compared. Although the paradigmatic and syntagmatic axes are two clearly distinguishable lexical dimensions in theory, they are closely related and difficult to separate in practice. The reason for this is that the meaning of a lexical item (its paradigmatic status) can only be determined on the basis of the context in which it occurs (its syntagmatic status). In fact, it is the syntagmatic patterning of words that determines what we can regard as a lexical unit in the first place. However, it is important to add that the two dimensions affect the interpretation of lexical categories in different ways. As pointed out by Sinclair et al. (1996: 176), closed-class words (or ‘grammatical’ words) tend to have little independent meaning and therefore have to be accounted for mainly in terms of their co-text and grammatical or textual function. At the other end of the spectrum are rare and specialised open-class words, such as technical terms, which “usually have little to do with the phraseology that surrounds them” (ibid. 176) and which can therefore easily be listed in multilingual term banks. Most other open-class words exist somewhere in between these two extremes: they tend to be polysemous and their contrastive description generally has to take account of several ‘layers’ of conceptual and contextual factors. In the rest of this section we shall describe some approaches that have a predominantly paradigmatic or syntagmatic bias and conclude with some attempts to reconcile the two perspectives. It should be added that a clear dis-





Bengt Altenberg and Sylviane Granger

tinction between the two dimensions is often difficult to make, and in some cases our account is simply based on the lexical difference mentioned above: open-class items tend to invoke a paradigmatic approach, closed-class items a syntagmatic approach. . Some paradigmatic approaches As we have seen, in order to be able to describe the type and degree of correspondence between lexical items in different languages, the basis of the comparison — the tertium comparationis — has to be primarily semantic or functional. Moreover, it is essential that the model of description that is used is language-independent and, preferably, based on typologically interesting or universal categories. In addition, the comparison must consider various principles of lexical and semantic organisation, including the structure of the vocabularies of natural languages, the nature of lexical relations, the relationship between meaning, concepts and the world, and the nature of polysemy (cf. Kittay and Lehrer 1992: 2). A few paradigmatic approaches of this kind that have been used in lexical CL with some success will be mentioned briefly here: semantic features, prototypicality and semantic fields. One concept that has long been central to the idea of the lexicon as an organised system is semantic decomposition. Interconnections within the lexicon have often been analysed in terms of shared primitive components or features. As pointed out by James (1980: 89) and Kittay and Lehrer (1992: 9), one motivation for this has been economy of description: a small number of components can be used to define a large number of words; another — which is of particular interest for lexical CL — is that semantic primitives offer an attractive (and potentially universal) basis for lexical comparisons across languages. Semantic primitives have consequently been used a great deal in lexical CL, implicitly or explicitly, either as a tertium comparationis or as part of the contrastive analysis (see James 1980:89ff.). However, semantic decomposition has been criticised and is still a controversial issue in lexical semantics (see Kittay and Lehrer 1992: 9). One objection has been that the number of semantic primitives is arbitrary (and theory-dependent), another that differences in meaning between lexical items are conceptual and cannot be captured in terms of abstract linguistic features.15 Some of the problems inherent in semantic decomposition can be avoided by resorting to the notion of prototypicality (see Rosch 1975, Taylor 1995). Prototypicality indicates degrees (or the ‘best fit’) of category membership and is a

Introduction

fuzzier notion than other semantic relations. If we accept that meanings can be fuzzy and are better described in cognitive (rather than purely linguistic) terms, certain lexical relations can be characterised more adequately in terms of prototypes. Prototypicality is therefore often used in cognitively oriented taxonomies and semantic field studies based on typological universals (cf. Viberg 1996:159f.). Another concept with a long history is that of semantic field, i.e. the conceptual domain within which lexemes are organised by specific semantic relations such as synonymy, hyponymy, incompatibility, antonymy, etc. (see e.g. Lehrer 1974, Schwarze 1985, Kittay 1987; on semantic relations, see Cruse 1986). Familiar examples of such fields are ‘colour terms’, ‘cooking’, ‘parts of the body’, ‘visual perception’ (see James 1980: 86ff.) and verbs of motion (Viberg 1996a). The claim of this approach is that “the meaning of words must be understood, in part, in relation to other words that articulate a given content domain and that stand in the relation of affinity and contrast to the word(s) in question. Thus to understand the meaning of the verb to sauté requires that we understand the contrastive relation to deep fry, broil, boil, and also to affinitive terms like cook and the syntagmatic relations to pan, pot, and the many food items one might sauté” (Kittay and Lehrer 1992:3–4). Many of the corpus-based multilingual studies mentioned in the previous sections have been essentially paradigmatic in character (e.g. those by Viberg). A translation-based variant of great theoretical and methodological interest is Dyvik’s (1998) demonstration of how sense distinctions between English and Norwegian lexemes can be made by means of successive bidirectional comparisons of translation correspondences in the English-Norwegian Parallel Corpus. Developing Ivir’s (1983) notion of back-translation, Dyvik proceeds in three steps. Starting from polysemous Norwegian lexemes like tak (‘cover’, ‘roof ’, ‘grip’), selskap (‘companionship’, ‘society’, ‘firm’, ‘party’) and god (‘good’, ‘nice’, fine’, etc), he first examines their translations into English (tak, for instance, is rendered by roof, ceiling, cover, grip and hold). He then reverses the perspective and examines how these English translations are rendered in Norwegian. Reversing the procedure a second time (from Norwegian into English), he arrives at a structured picture of the senses and sense relations of both the Norwegian and English lexemes. The method can be used to define such lexical properties as ambiguity, vagueness and synonymy, as well as lexical fields, feature-specified hierarchies and overlap relations within these fields (e.g. prototypicality, hyponomy). Like Viberg, Dyvik uses translation data to ‘objectify’ criteria and distinctions derived from, or supporting, a semantic model, giving lexical semantics an empirical foundation.





Bengt Altenberg and Sylviane Granger

. Some syntagmatic approaches As mentioned earlier, the corpus revolution has brought the contextual patterning of words into focus in recent years. The use of idioms and collocations tends to be highly language-specific and the syntagmatic aspect of lexis is consequently of great contrastive interest, theoretically as well as practically. It is of great importance for the FL learner (see e.g. Roos 1976, Bahns 1993, Granger 1998, Howarth 1996, 1998) and it is absolutely essential in natural language processing (NLP) fields, such as machine-assisted translation and the creation of multilingual lexical databases. Interesting attempts have been made to extract collocations from existing bilingual dictionaries and monolingual corpora for storage in multilingual electronic lexicons (see Section 6). One example is the DECIDE project (see Grefenstette et al. 1996), which used a combination of these sources to collect speech act nouns and their verb collocates in English, French and German (e.g. make/proffer etc. an apology, einen Vorschlag machen/annehmen etc.) together with information about corpus frequencies and syntactic behaviour in a multilingual database. An interesting alternative to this approach is an experiment carried out at Pisa (see Peters 1996), in which collocations in two domain-specific and topiccontrolled monolingual corpora (in English and Italian) were matched with the aid of a bilingual dictionary. The strategy was to (a) identify collocates of nouns in a corpus representing one language on the basis of their mutual information value (see Church and Hanks 1989), (b) to select potential ‘translation blocks’ in the other language on the basis of an English-Italian lexical database, and (c) to identify similar sets of contexts in a comparable corpus for the other language. Like pragmatic particles, prepositions are closed-class items whose meanings are difficult to define without considering the context in which they occur. Their functional importance varies from language to language and since they tend to have many language-specific and idiosyncratic uses they are interesting to study in a contrastive perspective. Three recent studies based on translation corpora illustrate this: Schmied’s (1998) comparison of German mit and English with, Paulussen’s (1999) contrastive investigation of English preposition/particle on/up, Dutch op and French sur, and Fabricius-Hansen’s (1999) study of German bei and its translations into English and Norwegian. These prepositional studies are interesting in several ways. They all demonstrate the usefulness of translation corpora in specifying the functions of items that derive their ‘meanings’ largely from the context. The great diversity of translation equivalents encountered in the corpora also underlines the inadequacy of earli-

Introduction

er contrastive and lexicographical descriptions that are not based on natural corpus data. Two of the studies (Paulussen and Fabricius-Hansen) also demonstrate the usefulness of a cognitive framework in describing prepositional meanings, at least those at the less idiosyncratic end of the semantic spectrum. Some of the uses of the items examined in these studies are characterised as metaphorical extensions of a prototypical literal or concrete ‘core’ meaning. Although the difference between the literal and figurative uses of an item can be regarded as a paradigmatic phenomenon, it can normally only be established on the basis of the linguistic context, i.e. syntagmatically. This is clearly illustrated in two studies in the present volume comparing various figurative uses of lexis in different languages. Lan Chun demonstrates the usefulness of a cognitive approach in describing the metaphorical uses of up/down in English and shang/xia in Chinese. On the basis of random samples from two monolingual corpora, one English and one Chinese, she reveals remarkable similarities between the metaphorical domains of the examined items in the two languages, a finding which supports the idea that there is a universal spatial metaphorical system and that our abstract reasoning is at least partially metaphorical. Paillard compares the use of two other types of figurative expression, hypallage and metonomy, in English and French. Hypallage involves constructions in which the normal function of an element is changed (by syntactic transposition, conversion or ellipsis) to create a marked effect (e.g. Melissa shook her doubtful curls), while metonomy involves the replacement of a term by another term that is closely associated with it (e.g. redneck). Both types exist on a cline from complete lexicalisation to linguistic creativity. Corpus-based investigations of these types of figurative language are problematic since they cannot be searched for on the basis of form, and Paillard therefore uses a mixture of sources in his study: dictionaries, textual examples and a sample from a translation corpus. Paillard demonstrates that the use of the two types tends to be diametrically opposed in English and French, in terms of both frequency and availability: while hypallage is more common in English, metonomy is in some respects more readily tolerated in French. This divergence appears to reflect interesting cross-linguistic differences: English allows greater syntactic flexibility in terms of movement, part-of-speech conversion and ellipsis, whereas French permits greater semantic freedom in the relationship between argument and predicate.





Bengt Altenberg and Sylviane Granger

. Combining the paradigmatic and syntagmatic perspectives Since corpora always present lexical items in their linguistic context, corpusbased contrastive studies of lexis can hardly avoid paying at least some attention to the syntagmatic patterning of the compared items, even if the primary concern is their paradigmatic relationship. Hence, the paradigmatic and syntagmatic perspectives are often fused and the distinction is to a large extent a matter of emphasis. However, it may be useful to end this section with a brief account of an approach whose goal is a conscious attempt to reconcile the two perspectives.16 A clearly corpus-driven approach to the study of lexis is that advocated by Sinclair (1998). Following the tradition of J. R. Firth (1957) in his definition of meaning as function in context, Sinclair proposes a model in which the paradigmatic and syntagmatic dimensions of lexical items can be determined by studying the contextual patterning — or co-selection — of words in text corpora. Five categories of co-selection are posited as components of a lexical item (ibid.: 14f ): a formally invariable ‘core’, its ‘semantic prosody’ (roughly, its associated attitudinal or pragmatic meaning), collocation (lexical co-occurrence), colligation (grammatical patterning) and semantic preference (link to a co-occurring lexical field). Cross-linguistic applications of this corpus-driven approach which explore the possibility of identifying units of meaning on the basis of their contextual environment in one language and linking them with functionally equivalent units in another have been tested in several European projects involving several languages (see e.g. Sinclair et al. 1996, Sinclair 1996, Teubert 1996). Although the ultimate goal of these projects has generally been to create multilingual lexicons for machine translation (MT) or machine-assisted translation (MAT) (see Section 7), they are of considerable contrastive interest. In the projects reported by Sinclair (1996) and Teubert (1996) the starting point is a number of pre-selected words which are studied on the basis of concordances from monolingual corpora representing the compared languages. For each word, recurrent contextual patterns specifying the different meanings of the word are identified and a translation equivalent is defined for each meaning by the researchers. A characteristic feature of these projects is that the translation equivalents, though inspired by monolingual corpora, are established on the basis of the researcher’s translation competence. An interesting variant of this approach is illustrated in Tognini Bonelli’s (1996) comparison of the English adjective real

Introduction

and its Italian counterparts reale and vero. The comparison, which is based on two broadly comparable monolingual corpora, involves several steps. First, the meanings and functions of the English adjective are established on the basis of its formal patterning in the concordance of an English corpus. Second, the same process is repeated for the most likely Italian translation equivalent, reale, on the basis of an Italian corpus. The meanings and functions of the two items are then compared and cross-linguistic matches and mismatches identified. Since the uses of reale do not cover all the uses of real, another Italian equivalent is postulated and tested to see if it can fill the functional gaps that reale fails to match. The result is a cross-linguistic description of the compared items that can be stored in a bilingual database of comparable units of meaning. The limitation of this approach is that the comparison is confined to prima facie translation equivalents postulated by the researcher, either on the basis of intuition, a bilingual dictionary or a translation corpus. The strength is that it can take full advantage of the large amounts of data provided in the two monolingual corpora to identify and describe syntagmatic and paradigmatic patterns of the compared items. In her contribution to this volume Tognini Bonelli explores the functional equivalence of expressions containing the English word case and the Italian word caso in concordances from broadly comparable English and Italian corpora. Starting with an analysis of the contextual patterning of the English multiword forms in the case of, in case of and in case in the English corpus, she repeats the analysis for their prima facie translation equivalents nel caso di, in caso di and se per caso in the Italian corpus. For each of these pairs the functionally complete units of meaning are identified and compared. The functional equivalence of the items is found to be surprisingly similar, especially in the case of the first two pairs.

. Bilingual and multilingual lexicography Since the publication of the Collins Cobuild English Language Dictionary in 1987, text corpora have become a well-established ingredient in monolingual lexicography. The advantages of using corpora to ensure authenticity and empirical adequacy in lexicography are well documented in Sinclair (1985 and 1987b). In 1994 this practice was extended to bilingual lexicography with the appearance of the Oxford-Hachette English-French, French-English Dictionary, which makes use of two monolingual corpora, one in English and one in





Bengt Altenberg and Sylviane Granger

French (see Atkins 1994:xix). Since then, the use of translation corpora has also become an important supplementary feature in bilingual lexicography, e.g. in the Bilingual Canadian Dictionary project (see Roberts and Montgomery 1996). Both types of corpora are invaluable resources in bilingual lexicography: monolingual corpora in the structuring of the lexical entries, in supplying natural examples and in verifying the target language equivalents, and translation corpora in enriching the inventory of target language equivalents (cf. e.g. Dickens and Salkie 1996, Teubert 1996 and the many projects described in Gellerstam et al. 1996). In the 1990s, the main emphasis has been on the creation of multilingual databases and terminology systems for machine translation as well as for human use (see e.g. Bläser 1995). Most of these projects have been devoted to specific lexical domains, such as perception verbs and nouns (e.g. Ostler 1995, Heid 1995). An interesting approach to multilingual lexicography is used in the Contrastive Verb Valency Dictionary of Dutch, French and English project at the University of Ghent (see Simon-Vandenbergen et al. 1996). Although corpora play a subordinate role in this project, it deserves to be mentioned here because it represents a recent and very interesting contrastive development of a long tradition of valency studies (for a good survey of earlier work, see Devos 1996). It has several distinctive features: (a) it is multilingual, comparing verbs in three languages; (b) it is multidirectional and truly contrastive in that it pays equal attention to all three languages and gives each the same systematic treatment (on this requirement, see Fisiak 1981:3); (c) it is multi-layered, providing syntactic, semantic and stylistic information for each entry. The starting point of the comparison is verbal lexemes that are judged to be prototypical equivalents in the three languages. The lexemes of each language are then analysed separately in terms of their syntax and semantics on the basis of monolingual corpora, bilingual dictionaries and introspection. In the final stage, the descriptions arrived at in this way are contrasted and different degrees of crosslinguistic similarity established. The result is a three-way multilingual electronic lexicon which provides rich contrastive information about the lexical structure of verbs in the three languages and about such phenomena as translation equivalence, overlapping and diverging polysemy, and ranges of meaning. The multidirectional approach also produces a more fine-grained cross-linguistic analysis of the lexemes than would otherwise have been possible. This is illustrated by Devos’s (1996: 35–37) comparative sketch of Dutch kijken, French regarder and English look. A unidirectional analysis that takes, say, Dutch as its point of departure would result in a lexical entry separating only

Introduction

five out of the nine meanings that need to be distinguished in a multidirectional perspective. Moreover, a multidirectional approach gives a clearer picture of the conceptual ranges of the three verbs: English look and French regarder have more meaning extensions than Dutch kijken. Many of the contributions to the present volume demonstrate convincingly how bilingual and multilingual lexicography can be enriched by means of corpora of various kinds. Alsina and DeCesaris examine two interesting issues in bilingual lexicography: how the degree of overlapping polysemy and the use of data from a monolingual corpus can be used to improve the structure of lexical entries in three general-purpose English-Spanish dictionaries and one English-Catalan dictionary. By comparing the sense distinctions made for three polysemous English adjectives, cold, high and odd, in existing monolingual and bilingual dictionaries, they identify potential areas for improvement in the design of the dictionary entries, the ordering of equivalents, and the treatment of idioms and set phrases. This information is then compared with the distribution of senses in the British National Corpus. Their conclusion is that bilingual lexicography should pay more attention to overlapping polysemy, i.e. symmetrical equivalence relations established on the basis of the languages involved. Although a single monolingual corpus is of limited use in this respect, it plays an important role for the ordering of equivalents and for the selection of fixed expressions in a bilingual dictionary. Cardey and Greenfield report on experiences gained from the construction of a multilingual dictionary system intended for the automatic recognition and translation of set expressions in four languages. Focusing on set expressions involving names of animals and parts of the body collected from various dictionaries, they examine the variability and various types of ambiguity associated with such multiword expressions. They conclude that, although only human researchers can identify and disambiguate set expressions and organise the entries in a multilingual electronic lexicon, the computer is a very useful tool in collecting, organising and verifying the use of multiword expressions. NLP systems depend heavily on powerful lexical databases that provide language-specific as well cross-linguistic information about the paradigmatic and syntagmatic patterning of lexical items. An important development in this direction is the creation of databases storing networks of lexical relations within and across languages. An example of this is the EuroWordNet project (see e.g. Vossen 1998, Ide et al. 1998). It is a multilingual development of the American (Princeton) WordNet system and intended to serve a variety of applications such as machine-assisted translation and information retrieval.





Bengt Altenberg and Sylviane Granger

So far, however, the wordnets of this project are derived from existing machinereadable dictionaries rather than from corpora and their usefulness has been called into question (see Teubert this volume). Different possibilities of creating multilingual databases from various sources (machine-readable dictionaries, corpora, language-specific wordnets) are discussed in Steffens (1995) and Teubert (1998).

.

Machine translation and machine-assisted translation

Much of the recent upsurge of interest in corpus-based CL has its origin in the growing need for machine translation (MT) or machine-assisted translation (MAT). When computers began to be used for language processing in the 1950s and 1960s one of the first priorities was MT. However, the results were disappointing, partly because the computational resources were insufficient, but mainly because the complexity of simulating human translation of unrestricted text was underestimated. Neither the mentalist linguistic models nor the statistical processing techniques — the two competing approaches that were used — could cope adequately with the problem of transferring one language in use to another (see e.g. Sinclair et al. 1996). MT and MAT rely heavily on large multilingual computational lexicons or databases (see Section 6). In the 1980s great efforts were made to extract and formalise the lexical information contained in conventional machine-readable dictionaries. However, traditional dictionaries are rarely detailed, systematic, explicit and reliable enough for NLP, and although valuable experience was gained in the development of large-scale lexical databases from such sources, the reliability and coverage of the resulting lexicons proved to be inadequate (see e.g. Ide and Véronis 1995, Steffens 1995:2f.). Maniez (this volume) identifies lexical ambiguity as one of the main problems confronting machine translation. Taking three concrete problems as a point of departure — the translation into French of the English compound sedimentation rate, the ambiguity of the expression based on, and discontinuous collocations — he demonstrates the need for lexical databases that include frequently used compounds and collocations and information about the frequency of the different meanings of polysemous lexical items. The emphasis of his article is not so much on the usefulness of computer tools and corpora in the field of machine-assisted translation — which is taken for granted — but rather on the need to collect data that can improve expert systems and tools for translators. To automize the disam-

Introduction

biguation process it is necessary to build lexical databases which include a comprehensive description of syntactic and lexical ambiguities that are likely to appear in particular domains, together with statistical information about various co-occurrence phenomena derived from corpora. The main outcome of these experiments has therefore been a growing awareness of the need for text corpora as an additional source of data and of closer collaboration between computational linguists, lexicographers and corpus linguists (see Ide and Véronis 1995). As a result, more recent work on MT has increasingly turned to text corpora for such tasks as lexical acquisition, disambiguation and analysis. Corpus-based methods are central for examplebased and statistical MT (e.g. Brown et al. 1990), for the extraction and structuring of multilingual term banks and for the creation of translation memories (e.g. Heyn 1998), and various support tools for translators (Isabelle et al. 1993, Merkel 1999:25ff.). Multilingual corpora have revived the hopes of achieving, if not automatic translation (except in specific domains), at least robust MAT. A number of projects are now at work trying to recover the wealth of information stored in translation corpora in various ways. This work involves at least two tasks. One is to recover, organise and recycle the information available in translation corpora; the other is to develop a very rich network of meaning relationships between categories in the languages involved or — to use the words of Sinclair et al. (1996:174) — to internalise “the expertise of bilingual humans”. The first task relies heavily on the fact that the meaning and use of words can generally be deduced from the linguistic context in which they occur and that the translations serve to distinguish these meanings. Several approaches have been used. One is to create ‘translation banks’, i.e. recurrent source-target pairings stored in a translation memory that can be called upon as an aid in translating new texts. This approach is especially useful for the translation of domainspecific texts where words and expressions tend to recur (see Ahrenberg and Merkel 1996, Merkel 1998: 43–61, Heyn 1998). Another is the procedure proposed by Sinclair et al. (1996), which extends the tradition of monolingual collocational studies to multilingual corpora: words and word combinations are disambiguated by means of their translations and by their linguistic context in both languages. The distinctive patterns recognised in this way are then formalised and stored in a large multilingual database which can be used for machine-assisted and human translation. Variants of this approach, involving a number of languages (English, French, German, Italian, Spanish and Swedish), are described in Sinclair et al. (1996) and Teubert (this volume).





Bengt Altenberg and Sylviane Granger

. The way forward There is no doubt that the use of text corpora has revolutionised contrastive lexical analysis. The wealth of natural language data represented in corpora not only provides a more detailed and accurate picture of the cross-linguistic correspondences of lexical items, it also greatly improves the quality and usefulness of multilingual lexicons and various types of translation tools. However, having said this, it is necessary to emphasize some of the challenges that lie ahead. In his criticism of earlier models of (monolingual) lexical description, Sinclair (1998:14) points out that an analysis that aims to account for both the paradigmatic and syntagmatic patternings of lexical items “calls for nothing less than a comprehensive redescription of each language.” In the same vein, Teubert (1996: 238), commenting on the impressive developments in multilingual lexicography and NLP, states that “further improvement depends on reanalysing the languages involved from scratch with the aid of multilingual corpora.” These statements do not necessarily imply that earlier contrastive work is useless, but they underline the inadequacy of much research in the past and, in particular, the magnitude of the work that lies ahead. Considering the vast size and complexity of the vocabulary of a single language and the enormous task of comparing the lexis of even two languages, these statements serve as a healthy reminder that the ‘revolution’ in lexical CL has only just begun. Despite the revitalisation of the field that multilingual corpora have created and the many useful tools that are now available, a host of problems remain to be solved and many challenges need to be overcome if real progress is to be made. It may be useful to mention some of these challenges here. In particular, there is a need for –

–

– –

–

a stronger coordination of activities and cooperation across related disciplines, in particular corpus linguistics, computational linguistics, lexicography, natural language processing and translation studies; increased efforts to integrate theoretical modelling and empirical studies of language use, to incorporate both the paradigmatic and syntagmatic dimensions of lexis, and to relate language-internal and cross-linguistic lexical relations in a systematic way; further refinement of corpus-based contrastive methodology, especially as regards the combined use of comparable and translation corpora; intensified efforts to create larger, more comprehensive and generally accessible multilingual corpora, especially translation corpora relating more than two languages; further development of multilingual software tools in such areas as word

Introduction

and phrase alignment, parallel concordancing, lexical databases, translator’s workbenches, computer-assisted translation, and multilingual systems in which corpora, electronic lexicons and grammars are linked in a userfriendly way. To achieve these goals much research and development will be required in the future. To judge from the great vitality in the field, the many promising corpusbased projects that are in progress all over the world and, not least, the variety of approaches represented in the present volume, there are good reasons to be hopeful about the future of lexical CL. There are indeed exciting times ahead.

Notes  For an overview of this lexical reorientation, see Faber & Mairal Usón (1999: Chapter 1). Some approaches representing different theoretical traditions reflecting this development are Lexical Functional Grammar (see e.g. Bresnan 1982), Generalised Phrase Structure Grammar (e.g. Gazdar et al. 1985) and its descendant Head-Driven Phrase Structure Grammar (e.g. Pollard and Sag 1987), Word Grammar (Hudson 1984), Functional Grammar (Dik 1978, 1989), Systemic Functional Grammar (Halliday 1994), Role and Reference Grammar (Van Valin 1993, Van Valin and LaPolla 1997), the Functional Lexematic Model (Faber and Mairal Usón 1999, Butler 1998, 1999), Cognitive Grammar (Langacker 1987, 1991), Frame Semantics (e.g. Fillmore and Atkins 1992) and Construction Grammar (Fillmore 1988, Goldberg 1995). It is also clearly manifested in such lexicological enterprises as the MIT Lexicon Project (e.g. Rappaport and Levin 1988, Levin 1993) and the WordNet Project (e.g. Miller and Fellbaum 1991). . “By far the majority of lexical items have a relative frequency in current English of less than 20 per million. The chance probability of such items occurring adjacent to each other diminishes to less than 1 in 2,500,000,000!” (Clear 1993:274) . In his summary of the development of CA up to 1980, James (1980: 83) says that the “structuralist movement in linguistics, and the allied Audio-Lingual Method, with their emphasis on the priority of grammatical patterns, tended, in contrast to the layman’s view, to neglect the role which vocabulary undoubtedly plays in the process of communication.” . In addition to these types there are other corpora involving translations: corpora of original texts and translations in the same language and corpora of translated texts in different languages. These are especially useful for translation studies and for investigations of systematic translation effects (see Baker 1993, Schmied and Schäffler 1996) but of little use in CL. . The use of translations for contrastive studies of lexis has a long history. Viberg (this volume) mentions the work of Wandruszka (1969), who used a non-electronic corpus of 60 publications in six Germanic and Romance languages, partly inspired by Bally (1950). The use of electronic translation corpora in CL is a comparatively recent phenomenon, but its





Bengt Altenberg and Sylviane Granger

roots can be traced back to the 1960s, i.e. the first decade of computer corpora. The first attempt to assemble a bidirectional electronic translation corpus for contrastive studies seems to have been made by Rudolf Filipovic and his collaborators in the Yugoslav SerboCroatian-English Contrastive project at the University of Zagreb (see e.g. Filipovic 1969, 1971). The corpus compiled for this project consisted of half the Brown Corpus (Francis and Kuc¦era 1979) which was translated into Serbo-Croatian and a smaller corpus of original Serbo-Croatian texts translated into English. . For a description of the Oslo project, see http://www.hf.uio.no/german/ sprik/index.html. Another interesting example of a truly multilingual translation corpus is that created by the Trans-European Language Resources Infrastructure (TELRI). The corpus uses Plato’s Republic as a point of departure and includes aligned translations into more than twenty languages (see Erjavec et al. 1998; for studies of translation equivalence on the basis of this corpus, see Teubert et al. 1997 and Teubert this volume). . For some useful surveys of different approaches to text alignment, see e.g. Merkel (1999: 28ff.), Oakes and McEnery (2000), Simard et al. (2000) and Véronis (2000). For experiments in pairing corresponding units in comparable corpora, see Peters (1996). . For some pioneering work on sentence alignment, see Brown et al. (1990), Brown et al. (1991), Gale and Church (1991), Simard et al. (1992). . The example is inspired by an illustration in Church and Gale (1991). For more information on the TransSearch bilingual concordancing tool, see Simard et al. (1993) and the TransSearch website: http://www-rali.iro.umontreal.ca/TransSearch/TS-project.en.html. . In English the underlying principle is mainly syntactic. The singular form is used in premodifying position (drug abuse, drug prevention, drug-related, drug-free) and the plural form in all the other cases (to import drugs, the war against drugs). In French the main factor is the countable/uncountable status of the noun. When it is uncountable the singular is used: lutter contre la drogue, les barons de la drogue, le milieu de la drogue. The plural form is only used when the referent is countable: 10% de toutes les drogues, la cocaïne et les autres drogues. In some cases, both forms are possible: trafic de drogue(s), saisies de drogue(s). . Translation effects, whether induced by the source language or universal strategies, are seldom violations of the target language system in professional translations, but quantitative deviations from the target language norm (see Schmied and Schäffler 1996). As such they are of course eligible as potential translation equivalents. What importance should be attached to them depends on their ‘naturalness’, which can only be evaluated against the norm provided by a large reference corpus of original texts representing the target language. . Alternatively, the definition of equivalence may be determined by the theoretical model used for the contrastive description. If the model requires a certain kind of formal correspondence, or if it draws the line between semantics and pragmatics in a particular way, this may be a legitimate reason for having a stricter definition of cross-linguistic equivalence. However, it is important that the grounds for the definition are made clear. . This is usually demonstrated by the number of subentries that are needed to describe them in monolingual dictionaries. A good English example is the verb run which is given 42 numbered senses in WordNet and 31 in the Longman Dictionary of Contemporary English (cf. Viberg 1998:346).

Introduction . The polysemy of think can be explained diachronically. Historically the English verb think represents a merger of two Old English verbs, /oynkan ‘seem’ and its causative (or factitive) variant /oynkan ‘cause to seem to oneself ’ (cf. Persson 1993). . The controversy over lexical decomposition is well reflected in Cruse’s (1986: 22) dismissal of the terms ‘semantic features’ and ‘semantic components’: “Representing complex meanings in terms of simpler ones is as problem-ridden in theory as it is indispensable in practice. I would like my semantic traits to carry the lightest possible burden of theory. No claim is made, therefore, that they are primitive, functionally discrete, universal, or drawn from a finite inventory. Nor is it assumed that the meaning of any word can be exhaustively characterised by any finite set of them.” . Two other approaches that have attempted to broaden the view of lexical meaning and given it a more lexico-grammatical and cognitively oriented basis are frame semantics (Fillmore 1985, Fillmore and Atkins 1992) and the pragmatically oriented contrastive model suggested by Weigand (1998a). Frame semantics has been used as a language-neutral lexical framework in the creation of corpus-based multilingual dictionary fragments for machine translation and multilingual lexicography (see e.g. Heid 1995, 1996, Ostler 1995). Several (partly corpus-based) studies exploring the conceptual field of ‘emotion’ are presented in Weigand (1998b). Another interesting attempt to combine syntagmatic and paradigmatic approaches is the Functional Lexematic Model, as demonstrated in Faber and Mairal Usón (1999) and Butler (1998, 1999).

References Ahrenberg, L., and Merkel, M. 1996. “On translation corpora and translation support tools: A project report”. In Aijmer et al. (eds), 183–200. Aijmer, K. 1996. “Swedish modal particles in a contrastive perspective”. Language Sciences 18: 393–427. Aijmer, K. 1998. “Epistemic predicates in contrast”. In Johansson and Oksefjell (eds), 277–295. Aijmer, K. 1999. “Epistemic possibility in an English-Swedish perspective”. In Hasselgård and Oksefjell (eds), 301–326. Aijmer, K., Altenberg, B. and Johansson, M. (eds). 1996. Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies. Lund: Lund University Press. Aijmer, K., Altenberg, B. and Johansson, M. 1996. “Text-based contrastive studies in English. Presentation of a project”. In Aijmer et al. (eds), 73–85. Altenberg, B. 1999. “Adverbial connectors in English and Swedish: Semantic and lexical correspondences”. In Hasselgård and Oksefjell (eds), 249–268. Altenberg, B., and Aijmer, K. 2000. “The English-Swedish Parallel Corpus: A resource for contrastive research and translation studies”. In Corpus Linguistics and Linguistic Theory, C. Mair and M. Hundt (eds), 15–33. Amsterdam and Atlanta: Rodopi. Atkins, B.T.S. 1994. “A corpus-based dictionary”. In The Oxford-Hachette French Dictionary, xix-xxvi. Oxford and Paris: Oxford University Press/Hachette Livre. Atkins, B. T. S., Levin, B. and Zampolli, A. 1994. “Computational approaches to the lexicon:





Bengt Altenberg and Sylviane Granger

an overview”. In Computational Approaches to the Lexicon, B. T. Sue Atkins and A. Zampolli (eds), 17–45. Oxford and New York: Oxford University Press. Bahns, J. 1993. “Lexical collocations: a contrastive view”. ELT Journal 47: 56–63. Baker, M. 1993. “Corpus linguistics and translation studies: Implications and applications”. In Baker et al. (eds), 233–250. Baker, M. 1995. “Corpora in translation studies — An overview and some suggestions for future research”. Target 7: 223–243. Baker, M. 1999. “The role of corpora in investigating the linguistic behaviour of professional translators”. International Journal of Corpus Linguistics 4: 281–298. Baker, M., Francis, G. and Tognini Bonelli, E. (eds). 1993. Text and Technology. In Honour of John Sinclair. Amsterdam and Philadelphia: Benjamins. Bally, Ch. 1950. Linguistique générale et linguistique française. 3rd ed. Berne: A. Francke. Barlow, M. 1995. “ParaConc: a concordancer for parallel texts”. Computers and Texts 10: 14–16. Biber D., Johansson, S., Leech, G., Conrad, S. and Finnegan, E. 1999. Longman Grammar of Spoken and Written English. Longman: Harlow. Bläser, B. 1995. “TransLexis: An integrated environment for lexicon and terminology management”. In Steffens (ed.), 159–173. Botley, S. P., McEnery, A. M. and Wilson, A. (eds). 2000. Multilingual Corpora in Teaching and Research. Amsterdam and Atlanta: Rodopi. Bresnan, J. (ed.) 1982. The Mental Representation of Grammatical Relations. Cambridge, Mass.: MIT Press. Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V.J, Jelinek, F., Lafferty, J. D., Mercer, R. L. and Roossin, P. S. 1990. “A statistical approach to machine translation”. Computational Linguistics 16: 79–85. Brown, P., Lai, J. and Mercer, R. 1991. “Aligning sentences in parallel corpora”. In Proceedings of 29th Annual Meeting of the Association for Computational Linguistics (Morristown, NJ), 169–176. Butler, C. S. 1998. “Enriching the Functional Grammar lexicon”. In The Structure of the Lexicon in Functional Grammar, H. Olbertz, K. Hengeveld and J. Sánchez García (eds), 171–194. Amsterdam and Philadelphia: Benjamins. Butler, C. S. 1999. “Some contributions of corpus linguistics to the functional lexematic model”. In Estudios functionales sobre léxico, sintaxis y traducción. Un homenaje a Leocadio Martín Mingorance, M.-J. Feu Guijarro and S. Molina Plaza (eds), 19–37. Cuenca: Universidad de Castilla La Mancha. Calzolari, N. 1996. “Lexicon and corpus: a multi-faceted interaction”. In Gellerstam et al. (eds), 3–16. Chesterman, A. 1998. Contrastive Functional Analysis. Amsterdam: Benjamins. Church, K. W. and Gale, W. A. 1991. “Concordances for parallel texts”. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research (Oxford), 40–62. Oxford: Oxford University Press and Waterloo, Ontario: UW Centre for the New OED and Text Research. Church, K. W. and Hanks, P. 1989. “Word association norms, mutual information and lexicography”. Proceedings of the 27th Annual Meeting of ACL (Vancover, B.C.), 76–83. Clear J. 1993. “From Firth principles. Computational tools for the study of collocation”. In Baker et al. (eds), 271–292.

Introduction

Cowie,A.P.(ed.).1998. Phraseology.Theory, Analysis, and Applications. Oxford: Clarendon Press. Cruse, D. A. 1983. Review of A. Wierzbicka, “Lingua mentalis: The semantics of natural language”. Journal of Linguistics 19: 265–272. Cruse, D.A. 1986. Lexical Semantics. Cambridge: Cambridge University Press. Devos, F. 1996. “Contrastive verb valency: Overview, criteria, methodology and applications”. In Simon-Vandenbergen et al. (eds), 15–81. Di Pietro, R.J. 1971. Language Structures in Contrast. Rowley, Mass.: Newbury House. Dickens, A. and Salkie, R. 1996. “Comparing bilingual dictionaries with a parallel corpus”. In Gellerstam et al. (eds), 551–559. Dik, S.C. 1978. Functional Grammar. Amsterdam: North-Holland. Dik, S. C. 1991. “Functional grammar”. In Linguistic Theory and Grammatical Description, F. Droste and J. Joseph (eds), 247–274. Amsterdam and Philadelphia: Benjamins. Di Sciullo A.-M. and Williams, E. 1987. On the Definition of Word. MIT Press: Cambridge, Mass. Dyvik, H. 1998. “A translational basis for semantics”. In Johansson and Oksefjell (eds), 51–86. Ebeling, J. 1998. “The Translation Corpus Explorer: A browser for parallel texts”. In Johansson and Oksefjell (eds), 101–112. Ebeling, J. 1999. Presentative Constructions in English and Norwegian. A Corpus-based Contrastive Study. Oslo: Unipub forlag. Erjavec, T., Lawson, A. and Romary, L. (eds). 1998. East Meets West: A Compendium of Multilingual Resources. Mannheim: Institut für Deutsche Sprache/TELRI Association e.V. Faber, P. B. and Mairal Usón, R. 1999. Constructing a Lexicon of English Verbs. Berlin and New York: Mouton de Gruyter. Fabricius-Hansen, C. 1999. “Bei dieser Gelegenheit — on this occasion — ved denne anledningen. German bei — a puzzle in a translational perspective”. In Hasselgård and Oksefjell (eds), 231–248. Filipovic, R. 1969. “The choice of the corpus for the contrastive analysis of Serbo-Croatian and English”. The Yugoslav Serbo-Croatian-English Contrastive Project, B. Studies 1, 37–46. Institute of Linguistics, University of Zagreb. Filipovic, R. 1971. “The Yugoslav Serbo-Croatian-English Contrastive Project”. In Papers in Contrastive Linguistics, G. Nickel (ed.), 107–114. Cambridge: Cambridge University Press. Fillmore, C. J. 1985. “Frames and the semantics of understanding”. Quaderni di Semantica 6: 222–254. Fillmore, C. J. 1988. “The mechanisms of ‘Construction Grammar’”. Proceedings of the 14th Annual Meeting of the Berkeley Linguistic Society, 35–55. Berkeley: University of California. Fillmore, C. J. and Atkins, B. T. 1992. “Toward a frame-based lexicon: The semantics of RISK and its neighbors”. In Lehrer and Kittay (eds), 75–102. Firth J. R. 1957. “A synopsis of linguistic theory 1930–1955”. Studies in Linguistic Analysis (special volume of the Philological Society), Oxford, 1–32. Fisiak, J. 1981. “Some introductory notes concerning contrastive linguistics”. In Contrastive Linguistics and the Language Teacher, Fisiak (ed.), 1–11. Oxford: Pergamon. Francis, G. 1993. “A corpus-driven approach to grammar”. In Baker et al. (eds), 137–156.





Bengt Altenberg and Sylviane Granger

Francis, W. N. and Ku¦cera, H. 1979. Manual of Information to Accompany a Standard Sample of Present-Day Edited American English, for Use with Digital Computers. Department of Linguistics, Brown University, Providence, RI. Gale, W. and Church, K.W. 1991. “A program for aligning sentences in bilingual corpora”. In Proceedings of 29th Annual Meeting of the Association for Computational Linguistics (Morristown, NJ), 177–184. Gazdar, G., Klein, E., Pullum, G., and Sag, I. 1985. Generalized Phrase Structure Grammar. Cambridge, Mass.: Harvard University Press. Gellerstam, M. 1986. “Translationese in Swedish novels translated from English”. In Translation Studies in Scandinavia, L. Wollin and H. Lindquist (eds), 88–95. Lund: CWK Gleerup. Gellerstam, M. 1996. “Translations as a source for cross-linguistic studies”. In Aijmer et al. (eds), 53–62. Gellerstam, M., Järborg, J., Malmgren, S-G., Norén, K., Rogström, L. and Röjder Papmehl, C. (eds). 1996. Euralex ‘96 proceedings I-II. Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, Sweden. Göteborg: Department of Swedish, University of Göteborg. Goldberg, A. 1995. A Construction Grammar Approach to Argument Structure. Chicago: Chicago University Press. Granger S. 1996. “From CA to CIA and back: an integrated approach to computerized bilingual and learner corpora”. In Aijmer et al. (eds), 37–51. Granger, S. 1998. Prefabricated patterns in advanced EFL writing: collocations and formulae. In Cowie (ed.), 145–160. Grefenstette, G., Heid, U., Schultze, B.M., Fontenelle, T. and Gerardy, C. 1996. “The DECIDE project: Multilingual collocation extraction”. In Gellerstam et al. (eds), 93–107. Guillemin-Flescher J. 1981. Syntaxe comparée du français et de l’anglais. Problèmes de traduction. Ophrys: Paris. Halliday M. A. K. 1966. “Lexis as a Linguistic level”. In In Memory of J. R. Firth, C. E. Bazell, J.C. Catford, M.A.K. Halliday and R.H. Robins (eds), 148–162. Longmans: London. Halliday, M. A. K. 1994. A Introduction to Functional Grammar. 2nd ed. London: Edward Arnold. Hartmann, R. R. K. 1996. “Contrastive textology and corpus linguistics: On the value of parallel texts”. Languages Sciences 18: 947–957. Hasselgård, H. and Oksefjell, S. (eds). 1999. Out of Corpora. Studies in Honour of Stig Johansson. Amsterdam and Atlanta: Rodopi. Heid, U. 1995. “Relating parallel monolingual lexicon fragments for translation purposes”. In Steffens (ed.), 231–251. Heid, U. 1996. “Creating a multilingual data collection for bilingual lexicography from parallel monolingual lexicons”. In Gellerstam et al. (eds), 573–590. Heyn, M. 1998. “Translation memories: insights and prospects”. In Unity in Diversity. Current Trends in Translation Studies, L. Bowker, M. Cronin, D. Kenny and J. Pearson (eds), 123–136. Manchester: St. Jerome Publishing. Hofland, K. 1996. “A program for aligning English and Norwegian sentences”. In Research in Humanities Computing, S. Hockey, N. Ide and G. Perissinotto (eds), 165–178. Oxford: Oxford University Press.

Introduction

Hofland, K., and S. Johansson. 1998. “The Translation Corpus Aligner: A program for automatic alignment of parallel texts”. In Johansson and Oksefjell (eds), 87–100. Howarth, P. A. 1996. Phraseology in English Academic Writing. Some Implications for Language Learning and Dictionary Making. Tübingen: Niemeyer. Howarth, P. A. 1998. “The phraseology of learners’ academic writing”. In Cowie (ed.), 161–186. Hudson, R. 1984. Word Grammar. Oxford: Blackwell. Ide, N., Greenstein, D. and Vossen, P. (eds). 1998. Special issue in EuroWordNet. Computers and the Humanities 32 (2–3). Ide, N. and Véronis, J. 1995. “Knowledge extraction from machine-readable dictionaries: An evaluation”. In Steffens (ed.), 19–34. Isabelle, P., Dymetman, M., Foster, G., Jutrac, J-M., Macklovitch, E., Perraul, F., Ren, X. and Simard, M. 1992. “Translation analysis and translation automation”. Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI’93), Kyoto, 201–217. Ivir, V. 1983. “A translation-based model of contrastive analysis”. Jyväskylä Cross-Language Studies 9: 171–178. Ivir, V. 1987. “Functionalism in contrastive analysis and translation studies”. In Functionalism in Linguistics, R. Dirven and V. Fried (eds), 471–481. Amsterdam and Philadelphia: Benjamins. James, C. 1980. Contrastive Analysis. London: Longman. Johansson, S. 1997. “Using the English-Norwegian Parallel Corpus — a corpus for contrastive analysis and translation studies”. In Lewandowska-Tomaszczyk and Melia (eds), 282–296. Johansson, S. 1998. “On the role of corpora in cross-linguistic research”. In Johansson and Oksefjell (eds), 1–24. Johansson, S. and Løken, B. 1997. “Some Norwegian discourse particles and their English correspondences”. In Sounds, Structures and Senses. Essays Presented to Niels DavidsenNielsen on the Occasion of his Sixtieth Birthday, C. Bache and A. Klinge (eds), 149–170. Odense: Odense University Press. Johansson, S. and Oksefjell, S. (eds). 1998. Corpora and Cross-linguistic Research. Amsterdam and Atlanta: Rodopi. Kay, M. and Röscheisen, M. 1993. “Text-translation alignment”. Computational Linguistics 19: 121–142. Kittay, E. F. 1987. Metaphor: Its Cognitive Force and Linguistic Structure. Oxford: Clarendon Press. Kittay, E.F. and Lehrer, A. 1992. “Introduction”. In Lehrer and Kittay (eds), 1–18. Krzeszowski, T.P. 1990. Contrasting Languages. Berlin: Mouton de Gryuter. Langacker, R. W. 1987. Foundations of Cognitive Grammar, Vol. I: Theoretical Prerequisites. Stanford: Stanford University Press. Langacker, R. W. 1991. Foundations of Cognitive Grammar, Vol. II. Stanford: Stanford University Press. Lehrer, A. 1974. Semantic Fields and Lexical Structure. Amsterdam: North-Holland. Lehrer, A. and Kittay, E. F. (eds). 1992. Frames, Fields, and Contrasts. New Essays in Semantic and Lexical Organization. Hillsdale, N.J: Lawrence Erlbaum.





Bengt Altenberg and Sylviane Granger

Levin, B. 1993. English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago Press. Lewandowska-Tomaszczyk, B. and Melia, P. J. (eds). 1997. Practical Applications in Language Corpora. Lodz: Lodz University. Mauranen, A. 1999. “Form and sense relations as seen through parallel corpora”. Paper presented at the Third European TELRI Seminar on Translation Equivalence — Theory and Practice (Montecatini, 1997). Mannheim: TELRI. http://solaris.idsmannheim.de/telri/proceedings/MAURANE.html Merkel, M. 1999. Understanding and Enhancing Translation by Parallel Text Processing. Department of Computer and Information Science, University of Linköping. Miller, G. A. and Johnson-Laird, P. N. 1976. Language and Perception. Cambridge, Mass.: Harward University Press. Miller, G.A. and Fellbaum, C. 1991. “Semantic networks in English”. Cognition 41: 197–229. Moon, R. 1996. “Data, description, and idioms in corpus lexicography”. In Gellerstam et al. (eds), 245–256. Nuyts, J. 1997. “How do you think?” In A Fund of Ideas: Recent Developments in Functional Grammar, C. S. Butler, J. H. Connolly, R. A. Gatward & R. M. Vismans (eds), 3–18. Amsterdam: IFOTT. Oakes, M. and McEnery, T. 2000. “Bilingual text alignment — an overview”. In Botley et al. (eds), 1–37. Ostler, N. 1995. “Perception vocabulary in five languages — towards an analysis using frame elements”. In Steffens (ed.), 219–230. Paulussen, H. 1999. A Corpus-based Contrastive Analysis of English on/up, Dutch op and French sur within a Cognitive Framework. Ph.D dissertation, Faculty of Letters and Philosophy, University of Gent. Persson, G. 1993. “Think in a panchronic perspective”. Studia Neophilologica 65: 3–18. Peters, C. 1996. “From parallel to comparable text corpora”. In Gellerstam et al. (eds), 173–180. Pollard, C. and Sag, I. 1994. Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press. Rappaport, M. and Levin, B. 1988. “What to do with theta-roles”. In Syntax and Semantics 21: Thematic Relations, W. Wilkins (ed.), 7–36. New York: Academic Press. Ridings, D. 1998. “PEDANT: Parallel texts in Göteborg”. Lexikos 8: 243–268. Ringbom, H. 1994. “Contrastive analysis”. In The Encyclopedia of Language and Linguistics, R.E. Asher and J.M.Y. Simpson (eds), 737–742. Oxford: Pergamon Press. Roberts, R. P. and Montgomery, C. 1996. “The use of corpora in bilingual lexicography”. In Gellerstam et al. (eds), 457–464. Roos, E. 1976. “Contrastive collocational analysis”. Papers and Studies in Contrastive Linguistics 5: 65–75. Rosch, E. 1975. “Cognitive representations of semantic categories”. Journal of Experimental Psychology 104: 192–233. Sajavaara, K. 1996. “New challenges for contrastive linguistics”. In Aijmer et al. (eds), 17–36. Salkie, R. 1997. “Naturalness and contrastive linguistics”. In Lewandowska-Tomaszczyk and Melia (eds), 297–312.

Introduction

Schmied, J. 1998. “Differences and similarities of close cognates: English with and German mit”. In Johansson and Oksefjell (eds), 255–275. Schmied, J. and Schäffler, H. 1996. “Approaching translationese through parallel and translation corpora”. In Synchronic Corpus Linguistics. Papers from the Sixteenth International Conference on English Language Research on Computerized Corpora (ICAME 16), C. E. Percy, C.F. Meyer and I. Lancashire (eds), 41–56. Amsterdam and Atlanta: Rodopi. Schwarze, C. (ed.). 1985. Beiträge zu einem kontrastiven Wortfeldlexikon Deutsch — Französisch. Tübingen: Gunter Narr. Simard, M., Foster, G., Hannan, M-L., Macklovitch, E. and Plamondon, P. 2000. “Bilingual text alignment: where do we draw the line?” In Botley et al. (eds), 38–64. Simard, M., Foster, G. F. and Isabelle, P. 1992. “Using cognates to align sentences in bilingual corpora”. In Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI92) (Montreal), 67–81. Simard, M., Foster, G. F. and Perrault, F. 1993. “TransSearch: un concordancier bilingue”. Centre d’innovation en technologies de l’information. Laval: Canada. Simon-Vandenbergen, A-M. 1998. “I think and its Dutch equivalents in parliamentary debates”. In Johansson and Oksefjell (eds), 297–317. Simon-Vandenbergen, A-M., Taeldeman, T. and Willems, D. 1996. “Introducing CONTRAGRAM or why we need contrastive verb valency”. In Simon-Vandenbergen et al. (eds), 7–13. Simon-Vandenbergen, A-M., Taeldeman, J. and Willems, D. (eds). 1996. Aspects of Contrastive Verb Valency. Studia Germanica Gandensia 40, University of Gent. Sinclair, J. 1985. “Lexicographic evidence”. In Dictionaries, Lexicography and Language Learning, R. Ilson (ed.), 81–92. Oxford: Pergamon. Sinclair J. 1987a. “Collocation: a progress report”. In Language Topics. Essays in Honour of Michael Halliday, R. Steele and T. Threadgold (eds), 319–331. Amsterdam and Philadelphia: Benjamins. Sinclair, J. (ed.). 1987b. Looking Up: An Account of the COBUILD Project in Lexical Computing and the Development of the Collins COBUILD English Language Dictionary. London: HarperCollins. Sinclair J. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sinclair, J. 1996a. “An international project in multilingual lexicography”. In Sinclair et al. (eds), 179–196. Sinclair, J. 1996b. “Cross-language semantic links. Data-driven multilingual lexicons”. Unpublished report from Workshop on Multilingual Lexical Semantics, 19–21 June 1998. Institut für deutsche Sprache/The Tuscan Word Centre/PAROLE German Subconsortium. Sinclair, J. 1998. “The lexical item”. In Weigand (ed.), 1–24. Sinclair, J., Payne, J. and Pérez Hernández, C. (eds). 1996. Corpus to corpus: A study of translation equivalence. Special issue of International Journal of Lexicography 9: 179–276. Steffens, P. 1995. “Introduction”. In Steffens (ed.), 1–15. Steffens, P. (ed.). 1995. Machine Translation and the Lexicon. Third International EAMT Workshop, Heidelberg, April 26–28 1993. Berlin/New York: Springer. Svensén, B. 1993. Practical Lexicography. Principles and Methods of Dictionary-Making. Oxford: Oxford University Press.





Bengt Altenberg and Sylviane Granger

Talmy, L. 1985. “Lexicalization patterns: semantic structures in lexical forms”. In Language Typology and Syntactic Description, Vol. 3, T. Shopen (ed.), 57–149. Cambridge: Cambridge University Press. Taylor, J. 1995. Linguistic Categorization. Prototypes in Linguistic Theory. 2nd ed. Oxford: Clarendon Press. Teubert, W. 1996. “Comparable or parallel corpora?” In Sinclair et al. (eds), 238–264. Teubert, W. (ed.). 1998. Workshop on Multilingual Lexical Semantics. Mannheim: Institut für deutsche Sprache/The Tuscan Word Centre. (http://solaris3.ids-mannheim.de/workshop.html) Teubert, W., Tognini Bonelli, E. and Volz, N. (eds). 1998. Proceedings of the Third European Seminar ‘Translation Equivalence’, Montecatini Terme, Italy, October 16–18, 1997. Mannheim/The Tuscan Word Centre: The TELRI Association e. V. Tognini Bonelli, E. 1996. “Towards translation equivalence from a corpus linguistic perspective”. In Sinclair et al. (eds), 197–217. Tognini Bonelli E. 2001. Corpus Linguistics at Work. Amsterdam & Philadelphia: Benjamins. Van Valin, R. D. 1993. “A synopsis of Role and Reference Grammar”. In Advances in Role and Reference Grammar, R. D. Van Valin (ed.), 1–164. Amsterdam and Philadelphia: Benjamins. Van Valin, R. D. and LaPolla, R. J. 1997. Syntax: Structure, Meaning and Function. Cambridge: Cambridge University Press. Véronis, J. (ed.). 2000. Parallel Text Processing. Alignment and Use of Translation Corpora. Berlin: Kluwer Academic Publishers. Viberg, Å. 1993. “Crosslinguistic perspectives on lexical organization and lexical progression”. Progression and Regression in Language. Sociocultural, Neuropsychological and Linguistic Perspectives, K. Hyltenstam and Å. Viberg (eds), 340–385. Cambridge: Cambridge University Press. Viberg, Å. 1996a. “Cross-linguistic lexicology. The case of English go and Swedish gå”. In Aijmer et al. (eds), 151–182. Viberg, Å. 1996b. “The meanings of Swedish dra ‘pull’: a case study of lexical polysemy”. In Gellerstam et al. (eds), 293–308. Viberg, Å. 1998. “Contrasts in polysemy and differentiation. Running and putting in English and Swedish”. In Johansson and Oksefjell (eds), 343–376. Viberg, Å. 1999. “Polysemy and differentiation in the lexicon. Verbs of physical contact in Swedish”. Cognitive Semantics. Meaning and Cognition, J. Allwood and P. Gärdenfors (eds), 87–129. Amsterdam: Benjamins. Vinay J. P. and Darbelnet, J. 1969. Stylistique comparée du français et de l’anglais. Paris: Didier. Vossen, P. (ed.). 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic. Wandruszka, M. 1969. Sprachen. Vergleichbar und unvergleichlich. München: Piper. Weigand, E. 1998a. “Contrastive lexical semantics”. In Weigand (ed.), 25–44. Weigand, E. (ed.). 1998b. Contrastive Lexical Semantics. Amsterdam and Philadelphia: Benjamins. Wools, D. 1998. Multiconcord. Birmingham: CFL Software Development.

P II

Cross-Linguistic Equivalence

Two types of translation equivalence* Raphael Salkie

.

Introduction

Translation equivalence is an elusive notion which has been debated vigorously in the literature. If a source text and target text diverge in some way, we need to set up two levels of analysis so that they are different on one level but equivalent on the other. The difficult challenge is to define these levels rigorously: as Gutt (1991) rightly points out, a popular solution which distinguishes the ‘meaning’ of a text from its ‘communicative effect’ is flawed because the former notion is vague and the latter is untestable. This paper argues that translation corpora offer a new perspective on this old issue. One of the advantages of corpora is that they reveal patterns which would be difficult to find otherwise. With a monolingual corpus, the patterns usually involve a phenomenon occurring more frequently than expected. The phenomenon in question is typically one of these: –

linguistic items such as words and phrases

–

‘association patterns’ (Biber et al. 1998:5–8) between items

If a word occurs more frequently in a corpus than we would expect, then that is information in the corpus which is of interest to researchers.1 Similarly, if two items occur together more frequently than we would expect, we have found a regular association pattern which calls for explanation.2 This information was not deliberately entered into the corpus by the writers of the texts. What happens is that language users make a series of unconscious choices which the corpus incorporates and which the analyst can find using frequency counts — assuming that the corpus is large enough to yield patterns which are statistically significant.



Raphael Salkie

With a translation corpus, the patterns that are revealed involve correspondences between words and expressions in different languages. Unlike monolingual corpora, the interesting cases with translations tend to be those where the correspondence is less frequent than anticipated. For example, if an English word in our corpus is translated by the ‘expected’ French word less frequently than we would anticipate, and other translations are used instead, then we have found a puzzle which needs an explanation. With translation corpora our expectations are based on our linguistic competence as bilinguals. For more thorough statements of correspondence we turn to a good bilingual dictionary, though we often find that the corpus contains correspondences which are not mentioned in dictionaries, as we shall see below. This paper looks at two examples of unexpected correspondences that were found in a translation corpus. The patterns that came to light were different, and we examine why this might be. We also consider the implications for lexicographers, translators and contrastive linguists. We made use of the INTERSECT corpus (Salkie 1997, 2000), consisting of about 1.5 million words in French and English, and about 800,000 words in German and English. Details of the corpus are given in Appendix 1.

. The data We looked at two words in the corpus: the German word kaum and its equivalents in English (Appendix 2), and the English word contain with its counterparts in French (Appendix 3).3 For kaum the dictionaries lead us to expect English translations using hardly, along with the less common alternatives scarcely or barely. The corpus produced 61 instances, 38 of them in the fiction texts and 23 in non-fiction. The fiction examples contain one of the expected equivalents 32 times: hardly 18 scarcely 10 barely 4

The other correspondences were a negative expression (13–14 in Appendix 2), almost + negative (15–16), but (17–18), and three instances of time expressions using as soon as or upon (19–24). As the fiction is mostly from the 18th or 19th century, these findings are straightforward, since scarcely and barely perhaps have an old-fashioned feel to them in contemporary English.

Two types of translation equivalence

In the non-fiction examples, on the other hand, the expected equivalents are far less frequent: hardly 5 scarce 1

(The scarce example is from the Communist Manifesto, another 19th century text). The remaining 17 cases cluster as follows: negative expression 5 little 4 almost + negative 3 hard/difficult/impossible 3 less and less 1 largely + negative 1

Thus only a quarter (26%) of the non-fiction examples are ‘expected’. Note, however, that the ‘unexpected’ translations are not random: they fall into four groups with only two singleton translations (of which less and less is closely akin to little, and largely + negative is similar to almost + negative). A set of patterns thus emerges from the non-fiction equivalents of kaum, though it is not the single pattern that the dictionary leads us to expect, and which we find in the fiction equivalents. Compare this with the results for contain and its French counterparts. A total of 295 examples were found, of which 171 (58%) were translated by the expected contenir. In the remaining 124 examples only two significant groupings emerged: 32 examples using figurer + preposition, and 9 using publier (all the latter were in United Nations documents, which suggests a house style). Some of the other renderings are given in Appendix 3: they contain a large number of singleton equivalents, including about 20 where there is no apparent expression corresponding to contain in the other language (41–8 are a small sample).

. Labelling the types of equivalence We thus need to distinguish a case like kaum → English, where the expected translation equivalent (at least in the non-fiction texts) is relatively rare, from a case like contain → French, where the expected equivalent may be relatively common but where there are many unique equivalents. One possible pair of labels would be to call kaum translationally ambiguous into English, but con-





Raphael Salkie

tain translationally vague into French. These labels capture one important part of the distinction: a translator who has to translate kaum into English is in a similar situation to a linguist who wants to state the meanings of an ambiguous word. The linguist is faced with multiple senses, and the translator has to consider multiple equivalents. A translator who is required to translate contain into French is like a linguist confronted by a vague word: in the monolingual case, just listing all the possible interpretations of the word that have been found so far in different contexts misses the point that it is the contexts rather than the word which are doing most of the work. In the translation situation, simply listing all the translation equivalents found so far misses the point that it is the translator who is doing most of the work by creating a new solution each time. I would argue, however, that these labels are not suitable because they suggest that the translational behaviour of kaum and contain is directly linked to their semantics: an ambiguous word might be thought to be translationally ambiguous into any L2, and a vague word might be thought to be translationally vague into any L2. These suggestions are not correct: the English word hard, for example is ambiguous between the senses ‘not easy’ and ‘not soft’. Whether hard is also translationally ambiguous depends on the L2: it is not translationally ambiguous into French, where dur has both senses, but it is translationally ambiguous into German, where schwer / schwierig correspond to the first sense and hart to the second. Similar counterexamples can easily be constructed for vagueness and translational vagueness. A better pair of labels emerges if we focus instead on the demands that kaum and contain make on translators. For a word like kaum, the strategy that skilled translators seem to have adopted can be stated informally like this: Don’t choose the expected equivalent, because the corpus shows that it is not very common. Instead, look at a wider range of equivalents in a bilingual dictionary based on a translation corpus.4

For a word like contain, on the other hand, the implicit strategy looks like this: Use the expected equivalent if you can. If you can’t, don’t bother to look at a wider range of equivalents in a bilingual dictionary, because they are unlikely to work in your specific context. Instead, invent a new equivalent of your own.

The two strategies are near the extreme ends of a spectrum of translation strategies. At one extreme are items which are always translated the same way: an example might be television → French. At the other end is the unlikely but logically possible case of items which have different equivalents each time they

Two types of translation equivalence

occur.5 Items at the former extreme can be called translationally systematic, while items with unpredictable equivalents can be called translationally unsystematic. Thus kaum → English is closer to the systematic end of this spectrum, while contain → French is nearer the unsystematic end. (The difference between the kaum and television cases is that television → French is simple while kaum → English is complex). To translate kaum successfully you need a good (= exhaustive) dictionary. To find an equivalent for contain a dictionary will help up to a point, but not as much as a good (= creative) translator.

. Reasons for the two types of equivalence In the case of kaum we seem to have learned something about the linguistic systems of English and German — information that could in principle be captured in an enriched bilingual dictionary which used corpus findings to capture lexical information about these two systems. With contain the emphasis is on unique creative solutions by translators, and this takes us away from the underlying systems and firmly into textual practice. It appears, then, that translational systematicity is best regarded as a relation between two linguistic systems, whereas translational unsystematicity is a relation between textual practice in two languages. There are, however, some reasons for doubt. Firstly, the distinction between translationally systematic and unsystematic items is to some extent an artefact of the size of the corpus. If we had a far larger translation corpus, some of the ‘unique’ solutions in the contain cases would no doubt occur more than once, in which case we could argue that consistent tendencies had emerged which could in principle be recorded in a bilingual dictionary. When we call an item translationally systematic, is that just a way of saying that we haven’t looked hard enough yet to find the system? Secondly, the distinction between ‘the underlying linguistic system’ and ‘textual practice’ is highly problematic when we look at translations. Since Saussure distinguished langue from parole it has been a fundamental principle of linguistics that these two domains are distinct. When a translation corpus reveals systematic differences in the textual practice of two languages, however, the question arises of where these differences are located. Does the textual difference that we have proposed between English and French reflect a difference in the underlying systems of the two languages? Either possible answer to this question leads to difficult problems. If the answer is yes, then we are committed





Raphael Salkie

to the principle that the underlying linguistic system of a language can include frequency rules which determine how often the resources in that system are used. Many linguists would feel uncomfortable with such a principle. If the answer is no, on the other hand, we have to find an alternative way to account for differences in textual practice between languages. It is not clear what this alternative way might be. Thirdly, the differences between the kaum case and the contain case are perhaps not as clear-cut as we have claimed so far. The two words occupy very different places in the linguistic systems of their respective languages. Kaum is a degree modifier, and so is its expected equivalent hardly, but what they can modify is not the same. This is not surprising: items in closed grammatical classes normally behave differently across languages. Contain, on the other hand, is a verb that enters into relations of hyponymy with other expressions: at the more general level we have the verb have (cf. example (34) in Appendix 3, where the French uses avoir), and at the more specific level there are cases where ‘X contains Y’ can be specified as ‘Y is published in X’ (cf. example (18) in Appendix 3): (33) Each document can [[contain]] one header, or one footer, or both. (34) Un document peut avoir un en-tête ou un pied de page ou les deux à la fois. [INSTRS\INSTR] (17) Mr.LABERGE (Canada), on behalf of the sponsors, who had been joined by Pakistan and Thailand, introduced the draft resolution [contained] in document A/C.3/43/L.77 entitled “Human rights and mass exoduses”. (18) M. LABERGE (Canada) présente au nom des auteurs, auxquels se sont joints le Pakistan et la Thaïlande, le projet de résolution publié sous la cote A/C.3/43/L.77, intitulé “Droits de l’homme et exodes massifs”. [INTORGS\UN2]

These differences between kaum and contain clearly influence the type and amount of creativity that translators deploy when they deal with these words. Here we see that the underlying systems of different languages influence textual practice.

5. Understanding translators’ resourcefulness The two types of equivalence identified here thus raise difficult conceptual issues. If we are to gain insight into the issues, two types of further research are

Two types of translation equivalence

necessary. Firstly, we need larger translation corpora and more studies which try to systematise the data that they produce. Different types of translation equivalence, of the kind discussed here, have not been identified in the past because the data was not available. As we remarked earlier, large and representative corpora are necessary if we hope to find significant linguistic patterns in them. As empirical work of this kind with translation corpora progresses, the distinction between translational systematicity and translational complexity may become clearer or turn out to be illusory; and other types of translation equivalence may emerge. Secondly, we need conceptual clarification. The following remarks are intended to be a small step in this direction. In both kaum-type examples and contain-type examples, the translator has departed much of the time from the most direct translation and has been ‘resourceful’ in finding an equivalent which works for this text in this context. The skill involved in creating a good translation is partly linguistic and partly literary — in this respect, translating is like any type of writing. For contrastive linguistics the literary dimension is not a primary concern: our task is to find linguistic patterns and explain them. Is there, though, a way of isolating the linguistic part of translators’ resourcefulness? Here I think that the notion of modulation is helpful. Vinay and Darbelnet (1958: 51) define this term as a change in the point of view from which a situation is regarded. This is a rather vague definition, and the examples given by Vinay and Darbelnet and others such as Chuquet and Paillard (1987: 26–38) and Van Hoof (1989:126–130) cover a very wide area. I think that it is nonetheless an accurate description of much translational resourcefulness, however. Consider again the German word kaum. If we were to attempt to specify the sense of this word, it would be something like ‘zero plus a small increment on some scale’. If we now outline the sense of the various English equivalents of kaum, we get something like this: zero (38 in Appendix 2) small quantity (48–54) negative (38–46) almost zero (56–58) almost not (60) mostly not (70)

These are modulations in the sense used here: different ways of viewing the same situation. In some cases the meaning is arguably identical to the meaning of kaum: a small quantity of something is the same as ‘zero plus a small incre-





Raphael Salkie

ment’. In other cases the meaning is not identical: ‘zero’ is not the same as ‘zero plus a small increment’. If we can compare all the translations of kaum in this way, and then look at the same conceptual area with other language-pairs, we can start to get a picture of how many ways this same situation can be viewed. We will be compiling a kind of multilingual thesaurus, where a large number of ways of representing the same concept is displayed. In another paper I have outlined a practical framework for compiling such a multilingual thesaurus (cf. Salkie 1999). For the issues raised in this paper, taking modulation as a starting point has two advantages. Firstly, the notion of modulation is located conveniently in between linguistic systems and textual practice. In our analysis of kaum we are not just talking about semantic equivalents of kaum in other languages, which would be a comparison of linguistic systems. Nor are we simply talking about textual creativity in two languages, which would be a matter of textual practice. We are talking about different ways of viewing the same situation, which is partly a semantic matter but is also partly textual and stylistic. The semantic part probably involves some process of semantic decomposition as a result of which the ‘point of view’ is shifted. We need to collect evidence about equivalents of kaum in various languages, and we will then be able to draw links between systemic constraints and textual creativity. In the case of English we will be able to explain more accurately the avoidance of hardly which is evident in the (non-fiction) data: we will have a basis for conceiving of the linguistic system and its textual realisation as separate but related. Secondly, modulation falls just on the right side of the line which separates linguistic resourcefulness from literary resourcefulness. It is ‘creative’, but not to such an extent that it cannot be systematised. Translations where the source text and target text diverge more radically than modulation are on the other side of the line, and can be ignored by contrastive linguists. Thus Vinay and Darbelnet’s most radical translation strategy — what they call adaptation – involves replacing the situation referred to in the source text by a new situation in the target text (1958: 52–4). As Chuquet and Paillard (1987: 10) note, this goes beyond the kind of phenomenon which a linguistic approach to translation should aim to analyse. With modulation, on the other hand, at least we are dealing with the same situation. It remains to be seen, of course, whether these conceptual distinctions can be maintained in the light of data from large translation corpora, but at least they offer a starting point. Consider now the equivalents of contain. The question once again is whether any particular translation into English is just a case of modulation, or

Two types of translation equivalence

whether a different situation is being represented. In example (25–6) from Appendix 3, where contain corresponds to a construction with reposer, it seems clear that the English and French sentences are not semantically equivalent, but this is nonetheless an instance of modulation: the situation is being described from a different point of view (arguably the French sentence is also more specific than the English): (25) The big kitchen table was covered with wicker baskets [[containing]] the dough. (26) La grande table de la cuisine était couverte de panetons d’osier où reposait la pâte. [FICTION\FRENCH]

Compare this with example (45–6): (45) Amongst the number of letters we found waiting for us at Naples was one [[containing]] an unexpected piece of information — a chair at the College de France had fallen vacant and my name had been several times mentioned in connexion with it; (46) Dans l’important courrier qui, depuis longtemps, nous attendait à Naples, une lettre m’apprenait brusquement que, se trouvant vacante une chaire au Collège de France, mon nom avait été plusieurs fois prononcé; [FICTION\GIDE]

Here we might be reluctant to accept that this is modulation: to ‘contain some information’ is not the same situation as ‘to tell someone something’. They are very close, however, and in context perhaps they are ‘the same’ in the relevant respect. Finally, consider example (47–8): (47) Apart from this, although the CGT and CTC did not have problems similar to those [[contained]] in the CSTC complaint, it was pointed out, at the meeting with UTC officials, that this organisation had also lost several trade unionists who had been murdered or disappeared. (48) Cela mis à part, alors que la CGT et la CTC ne connaissaient pas des problèmes similaires à ceux qui sont dénoncés dans la plainte de la CSTC, les dirigeants de l’UTC ont indiqué que cette organisation avait également perdu divers syndicalistes qui étaient morts ou avaient disparu. [INTORGS\ILO]

In this case we would probably be even less willing to accept that the words contained and dénoncés are modulations of each other: but taking the sentences as





Raphael Salkie

a whole, perhaps this continues to be an instance of modulation. By distinguishing modulation from other cases we thus have a criterion for deciding which translations should form raw material for our multilingual thesaurus and which are too divergent and idiosyncratic. The criterion is not always easy to apply, as the contain examples show, but it offers a starting point for delimiting and organising the types of translation equivalence where we are likely to find systematic regularities.6

6. Conclusion The problems of translation equivalence, and of langue versus parole — the underlying system of a language as opposed to the use of this system in texts — have usually been seen as conceptual ones. One of the benefits of using translation corpora as sources of data is that they bring a new empirical dimension to bear on these problems. Ideally the findings from such corpora can clarify the conceptual aspects of the problem; at the same time, this conceptual clarity will bring new vigour to our empirical work. In the process, contrastive linguistics will be able to make distinctive contributions to translation theory and to linguistic theory in general.

Notes * Constructive criticism by the editors of this volume improved this paper a great deal, and I thank them both. They are not responsible for any remaining defects.  The notion ‘more frequently than we would expect’ is usually itself based on data from a corpus. Normally we take a large, representative corpus as the standard against which we compare a new corpus. We might, for instance, take the British National Corpus (BNC) as such a standard for English. If we then take a new corpus we would compare normed word frequencies in our corpus with the frequencies of the same words in the BNC. Any statistically significant differences would be of interest to researchers. . With association patterns we base the expected frequency of co-occurrence on the frequency of the two items in the corpus as a whole. If one word in every 12 in a corpus is the word the, then we would ‘expect’ the to collocate with any word w once for every 12 instances of w. If we discover that the collocates much more often than this with w, then we have found a linguistically interesting association pattern between the and w. . The choice of the word contain was prompted by the interesting discussion of the semantic field of inclusion in Chesterman (1998). .

No such dictionary exists at the moment, so this is a pious hope rather than a practical

Two types of translation equivalence

strategy. In the meantime a translator can be advised to consult a translation corpus directly, or to build a corpus using translation memory software such as TRADOS. . Something approaching this situation occurs in the translation of poetry, where the need to find a rhyme with a neighbouring word can be the prime consideration. Gutt (1991:106–7) discusses this poem by Morgenstern: Ein Wiesel sass auf einem Kiesel inmitten Bachgeriesel. Das raffinierte Tier tat’s um des Reimes Willen. A weasel sat on a pebble in the middle of a ripple of a brook The shrewd animal did it for the sake of the rhyme Gutt suggests various English translations, for example: a weasel perched on an easel; a ferret nibbling a carrot; a mink sipping a drink; a hyena playing a concertina; a lizard shaking its gizzard. Thus the translations of Wiesel and Kiesel can be regarded as almost completely unpredictable in this text. Whether this still counts as ‘translation’ is, of course, debatable. .

For more discussion of modulation, see Salkie (to appear).

References Biber, D., S. Conrad & R. Reppen. 1998. Corpus linguistics. Cambridge: Cambridge University Press. Chesterman, A. 1998. Contrastive functional analysis. Amsterdam: John Benjamins. Chuquet, H. & M. Paillard. 1987. Approche linguistique des problèmes de traduction. Gap: Ophrys. Gutt, E.-A. 1991. Translation and relevance. Oxford: Blackwell. Salkie, R. 1997. “INTERSECT: parallel corpora and contrastive linguistics”. Contragram Newsletter 11 (Oct 1997), 6–9. Available on the Web: http://bank.rug.ac.be/contragram/newsle11.html#INTERSECT Salkie, R. 1999. “How can linguists profit from parallel corpora?” Paper presented at the Parallel Corpus Symposium, University of Uppsala, April 1999. (To appear in the proceedings). Salkie, R. 2000. “Quelques questions méthodologiques dans l’exploitation des corpus multilingues”. In Corpus: méthodologie et applications linguistiques, M. Bilger (ed), 180–195. Paris: Champion. Salkie, R. To appear. “A new look at modulation”. In Proceedings of Maastricht Conference on





Raphael Salkie

Translation and Meaning, April 2000, M. Thelen (ed.). Van Hoof, H. 1989. Traduire l’anglais: théorie et pratique. Louvain-la-Neuve: Duculot. Vinay, J.-P. & J. Darbelnet. 1958. Stylistique comparée du français et de l’anglais. Paris: Didier.

Appendix 1: Corpus texts Folder

File

Details

Bible

Bible

Extracts from Genesis, Exodus and Psalms.

Canhans (Canadian Hansard)

Hans1 Hans2

Extracts from Canadian Hansard (Reports of proceedings in the Canadian Parliament).

Fiction

Celine English

Céline, Voyage au bout de la nuit. Extracts from B. Stoker, Dracula and H.G. Wells, The Invisible Man Extracts from J. Verne, Le tour du monde en quatre vingt jours; S. Germain, Jours de colère; A. de Saint-Exupéry, Le petit prince; A. Camus, L’hôte. Gide, L’immoraliste. Extracts from stories by Malraux and Maupassant. Extracts from Robbe-Grillet, La jalousie. Extracts from Sartre, La Nausée.

French–English

French

Gide Malmau Robbeg Sartre Instrs (Instructions)

Instr

Intorgs (international Esprit organisations)

ILO

Maast UN1 UN2 Misc

Bankcan Canfgnp

Instructions for Xerox ScanWorx User Manual and various domestic appliances: Braun MultiPractic deluxe food processor; Fisher-Price All-in-one Kitchen Centre; Concertmate-750 keyboard; Sony Radio Cassette-Corder. EU document: Proposal For A Council Decision Adopting The First European Strategic Programme For Research And Development In Information Technology (Esprit) International Labour Organisation. Reports of the Committee on Freedom of Association, 246th Report. EU document: Maastricht Treaty United Nations: Report on committee meeting United Nations: Report on committee meeting Royal Bank of Canada newsletter Reports of the Joint Canadian House of Commons/Senate Special Committee on Canadian Foreign Policy.

Two types of translation equivalence

Folder

File

Details

Canlib Cfengin cffores cfnews cfpdiss franinfo

Canadian National Library newsletter Canadian armed forces discussion documents Canadian forestry information More Canadian armed forces discussion documents More Canadian foreign policy discussion documents Information about France from the French embassy in London

French–English

News

LM92/GW92 Extracts from Le Monde 1992 and their translation in Guardian Weekly. LM93/GW93 Extracts from Le Monde 1993 and their translation in Guardian Weekly.

Sci-techBible

ISO Pasteur telecom

Information from the International Organization for Standardization website. Information from the Institut Pasteur website. Extracts from the International Telecommunication Union CCITT Blue Book SECTION 10

German–English Comps (Company information)

Hoechst Deutel Siemens

Information from Hoechst website Information from Deutsche Telecom website Information from Siemens website

Fiction

Gerfict

Büchner, Lenz & Leonce und Lena; Kafka, Die Verwandlung. Dickens, A Christmas Carol

Dickens Intorgs (international Esprit organisations)

UN

EU document: Proposal For A Council Decision Adopting The First European Strategic Programme For Research And Development In Information Technology (Esprit) United Nations documents: General Assembly Resolutions

Misc

Sapman

Manual for employees of SAP (German translation and localisation company) in use of software

News

Newap96

Short news items from the “German News” website, April 1996.

Politics

Consts Herzog Manifest

Constitutions of the FRG, Austria and Switzerland. Speeches by Roman Herzog, President of the FRG. Marx-Engels, Communist Manifesto.





Raphael Salkie

Appendix 2: Equivalents of kaum in English Fiction Hardly (a selection from 18 examples) (1) [[Kaum]] hatte sie sich umgedreht, zog sich schon Gregor unter dem Kanapee hervor und streckte und blähte sich. (2) Hardly had she turned her back when Gregor came from under the sofa and stretched and pulled himself out. [FICTION\GERFICT] (3) Obgleich schon so ziemlich an gespenstische Gesellschaft gewöhnt, bangte Scrooge vor der stummen Erscheinung doch so sehr, daß seine Knie wankten und er [[kaum]] noch stehen konnte, als er sich ihr zu folgen bereit machte. (4) Although well used to ghostly company by this time, Scrooge feared the silent shape so much that his legs trembled beneath him, and he found that he could hardly stand when he prepared to follow it. [FICTION\DICKENS] Scarcely (a selection from 10 examples) (5) Nicht doch, meine Liebe, die Blumen sind ja [[kaum]] welk, die ich zum Abschied brach, als wir aus dem Garten gingen. (6) LENA: Not so, my dear, these flowers, which I picked in parting as we left the gardens, are scarcely wilted. [FICTION\GERFICT] (7) Als Scrooge wieder erwachte, war es so finster, daß er das Fenster [[kaum]] von den Wänden seines Zimmers unterscheiden konnte. (8) When Scrooge awoke, it was so dark, that looking out of bed, he could scarcely distinguish the transparent window from the opaque walls of his chamber. [FICTION\DICKENS] Barely (a selection from 4 examples) (9) Immerfort nur auf rasches Kriechen bedacht, achtete er [[kaum]] da auf, daß kein Wort, kein Ausruf seiner Familie ihn störte. (10) Intent on crawling as fast as possible, he barely noticed that not a single word, not an ejaculation from his family, interfered with his progress. [FICTION\GERFICT] (11) Er gab dem Löschhut einen letzten Druck und fand [[kaum]] Zeit, in das Bett zu wanken, bevor er in tiefen Schlaf sank. (12) He gave the cap a parting squeeze, in which his hand relaxed; and had barely time to reel to bed, before he sank into a heavy sleep. [FICTION\DICKENS] A selection of other translations (13) [[Kaum]] zu glauben, wie rasch und munter die beiden Jungen darangingen. (14) You wouldn’t believe how those two fellows went at it! [FICTION\DICKENS] (15) Man versuche es einmal und senke sich in das Leben des Geringsten und gebe es wieder in den Zuckungen, den Andeutungen, dem ganzen feinen, [[kaum]] bemerkten Mienenspiel…

Two types of translation equivalence

(16) People should try to plunge themselves into real life and to reproduce it in the tiny movements, the little hints, and in the fine, almost imperceptible play of features. [FICTION\GERFICT] (17) Obgleich sie die Schule [[kaum]] einen Augenblick hinter sich gelassen hatten, befanden sie sich doch plötzlich mitten in den lebendigsten Straßen der Stadt … (18) Although they had but that moment left the school behind them, they were now in the busy thoroughfares of a city… [FICTION\DICKENS] (19) Aber [[kaum]] war er wieder heraus, als er, obgleich noch keine Tänzer dastanden, wieder aufzuspielen begann, … (20) But scorning rest, upon his reappearance, he instantly began again, though there were no dancers yet… [FICTION\DICKENS] (21) Nun aber warteten oft beide, der Vater und die Mutter, vor Gregors Zimmer, während die Schwester dort aufräumte, und [[kaum]] war sie herausgekommen, mußte sie ganz genau erzählen, wie es in dem Zimmer aussah … (22) But now, both of them often waited outside the door, his father and his mother, while his sister tidied his room, and as soon as she came out she had to tell them exactly how things were in the room … [FICTION\GERFICT] (23) Und [[kaum]] hatten die Frauen mit dem Kasten, an den sie sich ächzend drückten, das Zimmer verlassen, als Gregor den Kopf unter dem Kanapee hervorstieß, um zu sehen, wie er vorsichtig und möglichst rücksichtsvoll eingreifen könnte…. (24) As soon as the two women had got the chest out of his room, groaning as they pushed it, Gregor stuck his head out from under the sofa to see how he might intervene as kindly and cautiously as possible. [FICTION\GERFICT]

Non-fiction Hardly (5 examples) (25) [[Kaum]] einer wisse, wofür die SPD stehe und wogegen sie sei. (26) He said that hardly anyone knows what the SPD stands for and what it is against. [NEWS\NEWAP96] (27) Allerdings dürfte der Mannschaftskapitaen [[kaum]] von Anfang an aufgeboten werden. (28) To be sure, the team captain could hardly be mobilized at once. [NEWS\NEWAP96] (29) Die Risiken, mit denen wir es heute zu tun haben, sind [[kaum]] geringer. (30) The risks confronting us today are hardly of lesser magnitude. [POLITICS\HERZOG] (31) Man wird der deutschen Öffentlichkeit wohl [[kaum]] Unrecht tun, wenn man behauptet, daß zu viele bei der Nennung des Wortes “Islam” vor allem Begriffe wie “inhumanes Strafrecht” … assoziieren. (32) It would hardly be doing the German public an injustice to claim that too many of





Raphael Salkie

us mainly associate terms such as “inhumane penal law” … with the word “Islam”. [POLITICS\HERZOG] (33) Die Schülerzahlen stiegen in den alten Bundesländern um 2.5 Prozent, dieser Wert sei aber bei der Schaffung neuer Planstellen [[kaum]] berücksichtigt worden. (34) The number of pupils increased by 2.5 per cent, but this hardly had been taken into account for the establishment of new posts. [NEWS\NEWAP96] Scarce (1 example) (35) Die Bourgeoisie hat in ihrer [[kaum]] hundertjährigen Klassenherrschaft massenhaftere und kolossalere Produktionskräfte geschaffen als alle vergangenen Generationen zusammen. (36) The bourgeoisie, during its rule of scarce one hundred years, has created more massive and more colossal productive forces than have all preceding generations together. [POLITICS\MANIF] Negative expression (5 examples) (37) Dies bedeutet mit anderen Worten, dass zwar [[kaum]] Zweifel hinsichtlich der strategischen Bedeutung der fünf festgestellten breiten Bereiche und bezüglich des Umfangs der Gesamtanstrengungen bestehen, die in den nächsten zehn Jahren erforderlich sind, um mit den Wettbewerbern gleichzuziehen, dass aber für die detaillierten FuE-Ziele … [INTORGS\ESPRIT] (38) In other words whereas there are no doubts about the strategic importance for the next 10 years of the five broad areas identified and of the size of the overall effort necessary to catch up with the competition, the detailed R & D objectives … (39) Streitkräfte dieser Art werden in absehbarer Zeit [[kaum]] zur Verfügung stehen. (40) Such forces are not likely to be available for some time to come. [INTORGS\UN] (41) Wenn die Bonner Pläne wahr gemacht würden drohe ein [[kaum]] wiedergutzumachender Schaden, schreibt Schulte in einem Brief an den Cher der Koalitionsfraktion im Bundestag, Schäuble. (42) In a letter to Mr. Schäuble, head of the coalition’s parliamentary group, Schulte writes that the realization of the government plans contains the risk of irreparable damage. [NEWS\NEWAP96] (43) Schwierigkeiten machen vor allem zwei Phosphatersatzstoffe in Waschmitteln und aus Papierfabriken, die biologisch [[kaum]] abbaubar sind. (44) Two phosphate substitutes in particular were problematic because they are not bio-degradable. These substitutes are two laundry detergent ingredients and pulp mill by-products. [NEWS\NEWAP96] (45) Der Steuerzahlerbund erwartet 1996 [[kaum]] Entlastungen bei den Abgaben. (46) The union of the tax payers does not expect a reduction of taxes for the year 1996. [NEWS\NEWAP96] Little (4 examples) (47) Die vorgesehene Lockerung des Kündigungsschutzes wirke sich in der Metall- und Elektroindustrie [[kaum]] aus.

Two types of translation equivalence

(48) The planned relaxation of laws regarding layoffs will likely have little effect in the steel and electronics industries. [NEWS\NEWAP96] (49) Das Bruttoinlandsprodukt lag damit [[kaum]] noch über dem Vorjahreswert. (50) The gross national product therfore was only little higher than last year. [NEWS\NEWAP96] (51) Er sei Ausdruck reiner Machtpolitik und habe mit den religiösen Grundlagen [[kaum]] etwas gemein. (52) Fundamentalism was an expression of power politics and had little in common with religious fundamentals. [NEWS\NEWAP96] (53) Danach gibt es für die Arbeitslosen in Deutschland [[kaum]] Aussicht auf Besserung. (54) According to them, there is only little positive prospect for unemployed people in Germany. [NEWS\NEWAP96] Almost + negative (3 examples) (55) Die Belastung des Abwassers mit Schwermetallen hat ein [[kaum]] noch nennenswertes Niveau erreicht. (56) The contamination of the wastewater with heavy metals has fallen to an almost insignificant level. [COMPS\HOECHST] (57) Sie haben [[kaum]] eine Chance, ein menschenwürdiges Leben zu führen. (58) These people have almost no chance of a life in dignity. [POLITICS\HERZOG] (59) [[Kaum]] ein Unterzeichner des Pamphlets habe je ein Buch von Annemarie Schimmel gelesen. (60) Almost none of the persons who signed the letter ever read a book of hers, he added. [NEWS\NEWAP96] Hard/difficult/seems impossible (3 examples) (61) Haemischer Kommentar von SPD-Fraktionsvize Wolfgang Thierse: “Die Politik schwächt eben so ungemein, dass man sich [[kaum]] auf den Beinen halten kann.” (62) SPD-faction vice-president Wolfgang Thierse sneered: “Politics can debilitate so much that it is hard to keep going.” [NEWS\NEWAP96] (63) Vor diesem Hintergrund ist ein effektives Gebäudemanagement ohne DVUnterstützung [[kaum]] noch vorstellbar. (64) Against this backdrop, it is difficult to imagine efficient building management without computer support. [COMPS\SIEMENS] (65) Nach dem “nein” aus Kiel ist die erforderliche 2/3-Mehrheit im Bundesrat [[kaum]] noch zu erreichen, da auch die Länder Hessen, Berlin, NordrheinWestfalen und Sachsen-Anhalt mit “nein” stimmen oder sich der Stimme enthalten wollen. (66) After the “no” coming from Kiel, the required two-thirds majority seems impossible to achieve in the Bundesrat, as Hesse, Berlin, North Rhine-Westphalia and Saxony Anhalt also want to vote “no” or abstain. [NEWS\NEWAP96]





Raphael Salkie

Less and less (1 example) (67) In der Tat legt sich diese Bundesregierung [[kaum]] mehr für die ärmsten in der Gesellschaft ins Zeug. Aber verursacht hat sie die Konjunkturflaute nicht”, stellt die RHEIN-NECKAR ZEITUNG fest. (68) Government indeed is less and less interested in backing the poorest in society, but that hasn’t caused the recession.” [NEWS\NEWAP96] Largely + negative (1 example) (69) Manche nehmen die ermutigenden Tendenzen und Fortschritte aber auch einfach nicht zur Kenntnis, weil Erfolge, so spektakulär sie auch sein mögen, weniger dramatische Bilder abgeben als Katastrophen und deshalb von den Medien [[kaum]] beachtet und berichtet werden. (70) Some people simply fail to appreciate encouraging trends and progress because success stories, no matter how spectacular, do not provide such dramatic pictures as disasters and are therefore largely ignored by the media. [POLITICS\HERZOG]

Appendix 3: Equivalents of contain in French Contenir (a selection from 171 examples) (1) “And my baggage [[contains]] apparatus and appliances.” (2) - Et mes bagages contiennent des appareils, un matériel. [FICTION\ENGLISH] (3) Unconverted document file [[containing]] formatting information. (4) Fichier de document non converti contenant des informations de formatage. [INSTRS\INSTR] (5) In 1984, the database [[contained]] close to three million bibliographic records, and was growing at an annual rate of 400 000 records. (6) En 1984, la base de données contenait près de 3 millions de notices bibliographiques et augmentait de quelque 400 000 notices par an. [MISC\CANLIB] Figurer (a selection from 32 examples) (7) - No “WRU” signals should be [[contained]] within the pre-recorded message up to the last code expression CI (8) - Aucun signal «WRU» ne doit figurer dans le message préenregistré jusqu’à la dernière expression de code CI. [SCI-TECH\TELECOM] (9) The complaint presented by the Central Organisation of Workers (CGT) is [[contained]] in a communication dated 30 May 1985. (10) La plainte figure dans une communication de la Centrale générale des travailleurs (CGT) du 30 mai 1985. [INTORGS\ILO] (11) These comments, [[containing]] important information on the situation, were made by those we interviewed and I have done my utmost to transcribe them as faithfully as possible.

Two types of translation equivalence

(12) Ces commentaires, parmi lesquels figurent des informations importantes sur la situation, ressortissent entièrement à la responsabilité des personnes rencontrées et je me suis efforcé d’en rendre compte aussi fidèlement que possible. [INTORGS\ILO] (13) Immediately after the Committee’s consideration of the case the Government’s reply [[contained]] in a communication dated 12 May 1986 was received. (14) Immédiatement après avoir examiné le cas, le comité a reçu la réponse du gouvernement qui figurait dans une communication datée du 12 mai 1986. [INTORGS\ILO] (15) In that connection, he drew attention to the relevant explanations [[contained]] in paragraphs 60 to 65 of document E/CN.4/1988/24 and in paragraph 60 of the interim report. (16) A cet égard, M. Pohl renvoie la Commission aux explications figurant dans les paragraphes 60 à 65 du document E/CN.4/1988/24 et dans le paragraphe 60 du rapport intérimaire. [INTORGS\UN1] Publier (a selection from 9 examples) (17) Mr.LABERGE (Canada), on behalf of the sponsors, who had been joined by Pakistan and Thailand, introduced the draft resolution [contained] in document A/C.3/43/L.77 entitled “Human rights and mass exoduses”. (18) M. LABERGE (Canada) présente au nom des auteurs, auxquels se sont joints le Pakistan et la Thaïlande, le projet de résolution publié sous la cote A/C.3/43/L.77, intitulé “Droits de l’homme et exodes massifs”. [INTORGS\UN2] A selection of other translations (about 50 types) (19) It [[contains]] in all some twenty acres, quite surrounded by the solid stone wall above mentioned. (20) Il comprend quelque vingt âcres de terres entièrement ceintes, comme je l’ai dit, par un solide mur de pierres. [FICTION\ENGLISH] (21) I confessed that … her country terrified me quite definitely more than the whole sum total of threats, actual, hidden and unforeseen which I found it [[contained]]… (22) et quant à son pays il m’épouvantait tout bonnement plus que tout l’ensemble de menaces directes, occultes et imprévisibles que j’y trouvais… [FICTION\CELINE] (23) … He observed that the butchers stalls [[contained]] neither mutton, goat, nor pork… (24) Il avait bien remarqué que moutons, chèvres ou porcs, manquaient absolument aux étalages des bouchers indigènes… [FICTION\FRENCH] (25) The big kitchen table was covered with wicker baskets [[containing]] the dough. (26) La grande table de la cuisine était couverte de panetons d’osier où reposait la pâte. [FICTION\FRENCH] (27) These works, and the pleasure they [[contain]], can be “learned” like a foreign language …





Raphael Salkie

(28) Ces oeuvres, et le plaisir qu’elles apportent, peuvent être “apprises” comme une langue étrangère … [FICTION\MALMAU] (29) In order to enjoy the features and functions of this unit to their fullest, be sure to carefully read this manual and follow the instructions [[contained]] herein.. (30) Afin d’apprécier au mieux les fonctions et les caractéristiques de cet instrument, lisez attentivement ce manuel et suivez les instructions y inclues. [INSTRS\INSTR] (31) The preview window [[contains]] pause options at the top of the window. (32) Le haut de la fenêtre de visualisation comporte des options de pause. [INSTRS\INSTR] (33) Each document can [[contain]] one header, or one footer, or both. (34) Un document peut avoir un en-tête ou un pied de page ou les deux à la fois. [INSTRS\INSTR] (35) It is simply a cache in the department of printed books, to which only our librarians have a key, and which [[contains]] a number of books which although extremely evil, are sometimes very precious to bibliophiles and have a high market value. (36) C’est tout simplement une cachette du département des imprimés dont les conservateurs ont seuls la clef et dans laquelle on enferme certains livres fort mauvais, mais quelquefois très précieux pour les bibliophiles, et de grande valeur vénale. [NEWS\GW92] (37) … General Assembly resolution 2248(S-V), which [[contained]] the political mandate and framework for the activities of the United Nations Council for Namibia. (38) … la résolution 2248(S-V) de l’Assemblée générale, qui définit le mandat du Conseil des Nations Unies pour la Namibie et le cadre politique de ses activités. [INTORGS\UN1] (39) … the committee would like to recall the principle [[contained]] in the Workers’ Representatives Recommendation … (40) … le comité tient à rappeler le principe énoncé dans la recommandation … [INTORGS\ILO] (41) When asked if she had the letters [[containing]] the death threats she had received, she replied … (42) A la question de savoir si elle possédait les lettre de menaces de mort qu’elle avait reçues, Mme Avella a répondu … [INTORGS\ILO] (43) …he had not hidden the fact that Chile wished to keep those territories because of the wealth they [[contained]]. (44) … il n’a pas caché que si le Chili tenait à garder ces territoires, c’était en raison de leur richesse. [INTORGS\UN1] (45) Amongst the number of letters we found waiting for us at Naples was one [[containing]] an unexpected piece of information — a chair at the College de France

Two types of translation equivalence

had fallen vacant and my name had been several times mentioned in connexion with it; (46) Dans l’important courrier qui, depuis longtemps, nous attendait à Naples, une lettre m’apprenait brusquement que, se trouvant vacante une chaire au Collège de France, mon nom avait été plusieurs fois prononcé. [FICTION\GIDE] (47) Apart from this, although the CGT and CTC did not have problems similar to those [[contained]] in the CSTC complaint, it was pointed out, at the meeting with UTC officials, that this organisation had also lost several trade unionists who had been murdered or disappeared. (48) Cela mis à part, alors que la CGT et la CTC ne connaissaient pas des problèmes similaires à ceux qui sont dénoncés dans la plainte de la CSTC, les dirigeants de l’UTC ont indiqué que cette organisation avait également perdu divers syndicalistes qui étaient morts ou avaient disparu. [INTORGS\ILO]



Functionally complete units of meaning across English and Italian Towards a corpus-driven approach Elena Tognini Bonelli If meaning is function in context, as Firth used to put it, then equivalence of meaning is equivalence of function in context. What the translator is doing when translating or interpreting is taking decisions all the time about what is the relevant context within which this functional equivalence is being established. (Halliday 1992a: 16)

.

Introduction

This study addresses the issue of comparing words and expressions across languages and proposes an approach where meaning — whether denotational and/or connotational and/or pragmatic — is seen as encoded by and intertwined with formal lexico-grammatical realisations in the verbal context. Starting from such a perspective it would not make sense to identify a certain function in a language solely from a grammatical or lexical point of view and expect an equivalent grammatical or lexical match in another language. It is argued that whether the starting point is lexical or grammatical, an analyst sensitive to the cumulative effect of usage (what Firth called “repeated language events”) will be led by the evidence to identify multiword lexico-grammatical items that operate within well-defined semantic platforms and perform specific functions at the pragmatic level. If we consider the comparative angle it is proposed that these multiword units only become available for comparison across languages or translation when they are “functionally complete” (Tognini Bonelli 1996a), that is when all the components that are necessary for the unit to function have been identi-



Elena Tognini Bonelli

fied. This study will try to demonstrate that this is possible and, indeed, the only way forward. The approach adopted here is a step towards that which has been referred to as “corpus-driven” (Tognini Bonelli 2001) and I will start by identifying the tenets of such an approach (Section 2) and differentiating it from a more traditional “corpus-based” approach with a view to outlining the implications for language description in general and contrastive linguistics and translation in particular.1 I will then (Section 3) go on to define and exemplify what I mean by “functionally complete units of meaning”, which I take to be the minimal currency units when comparing languages. In Section 4 I will discuss the implications for translation and contrastive linguistics. In Section 5 I will illustrate the approach comparing a given function — and its formal realisations — across two languages.

. The corpus-driven approach It should be noted that general work which makes use of a corpus as evidence for language description is usually referred to as corpus-based. I use this term in a more restricted sense to refer specifically to work where the corpus is used mainly to expound on, or exemplify, existing theories, that is theories which were not necessarily derived with initial reference to a corpus. It is important to note that although the evidence of the corpus may indeed seem to support, at least partially, a pre-existing non-corpus-derived theoretical statement, the corpus-based approach does not really go as far as querying traditional units of investigation which are taken as given, even though they could be questioned in the light of corpus evidence. Traditional distinctions such as the one between lexis and grammar are taken for granted and so this type of corpus-based investigation is usually happy to go along with distinctions between lexicons dealing with lexical units (usually words) on the one hand and grammars (studying grammatical frames) on the other.2 This approach does not allow for the fact that the enormous amount of evidence now available is bound to challenge language description and offer fascinating new insights into language (Sinclair 1991:4). To start, therefore, with units derived from traditional descriptions, often based on very little evidence3, is not only not sufficient anymore, it is dangerous. Perhaps the most important change brought about by corpus work — and one which is not recognised by the corpus-based approach — is a change in the

Functionally complete units of meaning

unit of currency, that is in the unit of linguistic investigation. The traditional water-tight separation between lexis and grammar does not really hold in the light of corpus evidence. This is why the linguist deciding to investigate corpus evidence with an open mind, rather than pre-set beliefs, will accept that even the units of investigation will have to be re-defined and s/he will have to come to terms with units which are neither fully grammatical nor purely lexical, but a mixture of the two.4 This type of unit is not the type that has been studied and analysed in traditional grammar books, nor listed in traditional dictionaries. The reason why we can now call it a unit, and indeed adopt it as the new currency unit, is that the interrelation between the lexical and the grammatical elements in it are so strong and systematic that they cannot be ignored anymore. Frequency distributions and patterns of co-selection determine the size and shape of the unit. But a new approach is needed to account for this new unit. The corpus-driven approach (discussed in some detail in Tognini Bonelli 1996b and 2001; see also Hunston and Francis 2000 for a corpus-driven approach to grammar), in contrast to the corpus-based approach, constitutes a methodology that uses a corpus beyond the selection of examples to support linguistic argument or to validate a theoretical statement. The commitment of the scholar is to the integrity of the data as a whole, and descriptions aim to be comprehensive, rather than selective, with respect to the corpus evidence for a particular topic of research. Here the corpus is not used just as a repository of examples to back pre-defined theories. The theoretical statements, as well as the comments or recommendations made, arise directly from, and reflect, the evidence provided by the corpus. The new unit of description can be safely posited and explored in this framework. Linguistic description is arrived at, step by step, from the observation of language usage; recurrent language events and frequency distributions are expected to form the basis of linguistic categories; the absence of a pattern is considered potentially meaningful. Of course, many issues and queries related to the corpus itself become relevant when one adopts a corpus-driven approach. The representativeness of the corpus has to be assessed, and so should the sampling criteria used in the creation of the corpus. Now that corpora containing hundreds of millions of words are available, even the question of what corpus size is adequate, and for what type of enquiry, should be addressed. Indeed, as Halliday points out, the corpus should be seen as “a theoretical construct” (1992b) because what may seem to be just evidence, and more evidence, contains the parameters of a very specific view of language. Querying pre-defined theoretical statements does not mean to say that the





Elena Tognini Bonelli

activity of analysing corpus evidence should be a-theoretical. The initial assumptions of the enquiry, though, should always be made clear and, above all, should be testable against the evidence of the corpus. The sections below will try to exemplify the corpus-driven approach and, in particular, explore a methodology for the identification of what I have called the new currency unit in linguistic description and, more particularly, the description and identification of equivalent units across languages.

. The new currency: Functionally complete units of meaning The central proposal of the theory is (…) to split up meaning or function into a series of component functions. Each function will be defined as the use of some language form or element in relation to some context. (Firth, A Synopsis of Linguistic Theory, 1968: 173)

The initial assumption here relates to a view of form and meaning as strictly and systematically interconnected, indeed as two aspects of the same phenomenon: language seen as function in context. This view of language, originally proposed by J. R. Firth, is adopted as a fundamental tenet by Sinclair (1991: 7), who, reporting on corpus work of the 80s, explains: Soon it was realised that form could actually be a determiner of meaning, and a causal connection was postulated, inviting arguments from form to meaning. Then a conceptual adjustment was made, with the realisation that the choice of a meaning, anywhere in a text, must have a profound effect on the surrounding choices. It would be futile to imagine otherwise. There is ultimately no distinction between form and meaning.

From the perspective adopted in this study, the implications of the above claim are very important. We are assuming that, given certain formal parameters in the context of a word, it is possible to arrive at a reliable meaning by formalising the evidence of language usage. We are assuming that a variation in the formal profile of a word or an expression will always lead us to a change in meaning. I will now briefly discuss the use of the word fork, as a noun and as a verb, to show how a series of steps in formalisation of the context reliably indicate meaning differentiations. The complete concordance of fork from a corpus of The Economist (9.38 million words) and the Wall Street Journal (6.36 million words) shows a total of 28 instances. Below I will present and discuss just a few examples that illustrate the interrelation between form and meaning.

Functionally complete units of meaning

Of the instances present in the corpus, five show a very consistent collocation with the word knife in the left co-text as in: Use a knife (right hand) and a fork (left hand) …0 00000000000 conservatives who use a knife and fork to eat their red meat …00000000

The meaning here is indeed the implement we use to eat, and this specific collocational patterning is associated with all instances with this meaning. Another meaning of fork is the point at which a road or a path divides into two parts. This is the meaning of fork in three instances of the concordance and it is always associated with the word road as a collocate as in: We are really at a fork in the road in terms of …0 000000 000000000000 At every fork of the road there were …000000000 0000

As a variation of this meaning of fork as bifurcation we find in the concordance six instances where fork always appears in capital letters and shows a very strong collocation in the left co-text with the adjectives North and South: rally of his supporters at the South Fork Ranch …000000000000000000 the real estate division of North Fork Bank & Trust Co. …0000000

It is interesting to note that although the meaning of fork here is the same as ‘fork in the road’, the contextual patterning is different. It consistently forms part of place names and institutions. This usage is mainly American and, in the concordance, strictly confined to the Wall Street Journal corpus. All the remaining instances of fork in the concordance are uses of the phrasal verb fork out. The examination of these instances leads us to consider the issue of co-selection, that is the habitual selection of two or more items together, beyond the simple patterns of collocation seen above. As I mentioned, patterns of co-selection in text shown up by corpus work are so strong and can usually be identified so clearly that they lead us to question the extent of the unit of meaning (Sinclair 1991, 1996, 1998) traditionally associated with the word, and, only in the case of well established idioms, with the phrase. This is where we can observe more clearly what I have called the change in currency which becomes evident when adopting a corpus-driven approach. It is not as if traditional linguistics has totally ignored the issue of co-selection. Indeed idiomatic expressions and phrasal verbs (such as fork out), which are examples of co-selection, have been studied from all angles, but until the advent of large corpora it has not been possible to see how all-pervasive this issue is. The evidence from the corpus tends to point to the fact that what Sinclair (1991) calls the ‘phraseological tendency’ is not limited to standard idiomatic expressions





Elena Tognini Bonelli

and phrasal verbs, but affects all words.5 Moreover, in the case of phrasal verbs, for example, traditional grammars have tended to identify the fixed core of the phrasal verb, but little if no attention has been given to other, perhaps not as immediately visible, patterns of co-selection obtaining between the fixed core and its own co-text. A corpus-driven approach aims to go beyond the identification of a fixed idiomatic core: it considers closely the patterns that link up the core to its environment and tries to quantify and assess their inbuilt variability. Let us consider now the series of steps whereby we can place the phraseological core fork out in relation to its co-text and identify the ultimate function associated with the unit. If we consider the right co-text of fork out we find a strong collocation with words such as Pounds, Dollars followed by numerals (the quotations are unchanged from the electronic form of the text, and words denoting currencies are clearly a way of avoiding the special characters £, $, etc.): fork out fork out fork out fork out fork out fork out fork out

Pounds sterling 50m–70m … a further Dollars 2.4m … for the benefit of shareholders … an extra Pound sterling 7..95 in nics. for the full fare. Yen 391 for this feast. close to Dollars 1 billion to raise its stake … several hundred people, ready to fork out the Pounds sterling 8 (Dollars 12.50) admission … means losing medical benefits and having to fork out for expensive child-care … the Germans and the Japanese will be wise to fork out even if America profits. Taurus member firms may be reluctant to it had to tax payers might ask why they should and his employer will have to business travellers who are prepared to In Japan, Big Mac fans have to BAT would have to

This patterning is supported by other words such as dm, Yen, cash, fare, money and establishes what Sinclair (1996, 1998) has called “semantic preference”: to fork out is related to the activity of paying money, usually exact amounts. If we consider now the left co-text we note that the verb fork out is preceded mainly by different forms of the modal verb have to. Other collocates are may, might, would, could, should and establish a strong co-selection pattern with modals in general; some of their lexical equivalents, such as reluctant to, are prepared to, will be wise to, agreed to, refused to, should make it easier to, support this tendency. The cumulative effect of such instances points to a “semantic prosody” (Louw 1993, Sinclair 1996, Stubbs 1996) which has to do with pressure and unwillingness. People who have to fork out are certainly not pleased about it and do it only in case of pressure or real need. Semantic prosodies represent “the functional choice which links meaning to purpose” (Sinclair: 1996); they delineate, in other words, the outer limit of the unit of meaning

Functionally complete units of meaning

where the co-text merges with the context and a certain item achieves a purpose in a certain environment. Looking at the systematicity of the patterns we have identified above, we are led to support the notion of an extended unit of meaning where collocational and colligational patterning (that is lexical and grammatical choices respectively) are intertwined to build up a multi-word unit with a specific semantic preference, associating the formal patterning with a semantic field, and an identifiable semantic prosody, performing an attitudinal and pragmatic function in the discourse. The unit thus identified is truly functionally complete (Tognini Bonelli 1996a) in that it merges the two dimensions, the contextual one and the functional one.

. Implications for contrastive linguistics and translation: Progressive steps between form and function From the point of view of the comparison of two languages this study argues that the assumptions of a correlation between form and meaning on the one hand and the postulation of a functionally complete unit of meaning on the other are the crucial stepping stones to identifying a network of equivalences. This approach of course entails a communicative view of language, where the linguistic choices made are seen as primarily functional. This is where it becomes crucial to identify systematically the formal patterns associated with the semantic preference and the semantic prosody: only when functionally complete will a unit of meaning be available as a possible choice to the translator or for comparison to the contrastive linguist. Before I go on to propose a methodology that makes use of a set of corpora for translation or contrastive linguistics, I would like to say a few words on the process of translation itself (see also Tognini Bonelli 1996a). The main — and perhaps the most obvious — point to be made is that both the text, encoding meaning, and the context in which the text itself is embedded, vary. Translation presupposes ‘displaced situationality’ (Neubert 1985, Viaggio 1992) at both the linguistic and the extra-linguistic levels. At the purely linguistic level the translator will negotiate equivalence of meaning in a displaced context, that is from SL to TL. This will involve assessment of the two different linguistic systems and analysis of the formal contextual features that realise the same function; the linguist will identify two units of meaning which are comparable in spite of the displaced context.





Elena Tognini Bonelli

At the extra-linguistic level the situational features will also be displaced in that the context will invariably refer to, and reflect, a different culture, a different situation and different participants. Two different levels of interaction will also have to be assessed and accommodated: the original interactive process between SL writer and his/her SL audience and the one between translator and his/her TL audience. The translator here has the task of reproducing, re-creating, the original interaction to a different audience, in a different situation, taking into account the fact that the original text and the translation may even have a different purpose altogether. The steps that the translator will take to negotiate equivalence at the extralinguistic level will account for the strategies s/he will adopt in order to transfer and report the original interaction to a new target audience. This stage can be seen as a reporting strategy, whereby the original interactive process has the status of a report within another, new, interactive process.6 This framework is taken to allow for shifts and changes of purpose between source text and translation, and for specific interventions of the translator vis-à-vis his own target audience, for example. Given these two levels in the translation process, it is important to understand that I am assuming a difference between what I called a unit of meaning, whether in the source or in the target language, and a unit of translation: I maintain that while units of meaning are defined contextually — that is by examining the verbal co-text of the chosen word or phrase and identifying the patterns of co-selection — units of translation are defined mainly strategically by means of explicit balancing decisions taken by the translator in order to achieve an effect or purpose equivalent to the original (Nida 1964). These balancing decisions will be possible (a) once comparable units of meaning have been isolated in the source and the target language, in the light of (b) the perceived role of the translator as “go-between” linking two cultures and two specific situations, as well as of (c) the function of the translated text vis-à-vis the new target audience. Although both the linguistic and the extra-linguistic levels must be taken into account for the translation to be successful, this article will present only a methodology for identifying and evaluating sets of comparable units of meaning. The strategic steps that may influence the translator to opt for one unit of translation rather than another and the linguistic realisation of these steps are excluded from the present enquiry.7 In this respect, the methodology illustrated here may be of relevance to other linguists who also work across languages but are not necessarily interested in a translated output as such.

Functionally complete units of meaning

The methodology I will illustrate tries to locate the words and phrases that encode a function in L1 and that of other words and phrases, inevitably different from the first set, that will yield a comparable unit of meaning in L2. In other words, my aim here is to trace, through a series of steps correlating formal patterning with function in L1 and L2, the boundaries of sets of functionally complete units of meaning in the two languages. We should note that the initial hypothesis positing one or more tentative matches between two or more prima facie units of meaning in SL and TL has to rely on the translator’s intuition or past experience. Traditionally, standard reference works such as bilingual dictionaries attempted to provide this information. Recently we have witnessed the emergence of translation corpora (also referred to as parallel corpora), which are corpora of texts that stand in a translational relationship to each other, that is to say the texts can each be a translation of an absent original or one of them can be the original and the other(s) translation(s). I maintain that the use of a translation corpus at this stage, if available, will give us the benefit of such input in a more reliable manner and provide us with a range of possible translation pairs that have already been identified and used by translators, in other words verified by actual translation usage. I maintain, however, that in the framework of a corpus-driven approach the definition of a functionally complete unit of meaning cannot be confined solely to the evidence from a translation corpus. Each unit of meaning has to be contextualised and its formal components identified in the light of a type of corpus evidence which is not subject to the restrictions of mediated language. At this stage, therefore, the linguist will need to substantiate his/her observations using two comparable corpora (one L1 and one L2); these are corpora whose components are chosen to be similar samples of their respective languages in terms of external criteria such as spoken vs. written language, register, etc. The identification and matching of form and function of the equivalence pair will take place in each of the two sets of comparable corpora.8 The formalisation of the regularities exhibited by the evidence will allow a series of progressive steps which will deconstruct an initial chosen function into its formal components, and vice versa. The aim is to ascertain functional equivalence, that is, the equivalence obtaining between functionally complete units of meaning. Here I will distinguish three methodological stages (see Table 1 below). The first step works within L1 and consists in identifying and classifying the formal patterning in the context of a given word or expression against the evidence of an L1 corpus (see Johns 1991: 4); this is followed by the matching of a specific meaning/function to each specific pattern. Step 2 in the





Elena Tognini Bonelli

process will consider both L1 and L2 and will posit a prima facie translation equivalent for each meaning/function. If a translation corpus is available, the process will be enriched by access to translations. If no translation corpus is available, as in the case of this study, this step has to rely on information taken from reference books or intuition on the part of the analyst. Step 3 will start from a function in L2, realised by the prima facie equivalent, and will deconstruct it into its formal realisations (collocational and colligational patterning); in a way it will replicate the process of step 1, but the other way around. Table 1: Methodological steps Comparable Corpus (L1)

Translation Corpus / Translator’s Experience SL/TL

Step 1 Step 2 from Formal Patterning/L1 identify a prima facie transto Function(s)/L1 lation equivalent for each Function/L1 → Function/L2

Comparable Corpus (L2) Step 3 from Function/L2 (as realised by a translation equivalent) to Formal Patterning /L2

I believe it is very important to be strict and systematic about the specific formal patterning associated with a given item. By looking at the patterns on the vertical axis of a concordance and identifying larger syntagmatic units on the horizontal axis and by considering the frequency distributions, the researcher — whether s/he is a translator, a bilingual lexicographer or a contrastive linguist — will not only be able to assess what is possible, but also what is likely within two different linguistic systems; specific appropriateness to context will be evaluated against the evidence, the full value of a translator’s own chosen or inadvertent deviations from the norm can be assessed against the range of variations present in the L2 corpus. Issues that could not really be addressed before the advent of corpora because of the need for a large amount of evidence — cumulative connotational tendencies or specific register characteristics, for example — can now be observed and become tangible, often being simply identified by alphabetising the context of a word. These points become very relevant when one considers the implications of translating from and into one’s own mother tongue. The process leading from formal patterning to function and vice versa can be related to the process of decoding and encoding in language. In translation the norm is for the translator to translate — that is to encode — into his/her own mother tongue where it is assumed s/he can be more sensitive to the demands of appropriateness. With corpus evidence at hand and with a methodology to identify systematically the

Functionally complete units of meaning

relevant lexical and grammatical profiles of a word or expression and relate them to connotational weight and pragmatic function, this approach will reduce the gap existing between translating from and into one’s own mother tongue. Even when dealing with a language other than their own mother tongue translators will be able to identify tangibly the norm and the range of variation from it and make choices in the light of that evidence.

. Navigating across English and Italian: The expression in (the) case (of) Sections 5.1 and 5.2 below propose an analysis of two expressions that incorporate the words case and caso in order to introduce a circumstantial element. Section 5.3 will present a third example, where the conjunction in case will be compared with the Italian se per caso. I will use two sets of general corpora, one of English and one of Italian, which at the time of writing were the best I could access in terms of comparable corpora, although not explicitly put together according to the same criteria.9 Step two, positing a prima facie translation equivalent, which should ideally make use of a translation corpus, will rely here on standard reference works and the linguist’s experience and intuition. The findings reported below are to be taken as indicative of the methodological steps proposed, but they would still need to be explored further and validated in the light of more exhaustive evidence. The citations discussed below are reported in order to illustrate patterning and they represent a reduced sample of the overall concordance analysed from the corpus. . From in the case of to nel caso di The expression in the case of is, from a grammatical point of view, a complex preposition introducing what Halliday called a circumstance of matter (1985: 142). Our first step will be to consider its co-text and analyse it into its formal lexico-grammatical constituents. Looking at the right co-text of the concordance below, we find a noticeable presence of the definite article the.10 This, coupled with the other strong pattern — the presence of proper names — points to a strong function related to specificity. The function here is to present individual examples, considered for their particular characteristics. of subsidies can be illustrated on shifts in values. As we shall see period is likely to be lengthier than end in itself. This is especially so

in the case of in the case of in the case of in the case of

Australia, where the est London’s motorways, the Spain, because of the we experiments which can





Elena Tognini Bonelli

even a reasonable thing to assume; and awakening of enlightened optimism allowed myself to break this rule an expert witness on the truth drug children and primitive artists, but not be able to perform efficiently.

in the case of in the case of in the case of in the case of in the case of in the case of

relatively minor ills the Liberals, and the the USSR — the data the Boston strangler. the caricaturist the the distance runner

The semantic preference associated with this is very varied: people alternate with countries, tangible objects with less tangible ones. In terms of semantic prosody we are not associating a particular evaluation with the instance presented. We could perhaps see the introduction of specificity as the ultimate function of this complex preposition. To understand better the neutrality attached to the specific cases introduced by in the case of it might be interesting here to open a brief parenthesis and consider the collocational profile of a grammatically parallel expression in English, namely in the event of, extracted from the same corpus. For lack of space, I will not present the whole concordance here, only a few examples of the collocates: unavoidable nationalisation, a major disaster, a breach of the rules, great national emergency, a war, company failure, hostilities, trouble, etc. The negative semantic prosody attached to the collocates of this expression is very strong and regular, so much so that even the only neutral word — an election — is turned negative by what follows: … of a politically hostile party. The Italian prima facie translation equivalent of in the case of posited in stage two is nel caso di and stage three will go through the same process of deconstruction in the Italian concordance, identifying the formal patterning present in the co-text: peggiorativa, come, per esempio, ambienti simulanti l’ acqua di mare del rumore termico) formaggio. Diverso è il il discorso si riproduce, altre volte come la collaborazione del paziente: l’ approccio più diretto, almeno olidaristiche e corporative come rambe le eventualità si presentano informazioni di tipo diagnostico

nel caso degli nel caso degli nel caso degli nel caso dei nel caso del nel caso del nel caso del nel caso del nel caso dell’ nel caso della

“homines novi”, uomini acciai superferritici e le algoritmi a minima bambini. Poiché il loro terzo sonetto vediamo fumo, deve voler smettere. carcinoma midollare della “Lord Spleen” di Giovanni “Orlando furioso”, patologia neoplastica

Here we note the merging of the preposition di with the definite article which gives rise to del, dell’, dei, degli, delle as substitutable for di. The function of specificity is very obvious because this merging between the preposition and the article is present in all the instances. In the right co-text we find nouns

Functionally complete units of meaning

which, as in the English material, show quite a lot of variation with no strong collocational pattern. In terms of semantic preference it is interesting to note two rather prominent areas. Firstly, the area of technical and scientific terminology (acciai superferritici, algoritmicarcinoma midollare, amminoacido, patologia neoplastica, etc.) which accounts for 31% of the instances; secondly the area of literary analysis (“homines novi”, terzo sonetto, “Lord Spleen”, l’Orlando Furioso, etc.) which accounts for 21.5% of the instances. At the level of semantic prosody again we could say that a fairly objective function of specificity is the only identifiable one, as in the English equivalent. We have now established a first set of translation equivalents. The correspondence is not only between two multi-word units which incorporate the same lexical word case/caso, or indeed between grammatical functions. The equivalence has been evaluated at the level of functionally complete units of meaning. The evidence of a divergence in semantic preference and/or prosody will be of great help to the translator, for example, and will allow him/her to avoid those instances of rather infelicitous ‘translationese’ (Gellerstam 1986) which may stem from an involuntary contravening of the unstated semantic preference. The case discussed above is, in spite of the differences in semantic preference, a fairly felicitous case of equivalence. One word of warning. The difference in semantic preference apparent between the English and the Italian in the concordance discussed above needs to be confirmed, to ensure that it is not the result of an imbalance in the selection of the texts included in the corpus and therefore skewed towards a language variety or reflecting a specific topic. Semantic prosodies are often linked to language varieties and seem to become more systematic and restricted the more specific and restricted the variety is. This point raises the issue of representativeness of a given corpus and, in our case here, the comparability of the L1 and L2 corpora. Unfortunately, it is still often the case that an analyst will be presented with a set of L1/L2 “comparable” corpora as a fait accompli, without any real access to information on the criteria according to which these corpora have been assembled and certainly without any say in corpus design. This is to a certain extent inevitable given the fact that corpora tend to be very large nowadays and beyond the undertaking of a single individual. However, it also means that the user will all too often not be in a position to evaluate the evidence properly for lack of information and will not to be able to influence the representativeness of the texts included in the corpus. My position with respect to representativeness is rather pessimistic; I would go along with Leech (1991: 27) in saying that the assumption of repre-





Elena Tognini Bonelli

sentativeness “must be regarded largely as an act of faith” as at present we still have no way of ensuring it or evaluating it objectively, although a lot of work is being done in this direction. I believe it is of paramount importance that the analyst, who will have to judge in the end whether the semantic preferences and prosodies reflect topic-dependency or are inbuilt in the language, be at least able to assess the specific criteria used in corpus building, and access a list of all the texts included in the corpus and information on the sampling criteria adopted. Last but not least, I think we should remember that corpus work — whether monolingual or bilingual — is above all comparative work, where the analyst must never tire of comparing across different varieties, different situations, different languages, different corpora. Only thus will s/he arrive at a balanced statement in language description. . From in case of to in caso di From a grammatical point of view in case of is, like in the case of above, a complex preposition introducing, in Hallidayan terms, a circumstance of cause or condition (1985: 140). It is interesting to note that the difference between the two — the first one introducing a circumstance of matter, the second one of cause or condition — is only brought about by the presence or absence of the definite article as an explicit signal of specificity. From the corpus we get some information on frequency: compared with in the case of (550 instances in 20m words) this complex preposition is not very frequent (56 instances in total), a fact that points to its “specialised” function: man being, and I wanted to be sure ever. This will minimize your loss of milk, a jar of pureed prunes mast and sail were constructed the place up kept an eye on it it in polythene kitchen wrap be there to pick up the pieces er not. One of us should be here, don’t,’’ and closed her eyes Under-ripe berries were preferred

in case of in case of in case of in case of in case of in case of in case of in case of in case of in case of

a sudden emergency that we gave accident or theft. One pound constipation. Other tips. engine failure. There were further vandalism or moves from involuntary incontinence, and massive calamity. Such more Lady Alices, do n’t you something terrible. Nothing fata transport hold-ups, and were

Starting now with the first step focusing on formal patterning, we identify some of the most frequent collocates as repeated co-occurrences in the right co-text: accident/s, attack, emergency, fire, trouble, need and difficulty are among the more frequent. This collocational profile already shows a strongly negative

Functionally complete units of meaning

semantic preference for what could be termed ‘disaster areas’ and this is reinforced by other words which belong to the same semantic field: a burn danger immediate need

an urgent telegram resistance renewed difficulty

distress showers overbalancing shortages

further questioning loss massive calamity

death war problems

What we have called semantic prosody, the overall function of the expression, could here be termed ‘provision for disaster’. This prosody is so strong that the only instance where the noun following in case of is rather neutral, viz. One of us should be here, in case of more Lady Alices, don’t you think?

is understood along the same lines, and the possibility of a “Lady Alice”, or someone like her, appearing at the door is interpreted as unappealing to say the least. The statement here carries an obvious ironical intention and the clash in semantic prosodies can be seen as the formal realisation of this ironical intention (Louw 1993).11 Having thus identified an extended unit of meaning in English, the second step will posit, as a prima facie equivalent, the Italian in caso di. As a start we can say that in caso di has the same grammatical function as the English counterpart, and again, in terms of frequency, it is rather rare (the Italian corpus contains 83 instances in total, compared with 319 nel caso di). The third step will de-generalise the prima facie equivalent into its formal components: posporre l’ intervento dell’ esercito. In caso di questa castagna e non aprirla se non in caso di riguarda l’ arruolamento volontario in caso di avrebbe preso e come le avrebbe usate in caso di fossero più pericolose di quelle grandi in caso di adeguati alle loro esigenze di vita in caso di epilessia che garantisce una copertura in caso di giova ricordare che il sistema bancario in caso di con Mitterrand alla presidenza poiché in caso di meglio avere i capelli super-puliti. In caso di

calamità naturali dirige gran necessità. Cammina guerra. A sedici anni Giovan incendio. Accennava i movime incidenti a catena a velocità infortunio, malattia, invalidità morte o invalidità permanente riduzione del personale , non successo non sarebbe “ né 12 un invito ultimo-momento,

At the collocational level we find repeated instances of the words necessità, guerra and urgenza. The negative semantic preference for ‘disaster areas’ is clearly noticeable and the other words present in the right co-text emphasise this preference for the negative:





Elena Tognini Bonelli

bisogno contrasto dissenzo disubbidienza dubbio riduzione del personale siccità

brusca frenata cellulite controversie emergenze eruzione impedimento permanente rilascio accidentale

caduta conflitto debolezza organica giudizio negativo malattie risposta negativa urgenza

The semantic prosody can again be termed ‘provision for disaster’ as the extended unit of meaning has the overall function of hypothesising the possibility of something unpleasant happening and offering a guarded damage-limitation statement about it. It is interesting to note that — as in the instance with more Lady Alices — we find here an example that is apparently neutral: con Mitterrand alla presidenza poiché in caso di successo non sarebbe né servito a …

This instance, though, is seen to fit the pattern once we look at the wider context, where it becomes apparent that the spokesman who is talking is conservative and, for him, the success of Mitterrand would be certainly perceived as a major disaster. The equivalence thus established between in case of and in caso di takes account of the general semantic field and semantic prosody and can be said to be satisfactory at the wider functional level. In the light of evidence from larger corpora it will be possible to validate a stronger collocational profile which would certainly provide a welcome guide to appropriateness for the researcher who works across languages. . From in case to se per caso Next I would like to consider briefly the conjunction in case.13 The conjunction works at the level of the clause and therefore it can be expected that the local patterning in terms of collocation will be less strong than with the prepositions. In my data there is no specific collocational restriction. At the colligational level one notices a strong presence of personal pronouns: I, you, he, she, we, which point to a certain interactiveness and colloquialness of the texts14 on the one hand and an association with the narrative genre on the other. The element of colloquialness seems to be confirmed by the frequent use of phrasal verbs, lay ahead, put off, turns out, catch up, mixed in, brushed against, etc. In terms of semantic preference there is again a lot of variation, but a number of words and expressions certainly confirm the feeling of informality. In most cases the analysis requires a wider context to identify the functionally

Functionally complete units of meaning

complete unit and I will not present the full concordance of this expression for lack of space. The citations below can help identify the overall semantic prosody: He looked at her now with alarm, in case it is better to be bribed than to bribe in case Claude, Fernet and I will be there too in case Tear gas, small arms, in case He poured himself another in case I left a bundle in my bed in case avoid passing them too close to the male’s body in case He didn’t tell you how to get in touch with him in case The pistol’s right there beside the bed, just in case cannibals are perfectly nice people and just in case

she might do the room an injury... something goes wrong down the line... the wolves start snapping... they won’t come back by themselves... the abbess forgot to suggest it... anyone looked but they were both snoring... they brush against him accidentally... I should arrange another party? the pimp has an attack of amnesia... you are wondering what this team is doing in the bush...

In these instances the tone and what we have called the semantic prosody is clear: there is an element of guarded damage limitation and provision for disaster (as with in case of) but the speaker/writer/narrator also seems to be smiling — half ironically or ‘tongue-in-cheek’ — at some situation and sharing with his/her reader a knowing wink. A prima facie translation equivalent for Italian is se per caso: Una torre senza ragni è sospetta: E soprattutto non cadermi addosso si guardavano intorno per vedere calzati con le scarpe da footing infangate e

se per caso se per caso se per caso se per caso

Venne la ragazza per chiedergli se per caso chiede notizie di Mozart, se per caso decise di scoprire se per caso una pentola, dico una, con il coperchio? se per caso Attraversava il sentiero; e se per caso

ne trovaste una, fuggite via subito... c’è un urto violento... il lupo li seguiva, ma lui ovviamente... caso non si posino sul divano rivestito di un materiale il cibo non fosse stato cucinato... lei conosce la sua “Marcia Turca”... si erano trovate a New York nello stesso periodo... gli riesce a convincere una a salire, pardon , a... un qualche grosso scarabeo zuccone...

In the citations above, again we find that the collocational patterning is not strong but the verbs vedere (‘to see whether’) and chiedere (‘to ask whether’) are often present in the left context of the conjunction. Other verbs which occur only once, but reinforce a general semantic preference for ‘discovering whether’ are: scoprire and spiare, often in the context of a fairy tale (see also the use of words such as lupo/lupa — ‘wolf ’ and ‘she-wolf ’). Perhaps the most noticeable difference between the English and the Italian,





Elena Tognini Bonelli

though, is the fact that in case is used when someone is mentioning a possible future situation or hypothetical event as a reason for doing something. In Italian this reference to a future/hypothetical event is not there, so we have instances like si guardavano intorno per vedere se per caso il lupo li seguiva where the best translation would perhaps be a more neutral ‘… to see if by any chance the wolf was following/had followed them …’ which would allow a reference to a past situation. At the level of the semantic prosody we find some instances — like the first one above: ‘A tower without spiders is suspect; se per caso you were to find one you should run away immediately …’ — which share the tone of ‘tongue-incheek’ with the English instances. Others, though, do not seem to take this light, semi-ironical attitude to the subject presented. In the example “she decided to discover se per caso they had been in New York at the same time …”, we no longer have a conditional clause but an indirect interrogative clause; here the writer is really talking of a fairly neutral possibility, and simply adding to it the element of discovery. A translator would probably opt for a translation that does not include in case, which is consistently associated with irony. We have to conclude that the prosody which was remarkably regular in English is not as systematic in Italian. The element of tongue-in-cheekness remains as a possible choice at the paradigmatic level, allowing the linguist or the translator to identify se per caso as an equivalent of in case when the time reference allows it. However, the type of special effects identified by Louw (1993) — irony, hidden attitudinal stance, for instance — which depend on a clash with “a sufficiently expected background of expected collocations” (ibid.), cannot be reliably identified in the Italian data. A translator working with English as his/her TL would have to be aware that the neutral possibility introduced by se per caso could become tinted with ironic connotations when introduced by in case. In terms of functionally complete units of meaning we have analysed here what seemed a possible translation equivalent, but the function of the two expressions has been shown to differ quite a lot after all. At the grammatical level the difference in time reference is quite noticeable and could give rise to mistakes in the translation. At the level of the semantic prosody it could generate a trap for the unaware translator because the correspondence is similar but not as systematic.

Functionally complete units of meaning

6. Conclusion The approach to establishing functional equivalence, whether for contrastive or translation purposes, proposed in this article advocates the use of comparable corpora, in stage one and three, and a translation corpus in stage two. The use of comparable — or even relatively comparable — corpora is seen as an absolute necessity in order to establish equivalence and it is argued that it would be impossible to identify reliably functionally complete units of meaning without the help of the evidence from the corpus. I maintain that it is also necessary to use a translation corpus to posit a set of prima facie equivalents, but such corpora are still not widely available and I have not been able to access one for this study. As a result this study only partially exemplifies the model it advocates, but I hope it points a way forward to a methodology that will bring together the translator’s experience (as from the translation corpus) and the input, the richness and the variability of two natural languages (as from the comparable corpora). This methodology is offered as potentially useful to all researchers working across languages, contrastive linguists, bilingual lexicographers and translators alike. Of course the input of the translators is given a certain priority and their input is more directly channelled into the procedure at the level of the translation corpus. What I have tried to show in this study is a way of establishing and evaluating the comparability of units of meaning across languages which takes into account language events which, in Firth’s words, are “typical, recurrent and repeatedly observable” (1957: 35). The assumption that words do not live in isolation but in strict semantic and functional relationship with other words has led me to posit the notion of functionally complete units of meaning. To sum up we can characterise them in this way: 1. They can be identified by looking at patterns of co-selection in the context of a word or expression. They involve collocational (lexical) and colligational (grammatical) choices and therefore cannot be defined solely in lexical or grammatical terms. They also involve a semantic preference, realised by words which belong to the same semantic field, and a specific semantic prosody at the pragmatic and connotational level. 2. They are syntagmatic units in that they interrelate with other words and, through a process of co-selection, they form a multi-word unit which becomes available as a single choice on the paradigmatic axis.15 3. Only when these multi-words units are functionally complete do they





Elena Tognini Bonelli

become available as translation equivalents or as comparable units of meaning between two languages. I hope to have demonstrated that there is no way in which the information gathered from corpus evidence simply by observing the repeated patterns of co-selection can be found in standard works of reference. The examples of multi-word units chosen here share the same lexical core, the word case. What accounts for their varying degrees of correspondence in terms of their semantic preference and semantic prosody cannot be severed from their very individual pattern of co-selection. It would not make sense to attempt a translation without first being fully aware of that specific semantic preference and that specific semantic prosody. In the examples discussed in the context of English and Italian we have found that the match was good in most respects. But this should never be assumed, and a comparison with other languages will indeed prove this point.

Notes  See Tognini Bonelli (2001) for an account of the integration of this approach with the building up of a network of translation equivalents.  This is not always the case. Some models which recognise a lexicon and a grammar attempt to integrate the two (for an overview, see Faber & Mairal Usón 1999). One may wonder, however, whether the theoretical positing of a dichotomy between lexis and grammar does not in itself affect the actual appreciation of the strict interconnections and overlaps between the two.  Stubbs (1993: 8–9) explicitly points out that much linguistics is based on invented sentences. In addition, often only a very small number of invented sentences are discussed and he goes on to warn us that “it is easy to forget or ignore how little data, either invented sentences or real texts, is actually analysed in the most influential literature in twentieth century linguistics”. The linguists he quotes in this context range from Saussure, Bloomfield, Chomsky and Lyons to Austin and Searle; but even Firth and Halliday turn out to analyse very little text.  Sinclair (1991 andff.) is fond of saying that a corpus can prove anything and the opposite of anything, and a theory that can account for corpus evidence specifically is needed in order to do justice to the data.  Sinclair defines the idiom principle or phraseological tendency and points out that “The principle of idiom is that a language user has available to him or her a large number of semipreconstructed phrases that constitute single choices, even though they might appear to be analysable into segments” (1991:110).

Functionally complete units of meaning  The view of this second stage in the translation process I am proposing is based on Sinclair’s position on the function of reporting structures in discourse (1981). Applied to translation, at the extra-linguistic level, this strategy will account for the role the translator has in (1) assessing his bridging task between SL and TL, source audience (i.e. the audience of the original writer, in the SL) and target audience (i.e. the audience of the translator in the TL), as well as the specific genre and function of the text (narrative, technical, persuading, advertising, etc.); (2) taking into account the correspondences between function and formal realisations across different languages established at the purely linguistic level; (3) reporting the original message (as negotiated in the interaction between source writer and his own audience) to his/her (the translator’s) own target audience.  A translation corpus (see below) can be used to shed light on the process of translation itself. For an interesting discussion on the process of translation and the use of corpora see Baker (1996) and (1998). The most common use of a translation corpus, however, remains the access to translations as products where the translated corpora reveal cross-linguistic correspondences and differences that are impossible to discover in a monolingual corpus.  Access to a set of two (or more) truly comparable corpora is not always possible. At the time of writing, although some monolingual corpora which claim to be representative exist and are accessible, sets of comparable corpora in different languages are still difficult to obtain.  The set of English corpora used here are: the Economist corpus (containing 9.38 million words from the journal of the same name), the Wall Street Journal corpus (containing 6.36 million words from the journal of the same name). As a general corpus I refer to the Birmingham corpus, which is the original 20 million corpus of contemporary English on which the Cobuild project was initially based. These are now part of the holdings of the Bank of English. The Italian corpus I use contains 4.5 million words of contemporary Italian and is a part of the holdings of the Istituto di Linguistica Computazionale at the Università degli Studi di Pisa. I would like to acknowledge here the generosity of all those who provide corpora for research, and in this case Prof. Zampolli, Director of ILC, Pisa and Jeremy Clear, Director of Cobuild Ltd.  The concordance for in the case of is taken from the Birmingham Corpus of Contemporary English (20 million words).  In his seminal article (1993) Louw defines semantic prosody as “a consistent aura of meaning with which a form is imbued by its collocates” and discusses the special effects due to a clash in semantic prosodies: “Evidence is emerging that departures in speech or writing from the expected profiles of semantic prosodies, if they are not intended as ironic, may mark the speaker’s real attitude even where s/he is at pains to conceal it” (ibid.:157).  “Un invito-ultimo-momento” is an invitation at the last moment. In Italy some people are rather sensitive to this because it implies not really having been planned in at the party and having been invited only because someone has called out or, worse, because the host has suddenly realised that 13 people were going to sit at the table. This instance goes along with the trend of in case + something unpleasant taking place.  I found a total of 532 instances of the conjunction in case in the Birmingham Corpus (20 million words).





Elena Tognini Bonelli  Biber (1994) identifies text types — in contrast to register and genre — on the basis of shared linguistic co-occurrence patterns. Among the linguistic features analysed to identify text types one is pronouns; Biber points out that they are “relatively interactive and colloquial in communicative function” (ibid.:389).  This is the case even when a very simple collocational pattern is the only expansion on the core word: consider the example of fork the meaning of which was differentiated, at the collocational level, by a different pattern of co-selection: knife and fork on the one hand and fork in the road on the other.

References Baker, M. 1993. “Corpus linguistics and translation studies” in Baker et al., 233–250. Baker M. 1996. “Corpus-based translation studies: the challenges that lie ahead”. In Terminology, LSP and Translation: Studies in Language Engineering, in Honour of Juan Sager, H. Somers (ed.), 175–186. Amsterdam and Philadelphia: Benjamins. Baker, M. 1998. “Réexplorer la langue de la traduction: une approche par corpus”, in Meta 43: 480–485. Baker M., Francis, G. and Tognini Bonelli, E. (eds) 1993. Text and Technology. In Honour of John Sinclair. Amsterdam and Philadelphia: Benjamins. Biber, D. 1994. “Representativeness in corpus design”. In Current Issues in Computational Linguistics in Honour of Don Walker, A. Zampolli, Calzolari N. and Palmer M. (eds). Linguistica Computazionale IX.X. Giardini Editori e Stampatori in Pisa and Kluwer Academic Publishers. Faber, P. B. & Mairal Usón, R. 1999. Constructing a Lexicon of English Verbs. Berlin and New York: Mouton de Gruyter. Firth, J.R. 1957. Papers in Linguistics 1934–1951. London: Oxford University Press. Firth, J. R. 1968. “A synopsis of linguistic theory: 1930–55”. In Selected Papers of J. R. Firth 1952–59, F.R. Palmer (ed.), 168–205. London and Harlow: Longmans. Francis, G. 1993. “A corpus-driven approach to grammar”. In Baker et al. (eds), 137–156. Gellerstam, M. 1986. “Translationese in Swedish novels translated from English”. In Translation Studies in Scandinavia, L. Wollin and H. Lindquist (eds), 88–95. Lund: CWK Gleerup. Halliday, M.A.K. 1992a. “Language theory and translation practice”. In Rivista Internazionale di Tecnica della Traduzione, No. 0, 27–58. Udine: Campanotto Editore. Halliday, M.A.K. 1992b. “Language as a system and language as an instance: the corpus as a theoretical construct”. In Directions in Corpus Linguistics, J. Svartvik (ed.), 61–77. Berlin and New York: Mouton de Gruyter. Halliday, M.A.K. 1985. An Introduction to Functional Grammar. London: Edward Arnold. Hunston, S. and Francis, G. 2000. Pattern Grammar: a Corpus-driven Approach to the Lexical Grammar of English. Amsterdam and Philadelphia: Benjamins. Louw, B. 1993. “Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies”. In Baker et al. (eds), 157–176.

Functionally complete units of meaning

Johns, T. 1991. “Should you be persuaded. Two samples of data-driven learning materials”. In Classroom Concordancing. ELR Journal 4: 1–16. University of Birmingham. Neubert, A. 1985. Text and Translation. Leipzig: VEB Verlag Enzyklopadie. Nida, E. 1964. Towards a Science of Translating. Leiden: J. Brill. Sinclair, J. M. 1981. “Planes of discourse”. In The Two-fold Voice: Essays in Honour of Ramesh Mohan, S.N.A. Rizvi (ed.), 70–89. Salzburg: University of Salzburg. Sinclair, J. M. (ed.) 1987. Looking Up: an Account of the COBUILD Project in Lexical Computing. London: Collins. Sinclair, J.M. 1991. Corpus Concordance Collocation. Oxford: Oxford University Press. Sinclair, J. M. (ed.) 1996. Corpus to Corpus. Studies in Translation Equivalence. Special issue of the International Journal of Lexicography 9 (3). Sinclair, J.M. 1996. “The search for units of meaning”. TEXTUS 9: 75–106. Sinclair, J. M. 1998. “The lexical item”. In Contrastive Lexical Semantics, E. Weigand (ed.), 1–24. Amsterdam and Philadelphia: John Benjamins. Stubbs, M. 1993. “British traditions in text analysis”. In Baker et al. (eds), 1–33. Tognini Bonelli, E. 1996a. “Towards translation equivalence from a corpus linguistics perspective”. In Sinclair (ed.), 197–217. Tognini Bonelli, E. 1996b. Corpus Theory and Practice. TWC Monographs, Birmingham: TWC. Tognini Bonelli, E. 2000. “Il corpus in classe: da una nuova concezione della lingua a una nuova concezione della didattica”. In Linguistica e Informatica: Corpora, Multimedialita’ e Percorsi di Apprendimento, R. Rossini Favretti (ed.), 93–108. Roma: Bulzoni. Tognini Bonelli, E. 2000. “Things that can and do go wrong in language teaching: Revisiting ‘the seven sins’. In the light of corpus evidence”. Linguistica e Filologia 11. Tognini Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam and Philadelphia: Benjamins. Viaggio, S. 1992. “Contesting Paul Newmark”. In Rivista Internazionale di Tecnica della Traduzione, no. 0: 27–58. Udine: Campanotto Editore.



Causative constructions in English and Swedish A corpus-based contrastive study Bengt Altenberg

.

Background

High-frequency verbs are often problematic for foreign language learners. The reason for this is that, while they tend to express basic universal meanings and consequently have equivalents in most languages, they have also undergone various meaning extensions resulting in a high degree of polysemy and language-specific uses (cf. Viberg 1996). As a consequence, superficial cross-linguistic similarities often conceal treacherous differences. An interesting example of this was revealed in a recent study by Altenberg and Granger (2001) of the lexical and grammatical patterning of the verb make in the International Corpus of Learner English, which showed that French-speaking and Swedish EFL learners deviated in interesting ways from native American students’ use of the verb.1 While both learner groups underused (and misused) ‘delexical’ make (e.g. make a decision, make a point), they were clearly differentiated in their treatment of causative make (e.g. make sb happy, make sb believe sth). As shown in Table 1, the French-speaking learners Table 1. Causative uses of make by EFL and native US students Complement

FR

SW

US

Adjective Verb Noun

198 167 110

179 125 123

130 180 126

Total

174

327

236



Bengt Altenberg

significantly underused causative make with adjective and noun complements (e.g. make sth possible, make sb a star), whereas the Swedish learners revealed an equally significant overuse of causative make with adjective and verb complements (e.g. make sth easier, make sb understand). Another interesting finding was that the learners’ treatment of causative make seldom resulted in clear errors but in a number of rather clumsy constructions, suggesting that the learners tended to opt for a semantically and grammatically ‘decomposed’ make + object + complement pattern in cases where a native writer would prefer a ‘synthetic’ causative verb alternative (e.g. make people come closer instead of bring people closer): (1) So a recession can actually make people come closer to each other (bring people closer) (2) The most difficult thing about this is … to make its inhabitants open their eyes (open its inhabitants’ eyes) (3) ... the differences are made to vanish (are eliminated) (4) There will always be pressure from the outside to make us change (change us)

From a Swedish perspective, these results raise several interesting questions. How can the Swedish learners’ overuse of causative make be explained? Do they overgeneralise a dominant English pattern (intralingual influence) or are they affected by transfer from Swedish (interlingual influence)? A purely intralingual explanation is not very plausible, however, since the Swedish and Frenchspeaking learners display fundamentally different tendencies. If intralingual influence had been the main conditioning factor, we would expect both learner groups to behave in the same way. This leaves interlingual influence, i.e. transfer from L1, as a more likely explanation.

. Causatives in English and Swedish The assumption that the Swedish learners’ overuse of causative make may be the result of transfer from Swedish is intuitively supported by the similarity between the basic causative constructions in the two languages. As shown in Table 2, English causatives with make can be divided into three main ‘analytical’ types — as I will call them — depending on whether the complement following the object is an adjective phrase (type A), an infinitive clause (type B) or a noun phrase (type C). Swedish has corresponding constructions with the verbs göra (types A and C) and få (type B).

Causative constructions in English and Swedish

Table 2. Main causative constructions in English and Swedish English

Swedish

Type A

make + Object + Adjective phrase: She made him happy

göra + Object + Adjective phrase: Hon gjorde honom lycklig

Type B

make + Object + Infinitive: He made her laugh

få + Object + Infinitive: Han fick henne att skratta

Type C

make + Object + Noun phrase: They made it their home

göra + Object + Prep. phrase: De gjorde det till sitt hem

Semantically and syntactically the constructions are very similar in the two languages. They are all ‘complex-transitive’ structures (cf. Quirk et al. 1985: 1195) in which the ‘raised’ object and the complement are notionally equivalent to the subject and predication of a related clause which expresses the result of the causative event (cf. Juffs 1996 and Song 1996). The differences are relatively superficial: English has one prototypical causative verb, Swedish has two; the English B construction has a bare infinitive, the Swedish infinitive is preceded by the marker att ‘to’; the complement of the C construction is a noun phrase in English but a prepositional phrase in Swedish. Apart from these analytical constructions, both languages have various other ways of expressing causative relations. For example, in English there are many synthetic causative verbs in which the resulting state or event is fused with the causative meaning of make into a single verb form: make sth fall = fell sth, make sb believe = convince sb. In addition, causative relations can be expressed by verbs other than make, such as cause, force, get, have and let. Moreover, cause-effect relations can be expressed by conjunctions (e.g. because, so that), by adverbial expressions (e.g. because of NP), by verbs (e.g. cause, result in) and in a number of other ways. The same applies in Swedish. Although both languages have various resources to express causative relations, it is reasonable to describe the analytical patterns as the basic or prototypical causatives in the two languages. Since there is a striking cross-linguistic parallelism between these constructions, it is natural to assume that Swedish learners might be tempted to use the semantically and grammatically ‘decomposed’ make + NP + complement pattern even in cases where a native writer would prefer a synthetic alternative, with examples like (1)–(4) as a result. However, in the absence of good contrastive descriptions of causative constructions in English and Swedish this can only be a hypothesis. Our knowledge of the relative frequency of various alternatives and of the ‘prototypicality’ of the analytical constructions in the two languages is very limited, and hardly





Bengt Altenberg

anything is known about the degree of correspondence of the various alternatives across the two languages. It is the purpose of this study to find out something about this and, if possible, throw some light on the Swedish learners’ overuse of causative make constructions. For this purpose, the learner study will here be supplemented with a contrastive examination of the main causative options available in English and Swedish and their distribution in a parallel corpus of English and Swedish texts.

. Aim and material For practical reasons, I will limit my study to the B construction in the two languages.2 The following questions will be explored: – – – –

How ‘dominant’ are the B constructions in the two languages? To what extent are the B constructions retained in translations between the two languages? Which are the main causative alternatives and how often are they used? How can contrastive data help to explain the Swedish learners’ overuse of English B constructions?

The study is based on the English-Swedish Parallel Corpus (see Aijmer et al 1996). As shown in Table 3, the corpus consists of 40 English text samples and their translations into English and 40 Swedish text samples and their translations into English. The samples are 10,000–15,000 words in length and half of them are drawn from fiction, half from non-fiction texts. Within each genre, the source texts from the two languages have been matched as far as possible in terms of purpose, subject matter and register, which means that the corpus can be treated both as a ‘comparable corpus’ and as a ‘translation corpus’ (on this distinction, see Johansson 1998). Table 3. Size and composition of the English-Swedish Parallel Corpus Direction

Text samples

No. of words

Fiction Non-fiction Total Eng. original → Swe. translation Swe. original → Eng. translation

20 20

20 20

40 40

1,043,000 1,031,000

Causative constructions in English and Swedish

. Method The composition of the corpus makes it possible to compare the languages in several ways (cf. Aijmer et al 1996): (a) Source texts → source texts. By comparing the use of the B constructions in the original English and Swedish texts we can get an indication of their frequency and relative importance in each language. (b) Source texts → translations. Using the original texts as a starting point and comparing them with the corresponding translations into the other language, we can find out how English causative make is translated into Swedish and how Swedish causative få is translated into English. This will give an indication of the main translation equivalents used to render the B constructions in each language and the relative importance of these equivalents. (c) Translations → source texts. Using the translations as a starting point and comparing them with the corresponding source texts in the other language, we can find out which Swedish source constructions have ended up as causative make constructions in the English translations and which English source constructions have ended up as causative få constructions in the Swedish translations. This ‘reversed’ approach will give an indication of the range of source constructions that have been used as a point of departure for the B constructions in the target language. Studying the translations in this direction will be a useful supplement to approach (b) and serve as a check on possible translation effects (cf. Johansson 1998).

. Analytical B constructions in source texts and translations The relative frequencies of English make and Swedish få in analytical B constructions in the English and Swedish source texts and translations are shown in Table 4. Table 4. Causative make and få (type B) in source texts and translations (n/100,000 words)

Source texts Translations

Make

Få

31.6 23.5

16.7 40.0





Bengt Altenberg

Two striking tendencies emerge from the figures. First, make has a much more dominant position as a causative B verb in the English source texts than få has in the Swedish source texts. This suggests that causative få has greater competition from alternative expressions in Swedish than make has in English. One way of uncovering these alternatives will be to look at the Swedish sources of causative make in the English translations. Second, in the translations this tendency is reversed: causative få is much more common in the Swedish translations than make is in the English translations. This indicates somewhat paradoxically that, whereas the Swedish translators regard få as a natural means of rendering causative B constructions in their translations and tend to overuse it as a result, the English translators display the opposite tendency: they seem to underuse analytical make, evidently preferring other alternatives. Another reason could of course be that its main source — Swedish få — is relatively infrequent in the Swedish original texts. To determine this we shall have to look more closely at the English sources and translations of causative få.

. Swedish equivalents of English causative make To find out more about the causative options in the two languages, let us first examine the Swedish equivalents of English B constructions in the corpus. As mentioned, these can be established by looking at how make has been translated into Swedish and by looking at the Swedish sources of make in the English translations. As shown in Table 5, five main types of equivalents can be distinguished, four involving a causative verb of some kind and one ‘miscellaneous’ Table 5. Swedish equivalents of English causative make (type B) Types of Swedish equivalents

a. b. c. d. e.

Swedish translations

Swedish sources

Total

0N

0%

0N

0%

0N

0%

Congruent construction with få Other causative verb + NP + Vinf/Vfin Causative verb + NP + Adj (type A) Synthetic causative verb Miscellaneous other constructions

089

054

041

032

130

044

031

019

014

011

045

015

005

003

009

007

014

005

001 040

001 024

008 056

006 044

009 096

003 033

Total

166

100

128

100

294

100

Causative constructions in English and Swedish

category in which the causative relation is expressed in various other ways. The most common Swedish equivalent in the corpus is a congruent B construction with få as a causative verb. It is especially common in the Swedish translations (54%) but only accounts for a third of the examples in the source texts (32%). This disproportion confirms the picture of Swedish få given in Table 4. The Swedish translators obviously regard analytical få as the prototypical equivalent of the corresponding English constructions, overusing it as a result. However, despite its status as the most natural Swedish equivalent of make, it has strong competition from other alternatives, especially in the Swedish source texts where the miscellaneous category is very common (44%). The second most common equivalent (disregarding the miscellaneous category) is the use of a causative verb other than få (15%) appearing either in a congruent construction with an infinitive complement or with a following finite object clause. This type, too, is especially common in the Swedish translations. The following list includes all the verbs of this kind in the corpus (source texts as well as translations): komma NP + Vinf tvinga NP + Vinf göra att + finite clause låta NP + Vinf se till att + finite clause säga till NP + finite clause be NP + Vinf göra så att + finite clause ha NP + Vinf tillhålla NP + Vinf vinnlägga sig om + Vinf

13 19 16 16 14 12 11 11 11 11 11

With the exception of the high-frequency verbs komma ‘come’ and göra ‘make’, these verbs are generally more specific in meaning than få, indicating varying types of coercion, manner and modality (cf. tvinga ‘force’, tillhålla ‘admonish’, låta ‘let’, be ‘ask’). In addition, the choice of verb is determined by selection restrictions, most of them requiring a human subject (e.g. se till ‘see to’, säga till ‘tell’, be ‘ask’, göra så att ‘do so that’). A third, less common, Swedish alternative is to use the causative verb göra followed by an adjective complement instead of an infinitive, i.e. an A construction rather than the B construction (cf. Table 2):





Bengt Altenberg

English version:

Swedish equivalent:

make NP feel dizzy (2) make NP feel better make NP feel calmer make NP feel cheerful make NP feel less grim make NP feel uneasy make NP feel worse make NP look foolish make NP look handsome make NP look whiter make NP look real make NP sweat

göra NP yr (2) ‘make NP dizzy’ göra NP bättre till mods ‘make NP better at heart’ göra NP lugnare ‘make NP calmer’ göra NP glad ‘make NP cheerful’ göra NP mindre tryckt ‘make NP less depressed’ göra NP underlig till mods ‘make NP uneasy at heart’ göra NP olustig ‘make NP uneasy’ göra NP löjlig ‘make NP ridiculous’ göra NP vackrare ‘make NP more handsome’ göra NP vitare ‘make NP whiter’ få NP riktig ‘get NP real’ göra NP svettig ‘make NP sweaty’

Interestingly, in the great majority of these cases the only difference between the English and Swedish versions is that a copular verb of perception — feel or look — is present in the English version and absent in the Swedish one. In other words, it seems as if these verbs tend to be redundant in Swedish.3 In one case the Swedish verb is få rather than göra (få NP riktig ‘get NP real’). Få is used as an alternative causative A verb to indicate that some degree of effort is involved in the action and that the outcome is successful (cf. Viberg, this volume). Another rather rare Swedish alternative is to use a synthetic verb conflating the resulting state or event with the causative meaning of make: English version:

Swedish equivalent:

make NP eat make NPs differ make NP emerge (by washing) make NP go further make NP stand make NP stay behind make NP think of make NP turn down make NP wet the hair

mata NP ‘feed NP’ skilja NPs ‘distinguish NPs’ tvätta fram NP ‘wash forth NP’ dryga ut NP ‘make-last NP’ ställa NP ‘put NP’ hålla NP kvar ‘keep NP behind’ påminna NP om ‘remind NP of ’ dra NP neråt ‘pull NP down’ vattenkamma NP ‘watercomb NP’

As this list indicates, the synthetic alternatives are mainly restricted to cases where the complement verb of the corresponding analytical construction is intransitive (e.g. make NP stand = ställa ‘put’ NP). When the complement verb is transitive, the object has to be incorporated into the synthetic verb in some way (e.g. make NP wet the hair = vattenkamma ‘watercomb NP’, where kamma

Causative constructions in English and Swedish

‘comb’ implies ‘hair’). This complication may be one of the reasons why synthetic verbs are generally less common as alternatives to B constructions than to A constructions in the corpus (cf. Altenberg 1998). In addition to these Swedish alternatives, all of which contain a causative verb of some kind, there is a large number of other Swedish variants (called ‘miscellaneous’ in Table 5) in which the causative elements are reorganised grammatically in various ways. These variants are especially common as Swedish sources of analytical English constructions. Three recurrent subtypes can be distinguished in the material: i. the cause is implied and omitted ii. the result is expressed in a finite clause and the cause by a different syntactic element iii. the result is nominalised or replaced by a nominal expression i. When the cause is unspecified or implied in the context there is often no need to use a causative construction as long as the result is expressed. This is illustrated in (5), where the agentless passive causative verb (is made) in the English original is left out in the Swedish translation, and in (6), where the causative verb in the English translation corresponds to a modal auxiliary expressing obligation (måste ‘had to’) in the Swedish original: (5) Brand, the hero of the poems is made to say … (RH)

Brand, diktens hjälte, säger … ‘Brand, the poem’s hero, says …’

(6) Dag måste lova att inte föra det vidare. (MG) ‘Dag had to promise ...’

She made Dag promise not to pursue the matter any further.

ii. Generally, however, the cause is specified in both versions but encoded by a syntactic element other than the subject in the Swedish text. A common Swedish strategy (in source texts as well as translations) is to express the causative result in a finite clause and indicate the cause in the form of an adverbial of reason: English version:

Swedish equivalent:

X makes NP feel like crying

NP blir gråtfärdig av X ‘NP becomes cry-ready of X’ NP mår inte bra av X ‘NP does not feel well from X’ NP jämrade sig vid X ‘NP groaned at X’

X makes NP feel bad X made NP groan





Bengt Altenberg

English version:

Swedish equivalent:

X made NP groan X made NP laugh X made NP blow up X almost made NP explode

NP jämrade sig vid X ‘NP groaned at X’ NP skrattade åt X ‘NP laughed at X’ därför svällde NP ut ‘therefore NP swelled out’ Då höll NP på att smälla av ‘then NP almost exploded’

Something similar is illustrated in the following examples, where the interrogative wh-pronoun (the causative subject) in the English version is rendered by an interrogative adverb (varför ‘why’, hur ‘how’) in the Swedish version: ‘What makes you say that?’ ‘What makes you think I don’t know?’ ‘What made you think I was looking for him?’

“Varför säger du det?” “Varför tror du inte jag skulle göra det?” “Hur visste du att det är honom jag letar efter?”

Alternatively, the cause and the result can be expressed in two finite clauses linked by a subordinator indicating result or purpose: (7) Normally it takes a lot more than I normala fall tarvas det betydligt fler that to make me feel outnumän så för att jag ska känna mig i bered. (JB) minoritet. ‘so that I shall feel myself in a minority’

iii. Another common Swedish alternative is to nominalise the result and encode it syntactically as the direct object in a monotransitive or ditransitive construction. The ‘causee’ is either implied or expressed as the direct object: English version:

Swedish equivalent:

make NP change make NP think of make NP appear to make NP stink

åstadkomma förändringar ‘achieve changes’ föra tanken till ‘bring the thought to’ ge intryck av att ‘give the impression that’ ge NP dåligt rykte ‘give NP a bad reputation’

Alternatively, the result can be rendered by a prepositional phrase indicating the goal or result of the causative event while the ‘causee’ is retained as the direct object of the causative verb. Most of these variants are set expressions in Swedish: make NP look like a fool make NP go make NP eat their words

göra NP till åtlöje ‘make NP to ridicule’ hålla NP i gång ‘keep NP in motion’ sätta NP på plats ‘put NP in place’

Causative constructions in English and Swedish

make NP observe rituals make the money go round

tvinga NP in i ritualer ‘force NP into rituals’ sätta pengarna i rörelse ‘put the money in motion’

The fact that these miscellaneous constructions are more common in the Swedish source texts than in the translations suggests two things. First, despite their formal variation, they represent important causative alternatives in Swedish. At the same time, the Swedish translators obviously find it easier to render them as analytical causatives than to retain them in their translations. As we shall see, the same tendency is evident in the English texts.

.

English equivalents of Swedish få

Let us now reverse the perspective and look at the English equivalents of analytical Swedish B constructions with få. Table 6 shows the main English types of equivalents used either as translations of Swedish få or as English sources of få in the Swedish translations. Table 6. English equivalents of Swedish causative få (type B) Types of English equivalents

a. b. c. d.

English translations

English sources

Total

0N

0%

0N

0%

0N

0%

040

049

088

043

128

044

Congruent construction with make Other causative verb + NP + Vnfin/Vfin Synthetic causative verb Various other constructions

023

028

038

018

061

021

005 013

006 016

030 051

014 025

035 064

012 022

Total

081

100

207

100

288

100

The most common English equivalent is a congruent analytical construction with make. Hence, make can indeed be described as the main English equivalent of Swedish få and the picture of make and få as mutually corresponding causative B verbs is confirmed. In fact, the relative frequency of make as an equivalent of få in the English translations and source texts is exactly the same (44%) as that of få as an equivalent of make in the Swedish texts (cf. Table 5). The proportion of make is slightly higher in the English translations (49%) than in the English source texts (43%), which suggests that the translators regard it as a particularly natural and handy alternative and tend to overuse it as a result. This contradicts the picture given in Table 4 where make appears to





Bengt Altenberg

be underrepresented in the English translations. However, the higher proportion of analytical make in the English translations demonstrated in Table 6 clearly indicates its attractiveness to the translators, and even if the increased use is not so dramatic as that of få in the Swedish translations (cf. Table 5), the term ‘overuse’ seems justified in both cases. Yet, despite its favoured position as the most common alternative, the relative frequency of make does not reach 50% even in the translations. Just like få in Swedish, English make has strong competition from other causative variants. The second most common English alternative (disregarding the miscellaneous category) is to use a causative verb other than make, followed either by a non-finite clause or, exceptionally, a finite object clause. As shown in Table 7, these verbs are especially frequent in the English source texts, but their proportion is in fact higher in the translations, where they account for no less than 28% of the examples. This means that the English translators tend to rely rather heavily on a smaller set of causative verbs in their renderings of Swedish få. Table 7. Other causative English verbs Type of causative verb

English translations

English sources

Total

get NP + Vinf cause NP + Vinf lead NP + Vinf ensure (that) + finite clause set (off) NP + Ving stop NP + Ving allow NP + Vinf encourage NP + Vinf have NP + Ving send NP + Ving adapt NP + Vinf compel NP + Vinf enable NP + Vinf get + NP + Ving have NP + Vinf have NP + Ved induce NP + Vinf leave NP + Vinf persuade NP + Vinf render NP + Vinf rouse NP + Vinf

29 25 22 23 20 20 21 22 20 20 20 21 20 20 20 20 20 20 20 20 20

27 24 26 20 23 23 21 20 22 22 21 20 21 21 21 21 21 21 21 21 21

16 19 18 13 13 13 12 12 12 12 11 11 11 11 11 11 11 11 11 11 11

Total

23

38

61

Causative constructions in English and Swedish

Like the corresponding Swedish verbs, these English variants are generally more specific in meaning than make, indicating various types of coercion and modality (cf. compel, persuade, encourage, enable, allow), requiring a particular type of subject (human or non-human), or specifying the outcome of the causative event (e.g. prevention). The majority of the verbs take an infinitive complement (the outstanding choice in the translations) but quite a few take an ing-participle. Have also occurs with a past participle and ensure with a finite object clause. A third English alternative is to use a synthetic causative verb conflating the causative meaning of make and the meaning of the complement verb or predicate (e.g. make NP rise = lift NP). This alternative is much more common as an English source (14%) than as a translation (6%) of the analytical Swedish få construction, which indicates that the translators find it more difficult to retrieve a synthetic English equivalent than the readily available analytical construction. Yet, to judge from the corpus, synthetic verbs are a more important causative alternative in English than in Swedish (cf. Table 5). Some examples of synthetic English verbs in the material are: Swedish version:

English equivalent:

få NP att slappna av (3) få NP att brista (2) få NP att lossna (2) få NP att öka takten/farten få NP att mjukna/vekna få NP att slå ner/bort blicken få NP att bibehållas få NP att explodera få NP att fladdra få NP att fällas ihop få NP att gapa få NP att gå (snett) få NP att hoppa högt få NP att inse få NP att lyfta få NP att mörkna få NP att resa sig få NP att slå sig få NP att spricka få NP att tappa koncepterna få NP att tova sig få NP att tystna

relax NP (3) break NP (2) remove/loosen NP quicken NP (2) soften NP (2) outstare NP (2) keep NP explode NP ruffle NP collapse NP astonish NP head NP (diagonally) startle NP teach NP lift NP thicken NP rouse NP warp NP burst NP unnerve NP mat NP silence NP





Bengt Altenberg

få NP att övergå till få tiden att gå

turn NP to pass the time

As this list demonstrates, the great majority of the synthetic English verbs correspond to an analytical Swedish construction with an intransitive (or reflexive) verb complement. Hence, the pattern of the Swedish synthetic verbs has a clear parallel in English: analytical constructions with transitive complements cannot easily be transformed into synthetic verbs unless the object of the transitive verb can be incorporated into the synthetic verb in some way (e.g. få NP att slå ner blicken ‘make NP turn down the gaze’ — outstare NP; få NP att tappa koncepterna ‘make NP lose his nerve’ — unnerve NP). But even in cases where the analytical construction has an intransitive verb, a synthetic alternative is often not lexically available (cf make sb cry → *cry sb). The choice of a synthetic verb is thus restricted in both languages, lexically as well as grammatically, whereas the analytical construction is nearly always possible. In addition to these English variants the corpus also contains a large group of miscellaneous alternatives in which the causative relation is reorganised grammatically in various ways. These alternatives are especially common as English sources of Swedish analytical causatives (25%) and, like their Swedish counterparts, they highlight the formal variation of causative expressions. Although they are not as common as their Swedish counterparts in the corpus (cf. Table 5), their structural patterning is very similar to that of the Swedish ones. If we ignore cases where the cause is implied and therefore omitted, the same subtypes can be distinguished as in the Swedish texts. i. The result of the causative event can be expressed in a finite clause and the cause rendered by an adverbial of reason: Swedish version:

English equivalent:

Ett ekonomiskt avgörande fick NP att åka X fick NP att ramla ur stolen En impuls fick NP att öppna Y

NP went for financial reasons NP fell about in his chair at this On impulse NP opened Y

Alternatively, the cause and the result can be expressed in separate clauses linked by a subordinator indicating result: (8) Han hade sinnesnärvaro nog — och turen — att rulla över på rygg, och det fick honom att känna sig lugnare. (JC1T) ‘… made him feel safer’

He had the sense, and luck, to roll this time onto his back so that [...] he was more safe.

Causative constructions in English and Swedish

ii. In the great majority of cases, however, the result of the causative event is rendered by a nominal expression acting either as the direct object or as a prepositional complement of the verb. In the latter case, the English verb is typically causative, the causee functions as the direct object, and the prepositional complement expresses the goal or result of the causative event. Many of these examples are set expressions or restricted collocations: Swedish version:

English equivalent:

få NP att framstå som X (2) få NP att gråta (2) få NP att leva få NP att upphöra få NP att klättra uppför väggarna få NP att fälla tårar få NP att acceptera få NP att läsa få NP att råka i panik få NP att spänna av få NP att gå samman få NP att övergå till terrorism

turn/make NP into X bring NP to tears, have NP in tears bring NP to life bring NP to an end drive NP up the wall have NP in tears lull NP into acceptance lure NP into reading push NP into panic put NP at ease turn NP into a group turn NP to terrorism

When the nominalised result is represented as the direct object in the English version, the causee generally acts as the indirect object: Swedish version:

English equivalent:

få NP att tro få NP att tveka få NP att känna självförtroende få NP att likna få NP att rysa få NP att se ut som få NP att verka få NP att förtvivla

give NP cause to believe give NP pause for thought give NP confidence give NP the look of give NP a chill give NP the air of give NP the appearance of bring despair to NP

Sometimes the causee appears as a possessive modifier: Swedish version:

English equivalent:

få NP att visa förtrolighet få NP att sluta flina få NP att börja gråta få NP att lyssna

invite NP’s confidence wipe the smile off NP’s face bring tears to NP’s eyes get the ear of NP





Bengt Altenberg

If we compare the ‘miscellaneous’ categories in the English and Swedish texts, we find surprisingly similar structural patterns. The main difference lies in the proportion of the subtypes used: in the Swedish texts the preferred strategy is to express the result in a finite clause and indicate the causative relation by an adverbial of reason or by a subordinator indicating result or purpose, while the tendency to nominalise the result is less prominent. The English texts display the opposite tendency: finite alternatives to the analytical construction are less common, while nominalised results are very frequent. But all subtypes are used in both languages, and they are more common in the source texts than in the translations of both languages. Hence, the translators often find it easier to render them by analytical constructions than to retain them.

. Contrastive summary As this contrastive survey has demonstrated, English and Swedish have a surprisingly similar range of resources for expressing causative relations. The main types in both languages are: – – – –

analytical constructions with make in English and få in Swedish other causative verbs + NP + Vnfin/Vfin synthetic causative verbs miscellaneous other constructions

These have roughly the same rank order in the two languages but their proportions differ somewhat. The English texts display a more frequent use of other causative verbs and synthetic verbs, whereas the Swedish texts have more constructions of the miscellaneous type, especially finite causative variants. In addition, the Swedish texts have a tendency to convert analytical B constructions into A constructions (with an adjective complement) in cases where a copular verb of perception (feel, look) can be omitted. In both languages the proportion of these types also differs in the source texts and the translations. Broadly speaking, the source texts display a more even distribution of the different alternatives, while the translations tend to rely more on the two most common types, especially the analytical construction. This suggests that the latter are easier to use and more readily retrieved by the translators of both languages. In both languages the analytical construction is the most common alternative, even if it seldom accounts for more than 50% of the examples. It is

Causative constructions in English and Swedish

especially common in the translations, where it is not only used to render analytical counterparts in the source language but frequently replaces other source constructions.4 In other words, the analytical construction tends to be ‘overused’ by the translators in both directions. In this respect the translators behave much like the advanced Swedish learners. The difference is that while the learners overuse the analytical construction in their L2, the translators do it when they translate into their own language. The reason for this is no doubt the different status of the causative options and the restrictions that determine the choice between them. In terms of frequency alone, the analytical construction represents the prototypical or ‘unmarked’ choice in both languages. This, in turn, reflects linguistic conditions of various kinds. For example, synthetic verbs are not always lexically available in either English or Swedish, and when they are, they are mainly used as alternatives to analytical constructions with an intransitive verb complement. Constructions with causative verbs other than make and få are also restricted, mainly because such verbs tend to have more specific meanings and be subject to various selection restrictions. The use of the miscellaneous other constructions revealed in the material is also constrained in various ways, for example because they restructure the causative elements or because they tend to be idiomatic or collocationally restricted. Many alternatives are also stylistically marked, being either more formal or informal than the analytical variant (cf. induce sb to do sth and drive sb up the wall). By contrast, the analytical construction is linguistically and contextually unmarked and can therefore nearly always be used: it is lexically unrestricted (always available), semantically more general, grammatically more versatile, and stylistically more neutral than the other alternatives.

. Conclusions The main contrastive conclusion that can be drawn from this study is that, on the whole, English and Swedish provide a very similar range of causative options. In both languages the dominant choice is the analytical construction with equivalent high-frequency verbs (make and få), but there are also a number of competing alternatives — other causative verbs, synthetic verbs and various grammatically ‘reorganised’ causatives — all of which tend to be lexically, grammatically or stylistically restricted and therefore more difficult for learners. Overuse of a target structure can either be explained as the result of over-





Bengt Altenberg

generalisation of an L2 pattern (intralingual influence) or as the result of transfer from L1 (interlingual influence). As we have seen, the analytical construction can be regarded as the unmarked causative in English. It is therefore reasonable to assume that learners — even advanced learners — will tend to overgeneralise this construction at the expense of more marked alternatives. The problem with this explanation is that French-speaking learners do not overuse this construction in their L2 writing, as would be expected if intralingual influence had been the decisive factor. Consequently, we have to turn to transfer as a more plausible explanation. Transfer, too, can be linked to the notion of markedness. As we have seen, the analytical construction can also be regarded as the unmarked form in Swedish. According to Hyltenstam (1984: 43), learners are likely to substitute unmarked categories from their native language for corresponding marked categories in the target language, whereas marked structures are seldom transferred, especially when the corresponding target category is unmarked.5 This prediction is clearly applicable to the Swedish learners’ overuse of the analytical construction in English. However, transfer can also be explained by the concept of prototypicality and by learners’ judgements of the similarity between L1 and L2. What they perceive as prototypical and semantically transparent in their L1 determines what they transfer to their L2 (see Ellis 1994: 326 and Kellerman 1983, 1986). This perception does not seem to be affected by their experience of or proficiency in L2, which would explain why advanced Swedish learners tend to overuse the analytical construction, while French-speaking learners do not. To Swedish learners the similarity between the prototypical causatives in English and their L1 is obviously more striking than it is to French-speaking learners. This ‘psychotypology’ — to use a term from Kellerman — can also be expected to retard second language development. Categories that are perceived as prototypical, unmarked or transparent are usually adopted early by learners and run the risk of becoming linguistic ‘teddy bears’ that continue to be favoured in later stages of the learning process at the expense of less common and more differentiated target alternatives (cf. Hasselgren 1994). The contrastive picture that emerges from this study thus suggests that the Swedish learners’ overuse of causative make with verb complements is the effect of transfer supported by cross-linguistic similarity. Learners who are unfamiliar with less common causative alternatives in English are likely to overuse the dominant target pattern and treat it as a lexico-grammatical ‘teddy bear’, especially if it is easy to transfer from their native language.

Causative constructions in English and Swedish

Methodologically, the study has demonstrated two other things. One is the usefulness of combining corpus-based interlanguage research with contrastive investigations based on parallel corpora. As Granger (1996: 46) has pointed out, results derived from learner corpora “can only be reliably interpreted as being evidence of transfer if supported by clear [contrastive] descriptions.” Such descriptions require empirical bilingual data from comparable corpora or translation corpora. Even if it has not been possible to make a detailed investigation of the factors determining the choice between the causative options in English and Swedish in the present study, the corpus has clearly revealed the causative ‘paradigms’ in the two languages and the degree of correspondence between them. This is a good starting point for further contrastive research of causatives in the future.

Notes . For a description of the International Corpus of Learner English (ICLE) and the methodology of corpus-based interlanguage research, see Granger (1993, 1998). . For a contrastive study of the A construction in English and Swedish, see Altenberg (forthcoming). . An inspection of the A consructions in the corpus shows a corresponding drift in the opposite direction: A constructions without a copular verb of perception in the Swedish texts tend to be represented by B constructions with such verbs in the English versions (cf Altenberg forthcoming). . Despite the competition from other causative options in both languages, the analytical constructions are often translated into each other: a calculation of their mutual ‘translatability’ in the corpus shows a cross-linguistic correspondence of 52% (on this concept, see Altenberg 1999). . Chinese ESL learners provide a good example of this. Chinese, being poor in derivational morphology, has no synthetic causative verbs. As a result, Chinese learners tend to transfer the analytical Chinese shi ‘make’ construction to their L2, greatly overusing make causatives in their English (see Wong 1983 and Juffs 1996:152).

References Aijmer, K., Altenberg, B. and Johansson, M. (eds). 1996. Languages in Contrast. Papers from a Symposium on Text-based Cross-linguistic Studies. Lund: Lund University Press. Aijmer, K., Altenberg, B. and Johansson, M. 1996. “Text-based contrastive studies in English. Presentation of a project”. In Aijmer et al. (eds) 1996:73–85.





Bengt Altenberg

Altenberg, B. 1999. “Adverbial connectors in English and Swedish: Semantic and lexical correspondences”. In Out of Corpora. Studies in Honour of Stig Johansson, H. Hasselgård and S. Oksefjell (eds), 249–268. Amsterdam: Rodopi. Altenberg, B. forthcoming. “Advanced Swedish learners’ use of causative make: A contrastive background study”. In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, S. Granger, J. Hung and S. Petch-Tyson (eds), Amsterdam and Philadelphia: Benjamins. Altenberg, B. and Granger, S. 2001. “The grammatical and lexical patterning of make in native and non-native student writing”. Applied Linguistics 22: 173–194. Ellis, R. 1994. The Study of Second Language Acquisition. Oxford: Oxford University Press. Granger, S. 1993. “The International Corpus of Learner English”. In English Language Corpora: Design, Analysis and Exploitation, J. Aarts, P. de Haan, and N. Oostdijk (eds), 57–69. Amsterdam: Rodopi. Granger, S. 1996. “From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora”. In Aijmer et al (eds) 1996:37–51. Granger, S. 1998. “The computerized learner corpus: a versatile new source of data for SLA research”. In Learner English on Computer, S. Granger (ed.), 3–18. London and New York: Addison Wesley Longman. Hasselgren, A. 1994. “Lexical teddy bears and advanced learners: a study into the ways Norwegian students cope with English vocabulary”. International Journal of Applied Linguistics 4: 237–260. Hyltenstam, K. 1984. “The use of typological markedness conditions as predictors in second language acquisition: The case of pronominal copies in relative clauses”. In Second Language: A Crosslinguistic Perspective, R. Andersen (ed.), 39–58. Rowley, Mass.: Newbury House. Johansson, S. 1998. “On the role of corpora in crosslinguistic research”. In Corpora and Cross-linguistic Research: Theory, Method, and Case Studies, S. Johansson and S. Oksefjell (eds), 3–24. Amsterdam and Atlanta: Rodopi. Juffs, A. 1996. Learnability and the Lexicon. Theories and Second Language Acquisition Research. Amsterdam and Philadelphia: John Benjamins. Kellerman, E. 1983. “Now you see it, now you don’t.” In Language Transfer in Language Learning, S. Gass & L. Selinker (eds), 112–134. Rowley, Mass.: Newbury House. Kellerman, E. 1986. “An eye for an eye: Crosslinguistic constraints on the development of the L2 lexicon”. In Crosslinguistic Influence in Second Language Acquisition, E. Kellerman and M. Sharwood-Smith (eds), 35–48. New York: Pergamon Institute of English. Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Song, J. J. 1996. Causatives and Causation. A Universal-Typological Perspective. London: Longman. Viberg, Å. 1996. “Cross-linguistic lexicology. The case of English go and Swedish gå”. In Aijmer et al. (eds) 1996:151–182. Wong, S. C. 1983. “Overproduction, underlexicalisation, and unidiomatic usage in the ‘make’ causatives of Chinese speakers”. Language Learning and Communication 2: 151–163.

P III

Contrastive Lexical Semantics

Polysemy and disambiguation cues across languages* The case of Swedish få and English get Åke Viberg

.

Introduction

Languages are at the same time very similar and very diverse. At a fundamental, cognitive level, there are certain similarities even between languages that are genetically and geographically widely separated. Simultaneously, there are often important semantic differences between cognates in closely related languages such as English go and Swedish gå (Viberg 1999a). Crosslinguistic lexicology (Viberg 1996) is concerned with this complex relationship of similarity and divergence between languages at the lexical level. It combines and tries to strike a balance between a number of approaches such as lexical universals (Berlin & Kay 1969, Goddard & Wierzbicka 1994, Newman 1996), linguistic relativity (Gumperz & Levinson 1996), lexical typology (Talmy 1985) and contrastive lexical analysis (Schwarze 1985). Crosslinguistic studies of the lexicon are relevant also to applied fields such as the second language lexicon (Hatch & Brown 1995, Singleton 1999) and the lexicon in machine translation (Dorr 1993, Wanner 1996). This paper will be concerned in particular with the nature of multiple meanings from a crosslinguistic perspective and with the interaction between word meaning and linguistic context in the disambiguation process. Words with multiple meanings are analyzed differently in various theoretical frameworks. The term multiple meanings is intended to be neutral with respect to the notions polysemy and homonymy. Polysemy is in general used to refer to the case where the ‘same’ word (lemma) is used with multiple meanings that



Åke Viberg

are somehow related, whereas homonymy is used to refer to the case where different words (lemmas) with totally unrelated meanings happen to be expressed by the same form. Studies concerned with the polysemy of words are concerned with the principles for linking various meanings and with the explanation and motivation of the links. A basic difference between theories concerns the nature of the primary meaning (taken as a neutral term): whether it is an ideal case (prototype) from which the other meanings represent deviations or of a more general or abstract type, which in some sense covers all the others and is not necessarily realized in pure form in any context. The former position has been predominant in more recent times, in particular with reference to prototypical meaning (e.g. Tsohatzidis (ed.) 1990, Taylor 1995, Geeraerts 1997). The second position was taken by Roman Jakobson (1936) in his famous study of the Russian case system, in which he aimed at describing an invariant general meaning (Gesamtbedeutung) which was independent of all the varying individual meanings (Sonderbedeutungen) induced by the contexts in which a case was used. The general meaning of a case was primarily dependent on the oppositions into which it entered with the other cases in the language system. More recently, a similar position has been taken by Pustejovsky (1995) and Poesio (1996) with respect to lexical meaning. Recent theories of the latter type have introduced the term underspecification. Lexical representations are underspecified with respect to the actual meanings of words which appear in actual text, where meaning is filled in from the linguistic context. The terms ambiguity and disambiguation often appear in contexts where comprehension is the major concern and are in principle neutral with respect to the distinction between polysemy and homonymy. When a listener or reader is confronted with a word form with multiple meanings, it is not possible to decide whether the word is homonymous or polysemous until the appropiate meaning has been identified. In order to make the distinction, disambiguation must already have been achieved. Disambiguation is also the first step in the translation of a word with multiple meanings. This paper will mainly be concerned with the contrasting use of syntactic and semantic cues in the disambiguation process of words with multiple meanings which are the primary translation equivalents across languages. The analysis is based on the assumption that the primary meaning could best be represented as a prototype, but the problems with establishing a primary meaning and links to the various extended meanings will only be briefly discussed in this paper. (See Viberg 1999b for an analysis along these lines of the polysemous Swedish verb slå ‘strike, hit, beat’.)

Polysemy and disambiguation cues across languages

Multiple meanings are common in verbs, particularly the most frequent ones. Among the 20 most frequent verbs in English, we find the four basic verbs of possession have, get, take and give. With a concrete object such as a camera, these verbs are readily interpreted as verbs of possession: Jane has a camera, Mary gave Jane a camera, etc. Very often, however, these verbs have an atypical object such as a headache or an idea: Peter has an idea, The noise gave Peter a headache. Such uses are often referred to as abstract possession, but this is only a cover term which conceals the problems involved in interpreting such expressions. There are also meanings which extend into other semantic fields such as motion: Eve got up early in the morning, The plane took off. In addition, some of these basic verbs have grammatical meanings such as Peter has left, Alexandra has to go or Mary got Jane to collaborate. The verbs of possession, especially the most frequent ones, are therefore a good testing ground for various approaches to multiple meanings. In this paper, the Swedish possession verb få will be compared with its closest equivalent in English, get, and, more briefly, with its correspondents in French and Finnish. The analysis is based on translation corpora — corpora of original texts and their translations (Johansson 1998). The availability of computerized translation corpora is likely to breathe new life into the method known as comparison of translations. A major earlier work B.C. (before computer corpora) is Wandruszka (1969), which is based on 60 publications in six Germanic and Romance languages. Wandruszka identifies Bally (1950) as the originator of the technique of comparing translations.

. The Swedish verb få . A brief look at major meanings and syntactic frames The comparison of Swedish and English that will be presented in this paper is based on the complete English-Swedish Parallel Corpus, ESPC (Aijmer et al. 1996, Altenberg & Aijmer 2000), which contains original text samples in English and Swedish together with their translations. The text samples represent both fiction and non-fiction and the total number of words from each source language is about half a million. The distribution of the meanings and syntactic frames of få (2043 occurrences in all) in the complete set of Swedish originals in the ESPC is shown in Table 1. For comparison, the distribution in a monolingual Swedish corpus, the ‘Stockholm-Umeå Corpus’ (SUC 1997), is also shown. The texts in the SUC represent a wide range of genres and were chosen according to principles similar to the ones used for the Brown corpus.





Åke Viberg

The total number of words is also in the same range: 1 million. In this corpus, there are 4588 occurrences of the verb få. 1009 of these were selected randomly and coded for meaning and syntactic frame. In the following sections, the individual meanings of få will be briefly presented. When the verb refers to Possession it has an NP as object with a concrete noun as head. The NP can also consist of a pronoun referring to such a concrete object but pronominalized NPs will not be dealt with here, since they Table 1. The major meanings of få in the English-Swedish Parallel Corpus (ESPC) and in the Stockholm-Umeå Corpus (SUC) Meaning

Syntactic frame

Proportion % ESPC SUC N=2043 N=1009

Possession Per fick en kamera. Per got a camera

få + NPConcrete Per got a camera.

12.7

11.9

Abstract possession Per fick en idé. Per got an idea

få + NPAbstract Per got an idea.

29.1

32.9

Modal: Permission/Obligation Per fick sälja kameran. Per got sell camera-the

få + VPInfinitive 1. Per was allowed to sell his camera. 2. Per had to sell his camera.

39.8

33.3

Inchoative Per fick se en älg. Per got see an elk

få + VPInfinitive [V: se, höra, veta] 4.5 Per caught sight of an elk.

4.3

Causative Per fick oss att skratta. Per got us to laugh

få +NP+att VPInfinitive Per made us laugh.

3.9

4.0

Attempt=>Success Per fick upp dörren. Per got up door-the

få + Particle +NP Per managed to open the door.

4.7

5.8

Attempt=>Success Per fick benen fria. Per got legs-the free

få + NP + ADJResult Per got his legs free.

0.4

0.5

Beneficiary/Maleficiary få + NP + Participle Per fick bilen reparerad/stulen. Per got his car repaired/stolen. Per got car-the repaired/stolen

1.6

2.1

Various other alternatives

3.3

5.2

Polysemy and disambiguation cues across languages

involve problems of a general nature, which are not specifically related to verbs of possession. The NP can also have an abstract noun as head as in Per fick en idé ‘Per got an idea’. This case is referred to as ‘abstract possession’ which, as mentioned above, is only a cover term for a number of problematic cases, some of which will be commented on in greater detail in Section 3.2. In any event, the interpretation in this case is closely related to the meaning of the abstract noun. There are several frequent uses where få is combined with an infinitive. The most important cases are the ones where the infinitive is bare (without the infinitive marker att). With one important exception mentioned below, få in this construction expresses deontic modality, but the interpretation is ambiguous in principle and can either be permission or obligation. Certain main verbs may strongly suggest one of the two alternative interpretations, but the choice is ultimately motivated by pragmatic factors. The modal meaning of få is always deontic (root-, agent-oriented), never epistemic (possibility or certainty). When the following verb is one of the two perception verbs se ‘see’ or höra ‘hear’ or the cognitive verb veta ‘know’, there is an alternative interpretation which is more frequent than the modal ones. In combination with these verbs, få usually has an inchoative sense, even if the modal interpretations are still possible. When få is combined with an NP as object followed by an infinitive combined with the infinitive marker att, the interpretation is causative: Per fick oss att skratta ‘Per made us laugh’. In this use, the subject of få may be agentive, which is an excluded reading in the constructions discussed earlier. The agentive interpretation is more or less obligatory when få is combined with a spatial particle such as upp ‘up’, ut ‘out’ or in ‘in’: Per fick upp dörren ‘Per managed to open (lit. got up) the door’. The meaning of få in this type of construction is close to ‘succeed’. It implies an active attempt on the part of the subject. The same applies when there is a resultative adjective as complement: Per fick benen fria ‘Per got his legs free’. When the object of få is combined with a past participle, the subject of få is psychologically affected by the outcome of the event described by the participle and can be interpreted as a Beneficiary or Maleficiary: Per fick bilen reparerad/ stulen ‘Per got his car repaired/stolen’. What is notable about få is the extent to which the syntactic frame can be used as a cue for disambiguation except in cases involving the distinction between concrete and abstract possession or between the two modal meanings of permission and obligation.





Åke Viberg

. Major translation equivalents in English, French and Finnish In Table 2, representative examples of the major meanings of få are given together with their translations into English, French and Finnish. The Swedish and English versions are taken from the ESPC, whereas the other versions are extracted from a small corpus prepared by the present author. The source of the examples from the ESPC is indicated by a text code. For an explanation of these codes, see Altenberg et al. (1999). The examples are taken from Ingmar Table 2. Translations of få associated with various meanings Swedish

English

French

Finnish

1. Concrete Possession Nu kommer det här med kinematografen. Det var min bror som fick den. IB

That was when the cinematograph affair occurred. My brother was the one who got it.

Alors arrive cette histoire du cinématographe. Le cinématographe c’est mon frère qui l’a eu.

Nyt on tämän kinematografiasian vuoro. Kojeen sai minun veljeni.

He felt sick.

Une nausée monta en lui.

Johania alkoi oksettaa.

2. Abstract possession Han fick kväljningar. KE

3. Modal: Permission Annie visste inte ens om Annie didn’t even know Annie ne savait même pas si elle pouvait quitif she was allowed to hon fick lämna köket. ter la cuisine. leave the kitchen. KE 4. Modal: Obligation Så fick Annie sätta sig i en fåtölj vid kaffebordet. KE 5. Inchoative Men jag får veta det snart. KE 6. Causative Nånting — var det en lukt? — fick honom att tänka på fisk. KE

Annie dut donc So Annie had to sit down in an armchair by s’asseoir dans un fauteuil près de la the coffee table. table basse.

Annie ei tiennyt edes, saiko hän lähteä pois keittiöstä. Annie siis sai luvan istua nojatuoliin kahvipöydän viereen.

But I’ll find out soon.

Mais je le saurai bientôt.

Mutta saan kyllä tietää.

Something — was it a smell? — made him think of fish.

Quelque chose — futce une odeur? — le fit penser à du poisson.

Jokin — hajuko ehkä? — sai hänet ajattelemaan kalaa.

Cette nuit, on m’a volé quatre pneus Hakkapeliitta tout neufs.

multa on viime yönä viety neljä uutta hakkapeliittaa.

7. Beneficiary/Maleficiary nu har jag fått fyra nya I had four new Hakkapeliitta tyres Hakkapeliitta stulna i stolen last night. natt. KE

Polysemy and disambiguation cues across languages

Bergman’s (1987) autobiography Laterna magica (IB) and the novel Blackwater (KE) by Kerstin Ekman (1993) and their translations into the languages mentioned above. When få is used in its basic meaning involving receiving a concrete object as possession, the most common equivalent in English is get and in Finnish saada ‘get, receive’. The most common translation into French is some form of avoir ‘have’. With an abstract noun as object such as kväljningar ‘nausea’, a change of construction is relatively frequent in the translations. What literally means ‘He got nausea’ in Swedish is rendered as ‘He felt sick in’ English, ‘A nausea arose in him’ in French and ‘Him (partitive case) began nauseate’ in Finnish. When få is used as a modal verb indicating either Permission as in (3) or Obligation as in (4), English and French generally use modal verbs as translations, whereas Finnish to a great extent uses saada, the primary equivalent even in contexts involving concrete possession. The inchoative meaning of få appearing with certain mental verbs as in (5) is often left unexpressed, or signalled by the choice of an inchoative mental verb instead of a stative one (e.g. find out instead of know in English). The use of få as a periphrastic causative is exemplified in (6). In this case the most frequent equivalent in English is make and in French faire ‘make’; in Finnish saada is also used to express this meaning. In (7), the construction with an object followed by a past participle is shown. In the comparison of original and translated texts, the word in the translated text which corresponds to a specific instance of the word under discussion in the source text will be called a ‘translation’. The term ‘translation equivalent’ will be used in a more restricted sense. The most frequent translations of få into English are shown in Table 3, where the English verbs have been classified into semantic fields according to their basic meaning, which means that extended uses are included in the counts except for have to which is counted separately. As can be observed, the two major fields are Possession and Modality. When få is used as a periphrastic causative, the most common translation is make. The major meanings of Swedish få are thus reflected in the basic meaning of the most frequent translations. In total, there are 2043 occurrences of få in the Swedish originals. The most frequent translations in English shown in Table 3 below account for 60.3% of all translations. In addition, there is a considerable number of other translations which occur only a few times (many occur only once). The most frequent English translation is get, which is the closest semantic equivalent of få in its prototypical meaning (see 3.1), but it occurs in only 11.7 % of the cases. Another frequent translation is have as a main verb, which accounts for 7.6%. If





Åke Viberg

Table 3. The most frequent translation equivalents of få in English Fields

Total

Possession

0Modality

get have give receive acquire obtain

239 156 077 066 025 019

0must 0have to 0can 0be allowed 0may 0shall

Total

582

Causative 059 134 079 064 194 029

make

559

61

61

Other catch come

19 10

29

1231

Various other alternatives

0812

Total

2043

the occurrences of have to, which reach 6.6%, had not been accounted for separately, have would actually be the most frequent translation. A similar survey of the most frequent translations into French is given in Table 4, based on extracts from six Swedish novels. Table 4. The most frequent translations of få in French (based on extracts from 6 Swedish novels) Fields

000Total

Possession

Modality

avoir donner recevoir obtenir trouver offrir prendre

falloir devoir pouvoir obliger*

Total

0‘have’ 0058 0‘give’ 0019 0‘receive’ 0015 0‘acquire’0011 0‘find’ 0011 0‘offer’ 0005 0‘take’ 0005 0124

0Causative 0‘must 0‘ought’ 0‘can’ 0‘oblige’

038 025 026 005

0faire ‘make’ 017

094

017

235

Various other alternatives

261

Total

496

*including être obligé

The most frequent translations represent 47,4 % of all the translations. The most notable result is that få does not have any direct equivalent in French even in its prototypical use as a possession verb. The use of avoir ‘have’ (usually in a perfective form) to indicate a change of possession represents a semantic exten-

Polysemy and disambiguation cues across languages

sion of the prototype. The verbs recevoir ‘receive’ and obtenir ‘acquire’ correspond semantically more closely to få but do not have a frequency in French which is comparable to that of få in Swedish, even if only the uses as a possession verb are counted. The pattern of polysemy which distinguishes få has an interesting areal distribution. The Norwegian cognate få shares most of the meaning patterns with the Swedish verb, whereas the Danish cognate få has similar uses only as a verb of possession. The closest correspondent of få as a modal verb in Danish is må, a cognate of English may. Interestingly, Finnish has the etymologically unrelated verb saada, which has a pattern of polysemy that closely resembles the Swedish and Norwegian one. As seen in Table 5, saada is the only translation that reaches a relatively high frequency. It represents 50,8 % of the translations, which is a high percentage for a verb with a complex pattern of polysemy. Table 5. The most frequent translation equivalents of få in Finnish (based on extracts from 6 Swedish novels) Fields

Total

Possession

Modality

saada

199

voida pitää täytyy joutua

Total

199

Other 0‘can’ 0‘must’ 0‘must’ 0‘ought to’

06 06 03 12 27

tulla

‘come’

06

06

232

Various other alternatives

160

Total

392

. Individual meanings of få In this section, the individual meanings of få will be discussed in greater depth although many problems have to be dealt with rather cursorily since the verb cuts across a number of complex semantic areas, such as possession, causation and modality, which have been studied intensively in recent years. The major translations of the various meanings of få will also be presented. In this case, too, it will only be possible to present a broad outline. There is an earlier study of få by Wagner (1976) which forms part of a contrastive analysis of modal verbs in Swedish and German.





Åke Viberg

. Possession The concept of Possession is complex and can only be discussed briefly in this paper. Even in the most straightforward case, where the object is concrete, possession can be construed in various ways. Miller & Johnson-Laird (1976: 565) use the following example to illustrate this: He owns an umbrella but she’s borrowed it, though she doesn’t have it with her (see also Heine, 1997, on possession). Using partly different terms than Miller & Johnson-Laird, we can say that He owns an umbrella refers to Ownership, whereas she borrowed it refers to Temporary Possession. Ownership presupposes certain socially regulated rights to use an object which is regarded as the property of a certain individual. These rights can be transferred permanently (e.g. as a gift) or temporarily (e.g. as a loan). The exact social norms motivating the lexicalization patterns are complex and vary a great deal between different cultures. The last part of the example, she doesn’t have it with her, refers to Physical Possession. Availability for immediate use seems to be the crucial notion behind this meaning. In the prototypical case, Possession involves both Ownership and Physical Possession, which can be combined as in the traditional text-book example: Peter gave Mary an apple (in her hand, which she could keep). Temporary Possession is a possible but marked interpretation with a verb such as give (Peter gave Mary a book as a loan.) When få refers to some aspect of concrete possession, the translation is predominantly a verb of possession. In English the most frequent translations of this meaning are (absolute frequency within parentheses): get (74), have (51), give (33), receive (19), acquire (7) and obtain (4). Together these verbs account for 72,3% of the total number of occurrences of få as a concrete verb of possession (N=260). When give is used as a translation, it usually appears in the passive. The French verb donner ‘give’ is also a rather frequent translation but generally appears in the active form with the generic subject on ‘one’ as in the following example (Finnish, as is usually the case, uses saada ‘get, receive’): (1) Jag fick varm choklad och smörgås med ost. (IB)

I was given hot chocolate and cheese sandwiches.

On m’a donné du chocolat et une tartine avec du fromage.

Sain kuumaa kaakaota ja juustovoileivän.

The passive is in fact used in the translations with a rather wide range of (mostly) verbs of possession such as be granted, be handed. The frequent use of passives in the translations is a reflection of the fact that få is inherently non-agentive (cf. 4.1 concerning get).

Polysemy and disambiguation cues across languages

. Abstract possession All verbs belong to a small number of dynamic classes which form a Dynamic System that cuts across all verbal semantic fields. In essence, a verb can either designate a state (no change) or a change, for example know (State) — realize (Change) or have, own (State) — get, lose (Change). Changes can either be inchoatives, which means they are pure changes without any indication of the cause, or causatives, which indicate a cause. Compare Harry died (Inchoative) and Peter killed Harry (Causative) or Harry lost his camera (Inchoative) and Peter stole the camera from Harry (Causative). Within a language, there are a number of ways to form complex (surface) predicates which fulfill the same function as a simple verb and in several cases can be used to paraphrase simple verbs (usually with some change in meaning). One such device is the use of Verb + Abstract Noun instead of a simple verb: ask => put a question, visit => pay a visit to, etc. In Swedish and English, the most basic verbs of possession meaning ‘have’, ‘get’ and ‘give’ in combination with abstract nouns form a very productive system generating complex predicates which represent states, inchoatives or causatives. The same dynamic contrasts are basic when complex predicates are formed with adjectives: ‘be’, ‘become’ and ‘make’. Sometimes it is possible to form a complete set of parallel predicates as shown schematically in Table 6 below (taken from Viberg 1981), in which various ways to form emotive predicates related to happiness are shown. To express the inchoative, for example, it is possible to say either X fick glädje ‘X got happiness’, X blev glad ‘X became (got) happy’ or to use a passive (gladdes) or reflexive (gladde sig) form of the verb glädja, which in its basic form has a causative meaning. Even though the discussion in this paper will be focused around the use of få and get, it is important to stress that the use of these verbs to form complex inchoative predicates in combination with abstract nouns is part of the more Table 6. Basic possession verbs as dynamic operators with an abstract noun as object. Dynamic meaning

Word class Noun

Adjective

Verb

State

X hade glädje av Y

X var glad (åt Y)

X gladdes/ gladde sig (åt Y)

Inchoative

X fick glädje av Y

X blev glad (åt Y)

Causative

Y gav X glädje

Y gjorde X glad

Y gladde X

have/get/ give happiness

be/become/ make happy

Emotion verb





Åke Viberg

general pattern involving ‘have’ and ‘give’. In spite of the fact that this use is very productive, there are obviously restrictions as to which abstract nouns can appear in such combinations. A much larger corpus than the one used in this study is required to pin down these restrictions. However, it is possible to identify certain semantic fields of abstract nouns which are frequent in such combinations. One distinct group is constituted by the nouns belonging to the field Physical Contact such as ‘a blow’, ‘a punch’, ‘a kick’. Such nouns can be combined with the basic possession verbs (except the stative ‘have’) to form complex predicates in all four languages considered here: (2) Han sparkade och han bet en av dem i armen och fick ett slag i nacken. (KE)

(kicking out, biting one of them in the arm, and received a blow on the back of his neck.

Il donna des coups de pied et en mordit un au bras, et reçut un coup sur la nuque.

Hän potki ja hän puri jotakuta kasivarteen ja sai iskun niskaansa.

The translations use parallels to the Swedish construction although the closest equivalent of få appears only in the Finnish translation with the verb saada. Even if such expressions exist in all four languages, a simple verb of Physical Contact in the passive form is a common translation. The French translation has an active form with the generic subject on ‘one’ (cf. 3.1): (3) Jag avstängdes från skolgång och fick mycket stryk. (IB)

I was removed from school and severely beaten.

On m’a renvoyé de l’école et on m’a beaucoup battu.

Koulunkäyntini keskeytettiin ja minua kuritettiin ankarasti.

The largest group appears to be nouns of Verbal Communication such as ‘order’, ‘offer’, ‘promise’ or ‘answer’ as in the following example: (4) Åke frågade på nytt om krattskaftet men fick samma svar. (KE)

Åke again asked about the rake handle, but was given the same answer.

Åke posa une nouvelle fois une question sur le manche de râteau mais obtint la même réponse.

Åke kysyi uudestaan haravanvarresta mutta sai saman vastauksen.

Even if be given corresponds to få in the English translation of this particular example, get an answer is also possible in English. French uses obtenir ‘obtain’ as a translation, while Finnish uses the primary equivalent saada. At a general

Polysemy and disambiguation cues across languages

level, the languages are rather similar with respect to the formation of complex predicates of verbal communication using basic possession verbs and abstract nouns, even if important contrasts can be found with respect to specific combinations of Verb + Abstract Noun. Verbal communication verbs in the passive form are quite frequent in the English translations, for example: (5) Kungens tjänare och de som tjänade andra stormän med häst och rustning fick här löften om stora privilegier. (AA)

The King’s servants and those who served other great men of the realm with horses and weapons were promised great privileges,

The following list contains a number of similar cases, most of them representing Verbal Communication: få + Nabstr → Verbpassive

0få + Nabstr → Verbpassive

0få + Nabstr → Verbpassive

få besked få namnet få nej ‘no’ få kritik

0få beröm 0få löfte 0få tillstånd 0få stöd

0få tröst 0 0få besök 00 0få stryk 00 00

be advised be named be refused be criticized

0be praised 0be promised 0be allowed 0be supported

be consoled be visited be thrashed, 00be beaten

In addition to examples of this kind, expressions with a verb of possession in the passive form followed by an abstract noun are also found: få + Nabstr → 0PossVerbpassive + Nabstr

få + Nabstr → PossVerbpassive + Nabstr

få råd 0be given advice få order 0be given orders få fria händer 0be given a free hand

få uppgift få tillstånd

be given a task be granted permission

There are also a number of cases where the translation involves a total change of grammatical roles. Such examples are sometimes used even when there is a ‘normal’, direct translation. The following example, which represents the ‘normal’ case, involves an abstract noun from the field Cognition, which is also fairly frequent in the material: (6) Medan han sakta promenerade genom de smutsiga gatornas snöslask fick han en idé. (GT)

While he slowly strolled through the slush in the dirty streets, he got an idea.

Besides this more or less direct translation of få en idé as get an idea, there is also a translation such as the one found in the following example, where ‘I got the idea’ is translated as ‘The idea came to me’. This is an example of a change from Experiencer as subject to Stimulus as subject which is characteristic of Mental Verbs in general:





Åke Viberg

(7) Jag fick tanken tidigt på morgonen (AP)

The idea came to me in the morning.

Various other examples of radical change of role structure in the translations are given below. It appears that such changes are particularly frequent when Abstract Possession is involved, although it cannot be said to reach a very high frequency even in this case. The following example literally reads something like: ‘Sweden got a changed military situation’: (8) Efter förlusten av Finland fick Sverige ett helt förändrat militärt läge. (AA)

After the loss of Finland Sweden’s military situation was completely changed.

In the next example the literal translation of the original is ‘She got increased blood pressure from rhododendrons’: (9) Hon avskydde allt som var spikrakt i trädgårdssammanhang och fick förhöjt blodtryck av rhododendron och silvergranar. (ARP)

She loathed anything dead straight in gardens, and rhododendrons and silver spruce made her blood pressure soar.

. Modal The verb få in combination with a verb in the infinitive generally has a modal meaning. Få signals primarily what van der Auwera & Plungian (1998) analyze as participant-external modality of the deontic type (deontic possibility = permission and deontic necessity = obligation). Following the interesting proposals in Winter & Gärdenfors (1995), the external power could either be a participant of the speech situation or a third party. These distinctions are expressed in subtle ways, as for example in the interaction between the choice of pronouns and sentence mood: Får jag gå? ‘May I leave?’ (listener in power), Du får gå ‘You may leave’ (speaker in power) vs. Jag får gå ‘I may leave’ (third party in power). The translation corpus is particularly well suited to studying the contrast between Permission and Obligation, since this distinction usually requires different translations. Which alternative applies is a pragmatic question. An example like Han fick åka hem ‘He få-PAST go home’ can be translated either as He was allowed/could go home or He had to go home depending on the context. In an example like Han hoppades få åka hem ‘He was hoping to be allowed to go home’ Permission (or perhaps Possibility) is involved. In the translation cor-

Polysemy and disambiguation cues across languages

pus, it is possible to find examples which come close to minimal pairs. In the following example, Obligation is the correct interpretation and this is also reflected in the English translation. The passage is taken from a novel (P. C. Jersild, Babels hus 1985) and describes what happens when someone arrives at a hospital. The presupposition is that someone who feels ill wants to stay at the hospital: (10) Den som inte är sjuk är följaktligen frisk och får åka hem igen.

The person who is not ill is consequently well and has to go back home.

In the following example taken from the same novel, another patient wants to leave the hospital after an operation. In this case, Permission is the appropriate interpretation, which is reflected in the translation: (11) Han skulle förmodligen snart få åka hem.

He would presumably be allowed [to go] home soon.

The ambiguity is quite obvious to native speakers of Swedish. For example, if a parent happens to tell the children to keep quiet using the phrase Nu får ni hålla tyst! ‘Now you must(/may) keep quiet’, the children are likely to answer Får vi? ‘May we?’ (with stress on få ‘may’ and mockingly surprised intonation). Intuitively, Permission appears to be the default interpretation, even if the children are well aware of the intended meaning in the preceding example. Although both Permission and Obligation are frequent as meanings of få, Permission appears to be most frequent. In legal texts, where ambiguity is not tolerated, få can only express Permission (at least in the present corpus). The semantic relations are set out in Figure 1 along the lines of Langacker’s (1988) usage-based model. As a modal, få has Permission as a default interpretation (symbolized by a box with double lines), which can be extended to cover Obligation (box enclosed in single lines and semantic extension symbolized by a broken arrow). What both of these meanings have in common is that some external actor is in power, usually a human agent with social authority, but External power

Permission

Obligation

Figure 1. Schematic network representing the modal meanings of få





Åke Viberg

power could also reside in some other actor such as a natural force: Vi fick gå hem på grund av regnet ‘We had to go home because of the rain’. This more schematic meaning which is shared by Permission and Obligation is symbolized as a box with broken lines. It is related to the more specific meanings via specialization (unbroken arrow). The contrast between Permission and Obligation is usually clearly reflected in the English translations. The major translations of modal få are shown in Table 7. When the interpretation is Permission, the major translations are be allowed to and can as in the following examples: (12) Får man meta i sjön, sa han. (SC) “Are you allowed to fish in the lake?” he asked. (13) Jo, sa Pettersson. För tio kronor får ni hyra roddbåten. (SC)

“Yes,” said Pettersson. “You can hire the rowing boat for ten kronor.”

In legal Swedish, få seems to be used exclusively in the Permission sense. The major translation in such texts is may, which is actually the most frequent translation of få in the Permission sense, but it is primarily found in this text type, which is the motivation for regarding can and be allowed to as the major translations in this sense. May is a domain-specific translation, dominating in legal language from which the following example is taken: (14) Visering får begränsas även i övrigt och får förenas med de villkor som kan behövas. (UTL)

The issue of a visa may be restricted in other respects and may be subject to such conditions as may be necessary.

Table 7. English translation equivalents of få as a modal verb Permission

Obligation

can be allowed to must (negated) should (negated) may Paraphrase ZERO ** Various other cases

055 056 044 020 184* 080 063 088

have to must Paraphrase ZERO ** Various other cases

121 013 023 026 041

Total

590

Total

224

* Predominantly in legal texts ** An instance is counted as ZERO only when få has been specifically omitted. Cases where få is contained in a longer passage which has been omitted are marked ‘untranslated passage’ in the original coding (included under ‘Various other cases’ in the tables).

Polysemy and disambiguation cues across languages

The most common translations account for 61% in the Permission sense and 60% in the Obligation sense. The dominant translation of få referring to Obligation is have to but must is also used to some extent: (15) Det var så djup snö att han fick leda cykeln sista biten. (AP)

The snow was so deep he had to push the bike the last bit.

(16) Jag kan inte påstå att jag tyckte om att höra Siiri vräka ur sig detta, men man får komma ihåg att hon var upprörd. (AP)

I can’t say I liked hearing Siiri pouring all this out, but you must remember that she was upset.

When få is negated, there is a well-known difference between Swedish and English, which is mentioned in most school grammars of English. In Swedish, negated permission is expressed as ‘X is not permitted to do S’, for example Peter får inte röka ‘Peter is not allowed to smoke’, whereas in English negated permission is rather expressed as an obligation not to do something: Peter must not smoke = Peter is obliged not to smoke. (17) Här får du inte rensa, Aron! (GT)

You must not weed here, Aron!

A negated form of should is also relatively frequent as a translation of negated permission: (18) Slåttern fick inte ta en timme. (SC)

Hay-making shouldn’t take an hour!

In legal texts, negated permission is translated by may not: (19) En utlänning får inte hållas i för- An alien may not be detained pursuant var med stöd av 2; första stycket to Section 2(1)(2) for more than 48 2 längre tid än 48 timmar. (UTL) hours.

In a relatively large number of cases marked as ZERO in Table 7, få as a modal verb is not translated, which may be taken as a sign that it sometimes has a rather weak modal force in Swedish, as in the following example: (20) I morgon ska det bli skönt att få tala med Stanley. (LH)

Tomorrow it will be a joy to Ø talk to Stanley.

. Inchoative When the main verb is one of the perception verbs se ‘see’ or höra ‘hear’ or the





Åke Viberg

cognitive verb veta ‘know’, få usually has an inchoative reading as in the following example with se ‘see’: (21) Då kom Alfrida, mora hans, ut och fick se att han stod där (TL)

Then his mother, Alfrida, came out and saw him standing there

Once one has the possibility of seeing something because it comes within the field of vision, one usually also sees it. The correlation between the inchoative meaning and the combination of få with the perceptual/cognitive verbs se, höra and veta is not total. Occasionaly, få can have a modal meaning even in combination with these verbs. The most common translation is Zero, i.e. få does not have a translation, as in the following English and Finnish translations. In the French example, the verb voir ‘see’ appears in a perfective form which signals the inchoative meaning: (22) Utanför Lill-Olas bod stod det bilar och när Åke fick se att det var öppet ville han in och köpa nya flugor. (KE)

There were cars outside Lill-Ola’s fishing-tackle booth and when Åke saw the shop was open, he said he wanted to get some more flies.

[…] quand Åke vit que c’était ouvert […]

[…] kun Åke näki kaupan olevan auki […]

The major English translations are displayed in Table 8. A fairly frequently used possibility is that of changing the main verb into another main verb which incorporates an inchoative meaning. The choice of translation depends on the main verb. Få se may be translated catch sight of, whereas få veta can be translated as find out or learn, be told. The last two alternatives can also translate få höra. Table 8. English translation equivalents of få + VInfinitive[Cogn, Perc] ZERO find out (‘få veta’) be told (‘få veta, få höra’) catch (a glimpse, sight of) Various

38 06 05 04 39

Total

92

. Causative When få appears in the syntactic frame få + NP + att VPInfinitive, it has a causative meaning (see also Altenberg, this volume). The most common trans-

Polysemy and disambiguation cues across languages

lation is make in English and faire ‘make’ in French, whereas Finnish in most cases uses the general equivalent saada: (23) Han var vid sitt muntraste lynne […] och fick alla att skratta. (IB)

He was in his most merry mood […] and made everyone laugh.

Il était de son humeur la plus joyeuse […] et faisait rire tout le monde.

Hän oli hilpeimmällä tuulellaan […] ja sai kaikki nauramaan.

As can be observed in Table 9, which gives a summary of the most frequent translations in English, the most common of these in the causative use is make, but get and cause are also used with some frequency. Table 9. English translation equivalents of få as a periphrastic causative make get cause Various other cases

42 09 05 25

Total

81

. Success and related senses Usually få has a non-agentive (non-intentional) subject but there are a few syntactic frames where the subject has a strong tendency to be agentive. Actually, an agentive interpretation is possible but not very frequent when få is used as a periphrastic causative, as in the following example, where the matrix verb försöka ‘try’ explicitly signals intention: (24) Jag förstod att de inbillade sig att världen var som lärarna eller föräldrarna försökte få dem att tro: utan hemligheter. (AP)

I realized they all imagined the world was just what the teachers and parents tried to get them to believe: with no secrets.

Non-agentive readings are common even when the subject is human in the frame få + NP att VPInfinitive. In addition, inanimate subjects which do not allow an agentive interpretation are fairly frequent in this construction. There is, however, a set of syntactic frames where the subject has a strong tendency to be agentive. Most of these frames have a low frequency. The most frequent frame of this type, which accounts for almost 5% of the occurrences of få, is få + Particle + NP:





Åke Viberg

(25) Med ett mjukt ryck fick roddaren With a soft jerk, the oarsman got up speed again to keep it at a distance. upp farten igen. (KE)

An example like this one implies that the subject made an active attempt to achieve something and succeeded. In Table 1, this meaning is labelled Success. The attempt is intentionally controlled, but whether the attempt is achieved or not cannot be controlled, and a further implication is that the act required a greater than usual amount of effort or skill. A sentence such as Peter fick upp dörren, which literally means ‘Peter got up the door’ should be translated as Peter managed to open the door or Peter got the door open rather than simply Peter opened the door which has the straightforward equivalent Peter öppnade dörren in Swedish. Although, strictly speaking, a human agent can never control all the conditions that affect the outcome of a certain intended act, we normally take it for granted that a simple act like opening a door will succeed. Attempt and success are invoked only when the outcome is in some sense unlikely or problematic. The intentional reading is often explicitly signalled by other linguistic cues in the immediate context. No less than 17 (out of a total of 95) examples appear in the wider frame för att VPInfinitive (in order) to VPInfinitive , which marks intention. In 10 cases, a verb marking Attempt or Success/Failure appears in a matrix clause governing få. Although the majority (81%) of the examples of få + Particle have the Intentional Success reading, there are some clear exceptions. In most of them få + Particle serve as a Mental predicate. The most frequent cases (9 examples) consist of the phrase få för sig att S ‘get the (wrong or weakly motivated) idea that S’: (26) I samma veva fick en del personer för sig att de måste informera Nora om mamma och pappa. (MG)

Then a number of people got it into their heads that they had to inform Nora about her mother and father.

The most common translation of få in combination with a particle is get, which accounts for 35% of the translations. That is a higher proportion than for most of the other uses of få and this indicates that get also has a strong association with Success (or Human Interest in a more general sense). The rest of the translations consist of a wide range of verbs, many of which are characterized as taking an agentive subject. Attempt and Success are also involved in most occurrences of få in the frame få + NP + ADJresult, in which få is combined with an object followed by a resultative adjective. This use is, however, infrequent, which makes the generalization tentative.

Polysemy and disambiguation cues across languages

(27) Sedan gäller det bara att komma Then it’s only a matter of how we’ll set på hur vi skall få hunden fri. (PCJ) the dog free.

The appearance of a PP (usually spatial) after the object often serves as an extra cue for the interpretation and may change the meaning in various directions. In certain cases, it has an effect similar to that of a spatial particle and introduces a success interpretation: (28) Men Birger fick honom på benen. (KE)

Birger got him upright.

A use related to the ones discussed earlier in this section is få in the frame få + NP + Participle. The NP in this frame serves semantically as an object of the verb in past participle form, whereas the subject of få has an interest in the outcome and could best be characterized as an Experiencer of either benefit or harm (Beneficiary or Maleficiary). Usually, the subject is simultaneously the Possessor of the object. In the following two examples, the subject is a Beneficiary (the second example literally means ‘get a prototype financed’): (29) Medan de uppfinnare inte har den upplevelsen som strävat flera år för att få sin senaste idé accepterad och som äntligen fått ett erbjudande av att få en prototyp finansierad. (BB)

An inventor, who has been struggling for several years to have his/her latest idea accepted and who has finally got an offer to get financing for his/her prototype based on this idea, may not have the same feeling of uncertainty.

The following is a clear example of the subject as Maleficiary: (30) jag minns inte vad jag företog mig, antagligen klättrade jag på hyllor och hängde i krokar för att slippa få tårna uppätna. (IB)

I don’t remember what I did, probably climbed on to shelves or hung from hooks to avoid having my toes devoured.

The combination of Beneficiary and Maleficiary meaning is well-known from crosslinguistic studies of the Dative. Like the Success reading, it focuses on the Human Interest domain.

. English get The verb get is “particularly versatile” with respect to the number of clause types it can enter into (Quirk et al. 1985:720) and it also has a large number of senses which are both lexical and grammatical. There are several detailed studies, two





Åke Viberg

of which will be mentioned here. Johansson & Oksefjell (1996) focus on the verb’s constructional flexibility and also account for its distribution in various text categories. Get turns out to be particularly characteristic of spoken English and of less formal fiction, whereas it is underrepresented in informative prose. Gronemeyer (1997, 1999) is centered around the polysemy of the verb and also accounts for the diachronic development of its meanings from Middle English to present-day English. Since the polysemy and the varied syntactic frames of get have already been treated in considerable detail in earlier studies, this section will concentrate on the most important contrastive relationships as reflected in data from the English-Swedish Parallel Corpus. The major meanings and syntactic frames of get in this corpus are set out in Table 10. Table 10. The major meanings of get. English originals Meaning

Frame

Example

Possession

get + NP have got + NP

Peter got a book Peter has got a book

30,1 08,0

Modal: Obligation

have got to + VPInfinitive gotta + VPInfinitive

Peter has got to come Peter gotta come

01,8

Inchoative Passive

get + ADJ/Participle get + PastPart (by NP)

Peter got angry Peter got killed (by a gunman)

11,2 02,6

Causative

get + NP +to VPInfinitive

Peter got Harry to leave

01,7

get + Particle get + PP

Peter got up/in/out… Peter got to Berlin

30,1

get + NP + PP get + Particle + NP

Peter got the buns out of the oven 07,1

Motion: Subject-centered Object-centered Various other cases

% N = 967

05,8

The most common translations of get in Swedish are shown in Table 11. The most frequent translation is få but this verb does not cover more than 20,9% of the total number of cases. Some other possession verbs, in particular ha ‘have’(9,5%) and ta ‘take’ (5,7 %), also reach a relatively high frequency as translations. However, the second most frequent equivalent is a motion verb komma ‘come’, which represents 11,3% of the translations, and the inchoative verb bli ‘become’, which translates get in 8,3% of the cases. In the following sections, the translations will be discussed in relation to the major meanings of get.

Polysemy and disambiguation cues across languages

Table 11. The most frequent Swedish equivalents of English get Possession få ha ta ge skaffa hämta

‘get’ ‘have’ ‘take’ ‘give’ ‘acquire’ ‘fetch’

Total

0Motion 202 092 055 018 018 015

0komma 0‘come’ 0gå 0‘go’ 0stiga 0‘step’ 0kliva 0‘stride’ 0resa sig 0‘rise’

400

Inchoative 109 029 020 011 015 184

bli ‘become’

000Total 80

80

664

Total other equivalents

303

Total

967

. Possession Like Swedish få, the verb get in its prototypical use as a possession verb combines the notion of CHANGE and POSSESSION and, as noted above, få is the dominant translation of get in this meaning. One of the major differences in comparison with få is that English get can refer to an intentional, controlled action even in its basic use as a verb of possession. The closest translation in this case is skaffa, which is often used in the reflexive form, as in the following example, but can also be used as a simple transitive verb: (31) Why don’t we get a microwave? (DL)

Varför skaffar vi oss inte en mikrovågsugn?

That get can be used with an agentive subject is also reflected in the fact that it can appear in a ditransitive syntactic frame. In this case, too, skaffa can be used as a translation. Another quite close translation of agentive get is hämta ‘fetch’, which is relatively frequent when get has an active meaning: (32) and so — she had told the maid to och därför hade hon bett kammarget her some champagne. (RDA) jungfrun att hämta lite champagne åt henne.

. Motion The uses of get as a motion verb, which are displayed in Table 12, are particularly interesting and represent 37% of the total number of occurrences of the verb. Verbs of motion can be divided into subject-centered verbs of motion such as walk and run, which describe the displacement of the subject, and object-cen-





Åke Viberg

tered verbs of motion, such as throw and put, which describe the displacement of the object. Get is primarily used as a subject-centered motion verb, which is a meaning the Swedish verb få does not have. The few cases where få is used as a translation of get in this meaning are not equivalent in this respect. The table also includes cases of abstract (or metaphorical) motion which tend to require a rather free translation. On the other hand, when get is used as an object-centered motion verb, få is the dominant translation, usually in the frame få + Particle. The second most frequent alternative ta ‘take’ is less common and the remaining translations only appear once or twice. Table 12. Major Swedish equivalents of get as a motion verb Subject-centered motion

Object-centered motion

komma ‘come’ ta (sig) ‘take’ +refl. gå ‘go’ stiga ‘step, rise’ kliva ‘step, stride’ resa sig ‘rise, get up’ hinna ‘get ... in time’ få Various

090 019 026 020 011 015 007 008 095

få ta ‘take’

31 06

Various

33

Total

291

Total

70

The dominant translation of get as a subject-centered motion verb is komma ‘come’. The reason for this is probably related to the fact that the semantics of komma involve a point of view tied to ego or a main character in various ways. This is related to the human interest domain, something which is characteristic of the basic verbs of possession in general. (33) Help me to get to the crossroads safely. (PDJ)

Hjälp mig att komma till vägkorsningen utan att något händer.

Another relatively frequent translation is ta sig, which requires the subject to be active and implies a certain effort on the part of the subject: (34) She had to get to the crossroads and catch the bus. (PDJ)

Hon måste ta sig till vägskälet och bussen.

The verb hinna, which is a language-specific hyponym of ‘succeed’ (‘get … in time’) and appears as a translation with moderate frequency, is also related to the human interest domain:

Polysemy and disambiguation cues across languages

(35) Diana heard her say, “But I must get to Marks and Spencer before they close. (ST)

Diana hörde hur hon sa: “Men jag måste hinna till Marks and Spencer innan de stänger.

The translations discussed so far are neutral with respect to Manner of Motion. There are two closely related meanings of get as a subject-centered motion verb, which appear in the frame get + Particle (+PP) and tend to be translated with verbs indicating Manner of Motion. The reason for this is that the displacement is of very limited extent, while, at the same time, the movement of the human body is extensive. The first subtype is related to the entrance into and exit from vehicles (get on/get out,off) and is usually translated with kliva (på/av) ‘step, stride’, stiga (på/av) ‘step, rise’ or gå (på/av) ‘go, walk’: (36) Dalgliesh got out of the Jaguar (PDJ)

Dalgliesh klev ur Jaguaren

(37) The train stopped and more people got on. (AT)

Tåget stannade och fler passagerare steg på.

The other subtype is the meaning get up, get out of bed/get to bed which is translated with the same set of verbs. When get up refers to a change from sitting to standing position, resa sig ‘rise’ in the reflexive form is the dominant translation: Han reste sig, satte på tekitteln och (38) He got up and put on the kettle and he sat down again where my satte sig ner igen där min mamma alltid satt. ma always sat. (RDO)

Observe that the Source of the Motion (such as ‘car’, ‘bed’, ‘chair’) is usually not explicitly mentioned in the syntactic frame of the verb but must be inferred from the wider context in order to yield a correct translation. As a motion verb, get is also used fairly frequently in metaphorical expressions. Usually the PP in the syntactic frame refers to an abstract Place or even to an event as in the example below (cf. the event structure metaphor treated in Lakoff (1993), which involves many spatial concepts): (39) Impetuous, he rages on: ‘It really turned bad soon after the divorce, when I tried to get down to writing again. (BR)

Han fortsätter med en plötslig häftighet: “Det blev verkligt illa strax efter skilsmässan när jag försökte komma igång med skrivandet igen.

As already mentioned, få is the dominant translation when get is used as an objectcentered motion verb. When it appears it usually has the success reading described in Section 3.5:





Åke Viberg

(40) Ma and Pa were at the front door Mamma och pappa stod framför ett of a dirty old house, trying to get smutsigt gammalt hus och försökte få in en nyckel i låset. a key in the lock. (ST) (41) There’s enough petrol for this Det finns väl bensin så det räcker för i afternoon, I expect, but how am I eftermiddag, tror jag, men hur ska jag going to get the children to school kunna få barnen till skolan i morgon? tomorrow morning? (FW)

The relative prominence of the feature human interest in the meaning of get is most probably also reflected in the feature Success associated with få + Particle in Swedish, but this component represents a much stronger degree of human interest and in many cases få cannot be used as a translation. The most frequent alternative in this case is ta ‘take’, but very often some more specific verb is used such as plocka ‘pick’. Various types of spatial metaphor are also relatively frequent when get is used as an object-centered motion verb: (42) I can’t get any sense out of her. (RDA)

Jag kan inte få ett vettigt ord ur henne.

. Grammaticalized meanings Get has acquired a number of grammaticalized uses in English but only the ones with an inchoative meaning reach relatively high frequencies in the corpus. Consequently, the grammatical uses will only be discussed rather briefly here in spite of their theoretical interest. .. Modal The forms (have) got to, (have) gotta can be used to express modal obligation. All but one of the 17 examples are translated by måste ‘must’, which expresses strong obligation in Swedish: (43) But when you’re in big business like I am, you‘ve got to be hot stuff at arithmetic. (RD)

Men när man gör affärer i den här storleksklassen så måste man vara slängd i matematik.

Patton rasade mot Eisenhower: “Mina (44) To Eisenhower he [Patton] exploded: “My men can eat their karlar kan äta sina livremmar, men tanksen måste ha soppa. belts, but my tanks have gotta have gas. (MH)

Polysemy and disambiguation cues across languages

.. Causative In the frame get + NP + to VPInfinitive get has a causative meaning but there are only 16 examples of this type. Exactly half of them are translated by få: (45) You should get Stuart to narrate our schooldays together. (JB)

Ni borde få Stuart att berätta om vår skoltid.

As was shown earlier, få in its use as a periphrastic causative was most frequently translated by make. There is another frame where get has primarily a causative meaning: get+NP+Participle. The most common translation is få in this case also (10 out of 23 examples): (46) But how was I going to get the check replaced? (SG)

Men hur skulle jag få checken utbytt?

4.3.3 Inchoative When get is combined with an adjective, bli ‘become’ is the dominant translation (40 out of 58 get + ADJ) as in the following simple example: (47) I don’t ever want to get old. (JB)

Jag vill aldrig någonsin bli gammal.

Bli is quite common also when get is combined with an adjectival participle, but in that case (according to varying lexical constraints) the most frequent type of translation is a reflexive verb, for example gifta sig ‘get married’, skilja sig ‘get divorced’, klä (på) sig ‘get dressed’, intressera sig ‘get interested’, vänja sig ‘get used to’. The following is a typical example: (48) And get involved he does, daily, in the lives of all. (LT)

Nog engagerar han sig alltid; dagligen och i alla bybornas liv.

The reflexive in these examples serves to topicalize the NP that ends up as subject. Sometimes the inchoative element is strengthened with the aspectual verb börja ‘begin’: Innan Baby började intressera sig för (49) Before Baby got interested in boys, she would help my mother pojkar brukade hon hjälpa min mor att sy klänningar åt henne. make dresses for her. (NG)

In Swedish, there is also a semi-productive inchoative verbal suffix — na, which appears in a few translations: kallna (from the adj. kall) ‘get cold’, tröttna (adj. trött) ‘get tired’, fastna (adj. fast) ‘get stuck’.





Åke Viberg

There are also examples of two other frames with an inchoative meaning in the corpus. The frame get + to + VPInfinitive is primarily found in the phrase get to know (translated lära känna), but there are a few other examples like the following: (50) People in communities like his own, in other areas of the Transvaal, got to hear of him; (NG)

Folk i samhällen liknande hans eget, i andra delar av Transvaal, fick höra talas om honom;

Another, related, frame with inchoative meaning is get + to + VPing. It is, however, only attested once in the present corpus: (51) But this morning, when Mr Harris didn’t turn up, and Marion didn’t either, we got to wondering (FW)

Men i morse, när varken mr Harris eller Marion kom, började vi undra,

.. The passive Clear cases of the so-called get-passive are represented in examples with an explicit by-phrase expressing the agent. Such examples are, however, infrequent and in many cases it is hard to draw a clear line between the get-passive and the inchoative use of get. Out of the 25 examples classified as passive, 9 are translated by the bli-passive in Swedish and 4 by the morphological s-passive: (52) Did he get picked up? (SG)

Blev han tagen av polisen?

(53) We were pen pals after he got sent Vi blev brevvänner efter att han skickats till San Luis. to San Luis. (SG)

. Conclusion: universal and language-specific structuring To resume the theme of the opening lines of this paper, human languages are at once characterized by universality and an enormous variability both across languages and within languages. At a general level, få and get resemble one another with respect to their semantic extension. Etymologically få is derived from a physical action verb fånga meaning ‘catch’, whereas get is derived from ‘seize’. The latter meaning is also a common source for verbs meaning ‘have’ in European languages. The rise of the possession verb meaning represents a focusing of the

Polysemy and disambiguation cues across languages

result and a gradual bleaching of the components related to manner of action and agentivity. The latter component has virtually disappeared from få, whereas get can still have an agentive reading as a possession verb. The further extensions into areas of grammatical meaning such as modal, causative and inchoative also show many parallels at a general level. The pattern represents quite a common path of meaning extension cross-linguistically. Matisoff (1991) describes a pattern of grammaticalization characteristic of Southeast Asian languages such as Thai, Vietnamese, Khmer and Lahu which in many respects resembles that of Swedish få, Finnish saada and English get. In Southeast Asian languages, verbs meaning ‘get’, ‘obtain’ have characteristically developed meanings such as ‘manage/get to’, ‘have to/must’ and ‘be able to’. The various senses are related to a quite high degree to distinct syntactic frames. The first two meanings tend to appear when ‘get’ is a pre-head auxiliary, whereas ‘be able to’ appears in post-head auxiliary position. It appears, however, that the meaning ‘get’ (‘come to possess’) is not generally lexicalized as a simple verb in the world’s languages. The situation found in French, where there is no direct equivalent of få, seems to be rather common. (In particular, it seems that ‘take’ can be extended to cover this meaning in a number of languages.) The pattern of meaning extension characteristic of ‘get’ is closely related to the patterns found for other basic possession verbs, in particular for ‘give’. According to Newman’s (1996) systematic study of ‘give’ across a wide range of languages, it is clear that this verb tends to extend into the grammatical areas of benefactive, permission/enablement and causation, which all have parallels in languages where ‘get’ is (being) grammaticalized. This indicates that there are great similarities across languages with respect to the conceptual core. In spite of the strong universality at the conceptual level, the lexicalization patterns are very language-specific at a more detailed level. The overall mutual translatability of få and get is remarkably low. Få is translated with get in only 12% of the cases and even if get has få as a translation almost twice as often, in 21% of the cases, that is still a relatively low figure. This prompts a detailed contrastive analysis, which can be used for applied purposes such as translation and language teaching. In this study, attention has been paid in particular to the cues for disambiguation. It turns out that the syntactic frame plays an important role in narrowing down the range of possible meanings of få and get to an extent that is unusual in the case of other words. However, additional cues often have to be taken into consideration in order to identify the exact sense. For example, the choice between a permission and an obligation reading of få is





Åke Viberg

decided primarily on the basis of pragmatic factors, which have to be worked out in more detail. For abstract possession, the semantic composition of the head noun of the object plays an important role in the choice of an appropriate translation. Since such uses are numerous and involve a wide range of abstract nouns, a more complete description requires the study of much larger corpora, which are available only in monolingual form. It is clear, however, that translation corpora serve an important function in sharpening the questions we would like to answer using the large monolingual corpora that are now available at the touch of a key.

Note * This work has been carried out within the Crosslinguistic Lexicology (Swed. Tvärspråklig lexikologi) project, which receives financial support from the Swedish Council for Research in the Humanities and Social Sciences. For a presentation of the project, see Viberg (1996).

References Aijmer, K., Altenberg, B. & Johansson, M. 1996. ”Text-based contrastive studies in English. Presentation of a project”. In Languages in contrast. Papers from a Symposium on Textbased Cross-linguistic Studies [Lund Studies in English 88], K. Aijmer, B. Altenberg & M. Johansson (eds), 73–85. Lund: Lund University Press. Altenberg, B. & Aijmer, K. 2000. “The English-Swedish Parallel Corpus: A resource for contrastive research and translation studies”. In Corpus Linguistics and linguistic theory, Christian Mair and Marianne Hundt (eds.), 15–33. Amsterdam and Atlanta: Rodopi. Altenberg, B., Aijmer, K. & Svensson, M. 1999. The English-Swedish Parallel Corpus: Manual. Department of English, University of Lund. (Also at http://www.englund.lu.se/ research/espc.html.) Bally, Ch. 1950. Linguistique générale et linguistique française. 3e éd. Berne: A. Francke. Berlin, B. & Kay, P. 1969. Basic color terms. Berkeley: University of California Press. Dorr, B. 1993. Machine translation. A view from the lexicon. Cambridge: The MIT Press. Geeraerts, D. 1997. Diachronic prototype semantics: a contribution to historical lexicology. Oxford: Clarendon. Goddard, C. & Wierzbicka, A. (eds). 1994. Semantic and lexical universals. Theory and empirical findings. Amsterdam: Benjamins. Gronemeyer, C. 1997. A semantic and syntactic account of the polysemy in get. Licentiate of Philosophy Thesis. Dept. of Linguistics, Lund University. Gronemeyer, C. 1999. “On deriving complex polysemy: the grammaticalization of get”. English Language and Linguistics 3(1): 1–39.

Polysemy and disambiguation cues across languages

Gumperz, J. & Levinson, S. (eds). 1996. Rethinking linguistic relativity. Cambridge: Cambridge University Press. Hatch, E. & Brown, C. 1995. Vocabulary, semantics and language education. Cambridge: Cambridge University Press. Heine, B. 1997. Possession. Cambridge: Cambridge University Press. Jakobson, R. 1936. “Beitrag zur allgemeinen Kasuslehre: Gesamtbedeutungen der russischen Kasus”. Reprinted in: Selected writings II: words and language, 23–71. The Hague: Mouton. Johansson, S. 1998. “On the role of corpora in cross-linguistic research”. In S. Johansson & S. Oksefjell (eds), 3–24. Johansson, S. & Oksefjell, S. 1996. “Towards a unified account of the syntax and semantics of get.” In Using corpora for language research, J. Thomas & M. Short (eds), 57–75. London & New York: Longman. Johansson, S. & Oksefjell, S. (eds). 1998. Corpora and cross-linguistic research. Theory, method, and case studies. Amsterdam: Rodopi. Lakoff, G. 1993. “The contemporary theory of metaphor.” In Metaphor and thought, A. Ortony (ed), 202–251. Cambridge: Cambridge University Press. Langacker, R. 1988. “A usage-based model”. In Topics in cognitive linguistics, B. RudzkaOstyn (ed), 127–161. Amsterdam: Benjamins. Matisoff, J. 1991. “Areal and universal dimensions of grammatization in Lahu.” In Approaches to grammaticalization. Vol. II [Typological studies in language 19:2], E. C. Traugott & B. Heine (eds), 383–453. Amsterdam and Philadelphia: John Benjamins. Miller, G. A. & Johnson-Laird, Ph. 1976. Language and perception. Cambridge: Cambridge University Press. Newman, J. 1996. Give. A cognitive linguistic study. Berlin & New York: Mouton de Gruyter. Poesio, M. 1996. ”Semantic ambiguity and perceived ambiguity.” In Semantic ambiguity and underspecification [CSLI Lecture Notes 55.], K. van Deemter & S. Peters (eds), 159–201. Stanford. Pustejovsky, J. 1995. The generative lexicon. Cambridge, MA: Bradford. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A comprehensive grammar of the English language. London & New York: Longman. Schwarze, C. (Hrsg.) 1985. Beiträge zu einem kontrastiven Wortfeldlexikon Deutsch — Französisch. Tübingen: Gunter Narr Verlag. Singleton, D. 1999. Exploring the second language mental lexicon. Cambridge: Cambridge University Press. SUC (1997). SUC 1.0. Stockholm-Umeå Corpus. Produced by Dept. of Linguistics, Umeå University and Dept. of Linguistics, Stockholm University. CD-Rom. Talmy, L. 1985. “Lexicalization patterns: semantic structures in lexical forms”. In Language Typology and Syntactic Description. Vol. III, T. Shopen (ed), 57–149. Cambridge: Cambridge University Press. Taylor, J. 1995. Linguistic categorization. Prototypes in linguistic theory. 2nd ed. Oxford: Clarendon Press. Tsohatzidis, S. (ed). 1990. Meanings and prototypes. Studies in linguistic categorization. London: Routledge. van der Auwera, J. & Plungian, V. A. 1998. ”Modality’s semantic map.” Linguistic Typology 2: 79–124.





Åke Viberg

Viberg, Å. 1981. “Emotiva predikat i svenskan och några andra språk.” In Studier i kontrastiv lexikologi, 61–99. (In Swedish. Studies in contrastive lexicology.) Ph. D. diss. Dept. of Linguistics, Stockholm University. Viberg, Å. 1996. ”Crosslinguistic lexicology. The case of English go and Swedish gå.” In Languages in contrast. Papers from a Symposium on Text-based Cross-linguistic Studies [Lund Studies in English 88], K. Aijmer, B. Altenberg & M. Johansson (eds), 151–182. Lund: Lund University Press. Viberg, Å. 1999a. “The polysemous cognates Swedish gå and English go. Universal and language-specific characteristics.” Languages in Contrast 2: 87–115. Viberg, Å. 1999b. “Polysemy and differentiation in the lexicon. Verbs of physical contact in Swedish.” In Cognitive semantics. Meaning and cognition, J. Allwood & P. Gärdenfors (eds), 87–129. Amsterdam: Benjamins. Wagner, J. 1976. ”Eine kontrastive Analyse von Modalverben des Deutschen und Schwedischen.” IRAL XIV(1): 49–66. Wandruszka, M. 1969. Sprachen. Vergleichbar und unvergleichlich. München: Piper. Wanner, L. (ed). 1996. “Lexical choice.” Special issue of Machine Translation 11(1–3): 1–216. Winter, S. & Gärdenfors, P. 1995. “Linguistic modality as expressions of social power.” Nordic Journal of Linguistics 18(2): 167–199.

A cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese Lan Chun

1. Introduction This is a contrastive study of spatial metaphors in English and Chinese carried out within the framework of cognitive semantics. It is assumed in the study that there exists an intermediate level ‘cognition’ between language and the physical world (Svorou 1994, Gärdenfors 1996, Geiger & Rudzka-Ostyn 1993, Langacker 1987, Lakoff 1987), and an experiential view of cognition is adopted. This view, also known as ‘experiential realism’, hypothesizes that basic-level categories and image schemas are the two kinds of preconceptual structure directly meaningful to us. One way in which abstract conceptual structure arises from these two kinds of preconceptual structure is by metaphorical mapping. The cognitive approach ascribes the following basic features to metaphor: 1. Metaphor is conceptual in nature: it is a cognitive device which enables us to organize our conceptualization of the world. 2. Metaphor is composed of two domains, a relatively clearly structured source domain and a relatively less clearly structured target domain. It is a mapping of the schematic structure of the source domain onto that of the target domain. 3. Metaphorical mappings are not arbitrary but are grounded in our physical experience. Once a metaphorical mapping is set up, it will impose its structure on real life and be made real in different ways. Two English spatial terms, namely up and down, and two Chinese spatial terms,



Lan Chun

namely shang (‘up’) and xia (‘down’), constitute the main research issues of this study. Following one of the basic assumptions of cognitive semantics — that semantic structure is equated with conceptual structure, which commonly gives rise to a prototype-based network (Smith 1993: 531, Geiger & RudzkaOstyn 1993: 1) — each of the four spatial terms is regarded as capturing a conceptual structure with prototypical models, and metaphorical extensions developed out of those prototypical models. To distinguish the linguistic term from the conceptual structure, the former will be referred to as up, down, shang and xia, and the latter as UP, DOWN, SHANG and XIA. The study is based on a Chinese corpus and an English corpus and has the following objectives: 1. to determine the metaphorical extensions along which UP/DOWN and SHANG/XIA develop; 2. to explicate the experiential bases of the metaphorical extensions uncovered on the one hand, and the realizations of those metaphorical extensions in everyday life on the other, which, according to Lakoff (1993: 244), are two sides of the same coin; 3. to discover the similarities and differences between the ways English and Chinese speakers conceptualize other domains via their UP/DOWN and SHANG/XIA metaphors. As recognized by Yu (1996) and Stibbe (1996), the cognitive approach to metaphor now faces three main challenges. First of all, more cross-linguistic and cross-cultural research needs to be done before sound evidence can be produced for the claim of the cognitive approach that abstract reasoning is partly metaphorical. Secondly, to what extent and in what manner cognitive universals and variations exist across cultures and languages still remains to be explored. Thirdly, during the past two decades research into the cognitive approach to metaphor has relied heavily on a narrow range of unnatural data, sometimes made on the spot to fit a pre-set theory. A closer look at a representative range of contemporary examples taken from natural language sources, considered in as full a context as possible, is therefore called for (cf. Schönefeld 1999). In view of these challenges, the present study contributes to cognitive semantic research in metaphor in the following ways. First, it offers a systematic contrastive analysis of the metaphorical extensions of two English spatial terms and two Chinese spatial terms. Second, evidence is provided from the analysis for the cognitive claim that metaphorical mapping of the image-schematic structure of the source domain onto that of the target domain gives rise to

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

abstract concepts and abstract reasoning. Evidence is also provided for the possible existence of a universal spatial metaphorical system, which has so far largely remained a speculation. Third, the study contributes to research methodology: it shows that, handled properly, a corpus-based approach towards data collection and analysis can be fruitfully exploited in the field of cognitive semantics; it also demonstrates how two typologically different languages can be brought together for comparative purposes within a cognitive framework.

. UP, DOWN, SHANG and XIA as image-schematic concepts UP, DOWN, SHANG and XIA each activates an image-schematic concept depicting a movement or a particular location of a trajector in relation to a landmark along the vertical axis. When UP/DOWN and SHANG/XIA depict a movement of the trajector, they will be referred to as dynamic UP/DOWN and dynamic SHANG/XIA. When they depict a particular location of the trajector, they will be referred to as static UP/DOWN and static SHANG/XIA. Figures 1 and 2 are graphic representations of dynamic UP/SHANG and dynamic DOWN/XIA. Examples of the dynamic type are: (1) The camera is panning up a girl’s body. (2) The unemployment rate has gone up to 4%. vertical axis

Figure 1: Schema for dynamic UP/SHANG

vertical axis trajector

landmark

landmark

trajector

horizontal axis

horizontal axis

Figure 2: Schema for dynamic DOWN/XIA





Lan Chun

(3) women pashang shanding. we climb up mountain top ‘We climbed up to the top of the mountain’ (4) qiwen shangsheng dao 38 du. temperature up rise to 38 degrees ‘The temperature has risen to 38 degrees’ (5) She sat down, perching on the edge of the armchair. (6) Cut your shopping down to twice a week. (7) women zouxia shanpo. we walk down mountain slope ‘We walked down the mountain’ (8) qiwen xiajiang dao lingxia 10 du. temperature down drop to zero down 10 degrees ‘The temperature has dropped to 10 degrees below zero’

When the trajector is stationary, we get static UP/SHANG and static DOWN/ XIA as captured in Figures 3 and 4. vertical axis

vertical axis

trajector

landmark

landmark

trajector

horizontal axis

horizontal axis

Figure 3: Schema for static UP/SHANG

Figure 4: Schema for static DOWN/XIA

Examples of the static type are: (9) He is up in his own bedroom. (10) They were two goals up at half time. (11) hongqi zai caochang shangkong piaoyang. red flag at playground up sky fly ‘The red flag is flying in the wind over the playground’

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

(12) nanxing diwei zai nuxing diwei zhi shang. man status at women status of up ‘Men’s status is above women’s status’ (13) He could see the house down below. (14) Brazil was two down against France at half time. (15) zhongzi maizai dixia. seed bury at ground down ‘The seeds are buried deep down in the earth’ (16) nuxing diwei zai nanxing diwei zhi xia. woman status at man status of down ‘Women’s status is below men’s status’

When static SHANG and XIA depict a contact between the trajector and the landmark, they constitute a special case for which there is no counterpart in the case of static UP and DOWN. I call this special use of SHANG and XIA contact SHANG and contact XIA, which are represented by Figures 5 and 6. It should be noted that in the case of contact SHANG, the trajector not only touches, but is also supported by, the landmark; and in the case of contact XIA, the trajector is covered or pressed by the landmark. vertical axis

vertical axis

trajector landmark landmark trajector

horizontal axis

Figure 5: Schema for contact SHANG

horizontal axis

Figure 6: Schema for contact XIA

Examples of contact SHANG and XIA are: (17) baozhi shang fangzhe newspaper up-contact place-ing ‘There is a pen on the newspaper’ (18) baozhi shang you newspaper up-contact have

yi zhi bi. one NC pen

yi pian one NC

wenzhang. article





Lan Chun

‘There is an article in the newspaper’ (19) hui shang you yi ge fayan. meeting up-contact have one NC speech ‘There is a speech at the meeting’ (20) baozhi xia mian newspaper down-contact side ‘There is a pen under the newspaper’

you have

yi zhi one NC

bi. pen

(21) gangban zai juda de yali xia bian xing. steel board at huge pressure down-contact change shape ‘The steel board bent under the enormous pressure’ (22) zai shichang jingji zuoyong xia, wujia you sheng you jiang. at market economy function down-contact, price have rise have fall ‘Under the influence of the market economy, prices rise and fall’

When the image schemas of UP/DOWN and SHANG/XIA are used to structure other domains outside space, i.e. when we give other non-spatial domains a vertical axis, a trajector and a landmark, as in examples (2), (4), (6), (8), (10), (12), (19), and (22), they will be regarded as metaphorical extensions of UP/DOWN or SHANG/XIA.

. Research methodology This study is based on samples from a Chinese corpus and an English corpus. The English corpus chosen is the 5-million-word Word Bank of the Collins Cobuild English Language Dictionary (1996), from which 5728 instances of up and 4781 instances of down were retrieved. This English corpus is mainly made up of written material taken from three sources, viz. newspapers, magazines and books published in the UK after 1990. The Chinese corpus, which is made up of about 1.8 million characters of written material, was assembled by the author by downloading Chinese newspapers and magazines published between 1 April and 30 June 1998 from their web-sites and by downloading books of contemporary Chinese writers published after 1995 which can be read from the internet. From this corpus, 7621 instances of shang and 4387 instances of xia were retrieved. The software Microsoft Access was used to process the data. A database was built up for up, down, shang and xia separately. A random list was created and about 10% of the instances of up/down and shang/xia were randomly

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

selected. 529 instances of up, 431 instances of down, 750 instances of shang and 434 instances of xia formed the final database. Each record of up, down, shang and xia was analysed in accordance with the following parameters: prototype model (static or contact or dynamic), trajector, landmark, path, and metaphorical extension. When analyzing up or down in a particular verb-particle construction, such as pick up, or cut down, the present study followed Lindner (1981) and Morgan (1997) in recognizing the contribution of up or down to the meaning of the whole phrase. However, since my interest was not in up/down as a word, but in up/down as encoding the concept UP/DOWN, I did not make a distinction between up/down as a preposition, adjective or adverb.

. SHANG and XIA . Prototypical vs. metaphorical meanings SHANG and XIA originated as purely spatial concepts. This is reflected in the earliest pictographic characters inscribed on oracle-bones excavated from Yin (capital of the Shang Dynasty). Evidence in the Chinese corpus shows that SHANG and XIA are mainly used for the conceptualization of a certain stage or a certain process in the following four target domains: QUANTITY, SOCIAL HIERARCHY, TIME, and STATES. The metaphorical extensions identified are: 1. A Larger Quantity Is Shang

A Smaller Quantity Is Xia

2. A Higher Status Is Shang

A Lower Status Is Xia

3. An Earlier Time Is Shang

A Later Time Is Xia

4. A More Desirable State Is Shang

A Less Desirable State Is Xia

The mapping of the image-schematic structures of SHANG and XIA onto that of their target domains and the relationship this mapping has with its experiential grounding and its realizations in real life are roughly represented in Figure 7. In the figure we see that the image-schematic structures of SHANG and XIA emerge directly from our everyday bodily experience. They are then projected onto the abstract target domains through metaphorical mappings. As a result, the target domains receive a spatial structure and become indirectly meaningful to us. The metaphorical mappings, once established, then impose their structures on real life and become realized in various ways.





Lan Chun

experiential basis

realisation

directly emerging

being realized

Shang

Shang trajector

trajector

landmark

landmark

horizontal

Xia source domain of SPACE

metaphorical mappings

horizontal

Xia target domains

Figure 7: Mapping of SHANG and XIA onto their target domains

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

Among the 750 occurrences of shang analysed, only 34.7% are found to be cases of the dynamic model. This shows that, literally or metaphorically, SHANG is less often used to depict the trajectory followed by a moving trajector than the location of a stationary trajector. As many as 72.3% of the 750 instances analysed carry metaphorical meanings. This demonstrates how often SHANG is used metaphorically. The percentages of the three models of SHANG and the distribution of the metaphorical extensions detected are presented in Tables 1 and 2. Table 1: The three prototypical models of SHANG Prototype model Non-dynamic SHANG Dynamic SHANG

Number 356 134 260

Percentage of 750 147.4% 117.9% 134.7%

Total

750

100%

(a) contact SHANG (b) static SHANG

Table 2: The metaphorical extensions of SHANG Target domain STATES QUANTITY TIME HIERARCHY

Metaphorical extension

Number 0Percentage 0of 543 A More Desirable State Is Shang 328 060.4% A Larger Quantity Is Shang 088 016.2% An Earlier Time Is Shang 078 014.4% A Higher Status Is Shang 049 009.0%

Total

543

100%

Percentage of 750 43.7% 11.7% 10.4% 06.5% 72.3%

Of the 434 occurrences of xia analysed, about 45.9% are instances of the dynamic model. The remaining 54.1% are either cases of the static model or of the contact model. This shows that XIA is quite well balanced between its nondynamic side and its dynamic side, although the former occurs slightly more often than the latter. As many as 77.7% of the 434 instances of xia carry metaphorical meanings. The statistical findings are presented in Tables 3 and 4. Table 3: The three prototypical models of XIA Prototype model Non-dynamic XIA Dynamic XIA

Number 089 146 199

Percentage of 434 020.5% 033.6% 045.9%

Total

434

100%

(a) contact XIA (b) static XIA





Lan Chun

Table 4: The metaphorical extensions of XIA Target domain

Metaphorical extension

Number

STATES TIME HIERARCHY QUANTITY

A Less Desirable State Is Xia A Later Time Is Xia A Lower Status Is Xia A Smaller Quantity is Xia

Total

174 101 034 028

Percentage of 337 051.6% 030.0% 010.1% 008.3%

Percentage of 434 40.1% 23.3% 07.8% 06.5%

337

100%

77.7%

. Four metaphorical extensions In this section, we shall discuss the four metaphorical extensions observed for SHANG and for XIA in turn. Following the claims of experiential realism that conceptual metaphors arise from bodily experience and, once set up, will then impose their structures on real life, in presenting the metaphorical extensions I shall try to work out their experiential grounding on the one hand and their realizations in real life on the other.

(a) Quantity A Larger Quantity Is Shang. A Smaller Quantity Is Xia.

Experiential Grounding: When more of a substance or of physical objects is added to a container or pile, the level goes up (see Lakoff & Johnson 1980:15–16, Lakoff 1993:240, Johnson 1987, Goatly 1997). Realizations of the Metaphor: Man-made objects like thermometers and stock market graphs exhibit a clear correlation between ‘Larger Quantity’ and SHANG and between ‘Smaller Quantity’ and XIA. The following special cases have been identified in the data: – – – – – – –

Increase in salary is shang/ Decrease in salary is xia. Increase in costs is shang/ Decrease in costs is xia. Increase in prices is shang/ Decrease in prices is xia. Increase in inflation rate is shang/ Decrease in inflation rate is xia. Increase in temperature is shang/ Decrease in temperature is xia. Increase in speed is shang/ Decrease in speed is xia. Increase in volume/pitch of voice is shang/ Decrease in volume/pitch of voice is xia.

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

Examples of the above special cases are: (23) gongzi shangtiao salary up adjust ‘a rise in the salary’ (24) xiaofei xia jiang consumption down fall ‘a drop in consumption’ (25) wujia shangzhang price up rise ‘a rise in the price’ (26) wujia xia die price down drop ‘a drop in the price’ (27) tonghuo pengzhang shangyang inflation rate up rise ‘a rise in the inflation rate’ (28) tonghuo pengzhang xia jiang inflation rate down fall ‘a drop in the inflation rate’ (29) wendu shangsheng temperature up rise ‘a rise in the temperature’ (30) sudu xia jiang speed down fall ‘a drop in the speed’ (31) shengyin shangyang voice up rise ‘a rise in the voice’ (32) shuliang xia tiao number down adjust ‘a decrease in the number’

(b) Social Hierarchy A Higher Status Is Shang. A Lower Status Is Xia.





Lan Chun

Experiential Grounding: In ancient society, a man’s status was associated with his physical strength, and the latter in turn was typically correlated with his physical size. A man who is bigger and taller is usually stronger and hence in a better position to win a fight than a shorter and smaller man. The victor in a fight is typically on top of the loser (Lakoff & Johnson 1980: 15–16, Lakoff 1993, Johnson 1987). Realizations of the Metaphor: Architecture: Take the halls in the Forbidden City as an example. To go to any of the halls, one needs to climb a lot of stairs. Those halls symbolize the emperor’s status and power in people’s eyes and are therefore raised far above ground level. Rituals: In ancient China the throne of the emperor was always situated in a place several steps higher than the seats for his ministers. Within family households, the seat for the patriarch was also situated in a higher place or in a place considered to be higher. People kowtow in front of officials to acknowledge their humbleness. The rebellious were forced to kneel down to repent of their sin. Social practices: In a name list the names of VIPs come at the top of a page. In the prize-giving ceremonies at sporting events the champion stands a step higher than the contestant who came second, who in turn stands a step higher than the contestant in third place. Below are some examples: (33) shangdiao up move ‘move to a higher social position’ (34) xiafang down place ‘be moved to a lower social position’ (35) shangqing xiada up feeling down reach ‘for the feelings of those at the top to reach those at the bottom’ (36) shangji bumen up step bureau ‘those bureaus of a higher level’ (37) xiaji bumen down step bureau ‘those bureaus of a lower level’

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

(38) shangzuo up seat ‘seat for VIP’ (39) xiazuo down seat ‘seat for less important people’ (40) shangliu shehui up stream society ‘the upper class society’ (41) xialiu shehui down stream society ‘the lower class society’

(c) Time An Earlier Time Is Shang. A Later Time Is Xia.

These two metaphors fit into the larger system of TIME-AS-SPACE metaphor noted by many researchers (see e.g. Lakoff & Johnson 1980, Lakoff 1993, Alverson 1994, Svorou 1994, Allan 1995, Yu 1996). In particular, they arise from two special cases of TIME PASSING IS MOTION ALONG THE VERTICAL AXIS. Special case 1: Times are fixed locations arranged along a vertical landscape. An earlier time is above a later time. It is reflected in expressions like the ones listed below: (42) shang yi dai up a generation ‘the older generation’ (43) xia yi dai down a generation ‘the younger generation’ (44) shangci up time ‘last time’ (45) xia ci down time ‘next time’





Lan Chun

(46) shang ban nian up half year ‘the first six months of a year’ (47) xia ban nian down half year ‘the second six months of a year’ (48) shangxun up ten-days ‘the first ten days of a month’ (49) xia xun down ten-days ‘the last ten days of a month’

Special case 2: Human beings (with their belongings) move downwards towards the future. They can nevertheless go upwards to revisit an earlier time. It is reflected in expressions like: (50) you ci shangsu dao hanchao from here up trace to han dynasty ‘trace to the Han Dynasty from this point’ (51) yanzhe lishi de changhe ni liu er shang along history long river against stream up ‘to go up stream against the river of history’ (52) jianchi xia qu insist down go ‘carry on till the future’ (53) yi dai yi dai chuan xia lai one generation one generation pass down come ‘to pass down generation after generation’

The two cases are consistent with each other in that both entail an earlier time being above a later time. Experiential grounding: 1. Human beings have detectors for motion and for objects/locations in their visual systems, yet they have no detectors for time. It thus makes sense from a biological point of view that time should be understood in terms of space (Lakoff 1993:218). 2. In the history of human evolution, the conceptions of spatial relations are

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

developed much earlier than those of temporal relations (Akhundov 1986:171). 3. In the process of individual growth, the conception of spatial relations is also acquired before those of temporal relations (Akhundov 1986:21–22). Realizations of the Metaphor: Man-made objects: In a typical calendar, an earlier time is usually put either in front of or above a later time. Rituals: Offerings to the ancestral spirits were always placed on top of a sacrificial altar raised above ground level. Social practices: When drawing a family tree, one always puts the oldest generation at the top of the page and then traces down to the youngest generation, rather than vice versa.

(d) States A More Desirable State Is Shang. A Less Desirable State Is Xia.

This is a special case of the Event Structure Metaphor (Lakoff 1993, Yu 1996), which claims that “various aspects of event structure, including notions like states, changes, processes, actions, causes, purposes, and means are characterized cognitively via metaphors in terms of space, motion, and force” (Lakoff 1993:220). Experiential grounding: The human body stands upright, with the head at the top and the feet at the bottom. Humans and most other mammals lie down when they sleep and stand up when they wake. Dead people are in a physically recumbent position. Realizations of the Metaphor: Physical symptoms: A drooping posture is typically associated with sadness and depression; an erect posture is typically associated with more positive emotional states such as happiness and cheerfulness. Literary works: In literary works it is common for the pursuit of a desirable purpose to take the form of an actual upward journey, such as mountain climbing. The following specific metaphorical extensions are identified within the target domain of STATES:





Lan Chun

– – – – –

Higher Morality Is Shang/Lower Morality Is Xia. Better Quality Is Shang/Poorer Quality Is Xia. In Public Is Shang/In Private Is Xia. Greater Intensity Is Shang/Lesser Intensity Is Xia. Fulfilment Of A (Positive) Action Is Shang/ Fulfilment Of A (Negative) Action Is Xia. (54) shang de up virtue ‘grand virtue’ (55) xia jian down humble ‘of low morality’ (56) shang shi up gentleman ‘gentleman with high morality’ (57) xia shi down gentleman ‘gentleman with low morality’ (58) shang pin up rank ‘of the best quality’ (59) xia pin down rank ‘of the poorest quality’ (60) shang shi up market ‘be on sale’ (61) xia shi down market ‘be off sale’ (62) shang ban up office ‘go to work’ (63) xia ban down office ‘leave work’

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

(64) shang ke up class ‘have class’ (65) xia ke down class ‘class is over’ (66) dang shang jingli become up manager ‘get to the post of manager’ (67) diu xia haizi drop down child ‘leave the child unattended’

. UP and DOWN . Prototypical vs. metaphorical meanings In their prototypical dynamic and static models, UP and DOWN are used to denote the physical position or the changes in the physical position of a trajector along a vertical axis. Extended from these two prototypical models, UP and DOWN are also used to talk about and to construct a certain stage or changes over a period of time in other abstract domains. Evidence in the English corpus shows that UP and DOWN are mainly used for the conceptualization of changes in the same four target domains as SHANG and XIA, namely QUANTITY, SOCIAL HIERARCHY, TIME, and STATES. The metaphorical extensions identified are: 1. 2. 3. 4.

A Larger Quantity Is Up. A Higher Status Is Up. A Later Time Is Up. A More Desirable State Is Up.

A Smaller Quantity Is Down. A Lower Status Is Down. A Later Time Is Down. A Less Desirable State Is Down.

Since these metaphorical mappings for the most part share the same experiential grounding as their Chinese counterparts, I will not repeat their experiential bases in the following descriptions. As for the realizations of those metaphorical mappings, attention will only be paid to cases where a distinctively English way of realizing a particular metaphor has been detected. The analysis of the 529 instances of up shows that UP is used in its dynamic model in 97.7% of the cases. This is certainly different from SHANG and seems





Lan Chun

to suggest that while SHANG is used to structure both a certain stage of its target domains and a certain change taking place in its target domains (with a bias towards the former), UP is almost always used to denote the changes going on in its target domains. Altogether, 87.6% of the records of up analysed are found with metaphorical extensions. This is even higher than the 72.3% found in the case of shang. Tables 5 and 6 present the statistical findings. Table 5: The two prototypical models of UP Prototype Model Dynamic UP Static UP

Number 517 012

Percentage of 529 097.7% 002.3%

Total

529

100%

Table 6: The metaphorical extensions of UP Target domains

Metaphorical extension A More Desirable State is Up A Larger Quantity Is Up A Higher Status Is Up A Later Time Is Up

Number 0Percentage 0of 463 313 067.6% 126 027.2% 013 002.8% 011 002.4%

Percentage of 529 59.2% 23.8% 02.5% 02.1%

STATES QUANTITY HIERARCHY TIME Total

–

463

87.6%

100%

Of the 431 instances of down analysed, 94.4% belong to the dynamic model. Comparing this with XIA, which is well balanced between its static side and its dynamic side, we notice a sharp contrast. 45.4% of all the instances of down are found to have metaphorical extensions. This is much less than the 77.7% of xia. It is interesting to notice that while up is more often used metaphorically than shang, down is less often used metaphorically than xia. The reason for this will only be established by further research. Tables 7 and 8 present the statistical results. Table 7: The two prototypical models of DOWN Prototype model Dynamic DOWN Static DOWN

Number 407 024

Percentage of 431 094.4% 005.6%

Total

0431

100%

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

Table 8: The metaphorical extensions of DOWN Target Domain

Metaphorical extension A Less Desirable State Is Down A Smaller Quantity Is Down A Lower Status Is Down A Later Time Is Down

Number Percentage of 196 113 057.7% 054 027.6% 025 012.7% 004 002.0%

Percentage of 431 26.2% 12.5% 05.8% 00.9%

STATES QUANTITY HIERARCHY TIME Total

–

196

45.4%

100%

In what follows, each of the metaphorical extensions observed for UP and DOWN is discussed briefly. . Four metaphorical extensions

(a) Quantity A Larger Quantity Is Up. A Smaller Quantity Is Down.

The following specific cases have been found among the corpus data: – – – – – – –

Increase In Salary Is Up/ Decrease In Salary Is Down. Increase In Costs Is Up/ Decrease In Costs Is Down. Increase In Price Is Up/ Decrease In Price Is Down. Increase In Inflation Rate Is Up/ Decrease In Inflation Rate Is Down. Increase In Temperature Is Up/ Decrease In Temperature Is Down. Increase In Speed Is Up/ Decrease In Speed Is Down. Increase In Size Is Up/ Decrease In Size Is Down. (68) The football star can expect up to 300,000 pounds a week. (69) The nurses have offered to scale down their pay demands to a lower figure. (70) The costs have been multiplied up many times. (71) Is there any way we can prune the costs down still further? (72) The dealers bid up all the good pieces, to keep out private buyers. (73) The price of milk should be down next week. (74) The inflation rate is going up again. (75) The new government promised to bring the inflation rate down. (76) The sun warmed up the seat nicely. (77) After a warm and sunny day, the temperature will be down to 10 degrees tomorrow.





Lan Chun

(78) You’ll have to speak up a bit, we can’t hear you above the noise of the traffic. (79) The radio station faded the music down to give a special news broadcast. (80) She has blown up the pictures she took with her mom. (81) You’ve slimmed down such a lot since we last met!

(b) Social Hierarchy A Higher Status Is Up. A Lower Status Is Down.

One special way of realizing this pair of metaphors was noted in the English material: Religious beliefs: In Christianity, God and Jesus are up in Heaven, Satan and the other devils are down in Hell. Below are a few examples: (82) (83) (84) (85) (86)

The upper strata of society Paleo is an upmarket resort. Your request will be handed up to the board of directors. Are the citizens still refusing to yield up the town? He has moved up the social ladder quite a lot since we last met.

(87) (88) (89) (90)

The downfall of a dictator We sell a lot of down-market books. A national strike would bring the government down. Why do the English look down on everything foreign?

(c) Time A Later Time Is Up. A Later Time Is Down.

Two special cases have also been found with TIME PASSING IS MOTION ALONG VERTICAL AXIS in English. Consider the following examples: (91) (92) (93) (94) (95)

from 1918 up to 1945 from the Middle Ages up to the present day They were using charcoal right up to my day. Up until the early sixties there was no shortage of power. Up to now they’ve had very little to say.

Examples like the above suggest the existence of Special Case 1: time is moving

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

upward from the past towards the future. This is different from special case 1 noted in Chinese. Now consider some other examples: (96) It had been occupied as a palace by all our kings and queens down to James I. (97) There has been a chapel here down all the years my family has lived in this house. (98) The custom has been carried down from the 18th century.

Expressions like these suggest the existence of Special Case 2: human beings (with their belongings) move downward from the past toward the future. This is the same as special case 2 noted in Chinese. Unlike the situation in Chinese, the two special cases in English are not consistent with each other. This inconsistency results in a conflict between Towards A Later Time Is Up and Towards A Later Time Is Down.

(d) States A More Desirable State Is Up. A Less Desirable State Is Down.

This is a piece of evidence for the existence of the Event Structure Metaphor in English. The following specific cases have been observed in the data: – – – – – – – – – – –

Into Consciousness Is Up/ Into Unconsciousness (or Death) Is Down. Into A More Active State Is Up/ Into A Less Active State Is Down. Virtue Is Up/ Depravity Is Down. Into a State Of Cheerfulness Is Up/ Into A State Of Depression Is Down. Improvement In Appearances Is Up/ Worsening In Appearances Is Down. Increase In Brightness Is Up/ Decrease In Brightness Is Down. Increase In Force Is Up/ Decrease In Force Is Down. Increase In Thickness Is Up/ Decrease In Thickness Is Down. Into Existence Is Up/ Out of Existence Is Down. Into A State Of Operation Is Up/ Out Of A State Of Operation Is Down. Towards Completeness Is Up/ Towards Finality Is Down.

(99) When did you wake up this morning? (100) One of the brothers was gunned down outside his home in London. (101) Now that I’m the mother of two children I’m up at 6 every morning. (102) Jane was down with a cold last week, so she didn’t come to work.





Lan Chun

(103) She is an upstanding citizen. (104) That was a low-down thing to do. (105) You need a holiday to cheer you up. (106) The young man seemed to be loaded down with the worries of fatherhood. (107) Are we going to dress up for the wedding, or is it informal? (108) The model dressed down so that nobody could recognize her on the streets. (109) The new paint will brighten up the house. (110) Dim the stage lights down during scene 3. (111) The wind is up. (112) I hope the wind keeps down, or the sea will be too rough for sailing. (113) The mist has thickened up since this morning. I don’t think it’s safe to go out now. (114) The paint has been thinned down too much. (115) New towns are sprouting up all over the country as part of the government’s plan to find homes for the increasing population. (116) She waited until the laughter had died down. (117) I hated that old car, I had to crank it up every morning to get it started. (118) Make sure you shut down the computer before leaving the room. (119) I’m sorry, the hotel is booked up. (120) The shop will be closing down for good on Saturday, so everything is half price.

.

Conclusion

From the above analysis of both the Chinese and the English data, it can be seen that remarkable similarities mark the metaphorical extensions detected for SHANG/XIA and UP/DOWN. The similarities are mainly reflected in the following three ways: 1. Both SHANG/XIA and UP/DOWN are used to structure the same four target domains, namely QUANTITY, SOCIAL HIERARCHY, TIME and STATES. 2. Within these four target domains, what is oriented ‘xia’ is also oriented ‘down’, and what is oriented ‘shang’ is also oriented ‘up’ (except that An Earlier Time Is Shang, but A Later Time Is Up). 3. The metaphorical extensions detected are found to be arranged in largely

Cognitive approach to Up/Down metaphors in English and Shang/Xia metaphors in Chinese

comparable order of frequency, with A More Desirable State Is Shang/Up//A less Desirable State Is Xia/Down being the most frequently occurring metaphorical extension for all the four concepts. Some discrepancies between Chinese and English have also been observed: 1. It has been found that UP and DOWN are predominantly used in their dynamic model while SHANG and XIA are well balanced between their nondynamic side and their dynamic side, with a bias towards the former. To put this in another way, while SHANG and XIA are used to denote both the location of a stationary trajector and the orientation of a moving trajector, UP and DOWN are predominantly used to capture the latter rather than the former. 2. SHANG and XIA carry a special prototypical model called contact SHANG and contact XIA. With contact SHANG, the trajector rests upon and is supported by the landmark; with contact XIA, the trajector stays below and is covered or pressed by the landmark. No such special case has been found with UP or DOWN. 3. Within the domain of TIME, Chinese has a pair of conceptual metaphors in agreement with each other, namely An Earlier Time Is Shang/A Later Time Is Xia; English, by contrast, has a pair of metaphors in the reverse direction, namely A Later Time Is Up/A Later Time Is Down. It must be emphasized that these discrepancies do not diminish the overall remarkable similarities between the metaphorical extensions found for SHANG and XIA and for UP and DOWN. This study suggests the following conclusions: 1. Since STATES, QUANTITY, SOCIAL HIERARCHY and TIME are all basic abstract domains important to our thinking, the fact that in both Chinese and English they are organized by the metaphorical mappings of the imageschematic structures of SHANG/XIA and UP/DOWN illustrates that our abstract reasoning is at least partly metaphorical. 2. That Chinese and English exhibit remarkable similarities in the metaphorical extensions of SHANG/XIA and UP/DOWN is a piece of evidence that there may indeed exist a universal spatial metaphorical system as predicted by Johnson (1992) and Sinha (1995). The Event Structure Metaphor and the TIME-AS-SPACE metaphor in particular may be strong candidates for universal metaphorical mappings.





Lan Chun

References Akhundov, M. 1986. Conceptions of Space and Time. Cambridge, MA: MIT Press. Allan, K. 1995. “The anthropocentricity of the English word(s) back”. Cognitive Linguistics 6: 11–33. Alverson, H. 1994. Semantics and Experience: Universal Metaphors of Time in English, Mandarin, Hindi, and Sesotho. Baltimore: Johns Hopkins University Press. Bickel, B. 1997. “Spatial operations in deixis, cognition, and culture: Where to orient oneself in Belhare”. In Language and Conceptualization, Nuyts & Pederson (eds), 46–83. Cambridge: Cambridge University Press. Gärdenfors, P. 1996. “Mental representation, conceptual spaces and metaphors”. Synthese 106: 21–47. Geiger, R. and Rudzka-Ostyn, B. (eds) 1993. Conceptualizations and Mental Processing in Language. Berlin: Mouton de Gruyter. Goatly, A. 1997. The Language of Metaphors. London: Routledge. Johnson, M. 1987. The Body in the Mind. Chicago: University of Chicago Press. Johnson, M. 1992. “Philosophical implications of cognitive semantics”. Cognitive Linguistics 3 (4): 345–366. Lakoff, G. and Johnson, M. 1980. Metaphors We Live By. Chicago: University of Chicago Press. Lakoff, G. 1987. Women, Fire, and Dangerous Things. Chicago: University of Chicago Press. Lakoff, G. 1993. “The contemporary theory of metaphor”. In Metaphor and Thought, Ortony (ed.), 202–251. Cambridge: Cambridge University Press. Langacker, R. 1987. Foundations of Cognitive Grammar. Stanford: Stanford University Press. Leech, G. 1983. Principles of Pragmatics. London: Longman. Lindner, S. 1981. A Lexico-semantic Analysis of English Verb Particle Constructions with OUT and UP. Ph.D. dissertation. University of California, San Diego. Morgan, P. 1997. “Figuring out figure out: Metaphor and the semantics of English verb-particle construction”. Cognitive Linguistics 8 (4): 327–359. Schönefeld, D. 1999. “Corpus linguistics and cognitivism”. International Journal of Corpus Linguistics 4: 137–171. Sinha, C. 1995. “Introduction”. Cognitive Linguistics 6: 7–9. Smith, M. 1993. “Cases as conceptual categories: Evidence from German”. In Conceptualizations and Mental Processing in Language, R. Geiger and B. Rudzka-Ostyn (eds), 530–545. Berlin: Mouton de Gruyter. Stibbe, A. 1996. Metaphor and Alternative Conceptions of Illness. Ph.D. dissertation. Lancaster University. Svorou, S. 1994. The Grammar of Space. Amsterdam: John Benjamins. Yu, N. 1996. The Contemporary Theory of Metaphor: A Perspective from Chinese. Ph.D. dissertation. University of Arizona.

From figures of speech to lexical units An English-French contrastive approach to hypallage and metonymy Michel Paillard

.

Introduction: methodological issues

The aim of this paper is to examine specific instances of the figures of speech known as hypallage and metonymy, whether from textual or lexicographic material, and to show that in some areas the availability of these syntacticosemantic patterns differs substantially in English and in French. The reason for examining both corpus and dictionary data is that while contrast observed in translated passages of fiction may be partly attributed to stylistic choice and subjectivity, lexicalized examples of the same contrastive patterns provide recognized evidence. Using corpus data to investigate these syntactico-semantic patterns is far from straightforward as they cannot be searched for on the basis of form. They are special cases of Adjective+Noun or Noun+Noun phrases which even elaborate tagging procedures could not adequately sort out. Besides, systematic scrutiny of a 100,000-word bilingual press corpus only yielded a dozen occurrences of those types, several of which are discussed below. This seems to indicate that they are less frequent in journalistic prose than in either literary style or everyday vocabulary.1 The systematic study of a much larger corpus might modify the conclusions reached in this paper.



Michel Paillard

. Hypallage . Figure of speech and syntactic shift Hypallage is defined as an “interchange in syntactic relationship between two terms” (Webster’s Collegiate Dictionary, e.g. You are lost to joy for Joy is lost to you) or as “the transposition of the natural relations of two elements in a proposition” (Concise Oxford Dictionary, e.g. Melissa shook her doubtful curls). More specifically, hypallage characterizes phrases in which the (apparent) syntactic scope of a qualifying term does not coincide with its (real) semantic scope. As the difference between the two examples above suggests, this can apply to several types of syntactic structure. For the sake of clarity we shall distinguish between three of them. Type 1 involves a syntactic shift (of one) or inversion of (two) elements: (1) Ce marchand accoudé sur son comp- The greedy shopkeeper, resting his elbows on his counter2 toir avide (Victor Hugo) for: “Ce marchand avide accoudé sur son comptoir” (2) besmirched / With rainy marching in salies par des marches pluvieuses à the painful field (Henry V) for: “With travers la plaine ardue [Translation painful marching in the rainy field” by François-Victor Hugo]

These examples (quoted in Suhamy 1981: 54) clearly belong to dated poetic style. The translation of (1) restores the real syntactic scope of the adjective. In the translation of (2), the marked effect of hypallage is fully maintained in marches pluvieuses, but only partly so in the second phrase as ardu(e) commonly collocates if not with plaine (which would be felt as a contradiction) at least with other topographic terms such as chemin (chemin ardu : steep/difficult path). The trope still routinely appears in modern literature, creating an impressionistic effect, as highlighted by Fromilhague (1995:43) with regard to example (3): D’où la présence marquée de l’hypallage dans les textes qui visent à restituer des associations impressionnistes étrangères à la logique: l’écriture artiste et l’esthétique fin de siècle en fournissent des exemples nombreux.3 (3) les fleurs de paulownias, d’un mauve the paulownia flowers, a rainy mauve against the sky of Paris pluvieux du ciel parisien (Colette) (4) des cocktails d’une écoeurante et inutile complication (Modiano)

sickening and uselessly sophisticated cocktails

From figures of speech to lexical units

Type 2, illustrated in examples (5–6), adds a change in syntactic category, typically Adjective for Adverb or vice versa: Je me frayai avec précaution un (5) I picked a careful way through the lobby to the privies on the starboard chemin jusqu’aux toilettes côté quarter. (W. Golding, Close Quarters) tribord (6) Still and quiet and almost looking flimsily aged at ten years old (J. Gardam, God on the Rocks)

Sage et tranquille, l’air fragile et presqu’âgé à dix ans 4

Such data can be related to the well-known syntactic versatility of English adverbs in -ly (from manner to sentence adverbs). As argued by Larreya & Méry (1992), hypallage is part of the more general flexibility of movement (including raising) which characterizes English syntax. In the fields of qualification and modality, for instance, both languages have raisings such as She is easy to please (Elle est facile à contenter) instead of It is easy to please her. French, however, does not have She is likely/certain to win or the epistemic interpretation of You’re sure to like the film. Type 3 further involves ellipsis of one term, as in the following passages, where the semantically implied but syntactically omitted element has to be reintroduced in the French version: (7) They thought in the winter it must be Ils pensaient aux dix miles à pardamn cold. They thought of the ten courir sous la pluie pour gagner Handleyford. drizzling miles to Handleyford. (V.S. Pritchett, Many are Disappointed). (8) Did you imagine that in the vicinity of Noah’s palace (oh, he wasn’t poor, that Noah) there dwelt a convenient example of every species on earth? (J.Barnes, A History of the World in 10 1/2 Chapters)

On aurait pu penser qu’à proximité du palais de Noé (il n’était pas à plaindre, allez, ce Noé), par un heureux hasard, résidait un exemplaire de chaque espèce vivant sur Terre.

(9) Ancien président de NBC, M. Joseph Angotti suggère: “Pour l’essentiel, cela ne doit rien à un choix rédactionnel. Il s’agit d’une logique économique. C’est ce qu’il y a de plus facile, de plus paresseux et de moins cher à couvrir”. (Le Monde diplomatique, August 1998)

A former vice-president of NBC, Joseph Angotti suggests that “most of that crime coverage is not editorially driven, it’s economically driven. It’s the easiest, cheapest, laziest news to cover.” (Le Monde diplomatique, August 1998, English Edition)





Michel Paillard

Example (9), from the parallel corpus mentioned above, is more complex for the following reasons: – –

–

The French text includes a quotation which was originally in English. In the English version, the first two adjectives pave the way for a tolerable if semantically distorted third superlative + infinitive construction. Clearly, while easy and cheap apply to the coverage, lazy can only qualify the journalist. The French adjective paresseux is in second position and therefore does not directly take the infinitive as its complement. The sequence plus paresseux à couvrir would definitely be unacceptable.

. Hypallage in lexicalized phrases The pattern of ellipsis and the contrast between English and French in this respect are clearest in cases of lexicalized hypallage such as the following: (10) Foreign Office ‘Foreign Affairs Office’ Ministère des Affaires étrangères (11) art theft ‘theft of works of art’

vol d’oeuvres d’art

(12) lucid interval ‘an interval during which a mentally ill person is lucid’

moment de lucidité

(13) restricted area ‘area where speed is restricted’ ‘area where access is restricted’ (14) white wedding ‘a wedding at which the bride wears a white dress’

zone à vitesse limitée zone à accès réglementé mariage en blanc5

sports de l’extrême (15) extreme sports ‘Designating sports performed in a hazardous environment, involving a high physical risk.’ (The Oxford Dictionary of New Words, 1997) ? tarif accordé pour motif familial (16) compassionate fare ‘A significantly reduced airline fare made available to grave people who are travelling to attend a funeral or visit someone very ill.’ (A Dictionary of Today’s Words)6

Similarly elliptical compounds such as happy hour, topless bar, sick bag doggedly resist translation into French.7 They require some form of transposition, and

From figures of speech to lexical units

in some cases are simply borrowed (e.g. standing ovation). Variants of example (15) seem to be creeping in (ski extrême, skieurs de l’extrême) and one recently found its way into a report by Le Monde on the rescue of two British mountaineers, following due contextual preparation : (17) C’est une véritable “opération commando” qui a été menée, dimanche 31 janvier, dans des conditions extrêmement périlleuses, deux alpinistes britanniques bloqués depuis quatre jours à près de 4000 mètres d’altitude. (…) “Il fallait faire vite et être précis. C’était une course contre le temps et la montre”, raconte Pascal Brun (…) Ce sauvetage extrême exigeait un appareil puissant et disposant d’une moindre prise au vent qui continuait de souffler en rafales. (Le Monde, 2 February 1999, p. 10)

A few phrases of this type have come to be common to the two languages: (18) fast lane

voie rapide

(19) happy days

jours heureux

(20) masked ball

bal masqué

(21) musical chairs

chaises musicales

But fully lexicalized cases of hypallage are few and far between in French: (22) (tomber en) panne sèche “panne lors de laquelle le réservoir d’essence est à sec” also: (tomber en) panne d’essence

run out of petrol/gas

(23) de guerre lasse (il finit par accepter) “las de résister, il finit par accepter”

he grew tired of resisting and finally accepted [Robert & Collins Dictionary]

In their outstanding studies of English word formation, both Adams (1973:87) and Tournier (1985: 212) emphasize the role of ellipsis in such patterns. In the words of Adams (1973:87): Some of these could be seen as three-word structures with an ellipted second element; confidential secretary might be explained by a phrase like ‘confidential work secretary’.

Tournier’s treatment of “l’hypallage lexicalisée” as a shift of meaning (alongside the countable/uncountable or transitivity parameters of polysemy) can be questioned insofar as the admittedly problematic syntax of such phrases leaves the meanings of the components unchanged. They are best dealt with, as





Michel Paillard

Adams chooses to do, as a special type of Adjective + Noun compound. Both authors group the type illustrated above with arguably different structures: plastic surgeon quoted by Tournier or criminal lawyer quoted by Adams do raise problems of analysis and translation but they should in fact be treated as derivationally related to plastic surgery and criminal law respectively (cf. Coates 1971).

. Metonymy . Metonymy in exocentric and endocentric compounds The contrastive picture is quite different, and in some respects reversed, where metonymy is concerned. Metonymies in which the ‘vehicle’, in the terms of Leech (1969: 151), names a distinctive part or concrete characteristic of the entity referred to, or ‘tenor’, are not uncommon in French: (24) des gros bras

rednecks, musclemen

(25) le rouge-gorge

the (redbreast) robin

(26) le petit écran

the small screen

(27) une ceinture noire

a black belt

These are exocentric compounds (rednecks are not necks), also called bahuvrihi compounds. Both languages have cols blancs (white-collar workers), even though the French phrase is labelled as “traduction de l’anglais” in Le Petit Robert, but only American English has wetbacks, ‘ouvrier agricole mexicain entré illégalement aux Etats-Unis’ (Robert & Collins English-French Dictionary). The pattern is indeed even more widespread in English in endocentric compounds (a bag lady is a lady) such as the following. The distinctive feature selected tends to be very specific and the semantic shortcut from vehicle to tenor can be spectacular. Translation into French is often problematic: (28) bag lady

clocharde [Robert & Collins Dictionary, which conversely gives tramp as a translation for “clochard(e)”]

(29) lollipop lady / man

(Brit) contractuel(le) qui fait traverser la rue aux enfants [Robert & Collins Dictionary]

(30) red-brick university

(Brit: often pej) université de fondation récente [Robert & Collins Dictionary]

From figures of speech to lexical units

(31) Ivy League

(US) les huit grandes universités privées du nord-est [Robert & Collins Dictionary]

(32) latchkey child (‘a child who is ?? “enfant à la clé” [Robert & Collins Dictionary] alone at home after school until a parent returns from work’, Concise Oxford Dictionary) (33) jet set

le ou la “jet set”

. Nominalization and discreteness English on the other hand seems to resist some types of abstract-for-concrete metonymy. Although there are many examples of long-standing, fully lexicalized state or action nouns in either English or French (an administration, an introduction, a building, a facility, etc.) English less readily allows a nominalized predicate to refer to a specific occurrence, or to the agent, place or instrument of the process. The following examples are from Chuquet & Paillard (1987), Astington (1983), Guillemin-Flescher (1981) and the parallel corpus mentioned above: (34) société de consommation

consumer society

(35) à la réception

at the reception desk 8

(36) l’allongement de la scolarité

the raising of the school-leaving age

(37) Une signalisation totalement dif- An entirely different signalling system férente sera alors indispensable, will be essential, particularly to allow pour permettre en particulier le automatic braking of trains. freinage automatique des rames. (38) Mais les régions dominées par la guérilla sont aussi les zones où s’est développée la culture de la coca. (Le Monde diplomatique, July 1998)

However, the regions dominated by the guerrilla movements also happen to be the areas in which the growing of coca is particularly widespread. (Le Monde diplomatique, English Edition)

(39) La confusion entre information et divertissement, désormais réunis par le lien sacré de l’audience, a parfois des effets politiques et sociaux dévastateurs. (Le Monde diplomatique, August 1998)

The blurring of the dividing line between information and entertainment, both of which are now governed by the iron law of audience ratings, can have dangerous political and social effects. (Le Monde diplomatique, English Edition)





Michel Paillard

Such differences are relevant to at least two areas of linguistic analysis: – the crucial question of nominalization and the related issues of concretization and discreteness: Langacker (1987) offers different cognitive representations of the verb and of the noun in pairs such as explode/explosion. Defrancq & Willems (1996) examine the polysemy of nominalizations on a scale of concreteness. For example, French construction can refer either to the process of building or to the resulting edifice whereas édification only refers to a process. – the diverging strategies of English and French in sentence orientation and semantic compatibility in argument structure. Detailed contrastive work carried out within the theoretical framework of Culioli’s “Théorie des opérations énonciatives” (Guillemin-Flescher 1981, Celle 1997) shows that while French routinely associates heterogeneous predicates and arguments in terms of their degree of animacy or abstractness, English requires a higher degree of homogeneity. Guillemin-Flescher (1981) uses a large corpus of works of fiction and their published translations to show, for instance, that the English versions will tend to avoid associating nouns referring to inanimate entities with verbs normally taking animate subjects: (40) sa conscience le taraudait (M. Tournier, Vendredi)

he suffered pangs of conscience (Translation by N. Denny)

This explains the frequent need in English translations of this type to fall back on concrete nouns as the syntactic heads of arguments. Straightforward examples are provided by lexicalized phrases such as (34–39). The structure is more complex in textual material such as (41–44), where various grammatical factors are involved: quantification (41), collocation and metaphor (43), coordination of arguments (44). Rearrangements are then required.9 (41) Ses plus extrêmes audaces, par certains côtés, sont des naïvetés.

His most audacious tricks, in some respects, are mere lack of experience.

(42) Il est vrai que, mieux que tous les sondages, les banques connaissent l’intimité économique des Français.

It is true that, better than any opinion poll, the banks know the intimate details of French people’s economic life.

(43) Des héritiers, dont les ancêtres ont immigré depuis déjà deux siècles, persistent à y cultiver la citoyenneté britannique.

The descendants of the first migrants who landed some two centuries ago still cultivate the art of being true British citizens.

From figures of speech to lexical units

(44) Une route buissonnière un peu déglinguée rejoint Ermelo et la fraîcheur de ses cascades.

A rough cross-country road leads to Ermelo and its cooling waterfalls.

Failure by non-native speakers to recognize and respect such differences can lead to grammatically well-formed but non-idiomatic expressions. Celle (1997:148) notes that literal translations of the phrases highlighted in (43) and (44) would not be acceptable in English : – –

Their descendants still cultivate British citizenship. A rough country road leads to Ermelo and the coolness of its waterfalls.

. Conclusion Hypallage and metonymy are found on a cline from complete lexicalization to literary creativity. On the basis of the data examined in this paper, which should be supplemented by quantitative corpus-based analysis, the limits imposed on the use of these patterns in English and in French appear to be diametrically opposed: in the type of metonymy just examined, French characteristically tolerates a greater degree of semantic heterogeneity between argument and predicate. Through hypallage, English characteristically allows greater syntactic flexibility in the form of movement and ellipsis.

Notes . The sample examined is part of a 500,000-word journalistic corpus now being created at the University of Poitiers for concordance processing. It consists of articles from Le Monde diplomatique published over a three-year period (1998–2000) and their English translations made available to subscribers in electronic form. It will be matched by a multilingual fiction corpus as part of a joint research project named PLECI (Poitiers-Louvain Echange de Corpus Informatisés). .

The French versions of examples (1) to (7) are my translations unless otherwise stated.

. Hence the marked presence of hypallage in texts aiming to conjure up impressionistic associations alien to logic: over-elaborate writing and fin de siècle aesthetic standards offer many examples of it. (My translation.) . Examples (6) and (8) are from Khalifa, J.C., Fryd, M. & Paillard, M. 1998. La version anglaise aux concours. Paris: Colin.





Michel Paillard .

Not to be confused with mariage blanc, which is a metaphor (‘unconsummated marriage’).

.

Cf. Lerner and Belkin (1993).

. I am grateful to the colleagues who offered suggestions on this point during and after the Symposium in Louvain, particularly François Maniez from the University of Lyon 2. . An illustration of this problem is to be found in Van Roey et al. (1988: 583): Veuillez passer à la réception can be translated by either Please go to reception or Please go to the reception desk. . Examples (41) and (42) are borrowed from Astington (1983); (43) and (44) from Celle (1997).

References Adams, V. 1973. An Introduction to Modern English Word-Formation. London: Longman. Astington, E. 1983. Equivalences. Translation Difficulties and Devices, French-English, English-French. Cambridge University Press. Celle, A. 1997. “Quand l’objet est un nom de procès”. In “La transitivité”, M.L Groussier (ed.), Cahiers Charles V 23: 139–172. Université de Paris 7. Chuquet, H. et Paillard, M. 1987. Approche linguistique des problèmes de traduction, anglais <> français. Paris et Gap: Ophrys. Coates, J. 1971. “Denominal Adjectives: A Study in Syntactic Relationships between Modifier and Head”. Lingua 27: 160–169. Culioli, A. 1990. Pour une linguistique de l’énonciation. Opérations et représentations. Paris et Gap: Ophrys. Defrancq, B. and Willems, D. 1996. “De l’abstrait au concret. Une réflexion sur la polysémie des noms déverbaux”. In Les noms abstraits. Histoire et théories, N. Flaux, M. Glatigny and D. Samain (eds), 221–230. Lille: Presses Universitaires du Septentrion. Dupriez, B. 1984. Gradus. Les procédés littéraires (Dictionnaire). Paris: Union Générale d’Editions [Collection 10/18]. Fromilhague, C. 1995. Les figures de style. Paris: Nathan [Collection 128]. Guillemin-Flescher, J. 1981. Syntaxe comparée du français et de l’anglais. Paris et Gap: Ophrys. Kleiber, G. 1994. Nominales. Essais de sémantique référentielle. Paris: Colin. Lakoff, G. and Johnson, M. 1980. Metaphors We Live By. Chicago: The University of Chicago Press. Langacker, R. 1987. “Nouns and Verbs”, in Communications 53 (1991): 103–153. Paris: Editions du Seuil. Larreya, P. et Méry, R. 1992. “On the Syntactic Productiveness of Hypallage”. Travaux du CIEREC 76: 143–160. Université de Saint-Etienne. Leech, G. 1969. A Linguistic Guide to English Poetry. London: Longman. Rainer, F. 1996. “La polysémie des noms abstraits”. In Les noms abstraits. Histoire et théories, N. Flaux, M. Glatigny and D. Samain (eds), 117–128. Lille: Presses Universitaires du Septentrion.

From figures of speech to lexical units

Suhamy, H. 1981. Les figures de style. Paris: Presses Universitaires de France. [Collection “Que sais-je?”] Tournier, J. 1985. Introduction descriptive à la lexicogénétique de l’anglais contemporain. Paris: Champion-Slatkine. Tournier, J. 1988. Précis de lexicologie anglaise. Paris: Nathan. Ullmann, S. 1967. Semantics. An Introduction to the Science of Meaning. Oxford: Blackwell. Van Hoof, H. 1989. Traduire l’anglais. Théorie et pratique. Paris et Louvain: Duculot.

Dictionaries The Concise Oxford Dictionary, 1995. Le Nouveau Petit Robert, Dictionnaire de la langue française, 1993. Robert & Collins, French-English, English-French Dictionary, 1995. Webster’s New Collegiate Dictionary, 1979. The Oxford Dictionary of New Words, 1997. Lerner, S. and Belkin, G. S. 1993. Trash Cash, Fizzbos, and Flatliners. A Dictionary of Today’s Words. Boston and New York: Houghton Mifflin. Van Roey, J., Granger, S. and Swallow, H. 1988. Dictionnaire des faux amis anglais-français. Paris & Gembloux: Duculot.



P IV

Corpus-based Bilingual Lexicography

The role of parallel corpora in translation and multilingual lexicography Wolfgang Teubert

.

The need for translation

Globalisation has led to an increased demand for translation. Twenty years ago, when people in Europe who had bought a satellite dish were given the option of choosing between TV programmes broadcast in many languages, it was believed that this would lead to an increase in learning of foreign languages, not only English as the global interlingua, but also European languages of some regional importance such as French and German. But these expectations were not fulfilled. While, in their professional lives, more and more people are learning to function in a bilingual or multilingual environment, it seems that, apart from a traditionally small polyglot elite, in their private lives they tend to cling to the language they grew up with. Suddenly we find not only periodicals but also daily newspapers being translated. There is a daily German edition of the Financial Times, and in France, Germany, and Italy the International Herald Tribune now comes together with an English language edition of a prominent local newspaper. The EuroNews TV channel is transmitted in several languages, and other channels will follow suit shortly. Globalisation of the media has opened up a new market for instant translation. Alongside the need for instant translation of texts, most of which will soon be forgotten (the common fate of most media coverage), there is also the necessity to translate agreements, contracts and all other documents that could have a legal impact (such as product descriptions and user instructions) into other languages. Whatever their source language (increasingly English, as we are all aware), these texts have to be localised, translated into the language(s) of the country whose jurisdiction is involved. As long as legal systems are not glob-



Wolfgang Teubert

alised, courts will accept documents as evidence only if they exist in the official language(s) of the country in which the court is situated. It is one of the central principles of the European Union that only those texts issued by the European authorities which have been translated into the official EU languages become legally binding in the member states. Therefore, all the new countries which have applied for membership, the so-called newly associated countries (NACs), will have to translate a corpus of (ultimately) 12 million words of EU documents into their languages before they can join. For the existing EU, the Commission’s Translation Service, the largest translation agency in the world, produces translations of all relevant new documents in the official EU languages; and here again it is questionable whether these translations will ever be referred to. Kaisa Koskinen tells us that “often, all the Finnish participants will have already read the original non-Finnish document (or even taken part in drafting it) and the Finnish translation arriving two months later contains no new information for them. The Finnish authorities have also been notoriously reluctant to rely (or admit to relying) on the Finnish versions, preferring to use English or occasionally French translations which are perceived as ‘more reliable’ or even judicially more valid than the Finnish ones” (Koskinen 2000: 51–52). This certainly does not mean the Finns will give up their right to have Finnish translations. Rather, it is another indication that our complex modern environment demands the production and translation of texts not to be read but to be there in case a need for them arises. At the European Parliament, we encounter the paradoxical situation that less than 10% of its budget is spent on parliamentary work proper, while more than 90% is spent on interpretation and translation. Translation, together with the necessity to write texts in a foreign language, is the most remarkable challenge linguistics has ever faced. To prepare people to cope with multilingual situations it is not enough to teach them foreign languages, there is also a need to give them tools — printed and electronic bilingual dictionaries that actually serve their purpose. It is time to develop a new generation of dictionaries, dictionaries suitable for assisting translation not only into the translator’s native language but also into a foreign language, dictionaries that give their users the proper translation equivalent for each semantic unit they have to deal with.

. Cognitive linguistics: a model for cross-linguistic lexicology? Meaning is the core issue of translation. A translator produces a paraphrase of a

The role of parallel corpora in translation and multilingual lexicography

text in another language. Meaning and meaning alone links a paraphrase to the original text. The more similar text and paraphrase are in their meanings, the more satisfactory the paraphrase. But many linguists are rather coy on the issue of translation. They are much more interested in contrasting languages and their vocabularies from a typological point of view. But how can one contrast vocabularies without using texts and their translations as a tertium comparationis? The history of Machine Translation (MT) is closely related to the history of cognitive linguistics. But for mainstream cognitive linguistics, meaning is not a discourse feature but a feature of the mind, in the form of mental representations of concepts. Concepts are, in this model, universal and part of the language of thought or ‘mentalese’, and they can be mapped onto the speaker’s native-language vocabulary, even though we cannot assume a one-to-one relationship between universal concepts and the words of a given natural language. Is this an approach that could be adopted by cross-linguistic lexicology? Is a word in one language the equivalent of a word in another language because they can both be mapped, somehow, to the same concept? Words are language signs, are symbols; they can be studied from the two points of view: content (or meaning) and form. Content or meaning cannot be separated from form. What then is the form of the universal concepts cognitive linguists talk about? How can we describe their meaning without using natural language? If there really are universal concepts, if we are told what they look like and what exactly they mean, cross-linguistic lexicography could and should use them. Universal concepts could help us define what translation equivalence is. As long as the conceptual ontologies used in MT describe their concepts in pretty much the same way as dictionaries do, using English or any other language for their definitions, it is hard to see how cross-linguistic lexicography can profit from this approach. The core issue of translation is meaning. For each semantic unit of the source text, there has to be an equivalent in the target text. Therefore cross-linguistic lexicography in quest of meaning must pay close attention to the practice of translators. It is they who invent the translation equivalents for lexical expressions. For these translation equivalents are not discovered, they are invented. Translators deal in texts, and they undertake to paraphrase a text in a different language so that the paraphrase will mean almost the same as the original text. In order to carry out their task, they have to understand the text. This means that they interpret the text. Text interpretation, however, is an action, not a process. Only human beings can do it. All computers can do is carry out processes. Therefore computers cannot translate in the sense that translation is





Wolfgang Teubert

generally understood. This is why the classical approach to MT will necessarily fail whenever the goal is to translate general language texts without any need for post-editing. Using concepts does not help as long as we have to treat concepts just as we have to treat natural language words. This is really nothing new. In his book The Possibility of Language (1996) Alan K. Melby, one of the founders of the discipline of Machine Translation, has given us a thorough account of why machine translation based on conceptual ontologies cannot work. But the MT community has paid little attention. So at present cross-linguistic lexicology can learn but little from Machine Translation.

. The quest for the perfect language Ever since the Tower of Babel, the Ursprache has been replaced by a multitude of mutually incomprehensible vernaculars, and the complaints about the corruption of our (natural) languages have not abated. In the language of Adam and Eve, we are told, words still had their proper meaning; they represented the Platonic ideas, concepts like ‘apple’ and ‘snake’ in metaphysical purity, and not yet subject to the distortions that came to pass once the Golden Age was over. Since then there have been a multitude of attempts to recreate this original perfect language as a symbolic algorithm guaranteeing instant communication and perfect understanding. It is a belief cherished in their hearts by many members of the AI and MT communities. In a newspaper article written by Chris Partridge which appeared in the London Times in July 1997, reiterating how close science is to complete success in Machine Translation, we find the revealing heading “Language is the last barrier to global communication”. The message of this article is that once we replace our deficient, decayed and corrupt native languages by a linguistic system free of the contingencies, idiosyncrasies and anomalies so typical of natural language, the problem of global communication will be solved. It is a way of looking at meaning that we also find in Hilary Putnam’s article ‘The Meaning of Meaning’ (1975). In it he suggests that the closest we can hope to get to distinguishing the meaning of the word elm from the meaning of the word beech is to ask not the language community but the expert. But even what the experts tell us is no more than an approximation. For Putnam, the true meaning of the word elm is what elms are in reality and what sets them apart from other trees, for instance beeches. In his view, the category ‘elm’ would exist even if there was no one to be aware of it. Among cognitive linguists, we find a similar desire to posit an ideal lan-

The role of parallel corpora in translation and multilingual lexicography

guage, common to all human beings. Since Noam Chomsky is primarily interested in finding and formulating the universal and innate laws which make our language work, he is more concerned with syntax (regarding language as some kind of formal algorithm) than with semantics, which he regards as a secondary and contingent phenomenon. But scholars originally close to him, such as the language philosopher Jerry Fodor, include semantics in their search for language universals. In 1975, Fodor published the seminal Language of Thought in which he discusses the universal nature of concepts as cognitive phenomena. Since then, cognitive linguists have been busy exploring the world of concepts, and there seem to be few who would agree on their nature, their number, on what they mean and how they differ from words of natural languages. An interesting selection of competing ideas can be found in the anthology Language and Thought. Indeed it seems that each author represented in this collection has his or her unique definition of ‘concept’ (Carruthers/Boucher 1998). In the meantime, Jerry Fodor’s Language of Thought has been renamed Mentalese, and Stephen Pinker apparently thinks that this is the language people all over the world think in, regardless of which language they speak (Pinker 1996). The quest for a perfect language that today unites many cognitive linguists, philosophers of the mind and experts in the field of MT has a long tradition. Umberto Eco, in his La ricerca della lingua perfetta nella cultura europea (1993), has described the endless attempts to either reconstruct the Ursprache or create a perfect language in which every correct sentence is a true sentence, in which every expression has only one meaning and in which every word is forever linked with the metaphysical reality it designates. All of these attempts were doomed to failure. But the fascination emanating from this idea seems to be inexhaustible, to this very day.

. The interlingua approach to multilingual lexicography In the remaining sections of this article, I want to show that 1. using a conceptual ontology as an interlingua does not help us with translation; 2. it is not words that are translated but translation units (units of meaning) in the form of compounds, multi-word units, collocations or set phrases; 3. parallel corpora are repositories of translation units and their equivalents in the target language, and that these translation units and their equivalents can be processed and re-used in subsequent translations.





Wolfgang Teubert

I will look at the words work, travail and Arbeit. For the interlingua model, I will use the entry for work in the current Internet version of the Princeton WordNet (www.cogsci.princeton.edu), a set of seven English translations of Plato’s Republic, and the French and German translations of the same book. I intend to show that conceptual ontologies representing general language cannot be language-neutral, and that while the concepts they feature may correspond to the word senses of one language, they do not match word senses in another language. I will then show that conceptual ontologies, even if they were language-neutral, would still not facilitate translation since the concepts they feature correspond in principle to single words, whereas texts are translated by translation units often larger than a single word. Finally I intend to demonstrate how the translational knowledge contained in parallel corpora can be used to increase the productivity and quality of human translation. Table 1 presents the WordNet entry for the noun work. It lists seven word senses, called synsets (sets of synonyms). Synsets may consist of only one word (if there is no other word having a word sense synonymous with it) (synsets 1 and 7), or of several words, if all those words have the word sense in common (synsets 2 to 6). Each entry gives a definition in brackets, but it is not clear if the Table 1. WordNet entries for work 1.

work — (activity directed toward making or doing something; “she checked several points needing further work”)

2.

work, piece of work — (something produced or accomplished through the effort or activity or agency of a person or thing: “it is not regarded as one of his more memorable works”; “the symphony was hailed as an ingenious work”; “he was indebted to the pioneering work of John Dewey”; “the work of an active imagination”; “erosion is the work of wind or water over time”)

3.

job, employment, work — (the occupation for which you are paid; “he is looking for a job”; “a lot of people are out of work”)

4.

study, work — (applying the mind to learning and understanding a subject (especially by reading); “mastering a second language requires a lot of work”; “no schools offer graduate study in interior design”)

5.

oeuvre, work, body of work — (the total output of a writer or artist (or a substantial part of it); “he studied the entire Wagnerian oeuvre”; “Picasso’s work can be divided into periods”)

6.

workplace, work — (a place where work is done; “he arrived at work early today”)

7.

work — (physics) a manifestation of energy; the transfer of energy from one physical system to another expressed as the product of a force and the distance through which it moves a body in the direction of that force; “work equals force times distance”)

The role of parallel corpora in translation and multilingual lexicography

language used in the definitions is thought to be a controlled language in the sense that the definitions are unambiguous, or if it is thought to be plain English, with all its fuzziness and polysemy. In addition, we find general language sentences illustrating the particular word sense. The idea is that within these examples, each element of the relevant synset can be used without a change of meaning. It must be borne in mind that WordNet was not set up as a conceptual ontology. George A. Miller, its designer and original creator, is first of all a psychologist, and he set it up because he was interested, among many other things, in how people associate ideas. For him, it was of no importance whether these ideas correspond to language-independent, universal cognitive concepts or just to words of the English language. When linguists first became interested in WordNet, they interpreted it as a thesaurus of the English language, and the relationships accounted for were not relationships between concepts but relationships between different senses of different words. Thus, sense 4 of work and one of the senses of study are identical, and work and study are therefore synonymous in this sense. WordNet provides other relationships as well, such as hypernyms, hyponyms and meronyms for the different word senses of a word. WordNet differs from a traditional thesaurus in that its basic unit is explicitly not the word, but the word’s senses. (Implicitly, this is also true of traditional thesauri, but there the senses are often not properly identified.) The word sense, however, is an abstraction; it is, strictly speaking, not inherent to language, but part of the lexicographer’s interpretation of the meaning of a word. Otherwise it would be possible to decide on the basis of linguistic evidence how many senses a given word has. But this is a crucial point on which dictionaries, even if they are comparable in size, tend to differ. What we would like to know is whether word senses, in WordNet, are thought to be concepts or not. If concepts are universal mental entities (or composed of universal mental entities) or if they are categories corresponding to entities of some language-external reality, we would expect them to be of a different nature from word senses. Yet in WordNet the question is left open whether the featured word senses are simply lexicographers’ hypotheses concerning English vocabulary or if they are thought to be universal (and therefore conceptually identical with the concepts of cognitive linguistics) at least to the degree that in principle, for each word sense, there is a corresponding word for it in other languages, or a lexical gap, or a distinction not present in the English unit of meaning so that there are two (or more) equivalents of this one unit. The possibility of identifying WordNet word senses with the concepts of





Wolfgang Teubert

cognitive linguistics proved too attractive to be ignored. It became the underlying idea of the major project EuroWordNet, funded by the European Commission and involving many EU languages (www.hum.uva.nl/~ewn/ index.html). For each language, local WordNets have been established, and a ‘language-independent’ ontology provides the framework in which to match word senses. Thus, WordNet has indeed come to be seen as a universal conceptual ontology, as a model of a universal interlingua, with a clear-cut, finite set of senses or concepts, onto which the words of any language can be mapped so that, ideally, for each natural language expression, there would be a ‘language-independent’ conceptual representation. WordNet would be the answer to the quest for a perfect language. But does it work? In this experiment, I use the Internet version of the Princeton WordNet, and I am interested in how far the synsets for each sense of the noun work, that is the set of words corresponding to each unit of meaning, can be matched to seven different English translations of Plato’s Republic. These translations constitute a set of paraphrases of the original Greek text. Each translator, we can assume, strives to render the text as closely to the original as possible, so that the paraphrases should turn out to be largely synonymous. Therefore, if we look at all occurrences of the noun work in one translation (our master version), we would expect to find in the other translations either also the word work or, in view of the word sense in which work is used in a given occurrence, one of the synonyms of the pertaining synset. I have worked only with the Princeton WordNet, because it is more detailed and more elaborate than the localised versions for other languages. This is why I have had to choose an indirect way of assessing the value of the EuroWordNet approach for translations. Instead of comparing the original text with its paraphrase in another language, I compare different paraphrases (in the same language) of the original Greek text. If the word senses of WordNet reflect universal concepts, one concept could then be assigned to each occurrence of the noun work, and, assuming that the translators interpret the Greek text in the same way, they would all assign the same concept to a given occurrence. If this is the case, the WordNet approach indeed seems a viable method for computerassisted translation. If the lexical variation displayed in our set of translations does not map with the synsets of Wordnet, then either WordNet is deficient (and what thesaurus isn’t?) or we would have to assume that the meaning of the underlying Greek text cannot be mapped so easily onto the concepts or word senses featured by WordNet. In this case the translator using the word work

The role of parallel corpora in translation and multilingual lexicography

would use it in a sense not identified by WordNet, but perhaps slightly differently. This would be an indication that the meaning of words is generally more fuzzy than the tradition of displaying word senses in dictionaries (or concepts in ontologies) would have us believe. If this is the case, then it is not possible to assign in a procedural, algorithmic, controllable way the proper word sense to a given occurrence of a word in a text. In this experiment I use seven different translations, the first of which is available electronically and therefore used as master version.1 Table 2 displays the evidence. We have, in Translation 1, i.e. in the translation chosen as the master version, 24 occurrences of the noun work. The first column identifies the citation in standardized form. The second column gives the word we find in the original Greek text. The third to eighth columns give the word we find in the six translations with which the master version is compared. The ninth column gives the synset that I would assign to work in the given citation. Words in bold face are the synonyms found in the relevant synset. Empty spaces indicate that in these cases there was no corresponding word (usually as a result of the fact that the original text was not translated word by word but in larger units of meaning). Table 2. Work and its equivalents in the Plato translations Origin Original

Transl. 2

332e érgon

Transl. 4

Transl. 5

Transl. 6

Transl. 7oiSynset

occupation –

work

result

action

–

?

352e érgon

–

work

work

end

–

function

?

352e érgon

function

function

work

end

work

function

?

353a érgon

function

function

work

end

function

function

?

353a érgon

function

function

work

end

function

function

?

353b érgon

–

work vb

work

end

–

–

?

353c ergázomai

function

work vb

work

end

function

function

?

353c to érgon function apergázomai

–

work

end

function

function

?

353d érgon

function

work

end

function

function

?

353e ta érga function apergázomai

work

work

ends

functions function

1

369e érgon

product of his labour

product of work his work

370b érgon

job [= task] work

function

Transl. 3

labour

result of work his labour

work

2

work

work

1

work





Wolfgang Teubert

Table 2. continued Work and its equivalents in the Plato translations Origin Original

Transl. 2

Transl. 3

Transl. 4

Transl. 5

Transl. 6

Transl. 7ooResult

370c parérgon

–

do vb

–

business

doing

–

371c demiourgía

job

work

work

–

work vb

occupation 3

372a érgazomai

work vb

work vb

work vb

work vb

work vb

work vb

374a ergazomai

job or profession

practice vb exercise vb practise vb work vb

375a phulaké

work

quality of guarding

quality of guarding

guarding

guarding keeping guard

3

380a érga

work

acts

work

works

deeds

work

1

416c phulaké

–

perfect being virtue guardians guardians

–

being guardians

3

421c érgon

job

craft

–

work

business

–

3

442b prátto

business

work

business

sphere

work

work

3

501b apergázomai work vb

work

work out vb

work

work

[in] doing [this]

1

1, 3

1, 3

professions 1, 3

535d misoponéo

intellectual trouble effort

work

labour

work

work

1

553c ergázomai

work

work

work

work

work

1

work

The table shows that in the first nine cases we find the Greek word érgon, and hence the English word work in our master version (Translation 1, see above), being used in the sense of ‘function’ or ‘end’. This is not a word sense featured in WordNet for work, and actually work, even though it is the standard translation, does not really fit in well, as illustrated in example 352e in the master translation: Would you be willing to define the work of a horse or of anything else to be that which one can do only with it or best with it?

Some of the other word sense assignments can be disputed as well. It is not surprising that the only synonym occurring in the translations listed in WordNet is job, belonging to synset 3. There are others, such as labour, occupation, profession, practice and business, that, in the contexts of our citations, are certainly synonymous with work but could not easily be subsumed under synsets 1 or 3. In the example given in WordNet for word sense 3, he is looking for a job, it is not possible to substitute profession or business for job (and I am not entirely sure

The role of parallel corpora in translation and multilingual lexicography

that looking for a job is always the same as looking for work). In a similar way, this would also be true of practice and labour as synonyms of work in synset 1. However, the results of this experiment are far from conclusive. Can WordNet really be improved so that it reliably indicates synonyms for the word senses it posits? Does it make sense to expect multiple translations of the same text to be synonymous? Is the whole experiment not seriously flawed because it focuses on a single word, which is not necessarily the unit of translation? The subsequent sections will consider these questions.

. Translation practice In this section I shall set out to demonstrate the practice of translation, using the example of the French and German versions of Plato’s Republic. I shall look for the word travail and the word Arbeit, the standard equivalents of the English noun work. As in the preceding section, I shall not compare the original Greek text with its French and German paraphrase, but compare two paraphrases of the same text. Each paraphrase (or translation) is an interpretation of the original text. Interpretation presupposes text understanding, something that is happening in people’s minds and involves intentionality. However, if text understanding involves intentionality, in the sense in which John Searle has defined intentionality (Searle 1992), then it is an action, not a process. While humans can carry out both actions and processes, computers are good only for processes, and this is why computers cannot paraphrase texts. Actions presuppose intentionality, and differ from processes in that the outcome of an action is not predetermined; it always involves some degree of arbitrariness. This is also one of the reasons why the conceptual ontology approach does not work. It takes a human being to assign a word sense to a text word, because such an assignment requires an understanding of the text. But if it is an action, then it involves arbitrariness. Paraphrases are not procedural mappings of an original text, they are the results of acts of interpretation. They may be as close in meaning to the original text as possible, but they can never be identical with it. Comparing the French and German paraphrase of a Greek original doubles the semantic difference obtaining between the original text and its paraphrase. Therefore the equivalence between the two paraphrases is looser than that between original and translation. It is this looseness that gives us, in a nutshell, a view of the range of options, the infinite design space translators have in their work.





Wolfgang Teubert

As we can see in the comparison of the French and German translations of Plato’s Republic, texts are not translated word for word. Quite often the translation units, the text segments that are translated as a whole, are larger than the single word; they are phrases of two, three or many more words. The equivalents of these translation units do not have to be phrases of the same or a similar structure; a collocation can become a clause; a whole clause can be reduced to a single word; singulars become plurals and vice versa. Table 3 shows the citations for travail/travaux and their German equivalents.2 Table 3. Plato’s Republic: travail / travaux and their German equivalents (16 citations) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16

son travail de cordonnier tout le travail de fabrication des autres objets travail mal fait travail de la poterie en supplément du travail sérieux un travail considérable son travail le travail manuel ses travaux seront moins réussis les travaux des artisans les travaux de la guerre les travaux d’artisans travaux des champs travaux du corps que de ceux de l’âme utiles à nos travaux s’associer avec le sexe mâle dans tous ses travaux

seine Schusterei alle andere handwerkliche Arbeit schlechte Leistungen Töpferei nur eine Nebenbeschäftigung eine grosse Aufgabe schmutzig-kleinliche Arbeit niedrige Handwerksarbeit er wird schlechtere Arbeit leisten die Betriebsleistungen das Kriegshandwerk die handwerkliche Arbeit Ackerbau körperliche und geistige Arbeit beim Gewerbe brauchbar an alle Arbeiten wie ein Mann herangehen

There are eight instances of travail and eight instances of travaux. For travail, the German standard equivalent Arbeit is only used in two cases, and in one more case Arbeit has become part of the compound Handwerksarbeit. Following similar patterns, travail de cordonnier becomes Schusterei and travail de la poterie becomes Töpferei. The singular travail mal fait becomes the plural schlechte Leistungen (is this WordNet sense 1 or 4?). Citation 5 needs more context: (philosophy is) une chose à ne pratiquer qu’en supplément du travail sérieux (‘something to be practised only in addition to serious work’), which reads in German: Philosophie sei nur eine Nebenbeschäftigung (‘is only a side job’), in spite of the difference in structure an appropriate equivalent. In citation 7, the German translator added schmutzig-kleinlich, which is not called for in the original text. To a lesser degree, this seems also to be the case for citation 8: travail manuel does not imply ‘menial work’ but niedere Handwerksarbeit does.

The role of parallel corpora in translation and multilingual lexicography

Here we find banausía in the Greek text, implying ‘the praxis of a mere mechanical art’ (cf. Teubert 1996). In three of the eight citations of the plural travaux, the German equivalent is the singular Arbeit, and only in one case is it the plural Arbeiten. We shall see in the next section that this phenomenon is not unique to the Plato translations but also quite common in EU documents. Citations 11 and 13 show stable collocations: travaux de la guerre and travaux des champs, which have their standard German equivalents: Krieghandwerk and Ackerbau. Citations 10 and 12 give the same collocation travaux des artisans, with the German equivalents Berufsleistungen and handwerkliche Arbeit. The Greek text has the neutral érga (pl., ‘works’) for citation 10, and banausía (see above) for citation 12. The wider context of citation 12 reveals a pejorative condition not found in the French and German equivalents: leurs corps mutilés … par leurs travaux d’artisans, Berufsleistungen and handwerkliche Arbeit (cf. citation 2: here handwerkliche Arbeit is the equivalent of travail de fabrication des autres objets). In citation 15, the equivalent of travaux is Gewerbe, a collective noun for craftsmen’s shops. The Greek text just has the neutral érga. Generally, travaux seems to mean ‘a set of continuous and coherent activities’, while Arbeiten is usually not used to designate activities, but the results of work. Only in citation 16 do we find the plural Arbeiten, because here all the different kinds of work men do are implied. Table 4. Plato’s Republic: Arbeit(en) and its French equivalents (30 citations) 01 ihre Arbeit 02 sie zu gemeinsamer Arbeit unfähig machen 03 ein Winzermesser, gefertigt für diese Arbeit 04 jeder leistet seine Arbeit 05 der richtige Zeitpunkt für eine Arbeit 06 schönere Arbeit leisten 07 die eigene Arbeit versäumen 08 zu keiner anderen Arbeit taugen 09 Körperkräfte für Arbeit besitzen 10 vor allen anderen Tätigkeiten sollte er Ruhe haben und…tüchtige Arbeit leisten 11 alle andere handwerkliche Arbeit 12 seine eigentliche Arbeit vernachlässigen

leur œuvre propre être incapables d’agir en commun les uns avec les autres la serpette fabriquée à cet effet chacun d’eux destine le produit de son travail le bon moment pour un travail réussir mieux laisser en sommeil son activité d’homme être impropres à toute autre fonction leur force physique les rend aptes aux efforts pénibles et donnant congé aux autres tâches et…l’accomplir comme il fallait encore tout le travail de fabrication des autres objets en négligeant l’ouvrage qui est devant lui





Wolfgang Teubert

Table 4. continued Plato’s Republic: Arbeit(en) and its French equivalents (30 citations) 13 ihn an einer aufmerksamen Arbeit hindern 14 15 16 17 18

der geistigen Arbeit schuld geben für die Arbeit schlechtere Arbei die Arbeit des Schusters machen an alle Arbeiten wie der Mann herangehen

19 20 21 22 23

dieselben Arbeiten machen die geistige Arbeit die handwerkliche Arbeit eine hervorragende Arbeit unermüdliche Arbeitsfreude

24 25 26 27

in seiner Arbeitslust in seiner Arbeitslust schmutzig-kleinliche Arbeit zu körperlicher und geistiger Arbeit untauglich 28 die von ihrer Hände Arbeit leben 29 wenn eine solche Arbeit Stückwerk ist 30 praktische Arbeit

en gênant le type d’attention qu’on doit exercer accuser la philosophie pour faire une poterie travaux moins réussis accomplir l’ouvrage d’un cordonnier s’associer avec le sexe mâle dans tous ses travaux faire les mêmes choses la pensée les travaux des artisans un très belle exécution qui ait du goût pour toutes les sortes d’efforts dans son goût de l’effort dans l’amour de l’effort travail incapables de travaux du corps que de ceux de l’âme qui travaillent si cet objet lui aussi se trouve être quelque chose de peu net œuvres d’habilité humaine

Table 4 gives 30 citations for Arbeit(en), and their French equivalents. Only in citations 4, 5 and 11 do we find the standard equivalent travail; in citations 16, 18, 21 and 27 it is travaux (for the reason, see above). Other recurrent equivalents are effort (citations 9, 23) and efforts (citations 24, 25); l’ouvrage (12, 17); œuvre (1) and œuvres (30); singular equivalents are effet (3), activité (7), fonction (8), choses (19), exécutions (20) and objet (29). For the collocation geistige Arbeit we find philosophie (14), pensée (16) and travaux de l’âme (27). None of these equivalents seems out of place, but no German-French dictionary offers such a wealth of options. The really interesting finding is that in six instances the equivalent of a nominal phrase is a verbal phrase: zu gemeinsamer Arbeit unfähig (2) becomes incapables d’agir en commun; schönere Arbeit leisten (6) becomes réussir mieux; tüchtige Arbeit leisten (10) becomes l’accomplir comme il fallait; aufmerksame Arbeit (13) becomes l’attention qu’on doit exercer; für die Arbeit (15) (where potters are mentioned in the wider context) becomes faire une poterie; and finally we find qui travaillent (28) as a very loose equivalent of die von ihrer Hände Arbeit leben (‘who make a living from their labour’).

The role of parallel corpora in translation and multilingual lexicography

The evidence extracted from the comparison of the French and the German versions of the Republic shows clearly that translators do not translate single, decontextualised words by assigning to them the word senses featuring in a dictionary or an ontology. Rather, translators carry out their task by slicing the text into translation units, semantic conglomerates which are translated as a whole. But even where they translate one single word by another single word, they do this not by assigning a specific word sense to the word but by forming a hypothesis on the basis of the context in which this word occurs. We do not know why a translator paraphrases gefertigt für diese Arbeit by fabriquée à cet effet, but paraphrases zu keiner anderen Arbeit taugen by être impropres à toute autre fonction. Are not fonction and effet used in a very similar way? Are they synonymous? It is impossible to tell from the limited evidence we have. In a large parallel corpus we can expect to find a reasonable number of occurrences of each translation unit, allowing us to generalize. If there is one unit where Arbeit is always translated as effet and never as fonction, then we would have found a reliable equivalent. But if in the same context effet and fonction are about equally common, it would seem that, in the given context, these two words are synonymous. In this section, I have shown that the translator’s design space is much larger than the language-neutral conceptual ontology (or the traditional bilingual dictionary) would lead us to believe. However, the evidence this comparison is based on is too limited and the variation encountered too large to show how information derived from parallel corpora can be incorporated into a bilingual dictionary or a translation platform that would actually help with translating texts.

. The parallel corpus: A repository of re-usable translation units Recently a team at the Multilingual Research Group of the Institut für Deutsche Sprache in Mannheim, headed by Valérie Kervio-Berthou, assembled a 30-million-word French-German Parallel Corpus, with the acronym GeFrePaC. The work was supported by a grant from ELRA (the European Language Resources Agency), and it is now being distributed by this agency. About two thirds of this corpus consists of documents issued by the European Commission (and downloaded from the CELEX database), and one third consists of the German and French versions of the European Parliament’s verbatim record of proceedings. The corpus is part-of-speech tagged, aligned on the sentence level and TEI-encoded. More information on GeFrePaC can be found in Kervio-Berthou (2000).





Wolfgang Teubert

GeFrePaC will be used in Mannheim for a project in bilingual lexicography. The goal is to test the hypothesis that the translational knowledge implicitly contained in a parallel corpus complements the traditional bilingual dictionary. While, owing to its limited size, it will not cover as many words as the average comprehensive French-German/German-French dictionary, it will contain many relevant translation units and their equivalents that tend to be overlooked by lexicographers not working with a parallel corpus (and that is the majority). Bilingual lexicographers have always been aware of the fact that texts are often translated in units larger than the single word. For a long time they have aimed to include compounds, multi-word units, significant collocations, set phrases and idioms. But until the arrival of corpora it was left to the lexicographers’ skills to sift the evidence and to decide what to enter in the dictionary. Usually they relied on monolingual dictionaries and on their own observations. The results were often arbitrary or even idiosyncratic. With the availability of monolingual corpora the quality of bilingual dictionaries quickly improved. Corpus linguistics provided the methodology to identify semantic conglomerates such as compounds or collocations, using a combination of statistical and grammatical approaches. It was then possible to enter the most relevant of these conglomerates, together with their presumed equivalents, in the dictionaries. Usually, bilingual dictionaries refer to corpora only to validate entry candidates selected on the basis of other principles. In Elena Tognini Bonelli’s (1996) words, this is still only the corpus-based (as opposed to the corpus-driven) approach. But even where bilingual dictionaries record the evidence encountered in monolingual corpora, they still have to rely on the lexicographers’ bilingual competence to determine the translation equivalent of any semantic conglomerate. This equivalent will, under normal circumstances, not be wrong. But it will not necessarily reflect the translation practice of the community of French-German and German-French translators. This practice is what parallel corpora record. They are repositories of translation units and their equivalents in the target language, and these translation units may be words within a given context or the semantic conglomerates mentioned above. Since there are, to date, no bilingual dictionaries of general language based on parallel corpora, we still do not know to what extent they can complement, improve and validate existing dictionaries. However, there is reason to believe that the additional evidence a parallel corpus of just 30 million words will provide is enough to take the traditional concept of printed dictionaries to its limits. Instead of a multi-volume printed dictionary there is now the option of a bilingual database of translation units and their target language

The role of parallel corpora in translation and multilingual lexicography

equivalents. In the more distant future, a translation platform may offer the translator a tentative breakdown of the text to be translated into translation units and possible target equivalents. Unfortunately the GeFrePaC corpus was finalised only in June 2000, when it was too late to exploit it systematically for this analysis. What I present here are occurrences of Arbeit and travail together with their equivalents as they were extracted by hand from the corpus, amounting to less than 40 citations per language pair. This represents only a tiny fraction of the total number of occurrences. All citations were taken from EU documents, none from the European Parliament verbatim record of proceedings. In the case of the EU documents in our corpus, it is often impossible to to say which is the original text and which is the translation. While it is safe to assume that there will be hardly any German original texts, in a number of instances the French text will also be a translation, in this case of an English text. Earlier drafts may well have been written in Spanish or German or other EU languages. Since each language version of the final text is legally binding, we can, in principle, assume that semantically a German and a French version are closer to each other than the translations of Plato’s Republic. GeFrePaC is not a corpus of general language; it records, rather, the legal and administrative language used in the European Commission. The French version therefore differs from the legal and administrational language used in France, and the same is mutatis mutandis true of the German version. It is a special jargon of its own, and the different language versions this jargon is used in are linked by the continuous practice of translation. This strong continuity in practice is one important reason why the translators of the EU legal documents have less freedom, less design space, than the translators of Plato’s Republic. In their efforts to link the different language versions of a text as closely as possible they preserve the original structure wherever possible, at text level, at paragraph level, at sentence level, and at the level of the translation unit. From a methodological point of view, the special language used in EU documents facilitates the extraction of translation units and their equivalents, as well as the processing of this data into lexicographical results. This is why they are a good starting point for testing parallel corpora in dictionary making. More balanced parallel corpora, representing general language as it is used in newspapers, magazines, fiction and general-purpose books, will be more difficult to handle. Nevertheless, I believe our citations of Arbeit and travail usefully show how corpus data can be processed. In the tables that follow, the translation units are given in italics, with the keyword in bold face. T1, T2 etc. refers to





Wolfgang Teubert

different texts, which may have been translated by different people. Table 5 presents the citations for Arbeit. Table 5. EU documents: Arbeit and its French equivalents (20 citations) 01 T1 an der Arbeit der Organisation teilnehmen 02 03 04 05 06 07 08 09 10

T1 die Arbeit der neuen Organisation T1 die Arbeit der Organisation T2 sich an ihrer Arbeit beteiligen T2 seine Arbeit zügig durchführen T2 ein Jahresbericht über die Arbeit der Behörde T2 die Arbeit der Behörde T2 die Arbeit des Schiedsgerichts T2 die Arbeit an Bord T3 die Arbeit des weltweiten Netzes stärken

11 T3 die Modalitäten der Arbeit dieser Gruppen 12 T3 die Modalitäten der Arbeit ihrer Gruppe 13 T4 die sich auf die Kosten einer solchen Arbeit beziehen 14 T4 seine Arbeit abschließen 15 T4 Bericht über die Arbeit des Ausschusses 16 T5 die Arbeit von Nichtregierungsorganisationen 17 T5 die Arbeit der Kommission 18 T5 Organisation und Umfang der Arbeit 19 T6 in seine Arbeit einbeziehen 20 T7 ohne eine andere zusätzliche Arbeit

participer aux travaux de l’organisation les activités de la nouvelle organisation les activités de l’organisation participer à ses travaux s’aquitter promptement de sa tâche un rapport annuel sur l’activité de l’Autorité les travaux de l’Autorité la tâche du tribunal arbitral le travail à bord renforcer le fonctionnement du réseau mondial les modalités de fonctionnement de ces groupes les modalités de fonctionnement de chaque groupe relatives au coût de ce travail mener à leur terme ces travaux rapport sur les travaux du Comité le travail des organisations non gouvernementales les travaux de la commission l’organisation et l’étendue des travaux d’associer à ces travaux sans autre main d’œuvre complémentaire

The most surprising result is that the standard equivalent travail occurs only three times, in citations 9, 13 and 16. Other nouns in the singular are tâche (5, 8; same text), fonctionnement (10,11,12), activité (6) and main d’œuvre (20). In all the other instances, the singular Arbeit corresponds to a plural noun: travaux (1, 4, 7, 14, 15, 17, 18, 19; spread over most texts) and activités (2, 3; same text). Looking at the collocates, we find that organisation can be a modifier of travaux (1), activités (2, 3) and travail (16). This does not provide a clear picture but may indicate that in this context pattern travail, travaux and activités are synonymous. It seems reasonable to assume that organisation belongs

The role of parallel corpora in translation and multilingual lexicography

to the same semantic field as autorité (7; collocate of travaux), tribunal arbitral (8; collocate of tâche), groupe, groupes (11, 12; collocates of fonctionnement), comité (15; collocate of travaux), commission (17; collocate of travaux). Travaux, in this context, is not only the most frequent equivalent (1, 7, 15, 17) but also the one spread over most texts, and this can be understood as an indication that travaux is always appropriate if followed by such modifiers. Fonctionnement occurs in only one text, perhaps due to a translator’s whim. We find participer à ses travaux (4) alongside mener à leur terme ces travaux (14) and d’associer à ces travaux (19), but s’acquitter de sa tâche (5). Again, travaux seems to be the best choice for this pattern. Table 6. EU documents: Arbeiten and its French equivalents (17 citations, all in text T2) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17

an den Arbeiten der Behörde teilnehmen im Zusammenhang mit Arbeiten die Unterbrechung von Arbeiten Einzelheiten der dort durchgeführten Arbeiten Tätigkeiten für Arbeiten für Arbeiten nicht mehr benötigt die mit Arbeiten zusammenhängen im Zusammenhang mit Arbeiten des Vertragsnehmers die Arbeiten des Unternehmers behindern Arbeiten (in Übersicht) Dauer der Arbeiten (in Übersicht) Fortschritt der Arbeiten (in Übersicht) Überprüfung der Arbeiten (in Übersicht) Erträge aus den Arbeiten (in Übersicht) im Verlauf seiner Arbeiten seine Arbeiten selbständig durchführen Auskunft über Arbeiten

participer aux travaux de l`Autorité au titre des opérations de suspendre les opérations détails sur les travaux dont elles font l’objet activités connexes au titre des opérations n’est plus nécessaire au titre des opérations connexes au titre des opérations à raison des opérations du contractant gêner les activités de l’exploitant opérations (en sommaire) durée des opérations (en sommaire) avancement des travaux (en sommaire) inspection des opérations (en sommaire) recettes tirées des opérations (en sommaire) dans la conduite des opérations agir de façon autonome renseignements au sujet des opérations

Table 6 presents the citations of Arbeiten. Again it comes as a surprise that in 12 out of 17 instances, it is not travaux but opérations that corresponds to the plural noun Arbeiten (2, 3, 5, 6, 7, 8, 10, 11, 13, 14, 15, 17). As all these citations are taken from one text, it is impossible to decide if they can be generalised. Apart from opérations, we find also travaux (1, 4, 12) and activités (9). There is even a case where the noun Arbeiten corresponds to the verb agir (17). Unfortunately, the citations do not indicate a difference in usage between opérations and travaux. Table 7 below presents the citations of travail. It seems that travail corresponds more often to Arbeit (or Arbeits-) than the other way round. Indeed, out of





Wolfgang Teubert

Table 7. EU documents: travail and its German equivalents (20 citations) 01 02 03 04 05 06 07 08 09 10 11 12 13 14

T1 T1 T1 T1 T1 T1 T2 T3 T3 T3 T4 T4 T4 T4

15 T4 16 T4 17 T5 18 T5 19 T5 20 T5

les conditions de travail plans de travail un plan de travail les plus hautes qualités de travail le volume de travail du Tribunaldes au travail à bord un groupe de travail les heures de travail normales un groupe de travail d’experts la pression de travail la travail du caoutchouc un simple travail de surface un travail de tirage de fils les machines-outils pour le travail des métaux les organes de travail les coupeuses pour le travail du papier ou du carton un appel d’offres en vue d’un complément de travail de conception les permis de travail les accidents du travail les lock-out ou autres conflits du travail

die Arbeitsbedingungen Arbeitspläne Arbeitsplan ein Höchstmaß an Leistungsfähigkeit Arbeitsanfall des Gerichtshofs bei der Arbeit an Bord eine Arbeitsgruppe die normale Arbeitzeit eine Arbeitsgruppe von Sachverständigen der Betriebsdruck das Bearbeiten von Kautschuk eine einfache Oberflächenbearbeitung eine Auszieharbeit die metallbearbeitenden Werkzeugmaschinen die Arbeitsgeräte die Papier- und Pappeschneidemaschinen eine Ausschreibung für weitere Konzeptionen die Arbeitserlaubnis die Arbeitsunfälle Aussperrungen oder sonstige Betriebsunruhen

our 20 citations, we find Arbeit/Arbeits- in eleven instances (1, 2, 3, 5, 6, 7, 8, 9, 15, 18, 19), and Arbeit is also part of Auszieharbeit, the equivalent of travail de tirage de fils (14). In these twelve citations there is only one occurrence of Arbeit proper (6), while everywhere else Arbeit is part of a nominal compound. Recurrent are plan(s) de travail (Arbeitsplan, -pläne; 2, 3) and groupe de travail (Arbeitsgruppe; 7, 9). N+de+travail collocations correlate with a compound noun beginning with Arbeits-: conditions de travail (Arbeitsbedingungen; 1), volume de travail (Arbeitsanfall; 15), heures de travail (Arbeitszeit; 8), organes de travail (Arbeitsgeräte; 15), permis de travail (Arbeitserlaubnis; 18). The same French pattern, but a different German word, is found in qualités de travail (Leistungsfähigkeit; 4). (There are also two instances of the correlation Leistung/travail/travaux in Plato’s Republic, see above.) If the work in question is carried out by machines, we find Bearbeiten (11, 14) and Bearbeitung (12) instead of Arbeit. In citation 14, there is also a change in the structure of the collocation: the French prepositional phrase here corresponds to an adjective. In citation 10, pression de travail and Betriebsdruck are standard-

The role of parallel corpora in translation and multilingual lexicography

ised terms, and this also seems to be true in the case of citation 16. In citation 17 travail de conception correlates with Konzeptionen, which looks like an idiosyncratic solution not to be generalised. Finally, in citation 20 we find Betriebsunruhen where I would have expected the more common Arbeitsunruhen. But an explanation could be that Arbeitsunruhen is commonly used for trouble caused by the workforce and therefore cannot designate les lock-out caused by managers. Table 8. EU documents: travaux and its German equivalents (9 citations) 1 2 3 4 5 6 7 8 9

T6 travaux effectués à l’aide de fils brodeurs en métal T7 les travaux de la Cour T7 lorsque les travaux ont été suffisamment avancés T7 les travaux de réhabilitation urgents T7 l’ensemble des travaux des ONG T7 les travaux de ce groupe T7 les travaux du groupe se sont developpés T7 dans le cadre de ces travaux T8 les marchés de travaux

mit Metallfäden ausgefüllte Sticharbeiten die Arbeiten des Hofes bei zufriedenstellendem Fortschritt der Arbeiten dringende Rehabilitationsmaßnahmen die Gesamttätigkeit der NRO die Arbeiten dieser Gruppe die Gruppe ist in ihrer Arbeit vorangekommen im Rahmen dieser Arbeit Dienstleistungsaufträge

Table 8 shows the citations of travaux. Out of the nine citations of travaux, four correspond to the standard equivalent Arbeiten (or -arbeiten) (1, 2, 3, 6), and only two to Arbeit (7, 8), whereas out of 20 citations for Arbeit, eight corresponded to travaux (see above). In citation 4, travaux de réhabilitation must be rendered as Rehabilitationsmaßnahmen, while in citation 5, Gesamttätigkeit seems to be more elegant than die Gesamtheit der Arbeiten. In 7 there is an attractive rephrasing of the original structure: the French nominal modifier is a subject in the German version, while the French subject is an adverbial phrase modifier in the German text. Finally, in citation 9, travaux is rendered by Dienstleistungs-, which makes sense in this context. My analysis is based on such a small number of citations that it is very difficult to generalise. It is recurrence as a parameter which allows us to determine whether a correlation between translation unit and translation equivalent is sufficiently established for it to be safely re-used in the translation of new texts. Recurrence is also an important issue for the automatic identification and extraction of translation units and their equivalents. Two other methods are statistical procedures (allowing us to determine the significance of co-occurrence of text elements) and grammatical operations (POS-tagging for determining the syntactic structure of a collocation). But even the very restricted





Wolfgang Teubert

analysis of just a few score citations of the kind presented here yields results that can well be used to complement existing bilingual dictionaries. In Table 9 these results are juxtaposed with the (abbreviated) entry for Arbeit/travail in the PONS Großwörterbuch Französisch-Deutsch (1996). Table 9. Corpus evidence vs. dictionary evidence Corpus evidence – – – – – –

Arbeit & equivalents travaux (4) activités (2) fonctionnement (2) tâche activité travail

– – – –

Arbeiten & equivalents opérations (12) travaux (3) activités agir

– – –

travail & equivalents Arbeits- (10) Betriebs- (2) Leistungs- -Arbeit Bearbeiten - -bearbeitung -arbeit -bearbeitend (Papierschneidemaschinen) (Konzeptionen)

– – – – – –

travaux & equivalents Arbeiten (3) Arbeit (2) -arbeiten -maßnahmen -tüchtigkeit Dienstleistungs-

– – – –

Dictionary evidence Arbeit & equivalents 01. (Tätigkeit) travail … 02. (Arbeitsplatz) travail 03. ( Produkt) travail 04. (schriftliches Werk) travail, ouvrage 05. SCOL (Klassen~) devoir, contrôle 06. UNIV mémoire, dissertation 07. (Mühe) travail 08. (Aufgabe) travail, tâche

travail & equivalents 01. (activité) Arbeit ... 02. (tâche) Arbeit ... 03. (activité professionnelle) Arbeit... Schwarzarbeit... Nachtarbeit ...Zeitarbeit 04. pl (ensemble de tâches)... die Bauarbeiten... ~aux de champs Feldarbeit... 05. (réalisation) Arbeit... Werk 06. (publication) Arbeit 07. ECON Arbeit; division du ~ Arbeitsteilung 08. (façonnage) Bearbeiten, Bearbeitung... ~de qc Bearbeitung einer S., -bearbeitung... 09. (fonctionnement) Arbeit... Funktion, Tätigkeit... 10. (effet) [Ein]wirkung... 11. (bois/métal) Arbeiten 12. PHYS Arbeit 13. MED... [Geburts]wehen...

The role of parallel corpora in translation and multilingual lexicography

.

Conclusion

My goal has been to show that the evidence of parallel corpora can complement traditional translation aids, such as printed dictionaries, termbanks and even translation memories. This evidence can be used to compile better bilingual dictionaries. However, as parallel corpora keep growing in size, the traditional form of the printed dictionary will give way to a bilingual database. Bilingual databases can cope better with larger translation units, and they can also be used as input for translation platforms which provide translators with translation options among which they can select the appropriate equivalent. What matches a translation unit with its equivalent in a given target language is not some abstract property of the language system in general or the systems of two specific languages but the continuous practice of generations of translators. It is this received practice that bilingual dictionaries have always endeavoured to capture. Now, with the advent of parallel corpora, this goal can finally be approached. Not all the practice of translators is adequate or appropriate. It is up to the community of bilingual speakers to decide whether a translation is adequate or appropriate. Many of the equivalents translators have come up with are questionable or idiosyncratic and should not be re-used if the translation unit occurs in a new text to be translated. Not all the evidence extracted from a parallel corpus should go into a bilingual database or printed dictionary. But how can we distinguish good practice from bad? For an automatic distinction, the only parameter available is recurrence. We can act on the assumption that a successful solution to a recurrent problem will, in due course, outnumber less successful attempts. If, in comparable contexts, a given adjective/noun collocation occurs eight times and is translated five times by the same equivalent and three times by different equivalents, we are justified in assuming that the recurrent equivalent, established by practice, can safely be re-used in a new translation of this collocation. I do not believe that Machine Translation has a future except for texts written in the controlled language of a narrowly restricted domain. It cannot be the answer to the growing demand for general language translations. Conceptual ontologies as they are customarily used in MT have two major deficiencies: they cannot deal with inherent word meanings but only with externally assigned word senses, and they fail to account for the fact that a large part of the vocabulary of general language consists of words whose meaning becomes concrete only within the context they are used in or as part of a semantic conglomerate. Conceptual ontologies contain decontextualised concepts, con-





Wolfgang Teubert

cepts in their paradigmatic relationship, but deprived of their syntagmatic relationships. Conceptual ontologies attempt to categorise the reality we encounter independently of language, and this is why they cannot deal with the symbolic nature of language. Language is discourse; it is the universe of texts that has been produced by a language community. Texts are concatenations of semantic conglomerates, of words, of collocations, of set phrases, of linguistic symbols which can only be studied from the point of view of form and meaning. Whatever the meaning of such a conglomerate, it does not refer to some language-external reality. Meaning is the history of all earlier occurrences of this conglomerate, and it is to these that it refers. In this history of occurrences we find citations where the conglomerate was paraphrased or explained, and other instances where it was used within a specific context. It is these citations, and not the language-external reality, which are symbolised by a linguistic sign. For there is no other way of introducing a unit of meaning into the discourse than by explaining what it means. A word whose meaning has never been explained does not refer to anything. Corpus linguistics provides the methodology to take linguistics, and lexicology in particular, beyond the single word as the basic semantic unit. Rather than decontextualising words and describing their meanings in the isolation of a lexical entry, corpus linguistics breaks down the border between syntax and the lexicon by identifying semantic conglomerates in corpora, combining the parameters of recurrence, statistical significance and syntactic categorisation. Corpus linguistics elucidates the meanings of these units of meaning by extracting their paraphrases and their usage from the corpus. Multilingual parallel corpora can be understood as repositories of paraphrases of translation units. The meaning of a translation unit in the source language is its equivalent in the target language. If there are several equivalents of a translation unit and if these equivalents are not synonymous, then this translation unit has several meanings. While traditional bilingual lexicography has often assigned word senses according to the established practice in the source language, it is now possible to define the senses of translation units on the basis of their non-synonymous equivalents in the target language. What is a translation unit in relation to one target language does not have to be one in relation to another. It is the target language that determines the unit of meaning. My analysis of Arbeit and travail in Plato’s Republic and in EU legal documents shows, I hope, that actual translation practice offers a wider choice of options and a larger design space for translation than the traditional bilingual dictionary. The two corpora used in this study are not representative of parallel

The role of parallel corpora in translation and multilingual lexicography

corpora in general. In the case of Plato’s Republic, I compared the French and the German translation of a Greek text, not an original text with its translation. The EU documents as a parallel corpus are unique in the sense that it is not possible to distinguish source language from target language. Translation studies have taught us, however, that it is important to know which is the source language, which is the target language. Translation is a unidirectional activity, and there are good reasons why bilingual dictionaries cannot be (and perhaps should not be) reversible. The EU corpus, therefore, does not permit us to reassess the important issue of reversibility. Here we need the evidence of reciprocal parallel corpora, consisting of original texts in all the languages involved, together with translations into all the languages. These corpora can then tell us to what extent the practice of translating from language A into language B can have an effect on translating from language B into language A. These parallel corpora will not lead to fully reversible bilingual dictionaries, but they will provide evidence for a renewed discussion of this issue. Parallel corpora will make faster and better translations possible. Multilingual corpus linguistics will contribute to monolingual and bilingual lexicology.

Notes  The following translations of Plato’s Republic were used: Paul Shorey; Cambridge, Mass.: Harvard University Press (available electronically and therefore used as master version) Desmond Lee (1974), 2nd ed.; London: Penguin. Francis MacDonald Cornford (1941); Oxford: Oxford University Press. W. H. D. Rouse (1984), revised ed.; New York: New American Library. B. Jowett (1871); New York: Vintage Books A. D. Lindsay (1957); New York: E. P. Dutton. John Llewelyn Davies/David James Vaughan (1997); Ware: Wordsworth Editions.  Access to the Plato Parallel Corpus (the Republic in ca. 20 different language versions, many of them aligned on the sentence level) can be obtained from: www.tractor.de.

References Carruthers, P. and Boucher, J. (eds) 1998. Language and Thought. Interdisciplinary Themes. Cambridge: Cambridge University Press. Eco, U. 1993. La ricerca della lingua perfetta nella cultura europea. Roma: Laterza. Fodor, J. 1975. The Language of Thought. New York: Crowell. Kervio-Berthou, V. 2000. “GeFRePac. Deliverable 3: ELRA Final Report”. Mannheim: IDS.





Wolfgang Teubert

Koskinen, K. 2000. “Institutional illusions: Translating in the EU Commission”. The Translator 6: 49–66. Melby, A.K. 1996. The Possibility of Language. Amsterdam: Benjamins. Pinker, S. 1994. The Language Instinct. New York: William Morrow. Putnam, H. 1975. “The meaning of ‘meaning’”. Reprinted in H. Putnam, Mind, Language and Reality. Philosophical Papers 2. Cambridge: Cambridge University Press. Teubert, W. 1996. “The concept of work in Europe”. In Conceiving of Europe: Diversity in Unity, A. Musolff, C. Schäffner and M. Townson (eds), 129–145. Aldershot: Dartmouth. Tognini Bonelli, E. 1996. Corpus: Theory and Practice. Birmingham: TWC.

Bilingual lexicography, overlapping polysemy, and corpus use Victòria Alsina and Janet DeCesaris

.

Introduction1

Both researchers interested in improving the quality and usefulness of dictionaries and lexicographers have welcomed the advent and availability of large computerized corpora. Representative bilingual or multilingual corpora are possible in specialized fields because in these well-defined situations set in multilingual environments the subject domains are quite restricted. Bilingual or multilingual corpora consisting of texts based either on translations produced by highly trained professionals or on comparable text production thus play an essential role in ensuring that specialized dictionaries, glossaries and terminologies actually reflect the language used in the workplace. However, the tasks and data facing the general language bilingual lexicographer are rather different in nature from the delimited contexts just mentioned: the kind of corpus which proves most useful in the construction of bilingual dictionaries is not yet well defined. While many modern monolingual dictionaries depend heavily on corpus-based data, bilingual lexicography has yet to determine what type of corpus best serves the needs of general bilingual dictionaries. This would seem to be yet another manifestation of the fact that bilingual lexicography lags behind monolingual lexicography (Hartmann and James 1998:15). Many researchers have noted that the typology of potential users of general bilingual dictionaries is quite varied, ranging from advanced learners to experienced translators (Al-Kasimi 1983:154–157, Tomaszczyk 1983:46). Bilingual dictionaries of this sort are used for both encoding and decoding by speakers of two different languages with several levels of language skills and thus must incorporate a great deal of grammatical and pragmatic information. In corpus-



Victòria Alsina and Janet DeCesaris

based bilingual lexicography, the two alternatives previously hinted at are ‘parallel corpora’, which contain one set of texts in two or more languages, and ‘comparable corpora’, which contain texts in several languages with the same or similar composition. Teubert presents a cogent discussion of both types (1996:245–249), and concludes (rightly, we think) that “ideally, parallel corpora should be viewed as complementary to comparable corpora” (1996: 252). Parallel corpora run the risk of presenting data produced under the special conditions of translation, which may be significantly different from ‘regular’ native-speaker production. It is a well-known fact in translation theory that “phenomena pertaining to the make-up of the source text tend to be transferred to the target text, whether they manifest themselves in a negative transfer (i.e., deviations from normal, codified practices of the target system), or in the form of positive transfer (i.e., greater likelihood of selecting features which do exist and are used in any case)” (Toury 1995: 275). It is true that interference need not be seen as an undesirable trait in translation. Indeed, “its undesirability is always a function of a host of socio-cultural factors, which may therefore be said to condition our law” and “communities differ in terms of their resistance to interference, especially of the ‘negative’ type” (Toury 1995: 277). Nevertheless, the inevitable presence of interference or transfer in translated texts does bear directly on the data to be found in parallel corpora, and makes us question the reliability of parallel corpora as the primary source of data for a general language bilingual dictionary. Comparable corpora would seem preferable for this type of dictionary project, but their use will not be addressed in this paper because we know of no such corpus data available for general purpose language in the language combinations we discuss.2 The two main problems we have mentioned with bilingual corpora, the presence of interference and unavailability, do not plague monolingual corpora of English. Since reliable, contemporary corpus data is widely available for English, we decided to see how it could be used to improve the information currently provided in English/Spanish and English/Catalan dictionaries. In order to determine the possible role for monolingual corpus data in the preparation of these dictionaries, we must first identify the main problems that beset existing dictionaries. We have therefore chosen three non-derived, polysemous adjectives in English that we were sure to find amply covered in current English/Spanish and English/Catalan dictionaries and in a corpus of English: cold, high and odd. Existing dictionary entries for these words were analyzed to pinpoint what needed improvement, and then the British National Corpus was consulted to see how it might help resolve the issues resulting from the dictio-

Bilingual lexicography, overlapping polysemy, and corpus use

nary analysis. We conclude that data from a monolingual corpus proves useful for addressing some of the main problems associated with providing equivalents for adjectives in a general-purpose bilingual dictionary, such as order of presentation, repetition of equivalents due to what we will define as overlapping polysemy, and decisions regarding examples, but has little bearing on the issue of delimiting possible contexts in which the equivalent provided by the dictionary is appropriate.

.

Methodology

We looked up the entries for the three adjectives in three English/Spanish bilingual dictionaries and one Catalan/English dictionary. The bilingual dictionaries consulted in the case of Spanish were The Oxford Spanish Dictionary (OSD), Larousse Gran Diccionario Español-Inglés/English-Spanish (GL), and Simon & Schuster’s International Dictionary English-Spanish/Spanish-English (S&S), which were chosen for the following reasons. First, we were interested in analyzing entries in recently published dictionaries which would reflect contemporary usage. The OSD in particular is noteworthy in this respect, as its first edition was published in 1994. Second, we deliberately included dictionaries produced by both British and American publishers. Third, it has been our personal experience as translators and teachers of translation that all three of these dictionaries are useful, that is to say, we ourselves use them and recommend them to our students. The English/Catalan dictionary analyzed is the Diccionari anglès-català published by Gran Enciclopèdia Catalana (DAC), which is the most comprehensive bilingual dictionary for this language combination currently available. The entries from bilingual dictionaries were compared with those from three monolingual dictionaries, the second edition of the Collins Cobuild English Dictionary (Cobuild), the Cambridge International Dictionary of English (CIDE) and the third edition of the American Heritage Dictionary (AHD3). The choice of these particular dictionaries was also not random: Cobuild is the prime example of a corpus-based dictionary in English, and because it is aimed at advanced learners of the language its target audience coincides to a large extent with the users of the bilingual dictionaries under examination. CIDE is addressed to the same target audience, states that a corpus was used in its preparation although it does not purport to be corpus-based in the same way as does Cobuild, and has a very nice way of dealing with polysemy in that senses





Victòria Alsina and Janet DeCesaris

are clearly grouped together under differentiated basic concepts. AHD3 covers American English, a variety of the language which is explicitly included in the bilingual dictionaries and not, we feel, well represented in Cobuild (it is somewhat better represented in CIDE). Although AHD3 is not based on a corpus, it does claim to rank the order of senses on the basis of usage, as opposed to the historical order of senses employed by other well-known American dictionaries such as Merriam-Webster’s Collegiate Dictionary. The corpus consulted, as mentioned above, was the British National Corpus (BNC). The BNC “was designed to characterise the state of contemporary British English in its various social and generic uses” (Aston and Burnard 1998: 28). It includes both informative and imaginative texts, and comprises 90% written texts and 10% spoken texts. In spite of the design features of the BNC that might lead to controversial linguistic generalizations about general purpose English, we believe it provides a sufficiently accurate picture of British English to allow comparison of data culled from it with data from dictionaries. The number of examples of the three adjectives in this corpus are as follows: high

28,698 examples in 3,243 texts

cold

06,438 examples in 1,592 texts

odd

04,478 examples in 1,595 texts

For the purposes of this article, we decided to examine 500 examples of each adjective randomly chosen by the search function, taken from both written and oral language, and with only one example from any given text. Although 500 is only 1.7%, 7.7%, and 11.2% respectively of the totals available for these words, this number proved workable from a practical standpoint in terms of downloading and producing a clean set of examples.

. Analysis of the dictionary entries and comparison with information extracted from the corpus sample The three adjectives, cold, high and odd, exhibit varying degrees of polysemy, as can be seen in the summary of the definitions listed in the monolingual dictionaries in Table 1. AHD3 consistently makes more sense distinctions than the other two monolingual dictionaries. We believe this difference may be attributed to two factors: (1) AHD3 is a more comprehensive dictionary, with more entries than either of the other two dictionaries; and (2) unlike Cobuild and CIDE, AHD3 is not addressed to foreign learners of English, and thus includes less fre-

Bilingual lexicography, overlapping polysemy, and corpus use

quent, even uncommon, senses of words which are unlikely to be consulted by advanced learners but nevertheless are not insignificant in the context of a comprehensive dictionary for native speakers. The three dictionaries differ somewhat from one another in the number of senses assigned to each adjective, but in this respect we think the best guide is CIDE, which has made a noteworthy effort to limit itself to only the basic senses (which is particularly important in bilingual dictionaries), whereas the other two, especially AHD3, tend to assign a separate sense or subsense to every nuance of meaning. We will therefore be referring mainly to CIDE when we discuss the number of senses of each word. Table 1. Definitions in the monolingual dictionaries

cold high odd

AHD3

Cobuild

10 senses, divided into 18 subsenses 13 senses, divided into 22 subsenses 07 senses, divided into 9 subsenses

08 senses, plus some expressions 02 senses 15 senses, plus some expressions 07 senses 05 senses, plus some expressions 05 senses

0CIDE

3.1 The adjective cold 3 Cold is a polysemous adjective in English with two main senses: (1) ‘having a low temperature’, and (2) the metaphorical sense of ‘unfeeling’ or ‘unfriendly’. Both of these senses correspond almost exactly to the two main senses of frío, in Spanish, and fred, in Catalan. Tables 2 and 3 below show the order of senses in the monolingual and bilingual dictionaries respectively. The data in Tables 2 and 3 show that all the dictionaries consulted, even the one with the simplest structure (DAC), give the sense ‘low temperature’ first, thus reflecting the intuition of the lexicographers that it is the most frequently used sense. The sense ‘unfriendly’ or ‘unfeeling’ is generally, but not always, given in second position. Some dictionaries acknowledge up to five additional different senses, although we feel that these could be included in one of the two Table 2. Treatment of cold in monolingual dictionaries Senses of cold

AHD3 (10 total senses)

Cobuild (8 total senses)

CIDE (2 total senses)

low temperature unfriendly cold colors trail or scent wrong dead

1–2, 8 3–5 6 7 – 9

1–4 6 5 7 8 –

1 2 – – – –





Victòria Alsina and Janet DeCesaris

Table 3. Treatment of cold in bilingual dictionaries Senses of cold

OSD (4 total senses)

GL (6 total senses)

S&S (12 total senses)

DAC (Catalan) (1 total sense)

low temperature unfriendly cold colors trail or scent wrong dead

1 2 – 1 1 –

1, 2 4 5 6 – 3

1 2, 3, 9 12 5, 6 7 10

1 1 – – – –

main senses (as in CIDE). The varying number of senses in the dictionaries reflects two different problems: (1) exactly what constitutes a separate sense is not always clear, even to trained lexicographers; and (2) some dictionaries list highly lexicalized examples as separate senses, even though the meaning could be included as part of an earlier sense. This latter issue is particularly evident in the case of bilingual dictionaries and explains why there may be more senses listed for a word in a bilingual dictionary than in a monolingual dictionary. It is a fact that English cold and Spanish frío / Catalan fred generally coincide in terms of physical reference and possible metaphorical contexts. This can be seen from the entry for cold in the OSD, in which the same equivalent (frío) is provided for cold numerous times. cold1 … adj 1 <water/weather/drink> frío; I’m ~ tengo frío; my feet are ~ tengo los pies fríos, tengo frío en los pies; it’s ~ today/in here hoy/aquí hace frío; the soup is ~ la sopa está fría; I’m getting ~ me está entrando frío; it’s getting ~ está empezando a hacer frío; you dinner’s getting ~ se te está enfriando la comida; the water has gone ~ el agua se ha enfriado; the engine starts straight from ~ without fail el motor arranca en frío sin fallar; the trail has gone ~ se han borrado las huellas; the news was already ~ la noticia ya estaba pasada or añeja; no, you’re still ~, getting ~er (in game) no, frío, más frío; ⇒ blow2 vi 1(a) 2 (a) (unfriendly, unenthusiastic) frío; I got a very ~ reception me recibieron con mucha frialdad or muy fríamente, la recepción que me dieron fue muy fría; to be ~ TO or WITH sb tratar a alguien con frialdad, estar*/ser* frío con algn; to go ~ on sth: I went ~ on the idea (colloq) la idea dejó de hacerme gracia (fam); to leave sb ~: that leaves me ~ (colloq) (eso) me deja frío or tal cual (fam), (eso) no me da ni frío ni calor (fam) (b) (impersonal) frío; keeping to the ~ facts … ateniéndose únicamente a los hechos … 3 (unconscious) ⇒ out2 1(b) 4 (without preparation) sin ninguna preparación; I came to the job ~ empecé el trabajo sin ninguna preparación; I was expected to start from ~ esperaban que empezara sin ninguna preparación.

Figure 1. Entry for cold in the OSD

Bilingual lexicography, overlapping polysemy, and corpus use

Such repetition is in no way limited to the OSD, but quite commonplace in entries that represent what we call overlapping polysemy. In overlapping polysemy, a word in one language is polysemous, and there exists an equivalent word in the other language that, by and large, exhibits the same polysemy. Overlapping polysemy is a manifestation of what Sinclair (1996: 179) termed “parallels between the textual environment of a word in one language and a word that is used to translate it in another.” At this point we are not as concerned with the causes behind overlapping polysemy as we are with its effects on bilingual lexicography, although we might speculate that a cognitive linguistics approach to metaphor in language could be enlightening. We have found examples of overlapping polysemy in English on the one hand and Spanish and Catalan on the other in all word classes, for example: verbs, Eng. run /Sp. correr, Cat. córrer (‘walk quickly’ and ‘run a risk’); prepositions, Eng. before /Sp. antes, Cat. abans in both spatial and temporal contexts; nouns, Eng. dough /Sp. and Cat. pasta referring both to a mixture of flour and water and to money; and adverbs, Eng. naturally / Sp. naturalmente, Cat. naturalment meaning in a natural (as opposed to unnatural) way or expressing the expectedness of an outcome. The existence of overlapping polysemy has not gone unnoticed in the literature; for example, Tognini-Bonelli (1996: 207–214) discusses a case of overlapping polysemy with reference to English real /Italian reale in some detail from the perspective of using corpora to identify translation equivalents. Given the overlapping polysemy exhibited by cold/frío-fred, we might have expected that the entries in the bilingual dictionaries would have taken advantage of the overlap and would hence turn out to be simpler and shorter than those in the monolingual dictionaries. In fact, however, several situations occur: the structure in the DAC is quite simple; the structure in the OSD is somewhat more complex, but is less so than that of either GL or S&S, both of which contain long entries with many senses. In fact, in these latter two dictionaries it appears that little to no attempt has been made to organize the material. After our initial analysis of the entries for cold in the bilingual dictionaries, we are now in a position to identify areas in which the dictionaries differed from one another and which are, perhaps, potential points for improvement: – – –

the design of the entry, to take advantage of overlapping polysemy when it exists; the criteria for ordering the equivalents in the entry; and decisions determining which set phrases or idioms should be afforded equivalent translations in the entry.





Victòria Alsina and Janet DeCesaris

We now turn to the corpus data relating to cold, to see if it bears on any or all of these issues. We were able to use 489 of the 500 examples containing cold that were downloaded from the BNC; in 11 examples the context provided by the search was not explicit enough to determine the sense of cold being used. As seen in Table 4, the sample showed that the ‘low temperature’ sense is by far the most frequently used in English. Table 4. Cold in the BNC sample Senses of cold

Number of examples in BNC sample

low temperature unfriendly giving impression of low temperature

358 054 006

expression: cold war expression: in the cold light of morning/day/dawn expression: cold comfort expression: cold feet expression: cold shoulder other collocations

035 010 006 006 006 008

The group of 358 includes several collocations in which the sense ‘low temperature’ was clear to us (e.g. cold sweat), and 22 figurative uses such as that exemplified by the expression reality closed its cold hand around her in which the sense of ‘low temperature’ was still evident. The sample included a significant number of lexicalized collocations, which are particularly important for bilingual dictionaries since they can constitute exceptions to the almost perfect equivalence between cold and frío/fred. The most frequent of these collocations was cold war (35 cases) — the equivalent of which is the loan translation guerra fría / guerra freda — and this large number of cases no doubt reflects the fact that much of the textual base of the BNC is journalistic and thus concerned with politics. There is also a third, metaphorical, sense of cold which we identify as ‘giving the impression of low temperature’. This sense is used in contexts in which cold is applied to nouns referring to color, light or appearance (e.g. cold grey/gleam/outlines/full moon). All of the above-mentioned senses exemplified in the corpus data, and others which did not appear in our 500 examples but which no doubt would have turned up in a larger sample from the corpus, such as cold sore, are present in the dictionaries examined. In addition, the dictionaries assigned the most frequent meaning to the first sense in the entry. The only exception to these observations is in the cold light of day, which surprisingly is not present in any of the dictionaries in spite of being the second most frequent lexicalized collocation.

Bilingual lexicography, overlapping polysemy, and corpus use

Moreover, the meaning of this expression is opaque and would not be immediately understood by foreign speakers.

. The adjective high Tables 5 and 6 show the order of senses for the polysemous adjective high in the monolingual and bilingual dictionaries consulted. We may first note that, as in the case of cold, AHD3 makes more sense distinctions than the other two dictionaries. The first two or three senses refer to physical height in all three dictionaries; what is perhaps surprising is the variation in the order of the other senses – for example, the sense referring to the foul smell of meat is third out of thirteen in AHD3, last of seven in CIDE, and absent from Cobuild. Table 5. Treatment of high in monolingual dictionaries Senses of high

AHD3 (13 total senses, with 22 subsenses)

Cobuild (15 total senses)

CIDE (7 total senses)

distance above average important mental state sound education bad smelling

01 08 02, 6, 7 10 04

01, 2, 3 04, 5, 7 08, 9 16 14

1 2 3 4 5 6 7

03

Table 6. Treatment of high in bilingual dictionaries Senses of high

OSD (6 total senses)

GL (27 total senses)

S&S (18 total senses)

DAC (Catalan) (1 total sense)

distance above average important mental state sound education bad smelling

1 2 1 4 1

1, 2 3, 5, 12, 21, 23, 24 1, 7, 8 27 10, 11

1 4, 5, 9 17 13

1a 1b 1b 1b 1b 1b

6

17, 19

11

Table 6 clearly shows that the treatment of high in the bilingual dictionaries is even more diverse. At one extreme there is GL, which distinguishes 27 different senses, not counting the large number of examples provided; for its part, S&S includes 18 senses. We note that in both these dictionaries the number of senses





Victòria Alsina and Janet DeCesaris

is greater than that given by the monolingual dictionaries, although it must be said that neither of these two bilingual dictionaries includes subsenses, which might have lowered the number of main senses substantially. OSD divides its entry into 6 senses, and the English/Catalan dictionary, DAC, as it did with cold, lists only one basic sense divided into two subsenses, one for the physical sense and the other for the metaphorical sense. Although the structure of the entry is much simpler in the DAC than in the other dictionaries, the entry itself is not much shorter because there are many examples under each subsense. And, like the monolingual dictionaries discussed above, the bilingual dictionaries list equivalents for the physical reference of the adjective first and then differ as to the order of the derived senses presented. The study of the 500 examples taken from the BNC (492 of which we were able to use) confirms that high is indeed a highly polysemous word. Table 7 summarizes the distribution of senses we found in our sample. Perhaps the most striking aspect of this information is the fact that the historically original sense of ‘extending (relatively) far upwards’ or ‘placed at a great distance from the bottom’, i.e. the sense giving a physical description that is listed first by all the dictionaries examined, is obviously not the most frequent, although it is by no means rare. The most frequently found sense was that of ‘situated at the top part of the scale’ when applied to nouns describing objectively measurable qualities, such as pressure (13), rate(s) (18), degree (10), price (10), cost(s) (10), level (29), proportion (9). A closely related sense, but with the further component of ‘good’ or ‘positive’, is also frequently found, and occurs when the adjective is paired with nouns describing qualities involving a subjective assessment: quality (13), class (3), standard(s) (17), reputation (4), and performance (3). Yet another frequent meaning is that of ‘important’, ‘above others in its class’, which we found in collocations such as high court (16), high school (12), high commissioner (7), and high priest (3). There are a few cases in which high is Table 7. High in the BNC sample Senses of high

Number of examples in BNC sample

top of scale top of scale + good top of musical scale important, above others physical height

198 076 007 071 082

expression: high street expression: high profile other collocations

025 006 027

Bilingual lexicography, overlapping polysemy, and corpus use

applied to sound as in high pitch. Finally, high is present in a number of collocations such as high street, high profile, high priority, high time, etc. To summarize up to this point, we have seen that the bilingual dictionaries examined take little notice of overlapping polysemy, even when it is almost complete (the case of cold/frío/fred). The main exception to this observation is the DAC, which gives one main equivalent and then several expressions. The bilingual dictionaries differ from one another with regard to the order of equivalents, especially in the case of figurative senses, and with regard to which equivalents are included. And, finally, relatively small samples from the corpus contained frequent fixed expressions and collocations that are legitimate candidates for translation equivalents. Not all of these expressions are included in the dictionaries under examination. . The adjective odd To judge from the number of senses listed in AHD3, odd is not as polysemous as either high or cold, and examination of the bilingual dictionaries shows us that it exhibits practically no overlapping polysemy with its equivalents. The five descriptors used by CIDE for odd, viz. ‘strange’, ‘separated’, ‘numbers’, ‘not often’, and ‘approximately’ (as an affix), do not correspond to either one or two adjectives in Spanish or Catalan. Since overlapping polysemy does not really Table 8. Treatment of odd in monolingual dictionaries Senses of odd

AHD3 (7 total senses)

Cobuild (5 total senses)

CIDE (5 total senses)

(Historical)

strange separated numbers not often approximately

1 3, 4 5 6 2

1 5 4 2 3

1 2 3 4 5

5 1 2 4 3

Table 9. Treatment of odd in bilingual dictionaries Senses of odd

OSD (4 total senses)

GL (19 total senses)

S&S (9 total senses)

DAC (Catalan) (5 total senses)

strange separated numbers not often approximately

1 3 3 2 4

13 5, 6 1 8 3

7, 8 1 2 6 3

1 4 2 5 3





Victòria Alsina and Janet DeCesaris

come into play with this adjective, we must concentrate on the other two issues that we have identified as needing improvement: order of equivalents and inclusion of fixed expressions. Table 8 above shows the order of senses in the monolingual dictionaries (we have included the historical order of senses (according to the Oxford English Dictionary) in the final column as a point of comparison), and Table 9 the order of equivalents in the bilingual dictionaries. The sample from the BNC shows that odd is much more common in the sense of ‘strange’ than in any of the other senses; the mathematical sense and the sense of ‘not matched’ or ‘not part of a pair’, which are historically older and, we believe, still important senses of this word, were quite few in number. The figures from the corpus sample are given in Table 10. Table 10. Odd in the BNC sample Senses of odd

Number of examples in BNC sample

strange occasional numbers = not even approximately not matched

257 151 024 031 008

expression: odd jobs expression: odd man out

011 008

The expression odd man out, which is present in the bilingual dictionaries and explicitly explained in both Cobuild and CIDE, turns out to be less frequent than the phrase odd jobs. If we compare these findings with the entries in the bilingual dictionaries, we see that OSD and DAC list the ‘strange’ equivalent first, but note that S&S begins with suelto ‘unmatched’, historically the first meaning, while GL begins with impar ‘not even’ and gives the reader the ‘strange’ sense in 13th position. This example brings up an important issue in relation to corpus-based lexicography, namely how to evaluate senses with relatively few occurrences in the corpus. The ‘not matched’ sense of odd was relatively infrequent in our sample, which might lead lexicographers to omit it from a bilingual dictionary, but we believe that would be an error because this meaning cannot be derived from other information and it is perceived by speakers as a basic sense of the word. Frequency data alone are not enough to determine the inclusion of a sense in a general purpose dictionary. By contrast, the same number of occurrences should be interpreted quite differently when dealing with an expression. Our sample shows that the expression odd jobs is quite frequent and always used in

Bilingual lexicography, overlapping polysemy, and corpus use

the plural (with the meaning of ‘several unrelated, not regular jobs’), and this information can help guide lexicographers in their choice of examples.

.

Conclusions

A major problem with equivalents in bilingual dictionaries is the identification of the range of semantic contexts in which the equivalent provided by the dictionary can be used. Adjectives that have more than one sense are used in a variety of lexical contexts, so it might seem that the starting point for determining which equivalents should be included in a bilingual dictionary is the information, and specifically the sense distinctions, provided by a monolingual dictionary. Several bilingual dictionaries covering the language combinations we have considered here are based on the information from monolingual dictionaries, although this is not openly stated. For instance, it seems that the order of equivalents presented in Simon and Schuster’s International Dictionary corresponds to the order of senses as presented in Merriam Webster’s Collegiate Dictionary of English (which is historical). However, the sense distinction in a monolingual dictionary, whether it be addressed to native speakers or to advanced-level foreign-language learners, may, in practice, not be right in the context of a bilingual dictionary precisely because both languages are not taken into account from the very beginning. In our opinion, the fact that bilingual dictionaries do not seem to be conceived of as bilingual, contrastive works but instead take as their starting point a monolingual dictionary or, at most, a description of only one of the languages does not always yield optimal results. Since, by and large, bilingual dictionaries are not written from the perspective of both languages, they do not take into account phenomena such as overlapping polysemy, which can only make sense when there are two languages involved. The prime role played by both languages in identifying translation equivalents explains why data from a monolingual corpus will not be relevant to this important issue for bilingual lexicography, because a monolingual corpus is simply not constructed from the standpoint of two languages. Although we do not believe that a monolingual corpus can resolve problems resulting from overlapping polysemy, we have seen that it can play an important role in two other, equally important, issues facing the bilingual lexicographer, namely choosing the order in which to present equivalents and determining which fixed expressions to include in a specific entry. In a case like that of odd, the fact that the word is used much more often in the sense of





Victòria Alsina and Janet DeCesaris

‘strange’ than in the mathematical sense of ‘not even’ can be used as an argument for the equivalent of the more frequent sense coming first (this argument seems to have been used by OSD and DAC). We are not arguing explicitly for the order of equivalents always to be based on corpus data, but the GL entry that buries the ‘strange’ sense in the 13th position (out of a total 19) is not, in our opinion, as useful to the reader as it could be. In the case of cold, the expression in the cold light of day/morning/dawn is relatively frequent in a large corpus of English, yet contrastive analysis shows that it is opaque to Spanish and Catalan speakers; here the corpus data helps us to determine which expressions should warrant translations in the dictionary entry. We thus conclude by saying that a monolingual corpus does have a role to play in the preparation of a general purpose bilingual dictionary. Bilingual dictionaries that do not take frequency data into account in the organization of their entries, such as S&S and GL, have been criticized in this article for the way they arrange the information the lexicographers have chosen to include. However, we should like to point out that we ourselves often use these dictionaries because they provide a wealth of information, especially in the form of translation equivalents. The information is there but it is poorly organized, and translators, whose professional obligations require them to search for the right equivalent, are willing to spend the time and effort necessary to plod through the entries. Conversely, a bilingual dictionary with significant gaps in coverage, no matter how well organized the information may be, is going to be found lacking. That is precisely our experience with the DAC — a dictionary with well-structured entries, as we have seen in the cases of cold and high; nevertheless, in our opinion the dictionary contains too few translation equivalents for the level of advanced learners, not to mention that of translators. Those of us who teach know that second-language learners, some of the main users of general purpose bilingual dictionaries, often do not read whole entries; that is why the structure of these reference books needs to be the best possible. We hope to have suggested at least two ways in which use of a monolingual corpus can be fruitful in this task.

Notes  Work on this paper was supported by grant PB96–0305 from the Spanish Ministry of Education and Culture, which we hereby acknowledge.  We note that the institute we are affiliated with, the Institut Universitari de Lingüística Aplicada of Pompeu Fabra University, is currently building a multilingual corpus for some

Bilingual lexicography, overlapping polysemy, and corpus use

languages for specific purposes that includes parallel production in English, Spanish and Catalan.  In order to show the range of senses for the three adjectives as clearly as possible, we present the data from the monolingual dictionaries first, followed by that from the bilingual dictionaries, although in carrying out our research we started with the bilingual dictionaries because as teachers of lexicography and translation we knew the entries would prove to be different from one another.

References Al-Kasimi, A. M. 1983. “The interlingual/translation dictionary. Dictionaries for translation”. In Lexicography: Principles and Practice, R. R. K. Hartmann (ed.), 153–162. London: Academic Press. American Heritage Dictionary, 3rd ed. 1992. Boston: Houghton Mifflin. Aston, G. and Burnard, L. 1998. The BNC Handbook. Edinburgh: Edinburgh University Press. Cambridge International Dictionary of English. 1995. Cambridge: Cambridge University Press. Collins Cobuild English Dictionary, 2nd ed. 1995. London: HarperCollins. Diccionari anglès-català. 1983. Barcelona: Gran Enciclopèdia Catalana. Hartmann, R. R. K. and James, G. 1998. Dictionary of Lexicography. London and New York: Routledge. Larousse Gran Diccionario Español-Inglés/English-Spanish. 1983. Mexico City: Larousse. Merriam Webster’s Collegiate Dictionary, 10th ed. 1996. Springfield, Mass.: Merriam Webster. Oxford English Dictionary, 2nd ed. 1991. Oxford: Oxford University Press. The Oxford Spanish Dictionary. 1994. Oxford: Oxford University Press. Simon & Schuster International Dictionary English-Spanish/Spanish-English. 1971. New York: Simon & Schuster. Sinclair, J. 1996. “An International Project in Multilingual Lexicography”. International Journal of Lexicography 9: 179–196. Teubert, W. 1996. “Comparable or Parallel Corpora?”. International Journal of Lexicography 9: 238–264. Tognini Bonelli, E. 1996. “Towards Translation Equivalence form a Corpus Linguistics Perspective”. International Journal of Lexicography 9: 197–217. Tomaszczyk, J. 1983. “On bilingual dictionaries. The case for bilingual dictionaries for foreign langugage learners”. In Lexicography: Principles and Practice, R. R. K. Hartmann (ed.), 41–51. London: Academic Press. Toury, G. 1995. Descriptive Translation Studies and Beyond. Amsterdam and Philadelphia: John Benjamins.



Computerised set expression dictionaries Analysis and design Sylviane Cardey and Peter Greenfield Although machines are useful in advancing and verifying the work of the linguist, there remains much core work for which only the linguist is competent. In the area of lexis, such work is essentially lexicographic in nature (conception, understanding and organisation). To provide evidence for this claim, problems and results arising from the construction of computerised set expression dictionaries are presented, as well as the problems encountered in the automatic recognition of set expressions in texts.

.

Introduction

There is a tendency to forget that the role of linguists, lexicologists and lexicographers is the one that is the most important in the creation of dictionaries even if these are ‘electronic’, ‘automated’ or ‘computerised’. This is also true of applications of natural language processing such as aided or automatic translation. What we attempt to show in this paper is that the results do indeed depend in large part, if not wholly, upon the analytical power and intuition of the linguist, the computer serving only as a tool for collecting, organising and verifying either what the linguist needs or what the linguist intuitively thinks. We present results arising from the construction of set expression dictionary systems and explain how these systems have been implemented. These systems are concerned with the automatic recognition and translation of set expressions in four languages (English, French, Italian and Spanish). In one dictionary (Limame 1998) the set expressions contain the names of animals (metaphoric use), as illustrated in example (1), in two others (Thomas 1998, Morgadinho 1999), parts of the body, as in example (2).



Sylviane Cardey and Peter Greenfield

(1) to kill the goose that lays the golden egg (2) to breathe down someone’s neck

. Set expressions in a cross-linguistic perspective Sociologically, we can postulate universals, that is to say, the identical conception of reality all over the world. But the ‘world’ includes ‘societies’, and ‘societies’ involve ‘languages’, and this is why it is interesting to take account of this aspect, because the language of a linguistic community forges the identity of a people, a fact which is evident in certain expressions. Let us look at the French expression (3) and the English expression (4). (3) mettre la charrue avant les boeufs (4) put the plough before the oxen

The word-for-word translations of these expressions are syntactically correct, but semantically they do not refer to the same reality, as the meaning of (4) is not metaphorical. The cultural dimension has to be taken into account in the process of translation. Where a French speaker says (5) quand les poules auront des dents

an English speaker says (6) when pigs have wings

Culture obviously plays a role in understanding set expressions; inevitably, nations do not perceive the world in the same way. While the English and French expressions in (7) and (8) are lexically different, their underlying meaning is identical. (7) blind as a bat, to kill two birds with one stone (8) aveugle comme une taupe, faire d’une pierre deux coups

Language reflects reality; in consequence, terms such as cat, nose, dog form the basis of numerous locutions. These expressions convey a certain environment and certain habits. For the English, as for the French, the monkey symbolises cleverness, the fox cunning and the mule stubbornness. However, locutions constructed upon the experiences of everyday life differ from one community to another, because what is experienced is similar for each person living within a given cul-

Computerised set expression dictionaries

ture. Clearly, no Westerner sees an elephant in the way that a Congolese does. Sometimes the translator is faced with two possible English translations of a given French expression (see 9–11), which causes problems for lexicographic implementation. (9) à vol d’oiseau (10) as the crow flies (11) in a bee-line Certain expressions come from fables: (12) vendre la peau de l’ours (13) vendere la pelle dell’orso (14) sell the bear’s skin (15) se parer des plumes du paon (16) coprirsi conle penne del pavone (17) tuer la poule aux oeufs d’or (18) kill the goose that lays the golden eggs

Whole communities acquire culture from their literature, both contemporary and past. The Bible, in particular, has bequeathed a great many expressions, as illustrated by examples 19 to 23. (19) (20) (21)

adorer le veau d’or worship the golden calf adorare il vitello d’oro

(22) (23)

un chien vivant vaut mieux qu’un lion mort meglio un asino vivo che un dottere morto

. Set expression dictionaries Idioms are the ‘exceptions that prove the rule’: they do not get their meaning from the meanings of their syntactic parts. (Katz 1973 in Moon 1998: 15)

It is now generally recognised that set expressions pose problems for natural language processing. Furthermore, the existence of set expressions is one of the universal characteristics of natural languages. The importance of this phenomenon became fully apparent as a result of attempts to construct automatic translation systems. For example, expressions (24) and (25) have nothing in common, either at the lexical level or at the level of syntax, but they have the same meaning.





Sylviane Cardey and Peter Greenfield

(24) se payer la tête de quelqu’un (25) to make fun of somebody

Set sequences can be subcategorised. Fraser (in Moon 1998: 15) establishes a hierarchy of seven degrees of idiom frozenness from L6 (completely free) to L0 (completely frozen) and argues that no true idiom can belong to level 6. Whatever the classification, what really matters is to make apparent the problems posed by set sequences. One can distinguish between the following types of set expression: a. pragmatagms (Mel’uk 1995 : 176–177), which are the result of usage and pragmatic conventions, as illustrated in (26) and (27). b. set sequences which are semantically blocked; they belong to one of the following three subtypes: – idiomatic expressions, which are composed of non-compositional elements, as in (28) and (29); – semi-idiomatic expressions, where one of the elements is non-compositional and the others are transparent, as in (30); – quasi-idiomatic expressions, which have both a compositional and an idiomatic meaning, as in (31). c. non-compositional syntactic sequences such as (32), whose syntactic structure is malformed. (26) à consommer avant (best before) (27) c’est la vie (that’s life) (28) couper les cheveux en quatre (literally: to cut the hair in four; to be a perfectionist) (29) lever le coude (literally: to lift the elbow; to like a drop) (30) coûter les yeux de la tête (literally: to cost the eyes of the head; to be terribly expensive) (31) ouvrir la bouche (literally: to open one’s mouth; to say something) (32) mettre pied à terre (literally: to put foot to earth; to dismount (from a horse))

The first stage in compiling a computerised dictionary of set expressions is to consult existing general-purpose paper dictionaries (mono- or bilingual) and then to carry out corpus searches on newspapers such as Le Monde, for example. In order to carry out these searches, it is necessary to define one or several search items. We searched for set expressions which contained names of animals and parts of the body. Our initial search enabled us to retrieve a considerable number of set expressions, some of which were ‘new’ in the sense that they

Computerised set expression dictionaries

are not listed in current dictionaries, but also yielded many irrelevant sequences which did not contain set expressions because the computer was unable to distinguish between set expressions and fully compositional, free sequences (see Sections 7 and 8). Many of the ‘new’ expressions yielded by our corpus search resulted from the high degree of variability involved in set expressions. Examples (33) and (34) illustrate the modification of a set expression by the “insertion of spurious or non-canonical words” (Moon 1998: 174). In (33) two prepositional phrases are added to the standard set phrase prendre le taureau par les cornes, while in (34) the adjectif noirs is added to chats in the expression avoir d’autres chats à fouetter. (33) Le président a pris le taureau du chômage par les cornes de la fiscalité. (34) Le premier ministre a d’autres chats noirs à fouetter avant d’arriver à ses fins.

This technique is often used by the press to draw the reader’s attention. Given that the degree of frozenness is entirely arbitrary, all expressions possess a degree of liberty and in consequence every expression can be unfrozen according to the speaker’s or writer’s needs. It is always possible to unfreeze an expression, even the most frozen, in order to produce an amusing effect or to surprise. However, unfrozenness is impalpable and arbitrary: anybody can add, delete or substitute components of set expressions at will. This variability makes corpus searches extremely time-consuming.

. Building a set expression inventory En terminologie, un corpus est un ensemble de textes homogènes, c’est-à-dire traitant du même domaine, rédigés et utilisés par le même type de personnes et dans des conditions semblables.1 (Dauphin 1997 : 17)

One of the aims of our work was the development of a system able to recognise set expressions in context. To do this, we first had to draw up an inventory of set expressions which would subsequently be used for designing our recognition rules. Even if we had an exhaustive inventory of the set expressions, this does not mean that they would easily be recognised in context by a machine. As illustrated in the preceding section, set expressions are not always stable. We first examined more than 50 specialised and non-specialised monoand bilingual dictionaries, including current dictionaries and dictionaries of idioms, in a variety of forms (paper, CD-ROM and on-line). These dictionaries





Sylviane Cardey and Peter Greenfield

are too numerous to be cited in this paper; but the major ones are listed in the Reference section (Chan 1999, Limame 1998, Morgadinho 1999, Thomas 1998). The sequences that we found in these dictionaries met the first condition of frozenness, that is to say, a conventional usage. They are recognised as set expressions by users of the language, and any other sequences would look strange to a native speaker. For example, to indicate the date of validity of a product, French would say A consommer avant… whereas English could only use Best before… (‘meilleur avant’), whilst Polish say Data spozycia... (‘date de consommation’) and Germans Mindestens haltbar bis… (‘utilisable au moins jusqu’à…’). The meaning of these expressions is transparent for a native speaker, and they cannot be replaced by equivalent sequences, even if we could say in French A garder jusqu’à…, Ne pas manger après…, Date limite d’utilisation. We can thus be certain that a given sequence is not an ad hoc metaphor, but that it possesses the same meaning for all speakers of the language. We then looked at journalistic text corpora such as Le Monde on CD-ROM in order to see if there were other set expressions in current use which do not appear in the dictionaries. We also integrated some well known expressions frequently used in spoken language which had not been found in the corpora that we had examined. The starting point in the search for set expressions in electronic corpora was the individual words (animal or parts of the body). As one might expect, we had to deal with a very large number of non-idiomatic uses of the words. It took a considerable length of time to sort the relevant sequences manually in the search for set expressions but there was no other solution. As we had expected, we found ‘new’ expressions (33–36), many of which are mephorical extensions of existing set expressions such as se jeter dans les bras (de quelqu’un) in (35). (35) A ceux qui désespèrent et qui voudraient se jeter dans les bras tendus du parti de… (36) En stigmatisant -l’Europe des technocrates-, il pointe du doigt les carences futures du traité de…

. Criteria for delimiting and selecting set expressions The first problem is determining where the set expression starts and ends. The difficulty with sequences such as (37) and (38) is that of deciding whether the whole sequence needs to be included or only part of it. In this case as well as in examples (39–44) only the non-verbal part was included (par coeur, à gorge déployée, etc.). The same decision was taken in the case of examples (47) and

Computerised set expression dictionaries

(48) in spite of the fact that à coeur, unlike par coeur, is not listed independently in dictionaries. However, in other cases, such as (48), it seemed justified to include the verb (avoir du coeur). (37) apprendre par cœur (38) connaître par cœur (39) verb + à gorge déployée (sing, laugh…) (40) verb + au doigt et à l’oeil (obey, listen…) (41) (42) (43) (44)

(être) au cœur de quelque chose (arriver) comme un cheveu sur la soupe (être) dans les bras de Morphée (avoir) quelque chose sur les bras (Avec cette affaire sur les bras, je ne finirai pas avant demain) (45) avoir quelque chose à cœur (46) prendre à cœur de faire (47) tenir à cœur (48) avoir du cœur

The other important decision to be made in selecting set expressions for inclusion in a multilingual dictionary is whether to include only expressions that belong to the same register or to list all translations, whatever the register. We opted for the latter. Our dictionary therefore contains set expressions belonging to the following registers: literary (49), colloquial (50–51) and slang (52). (49) nourrir un serpent sur son sein (50) se faire du mauvais sang (51) donner sa langue au chat (52) l’avoir dans l’os

The number of expressions involving parts of the body retrieved using these selection criteria was 595.

. Automatic recognition of set expressions Having established our list of set expressions, we then had to deal with the problem of recognising set expressions in the source text (to be translated), identifying translation equivalents for automatic translation and building a bilingual dictionary. A close scrutiny of the set expressions shows that certain





Sylviane Cardey and Peter Greenfield

parts are fixed whilst others are not. Examples (53–56) show that certain elements can be inserted or omitted, while examples (57–58) illustrate variations in word order. (53) (54) (55) (56)

enlever une (belle) épine du pied aller (droit) au cœur sell the bear’s skin (before one has caught the bear) vendere la pelle dell’orso

(57) mettre quelqu’un au pied du mur (58) il a été mis au pied du mur

In order to recognise set expressions automatically, it is crucial to identify the fixed part of a set expression, which is composed of those elements that are indispensable to the existence of the expression, be they variable or not. It is also necessary to take into account all the possible variants, as illustrated in (59–61). (59) se taper/se cogner la tête contre les murs (60) All is fish that comes to/in his/her net. (61) briser/déchirer le coeur

For each expression, all the alterations have to be listed. These alterations are found at several levels. At the morphological level, variations in verbal and nominal inflexion are found, as in (62) and (63). At the lexical level, there are cases of omission and addition of elements (33–35) as well as synonymous paradigms (59–61). (62) Porter la main sur Pierre, ça ne me serait jamais venu à l’idée. (63) Tu as porté la main sur Pierre.

In addition, there are also alterations at the structural level since many set expressions can undergo syntactic transformations such as passivisation, nominalisation or pronominalisation (see Moon 1998:104–119).

.

Set expressions and ambiguity

The great majority of set expressions are also susceptible of a literal interpretation. There are several reasons for this. Firstly, at the lexical level, there are few words that only exist in set sequences. Most exceptions to this tendency involve the use of archaic terms, as in (64). Secondly, the syntactic structure is usually perfectly normal, although there are some exceptions such as (65). Finally,

Computerised set expression dictionaries

there is a semantic reason: the idiomatic sense of a set expression is usually based on its original, literal, sense, as illustrated by example (66). (64) chercher noise (65) il y a anguille sous roche (66) se mettre la corde au cou

An expression is ambiguous when it possesses several senses, which can be: 1.

literal and frozen (67) se faire taper sur les doigts literal sense: to rap sb’s fingers frozen sense: to rap sb’s knuckles

2.

frozen and accidental (68) jouer sa peau frozen sense: to risk one’s neck accidental sense: il a tué un ours ce matin, il a joué sa peau contre une bouteille de whisky

3. several frozen senses (69) prendre le mors au dent a. of a horse: to bolt b. of a human being: to get carried away by passion, anger

. Resolving ambiguities In spite of all the difficulties involved, recognition and translation of a number of set expressions are possible even when these include parts which are not fixed. Strangely enough, it seems that the ambiguity problem can be resolved by examining precisely those elements within idiomatic expressions which can be free. As we have seen, both se taper la tête contre les murs and se cogner la tête contre les murs are possible, whilst s’appuyer la tête contre les murs is not. There is widespread research underway establishing and examining the different possibilities for insertion and transformation. The question is whether this work, which is likely to require much time and effort, will make it possible to distinguish between morphologically identical free sequences and set expressions. In other words, will computers be able to distinguish between the literal and the idiomatic use of sequences such as (70–75)?





Sylviane Cardey and Peter Greenfield

(70) (71) (72) (73) (74) (75)

cook one’s goose see the elephant on lui a mis l’affaire sur le dos on lui a mis le fagot sur le dos il prend la mouche Jean a le cafard

Some syntactic criteria could be used, such as the fact that set expressions cannot be relativised (*Je lui ai dévoilé mon coeur qui était triste) or pronominalised. However, there are many exceptions to these rules, as illustrated by examples (76–78). (76) Je me suis payé sa tête qui s’y est franchement prêtée. (77) Le peuple a réclamé sa tête qui ne valait pas grand chose. (78) Il donne sa langue au chat, je donne la mienne aussi.

In reality, there is a great deal of variability in acceptance of these structures; some native speakers find them acceptable, others reject them. This is partly due to the fact that set expressions are often part of informal language, which is less codified than standard language. The following are some of the cues and strategies that can be used to resolve ambiguities: 1. using semantic information, whereby the interpretation of a set expression can be based on a free element. Thus, in example (79), if the subject of prendre is human, then the expression means ‘to become hungry’, whilst if the subject is an animal it means ‘to bolt’; 2. using contextual cues, for example in specifying to which domain the text belongs. This enables certain ambiguous readings to be excluded. For example, avocat in example (80) cannot be ‘an avocado’; the context will probably serve to indicate that the referent could not be a fruit; 3. using interactive systems by means of which the computer consults the user in order to obtain pragmatic information capable of aiding the disambiguation process; 4. giving priority to statistically dominant structures. Statistical analysis of corpora could indicate that a given set expression is more likely to have one meaning than another; 5. preserving the ambiguity in the hope that it is also present in the target language, as, for example, in (81).

Computerised set expression dictionaries

(79) prendre le mors aux dents (80) se faire l’avocat du diable (to play the devil’s advocate) (81) to take the bull by the horns (prendre le taureau par les cornes)

Although we are aware that all ambiguities cannot be resolved, we have applied a set of rules which give reasonably good results. These rules involve: a. applying restrictions to the free elements of frozen sequences; for example, in the sequence noun1 avoir noun2 à l’oeil, if noun2 is human, the meaning will be ‘to keep an eye on…’; if noun2 is an object, the meaning is ‘to get something for nothing’; b. examining the type of prepositional group, for example, garder la main de la fillette (literal sense) vs garder la main au jeu (to play first); c. specifying the domain of use: ouvrir l’oeil has a literal meaning in the medical field, but otherwise means ‘watchful’; d. analysing the syntactic structure of a given sentence, since some transformations are not applicable to frozen sequences; for example, sa langue a été donnée au chat can only have a literal sense while donner sa langue au chat has two meanings (literal and metaphorical ‘to give in or up’). In cases where disambiguation of frozen sequences is not possible, the best general strategy seems to be to give preference to the frozen interpretation. However, we are not totally satisfied by this method as it does not always yield good results.

9. The MultiCoDiCT dictionary system In this section we aim to show how set expressions are represented in the MultiCoDiCT dictionary system (Multilingual Collocation Dictionary Centre Tesnière) (Greenfield et al. 1999). As stated earlier, set expressions raise problems as regards their formal recognition, but they also present semantic problems. This facilitates neither their access nor their representation in dictionaries. A first step is to choose a canonical representation for set expressions and to organise them by meaning equivalences within and across languages, so that the set expressions can be subsequently accessed in different but prescribed manners (Greenfield 1998). The MultiCoDiCT dictionary system includes the following: 1. a specialised language for encoding set expressions (and, indeed, collocations in general) in the context of meaning equivalences in a given domain and across a given set of languages;





Sylviane Cardey and Peter Greenfield

2. a kernel which is both language- and domain-independent and which provides access facilities to the set expressions in their encoded form; 3. domain-dependent dictionaries, where a dictionary is a structure comprising a specific set of set expressions for a given domain and across a given set of languages. Specialised language includes a natural language-independent component allowing the definition of the meaning equivalence relations that can occur between set expressions within and across languages, and language-specific components where a given language’s typology is defined, this being used in the canonical descriptions. We have opted for a coding based on canonical forms, for both word forms and collocations. This decision has been taken in order to ensure maximum re-usability of the data for possible different ends (manual and automatic). The system as currently implemented is for manual access.

Dictionary headwords Many specialised dictionaries, whether mono- or multilingual, are restricted to a special area of knowledge drawn, for example, from the sciences, technology or skills. This restriction allows them to include numerous words and collocations excluded from general dictionaries and to incorporate everyday words exclusively in their specialised meanings within a particular field. In the case of a manually accessed dictionary, the following questions arise: how are the dictionary’s headwords chosen? As idiomatic collocations and expressions contain more than a single word, under which word should they be included? Ideally, from the human user’s point of view, it should be possible to find the collocations and idiomatic expressions using any of the relevant constituent words as search items. If this is the case, the user does not have to try two or three possible headwords before finding the right one. However, as long as dictionaries are published in book form, space will restrict the duplication of information. The lexicographer thus has to choose the headword to access the collocation (Roberts 1996: 189). It could be argued that the expression should be located under the first important headword that it contains. But what is an important word for a nonspecialist in the area? Furthermore, what is the status of candidate headwords in synonymous expressions in the same or other languages represented in the dictionary? In view of these difficulties, we have adopted a multiple headword system. The direct headwords are the canonical forms of those terms present in

Computerised set expression dictionaries

the expression that have been chosen as headwords by the lexicographer, whilst indirect headwords are those in synonymous expressions in the same or another language (see examples below). The expression can only be accessed via these headwords, whether direct or indirect.

MultiCoDiCT dictionary examples In the following sections we give examples of expressions drawn from two of the MultiCoDiCT dictionaries and the problems that we have had to solve. Both dictionaries are bilingual and reversible and the languages involved are French and Spanish. One dictionary — [Parts of the body] — is concerned with the metaphorical usage of parts of the body (Morgadinho 1999), the other — [Tourism] — focuses on collocations in the field of tourism (Chan Ng 1999).

Expressions with multiple headwords In a MultiCoDiCT dictionary an expression can have several headwords. For example, in the bi-directional French-Spanish [Parts of the body] dictionary, the French expression (82) donner les yeux de la tête pour quelque chose

appears under 4 different headwords. It appears under the direct headwords oeil and tête but also indirectly under dedo and mano as it can be translated by two different expressions in Spanish: (83) dar un dedo de la mano por algo (84) dar una mano por una cosa

On the other hand, one and the same headword can appear in several expressions. The following example is taken from the same dictionary, the headword being the French noun main. Source language: French Headword: main Translations: Spanish: mano Expressions and translations: French

Spanish

avoir la main malheureuse (jeu\casser tout)

tener mala mano (jeu)

avoir la main malheureuse (jeu\casser tout)

tener las manos de trapo (casser tout)





Sylviane Cardey and Peter Greenfield

French

Spanish

avoir les mains nettes(fig)

tener las manos limpias

à pleines mains

a manos llenas

à portée de la main (fig)

a la mano

tenir quelqu’un entre ses mains

tener uno a otro en su mano(fig)

forcer la main

forzar obligar

haut les mains !

¡ arriba las manos !

avoir sous la main

tener a mano

de main de maître

de mano maestra

être comme deux doigts de la main

ser uña y carne

se laver les mains de quelque chose(fig)

lavarse uno las manos en algo

This entry shows the format in which entries are presented.

Expressions which have several possible meaning equivalents The translation of expressions is divided between those which have the same meaning in the source and target languages and those where there is a different meaning. The computerised presentation distinguishes between the two cases. Synonymous equivalents The following example illustrates equivalents in the French-Spanish [Parts of the body] dictionary: French

Spanish

être coude à coude être côte à côte

estar hombro a hombro

Under the headword coude in the French to Spanish direction, alongside the Spanish translation of the French expression containing coude, the dictionary also presents the synonymous French expression in order to draw attention to cases of synonymy in the source language. The same holds true in the case of the French headword côte. In some cases no less than three synonymous expressions are found:

Computerised set expression dictionaries

Spanish

French

estar uno hasta los pelos

en avoir par-dessus la tête avoir les oreilles rebattues en avoir par-dessus les yeux

Note that synonymy can involve single words. The following example shows an equivalence between a semi-frozen French expression and two single words in Spanish. French

Spanish

forcer la main

forzar obligar

Polysemous equivalents The French-Spanish [Parts of the body] dictionary contains the following example of polysemous equivalents: French

Spanish

avoir la main malheureuse (jeu\tout casser)

tener mala mano (jeu)

avoir la main malheureuse (jeu\tout casser)

tener las manos de trapo (tout casser)

In order to indicate the difference in meaning between the two languages, the ambiguous source expression in French is repeated. Beside each instance of the expression the lexicographer has indicated the particularities in terms of sense. Each of these particularities is also placed beside each of the Spanish expressions to indicate which French sense the Spanish expression corresponds to. The particularities are written in French so as to reinforce the message that it is the French expression that is polysemous. The inverse direction is also handled in this dictionary: Spanish

French

arrimar el hombro (propio\figurado)

donner un coup d’épaule (propio)

arrimar el hombro (propio\figurado)

travailler activement (figurado)





Sylviane Cardey and Peter Greenfield

Language variants Several of the MultiCoDiCT dictionaries take language variants into account, as in the following example from the [Tourism] dictionary which includes variants in Spanish: French

Spanish

billet

billete (Espagne) boleto (Américanisme) tiquet (Espagne) billete de avión (Espagne) boleto de avión (Américanisme) tiquete (Américanisme)

billet d’avion billet de bus

10. Conclusion In this paper, certain problems have been brought to light, particularly those concerning the collection of the data and its representation in computerised dictionaries for translation, as well as problems posed by the recognition of expressions in context and in their translation. We are of the opinion that although machines are useful in advancing and verifying the work of the linguist, there remains much core work which only the linguist is competent to carry out (conception, understanding and organisation), and such work is also essentially manual in nature. Even if it is considered that “hand-collected sets of citations cannot give robust information concerning relative frequencies” (Moon 1998: 47), we think that frequency alone is not sufficient. Indeed, the sequences that a human translator will have difficulty translating are likely to be sequences that he or she has never come across before. We are thus of a different opinion, especially in view of the fact that: ‘Par rapport à la macrostructure d’un dictionnaire bilingue général, celle d’un dictionnaire bilingue spécialisé est beaucoup plus réduite, mais elle contient des termes que les dictionnaires généraux n’ont pas’ (Marello 1996 : 39)2. There is a need for specialised dictionaries in all sorts of fields, as general dictionaries are often silent on specific terms. Corpora have proved to be extremely useful in uncovering these expressions but they will always be limited. Exclusive reliance on them might therefore lead lexicographers to miss important expressions which the corpus used does not contain. We think — and this is the line we have followed ourselves — that corpus research, existing dictionaries and

Computerised set expression dictionaries

human intuition should be used concurrently if we are to succeed in building systems capable of recognising set expressions in context, as well as multilingual dictionaries for automatic and human translation.

Notes . The term ‘corpus’ denotes a set of homogeneous texts, i.e. texts covering the same field, written and used by the same type of people and under similar conditions. . As compared to the macrostructure of a general bilingual dictionary, the macrostructure of a specialised bilingual dictionary is not as large, but it contains terms which are absent from general dictionaries.

References Cardey, S. and Greenfield, P. 1997. “Ambiguïté et traitement automatique des langues. Que peut faire l’ordinateur?”. In Actes du 16è Congrès International des Linguistes, Paris, 20–25 juillet 1997. Elsevier Sciences, sous forme de CD ROM. Chan Ng, R. 1999. “Prototype informatisé de dictionnaire du tourisme français-espagnolfrançais”. Mémoire de DEA, Centre Tesnière, Besançon, France. Dauphin, E. 1997. “Etude de corpus : un préalable pour l’adaptation de systèmes de traduction automatique aux besoins des utilisateurs in TA-TAO”. In Recherches de pointe et applications immédiates, A. Clas et P. Bouillon (eds), 15–34. Montréal: AUPELF-UREF. Greenfield, P. 1998. “L’espace de l’état et les invariants de l’état des dictionnaires terminologiques spécialisés de collocations multilingues”. In Actes de la 1ère Rencontre Linguistique Méditerranéenne, Le Figement Lexical, Tunis, les 17–18 et 19 septembre 1998, 271–283. Greenfield, P., Cardey, S., Achèche, S., Chan Ng, R., Galliot, J., Gavieiro, E., Morgadinho, H., Petit, E. 1999. “Conception de systèmes de dictionnaires de collocations multilingues, le projet MultiCoDiCT”. Colloque international AUPELF Réseau Lexicologie, Terminologie, Traduction, Beirut, November 1999 (à paraître). Limame, D. 1998. “Vers une reconnaissance automatique des expressions figées, théorie et application, anglais, français, italien”. Mémoire de DEA, Centre Tesnière, Besançon, France. Marello, C. 1994. “Les différents types de dictionnaires bilingues”. In Les dictionnaires bilingues, H. Bejoint and P. Thoiron (eds), 35–51. Louvain-la-Neuve: Duculot. Mel’uk, I. A. 1995. “Phrasemes in language and phraseology in linguistics”. In Idioms: Structural and psychological Perspectives, M. Everaert, E.-J. van der Linden, A. Schenk and R. Schreuder (eds), 167–232. Hillsdale, N.J.: Lawrence Erlbaum. Moon, R. 1998. Fixed expressions and idioms in English, a corpus-based approach. Oxford: Clarendon Press.





Sylviane Cardey and Peter Greenfield

Morgadinho, H. 1999. “Dictionnaire électronique français-espagnol. Expressions figées”. Mémoire de Maîtrise d’espagnol, mention Industries de la langue, Besançon, France. Roberts, R.-P. 1994. “Le traitement des collocations et des expressions idiomatiques”. In Les dictionnaires bilingues, H. Bejoint and P. Thoiron (eds), 181–202. Louvain-la-Neuve: Duculot. Thomas, I. 1998. “Analyse, reconnaissance et traduction des expression figées: vers un traitement automatique”. Mémoire de DEA, Centre Tesnière, Besançon, France.

Making a workable glossary out of a specialised corpus Term extraction and expert knowledge Christine Chodkiewicz, Didier Bourigault and John Humbley The aim of this paper is to show what remains to be done, especially in terms of lexicographical treatment by subject specialists, in order to turn a glossary obtained through computer-assisted term extraction into a tool that can be used by professional translators. It is argued that the use made of the term extractor gives a high-quality result, but that further treatment is still required, in which linguistic and specialist knowledge are inextricably linked. The example which serves as a demonstration is that of the elaboration of a human rights glossary intended, inter alia, for the translators of the European Court of Human Rights. The corpus used consisted in the complete texts of the European Convention on Human Rights, plus its eleven protocols and 36 court decisions. The term extractor Lexter first identified candidate terms in French, and then a legal expert manually discarded irrelevant sequences and paired the English-language equivalent in the aligned text. This yielded a bilingual glossary of equivalents, which was not considered immediately exploitable, mainly because of a high percentage of multiple correspondences in both languages. An analysis of these multiple equivalences indicates that many are due to purely linguistic phenomena, and can be easily dealt with, but many others require expert knowledge to unravel. The example of procédure/procedure + proceedings is given to illustrate this.

.

The Human Rights terminology project

Automatic or computer-assisted term extraction makes it possible to create dictionaries or terminology tools which can be tailor-made for a specific pur-



Christine Chodkiewicz, Didier Bourigault and John Humbley

pose or a specific type of user. In many fields dictionaries are only found for the most general level of specialised knowledge, and bilingual or multilingual dictionaries are uncommon in restricted domains where the national systems and terminology differ greatly, such as law. It was this situation that caused the European Court of Human Rights to regret the lack of an up-to-date dictionary to help not only its own translators working in the two official languages, English and French, but also those dealing with other languages, notably those of the countries of Eastern and Central Europe. As the Court produces all its official texts in English and French, both versions being considered equally legal and binding, it was possible to produce a corpus based on its own founding texts (the Convention and the protocols added to it) and a number of the decisions made by the Court (36 in the present corpus; at the time of writing over 850 decisions had been handed down). Use could thus be made of decades of painstaking searches for equivalents by translators, and inconsistencies of the past could be pinpointed and corrected. After the initial glossary was obtained in English and French, work was begun on adding Polish and Romanian equivalents, using the results of the term extraction from the French texts.

. Term extraction As the aim of the project was to produce a specific glossary for users with welldefined needs, the choice fell on a strategy of extracting terms from a corpus representing a sizeable proportion of the actual production of the Court itself, rather than of enriching existing dictionaries. The exceptional nature of the corpus used merits some comment. The Convention, the protocols and the decisions are issued simultaneously in English and French, which enjoy equal official status. Moreover, the quality of the translations is such that it is well-nigh impossible to tell which is the original and which the translation. At the same time, the texts are so densely packed with legal terminology that it was considered more useful in the long run to build a specialised vocabulary, rather than simply use a translation memory, as has been done for parliamentary debates, which are much wider in scope. The comparable nature of the two sub-corpora enables the terminologist to assume that the terms they contain will be equally authoritative, and the objection sometimes made of recycling stale translations can thus be firmly rejected. This does not mean, however, that the extraction will not show inconsistencies; one

Making a workable glossary out of a specialised corpus

such area, of which the translators themselves are aware, is that of names of institutions (Cour de cassation becomes, in English, Supreme Court, Court of ‘Cassation’ as well as Cour de cassation), and the results confirm this intuition. Standardisation of the Court’s terminology is a spin-off of the project. The corpus (the Convention, protocols and the 36 decisions handed down in 1995) was provided by the Court itself in both English and French in ASCII files. The first task was to align the two texts, using the highly structured framework of the two texts (number of decision, sections of decisions, numbers of sections, numbers of paragraphs, identification of quotations). In addition, references to past cases, articles of law, proper names, etc. were deleted. The sentences of both corpora were then aligned, using strong punctuation marks. As may be expected, some discrepancies between English and French punctuation were noted, and so some human intervention was necessary to pair off sentences correctly. The two resulting corpora contained 12 131 sentences each, and around 300 000 words. The preparation of the programs (written in Flex under Linux) and the verification of the part-of-speech tagging took about twenty hours of work. The next phase was to extract French candidate terms, using Lexter. This term extractor performs a morpho-syntactic analysis of a corpus, identifying

Figure 1. The Lexter interface. Each candidate term extracted is shown in its context in the left-hand boxes. The terminologist pairs the terms with those in the corresponding English-language contexts in the right-hand boxes.





Christine Chodkiewicz, Didier Bourigault and John Humbley

boundaries between syntagms in order to produce a list of noun phrases which are likely to be terminological units (candidate terms). These noun phrases are arranged in series so that the terminologist can look up all those in which an element occurs, either as a head or an expansion. The hypertext unit gives access to the contexts of the corpus in which the candidate terms occur (Bourigault 1993, Bourigault et al. 1996). The terminologist first validates the terms (‘yes’, ‘no’, ‘?’), using Lexter’s terminology hypertext interface (see Figure 1). Then comes the matching phase. Up to the present, Lexter has only been used to extract French candidate terms, but the aligned English corpus made it possible to identify the equivalent English terms after minor modifications to the hypertext interface. The technology used can be described as unsophisticated, as no statistical matching has been attempted. However, it is claimed that this approach bears comparison with more sophisticated techniques which yield matches for the more frequent candidate terms. Nevertheless, research is under way to develop more sophisticated tools. At the time of writing, the glossary is roughly two-thirds finished in its first stage. Its state of development is summarised in Table 1. Table 1. Number of candidate terms processed, discarded, kept or aligned by specialised terminologists Extracted

Processed

Discarded

Kept

Aligned

Frequency = 1

12 193

6 375

2 720 43 %

3 655 57 %

1 183

Frequency > 1

04 283

3 185

1 058 33 %

2 127 66 %

2 127

Total

16 476

9 560

3 778 40 %

5 483 60 %

3 310

This means that at the time of writing the terminologist had dealt with 9 560 terms out of the 16 476 French candidates which Lexter proposed (of which only 4 283 occurred more than once), decided to keep 5 483 of these, and aligned 3 310 with a corresponding English term. It should be noted that the terminologists kept more hapaxes as candidate terms (57%) than they discarded, casting doubt on the generally accepted view that important terms always occur frequently. Before the glossary can be submitted to the Court translators, however, a number of improvements have to be made, the most important being the question of multiple equivalents in both languages.

Making a workable glossary out of a specialised corpus

. Multiple equivalents: general problems In many cases one candidate term in English corresponds to one candidate term in French. For example, friendly settlement is the English equivalent of règlement amiable in the seven different sentences in which it occurs. There remain, however, a large number of candidate terms which have more than one equivalent in the other language; of terms with a frequency of two or more, one in four have more than one equivalent; out of 981 multiword terms with a frequency of two, 168 have two equivalents. This was considered too high a proportion of multiple equivalents, forcing the translator to look up the contexts through the Lexter hypertext interface, and precluding a complete paper version. In order to reduce this proportion, attention was given firstly to systematising textual variants, which amounts to a further step of lemmatisation, and secondly to highlighting some of the differences between the two legal systems which created this multiplicity of equivalents. Identification of multiword terms is one means of limiting the number of equivalents. The problems involved in deciding exactly what makes up a multiword term unit has been a bone of contention, and long discussed, especially in French-language terminology (Assal and Delavigne 1993, Rondeau 1979). The final decision is made by the subject field expert, who determines, not without difficulty, what is to be considered a significant unit. However, a certain amount of formal regularity can be used to put together phrases that belong together. For example, sous tutelle is in fact always found in the phrase placement sous tutelle or placer sous tutelle, and should be listed accordingly. Single-word terms are often more ambiguous than multiword terms. Thus, lawful corresponds to légal, légitime, licite, régulier, but lawful acts of war always corresponds to actes licites de guerre, and lawful restriction to restriction légitime. There is in addition a link between formal regularity and subject domain. For example, régulier is found regularly (!) as an equivalent of lawful in texts dealing with the entry into, and presence of, foreigners in a state (e.g. entrer régulièrement [territoire], étranger résidant régulièrement sur le territoire), prompting the terminologist to deal first with such phrasal units rather than with isolated terms. One aspect of textuality which has to be regularised for lexicographical purposes is anaphora. While it is perfectly normal for épuisement to be used without further qualification if the form épuisement des voies de recours internes has already been used, it is obvious that the lexicon will not have separate entries for the short and the complete form. It should be noted that short forms





Christine Chodkiewicz, Didier Bourigault and John Humbley

are also found when the complete form of the term is implied though not actually stated in a text: thus detention is used for detention preceding trial when a time limit is mentioned (The judge extended the detention by four months…). It should not be assumed from this that the short forms are mechanically derived from the complete forms but, knowing the expertise of the target users, it is sufficient for the latter forms to be listed in the glossary. Another linguistic aspect of lemmatisation concerns syntactic variants which occur in the texts. Thus, liberté provisoire has as English equivalents not only provisional release, but also released on a provisional basis, released provisionally, provisionally released, (a person) provisionally at liberty. This confirms the translator’s rule of thumb that abstract nouns often translate best into English verb forms (comparution usually appears in the English text as to appear (for trial)). Grouping of different morpho-syntactic forms of a term under an appropriate headword, usually a noun form, significantly reduces the number of multiple equivalents, though some forms attested locally may be considered too atypical to merit keeping in a glossary (such as: such an application cannot be a remedy whose exhaustion is required, which will probably not be indicated as a possible form of non épuisement des voies de recours internes).

. Synonymy Two terms or expressions are said to be synonymous when they have exactly the same meaning and are wholly interchangeable. Dual signifiants and identical signifié are the two criteria of synonymy. Synonymy disturbs the lawyer in so far as he seeks a high degree of precision in the terminology he uses. He resents ‘duplicates’ which appear to him a ‘linguistic waste’ or, at least, a factor of potential misunderstanding. In legal texts there are very few true synonyms, but many partial or quasisynonyms. These often result from the use of a generic term for a specific (e.g. in French: convention/contrat; contrat is one of many types of conventions), but there may be other reasons. However, in context this partial or quasi-synonymy seldom generates problems. In fact, it is often of no practical consequence since the text remains understandable despite a lack of accuracy of the terms used. Nevertheless, from a theoretical point of view and in order to prepare a thoroughly reliable glossary, it is important to circumscribe the specific meaning of each term. Surprising as it may seem, comparing terms in two languages, French and

Making a workable glossary out of a specialised corpus

English in this instance, actually facilitates the task of the linguist who has to deal with problems of synonymy. Using specialised and, to a lesser degree, general dictionaries of the two languages suggests interconnections and distinctions which might otherwise never have been made.

. An extreme case: procedure It soon became apparent when investigating those terms which have two or more equivalents in the other language that it is with reference to procedure (procédure) that ‘approximative synonymy’, to use Cornu’s terminology (Cornu 1990:178), can be most usefully illustrated. In French or in English the terms procédure/procedure — when related to the term proceedings — are particularly problematical. Proceedings has no less than twelve equivalents in the French subcorpus, and procédure has six in the English. To complicate matters further, many equivalents correspond to terms other than proceedings/procédure. The purpose of the following illustration is not to undertake a thorough semantic analysis of the terms that we have singled out in both languages but, more modestly, to try to show how such terms relate to one another. The first step was to consult established dictionaries. In English, the word procedure designates ‘the mode or form of conducting judicial proceedings’ (as distinguished from those branches of the law which define rights or prescribe penalties), as defined by a general language dictionary (Oxford 1994), or ‘the formal manner in which legal proceedings are brought’, as a legal dictionary puts it (Oxford 1990). The term proceedings is thus used to circumscribe the meaning of procedure. However, the word proceedings — which is either too obvious or too obscure to be defined in most legal dictionaries — is employed to define the term procedure itself: proceedings is defined as ‘the course of procedure in judicial action or in a suit in litigation; legal action’ (Webster 1966) or ‘the instituting or carrying on of an action at law or process; any act done by authority of a court of law; any step taken in a cause by either party’ (Oxford 1994). We shall see below that the words cause and action are themselves defined using the term proceedings. In French, procédure is defined as “cette branche de la science du droit ayant pour objet de déterminer les règles d’organisation judiciaire, de compétence, d’instruction des procès et d’exécution des décisions” (Cornu 1987). No single reference can be found to any word which might correspond, in our corpora, to proceedings, as will be seen below. The question is then: would it be legitimate to present procedure as a syn-





Christine Chodkiewicz, Didier Bourigault and John Humbley

onym of proceedings? To answer this question we looked at the various terms used in French by the Court’s translators to render proceedings. The twelve different equivalents already mentioned are: procédure (in a large number of cases), but also procès, litige, affaire, cause, action, instance and (less frequently) poursuites, débats, recours, audience and contentieux. We shall first examine those cases in which procédure corresponds to procedure and other equivalents, before presenting the twelve equivalents for proceedings in the French subcorpus. . Procédure corresponding to procedure and other equivalents Clearly, procédure and procedure overlap to some degree and are thus at least partly equivalent. Thus, the Convention specifies that La Cour établit son règlement et fixe sa procédure, which translates into English as: The Court shall draw up its own rules and determine its own procedure. Garanties fondamentales de procédure is expressed in English as fundamental guarantees of procedure and the expression garanties procédurales is invariably translated as procedural safeguards. Criminal procedural law or penal procedure render la procédure pénale. Procedural provisions or procedural rules are said to be dispositions ou règles procédurales. The Continental codes are designated as Code of Criminal Procedure or Code of Civil Procedure. Procedure in the Civil Courts is translated as procédure devant les tribunaux civils. In both languages, “procedural” is generally used in opposition to “substantive”: thus, les questions soulevées par ce texte étaient (…) de nature procédurale et non matérielle is translated by the questions raised (…) were of a procedural nature not of a substantive nature. But procédure corresponds to words or expressions other than procedure (or proceedings — see below). Thus, procédure is sometimes translated as litigation. Procédure orale corresponds to hearing. Procédure judiciaire is sometimes rendered as judicial process. Sans procédure adéquate is translated as without due process. Pendant la procédure corresponds to pending trial. Sans autre forme de procédure is rendered quite simply as without further formality. Procédure de jugement is sometimes simply translated as trial. Procédure d’instruction corresponds to judicial investigation. There are sentences in which the word procédure does not appear in the French text but procedure is used in English to convey a French expression: Nul ne peut être privé de sa liberté, sauf dans les cas suivants et selon les voies légales corresponds to No one shall be deprived of his liberty save in the following cases in accordance with a procedure prescribed by law.

Making a workable glossary out of a specialised corpus

. Procédure corresponding to proceedings Procédure very often corresponds to proceedings when an action or process (cf. Oxford 1994) before a specific institution is involved. This correspondence is thus in strict codistribution with names of courts or institutions: procédure devant les organes de la Convention – proceedings before the Convention institutions procédure de la Convention – Convention proceedings procédure devant la Cour de cassation – proceedings before the Supreme Court

Proceedings is also used to circumscribe the notion of procédure within a limited time-span and hence in codistribution with expressions of time: début de la procédure – beginning of the proceedings à tout moment (à tout stade) de la procédure – at any stage of the proceedings suspension de la procédure – stay of the proceedings conduite (poursuite) de la procédure – conduct of the proceedings la procédure est toujours pendante – proceedings are still pending

But proceedings is also used in multiword candidate terms corresponding in French to hyponyms of procédure: procédure pénale – criminal proceedings procédure interne – national proceedings/domestic proceedings procédure de cassation – “cassation” proceedings procédure de jugement – trial proceedings/court proceedings procédure d’appel – appeal proceedings procédure de révision – rehearing proceedings procédure de contrôle judiciaire – judicial review proceedings procédure en chambre du conseil – review chamber proceedings procédure non-contentieuse – non-contentious proceedings procédure en diffamation – libel proceedings procédure de règlement amiable – friendly settlement proceedings procédure préliminaire – preliminary proceedings procédure au principal – principal proceedings

The terminologist notes that these multiword terms have only one equivalent in the corpus and thus pose no problem of multiple equivalence when taken together. Proceedings also marks “the steps taken in a cause” (Oxford 1994): engager une procédure – to take/bring/institute proceedings





Christine Chodkiewicz, Didier Bourigault and John Humbley

suspendre une procédure – to stay/adjourn proceedings rouvrir la procédure – to reopen the proceedings

Here, the support verb is an indicator of the equivalence to be sought. In some cases procedure and proceedings are used without distinction for procédure. Thus, procédure d’extradition has two equivalents in the corpus: extradition procedure or extradition proceedings. Similarly, frais de procédure is either procedural cost or costs of the proceedings. Vice de procédure can be expressed either by procedural defect or procedural deficiency but also by defect in the proceedings or irregularity in the proceedings. The hypothesis is that the English equivalents are true synonyms. . Proceedings and its equivalents other than procédure

1. Procès Procès is an equivalent of proceedings in the following: procès pénal – criminal proceedings parties au procès – parties to the proceedings

Procès is, however, the most common equivalent of trial: procès contradictoire – adversarial trial

But trial itself corresponds to many words besides procès: commit the accused for trial – renvoyer l’accusé en jugement criminal trial – audience pénale / to appear for trial – comparaître à l’audience in the course of the trial – au cours des débats pending trial – pendant la procédure

Trial is sometimes used in a multiword term where, again, procès is not to be found: trial judge thus designates what is called in French the juge de première instance/premier juge/juge du fait/juge du fond (typical of French law); similarly, the judge in charge of preparing the case for trial is called juge de la mise en état (unknown in English law). To ensure appearance for trial corresponds to assurer la représentation en justice; pre-trial detention or detention pending trial to détention provisoire; to evade trial to se soustraire à l’action en justice; trial proceedings to procédure de jugement. Conversely, procès corresponds to many words other than trial: réouverture du procès – reopening of the case

Making a workable glossary out of a specialised corpus

droit à un procès – right to a hearing entamer un procès – to bring an action procès civil – civil process droit à un procès équitable has three equivalents: fair hearing/due process/right to a fair trial

2. Affaire Affaire corresponds to proceedings: l’état de l’affaire – the state of the proceedings affaires pénales – criminal proceedings

But affaire is most often the equivalence of case: la compétence de la Cour s’étend à toutes les affaires concernant l’interprétation et l’application de la présente Convention – the jurisdiction of the Court shall extend to all cases concerning the interpretation and application of the present Convention renvoi de l’affaire – adjournment of the case fond de l’affaire – merits of the case affaires de diffamation – libel/defamation cases

Affaire also corresponds to matter: fond de l’affaire (see above) – merits of the matter règlement amiable de l’affaire – friendly settlement of the matter toutes les affaires concernant l’interprétation et l’application de la présente Convention – all matters concerning the interpretation and application of the present Convention.

Finally, affaire can also correspond to many other words or expressions in which various metonymies can be noted: toutes les affaires dont l’examen n’est pas terminé – any application the examination of which has not been completed. il sollicita un examen rapide de son affaire – he asked for his application to be dealt with speedily affaire d’une telle envergure – trial on such a scale une affaire où l’on ignorait les faits – a situation where the true facts were unknown le juge fut chargé de l’instruction de l’ensemble de l’affaire – the judge (…) was put in charge of the overall investigation décision de joindre les affaires – decision of joinder





Christine Chodkiewicz, Didier Bourigault and John Humbley

3. Litige Litige can correspond to proceedings: litige de nature pénale – proceedings that were of a criminal nature somme en litige – sum at stake in the proceedings

Litige is also an equivalent of case: le litige auquel (les Hautes Parties Contractantes) sont parties – the case to which (the High Contracting Parties) are parties l’examen du litige – examination of the case fond du litige – merits of the case l’objet du litige – the scope/the compass of the case

But litige also often corresponds to dispute: somme en litige – sum in dispute litige concernant des droits et obligations de caractère civil – dispute concerning civil rights and obligations litige de nature privée – disputes between private parties l’issue du litige – the outcome of the dispute l’objet du litige – the subject-matter of the dispute

Finally, litige has various other equivalents: le litige a été résolu – the matter has been resolved contexte du litige – background to litigation l’enjeu du litige pour l’intéressé – what is at stake for the applicant in the litigation

4. Cause Cause is an infrequent equivalent of proceedings: être appelé dans la cause – to join the proceedings être maintenu dans la cause – to remain party to the proceedings

Instead, cause is more often an equivalent of case: examiner la présente cause – to examine the instant case faits/circonstances de la cause – facts/circumstances of the case il n’a pas pu faire entendre sa cause devant un tribunal – he was not able to bring his case before a tribunal bien-fondé/fond de la cause – merits of the case

Strictly speaking, bien-fondé de la cause and fond de la cause are not synonyms in French, so this is perhaps the only error of translation detected in our corpus.

Making a workable glossary out of a specialised corpus

However, cause has several other equivalents in the French subcorpus: les juridictions répressives ne sont pas intervenues dans cette cause – the criminal courts had not been involved in the matter toute personne a droit à ce que sa cause soit entendue équitablement, publiquement, dans un délai raisonnable – everyone is entitled to a fair and public hearing within a reasonable time la Cour ajourna l’examen de la cause – the Court adjourned the hearings

It is interesting to note that cause (in our corpus) is never given as an equivalent of cause (though the word exists in legal English) or of trial (though the words procès and cause are generally regarded as synonyms in French, as we shall see below).

5. Action Action is another occasional equivalent of proceedings: action pénale/action publique – criminal proceedings action civile – civil proceedings action en confiscation – condemnation proceedings/proceedings for forfeiture engager/intenter/entamer une action – to take/to bring proceedings action en diffamation – proceedings for/of libel (see also below)

The most common English equivalent of French action is simply action: intenter/engager/porter une action (contre quelqu’un) – to bring/to commence an action (against someone) examen de son action – trial of his action rayer une action du rôle – to strike out an action action en responsabilité de l’Etat – action to establish the State’s liability action en dommages-intérêts – action for damages action en diffamation – libel action/action of/for defamation

However, French action can correspond to other words in English: X s’est soustrait à l’action – X evaded trial action en responsabilité civile – civil litigation action publique – prosecution

Conversely, English action corresponds to other terms in French: persons against whom action is being taken with a view to deportation or extradition – personne contre laquelle une procédure d’expulsion ou d’extradition est en cours





Christine Chodkiewicz, Didier Bourigault and John Humbley

the Court can decide on the whole merits of the action – la Cour a plénitude de juridiction to dismiss the applicant’s action – débouter remedial action – mesures de redressement the provisions do not permit the action taken – les dispositions n’autorisent pas les mesures prises

6. Instance Instance corresponds to proceedings in the following collocations: instance judiciaire – legal/judicial court proceedings instance pénale – criminal proceedings instance de renvoi en jugement – committal proceedings suspendre l’instance – to adjourn the proceedings l’engagement d’une instance – the institutions of proceedings conduite de l’instance judiciaire – conduct of judicial proceedings

But instance also corresponds to procedure (instance devant la Cour suprême – procedure before the Supreme Court) or to action.

7. Poursuites Poursuites also corresponds to proceedings: poursuites judiciaires – legal proceedings pousuites pénales – criminal proceedings poursuites en cours – pending proceedings engager des poursuites contre … – to bring proceedings against … clôturer les poursuites – to close the proceedings (être poursuivi is translated by to be subject to legal proceedings)

8. Recours Recours is an infrequent equivalent of proceedings: introduire un recours – to take/bring/institute proceedings droit de recours devant un tribunal – right to take proceedings before a court

But recours more usually corresponds to other terms or expressions: introduire un recours (see above) – to file/make/lodge an application or to file a complaint droit à un recours effectif – right to have an effective remedy épuisement des voies de recours (internes) – exhaustion of domestic remedies

Making a workable glossary out of a specialised corpus

droit de recours individuel – right of individual recourse/right of individual petition recours gracieux – non-contentious claim recours au tribunal – appeal to the tribunal le recours doit être rejeté – the appeal must be dismissed exercer un droit de recours – to lodge an appeal juridiction de recours – appellate court/court of appeal débouter (quelqu’un) de son recours/rejeter un recours – to dismiss (someone’s) application recours en annulation – application for judicial review/plea of nullity recours de droit administratif – administrative law action recours administratif – administrative objection/non-contentious application/administrative appeal

9. Audience Audience is sometimes an equivalent of proceedings: publicité de l’audience – publicity of the proceedings

Audience also corresponds to trial: audience pénale – criminal trial comparaître à l’audience – to appear for trial

Audience is also often translated as hearing (or trial hearing): audience d’appel – appeal hearing audience de réexamen – revision hearing audience publique – public hearing

But audience also corresponds to other terms: renvoyer l’audience – to adjourn the case

10. Débats Débats is an equivalent of proceedings: débats judiciaires – judicial proceedings

11. Contentieux Contentieux is an equivalent of proceedings: le contentieux des droits de l’homme – human rights proceedings

But contentieux is also translated as litigation and dispute.





Christine Chodkiewicz, Didier Bourigault and John Humbley

. Discussion The term proceedings is highly polysemous and has no single equivalent in French (see Figure 2). Procédure

●

●

Proceedings

Procès

●

●

Procedure

Affaire

●

●

Trial

Litige

●

●

Case

Cause

●

●

Hearing

Action

●

●

Action

Instance

●

●

Process

Poursuite

●

●

Matter

Recours

●

●

Dispute

Audience

●

●

Litigation

Débat

●

●

Others

Contentieux

●

Figure 2. Cross-linguistic correspondences of French procédure and English proceedings. Frequent equivalents are indicated by an unbroken line, less frequent by a broken line.

If one term is to be highlighted as the most frequent equivalent of proceedings, it is certainly procédure but also to a lesser extent, poursuites, which in our corpus has no equivalent other than proceedings, and instance, which has two equivalents, procedure and action. The least frequent equivalent of proceedings is undoubtedly recours, as this term has itself many different equivalents in English. The equivalents of contentieux and débats are not very significant either, but it is not possible to draw a definite conclusion concerning these two words since they appear infrequently in our corpus. Three terms can be held to have exact equivalents in the other language:

Making a workable glossary out of a specialised corpus

procédure/procedure, action/action and cause/cause. But these terms are not systematically used to translate each other and all three French words often correspond to proceedings in English. As far as cause is concerned, it is difficult to draw a conclusion as this term does not appear in English in our corpus; but the instances in which it is used are probably fewer in English than in French (which does not necessarily mean that its meaning is more restricted). Cause is defined by the legal dictionary (Oxford 1990) as ‘a court action’. In French it is often used in the expression en tout état de cause, meaning ‘à toute hauteur de la procédure; à tout moment de l’instance (par opposition au seuil de l’instance)’ (Cornu 1987). However, French action and English action seem to have roughly the same meaning: the term is defined in English as ‘a proceeding in which a party pursues a legal right in a civil court’ (Oxford 1990) or ‘the taking of legal steps to establish a claim or obtain a judicial remedy’ (Oxford 1994) and in French as ‘voie de droit ouverte pour la protection d’un droit ou d’un intérêt légitime’. This is corroborated by the fact that French action has been shown to correspond most usually to English action. But we have seen that they cannot be regarded as strictly equivalent, given other possible equivalents. As regards procédure/procedure, it is quite clear that these two terms cannot be held to be always equivalent. In many instances, as seen above, proceedings is certainly a more appropriate equivalent of procédure than procedure, even though in some cases proceedings and procedure are used interchangeably. In French, procès, cause, litige, affaire are generally held to be synonymous (cf. Cornu 1990, Robert 1990). One could easily think that the same is true in English, as these four terms have at least two common denominators: proceedings (again), but also (in many instances) case (which is defined (Oxford 1990) as ‘1. A court action. 2. A legal dispute. 3. The arguments, collectively, put forward by either side in a court action’ and in general language (Oxford 1994) as ‘a cause or suit brought into court for decision’). But these four terms, as seen above, also correspond to other terms. The most common — not to say the most appropriate — equivalent of procès is probably trial but trial is clearly inappropriate as a translation of cause or litige. In fact, even litige, though often regarded as synonymous with procès, affaire or cause has been said to designate “more exactly” (Cornu 1987) a “différend, désaccord, conflit dès le moment où il éclate (…) pouvant faire l’objet d’une de solution indépendamment de tout recours à la justice étatique” (Cornu 1987, also Cornu 1990: 154). Litigation and dispute are probably, therefore, better equivalents for litige than proceedings, case or matter. Except in relation to the term procédure and perhaps instance (defined by





Christine Chodkiewicz, Didier Bourigault and John Humbley

Cornu 1987 as “la procédure engagée devant une juridiction; phase d’un procès; plus précisément, la suite des actes et délais de cette procédure à partir de la demande introductive d’instance jusqu’au jugement ou autres modes d’extinction de l’instance … On parle en ce sens du déroulement ou de la poursuite de l’instance”) and poursuites (defined as “exercice d’une voie de droit pour contraindre un personne à exécuter ses obligations ou à se soumettre aux ordres de la loi ou de l’autorité publique”), our conclusion is therefore that the term proceedings should be used sparingly (e.g. to translate audience, for which the most appropriate translation is surely hearing), though it cannot be said that the Court’s translators ever used it wrongly. Again, the high linguistic quality of our corpora — both French and English — should be emphasised. It is impossible to tell whether the texts we have analysed were initially written in French or in English and then translated into the other language. It is precisely because the quality of both texts is so high that we can afford to be so meticulous.

.

Conclusion

Procédure/proceedings represent a complex and hitherto largely uncharted area of legal terminology, which cannot be fully resolved simply by using semiautomatic term extraction. However, automatic processing presents several major advances. Firstly, it enables the terminologist to view the total number of occurrences of the candidate terms and their many equivalents. Secondly, the immediate access given to all the texts in which the term candidates occur facilitates disambiguation, precision of equivalents and harmonisation of terms chosen by the translators. Thirdly, Lexter gives priority to multiword term candidates, where the number of multiple equivalences is very much lower than in the case of single-word terms, and these multiword equivalences can go straight into the proposed glossary. There remains a significant number of equivalences which can only be unravelled with specialist knowledge, especially single-word terms with high frequency. For these, the terminologist is obliged to print out the relevant portion of the corpus and engage in traditional legal analysis. Future developments of Lexter aim at extracting verb group candidates, which, as has been shown in the examples given, significantly reduce the problem of multiple equivalence, and, in the medium term, automatic extraction in bilingual texts. As to the glossary in hand, three stages are projected: a term base using Lexter’s interface, which will be submitted to the Court’s translators

Making a workable glossary out of a specialised corpus

for validation, once the number of multiple equivalents has been significantly reduced; a paper glossary of the essential terms of human rights; finally a database containing both equivalences and translation memory. Only this sort of tool will enable translators to respond to the ever-increasing pressure of work resulting from the growing number of decisions to be translated.

References A Concise Dictionary of Law. 1990. 2nd. ed. Oxford: Oxford University Press. Assal, A., et Delavigne, V. 1993. “Le découpage des unités terminologiques complexes : limites des critères linguistiques”. In Actes de la quatrième journée ERLA-GLAT, 175–193. Brest. Bourigault, D. 1993. “Analyse syntaxique locale pour le repérage de termes complexes dans un texte”. In Revue t.a.l., 34/2. 105–117. Bourigault, D., Gonzales-Muilez, I., Gros, C. 1996. “LEXTER, a natural language tool for terminology extraction”. In Actes du 7ème congrès international EURALEX, 771–779. Göteborg. Cornu, G. (ed.) 1987. Vocabulaire juridique. Paris: Presses universitaires de France. Cornu, G. 1990. Linguistique juridique, Paris: Montchrétien. Dictionnaire des synonymes. 1990. Paris: Le Robert. Oxford English Dictionary (CD-ROM). 1994. 2nd ed. Oxford: Oxford University Press. Rondeau, G. (ed.) 1979. Table ronde sur les problèmes du découpage du terme. Montéal: Office de la langue française. Third New International Dictionary. 1966. Springfield, Mass.: Merriam.



P V

Translation and Parallel Concordancing

Translation alignment and lexical correspondences* A methodological reflection Olivier Kraif

.

Introduction

In the last few years much interest has been given to the outcome of translation aligning: Isabelle (1992) proposed using bilingual parallel texts, or bi-texts, i.e. segmented and aligned translation corpora, as a Corporate Memory for translators. He alleged that “existing translations contain more solutions to more translation problems than any other existing resource”. Such a translation database, organised as a bilingual concordancer (as in the TransSearch Project, cf. Simard et al. 1993) would store all the previously found solutions for a given translation problem and allow the translator to recover them easily. Other alignment-based tools, such as automatic verification, have a natural place in a translator’s workstation. Error detection can be implemented when translations are provided in aligned format. In the TransCheck system, Macklovitch (1995a) shows how common errors such as “deceptive cognates, calques, illicit borrowings” can be automatically detected in a bi-text framework. Other features, such as exhaustiveness (i.e. omission errors; cf. Isabelle et al. 1993) or terminological consistency (Macklovitch 1995 b), can be tested. It is also possible to verify automatically, in a reliable manner, the proper translation of specific phrasal constructions such as dates or numerical expressions. The transduction grammar formalism seems to work very well in this kind of restricted translation task. In the more ambitious field of Example-Based Machine Translation (Sato & Nagao 1990, Brown et al. 1990), aligned corpora form the cornerstone of the



Olivier Kraif

system. The linguistic knowledge is stored implicitly in the recorded examples of translation. The success of the system depends on the huge quantity of aligned sentences that constitute mutual translations. Another interesting application is the automatic extraction of bilingual lexicons. Many works (Dunning 1993, Dagan et al. 1993, Gaussier & Langé 1995) have shown how to use statistical filters to pair lexical units that have a similar distribution in each part of the bi-text. As a large proportion of these similar units are translation equivalents, they can be useful in establishing bilingual (or multilingual) glossaries for empirical observation. In order to align parallel texts, several techniques have been implemented which have yielded satisfactory results. Even when they take advantage of lexical information most of the systems work at sentence level (Brown et al. 1991, Simard et al. 1992, Kay & Röscheisen 1993, Gale & Church 1991). Indeed, it is a well-known fact that the hypothesis of parallelism does not hold below sentence level, and ‘lexical alignment’ appears to be a far more complex problem. However, some systems have yielded encouraging results in producing lexical alignment (Brown et al. 1993). Given the huge variety of algorithms and techniques devoted to alignment, we are now entering an evaluation phase, and some large-scale projects such as Arcade (Langlais et al. 1998) set out to give a coherent framework for definition and evaluation of the aligning task. In the former project two different tasks have been tested: sentence alignment and lexical spotting (i.e. finding lexical correspondences for a given list of test words). The evaluation task consists of two steps: given a test corpus, we have to determine first a gold standard, i.e. a manually constructed alignment that is considered to be exact. Then we have to implement a metric in order to effect a quantitative comparison of any other alignment with the standard. Both in the case of sentence and of word track, two kinds of difficulty resulted from the definition of a standard alignment: segmentation discrepancy and correspondence problems. Detailed criteria were given to human aligners and annotators in order to cope with inconsistencies, but the lexical spotting task, in respect of sentence alignment, rapidly proves problematic. After giving a precise definition of what bilingual alignment involves, we will go on to describe various problems associated with alignment at word level. We will then show the inconsistency of such a concept, and draw a line between the extraction of lexical correspondences and the alignment task from a general point of view. We believe that only a proper definition of the concepts of alignment and correspondence that takes account of the actual practice of

Translation alignment and lexical correspondences

translation can produce reliable criteria for the creation of a gold standard that can be used for the purpose of evaluation.

. The concept of alignment The standard concept of alignment can be summed up as follows: Aligning consists in finding correspondences, in bilingual parallel corpora, between textual segments that are translation equivalents. Translation equivalence is above all a global property of the translation of a text. It is not a linguistic property, but a pragmatic one: the translation arrived at is a result of interpretative choices that are made in a specific situational context. As Sager (1994:186) says: While the cognitive and linguistic equivalents are mainly established at the level of the sentence or in smaller units during the translation phase, the pragmatic equivalents have to be selected first in the preparation phase and at the level of the text type before being also realised in smaller units at appropriate points in the document.

These extra-linguistic parameters are linked to many factors at the pragmatic level: text typology, text intention, receptors, dynamic equivalence (cf. Nida & Taber 1982), cultural adaptation, conceptual background and so on.1 Translation equivalence is a relationship between messages entrenched in two given contexts and backgrounds: the source and the target context. This global equivalence does not imply equivalence at the level of linguistic units. In the following example, the original advertisement for golf items is not translated at word level (Henry 1991:15): (1) To make your greens come true Pour faire putt de velours

The French version includes a pun, as in English: it refers to the expression faire patte de velours, which means ‘to sheathe its claws’ (of a cat). Putt is a particular stroke in golf, and the translation plays on the paronymy between ‘putt’ and patte. This example illustrates the fact that the equivalence holds at a global and an abstract level. The two versions ‘work’ in the same way, although using different linguistic means. In this case the relevant features are the pun and the theme. Depending on the function of the message, some features are more relevant than others, and have to be maintained in translation whatever the cost





Olivier Kraif

(while other features are lost): these may be the conceptual content or rhetorical figures, stylistic devices, formal features such as alliteration, and so on. Therefore, to segment and to establish correspondence between segments, we have to make a specific assumption about the translation. We might call it translational compositionality. This concept is developed by Isabelle (1992): For translation to be possible at all, translational equivalence must be compositional in some sense; that is, the translation of a text must be a function of the translation of its parts, down to the level of some finite number of primitive equivalences (say between words and phrase).

I do not completely agree with Isabelle when he presents compositionality as a condition of the possibility of translation. Compositionality may be a characteristic of the process of translation, but remains a relative notion as as far as the product of translation is concerned. In fact, the translational compositionality of a bilingual corpus determines exactly the level at which it is possible to align it. In more formal terms, the compositionality assumption leads to the definition of a specific corpus structure: the bi-text. Generally speaking, a bi-text is a quadruple where T1 and T2 are mutual translations (the direction of the translation is irrelevant), Fs is a segmentation function which divides the texts into a set of smaller units (e.g. paragraphs, sentences, phrases), and A is the alignment of these units, i.e. a subset of the product Fs(T1) x Fs(T2). This general definition can lead to different kinds of bi-text: Fs can produce either a complete or a fragmentary partition of the texts, or a hierarchical partition where different levels are simultaneously involved (paragraph, sentence, words). Moreover, we can focus on particular alignments with several restrictions. For instance, Isabelle & Simard (1996) define a monotone alignment in terms of three constraints: – no crossing correspondences; i.e. the segments must appear in the same order in both texts. – no partially overlapping segments: two different segments that appear in different pairings cannot share the same portion of text. For instance, the phrase Machine Aided Translation would not yield two segments: Machine Aided and Aided Translation. – no discontinuous correspondences; i.e. there are no discontinuous segments, such as Machine […] Translation in the previous example. Most existing alignment systems use this kind of monotone alignment. Indeed, in the current state of the art, the possibility of automatic alignment is strongly

Translation alignment and lexical correspondences

conditioned by the parallelism of the corpora. As Gaussier & Langé (1995: 71) have defined it, parallelism consists in the conjunction of two criteria: one-toone matching and monotony: – One-to-one matching means that each segment of one text has a correspondence in the other text. In fact, this condition is never completely realised, because translation induces additions and omissions. Therefore, this criterion is more or less satisfied, depending on the particularities of the translation. – Monotony, as previously defined, is also a relative property. In general, however, inversion of the sequence of segments is rare.

. Alignment techniques As Simard & Plamondon (1996) point out, alignment techniques can produce two different kinds of result: – alignment involving a parallel segmentation of both texts into smaller logical units (such as paragraphs, sentences or even phrases), in such a way that the nth segment of source text and the nth segment of target text are mutual translations. – a bi-text map involving a set of points (x,y), called anchor points, where x and y refer to precise locations in the source and the target text that denote portions of text corresponding to one another. The latter case is very general, because it does not presuppose a previous segmentation. But a bi-text map is not a very useful form of bi-text, as it does not directly indicate correspondences between textual units as in bilingual concordances: it only establishes connections between text areas. We consider the bi-text map as a preliminary and intermediate step for the achievement of a full alignment. In the following discussion, I will give examples of sentence alignment, but the problems are the same for every kind of segmentation compatible with compositionality.

What is alignment? Bilingual alignment is not a negligible problem, as translation does not preserve unit boundaries. Practically, a sentence can be translated by two or more sentences, or can simply be omitted. At every stage the alignment algorithm has to determine the appropriate clustering of units in order to respect the





Olivier Kraif

translation equivalence property. We can illustrate this by the example in the following table, extracted from an English translation of Jules Verne’s novel De la terre à la lune (which is a part of the BAF corpus, developed at the CITI of Montreal, which has been used as a benchmark in the Arcade Project; cf. Langlais et al. 1998 and Simard 1998:489). Table 1. Example of sentence alignment English text

French text

P’1 ! “ Nous voilà au 10 août, dit un matin P1 ”Here we are at the 10th of August,” exclaimed J.T. Maston J.-T. Maston one morning, “only four P’2 Quatre mois à peine nous séparent du premier months to the 1st of December. décembre ! P’3 Enlever le moule intérieur, calibrer l’âme de la pièce, charger la Columbiad, tout cela est à faire ! P2 We shall never be ready in time!” P’4 Nous ne serons pas prêts ! ”

We can write this alignment as follows: T=P1P2

T’=P’1P’2P’3P’4

A= {[P1;P’1P’2],[∅;P’3 ],[P2;P’4]}

It is also possible to represent these clusters as a sequence of n-p transitions, called an alignment path: A = (1–2), (0–1), (1–1) Figure 1 gives a two-dimensional representation of this path, with T and T’ on the X and Y axes. The alignment is represented by the surfaces involved in the segment pairings: T’ P’4 P’3 P’2 P’1 0T 00P 1

00P 2

Figure 1. Two-dimensional representation of an alignment

Translation alignment and lexical correspondences

If we draw a chart representing the complete translation of Verne’s novel, we get a general view of the path, as shown in Figure 2. English version

Corpus Verne – Sentence alignment

3000 2500 2000 1500 1000 0500 0000 0

500

1000

1500

2000

2500

3000

3500

French version

Figure 2. A complete alignment path

The more parallel the translation is, the closer the path is to the diagonal of the square.

General framework Several methods have been developed to calculate this kind of path automatically. They are usually implemented within a probabilistic framework: by estimating the probability of all possible paths, the algorithm can find the bestscoring one, i.e. the one with the highest probability. Given a function p(A) which estimates the probability of alignment A, the algorithm has to find: A* = argmaxA p(A) Naturally, this task of maximisation creates great problems of computation: the number of possible paths is in O(n!) (where n represents the number of sentences). A Viterbi algorithm which considers simultaneously all the subpaths that share the same beginning can reduce the computation to O(n2) but it is still a considerable problem. A simpler method of reducing search space is to consider only the paths that are not too far from the diagonal. This is a direct implication of the parallelism hypothesis: if omissions, additions and inversions are marginal, the path cannot diverge too much from the diagonal.





Olivier Kraif

Prealignment Another way of reducing search space is a preliminary extraction of a rough but reliable bi-text map, based on superficial clues. Chapter separators, titles, headers and sometimes paragraph markers can yield information of great interest to produce a quick and acceptable pre-alignment (Gale & Church 1991). Other superficial clues are the chains that remain invariant in translation, such as proper nouns or numbers (Gaussier & Langé 1995). If one had to align a text and its translation manually in a completely unknown language, one would use exactly the same superficial, straightforward information. I have shown elsewhere (Kraif 1999) that such chains can be used to align 20% to 50% of the different texts in the BAF corpus (with less than 1% error rate).

Alignment clues Once the search space has been reduced, we can evaluate the probability of each possible sentence cluster in order to calculate the global probabilities of each path. Different kinds of information are available for this estimation.

Segment length Gale & Church (1991) and Brown et al. (1991) simultaneously developed a lengthbased method which yielded good results on the Canadian Hansard Corpus.2 The principle of this method is very simple: a long segment will probably be translated by a long segment in the target language, and a short segment by a short one. Indeed, Gale & Church show empirically that the ratio of the source and target lengths corresponds approximately to a normal distribution. Note that it is possible to compute the segment lengths in two ways: as the number of characters or the number of words in the segment. According to Gale & Church, the length in characters seems to be a little more reliable in the case of translations between English and French (the variance of the ratio is slightly smaller). Using the average and the variance of this ratio as specific parameters, depending on the language pairs involved, they compute the probability of a cluster as a combination of two factors: the probability of length ratio and the probability of transition. These latter probabilities were determined in an empirical way in the case of the Gale & Church corpus, considering only six of the most frequent types of transition, viz.: One sentence — one sentence : p(1–1)=0.89 One sentence — zero sentence and reciprocally : p(1–0)=p(0–1)=0.0099 Two sentences — one sentence and reciprocally : p(2–1)=p(1–2)=0.089

Translation alignment and lexical correspondences

Two sentences — two sentences : p(2–2)=0.011 All the other alignment clues are based on the lexical content of the segment. They come from a very straightforward heuristic: word pairings can lead to segment pairings. If two segments are translation equivalents, they will probably include more lexical units that are translation equivalents than any independent segments would. To take the lexical information into account, one just needs to know which units are potential equivalents. This linguistic knowledge can be extracted from various sources including bilingual dictionaries and bilingual corpora.

Bilingual dictionaries To be usable for this purpose, dictionaries have to be available in electronic format. Moreover, in technical fields, it is not always easy to find a dictionary that is consistent with the corpus concerned. Bilingual corpora It is also possible to extract a list of lexical equivalents directly from a bilingual corpus. Indeed, translation equivalents usually have very similar distributions in both texts. These distributions can be converted into a mathematical form and then be compared quantitatively. In the K-vec method, developed by Fung & Church (1994), both texts are divided into K equal segments. Then, for each word (here the words are treated as lexical units), it is possible to compute a vector representing its occurrence in each segment: with 1 for the ith co-ordinate if the word appears in the ith segment, otherwise 0. Thus, when both words have 1 for the same co-ordinate, one can say that they co-occur. This model of co-occurrence (cf. Melamed 1998) makes it possible to calculate the similarity of two distributions by several measures based on probabilities and information theory. In two texts divided in N segments, for two words W1 and W2 occurring in each text in N1 and N2 segments respectively, and co-occurring in N12 segments, you can easily compute their mutual information: N12 —— N I = log —————— N1 N2 —— · —— N N

If N1 and N2 are not too small (>3), then beyond a certain threshold of mutual information (I>2), it is highly improbable that the N12 co-occurrences are due to chance: you can assume that they are linked by a special contrastive relation, which may be translational equivalence. For rarer events (N1 or N2 ≤ 3), other





Olivier Kraif

measures, such as the likelihood ratio (Dunning 1993) or the t-score (Fung & Church 1994), are more suitable. The problem of the K-vec method is that segments are big (because the system has no knowledge about the real sentence alignment) and the co-occurrences model is very imprecise. The finer the alignment, the more exact the word pairing obtained. As there is an interrelation between segment pairing and word pairing, some systems work in an iterative framework (Kay & Röscheinsen 1993, Débili & Sammouda 1992). From a rough prealignment of the corpus they extract a list of word correspondences. From these correspondences they then compute a finer alignment. From this new alignment they extract a new and more complete set of word pairings. And so on, until the alignment has reached stability.

Formal resemblance Another way of determining lexical equivalence is to focus on cognate words which share common etymological roots, such as the French word correspondance and the English word correspondence. Cognateness is defined by Simard et al. (1992) as word pairs which share the same first four characters (4-grams), including also invariant chains such as proper nouns and numbers. Simard et al. show empirically that cognateness is strongly correlated with translation equivalence. On the basis of a probabilistic model, they estimate the probability of a segment cluster given its cognateness. This model, combined with the length-based model, yielded significant improvements of the results achieved by Gale & Church. In previous works, we show that a special filtering of cognate words can give a very precise and complete prealignment: in the case of the BAF corpus, we obtained 80% of the full alignment, with a very low error rate (about 0.5%). Of course, the exploitation of formal similarities depends on the languages involved. In the case of related languages such as English and French, cognateness is important. In the case of technical texts we can expect to observe cognates even between unrelated languages, because technical and scientific terms usually share common Graeco-Latin roots.

. The concept of lexical correspondence Usually, lexical correspondences are treated as a particular case of alignment. In the Arcade project, for instance, lexical spotting is seen as a simpler subproblem of full alignment. Brown et al. (1990) give the following example of

Translation alignment and lexical correspondences

what can be described as word alignment: (2) The poor don’t have any money Les pauvres sont démunis A={(The ; Les) (poor ; pauvres) (don’t have any money ; sont démunis)}

Even if it is generally admitted that the condition of quasi-monotony does not hold in this case, the supposed one-to-one matching seems to justify the concept of word alignment. Let us examine the problems that are involved here. Segmentation discrepancy From a monolingual point of view, a lexical unit is defined in terms of syntactic and semantic autonomy. A compound expression can be characterised by the conjunction of several criteria: – – –

a certain degree of semantic non-compositionality. more or less syntactically frozen structure. a certain recurrence.

We will not discuss the complexity of this problem. The definition of a lexical unit is a difficult problem in linguistics, and no consensus has been reached so far in the linguistic community. In any case, it appears that the units emerging from lexical alignment do not have lexical consistency, depending only on the structural homology between the related segments. For instance, another translation of the previous sentence results in different units: (3) The poor don’t have any money Les pauvres n’ont pas d’argent A={(The ; Les) (poor ; pauvres) (don’t have ; n’ont pas) (any ; d’) (money ; argent)}

Lexical alignment yields non-lexical compounds, but it can also break up genuine lexical units. For example, we can align the English, French and Italian expressions in different ways: (4) To be the very devil Avoir le diable au corps Avere il diavolo in corpo French/Italian: A ={(Avoir ; Avere) (le ; il) (diable ; diavolo) (au ; in) (corps ; corpo)} English / French: A = {(To be the very devil ; Avoir le diable au corps)}





Olivier Kraif

In this case we have word-for-word correspondence inside the lexical unit across Italian and French. The problem is: should the lexical alignment be allowed to break up lexical compounds, when it is possible?

Semantic discrepancies Another problem is semantic discrepancy, which is common between a text and its translation. The following example is extracted from a European Parliament report.3 (5) the marking of banknotes for the benefit of the blind and partially sighted l’émission de billets de banque identifiables par les aveugles et par les personnes à vision réduite [literally: ‘the issue of banknotes identifiable by the blind and partially sighted persons’]

The phenomenon of semantic discrepancy is frequently found in the practice of translating. This can be explained by the importance of the extra-linguistic level. Translation, as Pergnier notes (1993: 23), is not only an operation between two different languages, it is first a transformation between messages, involving the whole pragmatic and conceptual background.4 As Pergnier (1993: 75) says, “the equivalence at both levels, between two utterances and between the signs that they include, does not exist before the translation, but is a consequence of it” [my translation]. Thus the contrastive level, i.e. the possible equivalence between signs of different systems, is secondary: it is a result of translation as an act of communication, as shown in Figure 3. Source Language

Linguistic level: mediated contrastive relation

Target Language

Text 1

Pragmatic and extra-linguistic level Translational equivalence

Text 2

Figure 3. The level of translation equivalence

Translation alignment and lexical correspondences

As a result, lexical alignment based on semantic criteria is very often unclear. In these two sentences (6) the various policies for access to employment for disabled people les différentes politiques mises en œuvre pour permettre l’accès des personnes handicapées à l’emploi [literally: ‘the various policies implemented to allow disabled people to access a job’]

divergent solutions are possible for the following phrases: A={(for ; mises en œuvre pour permettre)} or else, if we take omissions into account: A={(∅ ; mises en œuvre) (for ; pour) (∅ ; permettre)} These semantic discrepancies, combined with segmentation difficulties, create very complex configurations in lexical alignment. Consider the following case: (7) The assessment of the official cause of death is a piece of information vital to these registers. Pour la bonne tenue de ces registres, l’évaluation des cas de mortalité constatés par les autorités apporte des informations importantes. [literally: ‘For the good keeping of these registers, the evaluation of causes of death noted by the authorities gives important information’]

In these sentences we observe correspondences between discontinuous units: A={(vital ; importantes […] pour la bonne tenue de ces registres)} There are thus two possible alignments of the following phrases: A={(cause of death ; cas de mortalité) (official ; constatés par les autorités)} or A={(official cause of death ; cas de mortalité constatés par les autorités)} Since semantic discrepancy and segmentation inconsistency are not discrete phenomena, but follow a continuum of intensity, the determination of reliable criteria to solve this kind of alignment is almost impossible. Recently great attention has been given to automatically extracted bilingual glossaries. Indeed, as we have seen before, probabilistic models make it possible to extract lexical correspondences by comparing the distribution of lexical items in a parallel corpus. Large-scale evaluations, as in the Arcade project, have been designed to test these methods and to guide the construction of





Olivier Kraif

a gold standard, established on the basis of a test corpus, in order to benchmark the different systems. In order to cope with the problems inherent in the concept of lexical alignment and delineate more clearly the task of automatic lexical pairing, we propose a redefinition of the concept of lexical correspondence.

Lexical correspondences We agree with Debili (1997:200) that lexical alignment is “neither one-to-one, nor sequential, nor compact. Correspondences are fuzzy and contextual.” He therefore proposes to distinguish between “lexical correspondence”, where the mutual translation can be validated by a bilingual dictionary, and “contextual correspondence” (1997:203), i.e. translation that depends on a specific context. But we do not subscribe to this point of view. The attestation of a dictionary is a somewhat arbitrary criterion, and it does not reflect the inherent continuity of the phenomena. We prefer to distinguish two different kinds of task: alignment and the determination of correspondences. Indeed, lexical correspondence can be defined in a very restricted sense: A lexical correspondence is a relation of denotational (conceptual, extra-linguistic) equivalence between two lexical units in the context of two segments that are translation equivalents. This definition raises the following issues: –

– –

lexical units are linguistically defined, in a monolingual context. By adopting a broad definition of lexical units, including compounds, phraseology and even terms, it is possible to avoid the issue of segmentation inconsistency. If the problem is shifted to a monolingual point of view, its resolution appears to be far more reasonable. we focus on the contextual sense of the lexical unit (referring to the opposition between “signe type” and “signe occurrence” made by Rastier 1991:96). monotony and one-to-one matching are no longer presumed, in accordance with empirical observations.

We feel that lexical alignment is a nebulous notion which inherits most of the misleading statements from the first generation of MT systems. For instance, in this case: (8) the marking of banknotes for the benefit of the blind and partially sighted l’émission de billets de banque identifiables par les aveugles et par les personnes à vision réduite.

Translation alignment and lexical correspondences

We can draw the following correspondences: C={(banknotes ; billets de banque) (blind ; aveugles) (partially sighted ; personnes à vision réduite)} The rest of the sentences is just a normal translation residue, due to the divergences between the two versions. These divergences can have a linguistic cause (e.g. morphosyntactic or lexical differences) or not (e.g. conceptual inferences).

Maximal resolution alignment This kind of lexical correspondence differs from sub-sentence alignment. We define a special kind of alignment that is very often confused with lexical correspondence: A maximal resolution alignment is a matching of the smallest possible segments in accordance with the principle of translational compositionality. This kind of alignment does respect the criteria of parallelism, except for monotony below the sentence level. In such an alignment, the syntactic characterisation of the segments is not determined: it can be a word, a phrase, a whole sentence, or even a paragraph. This depends on whether the translation is literal or not: if the translation of a sentence cannot be decomposed, the sentence has to be considered as a whole. Translation spotting, as defined in the Arcade project, appears to be a kind of maximal alignment, and yet it is fragmentary: it focuses on segments that contain some specific lexical units. For instance, looking for the correspondence of the French word apporter, it yields the alignment between the boldfaced segments: (9) A meeting held in Brussels […] went a long way towards meeting the concerns expressed by the Honourable Member. Une réunion, qui s’est tenue à Bruxelles […] a permis d’accentuer l’effort pour apporter des éléments concrets de réponse aux préoccupations exprimées par l’honorable parlementaire.

The notions of translational compositionality and maximality capture very neatly the criteria of translation spotting. In discussions about the appropriateness of aligning peas with pois in the phrases green peas and petits pois, the noncompositionality of this translation pair gives a very clear solution: petits pois and green peas cannot be decomposed.





Olivier Kraif

The characteristics of lexical correspondence and maximal alignment are summed up in Table 2. Table 2. Characteristics of Lexical Correspondence and Maximal Alignment Lexical Correspondence

Maximal Alignment

Segmentation criterion.

Monolingual, lexical unit level

Segmentation depends on structural homology between texts. It is based on both translational compositionality and on maximality: the segments cannot be decomposed further

Formal characteristics

Usually one-to-one relations Quasi-bijection, quasi-monotony between some lexical units, and below sentence level. the rest is residual. Many-tomany relations are also possible.

Syntactic nature of the segments

Lexical unit: words, compounds, set phrases, terms.

No syntactic consistency: word, phrase, sentence, paragraph.

Pairing criterion

Denotational identity (in the occurrence context).

Translation equivalence

To illustrate these two concepts, we give another example: (10) Confidential secret service information on applicants for European civil service posts Récolte de données à caractère personnel par les services secrets d ‘ un État membre sur les candidats aux concours organisés par les institutions européennes

The maximal alignment could be as follows: A={(Confidential ; à caractère personnel) (secret service: par les services secrets) (∅ ; d’un État membre) (information ; Récolte de données) (on ; sur) (applicants ; les candidats) (for European civil service post ; aux concours organisés par les institutions européennes)} And we can extract the following lexical correspondences: C={(confidential ; personnel) (secret service ; services secrets) (information ; données) (on ; sur) (applicant ; candidat) (European ; européennes)}

Translation alignment and lexical correspondences

. Conclusion These reflections aim at defining and clarifying the key concepts of alignment and correspondence in the field of bi-text exploitation and evaluation. We make a distinction between two different types of bilingual pairing: the alignment of the smallest segments that are considered as translational equivalents (in accordance with the principle of translational compositionality), and the lexical correspondences which concern stable lexical units (in a broad sense) having the same denotational content. In fact, inside two aligned sentences, there is no need to have all lexical units correspond with each other. Semantic discrepancies between a sentence and its translation can be very important, and the assumption of quasi-bijection does not hold at the lexical level. This distinction opens up a number of new possibilities: –

the development of more consistent criteria in order to establish benchmark corpora in the field of evaluation,

–

a more accurate interpretation of the meaning of contrastive phenomena which emerge from a bi-text. The sets of textual segments constituting a bitext are not linked by specific linguistic properties, but by translational equivalence, which is defined at an extra-linguistic level. Of course, contrastive regularities can be observed at different levels: morpho-syntax, lexicology, terminology and phraseology. But these regularities are not rules: they emerge statistically from the recurrence of translation facts.

Notes * Many thanks to Kim Van den Broecke, Hélène Ledouble and Luc Bardolph for their helpful assistance in the editing of this article.  “Dynamic equivalence is therefore to be defined in terms of the degree to which receptors of the message in the receptor language respond to it in substantially the same manner as the receptors in the source language.” (Nida and Taber 1982:24).  The Canadian Hansard Copus consists in a French / English Canadian Parliamentary Proceedings, available at http://www.parl.gc.ca/36/1/parlbus/chambus/house/debates/ indexe/homepage.html 

These reports can be found at http://www.europarl.eu.int

 “Dire que la traduction opère sur des messages, c’est en effet proclamer qu’elle est un acte de communication (ou d’échange linguistique) avant d’être un acte de comparaison inter-linguale.” (Pergnier, 1993:23)





Olivier Kraif

References Brown, P., Cocke, J., Della Pietra, S., Jelinek, F., Lafferty, J., Mercer, R. and Roossin, P. 1990. “A statistical approach to machine translation”. Computational Linguistics 16: 79–85. Brown, P., Della Pietra, S. and Mercer, R. 1993. “The mathematics of statistical machine translation: parameter estimation”. Computational Linguistics 19: 263–311. Brown, P., Lai, J. and Mercer, R. 1991. “Aligning sentences in parallel corpora”. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 169–176. Berkeley, CA. Dagan, I., Church, K. W. and Gale, W. 1993. “Robust bilingual word alignment for machine aided translation”. In Proceedings of the Workshop on Very Large Corpora, Academic and Industrial Perspectives, 1–8. Debili, F. 1997. “L’appariement: quels problèmes?”. In Actes des 1ère JST FRANCIL de l’AUPELF UREF, 199–206. Avignon. Debili, F. and Sammouda, E. 1992. “Appariements de phrases de textes bilingues FrançaisAnglais et Français-Arabes”. In Actes de COLING-92, 528–524. Nantes. Dunning, T. 1993. “Accurate methods for the statistics of surprise and coincidence”. Computational Linguistics 19: 61–74. Fung, P. and Church, K. W. 1994. “K-vec: A new approach for aligning parallel texts”. In Proceedings of the 15th International Conference on Computational Linguistics, 1096–1102. Kyoto. Gale, W. and Church, K.W. 1991. “A program for aligning sentences in bilingual corpora”. In Proceedings of the 29th Annual Meeting of the ACL, 177–184. Berkeley, CA. Gaussier, E. and Langé, J.-M. 1995. “Modèles statistiques pour l’extraction de lexiques bilingues”. T.A.L. 36 (1–2): 133–155. Isabelle, P. 1992. “La bi-textualité: vers une nouvelle génération d’aides à la traduction et la terminologie”. Meta XXXVII (4): 721–731. Isabelle, P., Dymetman, M., Foster, G., Jutras, J. M. and Macklovitch, E. 1993. “Translation analysis and translation automation”. In Proceedings of the 5th International Conference on Theoretical and Methodological Issues in MT. Kyoto. Israël, F. and Lederer, M. 1991. La liberté en traduction. Actes du colloque international tenu à l’E.S.I.T. les 7,8 et 9 juin 90. Paris. Didier Erudition, Coll. traductologie. Kay, M. and Röscheisen, M. 1993. “Text-translation alignment”. Computational Linguistics 19: 121–142. Kraif, O. 1999. “Identification des cognats et alignement bi-textuel: une étude empirique”. In Actes de la 6ème conférence annuelle sur la Traitement Automatique des Langues Naturelles. TALN 99, 205–214. Cargèse, France. Langé, J.-M. and Gaussier, E. 1995. “Alignement de corpus multilingues au niveau des phrases”. T.A.L. 36 (1–2): 133–155. Langlais, Ph., Simard, M. and Veronis, J. 1998. “Methods and practical issues in evaluating alignment techniques”. In Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Montréal, Canada.

Translation alignment and lexical correspondences

Macklovitch, E. 1995a. “Can terminological consistency be validated automatically?”. In Proceedings of the IVèmes Journées scientifiques, lexicommatiques et dictionnairiques, organized by Aupelf-Uref. Lyon, France. Macklovitch, E. 1995b. “The future of MT is now, and Bar-Hillel was (almost entirely) right”. Centre d’innovation en technologies de l’information (CITI). Laval, Canada. [Available at http://www-rali.iro.umontreal.ca.] Melamed, I. D. 1998. “Models of co-occurrence”. In Technical Report #98–05. Institute for Research in Cognitive Science, University of Pennsylvania, Philadelphia, PA. [Available at http://www.cis.upenn.edu/~melamed/home.html] Nida, E.A. and Taber, C.R. 1982. The Theory and Practice of Translation. Leiden: Brill. Pergnier, M. 1993. Les fondements socio-linguistiques de la traduction. Lille: Presses Universitaires de Lille. Rastier, F. 1989. Sens et textualité. Paris: Hachette, Coll. HU. Sager, J. C. 1994. Language Engineering and Translation: Consequences of Automation. Amsterdam: John Benjamins. Sato, S. and Nagao, M. 1990. “Towards memory-based translation”. In Proceedings of COLING’90, 247–252. Helsinki. Simard, M., Foster, G. and Isabelle, P. 1992. “Using cognates to align sentences”. In Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, 67–81. Montréal, Canada. Simard, M., Foster, F. and Perrault, F. 1993. “TransSearch: un concordancier bilingue”. Centre d’innovation en technologies de l’information (CITI), Laval, Canada. [Available at URL http://www-rali.iro.umontreal.ca.] Simard, M. 1998. “The BAF: a corpus of English-French bitext”. In Proceedings of First International Conference on Language Resources and Evaluation, 489–494. Granada, Spain.



The use of electronic corpora and lexical frequency data in solving translation problems François Maniez Polysemy creates difficulties for human translators, and even greater difficulties for automatic translation programs. Starting from an example of such a difficulty (the translation into French of the compound sedimentation rate), a case is made for using computerized databases that include the most frequently used compounds and collocations. Attention is also given to measures of lexical frequency that take into account the various meanings of a polysemous lexical item. The study of a case of syntactic ambiguity demonstrates that taking into account the frequency of the collocation based on in two separate corpora (scientific and non-scientific) makes it possible to achieve disambiguation in an automatic translation program if it is made sensitive to “lexical environment”. A third example deals with the failure to detect a non-contiguous collocation. The factors that contribute to the correct interpretation of such lexical constructs by bilingual translators are examined, as well as the ways in which their recognition can be replicated in the creation of automatic translation software.

.

Disambiguation techniques and automatic translation

The first step in automatic language processing with a view to translation from English into a foreign language is automatic part-of-speech (POS) tagging. A number of tagging programs exist and have been used with large corpora, including Brill’s stochastic tagger (see Brill 1993), and their accuracy is improving given that generally they can identify successfully the POS of about 95% of all words that are processed. The CLAWS4 automatic tagger (described in Garside 1987) that was used with the 100-million-word British National Corpus (BNC)



François Maniez

produced erroneous tags for only an estimated 1.7% of all words, with approximately 4.7% of the tags labeled as “ambiguity tags” — cases in which the automatic tagger was unable to decide which was the correct category, for instance between VVD (past tense verb) and VVN (past participle). A trickier problem is that of word-sense disambiguation, for which there is no totally reliable automatic processing to date. Part of the Brown Corpus (about 670 000 words) has been manually tagged at the University of Princeton, using semantic tags that correspond to the word senses of the WordNet data base (Miller 1990), and attempts are in progress to match the WordNet project with other European Languages (Vossen 1996). The DEFI project team at the University of Liège (Michiels 1996) has developed an automatic word-sense disambiguation project that matches words drawn from the context in which an ambiguous word is found with the words contained in the dictionary definitions for its various meanings, using the large Collins-Robert machine-readable dictionary (MRD). In another experiment involving an MRD, the Collins English-German dictionary, Neff & McCord (1990) successfully used monolingual resources comparable to WordNet synsets to match a given polysemous word with one of the collocates provided in entries and thus determine the correct translation in context. Sutcliffe & Slater (1995) also describe a method that uses word association in order to achieve word-sense disambiguation, and point out that higher levels of performance can be obtained in restricted contexts where domain-specific word-sense frequency data can be exploited. Experiments in the field of word-sense disambiguation are also being carried out by the Rank Xerox research team in Grenoble. Encouraging results have been obtained by Dini et al. (1998), using 45 upper-level WordNet semantic tags and merging them with functional tags into single tags in order to achieve disambiguation. The current limitations of their unsupervised learning algorithm lie in the fact that it is only able to learn relations among bigrams. The fact that Wordnet does not distinguish between homonymy and polysemy has also proved to be something of a hurdle.

. Corpora It has become commonplace to argue that computer science has now revolutionized the study of language, and the contribution of corpus linguistics to the fields of lexicography and phraseology has been widely demonstrated. My aim, therefore, is not so much to prove the usefulness of computerized tools in the

The use of electronic corpora and lexical frequency data in solving translation problems

field of translation as to define the role of man in the collection of the data that are needed to improve expert systems or to assist in the compilation of reference tools that can be of use to translators. Considering the huge amount of data that is now available in electronic form, the creation of an electronic corpus is no longer a matter of gathering a sufficient quantity of information, but rather of selecting the data that are relevant to the linguistic mechanisms one wishes to focus on. In the case of bilingual corpora, the issue of choice between parallel and comparable corpora has been raised by many authors, for example Teubert (1996). As regards monolingual corpora, the main problem that needs to be solved before compilation is the desired level of homogeneity and what subdivision of a given language is best suited to one’s research. Also, as has been pointed out by Clear (1996), the issue of corpus size must be addressed if one is concerned with the statistical significance of the data on which linguistic observations are based. For the present study, I have used two separate electronic corpora: the first consists of articles published in Time Magazine in the past ten years (TIME 20TH CD-ROM: approximately 10 million words); the second is a collection of medical articles taken from a CD-ROM (Internal Medicine 1993), with a total of approximately 18 million words. For the study of high-frequency lexical items, I have also used subsets of these two corpora: all the articles published in Time Magazine in the year 1991 (henceforth referred to as Time91) and a subsection of articles published by the Journal of the American Medical Association in the year 1993, which I called Corpumed. Both subsets were analyzed with John Bradley’s TACT program (Bradley 1989), which was developed at the University of Toronto. Taking as a starting point some concrete translation problems encountered by our own medical students, I searched our corpora for lexical frequency and co-occurrence data that might help a human translator to solve the problems that were created by some instances of lexical or syntactic ambiguity, with a view to formalizing some of our findings so as to make them ready to use in the framework of an automatic translation program.

. Some examples of misinterpretation . Polysemous lexical units (1) Hematocrit was 0.38, with an elevated white blood cell count of 13.3X109/L. The Westergren method for erythrocyte sedimentation rate was 103 mm/h.





François Maniez

L’hématocrite était de 0.38 (38%), et la numération leucocytaire était élevée (13,3 x 109/l). La (VSG) vitesse de sédimentation globulaire (sanguine / des hématies / érythrocytaire), mesurée par la méthode de Westergren, était de 103 mm/h.

The uses of the word rate can be divided into the following five categories (sorted by descending order of frequency in our general corpus). The French equivalents are given for each meaning: a. standard of reckoning obtained by expressing the quantity or amount of one thing in relation to another (taux, pourcentage) b. measure of charge or cost (prix, tarif) c. speed of movement, change, etc; pace (vitesse, rythme, cadence) d. measure of value (ordre, classe) e. case, as in at any rate (cas) It is worth noting that our distinctions are a reflection of the division into French equivalents. For instance, some monolingual dictionaries will combine meanings (b) and (d) into one category. Example (1) was given in its original context to a class of graduate students who specialize in medical translation, and they were asked to translate it into French. Although a speed unit (mm/h) was mentioned, about half of them misunderstood rate, translating it by an equivalent usually reserved for meaning (a), the word taux (it is quite possible that some of them were misled by the mention of hematocrit in the previous sentence, as the result of this test is typically given in the form of a percentage). As the compound sedimentation rate is very frequently used in medical literature, one may safely assume that its translation would not have posed much of a problem for today’s best translation software, provided a specialized dictionary was included. However, it is worth considering the ways in which lexical statistics might help human translators in their task. First of all, one can safely assume that the number of polysemous words that are used in scientific literature is limited. If one could draw up a list of the most frequently used polysemous words in a given field, one might consider various ways of signaling their presence to the translator, provided his original text was in machine-readable form. As a test, I examined a list of the words whose frequency was higher than 200 in the Corpumed corpus (the Figure 200 was chosen because the frequency for rate was 225 in that corpus). If we set aside function words, the majority of our hits were monosemous nouns (blood, bone, breast, calcium, cancer, data, diagnosis, levels, patients, symptoms, weight),

The use of electronic corpora and lexical frequency data in solving translation problems

and the only polysemous words that were in the same order of frequency as rate were the nouns stroke, table and trial. It thus seems feasible to establish a list of the most frequently used polysemous words in scientific English in order to point out possible ambiguities to the translator. However, if the aim is to assist both human translators and machines in their task, it might prove useful to add some measure of frequency information to such a list. If we once again consider the examples given above, it is more than likely that in a medical publication the meanings of the words stroke, table and trial will be those that would translate into French as accident vasculaire cérébral, tableau and essai, not as coup, table or procès. Bearing this in mind, it might prove useful to provide the translator with statistical information about the relative frequency of a given lexical item in specialized literature as opposed to everyday speech, as well as potential differences in the proportions in which the various meanings of polysemous words are used depending on the type of language. Table 1 shows that the word rate is quite representative of such context-dependent variation. Table 1. French translation for the word rate in the TIME91 and CORPUMED corpora. Uses of rate

TIME91 (1,8 M)

CORPUMED (306 000)

Absolute frequency of rate Normalized frequency of rate (per 500,000 words)

290 080.5

225 367.6

Collocation types

112

021

a) pourcentage, taux

054 (48%)

019 (90%)

b) tarif, prix

014 (12%)

000

c) vitesse, rythme

011 (10%)

002 (10%)

Other

033 (29%)

000

The table shows two things. The first is that the word rate is used almost five times as frequently in the medical corpus as in the general corpus. Another point worthy of note is that meaning (b) is not to be found in the medical corpus; meaning (c) is likewise hardly ever used. It should be noted, however, that searching the general corpus for the plural form (rates) gave us very different results (meaning (a): 187 occurrences, meaning (b): 23 occurrences, meaning (c): 4 occurrences), which makes it worth considering the case for differentiating collocational statistics in the singular and plural forms.





François Maniez

In conclusion, we can see that in the medical corpus, meanings (a) and (c) are the most frequent and that in order to determine the exact meaning of the word rate, it is very often sufficient to examine the word that immediately precedes it (its premodifier). One can thus consider including several different kinds of information in a dictionary that lists compounds and collocations: 1. A list of all the premodifiers of a given lexical item 2. The most frequently found meaning of that particular lexical item, depending on which premodifier is used, or on which type of specialized language is used. In the case of medical literature, for example, one may want to assign meaning (a) as the default value for rate; consequently, one might consider including in the dictionary (or displaying as the result of a given query) only those word combinations with a meaning other than percentage. 3. The most frequently used translation of a given word combination when a choice has to be made between several possible translations of the premodifier, possibly depending on the type of specialized language one is dealing with. Thus, the translator could be told that success rate is more frequently translated as taux de réussite in a medical context than as pourcentage de réussite; however, both forms exist and are acceptable in everyday use. In the case of the collocates for rate in meaning (a), including this type of information when entering data would allow users to test the acceptability of collocations, leading them to discard pourcentage d’intérêt as a possible translation of interest rate. As suggested in the case of rate and rates, one might also include information concerning the proportion of the meanings of the singular and plural forms, provided it is statistically significant. In the case of a dictionary devoted only to collocations in scientific English, one can easily imagine that the high level of repetition of the same lexical combinations, which has been confirmed by various statistical measurements of lexical richness, will make it unnecessary to resort to any kind of detailed semantic analysis. However, in the case of a reference work which aims to describe the general use of a given language, the model suggested by Fontenelle (1997) using the lexical functions defined by Mel’¦cuk (1994) seems best suited to a comprehensive description of collocational structures in all their diversity. . Syntactic ambiguity The extract given below was also misinterpreted by half the students asked to translate it into French. In this case, however, the mistake is likely to be repro-

The use of electronic corpora and lexical frequency data in solving translation problems

duced by the morpho-syntactic analyzers of most of today’s automatic translation programs. (2) Not infrequently, patients with Gaucher’s disease are initially presumed to have lymphoproliferative disorders or childhood soft tumors based on their “abdominal mass” and suppressed blood cell counts. Il n’est pas rare que l’hypertrophie abdominale et la cytopénie sanguine des patients induisent (provoquent) initialement le diagnostic de maladie lymphoproliférative ou de tumeur des parties molles chez l’enfant dans les cas de maladie de Gaucher.

In this case, the students who misunderstood the sentence interpreted based as the past participle of a verb form whose auxiliary had been deleted together with the pronoun of an underlying relative clause (soft tumors that are based on their “abdominal mass”… ), failing to recognize based on as a complex preposition that is synonymous with because of or due to. The semantic analysis that is necessary in order to rule out the erroneous interpretation is beyond the capacity of today’s automatic translation programs, but the root of the mistake probably lies also in the fact that the reduction of a relative clause in this fashion seems to be more frequent in French than in English. It should also be noted that in many cases the fact that the prepositional clause introduced by based on is placed at the beginning of a sentence prevents such ambiguity. As in the case of example (1), I examined the occurrences of the expression based on in the same corpora (Time91 and Corpumed). The results are shown in Table 2. Table 2. Frequencies of based and based on used as a complex preposition (CP) in the two corpora. Corpus

Occurrences of based and their relative frequency in corpus

TIME91

Occurrences of based on and their % of all occurrences of based

Occurrences of based on as a CP (start of sentence)

Occurrences of based on as a CP (other)

Total % of occurrences of based on as a CP

300 (0.016%) 181 (60%)

11

12

12.7

CORPUMED 145 (0.047%) 138 (95%)

13

18

22.5

Bearing in mind that Time91 has approximately six times as many tokens as Corpumed, it appears that the frequency of based is three times as great in the medical corpus as in the Time corpus, while the frequency of based on is five times as great and the frequency of based on as a complex preposition ten times as high.





François Maniez

It thus seems that including such collocations in a bilingual scientific lexicon (whether it is machine-readable or not), together with their frequency in a corpus in the way that has been suggested above for rate, could be of use to the translator. . Non-contiguous collocations In example (3), misinterpretation resulted from failure to identify a collocation (mixed results) which is frequently used in a non-scientific context. (3) Results of trials of selective gut decontamination have been mixed. The general consensus is that although some infections can be avoided, overall mortality is not reproducibly influenced.

The following translation appeared in the French edition of the Journal of the American Medical Association: Les résultats des essais cliniques sur la décontamination digestive sélective ont été analysés. Le consensus général est de dire que, si quelques infections sont évitables, la mortalité globale n’est pas modifiée de façon reproductible.

In this instance, the translator wrongly assumed that the English sentence contained a passive form whose underlying active equivalent was <somebody has mixed the results>, whereas in fact mixed is obviously used as an adjective. Most probably the translator believed that “mixing results” was a way of referring to a common tool of medical statistics, the combining of results from trials in which similar protocols were used (a better French translation would have been: Les résultats des essais cliniques sur la décontamination digestive sélective ont été mitigés.) It seems that human intervention is still required in order to solve such syntactic ambiguities. I submitted example (4) to the CLAWS 4 grammatical tagging program (Garside 1987) created at the Unit for Computer Research on the English Language (UCREL) of Lancaster University (the program has been used for the tagging of the British National Corpus). Even though such tagging generally proves useful in the disambiguation process, the results listed below show that here, too, mixed was interpreted as being part of a passive verb form: (4) Results_NN2 of_IO trials_NN2 of_IO selective_JJ gut_NN1 decontamination_NN1 have_VH0 been_VBN mixed_VVN ._. The_AT general_JJ consensus_NN1 is_VBZ that_CST although_CS some_DD infections_NN2 can_VM be_VBI avoided_VVN ,_, overall_JJ mortality_NN1 is_VBZ not_XX reproducibly_RR influenced_VVN ._.

The use of electronic corpora and lexical frequency data in solving translation problems

In order to develop translation software that could solve or reduce such problems, it is worth analyzing which factors play a role in the correct interpretation of such ambiguous forms. I found three possible factors, two of which rely on purely lexical knowledge: a. Previous knowledge of verbs that are synonymous with mixed and are known to co-occur with results. Searching our corpus for various occurrences of the word results revealed that combine is generally used instead of mix to express the compilation of results known as meta-analysis in medical literature. Example (5) (Reidenberg 1993) actually provides a definition of this procedure: (5) How best to combine the results of different clinical trials to produce a single valid conclusion has been an issue in clinical pharmacology and the rest of medicine since literature reviews were first conducted. Although formal statistical methodology for combining clinical trial results, or meta analysis, is an improvement over earlier methods of less formal literature review and interpretation, one must not let the rigor and formality of the statistics give the analysis more credibility than the underlying data deserve.

Storing collocations such as combine results in a collocation database in the VERB — NOUN category could be a step towards avoiding misinterpretation of strings like mix results, provided the user had access to the collocates of polysemous words on a semi-automatic basis. A search carried out in our Internal Medicine 93 corpus showed that the collocation appeared in 46 articles in its contiguous form. Conversely, mix is never found to co-occur in the active form with results, and whenever the form mixed co-occurred with results, it was always used as an adjective. b. Storage of collocations such as mixed results. The storage and automatic retrieval of such collocations obviously seems to be a sine qua non for correct interpretation. In the previous instance, a computer program with a well-documented data base was able to achieve what humans achieve through their awareness of polysemy. However, the task of a human translator is not quite completed when the correct meaning has been assigned to the adjective mixed, as several French equivalents can be used depending on what the English node word is. Table 3 summarizes the use of collocates for mixed in our corpora and indicates those that are listed in two monolingual dictionaries, Webster’s Encyclopedic Unabridged Dictionary, 1989 (WEUD) and The American Heritage Dictionary, 1998 (AHD). The suggested translation





François Maniez

equivalents are taken from the Robert and Collins Senior Dictionary (1995), except those for signals, reviews and messages, which are my own. Table 3. Frequency of collocations with the word mixed in the two Time corpora and their normalized frequency (number of occurrences per 1 million words). Collocates for MIXED

Freq. in TIME91

Freq. in TIME20th

Suggested Translation WEUD AHD Equivalents

Abs. 0Norm. Abs. 0Norm. signals

11 05.5

26 02.6

signaux, messages contradictoires

race

2

01

25 02.5

race mixte

blessing

4

02

21 02.1

avantage incertain

feelings

1

00.5

21 02.1

sentiments contraires, contradictoires

reviews

2

01

21 02.1

avis partagés

results

2

01

19 01.9

resultats mitigés, bilan contrasté

messages

1

00.5

16 01.6

signaux, messages contradictoires

economy

2

01

10 01

économie mixte

emotions

2

01

10 01

sentiments contradictoires

x

x

x

A brief comparison of the figures that were obtained in the larger corpus and its subset (10 million vs. 2 million words) demonstrates that it is necessary to use a large corpus when searching for co-occurrence data that concern infrequently used lexical items, as only two out of ten of the listed collocations occurred more than twice in the smaller corpus, a threshold under which statistical significance may be considered doubtful. c. Identification of collocations that occur in a non-contiguous form. Most programs that automatically retrieve collocations from computer corpora isolate recurring multi-word strings (as is the case with Collgen, the collocation generator that comes with the TACT software) or provide concordances for two words that have been selected by the user according to certain contiguousness parameters. Generally, the more intervening words there are between the node word and its collocate, the less likely it is to be identified as a statistically significant instance of co-occurrence. Needless to say, a large distance

The use of electronic corpora and lexical frequency data in solving translation problems

between the two components of a collocation is also an obstacle to human understanding. When I asked French students of English to translate example (3), two thirds of those who misunderstood the sentence knew of the collocation, and claimed that they would have had no trouble understanding it in a shorter sentence such as Results have been mixed. Automatic identification of collocations would no doubt be made easier if their components were stored together with the various grammatical forms in which they co-occur, if possible in descending order of probability of occurrence. I searched for such differences in the TIME91 corpus and in a subset of our large medical corpus (Internal Medicine 93). The figures for the occurrences of mixed are shown in Table 4. Table 4. Grammatical status of mixed in the TIME91 and Internal Medicine 93 corpora Grammatical status for mixed

Time91

I.M. 93

tokens

% of all tokens grammatical forms

% of all grammatical forms

verb forms (active) verb forms (passive) adjectival uses (predicative) adjectival uses (attributive)

04 09 10 44

(6%) (13%) (15%) (66%)

(0.7%) (3%) (2%) (95%)

Total

67

002 008 005 261 276

Such figures could be used as a basis for prioritization in algorithms designed to solve such translation problems. In the case of mixed, the following steps could be followed: if grammatical analysis suggests that a passive form was used, and if the preposition with does not follow the occurrence of mixed, then the previous context could be scanned for occurrences of the word’s most frequently used collocates (results, response, attitudes, reactions, feelings in medical literature). If one of them was found, then the appropriate translation equivalent could be provided. If not, the sentence would be translated with the equivalent passive structure in the target language. However, one issue has yet to be addressed. Of the above-listed methods, which is easiest to formalize and adapt to automatic corpus processing with a view to generating collocations to be used in an automatic contextual retrieval program? If we consider the case for our first possibility, i.e. the suppression of an erroneous interpretation through the previous storage of a collocate with a higher probability of occurring with a given base, we can see that this method is





François Maniez

difficult to apply in the case of automatic translation. The subtle difference between mix and combine, if it could be expressed with the help of semantically distinctive features, would need to be weighed in relationship to the type of language that is used. In our particular case, the use of mixing results (as opposed to combining results in example 5) may sound strange in scientific prose, but acceptable in everyday speech. The same is true of the French equivalents (although the noun mélange is much more common than the verb mélanger in medical literature, the verb does occur in the specialized vocabulary of medicine). Actually, trying to reproduce such cognitive processes automatically would most probably prove too costly in terms of computer memory, since it would require: a. storage of all the collocations that match a given grammatical pattern (in this case, VERB — OBJECT NOUN PHRASE) for all the words of the text to be translated. b. elimination of some possible choices (such as MIX -RESULTS) based on the existence of synonymous collocates (such as combine) that are more frequently used; establishing a data base of potential synonyms would in itself require preliminary work, especially in terms of designing its structure. The second possibility seems to be better suited to automatic data processing, since generating collocations from computer-encoded texts is a relatively easy task. However, the amount of “noise” in relation to the signal needs to be emphasized. After processing a French medical corpus with the TACT collocation generator, I found that eliminating function words left us with only 6% of all the forms that had initially been retrieved by the program. Most function words are short, but since word length cannot be the sole criterion for paring down the lists of collocations that are generated by the program, it is necessary to eliminate certain lexical combinations. Table 5 provides an example of the collocations that were obtained from a 200 000-word gastroenterology corpus after preparatory work of this kind. The homogeneity in terms of grammatical categories is particularly striking, as 18 word combinations out of 20 are of the NOUN-ADJECTIVE type. The high frequency of this structure is rather typical of medical literature (and perhaps of scientific writing in general). A further look at the table reveals a clear distinction between compounds that belong to the specialized lexicon and collocations proper, with bases (aspect, augmentation) that are frequently used in non-scientific texts. As to the identification of non-contiguous collocations, the problem is

The use of electronic corpora and lexical frequency data in solving translation problems

twofold. First, an automatic analysis of the kind that was summarily described above would considerably slow down any automatic translation program because of the sheer number of such collocations. Second, their retrieval from computer-encoded texts would require the setting of a maximum span value for the search (which is possible in most concordance programs) and would have the same effect (in example (3), mixed and results are 8 words apart). In order to fine-tune any search module that uses this span function, it would be necessary to integrate data that list frequencies of occurrence in the non-contiguous form for each collocation (to take an example drawn from Table 2, one can easily predict that such statistics would reveal that aspects observés will be found in a non-contiguous form more frequently than atteinte vasculaire), so as to use such functions only where necessary. Table 5. Collocations with a frequency of ≥ 4 starting with the letter A in the French gastroenterology corpus. FREQ

WORD1

WORD2

12 08 07 07 06 06 06 06 05 05 04 04 04 04 04 04 04 04 04 04

atteintes anses aspects atrophie abcès anses aspects atteintes adénomes aspect abdominal anastomose anatomie angiomes antérieur arcades aspect aspect augmentation axes

inflammatoires grêles radiologiques villositaire hépatiques intestinales observés vasculaires hépatocellulaires pseudo-tumoral aigu gastro-jéjunale pathologique géants gauche dentaires nodulaire radiologique localisée vasculaires





François Maniez

. Conclusion The results obtained in the attempt to solve the various translation problems that have been discussed here seem to demonstrate the benefits that can be derived from automatic processing of machine-readable corpora, but they also show the limits of this approach. Human intervention remains necessary at a number of stages of the data gathering and formatting process. In the examples I have chosen to examine, ambiguity is always a consequence of the polysemous nature of a given lexical item, and word-sense disambiguation cannot be achieved without identifying and analyzing either a syntactic structure that is itself ambiguous or the collocate for that lexical item in a given context. We seem, therefore, to be confronted with a double task. First, what is needed is a comprehensive description of the syntactical ambiguities that occur in a given language and a corpus that lists examples of such structures, so that lexical co-occurrence phenomena can be examined and studied for disambiguation. Second, and such a task seems achievable in the case of scientific literature, it is necessary to establish a list of the most frequently used polysemous words in order to establish a certain number of translation rules that are based on statistically confirmed lexical co-occurrence data.

References Ahlswede, T. & Even, M. 1988. “Generating a relational lexicon from a machine-readable dictionary.” International Journal of Lexicography. Special issue edited by F. Frawley & R. Smith. Benson, M. 1985. “A Combinatory Dictionary of English”. Dictionaries 7: 189–200. Brill, E. & Marcus, M. 1993. Tagging an unfamiliar text with minimal human supervision. ARPA Technical Report. Church, K., Gale, W., Hanks, P., Hindle, D. & Moon, R. 1994. “Lexical Substitutability”. In Computational Approaches to the Lexicon, Atkins and Zampoli (eds), 153–177. Oxford: Oxford University Press. Clear, J. 1996. “Technical implications of multilingual corpus lexicography”. International Journal of Lexicography 9: 265–276. Cowie, A. P. 1986. “Strategies for dealing with idioms, collocations and routine formulae in dictionaries.” In Workshop on Automating the Lexicon 15–23 May 1986. Grosseto, Italy. Dini, L., Di Tomaso, V. & Segond, F. 1998. “Word sense disambiguation with functional relations”. Language Resource and Evaluation Conference, Granada, May 98. Fontenelle, T. 1994. “Towards the construction of a collocational database for translation students”. META 39:47–56.

The use of electronic corpora and lexical frequency data in solving translation problems

Fontenelle, T. 1997. Turning a Bilingual Dictionary into a Lexical-Semantic Database. Tübingen: Max Niemeyer. Garside, R. 1987. “The CLAWS word-tagging system”. In The Computational Analysis of English, R. Garside, G. Leech and G. Sampson (eds), 30–41. London: Longman. Heid, U. 1992. ‘Décrire les collocations — deux approches lexicographiques et leur application dans un outil informatisé’. Terminologie and Traduction, 2–3. Heid, U. 1994. “On ways work together — topics in lexical combinatorics” In Euralex’94: Proceedings of the Sixth Euralex International Congress, Martin et al (eds), 226–257. Amsterdam. Heid, U. 1994. “Relating lexicon and corpus: Computational support for corpus-based lexicon building in DELIS” In Euralex’94: Proceedings of the Sixth Euralex International Congress, Martin et al (eds), 459–471. Amsterdam. Knowles, F. E. 1986. “Computational lexicography and lexical databases”. In Proceedings of the 13th International ALLC Conference April, 1986. Norwich Association for Literary and Linguistic Computing. Knowles, F. & Roe, P. 1994. “SP and the notion of distribution as a basis for lexicography”. In Euralex’94: Proceedings of the Sixth Euralex International Congress, Martin et al (eds), 306–319. Amsterdam. Lakoff, G. 1993. “The syntax of metaphorical semantic roles”. in Semantics and the Lexicon, J. Pustejovsky (ed.), 27–36. Dordrecht: Kluwer Academic. Mel’¦cuk, I. & Wanner, L. 1994. “Towards an efficient representation of restricted lexical cooccurrence”. In Euralex’94: Proceedings of the Sixth Euralex International Congress, Martin et al. (eds), 325–338. Amsterdam. Michiels, A. 1996. “An experiment in translation selection and word sense discrimination using the metalinguistic apparatus of two computerized dictionaries”. DEFI Technical Report, 24. University of Liège. [Available at http://engdep1.philo.ulg.ac.be/michiels/ defi.htm] Miller, G. A. (ed). 1990. “WordNet: An on-line lexical database”. International Journal of Lexicography 3. Neff, M & McCord, M. 1990. “Acquiring lexical data from machine-readable dictionary resources for machine translation”. In Proceedings of the 3rd International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language, University of Texas at Austin, 85–90. Reidenberg, M. 1993. “Clinical Pharmacology”. Journal of the American Medical Association 270: 192. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Smadja, F. A., McKeown, K. R. 1990. “Automatically extracting and representing collocations for language generation”. In 28th Annual Meeting of the Association for Computational Linguistics, 252–259. New York. Sutcliffe, R. & Slater, B. 1995. “Disambiguation by association as a practical method: Experiments and findings”. Journal of Quantitative Linguistics 2: 43–52. Teubert, W. 1996. “Comparable or Parallel Corpora?” International Journal of Lexicography 9: 238–264. Thoiron, P. & Bejoint, H. 1989. “Pour un index cumulatif et évolutif de co-occurents”. Meta XXXIV, 4.





François Maniez

Vossen, P. 1996. “Right or wrong: combining lexical resources in the EuroWordNet project”. In Euralex’96, 715–728. University of Göteborg. Zgusta, L. 1967. “Multiword lexical units”. Word 23: 578–587.

Computer software and electronic corpora Bradley, J. 1989. TACT. Copyright (c) 1989 John Bradley, University of Toronto. JAMA & ARCHIVES JOURNALS, American Medical Association, 1994 Complete Collection. Ovid Technologies. Time Almanac 1993. Compact Publishing.

Multiconcord A computer tool for cross-linguistic research Patrick Corness

.

Introduction

Substantial monolingual text corpora exist in many languages and the number and size of these increase daily, providing a vast resource for empirical linguistic research. Individual researchers can readily build their own corpora suited to particular needs. The potential for data extraction from corpora is significantly enhanced by their annotation, based on internationally agreed markup conventions and standards. With the advent of parallel concordancing software for PCs (such programs, such as ParaConc, were first developed for the Macintosh platform), the scope for contrastive linguistics research based on translation corpora has expanded considerably. The growth of translation corpora for contrastive studies is dependent on the additional human and technical resources needed to align the original text with its translation as well as on the availability and suitability of the translations. Many researchers wish to explore texts of their own choice, involving them in the processes of deriving equivalent electronic text in at least two languages and, as a minimum, aligning the texts at paragraph or sentence level. In view of the advances in automatic annotation techniques and of the potentially great additional value given to a corpus by annotation, one might conclude that serious involvement in the field of contrastive analysis requires familiarity with the relevant standards, methodology and state of the art automatic tagging software. However, the actual scope and power of automated text annotation techniques, impressive as they are, remain always under develop-



Patrick Corness

ment. The capability of even the most sophisticated techniques currently available does not cater for all needs that contrastive linguists may have. At the same time, it should be recognised that parallel corpora with only minimal annotation, or even no annotation at all, are also an extremely valuable resource. For many concordancing purposes, non-annotated texts are perfectly adequate, and this applies also to parallel concordancing. It is likely that contrastive linguistics will continue for some time to grow not only through advances in automatic annotation but also through contributions dependent on manual manipulation of the data, for example by text editing software such as Microsoft Word, in conjunction with a parallel concordancing program. It is thanks to automated techniques, permitting the treatment of ever larger corpora, that contrastive analysis is presently coming into its own as an empirical science. It is logical, therefore, to pursue the enhancement of tools for automatic corpus analysis in order to enable researchers to work with expanding corpora and to concentrate their effort on those tasks which have to be performed manually. The role of manual procedures, including the checking of the results of automated analysis, remains significant. Some of the cognitive limitations that apply to machine translation (MT), as opposed to computer aided translation (CAT), are found when a high level of automation is attempted in corpus analysis. Contrastive analysis is more akin to CAT than to MT in its methodology, though the findings may contribute to both. As with translation itself, much can be achieved in the contrastive study of languages with the assistance of straightforward tools. A great deal is possible with aligned, though unannotated, corpora. Multiconcord (Woolls 1998) was designed primarily as a tool to assist in the teaching and learning of translation. It is a parallel concordancing program for Windows which works with aligned bilingual corpora, enabling a search to be made for all examples of a given expression and, simultaneously, for the respective translations which occur in the target text. The resulting parallel concordance can be viewed on the screen, sorted and saved as a text file (see Figs. 6–9). The pedagogical value of raising awareness of contrastive features, even on a small scale, is self-evident. If it can be demonstrated that a small, unannotated corpus can reveal important contrastive features, providing insights that go beyond what a substantial bilingual desk dictionary can provide, students should come to appreciate better the significance of context. They should then see bilingual dictionaries in their correct perspective and perceive the value of exploring translation strategies further with the help of better translation corpora and corpus analysis tools.

Multiconcord

The semantic and pragmatic significance of textual context could be revealed to students through Tim Johns’s Data Driven Learning (DDL) approach to monolingual corpora, via MicroConcord (Johns & Scott 1993). Now Multiconcord takes DDL a step further, exposing learners to authentic translation corpora for similar purposes, facilitating empirical problem solving in a bilingual, contrastive dimension. It is the aim of the present paper to point out how Multiconcord, as one tool on the translator’s workbench, can work effectively in conjunction with Minmark, its associated markup program, and Microsoft Word for the alignment and editing of results. Multiconcord returns up to 250 hits for a given search. Considering the permutations that are possible with the specification of context words, this is adequate for most learning purposes and contrastive data can be readily discovered that suggest a wider range of translation equivalents than even major printed bilingual dictionaries typically include. I will outline procedures and techniques for preparing a bilingual translation corpus for teaching and learning purposes and as a springboard for research. This is based on a small English/French corpus of approximately 3,000 words in each language. Then I will show some sample results of a Multiconcord search of George Orwell’s novel Nineteen Eighty-Four and its translation into Czech and Lithuanian.

. Techniques The first step is to open in Word the source and target texts constituting the required bilingual corpus. In the File menu, Page Setup is selected and under the Margins tab the left and right margins are adjusted to permit viewing both files in parallel windows on the screen. For the text which is to appear on the left of the screen, the right margin is set to approximately 11 cms. and for the text which is to appear on the right, a correspondingly wide left margin is set. By selecting Arrange All in the Window menu and then dragging the edges of the panes, the parallel arrangement seen in Figure 1 is achieved. The English/French translation corpus sampled here is taken from TransIt-TIGER English-French (Corness et al. 1997) The texts can now be aligned. Multiconcord requires texts aligned at paragraph level; the program uses an algorithm to identify the sentences containing the equivalents of the source language search words in the target text for display in the concordance. The first step in the alignment process is to Select All text in





Patrick Corness

Figure 1. Texts in parallel windows

Figure 2. Texts with automatic paragraph numbering

Multiconcord

the Edit menu and click on the Numbering icon so that Word numbers the start of each paragraph. However, a caution is needed here. If there are numerals at the beginning of paragraphs in the text itself, Word replaces these numbers by ‘bullets’ (paragraph markers in the form of small circles, squares or diamonds etc: ● ■ ❒ ◆ ✓ ➣) without warning. In this case, it is vital to click on the Bullets icon first, selecting the option not to replace existing numbers by bullets. Clicking the numbering icon then replaces bullets by numbering without affecting the ‘real’ numbers (see Figure 2 above). Editing either text to add or remove paragraph breaks will result in automatic changes in the numbering, enabling alignment to be checked and adjusted. Once the alignment has been completed, the numbering is no longer needed and may be removed. The aligned texts should now be saved as plain text files. The next stage is to mark up the texts so that they can be recognised and searched by Multiconcord. For this the accompanying Minmark program is used. A text is selected via the Minmark File menu (see Figure 3).

Figure 3. Selecting a text file for markup

By providing a valid filename for the marked-up counterparts of the selected texts at the prompt, pairs of files ready for Multiconcord are generated (see Figure 4). At this stage, it is essential to create pairs of files with names which are identical except for the extensions. English files must have the extension .en, French files .fr, German files .de, Spanish files .es etc. Minmark inserts the following markers needed by Multiconcord. The beginning and end of the text are marked as and respectively.





Patrick Corness

Figure 4. Saving marked-up files

Paragraph breaks are marked by

and sentence breaks by <s>. The markedup text can be viewed in Word and edited further if it is required, but any alterations to the paragraph structure at this stage entail the manual addition or removal of

markers as appropriate. A quick count of

markers in each text can be done in Word via the Replace facility in the Edit menu. ‘Replacing’

will not change the text, but Word will report the number of ‘replacements’ made each time. Where the translation merges two sentences of the original into one, or splits one sentence of the original into two, Multiconcord detects the difference in the number of sentences within the corresponding paragraphs and the algorithm is usually able to locate the correct place in the target text. The user has the option to select paragraph view to compare whole paragraphs when necessary. The first step when running Multiconcord (see Figure 5 below) is to select the Search language and Target language. Any filenames in the selected directory valid for the selected language pair will then appear in the Available Files list. Any number of the available files may be selected for searching. The search word or words are then typed in the search for box and added, followed by a click on the Start Search button. A report on each file searched, giving the number of hits, appears after a few seconds, then the search results can be viewed and sorted alphabetically on the search word itself or on the first, second or third word to left or right of the search word (see Figure 6 below).

Multiconcord

Figure 5. Search options in Multiconcord

Figure 6. Multiconcord search results





Patrick Corness

All examples are initially marked C1 (category 1). They can be changed to C2, C3 or C4, representing categories determined by the user, and the results resorted. Four categories may prove insufficient, however, and more sophisticated categorisation and sorting can be done subsequently in Word (see under Figure 7).

Figure 7. Saving concordance results

Figure 8. Initial results file in Word (unformatted)

Multiconcord

Figure 9. Results with some initial formatting

The results as currently sorted are saved in the Test screen, which also permits the creation of cloze tests based on the search words (see Figure 7 above). The results may be saved in a file with search and target language interleaved. For sorting of the data, however, tabular format is more practical. If the saved file is opened in Word, it does not at first appear in a very usable form (see Figure 8 above), but if the text is converted to a table in Word, selecting Tabs as the separator, a parallel results file is created, which can be easily edited and sorted. The first step might be to change to bold typeface (using Edit/Replace) all occurrences of the search word in the search text and the respective translations in the target text (see Figure 9 above). A third column can be added, categorisation descriptions or codes entered here and the file re-sorted on this column.

. Sample parallel concordancing results The purpose of the limited experiment described here is to show an example of how Multiconcord can be used to explore a corpus consisting of a single novel in English (George Orwell’s Nineteen Eighty-Four) and its translations into Czech and Lithuanian. Bearing in mind that English phrasal verbs typically have a broad semantic range, it was decided to check the variety of translations of a selected phrasal verb suggested by a major English-Czech desk dictionary and an English-





Patrick Corness

Czech dictionary of phrasal verbs and to compare these with the translations found in Nineteen Eighty-Four. The phrasal verb pick up was chosen, more or less at random. The four-volume English-Czech dictionary by Hais and Hodek (1991–1993) gives 58 different translations of pick up, as follows: dát dohromady; dodat si; dohán¦et; dostat; chopit se; chytat/chytit/chytnout; chytat za slovo; chytit se; koupit; nabalit si; nabrat; najít; nalo¦zit; narazit na; nasbírat; navázat známost; objevit; podat uvazovací lano na (p¦rístavní bóji); pochytit; posbírat; postavit se na nohy; probudit k ¦zivotu; p¦ribrat; p¦ridat; p¦rijet si pro; rozpálit; sbalit; sbalit své v¦eci; sbírat (se)/sebrat (se); sehnat; seznámit se náhodou; schrastit; spla¦sit; stlouci/stloukat; uklidit; uko¦ristit; vybrat si (spoluhrá¦ce); vytáhnout; vzchopit se; vzít do ruky; vzít s sebou; vzmáhat se; zabrat; zadr¦zet; zachránit; zachytit; zajistit; zajmout; zam¦e¦rit se na; zastavit se pro; zatknout; zesílit; zlep¦sovat se; znovu najít; znovu sledovat; zotavit se; zrychlit; zvedat (se)/zvednout (se)

The English-Czech dictionary of phrasal verbs by Luká¦s Vodic¦ ka (1992) offers 22 additional translations, with some references to usage but without contextualised examples: b´yt p¦ripraven´y zaplatit; dozv¦ed¦et se; kárat; mít se k zaplacení; nahodit; nalodit; napojit se; napomínat; nasko¦cit; navázat (na téma); opravovat; o¦zivit se; p¦rijít k; rozjet se; rozkopat; sp¦rátelit se; svézt; vyd¦elávat si; vyzvednout (si); vzít si (e.g. taxi); získat; znovu se chytit

In George Orwell’s Nineteen Eighty-Four there are 25 occurrences of pick up in all, and 12 different translations of it were found in the Czech version (see Table 1). Eleven cases were identified where there was a close semantic match between one of the 58 Czech equivalents of pick up suggested by Hais and Hodek (H&H) and occurrences in the Orwell novel. These are (an approximate general equivalent in English is shown for each expression) vzít do ruky ‘take hold of ’; zachycovat/zachytit ‘catch’, ‘detect’1; zastavit se pro ‘call for’, ‘collect’; zvednout ‘raise’: (1) As Winston wandered towards the table his eye was caught by a round, smooth thing that gleamed softly in the lamplight, and he picked it up. Winston p¦rikro¦cil ke stolu a jeho pozornost upoutala okrouhlá hladká v¦eci¦cka, která se jemn¦e leskla ve sv¦etle lampy; vzal ji do ruky. (2) He picked up his pen half-heartedly, wondering whether he could find something more to write in the diary.

Multiconcord

Lhostejn¦e vzal do ruky pero a uva¦zoval, zda p¦rijde je¦st¦e na n¦eco, co by zapsal do deníku. cf. H&H: he always picked up your personal stuff and looked at it vzal v¦zdycky do ruky vá¦s osobní materiál a prohlédl si jej (3) In a place like this the danger that there would be a hidden microphone was very small, and even if there was a microphone it would only pick up sounds. Na takovém míst¦e hrozilo minimální nebezpe¦cí, z¦e by tam byl skryty´ mikrofon, a i kdyby tam byl, zachycoval by pouze zvuky. (4) Any sound that Winston made, above the level of a very low whisper, would be picked up by it, moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. Ka¦zdy´ zvuk, ktery´ Winston vydal a jen¦z byl hlasit¦ej¦sí ne¦z velmi tiché s¦eptání, obrazovka zachycovala; a co víc, pokud z˚ustával v zorném poli kovové desky, bylo ho vid¦et a sly¦set. (5) To keep your face expressionless was not difficult, and even your breathing could be controlled, with an effort: but you could not control the beating of your heart, and the telescreen was quite delicate enough to pick it up. Nebylo t¦ez¦ké zachovat bezvyraznou ´ tvá¦r a s jistym ´ úsilím mohl c¦lov¦ek kontrolovat i dech; nikoli v¦sak bu¦sení srdce, obrazovka byla natolik citlivá, z¦e je zachycovala. (6) He and Julia had spoken only in low whispers, and it would not pick up what they had said, but it would pick up the thrush. Hovo¦rili s Julií sice jen s¦eptem a mikrofon by nezachytil, co r¦íkali, ale zachytil by drozda. (7) He and Julia had spoken only in low whispers, and it would not pick up what they had said, but it would pick up the thrush. Hovo¦rili s Julií sice jen s¦eptem a mikrofon by nezachytil, co r¦íkali, ale zachytil by drozda. (8) There were no telescreens, of course, but there was always the danger of concealed microphones by which your voice might be picked up and recognized; besides, it was not easy to make a journey by yourself without attracting attention. Obrazovky tu samoz¦rejm¦e nebyly, ale mohly tu byt ´ skryté mikrofony, kterymi ´ mohli zachytit a de¦sifrovat vá¦s hlas; krom¦e toho nebylo snadné vydat se sám na cestu, ani¦z to vyvolalo pozornost. cf. H&H: zachytit enemy planes picked up by our radar installations (9) Perhaps you could pick it up at my flat at some time that suited you?





Patrick Corness

Mo¦zná byste se pro n¦ej mohl n¦ekdy zastavit u m¦e doma, a¦z se vám to bude hodit. cf. H&H: I’ll pick you up at your house zastavím se pro tebe doma; p¦rijedu si pro tebe dom˚u (10) O’Brien picked up the cage, and, as he did so, pressed something in it. O’Brien zvedl klec a n¦eco na ní stiskl. (11) She picked the stove up and shook it. Zvedla va¦ri¦c a zat¦rásla jím. cf. H&H:

He bent down to pick up his hat Two cases were found where a potential translation equivalent given in H&H was adopted in the Orwell translation but where there was not a close semantic match, viz. nabrat ‘take up’; posbírat ‘gather up’: (12) With the tip of his finger he picked up an identifiable grain of whitish dust and deposited it on the corner of the cover, where it was bound to be shaken off if the book was moved. Nabral s¦pi¦ckou prstu drobné zrnko b¦elavého prachu a polo¦zil je na roh desek; kdyby s deníkem n¦ekdo pohnul, musel by je set¦rást. cf. H&H: nabrat … pletacím drátem [with a knitting needle]; where did you pick up with that queer fellow?; pick up speed nabrat rychlost (13) Pick up those pieces, he said sharply. Posbírejte to, r¦ekl zostra. cf. H&H: bits of information, souvenirs he had picked up all over the world

Finally, there were twelve examples of translations of pick up in Nineteen Eighty-Four which were not found in H&H, using the verbs nadzvednout se ‘rise’; popadnout ‘seize’, ‘snatch’; uchopit ‘grasp’; vzít ‘take’; vzít si ‘take with you’; zdvihnout ‘raise’. Only one of these verbs (vzít si) is mentioned in Vodic¦ka: (14) The girl picked herself up and pulled a bluebell out of her hair. Dívka se nadzvedla a vytáhla si z vlasu˚ modry´ zvonek. (15) The dark-haired girl behind Winston had begun crying out Swine! Swine! Swine! and suddenly she picked up a heavy Newspeak dictionary and flung it at the screen. Tmavovlasá dívka za Winstonem za¦cala vyk¦rikovat Svin¦e! a zni¦cehonic

Multiconcord

popadla t¦ez¦ky´ slovník newspeaku a mr¦stila jím do obrazovky. (16) He picked up his pen again and wrote: Op¦et uchopil pero a psal: (17) He drank another mouthful of gin, picked up the white knight and made a tentative move. Vlil do sebe dal¦sí dou¦sek ginu, uchopil bílého jezdce a zkusmo táhl. (18) He turned back to the chessboard and picked up the white knight again. Vrátil se k s¦achovnici a znovu uchopil bílého jezdce. (19) He picked up the children’s history book and looked at the portrait of Big Brother which formed its frontispiece. Vzal d¦etskou u¦cebnici d¦ejepisu a zadíval se na portrét Velkého bratra na titulní stran¦e. (20) Someone had picked up the glass paperweight from the table and smashed it to pieces on the hearth-stone. N¦ekdo vzal ze stolu sklen¦ené t¦ez¦ítko a rozbil ho na kusy o krb. (21) O’Brien picked up the cage and brought it across to the nearer table. O’Brien vzal klec a p¦renesl ji k bli¦zs¦ímu stolu. (22) He picked up the white knight and moved it across the board. Vzal bílého jezdce a táhl jím po s¦achovnici. (23) Let’s pick up a gin on the way. Cestou si vezmeme gin. cf. Vodi¦cka: e.g. taxi (24) He picked up his glass and drained it at a gulp. Zdvihl sklenku a naráz ji vypil. (25) He saw Julia pick up her glass and sniff at it with frank curiosity. Vid¦el, jak Julie zdvihla sklenku a p¦rivon¦ela k ní s up¦rímnou zv¦edavostí.

To summarise, for twelve out of the twenty-five occurrences of a randomly selected lexical unit in the novel six different plausible translations are found which do not occur in the authoritative bilingual dictionary used as a point of reference (though one of them is mentioned in the dictionary of phrasal verbs). Additionally, on the corpus evidence, two translations given in the dictionary are found to have a broader range of semantic equivalents than the dictionary mentions. If the evidence of the Lithuanian translation of Nineteen Eighty-Four is compared with an English-Lithuanian dictionary, a similar discrepancy is found. As equivalents of pick up, the English-Lithuanian/Lithuanian-English Dictionary by





Patrick Corness

Table 1. Summary of comparison between H&H and translations of pick up in Nineteen Eighty-Four Eleven translations of pick up in Orwell matching those found in H&H

Two translations of pick up Twelve translations of pick up in found in H&H but not Orwell not found in H&H matching Orwell

vzít do ruky (2)

nabrat (1)

nadzvednout se (1)

zachycovat (3) /

posbírat (1)

popadnout (1)

zachytit (3) zastavit se pro (1)

uchopit (3)

zvednout (2)

vzít (4) vzít si (1) [found in Vodic¦ka] zdvihnout (2)

Bronius Piesarskas & Bronius Svecevic¦ius (1997) gives the following: surinkti, pakelti, pasitaisyti, pager˙eti, pagauti, greit i¦smokti, pave¦zti, atsitiktinai susipa¦zinti, i¦sgelb˙eti (skeî stantiî ), sugauti (b˙egliî )

Of these, the Lithuanian translation attests one example of surinkti ‘gather up’, four of pakelti ‘raise’ and one of pagauti ‘catch’. Nineteen out of twenty-five occurrences of pick up are thus unaccounted for by the dictionary. Although forms of paimti ‘take’ are found as translations in ten cases and there are also two examples of pasiimti ‘take’/‘take with you’, neither of these verbs is given in the dictionary as an equivalent of pick up. This is a parallel phenomenon to the omission of the Czech verb vzít ‘take’, occurring four times in the translation, from the English-Czech dictionary and raises the question as to whether there is a tendency for dictionary compilers to focus on more specialised, less frequent, meanings of the phrasal verb while omitting more common ones. Other equivalents of pick up found in the Lithuanian translation are uz¦ ras¦yti ‘record’, uz¦ fiksuoti ‘note’, stverti ‘seize’, ‘snatch’ and atsise˙sti ‘sit up’.

. Translation equivalents A comparison of the two translations shows something of the translators’ respective strategies. To consider this, the various meanings of pick up in the

Multiconcord

English text can be categorised and contextualised, so that translation of meaning in context can be assessed and other potential factors then considered. One significant use of pick up is in relation to the concept of the all-pervasive surveillance by Big Brother which is central to the theme of the novel. The Lithuanian version reveals a greater variety of expression here, different semantic components being selected for emphasis. The detection of conversations by hidden microphones is rendered throughout in the Czech translation by the verb zachycovat/zachytit ‘catch’, ‘detect’, whereas there are four different equivalents in the Lithuanian version. Only in one case is the verb pagauti ‘catch’ found, corresponding closely to the Czech zachycovat/ zachytit ‘catch’, ‘detect’: (26) English

Any sound that Winston made, above the level of a very low whisper, would be picked up by it, moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. Lithuanian Jis pagaudavo bet kuriî Vinstono sukeltaî garsaî, bent kiek smarkesniî u¦z tyluî s¦nib¦zdesiî ; dar daugiau — kol jis nei¦seidavo i¦s lek¦ ˙ st˙es apimamo ploto, galedavo ˙ b¿uti ne tik girdimas bet ir matomas. Czech Ka¦zdy´ zvuk, ktery´ Winston vydal a jen¦z byl hlasit¦ej¦sí ne¦z velmi tiché s¦eptání, obrazovka zachycovala; a co víc, pokud z˚ustával v zorném poli kovové desky, bylo ho vid¦et a sly¦set.

There are examples in which, by using the verb u¦zra¦syti ‘record’, the translator has introduced a semantic component not explicit in pick up but derived from the wider situational context of the novel, indicating that conversations were universally recorded and used as evidence: (27) English

He and Julia had spoken only in low whispers, and it would not pick up what they had said, but it would pick up the thrush. Lithuanian Jiedu su D¦zulija kalbasi tiktai pa¦snib¦zdom, ir mikrofonas nepaj˙egtuî u¦zra¦syti juî z¦od¦ziuî, bet strazdaî u¦zra¦sytuî. Czech Hovo¦rili s Julií sice jen s¦eptem a mikrofon by nezachytil, co r¦íkali, ale zachytil by drozda.

(28) English

In a place like this the danger that there would be a hidden microphone was very small, and even if there was a microphone it would only pick up sounds. Lithuanian Tikimyb˙e, kad tokioje vietoje pasl˙eptas mikrofonas, buvo labai ma¦za; net jeigu ir yra mikrofonas, tai u¦zra¦sys tik garsus.





Patrick Corness

Czech

Na takovém míst¦e hrozilo minimální nebezpe¦cí, z¦e by tam byl skryty´ mikrofon, a i kdyby tam byl, zachycoval by pouze zvuky.

The example of uz¦ fiksuoti ‘note’ is a similar case. The phrase gal e˙jo visk aî u z¦ fiksuoti also emphasises the capability of the all-powerful state to record everything that anybody said: (29) English

To keep your face expressionless was not difficult, and even your breathing could be controlled, with an effort: but you could not control the beating of your heart, and the telescreen was quite delicate enough to pick it up. Lithuanian I¦slaikyti veidaî nereik¦smingaî buvo nesunku, pasistengus galima suvaldyti ir kv˙epavimaî, bet s¦irdies plakimo taip lengvai nesukontroliuosi, o teleekranas buvo pakankamai jautrus ir gal˙ejo viskaî u¦zfiksuoti. Czech Nebylo t¦ez¦ké zachovat bezvyraznou ´ tvá¦r a s jistym ´ úsilím mohl c¦lov¦ek kontrolovat i dech; nikoli v¦sak bu¦sení srdce, obrazovka byla natolik citlivá, z¦e je zachycovala.

In one case, by contrast, the semantic component ‘detect’ is subsumed under the component ‘recognize’: (30) English

There were no telescreens, of course, but there was always the danger of concealed microphones by which your voice might be picked up and recognized; ¦ Lithuanian Zinoma, c¦ia nera ˙ teleekranuî, bet gali b¿uti pasl˙eptuî mikrofonuî, i¦s kuriuî tavo balsas b¿utuî atpa¦zintas. Czech Obrazovky tu samoz¦rejm¦e nebyly, ale mohly tu b´yt skryté mikrofony, kterymi ´ mohli zachytit a de¦sifrovat vá¦s hlas;

The most common meaning of pick up is ‘take hold of ’, with or without the additional semantic component ‘raise’, only mildly inherent in the particle. As already mentioned, there are 12 examples of the use of paimti or pasiimti ‘take’ as the translation of this concept and 4 of pakelti ‘raise’. The context does not always provide clear authority for variation between paimti and pakelti, as can be seen from the following examples of similar contexts, but variety for its own sake can be a valid stylistic decision. Of course, raising one’s glass could have a very different meaning in English from simply picking it up: (31) English He picked up his glass and drained it at a gulp. ˙ e.˙ Lithuanian Jis pa˙em˙e<- raise> stiklineî ir vienu ypu i¦sger

Multiconcord

Czech

Zdvihl <+ raise> sklenku a naráz ji vypil.

(32) English He saw Julia pick up her glass and sniff at it with frank curiosity. Lithuanian Jis mate,˙ kaip D¦zulija pakel˙ e˙ <+ raise> taureî ir neslepdama ˙ smalsumo pauoste.˙ Czech Vid¦el, jak Julie zdvihla <+ raise>sklenku a p¦rivon¦ela k ní s up¦rímnou zv¦edavostí.

The Czech translation of pick up in this general sense also shows greater variation, in that the Czech verbs nabrat ‘gather up’, uchopit ‘grasp’, vzít ‘take’, vzít do ruky ‘take hold of ’, vzít si ‘take with you’, zdvihnout ‘raise’ and zvednout ‘raise’ respectively are met where plain paimti is found in Lithuanian. Pick up in the sense of ‘collect from somewhere and take away’ has an equivalent fixed expression in Czech. The Lithuanian version here is rather more descriptive (‘call at my house and take [it] with you’), using the general verb pasiimti just mentioned: (33) English

Perhaps you could pick it up at my flat at some time that suited you? Lithuanian Gal gal˙etum˙et kokiu patogiu laiku u¦zeiti pas mane namo ir pasiimti? Czech Mo¦zná byste se pro n¦ej mohl n¦ekdy zastavit u m¦e doma, a¦z se vám to bude hodit.

Lithuanian stverti ‘seize’, ‘snatch’ is found where a sudden, impulsive action is indicated by the context. This verb incorporates the semantic component ‘suddenly’, explicit in the original English context. The Czech version has the verb popadnout (‘seize’, ‘snatch’), which in itself expresses the impulsiveness of the action, yet a reinforcing adverb zni c¦ehonic ‘all of a sudden’ is also included: (34) English

The dark-haired girl behind Winston had begun crying out Swine! Swine! Swine! and suddenly she picked up a heavy Newspeak dictionary and flung it at the screen. Lithuanian Tamsiaplauk˙e mergina u¦z Vinstono nugaros prad˙ejo s¦aukti “Kiaul˙e! Kiaul˙e! Kiaul˙e!”, paskui stv˙er˙e stor aî naujakalb˙es z¦odynaî ir met˙e jiî îi ekranaî. Czech Tmavovlasá dívka za Winstonem za¦cala vyk¦rikovat Svin¦e! a zni¦cehonic popadla t¦ez¦ky´ slovník newspeaku a mr¦stila jím do obrazovky.

Both the Czech and Lithuanian versions render pick up by an unambiguous verb meaning ‘gather up’:





Patrick Corness

(35) English Pick up those pieces, he said sharply. Lithuanian Surinkit tas s¦ukes, — grie¦ztai pasak˙e jis. Czech Posbírejte to, r¦ekl zostra.

The reflexive pick oneself up is also rendered in both versions by an unambiguous verb meaning ‘sit up’: (36) English The girl picked herself up and pulled a bluebell out of her hair. Lithuanian Mergina atsis˙edo ir i¦ssiem˙ ˙ e i¦s plaukuî katil˙eliî , Czech Dívka se nadzvedla a vytáhla si z vlas˚u modry´ zvonek.

. Conclusion The outcome of the present experiment suggests that parallel corpora are a resource that cannot be ignored in translation studies. Data extracted from translation corpora offer considerable potential for contrastive analysis of the respective patterns of linguistic forms which express given semantic content. This view is supported by R u¿ ta Marcinkevic¦iene˙, whose paper on parallel corpora and bilingual lexicography “starts from the position that parallel corpora (i.e. texts of source language and target language, aligned on the level of sentence) can considerably improve bilingual dictionaries and other tools of translators” (Marcinkevic¦iene˙ 1998:40). Bilingual lexicographers may have reservations concerning the validity of translation corpora as a source of empirical evidence for the improvement of bilingual dictionaries, as translators may be subject to interference from the language of the original. Wolfgang Teubert writes that it still remains to be seen what [parallel corpora] really can contribute to multilingual lexicography… Translations, however good and near-perfect they may be (but rarely are), cannot but give a distorted picture of the language they represent. (Teubert 1996:247)

An example is given by Martin Gellerstam (1996). Comparing original Swedish novels and English novels translated into Swedish, he has shown that certain linguistic features in Swedish are overused by Swedish translators under the influence of English. However, it cannot follow from this that translation studies should ignore the evidence of translation corpora; rather it means that this evidence of the intuitive knowledge of translators should be considered alongside the evidence of comparable corpora representing writing by native speakers of the respec-

Multiconcord

tive languages. Translation corpora yield, inter alia, valuable evidence of translation problems and translation strategies, especially if alternative versions are included. Insights into the sources of such problems and of the motivation of strategies for their solution are central to pedagogy and to academic research in this field.

Note .

Zachycovat/zachytit are considered here as different aspects of the same verb.

References Primary Sources Erjavec, T., Lawson, A. & Romary, L. (eds). 1998. East Meets West: a compendium of multilingual resources. Mannheim: TELRI. Orvelas, D¦zord¦zas 1991. 1984-ieji. [Translated into Lithuanian by Virgilijus Cepliejus] Vilnius: Vyturys. Orwell, George 1949. Nineteen Eighty-Four: a novel. Harmondsworth: Penguin Books. Orwell, George 1949. Nineteen Eighty-Four. New York: New American Library. Orwell, George 1991. 1984. Praha: Na¦se vojsko. [Anonymous Czech translation]

Secondary sources Aijmer, K., Altenberg, B. & Johansson, M. (eds). 1996. Languages in contrast: papers from a symposium on text-based cross-linguistic studies, Lund 4–5 March 1994. Lund: Lund University Press. Corness, P. J., Daniels, C. R., Deepwell, F. H., Haydon, D., Holland, M., Read, F., Thompson, D., Thompson, J. 1997. TransIt-TIGER English-French. London: Hodder & Stoughton. Gellerstam, M. 1996. “Translations as a source for cross-linguistic studies”. In Languages in contrast: papers from a symposium on text-based cross-linguistic studies, Lund 4–5 March 1994, K. Aijmer, B. Altenberg & M. Johansson (eds), 53–62. Lund: Lund University Press. Hais, K. & Hodek, B. 1991–1993. English-Czech dictionary (4 vols.). Praha: Academia. Johns, T. F. & Scott, M. 1993. MicroConcord: an introduction to the practices and principles of concordancing in language teaching. Oxford: Oxford University Press. Johns, T.F. & Scott, M. 1993. MicroConcord. Oxford: Oxford Electronic Publishing. Marcinkevi¦cien˙e, R. 1998. “Parallel corpora and bilingual lexicography”. In Germanic and Baltic Linguistic Studies and Translation: proceedings of the international conference held at the University of Vilnius, Lithuania, 22–24 April 1998, A.Usenien˙e (ed.), 40–47.





Patrick Corness

Piesarskas, B. & Svecevi¦cius. 1997. English-Lithuanian/Lithuanian-English dictionary. ¦ Vilnius: Zodynas Publishers. Teubert, W. 1996. “Comparable or parallel corpora?” International Journal of Lexicography 9 (3): 238–264. Vodic¦ka, L. 1992. Anglicko- c¦esk y´ slovník frázov y´ch sloves. Praha: Fragment a Práh. Woolls, D. 1998. Multiconcord [multilingual concordancing program, incorporating Minmark markup program]. Birmingham: CFL Software Development. [http://ourworld.compuserve.com/homepages/davidcfl. Funded by the Lingua Office of the European Union.http://web.bham.ac.uk/johnstf/lingua.htm]

General index

 adjective 32, 33, 98, 103, 112, 123, 138, 145, 157, 175-178, 180, 208, 211, 218, 219, 223-226, 298, 299, 302 adverb 19, 20, 106, 157, 177, 323 aligning 42, 44, 271-273, 285, 288, 307 alignment 10, 11, 13, 39, 40, 45-48, 271288, 309, 311 ambiguity 29, 35, 36, 120, 133, 149, 238240, 291-293, 296, 297, 304 anaphora 253 animacy 182 annotation 307, 308 Arbeit 194, 199-203, 205-210, 212 automatic translation 37, 231, 233, 237, 291, 293, 297, 302, 303 avoir 56, 69, 70, 125, 126, 235, 237, 241, 281  back-translation 17, 29 balanced corpus 9 based on 36, 291, 297 bi-text 271, 272, 274, 275, 278, 287 bilingual concordancer 271 bilingual corpus 44, 47, 216, 274, 279, 288, 293, 308, 309 bilingual dictionary 21, 30, 33-35, 43, 52, 54, 55, 81, 190, 203, 204, 210-229, 235, 237, 247, 279, 284, 305, 308, 309, 319, 324 bilingual glossary 249, 283 bilingual lexicon 10 British National Corpus (BNC) 35, 60, 216, 218, 222, 224, 226, 229, 291, 298 Brown Corpus 40, 121, 292

 Canadian Hansard Corpus 11, 278 case 33, 83-92 caso 33, 83-90 CAT 308 Catalan viii, 35, 216, 217, 219-221, 224, 225, 228, 229 causative 19, 41, 97-116, 123, 125, 129, 136, 137, 145, 147 causative construction 19, 105 causative verb 19, 98, 99, 102, 103, 105, 106, 108, 109 Chinese viii, 31, 115, 116, 151, 152, 156, 157, 167, 171-174 co-occurrence 5, 26, 27, 32, 37, 60, 94, 209, 279, 289, 293, 300, 304 co-selection 32, 75, 77, 78, 80, 91, 92, 94 cognate 10, 19, 127, 280 cognitive linguistics 149, 174, 190, 191, 195, 196, 221 cognitive semantics 48, 150-153, 174 cognitive universals 152 cold 35, 145, 171, 216, 218-225, 228 colligation 22, 32 collocate 60, 77, 207, 300, 301, 304 collocation 8, 22, 26, 27, 32, 42, 44, 47, 77, 78, 88, 95, 182, 200-202, 208, 209, 211, 222, 241, 242, 291, 298-303, 305 comparable corpus 7-9, 16, 17, 30, 40, 81, 83, 91, 93, 100, 115, 216, 293, 324 complex preposition 83, 84, 86, 297 compound 36, 180, 200, 208, 281, 291, 294 computational lexicography 7, 305 computational linguistics vii, 38, 42, 44, 45, 94, 288, 305

 conceptual ontology 191-196, 199, 203, 211, 212 concordance 13, 33, 47, 76, 77, 82-85, 89, 93, 95, 183, 303, 305, 308, 309 concordancer 13, 42, 271 connector 19 contain 20, 52-60, 68-71 contextual correspondence 284 contrastive analysis 5, 16, 18, 28, 43, 45, 46, 127, 147, 152, 228, 307, 308, 324 contrastive linguistics vii, 3, 5, 6, 43, 46, 57, 60, 61, 74, 79, 307, 308 corporate memory 271 corpus-based vii, viii, 4, 14, 15, 18, 26, 29, 31, 32, 36-39, 41, 43, 46, 74, 75, 94, 97, 115, 153, 183, 187, 204, 215, 217, 226, 247, 305 corpus-based dictionary 41, 217 corpus-based lexicography 226 corpus-driven 15, 32, 43, 73-78, 81, 94, 204 cross-linguistic lexicology 48, 116, 119, 148, 150, 190-192 Czech viii, 309, 315, 316, 320-325  Danish 127 data-driven learning 95 dictionary entry 35, 216, 218, 228 disambiguation 37, 119, 120, 123, 147, 240, 241, 266, 291, 292, 298, 304, 305 domain-specific corpora 8 down 31, 151-157, 161-173 Dutch 10, 24, 26, 30, 34, 35, 46, 47  electronic lexicon 34, 35 ellipsis 31, 177-179, 183 English viii, ix, x, 5, 6, 9-11, 18-21, 23-26, 29-37, 39-48, 52-55, 57-61, 64, 73, 8385, 87, 89, 90, 92-94, 97-116, 119, 121, 124, 125, 127-131, 133-137, 139-141, 144, 147-152, 156, 167, 170-175, 177185, 189-191, 194-196, 198, 199, 205, 216-222, 224, 226-229, 231-233, 236, 247, 249-256, 258, 261, 264-267, 273,

276, 278, 280, 281, 287, 289, 291, 292, 295-299, 301, 304, 305, 309, 311, 315, 316, 319-326 English-Norwegian Parallel Corpus 9, 10, 29, 45 English-Swedish Parallel Corpus 9, 23, 25, 41, 100, 121, 140, 148 equivalence viii, 15-18, 21, 22, 33-35, 40, 46-49, 51, 53, 55-57, 60, 73, 79-81, 85, 88, 91, 95, 191, 199, 222, 229, 242, 245, 257, 258, 259, 266, 273, 274, 276, 279, 280, 282, 284, 287 EU documents 190, 201, 205, 213 European Court of Human Rights 249, 250 European Parliament 190, 203, 205, 282 EuroWordNet 35, 45, 48, 196, 306 experiential grounding 157, 160, 162, 164, 165, 167 experiential realism 151, 160  få 19, 23, 98, 101-104, 107-113, 119, 121147 faire 125, 137, 202 figure of speech 175, 176 Finnish viii, 10, 23, 24, 121, 124, 125, 127, 128, 130, 136, 137, 147, 190 fixed expression 323 fork 76-78, 94 frame semantics 23, 27, 39, 41 French viii, 10, 11, 18, 20, 21, 23, 30, 31, 3337, 40, 41, 46, 52-56, 59, 68, 97, 98, 114, 121, 124-128, 130, 136, 137, 147, 175, 177, 178-185, 189, 190, 194, 199205, 208, 209, 213, 231-233, 236, 243245, 249-258, 260, 261, 264-266, 273, 278, 280-282, 285, 287, 289, 291, 294299, 301, 302, 309, 311, 325 functional equivalence 33, 73, 81, 91 functionally complete unit of meaning 33, 73, 74, 76, 79, 81, 85, 90, 91  German viii, 10, 20, 21, 24, 25, 30, 37, 40, 43, 47, 52, 54, 55, 57, 65, 127, 174, 189,

General index

194, 199-205, 208, 209, 213, 292, 311 get 19, 23, 99, 111, 119, 121, 125, 128-131, 137-149 get-passive 146 grammaticalisation 21 grammaticalized meanings 144 Greek 196-199, 201, 213  headword 242-244, 254 high 35, 216, 219, 223-225, 228 homonymy 119, 120, 292 Human Rights terminology 249 hypallage 31, 175-179, 183, 184 hyponomy 29 hyponym 142  idiom 92, 234 idiom principle 92 idiomaticity 5 in case 33, 78, 83, 86-90, 93 in case of 33, 78, 86-89 in caso di 33, 86-88 in the case of 33, 82-84, 86, 93 inchoative 123, 125, 129, 135, 136, 140, 144-147 interference 216, 324 interlanguage 115 interlingua approach 193 International Corpus of Learner English (ICLE) 97, 115, 116 INTERSECT Corpus 52 Italian viii, 30, 33, 37, 73, 83-85, 87, 89, 90, 92, 93, 221, 231, 281, 282  journalistic prose 175  kaum 20, 52-58, 64  landmark 153, 155-157, 173 language system 18, 40, 120, 211

language teaching 5, 6, 14, 95, 116, 147, 325 language use 8, 18, 38 langue 18, 55, 60 legal documents 205, 212 legal terminology 250, 266 lemmatisation 253, 254 lexical alignment 272, 281-284 lexical correspondence 280, 284-286 lexical database 30, 305 lexical decomposition 41 lexical field 32 lexical item 6, 22, 26, 27, 32, 47, 95, 291, 295, 296, 304 lexical relations 14, 28, 29, 35, 38 lexical semantics viii, 28, 29, 43, 47, 48, 95, 117 lexical unit 26, 27, 281, 282, 284, 319 lexico-grammatical 4, 5, 41, 73, 83, 114 lexicology vii, 5, 48, 116, 119, 148, 150, 190-192, 212, 213, 287 literary creativity 183 Lithuanian viii, 309, 315, 319-326  machine translation 32, 36, 37, 191-193, 211, 284, 288, 289, 308 machine-readable dictionary 36, 45, 292, 304, 305 make 19, 30, 97-116, 125, 137, 145 markup 307, 309, 326 metaphor 16, 45, 143, 144, 149, 151, 152, 160, 162, 163, 165, 167, 171, 173, 174, 182, 184, 221, 236 metaphorical 21, 31, 142, 143, 151-153, 156, 157, 159, 160, 165, 167-169, 172, 173, 219, 220, 222, 224, 232, 241, 243, 305 metaphorical extension 21, 157, 173 metaphorical mapping 151, 152 metonymy 175, 180, 181, 183 MicroConcord 309, 325 Microsoft Word 13, 308, 309 Minmark 309, 311, 326 mixed 298, 299, 301, 303 modal auxiliaries 19, 25, 105 modal particles 25, 41



 modality 16, 19, 103, 109, 123, 125, 127, 132, 149, 150, 177 modulation 20, 57-61 monolingual corpus 14, 30-35, 51, 52, 93, 148, 204, 216, 217, 227, 228, 293, 309 monolingual dictionary 40, 204, 215, 217, 218, 220, 221, 224, 226, 227, 229, 294, 299 motion 21, 23, 29, 106, 107, 121, 140-144, 163-165, 170 MultiCoDiCT dictionary system 241 Multiconcord 13, 48, 307-309, 311, 312, 315, 326 multilingual corpus vii, 7, 9, 10, 13, 15, 18, 37, 38, 42, 213, 215, 228, 304 multilingual dictionary 35, 41, 237, 247, 250 multilingual lexicography viii, 14, 33-35, 38, 41, 47, 189, 193, 229, 324 multilingual thesaurus 58, 60 multiple equivalents 23, 54, 252-254, 267 multiword term 253, 257, 258, 266 mutual correspondence 14, 17-19, 23 mutual information 30, 42, 279 mutual translatability 23, 147  natural language processing 30, 35, 36, 38 nel caso di 33, 83, 84, 87 nominalization 181, 182 non-compositional 234 Norwegian 9, 10, 20, 24, 26, 29, 30, 43-45, 116, 127 noun 26, 40, 76, 87, 98, 99, 122, 123, 125, 129, 131, 148, 175, 180, 182, 194, 196, 197, 199, 201, 206-208, 211, 243, 252, 254, 299, 302  obligation 105, 123, 125, 132-135, 144, 147 odd 35, 216, 218, 225-227 order of equivalents 225-228 order of senses 218, 219, 223, 226, 227 Oslo Multilingual Corpus 10 overlapping polysemy 22, 35, 215, 217, 221, 225, 227

 ParaConc 13, 42, 307 paradigmatic relations 5 parallel concordancer 13 parallel concordancing viii, 39, 269, 307, 308, 315 parallel corpus 8-10, 23, 25, 29, 41-43, 45, 46, 48, 61, 81, 100, 115, 121, 140, 148, 178, 181, 189, 193, 194, 203-205, 211213, 216, 229, 273, 283, 288, 305, 308, 324-326 parallel texts 10, 42-46, 271, 272, 288 parliamentary debates 47, 250 parole 18, 55, 60 paronymy 273 part-of-speech tagging 251 partial overlap 21 particle 25, 26, 30, 123, 137-139, 142-144, 157, 174, 322 periphrastic causative 125, 137, 145 permission 123, 125, 132-135, 147 phrasal verb 77, 78, 315, 316, 320 phraseology 27, 43, 45, 78, 247, 284, 287, 292 pick up 157, 316-324 Polish 236, 250 polysemy 19, 22-24, 28, 34, 35, 41, 48, 97, 119, 120, 127, 140, 148, 150, 179, 182, 195, 215, 217, 218, 221, 225, 227, 245, 291, 292, 299 Portuguese 10 possession 19, 23, 121-123, 125-132, 140142, 146-149 premodifier 296 primary meaning 120 procédure 23, 249, 255-258, 261, 264-266 proceedings 23, 249, 255-266 pronoun 20, 106, 122, 297 prototype 120, 127, 139, 148, 152, 157, 247 prototypical 19, 25, 31, 34, 99, 103, 113, 114, 120, 125, 126, 128, 141, 152, 157, 167, 173 prototypicality 28, 29, 99, 114 psychotypology 114 pun 273

General index  quasi-idiomatic 234  rate 36, 224, 293-296, 298 recurrence 209, 211, 212, 281, 287 register 9, 81, 82, 94, 100, 237 restricted domain 211 Romanian 250  saada 125, 127, 128, 130, 137, 147 se per caso 33, 83, 88-90 segmentation 272, 274, 275, 281, 283, 284 selection restrictions 27, 103, 113 semantic extension 133, 146 semantic features 28, 41 semantic field 29, 60, 79, 87, 88, 91, 207 semantic preference 32, 78, 79, 84, 85, 8789, 91, 92 semantic prosody 27, 32, 78, 79, 84, 85, 8793 semantic scope 176 semantic unit 190, 191, 212 semi-idiomatic 234 sense distinction 227 sentence alignment 10, 40, 272, 275, 280, 285 set expression 231, 233-236, 238-240 set expression dictionaries 231, 233 set phrase 235 shang 31, 151-157, 159-161, 163-168, 172, 173 shift of meaning 179 source domain 151, 152 Spanish viii, 35, 37, 205, 216, 217, 219-221, 225, 228, 229, 231, 243-246, 311 specialised corpus 249 specialised dictionaries 242, 246 Swedish viii, 9, 10, 19, 21, 23-26, 37, 41, 44, 48, 94, 97-116, 119-121, 124-127, 129, 130, 133-135, 138, 140-142, 144-148, 150, 324 synonymous equivalents 212, 244 synset 195-199

syntactic frame 122, 123, 136, 141, 143, 147 syntactic shift 176 syntagmatic relations 4, 5, 22, 29 synthetic causative 19, 99, 109, 112, 115  TACT program 293 target domain 151, 152, 165 term extraction 249, 250, 266 terminology vii, 7, 14, 34, 42, 85, 94, 249255, 266, 267, 287 tertium comparationis 15, 16, 20, 28, 191 text alignment 10, 11, 40, 46, 47 textual context 309 thesaurus 58, 60, 195, 196 trajector 153-157, 159, 167, 173 transfer 17, 80, 98, 114-116, 216 translation vii, viii, 6-11, 13-26, 29-48, 5155, 57, 58, 60-62, 73, 74, 79-85, 89-95, 100, 101, 105, 109, 115, 119-121, 124128, 130-136, 138, 140-145, 147, 148, 150, 176, 178, 180, 183, 184, 189-194, 196-200, 203-205, 209, 211, 212, 213, 216, 217, 221, 222, 225, 227-229, 231233, 237, 239, 244, 246, 247, 250, 260, 265-267, 269, 271, 272-282, 284, 285, 287-289, 291-294, 296-299, 301-305, 307-309, 312, 318-325 translation aids 211 translation corpus 7-11, 13, 16-22, 24, 25, 30, 31, 33, 34, 37-41, 43, 45, 47, 48, 51, 52, 54, 55, 57, 58, 60, 61, 81-83, 91, 93, 100, 115, 121, 132, 148, 271, 307-309, 324, 325 translation equivalence 15-18, 34, 40, 4648, 51, 57, 60, 95, 191, 229, 273, 276, 280 translation equivalent 17, 32, 33, 53, 82-84, 89, 90, 125, 190, 204, 209, 301, 318 translation memory 37, 61, 250, 267 translation platform 203, 205 translation practice 94, 199, 204, 212 translation process 80, 93 translation strategy 17, 54, 58, 308, 325 translation studies vii, 38, 39, 41, 42, 44, 45, 94, 148, 213, 229, 324



 translation unit 203, 205, 209, 211, 212 translational compositionality 274, 285, 287 translational systematicity 55, 57 translational unsystematicity 55 translationese 9, 44, 47, 85, 94 translator training 14 translator's workbench 309 translator's workstation 271 travail 194, 199-202, 205-210, 212 typological 14, 23, 27, 29, 116, 191  underspecification 120, 149 unit of meaning 32, 33, 73, 74, 76-81, 85, 87, 88, 90-92, 95, 193, 195-197, 212 unit of translation 80, 199 universal 14, 17, 21, 23, 27, 28, 31, 40, 41, 97, 116, 146, 149, 150, 153, 173, 174, 191, 193, 195, 196, 233 up 30, 31, 46, 151-157, 167-174, 316-324  valency 27, 34, 43, 47 verb 19, 20, 23, 24, 29, 30, 34, 40, 41, 43, 46, 47, 56, 76-78, 97-99, 102-106, 108115, 120-123, 125-132, 135-147, 157, 174, 182, 207, 237, 254, 258, 266, 292, 297-299, 302, 315, 316, 320, 321, 323325 verb of possession 121, 123, 127-129, 131, 141, 142  word alignment 11, 13, 281, 288 word formation 179 work 194, 196-199  xia 31, 151-157, 159-161, 163-168, 172, 173

Author index

 Achèche 247 Adams 179, 180, 184 Ahlswede 304 Ahrenberg 37, 41 Aijmer 8, 9, 19, 24, 25, 41, 44, 46, 48, 100, 101, 115, 116, 121, 148, 150, 325 Akhundov 165, 174 Al-Kasimi 215, 229 Allan 163, 174 Altenberg vii, viii, ix, 3, 9, 18, 19, 41, 97, 105, 115, 116, 121, 124, 136, 148, 150, 325 Alverson 163, 174 Assal 253, 267 Astington 181, 184 Aston 218, 229 Atkins 6, 23, 34, 39, 41-43, 304  Bahns 30, 42 Baker 7, 9, 39, 42, 43, 93-95 Bally 39, 42, 121, 148 Barlow 13, 42 Bejoint 247, 248, 305 Belkin 184, 185 Benson 304 Berlin 119, 148 Biber 5, 42, 51, 61, 94 Bickel 174 Bläser 34, 42 Botley 42, 46, 47 Boucher 193, 213 Bourigault ix, 249, 252, 267 Bradley 293, 306

Bresnan 39, 42 Brill 95, 289, 291, 304 Brown 10, 37, 40, 42, 44, 119, 121, 149, 271, 272, 278, 280, 288 Burnard 218, 229 Butler 39, 41, 42, 46  Calzolari 22, 42, 94 Cardey ix, 35, 231, 247 Carruthers 193, 213 Celle 182-184, 246 Chan 236, 243, 247 Chesterman 6, 15, 16, 18, 42, 60, 61 Chuquet 57, 58, 61, 181, 184 Church 10, 13, 30, 40, 42, 44, 272, 278-280, 288, 304 Clear 39, 42, 93, 293, 304 Coates 180, 184 Cocke 42, 288 Conrad 42, 61 Corness ix, 13, 307, 309, 325 Cornu 255, 265-267 Cowie 43-45, 304 Cruse 26, 27, 29, 41, 43 Culioli 182, 184  Dagan 272, 288 Daniels 325 Darbelnet 48, 57, 58, 62 Dauphin 235, 247 Debili 280, 284, 288 Deepwell 325 Defrancq 182, 184

 Delavigne 253, 267 Della Pietra 42, 288 Devos 34, 43 Di Pietro 6, 43 Di Sciullo 3, 43 Di Tomaso 304 Dickens 34, 43 Dik 39, 43 Dini 292, 304 Dorr 119, 148 Dunning 272, 280, 288 Dupriez 184 Dymetman 45, 288 Dyvik 29, 43  Ebeling 13, 18, 43 Eco 193, 213 Erjavec 40, 43, 325 Even 304  Faber 39, 41, 43, 92, 94 Fabricius-Hansen 30, 31, 43 Fellbaum 39, 46 Filipovic 40, 43 Fillmore 23, 39, 41, 43 Finnegan 42 Firth 4, 5, 32, 42-44, 73, 76, 91, 92, 94 Fisiak 34, 43 Fodor 193, 213 Fontenelle 44, 296, 304, 305 Foster 45, 47, 288, 289 Francis 15, 40, 42-44, 75, 94, 213 Fromilhague 176, 184 Fung 279, 280, 288  Gale 10, 13, 40, 42, 44, 272, 278, 280, 288, 304 Galliot 247 Gärdenfors 48, 132, 150, 151, 174 Garside 291, 298, 305 Gaussier 272, 275, 278, 288 Gavieiro 247 Gazdar 39, 44

Geeraerts 120, 148 Geiger 151, 152, 174 Gellerstam 9, 34, 42-44, 46, 48, 85, 94, 324, 325 Gerardy 44 Goatly 160, 174 Goddard 119, 148 Goldberg 39, 44 Gonzales-Muilez 267 Granger vii, viii, ix, 3, 30, 44, 97, 115, 116, 185 Greenbaum 116, 149 Greenfield ix, 35, 231, 241, 247 Greenstein 45 Grefenstette 30, 44 Gronemeyer 140, 148 Guillemin-Flescher 44, 181, 182, 184 Gumperz 119, 149 Gutt 51, 61  Hais 316, 325 Halliday 3, 4, 39, 44, 47, 73, 75, 83, 92, 94 Hanks 30, 42, 304 Hannan 47 Hartmann 7, 44, 215, 229 Hasselgård 41, 43, 44, 116 Hasselgren 114, 116 Hatch 119, 149 Haydon 325 Heid 34, 41, 44, 305 Heine 128, 149 Heyn 37, 44 Hindle 304 Hodek 316, 325 Hofland 10, 44, 45 Holland 43, 45, 325 Howarth 30, 45 Hudson 39, 45 Hunston 75, 94 Hyltenstam 48, 114, 116  Ide 35-37, 44, 45, 123, 131 Isabelle 37, 45, 47, 271, 274, 288, 289

Author index

Israël 288 Ivir 17, 29, 45  Jakobson 120, 149 James 6, 15, 16, 28, 29, 39, 45, 171, 213, 215, 229 Järborg 44 Jelinek 42, 288 Johansson 7-10, 13, 20, 26, 41-45, 47, 48, 100, 101, 115, 116, 121, 140, 148-150, 325 Johns 81, 95, 174, 309, 325 Johnson 23, 46, 128, 149, 160, 162, 163, 173, 174, 184 Johnson-Laird 23, 46, 128, 149 Juffs 99, 115, 116 Jutrac 45  Kay 10, 45, 119, 148, 272, 280, 288 Kellerman 114, 116 Kervio-Berthou 203, 213 Kittay 28, 29, 43, 45 Kleiber 184 Klein 44 Knowles 305 Koskinen 190, 214 Kraif x, 11, 271, 278, 288 Krzeszowski 6, 14-16, 45 Ku¦cera 40, 44  Lafferty 42, 288 Lai 42, 164, 288 Lakoff 143, 149, 151, 152, 160, 162-165, 174, 184, 305 Langacker 39, 45, 133, 149, 151, 174, 182, 184 Langé 272, 275, 278, 288 Langlais 272, 276, 288 LaPolla 39, 48 Larreya 177, 184 Lawson 43, 325 Lederer 288

Leech 42, 85, 116, 149, 174, 180, 184, 305 Lerner 184, 185 Levin 39, 41, 46 Levinson 119, 149 Lewandowska-Tomaszczyk 45, 46 Limame 231, 236, 247 Lindner 157, 174 Løken 26, 45 Louw 78, 87, 90, 93, 94  Macklovitch 45, 47, 271, 288, 289 Mairal Usón 39, 41, 43, 92, 94 Malmgren 44 Marcinkevi¦cien˙e 324, 325 Marcus 304 Marello 246, 247 Matisoff 147, 149 Mauranen 24, 46 McCord 292, 305 McEnery 40, 42, 46 McKeown 305 Mel’¦cuk 296, 305 Melamed 279, 289 Melby 192, 214 Melia 45, 46 Mercer 42, 288 Merkel 10, 37, 40, 41, 46 Méry 177, 184 Michiels 292, 305 Miller 23, 39, 46, 128, 149, 195, 292, 305 Montgomery 34, 46 Moon 26, 46, 222, 233-235, 238, 246, 247, 304 Morgadinho 231, 236, 243, 247, 248 Morgan 157, 174  Nagao 271, 289 Neff 292, 305 Neubert 79, 95 Newman 119, 147, 149 Nida 80, 95, 273, 287, 289 Norén 44 Nuyts 24, 46, 174



  Oakes 40, 46 Oksefjell 41, 43-45, 47, 48, 116, 140, 149 Orvelas 325 Orwell 309, 315, 316, 318, 325 Ostler 34, 41, 46

Rondeau 253, 267 Roos 30, 46 Roossin 42, 288 Rosch 28, 46 Röscheisen 10, 45, 272, 288 Rudzka-Ostyn 149, 151, 152, 174

 Paillard x, 31, 57, 58, 61, 175, 181, 183, 184 Paulussen 30, 31, 46 Payne 47 Pérez Hernández 47 Pergnier 282, 287, 289 Perrault 47, 289 Persson 41, 46 Peters 30, 40, 46, 149 Petit 180, 185, 247 Piesarskas 320, 326 Pinker 193, 214 Plamondon 47, 275 Plungian 132, 149 Poesio 120, 149 Pollard 39, 46 Pullum 44 Pustejovsky 120, 149, 305 Putnam 192, 214

 Sag 39, 44, 46 Sager 94, 273, 289 Sajavaara 6, 46 Salkie x, 18, 20, 34, 43, 46, 51, 52, 58, 61 Sammouda 280, 288 Sato 271, 289 Schäffler 9, 39, 40, 47 Schmied 9, 30, 39, 40, 47 Schönefeld 152, 174 Schultze 44 Schwarze 29, 47, 119, 149 Scott 309, 325 Segond 304 Simard 40, 45, 47, 271, 272, 274-276, 280, 288, 289 Simon-Vandenbergen 24, 34, 43, 47 Sinclair 3, 4, 22, 26, 27, 32, 33, 36-38, 42, 47, 48, 74, 76-78, 92-95, 221, 229, 305 Singleton 26, 53, 119, 149 Sinha 173, 174 Slater 292, 305 Smadja 305 Smith 116, 152, 174, 304 Song 99, 116 Steffens 36, 42, 44-47 Stibbe 152, 174 Stubbs 78, 92, 95 Suhamy 176, 185 Sutcliffe 292, 305 Svartvik 94, 116, 149 Svecevi¦cius 320, 326 Svensén 21, 47 Svensson 148 Svorou 151, 163, 174 Swallow viii, 185

 Quirk 99, 116, 139, 149  Rainer 184 Rappaport 39, 46 Rastier 284, 289 Read 67, 70, 156, 190, 228, 325 Reidenberg 299, 305 Ren 45 Reppen 61 Ridings 13, 46 Ringbom 6, 46 Roberts 34, 46, 242, 248 Roe 305 Rogström 44 Röjder Papmehl 44 Romary 43, 325

Author index  Taber 273, 287, 289 Taeldeman 47 Talmy 23, 48, 119, 149 Taylor 28, 48, 120, 149 Teubert vii, x, 8, 9, 21, 26, 32, 34, 36-38, 40, 48, 189, 201, 214, 216, 229, 293, 305, 324, 326 Thoiron 247, 248, 305 Thomas 149, 231, 236, 248 Thompson 325 Tognini Bonelli x, 15, 32, 33, 42, 48, 73-75, 79, 92, 94, 95, 204, 214, 229 Tomaszczyk 45, 46, 215, 229 Tournier 179, 180, 185 Toury 216, 229 Tsohatzidis 120, 149  Ullmann 185  van der Auwera 132, 149 Van Hoof 57, 62, 185 Van Roey 184, 185 Van Valin 39, 48 Véronis 36, 37, 40, 45, 48, 288 Viaggio 79, 95 Viberg x, 19, 21, 23, 29, 39, 40, 48, 97, 104, 116, 119, 120, 129, 148, 150 Vinay 48, 57, 58, 62 Vodic¦ka 316, 318, 319, 326 Volz 48 Vossen 35, 45, 48, 292, 306  Wandruszka 39, 48, 121, 150 Wanner 119, 150, 305 Weigand 7, 41, 47, 48, 95 Wierzbicka 43, 119, 148 Willems 47, 182, 184 Williams 3, 43 Wilson 42 Winter 132, 150 Wong 115, 116

Woolls 308, 326  Yu 152, 163, 165, 174  Zampolli 41, 42, 93, 94 Zgusta 306



In the series STUDIES IN CORPUS LINGUISTICS (SCL) the following titles have been published thus far: 1. PEARSON, Jennifer: Terms in Context. 1998. 2. PARTINGTON, Alan: Patterns and Meanings. Using corpora for English language research and teaching. 1998. 3. BOTLEY, Simon and Anthony Mark McENERY (eds.): Corpus-based and Computational Approaches to Discourse Anaphora. 2000. 4. HUNSTON, Susan and Gill FRANCIS: Pattern Grammar. A corpus-driven approach to the lexical grammar of English. 2000. 5. GHADESSY, Mohsen, Alex HENRY and Robert L. ROSEBERRY (eds.): Small Corpus Studies and ELT. Theory and practice. 2001. 6. TOGNINI-BONELLI, Elena: Corpus Linguistics at Work. 2001. 7. ALTENBERG, Bengt and Sylviane GRANGER (eds.): Lexis in Contrast. Corpus-based approaches. 2002. 8. STENSTRÖM, Anna-Brita, Gisle ANDERSEN and Ingrid Kristine HASUND: Trends in Teenage Talk. Corpus compilation, analysis and findings. n.y.p.

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)