Formulaic Sequences: Acquisition, Processing and Use (Language Learning and Language Teaching)

Formulaic Sequences Language Learning and Language Teaching The LL< monograph series publishes monographs as well...

Author: Schmitt | Norbert

238 downloads 1380 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Formulaic Sequences

Language Learning and Language Teaching The LL< monograph series publishes monographs as well as edited volumes on applied and methodological issues in the ﬁeld of language pedagogy. The focus of the series is on subjects such as classroom discourse and interaction; language diversity in educational settings; bilingual education; language testing and language assessment; teaching methods and teaching performance; learning trajectories in second language acquisition; and written language learning in educational settings.

Series editors Birgit Harley Ontario Institute for Studies in Education, University of Toronto

Jan H. Hulstijn Department of Second Language Acquisition, University of Amsterdam

Volume 9 Formulaic Sequences: Acquisition, processing and use Edited by Norbert Schmitt

Formulaic Sequences Acquisition, processing and use

Edited by

Norbert Schmitt University of Nottingham

John Benjamins Publishing Company Amsterdam/Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Library of Congress Cataloging-in-Publication Data Formulaic sequences : acquisition, processing and use / edited by Norbert Schmitt. p. cm. (Language Learning and Language Teaching, issn 1569–9471 ; v. 9) Includes bibliographical references and indexes. 1. Language and languages--Study and teaching. 2. Lexicology. 3. Pattern perception. I. Schmitt, Norbert, 1956- II. Series. P53. F654 2004 407-dc22 isbn 90 272 1707 6 (Eur.) / 1 58811 499 6 (US) (Hb; alk. paper) isbn 90 272 1708 4 (Eur.) / 1 58811 500 3 (US) (Pb; alk. paper)

2004041065

© 2004 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microﬁlm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Contents Preface Formulaic sequences in action: An introduction Norbert Schmitt and Ronald Carter

viii 

Measurement of formulaic sequences John Read and Paul Nation

23

Formulaic performance in conventionalised varieties of speech Koenraad Kuiper

37

Knowledge and acquisition of formulaic sequences: A longitudinal study Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

55

Individual diﬀerences and their eﬀects on formulaic sequence acquisition Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

87

Social-cultural integration and the development of formulaic sequences Svenja Adolphs and Valerie Durow

07

Are corpus-derived recurrent clusters psycholinguistically valid? Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

27

The eyes have it: An eye-movement study into the processing of formulaic sequences Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

53

Exploring the processing of formulaic sequences through a self-paced reading task Norbert Schmitt and Geoﬀery Underwood

73

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 Carol Spöttl and Michael McCarthy The eﬀect of typographic salience on the look up and comprehension of unknown formulaic sequences Hugh Bishop

9

227

vi

Contents

‘Here’s one I prepared earlier’: Formulaic language learning on television 249 Alison Wray Facilitating the acquisition of formulaic sequences: An exploratory study in an EAP context Martha Jones and Sandra Haywood

269

Index

30

To my colleagues at the University of Nottingham

Preface Lexical patterning is an increasingly important issue in applied linguistics as it becomes ever more apparent that such patterning pervades most language use. This is not a new insight, with numerous scholars referring to such patterning over the years. However these scholars have used a wide range of terminology for the phenomenon, and the research has been scattered across various ﬁelds. This led to a quite limited awareness of lexical patterning in the applied linguistics ﬁeld in general, and it was only relatively recently that the eﬀorts of scholars like Nattinger and DeCarrico, Sinclair, Moon, Kuiper, Wray, and Biber have led to it becoming more widely known. A considerable amount of the research has attempted to describe the nature of various lexical patterns (idioms, collocations, sentence stems, etc.), often based on corpus evidence. Other research has looked at the role of formulaic patterns in the acquisition of ﬁrst language. Beyond this, there is little research which has focused on lexical patterns in second language acquisition, or on the whole issue of how lexical patterns are processed in the mind. The time seemed ripe for research addressing these areas. A team at the Centre for Research in Applied Linguistics (CRAL) at the University of Nottingham was able to carry out a cycle of research into lexical patterning, and this volume reports on our ﬁndings. During our investigations, we became aware that other lexically-minded scholars around the world were concurrently carrying out studies in the same area, and some of their work is also included in this book. As a package, we feel that the studies in this volume are not only interesting in terms of their ﬁndings, but also in terms of variety of methodology used. We have included the full research instrumentation wherever possible for the interested reader. I would like to thank several people for making this volume possible. Zoltán Dörnyei, my co-director at CRAL, generated the grant that funded the whole process, and was there through all of the ups and downs of the research. Svenja Adolphs, Valerie Durow, Sarah Grandage and Khawla Zahran were the other core team members without whom nothing would have happened. Colleagues at the Centre for English Language Education (CELE) at the University of Nottingham allowed access to their students, and I would like to particularly thank

Preface

Rebecca Hughes, Martha Jones, and Sandra Haywood. Geoﬀrey Underwood was a most helpful collaborator who helped open up exciting new methodologies in the study of formulaic sequences. I am grateful to non-CRAL colleagues who have contributed welcome additions to the book: Hugh Bishop, Koenraad Kuiper, Paul Nation, John Read, Carol Spöttl, and Alison Wray. In particular, I would like to thank Alison Wray and Koenraad Kuiper for their very insightful input, which improved the entire project immensely. Jan Hulstijn and Birgit Harley proved to be supportive and insightful series editors and it is a pleasure to have this volume in their series. Kees Vaes was a most friendly and eﬃcient liaison at John Benjamins Publishing. The Economic and Social Research Council supported the research with Grant #R000239294. I have enjoyed being part of this research, and hope that you ﬁnd much of interest in these studies. If you become interested in researching this area yourself, all the better. Many of these studies are innovative now, but it would be wonderful if we could look back in ten years and marvel at how much we had progressed. Norbert Schmitt University of Nottingham November 2003

ix

Formulaic sequences in action

An introduction

Norbert Schmitt and Ronald Carter University of Nottingham

Introduction Formulaic sequences are ubiquitous in language use (Nattinger and DeCarrico, 1992: 66) and they make up a large proportion of any discourse. Erman and Warren (2000) calculated that formulaic sequences of various types constituted 58.6% of the spoken English discourse they analyzed and 52.3% of the written discourse. Using diﬀerent criteria and procedures, Foster’s raters judged that 32.3% of the unplanned native speech they analyzed was made up of formulaic language (Foster, 2001). If formulaic sequences are so widespread in English discourse, it follows that proﬁcient English speakers must have knowledge and mastery of these sequences at some level. A number of scholars claim that this knowledge is extensive. For example, Pawley and Syder (1983: 213) suggest that the number of “sentence-length expressions familiar to the ordinary, mature English speaker probably amounts, at least, to several hundreds of thousands.” Jackendoﬀ (1995) concludes from a small corpus study of spoken language in a TV quiz show that formulaic sequences may be of equal if not greater signiﬁcance than the lexicon of single words, while Mel’čuk (1995), who uses the term ‘phraseology’, claims even greater overall signiﬁcance for such sequences. The idea that proﬁcient language users know numerous formulaic sequences is intuitive, but it must be said that the above claims are made by assertion, as there is little empirical work to substantiate them. However, they do ﬁt well with Sinclair’s (1991) view that language as a whole is organised according to two main structuring principles: an open choice principle and an idiom principle, with the latter involving the widespread use of formulaic stretches of words.1 Furthermore, this store of formulaic sequences is dynamic and is constantly changing to meet the needs of the speaker (Wray, 2002: 101). Even if the above claims prove to be somewhat overstated, it is clear that lexical patterning does exist in

2

Norbert Schmitt and Ronald Carter

English, and therefore must have some consequences in terms of how English is acquired, processed, and used. Some types of formulaic sequence have always been obvious in the form of idioms, proverbs, and sayings. These sequences noticeably operate as single units at some level, even though their form consists of multiple orthographic words. The fact that these multi-word units express a single meaning made them stand out. In the case of idioms, their meaning could not be derived from the sum of meanings of the component words and they did not always follow the rules of grammar. These multiword units were often relegated to a peripheral category by scholars; acknowledged, but dismissed as having only a minor role in language (see Wray, 2002). The advent of computerized corpus studies made additional patterning evident, and it soon became clear that lexical patterning was not limited to these obvious multiword units (e.g. Biber et al. 1999).2 In fact, formulaic sequences seem to exist in so many forms that it is presently diﬃcult to develop a comprehensive deﬁnition of the phenomenon. This lack of a clear deﬁnition remains one of the foremost problems in the area. Some commonly-used criteria come from the area of corpus linguistics, such as institutionalization, ﬁxedness, and non-compositionality, which Moon (1997: 44) suggests are key characteristics of what she calls multi-word items. Another often-cited criterion is frequency of occurrence, on the assumption that if a sequence is frequent in a corpus, this indicates that it is conventionalised by the speech community, at least to some extent. In general, corpus deﬁnitions are concerned with identifying and describing formulaic sequences as they occur throughout a corpus. These criteria are useful, but are not the only possible way to view formulaic sequences. Psycholinguists and language acquisition specialists focus on criteria which determine whether sequences are known by individual participants, and whether these sequences are formulaic and stored as wholes in the participant’s mental lexicon. Thus criteria are used such as whether a sequence of words is produced more than once by a participant (indicating that the sequence is known and not just a one-oﬀ imitation of a sequence heard by the participant) and whether it is produced with an intact intonation contour (suggesting the sequence is stored as a whole). Although linguistic and psycholinguistic criteria have been developed for diﬀerent purposes, any satisfying description of formulaic sequences probably needs to draw on both perspectives. Thus the next section will utilize insights from both linguistic and psycholinguistic traditions as it explores some of the characteristics of formulaic sequences.

Formulaic sequences in action

Selected characteristics of formulaic sequences One of the reasons it is diﬃcult to deﬁne formulaic sequences lies in their diversity. For example, formulaic sequences can be long (You can lead a horse to water, but you can’t make him drink) or short (Oh no!), or anything in between. They are commonly used for diﬀerent purposes. They can be used to express a message or idea (The early bird gets the worm = do not procrastinate), functions ([I’m] just looking [thanks] = declining an oﬀer of assistance from a shopkeeper), social solidarity (I know what you mean = agreeing with an interlocutor), and to transact speciﬁc information in a precise and understandable way (Wind 28 at 7 = in aviation language this formula is used to state that the wind is 7 knots per hour from 280 degrees). They realize many other purposes as well, as formulaic sequences can be used for most things society requires of communication through language. These sequences can be totally ﬁxed (Ladies and Gentlemen) or have a number of ‘slots’ which can be ﬁlled with appropriate words or strings of words ( [someone/thing, usually with authority] made it plain that [something as yet unrealised was intended or desired] ). With this diversity in mind, it is little wonder that diﬀerent researchers have looked at formulaic sequences and seen diﬀerent things, resulting in a variety of terminology to express various perspectives. The range of this terminology is evident from the fact that Wray (2002: 9) found over ﬁfty terms to describe the phenomenon of formulaic language. Below is a sample: chunks collocations conventionalised forms

formulaic speech formulas holophrases

multiword units prefabricated routines ready-made utterances

The scope of this list made it diﬃcult to even decide on a cover term to use for the notion of formulaic language in this chapter. We have decided to use the term formulaic sequence based on a deﬁnition by Wray (2002: 9): a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar.

This term covers a wide range of formulaic language, and touches on two key criteria of the emphasis in this book: a) we are concerned with sequences of lexis and b) the mind handles, or appears to handle, these sequences at some level of representation as wholes. However, using this deﬁnition, Wray argues that even

3

4

Norbert Schmitt and Ronald Carter

single words and morphemes can be seen as formulaic sequences. In this chapter we wish to focus primarily on multi-word sequences of lexis and so initially searched for other terms. The term formula is often used, but usually to mean a string of formulaic language with idiosyncratic conditions of use, and so is not really suitable for use as a cover term. Similarly, lexical phrase is used by Nattinger and DeCarrico (1992) to emphasize the relationship between formulaic language and functional language use. When we were considering the various possible terms, each with their own particular bias, Koenraad Kuiper was most helpful in pointing out that there are two underlying properties which deﬁne the language phenomenon we are trying to capture: a) the units of formulaic language are not merely any sequence of words, but phrases, and b) they are lexical items exactly like other lexical items such as words, and with the same properties as words would have if they were phrases. This line of reasoning leads to two obvious terms, phrasal lexical item and phrasal lexeme and we considered carefully the adoption of such terms. However, even bearing in mind such distinctions, we settled in the end on formulaic sequence (FS) as the most comprehensive term for our investigations.3 The term formulaic sequence is thus intentionally all-encompassing, covering a wide range of phraseology. Since there is so much diversity, it is diﬃcult to identify absolute criteria which deﬁne formulaic sequences. Rather it is probably more useful to discuss characteristics which are typical of formulaic sequences, even though every example lexeme might not exhibit each characteristic. Wray and Perkins (2000, Figure 2) provide an extensive listing of these characteristics. Also, the interested reader will ﬁnd Wray (2002), a book-length treatment of formulaic language to which much of this chapter is indebted, an excellent resource. Assuming that the reader is familiar with the basic conceptual background regarding formulaic sequences, in this section we will overview a few of the characteristics which we ﬁnd particularly interesting. Formulaic sequences appear to be stored in the mind as holistic units, but they may not be acquired in an all-or-nothing manner. There is plenty of evidence to suggest that formulaic sequences are typically stored and processed as unitary wholes, even if this is not true in every case. Perhaps the most obvious evidence lies in semantically-opaque formulaic sequences, such as idioms, where the meaning of the sequence cannot be derived from knowledge of the component words. The only way to know the meaning of the idiom is to have learned it as a sequence. There is also evidence on the phono-

Formulaic sequences in action

logical front: formulaic sequences are typically spoken more ﬂuently, with a coherent intonation contour, to the extent that this has been accepted as one criterion of formulaticity (e.g. van Lancker, Canter, and Terbeek, 1981; Peters, 1983, p. 10). Moreover, Pawley and Syder (1983) assert that formulaic sequences offer processing eﬃciency because single memorized units, even if made up of a sequence of words, are processed more quickly and easily than the same sequences of words which are generated creatively. This assertion is supported by evidence from Kuiper (1996, this volume) and his colleagues (Kuiper and Haggo, 1984), who show that ‘smooth talkers’ (auctioneers, sportscasters) use formulaic language a great deal in order to ﬂuently convey large amounts of information under severe time constraints. In addition to this productive advantage, there seems to be a receptive advantage as well. Underwood, Schmitt and Galpin (this volume) demonstrate that words, when they are part of formulaic sequences, are read more quickly than the same words when embedded in non-formulaic text. One might also assume that there is a processing-based reason behind the fact that the preferred realization of many functions (e.g. making apologies, requesting) is one or more formulaic sequence. For example, when shifting a topic, we commonly use a formulaic sequence like by the way, but create novel phrases such as It’s time for a topic change much more rarely. If creatively-generated language was cognitively more eﬃcient, we would not expect to ﬁnd formulaic sequences realizing functional language usage nearly as frequently as we do in corpus evidence. Formulaic sequences generally appear to be processed as wholes and it is likely that many are also learned as wholes, especially short salient ones like Go Away! However, there are good arguments for why some formulaic sequences are not learned in an ‘all-or-nothing’ manner. Some ﬁrst language (L1) acquirers seem to acquire an initial phonological mapping of formulaic sequences proceeding from the whole to the individual parts, but with some elements still incompletely grasped, especially the unstressed phonemic constituents (Peters, 1977; Wray, 2002, Chapter 6). In these cases, the formulaic sequences are learned over time, with the later stages of acquisition consisting of ‘ﬁlling in’ the gaps in the initial incomplete rendering of the sequence. Likewise, some of the component words in the formulaic sequence, as well as the syntactic structure may not be known initially either. Peters (1983) suggests that these elements may be later extracted from the formulaic sequence through a process of segmentation. Another way formulaic sequences are learned over time involves the ﬂexible slots many formulaic sequences have which can be ﬁlled with semantically-appropriate words or phrases. If the formulaic sequences are initially ac-

5

6

Norbert Schmitt and Ronald Carter

quired with these slots as part of the structure, one might expect that it would take longer to learn the appropriate language insertions for these slots than to learn the ﬁxed elements of the sequence. Alternatively, if the slots are created when paradigmatic variation is noticed at one location in a previously fullyﬁxed string, then this learning is also incremental in the sense that a ﬁxed formulaic sequence must ﬁrst be acquired before it is analyzed to form a formulaic sequence with slots. Moreover, shorter formulaic sequences can be combined together into longer and more complex formulaic sequences (Peters, 1983: 73), which means that the component formulaic sequences need to be learned as the initial step to acquiring the subsequent formulaic sequence. The transparency of formulaic sequences might also aﬀect the learning burden. Formulaic sequences lie on a continuum of transparency/opaqueness, with idioms at the obscure end, but with many sequences being quite transparent at the other end (my point (here) is that _____). It may well be that transparent sequences are learned in a somewhat diﬀerent manner than opaque sequences, perhaps even being generated online in the ﬁrst instance through knowledge of the individual component words and knowledge of syntactical sequencing. The learning of one kind of lexeme (individual words) is incremental and produces diﬀerent learning burdens (Schmitt, 2000; Nation, 1990), and there is no reason to believe that other types of lexeme (i.e. formulaic sequences) are any diﬀerent in this respect. This would suggest that many formulaic sequences are partially known for a number of exposures until the point where they become mastered. The question of complete, holistic acquisition vs. incremental acquisition of formulaic sequences is an interesting one, because the answers may eventually determine which formulaic sequences are practical to teach to second language (L2) learners. Formulaic sequences can have slots to enable ﬂexibility of use, but the slots typically have semantic constraints. We have mentioned that some formulaic sequences are completely ﬁxed strings of words, while others have slots in addition to their ﬁxed elements. There is no doubt that in some cases, ﬁxedness is an advantage. For example, Watch Out! is an instantly recognizable warning, precisely because it is ﬁxed, and little processing should be required to understand it. We could shout something like Watch the car coming behind you!, but if milliseconds count, then a shorter,

Formulaic sequences in action

more conventionalised warning is likely to be most eﬀective. However, it is an advantage in much of language use to allow more ﬂexibility of meaning. For example, if we wish to express the notion that some activity or achievement is unusual, unexpected, or exceptional, then we can use phrases like Diane thinks nothing of running 5 miles before breakfast or He thinks nothing of driving 100 miles per hour on the freeway. The underlying structure to these sentences is ‘_____ thinks nothing of _____’, which allows the ﬂexibility to express the ‘unexpected’ notion in a wide variety of situations. This scaﬀold can aid ﬂuent language because some of the language is already preassembled and can be used in a variety of situations. The slots in this type of formulaic sequence are not always completely open however; there are often semantic constraints which control which word or words can be used in the slots. In the example above, the second slot must capture the idea of something unusual or unexpected, precisely because that is the reason for using this particular formulaic sequence. Note how the sentence She thinks nothing of sleeping 8 hours per night sounds strange because sleeping that amount of time is usual. Conversely, She thinks nothing of sleeping 14 hours per night seems acceptably surprising. Our intuitions say that these ﬂexible formulaic sequences are widely-used in discourse, simply because they are adaptable to a wide range of situations. We would expect this suggested broad usage to be evident in corpora. The evidence may well be in the data, but the problem is that ﬂexible formulaic sequences are diﬃcult to identify using current concordancing packages. Modern concordancers are good at identifying contiguous sequences, but we do not yet have software which can identify ﬂexible formulaic sequences automatically from corpora. Once this software is developed, we may ﬁnd that ﬂexible formulaic sequences are even more prevalent than totally ﬁxed ones. Formulaic sequences can have semantic prosody. Individual words (other than technical vocabulary) usually have a relatively wide range of usage. For example, the noun form of the word border can mean a political boundary, a geophysical boundary, the edge of a something like a piece of fabric, and the verb form can mean being adjacent to such a boundary. However, once the word border is used syntagmatically with other words (e.g. bordering on), its usage can become constrained. Consider the following concordance lines from the British National Corpus (BNC):

7

8

Norbert Schmitt and Ronald Carter

managers with an abandon bordering on carelessness. demonstrated an intransigence bordering on arrogance. been consumed, struck me as bordering on the ill-mannered. class were treated with distrust bordering on disdain. sat in a state of sullenness bordering on rage or had conspicuously fundamentally disturbed, and bordering on the deeply neurotic or worse. area to the south-east of Cumbria, bordering on Lancashire. drawn up to which all states bordering on its coasts should adhere. or emerging from property bordering on a road, give way to pedestrians Choose a good hotel, even bordering on the luxurious if you can. Of the 100 instances of bordering on in the BNC, 27 do refer to a physical location, but by far the most frequent usage (57 instances) carries the meaning of ‘approaching an undesirable state (of mind)’. This majority usage entails a negative evaluation of the situation which is key to the meaning sense it imparts.4 This type of evaluation has been referred to as semantic prosody (Sinclair, 2004), and is a feature of a number of formulaic sequences.5 Sinclair illustrates how rife behaves similarly: • Male chauvinism was rife in medicine in those days. • Fears are now rife that the price could plunge well below 30p by the end of the year. Proﬁcient language users know that rife is used to express the meaning ‘something undesirable is too common’, and that the formulaic sequence in which rife is embedded typically has the following structure: SOMETHING UNDESIRABLE is/are rife in LOCATION/TIME. To project the formulaic sequence’s meaning, one slot has the semantic constraint ‘something nasty or undesirable’. Likewise, the sequence inevitably carries a negative connotation, because that is the primary reason this sequence is used. Knowledge of this allows the correct interpretation of the following as an assertion that there are too many artists in the panel system, even though this is not explicitly stated. The panel system is rife with artists. Thus, just as single words can carry register/appropriacy marking (skinny has a more pejorative marking than thin), formulaic sequences can carry semantic prosody, and it often is a key element of the sequence’s meaning. So it seems clear

Formulaic sequences in action

that formulaic sequences can carry semantic prosody, but to our knowledge no one has done research into how many do and how many do not. This merely reinforces our impression that there is still a lack of research into many important aspects of formulaic sequences. Formulaic sequences are often tied to particular conditions of use. The term formulaic sequence is deliberately inclusive, and contains a number of diﬀerent kinds of patterned language. As mentioned earlier, some formulaic sequences are relatively obvious in terms of opacity of meaning and/or ﬁxedness of form and so have been deﬁned and discussed for quite some time: e.g. phrasal verbs, idiom, proverbs, and ﬁxed binomials/trinomials. However, even with these established categories of patterned language, deﬁnitions depending solely on descriptions of form and meaning are sometimes not completely clear. For example, most proverbs are semantically opaque, and would be classiﬁed as idioms on the basis of that, so what is the diﬀerence between them? One way of diﬀerentiating the two is their conditions of use. Idioms are typically used to express a concept (put someone out to pasture = retire someone because they are getting old), while proverbs typically state some commonly believed truth or advice (The longest journey begins with the ﬁrst step = a suggestion not to procrastinate, but to begin a long process by taking the ﬁrst necessary steps). In addition to these ‘traditionally-recognized’ categories, we would argue that conditions of use can also be used to fruitfully discuss a broader range of formulaic sequences. Wray (2002, Chapters 4–7) oﬀers a comprehensive exploration of the roles that formulaic sequences have in children and adults, but here we can highlight only a few key reasons why formulaic sequences are used in communication. It has been found that recurring situations in the social world require certain responses from people. These are often described as functions, and include such (speech) acts as apologizing, making requests, giving directions, and complaining. These functions typically have conventionalized language attached to them, such as I’m (very) sorry to hear about ____ to express sympathy and I’d be happy/glad to _______ to comply with a request (Nattinger and DeCarrico, 1992: 62–63). Because members of a speech community know these expressions, they serve a quick and reliable way to achieve the related speech act. Nattinger and DeCarrico suggest that the use of formulaic sequences for functional purposes is widespread, and we are inclined to agree, but believe that the research is too thin on the ground to truly know the extent of their use. One common type of function which is often realized by formulaic sequences

9

0

Norbert Schmitt and Ronald Carter

is maintaining social interaction. People the world over engage in ‘light’ conversation for pleasure or to pass the time of day. In these cases, the purpose of communication is unlikely to be serious attempts to exchange information or to get someone to do something. Rather, the content is less important than the fact that there is a semblance of communication. In these cases, people rely on a set of conventionalised phatic phrases which are non-threatening and help keep the conversation ﬂowing. Examples include comments about the weather (Nice weather today; Cold isn’t it), agreeing with your interlocutor (Oh, I see what you mean; OK, I’ve got it), providing backchannels and positive feedback to another speaker (Did you really?; How interesting). As Kecskes (2003) points out in a study of what he terms ‘situation-bound utterances’, such sequences have the purpose of acting both as a social lubrication and of actively co-constructing interpersonal communication. Another speciﬁc function formulaic sequences realize is that of discourse organization. This is well known to EAP specialists, who commonly teach various discourse markers in writing classes (in other words, in conclusion). Spoken discourse is also rich in these organizing phrases, for example: on the other hand (expressing an alternative viewpoint), to put it another way (re-phrasing), as I was saying, speaking of which (providing links to previous utterances). Sometimes the purpose of using formulaic sequences is to transact information in a precise and eﬃcient manner. Technical words in a ﬁeld realize this purpose (scalpel is a speciﬁc type of knife used in medicine), but technical vocabulary does not have to be limited to single words. Indeed, in many ﬁelds exact phraseology is stipulated to avoid any possible misunderstanding. In aviation language, the phrase Cleared to land gives the pilot very speciﬁc rights and responsibilities. Likewise, the conventionalised way of reporting blood pressure is blood pressure is 140 over 60 and everyone in the medical ﬁeld knows to place the higher pressure ﬁgure ﬁrst. This speciﬁc type of ‘technical’ formulaic sequence is likely to be quite prevalent in technically-based discourse, but again, nobody has yet researched its true extent. There are other purposes which formulaic sequences carry out as well, as illustrated in Wray (2002). Additional ones are likely to emerge with further research. Because formulaic sequences have so many important and frequent uses in language, it should not be surprising that such patterns are frequent in language. Moreover, because particular sequences are tightly linked to particular language functions or information, our interlocutors expect them, and they are the preferred choice. Thus formulaic sequences are not only useful for eﬃcient language usage; they are essential for appropriate language use.

Formulaic sequences in action

The acquisition of formulaic sequences For about two decades, there has been a steadily increasing amount of research being done on vocabulary in general (see Meara, 1987, 1992, 2003), and with it we are also starting to see more interest in formulaic language. Corpus-based research has informed the ﬁeld by identifying formulaic language and describing how it is used in discourse. The body of continental work has largely focused on such issues as lexicography, the phraseology of regional dialects, and text linguistics (Kon Kuiper, personal communication). However, it is probably fair to say that the amount of research into the acquisition of formulaic sequences has been fairly modest in comparison (see Wray, 2002, for the most comprehensive overview; also Weinert, 1995). There is a consensus that some L1 acquirers do learn and use formulaic sequences before they have mastered the sequences’ internal makeup. Moreover, the acquisition of formulaic sequences might depend to some extent on whether children are referential or expressive learners, that is, whether they are ‘system learners’ more than they are ‘item-learners’ (Cruttenden, 1981) (see also Brown, 1973 and Peters, 1983). Nelson (1973) found that children who had referential preferences (naming things or activities and dealing with individual word items) usually learned more single words, particularly nouns. Conversely, children who had more expressive tendencies (having interactional goals; focusing on the social domain) were more likely to learn whole expressions which were not segmented. The reason for these preferences may be psycholinguistic in nature (Bates and MacWhinney, 1987), or may only reﬂect what the child “supposes the language to be useful for”: predominantly naming things in the world or engaging in social interaction (Nelson, 1981: 186). It may also reﬂect the input a child receives: games for naming things in the world or social control clumps such as ‘D’ya wanna go out?’ (Nelson, 1981). Regardless of the underlying reason, there seems to be a link between the need and desire to interact and the use of formulaic sequences. In L2 acquisition, formulaic sequences are also relied on initially as a quick means to be communicative, albeit in a limited way. This can lead to quicker integration into a peer group, which can result in increased language input. Wong Fillmore (1976) found this was the case with ﬁve young Mexican children trying to integrate into an English-medium school environment. She identiﬁed eight strategies the children used, and at least three of them directly involved formulaic language:



2

Norbert Schmitt and Ronald Carter

• Give the impression, with a few well-chosen words (phrases), that you speak the language • Get some expressions you understand, and start talking • Look for recurring parts in the formulas you know. The use of formulaic sequences enabled the realization of these strategies even though the children’s language capabilities were quite limited. Furthermore, the use of formulaic sequences to facilitate language production is not restricted to L2 children. Schmidt’s (1983) study of Wes is a good example of the phenomenon in L2 adults; Wes’s speech is ﬁlled with formulaic language as a means of fulﬁlling his desire to be communicative, but not necessarily accurate. But formulaic sequences may provide language learners with more than an expedient way to communicate; they might also facilitate further language learning. For L1 learners, it has been proposed that unanalysed sequences provide the raw material for language development, as they are segmented into smaller components and grammar (see Peters, 1983). If so, it is possible that they serve the same purpose for L2 learners (e.g. Bardovi-Harlig, 2002). However, even if this proves not to be the case, there is little doubt that the automatic use of acquired formulaic sequences allows chunking, freeing up memory and processing resources (Kuiper, 1996, and Ellis, 1996 who explores the interaction between short-term and long-term phonological memory systems). These can then be utilized to deal with conceptualising and meaning, which must surely aid language learning. Wood (2002: 5) nicely summarizes the possible double role of formulaic sequences in language acquisition: They are acquired and retained in and of themselves, linked to pragmatic competence and expanded as this aspect of communicative ability and awareness develops. At the same time, they are segmented and analyzed, broken down, and combined as cognitive skills of analysis and synthesis grow. Both the original formulas and the pieces and rules that come from analysis are retained.

So sequence-based learning seems to have a part to play in language acquisition. A key question is how large a part it plays compared to grammar-based acquisition. Wray and Perkins (2000) and Wray (2002) argue that the balance of sequence-based versus grammatically-generated language varies during an L1 child’s development. During Phase 1 (birth to around 20 months), the child will mainly use memorized vocabulary for communication, largely learned through imitation. Some of this vocabulary will be single words, and some will consist of sequences. At the start of Phase 2 (until about age 8), the child’s grammatical awareness begins, and the proportion of analytic language compared to ho-

Formulaic sequences in action

listic language increases, although with overall language developing quickly in this phase, the amount of holistically-processed language is still increasing in real terms. During Phase 3 (until about age 18), the analytic grammar is fully in place, but formulaic language again becomes more prominent. “During this phase, language production increasingly becomes a top-down process of formula blending as opposed to a bottom-up process of combining single lexical items in accordance with the speciﬁcation of the grammar” (Wray and Perkins, 2000: 21). By Phase 4 (age 18 and above), the balance of holistic to analytic language has developed into adult patterns. The course of formulaic sequence development is more diﬃcult to chart in L2 learners. Typically there is early use of formulaic sequences, often after a silent period. As learners’ proﬁciency improves, there is the reasonable expectation of language which is more accurate and appropriate. In natives, this is achieved to a large extent through the use of formulaic sequences. Unfortunately, the formulaic language of L2 learners tends to lag behind other linguistic aspects (Irujo, 1993). This may be partly due to a lack of rich input: Irujo (1986) suggests that idioms are often left out of speech addressed to L2 learners. Learners also seem to avoid the use of idiomatic language (Kellerman, 1978), although this may have more to do with the degree of L1–L2 similarity than any intrinsic diﬃculty (Laufer and Eliasson, 1993; Laufer, 2000; Vihman, 1982: 272). There is also the tendency to stick with familiar and ‘safe’ sequences which the learners feel conﬁdent in using (Granger, 1998), although De Cock (2000) found that some formulaic sequences were overused, some underused, and others simply misused by nonnatives when compared to native norms. These tendencies have been noted by researchers, but overshadowing all of these results is the great variation in L2 use of formulaic sequences, which must at least partially stem from the fact that L2 learners are a diverse group in terms of age, manner of acquisition, L1, social environment, etc. (Wray, 2002: 144ﬀ) . There may well be an underlying systematicity to the acquisition and use of L2 formulaic language, but there is simply not enough focused research at present to say very much with conviction. One interesting development is the emergence of pattern-based models of acquisition, which posit that the human facility for language learning is based on the ability to extract patterns from input, rather than being under the guidance of innate principles and parameters which determine what aspects of grammar can and cannot be acquired (see Ellis, 1996, 2002, SSLA 24). This line of thinking suggests that we learn the letter sequences which are acceptable in a language (the consonant cluster sp can be word-initial in English, but hg cannot) simply by repeatedly seeing sp at the beginning of words, but not hg. This

3

4

Norbert Schmitt and Ronald Carter

learning is implicit, and may not be amenable to conscious metalinguistic explanation. Of course, learners may eventually reach the point where they can declare a ‘rule’ for this consonant clustering, but the rule is an artefact of the pattern-based learning, rather than the underlying source of learning. This pattern-based learning also works for larger linguistic units, such as how sequences of morphemes can combine to form words (un-question-able, un-reli-able, unfathom-able). Moving to words, we gain intuitions about which words collocate together and which do not (blonde hair, *blonde paint; auburn hair but only for women, not men). Many of these collocations must be based solely on pattern recognition, because there is often no semantic reasoning behind acceptable/nonacceptable pairings (*blonde paint makes perfect logical sense). Neither are collocations likely to be learned explicitly, because they are not normally taught, and even if they are, only possible cases are illustrated, not inappropriate combinations. Longer formulaic strings, which are also based on patterns rather than rules, seem to ﬁt very nicely with such sequence-based models of acquisition as well. Time will tell whether this kind of model best captures the mechanics of formulaic sequence acquisition (and that of language in general), but one thing seems certain. Given the increasingly evident importance of formulaic sequences in language use, convincing explanations of the mechanics of their acquisition must become an essential feature of any model of language acquisition.

Issues explored in this volume This volume has two main purposes. It reports on some of the ﬁrst sustained research into the acquisition, processing, and use of formulaic sequences. Equally important, it utilizes a wide range of methodologies to explore formulaic sequences, some of them used for the ﬁrst time. As such, the volume models methodological directions for future research in this area, and illustrates how innovative research methods can be fruitfully applied. It is diﬃcult to ﬁt the chapters in this volume into neat categories, but some logical grouping was possible. The ﬁrst three chapters provide backgrounding for the studies to follow. Chapters 4–6 report on the acquisition-based CRAL studies. Chapters 7–9 report on the CRAL studies focusing on the processing of formulaic sequences. The next two chapters do not ﬁt into any particular category, but Chapters 12 and 13 have a deﬁnite pedagogic element. The rest of this section provides brief overviews of the volume chapters.

Formulaic sequences in action

It should be clear from the brief overview in this chapter that numerous issues need to be explored concerning how formulaic sequences are acquired, processed, and used. This requires research, and most of this research will be empirical. This means that valid and reliable measures of formulaic sequences need to be developed or reﬁned. Read and Nation consider measurement methodology in Chapter 2, providing an overview of issues which need to be considered when tapping formulaic sequence knowledge. Much of everyday language is conventionalized, and this conventionalization is realized by various types of formulaic sequence. However, there are some kinds of language which are exceptionally conventionalised. Some examples of this are language which routinely covers the same topics over and over again (weather reporting, oral heroic poems), language where speed is important (auctioneering, sports reporting), and language where very precise formulations are required (air traﬃc control). Exploration of how formulaticity is involved in this kind of language use can provide insights into how it is used in more general circumstances. In Chapter 3, Kuiper reviews his and other research into highly conventionalized language and highlights the advantages of formulaic sequences in this language, as well as showing how the acquisition of situation-speciﬁc formulaic sequences (and the attending cultural knowledge) requires a long-term learning process. The reader should be aware however, that Kuiper uses somewhat diﬀerent terminology and deﬁnitions concerning formulaic language than most of the other chapters in this volume. Corpus evidence shows that formulaic sequences are widespread in native language. However, some research indicates that nonnatives have limited mastery of a limited number of formulaic sequences. Schmitt et al. address this issue directly in Chapter 4. The research team measured the productive and receptive knowledge of academically-based formulaic sequences in EAP students studying to enter British universities. They found that the students knew a surprising number of the formulaic sequences even before they entered the program, and knew most of them after the program ﬁnished, indicating that learning had taken place. Somewhat surprising though, the attitude/motivation and aptitude factors measured as part of the study did not predict this improvement. Even though the participants in the above study were able to improve their knowledge of formulaic sequences as a group, obviously some learners improved more than others. Using the classic ‘good learner/poor learner’ design, in Chapter 5 Dörnyei, Durow, and Zahran explore four successful and three unsuccessful learners in detail using a series of extended interviews. From this rich one-on-one data, they ﬁnd that success in acquiring formulaic sequences

5

6

Norbert Schmitt and Ronald Carter

seems to be strongly related to the participants’ active involvement in the English-speaking social community. Unfortunately, some of the international students in this study found it extremely diﬃcult to join ‘host-national networks’. The study suggests that if sociocultural adaptation is absent, only a combination of particularly high levels of language aptitude and motivation can compensate for this lack. The theme of socio-cultural integration is investigated in depth in Chapter 6. Adolphs and Durow analyze the spoken output of one high-integration student and one low-integration student to track their use of formulaic sequences over seven months at a British university. In the ﬁrst analysis, the participants’ production of 3-word formulaic sequences is tallied, and only the high-integration student seems to show any real progress. However, this tally only shows the number of sequences produced, but not their quality. The authors carry out a second analysis in which they ﬁrst compile a list of the most frequent 15 words in the participants’ output, and then run a sequence analysis to identify the sequences which form around these words (e.g. know → I don’t know). The sequences from the participants’ production are subsequently compared to CANCODE norming data. Based on this analysis, the high-integration student clearly outperforms the low-integration student, providing additional evidence for the importance of socio-cultural integration in the acquisition and use of formulaic sequences. Corpus analysis has shown that there are a great number of word clusters which recur at varying degrees of frequency within a corpus. However, what does the existence of recurrent clusters in corpora tell us about how those clusters are stored and processed by the human mind? In Chapter 7, Schmitt, Grandage, and Adolphs embed a variety of recurrent clusters drawn from corpus analysis into a psycholinguistic dictation task to see how natives and nonnatives are able to reproduce those clusters. The results show that, for the natives, although some of those clusters are likely to be stored holistically in the mind, a large number are not. The nonnative performance suggests that very few of the clusters are holistically stored in a way that would facilitate accessible retrieval and ﬂuent use. The authors conclude that it cannot be assumed that recurrent clusters identiﬁed through corpus techniques are necessarily stored in the mind in a holistic manner. The next two chapters explore how formulaic sequences are processed, using techniques borrowed from psychology. In Chapter 8, apparatus is employed which tracks the eye movements of participants as they read passages in which formulaic sequences are embedded. Underwood, Schmitt, and Galpin ﬁnd that

Formulaic sequences in action

both natives and nonnatives have fewer eye ﬁxations on words which are part of a formulaic sequence, than the same words when they are part of non-formulaic text. The natives also focus on the formulaic sequence words for shorter durations, although the gaze periods for nonnatives do not diﬀer between formulaic and nonformulaic words. The overall results indicate that there is a processing advantage for formulaic sequences, at least in terms of reading. In Chapter 9, Schmitt and Underwood use the same passages with embedded formulaic sequences, but this time the task for participants is to read the passage one word at a time within a self-paced reading paradigm. The participants tap a button to bring up each subsequent word in a passage, and the time between taps measured. In contrast to the above study, this technique shows no diﬀerence in recognition speed between the words in their formulaic vs. nonformulaic environments. However, for the nonnative participants, words appearing in formulaic sequences that were known are recognized faster than words in unknown formulaic sequences. This may well reﬂect the diﬃculty the nonnatives have with the unknown formulaic sequences. Overall, the results are less than clear, and the authors suggest that the self-paced reading technique needs to be reﬁned for further investigations. Formulaic sequences seem to be a common feature across languages. Thus knowing a formulaic sequence in one language may aﬀect the way it is learned in another. Spöttl and McCarthy (Chapter 10) examine participants who knew, or were learning, three or more languages and compare their knowledge of formulaic sequences across those languages. A think-aloud protocol analysis found that participants move between formulaic sequences among their various languages in mainly three ways: 1) the formulaic sequence is translated between languages holistically, without hesitation, repetition, or evaluation, 2) when the initial attempt at translation fails, the formulaic sequence itself is repeated and an evaluation of various possibilities evaluated, and 3) when the initial attempt at translation fails, the individual words of the formulaic sequence are repeated (but not the whole sequence), and a search process initiated which focuses on those words or the grammar of the language. The second approach is found to be most common, and a number of strategies are identiﬁed within this approach. The authors also ﬁnd that their participants are not particularly good at assessing their true knowledge of target formulaic sequences. A perpetual question in pedagogy is how to present target items to learners. Presumably anything that makes those items more salient or noticeable is beneﬁcial for learning. In Chapter 11, Bishop explores whether the use of typographical highlighting (underlining and red font) of words and formulaic

7

8

Norbert Schmitt and Ronald Carter

sequences encourages nonnative learners to click on those items for glosses. Participants look up more glosses for unknown words than unknown formulaic sequences for unhighlighted items, but for highlighted items, this result is reversed. This indicates that such highlighting can make formulaic sequences more noticeable. It has been claimed that formulaic sequences are less easily recognizable as holistic entities than words, because unlike words with spaces around them to indicate their boundaries, it is not clear where the boundaries of unknown formulaic sequences lie. If this is true, then highlighting the form of formulaic sequences can make their ‘wholeness’ apparent, which may facilitate learning. It has often been assumed that formulaic sequences take a long time to acquire. However, what would happen if they were taught intensively over as short a period as ﬁve days? Wray (Chapter 12) reports on a learner taking part in the British television program “Welsh in a Week”. The participant studies formulaic sequences with the purpose of becoming suﬃciently ﬂuent with a limited amount of Welsh in order to meet the challenge of a public presentation. However, although the learner understands that she would be most successful if she simply memorized the material given to her, by ﬁve months after her performance she had introduced typical learner errors into what she remembered of the original material. This suggests that the adult learner’s need to analyze linguistic material is unavoidable, and implies that the teaching of formulaic material to post-pubescent learners may be an uphill struggle. Jones and Haywood also take a pedagogical approach in Chapter 13, but this time in a traditional EAP classroom. They report on their eﬀorts to develop materials for and to teach formulaic sequences to their students over a period of ten weeks. The students are initially sceptical about the value of focusing on formulaic sequences, but seem to eventually realize their importance. The authors carefully track their students and ﬁnd some evidence of modest gains in formulaic sequence knowledge on a test by the end of the study, although there is no substantial evidence of this in the students’ writing. However, there is clear evidence that the students had increased their awareness of formulaic sequences in general.

Other lines of research into formulaic sequences This volume reports on research speciﬁcally into the acquisition, processing, and use of formulaic sequences. But in the end it is only one book and cannot

Formulaic sequences in action

hope to cover the many diverse questions which beg for answers. A few of these questions are listed here as intriguing prompts for any researcher who might want to pursue studies in this important developing area. 1. Once learned, are formulaic sequences overused or underused in terms of the norms of stylistic appropriacy of the speech community, in the same way individual words can be over- or underused? 2. How are formulaic sequences acquired in naturalistic and formal settings? What is the same/diﬀerent about learning formulaic sequences in these settings? What is the best way to teach formulaic sequences? Can they be taught at all? 3. What is the relationship between knowledge of formulaic sequences and knowledge of their individual component words? 4. How many exposures are necessary to learn formulaic sequences with various kinds of input? Is it the same as for individual words? 5. What is the nature of attrition of formulaic sequences? Are some elements retained better than others, or is the whole chunk either retained or forgotten? 6. Which elements of a formulaic sequence are most salient? Do formulaic sequences cluster around a key word or core collocation? 7. Are formulaic sequences learned in an all or nothing manner? 8. Does giving attention to formulaic sequences increase the chances of their acquisition? There are numerous other questions and we hope that this volume will be followed by many exploring this area. If it is accepted that formulaic sequences play an important part in language use, then any further research can only add to our knowledge of second language acquisition, linguistic theory, and many other applied linguistic areas.

Notes . Sinclair illustrates how both principles are essential but that attention has, especially within the Chomskyan tradition, normally been devoted mainly to the former principle. 2. It should be noted that continental researchers have treated multiword units as an important feature of language for decades. However, they often published in German and Russian, and so their impact was not as great as it might have been in the Anglophone world. For entry into some of this research, see Zgusta (1971), Aisenstadt (1981), Mel’čuk (1981), Howarth (1996), Cowie (1998), and Burger (2003).

9

20

Norbert Schmitt and Ronald Carter 3. Some authors in this book have chosen to use other terms for various reasons, but formulaic sequence will be the cover term used in most chapters. 4. Bordering on is also used to express positive evaluation, as in the ‘hotel’ example, in a minority of cases (9 instances out of the 100). 5. Stubbs (1995) describes the same phenomenon, referring to it as collocational prosody. Also, see Stubbs (2002) for a range of corpus-based studies of formulaic sequences.

Acknowledgements Our deepest appreciation goes to Alison Wray and Kon Kuiper who gave us detailed feedback on an earlier draft of this chapter. Their comments were invaluable in helping us to sharpen our thinking and much of what is good in the chapter draws heavily upon those comments.

References Aisenstadt, E. 1981. Restricted collocations in English lexicology and lexicography. ITL 53: 53–61. Bardovi-Harlig, K. 2002. A new starting point? Investigating formulaic use and input in future expression. Studies in Second Language Acquisition 24 : 189–198. Bates, E. and MacWhinney, B. 1987. Competition, variation, and language learning. In Mechanisms of Language Acquisition, B. MacWhinney (ed.), 157–193. Hillsdale NJ: Lawrence Erlbaum. Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Brown, R. 1973. A First Language. London: Allen and Unwin. Burger, H. 2003 (2nd ed.). Phraseologie: Eine Einführung am Beispiel des Deutschen. Berlin: Eric Schmidt Verlag. Cowie, A. P. 1998. Phraseological dictionaries: Some East-West comparisons. In Phraseology: Theory, Analysis, and Applications, A. P. Cowie (ed.), 209–228. Oxford: OUP. Cruttenden, A. 1981. Item-learning and system-learning. Journal of Psycholinguistic Research 10: 79–88. de Cock, S. 2000. Repetitive phrasal chunkiness and advanced EFL speech and writing. In Corpus Linguistics and Linguistic Theory, C. Mair and M. Hundt (eds), 51–68. Amsterdam: Rodopi. Ellis, N. C. 1996. Sequencing in SLA: Phonological memory, chunking, and points of order. Studies in Second Language Acquisition 18: 91–126. Ellis, N. C. 2002. Frequency eﬀects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24: 143–188. Erman, B. and Warren, B. 2000. The idiom principle and the open-choice principle. Text 20: 29–62.

Formulaic sequences in action Foster, P. 2001. Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers. In Researching Pedagogic Tasks: Second Language Learning, Teaching, and Testing, M. Bygate, P. Skehan, and M. Swain (eds), 75– 93. Harlow: Longman. Granger, S. 1998. Prefabricated patterns in advanced EFL writing: Collocations and formulae. In Phraseology: Theory, Analysis and Applications, A. P. Cowie (ed.), 145–160. Oxford: OUP. Howarth, P. 1996. Phraseology in English Academic Writing: Some Implications for Language Learning and Dictionary Making. Tübingen: Max Niemeyer. Irujo, S. 1986. A piece of cake: Learning and teaching idioms. ELT Journal 40: 236–242. Irujo, S. 1993. Steering clear: Avoidance in the production of idioms. International Review of Applied Linguistics in Language Teaching 31: 205–219. Jackendoﬀ, R. 1995. The boundaries of the lexicon. In Idioms: Structural and Psychological Perspectives, M. Everaert, E. van der Linden, A. Schenk, and R. Schreuder (eds), 133–166. Hillsdale NJ: Erlbaum. Kecskes, I. 2003. Situation-Bound Utterances in L1 and L2. Berlin: Mouton de Gruyter. Kellerman, E. 1978. Giving learners a break: Native language intuitions as a source of predictions about transferability. Working Papers in Bilingualism 15: 309–315. Kuiper, K. 1996. Smooth Talkers: The Linguistic Performance of Auctioneers and Sportscasters. Mahwah NJ: Lawrence Erlbaum. Kuiper, K. and Haggo, D. 1984. Livestock auctions, oral poetry, and ordinary language. Language in Society 13: 205–234. Laufer, B. 2000. Avoidance of idioms in a second language: The eﬀect of L1-L2 degree of similarity. Studia Linguistica 54: 186–196. Laufer, B. and Eliasson, S. 1993. What causes avoidance in L2 learning: L1-L2 diﬀerence, L1-L2 similarity, or L2 complexity? Studies in Second Language Acquisition 15: 35–48. Meara, P. 1987. Vocabulary in a Second Language: Vol. 2. London: Centre for Information on Language Teaching and Research (CILT). Meara, P. 1992. Vocabulary in a second language. Volume III 1986–1990. Reading in a Foreign Language 9: 761–837. Meara, P. The Vocabulary Acquisition Research Group Archive (VARGA). Internet resource: http://www.swan.ac.uk/cals/calsres/varga/index.htm. Accessed June 21, 2003. Mel’čuk, I. 1981. Meaning text models: A recent trend in Soviet linguistics. Annual Review of Anthopology 10: 27–62. Mel’čuk, I. 1995. Phrasemes in language and phraseology in linguistics. In Idioms: Structural and Psychological Perspectives, M. Everaert, E. van der Linden, A. Schenk and R. Schreuder (eds), 167–232. Hillsdale NJ: Erlbaum. Moon, R. 1997. Vocabulary connections: Multi-word items in English. In Vocabulary: Description, Acquisition and Pedagogy, N. Schmitt and M. McCarthy (eds), 40–63. Cambridge: CUP. Nation, I. S. P. 1990. Teaching and Learning Vocabulary. New York: Heinle and Heinle. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. Nelson, K. 1973. Structure and Strategy in Learning to Talk. Monographs of the Society for Research in Child Development, Serial no. 149, nos 1–2. Nelson, K. 1981. Individual diﬀerences in language development: Implications for development and language. Developmental Psychology 17: 170–187.

2

22

Norbert Schmitt and Ronald Carter Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J.C Richards and R.W. Schmidt (eds), 191–225. London: Longman. Peters, A. M. 1977. Language learning strategies: Does the whole equal the sum of the parts? Language 53: 560–573. Peters, A. 1983. The Units of Language Acquisition. Cambridge: CUP. Schmidt, R.W. 1983. Interaction, acculturation, and the acquisition of communicative competence: A case study of an adult. In Sociolinguistics and Language Acquisition, N. Wolfson and E. Judd (eds), 137–174. Rowley MA: Newbury House. Schmitt, N. 2000. Vocabulary in Language Teaching. Cambridge: CUP. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Sinclair, J. 2004. Trust The Text: Lexis, Corpus, Discourse. London: Routledge. Stubbs, M. 1995. Collocations and semantic proﬁles: On the cause of trouble with quantitative studies. Functions of Language 2: 1–33. Stubbs, M. 2002. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell. van Lancker, D., Canter, G. J., and Terbeek, D. 1981. Disambiguation of diatropic sentences: Acoustic and phonetic cues. Journal of Speech and Hearing Research 24: 330–335. Vihman, M. M. 1982. Formulas in ﬁrst and second language acquisition. In Exceptional Language and Linguistics, L. K. Obler and L. Menn (eds), 261–284. New York: Academic Press. Weinert, R. 1995. The role of formulaic language in second language acquisition: A review. Applied Linguistics 16: 180–205. Wong Fillmore, L. 1976. The Second Time Around: Cognitive and Social Strategies in Second Language Acquisition. Unpublished PhD thesis, Stanford University. Wood, D. 2002. Formulaic language in acquisition and production: Implications for teaching. TESL Canada Journal 20: 1–15. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP. Wray, A. and Perkins, M. R. 2000. The functions of formulaic language: An integrated model. Language and Communication 20: 1–28. Zgusta, L. (ed.). 1971. Manual of Lexicography. Mouton: The Hague.

Measurement of formulaic sequences John Read and Paul Nation

Victoria University of Wellington

Introduction Most of the research on formulaic sequences until now — particularly that done before the advent of computers and the ﬁeld of corpus linguistics — has primarily involved descriptive work to exemplify and classify multiword units which scholars have considered to function lexically rather than grammatically in the language. However, if work in this area is to advance and to move into the mainstream of applied linguistic research, it is necessary to address some important methodological issues that arise in the investigation of these lexical units. This chapter draws on insights from research methodology and language testing to identify particular problems of measurement in dealing with formulaic language and propose how they might be solved. We will illustrate some of our points by reference to the work reported in other chapters of this volume. One of the exciting developments in recent years is the realisation that formulaic sequences have been of long-standing interest to scholars in a whole variety of disciplines both inside and outside applied linguistics. Thus, in a sense, we are currently in a phase of surveying and attempting to integrate the insights that have been gained by researchers working in diﬀerent ﬁelds all around the world without necessarily being aware of what others were doing. This is well illustrated by Wray’s (2002) excellent book, which draws together work in general linguistics, phraseology, lexicography, corpus linguistics, ﬁrst and second language acquisition, language teaching, neurolinguistics and other disciplines. It is important to note that scholars in these various ﬁelds not only bring their own theoretical perspectives to bear on the study of formulaic language but also have distinctive methodological approaches to their work. This of course is a familiar situation in an interdisciplinary ﬁeld like applied linguistics, but what it means is that it would be unrealistic for us to attempt to impose a single research paradigm on the study of formulaic sequences. Thus, in this chapter we will attempt to focus on general principles and issues of measurement that need

24

John Read and Paul Nation

to be taken into account regardless of the particular research paradigm that the investigator is working within. Use of the term measurement may suggest that we favour quantitative or statistically based methods of investigation rather than qualitative ones. However, we are adopting a broad deﬁnition of measurement which includes criteria for the identiﬁcation of multiword units as formulaic sequences and for classifying them into categories, even if no further counting of relative frequencies or any other form of statistical analysis is then applied. In addition, we argue that an adequate account of formulaic units as they function in language acquisition and language use can come only from a combination of quantitative and qualitative analyses. The same already applies, of course, in word-based vocabulary studies. Although it may seem quite straightforward to the naïve observer to identify and count words, linguists and vocabulary researchers are well aware of the problematic nature of the word as a linguistic concept. A purely formal deﬁnition of a word as word form is of limited value in itself, as illustrated by one of the early computer-based word frequency counts (Carroll, Davies and Richman, 1971), where people, People, people’s, People’s, peopled, peoples and Peoples are all listed as separate items. Thus, vocabulary scholars have developed more meaningful conceptual units, such as the lemma, homonym, word family, lexeme or lexical unit, and the raw output of a frequency count needs to be classiﬁed at least partially by means of human judgement into one or more of these categories in order to be usable for further analysis. Some of these categories already involve units consisting of more than one word form, such as compound nouns, phrasal verbs and idiomatic expressions. Once we shift the attention to the whole range of multiword units, the basic elements are rather more diﬃcult to identify than individual word forms are and so both quantitatively and qualitatively, more sophisticated procedures are required to locate and classify them. In this chapter we intend to do the following: We will consider a deﬁnition of formulaic sequences and then look at reliability and validity issues in their identiﬁcation, eventually focusing on the importance of triangulation. Finally, we consider the procedures used in several of the studies included in this volume.

Deﬁnition of the construct In modern validity theory in educational measurement, a crucial step initially is to deﬁne the construct at a conceptual level. This then provides a basis for

Measurement of formulaic sequences

judging the adequacy of operational measures of the construct. In the case of formulaic sequences, Wray (2002: 9) has proposed a deﬁnition which is likely to be very inﬂuential but it also needs to be subject to critical scrutiny. If her definition is adopted, then the ultimate goal of an analysis will be to identify sequences that are “stored and retrieved whole from memory at the time of use”. This is a challenging goal because the means of storage and retrieval of the same sequence can diﬀer from one individual to another, and can diﬀer from one time to another for the same individual depending on a wide range of factors such as changes in proﬁciency, changes in processing demands, and changes in communicative purpose. There is some evidence for this variability from the study of idioms. Grant (2003) did an exhaustive study of what she called core idioms, which are noncompositional (the meaning of the parts does not give the meaning of the whole) and non-ﬁgurative (the image created by the unit does not relate to the meaning of the unit). They must also consist of words that can occur in other places. Grant found that English has about 104 core idioms. About 25% are frozen, and only 10 had a literal equivalent in the British National Corpus. Even among such a narrowly deﬁned group of items, where we would expect to ﬁnd extreme formulaicity, the norm seems to be that there is considerable variation. Here some of the variants of the core idiom pull someone’s leg: pull my blue leg, somebody’s leg was being pulled, having his leg pulled, leg pulling, a leg pull, a leg puller, tugged my leg, yank somebody’s leg, leg tugged/yanked. There is a similar set of variants for put your foot in your mouth: put your foot in it, putting his foot in his mouth to the kneecap, put his foot well and truly in his mouth, with her foot in his mouth, foot and mouth, foot-inmouth moments, foot-and-mouth soldiers, put your feet in your mouth. Most of these are low in frequency but there is a lot of variation, even without considering the numerous versions of the object or verb form. This variability however does not prove that all uses of the idiom are not formulaic. It is clear that some of the variations are deliberate attempts to add humour by playing with something that is typically ﬁxed. The evidence from the study of core idioms suggests that there are probably very few sequences, if any, that are always formulaic, and thus the most valid criteria for deciding formulaicity will be those that take account of features that are present in each particular use of a possible sequence. Wray’s (2002: 9) deﬁnition of formulaic sequences is deliberately inclusive. It

25

26

John Read and Paul Nation

goes only a short way towards specifying the form in which a sequence is stored and it states explicitly that the sequence need not be continuous. That is, there may be insertions in it, such as when right bloody is inserted into came a cropper: came a right bloody cropper. The deﬁnition also seems to exclude substitution of items within a sequence, such as the following variations within the ‘pull’ and ‘person’ components of pulling my leg : pull pulled pulls pulling yank etc. tug etc.

his her my your leg our someone’s his sister’s

Similarly, transformations of a sequence would not be included: chew the fat, fat-chewing, fat-chewers. These substitutions and transformations would be excluded because they would involve “generation or analysis of the language grammar” (Wray 2002: 9). The deﬁnition does not specify the form of the items in storage. If it is verbatim storage, where the actual words of the sequence are stored without the possibility of substitution or transformation, then Grant’s (2003) research suggests we are dealing with only a small number of sequences that are rather infrequent. This deﬁnition of a formulaic sequence is one that Kuiper (this volume) seems to follow. It is relatively easy to identify such sequences because of their ﬁxed form, and most researchers would readily consider them formulaic. However, much further along a possible scale of formulaicity are the numerous examples of collocational prosody such as bordering on, where the formula is at a rather abstract level. These sequences allow insertion, inﬂection, substitution, deletion, and transformation which all involve “generation or analysis by the language grammar”. The term formulaic sequence could not be sensibly applied to such patterns. Thus, Grant’s (2003) ﬁndings challenge the adequacy of Wray’s deﬁnition of the construct. The interest in formulaic sequences is partly a reaction to the lack of description of semantic patterning in previous descriptions of language. However, semantic patterning and formulaic sequences are not the same thing and so the deﬁnition needs to take account of this distinction if it is to be comprehensive enough to cover the phenomena to be investigated. Given the variability in formulaic language that we noted above, the deﬁnition of these se-

Measurement of formulaic sequences

quences may need to be tailored to some degree to the speciﬁc objectives of each research study.

Sources of evidence Once conceptual issues have been addressed, an essential requirement for the identiﬁcation of formulaic sequences is to have a source of examples of multiword units for analysis. From a measurement perspective, the key issue in choosing a suitable source is one of sampling: how to ensure that there are sufﬁcient examples to allow reliable generalisations to be made and, where applicable, that the sample is representative enough to provide the basis for a valid classiﬁcation system. There is a long-standing practice among grammarians and linguists of building up a collection of examples of idioms or other formulaic sequences, based on their own introspective knowledge of the language plus instances that they encounter through their reading, conversational interaction and other communicative activities in the language. Some scholars such as Pawley and Syder (1983) and Nattinger and DeCarrico (1992) adopted a more structured approach, drawing on transcriptions of spoken discourse and/or written texts of various kinds but without giving speciﬁc details of the scope of the source material. Their work has proved to be very important in applied linguistics in drawing attention to the pervasiveness of formulaic sequences and highlighting the variety in both the forms they take and the functions they perform. However, in sampling terms, this general approach will typically create a “convenience” sample, which is subject to uncontrolled bias. For work in this area to advance, it is necessary to complement such informal collections of examples with more systematic data-gathering procedures that can challenge the perceptions of individual investigators. The obvious source of more systematic evidence is some kind of text database. These now commonly take the form of computer corpora, providing very large samples of language, which can then be searched in an eﬃcient manner. Corpus software generates frequency counts and a whole variety of other quantitative measures. In addition, it can supply lists of words and word strings that meet particular speciﬁcations as the basis for qualitative analyses of idiomaticity, semantic transparency, semantic vs. pragmatic meaning, and so on. There are a number of options when it comes to the choice of a corpus for the analysis of formulaic sequences.

27

28

John Read and Paul Nation

Large general corpora Mega-corpora such as the Bank of English and the British National Corpus lend themselves well to certain kinds of research on formulaic sequences, for similar reasons to the enormous contributions they have made to lexicography, wordbased vocabulary studies, and descriptive grammars, among others. However, depending on the particular focus of the research, they also have some limitations. • There is bias in the sample of texts they include. The most obvious one is that spoken language is underrepresented, but there is also bias in style (overrepresentation of formal, informative prose) and genre (journalistic texts in the Bank of English). • Even in such large corpora, particular kinds of formulaic sequence may have quite low frequency, as Moon (1998) found in her research on idioms, proverbs and similes. • Although corpus software is getting more sophisticated all the time, there are still limits on what it can ﬁnd in a large corpus. • The particular kinds of text that are of interest (eg learner language; storytelling to schoolchildren) may not be in the corpus at all.

Specialized corpora There are a fast growing number of more specialized corpora which oﬀer opportunities to investigate formulaic sequences in more particular varieties of language. These include corpora of spoken language (the London Lund Corpus, the Cambridge and Nottingham Corpus of Discourse in English — CANCODE), learner language (the International Corpus of Learner English — ICLE), child language (The Child Language Data Exchange System — CHILDES), regional varieties (the International Corpus of English — ICE — corpora, the Brown corpus of American English and the various parallel corpora of other national varieties), and discipline-speciﬁc corpora. The issues involved in selecting a particular corpus include considering whether the corpus ﬁts the particular requirements of a proposed formulaic sequence study, whether it is accessible by other researchers (than the original compilers), whether the corpus is large enough to satisfy reliability requirements, and whether certain crucial kinds of information about the texts are available in the corpus, for example, the speciﬁc sources of written texts or particular phonological notation for oral texts. Given the pragmatic dimension

Measurement of formulaic sequences

to the meaning of many formulaic sequences, especially in oral language use, the researcher may require richer contextual information than the corpus provides. A further category includes collections of written or oral texts that may not be thought of as constituting a corpus, such as the reanalysis by Foster (2001) of the transcripts from the Skehan and Foster research on task-based language learning.

Purpose-built databases If existing corpora do not meet the research requirements, it will be necessary to build a set of data from scratch. This does not necessarily involve compiling a “whole” corpus (whatever the minimum dimensions of that might be). It may simply be the kind of data-gathering that sociolinguists, discourse analysts and others routinely engage in to collect samples of language use, either by unobtrusive recording of “natural” speech events or by elicitation procedures. Kuiper’s studies of race callers, auctioneers and checkout operators are good examples of these (see Chapter 3).

Procedures for identiﬁcation and classiﬁcation As previously indicated, in its present stage of development the study of formulaic sequences still faces fundamental problems in identifying the units of analysis within a database or corpus. Wray (2002: Chap 2) gives a comprehensive discussion of the criteria that have been proposed or applied in previous research. We will summarize the criteria here and explore the measurement issues.

Intuition The status of the intuition of an individual investigator is dubious from a modern “scientiﬁc” perspective. The exercise of this kind of subjective judgement is likely to be more acceptable if one or more of the following conditions apply: • a deﬁnition of what is meant by a formulaic sequence is carefully formulated in advance, as previously discussed. • the investigator communicates the deﬁnition to a second person, who then attempts to replicate the investigator’s identiﬁcation of the formulaic units.

29

30

John Read and Paul Nation

• instead of relying on the researcher’s judgement, a panel of judges is formed to analyse the database and a multiword unit is accepted as formulaic only when most, if not all, the judges identify it as such. In other words, what is required is intersubjectivity or, in measurement terms, a high degree of inter-rater reliability. Nevertheless, as Wray (2002: 20–25) points out, even meeting these basic conditions is not straightforward in the case of formulaic language. Corpus linguists such as Sinclair (1991) argue that their research reveals intuition to be a very fallible means of investigating the facts of language use, with regard to the relative frequency of linguistic features, typical meanings of lexical items, characteristic patterns of collocation, and so on. Secondly, in the context of second language acquisition research, the native speaker intuitions of the researcher are often brought to bear to account for the language production of learners, who may or may not have an intuitive basis for what they say or write in the second language. This means that the formulaic status of sequences in learner language is even more diﬃcult to establish by means of intuition than in the case of native speaker production. A third diﬃculty identiﬁed by Wray is that recognition of formulaic language may depend on the shared knowledge which comes from membership of a particular speech community rather than being universal among users of the language concerned. This represents just one more limitation on the value of intuition as an investigative procedure.

Corpus analysis Computer corpus analysis has added a powerful new tool to the range of procedures available for the study of formulaic sequences. Moving beyond the concept of locating and counting individual word forms, corpus software can search for speciﬁed headwords, combinations of words and even discontinuous sequences of words. Thus, if the investigator can specify particular words or word strings that are potentially formulaic (or known to be so on the basis of other evidence), the software can instantly assemble all of the examples in the corpus for inspection and further analysis. An alternative approach is a purely statistical procedure that identiﬁes sequences of two, three or more words that regularly co-occur throughout the corpus beyond a threshold level of probability. This second approach has produced a great deal of data that turns out not to be formulaic, depending on the deﬁnition of formulaic language adopted, but on the other hand it has shown its potential to give new insights into multi-

Measurement of formulaic sequences

word units that are not available through intuition. In both cases, the quantitative evidence supplied by the software needs to be evaluated by the application of human judgement to determine which of the word sequences are formulaic — and if a classiﬁcation system is involved, which ones ﬁt in which categories. Concordance software such as that included in Wordsmith Tools and SARA can be used to ﬁnd collocational clusters in corpus data. The most ﬂexible software allows the researcher to specify a search word or words and to gather and count the occurrences of collocates for several positions on either side of the search node. Such software is an extremely valuable tool for research on formulaic language. However, it is essential for the researcher to examine each instance of the data to make sure that it is relevant. One way to demonstrate this point is by means of a training exercise employing the SARA software on the British National Corpus. The task is to use corpus data to answer the question, “Are men beautiful?”. That is, do men and beautiful collocate? A corpus search with men as the node and beautiful as the collocate, using a 6 to the left 6 to the right span, found 38 instances. In only ﬁve of these were they really collocates. A more limited search of the same corpus using 3 to the left and right produced ten instances of which only four were collocates. Excluding right hand occurrences of beautiful would not change the result substantially. Here are the ten instances. to see if she were as who felt the need to dress up and be made love to the most brilliant and Next to him were two brothers, tall There are some You are so stunningly Men and If you were in Prague, two There are some very

beautiful as men told beautiful for their men beautiful men of your generation beautiful men with liquid eyes beautiful men’s clothes around beautiful that men would die for you beautiful to boot. Men would beautiful women also join in. beautiful men like you, beautiful young men there.

Clearly, valid cluster analysis requires manual checking of the data. Another limitation of concordance software is that it can automatically locate only contiguous sequences. In order to locate non-contiguous ones, it is necessary for the researcher to enter in the search request either a contiguous subpart of the whole sequence or at least one key lexical component of it. This of course assumes the whole sequence is already known to be formulaic. It is very likely

3

32

John Read and Paul Nation

that a substantial proportion of the formulaic language in English remains to be discovered; the non-contiguous nature of the sequences involved means that they fall below the threshold of recognition, whether it be by human intuition or automated computer search. In addition to the limitations of corpus analysis we have already noted, Wray (2002: 28–30) discusses two others. One is the big discrepancy in the estimates by diﬀerent researchers of the proportion of the corpus they analysed which could be considered to consist of formulaic sequences. Leaving aside any problems with the reliability of the individual analyses, there are clearly validity issues here related to diﬀering theoretical and operational deﬁnitions of formulaicity. Secondly, Moon (1998) among others has found that numerous formulaic expressions that are very familiar to native speakers do not occur at all even in the mega-corpora.

Structural analysis A variety of formal criteria have been proposed to assist in the identiﬁcation of formulaic sequences. The two mostly widely recognised ones are non-compositionality and ﬁxedness, which are characteristics of some idioms and other formulaic expressions to a lesser degree. Noncompositionality means that the sequence is not interpretable as a literal statement. It may contain individual words that never occur except as part of that expression. Fixedness refers to the degree to which either the order of the words in the sequence can be changed, individual words can be replaced by others, items can be inserted, or items can be inﬂected. The fact that these criteria turn out to be continua contributes to the diﬃculty in drawing the line between formulaic and non-formulaic expressions.

Phonological analysis In the case of spoken language, certain phonological features have been investigated as possible indicators of formulaic sequences. These include speech rate, pausing, stress patterns and clarity of articulation. The investigation of phonological criteria is likely to involve elicitation of data by means of a structured research design rather than analysis of an existing corpus. Apart from the relatively limited size of spoken corpora, the transcription of the oral texts in a general corpus may not meet the speciﬁc requirements of a phonological analysis. In addition, there are certain variables that need to be controlled in the interests

Measurement of formulaic sequences

of internal validity, such as whether the talk is spontaneous or prepared, what the topic is and the nature of the speaking task to be performed. As with other kinds of research involving the elicitation of spoken language data, there is tension between the control and manipulation of key variables needed to obtain interpretable results and the desirability, in the interests of external validity, of recording speech which is as natural and unmonitored as possible.

Pragmatic/functional analysis Another analytical criterion recognises that formulaic sequences have important roles in the performance of speech acts and are commonly associated with particular speech events. This provides an alternative approach to identifying them when data-gathering focuses on the particular social setting in which they typically occur (see Kuiper, Chapter 3). It also gives another perspective on the lack of transparency that the more ﬁxed formulaic sequences tend to exhibit. Idioms are said to lack semantic transparency because their meaning is not interpretable from knowledge of the individual lexical components. To this we can add pragmatic transparency, which refers to the need for knowledge of the social context in which particular formulaic expressions are used in order to be able to understand their role in the discourse.

The need for an eclectic approach Overall none of the criteria outlined in the preceding section is adequate by itself for the identiﬁcation of formulaic sequences. As Wray (2002) emphasises, researchers will generally need to apply more than one form of analysis in order to obtain valid results. The concept of triangulation, which has come to be an integral part of the qualitative research paradigm, is very relevant here. Let us now look at some of the studies in this volume to see how this triangulation might be done. Wray’s (Chapter 12) fascinating study of a beginner’s memorisation of sequences in Welsh uses evidence from pausing, errors, and changes to items in strings to examine the eﬀect of the memorisation of sequences and analysis on the retention of immediately useable language items. This use of both quantitative and qualitative evidence provides interesting insights into the way language data is stored and changed.

33

34

John Read and Paul Nation

In two innovative studies, Underwood, Schmitt and Galpin (Chapter 8), and Schmitt and Underwood (Chapter 9) used eye movement and self-paced reading methodologies to see if formulaic phrases embedded in a text were read any diﬀerently from other non-formulaic parts of the text. Considerable triangulation was used to ensure that the items being investigated were formulaic sequences. First a number of items were selected using intuition. Then their frequency was checked in a corpus (presumably the frequency of a ﬁxed unchanging sequence), and then these were tested in a cloze text with initial letter cues to check that the items were indeed predictable. The Schmitt, Dörnyei, Adolphs and Durow study (Chapter 4) uses a range of criteria including previous identiﬁcation by other researchers, corpus frequency, and occurrence in language teaching texts to come up with a list of target sequences. These examples illustrate the way forward in establishing a sound empirical basis from a measurement perspective for research in this rapidly developing area of vocabulary studies.

Reliability and Validity As a summary of some of the main points of this chapter, let us consider the measurement of formulaic sequences in terms of the classic criteria of reliability and validity. To satisfy the internal reliability requirement, any measures need to be consistently applied. This means that the criteria for identiﬁcation and classiﬁcation should be clear and there should be a high level of agreement among at least two analysts (or raters) working independently through a substantial sample of the data, if not the whole data set. In some studies (Foster, 2001; Jones and Haywood, Chapter 13) several expert raters have been used and the identiﬁcation of sequences as formulaic has relied on achieving consensus or near consensus among the raters. In other cases, where formulaic sequences are to be classiﬁed into a number of categories, the percentage of exact agreement in the classiﬁcations serves as the estimate of internal reliability. External reliability requires the clear description of procedures so that the study could be replicated. For a corpus search, for instance, the necessary information includes a description of the corpus, the kind of search, search parameters (what span each side of the node was used), whether there was manual checking of the results of the search, and what criteria were applied when checking.

Measurement of formulaic sequences

Validity issues are particularly problematic with formulaic strings, as the essential criterion — storage as a whole unit — is a diﬃcult one to operationalise. For internal validity, there is a need for a clear deﬁnition of what a formulaic string is, both at the conceptual level and in operational terms. Research indicates that this may need to take account of the function of formulaic strings (Wray, 2002: Chaps 4 and 5). Where possible, there should be methodological triangulation: two or more methods should be employed to identify what is formulaic. For external validity, the corpus — or whatever other data source is used — should represent target language use and be large enough to contain an adequate number of examples. This means that very large corpora are likely to be needed, which makes the problem of representativeness more diﬃcult to solve.

References Carroll, J. B., Davies, P., and Richman, B. 1971. The American Heritage Word Frequency Book. Boston MA: Houghton Miﬄin. Foster, P. 2001. Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers. In Researching Pedagogic Tasks: Second Language Learning, Teaching and Testing, M. Bygate, P. Skehan, and M. Swain (eds), 75– 93. Harlow: Longman. Grant, L. 2003. A Corpus-based Investigation of Idiomatic Multi-word Units. Unpublished PhD thesis. Victoria University of Wellington. Moon, R. 1998. Fixed Expressions and Idioms in English. Oxford: Clarendon Press. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and communication. J. C. Richards and R.W. Schmidt (eds), 191–225. London: Longman. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

35

Formulaic performance in conventionalised varieties of speech* Koenraad Kuiper

University of Canterbury

Singers of tales and the linguistics of formulaic performance Formulaic speech traditions may well be as old as story telling and ‘doing’ politeness (Brown and Levinson, 1987; Ferguson, 1976). The most signiﬁcant step in uncovering their nature was research by Milman Parry and Albert Lord in the 1930s and 1940s as they searched for explanations as to how Homer, blind and illiterate, could have created two of the great founding texts of Western literature. They went to what was then Yugoslavia with wire recording equipment and recorded illiterate bards singing of the heroic exploits of their traditional Christian and Muslim heroes to the accompaniment of the plaintive sound of their one stringed gusls. Lord’s (1960) book Singer of Tales, the result of this pioneering ﬁeld work, made a considerable impact in literary scholarship because it opened a new way of looking at oral traditional literature. It was even suggested that whole cultures might be inﬂuenced by the ways in which linguistic traditions are carried: either orally, or both orally and by means of writing (Ong, 1982). This way of thinking has been inﬂuential in many areas of research such as folklore (Foley, 1990; Jackson et al., 1988), cultural anthropology (Edwards and Sienkewicz, 1990), and literary studies (Foley, 1995), but it has had little impact on linguistics. As it happens, Lord’s explanations of the formulaic performance skills of oral heroic poets constitute an embryonic theory of linguistic performance in the psycholinguistic and sociolinguistic senses, i.e. a theory of what people do when they use their internalised knowledge of language in social contexts. Lord proposes that an oral poet must compose his poems in real time, maintaining ﬂuency in the face of a mobile audience and its reaction to the performance. Poet bards do this by using formulaic phrases which are traditionally keyed to speciﬁc episodes in the narrative (Lord, 1960: 34). For example, when Homer’s heroes in the Iliad arm themselves for battle, the phrases that are used to describe

38

Koenraad Kuiper

this are the same phrases that are used throughout much of the poem to describe this activity. Such formulae are acquired rather as other lexical items are acquired, through exposure to them. Like words, formulae are not taught but absorbed along with all the other aspects of the tradition within which the poet is performing. A formula is therefore a more or less ﬁxed phrase used by oral poets to do a particular ‘job’ such as describing how a hero puts on his helmet. There are other important elements to maintaining ﬂuency in the face of the pressure to perform. The poems which the oral bards perform have been performed before. The general plot outlines are known. This helps the performer since, at the highest level of the discourse, he knows what is going to happen. He also knows a little further ‘down’ what is going to happen since the plot as a whole consists of a sequence of episodes, what Lord calls themes, each of which relates events in a predetermined way (Lord, 1960: 68). In South Slavic epics the Counsel Meeting episode has the participants introduced in the same sequence each time there is such a meeting (Lord, 1960: 92). Again this makes for an easier performance since, once embarked on relating a Counsel Meeting, the singer knows what comes next (Lord, 1960: 94). But although the songs have been performed before, they are never performed in exactly the same way twice since that is not possible. The singer does not recall them verbatim. Each performance text is unique but constructed from established elements in the tradition. This can be termed ‘composition in performance’ (Lord, 1960: 25). Since the songs are sung, there is also a musical and metrical tradition which goes with the language. Formulae must fall within the metrical grid of the singer’s tradition and the chant must be within the time-honoured bounds required by the tradition. This takes time to learn. Lord describes how the tradition is acquired. He discovered that becoming a singer is a long process in which there is ﬁrst, a long apprenticeship period when the apprentice sits near a mature singer or singers and, as it were, absorbs the traditional way of singing (Lord, 1960: 21). That involves absorbing the plots, the episodes and their structure, and a very large number of formulae. In the journeyman period the singer is able to produce a song or two in stereotypical form (Lord, 1960: 24). Once mastery is reached, years later, the singer is able to perform a larger number of songs and to innovate and embellish, even constructing his own new formulae which then, in turn, can become part of the tradition. He is also able to learn and to perform new songs quickly because he has the traditional resources out of which the songs are all constructed (Lord, 1960: 25–28). In eﬀect, this is a theory of language acquisition for formulaic varieties of speech (Lord, 1960: 36).

Formulaic performance in conventionalised varieties of speech

But the acquisition can only occur when there is the chance for exposure. Traditions are local, and because they are passed down orally over generations, they will only persist where there is a chance for neophytes to learn from masters. In the case of the South Slavic singers, they sing in coﬀee houses, particularly during Ramadan and at weddings. So a neophyte must be able to gain access on a frequent basis to these locations. Performers often sing in one district and not elsewhere and so one can only learn from them in that district. The tales are also either Muslim or Christian. An aspiring singer may not be able to gain access to both of these traditions (Lord, 1960: 49). There are signiﬁcant conclusions to be drawn from Lord and Parry’s work. Formulaic performance takes place where speakers are under pressure from tasks other than (or perhaps directly relating to) speaking (Lord, 1960: 65), speciﬁcally pressures on their working memory (Lord, 1960: 54). The stories which bards sing are long and involved. So keeping track of what has gone before and what is still to come taxes the memory resources of the bard. Utilising an oral formulaic performance technique lowers this source of pressure. Because it relies on the resources of the tradition, formulaic performance is only possible in routine contexts, that is in situations where there is an expectation that things will happen in much the same way that they have happened before. The resources of a formulaic tradition can only operate appropriately in such a context. If one has something totally new to say, then these resources will not suﬃce. We would also expect the tradition itself to have various characteristics. Formulaic traditions would possess discourse structure rules which govern the order, both hierarchical and linear, in which ‘texts’ are constructed. Traditions would make available a range of formulae to cover every eventuality which a performer will come across. Performers are likely to have specialised prosodic modes of speaking such as chanting or droning their speech. We would also expect formulaic modes of performance to typically be restricted to particular times and places, to ritual events rather than to more unstructured situations.

Psychological determinants of formulaic speech varieties We can suppose that such performance situations and styles are not unique to the performance of heroic narratives. If so, we might expect to corroborate and extend Parry and Lord’s ﬁndings. It happens that vernacular performance traditions are much more common that we might think. There are early studies by Rosenberg on black preaching traditions (Rosenberg, 1970) and of black

39

40

Koenraad Kuiper

and Turkish verbal duelling (Dundes et al., 1972; Labov, 1972), for example. These studies are mainly descriptive and address vernacular traditions which are not particularly suitable to test and possibly extend Lord and Parry’s ﬁndings. For that to happen, the vernacular formulaic traditions under study must be those of speakers who are psychologically under some measure of working memory pressure from both the speech tasks in which they are engaged and other cognitive tasks they must simultaneously perform. The speech tasks must also be, sociologically speaking, routine such that high degrees of novelty of output are not required. That being the case, formulaic performance should also have predictable properties which are invariant across diﬀerent traditions. Such traditions should also evolve in predictable ways given the linguistic properties which they have. My own psycholinguistic investigations into speech production in such oral traditions initially relied on two professional groups for evidence. The ﬁrst were auctioneers since auctions take place in many diﬀerent parts of the world and auctioneers are under a range of pressures. These pressures range from light in the case of a house auction where only one house is being sold and the auctioneer has in the order of ﬁve to ten minutes to sell it, through to heavy in the case of tobacco and wool auctions where thousands of lots are sold at a rate of one every three to ﬁve seconds. The second group consisted of sports commentators since some sports, such as ice hockey and horse racing, are very fast paced and place great pressure on speakers to follow all that is going on while at the same time relating it in real time to an audience. Other sports, such as cricket, are slow paced, placing lower processing pressure on speakers, which allows them more time for improvising speech. If Parry and Lord are correct, we would expect to ﬁnd the full range of formulaic properties more in evidence in the more pressured situations. Like Parry and Lord’s work, my investigations relied on recording naturally occurring speech. This was done either oﬀ the air in the case of radio sports commentary or with high quality ﬁeld recording equipment in the case of the auctions. In many auction situations a directional microphone was needed, particularly where livestock provided accompaniment to the auctioneer. The tapes were then transcribed. For prosodic purposes this included a full prosodic transcription (Kuiper and Haggo, 1984). However this is not enough. To understand what is happening in the ﬁeld one must become a participant observer. To know which are formulae in a tradition one must also become a (perhaps partial) native of the tradition oneself. This involves a lot of listening very like that of the apprentice performer. It takes

Formulaic performance in conventionalised varieties of speech

time, speciﬁcally years of attending auctions and listening to sports commentary. A suitable test for native passive proﬁciency is being considered part of the situation by the professionals involved, such as the senior auctioneers at a livestock market and being able to talk knowledgeably with sports commentators about their work. At that point one can take the transcripts and extract the formulae, placing them into a database and checking each for its function in the discourse by formalising the discourse structure rules and noting where each formula is used in the discourse. This is a mutually deﬁning activity. Since each formula is indexed to a role in the discourse, formulae with the same function will appear at the same point in the discourse thus deﬁning a discourse constituent. (Substitution tests can then be used to test for functional equivalence.) Card ﬁles with index tabs are a useful way to do this. Knowledge of the tradition will also allow a reasonable estimate to be made of the kinds of working memory pressure the performer is under. A good example is the situation of the race caller who provides live commentary on horse races (Kuiper and Austin, 1990). His tasks are as follows. He must ﬁrst memorise the following details for each race. 1. A list of the names of all the horses which are running. Linked with this list and for each horse: 2. 3. 4. 5. 6.

its colours, i.e. the colours of the owner or trainer, the jockey/driver of each horse, the name(s) of the owner(s), the name(s) of the trainer(s), a list of the favourite(s) for the race,

Other details which must be memorised include: 7. the length of the race, 8. the physical nature of the track and the names of its topographical features, 9. the current state of the track. Then, during the race and as he is providing its commentary, he must have the following things at hand in working memory. 1. On the basis of the colours and possibly physical features of each horse, recalling the name of the horse and when required the name of its jockey, trainer and owner which he has previously memorised.

4

42

Koenraad Kuiper

2. Discerning for each horse its relative and absolute position both in a linear sequence if horses are one behind the other and in two dimensions if horses are travelling one outside the other. 3. Discerning the current location of each horse. Note that position is both position on the track as a physical (elliptical) entity and position in the race as a linear entity with a start at its beginning and ﬁnish at its end. 4. Discerning changes in relative and absolute position. 5. Noting any unusual happenings such as horses or riders falling, infringements of good racing behaviour and the like. Clearly there is signiﬁcant pressure on memory resources, but ﬂuent commentary is maintained through the utilisation of a totally formulaic speech tradition. Race commentaries are subject to a ﬁxed discourse structure and virtually everything which is said is said formulaically. Race calls are droned or chanted. The result is a highly ﬂuent commentary without any hesitation phenomena such as false starts or pauses voiced or unvoiced. By contrast, in a ﬁve-day cricket test match, when often little of note is happening, the time can be ﬁlled with non-formulaic conversation between the person who does the play-by-play commentary and the colour commentator. Yet during the short periods of intense activity when play-by-play commentary is provided, formulaic speech is again in evidence (Pawley, 1991). During the former kind of talk not much pressure is placed on the speaker’s memory resources, whereas during the latter the business of relating what is happening while it happens does place the speaker under pressure. That is what Lord and Parry would have predicted (if they had been psycholinguists). The same observations and conclusions have been made with regard to the speech of auctioneers (Haggo and Kuiper, 1985; Kuiper and Haggo, 1984; Kuiper, 1996). What of the developmental aspects of Lord’s account? Do professional auctioneers and sports commentators learn their craft in the way that South Slavic bards do? The answer is yes. Following auctioneers for a number of years, one can observe them growing into their craft, starting as recruits, in the case of livestock auctioneers, moving through a period of attending sale days, doing various associated chores, through selling small lots of calves and then gradually becoming fully ﬂedged fat cattle auctioneers. At the age of forty or older, they become masters, acknowledged by all who know the craft as fully in control of all aspects of it, and innovators too. Such men become the models for the younger auctioneers.

Formulaic performance in conventionalised varieties of speech

Much of this acquisition is unconscious and relatively fast. But not all. As Gleason et al. (1996) show, parents do explicitly instruct children in some cultures as to what they may and may not say by way of formulae. Auctioneers, however, acquire their oral tradition just by being exposed to it, as we all acquire vocabulary. The rate of acquisition is dependent on how much there is to learn and the frequency of opportunities to learn it. The formulaic inventory required to sing tales is very large while thank you formulae are a small set. Many heroic poems are not performed often in the life of an aspirant singer of tales whereas opportunities to learn to say thank you come up frequently in a child’s life.

Socio-cultural determinants of formulaic performance The linguistic skills which follow from the acquisition of the oral traditions of both auctioneer and sports commentator as professions play a signiﬁcant role in deﬁning what it means to be a member of both. Indeed, it is often possible to identify members of such professions by these linguistic skills alone. However, social knowledge is required as well. In fact, the social knowledge involved in being able, appropriately and as a native, to sell at auction or relate a sport on radio or TV is wide-ranging. The auctioneer must be able to diﬀerentiate the roles of vendor, vendor’s agent, bidder, buyer, and by-stander, and then appropriately address speech to each of these parties to the auction (Smith, 1989). Likewise, all the various ﬁeld positions, types of shot selection and bowling action must be known in order to be a cricket commentator, not to mention previous scores of games long past, batting averages of long dead players, etc. (Pawley, 1991). All this social knowledge not only manifests itself in speech, but is coded in the formulae which auctioneers and sports commentators have memorised and which enable them to produce ﬂuent output. For example, the sequence Do I sell? is addressed by the auctioneer to the vendor at a point in the auction when the auctioneer believes the lot has fetched a bid of suﬃcient value for a sale to be able to be made from that point on. It also shows itself in humour based in this knowledge. At a stud ram sale in Australia when a ram had reached a world record price, the auctioneer asked Do I sell? The knowledgeable audience appreciated the joke and the auctioneer got more bids, perhaps in appreciation. If we think of social identities as being negotiated in interaction as the professional identities of racecallers and auctioneers are and as evolving over time, then we are close to a conception of humans as social beings which would be

43

44

Koenraad Kuiper

in line with that of Goﬀman (1969). We play parts, and a good deal of what it means to play a part is learning the lines. The stereotypical socially sanctioned lines are often provided by the oral tradition of the particular role one is acquiring. In many real-life parts, the script (the discourse structure rules in our terms) plays a signiﬁcant role. Without it there would be no part to play. We must learn the scripts for greeting and leave taking and the various formulae which implement these rituals. Oral tradition is also often embedded in social action. The auctioneer says, ‘Any more bids? Last call.’ and raises his hand, looks intently round the gallery of buyers one last time and then lowers his hand to sell the lot. But learning the part and its associated actions is not the whole story. The part must be played legitimately, in the social context set down for it. I can now mimic an auctioneer quite well. But I am not an auctioneer and I could not be. I don’t hold a licence. I can imitate a cricket commentator, but I could not do a live commentary since that involves perceptual tasks which I can’t perform as well as social knowledge I do not have as to individual player’s life stories, and signiﬁcant events from the past. I have not been initiated. Not withstanding the problems associated with gaining a rounded view of vernacular oral traditions, their users, and their context, some of the most interesting and revealing social data relating to the construction of the social self are to be found in the detailed study of such varieties. They have the capacity to yield large amounts of data which can be transcribed and then systematised. Then they can reveal what it means to be an enculturated human being.

Extensions to Lord’s theories These earlier studies of formulaic performance skills led to further studies, such as one on supermarket checkout operator speech (Kuiper and Flindall, 2000). The aim of this study was to show how the social skills of the checkout operator, while governed by a common oral tradition are, notwithstanding, able to be executed diﬀerently by diﬀerent checkout operators. That being so, each checkout operator can evolve a unique persona within the tradition, with the tradition thus providing an avenue for individual identity. The study was conducted by recording a number of checkout operators in two diﬀerent supermarkets. Their common tradition was extracted from these recordings. That involved writing a discourse grammar and constructing a dictionary of checkout operator formulae. Then the personal implementations of these traditional resources were

Formulaic performance in conventionalised varieties of speech

analysed through noting the preferential use of particular formulae, and not others, by a set of operators. For example, of the seven operators studied, two favoured a greeting which began with Gidday. They were not the only operators who used it, but they favoured it. They were also male. The conclusion was that, even in such a highly routine environment, there are avenues through which individuals can express their individuality within an oral formulaic tradition. Again, becoming a checkout operator involves acquiring the tradition, and then making one’s own way within it. Hickey and Kuiper (2000) showed how formulaic traditions can also be written ones in the case of weather forecasters, while Hickey (1991) showed that these traditions can be sensitively styled for diﬀerent audiences by diﬀerent media. Again recording was followed by transcription, discourse structure analysis and formula dictionary construction. Style shifting for particular audiences was shown to be rule governed by noting that the discourse structure rules of the source forecast were subject to systematic changes in rewriting the forecast for a target audience (Bell, 1984). Thus, again, a formulaic tradition allows both for socio-cultural continuity and local contextual sensitivity. Smith (1991) showed that script writers of television soap operas are well aware of the social value of formulaic traditions of greeting and parting rituals and she builds a model of these rituals from transcriptions of such scripted rituals. This model closely resembles the models of Sacks et al. (1974) on such rituals in unscripted spontaneous utterances. The diﬀerence between these models and Smith’s work is that she shows the degree of their formulaticy and that each formula is socially ‘licenced’ to perform a particular role in the ritual. So greeting and parting formulae are, in that regard, no diﬀerent from Homeric formulae. Since formulae are keyed to particular contexts and roles within those contexts, they are cultural as well as linguistic artefacts. They act as greetings, apologies and so forth. This was ﬁrst noticed by Austin (1976), although he did not notice that the utterances which he saw acting as speech acts were, in fact, relatively ﬁxed formulae. Since such functionally-based formulae have relatively ﬁxed conditions of use, a number of things should follow. If there is a major social upheaval, one would expect the formulae which existed before the upheaval to change in various ways. They might change their form to indicate that they are diﬀerent from those which existed before the upheaval; some formulae may disappear altogether; others may undergo changes to their conditions of use. I investigated this prediction in a paper on routine formulae before and during the Great Proletarian Cultural Revolution (Ji et al., 1990). The study was writ-

45

46

Koenraad Kuiper

ten with two co-authors who were both linguists and had lived through the Cultural Revolution. Its ﬁndings were that, indeed, major changes in the formulaic inventory of speakers took place. Old formulae which were keyed to old ways were either proscribed or altered to represent the new order. For example, an old formula to begin school classes at the beginning of the day was proscribed and in its place an imperial greeting and homage formula was adapted to pay homage to Mao Tse Tung at the beginning of the school day. Ji (1998) carried on this study showing in detail how each twist and turn of ideological and political direction during the Cultural Revolution had consequences for the formulaic inventory. Linguistic engineering through young people’s desire for conformity in being like their peers came to be exploited for socio-political ends. This work shows that formulaic speech is not only sensitive to socio-cultural change but can be manipulated by the powerful for socio-political ends. If formulaic speech is socially sensitive it also follows that in a relatively uniform but bilingual culture, the formulaic inventory in two diﬀerent languages could have similar cultural underpinnings in its conditions of use. That prediction is explored and corroborated in a study I made of sections of the formulaic inventory of Hokkien (Chinese)-English bilinguals in Singapore with a multilingually ﬂuent Singapore Hokkien speaker (Kuiper and Tan, 1989). Singapore has a lectal continuum in both English and Hokkien ranging from a pidgin-like dialect at one end to an educated dialect at the other. In all the cases we explored, a formula for, say, greeting someone in one language at one level of the lectal continuum was matched with an equivalent formula in the other language at the same point of the continuum. For example, the formulae for beginning a meal echoed each other in the diﬀerent languages. However the loan translation clearly went from Chinese to English since the cultural values that underpinned the formulae and their use were Chinese and not English. It is also possible to use the formulaic inventory to explore and critique sociocultural practices and assumptions since the formulaic inventory is a cultural artefact and each formula thus has things to say about the culture in which it functions. In Kuiper (1990), I show that rugby locker room vocatives are a coercive means of maintaining group solidarity by acting as weapons to create a warrior elite. They do this by attacking players’ positive face (their desire to be thought well of by their mates) (Brown and Levinson, 1978) by indicating that they may not be men, but women (or parts of the female sexual anatomy). The message is that one cannot be sure of being a man save by undergoing trial by ordeal, speciﬁcally the verbal humiliation of the locker room banter and the game of rugby with its opportunities for physical injury and humiliation. Since

Formulaic performance in conventionalised varieties of speech

these gender-based formulae are transmitted only within the conﬁnes of the group, they have a strongly coercive potential to maintain group solidarity. I contrasted this gendering practice with another where a group of men actively maintain one another’s face through the use of formulae which support people even when they are potentially letting the side down, making mistakes and the like. This second group were men from many parts of a large organisation who played volleyball together intramurally. Many were not adept at the game and errors were frequent. The making of an error was normally followed by fellow team members producing a formula whose value was face-saving. For example, if a player served the ball out, his team mates would shout ‘On the line.’ Here the aim was not to dispute the point but to support the player who had just served out. An elderly player’s gentle serves would be accompanied by the formula, ‘They drop quickly’, possibly a reference to top-spun serves of very gifted players. Since this player often served into the net, any embarrassment was laid aside beforehand. All the conventional formulae of this group of men were face-saving.

Diachronic change in formulaic performance Since oral performance traditions are cultural artefacts they also have a history. In most case this is not amenable to research since data cannot be obtained. However two avenues are possible. Historical linguistics has shown that historical reconstruction is possible in cases where languages have a shared history, because their ‘parent’ can be reconstructed on the basis of the shared features of the ‘oﬀspring’. I explored that avenue in (Kuiper and Tillis, 1986) and (Kuiper, 1991). In the ﬁrst of these studies, I recorded and transcribed the chants of American tobacco auctioneers. Each of these chants is relatively uniquely that of the particular auctioneer but the tradition is a musical one. For help with that I worked with Frederick Tillis who is a musicologist specialising in nineteenth century Black musical traditions. He was important because the music of tobacco auctioneers is clearly Black and overlaid on an English discourse structure and English formulae. Some of these sequences are the same as those used elsewhere in the world in the English auctioning tradition, while others are home grown Southern ones. We surmised that two oral traditions came together in the chants of tobacco auctioneers: the monotonous chanting of the English auctioneering tradition and the Black African American tradition with its pentatonic scales, blues notes and syncopation. The creation of this hybrid tradition is thus a creole one.

47

48

Koenraad Kuiper

The second study used comparative data from England, Canada, the U. S. A., and New Zealand as a basis for attempting to reconstruct the common underlying oral tradition which was exported from England in the 18th and 19th centuries. This study (Kuiper, 1991) shows that it is possible to reconstruct aspects of the discourse structure, formulaic inventory and prosodics (intonation, stress, and the like) of an English tradition carried orally for three hundred years by using the tools of historical reconstruction. That tradition contains formulae such as Are you all done?, which is said in order to mark the near conclusion of the selling of a lot. It is found in many auction traditions which derive from the English tradition. English-derived traditions contain the basic discourse structure rule of having a description of the lot, followed by the search for an opening bid, followed by bid calling, followed by an optional conclusion or coda, with the sequence Sold to . . . often realising this optional conclusion or coda.1 The second way to investigate oral traditions is by means of recordings of the tradition from earlier periods. That is only possible as far back as recording equipment existed. Based on recordings of race calls in Christchurch from the 1930s onwards, I (Kuiper, 1991) show that major features of the tradition of current South Island callers documented in Kuiper and Austin (1990) were created by one caller, Dave Clarkson, who became the model for subsequent callers. This includes, for example, the convention that the only horse which is mentioned twice is the leader of the race. Thus, a caller who says Smoking Joe, he puts his nose in front can only do so if the horse is the leader. This only became mandatory after it became part of the oral tradition created by Clarkson. In this study I therefore show how an oral tradition can emerge where none existed before. That tradition then becomes the frame within which its successors must make their way.

Methodological issues Two methodological practices are particularly worthy of note in the study of oral traditions. From the beginning, the empirical studies of formulaic performance reported above have been conducted within a research tradition of ethnography of speaking (Saville-Troike, 1982). Since my aim was always to see formulaic performance traditions within their cultural setting, that was the obvious methodology to choose. It has a major disadvantage. It takes time to conduct research studies since one must become as thoroughly conversant with the situation and its cultural locus as possible. In the case of the many auctioning traditions that I

Formulaic performance in conventionalised varieties of speech

have looked at, recorded and described, one must attend a good many auctions to get to know exactly what is going on. This is not just a matter of the physical things that are happening, but the cultural values of the people concerned and what they are doing. Frequently that is where my co-authors and research associates come to be important. Most of the studies outlined above depended on the work and knowledge of my informants and co-researchers. I could not have written about Singapore English without Daphne Tan and her family’s large collection of formulae and their knowledge of how those formulae are used. Paddy Austin’s family owned racehorses and went to the races frequently. Doug Haggo and I had both watched a lot of ‘Hockey Night in Canada’ on the CBC. Marie Flindall was an experienced checkout operator. A second methodological area is that of formalisation and quantiﬁcation. Formalised systems such as I have developed to explain both the discourse structure and formulaic syntax of formulaic traditions are documented most explicitly in Kuiper (2000). There I provide models for the structural properties of formulae by proposing that they can be modelled as ﬁnite state systems subject to particular constraints and the discourse grammars are able to be modelled by context-free re-write rules which are also subject to further constraints. For example, discourse grammars are potentially fully recursive, that is, embedded structure could be inﬁnitely deep. However formulaic varieties utilise only relatively shallow structures. There appears to be no full recursion in any formulaic varieties such as one gets with clauses embedded within other clauses or phrases within phrases. A particular episode or theme cannot occur within itself. Also sub-episodes go no further than four or ﬁve deep. Even in The Iliad, structure tends to be serial with each major battle having its origins, then central ﬁght sequence followed by an aftermath. At the commencement of individual combat, warriors arm themselves, and this sequence again has sub-episodes but that is as far as it goes. Because these models are explicit they allow quantiﬁcational work to be parasitic on them. Two of the major approaches to the social analysis of the contexts of language use are variationist studies which are quantiﬁcational, and ethnography of speaking studies which generally are not. I have shown that there is value in using quantiﬁcational approaches when one has a formal theory as to the parameters of variation that are available within an oral tradition. The study of race calling shows how valuable quantiﬁcational data can be by showing how a loose tradition comes to be ﬁxed over time. Such studies are also useful in showing how individual variation is possible within a formulaic tradition (Kuiper and Flindall, 2000). To my knowledge, these studies are unique in this regard.

49

Koenraad Kuiper

100 90 80 70 Frequency

50

60 Title, initials/given name, surname Given names and surnames Given names only

50 40 30 20 10 0

Times

NZ

Aus

Figure 1. Formality in engagement notices

In a study of engagement notices in newspapers in the UK, Australia and New Zealand (Kuiper and Geisser, to appear), the fact that a formula was central to these notices could be used quantiﬁcationally. There were six formulae in the data. Engagement notices in The Times used only two of these. There were very few linguistic markers in these Times formulae. But in the New Zealand and Australian notices there were a number. Taking 100 notices in each of the Australian and New Zealand data it was possible to show that a number of marker variables were indicative of formality. For example the names of the parents of the couple could be given in the following forms: Title, given name, surname; title, initials, surname, given name(s), surname; given names only. The Times used the ﬁrst two possibilities almost exclusively with a small number of the third option. The New Zealand notices were midway while the Australian notices had the highest frequency of given names only. Three other variables patterned in harmony with this hierarchy: 1. the order in which the couple are mentioned, 2. mention of parents’ domicile, 3. choice of passive formulae such as The engagement is announced between . . . Quantifying on the basis of a formal theory of formula structure thus shows that there is a formality hierarchy. The Times is very high on the formality scale, New Zealand mid-way, whereas Australians prefer high levels of informality.

Formulaic performance in conventionalised varieties of speech

Formulae elsewhere? Many of the studies highlighted in this chapter concentrate on quite small domains. What of the more general domains? Speech act theory already suggests that a search for formulae is not going to yield much in the way of general domains because there may be no general domains for such formulae. Since all formulae are indexed for particular conditions of use, they will appear only in situations where such conditions are appropriate. Apologies, farewells, condolences will be used when speakers need to say sorry, goodbye, or express sympathy to someone who has lost a relative or friend. However there are phrasal lexical items other than formulae. Restricted collocations appear in all kinds of speech and do not have speciﬁc speech functions. Restricted collocations are pairs of words which occur together in ways that are more restrictive than the grammar of the language requires. For example we give oﬀence and take oﬀence. As far as English goes these are the only verbs that will do. We cannot donate oﬀence or accept oﬀence. These are not formulae because their use is not restricted by anything other than their meaning. We use them when that is what we want to say, not when the non-linguistic context dictates it. Idioms, semantically non-compositional phrasal lexical items, are also common. One has only to read newspaper coverage of political events to see how sports metaphors, often in phrasal form, play a role in that discourse variety.

Conclusion In this chapter I have suggested that human linguistic knowledge and practice includes a great deal of lexical and cultural knowledge which makes it possible to speak in a native-like way in particular oral traditions (Pawley and Syder, 1983). Formulaic varieties of language abound in all societies. They are often to be found in their own small speech-ecological niches known only to the subtribe who are members of the speech community which inhabits such a niche. These include operating theatres, law courts, auction rooms, radio sports broadcasting studios, surf lifesaving clubrooms, army mess halls, and even academic common rooms. The non-linguistic and attendant linguistic knowledge that comes with growing to maturity in each of these situations is acquired slowly. Full mastery often takes years. It is also circumscribed by the constraints on

5

52

Koenraad Kuiper

our ability to produce and understand language caused by the limitations of our memory resources. We have an immense capacity to remember, and to retrieve very quickly from memory what we need. But we have relatively restricted processing capacities because our working memory is quite small. Formulaic speech enables us to harness these resources in an eﬃcient way so long as what we wish to say does not need to be radically novel. Much of what we say in the normal course of social events is not.

Acknowledgements Parts of this chapter appeared previously as (Kuiper, 2001). I am grateful to Norbert Schmitt and Alison Wray for useful discussions on the topics covered in this chapter, to the University of Canterbury for a period of study leave, and to the Netherlands Institute for Advanced Study for a Fellowship during the holding of which this chapter was written.

Note . Many auction traditions also share a shout mode of delivery during the bid calling where the auction is in the open air and the auctioneer enunciates particular words with a very high volume. For example the bid calling formulae At X dollars would have the word at delivered with ‘shout’ prosodics. (Shout is a term developed by Douglas Haggo and me, and described in detail in (Kuiper and Haggo, 1984).)

References Austin, J. L. 1976. How to Do Things with Words. Oxford: OUP. Bell, A. 1984. Language style as audience design. Language in Society 13: 145–204. Brown, P. and Levinson, S. C. 1978. Universals in language usage: Politeness phenomena. In Questions and Politeness: Strategies in Social Interaction, E. N. Goody (ed.). Cambridge: CUP. Brown, P. and Levinson, S. C. 1987. Politeness: Some Universals of Language Usage. Cambridge: CUP. Dundes, A., Leach, J.W. and Özkök, B. 1972. The strategy of Turkish boys’ verbal duelling rhymes. In Directions in Sociolinguistics: The Ethnography of Communication, J. J. Gumperz and D. Hymes (eds), 130–160. New York: Holt, Rinehart and Winston. Edwards, V. and Sienkewicz, T. J. 1990. Oral Cultures Past and Present: Rappin and Homer. Oxford: Basil Blackwell.

Formulaic performance in conventionalised varieties of speech Ferguson, C. 1976. The structure and use of politeness formulas. Language and Society 5: 137–151. Foley, J. M. 1990. Oral-formulaicTheory: A Folklore Casebook. New York: Garland. Foley, J. M. 1995. The Singer of Tales in Performance. Bloomington IN: University of Indiana Press. Gleason, J. B., Ely, R., Perlmann, R.Y. and Narasimhan, B. 1996. Patterns of prohibition in parent-child discourse. In Social Interactions, Social Context and Language: Essays in Honor of Susan Ervinn-Tripp, D. I. Slobin, J. Gerhardt, A. Kyratzis and J. Guo (eds), 205–218. Hillsdale NJ: Lawrence Erlbaum Associates. Goﬀman, I. 1969. The Presentation of Self in Everyday Life. Harmondsworth: Penguin. Haggo, D. C. and Kuiper, K. 1985. Stock auction speech in Canada and New Zealand. In Regionalism and National Identity: Multidisciplinary essays on Canada, Australia and New Zealand, R. Berry and J. Acheson (eds), 189–197. Christchurch: Association for Canadian Studies in Australia and New Zealand. Hickey, F. 1991. What Penelope Said: Styling the Weather Forecast. University of Canterbury: M. A. Hickey, F. and Kuiper, K. 2000. A deep depression covers the South Tasman Sea: New Zealand Metereological Oﬃce weather forecasts. In New Zealand English, A. Bell and K. Kuiper (eds), 279–296. Wellington and Amsterdam: Victoria University Press and John Benjamins. Jackson, B., Taft, M., and Axlerod, H. S. (eds) 1988. The Centennial Index: One Hundred Years of the Journal of American Folklore. Washington DC: American Folklore Society. Ji, F.Y. 1998. Language and Politics during the Chinese Cultural Revolution: A Study in Linguistic Engineering, Unpublished PhD, Linguistics Department, University of Canterbury. Ji, F.Y., Kuiper, K., and Shu, S-G. 1990. Language and revolution: Formulae of the Chinese cultural revolution. Language and Society 19: 61–79. Kuiper, K. 1990. New Zealand sporting formulae: Two models of male socialisation. In English around the World: Sociolinguistic Perspectives, J. Cheshire (ed.), 200–209. Cambridge: CUP. Kuiper, K. 1991. The evolution of an oral tradition: Racecalling in Canterbury, New Zealand. Oral Tradition 6: 19–34. Kuiper, K. 1996. Smooth Talkers. Vol. 1: Studies in Everyday Communication. Hillsdale NJ: Lawrence Erlbaum Associates. Kuiper, K. 2000. On the linguistic properties of formulaic speech. Oral Tradition 15: 279– 305. Kuiper, K. 2001. Linguistic registers and formulaic performance. NZ Journal of Sociology 16: 261–273. Kuiper, K. and Austin, J. P. M. 1990. They’re oﬀ and racing now: The speech of the New Zealand race caller. In New Zealand Ways of Speaking English, A. Bell and J. Holmes (eds), 195–220. Clevedon: Multilingual Matters. Kuiper, K. and Flindall, M. 2000. Social rituals, formulaic speech and small talk at the supermarket checkout. In Small Talk, J. Coupland (ed.), 183–207. Harlow: Longman. Kuiper, K. and Geisser, C. To appear. Towards a variationist dialectology of formulaic genres: An engaging syntactic case study. Journal of Pragmatics. Kuiper, K. and Haggo, D. C. 1984. Livestock auctions, oral poetry and ordinary language. Language and Society 13: 205–234.

53

54

Koenraad Kuiper Kuiper, K. and Tan, D. G. L. 1989. Cultural congruence and conﬂict in the acquisition of formulae in a second language. In English across Cultures: Cultures across English, O.Garcia. and R. Otheguy (eds), 281–304. Berlin: Mouton de Gruyter. Kuiper, K. and Tillis, F. 1986. The chant of the tobacco auctioneer. American Speech 60: 141– 149. Labov, W. 1972. Rules for ritual insults. Language in the Inner City: Studies in the Black English Vernacular, W. Labov (ed.), 297–353. Philadelphia PA and Oxford: Pennsylvania University Press and Basil Blackwell. Lord, A. B. 1960. The Singer of Tales. Cambridge MA: Harvard University Press. Ong, W. J. 1982. Orality and Literacy: The Technologizing of the Word: New Accents. London: Methuen. Pawley, A. 1991. How to talk cricket. Currents in Paciﬁc Linguistics: Papers in Austronesian Languages and Ethnolinguistics in Honour of George W. Grace, R. Blust (ed.), 339–368. Honolulu: Paciﬁc Linguistics. Pawley, A. and Syder, F. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. Language and Communication, J. Richards and R. Schmidt (eds), 191– 226. London: Longman. Rosenberg, B. A. 1970. The formulaic quality of spontaneous sermons. Journal of American Folklore 83: 3–20. Sacks, H, Schegloﬀ, E. A. and Jeﬀerson, G. 1974. A simplest systematics for the organisation of turn taking. Language 50: 696–735. Saville-Troike, M. 1982. The Ethnography of Communication. Oxford: Basil Blackwell. Smith, C.W. 1989. Auctions: the Social Construction of Value. London: Harvester Wheatsheaf. Smith, J. 1991. Salutations, Felicitations, and Terminations: A Study in Communicative Performance. University of Canterbury: M. A.

Knowledge and acquisition of formulaic sequences A longitudinal study Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow University of Nottingham

Introduction Formulaic language has become one of the major issues in applied linguistics in the new millennium. Although relatively new to many scholars, it has been an important topic for decades in Russian and German academic circles (see Burger, 2003; Cowie, 1998; Howarth, 1996), and has been steadily developing for over 20 years in Anglophone research literature. Pawley and Syder (1983) were among the ﬁrst English-based researchers to recognize the importance of conventionalized language, and Sinclair followed up in 1991 with his ‘idiom principle’. Nattinger and DeCarrico (1992) expanded on this and explored the relationship between lexical phrases and functional language. Now there is a growing awareness that much of the systematicity of language is lexically-driven, with the resultant concept of ‘lexico-grammar’ (e.g. Biber et al., 1999; DeCarrico and Larsen-Freeman, 2002). This work has been instrumental in establishing the ubiquity of formulaic language and its importance in the usage of language in general. However, much of this research has been descriptive in nature, often utilizing corpus analysis. There has been less research into the acquisition of formulaic sequences, mainly focusing on the L1 acquisition of young children. Research into L2 acquisition is relatively scarce (see Wray, 2002 for an overview) and given the importance of formulaic sequences in language use, it seems an opportune time to give this area further attention. (See Schmitt and Carter, this volume, for a more detailed overview of formulaic sequences and their acquisition.) This study is one step in that direction. It will attempt to describe the acquisition of a set of target formulaic sequences under semi-controlled conditions. In addition, because individual diﬀerence factors have been shown to have

56

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

an important inﬂuence on language learning in general (Dörnyei and Skehan, 2003; Sawyer and Ranta, 2001), it is logical to suspect that they also inﬂuence the acquisition of formulaic sequences. Thus we will measure several of these factors (i.e. their age, gender, language aptitude, and motivation) in order to determine their eﬀect on formulaic sequence acquisition.

Methodology Selecting the target formulaic sequences The target formulaic sequences for this longitudinal study were chosen with three main criteria in mind. First, we needed to make sure that target formulaic sequences occurred with some degree of frequency in language use. Second, the target sequences would be incorporated into an EAP teaching environment, and so they needed to be connected with academic discourse. Third, in order to secure the cooperation of the language instructors at the Centre for English Language Education (CELE) at the University of Nottingham, the sequences also needed to be seen as useful to students and worthwhile to teach. Based on these criteria, the following procedure was carried out to identify and select appropriate formulaic sequences for the study. Our initial step was to consult reference materials which listed and discussed formulaic sequences of various kinds. We extracted 97 candidate formulaic sequences of an academic nature from the Biber et al.’s (1999) analysis of lexical bundles, and 59 candidate formulaic sequences from Nattinger and DeCarrico’s (1992) functional analysis of lexical phrases. We then took words from Hyland’s (2000) list which are used to express doubt and certainty (e.g. clearly and approximately) and which are used as discourse markers (e.g. therefore and ﬁnally) and submitted them to a corpus analysis to see if they formed the core of a formulaic sequence (clearly the best). If so, they were added to our candidate list. Once the list of candidate formulaic sequences was compiled, we determined how frequently they occurred in each of three corpora. Frequency ﬁgures from the British National Corpus (BNC) gave an indication of how often the sequences occurred in general English, ﬁgures from the CANCODE corpus indicated how frequent they were in spoken discourse, and ﬁgures from the MICASE corpus showed their frequency in academic spoken discourse. Based on these frequency ﬁgures, we were able to identify the formulaic sequence candidates with the highest frequencies in written, spoken, and academic contexts.

Knowledge and acquisition of formulaic sequences

The next step was to identify formulaic sequences which occurred in the CELE textbooks. We examined seven textbooks which would be used in the CELE summer presessional program: Upper Intermediate Matters (Bell and Gower, 1992) Lexis: Academic Vocabulary Study (Burgmeier, Eldred, and Boyd Zimmerman, 1991) Functions of English (Jones, 1981) Academic Writing Course (Jordan, 1992) Writing Academic English (Oshima and Hogue, 1999) A Way with Words Book 3 (Redman, 1991) Traveling the World through Idioms (Kadden, 1998). In addition, we looked through the CELE teaching materials for possible sequences. This search of textbooks and materials yielded 74 potential target formulaic sequences. After comparing the CELE list with the candidate list from our literature review, we compiled a short list of 45 candidate formulaic sequences which occurred in the CELE materials and which also had relatively high frequency ﬁgures in one or more of the corpora consulted. We ﬁxed these sequences to a questionnaire and surveyed the CELE language instructors for their opinions about the relative usefulness of the formulaic sequences on the list. Based on this survey and on further discussions with the instructors, the ﬁnal list of 20 formulaic sequences was agreed upon. Thus, the selection of the ﬁnal target formulaic sequences was based on a combination of criteria including appearance in the literature, appearance in CELE materials, frequency, and instructors’ intuitions of usefulness.

Developing the measurement battery Once the ﬁnal target formulaic sequences were decided upon, the next task was to develop elicitation instruments for productive and receptive measures of the target sequences, as well as language aptitude and motivation. Starting with the formulaic sequence instruments, we wished to incorporate the sequences in as natural a context as possible. Therefore instead of using separate short contexts for each of the formulaic sequences, we wrote two extended contexts into which we were able to embed all of the target sequences. We controlled the vocabulary load of the context stories by analyzing them with the vocabulary frequency proﬁle tool available on Tom Cobb’s website (http://132.208.224.131/) and eliminating most of the lower-frequency lexis.

57

58

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

For the productive measurement instrument, we adopted a test format which blended elements of cloze and C-test techniques. In it, the context was left intact, but most or all of the content words in each formulaic sequence were deleted and a blank inserted. To constrain the choice of words possible in each blank, the initial letter(s) of each word were given. Learners were instructed to complete the words on the blanks. The instrument was designed to measure whether the participants could produce the formulaic sequence appropriate for the surrounding context, and not whether they could discern the appropriate meaning for that context. Therefore, the meaning realized by the target formulaic sequence was given to the participants in the right margin as part of the item, and it was their task to produce the proper target form given that meaning and context. To highlight the fact that the blanks were part of a larger formulaic sequence, the whole sequence was put in bold font. An example of the resulting format is: Learning English as a second language is a diﬃcult challenge, but we do know several ways to make learning more eﬃcient. Fi of a , almost every research study shows that you need to use English as much as possible.

(the initial one)

[Answer: First of all] For the receptive version of this test, the same contextualization stories were used. A single line was inserted in place of the target formulaic sequences, and participants were to choose from four options in a multiple-choice test format. The distractors were written to be semantically similar to the correct option, and as similar in form and length as possible. Since all options were grammatically possible, the BNC was checked to ensure that each correct option was by far the most frequent and natural option for the particular context. A ﬁfth option (I DON’T KNOW) was included in order to oﬀer an option that did not force a participant to guess if they did not know the answer. Because the options for each item all had a similar meaning and form, it should be relatively diﬃcult to guess correctly unless a participant has some intuition about the correct form of the formulaic sequence targeted. An example of the receptive format is as follows:

Knowledge and acquisition of formulaic sequences

International debt Speaker A: I’ve been watching the news report and they say that (11) _____ the international debts of poorer countries might be cancelled.

11. a. b. c. d. e.

there’s a good chance that it seems to be happening that the evidence is increasing that people are thinking that I DON’T KNOW

[Answer: a] As part of the study, we wished to compare the knowledge and acquisition of formulaic sequences with a measure of the learners’ vocabulary in general. In order to do this, the learners were also given a vocabulary size measure. The measure chosen was the Vocabulary Levels Test (Schmitt, Schmitt, and Clapham, 2001). After consultation with CELE colleagues on the anticipated proﬁciency level of the participants, sections of the Vocabulary Levels Test focusing on the 3,000 and 5,000 frequency levels were selected as being the most appropriate. The 2,000 level was deemed too basic for the relatively advanced EAP students, while the 10,000 level was still considered quite diﬃcult. We also wished to get some indication of how a learner’s language aptitude and attitudinal/motivational proﬁle aﬀects the acquisition of formulaic sequences, so our test battery included measurements of these attributes. A 14-item aptitude test was adapted from a recently developed aptitude battery that contains a number of tasks based on an artiﬁcial language (Ottó, 2002). The attitude/motivation survey could have potentially covered many aspects, but because it was to be part of the pre- and post-test packages, its overall length had to be limited; therefore, rather than aiming for comprehensiveness, the content of the questionnaire was designed to cover a few selected attitudinal/motivational variables that were particularly relevant to the project and which have been found to play a central role in determining L2 learning behaviours and eﬀort (cf. Dörnyei, 2001; Dörnyei and Kormos, 2000). In line with the principles of questionnaire theory (Dörnyei, 2003), all the variables were made up of multiple items; the only exception was a self-report behavioural measure, Intended eﬀort, which was deﬁned by a single item. Table 1 presents the ﬁnal variables, a short description, and the number of items they contained. Draft items for the productive and receptive formulaic sequence measures were ﬁrst piloted on four native speakers, who all completed both instruments 100% correctly. This indicated that the measurements would not pose problems for proﬁcient English speakers. The complete test battery was then piloted on 21 international students attending a summer presessional course at Notting-

59

60

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow Table 1. List and description of the attitudinal/motivational variables used in the study Attitudinal/motivational variable

Description

Attitudes toward L2 learning

Subjective appraisal of the enjoyment of learning L2s and English in particular Integrativeness A broad positive disposition towards the L2 speaker community, including an interest in their life and culture Instrumentality Perceived job- and career-related beneﬁts of proﬁciency in English Language use anxiety Anxiety experienced while using the L2 Commitment to learn English The importance attached to mastering a high level of English Intended eﬀort The amount of eﬀort the student is willing to put into learning English

No. of items 3 6 3 3 2 1

ham Trent University. They were similar in kind to the eventual participants in the main study, although slightly weaker overall. The instruments were shown to be informative and have acceptable test characteristics, except for the aptitude test and the receptive formulaic sequence measure, which item analyses using ITEMAN (1989) showed to be slightly too easy. The receptive sequence measure was revised to increase the plausibility of the distractors. A native speaker respondent conﬁrmed that the revised items were still clear in terms of the key being correct and the distractors being inappropriate. The language aptitude measure was revised by deleting a number of the easier items and replacing them with more challenging items. Unfortunately there was no suitable group of international students to carry out a second pilot study to conﬁrm the test battery changes. The order of the test battery was: 1. 2. 3. 4. 5. 6.

productive formulaic sequences aptitude attitude/motivation Vocabulary Levels Test 3000 Vocabulary Levels Test 5000 receptive formulaic sequences.

This order was adopted so that the participants would have to produce the formulaic sequences ﬁrst without any chance of contamination from the other test

Knowledge and acquisition of formulaic sequences Table 2. Reliability estimates for the test battery component measures (Cronbach alpha)

Productive formulaic sequences Receptive formulaic sequences Language aptitude Attitudes/motivation Attitudes toward L2 learning Integrativeness Instrumentality Language use anxiety Commitment to learn English Intended eﬀort a

N=94

b

N=70

c

T1a

T2b

.65 .65 .78

.72 .67 —c

.78 .63 .52 .65 .55 —d

.76 .73 .64 .73 .56 —d

not given as part of T2 battery

d

no reliability ﬁgures possible

components, and then have to work through those other components before they came to the receptive formulaic sequence measure. By this time, any direct memory of the productive measure and any clues from that measure should have been minimized. The aptitude measure was considered relatively challenging, so it was placed second in the battery. The motivation survey was relatively easy to complete, and so seemed a good ‘break’ before the last three test components. The piloting indicated that the test battery would have a satisfactory level of reliability, and this was conﬁrmed by reliability estimates produced in the main study (Table 2). Given that the attitude/motivation scales were particularly short, the reliability coeﬃcients are acceptable, particularly for the posttest. The Vocabulary Levels Test sections were analyzed previously for reliability in a validation study which found reliability ﬁgures of about .93 for the 3,000 and 5,000 levels (Schmitt, Schmitt, and Clapham, 2001).

Participants The participants in this study consisted of students attending presessional courses at the University of Nottingham’s EAP program, based at the Centre for English Language Education (CELE). The students intended to enter the University of Nottingham in the autumn semester, and most were of a proﬁciency level above or near the minimum university entrance requirement of CBT TOEFL 213 (Paper TOEFL 550) or IELTS 6.0. Of the 94 total participants, 20 submitted CBT TOEFL Scores (M=216.90, Range=173–297), 20 submitted

6

62

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

paper TOEFL scores (M=576.90, range=500–637), and 64 submitted IELTS scores (M=5.55, range 3.5–6.5). Several students submitted scores from more than one test. The participants were mainly 22–26 years old (m=25.58, range 18–43), and intended to study a broad range of disciplines. 67 participants were female and 27 male. The majority (63) of the participants spoke Chinese as their L1, 10 spoke Japanese, while the remaining 21 participants spoke 12 diﬀerent mother tongues. Involvement in the study was voluntary, with 94 out of a possible 400 students participating in the ﬁrst test battery, and 70 of these students continuing on to take the follow-up test battery.

Procedure The participants were enrolled in either a two-month (N=62) or three-month (N=32) presessional course EAP course. (Ten of the 3-month students were continuing on as part of a 6-month course). The complete battery of measures was given as a pre-test (T1) within the ﬁrst week of their studies, and the same measures were given again (with the exception of the language aptitude instrument) as a post-test near the end of the course. The treatment consisted of exposure to the target formulaic sequences as part of the normal EAP instruction. It would have been ideal to control both the amount and type of exposure which the participants received, but since they were enrolled in a live EAP presessional course and spread across a number of class groups, this was not thought feasible. However, we ensured that each participant received a minimum amount of exposure through the following means: • The target formulaic sequences were present at least once in the materials each student used during the course. • The teachers drew the attention of their groups to each of the formulaic sequences at some point in the course, but without giving away that they were the target of the research. The teachers were free to introduce the sequences in any way and in any order they thought suitable. The variability of presentation and length of course means that the study will not be informative about the type of instruction or number of exposures necessary to acquire formulaic sequences. The analysis consisted of determining the amount of change in lexical knowledge between T1 and T2, and then investigating statistically whether any of the explored variables was able to account for this change.

Knowledge and acquisition of formulaic sequences

Results and discussion Pre-existing knowledge of formulaic sequences The ﬁrst issue to be addressed is the number of formulaic sequences the learners knew at the beginning of the CELE course. Several scholars have suggested that proﬁcient language users know a large number of formulaic sequences (e.g. Fillmore, Kay, and O’Connor, 1988; Wray, 2002), with Pawley and Syder (1983) suggesting that speakers know several hundred thousand of these sequences. Although these scholars’ arguments are persuasive, the claim of a large phrasal vocabulary has seldom been put to empirical test. The design of this study allows us to address this claim to a limited extent. Although it is impossible to fully generalize from the small numbers of participants and items in this study, the participants were typical of the type of international student seeking to do postgraduate studies in the UK, coming from a variety of countries and diﬀerent education systems. Likewise the target formulaic sequences should reﬂect the useful sequences occurring in academic texts, as they were chosen on the basis of previous research (e.g. Biber et al., 1999), judgments of academic value, and relatively high frequency. With this in mind, the participants’ knowledge of the target formulaic sequences in the T1 administration should give a useful indication of this type of advanced learners’ knowledge of this type of formulaic sequence. The mean scores of the measures of both productive and receptive mastery of the target formulaic sequences indicate that the participants had considerable knowledge of these sequences before they started the CELE course (Table 3). In terms of receptive mastery, the participants correctly recognized an average of nearly 17 out of the 20 sequences tested. Even if a limited degree of successful guessing occurred on the receptive measure, the productive mean score was nearly 13 correct out of 20, with this score being derived from a cloze-like measurement where it would be relatively diﬃcult to guess correctly. These scores are relatively high, so it appears that international students can achieve a considerable knowledge of formulaic sequences by the time they reach this level of proﬁciency. We can compare this with the participants’ vocabulary size. On average, they knew 87% (26.13/30) of the words in the 3000 frequency band and 56% (16.84/30) of the words in the 5000 frequency band. They almost certainly knew even higher percentages of words in the 2000 frequency band. This suggests that once students reach this order of vocabulary size, they are likely to also know a large range of formulaic sequences in addition to individual words. However,

63

64

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow Table 3. Mean scores on lexical measurements

Productive formulaic sequencesa Receptive formulaic sequencesa Vocabulary Levels Test 3000b Vocabulary Levels Test 5000b a

Max score=20

b

Max score=30

c

T1

T2

Gain

% Gain

12.83 16.84 26.13 19.77

16.03c 18.87c 26.93c 22.12c

3.20 2.03 .80 2.35

24.9% 12.0% 3.1% 11.9%

p ≤ .001 Matched pairs t-test

this conclusion must be tempered by the fact that correlations between the vocabulary size measures (T1 and T2) and the formulaic sequence measures (T1 and T2) were of only modest strength (3000 level and productive formulaic sequences = .42–.54, 5000 level and productive formulaic sequences = .31–.36, 3000 level and T2 receptive formulaic sequences = .26–.37, 5000 level and T2 receptive formulaic sequences = .28–.29 [all receptive formulaic sequence T1 correlations nonsigniﬁcant]). Furthermore, the formulaic sequence gain scores (both productive and receptive) showed no signiﬁcant correlation with any of the vocabulary size scores (3000 T1 and T2, 5000 T1 and T2, 3000 gain score, 5000 gain score). Thus, although there seems to be a moderate connection between vocabulary size and formulaic sequence knowledge, this study showed no connection between the ability to learn formulaic sequences and the vocabulary size of individual words, at least in the 3000 and 5000 frequency bands. It seems that the relationship between size of the ‘individual word lexicon’ and the ‘formulaic sequence lexicon’ is not straightforward. It is possible that some of students may have been explicitly taught some of the more transparent formulaic sequences (such as ﬁrst of all or on the contrary) before coming to CELE, but it is probable that most of the less transparent formulaic sequences were acquired through exposure, because sequences like in spite of are unlikely to be given much explicit attention by teachers or textbooks. It is therefore likely that the relatively substantial formulaic sequence scores in the T1 largely reﬂect incidental learning, and concomitantly, the extended period of study it took most of the students to achieve their relatively advanced levels of proﬁciency.

Improvement in knowledge of formulaic sequences over the course Two or three months is a relatively short period of time in SLA terms, and many language studies have found it diﬃcult to show gains in such a time scale. How-

Knowledge and acquisition of formulaic sequences

ever, the CELE course is an intensive program averaging 5 hours per day and 3–4 hours of homework, and it is ﬁlled with highly motivated students. Moreover, vocabulary is one aspect of language where tangible gains can be demonstrated in a short time period (e.g. the Keyword studies, see Hulstijn, 1997 for an overview). The Vocabulary Levels Test ﬁgures in Table 3 shows that the participants did indeed increase their receptive vocabulary size at both levels, even in such a short course. The gain at the 3,000 frequency level does not seem particularly impressive, even though it is statistically reliable, with the 3.1% increase translating into something like 27 new words learned. However, this limited increase can mostly be explained by a ceiling eﬀect, since the participants already knew over 26 out of the 30 target words on average at the time of the T1 administration. This is not surprising, as students wishing to enter an English-medium university can be expected to know the majority of words at the 3,000 level. It is at the 5,000 level where the real improvement occurred, with an 11.9% increase indicating that something like 157 new words were learned. Again, this may not seem like a large number of new vocabulary, but it must be remembered that CELE concentrates on academic vocabulary, such as that on the Academic Word List (Coxhead, 2000), and also helps students to improve their mastery over the vocabulary they already know. Thus the increase in general vocabulary represents meaningful learning. The next question is whether the gains in knowledge of formulaic sequences mirrored the increase in vocabulary size. Table 3 shows that the participants did indeed increase their knowledge of the target formulaic sequences, both receptively and productively. This increase was statistically-reliable at the p≤.001 level (matched pair t-test). (Many of the distributions were not normal, so non-parametric Wilcoxon Signed-Ranks tests were also run, with all results signiﬁcant at p < .001.) In terms of receptive knowledge of the target formulaic sequences, the participants moved from a score of almost 17 out of 20 on the T1 to nearly 19/20 on the T2. Thus, even though the T1 scores were quite high, the participants were able to show an improvement, to the point of being able to recognize nearly all of the target formulaic sequences by the end of the course. In fact, nearly half of the participants (34/70) received full marks in the T2 administration. The productive scores showed the greatest improvement in terms of percentage gain of all of the lexical measures. The advantage of productive gain over receptive gain may be partially due to the absence of a ceiling eﬀect with the productive scores, but in addition, a number of the formulaic sequences known to a receptive degree in the T1 had been mastered productively in the T2 (see discussion below).

65

66

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

Diﬀerence in gains between learners studying for 2 and 3 months The participants had exposure for either two or three months at CELE. It is worth checking whether the additional month of exposure enjoyed by some participants translated into bigger lexical gains by those participants. Independent sample t-tests were performed and no statistically-reliable diﬀerence in gains were found for productive formulaic sequence knowledge, receptive formulaic sequence knowledge, 3,000 level vocabulary, or 5,000 level vocabulary (all p>.05), although the 5,000 level vocabulary approached signiﬁcance (p=.055). It seems that for this level of student and this type of course, the additional month was not suﬃcient to lead to increased lexical knowledge, at least as demonstrated on these instruments.

Change in degree of mastery of the formulaic sequences over time The results so far indicate that the learners had considerable knowledge of the target formulaic sequences before they entered the CELE course, and they advanced this knowledge during their course of study. The mean scores in Table 3 illustrate this improvement, but such scores often hide a great deal of variation. To explore the acquisition of the target formulaic sequences in more detail, each participant’s responses from T1 to T2 were checked and placed into one of nine possible categories. Each category was then tallied (see Table 4). The total number of cases was 1400 (70 participants who took both T1 and T2 measures × 20 formulaic sequences).

Table 4. Change in degree of mastery of formulaic sequences over the CELE course T1 knowledge state Unknown Unknown Unknown Receptive Receptive Receptive Productive Productive Productive

→ → → → → → → → →

T2 knowledge state

Description

Number of cases (1400 total)

Unknown Receptive Productive Unknown Receptive Productive Unknown Receptive Productive

No learning Learning to receptive state Learning to productive state Attrition Stable receptive knowledge Enhancement of knowledge Attrition Attrition Durable productive knowledge

22 53 59 11 129 233 5 55 833

Knowledge and acquisition of formulaic sequences

The category with the greatest number of cases (nearly 60%) was Productive →Productive, reﬂecting the learners’ relatively strong performance on the formulaic sequences measure. Although the ability to complete a cloze item successfully does not demonstrate the ability to use the formulaic sequence at will in discourse, it does give persuasive evidence of at least some degree of productive knowledge, especially since the items were completed successfully twice in a two to three month period. This result gives additional support to the ﬁnding that the learners knew a large number of the target formulaic sequences, and it also shows that the productive mastery was maintained over the two or three months of study. In about 9% of the cases, existing receptive knowledge was maintained (Receptive → Receptive). From this, we see that the learners were more likely to know the sequences to a productive, rather than receptive, level of mastery. The cases where the formulaic sequences were unknown in the T1 are interesting because an analysis can illustrate if acquisition took place during the CELE course, and to what degree. In only about 16% of the cases did unfamiliar formulaic sequences remain unknown at the T2 (Unknown → Unknown). In around 40% of the cases the learners gained receptive mastery of the sequences (Unknown → Receptive), and in around 44% of the cases the learners gained productive mastery (Unknown → Productive). This is encouraging, as the cases of learning in this study outnumbered the cases of non-learning by a 5–1 ratio. It seems that the elements were in place in the CELE program for the learning of unknown formulaic sequences to take place. In particular, the students were exposed to each formulaic sequence at least once in their pedagogic materials, and their teachers also explicitly drew their attention to each sequence at least once during the course. These results show that this level of exposure was suﬃcient for meaningful learning of formulaic sequences to occur. Of course, many students undoubtedly received more than the minimum exposure, and the exposure certainly came in diﬀerent forms from the diﬀerent teachers. The point remains, however, that the instruction and enhanced exposure involved in this study did seem to facilitate the acquisition of formulaic sequences. Unfortunately, it is impossible to determine whether this facilitation derived mainly from the explicit presentation of the formulaic sequences, or whether the vastly increased language exposure inherent in an intensive language program is sufﬁcient by itself. Future studies should include a control group if possible to parcel out the relative eﬀects of these two variables. If we disregard the cases where the level of mastery remained the same (Unknown → Unknown, Receptive → Receptive, Productive → Productive), we

67

68

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

ﬁnd that in 83% of the remaining cases the knowledge state advanced, and in 17% the state of knowledge deteriorated. Thus, gains outnumbered the losses in a ratio approaching 5–1. This is largely due to the cases where the state of knowledge was enhanced from a partial receptive mastery to a more complete productive mastery. This shows that with formulaic sequences, as with individual words, the value of study is as much for the consolidation and enhancement of partially known vocabulary as it is for learning new vocabulary. Readers might be surprised at the 71 cases of attrition, especially in such an intensive EAP environment. However, lexical studies involving single words typically reveal some attrition (e.g. Weltens and Grendel, 1993; Schmitt, 1998), and it would be surprising if no cases of attrition occurred in a study focusing on formulaic sequences. The processes of lexical acquisition, maintenance, and attrition are not yet well understood, but Meara (e.g. 1999) is probably right to view the mental lexicon as a dynamic network, with words constantly becoming more and less available depending on factors such as recency of last use and the existence of associations in the context. It seems that mastery of formulaic sequences is also subject to this same ebb and ﬂow. A word of caution needs to be inserted here about the receptive vs. productive comparisons. They rely on valid measures of receptive and productive mastery, but it is actually not that easy to get unambiguous indications of either. Waring (1999: Chapter 2) found that results can depend as much on the relative diﬃculty of receptive and productive measurement instruments as on the true underlying mastery of learners. Although the common assumption is that receptive mastery typically precedes productive mastery, he often obtained higher productive scores than receptive scores from learners if the receptive instruments were relatively diﬃcult and the productive ones relatively easy. We feel that the cloze and multiple-choice tests are reasonable measures of productive and receptive mastery of the target formulaic sequences, and so the scores reﬂect this mastery rather than being artefacts of the diﬃculty of the tests themselves. Thus, overall we feel the results obtained are valid, but any interpretations must be made with the above caveat in mind.

Relationship of knowledge of formulaic sequences and other variables Since the focus of the study was the acquisition of formulaic language, we computed gain scores by simply subtracting the T1 scores from the T2 scores. We then correlated these gain scores with the individual diﬀerence measures.

Knowledge and acquisition of formulaic sequences

We obtained a very noteworthy result pattern: none of the correlations reached statistical signiﬁcance. Thus neither the aptitude measure nor the attitude/motivation items (both T1 and T2 were included in the analysis) correlated with the gain scores. In other words, the individual diﬀerence variables that we have included in our research paradigm did not directly aﬀect the acquisition of formulaic phrases. This result is rather surprising, given that such variables have been shown to aﬀect other aspects of language (Dörnyei 2002; Dörnyei & Csizér, 2002). It suggests that the relationship between the acquisition of formulaic sequences and learner attributes is not direct/linear; in other words, although learner characteristics might well aﬀect formulaic language development, their impact may be modiﬁed by other factors related to the learning context. Such a relationship could be identiﬁed by a longer, more focused study of individual diﬀerence and contextual variables/processes, and for this reason we carried out a parallel longitudinal qualitative study of various situated determinants of the language development of selected participants which did indeed reveal a complex interrelationship between situated learning and formulaic language gains (see Dörnyei, Durow and Zahran, this volume).

Conclusion Formulaic language is becoming an increasingly important topic in applied linguistics, but one which raises many questions concerning the acquisition of such language. The present study was designed to explore some of the issues revolving around the learning of academically-based formulaic sequences. It found that relatively proﬁcient EAP learners knew a considerable number of these formulaic sequences, and that they enhanced this knowledge over the course of the 2–3 month EAP program. This enhancement took the form of both learning new formulaic sequences, and of improving mastery of receptively-known sequences to a productive level, although the aptitude/attitude/ motivation factors explored did not account for this enhancement. Future studies could usefully build on these results by controlling for input to discover whether such enhancement stems from explicit instruction, or whether exposure to a rich ESL environment is suﬃcient in itself. They could also explore whether other individual diﬀerence factors might have an eﬀect on the learning of formulaic sequences.

69

70

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

References Bell, J. and Gower, R. 1992. Upper Intermediate Matters. London: Longman. Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Burger, H. 2003 (2nd ed). Phraseologie: Eine Einführung am Beispiel des Deutschen. Berlin: Eric Schmidt Verlag. Burgmeier, A., Eldred, G., and Zimmerman, C. B. 1991. Lexis: Academic Vocabulary Study. Englewood Cliﬀs NJ: Prentice Hall Regents. Cowie, A. P. 1998. Phraseological dictionaries: Some East-West comparisons. In Phraseology: Theory, Analysis, and Applications, A. P. Cowie (ed.), 209–228. Oxford: Clarendon Press. Coxhead, A. 2000. A new academic word list. TESOL Quarterly 34: 213–238. DeCarrico, J. and Larsen-Freeman, D. 2002. Grammar. In An Introduction to Applied Linguistics, N. Schmitt (ed.), 19–34. London: Arnold. Dörnyei, Z. 2001. Teaching and Researching Motivation. Harlow: Longman. Dörnyei, Z. 2002. The motivational basis of language learning tasks. In Individual Diﬀerences and Instructed Language Learning, P. Robinson (ed.), 137–158. Amsterdam: John Benjamins. Dörnyei, Z. 2003. Questionnaires in Second Language Research. Mahwah NJ: Lawrence Erlbaum. Dörnyei, Z., and Csizér, K. 2002. Some dynamics of language attitudes and motivation: Results of a longitudinal nationwide survey. Applied Linguistics 23: 421–462. Dörnyei, Z., and Kormos, J. 2000. The role of individual and social variables in oral task performance. Language Teaching Research 4: 275–300. Dörnyei, Z. and Skehan, P. 2003. Individual diﬀerences in second language learning. In The Handbook of Second Language Acquisition, C. J. Doughty and M. H. Long (eds), 589–630. Oxford: Blackwell. Erman, B. and Warren, B. 2000. The idiom principle and the open choice principle. Text 20: 29–62. Fillmore, C. J., Kay, P., and O’Connor, M. C. 1988. Regularity and idiomaticity in grammatical constructions: The case of LET ALONE. Language 64: 501–538. Howarth, P. 1996. Phraseology in English Academic Writing: Some Implications for Language Learning and Dictionary Making. Tübingen: Max Niemeyer. Hulstijn, J. H. 1997. Mnemonic methods in foreign language vocabulary learning. In Second Language Vocabulary Acquisition, J. Coady, and T. Huckin (eds), 203–224. Cambridge: CUP. Hyland, K. 2000. Disciplinary Discourses: Social Interactions in Academic Writing. Harlow: Longman. ITEMAN. 1989. St. Paul MN: Assessment Systems Corporation. Jones, L. 1981. Functions of English. Cambridge: CUP. Jordan, R. R.. 1992. Academic Writing Course. London: Nelson. Kadden, J. 1998. Traveling the World through Idioms. Ann Arbor MI: University of Michigan Press. Meara, P. 1999. Self organization in bilingual lexicons. In Language and Thought in Development, P. Broeder and J. Murre (eds), 127–144. Tübingen: Gunter Narr Verlag. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP.

Knowledge and acquisition of formulaic sequences Oshima, A. and Hogue A. 1999. Writing Academic English (3rd edition). New York: Addison Wesley. Ottó, I. 2002. Magyar Egységes Nyelvérzékmérő Teszt. Unpublished material. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J.C Richards and R.W. Schmidt (eds), 191–225. London: Longman. Redman, S. 1991. A Way with Words. Cambridge: CUP. Sawyer, M., and Ranta, L. 2001. Aptitude, individual diﬀerences, and instructional design. In Cognition and Second Language Instruction, P. Robinson (ed.), 319–353. Cambridge: CUP. Schmitt, N. (1998). Tracking the incremental acquisition of second language vocabulary: A longitudinal study. Language Learning 48: 281–317. Schmitt, N., Schmitt, D., and Clapham, C. 2001. Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing 18: 55–88. Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Waring, R. 1999. Tasks for Assessing Second Language Receptive and Productive Vocabulary. Unpublished PhD thesis: University of Wales, Swansea. Available at . Weltens, B. and Grendel, M. 1993. Attrition of vocabulary knowledge. In The Bilingual Lexicon, R. Schreuder and B. Weltens (eds), 135–156. Amsterdam: John Bejamins. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

7

Appendix 1 Vocabulary phrase completion NAME ____________________ In the two following passages, there are a number of vocabulary phrases in bold. Some words in these phrases have their second half cut oﬀ. Look at the context and ﬁll in the blanks with the missing half of the words. Sometimes only one letter is missing and sometimes several letters are missing. For example: The economy is sure to improve in the long r un .

(over a long period of time)

Learning English Learning English as a second language is a diﬃcult challenge, but we do know several ways to make learning more eﬃcient. Fi of a , almost every research study shows that you need to use English as much as possible. I is cl that the more you use English, the better you will learn it. There is no disagreement about this. Second, learning English from textbooks seems to help in most cases. Most evidence shows that studying grammar rules with textbooks can help you memorise those rules. Therefore, i seems lik that studying with textbooks can help you learn something about grammar. However, if you only use a book but do not practice speaking, you will probably not be able to use the grammar rules when you speak. If we look at language learning research, there i no evid that just learning from textbooks can make you a good speaker. Of course, studying grammar will help your speaking t a certain ext , but it is not the only thing you need to know. You must also know how to do things like ask questions and give directions. With reg to giving directions, you must know phrases like Turn right at the corner. Third, beginning learners and advanced learners need to study diﬀerently. For example, beginners have little

(to begin with) (this is obvious)

(this is probably true)

(nothing to show that this is true) (some amount, but not all) (concerning this certain thing)

Knowledge and acquisition of formulaic sequences language proﬁciency to build upon. At this st , it is probably best to focus on building vocabulary and learning grammar. In te of vocabulary, beginning learners should try to learn the most common 2,000 words of English. Then, as intermediate students, they should try to build a vocabulary size of 5,000 words. It may take some time to learn this many words, but as a res , learners should be able to read natural English texts, like newspapers and magazines. Advanced learners should learn even more vocabulary. If a learner continues to study over many years, in t long te they can reach a vocabulary size of 10,000 words or more.

(at this point of development) (concerning this certain thing) (something happening because of another thing) (over a long period of time)

International debt Speaker A: I’ve been watching the news report and they say that there’s a go cha that the international debts of poorer countries might be cancelled. Speaker B: Really? I don’t think so. As f as I kn the international banks do not want to cancel the debt because it would cost them too much money.

,

(this will probably happen) (I think this is true)

A: On the cont , the banks would have more money because they would get some money from the government instead. They may not get the loans back from the poorer countries for a long time anyway.

(the opposite is true)

B: That’s a good po . I guess it’s better for the banks to get some money now and just forget the loans, particularly when they take into acc the fact that some countries may never be able to pay the loans back.

(the idea in your argument is a good one) (consider this issue)

A: The problem is that many people do not want the government to pay the banks. They feel that the banks caused their own problems by lending money too easily. B: I s what you me . Many specialists told the banks that some countries had very weak economies and could not repay the loans. In sp of this, the banks loaned the money anyway. A: Yes, some loans were too dangerous. On the other ha , some countries used the money wisely to improve their economies and their people’s living conditions.

(I understand your argument) (doing something even though there is a good reason not to) (looking at the opposite argument)

73

74

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow B: That’s true. But the po is that many countries cannot pay back their loans and it is damaging their economies. There are many arguments for and against cancelling the loans, but o the wh , I think it would be best to cancel them.

(the main issue in the argument) (considering the complete situation)

Appendix 2 Language Analysis Name: ______________ The list in the box below contains words/phrases from an imaginary language along with their English translation. Following this, there will be 14 short English sentences, each with four possible translations into the imaginary language. Based on the examples given in the box, we would like to ask you to try and work out which of the four options is the correct translation of each sentence. Thank you very much. kau meu kau meud bo kau meud bi so ciu

dog cat The dog is chasing the cat. The dog was chasing the cat. watch mouse

1. The dog is watching the cat. a. kau meud so b. kau meud si c. meu kaud so d. meu kaud si 2. The cat was watching the mouse. a. meud ciu so b. meu ciud so c. meud ciu si d. meu ciud si 3. You are watching us. a. paxbo b. paxso c. xapbo d. xapso 4. You were chasing the dog. a. xa kaud bo b. pa kaud bo c. pa kaud bi d. xa kaud bi 5. We were watching you. a. xapsi b. paxso c. paxsi d. paxbi 6. You are not watching the cat. a. xa meud bor b. xa meud sor c. xa meud sir d. xa meu sor 7. You are not chasing us. a. paxbor b. xapbo c. xapabor d. xapbor

pa xa pasau meud bo pa meud bo paxbo pa meud bor

we, us you Our dog is chasing the cat. We are chasing the cat. We are chasing you. We aren’t chasing the cat.

76

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow 8. We were not watching the dog. a. pa kaud sir b. pa kau sir c. pa kaud sor d. pa kaud bir 9. We were not chasing you. a. xapbir b. paxbir c. paxbor d. xapbor 10. Your cat is chasing the mouse. a. xacu meud bo b. xaseu ciud bo c. meuxa ciud bo d. ciuxa meud bo 11. You are not watching our dog. a. xa paseud bor b. xa pasaud sor c. xa pasaud so d. xa pasaud bor 12. Our mouse was not chasing the dog. a. oasiu kaud bi b. xasiu kaud sir c. xasiu kaud bi d. pasiu kaud bir 13. Your mouse is chasing us. a. xa ciu pabo b. xasiu pbo c. xaciu pa bo d. xasiu pabo 14. Our cat was not chasing your dog. a. pseu xasaud bir b. pseu xsaud bir c. paseu xasaud bir d. paseu xsaud bir

Appendix 3 Language attitudes Name:

______

Following are a number of statements with which some people agree and others disagree. We would like you to indicate your opinion after each statement by putting an ‘X’ in the box that best describes the extent to which you agree or disagree with the statement. Thank you very much for your help! For example: Hamburgers are unhealthy. ☐ ☐ ☐ Strongly Disagree Slightly disagree disagree

☐ Partly agree

☐ Agree

☐ Strongly agree

Strongly disagree Disagree Slightly disagree Partly agree Agree Strongly agree

If you think, for example, that this statement is absolutely false, you can put an ‘X’ in the ﬁrst box.

1. Learning foreign languages is a lot of fun. 2. I get nervous and confused when I have to speak with native speakers of English. 3. Learning English is important for me to learn more about the English culture. 4. If I learn to speak ﬂuent English I will be able to get a very good job. 5. Making good friends with British people is very important for me. 6. Learning English is often boring. 7. I am conﬁdent that I will be able to understand English ﬁlms and videos. 8. I like the way English people live. 9. I would like to acquire native-like proﬁciency during my stay in Britain.

☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow

Strongly disagree Disagree Slightly disagree Partly agree Agree Strongly agree

78

10. I ﬁnd it diﬃcult to use my English in real-life situations. 11. The more I learn about British people, the more I like them. 12. I won’t be able to do my job unless I speak very good English. 13. I really enjoy studying English. 14. Learning English is one of the most important things in my life now. 15. I ﬁnd some aspects of living in England diﬃcult. 16. Learning English is important for me to be able to become similar to English people. 17. I think I am good at learning foreign languages. 18. I would be very disappointed if I didn’t learn excellent English while I am here in England. 19. British people are often ‘cool’ and ‘distant’. 20. I usually get uneasy when I have to speak in English. 21. English proﬁciency is extremely important for my future career. 22. I really like the English culture. 23. I am planning to work very hard improving my English. 24. I would like to get to know as many British people as possible. 25. I don’t mind if I don’t become perfect in English — I would only like to learn enough to be able to do my academic studies.

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐ ☐ ☐

Appendix 4 Levels test Name ___________________ This is a vocabulary test. You must choose the right word to go with each meaning. Write the number of that word next to its meaning. Here is an example. l 2 3 4 5 6

business clock horse pencil shoe wall

part of a house animal with four legs something used for writing

You answer it in the following way. l 2 3 4 5 6

business clock ___6____ part of a house horse ____3___ animal with four legs pencil ____4___ something used for writing shoe wall

Some words are in the test to make it more diﬃcult. You do not have to ﬁnd a meaning for these words. In the example above, these words are business, clock, and shoe. If you have no idea about the meaning of a word, do not guess. But if you think you might know the meaning, then you should try to ﬁnd the answer.

Version 2 the 3,000 word level 1 2 3 4 5 6

bull champion dignity hell museum solution

formal and serious manner winner of a sporting event building where valuable objects are shown

80

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow 1 2 3 4 5 6

blanket contest generation merit plot vacation

1 2 3 4 5 6

comment gown import nerve pasture tradition

1 2 3 4 5 6

administration angel frost herd fort pond

1 2 3 4 5 6

atmosphere counsel factor hen lawn muscle

1 2 3 4 5 6

abandon dwell oblige pursue quote resolve

1 2 3 4 5 6

assemble attach peer quit scream toss

1 2 3 4 5 6

drift endure grasp knit register tumble

holiday good quality wool covering used on beds

long formal dress goods from a foreign country part of the body which carries feeling

group of animals spirit who serves God managing business and aﬀairs

advice a place covered with grass female chicken

live in a place follow in order to catch leave something permanently

look closely stop doing something cry out loudly in fear

suﬀer patiently join wool threads together hold ﬁrmly with your hands

Knowledge and acquisition of formulaic sequences 1 2 3 4 5 6

brilliant distinct magic naked slender stable

1 2 3 4 5 6

aware blank desperate normal striking supreme

Version 2 1 2 3 4 5 6

analysis curb gravel mortgage scar zeal

1 2 3 4 5 6

cavalry eve ham mound steak switch

1 2 3 4 5 6

circus jungle nomination sermon stool trumpet

1 2 3 4 5 6

artillery creed hydrogen maple pork streak

thin steady without clothes

usual best or most important knowing what is happening

the 5,000 word level eagerness loan to buy a house small stones mixed with sand

small hill day or night before a holiday soldiers who ﬁght from horses

musical instrument seat without a back or arms speech given by a priest in a church

a kind of tree system of belief large gun on wheels

8

82

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow 1 2 3 4 5 6

chart forge mansion outﬁt sample volunteer

1 2 3 4 5 6

contemplate extract gamble launch provoke revive

1 2 3 4 5 6

demonstrate embarrass heave obscure relax shatter

1 2 3 4 5 6

correspond embroider lurk penetrate prescribe resent

1 2 3 4 5 6 1 2 3 4 5 6

decent frail harsh incredible municipal speciﬁc adequate internal mature profound solitary tragic

map large beautiful house place where metals are made and shaped

think about deeply bring back to health make someone angry

have a rest break suddenly into small pieces make someone feel shy or nervous

exchange letters hide and wait for someone feel angry about something

weak concerning a city diﬃcult to believe

enough fully grown alone away from other things

Appendix 5 Vocabulary phrase multiple-choice In the two following passages, there are a number of blanks where vocabulary phrases should be. Look at the context and decide which phrase is most natural in the blank. Circle the letter of that answer. If you don’t know the answer and can only guess, circle “I DON’T KNOW”. For example: The economy is sure to improve (1) ___c____

1. a. b. c. d. e.

in the long period over a long time in the long run over a long space I DON’T KNOW

1. a. b. c. d. e.

The ﬁrst one First of all The ﬁrst thing First in line I DON’T KNOW

2. a. b. c. d. e.

It is clear to all that It is a clear case that It is clear that It is clear to know that I DON’T KNOW

3. a. b. c. d. e.

it seems likely that it looks likely that the likely thing is that the likely case is that I DON’T KNOW

4. a. b. c. d. e.

the evidence is nonexistent that the evidence does not exist that no evidence is available that there is no evidence that I DON’T KNOW

Learning English Learning English as a second language is a diﬃcult challenge, but we do know several ways to make learning more eﬃcient. (1) , almost every research study shows that you need to use English as much as possible. (2) the more you use English, the better you will learn it. There is no disagreement about this. Second, learning English from textbooks seems to help in most cases. Most evidence shows that studying grammar rules with textbooks can help you memorise those rules. Therefore, (3) studying with textbooks can help you learn something about grammar. However, if you only use a book but do not practice speaking, you will probably not be able to use the grammar rules when you speak. If we look at language learning research, (4) just learning from textbooks can make you a good speaker. Of course, studying grammar

84

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow will help your speaking (5) , but it is not the only thing you need to know. You must also know how to do things like ask questions and give directions. (6) giving directions, you must know phrases like “Turn right at the corner”.

Third, beginning learners and advanced learners need to study diﬀerently. For example, beginners have little language proﬁciency to build upon. (7) , it is probably best to focus on building vocabulary and learning grammar. (8) vocabulary, beginning learners should try to learn the most common 2,000 words of English.

Then, as intermediate students, they should try to build a vocabulary size of 5,000 words. Learning this many words can be hard, but (9) , learners should be able to read natural English texts, like newspapers and magazines. Advanced learners should learn even more vocabulary. If a learner continues to study over many years, (10) they can reach a vocabulary size of 10,000 words or more.

5. a. b. c. d. e.

to a minor degree to a certain extent to an incomplete degree to a partial extent I DON’T KNOW

6. a. b. c. d. e.

Concerning the issue of With regard to On the idea of Discussing the issue of I DON’T KNOW

7. a. b. c. d. e.

at this phase at this period in time at this stage at this time period I DON’T KNOW

8. a. b. c. d. e.

In terms of Concerning the issue of As concerns Focusing on the issue of I DON’T KNOW

9. a. b. c. d. e.

as a purpose as an outcome as a reward as a result I DON’T KNOW

10. a. b. c. d. e.

over a long period over a long time in the long while in the long term I DON’T KNOW

11. a. b. c. d. e.

there’s a good chance that there’s a great chance that there’s a good likelihood that there’s a great likelihood that I DON’T KNOW

International debt Speaker A: I’ve been watching the news report and they say that (11) the international debts of poorer countries might be cancelled.

Knowledge and acquisition of formulaic sequences Speaker B: Really? I don’t think so. (12) , the international banks do not want to cancel the debt because it would cost them too much money.

12. a. b. c. d. e.

As far as I know As far as my information By my knowledge By my information I DON’T KNOW

A: (13) , the banks would have more money because they would get some money from the government instead. They may not get the loans back from the poorer countries for a long time anyway.

13. a. b. c. d. e.

On the contrary In a contradiction By a contrast For a contrast I DON’T KNOW

B: (14) . I guess it’s better for the banks to get some money now and just forget the loans, particularly when they (15) the fact that some countries may never be able to pay the loans back.

14. a. b. c. d. e.

That’s a solid point That’s a good point That’s a convincing point That’s a strong point I DON’T KNOW

15. a. b. c. d. e.

take into account factor into account allow into account put into account I DON’T KNOW

A: The problem is that many people do not want the government to pay the banks. They feel that the banks caused their own problems by lending money too easily. B: (16) . Many specialists told the banks that some countries had very weak economies and could not repay the loans. (17) this, the banks loaned the money anyway.

16. a. b. c. d. e.

I follow what you mean I understand what you mean I see what you mean I catch what you mean I DON’T KNOW

17. a. b. c. d. e.

On spite of By spite of With spite of In spite of I DON’T KNOW

A: Yes, some loans were too dangerous. (18) , some countries used the money wisely to improve their economies and their people’s living conditions.

18. a. b. c. d. e.

On the other view On the other part On the other standpoint On the other hand I DON’T KNOW

19. a. b. c. d. e.

the point is the key is the idea is the statement is I DON’T KNOW

B: That’s true. But (19) that many countries cannot pay back their loans and it is damaging their economies. There are many arguments for and against cancelling

85

86

Norbert Schmitt, Zoltán Dörnyei, Svenja Adolphs, and Valerie Durow the loans, but (20) best to cancel them.

, I think it would be

20. a. b. c. d. e.

with the whole considering the whole taking the whole into account on the whole I DON’T KNOW

Individual diﬀerences and their eﬀects on formulaic sequence acquisition Zoltán Dörnyei, Valerie Durow, and Khawla Zahran University of Nottingham

Introduction Anecdotal evidence abounds that language learners show considerable variation in their acquisition of formulaic sequences. This variation does not appear to be directly related to their overall rate of language learning success (i.e. ‘good’ learners may not be better than ‘slower’ learners at mastering a range of colloquial phrases); and the variation also applies to more natural language learning situations embedded in the host environment, with the learners being exposed to natural L2 input. What causes this variation? Why do we ﬁnd that many international students, who spend several years studying at a British university, still maintain their artiﬁcial, ‘textbook-like’ proﬁciency, whereas some others readily master a wide range of formulaic phrases and colloquialisms which in turn lend their language use a native-like character? Our initial assumption was that the acquisition of a formulaic, phraseological competence is somewhat diﬀerent from the mastery of other components of communicative language proﬁciency in that formulaic language is so closely linked to the everyday reality of the target language culture that it cannot be learnt eﬀectively unless the learner integrates, at least partly, into the particular culture. For example, the context-appropriate application of colloquial phrases cannot be learned from textbooks, but only through participation in real-life communicative events. Thus, we assumed that the acquisition of a formulaic repertoire is a socially-loaded process that goes beyond mastering elements of the target language code as it also requires ‘tapping into’ the sociocultural reality of the L2 community and incorporating elements of it into the learners’ own language behavioural repertoire. This hypothesis was indirectly conﬁrmed by the quantitative analyses of the data gathered in the acquisition component of our project, reported by Schmitt, Dörnyei, Adolphs and Durow (this volume): The lack of any statistically signiﬁcant correlations between the participating

88

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

students’ formulaic language gain scores and the individual diﬀerence measures pointed to the fact that the inter-learner variation was not simply a function of the existing diﬀerences between the learners’ basic attributes, but was the outcome of a more complex process such as the sociocultural integration of the learners. The current study intends to explore this ‘more complex process’ by analysing qualitative data collected from a subsample of the participants in the Schmitt et al. study. As far as we know, there have been no focused investigations conducted analysing the relationship between sociocultural integration and SLA in the past, therefore the design of our study was exploratory in nature. Based on the theoretical considerations outlined above, our own past experience, and extensive discussions with fellow teachers and researchers, we decided to look for the decisive factors explaining student success or failure in relation to the degree of the students’ acculturation, that is, the extent to which learners succeeded in settling in and engaging with the host community, thereby taking advantage of the social contact opportunities available. Thus, the qualitative strand of our project was aimed at examining how the participants coped with this sociocultural aspect of their learning process.

Background Schumann (1986) deﬁnes acculturation as “the social and psychological integration of the learner with the target language group” (p. 379) and sees it as a prerequisite to mastering the target language. His theory was originally developed for multiethnic settings from a minority group perspective and this situation has obvious similarities to the mastering of the dominant language of the host environment by international students. The bulk of Schumann’s theory concerns factors that may create a social or psychological distance between the L2 learners and the target language speakers, which is seen as detrimental to the attainment of the target language. Three areas highlighted in the theory seem to be particularly pertinent to our study: (a) culture shock and cultural adaptation; (b) language attitudes and motivation; and (c) social networks and enclosures.

Culture shock and cultural adaptation Schumann (1986) deﬁnes ‘culture shock’ as the anxiety and disorientation experienced upon entering a new culture due to the recognition that established

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

mechanisms to cope with routine activities do not work in the new environment. Thus, the concept denotes a complex notion covering a broad range of negative psychological and social reactions to immersion in another culture (Furnham, 1993). Culture shock is assumed to happen to everybody new to a culture — it is a normal and expected reaction as part of the adaptation to the existing cultural diﬀerences. The concept was ﬁrst introduced in a study by Oberg (1960), which identiﬁed six main sources of culture shock: (a) strain due to the eﬀort required to make necessary psychological adaptations; (b) a sense of loss and feelings of deprivation in regard to friends, status, profession, and possessions; (c) being rejected by / and/or rejecting members of a new culture; (d) confusion in role, role expectations, values, feelings and self-identity; (e) surprise, anxiety, even disgust and indignation after becoming aware of cultural diﬀerences; and (f) feelings of impotence due to not being able to cope with the new environment. The international students in our sample came from cultural backgrounds that were rather dissimilar to the host environment. Our pilot investigations suggested that not only did they ﬁnd the sociocultural norms diﬀerent and often strange but even basic issues such as the local food caused them diﬃculties and stress. We therefore assumed that many of them would experience severe forms of culture shock and the process of cultural adaptation would not be smooth for most.

Language attitudes and motivation A key aspect in any learning situation is the learner’s motivation. Because of the complex, socially-loaded nature of language, the motivation to learn a second language is a multi-faceted construct, involving a range of components such as attitudes towards the L2 speakers and their culture; various pragmatic beneﬁts of L2 proﬁciency; issues related to the learner’s personality/identity; and a host of factors rooted in the actual context of the learning (cf. Gardner, 2001; Dörnyei, 2001). Therefore, the learners’ appraisal of the host environment and the L2 community is a key determinant of their willingness and eagerness to actively engage with the locals. Furthermore, as Aston (1988) emphasises, the development of an interactional ability requires the acquisition of the interactional rituals of the L2 culture and having favourable attitudes towards their use — this again points to the signiﬁcance of a positive evaluation of the target culture. Because our study involved a longitudinal investigation covering a period of several months, of particular importance for us was how initial attitudes and motivation changed over time. The temporal dimension of motivation and the

89

90

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

question of motivational evolution has received some attention in L2 studies during the past few years (e.g. Dörnyei, 2000, 2001; Ushioda, 2001) and conceptualising motivation as a dynamic process rather than a relatively stable learner attribute oﬀered us the opportunity to explore the ﬂuctuation of the learners’ motivation and the relationship between motivational development and changes in the learners’ sociocultural perceptions and social situation. Our research design has, therefore, involved periodically revisiting the participants’ attitudinal disposition in order to identify possible trends that may support or hinder their learning process. Taking such a process-oriented approach seemed all the more necessary in the light of the failure of the motivation test administered to the whole student sample to produce results that explained a signiﬁcant amount of variance in the formulaic language gains (Schmitt et al. this volume), pointing to a more complex motivation-learning achievement relationship.

Social networks and enclosures One of Schumann’s (1986) most inﬂuential insights into the process of acculturation involved the signiﬁcant role he attributed to the concept of enclosure. By this he meant the extent to which the learner’s group shared the same social facilities (e.g. churches, schools, recreational facilities, professions) as the target language group. Obviously, if learners ﬁnd themselves in an ‘international ghetto’ situation, this will reduce their opportunities for contact with the host community and hinder any subsequent sociocultural integration. Thus, the issue of ‘enclosure’ raises two broader questions, the role of social networks and interethnic contact. Both are well-researched issues in the social sciences with solid bodies of literature and therefore the current discussion can only outline the scope of these issues and their relevance to our current study, without oﬀering a systematic overview. In a study that was similar to ours both in its aims and conditions, Geoghegan (1983) analysed the diﬃculties experienced by non-native students at Cambridge University. She concluded that the most important factor that contributed to the students’ sense of alienation was the poor contact they had with the host population. The participants of her study explained the insuﬃcient quantity of contact largely as a result of restrictions within the British culture where privacy and individualism are highly valued. While putting the blame, and therefore the responsibility, on the other party is clearly a simplistic and one-sided perception, it illustrates well that the success of interethnic communication is dependent on the extent of cross-cultural understanding. Indeed,

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

despite the realisation of the importance of contact within the learning process, many international students fail to achieve it because they treat learning the L2 and learning the L2 culture as two separate things and try to focus only on the former. Stangor, Jonas, Stroebe, Hewstone’s (1996) ﬁndings indicate that this phenomenon is not restricted to speciﬁc ethnic groups. The researchers investigated British exchange students and found that not a single one of them reported having had too much contact with host country members, whereas 55 per cent reported having had too little contact with them. The amount of interethnic contact one experiences is also a function of the nature of the learners’ social network, which consists of all the people they have links with such as family, friends, acquaintances and even the strangers they meet (McMahon, 1994). The quality of such social/friendship networks has a strong impact on the ultimate success of the language acquisition process. In a study of Chinese teaching assistants in America, Jenkins (2001) found, for example, that her participants, who lived together in apartments that they themselves referred to as the ‘Chinese ghetto’ and operated under a system of interdependence and group obligations, attributed their cultural isolation partly to their situational circumstances. The eﬀects of interethnic contact have also been the subject of a vigorous line of research in social psychology investigating the “Contact Hypothesis”. In a comprehensive review of the relevant literature, Pettigrew (1998) summarises that, according to the theory, contact leads to positive interethnic outcomes only if the following ﬁve conditions are met: equal group status of the two groups, common goals, intergroup cooperation, authority support and friendship potential. From our perspective, particularly the last condition is important because this is exactly the kind of quality that is so often missing from the relationship between international students and the locals. Furnham and Bochner (1989) provide a survey of ﬁndings concerning the friendship networks of international students and conclude that although friendship relationships with host nationals are seen as important and necessary, these relationships are seldom forged. They go on to argue that “foreign students have limited contact with host nationals [which] may explain why many overseas students return home disgruntled with the society in which they have studied” (p.129). The well-being of international students from a social network perspective would require, as the scholars maintain, for international students to belong to both a host-national network through which they could learn the social skills of the host culture and a co-national network so that they could maintain their culture of origin. However, the available evidence suggests that most foreign students “do not belong

9

92

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

to a viable host-national network” (p.129). In a recent overview, Ward, Bochner and Furnham (2001) conﬁrm the validity of the earlier ﬁndings. Although many students would like, and feel that it would be beneﬁcial, to form friendship relationships with members of the host community, in practice this very rarely happens. Investigating an Oxford student residence, for example, Bochner, Hutnik and Furnham (1985) reported that as many as 70 per cent of their sample of foreign students did not have any English friends at all after at least one year in the country.

Research question The brief overview above suggests that international students arriving at British universities may not necessarily ﬁnd what so many of them expect, namely that they will be able to immerse in the host culture and develop their language proﬁciency through ongoing participatory experience of L2 communication. Past research suggests that their acculturation process is likely to be an uphill struggle, hindered by serious culture shock, motivational ﬂuctuation and inadequate membership opportunities in host-national networks. We have also argued that the acquisition of a formulaic/phraseological competence is to a large extent the function of the learners’ sociocultural adaptation and integration, and in the light of the acculturation diﬃculties outlined above we can see why formulaic language learning is so often unsuccessful. The good news, however, is that some learners do manage to make considerable progress in this area, and this observation prompts our main research question: What learner characteristics and learning conditions/processes facilitate the successful mastery of formulaic sequences, thereby empowering learners to ‘beat the odds’? In order to pinpoint the patterns that cause the diﬀerences in this area, our research design contrasts some of the most successful formulaic language learners in our sample with some of the least successful ones.

Methodology Participants Participants included seven international postgraduate students at the University of Nottingham, enrolled in a pre-sessional intensive language course offered by the Centre for English Language Education (CELE). All of them were

Individual diﬀerences and their eﬀects on formulaic sequence acquisition Table 1. Descriptive data about the seven participants

Mike Daniel Faith Beth Jill June Ann

Age

Nationality

Study area

Formulaic L2 proﬁciency gain

Course length

29 26 26 23 23 34 32

Japanese Chinese Chinese Chinese Chinese Chinese Chinese

IT IT Law Education Law Business Law

20 16 15 10 1 −1 −5

3 3 3 3 2 2 3

TOEFL (comp): 190 IELTS: 5.5 TOEFL: 637 IELTS: 5 IELTS: 6.5 IELTS: 6 IELTS: 5.5

of Asian origin (Chinese and Japanese) and none of them had visited the UK before. They were selected from a pool of 24 students who had participated in the longitudinal interview study strand of our larger-scale project (cf. Schmitt et al. [Chapter 4], this volume). All the 24 students took part in regular interviews for a period of approximately six months and they all took a number of diﬀerent paper-and-pencil tests. The reason for selecting the current seven learners for our study was that they each obtained extreme gain scores on the two types of formulaic sequence tests we applied in the study (cf. Schmitt et al., this volume): while three of them showed virtually no improvement between the pre- and post-tests, the other four showed considerable gains in their formulaic sequence repertoire during the examined period. Table 1 presents some basic descriptive data about the participants; as can be seen, the ‘good’ formulaic learners all obtained a total gain score of 10 or above, whereas the ‘slow’ ones only 1 or below. Given that the mean gain score was 5.66 for the whole sample (N=70) and the standard deviation was 5.16, these learners were at least one standard deviation above or below the sample mean.

Data collection Besides taking the paper-and-pencil tests in the same way as the rest of the sample (cf. Schmitt et al., this volume), the participants also took part in a series of regular long interviews. At the time of the interviews, all the students were studying English in an intensive language course of either two or three months’ duration; following this preparatory course, they intended to proceed to postgraduate study. Students on the three-month pre-sessional course were interviewed at the beginning, middle and end of the course, while the two-month students were only interviewed at the beginning and the end. The interviews

93

94

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

were conducted by the authors and a research assistant in a way that a student was always interviewed by the same ‘caseworker’. The interviews were recorded and the tapes subsequently transcribed. At the end of the language course, the personal tutors who had been assigned to each student by CELE were also interviewed. Again, the interviews were taped and transcribed.

The interviews As summarised earlier, the aim of our study was to supplement and clarify the quantitative ﬁndings (cf. Schmitt et al., this volume) with in-depth qualitative data obtained from a series of semi-structured interviews with both the participants and their personal tutors. We were hoping to explain the variation observed in the formulaic gain scores by identifying possible reasons rooted in the participants’ motivation, attitudes and beliefs, as well as their personal experiences related to interethnic contact and cultural adaptation. In order to make the interview data comparable across the participants, we developed interview guides for each session, which were ﬁrst piloted with ﬁve students at Nottingham University. These interview guides included questions concerning factual information about the interviewees’ background and a set of topics to be explored with the interviewees during the course of the interviews. These were selected as a result of consulting the relevant literature and conducting in-depth informal discussions with a variety of people who had relevant expertise (e.g. course tutors) or personal experience (international students). The ﬁnal list involved issues such as the students’ reaction to the host country; their attitudes and beliefs about language learning; their language learning motivation and any possible changes in it; their perceived progress and any factors they thought might have facilitated or hindered it; and ﬁnally their social well-being, including social networking and contact opportunities with native speakers of the target language. The interviewers were given freedom in how they sequenced the questions and ﬁnalised their wording, and how much time they devoted to each individual topic as long as the interview contained some coverage of all the areas. The series of interviews created prolonged engagement with the interviewees and, as a result, good rapport was built between each interviewer and interviewee. The interviewees found it increasingly comfortable to express their opinions in a conversational manner and the fact that they were interviewed more than once allowed the interviewers to pursue and deeply understand any emergent topics, responses and motives. It is important that the interviewees

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

were seen as ‘participants’ not ‘subjects’ and they actively shaped the course of the interview. The interviewers showed interest, gave support and sometimes even took part in the participant’s social activities. Their focus was on exploring the participants’ own perspectives and interpretations.

Data analysis Data analysis took place in an ongoing manner throughout the longitudinal data collection phase. For the purpose of this study (as this was just one subset of the larger-scale project) we employed content analysis of the transcribed interviews, trying to identify any themes that would explain the individual participants’ observed success or failure in acquiring formulaic language.

Results Quantitative results Table 2 presents the aptitude and motivational test scores obtained by the participants, standardised for the whole sample. That is, the table shows how much each individual score diﬀers from the sample mean, and this diﬀerence is presented in standard deviations. Thus, for example, Mike’s aptitude score is .80 standard deviations higher than the sample mean. (All participants’ names are pseudonyms.) As can be seen in the table, there is no straightforward pattern of results that would explain the diﬀerences between the two groups of learners. For example, although Mike, the most eﬀective formulaic learner, had the highest level of language aptitude, Beth had the lowest and still qualiﬁed for the ‘good’ group. And although Ann, the worst formulaic learner, reported the highest level of language use anxiety, the second highest level was displayed by Mike, the top learner. This inconclusive pattern, in fact, corresponds with the results calculated for the whole sample (Schmitt et al., this volume), where we did not ﬁnd any signiﬁcant relationships between individual diﬀerence variables and the degree of acquisition of formulaic sequences.

Qualitative results Why did the good learners excel and the slow learners fail? In the following

95

96

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran Table 2. Various standardised test scores obtained by the participantsa Sample mean Mike Aptitude

Daniel

Faith

Beth

Jill

June

Ann

.67

.80

.49

.18

−1.07

−.76

−.13

−.76

Attitudes toward L2 learning

Initial Change Final

4.43 .14 4.57

.90 .29 1.21

−1.08 1.77 .47

−.75 .29 −.63

−.09 .29 .11

−.75 .66 −.26

−.09 −.46 −.63

−.42 −.46 −1.00

Integrativeness

Initial Change Final

4.27 −.11 4.16

1.59 −1.40 −.72 .31 .69 −1.03

−.41 −.49 −.89

−.91 1.62 .69

−.16 .22 .01

.34 .92 1.15

−.16 .92 .69

Instrumentality

Initial Change Final

4.20 .15 4.35

−.22 −.50 −.78

.15 −1.47 −1.55

.15 −.82 −.78

−.59 .15 −.40

.13 −.82 .36

.52 −.17 .36

.88 −.50 .36

Language use anxiety

Initial Change Final

3.35 −.29 3.06

.63 .21 .92

−.02 −.38 −.38

−.34 −.68 −1.03

−.02 −.68 −.71

−.02 −.09 −.06

−.67 .21 −.38

1.27 .21 1.57

Commitment to learn English

Initial Change Final

4.90 −.21 4.69

1.17 −.63 .30

.10 −.63 −.66

−1.50 .62 −.66

.64 −.21 .30

.10 .10 .20 −1.04 .30 −1.15

−.97 .62 −.18

Intended eﬀort

Initial Change Final

5.40 −.14 5.26

.86 −1.91 −1.75

.86 −.87 −.36

.86 −1.91 −1.75

−2.03 1.23 −.36

−.58 .18 −.36

.86 .18 1.03

−.58 .18 −.36

aAll participant scores have been standardised (i.e. the sample means have been converted to 0 and the standard deviations to 1). Please note that this also applies to the change scores: they do not represent the actual changes but rather how these changes were related to the changes that occurred to the other participants in the whole sample.

section we will present a qualitative data by looking at what we have found out about the participants from the interviews. First we examine the participants one by one, in a descending order of their formulaic gain scores (i.e. we start with the best formulaic language learner), and in the subsequent Discussion section we analyse any emerging broader themes.

Mike In Japan, Mike did not use to like learning English and did not see the point of doing so because he did not need the language for his daily life. All this changed

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

when he made friends with a foreign student whose English was excellent, and through him he got to know several other international students as well. As he explains, In university I changed my mind because I met an overseas student. He is Argentinean but can speak English very, very well. We got on a very very nice relationship, so I tried to speak English. He lived in international dormitory, and I had some chance to talk to other overseas students. It was a very nice experience for me and there I wanted to learn English.

There were three further contributing factors to his motivation: First of all, the role modelling of his father, who had a high level of proﬁciency in English since he edited an English-speaking paper and whom Mike admired. Second, the fact that he had to work for two years to be able to aﬀord his trip to England, which considerably increased the value of this opportunity to learn and also made him keen to make the most of it. Third, his general positive attitude towards British people, whom he thought shared a lot in common with the Japanese in being polite, gentle and shy. Mike therefore started oﬀ with a high level of commitment and he appeared to enjoy the English course: As Table 2 shows, his initial attitude toward L2 learning was higher than average and it further increased during the three month period. In his private life he also made an eﬀort to use as much English as possible, although he did not seem to manage to establish any real contact with native speakers. To compensate for this, he tried to speak to his friends and even to his wife in English, but as he admits this was not always possible. In an interview at the end of Mike’s studies in CELE, his tutor told us how pleased he was with Mike’s progress and how much he admired the positive part Mike took in his learning process. He described Mike as an organised, bright, humorous, well-balanced, happy, lovely and sensitive person. The tutor also noticed that Mike mixed well with other nationalities in the group and was also realistic about the pace of his learning, which he believed had paid oﬀ. Thus, Mike appears to be a straightforward case of a particularly highly motivated and talented learner (Table 2 shows that he had the highest aptitude score in our sample) who wanted to make the most of his studies. However, even in his case the picture is not entirely clear-cut because he also displayed a greater than average level of anxiety about using English and by the end of the three-month period his commitment to learn English decreased somewhat along with the amount of eﬀort he wanted to exert on his language studies.

97

98

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

Daniel Looking at Daniel’s quantitative proﬁle in Table 2, the only special thing we can notice is that he came to like the language learning process during his stay in Nottingham: Starting oﬀ as worse than the average, by the end of the three months he was way above his group is terms of his positive disposition. His qualitative data reveals a factor that must have been highly instrumental in his success: his desire to use English in real-life situations, that is, “just to talk with a native speaker”. Right from the start he decided that sharing his accommodation with native speakers would be useful, and indeed managed to move into a rented house with some British youth. In our experience with international students, this had almost been unprecedented as most students would usually stay in the safety of the university student halls where they are housed with other foreign students. The same willingness to apply English for real-life communication was also displayed in his eagerness to use the language to meet people of other nationalities and even to ﬁnd a foreign girlfriend: With English we can talk to Koreans, Japanese, we can talk and get to know each other, make a girlfriend from another country, it’s very useful.

Although he admits that the communication with his British house-mates was not always easy and unfortunately after a while they had to leave and Daniel moved back to campus, he kept watching TV and listening to the radio purposefully to remain in touch with spoken English. In a follow-up interview, his tutor described Daniel as chatty and talkative and he also noticed Daniel’s interest to use the language appropriately in social situations. The tutor thought that Daniel found it relatively easy to adapt to life in the UK but noted that Daniel had one main language problem, his poor pronunciation. A good illustration of Daniel’s intercultural social skills was given by himself in an interview describing a situation when he started to talk to a British lady in the street and ended up being invited to her place to have a cup of coﬀee with her. This was particularly noteworthy given that this was not something usually done in China: And also I met a lady, she welcome with me to her room and have a cup of coﬀee with her. Is very good I think but in China is very strange but in UK I think very common.

Faith Faith is similar to Daniel in that the quantitative details do not reveal anything remarkable about her with regard to her L2 learning success. The only area

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

where she was better rather than worse than the average was language use anxiety — she did not seem to worry about communicating in English. This, as we ﬁnd out from her personal account, was due to the fact that her mother was a teacher of English, who encouraged her to learn the language from an early age. Furthermore, before coming to England she had been working for joint venture companies where a good level of English was a requirement. In Nottingham, she did not ﬁnd the English course suﬃciently challenging. As her personal tutor described, She’s probably the best person in that group in all her skills and she is very focused, very self-motivated, wants to get on faster. You know, I think she has found it a bit slow.

Faith’s personal accounts reveal that she ‘underrepresented’ herself in the motivation questionnaire: She came to Nottingham with a very high level of motivation and expectations, and she spent a lot of her free time studying (at least four hours a day!). She set out to acquire a native-like level of proﬁciency in English, and it is this ambition and the accompanying determination to achieve it that, we believe, made her stand out of the others. She was well aware of the importance of contact with native speakers of English (“. . . it can improve English and it can also make you understand the culture . . . ”) but, interestingly, even her advanced proﬁciency and her motivation were insuﬃcient to get her to really integrate into the local community. This is partly because she spent most of her time on her “project and on playing on the computer”, and partly because she had certain basic problems with cross-cultural communication, most notably with the choice of non-academic topics that were appropriate for British people. I have created some chance to talk with native speakers but the topic is hard to choose . . . what kind of topic is proper to discuss. Sometimes when I try to ﬁnd some new topics, maybe the person I have talked with feels embarrassed or not at ease. For Chinese people, we all like to talk about family life but I found that some British people didn’t like this topic since it’s a little bit private to talk about . . . Sometimes I want to ask some questions to local people but I am always afraid that maybe they think I’m rude. I don’t want to make people upset.

Beth We have included Beth in this study partly because she qualiﬁed as a successful learner of formulaic sequences (cf. Table 1) and partly because her story serves as the perfect illustration for the signiﬁcance of an ongoing social engagement

99

00 Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

with the target language community: All the indications suggest that without the successful sociocultural integration that she displayed she would have probably failed her course, let alone become one of the success stories. So far all the three good students we have described had a higher-than-average language aptitude, whereas Beth was just the opposite: her aptitude score was more than one standard deviation below the sample mean. In addition to this handicap, her initial motivation was also below average in every respect except ‘commitment to learn English’, and the amount of intended eﬀort she was going to put in her learning was over 2 (!) standard deviations below average. And, given that her proﬁciency level was also worse than most of her peers’, it does not come as a great surprise that initially Beth was struggling: she was depressed, nervous and intensely homesick. In fact, both her personal tutor and her research caseworker expressed serious worries at that time that Beth would break down and go home. Two quotes, one from her and one from her tutor, illustrate this situation well: These days I am not very well. I am so tired. I feel that I cannot arrange the timetable with my daily life. I miss my mother very much, every night I bring my mother’s photo into my dreams. I feel I am very alone. [Personal tutor:] When she ﬁrst arrived, she was very nervous, very insecure, completely out of her depth. She was a long way from home and probably for the ﬁrst time she was very lost. She is very intense, you know the workaholic type who works a lot and doesn’t make friends very easily, so she was isolated for a long time.

Yet by the end of the three-month period, her integrative/cultural disposition towards England improved by more than 1.5 standard deviations above the average, her anxiety decreased and her attitude toward learning became more positive, exceeding the sample mean! What happened? If we look at her quantitative proﬁle in Table 2, there is one aspect in which Beth stands out: her desire to achieve a high level of English. And as soon as her immediate culture shock was over, she started to adjust to life in Britain and to cope with her challenges: I feel I have become, from time to time . . . I have joined in the British culture and British life, and it’s not very quick . . . When I came here, I always worried about everything, food shopping and study, and I always felt I couldn’t enjoy myself. Sometimes I didn’t want to communicate with people . . . but this week I feel I have some experience about how to arrange my life.

Beth’s characteristic feature was that she proactively sought out opportunities to interact with native speakers. As her personal tutor described, she “latched onto anybody she could”, but did this in an amiable way. Looking back, it is noteworthy that already in China she succeeded in developing a relationship with a

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

British teacher of English, whom she met at an exhibition and, as she described, “every week we write a letter on the internet”. Here in England she joined a local church and became very involved in church life. This resulted in a lot of contact with native English speakers and by the end of the three-month period she had made, by her own account, several friends both in church and in her language course. Let us conclude this summary with two telling quotes, one from her, the other from her personal tutor: I think in these three months I have made progress about my study. I have acquired academic skills and have got some friends in the language course and in my church and this is a great foundation to support my future study. So I’ve enjoyed these three months. [Personal tutor:] To begin with everybody was very worried about her because she was so depressed. She’s very thin and she seemed like a little sort of waif, wasting away, and she was always on her own. She looked unwell physically. She looked unwell emotionally. She just looked unhappy all the time but that’s changed now. She seems to be quite diﬀerent, very smiley, happy, relaxed, most of the time. She still has moments when you see the brow furrow and the panic start but that goes very quickly once she remembers what to do.

Jill Let us start looking at the group of low formulaic achievers by introducing Jill, a 23-year old Chinese student. Her test proﬁle in Table 2 shows that she was not dissimilar to the average sample, perhaps slightly on the negative side, particularly in her language aptitude and her attitude toward language learning. When we analysed her personal accounts, one pattern in her behaviour became striking: her inability to integrate into the local culture and community. For example, just like Daniel, she also left university accommodation where international groups were housed in ethnic clusters, but she moved into a rented house with Chinese friends rather than British or other international students. The following extract illustrates her crosscultural diﬃculties: I try to understand the English culture through the media such as newspapers and TV but I think it is very diﬃcult to be a part of English culture. You know, we are from diﬀerent countries, we have diﬀerent . . . maybe there is a cultural gap between us, so very diﬃcult, and nobody will look on us as a native. . . . I think the biggest problem is that I cannot meet many native speakers . . . So you know, there are Japanese just together, Chinese together, and people from Europe together . . .

Her diﬃculty in ﬁnding opportunities to communicate with native speakers of English might have also been caused by her beliefs about language learning.

0

02

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

Like many Asian learners, she believed in the supreme importance of studying grammar and memorising vocabulary items, especially law-related ones. She would thus spend hours studying the language in her room rather than using it. As she explained, in China she used to watch English-speaking TV, but she stopped doing so in Britain because of her studies and because she could not see the point of many “silly programmes” on TV, especially the comic ones. This critical attitude was also reﬂected in her attitude about British people: I heard from my friend they said although they act very polite indeed they are very indiﬀerent and I think I have not many relationship with them.

Indeed, her tutor described her as an articulate person who was conﬁdent enough to make a complaint, something the Chinese students rarely did. He also noticed that she seemed to get dispirited easily and needed some sort of external stimulus to get motivated.

June Two things that stand out in June’s quantitative proﬁle (Table 2) are her aboveaverage level of integrativeness and intended eﬀort. However, the latter is somewhat ‘pulled down’ by her lower-than-average commitment to learn English. This is also expressed in her personal account: In the long run I really want to be like a native speaker but in the two months [the duration of her language course before joining a department] I hope I can improve my English to achieve the requirement of academic studies.

Before June came to England she thought she would “make a lot of new friends and speak English every day” but she has found England “much quieter than China” and she “cannot see so many people and cannot ﬁnd many opportunities, activities [to communicate in English]”. Just like Beth, she tried to join a church because “many native speakers when they talk about something they always use stories or something related to the Bible” and therefore she thought that learning about the Bible would be useful, but her ﬁrst impression was that she could “understand very little”. What a diﬀerent attitude this is from Beth’s, who didn’t just want to learn about things but wanted to be part of things. This lack of commitment to make the most of her stay was also obvious when she admitted, “I think I am not a very hard-working pupil. When I go to the shopping centre with my Chinese friends we generally speak Chinese along the road.” Thus, June could not really make contact with the local people and we believe that one main reason for this was her general inability to relate to the English

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

way of life. This is well reﬂected in the following extract talking about pubs and free time: I don’t like the pub. I don’t know what the British people do every day, every weekend, every holiday; I don’t know where they go beside the pub. . . . I think in a pub it’s very . . . I don’t know how to say it . . . you know, people dress a little and drink a lot of beer, alcohol . . . they dance, they speak aloud, something like this, but I think that beside this nothing is very exciting, just very quiet.

We must note that this ‘inability to relate’ is not the same as a ‘negative attitude’. As was already mentioned, June had an above-average level of integrativeness and this was also reﬂected in her personal accounts: she found Britain an advanced, modern country and the British gentle, polite, traditional and patient. Furthermore, as her tutor has remarked, “from the way she is dressed she really wants to be European more than Asian”. When asked about the best way of learning English, she said, “I think the best is the English environment, but I cannot ﬁnd that environment. I think the most I use English here is to ask for directions [laughs]”. It is noteworthy that her tutor considered her rather quiet, lacking conﬁdence and not revealing much about herself except that she missed her country and her family a lot.

Ann Ann had been working for over ten years as a lawyer in China, a profession that did not require her to use English. Her quantitative proﬁle (Table 2) reveals two things: her level of language use anxiety was over a standard deviation above the sample mean and her attitudes toward L2 learning were below the sample average. She also openly admitted the latter in one of the interviews, “I admit that I can’t ﬁnd a lot of fun in learning English, sorry, because I still can’t ﬁnd a good way of improving my level.” She also did not have high expectations about her success: “I am not so conﬁdent about achieving a high level [of English] in the short time. Maybe in ten years I can [laughs].” Ann’s personal tutor described her at the time of her arrival as “not particularly conﬁdent but no less conﬁdent than anyone else”. She claimed to know very little of Ann outside the course but what she said implies that Ann avoided socialising: To be honest, I don’t actually see her very much. I never see her having dinner with the other students. . . . I would imagine that she has very little contact with native speakers.

03

04

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

Going through all the interview data, it becomes clear that Ann’s main problem was the tiredness and stress she experienced ever since her arrival in Britain. This, we believe, explains her high anxiety scores in Table 2. As she explained, this nervous state had in fact been a feature of hers even before coming to the UK: I need a little rest. I’m a nervous person — my colleagues always say that you are too nervous on the job. I think it’s because I have so much pressure in my life, for the future, for my work and for my dream. I try to learn how to relax. This is diﬃcult.

This stress caused tiredness, which was further augmented by her language use anxiety: I also feel exhausted in class because I still have not got used to get a message, get knowledge in a diﬀerent language. I translate it into my own language, and I reﬂect and I react, so I feel tired in class and after that I need a little rest. That’s why I go back home in lunchtime, and then in the afternoon, most of the times, I don’t think I learned much in the afternoons.

At the end of the three-month course she still experienced problems of tiredness which lead to diﬃculties in learning: “At the ﬁnal study, I felt tired. I can’t learn. I feel tired even in class. And my reactions slowed down. I just feel tired and physically I have some problems.” Thus, the stress and exhaustion held Ann back considerably from learning and also from socialising. She admitted that she withdrew from any form of social contact when she was tired and depressed. In general she spent a lot of time on her own: I just go to Beeston one time a week to shop, and most of the time I just stay in my room and study or sometimes I listen to the BBC radio broadcasting. Yes, not too much contact outside works.

Discussion Although the above descriptions could only provide a crude and superﬁcial representation of the speciﬁc issues the seven learners had experienced, one thing becomes clear when we read through them: success in acquiring formulaic sequences is strongly related to the learners’ active involvement in some Englishspeaking social community. The problem is, as we have generally found in our research project, that international students, and particularly those who come from a very diﬀerent cultural background, ﬁnd it extremely hard to join such ‘host-national networks’. In fact, apart from superﬁcial service encounters most of them hardly ever come into meaningful contact with English speakers out-

Individual diﬀerences and their eﬀects on formulaic sequence acquisition

side their academic environment. Therefore, their success in acquiring formulaic sequences — and a phraseological competence in general — will depend on whether they can ‘beat the odds’, that is, whether they can break out of the ‘international ghetto’ they ﬁnd themselves in. Two out of the four successful students we have described, Daniel and Beth, managed to do so and their eﬀorts paid oﬀ. It is particularly interesting to see that Beth, who was originally ‘destined’ to be a failure, did manage to completely turn the tide through her most determined eﬀorts at social integration. The other two successful students and all the three unsuccessful ones failed to break the social barrier. What distinguished the former from the latter, it seems, is their level of language aptitude, which was in both cases above average, and more importantly their extraordinary motivation. Mike was so motivated that he tried to speak English even with his Japanese friends and wife, and Faith tried to improve her language proﬁciency — which was quite developed to start with — by putting in an amazing amount of work every day. Jill, June and Ann were not bad or unmotivated students — if they had been they would not have been in Nottingham. But for various reasons they did not ‘run the extra mile’ that was necessary for success: None of them had a particularly high level of aptitude to start with, and each of them had some further personal ‘handicaps’: Jill could not get over the cultural gap that she felt divided her from British people; June did not have enough perseverance and she also had serious problems relating to British people; and Ann suﬀered from ongoing stress-related tiredness. It is dangerous to generalise from the ﬁndings of a qualitative study but the tendency that our data has revealed seems to be so strong and it coincides so well with the general impression that we have developed during the two years of our investigation that we feel it is justiﬁed to formulate the following conclusion: Success in the acquisition of formulaic sequences appears to be the function of the interplay of three main factors: language aptitude, motivation and sociocultural adaptation. Our study shows that if the latter is absent, only a combination of particularly high levels of the two former learner traits can compensate for this, whereas successful sociocultural adaptation can override below-average initial learner characteristics. Thus, sociocultural adaptation, or acculturation, turned out to be a central modifying factor in the learning of the international students under investigation, which explains why the whole-sample statistics (Schmitt et al., this volume) that did not address the issue of sociocultural adaptation failed to produce signiﬁcant results.

05

06

Zoltán Dörnyei, Valerie Durow, and Khawla Zahran

References Aston, G. 1988. Learning Comity: An Approach to the Description and Pedagogy of Interactional Speech. Bologna: Editrice Club Bologna. Bochner, S., Hutnik, N., and Furnham, A. 1985. The friendship patterns of overseas and host students in an Oxford student residence. The Journal of Social Psychology 125: 689–694. Dörnyei, Z. 2001. Teaching and Researching Motivation. Harlow: Longman. Dörnyei, Z. 2000. Motivation in action: Towards a process-oriented conceptualisation of student motivation. British Journal of Educational Psychology 70: 519–538. Furnham, A. 1993. Communicating in foreign lands: The cause, consequences and cures of culture shock. Language, Culture and Curriculum 6: 91–109. Furnham, A., and Bochner, S. 1989. Culture Shock: Psychological Reactions to Unfamiliar Environments. London: Routledge. Gardner, R. C. 2001. Integrative motivation and second language acquisition. In Motivation and Second Language Learning, Z. Dörnyei and R. Schmidt (eds), 1–20. Honolulu HI: University of Hawaii Press. Geoghegan, G. 1983. Non-native Speakers of English at Cambridge University. Cambridge: Bell Educational Trust in association with Wolfson College. Jenkins, S 2001. Cultural and linguistic miscues: A case study of international teaching assistant and academic faculty miscommunication [Electronic version]. International Journal of Intercultural Relations 24: 477–501. McMahon, A. M. S. 1994. Understanding Language Change. Cambridge: CUP. Oberg, K. 1960. Culture shock: Adjustment to new cultural environments. Practical Anthropology 7: 177–182. Pettigrew, T. 1998. Intergroup contact theory. Annual Review of Psychology 49: 65–85. Schumann, J. H. 1986. Research on the acculturation model for second language acquisition. Journal of Multilingual and Multicultural Development 7: 379–392. Stangor, C., Jonas, K., Stroebe W., and Hewstone, M. 1996. Inﬂuence of student exchange on national stereotypes, attitudes and perceived group variability. European Journal of Social Psychology 26: 663–675. Ushioda, E. 2001. Language learning at university: Exploring the role of motivational thinking. In Motivation and Second Language Learning, Z. Dörnyei and R. Schmidt (eds), 91– 124. Honolulu HI: University of Hawaii Press. Ward, C., Bochner, S., and Furnham, A. 2001 (2nd ed). The Psychology of Culture Shock. London: Routledge.

Social-cultural integration and the development of formulaic sequences Svenja Adolphs and Valerie Durow

University of Nottingham

Background It is widely accepted that exposure to language plays a signiﬁcant part in the acquisition process (Vygotsky, 1987; Krashen, 1982; Swain, 2000; Ellis, 1994). This has been documented with regard to the acquisition of individual lexical items, grammatical structures, and discourse competencies. However, the inﬂuence that exposure has on the acquisition of formulaic sequences in language use has been less well demonstrated. This is surprising since there has been an increasing body of research into the nature and occurrence of formulaic sequences in language use over the last three decades (Biber et al., 1999; Coulmas, 1979; Cowie, 1988; Nattinger and DeCarrico, 1992). Exposure to a language tends to be aﬀected by a variety of factors such as classroom focus, the time spent in a country where the language is spoken as a native language, and the amount of reading a student does. However, for students who spend an extended period of time at a university abroad, one of the most important factors aﬀecting levels of exposure is the social and cultural adaptation to the target-language environment (see Dörnyei, Durow and Zahran, this volume). This adaptation is generally facilitated through on-going contact with native speakers. Yet, one of the problems that international students at a British university can face is the lack of interaction with native speakers. Furnham and Bochner (1989) provide a survey of ﬁndings concerning the friendship networks of international students (sojourners). This has relevance for our study in that, overall, it would appear that relationships with host nationals are seen as important and necessary — “the degree of social interaction between the host national and the sojourner is related to the latter’s adjustment” (p.128). Nevertheless, it is also suggested that these relationships are seldom forged and that “foreign students have limited contact with host nationals [which] may explain why many

08

Svenja Adolphs and Valerie Durow

overseas students return home disgruntled with the society in which they have studied” (p.129). Findings suggest that students belong to three separate and distinct social networks: monocultural (with co-nationals), bicultural (with signiﬁcant host nationals) and multicultural (with other international students). To some extent, each of these networks, described as the ‘functional friendship model’ is signiﬁcant to the psychological well-being of the student. In their research on the friendship pattern of overseas and host students in an Oxford student residence, Bochner, Hutnik and Furnham (1985: 693) found that An internal analysis revealed that 16 (70%) of the foreigners did not have any English friends at all after at least one year in the country, further conﬁrming how socially isolated from the host society these students were. This separation created a vicious circle because the lack of English friends reduced the sojourner’s opportunities for learning those cultural skills that might facilitate entry into local society, thus rendering it even more inaccessible.

Similarly, Furnham and Alibai (1985: 719), when investigating the friendship networks of foreign students, found that 56% of all the foreign students had no British friends at all which means that they probably have very limited contacts with host nationals. This tends to conﬁrm the view that foreign students have limited (functional and utilitarian) contact with host nationals.

The results also suggest that members of the host community were preferred by international students for linguistic and academic help, while “co-nationals were chosen for emotional help, shopping, cinema and party attendance” (p. 720). The research reported so far gives rise to the question of whether there is a relationship between the level of sociocultural adaptation and the acquisition of certain sequences in language use. Based on the assumption that social integration provides more exposure to a language, we set out to explore the development of usage of formulaic sequences by non-native speakers over time. In order to do this, we needed to develop not only a working deﬁnition of such sequences, but also a framework for measuring their development longitudinally. Wray (2002) lists a range of terms that have been introduced to refer to the phenomenon of relatively ﬁxed sequences in language use ranging from ‘chunks’ to ‘multi-word units’ to ‘formulas’. A range of approaches have been introduced which aim at deﬁning the form and function of such sequences (Aijmer, 1996; Manes and Wolfson, 1981; Moon, 1998). All of these frameworks recognise that in language use certain lexical patterning can be observed or that much of language use relies heavily on what Sinclair

Social-cultural integration and the development of formulaic sequences

(1987) calls the ‘idiom principle’. According to this principle words are not always selected one at a time but instead are often part of a co-selection process which leads to a strong syntagmatic relationship between individual lexical and grammatical items. The frameworks that have been developed to describe the nature of such sequences vary widely, both in the deﬁnition of a sequence and in the methodology used to identify a speciﬁc sequence (Read and Nation, this volume). Some studies have relied heavily on intuition in this process (Weinreich, 1980), sometimes accompanied by prior or subsequent corpus research (Nattinger and DeCarrico, 1992). Others have used the criterion of frequency as the main starting point and produced lists of sequences according to their frequency ranks in a given corpus (Biber et al., 1999). The latter approach does have the beneﬁt of being more systematic in its identiﬁcation of sequences and somewhat less subjective than other approaches but it is also associated with certain problems. The cut-oﬀ point of how frequent a particular sequence needs to be to qualify as a formulaic sequence is often rather arbitrary for example. Furthermore, this approach is not necessarily compatible with more traditional studies which have used the psycholinguistic criterion of ‘pre-formulaticity’ in their description of formulaic sequences. Thus, using the frequency approach, we ﬁnd that a number of very frequent sequences would not be recognised as being pre-formulated if we were to use our intuition. There are several possible explanations for why this may be the case. The high frequency of sequences such as ‘the the the’ could be a simple artefact of the way the frequency search is conducted and not say much at all about the ﬁxedness and institutionalisation of the sequence. However, it may be the case that they are used in a similar way to other multi-word discourse markers that signal hesitation. For example, Beth (one of our participants, see below) uses the sequence ‘I I I’ 12 times in her ﬁrst interview, a case of which is shown in the following extract: Interviewer: Yeah? Beth: but but I love in where= in in the beautiful environment, I like the beautiful trees and Interviewer: Uh . . . uhh Beth: a house, a house is very, very interesting, very, very nice, yeah, and um the other things I I I Interviewer: You ﬁnd it strange? Beth: Yeah.

09

0

Svenja Adolphs and Valerie Durow

It is clear that the use of ‘I I I’ in this example does not mark ﬂuency, something that is assumed to be one of the properties of meaningful sequences in language. Yet, the recurrent use of this sequence seems to occupy a particular function in the student’s discourse, namely that of a hesitation and turn-keeping device. Although in this example the student is not successful in holding the turn she does achieve this in other places in the conversation by using the same sequence. Because of the consistent nature with which this sequence and others like it are utilised by the student, we have included such sequences in our analysis. However, we do distinguish between such sequences and those that have a more tangible lexical core in our discussion as it would appear that sequences that are not readily recognised as being semantically meaningful clearly fall into a different category. Another issue with the frequency approach is that a range of phrases that we would intuitively name as some sort of meaningful chunk may not occur at all, or only occur with a very low frequency, in any given corpus (Moon, 1998). While these are valid concerns with the frequency approach they are also characteristic of the uncertainty which still surrounds the deﬁnition of this linguistic phenomenon. Despite the problems associated with the frequency approach discussed above this approach seems to have distinct advantages when it comes to analysing the development of student output over time. A comparison between frequently used sequences in a native speaker corpus and the language output of non-native speakers allows for a measurement of approximation to those sequences displayed in the native speaker corpus. This approach also allows for the integration of a longitudinal perspective. The student output can be studied in terms of frequently-used sequences at diﬀerent points during their stay in Britain.

Participants We wish to explore whether the degree of social-cultural integration aﬀects the acquisition of formulaic sequences. Therefore we selected participants on the basis of their substantial diﬀerence in quantity and quality of interaction with native speakers. Several participants were interviewed as part of our larger project (see Schmitt et al. [Chapter 4], this volume), and from this participant pool, we looked for two students who demonstrated a contrast of high-integration versus low-integration. Speciﬁc questions included as part of the inter-

Social-cultural integration and the development of formulaic sequences

views made it possible to assess the level of the participants’ cultural integration over time. Of the two eventual participants, Beth was chosen as a good example of high integration and Ann (both pseudonyms) as a good example of low integration. The participants’ interview transcripts were then retroactively analyzed for this study. Their interviews were held over a period of seven months. Initially, they were studying on a three month pre-sessional English course and then Beth went on to study for an MA in Continuing Education, while Ann’s course was a Master’s in Law (LLM). Three interviews took place during the presessional course and two while they were studying in their respective Schools. Both participants were female, Chinese, and for both, it was their ﬁrst visit to the UK. Beth, who was twenty-three when the interviews commenced, had just graduated from her ﬁrst degree, while Ann, who at 32 was considerably older, had been working as a lawyer in China for ten years, a profession that did not require her to use English. Beth’s ﬁrst degrees were in music education and journalism, Ann’s was in law. We undertook a qualitative study of their interviews paying particular attention to the aspects of social interaction and contact/lack of contact with native speakers. These were not the only issues that were covered in the interviews, but they did appear particularly relevant to the present discussion of their socialcultural orientation with the British environment.

Social interaction After arrival, Beth quickly became very depressed and homesick. In her second interview, she claimed that “I couldn’t um enjoy myself everyday. Sometimes I don’t want communicate with other people.” Similarly, Ann was depressed early on, however, her depression was not, we believe, as noticeable as that of Beth. On arrival, Ann’s life appeared to be very solitary as she spent much of the time studying: “Study. Er yeah [laughs] Yeah most time I study, yeah. Erm I’m not s= maybe I’m not the clever student, but I think I can work hard.” Beth exhibited great determination and later was able to state that after four months many things had changed — “my body, my study, my language, my thinking, my future plan got a lot of change so I just couldn’t believe it, just four months, yeah.” Initially her leisure time was spent mainly on her own in her room, going on trips organised by the university or chatting with other international students. However, during her English language course, she joined a local church and through this was able to make a wide network of friends and acquaint-



2

Svenja Adolphs and Valerie Durow

ances. Unlike Beth, Ann’s situation did not improve. She was aﬄicted by constant tiredness, needing to return to her room at lunchtimes to rest. Throughout the pre-sessional course, much of her leisure time was spent studying and she admitted that “Yes, other times I think I have less communication with others.” Her friends were co-nationals, mainly students studying on the language course or on the current law course, and she found it diﬃcult to communicate with other international students. By the end of the three month course, she was still admitting to periods of depression, which led her to avoid contact with others: “I I I still have some diﬃculties in er communication with others when I am depressed at that time or sometimes I just want to relax, rest, rest and rest. I don’t want to do= I didn’t want to do anything at that time.” Beth appeared to have a much more varied social life and once on the MA course, it improved even more dramatically. She continued to attend church and social events associated with this, but also joined the university’s gliding club, obtained a part-time job in one of the university libraries, and began to give piano lessons. By contrast, at the beginning of her MA course, Ann’s social interaction was even less than before, although she did claim that she had more communication with her fellow students. Friendship seemed to be restricted to co-nationals — the majority of her ﬂatmates were Chinese, and there was a large Chinese presence on the law course: Yeah but er tch it’s a little bit, I think, diﬃcult for us to communicate er, yes, it’s the tch means is not so the easier because I ﬁnd now that er Chinese always get together and Japanese get together and er er maybe the two months or one month European students they just want to speak their, yeah with their own country’s or another’s . . .

In the ﬁnal interview, although slightly disenchanted by her course, Beth’s free time appeared to be spent in a number of wide and varied activities: “every weekend, we just got social life, social events, you know, some party and I like it very much, you know, because tch it’s very nice.” She seemed to have a wide network of friends, co-nationals, other international students and native speakers: . . . I go to church and in that church, I meet a lot of the international and native speaker, yeah, yeah. It’s it’s it’s more than the Chinese friend I think, but I also got some Chinese friend, in my ﬂat, you know, in my ﬂat live the seven= ﬁve Chinese cl= ﬂatmates. It’s nice, we also go shopping together and, yeah, do do a lot of things together. I think about= no, I think I got more British friends, yeah, and international friends.

Unlike Beth, by the last interview Ann’s friends are still restricted to fellow

Social-cultural integration and the development of formulaic sequences

Chinese, co-nationals. She admits that, “yes, they’re all Chinese friends um though, so I still haven’t er, you know, um the the friends other countries except the ﬂatmate in our yeah, in our ﬂat, so that’s the problem because the study is still busy. I still have no time to make friends.” Even though socially her life seemed to have marginally improved, she was still suﬀering from some depression — “Sometimes I just guess maybe high pressure is not the best thing because example, example I feel there is no meaning for for life, no interesting for life [laughs].”

Contact with native speakers Beth had had some prior contact with native speakers before arriving in the UK. She had met an English lecturer at an exhibition in China and had been in correspondence with him for over a year. Ann had had minimal contact with native speakers. As previously mentioned, she had been working as a lawyer and had not been required to use English. During the pre-sessional course, Beth made contact with a variety of native speakers through her church activities. One girl appears to have become a close friend, inviting her to stay over the Christmas holidays. Ann’s contact with native speakers was limited to course tutors and encounters in shops. The MA course also provided little opportunity for her, as it mainly attracted international students. On the other hand, the nature of Beth’s MA course allowed contact with many native speakers and Beth was able to achieve a wide network of friends and acquaintances in the native-speaking community, both inside and outside the university. After seven months, Ann claimed that she would like to join a university scheme which allows international students to visit British families but admits, “actually I have no other spare energy to to to do that, yes.” All in all, Ann appears to have had little contact with native speakers and most deﬁnitely, no ‘deep’ contact. When asked, in the ﬁnal interview, if she had met many more English people and had been in contact with more native speakers, her response is as follows: No, only study, even the students from other country t= didn’t, yeah, make the friends or I don’t know um . . . Part of it= them= er part of the reason is maybe the high pressure of the studies . . . The accessment[sic] still be a problem er still is= er is still a problem and er just um mainly because the high pressure from the study I think that fault.

By contrast, Beth was able to state that:

3

4

Svenja Adolphs and Valerie Durow Yeah, I think most of my friend is the English peoples. Yeah, yeah. I think because the church friends um, you know, in that= in my birthday party and I invite= I and my friend, because that girl, she come from Reading, we got very birthday= close birthday, so we just organise this party together, so we invite thirty= twenty-ﬁve people, I think twenty= twenty people is the British people. So mo= most of friend is the British people but also I got some friend is the Italian and Holland and whatever. So it’s very nice.

By virtue of her dogged determination, it would seem that Beth had achieved social integration, both with native and non-native speakers. However, it is debatable how far this “integration” goes. It may be that it is merely a means of furthering her ambitions. It has been noted elsewhere in this volume (Dornyei, Durow, and Zahran) that she possesses the ability to ‘ﬁx’ upon anyone who is able to assist her, and we believe that this ability becomes more marked in later interviews. For instance, she ‘ﬁxes’ upon fellow students who can advise her: I got the two classmates, they’re sixty years old. They’re quite they’re quite [laughs] old but they work hard so if you study with these kind of people, you got more and more experience because they also speak their own experience, their life experience their teaching experience, their study experience, so you can study from them, I thinking, yeah. Yeah, because we got the= one one um one man he is sixty years old. I always chat with him because he always give me a lot of the good good advice about my future development or something else.

She also socialises with lecturers, doctoral and post-doctoral students, who are able to “give me some information from their life and their work.” It is diﬃcult to assess how ‘deep’ Beth’s friendships with native speakers are, possibly some are acquaintances rather than friends. Nevertheless, it is undeniable that she achieved more contact with native speakers and is more socially active than other participants interviewed, notably Ann. Therefore Beth being a socially active participant was chosen for this reason whereas, Ann was chosen because she was much more solitary.

Study 1. three word sequences Methodology Interview procedure The interviews were all conducted by the same interviewer and varied in length. The early interviews (one, two and three) mostly lasted over one hour, however the ﬁnal two lasted only between thirty and forty-ﬁve minutes. Both students

Social-cultural integration and the development of formulaic sequences

were willing participants and were extremely interested in expressing opinions, both on their private lives and on their progress in their studies. The ﬁrst interviews were generally the longest as details of background and previous experiences of learning English were gathered. Beth’s ﬁrst interview was a total of approximately 9,500 words, while Ann’s was 11,500. The ﬁfth and ﬁnal interview was shorter as participants were only questioned about their progress on their Masters’ courses, social activities, and contact with native speakers, with the word count for both participants being around 5,000 words. The data analysed for this study were drawn from four interview sessions, the initial interviews with both participants and their ﬁfth interviews which took place seven months after the ﬁrst one. The transcripts of these interviews were divided into interviewer utterances and student utterances and stored in two diﬀerent ﬁles per interview.

Transcript analysis The ten most frequent 3-word sequences were derived from the participants’ ﬁles using the program Wordsmith Tools. The unit of 3 words per sequence was chosen for two reasons. A smaller unit of only 2 words per sequence would have included a range of phrasal verbs and habitual grammatical colligations which we were not interested in for the purposes of this study. A larger unit involving 4 or more items in the sequence would have resulted in too few examples for a meaningful study since an increase in the number of items per sequence necessarily means a decrease in the number of sequences found. The results provided an overview of frequently-used formulaic sequences in the student data and their development over time. We compared these with the 10 most frequent sequences in CANCODE, a 5 million word corpus of spoken English.1 The informal nature of the conversations in the CANCODE corpus resembled the relaxed and chatty atmosphere during the interviews that took place with the students and thus made this corpus a suitable resource for comparative study. The main aim of the following comparison was to see whether the overall percentage of recurrently used phrases would rise or fall over time which would indicate a degree of reliance on certain sequences in the students’ discourse. We have concentrated only on the 10 most frequent phrases to increase the likelihood that the recurrence of those sequences is indeed not accidental but that they represent some sort of linguistic unit that is produced to express a particular idea or discourse concept.

5

6

Svenja Adolphs and Valerie Durow

In our study we then compared the progress of usage of formulaic sequences between Interview 1 and Interview 5, i.e. progress over time, as well as the difference in terms of this progress of Beth on the one hand and Ann on the other. The latter variable is based on the degree of cultural integration which is markedly diﬀerent between the two participants.

Results Table 1 shows the ten most frequent 3-word sequences in Interviews 1 and 5 for both participants. It also shows the percentage that these sequences account for in all the utterances made by the participants in the respective interviews. Furthermore, the table shows a breakdown of the most frequently used three word sequences found in the CANCODE corpus. Table 2 shows the tally of all of the 3-word sequences used in the diﬀerent interviews by both participants.

Discussion It is interesting to note that when looking at the ten most frequent 3-word sequences, the students diﬀered in their use of these sequences between their ﬁrst and ﬁfth interview. The accumulative percentage of the ten most frequent 3-word sequences in the interviews with Beth increased from 2.38% to 3.53%. This percentage is lower for Ann, starting at 1.34%, and there is only a very marginal increase to 1.48% in Ann’s ﬁfth interview.2 These results suggest that Beth, who was relatively well-integrated, did increase her use of formulaic sequences over time, but that Ann, who was not well-integrated, did not increase her use of formulaic sequences to any real degree. However, for both students there is a change in the types of formulaic sequences used between Interviews 1 and 5. This is particularly obvious when we consider the development of usage by Beth. In Interview 1, half of the sequences are hesitation markers which signal disﬂuency in the language ﬂow. But in Interview 5, the hesitation sequences have been substituted with those that have a clear lexical core, i.e. that are more phrasal, and add to the ﬂuency of the discourse. Although Ann starts out with a lower frequency of hesitation markers, there is nevertheless a similar trend towards a reduction of these in the second interview. When we compare the types of formulaic sequences used by the students with the most frequent sequences we ﬁnd in the native speaker corpus there is only a little overlap. Both Beth and Ann use the sequence ‘a lot of ’ which is the

Social-cultural integration and the development of formulaic sequences

second most frequently used three word sequence in the CANCODE corpus and Ann uses ‘I don’t know’ and ‘I don’t think’ which are at frequency ranks 1 and 4 respectively in CANCODE. Larger non-native speaker corpora are needed to carry out a more comprehensive comparison with native speaker corpora in terms of frequently used sequences. This may make it possible to identify more representative patterns in non-native speaker data, such as the hesitation sequences used by the two participants in this study. Table 1. The ten most frequent three-word sequences produced by Beth and Ann Interview 1 Sequence

Interview 5 Freq. %

Sequence

Freq. %

BETH

I WANT TO JUST ER I I THINK SO AND I THINK III YEAH JUST ER YOU MUST ER I THINK THIS UM JUST ER A LOT OF

20 17 15 12 12 12 12 11 11 9

0.36 0.31 0.27 0.22 0.22 0.22 0.22 0.20 0.20 0.16

A LOT OF IT’S VERY NICE SO IT’S VERY SO IT’S NICE LOT OF THINGS I GOT VERY GOT A LOT JOIN THE LECTURE THE LECTURE AND I GOT SOME

33 23 15 14 12 9 8 8 8 7

0.85 0.59 0.38 0.36 0.31 0.23 0.21 0.21 0.21 0.18

ANN

A LOT OF ER I THINK I THINK I I WANT TO I I THINK ER I I I DON’T KNOW I DON’T THINK I TRY TO BECAUSE I THINK

23 11 11 9 8 7 7 7 7 6

0.32 0.15 0.15 0.13 0.11 0.10 0.10 0.10 0.10 0.08

A LITTLE BIT I DON’T KNOW I DON’T THINK A LOT OF HOW CAN I I HAVE NO I HAVE THE PREPARE FOR THE WE HAVE THE AND ER I

6 6 6 4 4 4 4 4 4 3

0.20 0.20 0.20 0.13 0.13 0.13 0.13 0.13 0.13 0.10

CANCODE

I DON’T KNOW A LOT OF I MEAN I I DON’T THINK DO YOU THINK DO YOU WANT ONE OF THE YOU HAVE TO IT WAS A YOU KNOW I

5,274 2,851 2,186 2,142 1,503 1,417 1,311 1,297 1,271 1,231

0.11 0.06 0.05 0.04 0.03 0.03 0.03 0.03 0.03 0.03

7

8

Svenja Adolphs and Valerie Durow Table 2. All three-word sequences produced by Beth and Ann

Number of words per interview/ student only Number of 3-word sequences occurring at least twice Percentage of 3-word sequences in text

BETH

ANN

Interview 1 Interview 5

Interview 1

Interview 5

5,553

3,899

7,162

3,046

394

250

345

126

20.98%

18.93%

12.66%

9.55%

There are a number of important aspects of formulaic sequence development which this data does not show us. Because we have only taken into account the ten most frequent 3-word sequences, we do not get an indication of the extent to which the students are using a variety of diﬀerent sequences. We deliberately concentrated on the most frequent sequences since we can expect to ﬁnd more sequences that are not characteristic of the participant’s typical usage towards the lower frequency end of the proﬁle. The question about the ‘cut-oﬀ ’ point for formulaic sequences is important in this context. With corpora of only a few thousand words it becomes much more diﬃcult to determine a suitable cut-oﬀ point since there are fewer sequences overall and since we cannot be sure that sequences which occur only twice or three times are representative of the participant’s usual discourse repertoire. It is interesting to note that Table 2, showing a tally of all recurring 3-word sequences produced by the participants, seems to suggest the opposite of the analysis of the ten most frequently occurring 3-word sequences, i.e. a decline in the use of such sequences over time. This may be due to the increase in variation in the students’ lexical and grammatical choices which means that some of the lower frequency sequences are avoided. It may also be a result of the decrease of ‘hesitation sequences’. The decrease, however, is less noticeable in Beth’s data while it is more obvious in Ann’s. Table 2 includes formulaic sequences of diﬀerent frequencies. After looking at these in more detail, it became clear that Ann’s results are made up of a high percentage of 2-word sequences, while Beth uses more 3- and 4-word sequences compared to Ann. Because of the limited size of the interview corpora, it is diﬃcult to make any generalisations about the use of low frequency formulaic sequences and about those that may only occur once in the student corpus and therefore do not qualify as a recurring sequence in the ﬁrst place.Yet, it may be that there are sequences

Social-cultural integration and the development of formulaic sequences

that are used by the students as single instances which nevertheless feature as a frequently used sequence in the native-speaker corpus. To capture this ‘overlap’ we have devised a diﬀerent analysis, which is outlined in the next section.

Study 2. formulaic sequences surrounding frequently-used lexical items Methodology In this study we concentrate on the most frequent lexical items used by the students and on the formulaic sequences that form around those. Corpus research suggests that a considerable number of formulaic sequences are formed around the most frequent lexical items in English. An initial frequency count was carried out for all four interviews and the 15 most frequent lexical items in each interview were selected. These are listed in the table below. Auxiliary verbs have been included in this list since a corpus-based frequency analysis makes it difﬁcult to discriminate between the diﬀerent meanings and functions of these in all of the instances that occur. We then carried out a sequence analysis of these items in CANCODE using Wordsmith Tools. The cut-oﬀ point was set at a minimum of ﬁve occurrences in

Beth

Ann

Interview 1 Just Very Think Some English Can Study About People Have Know University Time Like Must

Interview 5 158 106 89 81 60 53 50 43 38 36 35 33 30 29 29

Very Know Nice Think Just Got But Because Some Lot Friend Maybe About Whatever Things

Interview 1 89 79 78 72 64 63 53 51 43 37 30 27 25 24 22

Think Some Can Have Maybe English Because Just Know Will Very Here Lot Be Can’t

Interview 5 80 69 68 68 62 55 52 52 49 44 36 32 31 30 30

Just Maybe Have Can Because Better Are Some That Actually Know Cannot Don’t With Get

40 39 37 30 28 23 22 22 22 19 19 18 18 17 16

9

20

Svenja Adolphs and Valerie Durow

the 5 million word corpus (roughly 1 occurrence per million words). All 3-word sequences that included one of the words in the table above and occurred ﬁve times or more in the CANCODE corpus were generated. A special program was then written which compared the sequences found in the CANCODE corpus with the environment of the particular word in the student interview.3 An overlap between the construction used by the student and an identiﬁed sequence found in the corpus suggests that the student is using a particular word as part of a sequence that is highly frequent in native speaker English. While this method does not measure the correct use of a particular phrase nor its contextual appropriateness, it seems a useful procedure to assess the level of approximation to the use of formulaic sequences by native speakers of English. Our choice of lexical items in this study is deliberately based on frequency measures in the interview data rather than on identical lexical items that occur in both interviews. Yet, among the 15 most frequent lexical items, there are at least ﬁve in the ﬁrst interview of each student that are used again in the ﬁfth interview. This enables us to discuss results not only by frequency but also, in certain cases, longitudinally by individual words. To illustrate the methodology outlined above we will describe the sequence proﬁle of the lexical item ‘know’. This item occurs at frequency rank 11 in the ﬁrst interview with Beth and at frequency rank 2 in the ﬁfth interview with the same student. Thus the item ‘know’ falls among the 15 most frequent lexical items in both interviews. A frequency count of the whole CANCODE corpus shows that this item is at number 14 in the overall ranking with a frequency of 43,709 which accounts for 0.91 percent of the whole corpus. The sequence analysis of the lexical item ‘know’ generated an overall number of 894 diﬀerent sequences in the CANCODE corpus. We divided the list of sequences found in CANCODE into three frequency ranges in order to be able to study which frequency range the students drew on most. To this end we divided the overall number of occurrence of 894 by 3 which resulted in the following ranks: Rank 1 = 1–298, Rank 2 = 298–596, and Rank 3 = 596–894. The sequences in the corpus were then compared to the student interview data and any matches were counted. Of the 35 instances of the word ‘know’ that Beth used in her ﬁrst interview, 30 instances overlapped with frequently used sequences in the CANCODE corpus (see below). In the ﬁfth interview Beth uses the word ‘know’ 79 times overall of which 69 overlap with the sequences found in CANCODE. This takes the overlap percentage from 85.71% to 87.34%. While this is admittedly not a large difference it illustrates the procedure for this study.

Social-cultural integration and the development of formulaic sequences

Results The output of the sequence comparison is given in Table 3. It shows the overlap of student data with sequences found in CANCODE split into three frequency ranks. Numbers in parentheses indicate that sequences have been used more than once. The table below shows the overlap between the sequences of the word ‘know’ used by student Beth in Interviews 1 and 5 and the sequences found in the Table 3. Overlap of Beth’s 3-word sequences with CANCODE sequences formed around the word ‘know’ Freq. Rank

Beth Interview 1

1

“I don’t know” “you know I” (4) “you know the” “er you know” (6) “yeah I know” “you know yeah” (2) “you know er” “you know in” (6) “you know so” “you know just” “just you know” (2) “you know erm” “yeah you know”

2

“I know him”

3

“know most of ”

Interview 5 “I don’t know” (4) “you know I” (9) “you know the” (13) “and you know” (3) “don’t know what” “you know you” (2) “you know that” (2) “that you know” “but you know” (3) “you know when” “you know they” (2) “so you know” “you know in” (3) “you know so” “and I know” “you know but” “just you know” “you know because” (5) “you know they’re” (2) “people you know” “things you know” “you know my” “got you know” “you know some” (2) “really don’t know” (2) “to know the” “know a lot” “you know from” “you know keep” “you know now”

2

22

Svenja Adolphs and Valerie Durow

CANCODE corpus. The frequency ranks apply to the sequences found in the CANCODE corpus, i.e. to the ranges outlined above. The ﬁrst sequence in the table above, ‘I don’t know’, for example occurs in the CANCODE corpus as one of the more frequent sequences of this word which means that it is located in Frequency Rank 1. This range spans from the ﬁrst to the 298th most frequent sequence. The comparison above allows us to express the sequence overlap in terms of a percentage: BETH

Rank 1

Rank 2

Rank 3

Total

Interview 1 Interview 5

80.00% 83.54%

2.86% 2.53%

2.86% 1.27%

85.71% 87.34%

This comparison shows that the main overlap between Beth’s usage and CANCODE usage is at Frequency Rank 1, i.e. the most common 3-word sequences in the CANCODE. The greatest increase in sequence usage also occurs there. At the same time there is a slight decrease in Ranks 2 and 3. Overall, the comparison shows a progressive development of the use of formulaic sequences based around the word ‘know’, with Beth using a higher percentage of the most frequently used sequences in the second interview compared with the ﬁrst one. We carried out this type of analysis for all 15 lexical items in Interviews 1 and 5 of both students. We calculated the mean percentages of all rank categories. The results of this study are presented in Table 4. This table illustrates the difference between Interview 1 and 5, i.e. the development of sequence usage over time, as well as the diﬀerence between Beth and Ann, i.e. the student who integrated well into the native speaker community and the one who did not. While the number of cases chosen for this comparison is too small to permit a valid statistical analysis, a holistic perusal shows that there is a noticeable diﬀerence

Table 4. Overlap of the participants’ 3-word sequences with CANCODE sequences (all 15 words) Rank1

Rank2

Rank3

Total

Beth

Interview 1 Interview 5

33.73% 47.46%

5.30% 6.26%

3.25% 5.41%

42.28% 59.13%

Ann

Interview 1 Interview 5

48.10% 44.31%

4.29% 5.54%

3.30% 3.14%

55.72% 52.99%

Social-cultural integration and the development of formulaic sequences Table 5. Overlap of participants’ 3-word sequences with CANCODE sequences using the same cluster cores in Interview 1 and Interview 5 Core of cluster

Interview 1

Interview 5

Beth

Just Very Think Know About

26.58% 45.28% 91.01% 85.71% 9.30%

46.88% 57.30% 91.67% 87.34% 32.00%

Ann

Some Can Have Maybe Just

23.19% 66.18% 70.59% 4.84% 44.23%

27.27% 86.67% 86.49% 2.56% 45%

between the two participants, with Beth improving her percentage but Ann remaining about the same. Table 5 shows the longitudinal development of clusters formed around the ﬁve words common to Interviews 1 and 5 for each participant.

Discussion The results presented in Table 4 allow us to make two kinds of comparison. One relates to the progression of the usage of frequent sequences over time. The other relates to the diﬀerence in progression between Beth and Ann. The percentage ﬁgures show an increase of 16.85 percentage points between Interview 1 and Interview 5 by Beth. The main increase occurred in the Frequency Rank 1, i.e. in the most frequently used sequences in the native speaker corpus. The interview data of Ann on the other hand shows a slight decrease of overlap with the native speaker sequence results. This decrease is most visible in the Frequency Rank 1. Moreover, when we examine only the words used in both Interview 1 and Interview 5 (Table 5), we ﬁnd that both Beth and Ann increase their use of native speaker recurrent sequences surrounding these words in three out of ﬁve cases (if we count an increase as being more than two percentage points). While this trend ﬁts well with Beth’s positive growth ﬁgures in Table 4, it also shows that Ann, although showing overall stagnation in Table 4, still managed to improve her usage of certain sequences.

23

24

Svenja Adolphs and Valerie Durow

This seems to suggest that Ann builds up her usage of native-speaker sequences around items that she knows and uses recurrently. However, when we consider her development of high frequency items and the clusters surrounding them as a whole, she adopts less native-speaker sequences. This seems to indicate that her ability to acquire clusters surrounding less familiar lexical items is low compared to her ability to acquire clusters surrounding known items. Ann may be over-reliant on a relatively small number of sequences, which she is able to improve upon, but the price is the inability to gain improved mastery over a wider range of sequences. These results suggest that while Beth has achieved a more substantial approximation to the patterns derived from the native speaker corpus, Ann has not increased her use of such patterns in the same way. Although Ann starts out with a higher percentage of overlap, this percentage decreases slightly over time. The diﬀerence between the two students in this context suggests a relationship between social integration and the acquisition and usage of formulaic sequences as derived from a native speaker corpus. As such, the approach developed in this paper was able to show the diﬀerence between the two participants, and to identify the longitudinal development of each of the two students. The focus on naturally occurring student output is another advantage of this methodology as it diminishes the eﬀects of the artiﬁcial contexts that are often created in language testing environments.

Conclusion In this study we set out to explore the question of what eﬀect the level of social integration has on the acquisition of formulaic sequences in language use. To this end we contrasted the development of the use of formulaic sequences over time by studying the spoken output of two international students enrolled in degree programmes at a British University. The two students were chosen based on their markedly diﬀerent level of social integration into the native speaker community. Due to the lack of research in the area of the acquisition of formulaic sequences that includes a longitudinal dimension, it was necessary to develop a new type of methodology to describe the development in the student output over time. The two studies that have been reported in this paper both illustrate a change in the use of formulaic sequences over time. They also suggest a relationship between the quality of cultural and social integration of the students

Social-cultural integration and the development of formulaic sequences

and the adoption of formulaic sequences as displayed in a corpus of native speaker English. We acknowledge that our analysis is based on a very small set of data and that too much should not be claimed for it until our results have been replicated on a larger sample of data. However, we hope that we have shown in this study how corpus-based techniques can be used to assess some aspects of the development of usage of formulaic sequences.

Notes . CANCODE stands for Cambridge and Nottingham Corpus of Discourse in English, a ﬁve million word corpus of mainly informal spoken English. The corpus was developed as a joint project between the University of Nottingham and Cambridge University Press with whom sole copyright resides. 2. We have not used statistical analysis on this data as the number of sequences investigated per participant are too small to make this type of analysis appropriate. 3. We would like to thank Nicholas Cochrane for writing the software that has allowed us to carry out the quantitative analysis for this study.

References Aijmer, K. 1996. Conversational Routines in English. London: Longman. Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Bochner, S., Hutnik, N., and Furnham, A. 1985. The friendship patterns of overseas and host students in an Oxford student residence. The Journal of Social Psychology 125: 689–694. Coulmas, F. 1979. On the sociolinguistic relevance of routine formulae. Journal of Pragmatics 3: 239–266. Cowie, A. P. 1988. Stable and creative aspects of vocabulary use. In Vocabulary and Language Teaching, R. Carter and M. McCarthy (eds), 126–139. London: Longman. Ellis, R. 1994. The Study of Second Language Acquisition. Oxford: OUP. Furnham, A. and Alibhai, N. 1985. The friendship networks of foreign students: A replication and extension of the Functional Model. International Journal of Psychology 20: 709– 722. Furnham, A. and Bochner, S. 1989. Culture Shock: Psychological Reactions to Unfamiliar Environments. Routledge: London. Krashen, S. 1982. Principles and Practice in Second Language Acquisition. Oxford: Pergamon. Manes, J. and Wolfson, N. 1981. The compliment formula. In Conversational Routines: Exploration in Standardised Communication Situations and Prepatterned Speech, F. Coulmas (ed.). The Hague: Mouton.

25

26

Svenja Adolphs and Valerie Durow Moon, R. 1998. Fixed Expressions and Idioms in English: A Corpus-based Approach. Oxford: Clarendon Press. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Native-like selection and native-like ﬂuency. In Language and Communication, J. C. Richards and R.W. Schmitt (eds), 191–226. Harlow: Longman. Sinclair, J. McH. 1987. Collocation: A progress report. In Language Topics: Essays in Honour of Michael Halliday, R. Steele and T. Threadgold (eds), 319–332. Amsterdam: John Benjamins.. Swain, M. 2000. The output hypothesis and beyond: Mediating acquisition through collaborative dialogue. In Sociocultural Theory and Second Language Learning, J. P. Lantolf (ed), 97–114. Oxford: OUP. Vygotsky, L. S. 1987. The Collected Works of L. S. Vygotsky. Volume 1. Thinking and Speaking. New York: Plenum Press. Ward, C., Bochner, S., and Furnham, A. 2001, 2nd edition. The Psychology of Culture Shock. Routledge: London. Weinreich, U. 1980. Problems in the analysis of idioms. In On Semantics, W. Labov and U. Weinreich (eds), 208–264. Philadelphia: University of Philadelphia Press. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

Are corpus-derived recurrent clusters psycholinguistically valid? Norbert Schmitt, Sarah Grandage, and Svenja Adolphs University of Nottingham

Introduction Corpus research has been immensely useful in applied linguistics in numerous ways. It has allowed the compilation of dictionaries which better represent the way words are used, and all of the major international ESL dictionaries are now corpus-based. Corpora have been consulted to provide descriptive rather than prescriptive grammars of English (Biber et al., 1999; DeCarrico and Larsen-Freeman, 2002; Carter and McCarthy, in press). Corpus analysis has also done much to increase our understanding of the phenomenon that, in English (and perhaps most/all languages?), speakers tend to use the same clusters of words over and over again (e.g. Sinclair, 1991; Cowie, 1998; Moon, 1998). This is no marginal phenomenon, with Erman and Warren (2000) calculating that word clusters of various types constituted 58.6% of the spoken English discourse they analyzed and 52.3% of the written discourse. These recurrent clusters of words range from strings that intuitively appear to be single units (idioms, proverbs: a stitch in time saves nine) through strings which are used to realize functional language use (would you please . . . [requesting]) to strings which are recurrent in a corpus, but which do not intuitively seem to be ‘whole units’, such as many of the ‘lexical bundles’ identiﬁed by Biber et al. (in addition to the, in the number of ). At the same time, scholars working in the areas of psycholinguistics and language acquisition have focused on the same phenomenon. These clusters of words appear as an important feature of both ﬁrst (e.g. Vihman, 1982; Peters, 1983; Pine and Lieven, 1993) and second (e.g. Hakuta, 1976; Wong Fillmore, 1976; Ellis, 2003) language acquisition. The explanation oﬀered by Pawley and Syder (1983) about why these word clusters appear to hold such a prominent place in language usage has found general acceptance and in essence states that the mind stores useful word clusters as preformulated holistic units which can

28

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

be more easily retrieved and processed than the same word sequences if they were generated through the use of syntax and vocabulary. Since these ‘formulaic sequences’ are already ‘prepackaged’ in the memory, they are easier to process, and allow the language user to be more ﬂuent while at the sam e time freeing up cognitive resources for other language processes. (See Schmitt and Carter, this volume, for a more detailed background on formulaic language and its acquisition.) The corpus and psycholinguistic/acquisition approaches complement each other, and indeed there are clear links between the two modes of research. In particular, psycholinguistic studies often draw upon corpus data to select and control target lexical items (e.g. Underwood, Schmitt, and Galpin, this volume). It is not unnatural then to assume that the data drawn from corpus analyses reﬂects the psycholinguistic reality of how language is processed and produced. After all, nearly all corpora are compiled from authentic language of various types, which real people have produced. In some cases, corpus evidence can be directly interpreted as reﬂecting the true underlying mental state of the people contributing to the corpus. For example, in L1 research, a corpus of a young child’s utterances can accurately reﬂect the productive vocabulary of that child. But in other cases, the link is not so straightforward. An example of this is the unspoken assumption many people seem to have that recurrent clusters identiﬁed by corpus analysis are also stored as holistic formulaic sequences in the mind. Intuitively, this seems reasonable for clusters which are somehow ‘selfcontained’, like idioms, but we suspect most people would be much more unsure about lexical bundles like in a variety of. To our knowledge, this assumption has never been empirically put to the test, and so the extent to which recurrent clusters are psycholinguistically valid in terms of holistic storage is an open question. This study will use research methodologies from both approaches to seek enlightenment on this issue. Corpus analysis will be used to identify a number of target recurrent clusters, which will then be embedded in a psycholinguistic language task that can provide insights into whether they are stored holistically or not. To pursue this line of enquiry, we must make a distinction between word strings which come from corpus analysis (but which may or may not be stored holistically in the mind) and word strings which are stored in the mind as whole units (but which may or may not be identiﬁable through corpus analysis). We shall use the term recurrent clusters to refer to the ﬁrst type of word string and formulaic sequence (Wray, 2002) to refer to the latter. Thus the term recurrent clusters is solely corpus-based, and carries no psycholinguistic assumptions.

Are corpus-derived recurrent clusters psycholinguistically valid?

Methodology Selection of the target recurrent strings As the purpose of this study is to assess the psycholinguistic validity of recurrent clusters extracted from corpus analysis, the initial step was to create a list of corpus-derived clusters. We turned to the literature and extracted recurrent clusters identiﬁed in two of the best-known publications on the topic. First, we consulted the section on formulaic language (Chapter 13) in the Longman Grammar of Spoken and Written English (Biber, et al., 1999), and derived a list of 97 three-word and four-word clusters. Then we extracted 59 clusters from Lexical Phrases and Language Teaching (Nattinger and DeCarrico, 1992). Next, we took words from Hyland’s (2000) list which are used to express doubt and certainty (e.g. clearly and approximately) and which are used as discourse markers (e.g. therefore and ﬁnally) and submitted them to a corpus analysis to see if they formed the core of a formulaic sequence (clearly the best). If so, they were added to our candidate list. Once the list of candidate recurrent clusters was compiled, we determined how frequently they occurred in each of three corpora. Frequency ﬁgures from the British National Corpus (BNC) gave an indication of how often the clusters occurred in general English, ﬁgures from the CANCODE corpus indicated how frequent they were in spoken discourse, and ﬁgures from the MICASE corpus showed their frequency in academic spoken discourse. Based on these frequency ﬁgures, we were able to identify a range of recurrent clusters, varying from relatively frequent to relatively infrequent. From this list we selected target recurrent clusters which varied along a number of attributes, including length, frequency, transparency of meaning, and type of cluster. The length of the clusters ranged from two words to six. The most frequent cluster (you know) occurred 42,477 (424.8 per million running words) times in the BNC and the least frequent (to make a long story short) twice, with the majority of clusters falling within a band of 100–1600 occurrences, 1–16 p.m.). The clusters ranged in frequency in the CANCODE from 0–669 occurrences (0–133.8 p.m.). We chose some clusters which are relatively ‘self-contained’, expressing readily-accessible meanings that do not need additional context in order to be understood. For example, Go away is often used as a brusque phrase indicating that a person should leave. Some of the selected clusters are closely connected with functional language use; for instance, to make a long story short realizes the function of coming directly to the conclusion or punchline of a story or anecdote. Other selected clusters do not have this trans-

29

30

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

parency of meaning or function, and have often been referred to as ‘sentence stems’ or ‘sentence builders’ (Granger, 1998). Examples of these include what I want to and is one of the most. Finally, since the main purpose of the study was to explore whether corpus-derived clusters are also stored as holistic units, we wished to have a range of clusters which varied according to our intuitions about whether they were likely to be stored as formulaic sequences or not. Thus we wanted some clusters which seemed likely to be stored holistically by proﬁcient speakers (as a matter of fact) and some which were quite questionable in this regard (in the number of ). After balancing all of the above issues, 25 recurrent clusters were chosen for the study: aim of this study as a consequence of as a matter of fact as shown in ﬁgure for example from the point of view go away I don’t know what to do I see what you in a variety of in addition to the in the middle of the in the number of

in the same way as is one of the most it was going to it’s not too bad night and day on and oﬀ something like that to give you an example to make a long story short what I want to you know you’ve got to have

Developing the instrument and dictation methodology Once the target recurrent clusters were selected, we needed to ﬁnd a methodology which could indicate whether these clusters were stored holistically or not. Such a measurement is physiologically impossible, so any such measurement must inevitably be indirect. We took our cue from the ﬁeld of second language measurement, where dictation tests are used as measures of integrated language ability (e.g. Bailey, 1998; Fountain and Nation, 2000). The basic idea is that if the stretches (‘bursts’) of dictation are long enough, it overloads working memory, and the person is forced to reconstruct the content of the dictation burst via their language resources, rather than just repeating the dictation back from rote memory. One of those language resources is the inventory of formulaic sequences stored in memory. The object of the dictation task is to reproduce the

Are corpus-derived recurrent clusters psycholinguistically valid?

bursts as closely to the original stimuli as possible, and so if the formulaic sequences were available for use, we presume there is a high likelihood that they would be produced as part of the participants’ responses. Of course, if a participant reproduces a cluster correctly, this in itself does not mean that the cluster was stored as a formulaic sequence; it could have been generated via syntactic rules and lexical knowledge of the component words. This is particularly true if the dictation task requires written responses, with minimal time pressure on a participant’s cognitive resources. To overcome this problem, we chose to use an oral-response task, where the participant repeated the dictation into a tape recorder. We did this for two reasons. First, it served to put an element of time pressure on the participants, which should lead to a preference for the presumably quicker route of retrieving a formulaic sequence (if it is stored and available), rather than creating it from scratch. More importantly, it has been noted that formulaic sequences are typically articulated in a ﬂuent manner (e.g. van Lancker, Canter, and Terbeek, 1981), with a ‘normal’ intonation contour, that is, with a natural pitch, stress, and juncture proﬁle. This has been accepted as one of the criteria of formulaticity (e.g. Pawley and Syder, 1983; Peters, 1983), and any deviation from this proﬁle (e.g. a hesitation between words within a cluster: as a matter (1 second pause) of fact) suggests that the cluster is not stored holistically (although note that other explanations are possible: see Rosenberg, 1977). Thus, although it is admittedly not a direct measure of holistic storage, in this study we take ﬂuently-articulated reproduction of the recurrent clusters embedded in the dictation contexts as evidence that they are likely to be holistically-stored formulaic sequences. In order to use the dictation methodology, we needed to place the target recurrent clusters into discourse. It was felt desirable to have the dictation bursts form a coherent text, rather than be a series of unrelated bursts, and so the 25 clusters were embedded into a story about a hitchhiker. The story was controlled for low frequency vocabulary and more complex syntax to the extent that was possible without making it sound unnatural. We piloted the story several times, both to reﬁne the story itself, and to ﬁnetune the dictation procedure. We experimented with diﬀerent lengths of burst (9–36 words) and whether having a long or short pause after the dictation burst made a diﬀerence in the participants’ responses (it didn’t). The dictation task seemed viable, but suﬀered from one critical problem: the native-speaker pilot participants proved amazing good at it. Even with bursts approaching 36 words, the natives were able to repeat them back virtually verbatim. Although the nonnative pilot participants were suﬃciently challenged by the dictation task, we

3

32

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

wanted a task which we could use with both nonnative and native participants. Clearly it was not feasible to increase the burst length further and expect the nonnatives to do the task, as they were already struggling with the mediumlength bursts. It was therefore necessary to insert an extra task which would pressure the natives’ cognitive resources, because we needed them to reconstruct the language bursts rather than just repeat them from memory. The dual performance task we settled upon was a basic addition task, where natives did a calculation (e.g. 52 + 29 = ?) before they repeated back the dictation burst. This dual performance task required additional piloting to come to an appropriate level of diﬃculty for the addition calculations, although the ﬁnal task was still found somewhat challenging by some native speakers. With this dual performance task, we were able to cut the length of the bursts to around 20–24 words for the native speakers. This length proved appropriate for the nonnatives as well, since they were not required to do the dual performance task. In some psychological experiments, placement of the target in the stimulus is important, as there is sometimes an advantage for targets placed towards the beginning or end. To conﬁrm that this was not a confounding factor in this study, a Pearson correlation analysis was run on the eventual main study data between the participant performance and placement of the recurrent clusters in the dictation bursts (towards the beginning, middle, or end of burst). There was no signiﬁcant correlation for either the native speakers (p=.952) or the nonnatives (p=.409). Thus the performance scores do not appear to be aﬀected by where the clusters appeared in the burst. After the piloting process, the ﬁnal version of the story had 39 bursts in total, with 25 bursts containing target clusters. The discrepancy is due to the necessity of including several non-cluster-bearing bursts in order to keep the story coherent. See the appendix for the ﬁnal dictation bursts and their related dual performance tasks.

Procedure The story was recorded onto a master tape, with 30-second pauses between bursts to allow for task completion but under a time constraint (the pilot showed that the anticipated 20 seconds was not long enough for either participant set to successfully complete their tasks). The participants were divided into groups of up to 18 (the capacity of language laboratory) with native and non-native participants in separate groups to accommodate the diﬀerence in task type for the native speakers.

Are corpus-derived recurrent clusters psycholinguistically valid?

At the beginning of the session the basic task was outlined to the participants. This included a brief explanation of the text for the native speakers and a more detailed recounting of it for the non-native speakers, to facilitate the recall of the linguistic content of each burst without the added cognitive load involved in the comprehension of various propositions and topic shifts inherent in the narrative’s structure. In addition, during the non-native speaker sessions, pronunciation issues were pre-empted in relation to certain proper nouns within the story (Cosmopolitan/Sheﬃeld/Australia), following problems encountered during the piloting stage, where participants spent so long attempting to pronounce these words correctly that the time allotted for the repetition of the burst expired. For the native speakers, there was also the explanation of the addition task, which had to be carried out after hearing each burst of the story. The sequence of the dual performance task involved the participants listening to each burst which was then followed by a visual stimulus for the addition task. Reading the two numbers from a card displayed immediately after hearing the burst, the participants did the sum mentally and then recorded the answer onto the tape before attempting to reconstruct the burst. The task proved fairly challenging for several of the participants, some of whom resorted to approximating the answer after a short period. Although all the sums required a degree of ‘carrying over’ to make them more challenging, some seemed to cause fewer problems than others (e.g. Burst 10: 7 + 17 or Burst 14: 9 + 14 presented fewer problems than Burst 23: 28 + 45 or Burst 33: 37 + 85). For all of the participants, it appears that the demands of the extra processing occupied their working memory for suﬃcient time to force recourse to linguistic resources (as discussed earlier) to enable reconstruction of the story bursts and to avoid simple repetition. In addition, four of the native participants were given the dictation task without the dual performance task, in order to compare the above non-rote performances to a condition where memory resources were not put under pressure (control condition). Finally, technical points related to the recording procedure were covered. The recording process was controlled by the researcher from the master console in the language laboratory. The participants had no control over the recording, except to adjust the volume if necessary. They were not able to rewind to listen to bursts again or to rerecord their contribution. Each participant was recorded onto an individual tape, alongside the ‘guide track’ of the original story, which allowed for ease of comparison during transcription. The transcription itself noted participant performance in terms of both lexico-grammatical accuracy (including changes, additions to and omis-

33

34

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

sions from the original text) and prosodic features related to ﬂuency (i.e. intonation, hesitations, pausing, false starts, stumbles or repetitions). The analysis of the data was carried out both quantitatively and qualitatively. To quantify the participants’ performance, we devised a three-part scoring system: a) reproduction of a recurrent cluster fully intact in terms of lexis and intonation contour = 2 points, b) attempted reproduction of a cluster, but with missing/other lexis and/or a not fully intact intonation contour = 1 point, and c) reproduction of a recurrent cluster was completely missing from the participant’s response = 0 points. For each recurrent cluster, we also noted the number of participants falling into each of the above performance categories. For the qualitative analysis, we examined the responses for each cluster, giving special attention to Category B, because from our pilot experience, the incorrect’ responses often gave the best insights into how the clusters were being processed.

Participants The participants consisted of two groups: 34 native speakers (4 male/30 female) and 45 non-native speakers (12 male/33 female). All the participants, both native and non-native were taken from within the university community. All of the natives were undergraduates at the University of Nottingham except for two postgraduates. The non-natives were a mixture of international postgraduates and visiting scholars at the university. Over half of the non-natives spoke Chinese as their L1 (20 visiting Chinese teachers of English, 1 visiting scholar, 2 undergraduates, and 7 postgraduates including 2 from Taiwan), while the rest spoke a variety of mother tongues, including German (4), Spanish (3), French, (2), Flemish, Japanese, Korean, Malay, Akan, and Arabic (1 each). While seemingly heavily biased in terms of numbers of Asian L1 speakers, this is in fact representative of the non-native student population at the university which is dominated by such students.

Results and Discussion Quantitative analysis How well were the recurrent clusters reproduced overall? — Native speakers The ﬁrst thing to note in our analysis is that the meaning of the non-cluster text was nearly always reproduced faithfully, therefore the memory task was not so

Are corpus-derived recurrent clusters psycholinguistically valid?

diﬃcult as to inhibit the retention of the semantic content of the bursts. The question is thus whether the form used to instantiate this meaning consisted of the target recurrent clusters. The results show that the various clusters elicited a variety of response behavior. The overall performance mean using our scoring system (see above) was 1.344, clearly indicating that not all of the clusters were reproduced in a manner which would suggest they were holistically stored in the mind (see Table 1). The clusters at the low end of the range were below 1.00, suggesting that they are either not stored as single units, or that they are stored but for some reason were not available in this dictation task. At the other end of the range, a number of clusters are at or above 1.60, which indicates that most of the participants were reproducing the clusters accurately, implying that they may well be formulaic sequences. The clusters with scores in the middle of the range are more diﬃcult to interpret, although the following analysis will have more to say about these. Based on mean scores, it seems that the recurrent clusters are not a homogeneous set, with the natives varying widely in how well they were able to reproduce clusters. Perhaps a better type of evidence is the number of natives who reproduced the clusters correctly, the number who reproduced them incorrectly or disﬂuently, and the number who did not produce them at all. Some clusters were reproduced intact by almost all of the participants (e.g. go away, I don’t know what to do), while others were reproduced intact by almost no participants (e.g. in the same way as, aim of this study). This data supports the observations made in the above paragraph. Moreover, the response category can be more illuminating about the midrange scores than the mean score. Although the mean scores of two clusters might be similar, this might hide quite diﬀerent response behavior by the participants. Let us consider for example and in a variety of. For example has a mean score of 1.20, with 18 natives reproducing it intact and 12 not producing it at all. Crucially, there were no participants who attempted it, but produced some other form (or articulated it disﬂuently), which would give the clearest indication of the cluster not being stored holistically. Conversely, even though in a variety of had a higher mean score, it was reproduced intact by only 15 natives, with 11 participants producing a variation of the cluster. Because a large number of natives did not produce the cluster intact, but some word string which was similar, it seems unlikely that this recurrent cluster is a formulaic sequence for most natives. In fact, the ‘Partially Incorrect’ category is probably the most telling in this study. We argue that clusters which were produced intact provide evidence that those clusters were easily accessible and thus may well be stored as wholes. This

35

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs 36

Table 1. Native performance of recurrent clusters

Mean performance a 3 2 3 8 3 13 16 18 17 16 15 16 16 19 18 21 18 24 21 21 23 25 27 27 28

Produced correctly b 11 16 17 10 25 6 1 0 2 5 11 10 11 5 9 4 10 0 6 6 3 0 2 2 0

Partially incorrect b 16 12 10 12 2 11 13 12 11 9 4 4 3 6 3 5 2 6 3 3 4 5 1 1 2

Not produced b

1.000 1.250 0.750 2.000 1.500 1.250 2.000 1.500 1.750 2.000 2.000 1.500 1.750 2.000 2.000 2.000 2.000 2.000 2.000 2.000 1.750 2.000 2.000 2.000 1.500

Mean performance a

1 2 1 4 2 2 4 3 3 4 4 2 3 4 4 4 4 4 4 4 3 4 4 4 3

Produced correctly c

0.480 .714

2 1 1 0 2 1 0 0 1 0 0 2 1 0 0 0 0 0 0 0 1 0 0 0 0

Partially incorrect c

0.280 .542

1 1 2 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Not produced c

Control participants (N=4)

Recurrent cluster 0.567 0.667 0.767 0.867 1.033 1.067 1.100 1.200 1.200 1.233 1.367 1.400 1.433 1.433 1.500 1.533 1.533 1.600 1.600 1.600 1.633 1.667 1.867 1.867 1.867

3.240 1.012

Participants (N=30)

in the same way as aim of this study as shown in ﬁgure to give you an example I see what you as a consequence of night and day for example in the middle of the something like that in a variety of you’ve got to have it’s not too bad From the point of view in the number of as a matter of fact in addition to the you know What I want to it was going to to make a long story short on and oﬀ is one of the most I don’t know what to do go away

1.740 .364 Max = 4

6.160 4.488 c

6.880 6.214 Max = 30

16.720 7.749 b

1.344 .367 Max = 2

Mean Std a

Are corpus-derived recurrent clusters psycholinguistically valid?

depends on the argument that holistically-stored lexical items are more easily deployed than strings produced through syntactic construction. This has often been asserted (e.g. Pawley and Syder, 1983), but it must be admitted that the underlying mechanisms are not well understood. Similarly, if clusters were not produced intact, when the dictation task was to reproduce them exactly, this indicates that they were not readily available, which would argue against their being stored in the lexicon. But just because a cluster was not produced does not give direct evidence that it was not stored in the lexicon. For instance, it could have been ‘blocked’ for some unknown reason. On the other hand, clusters which were attempted, but not reproduced intact, give the clearest indication that those clusters were somehow not prominent in the mind, because if they were, they should have been reproduced intact when the participant was engaging with that part of the stimulus. In other words, we know that the participant was producing word strings similar to the cluster, and with the same semantic content, but not actually reproducing the cluster in the dictation. If the cluster was a formulaic sequence, we assume that in most cases it would be reproduced intact. Looking at the recurrent clusters in terms of the number of participants reproducing them partially incorrectly or disﬂuently, we again ﬁnd a range. For some clusters this happens not at all or very little (e.g. go away, for example, is one of the most), further enhancing the evidence for their formulaic sequence status. For other clusters, it happens with a majority of the participants (e.g. I see what you mean, as shown in ﬁgure, aim of this study), supporting the argument that they are unlikely to be stored holistically in the mind. Again the midrange clusters are less clear to interpret, but it must be questionable whether any cluster with a substantial number of participants falling into this category are formulaic sequences, although it is impossible to state how substantial the percentage needs to be to disqualify a sequence. The above results were for native speakers when they were carrying out the dual performance task designed to overload their ability to repeat back the dictation bursts in a rote manner. As expected, the four control condition native participants performed much better than the non-rote participants (19 clusters were reproduced intact by three or four of the natives), but this tells us little about lexical storage because the clusters could easily have been reproduced by rote. The interesting data lies in the ‘Partially Known’ and ‘Not Produced’ categories. Even where there was no pressure on memory (at least in terms of the dual performance task), two to three of these natives either did not produce or produced another form of the following recurrent clusters: in the same way as, I

37

38

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

see what you, as shown in ﬁgure, and aim of this study. Whereas piloting showed that natives are very good at doing this dictation task (which is why we needed the dual performance task in the ﬁrst place), it is telling that these four clusters were poorly reproduced. The fact that even these natives, with their memory unhindered, did not reproduce these particular clusters well seems to argue for the conclusion that the clusters do not hold any of the advantages ascribed to being formulaic sequences. Thus we have additional evidence for the argument that not all recurrent clusters are holistically stored. In sum, these results suggest that not all recurrent clusters identiﬁed on the basis of corpus analysis are psycholinguistically valid, that is, stored as holistic units in the minds of proﬁcient speakers. Recurrent clusters vary, with some highly likely to be formulaic sequences on the basis of this evidence, but others quite unlikely to be holistically stored. There is also a number of clusters which are ‘in the middle’, exhibiting mixed evidence. One way to interpret these results is that recurrent clusters fall on a cline of probability as to whether proﬁcient speakers will have them stored as formulaic sequences. On one end, some clusters have a high probability of being holistically stored by most speakers, while at the other, some clusters are likely to be stored in this way by very few if any speakers. In the middle we would expect to ﬁnd clusters that some speakers, but not others, have stored as formulaic sequences. In other words, it is idiosyncratic to the individual speaker whether they have stored these clusters or not. Every person has their own unique idiolect made up of their personal repertoire of language, and as part of that idiolect, it seems reasonable to assume that they will also have their own unique store of formulaic sequences based on their own experience and language exposure. This ‘formulalect’ or ‘phrasalect’ would include most of the formulaic sequences which the average member of a speech community stored holistically, but also a number of formulaic sequences which were not so typically stored by other speech community members. People will obviously vary in their levels of ﬂuency and powers of expression depending on the topic and discourse situation, and this may well be substantially dependent upon one’s ‘phrasalect’ given the close connection of formulaic language with ﬂuent and appropriate language use. Thus, the bottom line is that just as a person’s mental lexicon contains a unique inventory of words, it is likely to also contain a unique inventory of formulaic sequences.

The eﬀect of recurrent cluster attributes on dictation performance We have argued that not all recurrent clusters are also formulaic sequences. But if some are and some are not, are there any attributes of the clusters themselves

Are corpus-derived recurrent clusters psycholinguistically valid?

which might aﬀect whether they are taken into the mind and stored as wholes? We explored three features: frequency of the cluster, length of the cluster, and the transparency of clusters’ meaning/function. Frequency of occurrence is a key attribute in corpus analysis, and one might speculate that the most frequent clusters would be more likely to be stored as formulaic sequences, and so be connected with higher performance scores on the dictation test. We ﬁnd this is not the case: a Pearson correlation test indicated no reliable relationship between frequency of occurrence in the BNC and native performance on the dictation task (p = .315). Likewise, there was no relationship between frequency of occurrence in the CANCODE and native performance on the dictation test (p = .961). Thus, frequency of occurrence does not seem closely related to whether a cluster is stored in the mind as a whole or not. Furthermore, there was no signiﬁcant correlation between length of cluster and mean performance score (p = .839). We next looked at the meaning and function of the target recurrent clusters. It seems possible to discern a trend of the clusters with higher performance scores being relatively transparent in terms of meaning (go away, I don’t know what to do) or function (to make a long story short). Likewise, most of the clusters with lower performance scores appear to be sentence stems (in the same way as, aim of this study). However, this trend is far from clear, as some of the clusters with higher scores are sentence stems (is one of the most, it was going to). After dividing the recurrent clusters into (admittedly somewhat subjective) categories of sentence stem vs. semantically- or functionally-transparent clusters, a pointbiserial correlation came out at .267. Although this ﬁgure is modest, the factor of semantic/functional transparency at least does have a stronger relationship with the performance scores than frequency or length. On the basis of this, we would tentatively suggest that semantic and functional transparency does have a role to play in determining whether a recurrent cluster becomes stored in the mind. This sounds intuitively plausible, but any stronger conclusion must await further evidence.

Nonnative speaker performance on the dictation task Because native speakers are assumed to know their L1 well, and have a large inventory of formulaic sequences, their results gave an indication of the likelihood that recurrent clusters are also formulaic sequences in a proﬁcient speaker’s mind. However, such assumptions cannot be made about nonnative learners; in fact most research indicates that nonnatives often have relatively weak mastery over formulaic language, resulting in under-use (Dagut and Laufer,

39

40

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

1985), overuse (Granger, 1998; de Cock, 2000), or misuse (Yorio, 1989; Howarth, 1998), which can lead to inappropriate or awkward language. What can the nonnative results tell us about recurrent sequences, formulaic sequences, and second language learners? The native speakers scored only strongly enough to argue for holistic storage for a minority of the target recurrent clusters. We would expect the nonnatives to have even lower scores overall, and this is exactly what we ﬁnd (Table 2). Where the mean of the native performance scores was 1.344, the nonnative mean was only .902. Most of the performance scores were under 1.10, with the highest score only 1.489. Looking at the three performance categories, we ﬁnd that only four clusters were reproduced intact by half or more of the nonnatives (as a matter of fact, in the middle of the, you know, on and oﬀ ), with the percentage of nonnatives performing in the category ‘Produced Correctly’ roughly half that of the natives. Conversely, the vast majority of nonnative performances fell into the ‘Partially Incorrect or Disﬂuent’ or ‘Did not Produce’ categories. Overall, the nonnatives did not reproduce the target clusters very well. This supports the general observation that nonnatives have diﬃculty with mastery of formulaic language, and also suggests that they have relatively few formulaic sequences stored in their minds ready to be used in ﬂuent and appropriate language use. At the very least, the recurrent clusters in this task did not seem very salient for the nonnatives. With limited memory capacity in their L2 and language competence which inevitably had some limitations, the nonnative participants seemed to ‘latch onto’ key content words and then try to reproduce the dictation language around them. They did not seem to have the recurrent clusters available as formulaic sequences, and so tried to generate a sensible reconstruction based on these key words. This is reﬂected in the relatively high number of participants falling into the ‘Partially Incorrect or Disﬂuent’ category, where elements of a target cluster (usually one or two words) were reproduced, but in a form quite diﬀerent from the cluster. The possible exceptions to this are the clusters as a matter of fact, in the middle of the, and you know. It could be argued that the nonnative performance was strong enough to suggest that these are formulaic sequences for most of the nonnatives, but even with these best-performed clusters, the total performance is not nearly as conclusive as the native data. It is probably safest to conclude that these three clusters are among the best mastered by the nonnatives, but not construe that they are necessarily stored holistically. In the native speaker data, we found no correlation between frequency and

Are corpus-derived recurrent clusters psycholinguistically valid?

performance score, or length and performance score, but did ﬁnd a modest correlation between transparency of meaning/function and performance score. With the nonnatives, there was no signiﬁcant correlations between performance on the dictation task and the factors of frequency (BNC: p = .568, CANCODE: p = .226) or length (p = .666). When the recurrent clusters were divided between semantically or functionally-transparent clusters vs. sentence stems, the pointbiserial correlation came out at .476, which is considerably higher than for the native speakers. The natives’ performance suggests that some, though certainly not all, recurrent clusters were likely to be holistically stored, and given the small number of clusters in this study, that would extrapolate to what is probTable 2. Nonnative performance of recurrent clusters (n=45) Recurrent cluster

Mean performance a

Produced correctly b

Partially incorrect b

Not producedb

as shown in ﬁgure in the same way as in the number of as a consequence of you’ve got to have aim of this study in a variety of night and day it was going to what I want to it’s not too bad for example on and oﬀ from the point of view go away in addition to the to make a long story short something like that to give you an example you know I don’t know what to do I see what you is one of the most in the middle of the as a matter of fact

0.244 0.360 0.400 0.400 0.444 0.578 0.622 0.644 0.667 0.978 1.022 1.022 1.044 1.044 1.067 1.089 1.090 1.111 1.133 1.178 1.178 1.200 1.244 1.311 1.489

2 1 3 6 1 3 9 9 5 12 13 22 23 13 22 17 9 22 16 26 16 14 18 26 29

7 14 12 6 18 20 10 11 20 20 20 2 1 21 4 15 30 6 19 1 21 26 20 7 9

36 30 30 33 26 22 26 25 20 13 12 21 21 11 19 13 6 17 10 18 8 5 7 12 7

Mean Std

.902 .347

13.600 8.515

13.600 8.000

17.800 8.851

a

Max = 2

b

Max = 45

4

42

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

ably a large and diverse inventory of formulaic sequences. We do not know much about how these sequences are acquired, but perhaps natives do not need clusters to have such a high saliency in terms of meaning or function in order to be acquired. On the other hand, the nonnatives have more limited language resources, and perhaps because of this, recurrent clusters which have higher saliency in terms of meaning or function seem to be handled better. Whether this leads to acquisition in nonnatives is an interesting question worth further study.

Qualitative analysis Overall trends From the initial examination of the data, several of the outcomes predicted prior to the test were borne out. For example, it came as no surprise that, on the whole, the native speakers performed better than the non-native speakers in terms of accuracy of reproduction and number of accurately reproduced strings. In general terms, the trend for the native speakers was to either reproduce the string accurately or to not retrieve it or attempt it at all (it was not possible to tell which was the case). There were fewer partially reproduced strings in the native speaker data whereas the non-natives were more inclined to partially reproduce many of the strings or produce them disﬂuently or inaccurately. This seems to conﬁrm the pre-test conjecture that for native speakers the strings are either a) easily retrieved single units or b) easily reconstructed groups of grammatical and lexical items, while for the non-native speakers the strings have much less coherence as whole units and therefore have to be reconstructed word by word, resulting in errors. The overall linguistic proﬁciency of the participants1 was reﬂected in the task outcome. The highest level non-native speakers in the study (almost exclusively European, particularly German) mirrored the native speaker performance closely, as they reproduced the majority of the strings accurately, with fewer reformulations or disﬂuent attempts. The intermediate to low level non-native speakers produced the highest number of inaccurate/totally absent string repetitions. (See Spöttl and McCarthy (this volume) for more on the connection between proﬁciency and formulaic sequence performance.) One interesting trend that could be seen in the data of both native and nonnative speakers was that they performed better in the earlier stages of the dictation as a whole, producing more Partially Incorrect responses in the second half of the test. Perhaps this feature is due to factors as simple as fatigue or bore-

Are corpus-derived recurrent clusters psycholinguistically valid?

dom aﬀecting concentration in the latter stages of the test, as the construct of the strings does not vary signiﬁcantly from those in the earlier part of the dictation.

Eﬀect of string attribute The strings that were more consistently recalled, not only by the native speakers but also the non-natives, were the short, self-contained or semantically transparent units, (you know, go away, to make a long story short, I don’t know what to do). The sentence stems produced most diﬃculties, particularly for the nonnative speakers who, if able to reconstruct these strings, often seemed to be attempting it by ﬁtting them into previously known lexical or, more commonly, syntactic patterns. A particularly clear example of this process of attempting to ‘normalise’ the language in order to produce a coherent response can be seen in several of the non-native attempts to reproduce the ﬁnal sequence to make a long story short. Several students replaced the indeﬁnite article with the deﬁnite article, perhaps working on the understanding that the story in question in this string was a deﬁnite reference and literally referred to the dictation story. This would suggest that they were reconstructing the burst along known tracks, using grammatical and lexical clues rather than retrieving the string as a holistic whole. Hesitation and other forms of disﬂuency This feature cannot be speciﬁcally identiﬁed in the quantitative data, but was evident in the transcription process and played a vital part in the evaluation process, as the ability to reproduce the strings ﬂuently was one of the key points under consideration. During the transcription, a note was made of features such as hesitations (anything over approximately 0.5 of a second), false starts and stumbles and repetitions of parts or whole words. As a feature of the candidates’ ability to reproduce the strings ﬂuently, it is worth noting that the non-native speakers displayed hesitations, stutters and false starts in twice as many strings as the native speaker participants. The native speakers displayed the disﬂuent features in only six of the strings, ﬁve of which are sentence stems (from the point of view, in addition to the, aim of this study, in the number of, as shown in ﬁgure). It is possible that these strings are more diﬃcult to recall easily, not only because they are not syntactically whole, ‘stand alone’ units of meaning, but also because as a group they seem to point towards a more formal and academic register, which the native speakers may have subconsciously found more diﬃcult to reconcile with the more informal

43

44

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

tone of the narrative. Interestingly, the sixth stem that produced disﬂuent features (to make a long story short) seems to counter this supposition, being both semantically self-contained and more informal in register. However, it could be argued that this string caused problems for the participants as they attempted to retrieve it due to the proximity, in form and meaning, of a similar string to cut a long story short. Focusing on the non-native speakers, the above-mentioned string, to make a long story short, also caused problems. In addition to the rephrasing mentioned above, this string produced a series of hesitations, repetitions and false starts, again suggesting that the participants were struggling to reproduce the burst accurately. The string from the point of view of also contained hesitations and false starts in a third of the non-native attempts to reproduce it. In comparison with the native speakers, the pauses were of a consistently longer length, usually 1 or 2 seconds in length. Furthermore, the attempts to reproduce the string sometimes resulted in a selection of meaningless bursts e.g. ‘from the (1) er point of economy’. The phrase in addition to the also produced noteworthy results. Almost half of the non-native attempts to reproduce this string (6 out of 14 attempts) showed hesitations. The remaining attempts displayed an attempt to reproduce a version of the string with a similar lexical and syntactic make-up. What seems to be of particular interest here in terms of ‘lexical units’, is the fact that four of the attempted re-phrasings resulted in the phrase in addition, and the remaining attempts display the hesitations and stumblings after the initial two words; i.e. in addition seems to be a stronger contender for the ‘formulaic sequence’ label than the more complex and opaque in addition to the. Interestingly, the phrase aim of the study, whilst problematic, produced fewer problems for the non-native speakers than the native speakers, in terms of hesitations. In essence, where they attempted it, many of the native speakers struggled to reproduce the string in its given form, which resulted in some hesitations and false starts. This may have been due to the native speakers’ perception that the register of this string is more typical of academic discourse, and their subsequent hesitation may have less to do with the string not being pre-formed and more to do with a momentary query as to whether this string was congruent with the others found in the narrative. The non-native participants on the other hand were more likely to rephrase the string along pre-learned rules, rather than attempt to reproduce it in its given form, substituting diﬀerent nouns, including exercise, act, subject, topic, injury or journey, for study. However, these rephrased strings were produced with few hesitations.

Are corpus-derived recurrent clusters psycholinguistically valid?

Two of the strings that caused particular problems for the non-native speakers were I see what you and in the middle of the. The former resulted in 11 examples of hesitations out of 26 attempts to reproduce the phrase. Of particular interest in this context is that most of these attempted reproductions contained two or three hesitations, suggesting that I see what you is not an easily retrieved string. The sentence stem in the middle of the produced a similar eﬀect as in addition to the commented upon above. The non-native speakers displayed several hesitations; however these all occurred in the second half of the string, suggesting that in the middle may be more of a formulaic sequence than in the middle of the.

Meaning versus form Some of the clusters were reproduced more or less equally well or poorly by both participant groups, seeming to give a fairly strong indication of how formulaic they are. Taking a series of examples, it is possible to see certain patterns. For instance, both native and non-native participants performed more or less equally well in terms of accuracy when the short strings you know, on and oﬀ and go away are considered. This suggests a strong degree of formulaticity, despite the various functions and types of phrase (you know is a well known and frequently used a discourse marker; on and oﬀ is a more idiomatic phrase which is not immediately clear in meaning without a context; and go away, as was discussed earlier, is a common verb phrase, simple enough to understand when used in the imperative as it is in this context). Slightly less accurate overall, but still showing similar levels of accuracy across the two groups, are the phrases as a matter of fact and something like that. This came as something of a surprise in the analysis because, whilst both are easily recognisable to both native and non-native speakers, it had been expected that the latter expression would cause more problems for the non-native speakers due to its use in native speaker spoken discourse as a hedging vague term, which is not usually considered a feature of low to intermediate non-native users’ discourse. The string in the middle of the produced diﬀerent results among the two participant groups. Although it appears to have few stand-alone qualities, of the native speakers who attempted to produce it, over 50% did so accurately, with only 3 trying to reconstruct the string inaccurately. This suggests it was largely formulaic for the native speaker group. The majority of the non-natives were able to produce the sequence as well, but also had problems in terms of ﬂuency — hesitation occurred in very short fragments in several of the attempts before the string was ﬁnally produced. This makes it harder to decide whether this string

45

46

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs

was formulaic or not for those nonnatives: although they were ﬁnally successful at producing the sequence, they produced it in a disﬂuent manner. Only further research will tell if such sequences are actually formulaic in nature but not readily accessed, or whether they are compiled online in a halting manner. There were other strings which produced variations in accuracy and ease of recall between the two groups studied. For example, the native speakers found it easy to recall the sequence I don’t know what to do, which seems to stand as a whole unit of meaning even without the further contextualising of the rest of the burst (I don’t know what to do about my boss). For the non-natives this was much more diﬃcult to recall with over 25% trying to reformat the sequence to ﬁt previously learnt patterns (9 candidates reformulated as I don’t know how to do . . . and 2 completely reformulated both semantic and form: I don’t like my boss). Finally, there were often cases where a semantically-similar string was produced (as a consequence → as a result, in the number of → in the amount of ). This might be caused by the target string being partially triggered, but with the noun being replaced. Alternatively, the participants may have retrieved another similar, perhaps more frequent formulaic sequence (more frequent in their idiolect at least) within the same semantic and lexical framework. Unfortunately, the data does not provide a basis on which to speculate between these possibilities.

Limitations Exploring the inner workings of the mind is always a fraught proposition, especially with non-laboratory methodologies where variation is not easily controlled. We acknowledge the limitations of our assumption that reproduction of recurrent clusters in a dictation task indicates the probability of holistic storage of those clusters. It is not a direct measurement, but it is diﬃcult to envision a nonlaboratory technique which could measure this conclusively. However, we believe that our methodology has usefully questioned whether recurrent clusters are holistically stored, and look forward to exploring this question with other research techniques and with larger numbers of clusters and participants.

Conclusion Corpus data is very useful in identifying recurrent clusters in language production. This will continue to be of considerable use in applied linguistic applica-

Are corpus-derived recurrent clusters psycholinguistically valid?

tions. However, this study suggests that corpus data on its own is a poor indicator of whether those clusters are actually stored in the mind as wholes. There seems to have been an unspoken assumption that corpus data is somehow psycholinguistically valid, and in many senses this must be true because the language in corpora has been produced by people using language and so must reﬂect language competence to some extent. However, this study suggests that it is unwise to take recurrence of clusters in a corpus as evidence that those clusters are also stored as formulaic sequences in the mind. Corpus and psycholinguistic approaches complement each other, and unsurprisingly it seems we need both in order to explain how language is processed and used.

Notes . Proﬁciency levels had been noted by the researcher either in terms of recognised language qualiﬁcations and/or through personal judgement based on experience of foreign language speakers of English and regular contact with many of the participants.

References Bailey, K. M. 1998. Learning about Language Assessment. Cambridge MA: Heinle and Heinle. Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Carter, R. and McCarthy, M. In press. The Cambridge Advanced Grammar of English. Cambridge: CUP. Cowie, A. P. 1998. Phraseology: Theory, Analysis, and Applications. Oxford: OUP. Dagut, M. and Laufer, B. 1985. Avoidance of phrasal verbs — a case for contrastive analysis. Studies in Second Language Acquisition 7: 73–80. DeCarrico, J. and Larsen-Freeman, D. 2002. Grammar. In An Introduction to Applied Linguistics, N. Schmitt, (ed.), 19–34. London: Arnold. De Cock, S. 2000. Repetitive phrasal chunkiness and advanced EFL speech and writing. In Corpus Linguistics and Linguistic Theory, C. Mair and M. Hundt (eds), 51–68. Amsterdam: Rodopi. Ellis, N. C. 2003. Constructions, chunking, and connectionism: The emergence of second language structure. In The Handbook of Second Language Acquisition, C. J. Doughty and M. H. Long (eds), 63–103. Malden MA: Blackwell. Erman, B. and Warren, B. 2000. The idiom principle and the open-choice principle. Text 20: 29–62. Fountain, R. L. and Nation, I. S. P. 2000. A vocabulary-based graded dictation test. RELC Journal 31: 29–44. Hakuta, K. 1976. A case study of a Japanese child learning ESL. Language Learning 26: 321– 352.

47

48

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs Howarth, P. 1998. The phraseology of learners’ academic writing. In Phraseology: Theory, Analysis, and Applications, A. P. Cowie (ed.), 161–186. Oxford: OUP. Granger, S. 1998. Prefabricated patterns in advanced EFL writing: Collocations and formulae. In Phraseology: Theory, Analysis, and Applications, A. P. Cowie (ed.), 145–160. Oxford: OUP. Hyland, K. 2000. Disciplinary Discourses: Social Interactions in Academic Writing. Harlow: Longman. Moon, R. 1998. Fixed Expressions and Idioms in English: A Corpus-based Approach. Oxford: Clarendon Press. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J.C Richards and R.W. Schmidt (eds), 191–225. London: Longman. Peters, A. 1983. The Units of Language Acquisition. Cambridge: CUP. Pine, J. M. and Lieven, E.V. M. 1993. Reanalysing rote-learned phrases: Individual diﬀerences in the transition to multi-word speech. Journal of Child Language 20: 551–571. Rosenberg, S. 1977. Semantic constraints on sentence production: An experimental approach. In Sentence Production, S. Rosenberg (ed.), 195–228. New York: John Wiley. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Van Lancker, D., Canter, G. J. and Terbeek, D. 1981. Disambiguation of ditropic sentences: Acoustic and phonetic cues. Journal of Speech and Hearing Research 24: 330–335. Vihman, M. M. 1982. Formulas in ﬁrst and second language acquisition. In Exceptional Language and Linguistics, L. Obler and L. Menn (eds), 261–284. New York: Academic Press. Wong Fillmore, L. 1976. The Second Time Around: Cognitive and Social Strategies in Second Language Acquisition. Unpublished PhD thesis, Stanford University. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP. Yorio, C. A. 1989. Idiomaticity as an indicator of second language proﬁciency. In Bilingualism across the Lifespan, K. Hyltenstam and L. K. Obler (eds), 55–72. Cambridge: CUP.

49

Appendix Note: The target clusters are indicated in bold underlined script. The target clusters are numbered with bold numbers. The numerical dual performance task is illustrated for each cluster. The boring hitchhiker who wouldn’t stop talking 1. I’m going to tell you about the worst car journey I’ve ever had.

9+14

2. It happened one cold day last winter, when I was driving up to Scotland to spend the Christmas holidays with my friends.

16+28

3. I’d seen the hitchhiker by a roundabout near Sheﬃeld and stopped and gave him a lift. I thought I could use the company.

17+6

4/1. I never should have picked him up you know. But I was bored and had another 200 miles to go on my journey.

52+29

5. Although he eventually rode with me forever, as far as I knew he was originally going to get out at Leeds.

75+26

6/2. He said he was going to visit an aunt there, who was a taxi driver or something like that. 44+36 7. But as we drove along the road and I passed a few cars, things rather quickly began to change. 8/3. As a matter of fact, by the time we approached Leeds, I realised he had no intention of getting out of the car.

9+13 55+47

9. He wasn’t going to let me stop, as he kept talking about any subject that happened to pop into his mind.

8+28

10/4. ‘Do you like reading?’ he asked me rather suddenly. ‘It is one of the most relaxing things in the world, isn’t it?’

7+17

11. I made an eﬀort to be polite, but it was diﬃcult to smile and join in the conversation because he didn’t stop talking.

23+9

12/5. ‘On and oﬀ,’ I replied. I don’t have time to read novels at the moment because of work, although I like to usually.

49+63

13/6. He started looking through my Cosmopolitan magazine and said, ‘It’s not too bad this one, although I don’t usually read women’s magazines you understand.’ 37+84 14/7. ‘Most of them have too many pictures and no stories. And there are far too many advertisements for me. Look at this for example.’

9+14

50

Norbert Schmitt, Sarah Grandage, and Svenja Adolphs 15/8. ‘Women’s sweaters in a variety of lengths and colours and they are asking you to pay a hundred and ﬁfty pounds for them!’

74+27

16/9. ‘Would you pay that? Look. This one, as shown in Figure 1 opposite.’ I glanced over at the page he was holding up. 66+17 17/10. Suddenly, there was a loud beep from behind. I was in the middle of the road, heading for the opposite wall. 49+37 18. I moved back to the left side, and made a mental note to myself not to be distracted like that again.

15+9

19/11. The hitchhiker kept talking. ‘Did you know there has been a sharp increase in the number of teenager drivers caught driving drunk?’

36+45

20/12. ‘I mean, what I want to ask these people is why do they do something so dangerous to themselves and other people?’

75+48

21/13. I didn’t answer, letting his voice drift over me in the same way as the snow drifted over the hills in the distance.

93+36

22/14. ‘It says here in the magazine that as a consequence of social problems, drink driving has increased. I mean that’s nonsense isn’t it?’

16+27

23/15. ‘Certainly not everyone who has social problems ends up drink driving night and day like some of these youngsters seem to nowadays.’ 28+45 24/16. He picked up a travel magazine and began looking at camping adventures. I could it was going to be a long journey.

45+37

25. ‘I don’t like camping. If I went to Australia, I would have to stay in a cheap hotel with a bed at least.’

18+28

26/17. ‘I mean, to give you an example, listen to this: ‘After the ﬁrst week we ate mainly wild fruit and ants.’

74+47

27. ‘But the way in which the local people cooked them, over the campﬁre, made them actually taste a bit like peanuts.’

66+45

28/18. ‘But it says here you’ve got to have the ants fresh and have plenty of them. Let the locals have them I say.’

36+83

29. He then looked at an article on survival. ‘Five people tried to survive in the wild for a fortnight, and only one made it.’

4+49

30/19. ‘It says the aim of this study was to test human endurance.’ The hitchhiker was testing mine as he jumped from topic to topic.

17+76

31/20. Then he read out from an advice column. ‘Listen to this,’ he said, ‘I don’t know what to do about my boss.’

46+55

32. ‘Honestly, I would love to meet some of these people who complain. Why don’t they talk to their friends like we’re doing now?’

62+29

Are corpus-derived recurrent clusters psycholinguistically valid? 33/21. ‘I mean, this boss couldn’t be more clear unless he presented her with a card saying Go Away in big letters on the front.’

37+85

34/22. ‘In addition to the embarrassment if anyone recognised their letter, don’t they think that there are more important things to worry about?’

49+27

35. ‘Like this story on the next page. They’re going to build a dam in India which means thousands of people will lose their homes.’

16+5

36/23. ‘I see what you would want a dam for though, so maybe they could just build a smaller one in its place.’

47+56

37/24. ‘I suppose from the point of view of the economy, it might be useful to build a dam like that, but who’s to say?’ 16+28 38. By this time I had to get rid of the hitchhiker. I stopped for petrol and he went to the toilet.

9+14

39/25. To make a long story short, I threw out his pack and drove oﬀ without him. I’ll never pick up a hitchhiker again!

26+77

5

The eyes have it An eye-movement study into the processing of formulaic sequences Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

University of Nottingham

Introduction There is a consensus among applied linguistic scholars that the use of formulaic sequences contributes to ﬂuent, well-formed, and appropriate language (e.g. Pawley and Syder, 1983; Nattinger and DeCarrico, 1992; Wray, 2002; Schmitt and Carter, this volume). The underlying belief is that preformulated sequences of language, which are stored in the mind as wholes, can be recognized and retrieved with a minimum amount of processing eﬀort, which facilitates quick and accurate language use. However, the actual mechanics of the processing of formulaic sequences have been inadequately researched. Much of the research into formulaic sequences has either been corpus-based and descriptive, or acquisition-based, which focuses on sequences which have been produced by language novices, either L1 or L2. There has been relatively little use of the rigorous experimental paradigms from the ﬁeld of psychology which could shed light on the underlying processing mechanisms. This study will take advantage of one such methodology, the study of eye-movement during the reading of texts, to explore how formulaic sequences are processed, by investigating how they are read in context.

The eye-movement paradigm When reading a page of text our eyes do not move in a continuous sweep across the page but rather the movement tends to be noticeably jerky, stopping several times a second to inspect a word. Occasionally the reader may choose to move

54

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

back to a part of the text that they have previously ﬁxated upon or they may jump over several words to land in a previously uninspected part of the page. After the jerky movement or saccade has been completed, the eyes come to rest, and this resting time is known as a ﬁxation. Typically, readers ﬁxate for 200–250 msec between saccadic movements that last 20–30 msec, and information is extracted from the page only while their eyes are stationary. Although readers of alphabetic scripts such as English move their eyes in a regular left-toright fashion, they occasionally do go back to a point in the text that may have been previously ﬁxated, or to text that may have been passed over during a saccade. These return ﬁxations are known as regressions and they occur on average 10–15% of the time for a normal adult reader. Regressive ﬁxations are usually launched to areas of the text that have caused linguistic confusion, or contain particularly diﬃcult words. More detailed descriptions of the characteristics of readers’ eye movements can be found in Rayner (1998), Underwood and Batt (1996) and Underwood (1998). The appeal of measuring eye movements is that they give an indication of what processes are occurring in the reader’s mind. This assumption is based on the reports that the number of regressions and forward ﬁxations increase with text diﬃculty, and that they tend to be of a longer duration than those associated with less complex text. Furthermore, poor readers tend to make more regressive ﬁxations on a piece of text than good readers (Tinker, 1958). There is a considerable body of evidence which supports Just and Carpenter’s (1980) theory that ﬁxations provide an “on-line” indication of reading difﬁculty that also involves moment-to-moment control of the dynamics of reading. When ﬁxating a relatively important part of the ﬁeld, our eyes will remain stationary for a duration that is indicative of the increased amount of processing that is being performed. The extreme version of this theory proposes that words that are not ﬁxated are not processed. Just and Carpenter’s theory is in fact based on two assumptions, the immediacy assumption which states that “the reader tries to interpret each content word of a text as it is encountered”, and the eye-mind assumption that “the eye remains ﬁxated on a word as long as the word is being processed. So the time it takes to process a newly ﬁxated word is directly indicated by the gaze duration” (Just and Carpenter, 1980: 330). Support for this model comes from a variety of sources in which high and low frequency words are embedded in sentences that are to be read for comprehension. Word frequency is a potent determiner of ﬁxation duration. For example, Inhoﬀ and Rayner (1986) and Rayner and Duﬀy (1986) compared the ﬁxations on sentences such as The heavy rain damaged the crops with those on The heavy

An eye-movement study into the processing of formulaic sequences

hail damaged the crops. The word hail has a lower frequency of occurrence than the word rain, and the ﬁxation durations on these two target words were 262 msec and 225 msec respectively. As frequency decreases, so the amount of time required to extract the necessary information from the word increases. This effect is not a product of the relationship between frequency and length. Low frequency words do tend to contain more letters than high frequency words, but when words of similar length are compared, high frequency words gain shorter ﬁxations, and this holds for short words as well as for longer words (e.g., Underwood, Binns & Walker, 2000). Words that need more visual processing receive longer ﬁxations, and explaining the frequency eﬀect is a primary goal of theoretical models of eye movement control in reading (Reichle, Pollatsek, Fisher & Rayner, 1998). Just and Carpenter (1980) have provided data which showed that during the reading of paragraphs taken from scientiﬁc text, the length of the inspection was directly related to the diﬃculty of processing. For example, one participant looked at the word question for 300 msec, whilst they looked at the equally long but less frequent word transfer for 633 msec. Additional support for the on-line approach comes again from Carpenter and Just (1983) who showed that gaze duration on a target was not inﬂuenced by the length or frequency of the preceding word. They concluded that cognition is locked on to ﬁxation and that there is no inﬂuence of material prior to or ahead of ﬁxation; thus ﬁxation durations are indicative of the processing of the word that is being ﬁxated. A major source of evidence that suggests that our eyes are under the control of the cognitive processes involved in sentence comprehension comes from studies of the sensitivity to sentence contexts. Ehrlich and Rayner (1981) showed that during the reading of passages participants ﬁxated words that were predicted by the preceding context less often (51% of the time) than words appearing in neutral contexts (ﬁxated 62% of the time). If the target was predictable and it was ﬁxated, the ﬁxation duration was shorter than if the same target had been ﬁxated but appeared in neutral context (221 vs 254 msec). Words that are to some extent predictable by their preceding contexts can be thought of as being easier to recognize, and this ease of processing is again indicated by shorter ﬁxation durations. In the present study we asked whether the short contexts available in familiar idioms and other formulaic sequences such as on the other hand and as a matter of fact can also provide suﬃcient context to facilitate the processing of their terminating word, and also whether this facilitation eﬀect would be seen in a group of readers less familiar with these English expressions.

55

56

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

Methodology Selection of the target formulaic sequences This study is focused on the processing, rather than identiﬁcation, of formulaic sequences, and so we wished to use unambiguous cases as our targets. We also wished to include a range of formulaic sequence types, including lexical phrases (Nattinger and DeCarrico, 1992), transparent metaphors, sayings/proverbs, and idioms. To compile a list of potential formulaic sequences for this study, the lists used in the Schmitt, Dörnyei, Adolphs, and Durow (this volume) acquisition study were ﬁrst consulted and 45 potential candidates were identiﬁed for the lexical phrases category. In order to obtain clear cases of the other categories, the Oxford Learner’s Dictionary of English Idioms (1994) was consulted and an additional 40 candidates were extracted. The 85 candidate phrases were then subjected to a frequency analysis in two corpora: the British National Corpus and the CANCODE. Candidates with relatively low frequencies were deleted from the list. In addition to being frequent, the technicalities of the eye-movement methodology (see procedure below) meant that certain additional criteria were necessary to remain as a candidate sequence: • the sequence had a relatively obvious beginning, i.e. it did not begin with several function words • the sequence did not ﬁnish with a function word • the sequences were 4–8 words long • the sequences were relatively predictable from their initial components. The assumption was that the more frequent sequences were also more likely to be well-known. To conﬁrm this assumption, the remaining 21 formulaic sequences were embedded in a modiﬁed cloze test with short contexts, such as the following example: Steve thinks Sue is quite pretty, but I don’t think so at all. But as they say, “Beauty is in the e___ o___ t___ b___.”

This instrument was given to 30 native ﬁrst-year undergraduates. One sequence was produced by only four participants and was eliminated. The remaining twenty sequences were all well-known, being produced by 28–30 participants. The two exceptions were the straw that broke the camel’s back (19) and keep your nose to the grindstone (17), which were still known by the majority of participants.

An eye-movement study into the processing of formulaic sequences

The twenty sequences were then embedded in twenty extended contexts, with each context story containing one target formulaic sequence. In addition each context contained the terminal word from a formulaic sequence from another passage. By comparing terminal words when they appear in a formulaic sequence and when they appear in non-formulaic text, we are able to control for any individual characteristics of the words that may prompt variability in ﬁxation behaviour, such as word length, word frequency or part of speech. In the example below, the target sequence is beat around the bush and the non-formulaic terminal word is basket, from the idiom put all your eggs in one basket in another context story. You’ve been talking in circles for 30 minutes trying to tell me something. Please don’t beat around the bush for another half an hour, but just get to the point and tell me! If it was you who dropped my ﬂower basket, don’t worry because I won’t be angry with you.

The contexts were subjected to frequency analysis through the The Compleat Lexical Tutor (v.2) (Cobb, 2003) to ensure that low frequency vocabulary was kept to a minimum, so that non-native speakers would have no problems reading the context stories. Finally, a simple comprehension question for each context was devised to ensure participants read the contexts conscientiously. The question for the passage above is: Did someone drop the ﬂower basket? (Answer Yes)

The vocabulary-controlled contexts were then formatted in Word using black Helvetica font, size 8, spacing = 0.5, with line spacing set to double. Care was taken to ensure that each target sequence appeared near the middle of its line in the passage, and was not split between lines. Finally, this text was pasted and centred onto plain white bitmaps of dimension 1024 × 768 pixels for display on the apparatus computer monitor. (See appendix for the complete passages.)

Apparatus An SMI Eyelink system was used to take eye-movement measures. In the Eyelink system, a head-mounted high-speed camera takes an image of the right pupil every 4ms, and an on-line parser uses a velocity threshold of 30°/second to allocate samples into saccades with the resting point between them deﬁned as ﬁxations or blinks. A chin rest was used to minimise head movements. Passages were displayed on a 36 × 27 cm monitor with a resolution of 1024 × 768 pixels.

57

58

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

Procedure Participants were seated at a ﬁxed viewing distance of 70cm from the computer monitor with their head mounted on a chin rest. The SMI eye-tracking device was then placed on the participant’s head, and the camera positioned at an optimal viewing point to record the activity of the right pupil. A 9-point calibration procedure was then applied, and when successful, the experiment began. Each trial of the experiment began with a drift-correct display consisting of a centrally presented circle on which the participant needed to maintain a stable ﬁxation. This procedure helps re-align the system with eye position in the event of small head-movements. The experimenter terminated the drift-correct procedure when a satisfactory ﬁxation was achieved. A ﬁxation cross followed in the top left of the screen for 1 second, to allow the participant to position their eyes at the beginning of the text. The ﬁxation cross was then replaced by a passage of text. The participant was able to read each passage freely with no time constraints until they felt able to answer the simple comprehension question, upon which they pressed either the left or right arrow key and the passage was replaced by the question. Each question required either a yes response (right arrow key) or a no response (left arrow key). Participants were told to guess if uncertain. Once a response had been made, the drift correct screen appeared marking the onset of the following trial. Each participant read each passage (20 in total) before being debriefed as to the nature of the experiment and paid for participation.

Participants Two groups of participants were tested: native and non-native speakers. Each group consisted of 20 mainly postgraduate students studying at the University of Nottingham. Thirteen of the nonnatives had Chinese as their mother tongue, and the rest spoke a variety of L1s. Their degree of L2 competence was not controlled for, but it can be assumed to be relatively high, because they were all studying at an English-medium university, with a minimum undergraduate entrance requirement of CBT TOEFL 213 (Paper TOEFL 550) or IELTS 6.0. All participants had normal or corrected-to-normal vision.

Results Analysis was conducted on the ﬁxations only. All ﬁxations less than 100ms were removed from analysis, as it is assumed that on-line cognitive processes do not

An eye-movement study into the processing of formulaic sequences

inﬂuence short ﬁxations. The measures that were collected included the mean number of ﬁxations made on all words in the passages, the durations of those ﬁxations, the number of ﬁxations on the terminal words (when in a formulaic sequence and when in a non-formulaic context), and the durations of the ﬁxations on those terminal words. These measures are shown in Table 1. The total number of ﬁxations made on all passages and the durations of these ﬁxations provide an overall indication of diﬀerences in the reading dynamics of the two groups of readers. These diﬀerences are indicated when informally comparing Figures 1 and 2. Figure 1 shows the pattern of ﬁxations, and their durations, made by one of the native English speakers while reading one of the passages. This contrasts with the pattern in Figure 2 (a non-native speaker reading the same passage), where there are more ﬁxations and the durations are more variable. There is also more variability among participants in the nonnative speaker group, relative to the native speakers, and this is indicated in the larger standard deviations shown in Table 1. Comparisons between readers were made for the two measures using unrelated t-tests. Native speakers made fewer ﬁxations overall than non-native speakers (t38 = 4.76, p < 0.001), averaging less than one ﬁxation per word in contrast with the non-native speakers’ average of almost one and a half ﬁxations per word. The duration of those ﬁxations also varied, with native speakers dwelling upon each word for reliably less time (t38 = 3.11, p < 0.01). The number of ﬁxations on each terminal word (in sequence and out of sequence) were compared for the two groups of readers using a mixed-design Table 1. Eye movement measures recorded during the reading of the passages (Standard deviations are shown in parentheses.) Native speakers Non-native speakers Mean number of ﬁxations on all words in all passages Mean ﬁxation duration on all words in all passages (msec) Mean number of ﬁxations on terminal words in formulaic sequences Mean number of ﬁxations on terminal words in non-formulaic contexts Mean ﬁxation duration on terminal words in formulaic sequences (msec) Mean ﬁxation duration on terminal words in non-formulaic contexts (msec)

0.92 (1.94)

1.40 (3.14)

201 (25.6)

228 (29.2)

0.71 (0.24)

1.37 (0.56)

0.86 (0.30)

1.46 (0.43)

179 (31.5)

247 (62.2)

210 (54.3)

249 (41.3)

59

60

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

Figure 1. Fixations on a passage read by a native English speaker. Each ﬁxation is indicated by a circle here, and with larger circles indicating longer ﬁxations. The lines joining the ﬁxation-circles are representative of the reader’s saccadic eye movements. In this particular passage, the formulaic sequence is by the skin of his teeth (line 5), and the terminal word from a formulaic sequence that formed part of another passage is the word nine (line 6). Note the regular left-to-right sequence of ﬁxations along each line of text, the high proportion of words that are not ﬁxated at all, and the consistency of ﬁxation durations indicated by the sizes of the circles superimposed on the text

analysis of variance. Native speakers ﬁxated the terminal words less often than the non-native speakers (F1,38 = 27.7, p < 0.001), and terminal words in the ﬁnal position in formulaic sequences gained fewer ﬁxations than the same words in non-formulaic contexts (F1,38 = 7.24, p < 0.05). The interaction between these factors was not reliable. The ﬁxation durations on the terminal words were also inspected with a mixed-design analysis of variance. Native speakers had shorter ﬁxations on target words than did non-native speakers (F1,38 = 14.55, p < 0.001), and words in formulaic sequences gained shorter ﬁxations (F1,38 = 6.37, p < 0.05).

Figure 2. Fixations on a passage read by a non-native English speaker. Note the greater number of ﬁxations (following forward and regressive movements) and the greater variability of their durations

An eye-movement study into the processing of formulaic sequences

These diﬀerences were qualiﬁed by an interaction (F1,38 = 5.50, p < 0.05), that was further inspected with an analysis of simple main eﬀects. For the native speakers, there was a reliable diﬀerence between terminal words in and out of formulaic sequences (F1,38 = 11.86, p < 0.01), but for non-native speakers the diﬀerence between words was not reliable (F < 1). Table 1 indicates a 31 msec diﬀerence between formulaic and non-formulaic terminal words for the native speakers, but a diﬀerence of only 2 msec for the non-native speakers.

Discussion The results show that the native speakers were more ﬂuent readers than the nonnative participants. The advantage for the natives was consistent across the various measures, including fewer and shorter ﬁxations on all words in the twenty contexts, and fewer and shorter ﬁxations on the terminal words. Although it is unsurprising that the natives would be more proﬁcient readers, the nonnatives were relatively advanced in their English, studying at the same university as the natives and having passed the university’s language entrance requirements. Thus, the short context stories, in which low frequency vocabulary had been controlled, should not have proved overly challenging, but it still seems that even relatively proﬁcient nonnatives process written text less eﬃciently than educated natives. This is indicated by the fact that nonnatives ﬁxated on each word 1.4 times on average, and is particularly obvious when we observe the actual tracking during reading, as illustrated in Figure 2. The nonnatives tended to have many regressions, and most of the words were ﬁxated, often more than once. Conversely, the natives had relatively uniform ﬁxations, evenly spaced through the text (Figure 1). Natives apparently need to sample less of the text than nonnatives, mainly the content words, with many function words remaining unsampled. This eﬃciency of sampling also held true for the terminal words where natives ﬁxated less than the nonnatives in both formulaic/nonformulaic conditions. This result reﬂects the general reading advantage of the natives. But the key comparison is between terminal words within and outside of formulaic sequences. Both participant groups ﬁxated words less often when those words were part of a formulaic sequence than when those words were embedded in non-formulaic text. This means that the participants had less need to sample those words when they were in formulaic sequences. The obvious explanation is that the participants were better able to predict these terminal words based

6

62

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

on the earlier part of the formulaic sequences. Ehrlich and Rayner’s (1981) participants ﬁxated words that were predicted by the preceding context less often and more quickly than words appearing in neutral contexts, and it seems that the context provided by a formulaic sequence itself is enough to facilitate the processing of the terminating word of that sequence. This is largely consistent with the view that such sequences are stored and processed as wholes. Once a sequence is recognized, there should be less need to sample the end of the sequence, simply because the person already knows what that ending is. It could be argued, however, that if formulaic sequences are processed as wholes, there would be no need to sample the ends, leading to the expectation that subsequent ﬁxations would be beyond the terminal word. We did not ﬁnd this, with terminal words drawing ﬁxations in the majority of cases. It may be that the mind still ﬁxates on terminal words, albeit brieﬂy, as a kind of check in case the word string appears to be a formulaic sequence, but is in fact not. Let’s take the sequence black sheep of the family for example. It occurs eight times in the British National Corpus, but similar strings black sheep of the ﬁnancial world, black sheep of the independent sector, black sheep of the industry each appear once. Thus, the string black sheep of usually predicts the family, but the mind must allow for the creative use of language, where these formulaic sequences are manipulated for eﬀect, precisely because it can be assumed that people know the original form. It should be stressed however, that when we looked at the corpus evidence for the formulaic sequences in this study, they were almost exclusively used in the their original forms, which reinforces the predictive power of the beginning segments of formulaic sequences. Another possible explanation for why the terminal words were ﬁxated is that the mechanism controlling the reader’s ﬁxations is unable to advance the saccade accurately enough to skip the complete sequence, even though this would be most eﬃcient. This would suggest that skipping is not determined by contextual predictability. Current models of eye guidance during reading also propose that decisions to skip words are informed by the extraction of visual information about words that are not currently ﬁxated. The E-Z Reader model of eye guidance will be discussed in more detail later in this section. It is interesting to note that the nonnatives also had fewer ﬁxations on the terminal words when in a sequence. Although they needed more ﬁxations than the natives on average, they still seemed to require fewer ﬁxations at the end of a formulaic sequence than in the middle of a nonformulaic text. In other words, even though nonnatives were not as proﬁcient at reading as the natives, the nonnatives still demonstrated the same type of processing advantage when it came

An eye-movement study into the processing of formulaic sequences

to terminal words. However, this advantage only held in terms of number of ﬁxations, not in terms of the duration of ﬁxation. Although they needed fewer ﬁxations of the terminal words in sequences, they needed to look at these words just as long when in sequences as when not in sequences. The natives, on the other hand, required a much shorter gaze when the terminal words were in sequence than when not. Given the current state of knowledge regarding the processing of formulaic sequences, it is diﬃcult to explain why the nonnatives required fewer ﬁxations but an equally long gaze time for terminal words in sequences. We could speculate that mastering the recognition of formulaic sequences in written texts is an incremental process, and early partial mastery is rewarded mainly by not needing to ﬁxate on the vocabulary in a text as much, but it is only with fuller mastery that the requirement for a “full duration” ﬁxation lessens. This problem of a dissociation between the number of ﬁxations and the duration of those ﬁxations, seen in the reading of the non-native speakers, can be resolved by considering a current theory of eye movement control in skilled readers. The E-Z Reader model proposed by Reichle, Pollatsek, Fisher & Rayner (1998) is an account of where readers look, and for how long, and takes account of a range of behaviours (see also updated versions of the model by Reichle, Rayner & Pollatsek, 1999, 2003, and by Rayner, Reichle & Pollatsek, 2000). For example, the longer ﬁxations on uncommon words, the skipping of highly predictable words, the ‘spill-over’ of processing from one word to the next, and longer saccades into longer words, are all predicted by the E-Z Reader model, which has been tested against the recorded eye movements of adult readers. To see how the model can account for our non-natives showing sensitivity to the appearance of a terminal word in a formulaic sequence in their ﬁxation probabilities but not in their ﬁxation durations, we need to describe the model in a little detail. The E-Z Reader model proposes that eye movement control is achieved through a series of processing stages, some of which inﬂuence the decision about where to move next, and some inﬂuence the decision about when to move our eyes. These processes are as follows: 1. Familiarity Check. In this stage a newly ﬁxated word is assessed for its familiarity, determined mainly by the word’s frequency of occurrence in the language. Unfamiliar words will take longer here, and this is the ﬁrst point at which word frequency will inﬂuence ﬁxation duration. This is the frequency given by a word corpus such as the one used in the present experiment, al-

63

64

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

though it must be recognised that a word corpus is an estimate and is an average for a population. Individuals within the population will have their own lexicon, in which each word will have its own frequency. This frequency will reﬂect the reader’s own personal interests and domain of expertise. It will also change on a daily basis, as words are encountered. (Consider, for example the subjective frequency of the generally infrequent word metatarsal for a fan of English football at the start of the 2002 World Cup, when one of their favourite players broke this bone.) The predictability of the word, as determined by its context, will also inﬂuence this assessment. 2. Lexical Access. In this stage the word is recognised in that its lexical representation is contacted by the visual input, and the word becomes available for whatever syntactic and semantic processing the reader requires. The word’s frequency and context will inﬂuence the ease of lexical access, as with the Familiarity Check. These two processes together constitute the word recognition system, and are separated so that once familiarity is determined and indicative of imminent recognition (but before the full lexical access is achieved), a signal can be sent to the oculomotor system is start programming the next saccadic eye movement. The major advantage of separating the Familiarity Check from Lexical Access is that this decouples the signal to program a saccade from the signal to shift attention. In turn, this allows the model to explain ‘spill-over eﬀects’ whereby processing of a diﬃcult word continues to have an inﬂuence when the reader’s eyes have moved to the next word (WordN+1). If the reader’s eyes can move before lexical access is completed, then any residual lexical activity would be apparent when the next word was being ﬁxated. 3. Early Saccadic Programming. The ﬁrst stage of saccadic programming is said to be labile, in that it can be modiﬁed by information that is collected before this stage is completed. During this labile stage of processing the following sequence is possible. A decision can be made to move to the next word following completion of the Familiarity Check; attention then moves to the next word and a Familiarity Check on that word establishes that it is very familiar; and at this point the saccade to that word can be cancelled. This early extraction of visual information from the next word can result in skipping, but only if this stage of saccadic programming is labile can the movement be cancelled, to allow the reader to skip the ﬁxation on the next word (WordN+1). 4. Late Saccadic Programming. During the course of programming a saccadic movement a threshold is reached after which programming is no longer la-

An eye-movement study into the processing of formulaic sequences

bile, and the saccade will be executed. At this point the saccadic movement is obligatory, and will be executed upon completion of programming. 5. Saccadic Movement. Saccades are usually regarded as ballistic movements, in that once initiated they cannot be modiﬁed. The eyes are projected towards a target just as a ball is thrown from one player on a sports ﬁeld to another — once it leaves the thrower’s hand the trajectory can no longer be modiﬁed by the thrower. The characteristics of saccadic movements are no longer inﬂuenced by linguistic factors once the non-labile programming stage is reached. This powerful model of eye movement control accounts for the major phenomena observed when adults read sentences. The eﬀects of high word frequency and high word predictability have their eﬀects at the ﬁrst two stages, by allowing them to be completed early, allowing saccadic programming to start early. The eﬀects of word skipping are explained by recognition of the familiarity of the next word before it is ﬁxated, at a point when saccadic programming is labile. We can now look at the processing of formulaic sequences with the E-Z Reader model, and speculate on diﬀerences between native and non-native speakers. When a native speaker reads a formulaic sequence of words such as I can see what you mean or the black sheep of the family, the words become more predictable as they progress through the sequence, and the ﬁnal word (WordN) is almost redundant. The Familiarity Check would be completed earlier than for the equivalent terminal word placed in a non-formulaic text, thereby allowing faster word recognition overall. This has two consequences. Because the ﬁnal word in the sequence is recognised early, the signal to begin the saccadic programme is started early, and so the reader’s eyes ﬁxate the target word for less time than otherwise. This is manifest in the reduced ﬁxation duration on ﬁnal words in formulaic sentences compared to the same words in non-formulaic sequences (0.71 ﬁxations per word vs. 0.86 ﬁxation per word). When looking at the penultimate word in a sequence (WordN−1) the Familiarity Check would allow the reader to ascertain that the sequence is predictive and the words familiar, and attention would move to the ﬁnal word (WordN). The Familiarity Check on this word, performed while the reader’s eyes remain on WordN−1, would also conclude that the word is familiar. If this Check on WordN (the target word) is completed during the labile Early Saccadic Programming stage, then a decision can be reached to skip the target word. Not all target words were ﬁxated by the native speakers, and so we can conclude that the Familiarity Check did indeed enable the skipping decision to be made.

65

66

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin

The non-native speakers had longer ﬁxation durations overall when reading the passages, suggesting that their personal frequencies of the words being shown were not as high as those of the native speakers. This is a product of their lifetime’s exposure to these words. When a non-native speaker encountered a formulaic sequence, the pattern of ﬁxations was slightly diﬀerent to the pattern seen in native speakers. Whereas they did not show an eﬀect of predictability on ﬁxation duration, they did show an eﬀect on the number of ﬁxations. The eﬀect of predictability upon ﬁxation probability may be a product of a relatively slow Familiarity Check resulting in an intra-word saccade (see Rayner, Reichle & Pollatsek, 2000). There were 1.46 ﬁxations on each target word not in a sequence, in contrast with 1.37 ﬁxations for the same words in formulaic sequences, but note that in both cases there is more than one ﬁxation per word. These words received multiple ﬁxations, and the average over all words was 1.40 ﬁxations per word for non-native speakers. Their tendency was to ﬁxate, and then sometimes reﬁxate. There was moderation of this decision, with a greater probability of reﬁxation when the ﬁnal word (WordN) was not read in the predictive context of a formulaic sequence. Information about the word, collected during the word recognition stages, was used to inﬂuence the decision as to whether to make an intraword or inter-word saccade, but this did not inﬂuence the duration of ﬁxation on WordN . For the non-native speakers, each word was ﬁxated, on average, more than once, but, as can be seen in Figure 2, these multiple ﬁxations consisted of reﬁxations (i.e., making another ﬁxation on the word before making a movement to the next word) and regressive ﬁxations (i.e., returning to a word after reading other words). The multiple ﬁxations of non-native speakers varied according to whether the target word completed a formulaic sequence or appeared in a neutral context, but their durations did not vary. The E-Z Reader model does not offer a speciﬁc explanation for this pattern, which suggests that the processing of formulaic sequences varies as a result of post-recognition processes. Recognition of the target word did not vary, but decisions about where to ﬁxate next did show sensitivity to these phrases. Why might the non-native speakers choose to reﬁxate or regress to the ﬁnal word of a formulaic sequence? The present study does not provide an answer, but does suggest that the decision to do so occurs after all of the words in the sequence have been recognised, and therefore may result from uncertainty or lack of conﬁdence in their comprehension. An issue we did not directly address in this study is whether the nonnatives actually knew the formulaic sequences or not. We know from the selection criteria that the natives were very likely to know all or almost all of the formulaic sequences, but this may not hold true for all of the nonnatives. When we tried

An eye-movement study into the processing of formulaic sequences

to follow this up, only six of the nonnative participants were available. A Dutch and a German participant knew 17 and 20 sequences respectively, but three Chinese speakers (more representative of the nonnative group) knew 9, 10, and 12. A Japanese participant knew 10 out of the 20 sequences. Overall, it appears for many of the nonnatives, a considerable number of the formulaic sequences were unknown. Still, the nonnatives did show an advantage for the terminal words in the formulaic sequences, which may indicate partial knowledge of the target sequences, but knowledge which had not reached the level to where participants could consciously deﬁne the sequences. The issue of partial knowledge is an intriguing one which could be usefully explored in future research.

Conclusion This study has applied the eye movement research paradigm from psychology to explore the question of how formulaic sequences are processed. We now have evidence that the terminal words in formulaic sequences are processed more quickly than the same words when in nonformulaic contexts. This provides evidence for the position that formulaic sequences are stored and processed holistically. But there are still many questions regarding the exact nature of the processing, for example, why the nonnatives were found to use fewer, but not shorter, ﬁxations of the terminal words. Another issue is how the words in a formulaic sequence relate to each other in terms of processing. This study showed the value of eye-movement methodology in exploring the terminal word of formulaic sequences, but could not investigate each word in a sequence, simply because natives in particular do not sample all words in a text. Another methodology is required which can explore the processing of formulaic sequences in a word-by-word manner. Schmitt and Underwood (this volume) use a self-paced reading methodology in an attempt to do this. Given the widely recognized importance of formulaic sequences, it is now time to use all of the tools available in the psycholinguistic toolkit to investigate these items.

References Carpenter, P. A. and Just, M. A. 1983. What your eyes do while your mind is reading. In: Eye Movements in Reading: Perceptual and Language Processes, K. Rayner (ed.), 275–307. New York: Academic Press.

67

68

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin Cobb, T. The Compleat Lexical Tutor (v.2). Internet resource available at . Accessed February 2003. Ehrlich, S. F. and Rayner, K. 1981. Contextual eﬀects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior 20: 641–655. Inhoﬀ, A.W. and Rayner, K. 1986. Parafoveal word processing during eye ﬁxations in reading: Eﬀects of word frequency. Perception & Psychophysics 40: 431–439. Just, M. A. and Carpenter, P. A. 1980. A theory of reading: From eye ﬁxations to comprehension. Psychological Review 87: 329–354. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J. C. Richards and R.W. Schmidt (eds), 191–225. London: Longman. Rayner, K. 1998. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124: 372–422. Rayner, K. and Duﬀy, S. A. 1986. Lexical complexity and ﬁxation times in reading: Eﬀects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition 14: 191–201. Rayner, K., Reichle, E. D., and Pollatsek, A. 2000. Eye movement control in reading: Updating the E-Z Reader model to account for initial ﬁxation locations and reﬁxations. In Reading as a Perceptual Process, A. Kennedy, R. Radach, D. Heller, and J. Pynte (eds), 701–719. Oxford: Elsevier. Reichle, E. D., Pollatsek, A., Fisher, D. L., and Rayner, K. 1998. Toward a model of eye movement control in reading. Psychological Review 105: 125–157. Reichle, E. D., Rayner, K., and Pollatsek, A. 1999. Eye movement control in reading: Accounting for initial ﬁxation locations and reﬁxations within the E-Z Reader model. Vision Research 39: 4403–4411. Reichle, E. D., Rayner, K., and Pollatsek, A. 2003. The E-Z Reader model of eye movement control in reading: Comparisons to other models. Behavioral and Brain Sciences, in press. Tinker, M. A. 1958. Recent studies of eye movements in reading. Psychological Bulletin 55: 215–231. Underwood, G. (ed.). 1998. Eye Guidance in Reading and Scene Perception. Oxford: Elsevier. Underwood, G. and Batt, V. 1996. Reading and Understanding. Oxford: Blackwells. Underwood, G., Binns, A., and Walker, S. 2000. Attentional demands on the processing of neighbouring words. In Reading as a Perceptual Process, A. Kennedy, R. Radach, D. Heller, and J. Pynte (eds), 247–268. Oxford: Elsevier. Warren, H. (ed). 1994. Oxford Learner’s Dictionary of English Idioms. Oxford: OUP. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

Appendix Experimental stimuli Note: In the passages below, the target formulaic sequences are in italics and the control words in bold for the reader’s convenience. In the actual experiment, these lexical items were unmarked. Welcome to the experiment! In this experiment you will be required to read brief passages of text in preparation for a simple comprehension question. When you have ﬁnished reading each passage, press either the left or right arrow key. A question will then appear about the previous passage. The answer will either be ‘yes’ or ‘no’. Press the right arrow key ⇒ for ‘yes’. Press the left arrow key ⇐ for ‘no’. First there is a practice passage to get you used to the experiment. Press either key when you are ready to start. My friend Peter always insists that I go out with him to the pub at lunchtime, but I prefer to stay at my desk working. It’s a real problem to me, because I value his friendship and I don’t want to upset him by refusing to go out. If only he would ask my opinion occasionally then I would be able to say what I really think. Q: Am I happy to go along with Peter’s plans? Press ⇐ for No Press ⇒ for Yes Press either arrow key when you are ready to start the real experiment. Dave had been out at parties all weekend and did no work at all on his course assignment, even though it was due at the beginning of the week. But then he worked really hard on Monday and met the deadline by the skin of his teeth before the oﬃce closed on Tuesday afternoon. Dave had almost nine days to write the essay but as usual he did it all at the last moment. Question 1: Did Dave hand his essay in on time? (Answer Yes) Sam always seemed to leave things until he couldn’t put them oﬀ any longer. Sometimes this got him into real trouble. His dentist had warned him about having his teeth looked at regularly but Sam did not visit him again until he had a bad toothache.After that terrible experience, Sam realized that a stitch in time saves nine and decided to visit his dentist every six months. Question2: Has Sam always visited the dentist regularly? (Answer No)

70

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin You’ve been talking in circles for 30 minutes trying to tell me something. Please don’t beat around the bush for another half an hour, but just get to the point and tell me! If it was you who dropped my ﬂower basket, don’t worry because I won’t be angry with you. Question 3: Did someone drop the ﬂower basket? (Answer Yes) Dave had been out having a good time all semester and now exams are coming and he is not prepared. He’ll have to keep his nose to the grindstone in order to pass them. It seems that his policy of leaving things until the last minute means that he’s going to have to miss Sam’s party. Question 4: Is Dave prepared for his exams? (Answer No) Your ﬁnancial adviser gave you bad advice when he insisted that you put all of your money into high-technology stocks, and now they are worth nothing. I told you not to put all your eggs in one basket. You should have spread your money into many diﬀerent kinds of investment. And now that you have hurt your back in the car accident, you will need all of the money you can get. Question 5: Were you advised to invest in high-technology stocks? (Answer Yes) You’ve been putting oﬀ taking your driving test for weeks because you are afraid. You need to just take the bull by the horns and do it anyway. I’m sure you’ll pass it easily and in a short time you’ll be driving yourself all over town. Question 6: Have you taken your driving test? (Answer No) Joe said that there are a lot of factors that cause unemployment in the UK that you should be clear about. The cost of factories and equipment is high here, labour costs less overseas, and the pound is currently very strong in comparison to other currencies. Jill said “Okay, I see what you mean that unemployment is so complex that it can’t be blamed on one thing, but you’re missing the human element, because the eﬀect of unemployment on a family can be tragic.” Question 7: Does Joe think unemployment is caused by many factors? (Answer Yes) Cindy was always getting herself into trouble, and was back into diﬃculties again. It was going to be hard to tell her mother that that she had lied to her about throwing a stone through the kitchen window, but she knew that her mother would ﬁnd out eventually. So as usual honesty is the best policy and Cindy was just going to have to tell the truth. Question 8: Was Cindy going to lie again? (Answer No)

An eye-movement study into the processing of formulaic sequences Kate checked whether Alice had a lot of things to do at the garden centre today. Alice replied that she did, mainly buying some ﬂowers and a new bush for the front garden and said that as a matter of fact, she was leaving for the garden centre right that minute. Question 9: Was Alice going to the garden centre? (Answer Yes) I can’t make up my mind what to do, but it’s a well-known fact that I’m indecisive. I’d like to buy a new coat for the winter but on the other hand I need to save money for the rent. I can’t decide which is more important. Question 10: Would this person like to buy new boots for winter? (Answer No) I went home last weekend and went out dancing with my old friends. We had a great time, but spent all of our money, and had to walk 10 miles in the rain to get home. Carrying one of my friends home almost broke my back but it was fun to be with them again. The roads were getting ﬂooded and at one time I thought that we were going to sink into the mud. But to cut a long story short we eventually got home soaking wet at 3:30 am. Question 11: Did they have to walk home? (Answer Yes) Bob and Jane were loading the car for a camping trip. They love being out in the clean air away from the dirt in the city. Bob wanted to take just a tent and sleeping bags, but Jane wanted to be more comfortable. So she put extra blankets, pots, pans, books, lights, extra clothes, a chair, and many other things into the car. When Bob saw this he was surprised and said, “You’re taking too much stuﬀ! You have everything but the kitchen sink packed into the car and I’m not sure that there’s room for me!” Question 12: Did Jane pack the bare minimum? (Answer No) It was bad enough that Helen was always late for basketball practice and that she was always complaining. But when she missed the big game without telling us, that was the straw that broke the camel’s back and so we all agreed she had to be dropped from the team. But it turned out that Helen had burnt her hand badly on a pot and couldn’t have come to the game, so we all felt very bad about wanting to drop her. Question 13: Did Helen burn her hand? (Answer Yes) Bob thought that their camping trip really was in trouble. It was the middle of the night, the rainstorm was getting worse, the lining of the tent was ripped, and the car was out of petrol. To top it oﬀ, their cell phone batteries were dead! Jane agreed and said that she thought that they were really up the creek without a paddle and in desperate need of help. Question 14: Was the camping trip going well? (Answer No)

7

72

Geoﬀrey Underwood, Norbert Schmitt, and Adam Galpin I really enjoyed my holiday at the old windmill. It was in full working order with the original grindstone and everything. It was such a change after the hard work at the university. After reading only really diﬃcult books all semester, it was like a breath of fresh air to relax at the mill and enjoy reading an easy novel. Question 15: Did this person enjoy their holiday? (Answer Yes) I was sick all last week and couldn’t go out. I was due to visit my friends and that had to be called oﬀ. But at least I caught up with all of my homework, so every cloud has a silver lining and now I’m back on schedule. I was so ill that I couldn’t do any cleaning and I had to leave the dirty pots in the sink for ﬁve days, but I think the kitchen’s looking organised now. Question 16: Did this person do the cleaning while they were ill? (Answer No) My brother Ron is always doing crazy things. First he went to Africa and lived in the jungle for a year with some hunters. He got deported when he was found with some horns from rare animals. Then he got in trouble for stealing clothes from a store. Now he’s decided to hug every tree in the UK. But my parents and I have come to expect odd behaviour. He deﬁnitely is the black sheep of the family although we still love to have him around. Question 17: Did Ron live in Africa? (Answer Yes) I wouldn’t worry about your canoe race with James. He’s slow and not very strong. In fact, I’m sure you could beat him with one hand tied behind your back if you just remember to concentrate on your paddle technique and work hard over the whole course. Question 18: Is James a fast canoeist? (Answer No) Jim thought that the main reason they received a poor mark on their group project was that they didn’t allow enough time to do it properly. Being short on time will always mean that the ﬁnal product is less than satisfactory. Sally agreed and thought that he’d hit the nail on the head; if they had another week, they could have done much better. They missed a lot of important information by having to rush it. Question 19: Did Jim and Sally rush their project? (Answer Yes) By going home to London last weekend, John was able to both study at the British Library and to attend his mother’s 50th birthday party. He was quite pleased with himself, as he always liked to kill two birds with one stone when he could. The only downside to his visit was that he picked up a terrible cold in his head and had to take time oﬀ work to recover. Question 20: Did John miss his mother’s birthday party? (Answer No).

Exploring the processing of formulaic sequences through a self-paced reading task Norbert Schmitt and Geoﬀery Underwood University of Nottingham

Introduction The importance of formulaic sequences in language use is becoming ever more apparent, but no one is quite sure how the mind manages them. It is a consensus that they are processed as whole units, or at least appear to be (e.g. Pawley and Syder, 1983; Vihman, 1982; Weinert, 1995; Wray, 2002) but the actual mechanics of how they are processed are not clear. Corpus studies can help in identiﬁcation and description of the formulaic sequences (e.g. Biber et al., 1999; Nattinger and DeCarrico, 1992; Adolphs and Durow, this volume) and performance data from L1 child and L2 learner studies can go some way towards illuminating their acquisition (e.g. Nelson, 1975; Wong Fillmore, 1976; Schmitt et al., this volume). However, it will probably take more tightly-controlled laboratory methodologies to truly understand the underlying mental processes involved with formulaic sequences, simply because many of the processes are too automatic and too fast to be observed by more naturalistic investigative techniques. Underwood, Schmitt, and Galpin (this volume) used an eye-movement technique to explore how the eye samples the words in formulaic sequences as compared to nonformulaic words in a text, on the assumption that this indicates the underlying mental processing controlling the eye movements [hereafter Experiment 1 (E1)]. We found that the words are recognized with fewer ﬁxations when they are the terminal word in a formulaic sequence than when they are embedded in nonformulaic contexts. This supports the notion that formulaic sequences are processed holistically, or at least that the mind is able to predict the end words of the sequence from the previous words in the sequence. We also found that the words were recognized faster (shorter ﬁxation times) when they were terminal words in a sequence. But this faster recogni-

74

Norbert Schmitt and Geoﬀery Underwood

tion applied only to the native participants, with the nonnatives taking as much time for words in formulaic sequences as the same words in nonformulaic contexts. We speculated that when formulaic sequences are being learned, the ﬁrst level of mastery allows fewer ﬁxations, but that shorter ﬁxation times require higher levels of mastery. The methodology in Experiment 1 allowed us to study the terminal word in formulaic sequences, but it did not permit us to look at all of the individual words in the sequences, because proﬁcient readers typically do not ﬁxate on all of the words in a text, usually skipping over many of the function words. If we were able to analyze the processing of the sequences word-by-word, we might be able to establish the pattern of (presumably quicker) recognition as the sequence proceeds, rather than just focusing on the terminal word. Beyond this, we might be able to identify the recognition point for each sequence, after which the recognition times for subsequent words would presumably quicken, and thus identify what factors trigger recognition. Clearly an important factor is awareness that a string of words in a text forms a formulaic sequence, but how many of the component words it takes to recognize the sequence, and whether certain words (the initial content words?) play a greater role in the recognition are open questions. In order to explore these questions, we looked for a methodology which would be able to analyze the processing of the component words in formulaic sequences. We decided on Self-Paced Reading, a technique where the words in a text are ﬂashed on a monitor one-by-one, with the participant pressing a button to bring up each new word. The computer times each push of the button, which in eﬀect measures the time required to recognize and process the word. The self-paced reading task has been used in studies of reading processes for some time now. For example, Aaronson and Scarborough (1976) displayed words one by one in the centre of a computer screen, with a key-press from the reader being used to display the next word. Reading times were shown to reﬂect the semantic content of the sentences being read, with longer reading times for important content words, and decreasing reading times as contextual redundancy increased. Aaronson and Ferres (1983, 1984) have also used the task to demonstrate reading time diﬀerences for words in diﬀerent linguistic categories, and to investigate diﬀerences between individuals who diﬀered in their reading skill. Thus the task is well-established, at least for the study of individual words in reading. It seemed a reasonable research tool to use in a follow-up study to E1, where we wished to explore the component words in formulaic sequences.

Exploring the processing of formulaic sequences through a self-paced reading task

Methodology Procedure The target formulaic sequences and context story contexts from E1 were used in this study (see the Appendix in Underwood, Schmitt, and Galpin, this volume). The context stories were input into a laptop computer using E-Prime software. The order of the context stories was randomised. The computer was programmed to present a 3-option multiple-choice question after each story to ensure that the participants were reading for meaning and not just pressing the spacebar as fast as possible. Participants were individually scheduled to participate and once arriving at the research assistant’s oﬃce, were seated at the laptop and made comfortable. The assistant instructed the participant in how the software worked before each experiment began. She observed as the participant worked their way through the example trial, to make sure he or she understood the procedure and was proceeding appropriately. After this initiation, the participant was free to work their way through the twenty story contexts and questions at their own speed. The native participants took between 10–15 minutes, while the nonnatives took 20–25 minutes on average. The nonnatives were then orally asked to describe the meaning of the target formulaic sequences to determine if they knew them or not. Once the administration was completed, the participant was debriefed and paid for participation.

Participants Twenty native speakers and twenty nonnatives participated in the study. The natives were students at the University of Nottingham (15 undergraduates, 5 postgraduates; 7 male, 13 female)). The nonnatives were a mixture of students and visiting scholars from the same university: 2 on a pre-university foundation course, 2 undergraduates, 7 postgraduates, 6 on a Chinese ELT teacher’s course, and 3 visiting scholars. Six of the nonnatives were male and 14 female, with an average age of 28.7 years. With the exception of three European participants, who had spent several years in the UK, the participants averaged 6.97 months in the country. The L1s included 12 Chinese, 3 French, 2 German, and one each of Akan, Japanese, and Gujarati. The nonnative students had passed the university English language entrance requirement of CBT TOEFL 213 (Paper TOEFL

75

76

Norbert Schmitt and Geoﬀery Underwood

550) or IELTS 6.0, but no proﬁciency measure was available for the other nonnatives, whose abilities varied considerably.

Results Each press of the spacebar presented the next word of the text, and the time between presses was recorded as the reading time for the word. Of primary interest in the ﬁrst analysis was the reading time for the ﬁnal word in each formulaic sequence, and the average reading times are presented in Table 1 for the native and non-native speakers. These target words acted as their own controls and appeared as part of another passage for comparison. A two-factor, mixed-design analysis of variance was used to inspect these diﬀerences, with speaker-origin and word-type (target vs control) as the factors. Only speaker-origin appeared as a reliable eﬀect in the ANOVA (F1,38 = 30.86, p < 0.001), with native speakers (329 msec) having shorter reading times than non-native speakers (445 msec). The word factor (F < 1) and the interaction between speaker and word (F < 1) suggested no sensitivity of either group of readers to the placement of words in formulaic sequences vs. nonformulaic contexts. Table 1. Reading times (msec) for terminal words in formulaic sequences (target words) and when the same words appeared in non-formulaic contexts (control words). Standard deviations are in parentheses

Target Words Control Words

Native speakers

Non-native speakers

322 (64.0) 335 (65.1)

440 (45.9) 452 (124.8)

The second analysis looked at the processing of the component words in the formulaic sequences. The reading time of each word in a sequence was recorded separately according to the length of the sequence (4–7 words) and according to the position of the word in the sequence. In order to enter as many phrases as possible into this analysis only the ﬁnal four words in the sequence were entered, thereby allowing comparison of the ﬁnal four words of 4, 5, 6, and 7 word sequences. The means of these reading times are presented in Table 2. A threefactor, mixed-design analysis of variance was used, with speaker, sequence length, and word position as the factors. Native speakers (324 msec) had shorter reading times than non-native speakers (456 msec) according to this

Exploring the processing of formulaic sequences through a self-paced reading task

ANOVA (F1,38 = 21.15, p < 0.001), and the two factors of sequence length (F3,114 = 32.57, p < 0.001) and word position (F3,114 = 2.80, p < 0.05) were also reliable. Inspection of the means in Table 2 suggests that as sequence length decreases, so does reading time per word, at least for non-native speakers. The small effect of word position is less clear, but with slightly longer times being recorded for later words for the nonnatives. For the interaction between speaker and sequence length was reliable (F3,114 = 35.65, p < 0.001), and an analysis of simple main eﬀects indicated an eﬀect of length for the non-native speakers (F3,114 = 68.19, p < 0.001), but not for the native speakers (F<1). The interaction between speaker and word position was also reliable (F3,114 = 4.46, p < 0.01), and again the eﬀect of word position held for non-natives (F3,114 = 4.76, p < 0.01) but not for native speakers (F3,114 = 2.50). In both of these interactions, non-natives showed more sensitivity to the formulaic sequences than did the native speakers. The three-way interaction was also reliable (F9,342 = 1.97, p < 0.05). An analysis of simple main eﬀects for native speakers found no eﬀect of word position for any sequence length, but there were eﬀects for non-native speakers for 7, 6, and 4 word sequences only. The high variances (see the standard deviations in Table 2) and irregular pattern of results in the reading times of the non-native speakers prompted a further analysis, in an attempt to identify the source of this variance. If a reader is unfamiliar with a formulaic sequence, there is no reason to expect any pattern Table 2. Reading times (msec) for words in formulaic sequences according to the length of the sequence and the position of the word in the sequence. Only the last four words of each sequence are presented. The target word was the ﬁnal word in the formulaic sequence, word T-1 was the word prior to the target, word T-2 was the word prior to word T-1, and word T-3 was the word prior to word T-2. Standard deviations are in parentheses Native speakers Length of Formulaic Sequence (words): Word position T-3

7

6

Non-native speakers 5

4

7

6

5

4

328 334 336 331 (81.7) (57.5) (79.8) (104.6)

566 424 428 328 (146.2) (122.1) (112.4) (88.8)

T-2

321 343 324 336 (67.9) (21.2) (72.7) (92.5)

621 455 435 352 (190.2) (143.9) (123.7) (81.2)

T-1

312 306 311 315 (71.3) (58.8) (76.1) (101.4)

536 476 422 (150.1) (140.4) (118)

387 (167)

616 (226)

358 (85.3)

Target 322 318 327 320 (61.5) (71.6) (74.1) (97.2)

454 430 (131.9) (152)

77

78

Norbert Schmitt and Geoﬀery Underwood Table 3. Reading times (msec) of non-native speakers for words in formulaic sequences according to position of the word in the sequence. Only the last four words of each sequence are presented here. The target word was the ﬁnal word in the formulaic sequence, word T-1 was the word prior to the target, word T-2 was the word prior to word T-1, and word T-3 was the word prior to word T-2. Standard deviations are in parentheses Knowledge of formulaic sequence

Word position

T-3 T-2 T-1 Target

Known

Unknown

421 (101.2) 423 (105.5) 410 (105.1) 419 (113.9)

415 (132.2) 482 (130.3) 445 (136.3) 489 (167.7)

of behaviour that would indicate sensitivity to the increasing predictability of the words in the sequence, and so reading times were analysed as a function of familiarity. Responses to each sequence were analysed according to whether the individual knew the sequence or not, and these data are presented in Table 3. A two-factor analysis of variance was performed on these data, with word position and sequence knowledge as the two factors. Word position was reliable (F3,57 = 3.28, p < 0.05), as was knowledge of the sequence (F1,19 = 9.21, p < 0.01) and the interaction between these factors (F3,57 = 3.70, p < 0.05). An analysis of simple main eﬀects was used to inspect the interaction, and this found an eﬀect of word position for unknown sequences (F3,57 = 6.01, p < 0.01) but not for familiar sequences (F<1). The means in Table 3 indicate that for known sequences the reading times were relatively constant (as they were for native speakers — see Table 2), averaging 419 msec. However, for unfamiliar sequences the reading time varied according to the position of the word in the sequence. As the reader progressed through an unfamiliar sequence their reading times increased, with simple contrasts indicating reliable increases in reading time between words T-3 and T-2, and again between T-1 and the target word. While Table 3 focuses on the position of words in a sequence, Table 4 addresses the length of the sequences. Inspection of the reading times suggests a diﬀerence in the reading times of words in sequences of diﬀerent length, according to whether or not the non-native speaker knew the sequence. A two-factor analysis of variance was used to inspect this possibility further, with sequence length and sequence knowledge as the two factors. Both factors had reliable effects. Sequence length (F3,57 = 7.75, p < 0.001) suggests a trend towards longer reading times with longer sequences, and paired comparisons conﬁrmed that

Exploring the processing of formulaic sequences through a self-paced reading task Table 4. Reading times (msec) of non-native speakers for words in formulaic sequences according to the length of the sequence. The average reading time for a word is presented here, regardless of the position of the word in the sequence, according to whether the speaker knew the sequence or not. Standard deviations are in parentheses. Knowledge of formulaic sequence

Sequence length

4 words 5 words 6 words 7 words

Known

Unknown

374 (108.6) 439 (125.6) 435 (121.0) 393 (98.1)

397 (53.5) 439 (114.8) 461 (116.0) 453 (124.4)

four-word sequences were read faster than both ﬁve-word and six-word sequences (both comparisons at p < 0.01). No other comparisons were reliable. Familiarity again had an eﬀect (F1,19 = 8.47, p < 0.01), but the interaction between the two main eﬀects was not reliable (F3,57 = 1.34).

Discussion Recognition times of natives vs nonnatives for words in the formulaic sequences We ﬁnd that the natives had lower recognition times for words in the target sequences than the nonnatives. This means that words were recognized and processed faster by the native participants. This is unsurprising and mirrors the recognition results obtained through eye-movement methodology in Experiment 1.

Recognition times for terminal words vs control words One of the fundamental questions we would like to answer is whether words in formulaic sequences are processed faster than words in non-formulated text. The way we addressed this question in E1was a comparison between the sequence-ﬁnal target words and control words, and we found signiﬁcant diﬀerences in favor of the target words for number and duration of ﬁxation in every condition except duration in nonnatives. In this study, the target words were processed slightly faster than control words for both natives and nonnatives, but not to a reliable degree. For nonnatives, this non-signiﬁcant result is con-

79

80

Norbert Schmitt and Geoﬀery Underwood

gruent with the non-signiﬁcant result for duration between target and control words. However we expected the natives to have shorter recognition times towards the end of the formulaic sequences, but this didn’t happen. It is impossible to determine the reason why from our data, but we can speculate on several possible explanations. The consensus is that formulaic sequences are processed holistically, and it may be that a person needs to be able to view the sequence or parts of a sequence together in order to trigger recognition of the sequence, rather than word-byword. The self-paced reading methodology has the advantage of allowing data to be collected on each component word, but it might well be that reading a formulaic sequence word-by-word makes it very diﬃcult to recognize the sequence as a unit. A follow up study is planned in which each press of the spacebar brings up either a formulaic sequence or a length-matched string of nonformulaic words to explore this ‘holistic processing’ possibility. Another possibility is that the advantage of using formulaic sequences lies more in comprehension than in an improvement of reading speed. In other words, formulaic sequences do not exist in language because they provide a beneﬁt in terms of faster recognition and processing, but because they facilitate better understanding of the message. Our study included only very brief general questions connected with comprehension of the general story contexts, mainly to ensure that the participants took the task seriously and read for meaning. This was the case as almost all participants answered all of the questions correctly. However, it would be possible in future research to focus on comprehension by writing more detailed questions which centered on speciﬁc information which the formulaic sequences added to the story. This could help inform on the relative speed vs. understanding beneﬁts of formulaic sequences in reading. Another possibility is that the terminal word in a formulaic sequence is not the best word to focus upon in a self-paced reading task. There might be an ‘overrun’ eﬀect, where participants slow down close to the end of a formulaic sequence, so that they won’t overrun the sequence, and get into a part of the story where they could not predict what was coming up. A ﬁnal explanation is that the self-paced reading task is not as well-suited to this research area as we initially thought. The task, with its manual manipulation of the keyboard, might not be sensitive enough to measure the true diﬀerences in processing time inherent in the use of formulaic sequences. Alternatively, participants might use varying strategies (hit the spacebar as quickly as possible, get to the end of a sentence, and then work out what it means; carefully compose the textual meaning word-by-word; hit the spacebar quickly except when com-

Exploring the processing of formulaic sequences through a self-paced reading task

ing across diﬃcult words/construction and then pausing on that word (or more likely on subsequent words on the realization that something has been missed)) to read the text which could have inexplicable consequences on the measured recognition times. There will be more on this later in the discussion.

Recognition times of nonnatives who knew the formulaic sequences vs those who did not In E1, we were only able to compare native and nonnative performance. From the piloting done as part of the E1 selection procedure, we could assume that the native participants knew all or nearly all of the target formulaic sequences. Clearly this assumption could not be made for the nonnative participants. It is of interest to compare the cases where nonnatives knew the target sequences against cases where they did not, since this would presumably lead to diﬀerent levels of performance. In this study, we measured this factor, and found that, on average, the nonnatives knew 11.95 of the 20 target sequences. The sequences themselves ranged from being virtually unknown, with only three nonnatives knowing them (hit the nail on the head), to being known by all the nonnatives (I see what you mean, honesty is the best policy, and on the other hand). When we compared recognition times for words embedded in formulaic sequences which were known against words in unknown sequences, we found that those times were signiﬁcantly slower for the unknown sequences (458 msec) than for known sequences (419 msec). Most of the component words in the target sequences were of relatively high frequency, and the nonnatives were likely to know them. But these (presumably known) individual words were recognized faster by the nonnatives when they were part of a formulaic sequence than when they were not. This could indicate some facilitation as a result of these words being ‘packaged’ in unitary sequences. Alternatively, the diﬀerence in recognition time could be more connected with the problems associated with trying to interpret unknown formulaic sequences, particularly since many of the sequences used in this study had meanings which are opaque and idiosyncratic. We will come back to this again below.

Eﬀect of length of the formulaic sequences We included a variety of formulaic sequences in this study, and one of the ways in which they varied was length. The ANOVA shows that words in the shorter formulaic sequences were recognized reliably faster than words in longer se-

8

82

Norbert Schmitt and Geoﬀery Underwood

quences, but only for the nonnatives. One might have supposed that longer sequences would be more beneﬁcial in this study, simply because the longer the string, the more clues there are to its ending. For example, the four words the straw that broke provide good evidence that the remaining words will be the camel’s back. But this was not true for the proﬁcient native speakers, where the length of the formulaic sequence made virtually no diﬀerence in recognition times. This means that the shorter sequences were recognized and processed as quickly per word as the longer sequences. This would seem to indicate that natives do not need formulaic sequences to be long in order to process them economically; shorter sequences are handled just as eﬃciently. On the other hand, with the nonnatives we found a surprising converse relationship: shorter sequences were recognized more quickly per word. So not only do we fail to ﬁnd an advantage for the longer sequences, but it turns out that shorter sequences were easier to recognize and process for the nonnatives. This is not intuitive, but there may be factors other than length itself which might explain these results. Perhaps length itself is not the real factor, but co-occurs with some other factor which was not addressed in this study. For instance, we found that the shorter sequences in this study were known by the nonnatives somewhat better than the longer sequences. The two 4-word sequences were known by 8 and 20 participants respectively (M=14), the ﬁve 5-word sequences by 6, 11, 19, 20, and 20 participants (M= 15.2), the ten 6-word sequences by 3, 4, 4, 9, 10, 10, 11, 17, 18, and 18 participants (M= 10.4) and the three 7-word sequences by 10, 10, and 11 participants (M= 10.3). Thus the shorter recognition times are probably at least partially caused by the nonnatives knowing the shorter sequences slightly better than the longer ones. Of course frequency of occurrence is another factor which might aﬀect how well lexical items are known. This is certainly true of individual words, and there is ample evidence that higher frequency words are generally learned before lower frequency words (see Schmitt, Schmitt, and Clapham, 2001, for one example of this). One would suppose that the same situation would apply to formulaic sequences (although see Schmitt, Grandage, and Adolphs, this volume for potential counterevidence). We reviewed the frequency ﬁgures for the target sequences in each of three corpora: the British National Corpus (BNC), a corpus focusing on mainly written English; the CANCODE corpus, which is made up of unscripted spoken English; and the MICASE corpus, which consists of spoken academic discourse. There was a frequency advantage for shorter sequences, but this almost completely stemmed from three sequences which were much more frequent than the rest of the target sequences (on the other hand, as

Exploring the processing of formulaic sequences through a self-paced reading task

a matter of fact, I see what you mean). If these three were disregarded, there was little to diﬀerentiate the remaining sequences in terms of frequency, regardless of their length. For nonnatives, the mean recognition time for words in the three frequent sequences was 402 msec, and the recognition for the rest of the formulaic sequences was 423 msec. A t-test showed that the diﬀerence was not reliable (p=.072). This means that the nonnatives did not recognize words in the frequent formulaic sequences faster than words in the relatively nonfrequent sequences. So the advantage for shorter sequences cannot be explained by the shorter sequences merely being more frequent. The length eﬀect may also have something to do with whether the sequences are known or not. If the formulaic sequences are known, then perhaps it does not matter much to nonnatives how long the sequences are, just as with the natives, but if they are not known, then longer sequences may well be more confusing and diﬃcult to work out than shorter sequences, if for no other reason than having more words to deal with. This explanation seems reasonable, however the study found no support for it, as there was no interaction between length and sequence knowledge in the analysis of the data in Table 4. Another possibility is that shorter formulaic sequences are easier to process, which would go against the assumption that longer strings are more obvious and more predictable. But perhaps it doesn’t work like this. Just as individual words have diﬀerent learning/processing burdens, it may be that formulaic sequences do as well. This would mean that some formulaic sequences are easier to process than others. Possible factors in sequence diﬃculty might include saliency of the sequence, transparency of the sequence’s meaning, usefulness for particular speakers and contexts, and despite the preliminary result above, frequency of occurrence. Our results are only suggestive, but it might be that this type of factor is more strongly associated with shorter sequences rather than longer ones.

Eﬀect of word position in the formulaic sequence Clearly for each formulaic sequence, there is a point of recognition, just as there is for individual words. The target vs. control analysis above looked at only the single terminal word in a formulaic sequence, but here we look at the ﬁnal four words of a sequence. Presumably the later words in a sequence should be recognized faster. Until a string of words is recognized as a formulaic sequence, the language user does not know what is coming next in the discourse, and so must be open to a whole range of linguistic possibilities. However, once the sequence

83

84

Norbert Schmitt and Geoﬀery Underwood

is recognized, the person should have a very good idea of the subsequent words in the sequence. Surprisingly however, we found no such pattern in our statistical analyses. The native speakers showed no pattern connected with position of words in a sequence. The sequences which the nonnatives knew similarly produced no position eﬀect. However, the sequences which were unknown to the nonnatives did produce a position eﬀect, but an unexpected one: later words took more time to read than the earlier words. Moreover, this trend was not across the board. The terminal word took longer than the penultimate word (T-1) (see also discussion below), and the T-2 word took longer than the T-3 word. Additionally, this eﬀect only applied to the 7-word, 6-word, and 4-word sequences, but not the 5-word ones. In sum, there was no evidence of a facilitation eﬀect for later words, and the patterning of words with the sequence deﬁes explanation due to its complexity. Because we were surprised that later-position words did not seem to be facilitated, we looked at the native data in a more holistic manner to see if we could identify any trends which the statistical treatment did not bring to light. The natives were the most proﬁcient participants, and if any facilitative trends were to surface, they would most likely be for these participants. First, we suspected that the initial content words might be key to the recognition of the target sequences. We looked at the data and position of the initial content word, but found that there was no pattern of decreasing recognition times after that word. There seemed to be a hint of facilitation after the ﬁrst two content words, particularly if they were contiguous (the black sheep of the family, with one hand tied behind your back). However, it must be said that the data were generally chaotic and these observations can only be seen as speculative. However, one stronger pattern did emerge: in 18 out of the 20 formulaic sequences, the mean recognition time for the terminal word was higher than for the penultimate word. This means that for the vast majority of sequences, regardless of what had happened at earlier stages of the sequence, the participants took more time on the terminal word than the word before it. What this means is open to debate, but it clearly goes against the assumption that the terminal word in a known formulaic sequence is very predictable, and thus largely redundant. To illustrate the diﬃculty in identifying patterns in the data, let us explore three seven-word sequences. As mentioned before, these longer sequences would arguably be among the better candidates to show facilitation towards their later stages. In Figure 1 (natives), the sequence with one hand tied behind your back shows the kind of trend we were expecting with all of the sequences.

Exploring the processing of formulaic sequences through a self-paced reading task 425 400 375 350 Put all your eggs in one basket The straw that broke the camel’s back With one hand tied behind your back

325 300 275 1

2

3

4

5

6

7

Seven-word formulaic sequences Figure 1. Reaction times for native speakers

However, the sequence the straw that broke the camel’s back does not show this trend; if anything, the latter stages have slightly higher latencies than the initial stages. The phrase put all your eggs in one basket forms a zig-zag, with no pattern evident. These sequences are representative of the others in that little patterning was discernible. This is with the native speaker data, where the facilitation should have been the strongest. Figure 2 shows the results for the same target sequences for nonnatives who knew the sequences. The graphs are essentially ‘ﬂat’ with not much change in the recognition times from the beginning of the sequence to the end, although there is considerable variation within each sequence. This is particularly true for with one hand tied behind your back, which seems to be increasingly diﬃcult to recognize, especially the word behind. As for these sequences when nonnatives did not know them, Figure 3 shows that 575 525 475 With one hand tied behind your back

425

The straw that broke the camel’s back

375 Put all your eggs in one basket

325 275 1

2

3

4

5

6

7

Seven-word formulaic sequences Figure 2. Reaction times for non-natives (known formulaic sequences)

85

86

Norbert Schmitt and Geoﬀery Underwood

600 The straw that broke the camel’s back Put all your eggs in one basket

550 500

With one hand tied behind your back

450 400 350 2

1

3

4

5

6

7

Seven-word formulaic sequences Figure 2. Reaction times for non-natives (known formulaic sequences)

the recognition times are slower than when the sequences were known, but the results according to word position are chaotic. However, there is a trend of increasing recognition times towards the end of the sequences. It seems from these data that nonnatives are actually being harmed by the appearance of unfamiliar formulaic sequences, as they attempt to work out the meaning of the generally opaque meanings (eggs in baskets? straw and camels?). The nonnatives are likely to be more comfortable working out the meaning of nonformulaic strings of words where the component words ‘make sense’ in sequence. In E1, nonnatives were able to process the terminal words of formulaic sequences with fewer ﬁxations, but not more quickly. Here we ﬁnd that sequences which are not known by nonnatives cause processing problems. In terms of processing time, nonnatives (and perhaps natives as well) do better with nonformulaic language than with formulaic sequences which they do not know. This is common sense and is probably widely assumed, but these data are some of the ﬁrst laboratory data to support this assumption.

Summary The overall results are somewhat confusing but some trends did emerge. The native speakers read the words in the formulaic sequences faster than the nonnatives, and this is unsurprising. There was no diﬀerence in how long it took both natives and nonnatives to read the target terminal words vs. the control words, which was unexpected and which argues against formulaic sequences being

Exploring the processing of formulaic sequences through a self-paced reading task

processed faster than nonformulaic text. Words in known formulaic sequences were recognized faster by nonnatives than words in unknown sequences, but this may well be more of a diﬃculty eﬀect for the unknown sequences than a facilitative eﬀect for the known sequences. Taken together, these results surprisingly suggest that words in formulaic sequences are not processed any more quickly than nonformulaic words, at least for native speakers. This is in direct contradiction to the native results from E1, where natives processed the target words faster than the control words. However, these results do mirror the E1 nonnative results where there was no advantage in duration of ﬁxation for the target words. The main purpose for using the self-paced reading task was to look at the component words in a sequence and explore their behavior. When the ﬁnal four words within sequences were compared to one another, the later words were not recognized faster than the earlier words, with natives showing no diﬀerence in recognition time. Moreover, the nonnatives took longer with the later words. These results were perplexing, and any hope of identifying a point within sequences where recognition times drop, thus indicating the recognition point for the sequences as a whole proved unfounded. However, these results add to the evidence for the problems involved in the processing of unknown sequences. Longer sequences took longer to recognize per word than shorter sequences, but only for nonnatives. Only future research can determine whether this eﬀect is truly due to length, or other related factors such as familiarity or frequency. On the basis of these results, it must be questioned whether the self-paced reading task is the best methodology to research formulaic sequences, at least as it was used in this study. It is a well-established methodology which held the promise of illuminating the mechanisms surrounding the recognition of formulaic sequences. However, it may be that time taken to manually press the space bar masks the small diﬀerences inherent in the processing of formulaic sequences. Or it may be that the word-by-word nature of the task disrupts the holistic processing of formulaic sequences. A study now in preparation where strings of words (rather than individual words) are revealed by pressing the spacebar may go some way towards determining the viability of self-paced reading in this type of research.

Conclusion This chapter reported an exploratory study aimed at illuminating the processing of formulaic sequences by looking at the recognition times of their component

87

88

Norbert Schmitt and Geoﬀery Underwood

words. Being the ﬁrst time a new methodology (self-paced reading) was used to study formulaic sequences, the study inevitably raised more questions than it answered. Some of the more intriguing include the following: 1. Do formulaic sequences actually oﬀer an advantage in terms of speed of processing? 2. Is the advantage of formulaic sequences mainly to do with speed of processing or improved comprehension or both? 3. Is the recognition of formulaic sequences related to a cluster of content words near the beginning of a sequence? 4. In order to recognize a string of words as a formulaic sequence, is it necessary to see those words in sequence, as opposed to word-by-word? What implications does this have for slower L2 readers who essentially decode reading text in a word-by-word manner? 5. If a formulaic sequence is unknown, what does a reader do to form an interpretation? Although these questions are interesting in their own right, answers could have a strong impact on the theoretical understanding of lexical processing, and may have pedagogical implications as well. We look forward to pursuing them in the future.

References Aaronson, D. and Scarborough, H. S. 1976. Performance theories for sentence coding: Some quantitative evidence. Journal of Experimental Psychology: Human Perception and Performance 2: 56–70. Aaronson, D. and Ferres, S. 1983. Lexical categories and reading tasks. Journal of Experimental Psychology: Human Perception and Performance 9: 675–699. Aaronson, D. and Ferres, S. 1984. Reading strategies for children and adults: Some empirical evidence. Journal of Verbal Learning and Verbal Behavior 23: 189–220. Biber, D. Johansson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. Nelson, K. 1975. The nominal shift in semantic-syntactic development. Cognitive Psychology 7: 461–479. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J. C. Richards and R.W. Schmidt (eds), 191–225. London: Longman.

Exploring the processing of formulaic sequences through a self-paced reading task Schmitt, N., Schmitt, D., and Clapham, C. 2001. Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing 18: 55–88. Vihman, M. M. 1982. Formulas in ﬁrst and second language acquisition. In Exceptional Language and Linguistics, L. K. Obler and L. Menn (eds), 261–284. New York: Academic Press. Weinert, R. 1995. The role of formulaic language in second language acquisition: A review. Applied Linguistics 16: 180–205. Wong Fillmore, L. 1976. The second time around: cognitive and social strategies in second language acquisition. Unpublished PhD thesis, Stanford University. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

89

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 Carol Spöttl and Michael McCarthy

University of Innsbruck and University of Nottingham/ Pennsylvania State University

Introduction Everyday language contains signiﬁcant proportions of routine elements. Native speakers (of any language) seem to easily recognize their form and keep them in their memory as prefabricated units. Learners (of any language) at various stages have these formulaic sequences as their target goal. Wray (1999) questions a common assumption often made in the second language learning literature, namely, because formulaic sequences are a feature of ﬁrst language learning, they must be easy for the learner to adopt in acquiring further languages. Learners have been shown to be relatively successful in learning these sequences in the initial stages of learning an L2, and this learning can lead to initial communicative success (Nattinger and DeCarrico, 1992: 183). Bolander (1989: 73) argues that, right from the early stages of learning, memorisation of strings and formulaic speech are important as a means of facilitating conversation. However, Pawley and Syder (1983) have illuminated the problem at the other end of the scale with advanced non-native speakers, and shown it to be the last and most challenging hurdle in attaining near native-like performance. In her preface, Wray (2002) asks how something so easy and useful for both beginners and very proﬁcient users can be so diﬃcult in between. We will investigate this ‘in-between’ stage by investigating multilingual advanced learners’ and users’ perception and knowledge of formulaic sequences across their various languages.

92

Carol Spöttl and Michael McCarthy

Background Formulaic sequences The study adopts Wray’s working deﬁnition of the formulaic sequence: A sequence, continuous or discontinuous of words or other meaning elements, which is, or appears to be, prefabricated: that is stored and retrieved whole from the memory at the time of use, rather than being subject to generation or analysis by the language grammar (Wray 2000: 465).

This deﬁnition highlights the research concern behind the present study. If words in a formulaic sequence are “glued together” and stored as a single “big word” (Ellis 1996: 111), can we observe these sequences of words being perceived as holistic units of meaning by learners or observe operations where they would seem to be being retrieved whole from memory rather than generated? Or would observation reveal analysis of such sequences by the language grammar ? What are the hurdles the learner meets in decoding the sequences?

Formulaic utterances: lexical representation, semantic access and proﬁciency The nature of lexical representation and semantic access of formulaic utterances has yet to be fully addressed. In presenting his Homogeneity Hypothesis, Libben (2000) states that lexical knowledge is at the heart of the language system and argues that the key to understanding the overall organisation of language in the mind is through understanding how lexical knowledge is organised. His hypothesis is that in the proﬁcient bilingual lexicon there are no separate stores of lexical knowledge just “homogenous lexical architecture” (p. 230). Twenty-three years earlier, in his discussion of the Independence-Interdependence issue, McCormack (1977: 64) also concluded that the single store position made most sense. This view may even help reconcile the split in vocabulary studies between research based on single word studies and those advocating that the relevant unit for analysis should be the multiword unit (Bolinger, 1976; DeCarrico, 1998; Pawley and Syder, 1983). If the store is a single piece of architecture, then the issue between single word vs formulaic sequence, at least as regards representation and processing, would appear less controversial. From the BIMOLA model (The Bilingual Model of Lexical Access), an interactive model of spoken word recognition, De Groot (2002) proposes that

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

initial access to the mental lexicon is independent of the language being processed. De Groot and Hoeks (1995) investigated Dutch trilinguals (L1 Dutch, L2 English and L3 French) where L2 was the stronger language and L3 the weaker and found concept mediation translation was used at a high-level foreign-language proﬁciency and word-association translation at low-level proﬁciency. They concluded foreign language proﬁciency aﬀects memory representation not only across participant populations but also within individual participants. “The important conclusion is that within one and the same multilingual mind diﬀerent types of bilingual memory representations — word-association representations and concept mediation representations — may co-exist” (De Groot, 2002: 42). This may well hold true for single word items, but do the same conditions apply to multi-word items? Can evidence be found for a similar pattern for the lexical representation and access of formulaic sequences? Bialystok (2001) warns that experimental results aimed at determining the nature of representation depend on whether the task requires access to the conceptual representations or can be solved by more shallow access to the lexicon: Moreover, the nature of the relation between words and their meanings changes as a function of ﬂuency in each language. As language proﬁciency increases, the connection between a word and its meaning becomes more direct, relying less on a mediating connection through the ﬁrst-language connection. Bialystok (2001: 103).

Problems with formulaic sequences in l3 and l4 From other studies conducted (Hinger and Spöttl, 2002; Spöttl and McCarthy, 2003), vocabulary size and general language proﬁciency proved the greatest stumbling block in investigating cross-linguistic lexical operations. For formulaic sequences in particular, saliency and opaqueness in meaning were observed as additional factors aﬀecting level of task diﬃculty in a multilingual context. Without a certain level of general language proﬁciency, noticing did not even take place and participants were found to completely ignore sections of L2 dialogues containing multiword items. No attempt was even made to read the sequence aloud, either in full or partially. Schmidt (2001) argues that attention is necessary for input to become available for further processing. Additionally, Gass (1988), van Lier (1991, 1994) and Schmidt (1995) put forward convincing arguments that attention, noticing and awareness play an important role in any language acquisition. Schmidt (2001) reports Leow’s study (1997: 19) using

93

94

Carol Spöttl and Michael McCarthy

think aloud protocols with learners of Spanish completing an L2 crossword puzzle which distinguished between two levels of awareness: simple noticing and noticing with metalinguistic awareness. Part of our research concern was to observe any link between general language proﬁciency and attention to formulaic sequences across the languages.

Lexical links between formulaic sequences in the l2 and l1/l3/l4 Numerous studies have shown that processing of single L2 words partly proceeds phonologically. Some research focused round the premise that unlike the L1 lexicon, much L2 processing is not semantically focused but phonologically driven (Laufer 1989: 17, Gass and Selinker 1994, Harley 1995: 7). Levenston (1979), Laufer (1991), and Celce-Murcia (1978) have shown that learners avoid words they cannot pronounce. Söderman (1994) proposes that each lexical item has it’s own processing programme starting with a phonological proﬁle and then moving to a semantic proﬁle. Although extensive research has been carried out into cross linguistic inﬂuence between L1 and L2 particularly in the ﬁeld of bilingualism, to the best of our knowledge, no single study exists which adequately covers lexical links in spoken formulaic utterances in L3, L4 or more. This study hopes to illuminate the above issues by investigating semantic access and the processing paths used by advanced proﬁciency users for corpusderived context-embedded formulaic sequences across L1, L3, and L4.

Data collection/instruments This study explores knowledge of formulaic sequences across several languages. To do this, it was necessary to investigate a limited number of participants in considerable depth, using a combination of qualitative and quantitative data. Data on the quantitative side of the continuum was collected from receptive and productive measurements of participant’s knowledge of target formulaic sequences. More qualitative data was collected from think aloud protocols, as well as a questionnaire which investigated how well participants thought they knew those sequences and what problems they perceived in using them across the various languages they knew.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

Think aloud protocols Think aloud protocols (TAP) were chosen as a way to probe in depth the participants’ knowledge of formulaic sequences across their L1 and additional languages. This technique is now well established in applied linguistics research, with Ericsson and Simon (1987, 1993) arguing that this type of data is admissible, interesting and usable. In particular, we wished to use this technique to explore the following questions: • How easily transferable are formulaic sequences between languages? • What strategic routes do participants take when handling formulaic sequences? • How do participants process formulaic sequences? • Are psychologically-salient formulaic sequences easier to recall? These questions are potentially diﬃcult to operationalize in a purely quantitative fashion, but we feel that they may be more amenable to the more detailed probing possible through think aloud protocols. In order to use TAPs, it was necessary to develop a task for the participants to work through. We wished to provide a number of authentic formulaic sequences for the multilingual participants to consider in their various languages. In order to select these target sequences, we turned to a spoken corpus for guidance. Corpus linguistics has yet to provide reliable statistics for the distribution of multi-word items in adult native speaker usage, partly because of the diﬃculty of automatically retrieving such units. Computers cannot reliably distinguish between strings which simply recur but which have no psychological status as units of meaning (e.g. the syntactically dependent string to me and occurs over 100 times in the corpus used in the present study) and those units which have a semantic unity and syntactic integrity, even though they may be less frequent (e.g. the unitary discourse-marker phrase as far as I know occurs less than half as often as to me and in the corpus). This has led some linguists to broaden the scope of chunking to incorporate syntactically incomplete strings (e.g. Altenberg, 1998; De Cock, 2000) simply on the basis of recurrent word combinations, which might include phrasal and clausal fragments (e.g. De Cock gives the examples of in the and that the), as well as intuitively meaningful but syntactically incomplete stems such as it is true that. In the present chapter, we have conﬁned ourselves to those items in automatically extracted strings which display syntactic and semantic integrity, which has necessarily involved manual sifting of the automatically generated data (see below).

95

96

Carol Spöttl and Michael McCarthy

The present study uses the ﬁve-million word CANCODE spoken corpus of British English. CANCODE stands for ‘Cambridge and Nottingham Corpus of Discourse in English’. The corpus was established at the Department of English Studies, University of Nottingham, and is funded by Cambridge University Press. The corpus consists of ﬁve million words of transcribed conversations. The corpus recordings were made non-surreptitiously in a variety of settings including private homes, shops, oﬃces and other public places, and educational institutions in non-formal settings across the islands of Britain and Ireland, with a wide demographic spread. The CANCODE corpus forms part of the much larger Cambridge International Corpus. For further details of the CANCODE corpus and its construction, see McCarthy (1998). The procedure for extracting the data samples used in the present study was the following: 1. A rank-order frequency list was generated for the entire ﬁve-million word corpus. The list consisted of non-lemmatised word-forms (get and got, for example, are two separate entries). Non-lemmatisation was purposeful: Sinclair (1991) argues convincingly that inﬂected and derived forms of words may enter into diﬀerent syntactic patterns with diﬀerent semantic/pragmatic meanings. 2. Working through the list of the 100 most frequent tokens, and ignoring articles, pronouns and other high-frequency non-lexical tokens such as yes and and, but including prepositions, other conjunctions, and basic adverbs such as here/now, concordances were generated for 20 high-frequency tokens ranging across the word-classes. These were: on, all, think, with, now, out, good, time, just, way, comes, get, go, when, had, see, sort, here, down, thing. 3. Using software able to produce all recurring strings of variable lengths based on each key word, lists of all 3-word strings with a frequency greater than 10 were generated for each item. This produced clusters such as out in the (for out), which were rejected (see the discussion above), but also more integrated sentence-stem clusters such as do you remember when [x]? for when, which are certainly useful, frequent chunks in terms of ﬂuent production. For the present purposes, and given our research questions, we were more interested in opaque or semi-opaque chunks which might present a processing challenge to the learner and which might or might not have accessible equivalents in L1 or L3, such as it’s a good job (in the sense of “thank goodness”, i.e. strings displaying at least a degree of semantic opacity). 4. The next step, therefore, was to identify and agree upon the most frequent

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

non-transparent 3-word string in each list. In the case of out, this was out of place which occurred 11 times in ﬁve million words (a frequency considerably greater than a string such as kick the bucket which occurs only twice in ﬁve million words). The eleven target sequences selected for use in this study include: On: on and oﬀ All: all over the place Think: now that I come to think of it With: what with Now: every now and then Out: out of place Good: it’s a good job Time: time after time Just: I’m just looking around Way: there’s no way Comes: if the worst comes to the worst In the choice of sequences no consideration was given as to whether the English formulaic sequences had any counterparts in the other languages. 5. The ﬁnal step was to extract from the concordances a typical context for each selected string, which could either be used verbatim or edited to provide a suitable stimulus for the experiment. Numerous contexts were considered, and the ﬁnal extracts are reproduced as edited in Appendix 1. Editing was necessary because unedited concordance lines often do not provide enough context to enable an item to be used successfully as experimental prompts. Learners are most likely to meet multi-word items in the real world in adequate contexts, not out of context or in impoverished or (to the outside observer) impenetrable contexts. An example of one of these extracts is the context for it’s a good job : A: Owen always says what he wants. He has ideas. B: Mm. Well it’s a good job Sarah married somebody like that isn’t it or they’d just be sitting there on the sofa because neither of them would say what they wanted to do A: Yeah. The task of the participants was to translate the selected extracts from L2 (English) to L1 (German) and then into L3 and/or L4 (French and/or Spanish) while giving a think aloud protocol commentary.

97

98

Carol Spöttl and Michael McCarthy

As discussed earlier, numerous studies have shown that processing of single L2 words partly proceeds phonologically, but the same hypothesis cannot just be assumed for formulaic sequences. This study tries to establish the cognate processing routes of formulaic sequences. TAPs, it was hoped, would provide better conditions to gather such data for analysis. By embedding the sequences in suﬃcient context and observing learner behaviour in transferring that language in its social context it was hoped to get as near to naturally authentic processing as possible.

Multiple-choice test of formulaic sequence knowledge In order to elicit receptive knowledge of the target sequences, an eleven-item multiple-choice test was developed. Participants were given the context from the authentic spoken dialogues used in the TAP experiment (see above) and asked to decide whether the choices provided were appropriate in the context. The test oﬀered ﬁve alternatives; a, b, c, d, and e) I don’t know. The distracters on the test were either based on L1 linguistic items in order to observe any possible cross linguistic misunderstandings on a semantic level (see Extract 11 below), other similar authentic L2 sequences that were however not context appropriate (Extract 6), or other L2 near synonyms (Extract 8). The instrument format was one which the participants were very familiar with, felt comfortable with and afterwards stated that they had enjoyed because of the challenge set. See Appendix 2 for the complete test.

Multiple-choice test Read the extracts below taken from authentic spoken dialogues. Decide whether the phrases below them are appropriate in that context. More than one answer is possible. Choose e) I don’t know if you simply just don’t know. Extract 11 (L1 interference) A: Let’s go up the side of the lake and see where there is to camp. B: Yeah. You usually ﬁnd somewhere don’t you. A: Oh yes. I’ve done it for years you know. B: But, well we have the car so . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . we can sleep in that.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

a. b. c. d. e.

in the worst case(L1: im schlimmsten Fall) if the best comes to the worst if the worst comes to the worst if it comes hard on hard(L1: wenn es hart auf hart kommt) I don’t know [answer: c]

Extract 6 (authentic formulaic sequences but not context appropriate) A: It would be perfect. His Miami tee shirt on. Oh he’d love it. B: Yes he would. A: Florida would be just right for him. He wouldn’t look . . . . . . . . . . . . . . . . . . . . . . He’s got an American body. He’s got the shape for it, completely. a. b. c. d. e.

in shape in form out of place out of touch I don’t know [answer: c]

Extract 8 (L2 near synonyms) A: I don’t think they’d like to go to too many concerts. It’s like Victoria says. I mean you don’t go and see the same band . . . . . . . . . . . . . . . . . . . do you? B: No not unless you really like them. a. b. c. d. e.

after a time time after time time before time time and again I don’t know [answers: b and d]

A second purpose of the multiple-choice test was to access any depth of knowledge in L2 (English) of any potential near synonym sequences for some of the target sequences. To this end, the multiple-choice test also asked participants to state whenever they thought there was more than one possible answer. This was the case with Extract 1 (on and oﬀ, oﬀ and on), Extract 7 (it’s a good job, it’s just as well) and Extract 8 (time after time, time and again). These responses were recorded and analysed independently of the context appropriate answers. It was

99

200 Carol Spöttl and Michael McCarthy

hoped this would be an indicator of the participants’ receptive depth of knowledge of the selected formulaic sequences in speciﬁc given contexts.

Reﬂective self-assessment questionnaire The research project asked participants to move from L2 to L1 on relatively sophisticated linguistic items. Success or failure in task completion would logically be dependant on the participants’ clear understanding of the L2 item. To investigate this, a learner questionnaire was designed to ascertain the participants’ perspective on their receptive knowledge of the sequences in the L2 (English). The questionnaire consisted of a hard copy of the eleven extracts with the target sequences printed in bold type. The participants were asked to study the extracts and then do two tasks. First they were asked whether they knew the individual sequence or not. Second, they were asked to tick yes or no whether they felt they could translate them into their L1 (German) (see Appendix 3). The translation question was included on the questionnaire because translation is a task the participants were well familiar with. It was hoped that by selecting the direction L2 to L1, insights into the transparency of meaning of the selected sequences could be gained, because participants would be able to explain the meaning in their L1, which would provide ease and variety of expression not otherwise available in L2, L3, or L4.

Piloting The questionnaire and multiple-choice test were piloted with three language staﬀ members, one English native speaker and two German native speakers. These members were not involved in the project itself. The piloting revealed that the tasks set were clear, the rubrics comprehensible and time estimated to complete them accurate. The results of the multiple-choice test pilot did lead to a change. The original version contained four alternatives (a-d) plus a ﬁfth (e other) to draw out any other knowledge participants might have of related sequences. It was quickly clear that the answers to this alternative could neither be easily quantiﬁed nor would they establish whether the participants knew the formulaic sequence or not. For this reason alternative e) was changed to e) I don’t know. The ﬁrst author had considerable experience with TAPs and was training the participants in their use, hence piloting for this procedure was considered unnecessary.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

Procedure A timetable outside classroom time was set up for recordings and participants chose a time convenient to their own personal schedule that would ensure no time pressure on the research task. Two adjoining non-teaching rooms were used to allow for two candidates per session, but ensure privacy for each during the introspection. Data from TAPs has been shown to be aﬀected when participants feel they are being observed. Participants were ﬁrst given a biodata form to complete and then instructed on the use of the mini disc player. Participants were given a demonstration of what an introspective verbal report involves on a subject not related to the research topic. Two further training tasks were administered, one involving performing simple mental arithmetic tasks aloud, the other set a multiple-choice vocabulary test where students were required to select an answer by thinking aloud on the four alternative responses. All students did both training tasks immediately prior to their own think aloud protocols. The participants’ responses were taped on mini-disc and subsequently transcribed. This resulted in a 26,000-word learner corpus from the think aloud protocols. Permission was gained from each participant to use their data. As part of the triangulation data gathering, two months later the participants were ﬁrst given the formulaic sequence multiple-choice test and then the reﬂective self-assessment questionnaire on receptive knowledge of the sequences. The delay was intended to avoid any direct inﬂuence of the TAP procedure on the subsequent instruments. Respondents were not reminded of the TAPs, but simply asked if they would provide the information required for a research project. Again permission to use the data was obtained.

Participants The main purpose of this study was to investigate formulaic sequence knowledge of advanced learners, because this knowledge is a challenging hurdle in attaining near native-like proﬁciency. However, there was another reason for choosing participants with this particular proﬁciency level for examination. In a previous study (Spöttl and McCarthy, 2003), proﬁciency was found to be a signiﬁcant factor in learner ability to decode multi-word strings. At lower levels of proﬁciency, it was found that learners simply avoided formulaic sequences they did not know. Thus participants with advanced proﬁciency levels in L2

20

202 Carol Spöttl and Michael McCarthy

(English) and L3 (French or Spanish) were chosen for this study. Participants were selected from both staﬀ and students in upper intermediate and advanced courses from both the Department for English Studies and the Department of Romance Languages at the University of Innsbruck. Seventeen participants in all took part; 15 undergraduates aged between 21- 26 (2 male and 13 female) and two male members of staﬀ aged between 40–45. Two biodata forms were used to glean information on the participants; one to establish such factors as number of languages, chronology and time spent studying these languages and the other to elicit participant’s perspective on the dynamic nature of their abilities in the individual skills in each of their languages. Data from these instruments showed the test population to be predominately German L1 users (14 German L1, two bilinguals, German/ English and German/ Italian and 1 Spanish L1). Further, the vast majority of the group gave English as their L2 (15 English, 1 German, 1 Italian). French was the dominant L3 for twelve participants (L3 English 2, L3 Spanish 2). Spanish, Italian, French and Russian (5, 4, 2 and 1 respectively) were the stated L4s. Five participants gave an L5 (Spanish 3, Russian 2), ﬁve an L6 (Portuguese 3, Japanese 1, Norwegian 1) and two gave Greek as their L7. After English, the groups’ linguistic background is heavily weighted towards the Romance languages, a factor expected to play a role in cross-linguistic inﬂuence (hereafter CLI) (Sharwood-Smith and Kellerman 1986) and cross-linguistic interaction (hereafter CLIN) (Herdina and Jessner 2002: 29) in performing productive lexical operations. Seven of the group had spent between 7–9 years learning L2 English, the rest over 10 years (a factor crucial to the understanding of the target sequences). Eight of the participants had spent 4–6 years studying L3 French, three had spent 7–9 years and one over ten (a factor expected to inﬂuence level of proﬁciency in the L3). Five of the group had experienced stays over six months in one or more of the countries using the languages in question (a factor hoped to come into play with the recognition of formulaic sequences not necessarily focused on in classroom teaching).

Results and discussion Think aloud protocols Overview Analysis of the TAP transcriptions revealed three main patterns of response. We will use the following language code conventions: English (regular font) / Ger-

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 203

/ German (bold) / French (italics) / Spanish (bold italics) / Translation (in parentheses) Response Type 1 The L2 extract was read aloud and transferred into the L1 and L3 and/or L4 holistically, without hesitation or repetition or evaluation. We choose to call this automatic processing. Extract 3: . . . now that I come to think about it . . . Participant A: L1 German / L2 English / L3 French into German (reads English text ﬁrst) Also ich meine ich war die einzige (So I mean I was the only woman) . . . jetzt wo ich darüber nachdenke, war ich die einzige Frau (now that I come to think about it, I was the only woman) into French Ce que je veux dire c’est que . . . maintenant où j’y pense, j’étais la seule femme ... Response Type 2 The L2 extract was read; a start at translation was made and then aborted. The L2 formulaic sequence only was repeated and a search process began producing various responses that were then evaluated. This we see as synthetic evaluative processing. Extract 2: . . . all over the place . . . Participant B : L1 German / L2 English / L3 French into German . . . dass ich da Leute von überall treﬀe ( . . . that I meet people there from all over the place) oder Leute von der ganzen . . . ( . . . or people from all over the . . . ) nein, vielleicht nicht von der ganzen Welt (no, maybe not from all over the world) dass ich da Leute von überall her treﬀe ( . . . that I meet people there from all over the place) . . . ja (yes) . . . ok, das ist es, ja (that’s it, yes) . . . into French je rencontre des gens de tout le monde ( I meet people from all over the world) . . . . bien, de . . . de partout (from all over) . . . un peu de partout . . . eh yeah, all over the place . . . I’m trying to . . . I keep trying to say all over the world . . .

204 Carol Spöttl and Michael McCarthy

that’s probably where it comes from and I wanna say that in French .. de tout le monde (from all over the world) . . . no, that’s stupid, that means everybody . . . bien, c’est de rencontrer des gens de.. de un peu de partout. Response Type 3 The L2 extract was read, a start at translation was made and then aborted. The individual words of the formulaic sequence were repeated but not the sequence in its entirety and a search process was begun following individual words or the grammar of the language. This we refer to as analytic evaluative processing. Extract 9: . . . I’m just looking round . . . Participant C: L1 German / L2 English / L3 French into German Kann ich Ihnen helfen, Madam? (Can I help you Madam?) . . . äh . . . Nein, danke, ich sehe mich nur um. (No thanks, I’m just looking around) into French Avez-vous besoin de l’aide, Madame?(Do you need help?) Non, merci beaucoup, (No, thank you) hm . . . hm . . . just looking around . . . hm . . . . . . je suis seulement . . . (I am only . . . )..hm..je suis seulement en train de . . . (I’m only in the middle of . . . ) non,(no) je . . . je ne voudrais que . . . hm . . . was heißt denn umsehen (what does umsehen[look around] mean) . . . je ne voudrais que . . . hm . . . voir..ok, mach ma voir (let’s go for voir [see]) . . . je ne voudrais que voir un peu ( I would like to look a little) Most participants produced Type 2 responses. Only three participants out of seventeen followed Response Type 3 in several of the extracts and ﬁve participants Response Type 1. Participants following Response Type 1 had either native or near native competence in L2. On triangulation with the biodata form, these ﬁve participants had either one or a combination of the following; lengthy stays in English speaking countries (six months to a year), intensive daily contact with English outside the teaching classroom either through the family or through partnerships, and high proﬁciency level in the L3 coupled with lengthy stays in that country (six months to a year). This may indicate that in order to deal eﬀectively and accurately with the opaque meaning in formulaic sequences learners need not only a high level of proﬁciency in the measurable classroom sense, but the repeated experience of these sequences in their natural and authentic contexts and social interaction functions.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 205

Comparing Type 1 with Type 3 responses also illustrates the automaticity involved when formulaic sequences have been solidly mastered leading to the often cited speed and ease in real time processing. However, until the amount of exposure in the respective languages and real-situational use of formulaic sequences allows people to reach a very good mastery of those sequences, their usage will likely not be automatic and the advantages of ease and speed in realtime processing will remain illusive. Response Type 2 raised a number of interesting aspects which will now be discussed in detail.

Strategic approaches to formulaic sequences Howarth (1998) attempts a categorisation of cognitive strategies used by advanced learners based on empirical work (Gabrys-Biskup, 1992; Bahns, 1993; Bahns and Eldaw, 1993; Granger, 1998; Cowie and Howarth, 1996) and identiﬁes ﬁve strategy types in learners’ conscious attention to collocational knowledge in phraseology; avoidance, experimentation, transfer, analogy, and repetition. Supporting evidence of these were found in this study. Of the seventeen participants, none produced an acceptable L1 version for Sequence 4 “what with”. Four ignored the sequence completely (Howarth’s avoidance strategy), the other thirteen produced seven diﬀerent inappropriate versions in German revealing experimentation, transfer and analogy strategies (Howarth, 1998). German response what with

Transliteration

Number of participants

Howarth’s strategies

no response da ich auch Weil um auch nach aber wenn ich auch Was wäre wenn Was ist mit Was damit

because I also Because in order also to but if I also what would be if what is with what with it

4 2 5 2 1 1 1 1

Avoidance Analogy Analogy Experimentation Analogy Analogy Transfer Transfer

Results seemed to show that where meaning can be processed holistically, under certain conditions it can be transferred holistically making for ease in translation. For Sequence 2 all over the place, six versions were produced, ﬁve of which were acceptable and given by sixteen of the seventeen participants. Here it would seem the majority applied Howarth’s transfer strategy taking the adverbial meaning in L2 English and copying it into the various adverbial alter-

206 Carol Spöttl and Michael McCarthy

natives in German. For this sequence, there were neither pauses nor repetitions on the TAPs, a further possible indication of holistic processing with a high degree of automaticity. German response

Transliteration

Number of participants

Von überall Von überall her Von der ganzen welt Aus aller welt Von überall auf der welt Überall

from everywhere from everywhere from all over the world out of all the world from everywhere in the world Everywhere

4 6 3 2 1 1

Not all sequences produced this homogenous strategy choice. It would be oversimplistic to suggest that similarity between L2 and L1word class dominates the choice of strategy and productive response. But where meaning is processed holistically but cannot be transferred holistically, then translations reveal greater variation and may draw on other word classes or searches for idiom-equivalents. This was the case with both Sequence 6 out of place and Sequence 7 it’s a good job. In cases where a wide range of strategies were observed, a variety of linguistic paths were also recorded. Sequence 7 it’s a good job has a linguistically heavy noun “job” which might be expected to function as a key processing item. Here data suggested that for some participants the noun in fact inhibited holistic transfer of meaning into L1. The sequence generated the following linguistic paths: a. b. c. d.

Literal translation based on the noun Idiomatic translation based on noun Super-ordinate translation based on noun Rejection of noun inhibitor in favour of holistic translation based on adjectival meaning e. Rejection of noun inhibitor, in favour of holistic translation based on idiomatic chunk

German response

Transliteration

Processing choice

Lexical paths L2 → L1

es ist eine gute Arbeit es ist ein Glück es ist eine gute Sache es war good es ist günstig das ist gut es ist wirklich super das ist richtig so Gott sei dank

It is a good job* It is a piece of luck It is a good thing It was good It is favourable that is good It is really super That is right like that Thank God

A B C D D D D D E

job → work → Arbeit good job → Glück job → thing → Sache Processed holistically Processed holistically Processed holistically Processed holistically Processed holistically Processed holistically

*in the sense it is a job well done or a good piece of work

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 207

Sequence 7 out of place might be expected to produce a similar range of strategies and linguistic paths starting from the noun place. German has this route to oﬀer with the noun Platz (place) and the idiom fehl am Platz. This was not the case. Only one participant took this route but added the verb sein and a negation nicht which linguistically would not be necessary. This sequence generated predominately verb based translations and the following linguistic paths: a. b. c. d. e.

verb based translation idiomatic translation based on verb holistic translation based on idiomatic chunk holistic translation based on noun phrase concept mediation “belonging”

This might suggest that with this particular sequence there was a shared underlying meaning between L1 and L2 or “concept mediation” (de Groot, 2002: 37). The ﬁnal four versions on the list were inappropriate responses revealing the sequence clearly had not been understood. German response

Transliteration

Processing choice

überhaupt nicht heraus stehen überhaupt nicht heraus stechen nicht so auﬀallen nicht fehl am platz sein nicht aus der Reihe fallen er würde genau dorthin passen er würde nicht aus der Reihe tanzen nicht einmal merken dass er von dort ist würde aussehen als gehörte er dorthin als wäre er ein Einheimischer auf nichts anderes schauen würde sehr überrascht darüber sein ganz genau in der richtigen Stelle sein würde nicht weg gehen wollen

wouldn’t stand out wouldn’t stick out wouldn’t be noticed wouldn’t be out of place wouldn’t fall out of line would ﬁt in there perfectly wouldn’t dance out of line wouldn’t notice he was from there looks as though he belongs there looks like a local wouldn’t look at anything else wouldn’t be very surprised Be right in the correct spot wouldn’t want to go away

A A A B C E C A E D A A D A

Formulaic sequences and functional use The formulaic sequence that produced the lowest degree of response variance was Sequence 9 I’m just looking. All seventeen participants produced an acceptable and appropriate German version with no hesitations or repetitions or unsuccessful search beginnings. All ﬁve versions were in themselves formulaic in German and all meant I’m just looking. An explanation may lie in the pragmatically specialized nature of this sequence unlike the other ten. All approaches to

208 Carol Spöttl and Michael McCarthy

the study of formulaic language stress the importance of their functional aspect, i.e. the fact that certain language sequences have conventionalized meanings which are used in certain predictable situations. Weinert (1995: 195) further argues for research into the relation between formulaic language and the development of pragmatic competence as a central learning task. Sequence 9, which would be most likely found in service encounters, falls under this category. Furthermore, it is an area where participants can be expected to have high degrees of experience, practice, and hence again, increasing automaticity. This may allow the speculation that multiword items which are lexically frozen and have a limited or specialized pragmatic function are more easily transferred into L1 and therefore would seem to be processed holistically. German response

No. of participants

ich schaue nur 4 ich schaue mich nur ein wenig um 7 ich sehe mich nur um 3 ich sehe mich nur ein bischen um 1 ich möchte mich nur umsehen 2

Insights into production and processing were not only gained from the degree of response variation but also from errors. Sequence 4 what with recorded the following errors in production. Two of the inappropriate German responses were: German response

Transliteration

Functional formulaic sequence equivalent

was ware wenn was ist mit

what would be if what is with

what if what about

These responses seem to reveal searches for stored sequences containing “what”, showing the participants appear to be aware of the functional uses of the formulaic sequences what if (condition) and what about (suggestion) but not what with (explanation). That almost half of the group took the formulaic sequence to mean the same as the conjunction “because” and went for a causal solution weil (because) and da ich auch (because I also . . . ) would appear to be evidence of holistic processing rather than analytical processing. The responses may also be illustrating paths taken in the development of depth of meaning for formulaic sequences of the pattern what + prep, thus demonstrating that to “know” a FS begins by ﬁrst clarifying it’s meaning level representation or establishing its conceptual representations (as in de Groot’s concept mediation) before it can be mapped on to the appropriate equivalent in the L2.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 209

Evidence from sequences with lexically light components The widest range of versions was produced with Sequence 1 on and oﬀ. There was no evidence here of avoidance but the seventeen participants produced thirteen diﬀerent versions of which only ﬁve were acceptable. On and oﬀ is similar to Sequence 4 what with in that their components are lexically light items. Whereas with Sequence 2 all over the place Howarth’s transfer strategy was applied to prepositional equivalents in L1, in this case the literal L1 prepositional meaning ein und aus (as in: to switch something on and oﬀ /etwas ein und aus schalten) was never used. Direct transfer appears to have been inhibited by contextual clues that have proved too strong and presented problems on a more idiomatic level. Unlike the search for conceptual mediation in the sequence what with, this lexically light sequence is a binomial and appears to have triggered a phonological linked search for many respondents. Research into the processing of binomials has reported that L2 learners are sensitive to phonologically-based constraints on the order of elements in such sequences (Birdsong, 1979; Birdsong and Pinker, 1979). The responses in German seem to posit a role of phonological determinants in selecting an appropriate binomial L1 translation. For Sequence 1, eleven of the seventeen produced immer wieder/ kam und ging / einmal besser einmal schlechter and for Sequence 5 (discussed below) hin und wieder/ immer wieder/ab und an. German response

Transliteration

Number

Wochenlang immer wieder Immer wieder Dieser Schmerz kam und ging Einmal besser einmal schlechter über Wochen hinweg weil ich es für Wochen nicht los wurde weil es über Wochen gedauert hat ich hatte es immer wieder Wochen lang weil es über Wochen gedauert hat Einmal hatte ich sie und einmal nicht

again and again for weeks again and again this pain came and went sometimes better / worse for weeks on end I couldn’t get rid of it for weeks because it lasted for weeks I had it again and again for weeks because it lasted for weeks once I had it and once not

5 3 2 1 1 1 1 1 1 1

Sequence 5 every now and then was another formulaic sequence with lexically light items. It generated seven diﬀerent L1 responses, revealing how opaque participants found its meaning. However none of these responses showed evidence of lexical processing at individual word level. The sequence seems to have reached a level of shared semantic meaning between L2 and L1 lexicons (cognitive mediation again) as illustrated by the idiomatic nature of some of the search results and the non-literalness of the translations given.

20

Carol Spöttl and Michael McCarthy German response

Transliteration

No. of participants

Alle heiligen Zeiten hin und wieder Immer wieder ab und an Manchmal Jedesmal Ständig

At all holy times every now and then again and again oﬀ and on Sometimes every time Always

1 3 8 1 2 1 1

Evaluating meaning-based searches for formulaic sequence equivalents across the languages for suitability and accuracy Unlike the ﬁndings in the previous study (Spöttl and McCarthy 2003) investigating this group with high level foreign language proﬁciency, the TAPs produced much more evidence of concept mediation translations (de Groot 2002) with word associations translations only appearing in type three processing presented above. Investigating TAP’s from L2 into spoken L1 allows a further insight. In a spoken protocol, changes are easy and faster to make than in a formal written exam performed under time pressure (Granger, 1998). Further, given that learners clearly have their greater proﬁciency in their L1, their considerations and choices when asked to move from L2 to L1 can reveal an awareness of collocational and connotational nuances in L2 formulaic sequences, thus providing an additional insight beyond accurate or inaccurate understanding of the sequences. Such responses may help to reveal the degree of depth of knowledge of the formulaic sequences. Participants often considered suitability of their translations, and we call this further strategy “equivalence evaluation”. Transcription extracts from a TAP Extract 1: “On and oﬀ ” : equivalence evaluation into French Participant B: L1 German / L2 English / L3 French . . . on and oﬀ . . . God . . . je l’avais eu de temps en temps . . . (I’ve had it from time to time) but that’s not really the same . . . from time to time, that’s from time to time . . . de temps en temps . . . pendant des semaines et des semaines . . . ( on and oﬀ) mmmh . . . I should go back to that . . . that’s not really what it is . . . on and oﬀ . . . what could that be in French . . . . bien, I’ll just use the . . . I’ll just use de temps en temps . . . je l’avais eu et pas eu, ben, je l’avais eu de temps en temps pendant des semaines et des semaines

Here the participant meets the FS on and oﬀ, indicates a problem but produces one L3 FS version which she then evaluates with a further L2 FS version. None of the evaluation goes through the L1 phrasicon. This example demonstrates

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

that learners are not only able to transfer FS across languages but that advanced learners, capable of automatic holistic processing, can do so without necessarily activating L1 knowledge. Extract 4: “what with” : equivalence evaluation into German Participant B: L1 German / L2 English / L3 French well that . . . what with . . . is a bit of a . . . hmmm . . . what with going to London as well . . . I would say it with . . . weil . . . but that’s not exactly the expression . . . . . . . . äh . . . there is no other way in German of saying this “what with” . . . was mit . . . no you can’t say that . . . . but Tage ware eigentlich . . . eigentlich nur ein paar Tage . . . that really would better be eigentlich . . . aber ich habe ja eigentlich nur ein paar Tage . . . äh . . . mit dem Ganzen nach London fahren und so . . . . you could say that . . . mit mit mit der ganzen Fahrt nach London noch und mit dem Unterkunft schauen und so . . . that’s not a grammatically correct sentence but well! . . .

Here although the same participant demonstrates clear and ﬁrm conceptual representation, the processing type is synthetic evaluative. Three L1 versions are produced, evaluated and two discarded as not being suitable equivalents. Two of the error versions discussed previously are tried. One is discarded because the L1 (weil) doesn’t map on to the meaning representation existing for the L2 and the second because the L1 literal lexical translation does not exist. The what with/ was mit section in the nearest the participant came to word association translation in all of her protocols. The processing in this case is not of type 1 and is by the same participant as the extract above supporting de Groot’s ﬁndings (2002: 42) that language proﬁciency aﬀects memory representations not only within individual group members but within individual participants. Extract 6: “out of place” : equivalence evaluation into German Participant B: L1 German / L2 English / L3 French er würde überhaupt nicht (he wouldn’t at all) . . . äh . . . . . . . äh . . . . out of place . . . I’ve come across this problem before translating it into German . . . äh . . . er würde er ware überhuapt nicht . . . (he wouldn’t at all) . . . äh . . . you can’t translate it literally . . . überhaupt nicht (not at all).. äh . . . er würde überhaupt nicht herausstehen (he wouldn’t stand out at all) . . . herausstechen (stick out) oder so .. äh . . . er hat einen amerikanischen Körper (he has an American body) . . . er hat die, die Form dafür, total . . . ja . . . he wouldn’t look out of place . . . that’s still not . . . komisch (strange) . . . that’s the same as . . . er würde (he would) . . . yeah, you could say . . . er wäre .. er würde überhaupt nicht auﬀallen (he wouldn’t be out of place) . . . that’s really what it means, isn’t it . . . I think about what it really means if you picture it . . . a lot of Americans walking along there . . . er würde nicht herausstechen (he wouldn’t stick out at all) . . . . . . er würde nicht auﬀallen (he wouldn’t be out of place) . . . yeah, we would say it in a positive way there.

2

22

Carol Spöttl and Michael McCarthy

This extract provides evidence of a learner’s ability to evaluate the appropriacy of FS between L1 and L2. Here the participant states she has met this sequence before and found no solution. Her attempts to render a suitable equivalent L1 FS include a literal translation which she rejects, a concept mediation version and a visualisation of the situation. However, her equivalence evaluation this time progresses to include an appropriacy evaluation of her L1 versions comparing positive and negative connotations of herausstechen with auﬀallen indirectly revealing her knowledge of the positive connotations of the L2 FS. Extract 9: “I’m just looking round”: equivalence evaluation into Spanish Participant D: L1 German / L2 English / L3 French / L4 Italian / L5 Spanish / L6 Russian / L7 Portuguese En la tienda (In the shop) ¿En qué puedo servirle, señora? (Can I help you, Madam?) ¿No . . . puedo . . . (Can I) puedes ayudar . . . (can you help?) puedo ayudarle, señora? (Can I help you, Madam?) No, gracias. (No, thank you) Quiero sólo . . . (I just want) quiero ver lo que hay. (to have a look) No . . . estoy mirando (I’m just looking) a lo mejor. (maybe) Sí. . . . (yes) está mirando . . . quiero mirar.

This extract provides evidence of equivalence evaluation of an L2 FS in an L5 (Spanish) Again here the participant demonstrates automatic holistic processing in both the service encounter formulaic sequences Can I help you? and I’m just looking. The evidence further support’s Bialystok’s view discussed previously that with increased language proﬁciency the link between a word and its meaning representation requires less reliance on L1 mediation. This data shows this also to be true for formulaic sequences. As with the examples above, there is no hesitancy, no repetitions of individual words or syllables nor word association translations in a search for equivalents. Again, two versions are produced and evaluated for their equivalence. Interestingly here the equivalence evaluation of the L5 (Spanish), takes place in Spanish and not L1 (German) or L2 (English), evidence perhaps of Wray’s description of gaining a solid knowledge and full control of a new language and its sequences, and a sensitivity to collocational nuances. Gaining full command of a new language requires the learner to become sensitive to the native speakers’ preferences for sequences of words. (Wray, 2000: 463)

Multiple-choice test From the group of seventeen who took part in the TAPs, fourteen completed the multiple-choice test and self-assessment questionnaires. The multiple-choice

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

test was designed to elicit receptive knowledge of the target sequences. Overall, response success was higher than expected, with only Extract 4 eliciting a score of below 50% accuracy (Figure 1). Experience from a previous study of formulaic sequences (Spöttl and McCarthy, 2003) found that even noticing alone of context-embedded sequences to be a problem; here participants were scoring over 70% on over half of the items (1,2,5,6,8,9, and 10) and over 90% on two items (8 and 9). Further examination of the data revealed that the scores as a group would have been even higher but were negatively aﬀected by three weaker members. In any vocabulary learning situation this would be greeted as success. This may be attributed to two factors; ﬁrstly, the choice of high frequency authentic sequences themselves and second the linguistic background of the participants in this group. In contrast to the previous study, these participants show a high number of other languages, a longer period of study and more outside class experience of the languages. They may allow us to assume that both levels of metalinguistic awareness and general language proﬁciency were higher. These results, even given the low number of items investigated, may suggest a surprisingly overall high degree of receptive knowledge, at least of such high frequency formulaic sequences. To observe true understanding of context appropriacy, as opposed to simple recognition of a genuine formulaic sequence, some distracters on the test 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

1

2

3 4 Correct

5

6

7 8 Incorrect

9

10

11

Figure 1. Multiple-choice test of receptive knowledge of the target formulaic sequences

23

24

Carol Spöttl and Michael McCarthy

were based on similar authentic L2 sequences that were however not contextually appropriate. Extracts 6 and 10 contained such L2 sequences. Scores on these extracts were around the 80% mark, indicating that participants were relatively successful at discriminating between authentic + appropriate formulaic sequences and authentic but contextually inappropriate sequences. From this we might speculate that contextual appropriacy is in fact stored well with sequence meaning. To observe any possible cross-linguistic misunderstandings on a semantic level, other distracters were based on L1 linguistic items (Extracts 3, 9, and 11). Distracters for Extract 11 were constructed on L1 lexical items and Figure 1 shows receptive knowledge at 50% for this item. Here the accuracy of this receptive knowledge would appear to be questionable as in the multiple-choice test, almost 30% of the group gave the two versions (in the worst case and if it comes hard on hard) as acceptable synonyms signalling at least a lack of true mastery of the formulaic sequence. Extracts 3 and 9 contained L1 tense system variants. The distracters based on grammatical variation between L1/L2 tense system in Extract 3 lead to a relatively low score, whereas Extract 9’s clear pragmatic function seems to have overridden any grammatical doubts, with a resultant very high score. The success of the grammatical distracters would appear to be evidence for those respondents having viewed the sequences analytically and against holistic processing. Conversely, the service encounter in Extract 9 appears to have achieved full automaticity and been processed holistically. The multiple-choice test also asked participants to recognise where more than one answer was possible. This was the case with Extract 1 (on and oﬀ, oﬀ and on), Extract 7 (it’s a good job, it’s just as well) and Extract 8 (time after time, time and again). Figure 2 seems to show that, compared to data in Figure 1, the depth of knowledge of the target formulaic sequences involving context decisions is signiﬁcantly lower. Correct responses to Extracts 1 and 7 were below 10%, Extract 8 being slightly higher at 21%. (Responses to Extracts 3 and 5 show correct synonyms given because some respondents wrote extra acceptable examples in addition to the given alternatives.) These low scores may indicate something about the treatment in formulaic sequences in foreign language pedagogy. Whereas single word vocabulary items are frequently taught through synonyms, are supported with dictionaries of synonyms, and are tested by asking for synonyms, these results seem to suggest that the case with formulaic sequences, insofar as they are in fact taught, would not seem to be the same. It may also suggest that where learners encounter formulaic sequences in context, they may become easily satisﬁed with “knowing” the L1 meaning and their training

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

1*

2

3

4

5

6

7*

8*

9

10

11

Identiﬁed additional correct sequence Identiﬁed additional incorrect sequence Indicated no additional sequence was appropriate * Multiple correct answers Figure 2. Knowledge of multiple contextually correct formulaic sequences

in noticing does not yet extend to synonyms of those sequences. In short, the depth of knowledge learners have about formulaic sequences would appear to lag behind that of single word items.

Reﬂective self-assessment questionnaire The questionnaire asked participants whether they felt they knew the sequences and whether they had any problems translating the sequence into their L2 (German). Data from these two questions were then compared with the productive knowledge of the target sequences demonstrated in the TAP responses and the receptive knowledge the participants demonstrated in choosing the appropriate sequence for the context provided on the multiple-choice test. First, with the exception of Extract 4 (what with) and 7 (it’s a good job), over seventy percent of the group consistently stated they knew the sequences, generally indicating a high conﬁdence level of receptive knowledge. If the participants’ perceptions were accurate and consistent, then all four bars for each string would be equal, i.e. if they said they knew the sequence and thought they had no problems translating it, then this should correspond with the number of actual

25

26

Carol Spöttl and Michael McCarthy 100% 80% 60% 40% 20% 0%

1

2

3 4 5 6 7 8 9 10 Perception of knowledge of formulaic sequences Perception of ability to translate formulaic sequences Demonstrated receptive knowledge from multiple-choice test Demonstrated productive knowledge from TAP

11

Figure 2. Comparison of participants’ perception of formulaic sequences and demonstrated knowledge

correct responses given in German on the TAPs, and answers on the multiplechoice test. Sequence 9 (I’m just looking round) was the only item that produced this equality of response, with Sequence 2 (all over the place) almost reaching it. Yet given the limited number of participants and items researched Sequences 1 and 7 also show an acceptable pattern. Figure 3 demonstrates that although these four sequences demonstrate accuracy between perception and production this was not always the case. Sequences 5,6 and 8 reveal a pattern of perceived receptive knowledge that is not matched with productive performance. This may simply indicate a natural order of acquisition (i.e. receptive knowledge before productive knowledge) showing that some of the participants had not yet reached full mastery of the sequences. In Sequences 2,3,7,9 and 11, however, productive scores are higher than receptive scores. These responses would appear to raise an interesting question regarding the modes of testing formulaic sequences themselves and the ensuing eﬀect on results and processing styles. In this case, the multiple choice test mode itself may in fact be a distracter. The nature of each discrete test item in a multiple choice demands analytic processing and may lure less secure students away from a holistic approach which under real time production conditions they would have otherwise have approached holistically and hence accurately.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4

Overall, the results indicate that the participants had reasonable perceptions regarding their knowledge of many of the target formulaic sequences, but there was also a trend towards overestimation of knowledge for several of the sequences. Clearly one has to be hesitant about generalising from such a small sample and such a limited selection of formulaic sequences, but on the whole these results would appear to demonstrate the diﬃculty involved for learners in consistently perceiving and assessing their true knowledge of formulaic sequences.

Conclusion The evidence presented shows that research into spoken formulaic sequences using high frequency sequences in enriched authentic contexts produces useful results. As corpus data provides real evidence for the prevalence of formulaic sequences, we further stress the importance of investigating these items in attested contexts of use. We conclude that where formulaic sequences are processed holistically (at least based on TAP evidence), it seems that they can be transferred holistically across L1, L3 and L4, albeit by individually determined strategic and linguistic routes. The evidence gathered here suggests three basic response types in transferring meaning across languages and that these are closely linked not only to general language proﬁciency in the L2, L3 and/or L4, but also to exposure to the sequences in authentic non-classroom-based situations. The quantitative data highlighted learners’ diﬃculty in accurately perceiving and assessing their true knowledge of formulaic sequences and indicated little depth of knowledge of individual sequences.

References Altenberg, B. 1998. On the phraseology of spoken English: The evidence of recurrent word combinations. In Phraseology: Theory, Analysis, and Applications, A. P. Cowie (ed.), 101– 122. Oxford: OUP. Bahns, J. 1993. Lexical collocations: A contrastive view. English Language Teaching Journal 47: 56–63. Bahns, J. and Eldaw, M. 1993. Should we teach EFL students collocations? System 21: 101– 114. Bialystok, E. 2001. Bilingualism in Development: Language, Literacy, and Cognition. Cambridge: CUP. Birdsong, D. and Pinker, S. 1979. Speakers’ sensitivity to rules of frozen word order. Journal of Verbal Learning and Verbal Behavior 18: 497–508.

27

28

Carol Spöttl and Michael McCarthy Birdsong, D. 1979. Psycolinguistic Perspectives on the Phonology of Frozen Word Order. Harvard University: Unpublished doctoral dissertation. Birdsong, D. 1995. Iconicity, markedness, and processing constraints in frozen locutions. In Syntactic Iconicity and Linguistic Freezes: The Human Dimension, M. E. Landsberg (ed.), 31–45. Berlin:Mouton de Gruyter. Bolander, M. 1989. Prefabs, patterns and rules in interaction? Formulaic speech in adult learners’ L2 Swedish. In Bilingualisn across the Lifespan, K. Hyltenstam and L. K. Obler (eds), 73–86. Cambridge: CUP. Bolinger, D. 1976. Meaning and memory. Forum Linguisticum 1: 1–14. Celce-Murcia, M. 1978. The simultaneous acquisition of English and French in a two-yearold child. In Second Language Acquisition: A Book of Readings, E. Hatch (ed.), 38–53. Rowley MA: Newbury House. Cowie, A. P. and Howarth, P. 1996. Phraselogical competence and written proﬁciency. In Language Education, G. M. Blue and R. Mitchell (eds), 80–93. British Studies in Applied Linguistics 11. Clevedon: Multilingual Matters. De Groot, A. M. B. 2002. Lexical representation and lexical processing in the L2 user. In Portraits of the L2 User, V. Cook (ed.), 32–63. Clevedon: Mulitilingual Matters. De Groot, A. M. B. and Hoeks, J. K. J. 1995. The development of bilingual memory: Evidence from word translation by trilinguals. Language Learning 45: 683–724. DeCarrico, J. 1998. Syntax, lexis and discourse: Issues in redeﬁning the boundaries. In Perspectives on Lexical Acquisition in a Second Language. K. Haastrup and Å. Viberg (eds), 127–147. Lund: Lund University press Ellis, N. C. 1996. Sequencing in SLA: Phonological memory, chunking and points of order. Studies in Second Language Acquisition 18: 91–126. Ericsson, K. A. and Simon, H. A. 1987. Verbal reports on thinking. In Introspection in Second Language Research, C. Faerch and G. Kasper (eds), 24–53. Clevedon: Multilingual Matters. Ericsson, K. A. and Simon, H.A 1993. Protocol Analysis: Verbal Reports on Data. Cambridge MA: MIT Press (revised edition). Gabrys-Biskup, D. 1992. L1 inﬂuence on learners rendering of English collocations. A Polish/ German empirical study. In Vocabulary and Applied Linguistics, P. J.L Arnaud and H. Béjoint (eds), 85–93. London: Macmillan. Gass, S. 1988. Integrating research areas: A framework for second language studies. Applied Linguistics 9: 198–217. Gass, S. M. and Selinker L. 1994. The lexicon. Second Language Acquisition: An Introductory Course. Hillsdale NJ: Lawrence Erlbaum. Granger, S. 1998. Prefabricated writing patterns in advanced EFL writing: Collocations and Formulae. In Phraseology: Theory, Analysis and Applications, A. P. Cowie (ed.), 145–160. Oxford: Clarendon Press. Harley, B. 1995. The lexicon in second language research. In Lexical Issues in Language Learning, B. Harley (ed.), 1–28. Ann Arbor/Amsterdam: Language Learning/John Benjamins. Hinger, B. and Spöttl, C. 2001. A Multilingual Approach to Vocabulary Acquisition. Interactive CD-Rom: L3 Conference, Fryske Akademy, 2002, see: http://www.inomedia.at/papers/ spoettl_hinger/index.html. Herdina, P. and Jessner U. 2002. A Dynamic Model of Multilingualism: Perspectives of Change in Psycholinguistics. Clevedon: Multilingual Matters.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 Howarth, P. 1998. Phraseology and second language proﬁciency. Applied Linguistics 19: 24– 44. Laufer, B. 1989. A factor of diﬃculty in vocabulary learning: Deceptive transparency. In Vocabulary Acquisition, P. Nation and R. Carter (eds), 10–20. Amsterdam: Free University Press. Laufer, B. 1991. Why are some words more diﬃcult than others? Some intralexical factors which aﬀect the learning of words. IRAL 28: 293–307. Levenston, E. 1979. Second language acquisition: Issues and problems. Interlanguage Studies Bulletin 4: 147–160. Leow, R. 1997. Attention, awareness and foreign language learning behaviour. Language Learning 47: 467–505. Libben, G. 2000. Representation and processing in the second language lexicon: The homogeneity hypothesis. In Second Language Acquisition and Linguistic Theory, J. Archibald (ed.), 228–248. Oxford: Blackwell. McCarthy, M. J. 1998. Spoken Language and Applied Linguistics. Cambridge: CUP. McCormack, P. 1977. Bilingual linguistic memory: The independence-interdependence issue revisited. In Bilingualism: Social, Psychological, and Educational Implications, P. Hornby (ed.), 57–67. New York: Academic Press. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases in Language Teaching Oxford: OUP. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J. C. Richards and R.W. Schmidt (eds), 191–226. New York: Longman. Schmidt, R. 1995. Attention and awareness in foreign language learning. Honolulu HI: University of Hawii Press. Schmidt, R. 2001. Attention. In Cognition and Second Language Instruction, P. Robinson (ed.), 3–32. Cambridge: CUP. Sharwood-Smith, M. and Kellerman, E. 1986. Crosslinguistic inﬂuence in second language acquisition: An introduction. In Crosslinguistic Inﬂuence in Second Language Acquisition, E. Kellerman and M. Sharwood Smith (eds), 1–9. New York: Pergamon Press. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Söderman, T. 1993. Word associations of foreign language learners and native speakers — the phenomenon of shift response type and its relevance for lexical development. In NearNative Proﬁciency in English, H. Ringbom (ed.), 91–182. Abo: Abo Akademi, English Department Publications. Spöttl, C. and McCarthy, M. 2003. Formulaic utterances in the multi-lingual context. In The Multilingual Lexicon, J. Cenoz, B. Hufeisen, and U. Jessner (eds), 133–151. Dordrecht: Kluwer. van Lier, L. 1991. Inside the classroom: Learning processes and teaching procedures. Applied Language Learning 2: 29–68. van Lier, L. 1994. Language awareness, contingency, and interaction. AILA Review 11: 69–82. Weinert, R. 1995. The role of formulaic language in second language acquisition. Applied Linguistics 16: 180–205. Wray, A. 1999. Formulaic language in learners and native speakers. Language Teaching 32: 213–231. Wray, A. 2000. Formulaic sequences in second language teaching: Principle and practice. Applied Linguistics 21: 463–489. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

29

Appendix 1 Context embedded formulaic sequences Extract 1 A: I had a bad throat. The doctor just gave me these tablets, er capsules, to take and he waited till I went back again. Cos I’d had it on and oﬀ for weeks and weeks before he decided to send me to the specialist. B: Mm. Extract 2 A: What I really like about my job is meeting people from all over the place. B: Yeah. A: plus you know really diﬀerent backgrounds and cultures. I love it you know. Extract 3 A: Er I mean I was the only woman now that I come to think of it. I was the only woman on the staﬀ of the New Town. All the other oﬃcers were men but we worked as a team. B: Mhm. Extract 4 A: I’m going to Wales on around the eighth of August. I’ll only have a few days really, what with going to London as well, to look for accommodation. B: Yeah, you said you didn’t get back till the 18th. Extract 5 A: He’s not quite as motivated towards teaching is he? He wonders every now and then whether he’s in the right job. B: Yeah. Extract 6 A: It would be perfect. His Miami tee shirt on. Oh he’d love it. B: Yes he would. A: Florida would be just right for him. He wouldn’t look out of place. He’s got an American body. He’s got the shape for it, completely. Extract 7 A: Owen always says what he wants. He has ideas. B: Mm. Well it’s a good job Sarah married somebody like that isn’t it or they’d just be sitting there on the sofa because neither of them would say what they wanted to do. A: Yeah. Extract 8 A: I don’t think they’d like to go to too many concerts. It’s like Victoria says. I mean you don’t go and see the same band time after time do you? B: No not unless you really like them.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 Extract 9 [In a shop] A: Can I help you, Madam? B: No, thank you, I’m just looking round. Extract 10 A: We’ve got so much food there there’s no way we’re gonna eat it all. B: Well you’ve got some pudding to eat haven’t you? A: Yeah I know. Extract 11 A: Let’s go up the side of the lake and see where there is to camp. B: Yeah. You usually ﬁnd somewhere don’t you? A: Oh yes. I’ve done it for years you know. B: But, well we have the car so if the worst comes to the worst we can sleep in that.

22

Appendix 2 Formulaic sequences MC TEST Read the extracts below taken from authentic spoken dialogues. Decide whether the phrases below them are appropriate in that context. More than one answer is possible. Choose e. I don’t know if you simply just don’t know. Example A: How’s Audrey then? How is she keeping? B: Fine . . . . . . . . . . . I haven’t really heard from her since the last time. a. as wide as I know b. as far as I know c. to my knowledge d. to my knowledges e. I don’t know Extract 1 A: I had a bad throat. The doctor just gave me these tablets, er capsules, to take and he waited till I went back again. Cos I’d had it . . . . . . . . . . . . . . . . . . . for weeks and weeks before he decided to send me to the specialist. B: Mm. a. in and out b. on and under c. on and oﬀ d. oﬀ and on e. I don’t know Extract 2 A: What I really like about my job is meeting people from . . . . . . . . . . . . . . . . . . . . . . . . . . . . B: Yeah. A: plus you know really diﬀerent backgrounds and cultures. I love it you know. a. all over the place b. all under the place c. all in the place d. all about the place e. I don’t know Extract 3 A: Er I mean I was the only woman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I was the only woman on the staﬀ of the New Town. All the other oﬃcers were men but we worked as a team. B: Mhm.

Comparing knowledge of formulaic sequences across L1, L2, L3, and L4 a. b. c. d. e.

now that I think on it now that I come to think of it now that I go to think about it now that I’m thinking about it I don’t know

Extract 4 A: I’m going to Wales on around the eighth of August. I’ll only have a few days really, . . . . . . . . . . . . . . . . . . . . . . . going to London as well, to look for accommodation. B: Yeah, you said you didn’t get back till the 18th. a. how with b. where with c. why with d. what with e. I don’t know Extract 5 A: He’s not quite as motivated towards teaching, is he. He wonders . . . . . . . . . . . . . . . . . . . . . . . . whether he’s in the right job. B: Yeah. a. every then and now b. every now and then c. every here and there d. every there and here e. I don’t know Extract 6 A: It would be perfect. His Miami tee shirt on. Oh he’d love it. B: Yes he would. A: Florida would be just right for him. He wouldn’t look . . . . . . . . . . . . . . . . . . . . . . . He’s got an American body. He’s got the shape for it, completely. a. in shape b. in form c. out of place d. out of touch e. I don’t know Extract 7 A: Owen always says what he wants. He has ideas. B: Mm. Well, . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarah married somebody like that, isn’t it or they’d just be sitting there on the sofa because neither of them would say what they wanted to do. A: Yeah. a. it’s a good job b. it’s a ﬁne job c. it’s a good decision d. it’s just as well e. I don’t know

223

224 Carol Spöttl and Michael McCarthy Extract 8 A: I don’t think they’d like to go to too many concerts. It’s like Victoria says. I mean you don’t go and see the same band . . . . . . . . . . . . . . . . . . . do you? B: No not unless you really like them. a. after a time b. time after time c. time before time d. time and again e. I don’t know Extract 9 [In a shop] A: Can I help you, Madam? B: No, thank you, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a. I just look around b. I’m just looking around c. I’m just looking about d. I just look about e. I don’t know Extract 10 A: We’ve got so much food there . . . . . . . . . . . . . we’re gonna eat it all. B: Well you’ve got some pudding to eat haven’t you? A: Yeah I know. a. there’s no room b. there’s no space c. there’s no way d. there’s no time e. I don’t know Extract 11 A: Let’s go up the side of the lake and see where there is to camp. B: Yeah. You usually ﬁnd somewhere don’t you. A: Oh yes. I’ve done it for years you know. B: But, well we have the car so . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . we can sleep in that. a. in the worst case b. if the best comes to the worst c. if the worst comes to the worst d. if it comes hard on hard e. I don’t know

Appendix 3 Questionnaire Look at the Extracts on the sheet provided. Can you take the time to think carefully about your responses to the phrases in bold print. Do you know the phrase in bold print? Please tick Yes or no. Phr no: 1. 2. 3. 4. 5. 6.

Yes No ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐

Phr no: 7. 8. 9. 10. 11.

Yes ☐ ☐ ☐ ☐ ☐

No ☐ ☐ ☐ ☐ ☐

Did you have any trouble translating the phrase in bold print into your L1 (German or other)? Phr no: 1. 2. 3. 4. 5. 6.

Yes No ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐ ☐

Phr no: 7. 8. 9. 10. 11.

Yes ☐ ☐ ☐ ☐ ☐

No ☐ ☐ ☐ ☐ ☐

The eﬀect of typographic salience on the look up and comprehension of unknown formulaic sequences Hugh Bishop

University of Wisconsin-Madison

Introduction A number of researchers have suggested that word strings in the form of holistic pre-fabricated chunks of language play an important part in both L1 and L2 language development (e.g., Nattinger and DeCarrico, 1992; Lewis, 1993; Wray, 2000). However, Wray (2001) notes that, although L2 beginners appear to rely heavily on limited quantities of formulaic language, intermediate and advanced learners appear to be stymied by formulaic sequences as they try to emulate native speakers. For example, Yorio (1989) claims L2 adults have great problems acquiring and using conventional sequences. Irujo (1993) describes how very advanced learners of English, who exhibit few grammatical or lexical errors, display a relatively poorer knowledge of idioms. Farghal and Obiedat (1995) report that English majors, whose L1 is Arabic, exhibit poor knowledge of collocations in common areas of discourse (e.g. food or weather). Howarth (1998) asserts that students who can competently use the combinatorial rules of English to produce apparently grammatical sentences are betrayed by the non-native lexical combinations they employ. Peters (1983) suggests that novel, non transparent formulaic sequences are unlikely to be understood by learners This struggle with formulaic sequences may stem from L2 users having problems identifying unknown formulaic sequences (Pawley and Syder, 1983). English words consist of a sequence of symbols “bounded on either side by a space or a punctuation mark” (Carter, 1998: 4), whereas formulaic sequences vary in degree of compositionality, and have no clearly delineated boundaries. Howarth (1996: 186) argues L2 learners’ problems with formulaic sequences are attributable to “a lack of awareness of the phenomenon”. Wray (2001: 206) suggests that learners are at a disadvantage when “trying to express ideas idiomatically” be-

228

Hugh Bishop

cause of the diﬃculty of distinguishing formulaic sequences from the plethora of nonformulaic word combinations which can be generated from individual words.

Are formulaic sequences simply not noticed? Could the idea of noticing aﬀord some clues to the diﬃculties that L2 learners have with formulaic sequences? Schmidt (1990, 1992) claims that conscious awareness at the level of noticing (deﬁned as ‘availability for verbal report’) is a necessary and suﬃcient condition for converting input to intake, and that the requirement of noticing applies to vocabulary as well as syntax, phonology, and pragmatics. Most studies of noticing have focused on grammar (e.g., Schmidt 1995; Doughty and Williams, 1998). However, if noticing applies to vocabulary, it should therefore apply to formulaic sequences. Not noticing the holistic form of a formulaic sequence should interfere with its processing and subsequent learning as a unitary whole. A possible mechanism by which this happens is illustrated using De Bot, Paribakht, & Wesches’s (1997) adaptation of Levelt’s (1989, 1993) L1 lexical processing model.

A processing model for L2 vocabulary Levelt (1989) claims that certain information about a word is represented at two levels in the mental lexicon: phonological and orthographic information is represented in the lexeme, and semantic and syntactic information is represented at the lemma level. According to de Bot et al, when a known word is processed during reading, the orthographic pattern must ﬁrst be recognized and matched with a lexeme, which activates the lemma thus accessing the syntactic and semantic properties of the word. With an unknown word, however, the L2 reader must ﬁrst focus on the unknown form and attend to it as being of suﬃcient interest to attempt to ﬁnd the meaning. The attempt to ascertain meaning of an unknown lexeme leads to the setting up of an empty knowledge structure for a new lemma (Paribakht and Wesche 1996), which can be ﬁlled using a variety of knowledge sources (e.g., inferring or looking up words). Filling the empty lemma implies comprehension, and this is a necessary condition for learning.

The eﬀect of typographic salience 229

Are empty lemma structures set up for formulaic sequences? For this to happen, the strings of words must ﬁrst be perceived as lexemes, i.e. as wholes with unitary meanings. For example, the word receive (as in receive criticism), can be replaced by the formulaic sequence come in for. The formulaic sequence come in for is noncompositional (none of its components individually match receive), and these three words in combination must be recognized as a lexeme before a lemma can be set up, which can then be ﬁlled with the information that syntactic and semantic properties overlap with that of receive. Only then can come in for begin to be substituted for receive. According to this account, it is clear that recognition of the form of the formulaic sequences is an essential step towards its being learned. But whereas unknown word forms are perceptually salient and clearly delineated (i.e., bounded by space), formulaic sequences have no consistent form. Wray (2001: 208) describes the consequences. While native speakers have access to a large store of formulaic sequences, non-native speakers don’t. They don’t know which sequences are grammatically constructed and which are idiomatic. Facing the tricky problem of sorting formulaic sequences from grammaticallygenerated combinations, non-native speakers may be unaware that they don’t know that a given sequence is formulaic.

Typographic salience If recognizing the form of formulaic sequences is a signiﬁcant source of L2 difﬁculty, then this problem might be ameliorated by making the form perceptually salient (i.e., highlighting it). Examples of techniques for highlighting include using bold face, underlining, using color, and using asterisks (Tullis, 1988). Christ (1975) in a review of the role of visual cues on visual displays found that color could often be useful in search tasks, and sometimes be useful in identiﬁcation tasks. However, although highlighting text can lead to increased attention, there is always the danger of attention being directed wrongly to items such as distractors (Fisher & Tan 1989). Indeed de Ridder (2002) suggests that highlighted text items might draw L2 readers’ attention to speciﬁc lexical items at the expense of awareness of larger text organization structure (with a consequent adverse eﬀect on global comprehension). Instead of the term highlighting, Grabinger & Osman-Jouchoux (1996: 191) use the term directive cues (e.g., underlining, color, bold type) which they deﬁne as “format changes in the text designed to capture and focus the readers’

230 Hugh Bishop

attention on a particular portion of that text”. They support their claims for the eﬃcacy of directive cues by arguing three things: First, perception is selective (Fleming & Levie, 1978). Second, there is usually too much information in the environment for the perceiver to handle (Kolers et al., 1981). Third, directive cues foreground the target item (Butler, 1980) making the target more conspicuous. But is conspicuousness alone enough? Hegelheimer and Chapelle (2000) describe a problem associated with making target items conspicuous. The construction of experimental materials by highlighting particular linguistic features embodies the assumption that participants will notice highlighting, but this is not necessarily warranted. When relative pronouns (for example) appear in bold typeface, it does not follow that participants will notice them. For example, Jourdenais, et al. (1995: 206) found little reference was made to input enhancement (highlighting) of linguistic features by participants in retrospective think-alouds. The uncertainty about which typographic features are noticed is compounded in some vocabulary studies. Unlike grammar learning, where researchers typically supply a number of instances of a particular form, vocabulary items in reading investigations more typically appear in the input only once (Hegelheimer & Chapelle, 2000). Therefore, investigators need to pinpoint accurately, and reliably what action, if any, is taken with unknown lexical forms. Hegelheimer and Chapelle’s solution (which is also adopted in this study) was to use computer technology to implement a concurrent noticing approach, involving direct observation of observable interactions (mouse clicks). Using computers in turn leads to other problems. Although considerable research has been done on modifying the typographic appearance of words in paper media, the use of computers rules out many studies of typographical salience, since there is uncertainty about generalizing from paper media to electronic media (Muter, 1996; Grabinger & Osman-Jouchoux, 1996). Let us therefore focus on studies of typographic salience in computer presentations. A number of studies have employed typographic salience to highlight linguistic form. For example, Doughty (1991) used typographic salience to draw attention to relative clause formation, although there was no eﬀect solely attributable to salience. Chun and Plass (1996) used diacritics (o) to draw attention to lexical items available for gloss during online reading, and it had the eﬀect of making them more memorable. But typically, lexical studies that utilize typographic salience don’t focus on the speciﬁc eﬀects of salience vs. non-salience. For example, Hegelheimer and Chapelle (2000) used bold face type to indicate target lexical items that could be glossed by clicking on a hyperlink. Their inter-

The eﬀect of typographic salience

est was not in whether target items in bold face type increased clicking behavior relative to comparable non-bold face items; rather it was on whether clicking on a target item lead to other measurable learning eﬀects. Even amongst the few studies relating typographic salience to vocabulary, caution must be exercised. Black et al. (1992) found that a small superscript black spot following a word indicating a gloss increased the students’ willingness to consult it. Unfortunately for L2 generalization, the participants were ﬁrst language learners of technical vocabulary. In a small scale study, de Ridder (1999, 2000) found that L2 readers clicked signiﬁcantly more on a text with highlighted hyperlinks than the text without highlights. She also reports that increased clicking in the salient condition did not aﬀect reading speed or diminish comprehension. After observing that “no study has looked into the eﬀects of visible links on text comprehension”, de Ridder (2002: 127) carried out a larger study (N=60) which investigated (amongst other things) the eﬀect of typographic salience (underlined blue font) on clicking behavior, and on text comprehension during general and speciﬁc reading tasks. She found that L2 readers were signiﬁcantly more likely to consult glosses which had highlighted hyperlinks than hyperlinks that were invisible (i.e., typographically indistinguishable from the surrounding text). In an accompanying speciﬁc reading task, which included a focus on individual lexical items, the highlighted condition comprehension scores (62.91%) were higher than in the non-highlighted condition (57.58%), but these diﬀerences were not signiﬁcant. In a general reading task the ﬁndings were reversed. The condition without marked hyperlinks had higher scores (61.93% vs. 57.13%), but again the diﬀerence was not signiﬁcant. De Ridder made no distinction in her glosses between words and phrases. Some hyperlinks were words and some were phrases. No speciﬁc attempt was made to study the eﬀects of salience on formulaic sequences or to diﬀerentiate between words and formulaic sequences. But since typographic salience is associated with increased attempts to gloss hyperlinked lexical items, and possibly with some increase in comprehension, it promises to be a proﬁtable line of investigation with respect to formulaic sequences, which are not easily recognized by L2 readers. . There is little disagreement that formulaic sequences are a source of problems for L2 learners. In this experiment it is proposed to construct a text that has a number of diﬃcult vocabulary items that have been glossed (i.e., dictionary type deﬁnitions are available for those items). If it can be shown that typographically salient formulaic sequences are more frequently looked up (i.e., partici-

23

232

Hugh Bishop

pants attempt to access the dictionary deﬁnitions) compared to corresponding non-salient sequences, and that they are better comprehended, this would suggest that part of the problem associated with formulaic sequences is one of lexeme visibility.

Research questions Noticing the orthographic form of an unknown word appears undemanding. The lexeme is perceptually distinct (bounded by spaces). In the case of formulaic sequences, if the reader does not realize that the components are actually part of a larger single conceptual unit, it seems likely that no attempt will be made to ﬁnd the meaning of the whole. Therefore unknown formulaic sequences, being less perceptually salient than unknown words, would be looked up in the glossary fewer times. On the other hand, if unknown formulaic sequences are made perceptually salient could this increase the number of times they are looked up in the glossary? ‘The following questions will be addressed: 1. Will unknown words be clicked on (to access glosses of their meaning) more frequently than unknown formulaic sequences in a computerized reading task? 2. In light of de Ridder’s (2002) ﬁnding that perceptually salient hyperlink glosses are more likely to be clicked on, are perceptually salient formulaic sequences more likely to be clicked on for glosses than non-salient formulaic sequences? 3. Will any increase in glossing of unknown salient formulaic sequences lead to an increase in the number of items comprehended compared to an identical text containing no salient formulaic sequences?

Methodology Selection of target words and formulaic sequences The formulaic sequences described in this study constitute a simpler more easily identiﬁable subset of Wray’s formulaic sequences (Wray 2001: 74) which “involve the use of two or more words to express a simple idea.” The formulaic sequences chosen have one word synonyms and a straightforward conceptual meaning. The ten target sequences and ten target words can be looked up in the

The eﬀect of typographic salience

same manner with contextually appropriate denotative meanings. Since it was intended that target items would be less well known than the text they were embedded in, no target items belonged to the 2000 most frequent English words or their derivations. (See Appendix 1. for the target items.)

Development of the instruments A 750-word text on Global Warming was written, with embedded target words and formulaic sequences. Based on a 1988 PBS video interview between Bill Moyers and the biophysicist Jessica Tuchman Mathews, this text represented the modiﬁcation of pre-existing advanced ESL learning material recently developed at the University of Wisconsin, but not yet used in classes (so students wouldn’t have been exposed to it). The text was then further modiﬁed using the Range software application (retrieved from http://www.vuw.ac.nz/lals/software.htm) to ensure that all lexical items, except for target lexical items, were in the ﬁrst 2000 most frequent English words. The Collins’ CoBuildDirect Corpus Sampler (http://titania.cobuild.collins.co.uk/form.html) was consulted to ensure that all target items appeared in semantically appropriate contexts. Finally, the resulting text was checked for consistency and naturalness by two professors at UW-Madison whose dialect was standard American English. Some minor lexical modiﬁcations were made as a result. Two parallel versions of this text were created, which diﬀered only in the typographical salience of target formulaic sequences. In the treatment text, the ten formulaic sequences appeared in red and were underlined. Otherwise the fonts were identical in both texts (see Appendix 2.). Twenty true or false comprehension items were written based on the text. Each true or false item focused on a sentence in the text containing a target lexical item. For example, the true/false item The quantity of scientiﬁc evidence supporting global warming is growing focuses on the target sequence pile up, which occurs in the following context: Another problem is that it is not easy to see that global warming is happening, even though a lot of scientiﬁc information is starting to pile up. It was intended that participants needed to know a speciﬁc target word/formulaic sequence in order to correctly answer a given item, although guessing was always possible. Overall, ten of the items pointed to target words, and ten to target formulaic sequences. The texts and questions were then imported into a customised computer program, and glosses (i.e. deﬁnitions using simpler language) were written for the target words and formulaic sequences. Participants could access these glosses

233

234

Hugh Bishop

by clicking on the target items in the text. We needed to diﬀerentiate between lookups of formulaic sequences and words, and so target formulaic sequence meanings were accessed with double clicks and target word meanings with single clicks. If the participant clicked anywhere else (i.e., on non-target items) a pop-up message appeared that no gloss was available. Macromedia Authorware 6 was used to create the program, which tracked all mouse-click requests for target and non-target lexical glosses. It also enabled web access to the reading program at a university computer lab. Thus this study focuses on glossing behavior in the form of mouse clicks. The noticing of the typographical enhancement of the target items (in red and underlined) is potentially quite subjective and therefore diﬃcult to measure. By operationalizing it in terms a conscious request for additional information (by clicking on a target item), a relatively unambiguous measurement can be made. When an unknown item is glossed, it is reasonable to infer that the participant has noticed the unknown item and is attempting to ﬁnd the meaning. This is consistent with Swain’s (1998: 80) view: “It seems essential in research to test what learners actually do, not what the research assumes instructions and task demands will lead learners to focus on.”

Participants The 44 participants were volunteers studying English full time in intermediate level classes, or from upper intermediate level classes studying English part-time and concurrently taking other university classes at the University of WisconsinMadison. Relative numbers of students from intermediate or upper intermediate classes were not tracked, but two thirds were female, and participants were typically in their early twenties. The volunteers received a small amount of compensation for their time. L1s represented were Chinese 16, Korean 10, Spanish 6, Indonesian 4, Japanese 2, Russian 1, Arabic 1, Dutch 1, Thai 1, and two east Asian students who did not specify their mother tongue. All students performed the experiment in one of two University computer labs while the experimenter was present. Twenty one participants were randomly assigned to the treatment group which read the text containing 10 plain target words and 10 typographically salient formulaic sequences, and 23 participants were assigned to the control group which read the identical text, containing both plain target words and also plain target formulaic sequences.

The eﬀect of typographic salience

Procedure Pre-test The participants’ glossing behavior of unknown lexical items while reading was being studied, so it was necessary to ensure homogeneity of groups in terms of 1) reading skill and 2) lexical knowledge of target items. To determine reading levels a TOEFL reading sub-test was administered to all participants at the start of the study. Mean Z score of the control group was 36.76 (SD 9.55) and mean Z score of the treatment group was 36.68 (SD 6.33). No signiﬁcant diﬀerences were found between groups. (Mann-Whitney p = 0.597). The TOEFL reading subtest was administered immediately before the vocabulary pre-test. To determine lexical knowledge, a computerized variation of Wesche and Paribakht’s (1996) vocabulary knowledge scale (VKS) was used. The VKS elicits self-perceived knowledge (self-report) and demonstrated knowledge of written words. Wesche and Paribakht (1996) cite high correlations between self report knowledge and demonstrated knowledge (.95), and a Pearson test-retest reliability of .89. The online vocabulary pre-test consisted of 40 self-report items including the 10 target words and 10 target formulaic sequences. The other 20 (non target) items consisted of single word synonyms of target formulaic sequences, and formulaic sequence synonyms of target words. The choice of nontarget words was made to enable a second experiment to be run in which all target formulaic sequences in the text were replaced by synonymous words, and all target words were replaced by synonymous formulaic sequences (Bishop, in preparation). For each vocabulary item, participants were required to choose one of ﬁve self-report options: 1. 2. 3. 4. 5.

I don’t remember having seen this item before I have seen this item before, but I don’t know what it means I have seen this item before and I think it means . . . . (write synonym) I know this item. It means . . . . (write synonym) I can use this item in a sentence . . . . (write sentence).

For each question, when participants clicked on response numbers 1 or 2 the next vocabulary item appeared on screen. If the participants clicked on choice 3 or 4, a small text entry box appeared in which the meaning of the item had to be typed. Hitting enter brought up the next question. If Response 5 was chosen, a larger text entry box appeared. Since this was a multi-line text box, and enter moved the cursor to the next line of the box, they were instructed to use tab to

235

236

Hugh Bishop

move to the next question. When participants exited the pretest, results were automatically emailed as a text ﬁle to the experimenter. In most cases it was clear to the experimenter when an answer was correct. Where the meaning was unambiguous, spelling mistakes were not penalized. For example, when pile up was given the synonym accumalate it was accepted. Morphological errors were not penalized if the word appeared to be known and its grammatical class was appropriate (for example, when eﬀect was oﬀered as a synonym for consequences it was accepted even though the plural morpheme was missing). Alternative senses of polysemous words were marked as unknown. It was not assumed that knowing one sense of a word implied knowledge of another (Bogaards, 2001). Words which were not known scored 0, and known words scored 1. This 0/1 scoring system suﬃced for the current glossing study, although the actual data collected is richer than was immediately needed. Mean unknown target items for the treatment group were 6.95 (SD 1.75) words and 7.86 (SD 1.68) formulaic sequences. Means for the control group were 6.96 (SD 1.69) words and 7.96 (SD 1.36) formulaic sequences. Knowledge of words (Mann-Whitney p=0.905) and formulaic sequences (Mann-Whitney p = 0.952) was not signiﬁcantly different between the two groups. Only unknown words and formulaic sequences from the pre-test were of interest in the current study of look up behavior. Lexical items that participants knew were identiﬁed from the pre-test and they were excluded from consideration by the experimenter. To allow suﬃcient time for forgetting (Hulstijn 2003), the pre-test was completed one week before the treatment. To cancel out primacy or recency eﬀects, the items in the pretest were randomized automatically for each participant.

Training session This took place a week after the pretest and immediately before the treatment, and all participants did the same training session to learn how to access glossary help while doing three practice pages on screen. The main aim of the training session was to make participants familiar with the following glossary help features: • Single clicks on some words brought up a gloss. • Double clicks on some formulaic sequences brought up a gloss. • Most of the time no gloss would be forthcoming. Instead a pop up message appeared indicating that no gloss was available for this item.

The eﬀect of typographic salience

On the ﬁrst practice page, participants were instructed to access glosses of isolated lexical items by single clicking on speciﬁed words, and double clicking on speciﬁed formulaic sequences, or clicking on other items, which were not actually glossed. Whenever a word or a formulaic sequence was clicked on, a pop-up box appeared with either a gloss or an indication that no gloss was available. On the second practice page, participants practiced clicking on words and formulaic sequences while they read a very short text, No lexical items were ever obscured by the popup box which always appeared near the item clicked on. On the third page participants answered six true/false questions. The procedure was identical to the main experiment, but much shorter. Participants typically completed it in about 5–10 minutes. When participants had ﬁnished the training session, they were prompted to start the experiment. The ﬁrst screen they then met was a splash screen reminding them again that they could access help for words and formulaic sequences by using clicks and double clicks respectively, but that not every word was glossed.

Experiment Participants read a 2 page (non-scrolling) text about global warming, with the 20 embedded target lexical items. After reading the ﬁrst two pages on global warming, there was a third page with 20 true/false items. Participants could page backwards and forwards between text and questions, and they answered the questions by clicking in a true or a false checkbox. Each true/false question was answerable by referring to a sentence in the text containing a target item, and no two target items occurred in the same sentence. Answers were automatically tracked by the program and included in the data sent to the experimenter. Results Table 1 shows the mean number of times the various unknown lexical items were clicked upon for glosses (known items were controlled for by being excluded after the pre-test). The number of lookups for words in the treatment and control conditions does not vary signiﬁcantly (2.05 vs. 2.22; Mann-Whitney p = 0.73)1, which is to be expected because the words were not altered typographically in either condition, and so were presented identically. Between groups it can also be seen that salient formulaic sequences were glossed signiﬁ-

237

238

Hugh Bishop Table 1. Number of unknown lexical items glossed Condition

Words FS

Treatment

Control

Mean

SD

Min

Max

Mean

SD

Min

Max

2.05 5.00

2.01 3.33

0 0

6 10

2.22 1.43

1.68 1.44

0 0

5 4

cantly more frequently (5.00 vs. 1.43) than non-salient sequences (Mann-Whitney p = .0005). In the control condition, the target words (plain) were clicked upon signiﬁcantly more often than the non-salient formulaic sequences (2.22 vs. 1.43; Wilcoxon Signed-Rank p = .01).2 However, in the treatment condition, the power of typographic salience is demonstrated. Unknown salient formulaic sequences are looked up signiﬁcantly more frequently than unknown words (5.00 vs. 2.05; Wilcoxon Signed-Rank p = .0022). In Table 2 it can be seen that the treatment group achieved higher total scores on the reading comprehension task (including all twenty questions) than the control group (17.24 vs. 16.26), and this diﬀerence is signiﬁcant (Mann-Whitney p=.04). Table 3 illustrates the scores for the ten questions associated with formulaic sequences only, and participants exposed to the salient formulaic sequences scored higher than participants exposed to the non-salient sequences (8.71 vs. 8.00). However, this diﬀerence was not signiﬁcant (Mann-Whitney p = 0.064). Table 2. Total reading scores for all lexical items Condition

Treatment Mean

Word and formulaic sequence 17.24 questions

Control SD

Min

2.34 9

Max Mean SD 20

Min

16. 26 2.03 14

Max 20

Table 3. Reading scores associated with formulaic sequences Condition

Treatment Mean SD

Formulaic sequence questions only 8.71

Control Min

1.42 6

Max Mean SD 10

8.0

Min

1.28 6

Max 10

The eﬀect of typographic salience

Discussion Eﬀect of salience on frequency of gloss Were unknown non-salient formulaic sequences glossed less frequently than unknown words? The answer is yes. The (plain) formulaic sequences in the control group were looked up signiﬁcantly less frequently (mean 1.43) than unknown words in the control group (mean 2.22). (Note that this condition is analogous to ‘real world’ reading.) This is consistent with the claim that unknown formulaic sequences are less easily recognizable as holistic entities than words, because unlike words, it is not clear, a priori, where the boundaries of unknown formulaic sequences lie. The results are also consistent with de Bot et al.’ s modiﬁcation of Levelt’s lexical model when applied to L2 reading. In order for a lexical item to be looked up (representing an attempt to ﬁll the empty lemma structure), the form of the lexeme must ﬁrst be recognized as unknown, but unpredictable and therefore less easily recognized lexeme forms (i.e. formulaic sequences) would be therefore expected to be looked up less often. The second outcome was that de Ridder’s (2002) ﬁnding that perceptually salient hyperlink glosses are more likely to be clicked upon was found to apply to perceptually salient formulaic sequences. In fact, making the target sequences salient had a dramatic eﬀect. Whereas only 1.43 unknown sequences on average were clicked on in the control condition, that number increased signiﬁcantly to 5.00 when they were underlined and presented in red in the treatment condition. In addition, salient formulaic sequences were clicked on signiﬁcantly more often than plain single words in the same condition. One reason for this might be that white space round a word is a much weaker ‘attention getter’ than typographic salience. This can be tested (and will be in an upcoming experiment) by adding a salient word condition and ascertaining whether making unknown words typographically salient has the same eﬀect as making formulaic sequences salient. Adding typographical salience to a formulaic sequence is in a sense making the (multi word) lexeme visible to the reader. The color and the underlining, it is argued, signal the holistic nature of the FS. It is assumed that once a form is focused on as unknown, the possibility increases (all other things being equal) of the participant actively trying to ﬁnd a meaning from a gloss if it is available.

Eﬀect of saliency on comprehension Increased clicking on glosses seems to be associated with improved comprehen-

239

240 Hugh Bishop

sion on the speciﬁc task in this study. The treatment group, answering twenty true/false questions which required understanding of target items, scored signiﬁcantly higher than the control group (17.24 vs. 16.26). However, when just the questions associated with formulaic sequences were considered (Table 3) a slightly diﬀerent picture emerges. Although the mean treatment score of 8.71 items correct out of 10 was higher than the control score of 8 correct, the diﬀerence was not signiﬁcant (Mann-Whitney p = .064). It appears that making formulaic sequences typographically salient increases the frequency with which they are looked up. This, in turn, is associated (curiously) with a signiﬁcantly better overall performance on items associated with words and formulaic sequences together in the true/false task. However, only a smaller (non-signiﬁcant) eﬀect was found for the items associated with salient formulaic sequences alone. This may arise from weaknesses in the procedure.

Weaknesses of the study Once source of weakness in the study which could have confounded the measurement of comprehension was that true false items were not controlled for difﬁculty. Some items were answered correctly by almost all participants whereas two items were answered wrongly by almost half the participants in both conditions. On closer examination some of the questions themselves appeared problematic. For example, true/false item 1 (see Appendix 3), which referred to the opening words of the reading text, appeared to exert a strong garden path eﬀect (the correct answer is true) such that even if the lexical item (do away with) was correctly understood, participants could be enticed into giving the wrong answer, possibly as a result of the rhetorical question it was embedded in and the awkward negative phrasing of the item. Since it contributes to diminished validity, it is necessary to control for item diﬃculty. Although there is a nonsigniﬁcant but suggestive (p=.064) association of formulaic sequence typographic salience with comprehension of true/false items, the potential weakness of the connection between the true/false items and target item comprehension needs to be addressed before more conﬁdent assertions can be made. Question items should be made of uniform diﬃculty, and the sentences that target items are embedded in should be equally dependent on comprehension of target items for their interpretation. They should be (as far as possible) equally easy to interpret. For example, a target item embedded in a subordinate clause in a long complex sentence, may contribute less to the meaning of the sentence than a target item embedded in a short simple sen-

The eﬀect of typographic salience

tence, so that even if both lexical items are known, they are likely to contribute diﬀerent weights to any comprehension test items. Only one reading task was carried out in this study. The outcome of this experiment may be inﬂuenced by the speciﬁc reading task required of participants. It remains to be seen whether varying the task demand will alter glossing behavior of formulaic sequences. The L1 literature is mixed. Schmalhofer & Glavanov, (1986) found that varying the readers’ purpose inﬂuenced the way a text was read, whereas Black et al. (1992), could not establish that reading for gist vs. reading for detail discriminated readers’ clicking behavior. With L2 readers, De Ridder (2002) found that varying the L2 reading task (speciﬁc vs. general) appeared to change the frequency with which salient hyperlinked glosses were sought (with the general reading condition attracting more attempts to look up glossary items), but she was unable to establish that increased glossing led to increased understanding. It may be that de Ridder’s study and this one indicate that the link between looking up highlighted formulaic sequences and comprehension is weak, or it may be that better research designs will produce a stronger eﬀect. The problem of the relationship between clicking on glosses and the understanding of formulaic sequences is currently terra nova et incognita.

Conclusion The ﬁndings of this study are consistent both with the claim that making unknown formulaic sequences typographically salient increases readers’ willingness to seek glosses, and also that this glossing leads to some increased comprehension of lexical items. It is risky to generalize too strongly from such a small scale study, but nonetheless a number of interesting questions are raised. Is typographical salience implicated just in local comprehension of formulaic sequences, or is there an eﬀect on global comprehension? Does increased attention to local meanings of formulaic sequences in terms of clicking on glosses (“click happy behavior” (Roby, 1999: 98)) detract from construction of larger text-based meanings, i.e., constant interruptions to the reading process might lead to “the construction of a less coherent text base” (De Ridder, 2002: 126)? How does the impact of typographic salience vary with task demand? Does increased glossing lead to increased incidental learning of formulaic sequences? What eﬀect does it have on intentional learning? Answers to these questions are of considerable practical and theoretical interest and therefore further study of these interesting entities is certainly warranted.

24

242 Hugh Bishop

Notes . Since it was not possible (Anderson Darling p < .05) to make the assumption that glossing data or reading data were normally distributed, non-parametric tests were used to test for signiﬁcance. 2. A one tailed test was used given the directional nature of the research question (Will words be clicked on more frequently than formulaic sequences?)

References Bishop, H. In preparation. Do Second Language Readers Notice Formulaic Sequences? University of Wisconsin-Madison: PhD thesis. Black, A., Wright, P., Black, D., and Norman, K. 1992. Consulting on-line dictionary information while reading. Hypermedia 4: 145–169. Bogaards, P. 2001. Lexical units and the learning of foreign language vocabulary. Studies in Second Language Acquisition 23: 321–343. Butler, B. E. 1980. The category eﬀect in visual search: Identiﬁcation versus localization factors. Canadian Journal of Psychology 34: 238–247. Carter, R. 1998 (2nd edition). Vocabulary: Applied Linguistic Perspectives. London: Allen and Unwin. Christ, R. E. 1975. Review and analysis of color coding research for visual displays. Human Factors 17: 542–570. Chun, D. and Plass, J. 1996. Eﬀects of multimedia annotations on vocabulary acquisition. The Modern Language Journal 80: 183–198. de Bot, K., Paribakht, T.S, and Wesche, M. 1997. Towards a lexical processing model for the study of second language vocabulary acquisition. Studies in Second Language Acquisition 19: 309–29. de Ridder, I. 1999. Are we conditioned to follow links? In CALL and the Learning Community, K. Cameron (ed.), 195–116. Exeter: ELM Bank Publications. de Ridder, I. 2000. Are we still reading or just following links? Highlights in CALL materials and their impact on the reading process. Computer Assisted Language Learning 13: 183–195. de Ridder, I. 2002. Visible or invisible links: Does the highlighting of hyperlinks aﬀect incidental vocabulary learning, text comprehension, and the reading process? Language Learning and Technology 6: 123–146. Doughty, C. 1991. Second language instruction does make a diﬀerence: Evidence from an empirical study of relativization. Studies in Second Language Acquisition 13: 431–469. Doughty, C. and Williams, J. 1998. Focus on Form in Classroom Second Language Acquisition. Cambridge: CUP. Fahrgal, M. and Obiedat, H. 1995. Collocations: A neglected variable in EFL. International Review of Applied Linguistics in Language Teaching 33: 315–331. Fisher, D. L. and Tan, K. C. 1989. Visual displays: The highlighting paradox. Human Factors 31: 17–30.

The eﬀect of typographic salience Fleming, M. and Levie, H. 1978. Instructional Message Design: Principles from the Behavioral Sciences. Englewood Cliﬀs NJ: Educational Technology Publications. Grabinger, R. S. and Osman-Jouchoux, R. 1996. Designing Screens for Learning. In Cognitive Aspects of Electronic Text Processing. Advances in Discourse Processes Vol. LVIII, H. Van Oostendorp and S. de Mul (eds), 181–212. Norwood NJ: Ablex. Hegelheimer, V. and Chapelle, C. 2000. Methodological issues in research on learner–computer interactions in CALL. Language Learning and Technology 4: 41–59. Howarth, P. 1996. Phraseology in English Academic Writing: Some Implications for Language Learning and Dictionary Making [Lexicographica, Series maior 75]. Tübingen: Max Niemeyer. Howarth, P. 1998. The phraseology of learners’ academic writing. In Phraseology: Theory Analysis and Applications, A. Cowie (ed.), 161–186. Oxford: OUP. Hulstijn, J. H. 2003. Incidental and intentional learning. In Handbook of Second Language Acquisition, C. Doughty and M. Long (eds), 349–381. Oxford: Blackwell. Irujo, S. 1993. Steering clear: Avoidance on the production of idioms. International Review of Applied Linguistics in Language Teaching 31: 205–219. Jourdenais, R., Ota, M., Stauﬀer, S., Boyson, B., and Doughty, C. 1995. Does textual enhancement promote noticing? A think-aloud protocol analysis. In Attention and Awareness in Foreign Language Learning [Technical Report #9], R. Schmidt (ed.), 183–213. Honolulu HI: University of Hawai’i, Second Language Teaching and Curriculum Center. Kolers, P. A., Duchnicky, R. L., and Ferguson, D. C. 1981. Eye movement measurement of readability of CRT displays. Human Factors 23: 517–527. Levelt, W. 1989. Speaking from Intention to Articulation. Cambridge MA: MIT Press. Levelt, W. 1993. Lexical Access in Speech Production. Oxford: Blackwell. Lewis, M. 1993. The Lexical Approach. Hove, England: LTP Publications. Muter, P. 1996. Interface design and optimization of reading of continuous text. In Cognitive Aspects of Electronic Text Processing. Advances in Discourse Processes Vol. LVIII, H. van Oostendorp, and S. de Mul. (eds), 161–180. Norwood NJ: Ablex. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. Paribakht, T. S. and Wesche, M. 1996. Enhancing vocabulary acquisition through reading: A hierarchy of text-related exercise types. The Canadian Modern Language Review 52: 155–78. Pawley, A. and Syder, A. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J. C. Richards and R.W. Schmidt (eds), 191–227. London: Longman. Peters, A. M. 1983. The Units of Language Acquisition. Cambridge: CUP. Roby, W. B. 1999. What’s in a gloss? Language Learning and Technology 2: 94–101. Schmidt, R. 1990. The role of consciousness in second language learning. Applied Linguistics 11: 129–158. Schmidt, R. 1992. Awareness and second language acquisition. Annual Review of Applied Linguistics 13: 206–226. Schmidt, R. 1995. Attention and Awareness in Foreign Language Learning [Technical Report #9]. Honolulu HI: University of Hawai’i, Second Language Teaching and Curriculum Center. Schmalhofer, F. and Glavanov, D. 1986. Three components of understanding a programmers’ manual: Verbatim, propositional, and situational representations. Journal of Memory and Language 25: 279–294.

243

244 Hugh Bishop Swain, M. 1998. Focus on form through conscious reﬂection. In Focus on Form in Classroom Second Language Acquisition, C. Doughty and J. Williams (eds), 64–81. Cambridge: CUP. Tullis, T. S. 1988. Screen design. In Handbook of Human-computer Interaction, M. Helander (ed.), 377–411. Amsterdam: Elsevier. Wesche, M. and Paribakht, T. S. 1996. Assessing second language vocabulary knowledge: Depth versus breadth. Canadian Modern Language Revie 53: 13–40. Wray, A. 2000. Formulaic sequences in second language teaching: Principle and practice. Applied Linguistics 21: 463–489. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP. Yorio, C. 1989. Idiomaticity as an indicator of second language proﬁciency. In Bilingualism across the Lifespan, K. Hyltenstam and L. Obler (eds), 55–72. Cambridge: CUP.

Appendix 1 Target lexical items 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

do away with disrupt put up with expatiate catch on to consequences have an inkling of outweigh carry out moderate put oﬀ obviate cut out concede pile up ongoing silver tongued perspicacity determine over the top

Appendix 2 Treatment condition text (The underlined formulaic sequences were presented in red font in the experiment. The control condition text was identical except the formulaic sequences were not typographically enhanced.) Could global warming do away with life on earth by making the world so hot that life cannot be supported? Perhaps not, but the ice at the north and south poles could melt. This melting will ﬂood coastal parts of countries where many millions of people live. Countless thousands could be driven from their homes to become refugees, and this large number of people who have lost their homes would disrupt the activities of everyday life. It could create great political problems. Also, global warming will change the climate, and this will not improve food production. Dry areas will become wetter and wet areas will have to put up with drought. Less food will be produced and there will be increased danger of starvation. Thus global warming is seen by some to be a very real threat, but how does it happen? When you walk into a greenhouse on a sunny day, it is hot because the glass traps the sun’s heat. The earth’s atmosphere can do the same thing. Greenhouse gases, such as carbon dioxide, in the atmosphere trap the sun’s heat and stop the heat from being radiated back into space. These gases, which come from cars and factories make the earth warmer — in other words they cause global warming. Writers in newspapers and magazines expatiate on the seriousness of global warming. It is starting so slowly so that we may not catch on to what is happening until it is too late. However if the problem is not noticed, it may get so bad that there will be serious consequences that cannot be undone. Global warming is a slow development so it is very hard to have an inkling of what change might be catastrophic and what change won’t. Given the slowness of the change, what are politicians more likely to be concerned with? In the short term, unchecked production of greenhouse gases leads to increased economic production. Any serious reduction of greenhouse gases will make energy more expensive and reduce economic growth. Politicians need votes in the short term, so which choice will be seen as the most important — short term economic gain or safety of the environment? It is most likely that, in politicians’ minds, short-term economic gain will outweigh long term safety for the environment. . If governments start to carry out sensible environmental strategies, at the very least the arrival of the coming environmental disaster can be delayed. Is it enough, however, just to moderate greenhouse gas emissions?. When we reduce emissions, we don’t solve the problem. We just put oﬀ the oncoming global crisis until a later time. In the long term, we need to stop greenhouse emissions completely. But how do we go about it? How are we to completely obviate the production of greenhouse gases when they are a result of demand for cheap energy? Cheap energy means industry can make more money. What politician will tell companies to cut out the production of greenhouse gases if it means more expensive power for factories, and it results in a slowdown for the economy. Politicians are going to

The eﬀect of typographic salience 247 have to concede the strong probability that they will lose their jobs if they take this position. This is a serious problem. Another problem is that it is not yet clear to all that global warming is actually happening. Although scientiﬁc information is starting to pile up supporting the argument that global warming is ongoing, its eﬀects are not clear enough so that everyone can see them. Who can help people to wake up to the danger before it is too late? In this respect both scientists and politicians have important parts to play. Politicians need to be able to make people do things they don’t want to. In order to do this, they must speak clearly and strongly about the problem so that average people will understand what is happening, and that they will also understand what will happen if they do nothing. So these politicians must be silver tongued speakers to do this. Scientists, too, need to show great perspicacity, since they are the ones who must ignore all the passion and ﬁnd solutions to these very complicated problems. These scientists must determine those practical solutions that the politicians will then have to get people to accept. Is this really too much to ask? Is it so over the top to tell people they must reduce their standard of living? Maybe it is, but we really don’t have any choice.

Appendix 3 True false reading task items 1. Global warming will not destroy life on Earth. 2. Global warming will harm people’s daily activities. 3. The droughts will mean some people have too much water. 4. Newspapers talk to us all the time about global warming. 5. It is easy to see that global warming is happening. 6. Bad things will happen as a result of global warming. 7. Good greenhouse change and bad greenhouse change are easy to see. 8. Economic gain at this moment is more important to politicians than the environment. 9. Sensible government policy can stop the environmental problem getting worse. 10. Producing fewer greenhouse gases will solve the problem of global warming. 11. Reducing the amount of greenhouse gases only delays facing the real problem. 12. If we stop all greenhouse gas production energy will be cheaper. 13. Politicians must tell people to stop producing greenhouse gases. 14. People will be happy if politicians raise the price of energy and save the environment. 15. The quantity of scientiﬁc evidence supporting global warming is growing. 16. Global warming is not actually happening at this moment. 17. Politicians do not need to be good speakers to make people to give up cheap energy. 18. Scientists have to communicate clearly to everyone the problems of global warming. 19. Politicians alone must ﬁnd solutions to the problem of global warming. 20. It is unreasonable to tell people to reduce their standard of living.

‘Here’s one I prepared earlier’ Formulaic language learning on television Alison Wray

Cardiﬀ University

The challenge to the language learner All of us, as post-childhood foreign or second language learners, encounter the challenge of how to balance speed and ﬂuency on the one hand with accuracy on the other, when engaging in spontaneous conversation. Our preferred strategies vary, according to situation, personality and perhaps our recent alcohol consumption. In an L2 context with little at stake — such as when we are the least linguistically competent person at a dinner table — we may shut up altogether, preferring not to ﬁght for conversational space against our more proﬁcient adversaries. On the other hand, faced with a medical emergency in a foreign country we will hang the little we know onto gesture, pragmatics and a look of desperation, in order to encourage our addressees to share the burden of our communicative limitations. One way to increase our ﬂuency and accuracy is to rely on prefabricated linguistic material. If we can simply reproduce intact — as if it were a single, long, unit — the exact formulation that a native speaker would use, we can avoid the diﬃculties associated with stringing words and morphemes together by rule, and the risk of instilling our message with inappropriate pragmatic overtones. Such prefabricated multiword strings are empowering, because they ease the expression of otherwise complex messages. On the other hand, they are an act of faith. The user launches a pre-packaged idea, without necessarily being aware of its exact composition. It’s like handing the hearer a sealed box, in the hope that it really will convey the intended message. Thus the visitor to Russia can entirely eﬀectively bid farewell with ‘dozvidaniya’, knowing only that it is what one says for ‘goodbye’, not that, literally, it expresses the same expectation of a future meeting as ‘auf Wiedersehen’ in German or ‘au revoir’ in French.

250

Alison Wray

At the other extreme, the learner with little faith in formulaic language will, presumably, feel compelled to control every morpheme of a painstakingly constructed utterance, relying on grammatical rules to compose the message out of its smallest components. Behaviourally, this can give rise to varying degrees of conversational constipation — for even native speakers, let alone non-native speakers, seem unable to maintain a ﬂow in interaction without recourse to formulaic language (Wray 2002a,b). Linguistically, it can be characterised by an absence of not only idiom, but also common pragmatic implicature, and by a curious over-application of rule patterns, a literal use of lexical forms, and too wide a range of collocational associations. The eﬀect may be variously construed as non-native, abnormal, or creatively poetic. The language learner is challenged to ﬁnd a happy medium, in which ﬂuency and accuracy are supported by the use of prefabricated material, while online construction supplies the ﬂexibility to express any desired message. It follows that researchers are interested in observing the dynamics through which this balance is achieved. However, diﬃculties immediately arise with respect to identifying when a word string is, in fact, formulaic. Some strings — opaque idioms — are quite clearly so, for they cannot be constructed nor decoded using the grammar of the language. But most linguists want to be more inclusive, and allow for the possibility that a wordstring can be semantically transparent and grammatically regular, but still be formulaic. This position permits expressions like have a nice day and it’s been great talking to you. One criterion for identifying such strings is some layer of meaning that transcends the individual words and belongs to the string as a whole (Erman & Warren 2000). A third position on formulaicity is more inclusive still. It holds that there is nothing to prevent any wordstring from being treated formulaically, even if there is no additional layer of meaning. (However, once it is formulaic, it is likely to drift away from semantic transparency and grammatical regularity (Wray 2002a: 49). The deﬁning features of more conservative deﬁnitions of formulaicity are, thus, viewed as a consequence of holistic processing rather than a cause of it). If any wordstring can become formulaic, it follows that one can neither guarantee to spot formulaic strings by looking at their form, meaning or usage, nor compile a complete list of them. The identiﬁcation of formulaicity in this deﬁnition is not, however, impossible. Various techniques have been applied, including the tracking of pauses, eye-gaze, intonation and, for written text, ﬂuency in typing (see Wray 2002a: chap 2 for a review).

Formulaic language learning on television

Practical approaches to exploring formulaicity in action But suppose one could create a situation in which one knew for sure which wordstrings were formulaic. Many useful insights might be gained into how we handle formulaic and non-formulaic material, and these insights could feed back into the interpretation of less controlled situations. A suite of recent and current research projects at Cardiﬀ is exploring the potential for imposed formulaicity to oﬀer a valuable baseline for comparison in linguistic analysis. The common approach in these projects is to provide the speaker with an exact nativelike formulation of the message that s/he wishes to express (e.g. Wray 2002b; Wray et al., 2004). The project reported here is founded on a simple premise: in a fully predictable situation, it should be easy for a language learner to pass herself oﬀ as nativelike (apart from pronunciation), because all she has to do is memorise a set of nativelike sentences. This premise, while logical in its own terms, is designed to directly challenge decades of research observations to the eﬀect that adult taught language learners are poor users of formulaic language, relative to adult naturalistic learners, and child learners. These observations, fully reviewed in Wray (2002a: chaps 8–10), are typiﬁed by Yorio’s (1989) ﬁnding, that expressions learned as formulas subsequently developed errors consistent with the interlanguage. Observations like this suggest that adult learners may ﬁnd it diﬃcult to suppress the tendency to break down linguistic input, even when they know that it is not necessary or advisable to do so. If, as Wray (2002a: 200ﬀ) proposes, they engage in this analysis in order to identify lexical elements, they may simply discard the grammatical detail, creating a problem for themselves when they later need to reconstruct the string. These hypotheses are directly tested in the present study, by virtue of a rather unusual learning situation. In standard investigations, one can never be sure (a) what the learner actually internalised in the ﬁrst place, (b) what learning events might have subsequently aﬀected the handling of that original input, or (c) what immediate communicational or other circumstances might have contributed to some sort of on-line editing of what would otherwise have been a more nativelike string. In this study, these factors are controlled. By watching the entire learning process it is possible to gain a fairly accurate impression of (a) and (b), while the performance situation excludes one of the two major motivations for editing — unanticipated input from interlocutors, while retaining the other — high-level anxiety.

25

252

Alison Wray

The context of the study: Welsh in a week These tightly controlled conditions were made possible by observing the ﬁlming of an episode of the BAFTA winning television programme Welsh in a Week. The major function of this half hour programme on S4C (Welsh Channel 4) is to encourage viewers to take up the challenge of learning Welsh. Presented through the medium of English, it focuses upon the use of Welsh in the workplace, and particularly emphasises the domains of healthcare and leisure/tourism. The primary focus of a Welsh in a Week programme is the progress of an individual learner in mastering, in the space of four days (a rather short ‘week’), suﬃcient Welsh to achieve their ‘Challenge’. In preparation for the Challenge, Nia Parry, the tutor/presenter, gives three tutorials in which she introduces the language necessary for achieving the Challenge task. The learner has a great deal to learn and practise, before he or she is plunged into the real life situation, all captured on ﬁlm. The views and feelings of the learner are chronicled in interviews, supplemented by a video diary. The approach taken in Welsh in a Week is of immense interest to the linguist, because it relies predominantly on formulaic material. That is, the phrases and sentences that are introduced to the learner are presented as holistic units, with no, or only partial, indication as to how they are constructed. The reason for this practice is pure expediency: the format of the programme is not compatible with taking time to explain why a particular sequence means what it does. In addition, the formulaic material used in Welsh in a Week is mostly of a highly speciﬁc nature. Although the ﬁrst tutorial may introduce a few generic expressions, the Challenge can only be achieved if the learner is supplied with language that is highly tuned to the situation. For example, a learner challenged to run a bingo session in a Welsh-medium residential home for the elderly was taught Oes gynnoch chi gerdyn? ‘Have you got a card?’, Dach chi wedi ennill? ‘Have you won?’, and Gêm arall? ‘Another game?’. A doctor challenged to administer an anaesthetic was provided with Dach chi wedi bod yn yr ysbyty o’r blaen? ‘Have you been in hospital before?’, Oes gynnoch chi ddannedd gosod? ’Have you got false teeth?’ and ‘Dw i’n mynd i roi mwgwd ar eich wyneb ‘I’m going to put a mask on your face’. Such highly speciﬁc expressions do, of course, oﬀer the potential for generalisation if the learner is able to segment out the primary lexical material to leave frames like Oes gynnoch chi __? ‘Have you got __?’ and Dw i’n mynd i roi __ ‘I’m going to put/give __’. To some extent this is encouraged by the tutor (see later) though, signiﬁcantly, it

Formulaic language learning on television

is deliberately not systematic and it certainly does not extend to a full grammatical and lexical analysis of all the input material. The learner whose experience we follow below was challenged to present a cookery demonstration to a group of ladies from the local Welsh chapel social group.² Because the demonstration proceeds on the basis of an uninterrupted scripted monologue, this Challenge, exceptional in the series in not entailing uncontrolled interaction from interlocutors, oﬀers precisely the situation necessary for testing the ability of an adult learner simply to memorise and repeat linguistic material.

Case study: Margaret Owen Margaret Owen is a retired home economics teacher from south-west Wales. Her pre-ﬁlming knowledge of Welsh was limited to the comprehension of a few phrases picked up from Welsh-speaking friends and relations. She had sung hymns in Welsh for many years, and could read it aloud accurately (though without much comprehension). She said that she could sometimes follow a contextualised conversation, but that she was not suﬃciently proﬁcient to participate at even a basic level. The primary data was collected on location with the ﬁlm crew for a week, and consists of ﬁeld notes relating to observations and interviews, the complete set of the ﬂashcards prepared for the lessons, and three hours of unedited ﬁlm, including all the takes, some practice time, and discussions about the language material that took place between takes with the camera still running. In addition, there are two audio recordings, made ﬁve and nine months later, for follow-up purposes. In the ﬁlming week, the learning experience occurred in real time, and though the ﬁnal half hour programme depicts barely 15 minutes of the process, there was a genuine and substantial language-learning iceberg beneath the visible tip. The three tutorials took place on Monday, Tuesday and Wednesday. They were clearly structured, focussed, and covered a considerable amount of input. The tutorials were ﬁlmed three times — itself an opportunity for learning — and Margaret was able to take away the materials used, plus supplementary matter, presented on ﬂashcards. It was her responsibility to memorise the material by the next day, ﬁtting it in around the ﬁlming schedule. The Challenge took place on the Friday. Margaret’s Challenge was to demonstrate the preparation of two recipes, pork and mushroom casserole and lemon pudding. In the event, only the former was

253

254

Alison Wray

broadcast. This provides an unexpected bonus for us. The ﬁrst follow-up recording was made about two weeks before the programme was broadcast for the ﬁrst time. The second follow-up was made some three months after the broadcast, during which period Margaret had, inevitably, viewed it a number of times on video with friends. It was therefore possible to investigate the eﬀect of additional rehearsal on her recollection of the casserole-related language, as compared with that of the pudding, which she had never viewed.

Chronicle of the process Before the ﬁlming, Margaret received a preparation sheet and tape, providing some pronunciation exercises based on minimal pairs (the examples were not translated), thirty common phrases for greeting and leave-taking, introducing oneself, and so on, and the numbers one to ten. Margaret reported that she already knew most of this vocabulary. As part of the week’s teaching, Nia provided around 50 ﬂashcards per day, though not all were used in the tutorials. The ﬂashcards contained in total 341 words, of which 56 (16.4%), mostly the names of ingredients and cooking utensils, had a ﬂashcard to themselves. The remainder appeared in multiword strings, ranging in length from two to eleven words. More than half the material appeared in strings of four words or more. In the tutorials, Nia suggested messages that Margaret might need for her Challenge, and gave her the appropriate Welsh phrase or sentence. She presented some long sentences in smaller chunks on ﬂashcards (Figure 1), showing how they could be built up. These chunks were themselves formulaic — that is, internally complex but presented whole. By using multiword strings, there was the potential to bypass at least some of the impact of ‘front mutation’, a particular problem for learners of Welsh (see later). Nevertheless, Nia did give some instruction regarding mutations if they occurred at boundaries between chunked constituents. As we shall see, this strategy may be responsible for some of the errors in Margaret’s recall. Rehearsal was in part an inevitable consequence of the ﬁlming context, since each tutorial was ﬁlmed three times for television production purposes. AlDw i’n

mynd i

ddangos i chi

sut i

wneud . . .

I’m

going to

show you

how to

make . . .

Figure 1. Flashcard sequence for a complex sentence

Formulaic language learning on television

though the content, including Margaret’s reactions to new information, had to be broadly reproduced intact for each take, it was noticeable that learning achieved in one take was transferred into the next take. When the cameras were not rolling, Nia would often give additional tuition and rehearsal support. Margaret practised her material in the evenings both alone and with friends and relations, and was word-perfect by the time the next set was introduced. On the day before the Challenge, Nia spent some time oﬀ camera helping her rehearse the material she would speciﬁcally need. Margaret presented her cookery demonstration twice to her live audience. The language she used was not identical across the two events, nor entirely identical to that which she had prepared. Her delivery was ﬂuent (allowing for the natural gaps entailed in cooking something in real time) and evidently entirely comprehensible to the audience, who listened attentively, laughed at her jokes, and even made notes. Five months after ﬁlming (two weeks before the broadcast), Margaret recorded onto audiocassette as much of the material as she could now recall without rehearsal. She also provided a written commentary on her experience and recall. Nine months after ﬁlming (four months after the ﬁrst broadcast of the programme) the researcher visited her and recorded, once more, her recall of the material, as well as interviewing her.

Findings The data from the study is very rich, and no single paper can do it full justice. Material introduced in the tutorials but not used in the Challenge is excluded from consideration here; we shall deal only with the various manifestations of the sentences that constituted the ‘script’ of the cookery demonstration. The script was made up of 60 diﬀerent utterances (henceforth ‘items’), numbered 1–63 in Figure 2,1 since some occurred more than once. Using the script as a reference point, it was possible to compare the exact forms in which any given item appeared, from its ﬁrst introduction in a tutorial to the ﬁnal audio recall nine months after ﬁlming. A typical item would be attempted twice in one tutorial (providing six renderings over the three takes), once in each take of the Challenge, and once in each of the two recalls: ten attempts in all. The lowest number of attempts at an item was three, the greatest, forty-six. High numbers of attempts normally reﬂected many repetitions in a tutorial, because of diﬃculty in remembering the

255

256

Alison Wray The script 1. 2. 3. 4.

Prynhawn da. Margaret Owen dw i. Mae’n braf eich gweld chi i gyd. Prynhawn ’ma dw i’n mynd i ddangos i chi sut i wneud caserol porc a madarch a pwdin lemon. Yn gyntaf, rysait y caserol. Bydd angen y cynhwysion yma: Pwys a hanner o borc, chwe owns neu hanner pwys o fadarch,

Translation

dwy owns o fenyn, dwy lond llwy fwrdd o ﬂawd, llwyaid o sieri, hanner peint o stoc cyw iâr, chwarter peint o hufen dwbl, llwyaid o sudd lemon, halen a phupur i ﬂasu, pwdr cyri. Rhowch y menyn yn y badell ﬀrio. Torrwch y cig yn ddarnau bach. Rhowch y cig a dwy lond llwy fwrdd o ﬂawd mewn cwdyn plastig. 21. Siglwch fel hyn. 22. Rhowch y porc yn y badell ﬀrio fel hyn.

10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

23. 24. 25. 26.

23. 24. 25. 26.

Good afternoon. I’m Margaret Owen. It’s nice to see you all. This afternoon I’m going to show you how to make pork and mushroom casserole and lemon pudding. First, the casserole recipe. You will need these ingredients: A pound and a half of pork, six ounces or half a pound of mushrooms, two ounces of butter, two tablespoonfuls of ﬂour, a spoonful of sherry, half a pint of chicken stock, a quarter of a pint of double cream, a spoonful of lemon juice, salt and pepper to taste, curry powder. Put the butter in the frying pan. Cut the meat into small pieces. Put the meat and two table-spoonfuls of ﬂour into a plastic bag. Shake it like this. Put the pork into the frying pan like this. Cook quickly on both sides. Add the mushrooms. Cook for ﬁve minutes. Add two tablespoonfuls of sherry,

27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.

and mix well. Add the double cream, and the lemon juice. Mix well. Add salt and pepper to taste, curry powder, and the chicken stock. Cook slowly. I prefer meat well done, so I put the casserole in an oven: gas mark four, electric: one hundred and eighty. Put it in the casserole [dish].

5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.

Coginiwch y ddwy ochr yn gyﬂym. Ychwanegwch y madarch. Coginiwch am bum munud. Ychwanegwch ddwy lond llwy fwrdd o sieri, a chymysgwch yn dda. Ychwanegwch yr hufen dwbl, a sudd lemon. Cymysgwch yn dda. Ychwanegwch halen a phupur i ﬂasu, pwdr cyri, a stoc cyw iâr. Coginiwch yn araf. Mae’n well ’da ﬁ cig wedi coginio yn dda, felly dw i’n rhoi y caserol mewn ﬀwrn: ﬀwrn nwy pedwar, trydan: cant wyth deg. Rhowch yn y caserol.

1. 2. 3. 4. 5. 6. 7. 8. 9.

21. 22.

Formulaic language learning on television 40. Cogniwch yn araf am dri chwarter awr. 41. A dyma un dw i wedi’i baratoi yn gynharach. 42. Pwdin lemon. 43. Bydd angen: 44. hanner peint o hufen dwbl, 45. llond llwy fwrdd o siwgr caster, 46. dwy owns o fenyn, 47. paced bach o bisgedi digestive wedi malu, 48. sudd tri lemon, 49. a un tun o laeth cyddwysedig. 50. Toddwch y menyn mewn sosban 51. Rhowch y bisgedi ac un llond llwy fwrdd o siwgr caster yn y sosban, 52. a chymysgwch yn dda. 53. Rhowch y bisgedi mewn dysgl fel hyn. 54. Rhowch y sudd lemon, llaeth cyddwysedig a hufen dwbl mewn powlen, 55. a chymysgwch yn dda. 56. Arllwyswch ar ben y bisgedi fel hyn. 57. Rhowch yn yr oergell am ddwy awr 58. Os dych chi eisiau, 59. rhowch siocled neu ﬀrwythau ar ben y pwdin lemon. 60. Diolch yn fawr iawn am ddod. 61. Mwynhewch y rysaitau. 62. Pwy sy eisiau blasu? 63. Dewch yma.

40. Cook it slowly for three quarters of an hour. 41. And here’s one I prepared earlier. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62.

Lemon pudding. You will need: half a pint of double cream, a tablespoonful of caster sugar, two ounces of butter, a small packet of digestive biscuits, crushed, the juice of three lemons, and one tin of condensed milk. Melt the butter in a saucepan. Put the biscuits and a tablespoonful of caster sugar into the saucepan, and mix well. Put the biscuits into a dish like this. Put the lemon juice, condensed milk and double cream in a bowl, and mix well. Pour over the biscuits like this. Put in the fridge for two hours. If you like, put chocolate or fruit on top of the lemon pudding. Thank you very much for coming. Enjoy the recipes. Who would like to taste?

Figure 2. The ‘script’ of the cookery demonstration

sequence, or in pronouncing one or more of the words. Some renderings were simple repetitions of what the tutor said. Others were the result of translation or free recall. In interrogating the data, our key interest is the extent to which Margaret was able successfully to reproduce the material that she had memorised. We shall examine the data in relation to a set of questions that, between them, explore the reasons for her successes and failures. For convenience, the performances will be referred to as Ch1 and Ch2 (for Challenge) and the ﬁve- and nine-month audio recalls as R+5 and R+9 respectively.

257

258

Alison Wray

To what extent were memorised strings successfully recalled? Table 1 summarises the successful delivery of the script items in Ch1, Ch2, R+5 and R+9. Scoring for this calculation was draconian: one mark for an entirely correct recollection, and zero for one or more errors. Errors could be as minor as the loss of an unstressed phoneme or as major as a complete breakdown of the form. However, the entire omission of the item was not counted as an error, and the percentages are based only on those items attempted. Partial omissions are counted as errors. Using χ2 as a test of independence, the diﬀerences in performance shown in Table 1 are highly signiﬁcant (χ2 = 29.24, df 3, p < 0.01). It is not surprising to ﬁnd that R+5 and R+9 revealed a relatively poorer recall both of the messages and their forms, since the Challenge events were the climax of considerable rehearsal, while the ﬁve and nine month recall attempts expressly were not. However, it is more surprising to ﬁnd a sizeable increase in the number of errors from Ch1 to Ch2, since the two events took place within a few minutes of each other. The diﬀerence is signiﬁcant (χ2 = 4.12, df 1, p < 0.05). Watching the ﬁlm of the two performances, a clear diﬀerence between the two performances is apparent. In both cases Margaret was very nervous, but in the ﬁrst, her presentation was much more up-beat, and she had a rather better rapport with the audience. Whether because of tiredness, or relief at having the ﬁrst performance ‘in the can’, she seemed less engaged second time round, and this may explain her reduced success in recalling the memorised material accurately. The diﬀerence in errors between the less successful Challenge (Ch2) and the more successful recall (R+9) is also signiﬁcant (χ2 = 6.15, df 1, p < 0.05). However, the diﬀerence between R+5 and R+9 is not signiﬁcant (χ2 = 0.1159, df 1), even though one might have predicted that the accuracy of recall would reduce over the four intervening months. The similarity between them could indicate that the loss of accuracy had already reached a plateau by R+5, with the bestlearned formulas ﬁrmly ensconced in memory and the rest already forgotten. However, another explanation is also possible.

Table 1. Strings delivered entirely correctly (of those attempted) Ch1

Ch2

R+5

R+9

50/63 (79%)

37/59 (63%)

17/48 (35%)

19/49 (39%)

Formulaic language learning on television Table 2. Strings delivered entirely correctly in broadcast and unbroadcast parts of the script

Broadcast Not broadcast

Ch1

Ch2

R+5

R+9

35 15

26 11

11 6

12 7

The television programme was broadcast soon after R+5. If there was an underlying reduction in accuracy over time, it could have been counterbalanced by the eﬀects of repeatedly watching the video of the programme, as Margaret inevitably did with friends and family. We can test this hypothesis by comparing her performance on the casserole and pudding recipes, since only the former made it onto the TV programme. In other words, if Margaret’s recall at R+9 was enhanced by her having watched the video several times, this should have improved her recall of the casserole recipe but not the pudding recipe. Table 2 summarises the separate totals for the parts of the script that were broadcast (greeting, casserole recipe, closing) and were not (lemon pudding). While Margaret’s recall of the broadcast material was slightly greater at R+9 than at R+5, the same was true for the unbroadcast material. There is no signiﬁcant change in the accuracy of the broadcast and unbroadcast material between the pre-broadcast recall (R+5) and the post-broadcast recall (R+9) (χ2 = 0.033, df 1). Thus, it seems that watching the video was not responsible for the retention of accuracy between R+5 and R+9, and that the attrition in her accuracy had indeed reached at plateau by R+5. These stark ﬁgures only evaluate the extent to which Margaret was simply able to reproduce what she has memorised without any error at all. As such, they hide a considerable amount of information, since no diﬀerentiation has been made between error types. One could ‘score’ the errors according to type, so that a minor slip in morphology was worth less — or more — than the replacement of a word or the breakdown of a grammatical structure. However, it is diﬃcult to imagine justifying any particular scoring system without taking a theoretical position on why it was more diﬃcult to memorise some kinds of linguistic feature than others. Qualitative analyses are ultimately more useful than ad hoc quantitative ones, and the former are presented later. First, however, we explore the location of errors and pauses.

Where were errors and pauses located? If a formulaic string is treated as a single, holistic unit, it ought to be relatively

259

260 Alison Wray

resistant to internal dysﬂuency and inaccuracy (Wray 2002a: 35–37; 219–222). Therefore, we can make the prediction that there would be far fewer pauses and errors within formulaic strings than between them. But what, precisely, should we view as a formulaic string in this material? Is each of the 60 items in the script a formulaic string? In keeping with Wray’s (2002a: 9) formal deﬁnition of the ‘formulaic sequence’, the test would be: was the item introduced and memorised whole? Some were, but some were not. Those that were not, were presented to Margaret as two or more smaller, though often still internally complex, parts. It is these smaller units that we shall view as the formulaic strings. There are, in other words, boundaries within some of the 60 script items, which separate two or more formulaic sequences. The boundaries are identiﬁed not on the basis of any external criteria, such as componential analysis or a language processing theory. Rather, they purely reﬂect the input that Margaret received. The procedure is best demonstrated with an example. Item 20 in the script was Rhowch y cig a dwy lond llwy fwrdd o ﬂawd mewn cwdyn plastig ‘Put the meat and two tablespoonfuls of ﬂour into a plastic bag’. Three internal boundaries were identiﬁed as likely to occur within this string for Margaret. One was after rhowch ‘put’, since nine diﬀerent instructions in the script began with this word. The second and third boundaries enclose the phrase dwy lond llwy fwrdd o ﬂawd ‘two tablespoonfuls of ﬂour’, since this also appeared separately in the script (item 11). As a result, this script item was viewed as consisting of four formulaic sequences: rhowch; y cig a; dwy lond llwy fwrdd o ﬂawd; mewn cwdyn plastig. As y cig a ‘the meat and’ and mewn cwdyn plastig ‘in a plastic bag’ clearly indicate, this boundary allocation does not attempt a mapping onto grammatical constituents, and it has no status beyond the individual learner’s experience. Considerable care was taken to base the boundaries on the actual occurrence of items in Margaret’s input, so that they represented as accurately as possible the places at which she was likely to register a potential for choice. The matter in hand, then, is: were there, as the hypothesis predicts, more pauses (we shall deal with errors later) at these boundaries than in other places within the script item? In order to investigate this, the data from the four attempts at the script, Ch1, Ch2, R+5 and R+9, was amalgamated.³ For each item, the pauses were allocated to one of two categories: at a boundary or not at a boundary. Non-boundary pauses were considered able to appear either between words or within any word longer than one syllable. Since there were inevitably many more possible non-boundary places for a pause to occur, a random distribution of pauses would appear to favour non-boundary locations, if raw ﬁgures were used. To compensate for this, the frequencies of the boundary

Formulaic language learning on television HALEN

A

PHUPUR

I

FLASU

Figure 3. The potential locations of script-item-internal boundaries between formulaic sequences (double line) and within formulaic sequences (single lines)

and non-boundary pauses were calculated as percentages of the respective locations available. Figure 3 illustrates the procedure for a short item, Halen a phupur i ﬂasu ‘salt and pepper to taste’. Because halen had been separately taught, while phupur, a mutated form, never occurred out of the context a __ i ﬂasu, this script item was judged to contain one formulaic boundary, between halen and a (marked with the double line). If there was a pause at this location in, say, two of the four performances, this would render a score of 50%. A pause here in all four performances would give a score of 100%. The single lines in Figure 3 indicate the potential locations for non-boundary pauses: that is, either between words that were not separated by a formulaic boundary, or (anywhere) within a word of more than one syllable. There are, for this script item, six possible pause locations that are not at a formulaic boundary. If Margaret paused, say, only between phupur and i, and only on one occasion out of the four, the score would be one twentyfourth, or 4.2%; that is, of the twenty-four possible pause locations (six in each performance), she inserted a pause in one. If she paused in that location in two performances, and once each within phupur and within halen, the score would be four twenty-fourths, or 16.7%, because four of the twenty-four possible locations had been used. Pauses occurring at the very start and end of script items were not included, since the items characteristically conveyed a complete instruction, followed by a demonstration of it, creating a natural pause. Using this procedure, it was calculated that there were a total of 245 pauses, distributed across 2178 possible locations (292 boundary locations, 1886 nonboundary locations). Pauses occurred at 106/292 (36.3%) possible boundary locations, and at 139/1886 (7.4%) possible non-boundary locations. In short, ﬂuency within formulaic sequences was considerably greater than ﬂuency between them. This diﬀerence is highly signiﬁcant (χ2 = 211.98, df 1, p < 0.01). The same calculations were performed for the locations of errors. There were a total of 187 errors, distributed across the 2178 possible locations. Errors occurred at 78/292 (26.7%) possible boundary locations, and 109/1886 (5.8%) possible non-boundary locations. Thus, errors were considerably more likely to occur at boundaries between formulaic sequences than within the formulaic se-

26

262 Alison Wray

quences. This diﬀerence, too, is highly signiﬁcant (χ2 = 39.797, df 1, p < 0.01). Errors come in many shapes and forms. In order to establish patterns in Margaret’s performance, we turn now to qualitative analysis.

What errors were made? The key advantage of memorising linguistic material is that you do not have to know why it has the form it has. You just need to remember it. You don’t need to make any choices, only use what you have learned. However, as we shall see below, the adult learner’s mind appears intent on interfering with this apparently simple business, by editing forms unnecessarily. On the other hand, when a string of words becomes familiar, there is a danger that it will ‘run away with you’, so that you end up saying something diﬀerent from what you intended. This is where we begin.

Too much of a good thing The examples described here are indicative of two interacting features of Margaret’s learning: successful automatisation, and the absence of analysis. The script contained the item: ‘I prefer meat well done’, mae’n well ’da ﬁ cig wedi coginio yn dda. In Ch1, she began this item with ond ‘but’, which is semantically inappropriate. Why? A possible explanation is found in her extensive practise, in Tutorial 1, of the paired formulas dw i’n hoﬃ N1 ‘I like N1’ and ond mae’n well ’da ﬁ N2’ but I prefer N2’. In memorising the latter, she learned strongly to associate the generic ‘prefer’ formula with the initial word ond. This may have interfered when she sought her speciﬁc ‘prefer’ formula during the Challenge. In Tutorial 1, Margaret correctly repeated dw i’n mynd i wneud ‘I am going to make’ six times. She was then taught to replace wneud ‘make’ with other verbs, including goginio, ‘cook’, and also with place names. Subsequently, she overshot, retaining the wneud when attempting to construct dw i’n mynd i goginio ’I am going to cook’ and dw i’n mynd i Waelod y Garth ‘I am going to Gwaelod y Garth’. This is indicative of automatisation of the entire original string, and suggests that imposing a formulaic boundary after memorisation can lead to errors. In Ch2, aiming for dwy lond llwy fwrdd ‘two tablespoonfuls’ Margaret said dwy lond llwy bren ‘two woodenspoonfuls’. Although ‘two woodenspoonfuls’ is not formulaic in itself, it contains the string llwy bren ‘wooden spoon’ which Margaret had said eight times in Tutorial 2. Also in Tutorial 2, Margaret rendered padell ﬀrio ‘frying pan’ as padell wedi ﬀrio ‘fried pan’. Again, padell wedi

Formulaic language learning on television

ﬀrio is not itself a formula, but wedi ﬀrio ‘fried’ certainly is. Although not introduced in during her Welsh in a Week experience, it is suﬃciently common for Margaret to have heard, even used, it before.

Correct and incorrect mutations The front-mutation system of Welsh is renownedly troublesome for learners. There are three mutations: soft, nasal and aspirate. They apply word-initially, and aﬀect partly overlapping subsets of consonants, changing them to another consonant that contrasts in voicing, nasalisation or aspiration (Table 3). Even after the learner works out that mutated words need unmutating before they can be found them in the dictionary, a major problem persists: the rules governing their occurrence are highly complex. Margaret knew that mutations existed in Welsh and that they were diﬃcult. However, she did not appear to know how they worked or when they applied. Table 3. Patterns of mutation in Welsh

Soft Nasal Aspirate

p

t

c

b

d

g

m

ll

rh

b mh ph

d nh th

g ngh ch

f m

dd n

null form ng

f

l

r

The mutations occurring in the script can be categorised into three types on the basis of how they were introduced in the tutorials. Some were speciﬁcally taught. For instance, Nia used ﬂashcards to demonstrate the soft mutation of gwneud ‘make’, coginio ‘cook’ and dangos ‘show’ after dw i’n mynd i ‘I am going to’. A second group were not taught as such, but the existence of the mutated form was evident because the unmutated form had previously been introduced. For example, menyn ‘butter’ was introduced in Tutorial 2 as a separate item of vocabulary. In the script it occurred in both unmutated and mutated form (e.g. items 18, 46). The third set occurred only as mutations and no mention was made of the fact that they were mutated. They included o ﬂawd (> blawd,‘of ﬂour’), ar ben (> pen, ‘on top of ’), and am bum munud (> pum(p), ‘for ﬁve minutes’). Table 4a summarises the performances for each type. It is clearly the case that the co-existence of both mutated and unmutated forms within the material was detrimental to accuracy. Furthermore, as Table 4b indicates, the inaccuracy increased over time (ticks and crosses represent ‘correct’ and ‘incorrect’ respec-

263

264 Alison Wray Table 4a. Accuracy in reproducing mutated forms (combined data)

Speciﬁcally taught Both forms in use Introduced intact

Correct

Incorrect

56 79 80

0 44 1

Table 4b. Accuracy in reproducing mutated forms (by data type) Items

Speciﬁcally taught Both forms in use Introduced intact

Tutorials

Practice

Ch1

✓

✓

×

✓

×

✓

×

✓

×

✓

×

17 0 4

0 0 0

2 19 16

0 6 0

2 18 13

0 7 0

2 12 8

0 15 0

2 12 10

0 12 0

×

31 0 18 4 29 1

Ch2

R+5

R+9

tively). The deterioration in accuracy in the ‘both used’ category was strongly characterised by the replacement of the correct mutated form by the incorrect base form. There were no spurious phonological alterations that might suggest the internalisation of an incorrect rule. Three further features of mutations can be identiﬁed in the data. The ﬁrst entailed the persistence of a mutation when the grammatical environment causing it had been altered, or the failure to apply a mutation when a change in the grammatical environment rendered it necessary. The second was hypercorrection: the application of a mutation in a situation that did not require it. The third was a lack of conﬁdence about which form was correct. As before, this was most in evidence in the ‘both forms’ category. At various points, Margaret hedged in her pronunciation of initial /p/ and /t/ when they might need to be mutated to /b/ and /d/ respectively, producing an initial consonant that was neither voiced nor aspirated. These observations suggest that the best way to ensure accuracy with morphological forms beyond one’s generative capacity is either to repeatedly practise the sequence in which they occur (as with the tutorial drilling of the specifically taught forms) or to be unaware of their presence. In the former case the learner is, in eﬀect, creating a new formula through ‘fusion’ (Peters 1983: 82), with the mutation safely tucked inside and made familiar through repetition. In the latter, the formula originally accepted by the learner conceals the mutation. Accuracy is at risk where both the mutated and unmutated forms are in use, and therefore ‘sound right’.

Formulaic language learning on television

Internal editing Formulaic strings are much more useful if they are ﬂexible enough to permit the swapping, where appropriate, of both open class items (e.g. nouns) and closed class ones (e.g. tense markers). The loosening of ﬁxedness can be achieved through a process of segmentation (Peters 1983: 35ﬀ), that is, the isolation and separate storage of salient elements, which entails inserting internal boundaries. The location of such boundaries, and hence the nature of the isolated unit, rides on evidence from input to the eﬀect that the element (be it morpheme, word or wordstring) is one of several able to occur as a paradigmatic variant in that context. In Margaret’s case, the production of paradigmatic variations on her script material was not required. It is interesting to note, therefore, that although it was expressly not in her interests to engage in segmentation, she appears to have done so anyway — for her memorised strings were subject to accidental editing. In some instances, Margaret replaced a word with a Welsh synonym. For instance, in R+9, aiming for toddwch y menyn mewn sosban ‘melt the butter in a saucepan’ she said rhowch y menyn mewn padell ‘put the butter in a pan’. Also in R+9, Margaret once replaced chi ‘you (plural/polite)’ with ti ‘you (singular/ familiar)’. There was no obvious pragmatic motivation for this. In other cases, she replaced the Welsh word with its English translation: llwyaid o juice for llwyaid o sudd ‘a spoonful of juice’, salt a phupur for halen a phupur ‘salt and pepper’, and ychwanegwch y cream for ychwanegwch yr hufen ‘add the cream’. While lexical interference from the L1 is common in foreign language learners, usually it can be explained in terms of production pressures when expressing novel messages in real time. There is one example that must be so-explained — rhowch y mixture ‘put the mixture’. Here, Margaret diverged from her script and had to use an English word to make good a sentence for which she did not have the necessary vocabulary. However, in all other cases, Margaret’s selection of an alternative to the word she has memorised was gratuitous. It can only be explained in terms of analytic activity interfering with what ought to have been a very straightforward process of faithful reproduction. Another level of internal editing was seen in the loss of the grammatical particle yn (reduced to ’n after a vowel), which is required between the auxiliary and main verb in the present tense. After 32 successful renderings of dw i’n mynd ‘I am going’ in Tutorial 1, Margaret began to say dw i mynd, using this version four times and the correct one ﬁve times. She later used dw i’n mynd in Ch1 and R+9,

265

266 Alison Wray

but dw i mynd in Ch2 and R+5. The same particle also disappeared from mae’n well ’da ﬁ ‘I prefer’ and dw i’n rhoi ‘I put’. There are two possible explanations for this loss. One is in line with the general observation that formulaic material tends to be phonologically underspeciﬁed, such that the formula as a whole can be reproduced quite convincingly without all of the (particularly unstressed) syllables being fully accurate (Wray 2002a: 37ﬀ, 107ﬀ). Certainly, this clitic form is precisely the sort of item that would be vulnerable. However, the fact that it was present at ﬁrst, and lost later, suggests that Margaret was aware of it phonologically. The alternative explanation is that it was edited out because it did not appear to have a semantic function. This need not imply that Margaret had, consciously or unconsciously, assigned a meaning to everything else in those strings. A more likely scenario would be that even if she had engaged in some segmentation of the lexical material, a rump of phonological forms remained unassigned, representing, between them, the rest of the meaning (Wray 1998: 57). Of these unassigned elements, an unstressed form, especially a clitic, would be particularly susceptible to omission, being neither semantically nor phonologically strong.

Conclusion Margaret Owen’s experience of Welsh in a Week has been used to investigate some current hypotheses about formulaic language. Because her situation, exceptionally, presented an opportunity for maximum success through memorising wordstrings and not altering them, it was possible to examine the extent to which an adult learner is actually able to keep analysis at bay. We have seen that, in the event, Margaret introduced many errors typical of an early stage learner of Welsh, suggesting that she did not have the capability to bypass linguistic analysis, even when it was in her interests to do so. If her case is representative — which seems likely — then there are important implications for language teaching. Recently, there has been increasing interest in how adult classroom learners might use formulaic language to become more nativelike, and one possibility is to introduce fait accompli multiword strings in the classroom without formal examination of their construction. Even leaving aside formulaic routines like How do you do, which have always been so-handled, the introduction of collocational pairs in this way would mean that the learner is able to learn, for example, under control and virtually all, without the teacher having to ‘explain’ how the

Formulaic language learning on television

two components contribute to the meaning of the whole. Since such explanations may result in the learner dividing the pairs for separate storage, and thus being less likely to remember the pairing, such an approach could, in theory, be highly successful. However, if learners are going to engage in the analysis anyway, the teacher’s eﬀorts at being ‘hands oﬀ ’ will be somewhat undermined. Nevertheless, what we see in this case study is a level of linguistic achievement that would be unattainable using conventional teaching methods. After only very minimal tuition, a virtual beginner in Welsh was able competently to deliver a comprehensible cookery demonstration to native speakers. Furthermore, nine months after ﬁlming, she still knew a considerable amount of the material. If, as many have proposed (see Wray 2002a: 191ﬀ for a review), an important function of memorised wordstrings is as a long term reference resource for the learner — language on tap, so to speak — then she had successfully acquired and maintained that resource, albeit not entirely accurately. Of course, the cookery script was all she knew, and so she was no better prepared than before for getting her car ﬁxed by a Welsh-speaking mechanic.Yet, to the extent that our everyday lives do feature a small set of recurring social ‘scripts’, one can imagine that, armed with a couple of dozen, she might actually be able to pass herself oﬀ as linguistically competent quite a lot of the time. No-one is suggesting that such a strategy would be a replacement for the development of a facility with words and rules. On the other hand, given the underlying propensity to engage in analysis anyway, it is interesting to speculate on the extent to which the repeated use of such scripts might ultimately bootstrap the learner into a kind of extrapolated knowledge that was both ﬂexible and rather more nativelike than usual, being based, as the young child’s is, exclusively on the delivery of real language in use.

Acknowledgements The author gratefully acknowledges the extensive help and co-operation of Margaret Owen, Nia Parry, the production team of Fﬂic, and Nefydd Thomas at Acen, and the advice of Dr Gwen Awbery. The ﬁeldwork was funded though an AHRB Innovation Award.

Notes . Items 4, 35 and 47 contain non-standard features common in colloquial Welsh. In the analyses, instances of ‘(and) mix well’ (items 27, 30, 52, 55) were accepted interchangeably, but hybrid forms (a and no mutation) counted as errors.

267

268 Alison Wray 2. The programme was ﬁrst broadcast on Monday 30 September 2002, as Part 4 of Series 2. 3. That is, the tutorial material was not used in this analysis.

References Erman, B. and Warren, B. 2000. The idiom principle and the open choice principle. Text 20: 29–62. Peters, A. M. 1983. Units of Language Acquisition. Cambridge: CUP. Wray, A. 1998. Protolanguage as a holistic system for social interaction. Language and Communication 18: 47–67. Wray, A. 2002a. Formulaic Language and the Lexicon. Cambridge: CUP. Wray, A. 2002b. Formulaic language in computer-supported communication: Theory meets reality. Language Awareness 11: 114–131. Wray, A., Cox. S., Lincoln, M. and Tryggvason, J. 2004. A formulaic approach to translation at the post oﬃce: reading the signs. Language and Communication 24: 59–75. Yorio, C. A. 1989. Idiomaticity as an indicator of second language proﬁciency. In Bilingualism across the Lifespan, K. Hyltenstam and L. K. Obler (eds), 55–72. Cambridge: CUP.

Facilitating the acquisition of formulaic sequences An exploratory study in an EAP context Martha Jones and Sandra Haywood University of Nottingham

Introduction There is a growing awareness that a signiﬁcant proportion of the language that we produce is composed of formulaic sequences and that, as listeners and readers, we do not always decode and encode word by word, but make use of those sequences (e.g. Pawley and Syder, 1983; Nattinger and DeCarrico, 1992; Wray, 2002; Schmitt and Carter, this volume). Sinclair (1996: 82) declares that “units of meaning are expected to be largely phrasal”, proposing that when constructing meaning we operate under two principles: the open-choice principle and the idiom principle (Sinclair, 1991: 109–110). The former conforms to the traditional slot-and-ﬁller view, which assumes free choice of individual lexical items, with the main restraint observed being that of grammaticalness. Contrary to this, the idiom principle asserts that in many instances ‘semi-preconstructed phrases’ are used. In this case, initial choice of meaning leads to the selection of a phrase rather than a series of discrete words. Sinclair hypothesizes that we use both principles, switching from one to the other as necessary. If this is so for native speakers, it should follow that learners of English will ﬁnd formulaic sequences very useful. Cowie (1992: 10) points out that “It is impossible to perform at a level acceptable to native users, in writing or in speech, without controlling an appropriate range of multiword units.” One might expect this to be especially true of learners in English for Academic Purposes (EAP) situations (Granger, 1998; Howarth, 1998). Given this, the obvious questions are how learners should acquire these sequences and what teaching materials would be useful for this purpose. This chapter explores these questions by reporting on an exploratory study of the teaching of formulaic sequences

270 Martha Jones and Sandra Haywood

to a group of non-native EAP students. It begins by examining examples of existing EAP writing textbooks to assess to what extent they deal with formulaic sequences, reviewing some important trends in vocabulary teaching, and then discussing formulaic sequences speciﬁcally in an EAP context. Because this chapter is taking a pedagogical, and not theoretical perspective of phraseology, we will not discriminate between the numerous terms used for this phenomenon. Although terms such as collocation, lexical bundle, and phrase will be mentioned, the preferred term for multi-word strings of language throughout the chapter will be formulaic sequence (Wray, 2002).

Review of academic writing textbooks On EAP courses, diﬀerent textbooks focusing on academic writing are used. The content of these books consists of tasks based on how information is organised in speciﬁc text-types, e.g. Comparison and Contrast, Description of Process, etc. Some attention is devoted to language work, which may include aspects of phraseology, although this may not be made explicit in the tasks. Four well-known and widely-used academic writing textbooks were examined to see how much attention is devoted to phraseology: Skills in Action (Sellen, 1982), Academic Writing Course (Jordan, 1990), Writing (White and McGovern, 1994), and Writing Academic English (3rd ed.) (Oshima and Hogue, 1999). All books include a Reference Page or Structure and Vocabulary Aid at the end of each chapter with words, linking expressions, and academic phrases, some of which could be considered to be formulaic sequences. The table in Appendix 1 summarises the phrasal language portrayed in these coursebooks. There are a number of reasons why such structure and vocabulary reference pages are not very useful if the aim is for the students to acquire formulaic sequences. Firstly, the sheer number of words and phrases is likely to confuse rather than guide the student, as there are few example sentences given and there is no indication of diﬀerences in meaning. The examples available are often decontextualised, which makes it very diﬃcult for the student to learn how to use particular phrases. Secondly, the long lists given include single words as well as collocations, and the phraseological nature of the language may not be obvious to the student. Thirdly, there is no information regarding the frequency of the words or phrases in real language. Therefore, students may use expressions which are rare in academic prose, and this is a materials writing problem as well as a learning problem. If students are investing time and eﬀort in learning

Facilitating the acquisition of formulaic sequences

formulaic sequences used in academic writing, they should be learning those which are more frequently used. Fourthly, the books do not include exploratory tasks to help learners understand how phrases are used. Lastly, there is no apparent attempt to teach learning strategies for the acquisition of formulaic sequences. If coursebooks fail to give due attention to the teaching of formulaic sequences in academic discourse, then it is up to the teacher to do so. The following section reviews speciﬁc approaches to vocabulary teaching in general as well as multi-word units such as collocations.

How to teach formulaic sequences In spite of the increasing interest in and knowledge about phraseological development amongst L1 and L2 speakers, little progress has been made when it comes to applying the new insights to the EFL classroom. This being so, we drew on the wider ﬁeld of vocabulary teaching in general, for guidance on teaching methodology. Current trends in vocabulary teaching suggest the importance of several factors, including: the beneﬁts of explicit vocabulary instruction, the advantages of encouraging a deep level of processing, and the necessity of ensuring students are aware of vocabulary learning strategies (Nation, 1990, 2001; Sökmen, 1997; Schmitt, 2000). Another feature apparent in many teachers’ approach to vocabulary teaching is the attention paid to collocation, and a smaller number of teachers have started to use concordance lines as a way of investigating vocabulary. Explicit instruction, alongside incidental implicit learning, is considered to be the optimum approach for classroom-based courses (Hulstijn, 2001). If this is carried out in a way which encourages a deep level of processing, acquisition will be enhanced. Suggestions include: classifying lexical items, establishing vocabulary networks, working on synonyms and antonyms, completing componential analysis grids (McCarthy, 1990). Nation (2001) claims that three psychological processes are necessary for successful vocabulary learning: noticing, retrieving and generating. Noticing can occur when a word is highlighted as being salient in text input or in discussion of the text. Looking up a word in the dictionary, guessing from context, deliberately studying a word, or having a word explained are all possible factors leading to noticing. Two important conditions for words to be noticed are motivation and interest. For example, if the content of a text is perceived to be in-

27

272

Martha Jones and Sandra Haywood

teresting or stimulating, the learners are likely to become more engaged in the learning activity. Once a word has been noticed and understood, the next stage in the process is retrieval, which can be receptive (i.e. perceiving the form of the word and retrieving its meaning when learners encounter the word in listening or reading) or productive (i.e. having the need to communicate the meaning of a word and retrieving the spoken or written form actively). The last stage in the process is creative/generative use, which takes place when a previously encountered word is met or used again in a slightly diﬀerent way. Discussion can be useful in this stage: Stahl and Vancil (1986, cited in Nation, 2001), found that the discussion involved in building a semantic map was a key factor in vocabulary learning. Negotiation is also beneﬁcial, as a wide range of grammatical contexts of a particular word or phrase can be generated. Likewise, the use of concordance texts could be extremely helpful since they allow multiple encounters with a lexical item in a variety of contexts. The use of concordance lines as a way of studying lexis, as well as other linguistic features, is advantageous in that it requires a deep and thoughtful level of mental processing as students become involved in investigating for themselves the typical patterns of use of the target items. They do this via the use of authentic data, which in itself can be a motivating factor. A set of concordance lines provides the opportunity for generative use, revealing many aspects of a lexical item. Meeting the item in multiple contexts can, for example, illustrate and reﬁne its meaning, can reveal the grammatical structures in which it is typically used, and can give information about collocations and semantic prosody (Stubbs, 1995). Although little teaching material focusing on this type of study has been published (although see Tribble and Jones, 1990), perhaps because by its very nature it is used to respond to locally-perceived needs, there are several reports of interesting work carried out by individual teachers with their classes (see Stevens, 1995 for a review). Another important aspect of the methodology of vocabulary teaching is the teaching of learning strategies (e.g. Schmitt, 1997). Teaching materials include, for example, suggestions on planning vocabulary study (Ellis and Sinclair, 1989) and diﬀerent ways of organising a vocabulary notebook (McCarthy and O’Dell, 1994). It is of course impossible to teach all the vocabulary that a learner will need within the constraints of a timetabled course. Learners must therefore be prepared to continue vocabulary study outside class. This is especially so on a pre-sessional EAP course where the ultimate aim is to give the learners the conﬁdence and knowledge that they need to, as it were, ﬂy the nest and enter their departments ready to engage with their studies and continue improving

Facilitating the acquisition of formulaic sequences

their English language independently. As Conzett (2000: 87) argues, whatever approach is used to teach vocabulary, the teachers’ goal should be to “empower [our] students as language learners”.

Formulaic sequences in an EAP context According to Wray (2002), formulaic sequences have a number of important functions: ﬁrstly, their use enables an individual to express identity with a group, for example a social or academic community; secondly, their use reduces the processing eﬀort for the listener or reader; and thirdly, it allows the speaker or writer to express individual identity. The ﬁrst two functions are very pertinent in an EAP context. Both undergraduates and postgraduates serve a kind of apprenticeship in their chosen discipline, gradually familiarising themselves not only with the knowledge and skills of their ﬁeld, but also with the language of that ﬁeld, so that they become capable of expressing their ideas in the form that is expected. As they do this, their use of formulaic sequences enables them, for example, to express technical ideas economically, to signal stages in their discourse and to display the necessary level of formality. The absence of such features may result in a student’s writing being judged as inadequate. Commenting on the work of a Jordanian student, one lecturer wrote: “the use of English . . . is a problem throughout the essay. By this I do not mean your English is poor or unintelligible but it is too colloquial and the phraseology is poor” (cited in Green, 2000: 141). On the other hand, familiarity with and control of the language of their ﬁeld indicates their membership of the group, in this case, the community of their chosen academic discipline. In addition, when the writing style is conventional, it attracts little attention. This lightens the processing load for the reader and allows the writer’s message to be more easily perceived. The third function noted above comes into play once the basic apprenticeship has been completed and the student is ready to adapt or even reject the conventions to serve a particular purpose. Wray’s model suggests that the overriding purpose of the use of formulaic sequences is “the promotion of the [user’s] interests” (2002: 95). The prime interest of learners of English on pre-sessional EAP courses is to graduate successfully from their chosen university course. Where improvement in phraseological competence is likely to contribute to an increase in a student’s grades, it seems very much in the students’ interest to seek this improvement. Thus a focus

273

274

Martha Jones and Sandra Haywood

on formulaic sequences in academic writing in the EAP classroom seems fully justiﬁed as it can help the students reach their academic goals.

Methodology Participants A study was carried out with 21 learners from two intact classes at the Centre for English Language Education at the University of Nottingham. They were beginning the ﬁrst three months of a six-month, intensive, pre-sessional EAP course. The study followed them over one term i.e. ten teaching weeks. The participants were from a number of diﬀerent countries, preparing to study in diﬀerent ﬁelds, some at undergraduate level but most at postgraduate level. The minimum level of English at entry was IELTS Band 4.5. The treatment group of ten participants received training in formulaic sequences, while the control group of 11 participants did not. Although both groups followed the same syllabus, they may have been exposed to slightly diﬀerent materials and teaching styles.

Selection of the target formulaic sequences Our ﬁrst task in designing the study was to decide which formulaic sequences to teach. This is far from straightforward since, as Read and Nation (this volume) point out, determining what a formulaic sequence is or what criteria are required for a phrase to be regarded as a formulaic sequence is problematic. We had several options: we could use our intuition as experienced EAP tutors; we could use the formulaic sequences previously identiﬁed for the purpose of teaching, for example in the textbooks previously reviewed (although it appears the sequences were selected in these books on the basis of the authors’ intuitions); or we could use sequences already identiﬁed for other purposes. We explored these options keeping in mind our prime concern, which was to produce materials which would be of use to each student in our mixed-discipline class. Our students particularly needed to work with formulaic sequences frequent in writing rather than speech; frequent in academic writing rather than ﬁction, and not speciﬁc to any one genre, but useful across disciplines. With these pragmatic factors in mind we decided to base our selection on the work of Biber et al. (1999). In their book, Chapter 13, ‘Lexical expressions in speech and writing’ focuses on what they call lexical bundles i.e. “bundles of words that show a

Facilitating the acquisition of formulaic sequences

statistical tendency to co-occur” (p. 989). Using a corpus of academic writing of over 5 million words, covering a wide range of disciplines, the authors selected the most frequent lexical bundles in academic prose. Although the bundles do not always represent complete structural units, and were selected solely on the basis of frequency, this source seemed to suit our purpose well in that it identiﬁes sequences of words which occur commonly in academic writing in general. Biber et al. deﬁned lexical bundles as recurring sequences of three or more words (p. 990), and found that sequences of this length were at least ten times more common than longer sequences. In fact, the 3-word bundles were too numerous to list in the book; only the most frequent are given. These are: in order to the number of one of the the presence of part of the the use of

the fact that there is a there is no

Using the listings of four, ﬁve, and six word bundles as source material, we selected a number of bundles of each length, keeping in mind the criteria of usefulness and relevance to the speciﬁc language functions we intended to teach. As part of the process, we extrapolated from the longer-sequence lists to make assumptions about the unlisted 3-word sequences, in order to compile additional 3-word bundles. Appendix 2 includes the full list of formulaic sequences used in the study, along with their Biber et al. grammatical classiﬁcations.

Training in formulaic sequences The purpose of the study was to research a possible approach to the teaching of formulaic sequences which would raise awareness of the sequences, increase accurate and appropriate production of the sequences, and develop the students’ learning strategies. We decided to focus on the use of formulaic sequences in academic reading and writing. Written texts are the main form of assessment in most university departments, so proﬁciency in writing is particularly crucial to a student’s success. The pre-sessional course covered several components including Reading & Summarising and Academic Writing. The plan was to use some of the reading classes to raise awareness of the importance of formulaic sequences in academic texts and explore the use of a selected few in depth. This would then feed into the writing classes where students would review the awareness raising work and be encouraged to try incorporating formulaic sequences into their own written production of discursive essays. Overall, approximately

275

276 Martha Jones and Sandra Haywood

two hours per week were devoted to the study of formulaic sequences. Since there is as yet no proven methodology for the teaching of formulaic sequences, or as Granger (1998: 159) puts it: we do not know “what to teach, how much to teach, and least of all, how to teach”, we planned a cautious approach, introducing the work on formulaic sequences in a small yet systematic way, whilst continuing to use methods and materials familiar to the students in the majority of the classes.

The reading component The reading classes were used to explain the phenomenon of formulaic sequences, raise awareness of their importance, study and practise usage, and to model learning strategies. Four texts were selected of a general nature (so that they would not be limited to a single academic discipline), which tended to be journalistic in style. They corresponded to certain text types, speciﬁcally problem–solution and cause– eﬀect, since these were the kind of texts the students were expected to produce in this part of their course. The texts were then adapted to increase the density of formulaic sequences typically found in academic writing. (see Appendix 3 for an example of an adapted text). The selected texts were used in the following way. Firstly, normal meaningfocused reading class activities were carried out: for example, identifying main points, scanning for speciﬁc details. It was only once the students had become familiar with the text, that the focus switched to look at the form in which the meaning was expressed, speciﬁcally the use of formulaic sequences. To raise students’ awareness of formulaic sequences, the text was re-presented to them, this time with selected sequences highlighted in bold italics in order to increase their salience and thus encourage noticing (see Bishop, this volume). This simple step drew students’ attention to the sequences and, since each text included numerous sequences, indicated their importance. With the ﬁrst adapted text, a clear explanation of our purpose was given: to help students move towards a more academic style in their writing, by studying the kinds of phrases often found in academic texts. We chose to use the concept of academic style since this is a familiar concept for pre-sessional students and all are aware of the need to develop this aspect of their writing. An additional concern, following Wray’s advice (2002: 191), was to sanction the holistic use of formulaic sequences. Most of the students’ previous training had encouraged them to compose sentences word by word. We were now asking them to notice and remember sequences of words, and indicating that it was acceptable, even

Facilitating the acquisition of formulaic sequences

desirable, to use sequences, as well as single words, as building blocks in the creation of their own texts. Finally, our aim was to equip students with the strategies which would enable them to acquire the knowledge needed to use formulaic sequences accurately and appropriately in their own work. This meant, as with the learning of single word lexical items, getting to know more about the sequence than simply its meaning. Activities were designed to encourage students to spend some time studying the sequences, thus fostering a deeper processing than might otherwise occur. Some activities were based on the way the sequences were used in the text being studied, for example: classifying sequences into meaningbased groups, analysing and classifying the sequences according to their structure, ﬁnding in the text academic equivalents of less formally written sentences and comparing style. Other activities used concordance lines and corpus extracts since this allowed the students to study a selected sequence in several diﬀerent contexts and learn more about its typical usage. This type of activity, especially, was designed to also provide the opportunity to develop the learning skills of the students by modelling the process of engaging with a formulaic sequence in order to understand how to use the sequence appropriately. This meant analysing the patterns in which the sequence was typically used in a set of concordance lines or longer extracts of discourse. This involved considerations of grammar, noticing for example that the sequence the number of is almost always followed by a plural noun phrase. It involved considerations of meaning, noticing for example that the sequence the spread of is usually followed by a noun phrase with a negative connotation. When studying corpus extracts for the sequence to what extent, students were guided, through the use of questions prompting discussion, to an understanding that this sequence is followed by a clause and used in a context of uncertainty (See Appendix 4).

The writing component The writing classes were used to review and expand work carried out in the reading classes and to support productive use. The students were asked to write four cause-eﬀect and problem-solution type of essays in the writing classes. At the beginning of the lessons and as part of the essay planning process, there was a revision process of speciﬁc formulaic sequences which would be appropriate to use in a particular essay. In some cases, the students were shown lists of frequent formulaic sequences used in academic discourse, according to Biber et al. (1999). These were classiﬁed according to their lexico-grammatical patterns. It

277

278

Martha Jones and Sandra Haywood

was important to remind students of the grammatical structure of these formulaic sequences so that they would be able to use them accurately in their essays. The students were also asked to analyse the functions of formulaic sequences in context. They were shown short texts with speciﬁc formulaic sequences in bold type and their functions were analysed. Gap-ﬁll exercises were used as well in order to elicit formulaic sequences which had been explored previously. Appendix 5 illustrates one of these texts and some exercises focusing on ‘contrast’ formulaic sequences. Once the students had become more proﬁcient in the analysis and use of formulaic sequences, they were asked to produce concordance texts in one of the lessons, using ‘Word Pilot’ (2003), a concordancing program. By this time, the students were already familiar with the analysis of speciﬁc formulaic sequences in concordance texts, as this type of approach had been used in the reading classes. Through guided tasks, the students were asked to investigate the frequency, i.e. number of occurrences, and information on collocates. The students also noted down their observations for later discussion with other members of the group. Again, this exercise was considered to be fruitful as it generated a number of questions and discussion.

The assessment component The exercises in the reading and writing components were designed to raise the students’ awareness of formulaic sequences and their ability to use them in their essays. We included several types of assessment in the research design to determine whether the students had improved in these areas.

Raising awareness Tests were carried out at the beginning and end of the study to ascertain whether a change had occurred in the students’ awareness. A short academic text was selected for each test, based on a topic accessible to all the students. The two texts were adapted where necessary to facilitate understanding and to ensure a high density of the formulaic sequences typical of academic writing. For each test, the students were initially given a comprehension task, the answers to which were discussed in class. This was to ensure students had a good understanding of the ideas in the text; we were interested in their awareness of formulaic sequences and we did not want unknown vocabulary to cloud the issue. The second task was the actual test. Students were given ﬁve minutes to respond to the instruction:

Facilitating the acquisition of formulaic sequences Imagine you are asked to give advice to Stage 1 students [students at a level below them] who want to improve their academic writing. Underline the words/phrases which would be useful for them to learn.

The ratio of phrases to single words highlighted was then examined. It was considered that this ratio would give an indication of each student’s awareness of the importance of paying attention to sequences of words. It was not intended to measure their knowledge of formulaic sequences but simply to indicate whether and to what extent they paid attention to phrases rather than single words only when studying a text. For this reason any underlined sequence of two or more words was counted. This included for example: adjective-noun combinations such as signiﬁcant costs; noun-noun combinations such as research project; phrasal verbs, such as to soak up; linking phrases such as as a result of; noun phrases such as a lack of.

Producing formulaic sequences on a c-test There was also a pretest and posttest which measured the students’ ability to produce the target formulaic sequences. The pretest text was an adapted version of an article found in New Scientist, 12 January 2002 entitled ‘Immune to Pregnancy’. The modiﬁed C-test portions of the text elicited the following formulaic sequences: the presence of, the levels of, this kind of, the development of, and the relationship between. The posttest text was an adaptation of an article on Qatar from National Geographic, March 2003 and its C-test elements focused on the following phrases: to what extent, as a result, the kind of, the number of, and the size of. The following extract from the ‘Immune to Pregnancy’ text illustrates the C-test format: Beer suspected that too much of th___ ki___ o__ chemical might encourage the immune system to stop t__ deve________ o__ the embryo so he gave drugs that reduce levels of the chemical to 100 women with fertility problems. The texts were similar in length and number of formulaic sequences, averaging 323 words, which was considered to be of a reasonable length for the students to cope with within an approximately 30-minute period in the Academic Writing class. Some words or phrases which were regarded as diﬃcult were put in italics in the text and a deﬁnition provided in a glossary at the end of the texts to avoid vocabulary overload which could distract the students from the main task of producing the target formulaic sequences. The scoring scale below was used to assess accurate use of formulaic sequences in the two tests and in the students’ essays (see below).

279

280 Martha Jones and Sandra Haywood

Measuring production of formulaic sequences: Key: 3 = Correct phrase 2 = Correct phrase but problems with morphology, e.g. the relation between instead of the relationship between 1 = Some idea of phraseology but could not get the correct phrase, e.g. the preparation of instead of the presence of 0 = No idea of phraseology

Producing formulaic sequences in essays To further assess whether students’ production of formulaic sequences had developed over the study, two essays from the students in the treatment group were compared. The ﬁrst essay was written on the topic of homelessness and the second set on teaching disruptive children. Unfortunately, due to curriculum constraints the ﬁrst essay was written in Week 7 and the second in Week 9, allowing only two weeks between the two assignments. We asked a panel of ﬁve experienced EAP teachers to look at the two essays from each student. To prepare the panel, we gave them a brief description of the study and its aims. They were asked to familiarise themselves with extracts from Biber et al. (1999: Unit 13) and read through the adapted texts we had used. Care was taken to give the panel an understanding of our interpretation of the term formulaic sequence without giving them a list of the sequences used. The panel was then instructed to read the essays and identify any formulaic sequences used by highlighting them. They were not asked to judge the accuracy or appropriacy of the way the sequences were used, but simply to highlight any used, including incorrect or inappropriate uses. It was expected that using a panel of ﬁve would result in a large degree of conformity. The sets of essays were then collated and those sequences highlighted by at least four members of the panel were noted. Each essay was then given a score, using the same system as in the C-test measurement. Classroom observation and student interviews To get an insight into the learning process and the reactions of the students to the new materials we used daily classroom observation and interviews with three of the students. The daily observations were kept by each author and were discussed at weekly meetings. This included for example, comments on individual students’ reactions and progress. The interviews were carried out during Week 2 by the authors and lasted approximately 30 minutes each.

Facilitating the acquisition of formulaic sequences

Results Raising awareness Our hope was that over the ten weeks of the study, students’ awareness of the importance of paying attention to phrases would increase. The results, as shown in Table 1, indicate that an increase in awareness did indeed occur for the majority of students. Six students showed a very marked increase in the total number of formulaic sequences identiﬁed from pretest to posttest (Students 1, 2, 6, 7, 8, 10). This is encouraging, but may have been partially caused by it somehow being easier to ﬁnd sequences in the second text. However, the ratio for word vs. sequences should control for this to some extent since both variables relate to the same test. Using this measure, six students highlighted more sequences than words in the posttest, even though they had highlighted more words than sequences in the pretest (1, 3, 4, 6, 8, 10). For example, Student 10 underlined nine single words but only one sequence in the pretest. In the posttest, however, her awareness of the importance of phraseology had increased: she underlined two single words and twelve sequences. Another two students (2, 5) increased the ratio of sequences to single words, although they still highlighted more words. It is interesting to note that Student 7 was the only subject to highlight more sequences than words on the pretest. At the posttest this had increased slightly to a ratio of 14 single words to 19 sequences. Her background diﬀered from the other students as we discovered in the interviews (see below).

Table 1. Results of awareness pre- and posttests Pretest

Posttest

Student

Words

Sequences

Words

Sequences

1 2 3 4 5 6 7 8 9 10

18 26 7 6 24 7 7 14 absent 9

4 9 6 5 3 1 8 3 absent 1

1 24 1 1 20 9 14 10 16 2

19 21 8 9 8 12 19 12 1 12

28

282

Martha Jones and Sandra Haywood

Interestingly, the mean length of the formulaic sequences underlined also increased (discounting proper nouns [e.g. the School of Chemical, Environmental and Mining Engineering]: pre-test Mean= 2.6 words per sequence, post-test Mean= 3.7 words per sequence) with fewer two-word collocations and a notably higher number of noun phrases with of in the posttest.

Producing formulaic sequences on a c-test Pretests Table 2 shows the results of the C-test pretest of the students in the treatment group. Overall, the participants showed considerable ability to complete the c-test items with a mean score of 1.7 on a scale of 3. With the exception of the presence of, there are few 0 scores and a preponderance of 3 scores. Clearly, on this small sample of ﬁve target phrases, the students had some knowledge of these sequences. The scores did vary considerably according to the sequence however. Only two students were able to complete the phrase the presence of even partially on the C-test, while almost all students were able to produce the levels of. Student 1, who had had considerable exposure to academic phrases in her country, as she had studied her subject in English, was able to achieve the highest mean score in the group (2.2). It is interesting to note that although in general, Students 2, 3, 4, and 6 were considered to have problems with accuracy in essay writing in general, they scored reasonably well on the sequence C-test.

Table 2. Results of C-test pretest (Treatment Group) Student

the presence the levels of of

this kind of

the develop- a relationship ment of between

Mean

1 2 3 4 5 6 7 8 9

1 0 0 0 0 0 1 0 0

3 3 3 3 3 2 3 3 3

2 2 2 2 0 2 2 0 2

3 3 3 2 0 1 0 2 2

2 2 2 2 3 3 0 1 3

2.2 2 2 1.8 1.2 1.6 1.2 1.2 2

Mean

0.2

2.8

1.5

1.7

2

1.7

Facilitating the acquisition of formulaic sequences Table 3. Results of C-test pretest (Control Group) Student

the presence of the levels of

this kind of

the develop- a relationship Mean ment of between

1 2 3 4 5 6 7 8 9 10 11

0 0 0 0 0 0 0 0 0 0 0

0 2 2 3 1 1 3 2 3 3 2

0 2 0 0 1 0 2 0 0 0 3

3 0 3 3 2 3 0 0 3 3 3

3 2 2 3 3 3 2 2 3 2 3

1.2 1.2 1.4 1.8 1.4 1.4 1.4 0.8 1.8 1.6 2.2

Mean

0

1.8

0.7

2.09

2.5

1.5

Table 3 shows the scores of the control group on the C-test pretest. The scores for the control group were lower than the treatment group, which is somewhat surprising in that the treatment group was considered to be weaker in terms of language proﬁciency. This would indicate that knowledge of this kind of sequence is not tightly linked with general language proﬁciency. However, given the small number of participants and items in this exploratory study, such a conclusion would be highly speculative. In any case, the purpose of the control pretest is simply to provide a baseline from which to compare the control posttest results.

Posttests Two students from the treatment group were absent (2 and 3) when the posttest was administered at the end of the study and Student 10 had not taken the pretest, so these students cannot be compared longitudinally. This reﬂects the diﬃculty of carrying out in-depth longitudinal studies with small numbers of students. Table 4 presents the C-test posttest scores for the treatment group. Of the seven students who took both the pre- and posttest, ﬁve increased their mean scores. This is reﬂected in an increase in the overall mean score from 1.7 on the pretest to 1.85 on the posttest. This is a small gain, and is impossible to substantiate with statistical tests of reliability due to the small number of participants, but

283

284 Martha Jones and Sandra Haywood Table 4. Results of C-test posttest (Treatment Group) Student

to what extent

as a result

kind of

the number of the size of

Mean

1 2 3 4 5 6 7 8 9 10a

3 Abs Abs 3 3 3 2 1 3 1

3 Abs Abs 3 3 1 3 3 3 3

3 Abs Abs 3 3 0 3 1 3 0

3 Abs Abs 1 1 1 3 2 3 2

1 Abs Abs 0 0 0 0 0 0 0

2.6 Abs Abs 2 2 1 2.2 1.4 2.4 1.2

Mean

2.4

2.7

2

2

0.1

1.85

a This student did not take the pretest

it is nevertheless suggestive. This is particularly true given the relatively short time period of the treatment (only 8 weeks). Also, of the three weakest students getting a mean of 1.2 on the pretest, Students 5 and 7 made considerable improvement in their scores. However, Student 8 made minimal progress. Student 6 actually had a lower score on the posttest, but during the time the study was being conducted, she had not attended the Academic Writing class regularly and so only received part of the training. Table 5. Results of C-test posttest (Control Group) Student

to what extent

as a result

kind of

the number the size of of

Mean

1 2 3 4 5 6 7 8 9 10 11

2 3 Abs. Abs. Abs. 2 Abs. 0 0 Abs. 2

1 1 Abs. Abs. Abs. 3 Abs. 3 1 Abs. 1

2 0 Abs. Abs. Abs. 0 Abs. 0 0 Abs. 0

1 1 Abs. Abs. Abs. 1 Abs. 0 0 Abs. 1

0 0 Abs. Abs. Abs. 0 Abs. 0 0 Abs. 0

1.2 1 Abs. Abs. Abs. 1.2 Abs. 0.6 0.2 Abs. 0.8

Mean

1.5

1.6

0.3

0.6

0

0.8

Facilitating the acquisition of formulaic sequences

It must be remembered however, that the pretest and posttest used diﬀerent formulaic sequences, and so the pretest and posttest scores are not directly comparable. The better results on the posttest may have resulted from the posttest containing sequences that were somehow ‘easier’. Thus, the most telling contrast is the treatment group vs. control group comparison where this is controlled for. We were disappointed with the number of absences in the control group on the posttest day, but the schedule allowed no time for a makeup session. Nevertheless, the results seem clear: no student in the control group improved their score in the posttest (Table 5) and all but one had lower scores. This indicates that the posttest is unlikely to have been ‘easier’ and anecdotally, the text was seen to be simpler than that of the text used for the pretest because it was not scientiﬁc. Taken together, these results suggest that the modest improvements in the treatment group are due to increased knowledge due to the training.

Producing formulaic sequences in essays In Table 6, the number of formulaic sequences in each student’s essay is shown, as agreed upon by the rating panel, as well as the total score for each essay according to the 0–3 rating scale. As can be seen, the results were inconclusive. In most cases the scores suggested no improvement had been achieved. Only in the essays of Student 8 do we see noticeable progress in the number of phrases used and in the total score. This is interesting since this particular student was generally fairly weak at grammar. It could be that, because of this, she relied more heavily on reproducing phrases than a learner with a more analytic approach. We could thus speculate that the method of learning and storing formulaic sequences may be inﬂuenced by individual learning style, although a larger longitudinal study found no eﬀect for motivation, aptitude, or attitude variables on the learning of formulaic sequences (Schmitt et al., this volume). The results are also inconclusive if we analyze the mean score per sequence. With this measure, three participants improved and three gained lower scores. Overall, there is a disappointing lack of apparent improvement in terms of the use of phraseology in the students’ essays. Several factors may have contributed to this lack of evidence of progress. Most importantly, because of curriculum constraints the genre of discussion essays was only dealt with from Weeks 6 to 10, which meant that the gap between Essays 1 and 2 was a mere two weeks (allowing for preparation in week 6 and a test in week 10). In addition, the teaching input diﬀered. For Essay 1, the essay on homelessness, there was signiﬁcant teacher support: texts on the topic

285

286 Martha Jones and Sandra Haywood Table 6. Number and quality of phrases in student essays Student

Essay 1

Essay 2

Number of Total phrases used score

Mean score per phrase

Number of Total phrases used score

Mean score per phrase

1 2 3

9 6 13

24 17 39

2.67 2.83 3.00

8 7 Abs.

20 16 Abs.

2.50 2.29 –

4

14

38

2.71

7

19

2.71

5 6 7 8 9

18 15 Abs. 5 11

50 48 Abs. 13 29

2.78 3.20 – 2.60 2.64

6 Abs. 10 9 7

18 Abs. 29 25 20

3.00 – 2.90 2.78 2.86

were given to the students and studied in class. For Essay 2, there was little textual support so the students had to rely more on their own ideas, and thus on their own range of lexis and grammar. As a result, in general, they used a lower number of sequences.

Student interviews Three students (1, 6 and 7) were selected to be interviewed after the pretest had been administered and at the end of the study. Unfortunately, there was no time to conduct the second interviews at the end of the study. However, the students were asked to give their views in writing instead. We were interested in ﬁnding out about their background, in terms of academic studies and English language training, and also about their vocabulary learning strategies. Student 1, a PhD student in Toxicology, had the highest mean score in the group. Although she had had few English language classes since school, during her undergraduate studies she had attended lectures in English, had read books in English and taken examinations in English, thus had had considerable exposure to academic English in her subject area. The strategy she employed at this time was to look at the words and phrases she needed to learn in their context and to repeat them to herself. This had clearly been to some degree successful. As she said: ”Before I used phrases but I didn’t know it was phrases.” During the study she began collecting formulaic sequences in a vocabulary workbook. Her awareness increased markedly and production also showed an increase. Student 7, a Masters student in Journalism, was the only student in the group

Facilitating the acquisition of formulaic sequences

who had followed an English for Academic Purposes course previously. During this course she had been given a list of phrases useful for academic writing to learn. She had a higher awareness of the importance of phrases than the others at the pretest, perhaps because of this. In addition, she had spent some time working as an accountant in her home country, using English to communicate with overseas visitors occasionally. As a learner, she was quite self-aware; she had tried several diﬀerent strategies for learning vocabulary but felt that none had been very successful. During the study, she focused on noting and learning “collocations” and “connecting words”. She commented that she found the large number of phrases that she met confusing since some had “almost same meaning”. She indicated that she would prefer a limited list of the most useful phrases. In spite of this, her awareness and production scores both increased. Student 6, planning to follow a Masters course in Marketing, was one of the weaker students in the group. Her previous vocabulary learning strategy had been to record new words with a translation or explanation in her ﬁrst language and she had paid little attention to phrases. She was aware that she needed to make a considerable improvement in her academic writing and during the study she started to note down phrases, and create her own sentences with them. She commented on the diﬃculty of phrases with similar meanings being used in diﬀerent grammatical structures. Her awareness increased, but unfortunately, in the time allowed for the study, her production score decreased. In spite of diﬀerences in their progress, as shown by our measurement tools, and a slight feeling of being overwhelmed by the range of meanings and structures in the phrases brought to their attention, all three students felt that the approach to formulaic sequences in the Reading and Academic Writing classes had helped them to improve their essay writing. Some of their comments are given below: I think that it is very useful to use such phrases in academic writing. These phrases help to explain some points or ideas. (Student 1) The phrases can help me to get some ideas. Also, the concordance can give me some ideas about how to use a linking word in a correct way. (Student 6) It seems to me that the phrases you gave us in class are useful when I write an essay. (Student 7)

The above comments were encouraging in that the students had understood that paying attention to phrases as whole units could be helpful to them. They also found some aspects of the teaching methodology useful, for example, the use of concordance lines. In addition, when these students were shown their

287

288 Martha Jones and Sandra Haywood

scores in the pretest and the posttest, with the exception of Student 6, they were pleased about their results and the fact that there had been some improvement.

Classroom observation The students’ reactions to the new approach and materials were interesting. Initially they seemed rather uncertain about the value of the work. This was perhaps because we were asking them to pay attention not to the diﬃcult new vocabulary in the text but to words which in many cases they had met before and thought they knew, words such as cause, development, way. Hill, Lewis, and Lewis (2000) also comment on the fact that their students found the phrasal nature of language strange initially. A discussion about academic style and what constitutes it, dealing especially with one of the most common structural patterns found in academic writing, the noun phrase with of-phrase fragment (Biber et al. 1999), led the students to a better appreciation of the usefulness of formulaic sequences. This discussion was reinforced by text-based activities where students were asked to ﬁnd in the text equivalents of less formal expressions. For example, students were given: “Some people have suggested that you could select workers according to the information about this”. In the text they then found “It has been suggested that workers could be selected on the basis of this.” Observations of the students’ vocabulary notes indicated that they were beginning to use the strategy of paying attention to and noting down the unit of the sequence rather than the single word when possible. The strategy of thinking about typical usage, introduced through the concordance line study, was also sometimes made evident through their questions to the teacher, for example “Do you usually say the reason of ?”, and they seemed more willing to accept that a certain phrase, although grammatically possible, would not usually be chosen by a native speaker.

Discussion Did we achieve our aims? One aim was to raise awareness of the importance of phrases in academic written texts. The results of our pre- and posttests indicate clearly that in this we were successful. However, we have no way of knowing whether our students will transfer this heightened level of awareness to contexts outside the classroom, how long the awareness will last or whether it will be helpful to the learners once they are in their departments. Another aim was to

Facilitating the acquisition of formulaic sequences 289

help students produce phrases. The results showed a slight improvement in the students’ production of phrases in controlled situations, that is the C-test task. However, improvement in the use of phrases in their essays was less noticeable. But while there was no indication of a deﬁnite improvement in the group performance, there were instances where individual students used phrases accurately and appropriately in their own unsupported writing, for example: the rate of illiteracy among the people (Student 1) the best way of spreading knowledge from other countries (Student 6) the way in which to help the homeless (Student 8).

Of course, there were also inaccuracies, for example, missing articles: oppressive regimes have often used system of national registration (Student 2).

Many of the sequences that we included in our materials had at their core a lexical word which had already been encountered by the students, perhaps many times, for example: way, cause, rate, size, system. Under these circumstances it is perhaps unsurprising that this single lexical word appeared more salient and that the other important grammatical elements of the sequence would be paid less attention. However, given the way in which we presented the sequences in the texts, highlighted in bold italics, it is likely that students did perceive the sequences as such (Bishop, this volume). It seems therefore, that they did not always memorise them as chunks, or did not remember the chunks with suﬃcient accuracy. As teachers, we became aware that some of the sequences were more diﬃcult than others both to understand and to use appropriately. The noun phrase with of-phrase fragment (e.g. the purpose of ) seemed to be the easiest category. Generally its function in the sentence is clear as it behaves in the same way as a simple noun. Other sequences, whilst perhaps no more diﬃcult in themselves, were clearly trickier to use appropriately. This suggests that, for teaching purposes, it would be advantageous to attempt to establish a cline of diﬃculty for formulaic sequences.

Limitations This exploratory study followed a group of 10 students over one ten-week period within the setting of a full-time EAP course. Inevitably, such a study suﬀered from several limitations.

290 Martha Jones and Sandra Haywood

Firstly, we were working with students at an intermediate level of English who had already been studying English for a number of years. Their previous years of learning English had engendered certain habits, both in the way in which they perceived written texts (paying attention to unknown vocabulary, usually one-word lexical items) and the way in which they composed their own writing (using single words as building blocks). For our students, the unit of the word is the most salient. The fact that the study in total was only ten weeks long gave little time for evidence of progress to be seen. Curriculum constraints meant that within that period only one or two hours a week were spent focussing on the use of formulaic sequences. Also the tight teaching schedule meant that there was no time to give students absent from the tests a second chance to take them.

Conclusion This chapter has described the diﬀerent stages of an exploratory, in-depth study among students on an EAP course to promote and assess their progress in the recognition and production of frequent formulaic sequences used in academic discourse. A combination of quantitative and qualitative research methodology was used to evaluate diﬀerent aspects of the students’ passive and active knowledge of speciﬁc formulaic sequences. Despite time and curriculum constraints, it seems that by the end of the study, through repeated exposure and discussion, i.e. Noticing and Retrieval, in Nation’s terms, (2001), most students had shown greater awareness of formulaic sequences used as whole units, and a few students were able to use certain formulaic sequences accurately and appropriately in their essays. However, there was not enough time to assess full generative use of formulaic sequences. Future research should therefore concentrate on the investigation of diﬀerent approaches to the teaching of formulaic sequences for longer periods of time to determine how and to what extent we can help our students master the important element of phraseology in academic contexts.

References Biber, D., Johansson S., Leech G., Conrad S. and Finnegan E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education. Conzett, J. 2000 Integrating collocation into a reading and writing course. In Teaching Collocation. M. Lewis (ed.), 70–87. Hove: Language Teaching Publications.

Facilitating the acquisition of formulaic sequences Cowie, A. P. 1992. Multiword lexical units and communicative language teaching. In Vocabulary and Applied Linguistics, P. Arnaud and H. Bejoint (eds), 1–12. Basingstoke: Macmillan. Ellis, G. and Sinclair, B. 1989. Learning to Learn English. Cambridge: CUP. Granger, S. 1998. Prefabricated patterns in advanced EFL writing: Collocations and formulae. In Phraseology: Theory, Analysis and Applications, A. P. Cowie (ed.), 145–160. Oxford: OUP. Green, R. 2000. Life After the Pre-Sessional Course. In Assessing English for Academic Purposes. G. M. Blue, J. Milton and J. Saville (eds), 131–145. Bern: Lang. Hill, J., M. Lewis and M. Lewis. 2000. Classroom strategies, activities and exercises. In Teaching Collocation, M. Lewis (ed.), 88–117. Hove: Language Teaching Publications. Howarth, P. 1998. The phraseology of learners’ academic writing. In Phraseology: Theory, Analysis and Applications, A. P. Cowie (ed.), 161–186. Oxford: OUP. Hulstijn, J. H. 2001. Intentional and incidental second language vocabulary learning: A reappraisal of elaboration, rehearsal, and automaticity. In Cognition and Second Language Instruction, P. Robinson (ed.), 258–286. Cambridge: CUP. Jordan, R. R. 1990 (2nd ed). Academic Writing Course. London: Collins ELT. McCarthy, M. 1990. Vocabulary. Oxford: OUP. McCarthy, M. and O’Dell, F. 1994. English Vocabulary in Use. Cambridge: CUP. Nation, I. S. P. 1990. Teaching and Learning Vocabulary. New York: Heinle and Heinle. Nation, I. S. P. 2001. Learning Vocabulary in Another Language. Cambridge: CUP. Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical Phrases and Language Teaching. Oxford: OUP. O’Connell, S. 2002. Focus on IELTS. Harlow: Pearson Education. Oshima, A. and Hogue, A. 1999 (3rd ed). Writing Academic English. London: Longman. Pawley, A. and Syder, F. H. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike ﬂuency. In Language and Communication, J.C Richards and R.W. Schmidt (eds), 191–225. London: Longman. Schmitt, N. 2000. Vocabulary in Language Teaching. Cambridge: CUP. Schmitt, N. 1997. Vocabulary Learning Strategies. In Vocabulary: Description, Acquisition and Pedagogy. N. Schmitt and M. McCarthy (eds), 199–227. Cambridge: CUP. Sellen, D. 1982. Skills in Action. Cheltenham: Hulton Educational. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Sinclair, J. 1996. The search for units of meaning. Textus IX:75–106 Sökmen, A. 1997. Current trends in teaching second language vocabulary. In Vocabulary: Description, Acquisition, and Pedagogy, N. Schmitt and M. McCarthy (eds), 237–257. Cambridge: CUP. Stahl, S. A. and Vancil, S. J. 1986. Discussion is what makes semantic maps work in vocabulary instruction. The Reading Teacher 40: 62–67. Cited in Nation, I. S. P. 2001. Learning Vocabulary in Another Language. Cambridge: CUP. Stevens, V. 1995. Concordancing with language learners: Why? When? What? CAELL Journal 6: 2–10. Stubbs, M. 1995. Corpus evidence for norms of lexical collocation. In Principle and Practice in Applied Linguistics, G. Cook, and B. Seidlhofer (eds), 245–256. Oxford: OUP. Tribble C. and Jones G. 1990. Concordances in the Classroom. Harlow: Longman. White, R. and McGovern, D. 1994. Writing. Mahwah NJ: Prentice Hall.

29

292 Martha Jones and Sandra Haywood Wilkinson, R. 1966. Sleep and dreams. In New Horizons in Psychology, B. Foss (ed.). Harmondsworth: Penguin. Extracted in Swan, M. 1975. Inside Meaning: Proﬁciency Reading Comprehension. Cambridge: CUP. Word Pilot. Internet resource: . Accessed June 30, 2003. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

Appendix 1 Examples of formulaic sequences in four academic writing coursebooks Coursebook Language Area

Examples

Skills in Action

Prevent X from happening; Under no circumstances First of all; In order to

Suggesting instructions Connectives (Sequence & Instructions) Describing location Connectives (Cause and Eﬀect) Expressing comparison Expressing contrast To introduce a new aspect Ways of referring to the criteria being used Connectives to express purpose

At the front; In the north of As a consequence of which To be the same as As distinct from As far as X is concerned On the basis of In order to / that

Academic Writing Course

Composition of a country Commonly used verb forms Qualiﬁcation and Comparison

Writing

Cohesive markers ‘And’ type Cohesive markers ‘Or’ type Cause and Eﬀect Transition Comparison within sentences Ways of referring to data in tables and diagrams

Apart from this; What is more In other words Because of this; As a result As far as X is concerned X and Y are quite similar in terms of As can be seen in Table X; According to Table X

Writing Academic English

To introduce examples To indicate order of importance To introduce cause or reason To conclude

An example of First and foremost The consequence of; As a result of All in all; The evidence suggests that . . .

X is composed of The following are examples of X X is considerably smaller than; X is totally diﬀerent from Connectives and Markers (CauseThe eﬀect of X is; An increase in X Eﬀect) often leads to Qualiﬁed generalisations It is fairly likely that; It is almost certain In the majority of cases; In some cases Impersonal verb phrases used It appears that; Some of the evidence in conclusions shows that; It has been suggested that Interpretation of data As can be seen from the chart; According to Figure Introduction, Development & Con- To sum up; On the whole clusion

Appendix 2 Formulaic sequences encountered by students in the study (Categorizations following Biber et al. 1999) Noun phrase with of-phrase fragment the development of the report of the eﬀect(s) of the kind(s) of the number(s) of the study of the work of the existence of the presence of the absence of the nature of the size of the purpose of the levels of the parts of one of the main (noun) one of the most (noun) the value of the the use of the importance of the origin of

the point of view the needs of the area of the group(s) of the spread of the symbol of [the] species of the cycle of the hours of a question of a study of a high incidence of the changes of the temperature of the rate of the adaptation of the period of [this] form of the result of the accuracy of the aim of this study was

Prepositional phrase with embedded of-phrase as a result of as a consequence of in the case of in terms of on the basis of Other prepositional phrase (fragment) in contrast to in order to on the one hand on the other hand

Facilitating the acquisition of formulaic sequences to what extent Noun phrase with other post-modiﬁer fragments the relationship between the reason for there were no signiﬁcant diﬀerences an increase in the way in which the extent to which the fact that due to the fact that studies have shown that (Verb phrase +) that-clause fragment has been suggested that . . . has been shown that . . . can be seen that . . . Anticipatory it + verb phrase/adjective phrase it is (not) possible to . . . it is possible that . . . it is likely / unlikely that . . . it is clear that it is necessary to it may be necessary to (Verb/adjective +) to-clause fragment is/are likely to be may be able to should be able to will be able to

295

Appendix 3 An adapted text from the reading component SLEEP Section 1 We all know that the normal human daily cycle of activity is of some 7–8 hours’ sleep alternating with some 16–17 hours’ wakefulness and that, broadly speaking, the sleep normally coincides with the hours of darkness. Our present concern is with how easily and to what extent this cycle can be modiﬁed. The question is not only an academic one. The ease, for example, with which people can change from working in the day to working at night is a question of growing importance in industry where automation calls insistently for round-the-clock working of machines. It normally takes from ﬁve days to one week for a person to adapt to a reversed routine of sleep and wakefulness, sleeping during the day and working at night. Unfortunately, in industry shifts are often changed every week; a person may work from 12 midnight to 8a.m. one week, 8a.m. to 4p.m. the next, and 4p.m. to 12 midnight the third and so on. This means that no sooner has he got used to one routine than he has to change to another and, as a result of these changes, much of his time is spent neither working nor sleeping very eﬃciently. Section 2 One answer would seem to be longer periods on each shift, a month or even three months. However, it has been shown that people on such systems will revert to their normal habits of sleep and wakefulness during the weekend and that this is quite enough to destroy any adaptation to night work built up during the week (Bonjer 1960). The only real solution appears to be to hand over the night shift to a group of permanent night workers whose nocturnal wakefulness may persist through all weekends and holidays. An interesting study of the domestic life and health of night-shift workers was carried out by Brown in 1957. She found a high incidence of disturbed sleep, digestive disorders and domestic disruption among those on alternating day and night shifts, but in the case of permanent night workers the presence of such symptoms was found to be normal. This latter system then is likely to be the best long-term policy, but meanwhile something may be done to relieve the demands of alternate day and night work by selecting those people who can adapt most quickly to the changes of routine. One way of knowing when a person has adapted is by measuring his performance, but this can be laborious. Fortunately, we have a physical measure which correlates reasonably well with the behavioural one, in terms of performance at various times of the day or night, and which is easier to take. The temperature of the body, which can be determined by the use of an ordinary clinical thermometer, alters throughout 24 hours. People engaged in normal daytime work will have a high temperature during the hours of wakefulness and a low one at night; when they change to night work the pattern will only gradually reverse to match the new routine and the rate of change

Facilitating the acquisition of formulaic sequences of the body temperature parallels, broadly speaking, the adaptation of the body as a whole, particularly in terms of performance and general alertness. Therefore by taking body temperature at intervals of two hours throughout the period of wakefulness researchers can tell how quickly a person can adapt to a reversed routine. It has been suggested that workers could be selected on the basis of this. So far, however, this form of selection does not seem to have been applied in practice. From Sleep and Dreams by Robert Wilkinson (1966) Adapted by Sandra Haywood

297

Appendix 4 Vocabulary exercises for to what extent VOCABULARY STUDY: to what extent Task 1 Study the extracts from academic papers below. Link each extract to a subject area: biology, civil engineering, economics, education, law, linguistics, management, politics 1. We do not yet know to what extent chimps use their potential in the wild. 2. It is not, for instance, possible to say to what extent the diﬀerences in construction relate to their function. 3. It is less clear, however, to what extent and in what ways this broad division has actually manifested itself in the classroom and in internal school debates and policies, and in local authority policies. 4. Where an unconstitutional change of regime takes place in a recognised state, governments of other states must necessarily consider what dealings, if any, they should have with the new regime, and whether and to what extent it qualiﬁes to be treated as the government of the state concerned. 5. The object of this project is to test under what circumstances and to what extent people in post-Communist societies are developing values and patterns of behaviour consistent with market economies and social welfare as these terms are understood in Western Europe. 6. There is some disagreement whether and to what extent pressure groups should be allowed to use the courts to achieve their desired ends. 7. The question, then, is to what extent can these eﬃciencies be improved and to what extent can the wastage be reduced? 8. To what extent can children be said to apply ‘rules’ in word-formation? extracts from the BNC available at http://thetis.bl.uk/lookup.html Task 2 1. Find and underline the phrase to what extent in each extract. What follows to what extent? 2. The word order in Extracts 7 and 8 is diﬀerent. How? Why? 3. You use this phrase when you think something may be true but you are uncertain how true. Can you ﬁnd evidence for this idea of uncertainty in each extract? Task 3 Imagine that you are conducting a survey or an experiment to collect data about something that you think may be true but you are uncertain how true. Describe the aim of your survey or experiment. For example: The aim of this survey was to discover to what extent students are unhappy with their accommodation.

Appendix 5 Gap-ﬁll and analysis exercises in the writing component Academic writing (Stage Two) Presenting supporting points Read the passage below and ﬁll the blank spaces with one of the expressions in the box Stress is __________ one of the most serious modern diseases. _________________ ______________________ the Institute of Management, approximately 270,000 UK workers take time oﬀ work every year because of work-related stress, at a cost to the nation in sick pay, lost production and medical bills of about £7 billion. ______________________________ stress is less of a problem for bosses than for their subordinates, and _________________ the survey, _________________ only 9 per cent of junior managers looked forward to going to work. ____________, only 7 per cent felt they were in control of their jobs. which found that arguably this view is conﬁrmed by According to a survey carried out by Furthermore Experts have often suggested From O’Connell. (2002). Focus on IELTS Academic writing — Stage Two Contrast/Concession Read the examples below and analyse how the expressions of Contrast/Concession are used in context • Administration oﬃcials, notably the White House Chief of Staﬀ and Deputy Treasury Secretary, were irked (irritated) by his independence. On the other hand, Taylor reportedly is well-regarded by Treasury oﬃcials for his low-key out-of-the-limelight style (NEWS) • Many statutory water companies are already saddled with (put in a position where they have to deal with) high borrowings. In contrast, the water authorities are going into the private sector ﬂush with cash (NEWS) • Potassium ions might be more readily translocated from zones of high concentration within the root system although there is no evidence for this. Alternatively, there might be a threshold concentration of all nutrients. (ACADEMIC)

300 Martha Jones and Sandra Haywood • The elements of design and their interconnection in into the process network are relatively easy to recognize and generalize, and so produce a common basis for all design activities. It is however the subtler aspects of weight, control and role which ‘colour’ the process. (ACADEMIC) • These experiments do not support the notion that poor readers are unlikely to use context when reading and go some way to suggesting that it is the poor readers who rely on context to aid their weak word-recognition skills. The good readers, in contrast, seem to recognise words so quickly that the beneﬁcial (or harmful) eﬀects of context do not have time to take eﬀect. (ACADEMIC) • For well-watered crops of pigeonpea dry matter production and the amount of radiation intercepted were linearly related. In contrast, dry matter production by monocropped and intercropped groundnuts was not linearly related to the amount of intercepted radiation. (ACADEMIC) From Biber et al. (1999), Longman Grammar of Spoken and Written English.

Index A absence of analysis 262 Academic Word List 65 academic writing 270 academic writing textbooks 270 acculturation 88 accuracy 249, 258–260, 263 accuracy of morphological forms 264 acquisition of formulaic sequences 4, 107 amount of formulaic sequences in language 1 anxiety 95, 99, 103 aptitude 95, 101 attention 193, 229 attitude 59 attitudes toward L2 learning 103 attrition of formulaic sequences 68, 259 auctioneer 40 automaticity 208 automatisation 262 avoidance strategy 205 awareness raising of formulaic sequences 281

concurrent noticing 230 conditions of use of formulaic sequences 9 Contact Hypothesis 91 content words 184 Continental research into formulaic sequences 11, 19 core idioms 25 corpora 28, 30, 110, 127 creative / generative use of lexemes 272 criteria of formulaic sequences 2 C-test 58, 279, 282 cultural adaptation 88 culture shock 88 D deﬁnition of formulaic sequences 2–3, 24, 192 dictation 130, 149–151 directive cues 230 discourse markers 10 dual performance task 132

B

E

Bilingual Model of Lexical Access 192 boundaries between formulaic sequences 260 British National Corpus (BNC) 7, 25, 28, 31, 56, 129, 139, 156

EAP 56, 59, 269, 273 enclosure 90 error types 259 errors in formulaic sequences 261 exposure 107 eye-mind assumption 154 eye-movement 154, 173 E-Z Reader model of eye guidance 162–163

C CANCODE 28, 56, 115, 129, 139, 156, 196 Centre for English Language Education (CELE) 56, 61, 274 child learners 251 classroom observation 288 clitics 266 closed class items 265 cloze 58 CoBuildDirect Corpus Sampler 233 collocation 31 collocational associations 250 collocational pairs 266 collocational prosody 20 composition in performance 38 comprehension 239 concordance lines 272 concordancers 7, 31 concordances 197

F ﬁxation 154, 159 ﬁxedness of formulaic sequences 32, 265 ﬂashcards 253–254, 263 ﬂexible formulaic sequences 6 ﬂuency 37, 143, 249, 255, 260 formalisation of formulae 49 formula 4 formulae 38 formulaic performance 37 formulaic speech traditions 37 frequency of occurrence 2, 24, 182 friendship networks 91 functions 3, 9, 129, 207 fusion 264

302

Index G

N

gap-ﬁll exercises 278, 300–301 glosses 231, 239

native speakers 113 naturalistic learners 251 naturally occurring speech 40 noncompositionality of formulaic sequences 32 noticing 228, 271

H hesitations in speech 143 highlighting of linguistic features 229 Homer 37 Homogeneity Hypothesis 192 humour 43 hypercorrection 264

O open class items 265 open-choice principle 269

I

P

identifying formulaic sequences 250 idiom principle 1, 55, 269 idioms 2, 250 IELTS 61, 158, 274 integrativeness 102 interethnic contact 90 interlanguage 251 interviews 94, 114, 286 intuition 29 ITEMAN 60

paradigmatic variant 265 participant observer 40 pattern-based models of acquisition 13 pauses in formulaic sequences 260–261 phatic phrases 10 phonological analysis 32 phonological proﬁle of formulaic sequences 194 phrasal lexeme 4 phrasal lexical item 4, 51 phrasalect 138 politeness 37 practise 264 pragmatic implicature 250 pragmatic transparency of formulaic sequences 33 pragmatic/functional analysis 33 pragmatics 249 prefabricated linguistic material 249 processing problems for formulaic sequences 186 process-oriented approach 90 proﬁciency in English 97 pronunciation 264 prosodic modes 39 psycholinguistics 127

L L1 acquisition of formulaic sequences 11 L2 acquisition of formulaic sequences 11 language aptitude 59 learning burden 6 lemma 24, 228 length of formulaic sequence 181 lexeme 24 lexical access 164 lexical bundles 274 lexical interference 265 lexical phrase 4, 55 lexical representation 164 lexical representation of formulaic sequences 192 lexically light formulaic sequences 209 lexico-grammar 55 M memorising formulaic sequences 262 methodology in researching formulaic sequences 48 MICASE 56, 129 motivatation 59, 89, 97, 99 multiple-choice test of formulaic sequence knowledge 198, 212, 222–224 multi-word items 2 mutation 254, 263

Q quantiﬁcation of formulae 49 questionnaire 200 R Range software 233 recall 258 receptive vs. productive comparisons 68 recognition times of formulaic sequences 179 recurrent clusters 128 reﬂective self-assessment 200

Index regression 154 rehearsal 254, 258 reliability 34, 61 repetition 264 research questions into formulaic sequences 19 restricted collocation 51 retrieval of lexemes 272 ritual events 39 routine 40 S saccade 154 saccadic programming 164 saliency 239 segmentation 265–266 selection of formulaic sequences 56, 156, 197, 232, 274 self-identity 89 self-paced reading 174, 180, 187 semantic prosody 7 semantic transparency of formulaic sequences 33 sentence context 155 sequences in language 108 social identity 43 social interaction 10, 107 social networks 90 social scripts 267 sociocultural adaptation 108 sociocultural integration 88, 100 solidarity 46 speech act 45 sports commentator 40 strategies 205, 272, 277

style shifting 45 synonyms 265 T teaching formulaic sequences 271, 276 technical formulaic sequences 10 technical vocabulary 10 terminal words 157, 161 terminology 3 tests of formulaic sequences 57, 72–74, 86 think aloud protocols 194–195 TOEFL 61, 158 tradition 38 translation 205, 265 transparency of formulaic sequences 6 triangulation 33 typographic salience 229 V validity 34 variation within formulaic sequences 25 Vocabulary Levels Test 59, 61, 79–82 W Welsh 252, 263 word frequency 154 word position in formulaic sequence 183 word recognition 164 Wordsmith 31 working memory 40, 133 writing classes 277

303

In the series LANGUAGE LEARNING & LANGUAGE TEACHING (LL<) the following titles have been published thus far, or are scheduled for publication: 1. CHUN, Dorothy M.: Discourse Intonation in L2. From theory and research to practice. 2002. 2. ROBINSON, Peter (ed.): Individual Differences and Instructed Language Learning. 2002. 3. PORTE, Graeme Keith: Appraising Research in Second Language Learning. A practical approach to critical analysis of quantitative research. 2002. 4. TRAPPES-LOMAX, Hugh and Gibson FERGUSON: Language in Language Teacher Education. 2002. 5. GASS, Susan, Kathleen BARDOVI-HARLIG, Sally Sieloff MAGNAN and Joel WALZ (eds.): Pedagogical Norms for Second and Foreign Language Learning and Teaching. 2002. 6. GRANGER, Sylviane, Joseph HUNG and Stephanie PETCH-TYSON (eds.): Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching. 2002. 7. CHAPELLE, Carol A.: English Language Learning and Technology. Lectures on applied linguistics in the age of information and communication technology. 2003. 8. JORDAN, Geoff: Theory Construction in Second Language Acquisition. 2004. 9. SCHMITT, Norbert (ed.): Formulaic Sequences. Acquisition, processing and use. 2004.

Formulaic Sequences: Acquisition, Processing and Use (Language Learning and Language Teaching)

Perspectives on Formulaic Language: Acquisition and Communication

Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching (Language Learning & Language Teaching)

Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching (Language Learning & Language Teaching)

Intercultural Language Use and Language Learning

Intercultural Language Use and Language Learning

Second Language Learning and Language Teaching

Second Language Learning and Language Teaching

Vocabulary Learning Strategies and Foreign Language Acquisition (Second Language Acquisition)

Formulaic Language and the Lexicon

Second Language Acquisition and the Younger Learner: Child's Play? (Language Learning and Language Teaching, Volume 23)

Formulaic Language and the Lexicon

Second Language Acquisition - Foreign Language Learning

Second Language Acquisition - Foreign Language Learning

Language acquisition: knowledge representation and processing

Language Acquisition: Knowledge Representation and Processing

Language acquisition: knowledge representation and processing

Connected Words: Word associations and second language vocabulary acquisition (Language Learning & Language Teaching)

Language Acquisition and Learnability

Vocabulary In A Second Language: Selection, Acquisition, And Testing (Language Learning & Language Teaching)

Second Language Acquisition and Second Language Learning (Language Teaching Methodology Series)

Principles and Practice in Second Language Acquisition (Language Teaching Methodology)

Language acquisition and learnability

Language Acquisition and Learnability

Teaching-And-Learning Language-And-Culture

Teaching and Learning Communication, Language and Literacy

Synthesizing Research on Language Learning And Teaching (Language Teaching & Language Learning)

Language Processing and Acquisition in Languages of Semitic, Root-based, Morphology (Language Acquisition & Language Disorders)

Identity, Motivation and Autonomy in Language Learning (Second Language Acquisition)

Dictionary Use in Foreign Language Writing Exams: Impact and implications (Language Learning & Language Teaching)

Implicit and Explicit Knowledge in Second Language Learning, Testing and Teaching (Second Language Acquisition)

Formulaic Sequences: Acquisition, Processing and Use (Language Learning and Language Teaching)

Perspectives on Formulaic Language: Acquisition and Communication

Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching (Language Learning & Language Teaching)

Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching (Language Learning & Language Teaching)

Intercultural Language Use and Language Learning

Intercultural Language Use and Language Learning

Second Language Learning and Language Teaching

Second Language Learning and Language Teaching

Vocabulary Learning Strategies and Foreign Language Acquisition (Second Language Acquisition)

Formulaic Language and the Lexicon

Second Language Acquisition and the Younger Learner: Child's Play? (Language Learning and Language Teaching, Volume 23)

Formulaic Language and the Lexicon

Second Language Acquisition - Foreign Language Learning

Second Language Acquisition - Foreign Language Learning

Language acquisition: knowledge representation and processing

Language Acquisition: Knowledge Representation and Processing

Language acquisition: knowledge representation and processing

Connected Words: Word associations and second language vocabulary acquisition (Language Learning & Language Teaching)

Language Acquisition and Learnability

Vocabulary In A Second Language: Selection, Acquisition, And Testing (Language Learning & Language Teaching)

Second Language Acquisition and Second Language Learning (Language Teaching Methodology Series)

Principles and Practice in Second Language Acquisition (Language Teaching Methodology)

Language acquisition and learnability

Language Acquisition and Learnability

Teaching-And-Learning Language-And-Culture

Teaching and Learning Communication, Language and Literacy

Synthesizing Research on Language Learning And Teaching (Language Teaching & Language Learning)

Language Processing and Acquisition in Languages of Semitic, Root-based, Morphology (Language Acquisition & Language Disorders)

Identity, Motivation and Autonomy in Language Learning (Second Language Acquisition)

Dictionary Use in Foreign Language Writing Exams: Impact and implications (Language Learning & Language Teaching)

Implicit and Explicit Knowledge in Second Language Learning, Testing and Teaching (Second Language Acquisition)

Recommend Documents