OUTSTANDING DISSERTATIONS IN LINGUISTICS Edited by
Laurence Horn Yale University
A ROUTLEDGE SERIES OTHER BOOKS IN THIS SERIES: MINIMAL INDIRECT REFERENCE A Theory of the Syntax-Phonology Interface Amanda Seidl AN EFFORT BASED APPROACH TO CONSONANT LENITION Robert Kirchner PHONETIC AND PHONOLOGICAL ASPECTS OF GEMINATE TIMING William H.Ham GRAMMATICAL FEATURES AND THE ACQUISITION OF REFERENCE A Comparative Study of Dutch and Spanish Sergio Baauw AUDITORY REPRESENTATIONS IN PHONOLOGY Edward S.Flemming THE SYNCHRONIC AND DIACHRONIC PHONOLOGY OF EJECTIVES Paul D.Fallon THE TYPOLOGY OF PARTS OF SPEECH SYSTEMS The Markedness of Adjectives David Beck THE EFFECTS OF PROSODY ON ARTICULATION IN ENGLISH Taehong Cho PARALLELISM AND PROSODY IN THE PROCESSING OF ELLIPSIS SENTENCES Katy Carlson PRODUCTION, PERCEPTION, AND EMERGENT PHONOTACTIC PATTERS A Case of Contrastive Palatalization Alexei Kochetov RADDOPPIAMENTO SINTATTICO IN ITALIAN A Synchronic and Diachronic Cross-Dialectical Study Doris Borrelli PRESUPPOSITION AND DISCOURSE FUNCTIONS OF THE JAPANESE PARTICLE MO Sachiko Shudo
THE SYNTAX OF POSSESSION IN JAPANESE Takae Tsujioka COMPENSATORY LENGTHENING Phonetics, Phonology, Diachrony Darya Kavitskaya THE EFFECTS OF DURATION AND SONORITY ON CONTOUR TONE DISTRIBUTION A Typological Survey and Formal Analysis Jie Zhang EXISTENTIAL FAITHFULNESS A Study of Reduplicative TETU, Feature Movement, and Dissimilation Caro Struijke PRONOUNS AND WORD ORDER IN OLD ENGLISH With Particular Reference to the Indefinite Pronoun Man Linda van Bergen ELLIPSIS AND WA-MARKING IN JAPANESE CONVERSATION John Fry WORKING MEMORY IN SENTENCE COMPREHENSION Processing Hindi Center Embeddings Shravan Vasishth INPUT-BASED PHONOLOGICAL ACQUISITION Tania S.Zamuner VIETNAMESE TONE A New Analysis Andrea Hoa Pham ORIGINS OF PREDICATES Evidence from Plains Cree Tomio Hirose
CAUSES AND CONSEQUENCES OF WORD STRUCTURE by
Jennifer Hay
Routledge New York & London
Published in 2003 by Routledge 29 West 35th Street New York, NY 10001 Published in Great Britain by Routledge 11 New Fetter Lane London EC4P 4EE Copyright © 2003 by Taylor & Francis Books, Inc. Routledge is an imprint of the Taylor & Francis Group. This edition published in the Taylor & Francis e-Library, 2006. To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to http://www.ebookstore.tandf.co.uk/. All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system without permission in writing from the publisher. 10 9 8 7 6 5 4 3 2 Library of Congress Cataloging-in-Publication Data for this book is available from the Library of Congress ISBN 0-203-49513-6 Master e-book ISBN
ISBN 0-203-57912-7 (Adobe eReader Format) ISBN 0-415-96788-0 (Print Edition)
Contents List of Figures List of Tables
viii
xi
Preface
xiv
Acknowledgments
xvi
1. Introduction
1
2. Phonotactics and Morphology in Speech Perception
17
3. Phonotactics and the Lexicon
32
4. Relative Frequency and Morphological Decomposition
60
5. Relative Frequency and the Lexicon
84
6. Relative Frequency and Phonetic Implementation
108
7. Morphological Productivity
122
8. Affix Ordering
134
9. Conclusion
162
Appendix A . Segmentation and Statistics
168
References
192
Index
201
List of Figures 1.1 Schematized dual route model
5
1.2 Schematized dual route model, indicating resting activation levels
8
1.3 Schematized dual route model, with fast phonological processor 10 2.1 Architecture of a simple recurrent network designed to identify word onsets
20
2.2 Average well-formedness ratings as predicted by the log probability of the best morphological parse
23
3.1 Phonotactics and prefixedness
40
3.2 Phonotactics and sematic transparency
42
3.3 Phonotactics and semantic transparency, with line fit only through relatively transparent words
44
3.4 Phonotactics and polysemy
49
3.5 Phonotactics and relative frequency
54
4.1 Schematized dual route model
63
4.2 Waveform and pitchtrack for “But today they seemed imperfect.”
79
4.3 Waveform and pitchtrack for “But today they seemed impatient.”
80
4.4 Pitch accent placement and relative frequency
81
5.1 Suffixed forms: derived and base frequency
86
5.2 Prefixed forms: derived and base frequency
87
5.3 Prefixed forms: relative frequency and polysemy
92
5.4 Prefixed forms: frequency and polysemy
94
5.5 Prefixed forms: frequency, relative frequency, and polysemy
97
5.6 Suffixed forms: frequency and polysemy
102
6.1 Waveform and spectrogram for “daftly”
112
6.2 Waveform and spectrogram for “softly”
113
6.3 Waveform and spectrogram for “swiftly”
114
6.4 Waveform and spectrogram for “briefly”
115
6.5 Boxplots of average /t/-ness ratings
119
7.1 Type frequency and productivity
128
7.2 Relative frequency and phonotactics
129
7.3 Relative frequency and productivity
130
7.4 Phonotactics and productivity
131
7.5 Productivity, relative frequency, and phonotactics
133
A. 1
Probability of nasal-obstruent clusters morpheme internally, vs. 170 across a morpheme boundary
A.2 Reanalysis of data from Hay et al. (2000)
172
A.3 Intra- and inter-word co-occurrence probabilities for transitions occurring in 515 prefixed words
174
A.4 Log(intra-word co-occurrence probabilities) vs. Log(intra-word/ 174 inter-word probabilities for transitions occurring in 515 prefixed
words A.5 Experiment 2: Intra- and inter-word co-occurrence probabilities 177 for stimuli A.6 Experiment 2: Log expected probability across a word boundary, vs. number of subjects rating the stimuli as “complex”
178
A.7 Experiment 2: Log expected value across a word boundary of the worst member of each pair, against the difference in judgments between pair members
180
A.8 Predictions made by an “absolute probability” decomposer, 184 relative to a decomposer based on over- or under-representation in the lexicon A.9 Intra- and inter-word co-occurrence probabilities for transitions occurring in 515 prefixed words
189
List of Tables 2.1
Experiment 2: Stimuli
26
3.1
Experiment 3: Suffixed Stimuli
33
3.2
Experiment 3: Prefixed Stimuli
34
3.3
Prefixed Forms: Junctural Phonotactics and Polysemy
46
3.4
Prefixed Forms: Junctural Phonotactics and Polysemy
47
3.5
Prefixed Forms: Junctural Phonotactics and Semantic Transparency
50
3.6
Prefixed Forms: Relative Frequency and Junctural Phonotactics 52
3.7
Suffixed Forms: Junctural Phonotactics and Semantic Transparency
55
3.8
Suffixed Forms: Junctural Phonotactics and Polysemy
56
3.9
Suffixed Forms: Relative Frequency and Junctural Phonotactics 56
4.1
Experiment 4: Prefixed Stimuli
74
4.2
Experiment 4: Suffixed Stimuli
74
4.3
Experiment 4: Stimuli
77
5.1
Prefixed Forms: Frequency and Relative Frequency
87
5.2
Suffixed Forms: Frequency and Relative Frequency
88
5.3
Prefixed Forms: Above/Below Average Frequency and Relative 89 Frequency
5.4
Suffixed Forms: Above/Below Average Frequency and Relative 89 Frequency
5.5
Prefixed Forms: Frequency and Polysemy
91
5.6
Prefixed Forms: Relative Frequency and Polysemy
93
5.7
Prefixed Forms: Frequency, Relative Frequency, and Polysemy 94
5.8
Prefixed Forms: Relative Frequency and Semantic Transparency
97
5.9
Prefixed Forms: Frequency and Semantic Transparency
98
5.10 Suffixed Forms: Relative Frequency and Semantic Transparency
99
5.11 Suffixed Forms: Frequency and Semantic Transparency
99
5.12 Suffixed Forms: Relative Frequency and Polysemy
100
5.13 Suffixed Forms: Frequency and Polysemy
100
5.14 Suffixed Forms: Frequency, Relative Frequency, and Polysemy 102 6.1
Experiment 6: Stimuli
110
6.2
Experiment 6: Average “/t/-ness” Rankings
117
7.1
Frequency of the 14 Irregular Plurals in -eren in Modern Dutch 126
8.1
-ionist Forms
144
8.2
-ionary Forms
145
8.3
-ioner Forms
146
8.4
Frequency Counts for Latinate Bases Suffixed with -ment
151
8.5
Experiment 7a: Stimuli
153
8.6
Experiment 7b: Stimuli
156
Preface
The work reported here originally appeared as my 2000 Northwestern University Ph.D dissertation. This version of the text has undergone many minor revisions—and some of the statistics have been redone using more appropriate tests than those which were originally reported. The only major revision of the text was a folding together of the original chapters 8 (“Level-ordering”) and 9 (“The Affix Ordering Generalization”) into a single chapter, which appears here as chapter 8 (“Affix Ordering”). This does not reflect substantial revisions to the content or argumentation, but the result is a more concise and more coherently presented account of the proposed theory. No attempt has been made to update the text to reflect literature which has appeared since the completion of the dissertation, nor to incorporate discussion of research which has built on the ideas presented in the dissertation, or to respond to critiques of this work which have appeared in the literature. For extensive discussion and critique of the dissertation the reader is referred in particular to Baayen (2002) and Plag (2002). Plag (2002) has dubbed the Affix-ordering account developed in chapter 8 “Complexity Based Ordering” (CBO), a term which I have subsequently adopted (Hay 2002, Hay and Plag to appear). Hay and Plag (to appear) have developed the CBO account further by examining co-occurrence restriction patterns of a set of English affixes. We find that these patterns provide strong support for a CBO account. Harald Baayen and I have devoted considerable energy to modeling the effects described in chapter 7 on a much larger scale. Hay and Baayen (2002) and Hay and Baayen (to appear) describe a large scale investigation into the relationship between relative frequency, phonotactics, and morphological productivity. We find strong evidence for links between the phonotactics and frequency profile of an individual affix, and that affix’s productivity. One significant development is the motivation of a ‘parsing threshold’. The division in the dissertation between derived forms which are more frequent than the bases they contain, and derived forms which are less frequent than the forms they contain was a first approximation, and clearly overly simplistic. Hay and Baayen (2002) refine the notion of relative frequency, investigating how frequent a base form needs to be relative to the derived form, in order to facilitate parsing. Also in joint work with Harald Baayen, the results of experiment 4 were replicated in Dutch in 2001. The details of this replication have yet to be published.
Since the completion of the dissertation, text from chapters 4 and 5 has appeared as Hay (2001), and much of chapter 8 has been published as Hay (2002). Thanks are due to Linguistics and Language for permission to reprint that material in this volume.
Acknowledgments
First and foremost, this dissertation would not have been possible without the relentless, multi-faceted support of my supervisor, Janet Pierrehumbert. Her invaluable contributions to this enterprise constituted a magical combination of friendship, advice, argument, opera, inspiration, and fish. Her constant enthusiasm and her penetrating mind made this experience something a dissertation is not supposed to be—enormous fun. Janet will forever be a role-model for me, and I simply can not thank her enough. The rest of my committee was remarkably patient with my sometimes long silences, yet always forthcoming when I turned up unannounced with questions. Mary Beckman generously hosted me in the lab at Ohio State for two consecutive summers—where early plans for this dissertation were hatched—and was a rich source of advice and ideas. Chris Kennedy is to be thanked for his strong encouragement of my interest in lexical semantics, and for the good-natured way in which he managed to ask the really hard questions, and force me to be precise in my claims. Lance Rips provided an invaluable psychology perspective, and an extremely thorough reading of the final draft. His sharp eye caught several contentful errors and omissions, for which I am extraordinarily grateful. In addition to my committee members, several people played a substantial role in capturing my imagination, and/or shaping my subsequent thinking about phonetics, phonology, morphology and lexical semantics. They are Laurie Bauer, Ann Bradlow, Mike Broe, Fred Cummins, Stefan Frisch, Stefanie Jannedy, Beth Levin, Will Thompson, J.D.Trout, Paul Warren and Yi Xu. Beth deserves special thanks for her ceaseless encouragement, and the constant flow of relevant references. Stefanie Jannedy, Scott Sowers, J.D.Trout and Saundra Wright are to be credited as excellent sounding boards, and were good-natured recipients of many graphs on napkins. And Heidi Frank, Christina Nystrom, and Scott Sowers pro-vided comments on an early draft; and beer. Many thanks to everyone in the linguistics department at Northwestern, and particularly Betty Birner and Mike Dickey, for allowing me access to their students; Tomeka White, for keeping everything running smoothly; Ann Bradlow for establishing the linguistics department subject pool; and Chris Kennedy and Lyla Miller for lending their voices as stimuli. Special thanks are also due to everyone in the lab at Ohio State for their hospitality and help, particularly Jignesh Patel, Jennifer Vannest and Jennifer Venditti. Much of the work in this dissertation would have been excruciatingly time-consuming if it weren’t for my ability to program. For this, and for teaching me to think about text in
an entirely new way, I am indebted to Bran Boguraev, Roy Byrd and everyone in the Advanced Text Analysis and Information Retrieval Group, at IBM T.J.Watson Research. My stint at IBM was invaluable in a second way—by providing the financial freedom to concentrate fully on this dissertation during the all-important home straight. The Fulbright Foundation and Northwestern University also provided crucial financial support. The following people are to be credited with keeping me generally sane during my graduate career: Jennifer Binney, Lizzie Burslem, Heidi Frank, Stefanie Jannedy, Karin Jeschke, Mark Lauer, Norma Mendoza-Denton, Kathryn Murrell, Janice Nadler, Bernhard Rohrbacher, Scott Sowers, J.D.Trout, Lynn Whitcomb and Saundra Wright. Finally, I’d like to thank Janet Holmes for fostering my initial interest in linguistics, and then for sending me away. All my family and friends in NZ for their patience, love and support. And Aaron. Evanston, Illinois May 2000 I would like to take this opportunity to thank Harald Baayen and Ingo Plag for their extensive feedback on the work reported in this dissertation, and for the joint work which has grown out of our discussions. I should also (belatedly) acknowledge Cafe Express, in Evanston, where much of the original text was written. This version of the document has benefited from the outstanding latex support of Ash Asudeh, and the careful proofreading of Therese Aitchison and Brynmor Thomas. Christchurch, New Zealand May 2003
CHAPTER 1 Introduction
This is a book about morphology. It tackles questions with which all morphologists are intimately familiar, such as the degree to which we can use affixes to create new words, and the possible orderings in which affixes can appear. However it takes as its starting point questions which are generally considered well outside the domain of morphology, and even considered by many to be outside the domain of linguistics. How do listeners process an incoming speech signal? How do infants learn to spot the boundaries between words, and begin to build a lexicon? I demonstrate that fundamentals of speech processing are responsible for determining the likelihood that a morphologically complex form will be decomposed during access. Some morphologically complex forms are inherently highly decomposable, others are not. The manner in which we tend to access a morphologically complex form is not simply a matter of prelinguistic speech processing. It affects almost every aspect of that form’s representation and behavior, ranging from its semantics and its grammaticality as a base of further affixation, to the implementation of fine phonetic details, such as the production of individual phonemes and pitch accent placement. Linguistic morphology has tended to focus on affixes, and on seeking explanations for unexpected differences in their behavior. I argue that that a different level of abstraction is necessary. Predictions about the behavior of specific affixes naturally follow when we focus on the behavior of individual words. In order to properly account for classically morphological problems such as productivity, stacking restrictions and cyclicity phenomena, we need to understand factors which lead individual forms to become, and remain, decomposed. And in order to understand decomposition, we need to start at the very beginning.
1.1 Modeling Speech Perception When listeners process an incoming speech signal, one primary goal is the recognition of the words that the signal is intended to represent. Recognizing the words is a general prerequisite to the higher level goal of reconstructing the message that the speaker intended to convey.
Causes and consequences of word structure
2
The processing required to map incoming speech to stored lexical items can be broadly divided into two levels-prelexical processing, and lexical processing. Lexical processing consists of the selection of appropriate lexical entries in the lexicon. Prelexical processing involves strategies exploited by listeners in order to facilitate access to these lexical entries. One primary role of such prelexical processing is the segmentation of words from the speech stream. Speech, unlike the written word, does not come in a form in which word boundaries are clearly marked. This fact makes the acquisition of language an impressive feat, as infants must learn to spot word-boundaries in running speech, in order to begin the task of acquiring a lexicon. Recent work has shed considerable light on the strategies used by adults and infants to segment words from the speech stream. Multiple cues appear to be simultaneously exploited. These include stress patterns (see e.g. Cutler 1994), acoustic phonetic cues (Lehiste 1972), attention to utterance boundaries (Brent and Cartwright 1996), and knowledge of distributional patterns (Saffran, Aslin and Newport 1996; Saffran, Newport and Aslin 1996). Many of these strategies may be associated with the prelexical level in speech perception, serving as a filter which hypothesizes word boundaries, facilitating access to lexical entries which are aligned with those boundaries (Pitt and McQueen 1998, van der Lugt 1999, Norris, McQueen and Cutler 2000). Of course, for adult listeners, segmentation may result as a by-product of recognition of words embedded within the signal. As van der Lugt (1999:24) eloquently observes: “if you know the word, you know the word boundaries.” Lexical entries are organized into a network, and compete with each other in access. There is now a large body of evidence supporting the claim that words compete, resulting from a variety of experimental tasks (see e.g. McQueen, Norris and Cutler 1994, Norris et al. 1995, Vitevitch and Luce 1998). Lexical competition is therefore incorporated into most current models of speech perception, including MERGE (Norris et al. 2000), TRACE (McClelland and Elman 1986), NAM (Luce and Pisoni 1998), SHORTLIST (Norris 1994) and ART (Grossberg 1986). One factor which is highly relevant to lexical competition is lexical frequency. In speech perception, ambiguous stimuli tend to be identified as high frequency words (Connine, Titone and Wang 1993), less acoustic information is required to identify high frequency words than low frequency words (Grosjean 1980), and lexical decision times are negatively correlated with lexical frequency (Balota and Chumbley 1984). Also, a highly schematized (sine-wave) replica of a list of sentences is recognized as speech earlier by naive listeners if the sentences contain only high-frequency words, and once this speech mode of listening is triggered, word-identification rates are significantly more accurate for high frequency words (Mast-Finn 1999). Frequency also affects speech production, with high frequency words accessed more quickly, produced more fluently, undergoing greater reduction, and being less prone to error (see, e.g. Whalen 1991, Levelt 1983, Dell 1990, Wright 1997, Hay, Pierrehumbert, Beckman and Faust West 1999; Hay, Jannedy and Mendoza-Denton 1999). Lexical and prelexical speech perception processes such as those described above have important consequences for the long-term representation of lexical items. Recognizing an item affects that item’s representation, and increases the probability that it will be successfully recognized in the future. Some models capture this process by raising the resting activation level of the relevant lexical entry—such a mechanism is implicit in
Introduction
3
most of the models outlined above. Other models, based on exemplars, assume that identifying a word involves adding a new exemplar to the appropriate exemplar cloud (Johnson 1997a,b). However, in order to capture the fact that words encountered frequently have different properties from words encountered relatively infrequently, all models must assume that accessing a word in some way affects the representation of that word. All of the models discussed above have been primarily concerned with modeling the access of simple words. Central to this book is the argument that they also make extremely important predictions for the processing of affixed words. Assume that an affixed word can be accessed in two ways—a direct route, in which the entire lexical entry is accessed directly, or a decomposed route in which the word is accessed via its component parts (models which make this assumption are outlined in section 1.2). If accessing a simple word affects its representation, it follows that accessing a complex word also affects its representation. And, if there are two ways to access an affixed word, then there are two ways to affect its representation. Accessing it via the decomposed route reinforces its status as an affixed word made up of multiple parts. Accessing it via the direct route reinforces its status as an independent entity. Importantly, not all affixed words are equally likely to be accessed via a decomposed route. As outlined above, prelexical information is used to segment the speech stream in speech perception. This can be modeled by assigning lexical entries which are not well aligned with hypothesized boundaries less activation than lexical entries which are well aligned (cf. Norris et al. 1997). It follows that if an affixed word possesses properties leading to the hypothesis of a boundary at the morpheme boundary, this would significantly facilitate a decomposed route—and impede the direct route, which would not be aligned with the hypothesized boundary. If an affixed word possesses no such properties, however, no boundary will be hypothesized. This book examines this hypothesis in the context of two specific factors which the speech processing literature predicts to be relevant to the segmentation of words from the speech stream: one lexical, and one prelexical. At the prelexical level, we will investigate the role of distributional information, in the form of probabilistic phonotactics. At the level of lexical processing, we will investigate the role of frequency-based lexical competition. I demonstrate that these factors both exert an important influence on the processing of affixed words, and, consequently, many aspects of their representation. This has profound consequences for the semantic transparency and decomposability of individual affixed forms, and for the predicted behavior of specific affixes. By recognizing that decomposability is a continuum, and that it can be directly related to factors influencing segmentation in speech perception, we will acquire tremendous explanatory power in domains which have proven classically problematic in linguistic morphology, including morphological productivity, level-ordering phenomena, and affix ordering restrictions.
1.2 Modeling Morphological Processing While morphological processing is clearly part of speech processing, the two have not generally been treated together. Discussion in the context of models of speech processing
Causes and consequences of word structure
4
tends to deal exclusively with the treatment of simple words. Because affixed words present special problems, researchers interested in morphological processing have developed their own models in order to account for observed phenomena in this particular sub-domain. Recent models of morphological processing share many components in common with more general speech processing models, including concepts such as the arrangement of words in a lexical network, and frequency-based resting activation levels of lexical items (see, e.g. Frauenfelder and Schreuder 1992, Baayen and Schreuder 1999). Models of morphological processing must make some fundamental assumptions about the role of decomposition. Do we decompose affixed words upon encountering them, breaking them down into their parts in order to access lexical entries associated with the component morphemes? Or do we access affixed words as wholes, accessing an independent, holistic, lexical entry? Some researchers have argued that there is no decomposition during access (e.g. Butterworth 1983), and others have claimed there is a prelexical stage of compulsory morphological decomposition (e.g. Taft 1985). Laudanna, Burani and Cermele (1994) and Schreuder and Baayen (1994) argue that affixes are not an homogeneous set, and so it is hazardous to generalize over them as an undifferentiated category. Indeed, I will argue that it is hazardous to generalize even over different words which share the same affix. Most current models are mixed—allowing for both a decomposed access route, and a direct access, non-decomposed route. In many models the two routes explicitly compete, and, in any given encounter with a word, either the decomposed or the direct route will win (Wurm 1997, Frauenfelder and Schreuder 1992, Baayen 1992, Caramazza, Laudanna and Romani 1988). Direct competition does not necessarily take place, however. Baayen and Schreuder have recently argued that the two routes may interactively converge on the correct meaning representation (Schreuder and Baayen 1995, Baayen and Schreuder 1999). In this model too, however, there will necessarily be some forms for which the decomposed route dominates access, and others in which the direct, whole word representation is primarily responsible for access. As pointed out by McQueen and Cutler (1998), dual-route models appear to have met the most success in accounting for the range of empirical facts. For ease of explication, I will adopt a simple dual route “to the death” model throughout—assuming that, in any given encounter, a form was accessed via either a fully decomposed form, or a holistic direct route. I do this not because I am committed to its truth, but rather because it offers the best prospects for a straightforward illustration of the sometimes complex interaction of the various factors involved. It is important to note that the predictions made in this book regarding the interaction of speech processing strategies with morphological decomposition do not rely on this specific choice of model. The same predictions will follow from any model in which both decomposition and whole route access are available options, or in which the presence of the base word can be variably salient. Following the speech processing literature outlined in the previous section, we will assume that accessing a word via a whole word route affects the representation of that word. This occurs either by a raising of the resting activation level, or—in exemplar models, by increasing the number of relevant exemplars. We will also assume that complex words, even if accessed via a decomposed route, are subsequently stored in
Introduction
5
memory. In an exemplar model, we would assume that such exemplars are stored in parsed form. In our more abstract dual route network, we can assume that the form is stored with strong links to the parts that were used to compose it. Figure 1.1 shows an idealization of the two routes which race to access the lexical entry for insane. The dashed line indicates the direct access route—on encountering insane, this represents the possibility to access it directly.
Figure 1.1: Schematized dual route model. The solid lines indicate the decomposed route. The dashed line indicates the direct route. The solid lines indicate the decomposed route. The component parts are activated, and used to access the lexical item. If this route wins, the connection between the parts and
Causes and consequences of word structure
6
the derived form is reinforced. As such, any access via the whole word route will serve to reinforce the independent status of insane, whereas any access via the decomposed route will reinforce its decomposability—and its relation to in- and -sane. With this basic framework in place, we now introduce some factors known to be relevant to speech processing, in order to see how they are likely to impact the processing of morphologically complex words.
1.3 Lexical Effects 1.3.1 Phonological Transparency One primary goal of speech processing is to map the incoming speech signal to appropriate lexical entries. In the example given above, the speech signal associated with insane is sufficiently similar to at least two entries in the lexicon—insane and sane—that both are activated. However, if the speech signal matched one candidate more poorly than another, then that candidate would not fare well in the competition. Insane contains sane (phonologically), and so both are well activated by the speech signal. Sanity, on the other hand, does not contain sane. As such, the contents of the speech stream will not map well to the access representation for sane. The whole word route has a high chance of winning whenever the base word is not transparently phonologically contained in the derived form. In keeping with this prediction, Cutler (1980, 1981) demonstrates that the acceptability of neologisms relies crucially on the degree to which they are phonologically transparent. Bybee (1985:88) has claimed that derived forms with low phonological transparency are more likely to become autonomous than forms which are phonologically close to the base form, and Frauenfelder and Schreuder (1992) explicitly build this into their dual route model of morphological access, with the parsing route taking more time for phonologically less transparent forms. The claim that phonological transparency facilitates access is not uncontroversial, however. For example Marslen-Wilson and his colleagues (e.g. MarslenWilson et al. 1994, Marslen-Wilson et al. 1997, Marslen-Wilson and Zhou 1999) find that derived words prime their bases equally, regardless of phonological transparency. They argue that speech is directly mapped onto an abstract representation of a morpheme, which is underspecified for the properties of the base word which display variation. While Marslen-Wilson et al.’s results demonstrate that equivalent priming takes place once a lexical item is accessed, the results do not explicitly exclude the possibility that access to that lexical item may vary with phonological transparency, either in manner or speed. 1.3.2 Temporality Speech inherently exists in time; we encounter word beginnings before we encounter word ends. Consider this with reference to the model in figure 1.1. The acoustic signal which will activate the whole word insane, precedes the acoustic signal which will activate the base, sane. If we assume that, in general, affixes tend to have relatively lower
Introduction
7
resting activation levels than free words, then this temporal asymmetry is likely to favor the whole word route for prefixed words. Conversely, in suffixed words, the temporal onset of the derived form and the base is simultaneous. Any bias we see towards whole word access in prefixes, should be reduced for suffixed forms. Cutler et al. (1985) argue that this temporal asymmetry—together with the fact that language users prefer to process stems before affixes, is responsible for the generalization that suffixes are much more frequent across the world’s languages than prefixes (Greenberg 1966). Indeed, there is good evidence that prefixed words tend to be treated differently in processing than suffixed words (Beauvillain and Segui 1992, Cole, Beauvillain and Segui 1989, Marslen-Wilson et al. 1994). In chapter 3 we will see evidence that this temporal asymmetry has long-term consequences for lexical representations. 1.3.3 Relative Frequency As outlined in section 1.1, lexical frequency affects speed of access. We must therefore assume that each of the nodes in figure 1.1 has a resting activation level which is a function of its frequency of access. Nodes associated with frequent words (or morphemes) will be accessed more quickly than nodes associated with infrequent words. We will display nodes with different line widths, with thick lines indicating high frequency, and so high resting activation levels. We can approximate the relative frequencies of the derived form and the base by consulting their entries in the CELEX Lexical Database (Baayen et al. 1995). This reveals that sane occurs at a rate of around 149/17.4 million, whereas insane occurs at a rate of around 258/17.4 million. This is shown graphically in 1.2. We will leave aside the resting activation level of the prefix. This requires some knowledge of how many words containing this affix are routinely accessed via decomposition. In chapter 7, I will return to this issue, demonstrating that the resting activation level of an affix (as estimated by the proportion of forms which are accessed via decomposition) is highly predictive of that affix’s productivity level. When we now consider the two routes in 1.2 it is clear that the whole word route has an advantage. The higher relative frequency of insane speeds the whole route, relative to the decomposed route. Insane can be compared with a word like infirm. Infirm is fairly infrequent (27/17.4 million), and, importantly, its base firm is highly frequent (715/17.4 million). As such, we predict the decomposed route should have a strong advantage
Causes and consequences of word structure
8
Figure 1.2: Schematized dual route model. The solid lines indicate the decomposed route. The dashed line indicates the direct route. The line width of each node indicates the resting activation level—insane is more frequent than sane. over the whole word access route. Importantly, because words compete, the absolute frequency of the derived form is not so important as its frequency relative to the base form with which it is competing. Many researchers have posited a relationship between lexical frequency and the decomposition of complex forms. However, this body of work has nearly exclusively concentrated on
Introduction
9
the absolute frequency of the derived form—arguing that high frequency forms are not decomposed. The argument here is different. Insane is not a high frequency word. Chapter 4 steps through current models of morphological processing in some detail, arguing that all models which predict a frequency effect, in fact predict one of relative and not absolute frequency.
1.4 Prelexical Effects As outlined above, there is good evidence that some aspects of speech processing take place at a prelexical level—incoming speech undergoes some preprocessing in order to facilitate mapping to appropriate lexical items. Many such processing strategies also appear to be prelexical in a second sense—they are acquired by the infant before they have the beginnings of a lexicon. Morphologically complex words must be subject to the same types of preprocessing as morphologically simple words. Precisely because prelexical effects are prelexical, they can not possibly filter out morphologically complex words and treat them as special cases. Prelexical processing affects complex words too. We now therefore consider the implications of adding a Fast Phonological Preprocessor to our idealized dual route model, as schematized in figure 1.3. Information flows from the phonological preprocessor in a strictly bottom-up fashion (see Norris et al. 2000). One primary role of the preprocessor is to entertain hypotheses about possible segmentations of the incoming speech stream. These hypotheses are formed on the basis of knowledge of lexical patterns and regularities. There are at least three specific hypothesis-forming strategies which could impact the processing of morphologically complex words—the use of metrical structure, restrictions on what can be a possible word, and the use of local distributional cues. 1.4.1 Metrical Structure There is extremely good evidence that English speaking adults and infants exploit the stress patterns of words in order to aid segmentation. Over 90% of English content words begin with stressed syllables (Cutler and Carter 1987). There is strong evidence that English speaking adults and infants
Causes and consequences of word structure
10
Figure 1.3: Schematized dual route model. The solid lines indicate the decomposed route. The dashed line indicates the direct route. A fast phonological preprocessor operates prelexically, facilitating access to lexical entries which are aligned with hypothesized word boundaries.
Introduction
11
take advantage of this during speech processing—positing word boundaries before strong syllables (Cutler 1990, Cutler and Butterfield 1992, Cutler and Norris 1988, Jusczyk, Cutler and Redanz 1993, Jusczyk, Houston and Newsome 1999). This Metrical Segmentation Strategy is evident in infants as young as 7.5 months (Jusczyk et al. 1999), and is clearly prelexical. In our dual-route model of morphology we attribute such prelexical effects to a phonological preprocessor—the preprocessor hypothesizes boundaries at the beginning of strong syllables and facilitates access to candidate lexical entries which are aligned with those boundaries. What implications does this have for morphologically complex words? Words in which a strong syllable directly follows the morpheme boundary will be more likely to be decomposed than words in which that syllable is weak. To see this, compare the processing of inhuman and inhumane. The first syllable of both derived words is unstressed, so, when encountered in running speech, the metrical segmentation strategy will do nothing to facilitate access via the direct route. However, the words differ in an important way. The first syllable following the morpheme boundary is stressed in inhuman, but not in inhumane. As such, the preprocessor should facilitate access to the base in the former, but not the latter. We expect words like inhuman to be more likely to be decomposed during processing than words like inhumane. The potential relevance of the Metrical Segmentation Strategy for the processing of complex words has been pointed out by Schreuder and Baayen (1994). The prediction being put forward here is somewhat more precise—because the Metrical Segmentation Strategy will affect only some prefixed words, we expect these words to remain more robustly decomposed than their non-optimally stressed counterparts. Words in the latter group should more often favor the direct access route, be more likely to become liberated from their bases, and undergo semantic drift. Some preliminary calculations over lexica lead me to believe that this will prove to be the case. However no results on this point are reported in this book—it remains as an empirical, testable, question. 1.4.2 Possible Word Constraint Norris et al. (1997) claim that there is a Possible Word Constraint operative in the segmentation of speech. This constraint effectively suppresses activation of candidate forms which would lead to a segmentation resulting in impossible words. Their experimental results, for example, show that subjects have a harder time spotting apple inside fapple than inside vuffapple (because f is not a possible word). Similarly, sea is easier to spot in seashub than seash. We predict that this will have implications for the processing of affixes which themselves cannot be words. For example Bauer (1988) points to abstract noun forming th as a textbook case of a non-productive affix. Example nouns with this affix include warmth, growth and truth. The affix -th is not itself a syllable, and is shorter than the minimal possible word in English. Thus, the processor is unlikely to posit a boundary before -th in truth, and so the direct route is likely to be advantaged. This can be contrasted with the processing of a word like trueness, where the Possible Word Constraint would not disadvantage access of the base. Thus, we predict that more words
Causes and consequences of word structure
12
containing word-like affixes will be decomposed during online processing than words containing affixes which themselves could not be phonological words of English. 1.4.3 Probabilistic Phonotactics Language specific phonotactic patterns affect multiple aspects of speech perception. Important for the discussion here, they appear to be one cue which is used in the segmentation of speech. Saffran, Aslin and Newport (1996) show that, when presented with a string of nonsense words, eight month old infants are sensitive to transitional probabilities in the speech stream. This is also true of adults (Saffran, Newport and Aslin 1996). This result suggests that sensitivity to probabilistic phonotactics plays a role in the segmentation of speech. McQueen (1998) and van der Lugt (1999) provide further evidence that phonotactics are exploited for the task of locating word boundaries. Mattys et al. (1999) demonstrate that English learning infants are sensitive to differences between inter- and intra- word phonotactics. Computer models demonstrate that probabilistic phonotactics can significantly facilitate the task of speech segmentation (Brent and Cartwright 1996, Cairns et al. 1997, Christiansen et al. 1998). Recent results reported by Pitt and McQueen (1998) and Vitevitch and Luce (1998) suggest that knowledge of probabilistic phonotactics must be represented independently of specific lexical entries. In brief, our phonological preprocessor depicted in figure 1.3 is likely to be sensitive to distributional cues—positing boundaries inside phoneme transitions which are unlikely to occur word-internally. This has important implications for morphologically complex words. Namely, if the phonology across the morpheme boundary is highly unlikely to occur morpheme internally, then the preprocessor is likely to posit a boundary, and so advantage the decomposed route. Inhumane is an example of such a word. The /nh/ transition is highly unlikely to be found within a simple word, and so the processor will hypothesize the presence of a boundary. As such, the direct route will be disadvantaged in comparison with the decomposed route, because it does not align with hypothesized boundaries. We expect words like inhumane to be more likely to be decomposed than words like insincere where the transition across the morpheme boundary is well attested morpheme internally (e.g. fancy, tinsel). Similarly, pipeful should be more likely to be decomposed than bowlful (cf. dolphin).
1.5 Consequences 1.5.1 Words In this book I concentrate on two of the predictions outlined above—one lexical (lexical frequency) and one prelexical (probabilistic phonotactics). I demonstrate that both factors are related to morphological decomposition. Because lexical access leaves its traces on the lexicon, this has profound implications, leading words which tend to be decomposed in access to acquire different properties than words which are more likely to be accessed via a direct route. Words which are more
Introduction
13
prone to whole word access appear less affixed, undergo semantic drift, proliferate in meaning, and are implemented differently in the phonetics. They are effectively free to become phonologically and semantically liberated from their bases and acquire idiosyncrasies of their own. Words which are more prone to decomposition during access, on the other hand, remain more robustly decomposed. They tend to bear a regular and predictable semantic relation to their base word, and are highly decomposable, both impressionistically, and in the phonetics. Close investigation of lexical frequency and phonotactics in morphologically complex words leads us to discover a number of lexicon-internal syndromes—remarkable co-occurrences of frequency-based, phonologically-based, and semantic factors. 1.5.2 Affixes As outlined above, I argue that affixed words display different levels of decomposability, and that the degree to which a word is decomposable can be well predicted by general facts about speech processing. This variation in decomposability has consequences not only for individual affixed forms, but also for the affixes they contain. Affixes represented by many highly decomposed forms will have higher activation levels than affixes which are represented by many forms which are accessed via a direct, non-decomposed route. Some properties inherent to the affixes themselves predict the degree to which words that contain them will be decomposable. One such property is the degree to which the affix has the form of a possible word. As outlined above, the Possible Word Constraint will make words containing the suffix -th less likely to be decomposed than words containing a more wordlike suffix like -ness. Similarly, the types of phonotactics an affix tends to create across the boundary is highly relevant. An affix which creates consistently bad phonotactics across the morpheme boundary is likely to be represented by a higher proportion of decomposed forms than an analog which consistently creates words which have the phonological characteristics of monomorphemic words. I demonstrate that the productivity of an affix can be directly related to the likelihood that it will be parsed out during processing. Previous work has documented the remarkable co-occurrence patterns involving phonological patterns, phonological transparency, semantic transparency, productivity, and selectional restrictions of affixes in English. These co-occurrences have been used to motivate two distinct levels of affixation. In this book I abandon the idea that affixes can be neatly divided into two distinct classes, and demonstrate that the variable parsability of affixes can be linked to the variable decomposition of individual words. This affords us enormous explanatory power in arenas which have proven classically problematic in linguistic morphology—such as restrictions on affix ordering. While early accounts of affix ordering were overly restrictive, recent work which has discarded the idea that there are any restrictions on ordering (beyond selectional restrictions) misses a number of important generalizations about stacking restrictions in English. When we approach the problem from a different level of abstraction, we gain enormous explanatory power.
Causes and consequences of word structure
14
The problem of restrictions on affix ordering in English can be essentially reduced to one of parsability: an affix which can be easily parsed out should not occur inside an affix which can not. This has the overall result that, the less phonologically segmentable, the less transparent, the less productive an affix is, the more resistant it will be to attaching to already affixed words. This prediction accounts for the patterns the original affix-ordering generalization was intended to explain, as well as the range of exceptions which have been observed in the literature. Importantly, the prediction extends to the parsability of specific affixes as they occur in specific words. This accounts for the so-called dual-level behavior of many affixes. I demonstrate that an affix may resist attaching to a complex word which is highly decomposable, but be acceptable when it attaches to a comparable complex word which favors the direct-route in access. Armed with results concerning the decomposability of words with certain frequency and phonology-based characteristics, we are able to systematically account for the range of allowable affix combinations in English. The temporal nature of speech perception also leads to powerful predictions regarding allowable bracketing paradoxes— unexpected combinations of prefixes and suffixes.
1.6 Some Disclaimers All of the experiments and the claims put forward here are based on English derivational morphology. I set inflectional morphology aside. I do this not because I believe that all inflectional morphology is inherently different from derivational morphology—on the contrary. However most inflectional morphology in English is of a much higher type frequency than typically observed with derivational morphology. Thus, the models outlined above would predict it to acquire markedly different characteristics. One effect of high type frequency, for example, would be a high resting activation level of the affix, and so a high level of productivity. Because the incorporation of inflectional morphology in this study would require close attention to (and experimental control of) affixal type frequency, I have omitted it from the experiments, and the study. The degree to which the results presented here are relevant to inflection remains an empirical question. The use of the terminology dual-route should not be viewed as implicit support for dual-systems models in which rule-based morphology is fundamentally distinguished from memory-based morphology (see, e.g. Pinker and Prince 1988, Prasada and Pinker 1993, Ullman 1999). As the ‘dual-systems’ debate has been waged primarily on the turf of inflectional morphology, it should be clear that the phenomena discussed in this book do not speak directly to that debate. As the use of similar terminology opens the door for confusion, however, it is worth pointing out explicitly that the ‘dual-routes’ in the simple model outlined in figure 1.3 (as well as in the models on which it is loosely based) are intended to refer to different access routes to a single representation. This book is also not a crosslinguistic study. I argue that the speech segmentation strategies used by English speakers exert an important influence on English morphology. The consequences of this claim are not examined for other languages. There are clearly
Introduction
15
cross-linguistic differences both in speech segmentation processes (see, e.g. Vroomen, Tuomainen and de Gelder 1998), and in morphological patterns (see e.g. Spencer 1991). The specific organization of a given language, that is, the factors which facilitate speech segmentation, and the degree to which these are present in morphologically complex words, will affect the manner in which the effects described here will play out in that language. While the range of results presented here certainly predict that there will be an interaction between speech perception and morphological structure crosslinguistically, detailed cross-linguistic analysis remains for future research.
1.7 Organization of the Book The book is organized as follows. Chapter 2 outlines the evidence for the role of phonotactics in the segmentation of speech. I present a simple recurrent network which is trained to use this information to segment words, and demonstrate that this learning automatically transfers to hypothesizing (certain) morpheme boundaries. If phonotactic-based segmentation is prelexical, morphology cannot escape its effects. Results are then presented from an experiment involving nonsense words, demonstrating that listeners do, indeed, use phonotactics to segment words into morphemes. Chapter 3 investigates the consequences of this result for real words. Experimental results presented in this chapter demonstrate that listeners perceive real words containing low probability junctural phonotactics to be more easily decomposable than matched counterparts which contain higher probability junctural phonotactics. Calculations over lexica demonstrate that prefixed words with legal phonotactics across the morpheme boundary are more prone to semantic drift, more polysemous, and more likely to be more frequent than the bases they contain. These results are not replicated for suffixed words— a difference which can be ascribed to the inherently temporal nature of speech processing. In chapter 4 we turn to lexical aspects of speech processing, investigating the role of the relative frequency of the derived form and the base. I argue that, despite previous claims involving the absolute frequency of the derived form, all models which predict a relation between frequency and decomposability, predict a relation involving relative frequency. Two experimental results are presented which confirm this prediction. First, subjects rate words which are more frequent than their bases as appearing less morphologically complex than matched counterparts which are less frequent than their bases. And second, words which are less frequent than the bases they contain (e.g. illiberal) are more likely to attract a contrastive pitch accent to the prefix than matched counterparts which are more frequent than their bases (e.g. illegible). Chapter 5 further demonstrates the lexicon-internal effects of relative frequency, by presenting calculations over lexica. Derived forms which are more frequent than their bases are less semantically transparent and more polysemous than derived forms which are less frequent than their bases. This result proves robust both for prefixes and suffixes. In addition to being cognitively important, this result has methodological consequences for current work on morphological processing.
Causes and consequences of word structure
16
Some phonetic consequences of decomposability are examined in chapter 6, which presents experimental results involving /t/-deletion. Decomposable forms (such as softly—less frequent than soft), are characterized by less reduction in the implementation of the stop than forms which are not highly decomposable (e.g. swiftly—more frequent than swift). The remainder of the book examines the linguistic consequences of the reported results. The two factors we have examined provide us with powerful tools for estimating the decomposability of an affixed word. Armed with these tools, I demonstrate the central role of decomposability in morphological productivity (chapter 7) and the affix ordering generalization (chapter 8). The results are drawn together and discussed as a whole in chapter 9—the conclusion. Taken as a whole, the results in this book provide powerful evidence for the tight connection between speech processing, lexical representations, and aspects of linguistic competence. The likelihood that a form will be parsed during speech perception has profound consequences, from its grammaticality as a base of affixation, through to fine details of its implementation in the phonetics.
CHAPTER 2 Phonotactics and Morphology in Speech Perception
English speaking adults and infants use phonotactics to segment words from the speech stream. The goal of this chapter is to demonstrate that this strategy necessarily affects morphological processing. After briefly reviewing the evidence for the role of phonotactics in speech perception (2.1), I discuss the results of Experiment 1—the implementation of a simple recurrent network (2.3). This network is trained to use phonotactics to spot word boundaries, and then tested on a corpus of multimorphemic words. The learning transfers automatically to the spotting of (certain) morpheme boundaries. Morphologically complex words in English cannot escape the effects of a prelexical, phonotactics-based segmentation strategy. Having illustrated that segmentation at the word and morpheme level cannot be independent, I present the results of Experiment 2, which show that listeners can, indeed, use phonotactics to segment nonsense words into component “morphemes.”
2.1 Phonotactics in Speech Perception There is a rapidly accumulating body of evidence that language specific phonotactic patterns affect speech perception. Phonotactics have been shown to affect the placement of phoneme category boundaries (Elman and McClelleland 1988), performance in phoneme monitoring tasks (Otake et al. 1996), segmentation of nonce forms (Suomi et al. 1997), and perceived well-formedness of nonsense forms (Pierrehumbert 1994, Coleman 1996, Vitevitch et al. 1997, Treiman et al. 2000, Frisch et al. 2000, Hay et al. to appear, and others). Several of these results are gradient, indicating that speakers are aware of, and exploit, the statistics of their lexicon. Such statistics also appear to play a vital role in the acquisition process. Jusczyk et al. (1994) show that nine month old infants prefer frequent phonotactic patterns in their language to infrequent ones. Saffran, Aslin and Newport (1996) show that, when presented with a string of nonsense words, eight month old infants are sensitive to transitional probabilities in the speech stream. This is also true of adults (Saffran, Newport and Aslin 1996). This result is important because it suggests that sensitivity to probabilistic phonotactics plays a role in the segmentation of speech.
Causes and consequences of word structure
18
McQueen (1998) and van der Lugt (1999) provide further evidence that phonotactics are exploited for the task of locating word boundaries. While knowledge of segment-level distributional information appears to be important, it is certainly not the only cue which plays a role in segmentation (Bates and MacWhinney 1987, Christiansen, Allen and Seidenberg 1998). Other clues include the stress pattern (Cutler and Norris 1988, Jusczyk, Cutler and Redanz 1993), acousticphonetic cues (Lehiste 1972), prosody (Gleitman, Gleitman, Laundau and Wanner 1988), knowing a substring (Dahan and Brent 1999) and attention to patterns at utterance boundaries (Brent and Cartwright 1996). In this chapter we concentrate on the role of junctural phonotactics.
2.2 Neural Networks and Segmentation Neural network models have been used to demonstrate that distributional information related to phonotactics can inform the word segmentation task in language acquisition. Elman (1990) demonstrates that a network trained on a phoneme prediction task can indirectly predict word boundaries. Error is high word-initially, and declines during the presentation of the word. Based on this result, he claims that error distribution in phoneme prediction could aid in the acquisition process, with high error rates indicating word onsets. In keeping with this hypothesis, Christiansen et al. (1998), and Allen and Christiansen (1996) deploy a phoneme prediction task in which a network is trained to predict the next phoneme in a sequence, where one possible phoneme is a boundary unit. The boundary unit is activated at utterance boundaries during training. The input at any given time is a phoneme—how this is represented varies across different implementations, but most modelers use a distributed representation, with each node representing certain features. The correct output is the activation of the phoneme which will be presented next in the sequence. The sequences preceding the utterance boundary will always be sequences which are legal word endings. Thus, after training, the utterance boundary is more likely to be activated after phoneme sequences which tend to end words. Allen and Christiansen (1996) train their network on 15 tri-syllabic nonsense words, concatenated together into utterances ranging between 2 and 6 words. They demonstrate that the network is successful at learning this task when the word internal probabilities are varied—so as to provide potential boundary information. However when the word internal probability distributions were flat, the network failed to learn the task. Christiansen et al. (1998) scale the training data up, demonstrating that when the network is faced with input typical of that to infants, phonotactics can also be exploited in the segmentation task. They train the model on utterances from the CHILDES database, and demonstrate that stress patterns, utterance boundaries and phonotactics all facilitate the phoneme prediction task, and so, indirectly, the identification of word boundaries. In a phoneme prediction task such as that deployed by Christiansen et al.’s network, the activation of a boundary depends on the phoneme(s) that precede it. Consider the trained network’s behavior on presentation of the word man. Each phoneme is presented to the network, and on the presentation of each phoneme, the network attempts to predict the next phoneme. On presentation of the /n/, then, a certain prediction will occur, and the
Phonotactics and Morphology in Speech Perception
19
boundary marker will receive some degree of activation. The degree of activation of the boundary unit is unrelated to whatever phoneme actually occurs next. At the time of the presentation of /n/, the network does not know whether the next phoneme is /t/, /p/, or something else. As such, a network trained on a prediction task would make the same predictions regarding the presence of a boundary in strinty and strinpy. Given that adults do make different predictions regarding the decomposability of these strings (see section 2.4), we should assume that, at least in a mature listener, the phoneme following the potential juncture is also relevant to the task of segmentation. In the next section I describe a simple recurrent network designed to demonstrate that if phonotactics is deployed for the task of speech segmentation, this strategy must have direct consequences for morphological decomposition.
2.3 Experiment 1: A Simple Recurrent Network Because I was interested in demonstrating the inevitability of transfer from phonological based segmentation to morphological decomposition, I decided to train a simple network on an explicit segmentation task, and provide it with the phonotactic information believed to be exploited in this task by adults—that is, provide it with information about the phonology on both sides of the potential juncture. The network was also provided with syllabification information, to allow it to make positionally sensitive generalizations. These strategies give the network the maxi-mum chance of success at learning word boundaries. The network was trained on monomorphemic words of English, and then tested on multimorphemic words, to establish the degree of the transfer of learning to the task of morpheme spotting. 2.3.1 Network Architecture A simple recurrent network was designed, with a single output node, which should be activated if the current unit is the first unit of a word, and should remain unactivated elsewhere. The network was implemented using Tlearn (Plunkett and Elman 1997). The architecture is shown in figure 2.1. There are 27 input nodes, which are used for an essentially localist representation of units. The first three input nodes represent syllabic position—node one is on for onsets, node two for nuclei, and node three for codas. The remaining 24 nodes represent the different consonants of English. The network was not given information about distinctions between vowels, on every presentation of a vowel, node two was activated and no other. If the unit consists of a single consonant, just one of nodes 4–27 is activated. If the unit is a consonant cluster, the relevant component nodes are activated. This input strategy has the side effect that there are a small number of clusters that the network can not distinguish between. The codas /ts/ and /st/, for example, have the same representation. However sonority restraints on sequencing ensure that the number of such “ambiguous” clusters in this representation is very small. The output layer consists of a single node, which is on if the current unit is the first unit in a word. In addition, the network has 6 hidden units, and 6 context units. The small number of hidden units forces an intermediate distributed transformation of the localist
Causes and consequences of word structure
20
input. All links depicted in figure 2.1 are trainable, except the links between the hidden nodes and the context nodes, which are fixed at one to one, to capture a copy of the activation pattern of the previous unit. 2.3.2 Training Data The training set was a subset of the CELEX lexical database (Baayen et al. 1995), identified as being monomorphemic. This set includes all words which CELEX tags as “monomorphemic.” CELEX also includes many words having “obscure morphology” or “possibly containing a root.” These two classes were examined independently by three linguists, and all forms which were identified as multimorphemic by any of the linguists were omitted from the corpus. After examining the resulting database, a number of forms identified as “monomorphemic” in CELEX were also subsequently excluded—these included reduplicative forms such as tomtom, and adjectives relating to places and ethnic groups such as Mayan. The resulting database is a set of 11383 English monomorphemes. Calculations based
Figure 2.1: Architecture of a simple recurrent network designed to identify word onsets. on this subset of CELEX have been reported in Hay et al. (in press) and Pierrehumbert (2001). Each word was parsed into a sequence of phonological units, consisting of onsets, nuclei and codas. The network then was trained on a single long utterance, consisting of one pass of this set of 11383 randomized monomorphemic English words. It therefore saw each word only once, and so word frequency information is not encoded at all in the input. The total number of phonological units presented was 53454, with the output node turned on for those 11383 which represented word onsets. The weights were initially randomized within the interval (−0.5, 0.5). The network was trained with a learning rate of .3 and a momentum of .9. After the presentation of each unit, the error was calculated, and used to adjust the weights by back-propagation of error (Rumelhart et al. 1986).
Phonotactics and Morphology in Speech Perception
21
2.3.3 Results and Discussion After training on monomorphemes, the network was tested on 515 prefixed words. The corpus involves all words with 9 English consonant-final prefixes which contain monomorphemic bases. This is the same dataset used for the analysis in chapters 3 and 5, and described in detail in section 3.3.2. The words were concatenated together into a long utterance consisting of 3462 units. During testing, the output node displayed a range of activation from .000, to .975. In the statistics reported here, I assume a threshold of .5. In all cases in which the output node has an activation level of above .5, it is assumed to have located a boundary. An examination of the word boundaries in the test corpus gives an indication of how well the network learned its task. Of 515 word-initial units, the network identified a boundary 458 times. That is, the network correctly identified 89% of the word boundaries. The network therefore learned the task quite successfully. As predicted, this learning transferred to the spotting of morpheme boundaries. Of the 515 morpheme boundaries present in the test set, the network hypothesized a juncture at 314–60%. In contrast, of the 2306 units which represented neither morpheme nor word onsets, the output node fired for only 127—that is, a boundary was falsely hypothesized less than 1% of the time. The 60% identification rate of morpheme boundaries reflects the considerable variation of the type of phonotactics which occur across morpheme boundaries. Some transitions are fully legal word-internally, whereas others are more typical of inter-word, rather than intra-word transitions. The variation in the behavior of the output node across the morpheme boundaries is significantly correlated with the phonotactics. The activation level of the output node can be significantly predicted by the probability of the transition occurring morpheme internally, and the probability of it occurring across a word boundary (r2=.25, p<.0001). Of these, the morpheme-internal probability is both the stronger and more significant predictor (coefficient =−.089, p<.0001), than the inter-word probability (coefficient= .033 p<.002). 1 The network was so successful at learning this task for a number of reasons. First, it was explicitly trained on the task of segmentation. Researchers who are interested in the acquisition of segmentation strategies have generally attempted to model how simple recurrent networks might be able to succeed in this task with no explicit training on it. When the answer is not in the training data, this task is obviously a much more difficult one. In this simulation, I effectively gave away the answer in the monomorphemic training set, in order to see how the generalizations formed extended to the analysis of our multimorphemic test set. Second, the network was given information about syllabification. This greatly facilitated the learning task, by allowing the network to make generalizations at multiple levels of abstraction, and facilitating the abstraction of position-specific phonotactic restrictions. The current network also had an advantage in that it had at its disposal information not only about phonotactics prior to the juncture, but also the phonology immediately following the juncture. Models which use a prediction task are not able to exploit this information, which leads to a limitation in the degree of informativeness of the phonotactics.
Causes and consequences of word structure
22
However, regardless of the type of task a network is trained on, and regardless of the specific nature of the input, if a network learns to spot word boundaries (whether directly, or indirectly), this learning must have consequences for certain types of morpheme boundaries. Some morpheme boundaries are more boundary-like than others. For example, while the current network fails to detect a boundary in untangle (activation level: 0.082), it hypothesizes an extremely strong boundary in disquiet (activation level: 0.975). Words like uncouth were associated with juncture, but to a lesser degree (0.613). A phonotactics-based segmentation strategy for word-spotting, then, has direct consequences for morphology. We predict that affixed words behave differently according to their internal phonotactics. Those that contain phonotactics which indicates boundary are more likely to remain highly decomposed than those which do not. 1. The probabilities are calculated in a position sensitive manner. For example, for a word like insincere, the probability within the set of monomorphemes is calculated as the proportion of all monomorphemes which contain an /n/ coda, followed by an /s/ onset. Full details are given in 3.2. The probability across a word boundary is calculated using word ends. Thus, the expected probability of an /ns/ transition is calculated as the product of the probability of an /n/ final monomorpheme, and the probability of an /s/ initial monomorpheme.
English is configured in such a way that many morpheme boundaries resemble word boundaries. If listeners are able to exploit phonotactics to segment words from the speech stream, this must inevitably carry over to the task of morphological decomposition. In the next section we discuss some experimental results which bear on this issue.
2.4 Phonotactics and Morphological Decomposition In Hay, Pierrehumbert and Beckman (in press) we provided some initial evidence that probabilistic phonotactics are used to hypothesize morpheme boundaries. We reported the results of a series of well-formedness rating tasks, in which subjects were asked to rate the well-formedness of nonsense words. The nonsense words were created by crosssplicing, and differed in the probability of a word-medial nasal-obstruent cluster. One set of stimuli, for example, embedded nasal-obstruent clusters in a striNOy template, as in strimpy, strinsy, strimfy, strinpy, etc. In the first experiment, Hay et al. demonstrate that the log probability of the medial cluster was a reasonable predictor of subjects’ well-formedness judgments. However the results displayed a marked anomaly—impossible clusters, such as those found in strimty, or zamtSer, received anomalously high well-formedness ratings, even when perceived accurately (as indicated by subjects’ spellings of the nonsense words, which they transcribed after rating). Thus, when well-formedness ratings were plotted against the probability of the phoneme transition occurring in a monomorphemic word, a boomerang shape resulted—the lowest rated “impossible” cluster received a higher rating than clusters which were low probability, yet still possible. Hay et al. hypothesized that the strings containing impossible clusters were automatically parsed as multimorphemic, and hence were rated as such. Thus, while strimty would be an ill-formed monomorphemic word of English, it would be a perfectly acceptable compound. In the final experiment, Hay et al. provide results for the full set of
Phonotactics and Morphology in Speech Perception
23
nasal-voiceless obstruent clusters in English, demonstrating that subjects’ wellformedness judgments are well-predicted by the log probability of the most probable parse. When the phonology of a word makes it a highly probable monomorpheme, subjects perceive, and rate it as such. When the phonology makes a multimorphemic parse more probable (either as an affixed or a compound form), subjects perceive and rate the form as multimorphemic. Figure 2.2, taken from Hay et al. (in press) demonstrates the linear relationship between the log probability of the best parse, and subjects’ wellformedness judgments. The center line shows the overall fit of the data (r2=.65, p<.0005). The dashed lines on either side show separate fits for strident and non-strident obstruents. Two thirds of Hay et al.’s nonsense words began with a strident, either voiced (as in zamper) or unvoiced (as in strimpy). Thus, a long distance effect of the Obligatory Contour Principle appears to have affected the well-formedness judgments. Stridents in parallel positions in adjacent syllables are dispreferred, lowering the overall wellformedness judgments for the affected nonsense words containing stridents. When we fit regression lines separately through the stridents and non-stridents, very high r2 values were obtained (r2=.8, p<.02 for stridents; r2=.94, p<.0001 for non-stridents). The overall well-formedness reflects a cumulative effect of the local probability of the parse, and the long distance factor of the Obligatory Contour Principle.
Figure 2.2: Figure 6 from Hay et al. (in press). Average well-formedness ratings as predicted by the log probability of the best morphological parse. The center line shows the overall fit of the data. The dashed lines
Causes and consequences of word structure
24
on either side show separate fits for strident and non-strident obstruents. Hay, Pierrehumbert and Beckman’s results therefore provide some initial evidence that subjects are sensitive to phonotactic probabilities, not only across word boundaries, but also across morpheme boundaries. The experimental task does not directly invoke the notion of morphemehood, however. Subjects’ parsing strategies are induced by an examination of the statistics governing their wellformedness judgments in a task in which they are asked to rate overall wellformedness. Subjects are given no external reason to believe that any of the nonsense words might be morphologically complex, and it is not clear the degree to which the subjects in this experiment were consciously decomposing the nonsense words. That is, Hay et al. have given us good reason to believe that phonotactics are involved in morphological parsing. Does it follow that subjects can consciously exploit this variable in a task which involves the notion of morphemehood explicitly? Section 2.5 will describe an experiment designed to address this question directly.
2.5 Experiment 2: Phonotactic Decomposition in Morphology As discussed above, Hay et al. (in press) provide some indirect evidence that listeners do, indeed, use phonotactic clues when assessing the degree to which a word is multimorphemic. This section describes an experiment which was designed to test this claim directly. When subjects are explicitly asked to judge the affixedness of specific words, do they exploit phonotactics to facilitate this task? As a first step towards creating stimuli for such a task, matched pairs of words containing the same affix were selected. The base frequency and the frequency of the derived form were controlled. Members of each pair differed according to the probability of the phonotactic transition across the morpheme boundary. For example, the word bowlful has a listed lexical frequency of 0 in CELEX, and bowl has a frequency of 591. Pipeful displays similar frequency characteristics (pipeful=0, pipe=558). However the /lf/ phonotactic transition across the morpheme boundary in bowlful is attested morpheme internally in English (cf. dolphin, alpha, sulphur, etc). The /pf/ transition in pipeful is not attested morpheme internally. The prediction, therefore, is that pipeful will be more decomposable than bowlful. The pf transition provides a strong cue to the morpheme juncture. We predict that this cue will be exploited in speech perception, and facilitate decomposition. However the affix -ful enters into a large number of low probability phoneme transitions. Indeed, almost any affix which enters into some low probability transitions is likely to enter into many. Thus, the set of affixes available for such a matched-pairs design are (as will be shown in chapter 7) extremely productive and robust affixes. There are many highly decomposable forms containing that affix in the lexicon, and so the affix emerges as productive and extremely “affix”-like. Thus, even in cases in which the phonotactics suggest analysis as monomorphemic, the cumulative force of many illegal transitions in the lexicon (and thus many robustly decomposed forms containing the affix) should facilitate ready decomposition. That is, -ful may be so decomposable that a
Phonotactics and Morphology in Speech Perception
25
ceiling effect ensues, and all forms containing the affix are quickly and readily decomposed. In order to avoid an experimental result in which all forms are readily decomposed, nonsense words were created which were modeled on the existing forms. Such words are, by definition, not semantically transparent. Using nonsense words also completely eliminates any effect of frequency or semantic transparency of existing words. Basing the nonsense words on existing words allowed for the possibility of later running a parallel experiment on the existing words (cf. experiment 3). It also ensured that the comparisons subjects were asked to make reflect actual transitions occurring in the language. So rather than making a judgment between bowlful and pipeful, subjects assessed two pairs which were based on the relevant junctures: vilfim/vipfim, and jolfet/jopfet. Subjects were asked to imagine they were helping in the construction of a nonsense language, which needed to be sufficiently complex so as to have morphology. It is assumed that speakers who speak (only) English natively will bring their knowledge of English phonotactics to bear on the task of decomposing a novel string, regardless of what language the string is said to be in. 2.5.1 Materials All nonsense words were chosen such that each first (strong) syllable is not an existing syllable in English, although both the onset and rhyme are attested. The second syllable of each word was identical across matched pairs. The first syllable and the second syllable therefore have exactly the same probability for both members of each matched pair. What differs is the probability of the transition across the two syllables. The nonsense words used are shown in table 2.1, together with the number of words containing the relevant coda-onset transition which appear in our corpus of monomorphemes. This is the same subset of CELEX which was used for the calculations reported by Hay et al. (in press), Pierrehumbert (2001), and for the training set described in section 2.3. The probability of the phoneme transition across the morpheme boundary was calculated, with reference to this set of monomorphemes. I assume the syllabification provided by CELEX, and calculate the probabilities in a position specific manner. Codas and onsets are treated as holistic entities. Thus, for a word like zantlid, the probability within the set of monomorphemes is calculated as the proportion of all monomorphemes which contain an /nt/ coda, followed by an /l/ onset. The calculation therefore reflects a simple bi-gram model, where the “grams” consist of holistic onsets and codas. The inter-word probability was also calculated, by calculating the product of the probabilities of, in this example, an nt final word followed by an l initial word.2 For all of the words in the data set, the more decomposable member of the pair is more decomposable, regardless of whether decomposability is calculated by absolute (i.e word internal) or relative (intra-/inter-word) probabilities. All stimuli were created by cross-splicing, in order to ensure phonetic comarability across the matched pairs, and to avoid possible misfluency which may result from having a speaker produce low probability, or non-occurring phoneme transitions. A female speaker of Standard American English recorded the stimuli from an IPA transcription. Initial syllables were recorded in the context _-hin. Final syllables were
Causes and consequences of word structure
26
recorded in the context bla-_. The stimulus vilfim, for example, was therefore created by cross-splicing from vil-hin, and bla-fim. The only exceptions are the four “suffixes” which begin with /d/. In order to avoid a flapped token, these were recording following a lateral, in the context bal-. Every matched pair therefore contains identical phonetic material as the suffix, and the first half of each member of the pair was recorded in the same environment. In all cases, the speaker recorded the stimuli in a strongly trochaic stress pattern. It follows that the resulting cross-spliced stimuli are all trochaic. 2.5.2 Methodology The stimuli were tested under four conditions, of which each subject completed just two. The experiment was run at Northwestern University, in an introductory undergraduate linguistics class. All subjects were tested at the same time, with the stimuli played over a loud speaker. Anticipating the possibility that under these conditions the acoustic stimuli may not be sufficiently distinguished so as to produce a reliable effect, one group of subjects was given answer sheets on which an orthographic version of the stimuli appeared. These subjects both saw and heard all stimuli. A larger group of subjects made judgments based solely on the acoustic stimuli. The presentation of stimuli was divided into two sections. In the first section, subjects were asked to judge the complexity of nonsense words presented in isolation. For each word they were asked to judge it simple or complex, and then rate the confidence of their answer on a four point scale. Words were randomized in two blocks, where each block had only one member of each word-pair in it. Half 2. This assumes that the probability of one given phoneme ending one word is independent of the probability of any other phoneme beginning the next. Because words have arbitrary meanings, and coherence of meaning is the dominant factor in how words are combined, combinations across word boundaries are closer to being statistically independent than combinations within words.
Word A vilfim jolfet krolfek prilfis nizmip
Frequency of transition
Word B 11 vipfim
9 krothfek
14 nistmip
pruzmik
14 prunmik
kredlif hothlik
0
prithfis
glastmig
gidlu
0
jopfet
glazmig
gezmit
Frequency of transition
0
1
genmik 9 ginlu
1
krenlif 3 honglik
0
Phonotactics and Morphology in Speech Perception
blethlin
blenglin
kedlip
9 kendlip
fladlib
flandlib
ketlith
9 kelslith
protlig
prolslig
kwemnit
12 kweftnit
jomnish
joftnish
voknip
11 voshnip
beknim
beshnim
plekda swakdib twekdit zakdip zamlid shemlidge chodlish zidla
2 plefda
2 twensdit
0
0
2
0
0
zansdip 8 zantlid
2
shentlidge 9 chonlish
1
zinla 15 klimstil
vonstig
vomstig
volstim
1
swafdib
klinstil
prilstit
27
5 prinkstit
2
0
vonkstim
Table 2.1: Experiment 2: Stimuli of the members with low probability transitions appeared in block one, and half appeared in block two. Across the two blocks, there was a minimum of 13 words intervening between members of the same matched pair. In a second section, the words were presented in their matched pairs, and subjects were asked to indicate which member of the pair appeared more complex. Half the stimuli were presented with the word with low probability transition first, and half were presented with it second. The order of presentation was randomized. The actual instructions were recorded, and worded as below: In this experiment, you will assist with the design of a new language, smeg, to be used in an upcoming science fiction movie. The movie will contain a number of scenes in which characters communicate using smeg. Smeg therefore needs to include basic characteristics of human language such as the ability to combine words to make meaningful sentences, and the ability to add affixes to existing words to make new, more complex words.
Causes and consequences of word structure
28
In this experiment, we will be interested in the structure of smeg words. You will hear a number of words, and will be asked to decide whether they are more likely to be simple or complex words of smeg. In English, for example, the word “writer” can be broken down into two units: “write,” and “er.” “er” is a unit which occurs at the end of many English words. In “writer,” “er” has been added to the word “write” to make a new, more complex word “writer.” We will call a word which has been made out of smaller units in this way, a complex word. “Reddish” is another example of a complex word in English. It can be broken down into “red” and “ish.” Words which are not complex are called simple words. Here are some examples of simple words in English: yellow, sing, table. It is impossible to break down the word “table” into smaller units. “Table” is a simple word. In the first part of the experiment you will hear some words which we are considering including in the vocabulary of smeg. We are interested in whether you think each word is more likely to be a simple word of this language, or a complex word. This is so we can identify which half of the words would make the best complex words, and which half would make the best simple words. For each word, first indicate whether you think it is more likely to be a simple word, or a complex word. Then rate from 1 to 4 how certain you are of your answer. If you feel very certain of your answer, you should circle 4. If you feel very uncertain of your answer, you should circle 1. Remember that there are no right or wrong answers, we are only interested in your intuitions. (stop tape and check they understand, play stimuli for part 1). In the next part of the experiment, you will hear the same words again. This time the words will be presented in pairs. Please indicate which of the two words you think is more likely to be complex. If you think word a is more likely to be complex, then circle a. If you think word b is more likely to be complex, circle b. Remember, there are no right or wrong answers, we are only interested in your intuitions. (stop tape and check they understand, play stimuli for part 2). Each word pair occurred once in the first part of the experiment, and twice in the second part (the paired forced choice section). In the paired presentation, the pairs were block randomized, and counterbalanced for order of presentation. Incomplete answer-sheets were excluded, along with those of subjects who claimed native competence in languages other than English. This resulted in a total of 10 analyzed subjects in group A (written and auditory stimuli), and 17 in group B (auditory stimuli only). 2.5.3 Results and Discussion While subjects were asked to complete both a simple/complex forced choice decision and a confidence rating, response rate on the confidence rating portion of the answer sheet was not consistent. Some subjects did not complete this portion at all, some consistently gave the same confidence rating for all of the stimuli, and many omitted confidence ratings for some small portion of the data. It is possible that the rate of presentation of stimuli was sufficiently rapid as to make this portion of the task unmanageable. And
Phonotactics and Morphology in Speech Perception
29
given that the subjects completed the experiment within the context of a classroom setting, it is possible that some simply deemed this second portion of the answer sheet to be beyond the call of duty. The confidence ratings were therefore not analyzed in detail, and only the simple/complex answers will be presented here. When words were presented in pairs, and subjects were asked to indicate which word appeared more complex, subjects displayed a significant tendency to choose the word with low probability phonotactics. This was true both for subjects who saw and read the stimuli (Wilcoxon: by items p<.005; by subjects p<.05), and for subjects who just heard the stimuli (Wilcoxon: by items p<.002, by subjects p<.05). In the condition involving both auditory and visual stimuli, 62% of responses judged the word with low probability junctural phonotactics to be more complex. 38% judged the word with higher probability junctural phonotactics to be the more complex member of the pair. In the condition involving auditory stimuli alone, 56% of responses to words with low probability junctural phonotactics judged the word to be complex, as opposed to 44%, for words with higher probability junctural phonotactics. When the words were presented in isolation, subjects who both heard and saw the stimuli judged words with low probability phonotactics to be complex at a rate of 64%. The words with higher probability junctural phonotactics were judged complex in 53% of responses. This pattern was highly significant by items (Wilcoxon: by items p<.01). However it did not reach significance by subjects, due to the relatively small number of subjects participating in this condition (10). For the subjects who just heard the stimuli, the effect was absent when the stimuli were rated in isolation. The tendency was not even in the predicted direc-tion (64% complex response for low probability junctural words; 66% complex responses for higher probability junctural words). Of the four conditions, this last condition is the most difficult, and so it is perhaps not surprising that the effect failed to show up here. Indeed, if any of the conditions was to fail to reach significance, this is the one we would expect. Given that subjects who could also read the stimuli were able to rely on the phonotactics, even when stimuli were presented in isolation, we expect that this last condition is likely to succeed if subjects were brought into the lab. If stimuli were presented auditorily, in a context in which the clarity of the signal could be more controlled, we expect that this final condition would return a significant result. Based on the significant results in the other three conditions, however, this experiment has reaped convincing evidence that speakers do (or at least, that they can) use phonotactics as a basis for positing a decomposed analysis. The results of this experiment are analyzed in slightly more detail in Appendix A, in the context of a discussion about the specific type of statistic subjects may be using for segmentation. There I demonstrate that normalized coda-onset probabilities (specifically, mutual information statistics) are approximated fairly well by non-normalized probabilities, and that subjects appear to be using a segmentation strategy which exploits the latter, much simpler, statistic. This discussion has been relegated to an appendix, because it takes us too far away from the main point of this chapter: that the phonotactics of a morphologically complex word will affect the likelihood that the word will be decomposed during access.
Causes and consequences of word structure
30
2.6 Summary In this chapter we have seen that the use of phonotactics in segmentation has direct consequences for morphological decomposition. First, we trained a simple recurrent network on the task of spotting word boundaries. The network was trained with monomorphemic words of English. When tested on prefixed words, the segmentation strategy generalized. The boundary unit fired at 60% of the morpheme boundaries in our corpus of prefixed words. The behavior of the network, then, strongly suggests that the use of phonotactics to segment words from the speech stream will not leave morphology untouched. We then examined the results of an experiment designed to investigate the degree to which subjects could exploit phonotactics in the task of deciding how complex nonsense words sounded. This experiment, together with the results reported in Hay et al. (in press) provides conclusive evidence that subjects exploit phonotactic probabilities in the decomposition of nonsense words into morphemes. This result is consistent with the mounting literature which demonstrates that both children and adults exploit phonotactic probabilities in the extraction of words from the speech stream. If there exists a mechanism which monitors phonotactic probabilities for the purpose of segmenting words from the speech stream, it is a small leap to assume that this mechanism is involved in morphological decomposition as well. Indeed, as demonstrated by the neural network simulation, it would be extremely surprising if it were involved at one level but not another. Taken together, the results presented in this chapter strongly suggest that the use of phonotactics as a speech segmentation strategy carries over to morphological processing. Recent results reported by Vitevitch and Luce, however, demonstrate that the effects of probabilistic phonotactics can differ markedly across word and non-word stimuli (Vitevitch and Luce 1998, 1999). In chapter 3, then, we turn to an investigation of real words. What consequences, if any, do the results reported in this chapter have for the morphological decomposition of real words?
CHAPTER 3 Phonotactics and the Lexicon
In chapter 2 we saw that learning to segment words based on phonotactics is likely to have consequences for morphological decomposition. Furthermore, we saw experimental evidence that phonotactics can be used, online, to decompose nonsense words into morphemes. In this chapter we examine the consequences of these results for the processing and representation of real words. If phonotactics affects online morphological decomposition, then we expect to see reflexes of this in the lexicon. Words with fully legal junctural phonotactics are in a good position to undergo semantic drift. Their phonotactics do not contain a cue to the existence of a juncture, and so, if other circumstances permit, the juncture itself may gradually disappear, and the relationship between the derived form and its base may become increasingly opaque. Thus, we expect words like insincere to display different tendencies in the lexicon than words like inhumane. In this chapter I present the results of an experiment in which subjects were presented with pairs of real words, and asked to indicate which appeared to them be more “complex.” Subjects displayed a significant preference for words with low probability junctural phonotactics. I then present several different types of evidence which demonstrate the long term effects of this in the lexicon. Prefixed words with fully legal transitions (e.g. insincere) are less semantically transparent, more polysemous, more “prefixed,” and display different frequency characteristics than words with illegal transitions (e.g. inhumane). These results are absent for suffixed words—a result which reflects the left-to-right nature of lexical access.
3.1 Experiment 3: Phonotactics and Morphological Complexity Experiment 2 demonstrated that subjects can use phonotactics to segment nonsense words into “morphemes.” We used nonsense words in that experiment to avoid a possible ceiling effect that might ensue with actual existing words. For example -ful is a fairly productive affix, and it may be the case that nearly all words containing it are regarded as highly complex, resulting in a ceiling effect, such that degrees of complexity are difficult to discern. Encouraged by the results of the experiment with nonsense words, however, we decided to run a simple experiment to test whether subjects behaved similarly when presented with pairs of existing English words. Does pipeful, for example, appear to be
Phonotactics and the Lexicon
33
more complex than bowlful? If so, this would ratify the results of experiment 2, and considerably strengthen our prediction that the lexicon should display consequences of this phonotactics-based segmentation strategy. This question is also important in light of recent evidence that the effects of probabilistic phonotactics may differ across word and non-word stimuli (Vitevitch and Luce 1998,1999). 3.1.1 Materials and Methodology Forty-six pairs of affixed words were constructed—23 prefixed and 23 suffixed pairs. The suffixed pairs included all examples underlying the construction of nonsense stimuli in experiment 2, together with seven additional pairs. Members of each pair were matched for stress pattern, syllable count, affix identity, surface frequency and base frequency. They differed in the probability of the phoneme transition across the boundary, as calculated by the same method described for experiment 2. The suffixed and prefixed stimuli are listed in tables 3.1 and 3.2 respectively. The “A” members of each pair contain junctural phonotactics which are more likely to occur morpheme-internally than those contained in the “B” member. The listed frequencies are lemma frequencies from CELEX—the number of times that word, together with its inflectional variants, occurred in a corpus of 17.9 million words. The number of words containing the relevant coda-onset transition in the corpus of 11383 monomorphemes is given in the column labeled phon.1 These pairs of words were randomized, and counterbalanced, and presented together with 30 filler pairs. The fillers paired together pseuodo-affixed and affixed words. Example filler pairs include defend—dethrone, indignant—inexact, family 1. Vowel-vowel counts were calculated using the specific vowel onset, rather than a more general no-onset calculation.
Word A bowlful
freq
base freq
phon.
0
591
skilful
124
1456
amusement
197
amazement
Word B
11 pipeful
freq
base freq
phon
0
558
0
9 youthful
144
1314
0
470
14 adjustment
234
468
0
179
300
14 attainment
153
234
1
lidless
1
331
9 hornless
1
323
1
pathless
3
1095
3 wingless
4
1036
0
crudely
55
378
9 fondly
54
416
1
sweetly
55
828
9 falsely
39
785
0
dimness
19
296
12 swiftness
19
221
0
meekness
7
41
11 brashness
7
32
2
sheikdom
5
41
2 serfdom
4
23
0
dukedom
7
727
2 princedom
2
696
0
Causes and consequences of word structure
34
lamblike
0
374
8 saintlike
0
320
2
threadlike
3
287
9 hornlike
1
323
1
punster
0
33
15 rhymester
0
86
2
pollster
4
45
5 prankster
4
22
0
jobless
17
5966
5 friendless
17
6386
1
endorsement
45
176
7 enchantment
46
115
1
330
531
9 vaguely
354
454
2
neatness
33
531
5 smoothness
34
613
0
blissful
23
146
7 lustful
22
184
0
rudeness
40
234
4 frankness
44
172
0
glumness
2
34
2
33
2
loudly
12 glibness
Table 3.1: Experiment 3: Suffixed Stimuli —busily and adjective—protective. The complete set of stimuli consisted of 76 word pairs. For each pair, subjects were asked to indicate which member of the pair they considered more “complex.” The exact instructions were as follows. This is an experiment about complex words. A complex word is a word which can be broken down into smaller, meaningful, units. In English, for example, the word writer can be broken down into two units: write and -er. -er is a unit which occurs at the end of many English words. In writer, -er has been added to the word write to make a new, more complex word writer. We call a word which has been made out of smaller units in this way a complex word. Rewrite is another example of a complex word in English. It can be broken down into re- and write. Words which are not complex are called simple words. Here are some examples of Word A freq base freq phon. Word B freq base freq phon entomb
11
456
37 entrap
16
533
20
insincere
22
154
96 inhumane
28
233
3
uncork
7
98
19 unhinge
12
64
3
mismanage
9
2289
8
1193
0
miscast
9
737
7
658
3
substandard
7
1846
10
1649
2
uptown
7
4005
19 upstage
5
2933
1
upturn
14
10372
19 upstart
17
8092
1
de-populate
10
100
6
100
20
7 mishandle 16 misrule 8 subnormal
292 de-escalate
Phonotactics and the Lexicon
35
regenerate
47
662
148 reorganize
61
1118
4
unhook
23
299
3 unwind
30
293
0
expope
0
106
0
187
0
imperfect
50
94
98 infrequent
44
49
4
unfit
77
373
27 unkind
72
390
19
mislay
34
2167
2 misuse
63
2216
0
substation
3
2093
8 substructure
8
1879
3
transpose
24
496
7 transfix
28
725
0
uphill
33
2127
2 upstream
36
863
0
de-salt
1
666
356 de-ice
0
944
0
remold
5
204
555 rescript
0
318
9
retouch
12
1967
15
2216
5
unfold
122
685
122
493
1
0
873
0
951
1
expriest
17 exnun
640 reuse 27 unload 3 exqueen
Table 3.2: Experiment 3: Prefixed Stimuli simple words in English: yellow, sing, table. It is impossible to break down the word table into smaller units. Table is not complex. In this experiment, you will be presented with pairs of complex words, and asked to decide which one you think is more complex. For example happiness is very complex—it can be easily broken down into happy and -ness. Business, however, is not quite so complex. While it is possible to break business down into busy and -ness, it does not seem completely natural to do so. Business is complex, but not as complex as happiness. Another example of a complex word is dishorn. Even though you may never have heard the word dishorn before, you can understand its meaning, because it can be broken down into dis- and horn. Discard is also complex—it can be broken down into dis- and card, But discard does not seem as complex as dishorn. We do not need to break discard into its parts in order to understand its meaning, and, in fact, it seems slightly unnatural to do so. For each pair of words below, please read both words silently to yourself, and then circle the word you think is more complex. It is very important that you provide an answer for every pair, even if you are not certain of your answer. Just follow your intuition, and provide your best guess. The experiment was completed using pen and paper, and subjects worked at their own pace. 30 Northwestern University undergraduates completed the task, in fulfilment of a course experimental requirement.
Causes and consequences of word structure
36
3.1.2 Results and Discussion Any subject who did not provide the same answer (in either direction) for at least twenty of the thirty fillers was excluded from the analysis. Five subjects were excluded on these grounds. Twenty five subjects were therefore analyzed. Of these, three interpreted “complex” in the opposite manner from that intended. This could be seen from their consistent behavior on the filler items (ie they rated family more complex than busily, adjective more complex than protective and so on). This consistent behaviour indicates that their confusion was a terminological one, rather than a conceptual one, and so their data was included, with their answers reversed. Subjects displayed a significant tendency to rate the word with the less probable phonotactics as more complex (Wilcoxon, by subjects p<.05). This confirms the results of experiment 2, which demonstrated the same effect with nonsense words. When we break the results of the prefixed and suffixed forms down separately, the suffix data is highly significant by subjects (Wilcoxon, p<.005), and approaches significance by items (p<.08). However the prefixed data does not approach significance. While 56% of responses to suffixed forms favored the form with low probability junctural phonotactics (cf. 44% favoring the form with more probable phonotactics), the responses were exactly evenly split at 50% for the prefixed forms. This is unlikely to be due to some deep fact about the role of phonotactics in prefixes and suffixes, but rather is likely to be a reflex of the specific stimuli used. In order to control for base and derived frequency, I was forced to include some rather marginal pairs in the prefixed set. The suffixed set was better controlled, both in terms of the nature of the phonotactic contrasts, and also in terms of the transparency of the semantics. Thus, while all of the suffixed pairs have relatively transparent semantics, the prefixed set includes many words which are highly lexicalized (cf. upstream, uphill, upstage, uptown, transpose, transfix, entomb, entrap, etc). The meanings of these words is not a simple function of the parts. Thus, the added factor of the variability in terms of the subtle semantic relationships between the whole and the parts is highly likely to have affected subjects’ responses to the task. It is likely no accident that it was easier to control away from these factors in the suffixed set than in the prefixed set. Because of the left-to-right nature of speech processing, prefixed words favor direct access, and so we predict they are likely to be particularly prone to semantic drift. In addition, difficulties arise with controlling relative frequency while manipulating phonotactics in prefixed words, because the two are not independent (see section 3.3.3). The fact that suffixed words are more likely to be decomposed on access, leads them to be more likely to retain transparent semantics. In the suffixed set, then, I was able to wield sufficient control over relevant factors, for a significant effect of the phonotactics to emerge. Thus, while this experiment displayed a stronger effect of phonotactics across suffixal boundaries than prefixal boundaries, this should not be interpreted as evidence that phonotactics are more important at the end of the word than at the beginning of the word. On the contrary, one particular difficulty associated with finding appropriate prefixed words for this experiment, stems from the fact that phonotactics are very important across
Phonotactics and the Lexicon
37
prefix boundaries, leading prefixed words with legal phonotactics to take on different properties than prefixed words with illegal phonotactics. Once we start to try to manipulate phonotactics in prefixed words, it is difficult to hold other factors constant. To see this, we now turn to an exploratory investigation of the role of phonotactics in lexical organization. Most of our evidence will come from three disparate sources: the CELEX lexical database, Webster’s 1913 Unabridged Dictionary, and an appendix published in Wurm (1997), with prefixedness and semantic transparency ratings for a range of prefixed words. We will jump backwards and forwards between these sources a little, as we examine the various phenomena one at a time. That is, the discussion below is organized, not according to the source of the information, but rather according to its type. Before moving to the evidence itself, the nature of the calculations is described.
3.2 Calculating Juncturehood The discussion below involves an investigation of the relationship between phonotactics and various measures hypothesized to be related to decomposability. As described in the previous chapter the extent of juncturehood of a phoneme transition is established by running calculations over a subset of the CELEX lexical database—a subset which has been identified as monomorphemic. For affixed words, the probability of the phoneme transition across the morpheme boundary was calculated, with reference to the set of monomorphemes. I assume the syllabification provided by CELEX, and calculate the probabilities in a position specific manner. Codas and onsets are treated as individual entities. Thus, for a word like insincere, the probability within the set of monomorphemes is calculated as the proportion of all monomorphemes which contain an /n/ coda, followed by an /s/ onset. And for transfix, the probability is assessed as the proportion of monomorphemes which contains an /ns/ coda followed by an /f/ onset. Coda-less syllables (and likewise, onsetless syllables) are treated together. Thus the probability of the transition in prefix, is calculated as the probability of no coda followed by an /f/ onset. As such, the probability of the transition in prefix would be considered identical to the probability in pro-ferret. Clearly this over-simplifies the situation slightly, particularly by collapsing together short and long vowels, but it suffices as a reasonable approximation. A further simplification is made by always running the calculation across a syllable boundary. This ensures that the same universe of comparison is relevant to all calculations: they are all coda-onset calculations. This is true even when the morpheme boundary is blurred, or is actually syllable-internal. The word transalpine, for example, is syllabified by CELEX as tran][salpine. Here I calculate the probability of an /n/ coda followed by an /s/ onset. And for i][legible, the probability is calculated as the probability of no coda, followed by an /l/ onset. Thus, while the probability of words involving resyllabification does not necessarily involve the transition over the morpheme boundary per se, the probability is always fairly high, relative to other words containing the same affix. They therefore always appear among the words displaying the most probable phonotactics—a ranking similar to that which would have resulted by calculating the syllable-internal probabilities. For the vast majority of the words we will be considering,
Causes and consequences of word structure
38
the morpheme boundary does occur at a syllable break, and so this seemed a reasonable compromise position. These word internal probabilities are not normalized for expected probabability of occurrence. I follow Cairns et al. (1997) in assuming a strictly bottom-up segmentation strategy. Reasons for this, re-analysis of data reported in this chapter with normalized probabilities and a discussion of the implications, is given in Appendix A. For convenience, many of the calculations presented below crudely divide words into two groups—those for which the junctural phonotactics could never be found in a monomorphemic word, and those for which the junctural phonotactics are attested morpheme internally. Any probability calculation, including n-gram models, conditional probabilities, mutual information, and any calculation of over- or under- representation, would draw the line between “legal” and “illegal” in the same place. My specific choice of probability statistic therefore does not affect the majority of evidence put forward in this chapter, nor the main argument. The phonotactics of a morphologically complex word can affect its lexical representation.
3.3 Prefixes 3.3.1 Prefixedness Wurm (1997) reports on two experiments designed to investigate the nature of processing of prefixed words. He uses his results to argue that compositional and non-compositional processes are active in processing. One of the explanatory variables in Wurm’s model is the “prefixedness” of prefixed words. Wurm’s words are assigned prefixedness ratings, based on a pre-experiment rating task. The ratings were collected in the following way: A Likert scale with anchor points “Not at all prefixed” (1) and “Very prefixed” (8) was used. Participants were free to decide for themselves what “prefixed” means; no example or further instructions were given. (Wurm 1997:441) While Wurm uses the resulting ratings as an explanatory variable, he does not reflect upon what subjects might be doing when rating “prefixedness.” Prefixedness ratings are not simply semantic transparency ratings. We know this, because Wurm also collected semantic transparency ratings, and they were not the same. The words concave, disfigure and engulf, for example, are rated as highly prefixed, but have low semantic transparency. I conducted a post-hoc analysis of Wurm’s stimuli, which are provided in the appendix of his paper, in order to determine whether there was any correlation between the probability of the phoneme transition over the morpheme boundary and subjects prefixedness ratings. That is, while disfigure is not highly semantically transparent, perhaps the low probability /sf/ transition plays an active role in maintaining the apparent prefixedness of this form. Its phonotactics indicates it is highly decomposable. Wurm’s list contains a few psuedo-prefixed words—words which begin with segmental material which could conceivably be a prefix, but which are not actually
Phonotactics and the Lexicon
39
prefixed. Examples include increase and disgust. Words which are not truly prefixed are unlikely to contain illegal phonological transitions. Their inclusion in the correlation reported here may artificially enhance the strength of the correlation—they are both not prefixed, and have high probability phonotactics. I therefore included in this analysis only those forms which gained a score of above 4 on either prefixedness or semantic transparency. Thus, while antibody is not regarded as semantically transparent, it is regarded as fully prefixed, and so is included in the calculation. And while twilight is not regarded as particularly prefixed, it is rated as reasonably semantically transparent, so this too, is included in the calculation. There are, however, seven words from Wurm’s total of 74, which were rated four or below on both scales. The words are perfume, increase, disgust, belong, aspire, acquire and abridge. These forms were therefore omitted on the basis of psuedo-prefixedness. We now turn to an investigation of the role of the phonotactics across the morpheme boundary in predicting perceived prefixedness. Indeed, the log probability of the phoneme transition morpheme internally is weakly but negatively correlated with perceived prefixedness (Spearman’s rho=−.26, p<.05). The less likely a phoneme transition is, the more likely the word containing it will appear prefixed. As can be seen from figure 3.1 the correlation involving intra- word probability is weak, and accounts for only a small amount of the overall variation. While this result is certainly indicative, one should interpret it with some caution, as the correlation is strongly influenced by one point—the word twilight, which appears at the bottom right of the graph. When we remove this point, the correlation drops below significance (p<.07). We can preliminarily conclude that phonotactics are weakly correlated with prefixedness ratings. There are two possible interpretations of this result. One is that the processor facilitated decomposition of those words which had low probability transitions, and so, in this particular encounter those words appeared more prefixed than words with legal transitions. That is, this result could be telling us something about the access strategy, akin with the experimental result on nonsense words, presented in chapter 2. Just as low probability transitions lead subjects to rate nonsense words as more affixed in the nonsense words experiment, so too did low probability transitions lead subjects to rate actual words as more prefixed in the rating task reported here. If this interpretation of the results were an accurate explanation of the subjects’ behaviour, then this post-hoc analysis would provide further evidence in favor of the claim that phonotactic probabilities are used in the online parsing of words. There is a second interpretation of this correlation. This interpretation is related, but focuses more on the way that words have been lexicalized. The representation of prefixed words which contain low probability transitions is likely to be highly decomposable. Thus, even if words were presented visually, we would expect phonologically well-formed words to be rated less prefixed than words with violations across the morpheme boundary. The result of a mechanism that facilitates a decomposed parse, is a highly decomposed representation. Of course, because Wurm’s stimuli were presented auditorily, this result is likely a combination of the two explanations posited above. We know from the experiment with nonsense words that a parsing strategy involving phonotactics exists. And we will see from the results in the next section, that this has a lasting effect upon representation.
Causes and consequences of word structure
40
Phonotactics affects not only “prefixedness,” but also semantic transparency, polysemy, and semantic drift.
Figure 3.1: Log probability of the transition, with average prefixedness scores. The line shows a nonparametric scatterplot smoother. 3.3.2 Semantics In this section we will see three different types of evidence that phonotactics affects the semantic representation of complex words. The first type of evidence comes from posthoc analysis of semantic transparency ratings, collected by Wurm (1997). The second and third types of evidence are drawn from correlations drawn over Webster’s 1913 Unabridged Dictionary, regarding the number and the nature of definitions listed for complex words. Semantic Transparency Ratings: Wurm (1997) As discussed in the previous section, Wurm (1997) conveniently lists in his appendix a set of numbers which are very useful for our purposes. One of the types of information
Phonotactics and the Lexicon
41
Wurm collected from his subjects was semantic transparency ratings for a set of 74 prefixed verbs. The procedure is described as follows. ..on each trial, particpants heard a full-form and its corresponding root and were asked to rate “…how related in meaning the two words are” on an 8point Likert scale. Anchor points were labeled “Not at all related” (1) and “Very related” (8). On half of the trials, the full-form was presented first, and on half the root was presented first. (Wurm 1997:441) In this task subjects were given longer to respond than in the prefixedness rating task— 4000ms as opposed to 2500ms. The nature of the question also requires complete lexical access of both the full-form and the root, and contemplation of their relation in meaning. Thus, semantic transparency ratings are less likely to be the direct result of a parsing strategy than prefixedness ratings. We do, however, still expect to see a result of the phonotactics—we expect to see the result of the effect of phonotactics upon the representation. Words with low probability transitions across the morpheme boundary are highly likely to be parsed as decomposed, and so the relation between the derived form and its base is likely to remain very robust. Words with high probability transitions, however, are likely to be accessed whole, and so are not tied to the representation of the base. As a result, we might expect semantic drift to occur, and the relationship between the derived form and the base to become increasingly opaque. Thus, while we may not expect semantic transparency ratings to directly reflect the nature of a specific parse of a word, we do expect them to reflect a lexical representation which has been shaped by an accumulation of previous parses. As such, we expect to find a correlation between semantic transparency and the probability of the phoneme transition across the base. If this correlation holds true, then this will provide more direct evidence than the prefixedness ratings that phonotactic representation has a lasting effect on representation. The effect of phonotactics upon semantic transparency was investigated, omitting the same seven pseudo-prefixed forms described in section 3.3.1. The log probability of the phoneme transitions occurring morpheme-internally was a significant predictor of semantic transparency ratings (Spearman’s rho=−.35, p<.005), as shown in figure 3.2. More probable phoneme transitions were associated with lower transparency ratings.
Causes and consequences of word structure
42
Figure 3.2: Log probability of the transition, with average semantic transparency scores. spearman’s rho=−.35, p<.005 Closer examination of figure 3.2 reveals that the predictable variation in this graph is among words with high semantic transparency ratings. If a word is thought to be semantically transparent, then the phonotactics will affect the degree of semantic transparency. Among non-transparent words, however, phonotactics have no effect. The graph is repeated in figure 3.3, this time with a regression line fit only through the words scoring above 4 in semantic transparency. The log probability of the phoneme transition occurring morpheme-internally is a significant predictor of semantic transparency ratings (Spearman’s rho=−.50, p<.002). There is no correlation at all between the phonotactics and the semantic transparency rating for low transparency words—that is, for words with a semantic transparency rating of 4 or less. (Spearman’s rho=.02, p<.9) Semantic transparency ratings and phonotactics are related. Of course, even this result may tell us more about processes of speech perception than anything about lasting effects on lexical representation. The stimuli were presented auditorily, and so, according to our predictions, the decomposed route was advantaged for those words containing low probability phoneme transitions. Once a decomposed route has been accessed, presumably the base and the derived form are perceived as very related—the connection
Phonotactics and the Lexicon
43
between them is very salient. Semantic commonalities are likely also to be salient. As such, we might expect semantic transparency ratings to be higher for words with phonotactic violations simply because of the access strategy. That is, there is some possibility that this result, too, is a function of processes of speech perception, and not a result about the nature of semantic representations. In order to more directly probe the relationship between phonotactics and semantics, then, we move away from experimental tasks which involve rating on various dimensions, and instead turn to an examination of the distribution of properties across the lexicon. Polysemy If it is the case that phonologically well-formed complex words tend towards lower semantic transparency than phonologically ill-formed words, then a prediction about polysemy follows. If a word can drift in meaning, it can presumably also proliferate in meaning. So, while inhumane-type words (with low probability phonotactics) are presumably restricted to the meanings associated with the base humane, words like insincere contain fully legal phonotactics, and so are free to acquire additional meanings, which are not directly associated with the base. The most accurate measure of such an effect would be to compare the number of meanings a derived form has, relative to its base, for inhumanetype words versus insincere-type words. This would be an extremely time-consuming task to do by hand, and slightly awkward to automate. The effect described above, however, should have an overall
Causes and consequences of word structure
44
Figure 3.3: Log probability of the transition, with average semantic transparency scores. Scatterplot smoother is fit only through words with semantic transparency ratings greater than 4. Spearman’s rho=−.50, p<.002 result, that insincere-type words are associated with more meanings, period. Here I will assume that there is no bias in the lexicon such that monomorphemic words beginning with certain phonemes have, on average, more meanings than words beginning with any others. Assuming no effect upon polysemy of the identity of an initial phoneme, we do not need to compare meanings of derived forms with the meanings of their bases. Rather, we will just concentrate our calculations upon the derived forms themselves. As an initial investigation of this question, a set of matched pairs was chosen, which were matched for prefix identity and for the relative frequency of the derived form and the base. For each pair, the phonological transition was more probable in one member than the other. The list of words, with the frequency of the derived form and the base, is given in table 3.3, together with the number of definitions. The words in the leftmost
Phonotactics and the Lexicon
45
column contain a phoneme transition across the morpheme boundary which is more probable, word internally, than their matched pairs occurring in the right hand column. Each of the derived forms was located in the online Oxford English Dictionary, and the number of meanings listed was counted. Meanings listed as “obscure” were excluded from the count. As predicted, words with legal phonology across the morpheme boundary had significantly more meanings associated with them than their counterparts containing less probable phonotactics (Wilcoxon, p<.01). While the words containing probable junctural phonotactics averaged 3.08 listed definitions, the words containing relatively less probable phonotactics averaged 1.96 listed definitions. This controlled set therefore provides a clear initial indication that the expected effect exists. The effect reported here is, however, based only on 24 pairs of words, leaving open some possibility that the result is a coincidence, and not a genuine reflection of an overall trend in the lexicon. I therefore moved away from the online OED, to a downloadable ascii dictionary: Webster’s 1913 Unabridged English Dictionary. The existence of an ascii dictionary allows for large scale calculations over large sets of words. Obviously there is not a one to one relationship between mental representations and the content of dictionary entries. However we do expect that, overall, words which are truly highly polysemous will be associated with high numbers of definitions in the dictionary. In this sense, there may be some serendipitous advantage to using a dictionary which was compiled in 1913. The entries are likely to represent a closer reflection of the dictionary writer’s mental representations than more modern-day dictionaries which may be subject to stricter conventions and consistency of content. Clearly phonotactics will not be the only factor related to the polysemy of complex words. In particular, as will be demonstrated in chapter 5, lexical freword A freq base freq defs word B freq base freq defs desalt
1
666
1
deice
0
944
1
entomb
5
337
3
entrap
11
456
2
entrench
14
181
3
enthrone
6
189
2
imperfect
50
94
12
infrequent
44
49
2
insincere
22
154
2
inhumane
28
223
1
miscast
9
737
1
misrule
7
658
2
mismanage
9
2289
1
mishandle
8
1193
1
regenerate
47
662
5
reorganize
61
1118
1
retouch
12
1967
2
reuse
15
2216
1
substandard
17
1846
3
subnormal
10
1649
1
substation
3
2093
3
substructure
8
1879
1
uncork
7
98
2
unhinge
112
64
4
unfold
137
685
5
unload
122
493
6
uptown
7
4005
3
upstage
5
2933
2
Causes and consequences of word structure
46
transpose
24
496
2
transfix
28
725
1
mistrust
35
858
4
misjudge
30
718
1
incompatible
94
102
4
inhospitable
36
44
2
indignity
56
453
2
ingratitude
13
197
1
untangle
15
155
2
unscramble
8
263
2
uncivil
3
1209
4
unquiet
6
1602
3
ungenerous
17
457
1
ungrateful
36
431
2
unveil
52
1656
4
unmask
24
354
5
unsaddle
6
177
3
unbuckle
12
75
2
misconceive
20
361
2
misdirect
19
1472
1
Table 3.3: Matched pairs which differ in junctural phonotactics, with the number of definitions listed for each word in the OED. quency issues are also relevant. However, if we run our calculations over a large enough set of data, we expect there to be an overall trend. Phonotactically illformed prefixed words should be associated with fewer meanings than phonotactically well-formed prefixed words. First, nine prefixes were chosen from CELEX, and all forms listed in CELEX as containing these prefixes were extracted. CELEX codes words as prefixed only when there is a corresponding root entry. That is, words like transfer are not coded as prefixed, but rather as containing “obscure morphology.” The CELEX entries provide frequency information for the entire form, as well as a morphological decomposition, which allowed for the automatic retrieval of the frequency of the base, as will become important in chapter 4. CELEX also provides a syllabified phonological transcription for each form, which could be used (in connection with the corpus of monomorphemes described in section 3.3.1) to tag each word with the observed and expected probability of the phoneme transition across the morpheme boundary. Only prefixed forms with monomorphemic bases were included, and duplicate headwords were deleted. In cases of multiple entries, only the most frequent was included. Second, for each of the 515 prefixed word in this set, the Webster’s dictionary was consulted, and the number of definitions listed was retrieved. Two example entries from the Webster’s dictionary are given below. Some tags relating to pronunciation are removed, to make the entry easier to read. Disk <def>A discus; a quoit. <def>A flat, circular plate;
as, a <ex>disk of metal or paper. <def>The circular figure of a celestial body, as seen projected of the heavens. <def>A circular structure either in plants or animals;
as, a blood <ex>disk; germinal <ex>disk, etc. <def>The whole surface of a leaf. <sd>(b) <def>The central part of a radiate compound
Phonotactics and the Lexicon
47
flower, as in sunflower. <sd>(c) <def>A part of the receptacle enlarged or expanded under, or around, or even on top of, the pistil. <def>The anterior surface or oral area of coelenterate animals, as of sea anemones. <sd>(b) <def>The lower side of the body of some invertebrates, especially when used for locomotion, when it is often called a <xex>creeping disk. <sd>(c) <def>In owls, the space around the eyes. Diskless <def>Having no disk; appearing as a point and not expanded into a disk, as the image of a faint star in a telescope. Completely distinct definitions are listed between distinct def tags. Within each set of def tags, different shades of the same meaning are separated by semicolons. Note that the entry for disk contains a large number of meanings. The multimorphemic diskless, however, contains relatively few. This is because it is semantically transparent, and so can simply refer to the meaning of disk. It has not acquired additional meanings which are not related to the base form. In genLEGAL: above average
LEGAL: below average
ILLEGAL above average
ILLEGAL below average
dis
51
44
4
7
un
26
41
7
14
in(sane)
58
46
1
2
em
23
32
1
8
up
3
17
2
6
mis
2
41
1
8
11
24
1
7
ex
2
4
0
2
trans
3
4
2
10
179 (41%)
253 (59%)
19 (23%)
64 (77%)
in(doors)
TOTAL
Table 3.4: Number of forms with above/below average number of listed definitions, for prefixed forms containing legal vs illegal phoneme transitions. Chi-Square on Totals line: ChiSquare=9.35, df=1, p<.005. eral, then, we expect transparent multimorphemic words to contain relatively few definitions. Only if a prefixed word has become relexicalized and lost semantic transparency, do we expect it to contain a large number of definitions, on par with the entry for disk.
Causes and consequences of word structure
48
All headwords were de-capitalized, and pronunciation tags were removed, so as to ensure equivalence between the headwords in CELEX and in the ascii dictionary. A script counted definitions for each entry in the dictionary, by counting the number of definitions, as separated by semi-colons, and occurring between <def> flags. It was assumed that prefixed forms which were listed in CELEX, but did not appear in the Webster’s dictionary were not associated with semantic drift, and they were tagged as being associated with just one meaning. The average number of definitions over all 515 prefixed forms in the database is 5.2. Table 3.4 shows the numbers of prefixed forms with an above average number of listed meanings, and those with a below average number, for prefixed forms with legal and illegal transitions across the morpheme boundary. The tendency is in the right direction for 7 of the 9 prefixes investigated, and the overall distribution is highly significant (Chi-Square=9.35, df=1, p<.005). Prefixed forms containing illegal transitions have significantly fewer meanings associated with them than prefixed forms which contain phonologically legal transitions. We saw this first with a carefully selected set of matched pairs, where phoneme identity and frequency factors were tightly controlled. There we saw that pairs with low probability or illegal transitions were associated with signifi-cantly fewer meanings than their matched counterparts. Here we see this pattern replicated over a calculation of 515 forms, with phonology information calculated from the CELEX lexical database, and definitions extracted from the Webster’s 1913 database. This relationship between phonotactics and degree of polysemy appears to be robust. Is it also gradient? When we examine the overall relationship between polysemy and phonotactics, we find a very weak, but very significant correlation. Again, it would be a curious situation if phonotactics were the main predictor of how many meanings a word was able to accumulate. If a word is at least possible phonotactically, then a host of phonology external factors will influence the degree to which it proliferates meaning. However, we do see a fairly weak relationship, such that the better the phonotactics, the more meanings are associated with the word (Spearman’s rho=.20, p<.0001). The graph of the log probability against the log number of meanings listed in Webster’s dictionary, is given in figure 3.4, with a non-parametric scatterplot smoother fit through the points. For the purposes of this correlation, non-occurring observed forms were treated as occurring just once in the corpus of monomorphemes, so as to avoid taking log of zero. The highly polysemous word with marginal phonotactics which can be seen on the graph is the word discharge. The log number of meanings rather than the absolute number of meanings is used because meaning proliferation is likely to be exponential. If every meaning of a word has some possibility of spawning new meanings, then the more meanings a word has, the more likely it is to acquire still further meanings. As new meanings are acquired, the possibilities for further meanings grows exponentially. Incidentally, the log number of meanings associated with a form is also partially predictable from the relative frequency of the derived form and the base—another factor we will argue is relevant to decomposition. We will return to discussion of this point in chapter 5, but for now it is worth noting, as corroborative evidence that the number of meanings associated with a word can indeed by tied to degree of decomposability. Degree of Semantic Drift
Phonotactics and the Lexicon
49
Words with illegal phoneme transitions are less polysemous than words containing legal phoneme transitions. This is evidence that well-formed words tend to acquire additional meanings. Can we find even more direct evidence that they actually drift away from the meaning of their base? That is, can we be sure that well-formed words tend to lose semantic transparency, rather than, say, simply acquiring new meanings on top of a robust semantically transparent meaning? In an attempt to eliminate this possibility, a second calculation was performed
Figure 3.4: Log probability of the transition occurring morpheme internally, with log number of meanings. Spearman’s rho=.20, p<.0001 using the set of 515 prefixes described above. Two entries for words taken from the Webster’s dictionary are shown below. Some tags relating to pronunciation have been removed. Dishorn <def>To deprive of horns;
as, to <ex>dishorn cattle.
Causes and consequences of word structure
50
Dislocate <def>To displace; to put out of its proper place. Especially, of a bone: To remove from its normal connections with a neighboring bone; to put out of joint; to move from its socket; to disjoint;
as, to <ex>dislocate your bones. The word dishorn is maximally semantically transparent. It has neither shifted nor proliferated in meaning. That it has not proliferated can be evidenced by the small number of definitions associated with it. One index of the fact that it has not shifted is the explicit presence of the base word horn in the definition. Dislocate, on the other hand, is not quite so transparent in meaning. Thus, it not only is associated with more different meanings than dishorn is, but the base word locate is not explicitly invoked in the definition. If a prefixed word is highly transparent, then it should be easily defined by referring to the base word. Indeed, the majority of the 515 prefixed words do mention the base word in their definition. We can therefore take the absence of the base word from the definition of a prefixed word to be meaningful—a clear sign of semantic drift. So if we examine the 515 prefixed words described above, do words with illegal transitions tend to have definitions which explicitly mention their bases more often than words with legal transitions do? A short Perl script was written to test this hypothesis. For each word in the data-set, the Webster’s dictionary was consulted and the relevant entry was retrieved. The prefix was stripped from the head word, and then the definitions were consulted to see if they contained the base word. If the word was found, a match was returned, and if not, some simple transformations were performed on the definition words (such as the removal of “s” endings, to catch plurals), and the definition was consulted again. If no entry was found for a CELEX prefixed word in the Webster’s dictionary, it was assumed to be transparent, and tagged as if the base was present in its definition. As mentioned above, this set constituted a small proportion of the words. All forms which were tagged as not listing their base in the definition were then checked by hand. This process caught some bases which were not identified by the simple heuristics used in the Perl script. The script did not attempt any modifications on the base words. The word dissatisfy, for example, included satisfied in the definition, which was not caught by the script, but was tagged by hand as a match. In addition, some definitions referred to other definitions which contained the base word. The definition for encrust, for example, was simply “To LEGAL: base absent
LEGAL: base present
ILLEGAL base absent
ILLEGAL base present
dis
29
66
1
10
un
5
62
1
20
21
83
0
3
em
8
47
0
9
up
2
18
1
7
mis
6
37
1
8
in(doors)
6
29
2
6
in(sane)
Phonotactics and the Lexicon
51
ex
4
2
0
2
trans
5
2
4
8
86 (20%)
346 (80%)
10 (12%)
73 (88%)
TOTAL
Table 3.5: Number of definitions which do/do not explicitly invoke the base, for prefixed forms containing legal vs illegal phoneme transitions. Chi-Square on totals line=2.34, n.s. incrust. see Incrust.” And the definition for incrust does contain the base word crust. Only inflectional variants of the base words were accepted. Two exceptions involve definitions for the prefixes dis- and un-. Bases prefixed with un- were admitted for head words prefixed in dis, and vice versa. (1), for example, contains the entry for unjoin. (1) unjoin: To disjoin. This was coded as if the base word was mentioned in the definition. Words defined in this way were clearly semantically transparent. Table 3.5 shows the distribution of definitions which explicitly invoke their base, versus those that do not, across nine sets of prefixed forms, with legal versus illegal phoneme transitions. Taking the nine prefixes together as a set, forms with illegal phonological transitions across the morpheme boundary are more likely to explicitly invoke their base in their definitions than forms containing transitions which are attested morpheme-internally, although this tendency does not reach significance. Of the 9 prefixes, 7 of them display higher rates of non-transparency among forms with legal phonotactics. Thus, all forms of evidence we have investigated point to a situation in which phonotactics and semantic transparency are intimately related. Words containing phonological material indicating juncture appear more prefixed, receive higher semantic transparency ratings, are more polysemous, and display a (non-significant) tendency to be associated with semantically transparent definitions in the dictionary. In the next section, we briefly examine the frequency characteristics of prefixed words, demonstrating that phonotactically ill-formed words are more often associated with the frequency characteristics of highly decomposable forms than are words containing morpheme-internal phonotactics. 3.3.3 Lexical Frequency As will be discussed in detail in upcoming chapters, the frequency of a derived form relative to its base is related to that derived form’s decomposability. Derived forms which are more frequent than their base are more independent of their bases, less semantically transparent and more polysemous. Indeed, many of the phenomena discussed in this chapter will also be shown to be true for relative frequency of the derived form and the base. In addition, the phonetic implementation of such forms also tends to reflect monomorphemic structure—cues to juncture are absent or reduced.
Causes and consequences of word structure
52
Since relative frequency is related to decomposition, we might expect to see a correlation with phonotactics. Forces from two directions conspire to suggest that such a correlation might result. First, as will be demonstrated in chapter 6, and mentioned above, the phonetic implementation of derived forms which are more frequent than their base tends to minimize cues to juncture. The /t/ in listless, for example, is more often deleted than the /t/ in tasteless. This fact might lead us to expect a process of relexicalization, over time, such that forms for which the derived form is more frequent than the base acquire phonologically well-formed representations. Second, as demonstrated above, phonotactically well-formed words are more easily liberated from their bases than words which contain a phonotactic cue to juncture. As the relationship between the representation and semantics of the two forms weakens, we might expect the frequency dependence to also weaken. Highly decomposable forms are much less frequent than their bases. As words become increasingly independent, acquire new semantics, and so on, we might also expect them to become more frequent. Thus, we expect to find a conspiracy in the lexicon such that words which display optimal decomposability phonology-wise, also display optimal decomposability frequency-wise. And, similarly, words which display word-internal phonotactics across a morpheme boundary should be more likely to be liberated from (and so more frequent than) their bases. In this section we investigate whether there is a correlation between phonotactics and lexical frequency in the corpus of 515 prefixed forms described in previous sections. LEGAL: whole>base
LEGAL: base>whole
ILLEGAL whole>base
ILLEGAL base>whole
dis
8
86
0
11
un
9
58
1
20
22
82
0
3
em
8
47
1
8
up
0
20
0
8
mis
1
43
1
8
in(doors)
3
32
0
8
ex
0
6
0
2
trans
0
7
0
12
51 (12%)
381 (88%)
3 (4%)
80 (96%)
in(sane)
TOTAL
Table 3.6: Relative frequency of derived form and base for prefixed forms containing legal vs illegal phoneme transitions. Chi-Square on Totals line: Chi-Square= 3.98, df=1, p<.05
Phonotactics and the Lexicon
53
CELEX provides frequency counts for each of its entries—of counts per 17.9 million taken from the COBUILD corpus. The frequencies reported here are lemma frequencies—the frequency counts for a given word include its inflectional variants. For each prefixed form, the frequency information was retrieved, and a short Perl script located the base using the morphological analysis provided by CELEX. The entry for the base was then located in CELEX, and the frequency information for that form was also retrieved. Table 3.6 shows the numbers of prefixed forms containing legal and illegal transitions, for which the derived form is more frequent than the base, and vice-versa. One form, for which the derived form and base were exactly equal in frequency, is not included in the table. Derived forms which are more frequent than their bases are extremely unlikely amongst prefixed forms containing illegal transitions. This fact makes the pattern very clear, but obtaining robust statistics on it slightly difficult. Overall, the distribution is significant on a Chi-Square (Chi-Square=3.98, df=1, p<.05), although this figure should be treated with caution, given the distribution contains one cell containing less than five items. The three examples populating the sparse cell are mishap, entwine and unravel. It is also worth noting that there are three prefixes for which the derived form is never more frequent than the base regardless of phonotactics. These forms are trans-, up-, and ex-. In fact, these are the same three prefixes among the set which create the highest proportion of forms with illegal junctural phonotactics. Thus, while the inclusion of these prefixes in the distribution perhaps dilutes our effect overall they seem to provide general confirmation of the hypothesis. These are the most “separable” of all our prefixes, in terms of the phonotactics, and for these forms the derived form never appears to become sufficiently liberated from the base so as to overtake it in lexical frequency. The extremes of our distribution, then, appear to confirm a relation between the frequency characteristics of the prefixed words, and aspects of their phonology indicating juncture. However this, presumably, is not a categorical effect, and so we expect it to show up in a more gradient investigation of the relationship. For this calculation, non-existent forms were treated as if they occurred just once in the corpus of monomorphemes, so as to avoid taking the log of zero. Bases and derived forms with listed frequencies of zero were also treated as having a frequency of one. The log probability of the junctural phonotactics occurring morpheme-internally proved an extremely weak, but highly significant predictor of the log ratio of the derived frequency over the base frequency (Spearman’s rho=−.19, p< .0001). The relationship is shown in figure 3.5. As was demonstrated with meaning, the phonotactics has a weak, but highly significant effect. Again, it would be almost disconcerting if phonotactics played an extremely dominant role in predicting meaning and frequency characteristics of words. However, as with meaning, we see a distinct and significant difference between words with illegal and legal transitions, and then a weak but robust pattern across prefixed words in the lexicon as a whole. Figure 3.5 shows the overall pattern, with a regression line fit through the data. This result that relative lexical frequency and phonotactics are weakly correlated brings together two factors which take centre-stage in this book. In English, bad phonotactics creates good morphology. And maximally robust prefixed forms are less
Causes and consequences of word structure
54
frequent than the bases they contain. Together these factors enter into the grand conspiracy which is the English lexicon, working together to make morphology perceivable, maintainable and robust. Overall, we have found consistent evidence that phonotactics is related to the decomposition of prefixed forms. We now turn to an investigation of the relationship between suffixes and prefixes.
3.4 Suffixes In contrast to prefixes, suffixes appear at the end of words. If their affixation creates a phonotactic violation, therefore, that violation occurs after much information has become available. That is, the race is already well underway by the time the phonotactic cue to juncture appears, and so the extra boost that this should
Figure 3.5: Log probability of the transition morpheme-internally, vs relative lexical frequency. Spearman’s rho=−.19, p<.0001. The line shows a non-parametric scatterplot smoother fit through the data.
Phonotactics and the Lexicon
55
provide to the decomposition should be relatively small. We expect the effect of phonotactics in suffixes to be weaker than it was in prefixes. This section examines the relationship between phonotactics and suffixedness using a database similar to that used in our investigation of prefixedness—suffixed forms from CELEX, tagged for the probability of the phonological transition, and analyzed in conjunction with Webster’s 1913 Unabridged English Dictionary. 3.4.1 Semantics Degree of Semantic Drift As described in section 3.3.2, we expect semantically transparent forms to mention their base in their definition more often than non-transparent forms. One measure of the role of phonotactics in decomposition, then, is the degree to which semantically transparent definitions correlate with the phonotactics across the juncture. Five suffixes were chosen, all of which have a tendency to create bad phonology across the juncture: -ful, -less, -ly, -ness, -ment. All forms containing one of these affixes, affixed to a monomorphemic base, were extracted from the CELEX lexical database. In cases of multiple entries, only the entry with highest lexical frequency was included. This led to a corpus of 2028 words. For each of the words, the morphological analysis provided by CELEX was used to retrieve the lexical frequency of the base form. And each form was tagged with the probability of the phoneme transition across the morpheme boundary occurring morpheme-internally, using the syllabification provided by CELEX. The exact nature of this calculation is described in section 3.2 A simple Perl script then identified the relevant entry for each word in the Webster’s dictionary, stripped the suffix off, and attempted to locate the base word in the set of definitions provided. The script tested for an exact match with the base, as well as attempting to identify simple inflectional modifications. For those words not listed in the dictionary, I assumed the words to be either extremely recent innovations, or highly transparent, and so the script coded them as if it had located both the word, and the base within the definition of the word. All words which were not identified as having transparent definitions were then double checked manually, to check for unpredictable allomorphy, which the script might have missed. The collapsed distribution is given in table 3.7. The distribution shows a slight trend, but the trend does not even approach significance. In this set of suffixed forms, the proportion of forms with illegal transitions which omit the base from the definition is slightly smaller than the proportion of forms with legal transitions which omit the base. Based on the set root absent root present illegal transition
27 (14%)
283 (15%)
legal transition
169 (86%)
1549 (85%)
Causes and consequences of word structure
56
Table 3.7: Number of forms with base word absent or present in the definition, for suffixed forms with (il)legal junctural phonotactics. above-average
below-average
illegal transition
112 (19%)
198 (14%)
legal transition
488 (81%)
1230 (86%)
Table 3.8: Number of forms with above/below average number of definitions, for suffixed forms with (il)legal junctural phonotactics. of data here, then, we can conclude that either the effect of phonotactics upon the decomposition of suffixed forms is non-existent, or it is much weaker than the effect upon prefixed forms, and so would need a bigger data-set than the one analysed here to emerge as a significant tendency. Polysemy Does an effect of phonotactics show up in an investigation of polysemy? The same corpus of 2028 forms was analyzed in conjunction with definition counts in Webster’s dictionary. Definitions were counted using the technique described in 3.3.2. The average number of definitions associated with this data set is 2.1. The data was therefore divided into those listing an above-average number of definitions (3 or more), and those listing a below-average number of definitions (2 or fewer). Table 3.8 breaks down the distribution according to the legality of the phoneme transition. This distribution provides no evidence that phonotactic well-formedness is associated with polysemy for suffixed words. In fact, the distribution suggests, at least for this dataset, that the opposite holds true. Forms with illegal transitions in this data set are associated with higher rates of polysemy than forms with legal transitions. Given that this is the opposite tendency than that observed with the semantic transparency data, above, and given that there is no conceivable reason why such a relationship should hold, the strength of this tendency is likely due to chance. There is, in any case, no evidence here to lead us to believe that low probability transitions lead to increased decomposability for suffixed forms. This conclusion is reinforced by investigating the relative frequency effects. derived form more frequent base more frequent illegal transition
26 (17%)
281 (15%)
legal transition
131 (83%)
1564 (85%)
Table 3.9: Number of forms with derived form more/less frequent than base, for suffixed forms with (il)legal junctural phonotactics.
Phonotactics and the Lexicon
57
3.4.2 Lexical Frequency In our investigation of prefixed forms, we noted a significant correlation between phonotactics and relative frequency. Prefixed forms containing legal transitions were more likely to be more frequent than their bases than prefixed forms containing illegal transitions. The relevant distribution for suffixed forms is given in table 3.9. Note that there are 26 forms for which derived frequency and base frequency are identical—these are not included in this table. Again, there is no evidence here for an effect of junctural phonotactics upon decomposability, and, in fact, again the trend goes slightly in the wrong direction. 3.4.3 Summary: Suffixes and Junctural Phonotactics Our investigation into the relationship between junctural phonotactics and suffixed forms in the lexicon has failed to reveal any reliable effect. There are at least three likely explanations for the absence of this effect. First, speech is inherently temporal; listeners perceive bases before suffixes. Cutler et al. (1985) argue that this, together with the fact that language users prefer to process stems before affixes, is responsible for the generalization that suffixes are much more frequent across the world’s languages than prefixes (Greenberg 1963). In terms of the discussion here, (transparent) suffixed forms have a natural bias towards decomposition, because the full form is not uniquely identifiable until after the offset of the base is encountered. This bias may be sufficiently strong that the late phonotactic cue might not provide any additional advantage. Indeed, a range of experimental paradigms has reaped evidence that prefixed words are treated differently in processing than suffixed words (Segui and Zubizarreta 1985, Beauvillain and Segui 1992, Cole, Beauvillain and Segui 1989, Marslen-Wilson et.al 1994). The junctural phonotactics tend to come at the unique point of the word for all of the affixes analyzed here. None of the five affixes investigated cause phonological changes to the base. Indeed such affixes were purposely avoided for this investigation, in an attempt to abstract away from factors which make an independent contribution to (non)decomposability. Thus, any phoneme following the base is likely to be posited as beginning a new morpheme. Suffixes differ importantly from prefixes then, in the relative positioning of the base, and the phonotactic cue to juncture. In prefixed words, the phonotactic cue comes first. In suffixed words, the base does. Second, the five suffixes included in this set all contain phonology which is likely to create low probability phonotactics. Experiment 3 indicated that there does exist some effect of phonotactics upon decomposition of suffixed forms. For many forms containing the suffixes investigated above, the fast phonological preprocessor would therefore bias the decomposed route. For each of these affixes, then, the decomposed route should win fairly often, making the affix nodes frequently accessed, and giving them a very high resting activation level. Once the resting activation level of an affix is sufficiently high, it is easily activated, and a bias arises for the parsing route in all forms containing that affix, regardless of the phonotactics. This would lead to a synchronic situation in which the decomposed route was generally favored for all forms containing an affix which created low probability phonotactics with a sufficiently large number of bases. Chapter 7 will
Causes and consequences of word structure
58
present evidence that large numbers of affixed forms containing illegal phonotactics leads to high levels of productivity. Third, phonemes at the beginnings of words carry a higher burden in the word recognition process than phonemes towards the ends of words. Phonetic alterations or imprecisions word-initially are likely to detrimentally affect wordrecognition. However, alterations word-finally, are less likely to affect the word recognition process, and so are easier to “get away with.” For example, Bagley (1900 as cited in Cutler et.al 1985) demonstrated that mispronunciation of an initial consonant was much more disruptive to word recognition than the mispronunciation of a final consonant. Speakers appear to attempt not to distort word onsets (Cooper and Paccia-Cooper 1980). Frisch (1996,2000) demonstrates that word onsets display special behavior in speech error tasks, and that OCP-Place restrictions are more strictly enforced at word onsets. And importantly, subjects are less likely to notice mispronunciations if they occur late in a word (MarslenWilson and Welsh 1978). As such, the phonetics at suffixal junctures is highly likely to be more malleable than the phonetics associated with prefixes. A large number of phonological violations across suffixal boundaries may, in fact, be resolved in phonetic implementation. The words listless and swiftly, for example, both contain phonological cues to juncture. However, both are more frequent than the bases they contain, suggesting that the whole-word route should be relatively fast. As a whole-word representation becomes increasingly robust, we might expect the phonological cues to juncture to disappear. Indeed, in chapter 6, I present data demonstrating that rates of /t/ reduction in the above words are anomalously high, in comparison with words like tasteless, or softly. The phonological pro-cessor, then, may never receive some of the phonological violations which would otherwise indicate a suffixal morpheme boundary. Overall, the results in this chapter indicate that the effect of phonotactics is much stronger upon prefixes than suffixes. However it is important that the lack of an observed effect in this limited subset of suffixes should not be used to conclude that phonotactics are completely irrelevant to suffixation. Indeed, we found in experiment 3 that subjects rate suffixed words containing low probability transitions as more complex than matched counterparts containing higher probability transitions. We should expect that if we were to move to less subtle contrasts, this should manifest itself in corpora, in similar ways to those observed with prefixes. Vowel-initial versus consonant-initial suffixes, for example, create radically different junctural phonotactics. In particular, consonant-initial suffixes tend to maintain the syllable structure of the base, whereas vowel-initial suffixes will invoke some resyllabification. As such, we should be very surprised if words containing consonant-initial suffixes in general were not more decomposable than words containing vowel-initial suffixes. The neural network described in chapter 2 was extremely successful at learning syllabic restrictions on the placement of word boundaries. Based on the junctural phonotactics alone then, we expect a network trained to find word boundaries to automatically hypothesize juncture at at least some boundaries involving consonant-initial suffixes, but practically never at boundaries involving vowelinitial suffixes. Minimally, then, consonant-initial suffixes should tend to be activated in the parsing route more often than vowel-initial suffixes. That this is the case is strongly indicated by patterns of phonological productivity (see chapter 7), and by the range of effects associated with theories of level-ordering (chapter 8).
Phonotactics and the Lexicon
59
3.5 Summary In this chapter we have investigated the effects of phonotactics upon the decomposition of real words. As demonstrated in chapter 2, subjects can exploit information provided by probabilistic phonotactics for the task of morphological parsing. In this chapter we have seen that it is not just the case that subjects can rally such information in tasks involving nonsense words, but it is also the case that they do use such information. In addition to replicating the results of chapter 2 with real words, we have seen evidence of a phonotactically-based segmentation strategy throughout the prefixed lexicon. Numerous effects thought to be related to decomposition (or demonstrated to be, in other chapters of this book), have been shown to correlate with the nature of the phonotactics across the morpheme boundary of prefixed words. Low probability phonotactics are associated with robust morphology. We have seen evidence that when prefixed words do not in-clude phonotactic information indicating a boundary, they are prone to a reduced perception of “prefixedness,” a loss of semantic transparency. proliferation of meaning and an overtaking of their base’s lexical frequency. Such forms are readily relexicalized, and liberated from their base. Finally, we observed that, while evidence of the role of phonotactics upon the decomposition of prefixed forms was clearly detectable in the lexicon, no effect with suffixes emerged. Such a difference between prefixes and suffixes is predictable from the inherently temporal nature of speech perception.
CHAPTER 4 Relative Frequency and Morphological Decomposition
In this chapter we move away from prelexical processing, to a lexical factor relevant to speech processing. Lexical frequency affects the speed of lexical access—with frequent words accessed more quickly. Many researchers have argued that lexical frequency affects morphological decomposition. The experimental work in this area has tended to focus on the extent of decomposition in word recognition (e.g. Taft 1979, Burani and Caramazza 1987, Cole et al. 1989), and has provided mixed results. Despite these mixed experimental results, however, models of morphological access and productivity tend to regard lexical frequency as important. High frequency forms, it is claimed, tend to be accessed whole, are not easily decomposed, and so do not contribute to the productivity of the affixes they contain (Modor 1992, Baayen 1992, 1994, Baayen et al. 1997, Bybee 1988, 1995b and others). In this chapter I explore the role of lexical frequency in morphology, arguing that this emphasis on the role of absolute frequency is too strong. Section 4.2 outlines the general assumptions about the role of surface frequency which are found in the literature, and section 4.3 outlines some results relating to the role of the frequency of the base. Section 4.4 steps through a range of current models of morphological access, demonstrating that, where such models predict a role of lexical frequency upon decomposition, the role is one of relative lexical frequency. While the proponents of many of these models emphasize the role of surface frequency, examination of the models themselves reveals that they predict an interaction between the surface frequency of the complex form, and the frequency of the parts. Maximally decomposable forms should be those which are much less frequent than the parts they contain. Non-decomposable forms should be those which are more frequent than the parts they contain. Two experiments are presented which provide evidence in favor of this interpretation. First, an experiment described in section 4.5 demonstrates that derived forms which are more frequent than the bases they contain are rated less complex than matched words which are less frequent than their bases. Second, in section 4.6 I demonstrate that this has consequences beyond the simple meta-linguistic rating task. Relative frequency affects the degree to which a prefixed form can attract a contrastive pitch accent to the prefix.
Relative Frequency and Morphological Decomposition
61
4.1 Relative Frequency in Morphology Anshen and Aronoff (1988) have pointed to the importance of relative frequency for irregular forms. Irregular derived forms (such as irregular plurals), they note, tend to be more frequent than the bases they contain. They claim it is this which facilitates fast access to a form like feet, effectively blocking formation of a regular plural on the relatively lower frequency foot. This analysis implicitly assumes a dual-route model— forms may be accessed via whole word access or online composition. A higher frequency “whole word” facilitates whole word access, whereas a higher frequency base facilitates online composition. Baayen and Lieber (1991) claim that such effects arise only in extreme cases, typically involving inflectional affixes, where the distinction between productive and unproductive is extremely clear. In fact, in their discussion of morphological productivity, Baayen and Lieber (1991) explicitly dismiss any relevance of relative frequency, citing evidence presented in Thorndike (1943). Thorndike calculates sets of “derivation ratios” for a number of word formation rules. The derivation ratio of a complex word is the frequency of that word, over the frequency of its base. Baayen and Lieber explain: When such derivation ratios are calculated for productive WFRs, distributions of derivation ratios are obtained that show a wide range of possible shapes, scarcely narrower than the theoretically possible maximum range. Moreover, the distributions obtained for unproductive WFRs fall within the same range. Hence it is impossible to distinguish between productive and unproductve WFRs on the basis of these derivation ratios. (Baayen and Lieber 1991:807). The thrust of their argument is based on Thorndike’s report that derivation ratios appear to be fairly continuous. However Thorndike only reports ratios for two cases: -ly, and the negative prefixes in-, un-, non- and dis-, which are collapsed and considered together. Thorndike (1943:27) reports: ..the range of variation is very great, from 0 or near 0 to 10 or over. […] In all, there is absence of any wide gaps; and the variation would probably be continuous if the size of the sample taken was increased to include all nouns and all adjectives. Given that Thorndike does not investigate the differences between many different affixes, it seems premature to dismiss any relationship between derivation ratios and productivity. In addition, the supposed continuity of the distribution of derivation ratios does not rule out the possibility that words with high ratios display different properties than those with low ratios. Indeed, Thorndike’s discussion suggests that words with high derivation ratios may display special properties: …many [-ly forms with high derivation ratios] can be accounted for only by specialized habits, so specialized that the ratios for synonyms and
Causes and consequences of word structure
62
antonyms of the ized habit is hermetically, which has a ratio of infinity in our counts. It is learned, words in question are in most cases much lower […] an extreme case of a specialand used in the phrase hermetically sealed by probably ninety nine out of a hundred users, with nothing thought about hermetical and with nothing known about it save what can be inferred from the meaning of hermetically sealed. (Thorndike 1943:37) Thus, it seems at least possible that words which are more frequent than the bases they contain may display special properties, and be associated with lower levels of decomposability. This is certainly predicted by our schematic dual-route model, introduced in chapter 1, and repeated for convenience in figure 4.1. Recall that the two routes are racing, and whichever finishes first, wins. A defining characteristic of a race is that there are at least two participants. It does not matter how fast participant one is, if participant two is faster, participant two will win. In this example, the derived word insane is more frequent than the base—sane. This gives the direct route a considerable advantage during processing. The direct route is likely to win, not because insane is particularly frequent, but because it is more frequent than sane. Thus, we predict that even low frequency words can display low levels of decomposability, if they are more frequent than their base words. And even high frequency words can display high levels of decomposability, if their base words are more frequent still. This prediction does not rely on the dual-route model laid out here. It follows from any model in which both decomposition and whole route access are available options, or in which the presence of the base word can be variably salient.
Relative Frequency and Morphological Decomposition
Figure 4.1: Schematized dual-route model. The dotted lines indicate the decomposed route. The dashed line indicates the direct route. A fast phonological preprocessor operates prelexically, facilitating access to lexical entries which are aligned with hypothesized word boundaries. The
63
Causes and consequences of word structure
64
line width of each node indicates the resting activation level—insane is more frequent than sane. An analysis presented by Tiersma (1982) provides some indirect evidence for this hypothesis. Tiersma is concerned with a phenomena he terms local markedness. He investigates paradigm leveling involving singular/plural pairs across a range of languages, and notes that, while paradigm leveling generally favors the unmarked, singular noun, there is a set of systematic exceptions. These exceptions all involve nouns denoting objects which tend to occur in pairs or groups—objects he argues are locally unmarked in the plural. He outlines several different types of evidence that nouns which are unmarked in the plural (such as eggs, wheels, and shoes), display properties generally associated with basic forms. Frequency is tightly connected with markedness, and Tiersma notes: ...nouns which are locally unmarked in the plural are also more frequent in the plural; and indications exist that these trends may hold for several languages. In a footnote discussing how this may play out in languages with a dual marker, Tiersma emphasizes that the observed effects arise from a pattern involving relative frequency, and not the frequency of the derived form. In languages with a dual, physical objects which come in pairs may be unmarked in the dual. A nice example of this (Behaghel, 320) is that the chief remains of the Indo-Euopean dual for Proto-Germanic nouns were breusto ‘breasts’ and noso, ‘nostrils’. Note that it was not the most frequent nouns for which the dual was preserved, but rather those nouns in which the dual was probably the most frequent form. (Tiersma 1982:842 (footnote 9)) Thus, the analysis presented by Tiersma is highly consistent with the prediction that derived forms which are more frequent than their bases are prone to opacity and autonomy. They can not be captured by models which privilege the frequency of the derived form. Nonetheless, the frequency of the derived form has been the focus of discussions of frequency and decomposablity, as will be outlined below.
4.2 Surface Frequency and Decomposition In experimental work, the surface frequency of the derived form has been shown to affect the speed of processing of (at least some) complex words (e.g. Burani and Caramazza 1987, Sereno and Jongman 1997). When this surface frequency effect is absent, its absence is interpreted as evidence for morphological parsing (e.g. Vannest and Boland
Relative Frequency and Morphological Decomposition
65
1999). Meunier and Segui (1999a, 1999b) argue that high frequency words can be recognized by their own representation, whereas low frequency words must be recognized via a morphemic representation. Surface frequency effects are believed to be not only symptomatic of (non)decomposition, but the two are also believed to be causally linked. Bybee, for example, makes strong claims about the relationship between the frequency of the derived form and degree of compositionality. There is a universal tendency for morphological irregularity to be restricted to the highest frequency forms of a language. (Bybee 1995a:235) This, claims Bybee, is true both of phonological irregularity and semantic irregularity. She claims (1985, 1995a, 1995b) that frequent derived forms diverge both phonologically and semantically from their bases, and have a tendency to become autonomous. The vast majority of work bearing on this issue actually deals with inflectional morphology. High frequency inflected forms have been shown to be stored, while low frequency forms are believed (by some) to be derived by rule (see, e.g., Stemberger and MacWhinney 1986, 1988, Losiewicz 1992, Alegre and Gordon 1999a). One piece of evidence relating the claim to derivational affixation comes from Pagliuca (1976, as cited in Bybee 1995a), who shows that both phonological and semantic transparency in pre- affixation can be related to lexical frequency. The belief that semantic transparency and the frequency of the derived form are linked is so widely held, that it is sometimes stated as fact, without examples or references. Baayen (1993), for example, when justifying a methodological choice, starts out: “Since transparency is inversely correlated with frequency…” (Baayen 1993:203). Because Baayen (1992, 1993) assumes a link between frequency and opacity, he argues that there is a connection between high token frequency and lack of productivity—frequent words are not accessed via the components they contain, and so do not contribute to the productivity of the related affix. Any word-formation process which is characterized by only high-frequency tokens, will therefore be an unproductive one. Section 4.4 discusses how this link between frequency and autonomy is thought to emerge from a variety of current models of morphological processing. First, though, I briefly outline the literature on the role of base frequency.
4.3 Base frequency and decomposition Unlike surface frequency, the frequency of the base (or the “cumulative stem frequency”—the frequency of the base and its inflectional variants), is not often claimed to play an active role in creating or maintaining decomposability. The word recognition literature does report an effect of base frequency upon the recognition of complex words, although the interpretation of this effect is mixed. The literature reports a facilitatory effect of stem frequency upon recognition (see, e.g. Burani and Caramazza 1987). Some researchers assume that this effect only appears if morphological parsing takes place, and so treat it as symptomatic of parsing (e.g. Vannest
Causes and consequences of word structure
66
and Boland 1999, Bertram et al. 2000). In keeping with this assumption (and coupled with the assumption about the role of surface frequency, above), Gordon and Alegre (1999) claim a stem frequency effect is present only for words with low surface frequency. Other researchers assume that the stem frequency effect always plays a role in morphological access, and facilitates the whole word access route (e.g. Meunier and Segui 1999a, 1999b). These claims are hard to interpret, as the experimental work generally compares conditions containing large numbers of words, making it impossible to interpret the overall consistency of the effect across different words within the same condition. Thus, while the stem frequency does play some role in the perception of (at least some) complex words, no experimental work has associated it with an active role in decomposition, or explicitly linked it to factors associated with decomposition, such as semantic transparency. In the following section I turn to a number of current models of morphological processing, and demonstrate that a role of the base frequency upon decomposability is, in fact, predicted.
4.4 Models of Morphological Processing In this section I outline a number of current models of morphological processing. The models all deal with the role of morphology in perception—much less attention has been directed towards the role of morphological structure in production. Nonetheless, these models also make predictions about the types of lexical representations likely to be most stable, and so (assuming a shared lexicon), are by no means irrelevant for work on speech production. I will not explain each model in depth, but rather attempt to give a broad outline of the salient features, and discuss how lexical frequency is said to figure in the model. While most of the researchers explicitly assume an active role for the frequency of the derived form, we will see that almost all of the models implicitly predict an interaction between the frequency of the base and the frequency of the derived form. 4.4.1 Bybee’s “Morphology as Connections” model Bybee (1985, 1988, 1995a, 1995b) argues for a model of morphology in which morphological relationships are represented as connections between words in the lexicon. Complex words have multiple connections to related words, and lexical connections may vary in strength. She claims that high frequency forms tend to be less transparent than low frequency forms. …low frequency items are analyzed and understood in terms of other items, while high frequency words, complex or not, may be autonomous, and processed unanalyzed. (Bybee 1985:123–124) Bybee models this phenomenon by assuming a link between token frequency and the strength of connection between a complex form and its base. Low frequency complex words are stored in terms of more basic words, which already have lexical
Relative Frequency and Morphological Decomposition
67
representations. As such, low frequency words have many connections, and these connections are strong. High frequency words, on the other hand, are acquired independently, and so display fewer, and weaker links. Bybee, then, assigns the frequency of the derived form a primary role in determining the strength of its relation to its base. High frequency words “undergo less analysis, and are less dependent on their related base words than low-frequency words” (1985:117). Other factors, of course, also play a role in determining the strength of a connection: ...lexical connections weaken because of lessened phonological or semantic similarity, or the high frequency of derived forms. (Bybee 1995a:239) The frequency of the derived form is invoked throughout Bybee’s work, as a primary factor in determining the strength of lexical connections. However, as noted above, her argument is primarily one about the nature of acquisition and storage. It is assumed that a low frequency derived form is more easily learned and retrieved if it is stored in terms of the parts it contains. High frequency forms, on the other hand, are acquired more or less independently, and there is no learning or processing advantage involved in storing or accessing them via their parts. It appears to follow from this line of argumentation that the frequency of the base word should also affect the strength of the connection. Learning and storing a low frequency derived form in relation to its parts presumably only affords an advantage if the parts themselves are fairly frequent. If the base word is even less frequent than an infrequent derived form, strong connections are unlikely to result in a processing advantage. Likewise, frequent forms are likely to be learned independently if they are more frequent than the bases they contain. But if the bases are even more frequent, surely we expect this to affect the desired processing strategy. Indeed, while Bybee is most concerned with the frequency of the derived form, she does, in one paper, invoke relative frequency in an explanatory aside: ..awe and awful are phonologically transparent and not radically semantically divergent, but their frequency disparity weakens their connectedness (awful is three times as frequent as awe according to Francis and Kucera 1982). (Bybee 1995a:239) This statement suggests that Bybee might agree that, while high frequency forms appear to be less transparent than low frequency forms, this is likely a result of the nature of the frequency relations, rather than the high frequency of the forms alone. 4.4.2 Caramazza’s “Augmented Addressed Morphology” Caramazza and colleagues have proposed a model of “Augmented Addressed Morphology” (Caramazza et.al 1988; Chialant and Caramazza 1995, and elsewhere). This model is generally interpreted as a direct access model—known words are not accessed through any kind of prelexical parsing—although the lexical representations themselves may be represented with morphological structure. Morphemic subparts of complex words tend to be activated during access, but the direct route for recognition will
Causes and consequences of word structure
68
always be faster than the decomposed route. The decomposed route, then, can result in lexical access only for words which have not been encountered before. Thus, the model predicts effects of morphological structure only for non-words. Augmented Addressed Morphology (AAM), then, appears to make no explicit predictions about the role of absolute frequency versus relative frequency in terms of influencing access procedures or degree of decompositionality. Because all known forms are accessed the same way, while frequency may affect speed of access, it should not affect access type. However, while most papers based on this model distinguish between known and novel words, Chialant and Caramazza (1995) make a slightly different division. The model also makes the assumption that lexical access to morphologically complex words takes place through whole-word access units for known words and through morpheme-sized access words for unfamiliar morphologically regular words (that is, those cases for which the frequency of the stem is much higher than the frequency of the surface form) or novel words. It follows that for all orthographically transparent forms both whole-word and morpheme-sized access units will be activated, to an extent which is directly proportional to the frequency of the access unit. (Chialant and Caramazza 1995) This paragraph suggests that proponents of AAM may mean something different by unfamiliar words than the noun phrase first suggests, and than is often attributed to them. Rather, here they explicitly invoke the relative frequency of the whole and the parts, suggesting that some active competition occurs between the two access routes, and that the decomposed route may, in fact, have a chance at winning in cases in which the parts are much more frequent than the whole. In this case, AAM makes a clear prediction about the role of frequency in decomposition. While the absolute frequency of the derived form should itself make no difference to the likelihood of decomposition (except in the case of extremely low frequency “novel” words), the relative frequency of the derived form and the stem may influence decomposition. 4.4.3 Marslen-Wilson’s “Direct Access Model” Marslen-Wilson and Zhou (1999) argue that all morphologically complex forms which are related to each other via regular phonological relations share a single underlying abstract representation. The form sanity is processed as the combination of sane+-ity in the same way that madness is processed as the combination of mad+-ness, without any recourse to whole-word access or lemma representations for either type of word. (Marslen-Wilson and Zhou 1999:349) As such, they would presumably predict that all regular morphologically complex forms are equally transparent and decomposable. The model does not seem to make any predictions about the role of token frequency—either whole word frequency, or base
Relative Frequency and Morphological Decomposition
69
word frequency, in influencing decomposition or productivity. The only clear-cut prediction involving frequency is one of access speed. Words with high base frequency would be accessed faster than words with low base frequency. Any effects of frequency upon decomposability, and particularly any effects of surface frequency, would cause problems for this model. 4.4.4 Baayen (1992) Baayen proposes a model in which there are two strategies available for retrieval of forms from memory, and these operate in parallel. They are a rule-based access procedure, and a memory-based procedure, which is much faster. This discrepancy in speed between the two routes predicts different behavior with respect to high-frequency forms and low frequency forms. High frequency types, irrespective of whether the corresponding word formation rule is productive or not, are efficiently accessed by the memory based address procedure. For such types, no benefit from the rule-based address procedure is to be expected, since access by memory will have been completed before access by rule. In the case of lowfrequency items, the speed of retrieval might well benefit from statistical facilitation. For such items, the memory-based access procedure operates more slowly than for high-frequency words. (Baayen 1992:127) For Baayen, then, frequency is relevant to memory-retrieval, with high frequency forms effectively speeding up the memory-based retrieval, making them much less likely to be accessed by rule. Baayen does not discuss the types of factors which might speed the rule-based access procedure. Presumably, because the rule-based access procedure involves accessing the base word, high frequency base words should speed this procedure. Baayen does discuss the role of the base word, but the discussion relates to how it influences the memory-based procedure. He points to results indicating that the frequency of the base word facilitates the lexical decision task (see, e.g., Taft 1979, Laudanna and Burani 1985), using these to argue that high frequency base words must also facilitate the memory based procedure. This would help to explain the non-productivity of -ity as compared to -ness, because, as demonstrated by Anshen and Aronoff (1988), the base word frequencies of -ity tend to be higher. This suggests that formations in -ity have two processing advantages with respect to formations in -ness: their high frequencies guarentee storage in the mental lexicon, and the relatively high frequencies of their base words guarantee rapid lexical access. (Baayen 1992:135) The exact procedure through which the frequency of the base is able to facilitate whole word access remains unclear. The model itself, as I understand it, makes no specific predictions about the role of the base word frequency in the memory-based address procedure. Assuming that the rule-based procedure varies in speed, we should predict that
Causes and consequences of word structure
70
high base frequency would facilitate this route. Thus, my reading of Baayen’s model is that it predicts an effect of the relative frequency of the derived form and the base upon likelihood of decomposition, although it is unclear whether he would agree with this assessment. 4.4.5 Frauenfelder and Schreuder (1992) Frauenfelder and Schreuder (1992) present a model which is essentially a refinement of that outlined by Baayen (see above)—a model they dub the Morphological Race Model (MRM). Like Baayen, they assume two routes—one of which is direct, and one of which involves parsing, and the two routes race in parallel. Unlike Baayen, they are explicit about the factors which facilitate morphological parsing. They assume that both phonological and semantic transparency facilitate the parsing route. They also assume a role of the frequency of the components. …we assume that the time taken to parse a word depends on the resting activation levels of its stem and affix(es). A word whose stem and affixes have a high resting level will have a ‘headstart’ in the parsing against other words. The resting activation levels of the access representations of the stem and affix will be increased only when the parsing route wins the race and produces a successful parse. (Frauenfelder and Schreuder 1992:176). They claim that word token frequency is the most important factor for determining which route wins the race. High frequency word forms will therefore be recognized via the direct route “irrespective of their morphological structure,” because frequent words have a headstart given their increased resting activation level. This does not follow directly from the suggested model—presumably even for high frequency word forms, there is a chance the parsing route will win if the components are of sufficiently higher frequency. Frauenfelder and Schreuder appear to assume that there is a ceiling effect for both routes, and that the fastest possible access via parsing is slower than the fastest possible direct route access. For words of medium to low frequency they explicity assume that both routes have a chance of winning, and that the relative frequency of the whole word and its parts will bias one route over another. Thus, Frauenfelder and Schreuder’s model, as stated, predicts an effect of relative frequency—the more frequent a derived form is relative to its parts, the more likely it will be accessed via the direct route. And Frauenfelder and Schreuder themselves assume this is true, at least for medium to low frequency words. 4.4.6 Schreuder and Baayen’s Morphological Meta-Model Schreuder and Baayen (1995) and Baayen and Schreuder (1999) introduce a “metamodel” of morphological processing, said to characterize the central properties that language-specific models should possess. Departing from a dual-route model in which access mechanisms compete, in their model the two routes may interactively converge on
Relative Frequency and Morphological Decomposition
71
the correct meaning representation. The proposed model is a spreading activation model with three levels: segmentation, licensing, and combination. During the segmentation stage, words are mapped (via an intermediate access representation), to modality specific access representations which are normalized with respect to the inherent variability in the speech signal. Complex forms, affixes and stems may all have their own access representations, and each representation is associated with its own resting activation level. Each access representation is associated with at least one lexical representation, in the form of a “concept node.” It is assumed that concept nodes are only added to the lexicon when the meaning of the complex word cannot be obtained compositionally. Complex words which aren’t semantically transparent then, have their own concept nodes. During the licensing stage, the concept nodes are accessed, and checked to see whether the subcategorization properties of co-activated concept nodes are compatible. Finally, during combination, the lexical representation of the complex words are computed. Both concept nodes and access representations can receive feedback from higher levels. In this way, Schreuder and Baayen predict direct access of highly frequent words. The feedback from the concept nodes to their access representations ensures that frequently processed complex words will eventually be recognized on the basis of their full forms rather than on the basis of their constituents. (Schreuder and Baayen 1995:135) This relies on the assumption that only those access representations which lead to the activation of the successful concept node will benefit from feedback, that is, only nodes which are already activated are able to receive further activation through feedback. In sum, “the activation level of an access representation is a function of the frequency of occurrence of its activator, and of the amount of feedback it receives from the concept nodes with which it is associated.” (135) Consider, for example, the access procedure for the word happiness. The access nodes for happiness, happy and -ness would all receive some level of activation. Assuming happiness is fully semantically transparent, then it is not associated with an independent concept node, but will contain links to the concept nodes for happy and -ness. The concept nodes then, will receive activation from multiple sources. Feedback will flow back from the concept nodes to the access nodes, in proportion to the initial degree of activation of the access nodes. Crucially, because happiness is associated with multiple concept nodes, it will receive more reinforcement through spreading activation than either of the components individually. This mechanism ensures that high frequency forms will eventually come to be accessed via independent access nodes. Word frequency effects are located in the activation levels of the access nodes. The cumulative stem frequency effect—the effect that the number of words sharing a single stem affects access speeds for words containing that stem, is explained by frequency sensitivity of the concept nodes. All semantically transparent forms sharing a stem will activate the same concept node. Now, as with the previous models, nothing explicit really predicts that high frequency forms will come to be primarily accessed via their own access node. If the access nodes for the component parts have sufficiently high resting activation levels, then the
Causes and consequences of word structure
72
activation via the component parts may still be more primary than the activation via the access node. In this model, however, this distinction becomes less crucial, because the two types of information do not directly compete, but rather can converge on a single representation. This model does, however, make some interesting predictions about the conditions under which non-transparency should be possible. Schreuder and Baayen do not discuss this aspect of their model explicitly, but the predictions become clear if we stop to consider the model’s behavior under various conditions of non-compositionality. Recall that non-compositional derived forms are associated with their own concept node. In the case of a phonologically transparent, but semantically non-transparent derived word, then, a direct competition occurs. If the access node for the derived word has a sufficiently higher resting activation level than the access nodes for the parts, then the correct concept node will be activated first, and the appropriate meaning retrieved. However, if the access node for the derived word has a much lower resting activation level, then the concept nodes associated with the parts may be activated and combined before the concept node for the whole form reaches threshold. Such a configuration would be non-optimal and unstable, and non-compositionality is unlikely to remain robust under such circumstances. The prediction then, contrary to the general assumption in the literature, is not that non-compositionality will be associated with forms of high surface frequency, but rather that it will be associated with forms with high frequency relative to their parts. In a paper unrelated to the explication of this model, Baayen assumes that noncompositionality is related to high frequency, and that this is related to memory constraints. A high frequency of use guarantees that their opaque reading can be retained in memory. Thus it is only to be expected that opaque formations show a strong tendency to appear in the highest ranges of the frequency spectrum. (Baayen and Lieber 1997:283) Note, however that monomorphemic words are inherently opaque, and that we are able to retain the meanings of relatively low frequency monomorphemic words in memory without apparent difficulty. The problem with retaining opaque complex words in memory relates to the fact that there is a competing analysis. If the components of that competing analysis are of relatively lower frequency, then retaining an opaque meaning for a complex word in memory should provide no obstacle. It is only when the competing analysis is a strong and competitive one that problems start to arise. Baayen and Schreuder’s model then, makes the same prediction as would any model allowing for competition between access procedures. Non-compositionality should be a viable option for complex words which are more frequent than their parts, regardless of the absolute frequency of the derived forms. 4.4.7 Summary Despite widespread assumptions about the role of the frequency of the derived form in morphological decomposition, most models of morphological processing actually predict
Relative Frequency and Morphological Decomposition
73
an interaction between the lexical frequency of the base and derived form. When the former is more frequent, forms should be readily decomposable. When the latter is more frequent, a whole word access route is dominant, and decomposition should appear less natural. In the next two sections we examine this prediction experimentally.
4.5 Experiment 4: Relative Frequency and Morphological Complexity Do words which are more frequent than their embedded bases appear more easily decomposable than words which are not more frequent than their embedded bases? In this section I describe a simple experiment, which asked subjects this question directly. 4.5.1 Materials and Methodology Thirty-four pairs of words were constructed—17 prefixed pairs, and 17 suffixed pairs. Members of word-pairs shared affixes, and were matched for the morpheme-internal probability of the junctural phonotactics, the stress pattern and syllable count, and for the surface frequency of the derived form. One member of each pair was more frequent than the base, and one member was less frequent. Frequency information was obtained from the CELEX lexical database. The prefixed and suffixed word pairs are shown in 4.1 and 4.2 respectively. The A members of each pair are more frequent than the bases they contain, so are predicted to be rated less complex than the B members of each pair, which are more frequent than the bases they contain. Note that in the suffixed stimuli the pair agility -fragility is included. Unlike the other words in column A, agility is not more frequent than agile—they are of roughly equal frequency. However, fragile is considerably more frequent than fragility, and so the pair should still exhibit the expected contrast. They were included so as to obtain a reasonably sized dataset, while still meeting the considerable restrictions imposed by controlling for various phonological factors. The pairs were counterbalanced for order of presentation, and randomized together with 30 filler word pairs. The fillers paired together pseudo-affixed and affixed words. Example filler pairs include defend-dethrone, indignant-inexact, family-busily and adjective-protective. The complete set of stimuli consisted of 64 word pairs. For each pair, subjects were asked to indicate which member of the pair they considered more “complex.” The exact instructions were as follows: word A freq base freq word B freq base freq refurbish
33
1 rekindle
22
41
inaudible
292
100 inadequate
399
540
23
400
114
3376
44
187
6
223
54
169
incongruous
55
3 invulnerable
uncanny
89
20 uncommon
unleash
65
16 unscrew
immutable
40
unobtrusive
42
4 immoderate 17 unaffected
Causes and consequences of word structure
74
entwine
32
27 enshrine
44
98
immortal
112
53 immoral
94
143
illegible
14
10 illiberal
11
55
intractable
45
12 impractical
47
1228
uncouth
34
2 unkind
72
390
impatient
227
114 imperfect
50
1131
revamp
13
4 retool
10
800
inanimate
34
4 inaccurate
53
377
reiterate
47
0 reorganise
61
1118
immobile
55
11 immodest
13
521
Table 4.1 Experiment 4: Prefixed Stimuli word A
freq
base freq
diagonally
36
abasement
6
meekly
47
swiftly
268
word B
freq
base freq
29 eternally
58
355
3
64
22
196
440
1464
17
116
2 enticement 41 bleakly 221 softly
diligently
35
31 arrogantly
rueful
14
9 woeful
14
68
respiration
39
4 adoration
49
218
alignment
57
44 adornment
41
75
1663
4624
equally
1303
1084 generally
hapless
22
13 topless
27
3089
listless
42
19 tasteless
30
402
frequently
1036
396 recently
1676
1814
exactly
2535
532 directly
1278
1472
agility
34
38 fragility
36
207
slimy
61
35 creamy
74
540
virility
41
31 sterility
36
121
scruffy
42
7 puffy
48
159
Table 4.2: Experiment 4: Suffixed Stimuli This is an experiment about complex words.
Relative Frequency and Morphological Decomposition
75
A complex word is a word which can be broken down into smaller, meaningful, units. In English, for example, the word writer can be broken down into two units: write and -er. -er is a unit which occurs at the end of many English words. In writer, -er has been added to the word write to make a new, more complex word writer. We call a word which has been made out of smaller units in this way a complex word. Rewrite is another example of a complex word in English. It can be broken down into re- and write. Words which are not complex are called simple words. Here are some examples of simple words in English: yellow, sing, table. It is impossible to break down the word table into smaller units. Table is not complex. In this experiment, you will be presented with pairs of complex words, and asked to decide which one you think is more complex. For example happiness is very complex—it can be easily broken down into happy and -ness. Business, however, is not quite so complex. While it is possible to break business down into busy and -ness, it does not seem completely natural to do so. Business is complex, but not as complex as happiness. Another example of a complex word is dishorn. Even though you may never have heard the word dishorn before, you can understand its meaning, because it can be broken down into dis- and horn. Discard is also complex—it can be broken down into dis- and card. But discard does not seem as complex as dishorn. We do not need to break discard into its parts in order to understand its meaning, and, in fact, it seems slightly unnatural to do so. For each pair of words below, please read both words silently to yourself, and then circle the word you think is more complex. It is very important that you provide an answer for every pair, even if you are not certain of your answer. Just follow your intuition, and provide your best guess. The experiment was completed using pen and paper, and subjects worked at their own pace. 20 Northwestern University undergraduate students completed the task, in fulfilment of a course experimental requirement. 4.5.2 Results and Discussion Subjects who did not consistently distinguish between the pseudo-affixed and affixed filler pairs were not included in the analysis. Any subject who did not provide the same answer (in either direction) for at least twenty of the thirty fillers was excluded. Sixteen subjects were therefore analysed. Of these, two interpreted “complex” in the opposite manner from that intended. This could be seen from their consistent behavior on the filler items (i.e. they rated family more complex than busily, adjective more complex than protective and so on). This consistent behaviour indicates that their confusion was a
Causes and consequences of word structure
76
terminological one, rather than a conceptual one, and so their data was included, with their answers reversed. Forms which were more frequent than the bases they contained were consistently rated less complex than their counterparts, which were less frequent than the bases they contained. This is true both for suffixed forms (Wilcoxon, by subjects: p<.005, by items: p<.002), and prefixed forms (Wilcoxon, by subjects: p<.002, by items: p<.01). Among prefixed pairs, 65% of responses favored the form for which the base was more frequent than the whole. Only 35% of responses judged forms which were more frequent than their bases to be more complex than their matched counterpart. The tendency for suffixed forms was of roughly equal strength (66% to 34%). This provides strong evidence that the frequency of the base form is involved in facilitating decomposability. When the base is more frequent than the whole, the word is easily and readily decomposable. However when the derived form is more frequent than the base it contains, it is more difficult to decompose, and appears to be less complex. Subjects share strong judgments about decomposability when asked directly. However, a sceptic might argue that this is an artificial meta-linguistic task. Subjects may find a way to complete the task when faced with a forced choice decision, but these judgements of “complexity” may not relate to any aspect of linguistic competence. It would therefore be desirable to see evidence of this effect at work in a context in which subjects are not forced to make a possibly artificial distinction. In section 4.6, we therefore move away from the forced-choice task, in order to seek evidence of a relative frequency effect which can be more directly related to subjects’ linguistic competence.
4.6 Experiment 5: Relative Frequency and Pitch Accent Placement One way to probe whether the relative frequency effect found in experiment 4 is a deep fact about lexical organization might be to use a word recognition task. Indeed most experimental work on the nature of morphological representations is conducted using the word recognition paradigm. However, results from word recognition may tell us something either about the access strategy or the nature of the representation itself. And when we find different frequency effects across con-ditions, this may reflect different access strategies, or different resting activation levels of representations accessed via the same strategy. Thus, the results in this area are not only often conflicting, but they are also difficult to interpret. In production, however, if we see systematic differences in implementation, it is perhaps safer to assume that these difference stem from some underlying difference in representation. Is it possible to find some measurable correlate of decomposability in subjects’ production of complex forms? The experiment described in this section is based on the hypothesis that the decomposability of a prefixed form can be gauged by the degree to which subjects are prepared to place a contrastive pitch accent on the prefix. Bolinger (1961) points to the high probability of stress shift in cases involving prefixed forms in certain contexts. It is obvious that in pairs where everything is identical except a prefix, a shift to the left is necessary if anything like a balanced contrast is to be achieved. (Bolinger 1961:89)
Relative Frequency and Morphological Decomposition
77
He cites the example in (2) in which a form prefixed in un- is contrasted with its morphological affirmative, pointing out that a leftward stress shift takes place to mark contrast—“the most conspicuous of all of the occurrences of phonetic highlighting.” (Bolinger 1961:83) (2) The phenomenon we are noting may be called the relationship between length and unfamiliarity, or between condensation and familiarity. (Bolinger 1961:89) If a form is highly decomposable, then the prefix is a meaning-bearing unit, and should easily attract a pitch accent. If, however, subjects avoid placing a pitch-accent on a prefix under contrastive focus, this would provide evidence for the relative robustness of a whole-word representation, which is resistant to a decomposed parse. Working under these assumptions, I constructed some experimental materials to test the idea that the relative frequency of a derived form and its base was relevant to morphological decomposition. 4.6.1 Materials and Methodology Fourteen pairs of prefixed, semantically transparent English words were constructed. Each pair was matched according to the identity of the prefix, the syllable count, and the probability of the phoneme transition across the morpheme boundary. All pairs were matched according to the approximate lexical frequency of the derived form. The words differed in that, for one member of each pair, the base BASE
DER
BASE BASE>DER
DER
BASE
refurbish
33
1 rekindle
22
41
immortal
112
110 immoral
94
1033
inaudible
292
100 inadequate
399
540
illegible
14
11
266
incongrous
33
23
400
intractable
45
12 impractical
47
1228
uncanny
89
20 uncommon
114
3376
uncouth
34
2 unkind
72
390
unleash
65
16 unscrew
44
187
inanimate
34
4 inaccurate
53
377
impatient
227
114 imperfect
50
1131
immutable
40
4 immoderate
6
223
revamp
13
4 retool
10
800
unobtrusive
42
54
169
10 illiberal 3 invulnerable
17 unaffected
Table 4.3: Experiment 5: Stimuli
Causes and consequences of word structure
78
was less frequent than the derived form, and for the other member, the base was more frequent than the derived form. The matched pairs are listed in table 4.3, together with the frequency of the words, and the bases they contain. The words were embedded in sentences designed to attract a contrastive pitch accent to the prefix. Matched pairs were presented in strictly parallel syntactic constructions, as exemplified in (3). (3) (a)
Sarah thought the document was legible, but I found it completely illegible.
(b) Sarah thought her cousin was liberal, but I found him completely illiberal.
The full data set consisted of 84 sentences—6 sentences for each word pair. The full six sentences for the illegible/illiberal pair are given in (4). In each set, the matched pair was placed both under strong contrastive focus (4) (a, b) and weak contrastive focus (c, d). Two filler sentences were also used, one in which the word is non-decomposable—or relatively so (e), and one for which the word is prefixed, but no contrast is invoked (f). All sentences were randomized, together with filler sentences, and presented on cards to six subjects, each of whom produced each sentence twice. (4) (a) Sarah thought the document was legible, but I found it completely illegible. (b) Sarah thought her cousin was liberal, but I found him completely illiberal. (c)
Sarah thought the document was readable, but I found it completely illegible.
(d) Sarah thought her cousin was open-minded, but I found him completely illiberal. (e)
Sarah had never heard of him before, but he appears very illustrious.
(f)
Sarah would love to try that, but it is completely illegal.
Extremely few tokens occurring under weak contrast or non-contrast conditions attracted a pitch accent to the prefix (or pseudo-prefix), although this was not quantified. Only the tokens occurring in the test condition—the strong focus condition, were carefully analyzed. The test sentences were examined using esps-xwaves, and prefixes were coded as bearing a pitch accent only in cases in which this was both clearly audible, and visible in the pitch-track. Clear examples of the hypothesized difference are given in figures 4.2 and 4.3. Imperfect is less frequent than perfect, and so we predict that it should be highly decomposable, and easily attract a contrastive pitch accent to the prefix. Figure 4.2 shows an example—vertical lines segment the waveform and pitch-track into component phonemes. The pitch-track shows a clear peak aligned with the prefix. This can be contrasted with the same speaker’s production of impatient, in figure 4.3, in which the peak aligns with the lexically stressed syllable—the second syllable of the word. 4.6.2 Results As predicted, prefixes on words with the derived form more frequent than the base were significantly less likely to attract a pitch accent than their counterparts (by items sign test: p<.02). A derived form’s relationship to its base is strong and salient when that base is relatively frequent. If the derived form is highly frequent relative to its base, however, the morphological relationship becomes weaker, and a decomposed parse is less available.
Relative Frequency and Morphological Decomposition
79
The trend was in the predicted direction for 5 of the 6 subjects. The sixth subject displayed a ceiling effect—she placed a contrastive pitch accent on the prefix for all sentences involving strong focus. For individual pairs, items displaying the largest difference in relative frequency showed the largest difference in behavior. Figure 4.4 shows a point for each matched pair. The x axis shows the difference between the log(base frequency/derived frequency) for each member of the pair. The y axis shows the difference between the two pairs, in terms of the number of tokens for which a
Figure 4.2: Subject 6: Waveform and pitchtrack for “But today they seemed
Causes and consequences of word structure
80
imperfect.” In this example, the prefix attracts a pitch accent.
Figure 4.3: Subject 6: Waveform and pitchtrack for “But today they seemed impatient.” In this example, the prefix does not attract a pitch accent.
Relative Frequency and Morphological Decomposition
81
pitch-accent appeared on the prefix. Examination of the graph shows that the pair with the largest difference in relative frequency (immoderate/immutable) showed a marked difference in terms of pitch accent placement. The pair with the smallest difference actually displayed a trend in the wrong direction (inadequate/inaudible). Overall, the difference in number of tokens attracting a pitch accent on the prefix for each pair is well predicted by the magnitude of the difference in relative frequency (r2=.53, p<.005). The regression line is shown in figure 4.4.
Figure 4.4: Difference in number of tokens attracting a pitch accent to the prefix, as a function of the difference between the log (base freq/derived freq) for each matched pair. (r2=.53, p<.005) 4.6.3 Discussion This experiment has illustrated the potential for using production data to probe underlying morphological structure, and points to new possibilities for gathering evidence about the multitude of issues of morphological storage for which the word recognition literature is ambiguous. In particular, these results demonstrate that one factor which is highly relevant to the representation of prefixed forms is the relative frequency of the derived form and the base. Words like illegible for which the derived form is more frequent than the base, tend to be less decomposable than words for which the opposite is true (e.g. illiberal). This
Causes and consequences of word structure
82
provides evidence in support of the hypothesis that relative frequency of the derived form and the base is relevant to the decomposition of complex words, and so supports models of morphological access in which multiple analyses are processed in parallel, and in which the speed of processing of a given analysis relates to the frequency of the components.
4.7 Summary Many researchers have claimed a link between the frequency of a derived form, and the degree to which it is accessed and represented independently of its parts. I have argued that the more relevant frequency effect is the relative frequency of the derived form and its base. Two experimental results were presented—first, subjects are much more likely to rate forms with higher frequency bases as complex, than matched counterparts with relatively lower frequency bases. And second, prefixed forms with high frequency bases were more likely to attract a pitch accent to the prefix. Both of these experimental tasks were chosen because they tap fairly directly into the degree to which a form is decomposed. Thus, these results can be taken as more direct evidence for the relationship between frequency and decomposability than work which infers (lack of) decomposition from response times in a word recognition paradigm. While the results presented here do not directly counter the claim that the absolute frequency of the derived form is linked to opacity, it does show that it is not the only frequency effect related to opacity, and suggests that results leading to this claim may in fact have arisen through an interaction between base and surface frequency. In the next chapter we therefore turn to an investigation of the relationship between lexical frequency and semantic transparency, for both prefixed and suffixed forms, in order to investigate this possibility more closely.
CHAPTER 5 Relative Frequency and the Lexicon
In the previous chapter we discussed a variety of evidence pointing to the fact that frequency affects decomposition. In particular, it was argued that the relative frequency of the derived form and the base plays an important role—derived forms which are more frequent than their base are less decomposable than derived forms which are less frequent than their base. We have seen how this prediction follows from most current models of speech perception, and have found some supporting experimental evidence. In this chapter we turn to an examination of the English lexicon today. Can we find evidence of such an effect in the lexicon? If mechanisms of speech perception and production tend to make some complex words more robustly decomposed than others, then we should be able to find evidence of this in the lexicon. If derived forms which are more frequent than their bases are less easily decomposed online, they should display signs of semantic drift and polysemy. Other factors known to be associated with decomposition should be negatively correlated with the relative frequency of the derived form and the base. In this chapter, we will explore similar materials to those used in our discussion of affixedness in chapter 3. Wurm’s (1997) words only contain four examples for which the derived form is more frequent than the base, so they do not help us to disentangle the roles of absolute and relative frequency. We concentrate, instead, on investigating the numbers and types of definitions appearing in Webster’s 1913 Unabridged Dictionary. We begin with a discussion of the overall frequency distributions of prefixed and suffixed forms, and then move on to an investigation of the role of relative frequency in prefixed forms (in section 5.2), and suffixed forms (5.3). Finally, section 5.5 discusses some methodological consequences of the results presented in this chapter for current experimental work on morphological access.
5.1 Relative Frequency Distributions in Affixed Words Harwood and Wright (1956) have pointed out that in general derived forms (or in their terminology resultants) tend to be less frequent than their bases (the underlying forms). This can be confirmed by examining the data in figures 5.1 and 5.2, which show the relationship of derived frequency to base frequency, for the set of prefixed and suffixed words, respectively.
Relative Frequency and the Lexicon
85
The dotted lines through the graphs indicate the line at which derived frequency and base frequency are equal. Points falling below this line represent forms for which the derived frequency is greater than the frequency of the base. Points falling above it represent forms for which the derived form is lower frequency than the base it contains. The solid line represents a regression line fit through the data, and the dashed line represents a non-parametric scatterplot smoother (Cleveland 1979). The graphs are based on 515 prefixed forms (described in section 3.3.2), and the 2028 suffixed forms (section 3.4.1), respectively. These graphs reinforce Harwood and Wright’s (1956) argument, that derived forms tend to be less frequent than the bases—many more points fall above the x=y line, than below it. The points below the x=y line represent forms which are more frequent than their bases. There is a sense, then, in which these forms have escaped—and become liberated from the properties of the base. It is these forms which, we argued in chapter 4, tend to be accessed via a whole-word representation, and are not robustly decomposed. In theory, any form could fall below the line, although some are much more likely to fall than others. In chapter 3 for example, we saw that prefixed forms with legal junctural phonotactics are significantly more likely to fall below this x=y line than forms which contain illegal junctural phonotactics. In this chapter we will further explore the hypothesis that these forms are associated with different properties than the forms which fall above the line. Examination of 5.1 and 5.2 reveals that for both prefixes and suffixes, a positive and significant correlation holds between the frequency of a base form, and the frequency of a derived form which contains it. More frequent bases tend to be associated with more frequent derivatives. In general, how often a (transparent) derived form is deployed in speech is likely to be a partial function of the frequency of the form upon which it is based. Importantly, for both suffixes and prefixes the frequency of the derived form and the base are positively correlated on a slope that falls consistently above the line at which the derived form and the base are equal. Given the frequency of any base, then, the frequency of a derived form containing that base is partially predictable, and, in all cases, the prediction is that the derived form will be somewhat
Causes and consequences of word structure
86
Figure 5.1: Log derived frequency and base frequency for 515 English prefixed forms. The solid line shows the result of a linear regression (r2=.04, p<.001). The dashed line shows a nonparametric scatterplot smoother (Cleveland 1979) fit through the data. The dotted line shows the x=y line. The correlation is significant both by parametric (Pearson’s r=.2, p<.001) and non-parametric (Spearman’s r=.21, p<.001) measures.
Relative Frequency and the Lexicon
87
Figure 5.2: Log derived frequency and base frequency for 2028 English suffixed forms. The solid line shows the result of a linear regression (r2=.28, p<.001). The dashed line shows a nonparametric scatterplot smoother (Cleveland 1979) fit through the data. The dotted line shows the x=y line. The correlation is significant both by parametric (Pearson r=.53, p<.001) and non-parametric (Spearman’s r=.56, p<.001) measures. derived>base log (der)<2
total forms
percentage
5
167
3%
2
24
196
12%
4
21
133
16%
6
3
19
16%
Causes and consequences of word structure
88
Table 5.1: Number and percentage of prefixed forms with derived form more frequent than the base, broken down by frequency range of derived form. derived>base
total forms
percentage
log (der)<2
13
914
1%
2
53
633
8%
4
39
355
11%
6
43
115
37%
9
11
82%
8
Table 5.2: Number and percentage of suffixed forms with derived form more frequent than the base, broken down by frequency range of derived form. less frequent than the base. Examination of the data represented in these figures provides some insight into the degree to which the absolute frequency of the derived form is correlated with the relative frequency of the derived form and the base. The chances of a high frequency derived form (towards the right of each graph) being more frequent than its base are much higher than the chances of a low frequency derived form being more frequent than its base. The forms occurring at the very left of each graph have a listed lexical frequency of 0 (treated here as 1, so as to facilitate taking the log). For such forms, it is obviously impossible for the base to be less frequent. The degree of opportunity for the derived form to be more frequent than the base increases as we move to the right of the graph. To see that this is so, tables 5.1 (for prefixes) and 5.2 (suffixes) list the percentage of forms falling below the x=y line for different values of the derived frequency. When the derived frequency is high, the proportion of forms for which the derived form is more frequent than the base is substantially higher than when the derived frequency is low. This is particularly true of the suffixed forms (table 5.2). The result of this tendency is that absolute frequency and relative frequency are not independent of one another. If relative frequency is an important factor in morphological decomposition, this correlation raises the possibility that previous observations regarding correlations between the absolute frequency of the derived above-average freq.
below-average freq.
derived>base
19 (36%)
88 (19%)
derived
34 (64%)
373 (81%)
Table 5.3: Number of prefixed forms of above/below average frequency, with derived form
Relative Frequency and the Lexicon
89
more/less frequent than the base. Chi-Square=7.12, df=1, p<.01 above-average freq.
below-average freq.
derived>base
74 (46%)
222 (12%)
derived
87 (54%)
1619 (88%)
Table 5.4: Number of suffixed forms of above/below average frequency, with derived form more/less frequent than the base. ChiSquare=132.4, df=1, p<.00001 form and decomposability may in fact be artifactual. Table 5.3 breaks the prefixed forms into above- and below-average frequency forms. The above-average forms are significantly more likely to be more frequent than their base than the below average forms (Chi-Square=7.12, df=1, p<.01). This tendency is stronger still in suffixed forms, as shown in table 5.4 (ChiSquare=132.4, df=1, p<.00001). As has already been discussed, suffixed forms differ from prefixed forms in an important way. The onset of the derived form and the base are simultaneous in suffixed forms, whereas for prefixed forms the onset of the derived form is temporally prior to the onset of the base. We have seen (in chapter 3) that this leads to different predictions regarding the role of phonotactics in the decomposition of prefixed and suffixed forms. Different predictions about the role of relative frequency also follow independently. Because the onset of the derived form precedes the onset of the base in prefixed forms, the derived form has a natural advantage in perception. If it is more frequent than the base, this reinforces the advantage, making a decomposed route unlikely. In suffixed forms, however, the temporal onset of the derived form and the base is identical. Furthermore, the offset of the base is reached before the offset of the entire derived form. Suffixed forms, then, do not afford the whole form access route the same natural advantage over the decomposed route that prefixed forms do. For both prefixed and suffixed forms, decomposition and whole route access are possible, and so in both cases we expect to see an interaction between the frequency of the derived form and the base: the more frequent the derived form is relative to the base, the more likely a whole word representation and access strategy will be. However, because the whole word route does not have quite the same advantage in suffixed forms, one might reasonably expect the derived form to need to be relatively more frequent than the base in suffixed forms than in prefixed forms before signs of non-compositionality can be detected. Furthermore, we should expect prefixed words to become liberated from their bases at a higher rate, and so be more likely to drop below the x=y line than suffixed words. In light of these difference between prefixes and suffixes, it is worth carefully comparing the two graphs in 5.1 and 5.2. Comparing the two figures reveals some interesting differences. Recall that the temporal nature of speech predicts forms containing prefixes to generally be less decomposed than those containing suffixes. Some
Causes and consequences of word structure
90
evidence for this comes from comparing the proportion of forms falling below the x=y line for the two graphs. Assuming briefly the point this chapter is trying to show (that relative frequency and decompositionality are linked), a larger proportion of points below this line would indicate a higher proportion of non-compositionality. In the case of a fully productive affix creating highly decomposable forms, we should expect points to gather fairly tightly around the regression line, and uniformly above the point at which base frequency and derived frequency are equal. Comparing figures 5.1 and 5.2, note that a larger proportion of points fall below the x=y line for prefixes than suffixes. For prefixes, 53 of 515 forms fall below the line—10%. For suffixes, a slightly smaller proportion of forms falls below the line—158 of 2028 forms, or 8%. This lends support to the hypothesis that prefixed forms, in general, tend to be less decomposed. The correlation between base frequency and derived frequency is also much stronger for suffixes. This too, can be taken as indicative of the higher rates of decompositionality among suffixes than prefixes. As argued above, we might expect the derived form to need to be relatively more frequent than the base in suffixed forms than in prefixed forms before signs of noncompositionality can be detected. That is, the relevant threshold of relative frequency should differ for prefixes and suffixes. In absolute terms, a derived prefixed form may not necessarily be more frequent than its base, before it is frequent enough to win out in access. And likewise, a derived suffix form which is slightly more frequent than the base it contains may still not be quite frequent enough to escape the bias afforded the parsing route. That is, the correct threshold for prefixes may be slightly higher than the line which appears in figure 5.1. And the correct threshold for suffixes may be slightly lower than the line which appears in figure 5.2. The degree to which this is the case remains for empirical investigation. Despite the fact that the relevant threshold is likely to be slightly different for prefixes and suffixes, we still expect relative frequency to be related to decompositionality in both prefixed and suffixed forms. And, for the purposes of investigating this hypothesis, drawing a line at derived frequency=base frequency suffices as a first approximation at dividing the data. Now then, we turn to a closer investigation of affixed forms, in order to determine whether the forms which have fallen below the x=y line in figures 5.1 and 5.2 display different properties than forms which fall above the line. We begin with prefixes.
5.2 Relative Frequency in Prefixed Forms We have already seen some preliminary evidence that relative frequency is related to decomposition in the lexicon. The investigation of phonotactics in prefixed words (in chapter 3) revealed a significant correlation between phonotactics and relative frequency. Derived forms which are more frequent than the bases they contain are significantly less likely to contain illegal phonotactics across the juncture than derived forms which are less frequent than their bases (Chi-Square=3.98, p<.05). This relationship is weak but significant. The log ratio of the base and derived form frequency proved significantly correlated with the probability that the phoneme transition across the morpheme boundary would occur morpheme-internally (Spearman’s rho=.19, p<.0001). This initial
Relative Frequency and the Lexicon
91
evidence points to the relevance of relative frequency to decomposability. We turn now to investigate the consequences for polysemy. 5.2.1 Relative Frequency and Polysemy in Prefixed Forms Do words which are more frequent than their bases tend to proliferate in meaning? Presumably, because they are less tied to the semantics of their bases, they are free to acquire additional meanings, and are not strongly tied to the semantics of the base. A set of prefixed forms was used to investigate this question. The set contains 9 different affixes, and 515 words, and is described in full in 3.3.2. For each of these forms, the frequency of the derived form and the base was retrieved from CELEX, and the number of definitions listed in Webster’s 1913 Unabridged Dictionary was counted, in the manner described in 3.3.2. The effect of relative frequency was investigated in two ways. First, the words were divided into two sets—those for which the derived form is more frequent than the base, and those for which the opposite holds true. The two sets were then investigated to establish what proportion of words in each set were associated with an above-average number of meanings. The average number of meanings across the entire set of words is 5.2, and so a distinction is drawn between words with an above average number of definitions (6 or more), and those with a below average number (5 or fewer). DERIVED MORE FREQ
BASE MORE FREQ
above average
below average
above average
below average
dis
5
3
49
48
un
5
5
28
50
14
8
45
40
em
5
4
19
36
up
0
0
5
23
mis
0
1
3
48
in(doors)
1
2
11
29
ex
0
0
2
6
trans
0
0
5
14
30
23
167
294
(57%)
(43%)
(36%)
(64%)
in(sane)
TOTAL
Table 5.5: Number of forms with above/below average number of listed definitions, for prefixed forms with the derived form more frequent than vs less frequent than the base. Chi-Square on Totals line: Chi-Square=7.51, df=1, p<.01
Causes and consequences of word structure
92
The distribution across the two sets is shown in table 5.5, broken down by affix.1 Overall, prefixed forms which are more frequent than their base are significantly more likely to list an above average number of definitions than prefixed forms with are less frequent than the bases they contain. Relative frequency is related to polysemy. Prefixed forms which are more frequent than the bases they contain are prone to semantic liberation. They are not intimately tied to the meaning of the base, and are free to proliferate in meaning. The relationship between relative lexical frequency and polysemy is gradient. Figure 5.3 contains a point for each of the 515 words in the corpus. The y axis shows the log number of definitions listed for that word, and the x axis shows the difference between the log lexical frequencies of the base and the derived form. A significant negative correlation holds (rho=−.32, p<.0001). The more frequent the derived form is, relative to the base it contains, the more meanings it is associated with. The data therefore provides good evidence that there is a relationship between relative frequency and polysemy, and that the relationship is gradient. But what is the exact nature of the relationship, and, in particular, can it to be shown to be distinct from any effect of the absolute frequency of the derived form. That is, is this correlation simply an artifact of a relationship between the lexical frequency of a derived form, and the number of 1. One word is omitted from this table because the derived form and the base have equal lexical frequency.
Relative Frequency and the Lexicon
93
Figure 5.3: Log number of meanings, as predicted by relative lexical frequency (Spearman’s rho=−.32, p<.0001). The line shows a nonparametric scatterplot smoother fit through the data. above-average freq.
below-average freq.
above-average meanings
70 (65%)
128 (31%)
below-average meanings
38 (35%)
279 (69%)
Table 5.6: Number of forms with above/below average number of listed definitions, for prefixed forms with the derived form having above/below average lexical frequency. Chi-Square=38.75, df=1, p<.001 definitions it is associated with? Table 5.6 shows the collapsed distribution of prefixed forms with aboveaverage/below-average lexical frequency, which are associated with an aboveaverage/below-average number of meanings. This distribution is significant (Chi-Square=38.75, df=1, p<.001). Frequent derived forms are associated with an above-average number of meanings. Indeed, when we examine the distribution in a more gradient manner, the log lexical frequency of the derived form is a significant predictor of the log number of meanings associated with that form, as can be seen in figure 5.4 (rho=.39, p<.0001). All evidence points to this correlation being a stronger effect than the one we have observed involving the relative frequency of the derived form and the base. Indeed, the correlation between the absolute frequency of the word and the number of meanings is independently predicted. Previous work has demonstrated that high frequency words tend to be associated with more meanings than lower frequency words (Paivio et al. 1968). Is the effect of relative frequency upon polysemy actually an artifact of this relationship? No, it is not. When we separate out the data, we find evidence of both effects at work. Table 5.7 shows the distribution of the data. The data are separated into two groups, according to whether the derived form exhibits above-average or below-average lexical frequency. For each of these two groups, the number of words associated with an above average or below average number of definitions is shown, for the subsets for which the derived form is more frequent than the base, and vice-versa. There is a lot of information in this table. I approach the explication backwards: by stating the conclusions, and then stepping through the relevant evidence. Overall the distribution supports an interpretation in which both absolute and relative frequency are important, and are actively and independently involved in polysemy. When a derived form becomes sufficiently frequent, it is associated with multiple meanings, regardless of the relative frequency of the base. That is, absolute frequency is
Causes and consequences of word structure
94
important. Once a form is sufficiently frequent, then meaning proliferates. If a high frequency form overtakes its base, this has no further effect.
Figure 5.4: Log number of meanings as predicted by the log lexical frequency of the derived form (Spearman’s rho=.39, p<.0001). The line shows a non-parametric scatterplot smoother fit through the data. above-av. freq.
below-av. freq.
above-av. meanings
below
above-av. meanings
below
base>der
56
32
111
262
der>base
13
6
17
17
Table 5.7: Number of forms with above/below average number of listed definitions for prefixed forms with the derived form having above/below
Relative Frequency and the Lexicon
95
average lexical frequency, for forms for which the derived form is more or less frequent than the base. Relative frequency has no significant effect on polysemy amongst high frequency forms. Relative frequency, however, is also important. Derived forms which are more frequent than their base, tend to be associated with more meanings than derived forms which are less frequent than their base. Once a derived form overtakes its base in frequency, it tends to proliferate meaning. As such, amongst derived forms which are more frequent than their bases, the absolute frequency of the derived form has little further effect upon polysemy. We begin by looking at the facts associated with the relative frequency of the derived form and the base, as this is the factor which we are most interested in. If we restrict our attention to forms with below average lexical frequency, then a strong effect of relative frequency emerges. Amongst prefixed forms of below average frequency, forms which are more frequent than the bases they contain are significantly more likely to be associated with an above average number of meanings than are forms which are less frequent than the bases they contain (Chi-Square=5.02, p<.025). In addition, amongst the forms of below average lexical frequency, the log ratio of the derived form and the base remains a significant predictor of the variation (Spearman’s rho=−.3, p<.0001). However, this correlation does not hold amongst forms of above-average lexical frequency. Recall that forms of above-average lexical frequency are independently expected to be associated with multiple meanings. Relative frequency has no additional effect (Chi-Square=.02, n.s, Spearman’s rho=.16, p<.1 1). Thus, amongst high frequency forms, the lexical frequency of the derived form appears to be dominant, and relative frequency plays no additional role. Amongst low frequency forms, however, the role of the frequency of the derived form is minimized, and the relative frequency of the derived form and the base is significantly correlated with the degree of observed polysemy. And, as expected, the converse holds true for the effect of the absolute frequency of the derived form. We expect this effect to be strongest when the base is less frequent than the derived form. This is because forms for which the derived form is more frequent than the base are likely to be already associated with multiple meanings, and so the frequency of the derived form is likely to have minimal additional effect. As predicted, for forms for which the base is more frequent than the derived form, derived forms of above-average lexical frequency are more often associated with an above-average number of meanings than are derived forms of below-average meanings (Chi-Square=33.92, p<.001). Moreover, the lexical frequency of the derived form is a significant predictor of the observed variation in this set (Spearman’s rho=.38, p<.0001). For that subset for which the derived form is more frequent than the base, the two statistics provide mixed evidence regarding the role of the frequency of the derived form. When we divide the data categorically into “above-average” and “below average” subsets, no effect emerges. Thus, for forms for which the derived form is more frequent than the base, forms of above-average lexical frequency are not significantly more likely to be associated with an above-average number of definitions than forms of belowaverage frequency are (Chi-Square=1.02, n.s). A gradient relationship does, however, still hold (Spearman’s rho=.37 p<.01). This, together with the overall size of the correlations
Causes and consequences of word structure
96
reported above, suggests that the effect of the lexical frequency of the derived form is the single best predictor of degree of polysemy. The overall picture that emerges is that one set of prefixed words resists acquiring new meanings: prefixed words of below-average frequency, which are also less frequent than the bases they contain. If the word reaches a certain threshold of frequency, or if it overtakes the frequency of the base it contains, then it is likely to acquire new meanings. If neither of these events occur, a word is likely to resist polysemy. Final evidence that both relative and absolute frequency are separately related to polysemy comes from an examination of their behavior in a multiple regression model of the data. Both the log ratio of the base frequency and the derived frequency, and the derived frequency alone, are independent and significant predictors of the variation in polysemy, together accounting for 15% of the variation (r2=.15, p<.00001). Figure 5.5 shows the log number of meanings, plotted against the sum of these two factors, each weighted by the value of their respective coefficients. We will return to further discussion of these issues later in the chapter, after an examination of results involving semantic drift. We will see that proliferation of meaning and semantic drift are not necessarily directly linked, and that while (as we have seen) the former is associated with two distinct frequency effects, the latter is associated with only one. 5.2.2 Relative Frequency and Semantic Drift of Prefixed Forms This section investigates degree of semantic drift as a function of the frequency characteristics of the derived form and the base. Semantic drift can be assessed by locating the base of the derived form in the definition of the derived form. If a prefixed form contains its base in its definition, it is assumed that it is reasonably semantically transparent. Words which have drifted away from the meaning of the base are less likely to invoke the base explicitly in their definitions. The rationale for this methodology is discussed in slightly more detail in section 3.3.2, where a similar calculation was done in an
Relative Frequency and the Lexicon
97
Figure 5.5: Log number of meanings, as predicted by the log derived frequency, and the log (base/derived) frequency, each weighted by their respective coefficients as returned by a multiple regression. The line shows a non-parametric scatterplot smoother fit through the data. DERIVED MORE FREQ
BASE MORE FREQ
base absent
base present
base absent
base present
dis
4
4
28
69
un
2
8
5
73
in(sane)
8
14
13
72
em
4
5
4
51
up
0
0
3
25
mis
1
0
6
45
Causes and consequences of word structure
98
in(doors)
1
2
7
33
ex
0
0
4
4
trans
0
0
9
10
20
33
79
382
(38%)
(62%)
(17%)
(83%)
TOTAL
Table 5.8: Number of definitions which do/do not explicitly invoke the base, for prefixed forms with the derived form more frequent than vs less frequent than the base. Chi-Square on totals line: Chi-Square= 11.68, df=1, p<.001 above-average freq.
below-average freq.
base absent
23 (21%)
76 (19%)
base present
85 (79%)
331 (81%)
Table 5.9: Number of forms with base absent/present in definition, for prefixed forms with the derived form having above/below average lexical frequency. Chi-Square=.23, n.s. investigation of the relationship between phonotactics and semantic drift. The definitions were examined in the manner described in 3.3.2. Table 5.8 shows the number of words for which the base was mentioned in the definition, for words with bases more/less frequent than their derived forms. Relative frequency is related to semantic drift. Words for which the derived form is more frequent than the base are significantly less likely to mention their base in their definition than words for which the derived form is less frequent than the base (ChiSquare=11.68, p<.001). We saw above that, in addition to relative frequency, the absolute frequency of the derived form was also relevant to polysemy. The data under investigation here provides an interesting contrast. The absolute frequency of the derived form appears to be unrelated to the likelihood a base will appear in that word’s definition. Table 5.9 shows the collapsed distribution of above-average and below-average frequency derived forms for which the base is present or absent in the definition. Above-average frequency prefixed forms are no more likely than below-average frequency forms to mention their bases explicitly in their definitions. The absolute frequency of the derived form does not appear to be relevant to semantic drift (ChiSquare=.23, ns). Attempting to factor out relative frequency does nothing to facilitate the emergence of an effect of absolute frequency. Thus, amongst derived forms which are more frequent than their bases, above-average frequency prefixed forms are not more likely than below average forms to mention their base in their definitions (Chi-Square=.5,
Relative Frequency and the Lexicon
99
ns). The effect is also absent amongst derived forms which are less frequent than the base they contain (Chi-Square=.05, ns). Also, the effect of relative frequency appears relatively robust, across the full spectrum of derived forms. Amongst derived forms of above average frequency, forms which are more frequent than their bases are significantly less likely to invoke their base in the definition than forms which are less frequent than their base (Chi-Square=7.23, p<.01). The same trend is strongly in the right direction for derived forms of below average frequency, though it is marginally off reaching significance (Chi-Square=3.74, p<.06). Here, then, we have clear evidence that the relative frequency of the derived form and its base is relevant to semantic transparency of prefixed forms, and so, (by inference) to decomposition. The effect observed here can not be an artifact of absolute frequency of the derived form—as the absolute frequency of the derived form appears to have absolutely no effect on the observed phenomena. The evidence involving prefixed forms is clear. In the following sections we turn to an investigation of the relationship of relative lexical frequency to suffixed forms in English.
5.3 Relative Frequency in Suffixed Forms 5.3.1 Relative Frequency and Semantic Drift in Suffixed Forms The corpus described in section 3.4.1 was examined in order to assess the influence of relative frequency upon semantic drift in suffixed forms. The hypothesis was that derived forms which are more frequent than their bases would be less likely to explicitly invoke their base in their definition than derived forms which are less frequent than their bases. The relevant distribution is shown in table 5.10. The predicted pattern is present. Forms for which the derived form is more frequent than the base are significantly less likely to mention their base in their definition than forms for which the base is more frequent (Chi-Square=6.63, df=1, p<.01). This indicates that the pattern observed with prefixed forms is also present for suffixed words. The same data-set is given in table 5.11, this time broken down by der>base
base>der
base absent
25 (16%)
169 (9%)
base present
133 (84%)
1675 (91%)
Table 5.10: Number of forms with base absent/present in definition, for suffixed forms with the derived form more/less frequent than the base it contains. Chi-Square=6.63, df=1, p<.01 above-average freq. base absent
38 (13%)
below-average freq. 158 (9%)
Causes and consequences of word structure
base present
100
258 (87%)
1574 (91%)
Table 5.11: Number of forms with base absent/present in definition, for suffixed forms with the derived form having above/below average lexical frequency. Chi-Square=3.58, df=1, n.s (p<.06) absolute frequency. Numbers of forms for which the base was present or absent in the definition are given, for forms which are above or below-average lexical frequency. The average lexical frequency for derived forms in the data-set was 122.2 (per 17.9 million). There is a tendency for frequent derived forms to mention their bases in their definitions less often than infrequent derived forms. However this tendency does not reach significance. Thus the significant relative frequency effect observed above is not an artifact of a strong effect of absolute frequency. This reinforces the claim that there is a relationship between relative frequency and semantic transparency in suffixed forms in English, and that relative frequency is more directly relevant to decomposition than absolute lexical frequency. 5.3.2 Relative Frequency and Polysemy in Suffixed Forms As has already been noted, there is a link between decomposability and polysemy. Forms which are relexicalized, and accessed as wholes, are more prone to the proliferation of meaning than are forms which favor the parsing route, and are highly decomposed. Recall that, in our investigation of prefixed forms, we found effects of both relative frequency and absolute frequency upon polysemy. Neither of these effects was artifactual. Both frequent forms, and infrequent forms which are more frequent than their bases, are prone to the proliferation of meaning. However, as we have seen, it is only forms which are more frequent than their bases that are likely to drift away from their transparent meanings. Hence, frequent words which are der>base
base>der
above-average meanings
67 (42%)
1312 (71%)
below-average meanings
91 (58%)
532 (29%)
Table 5.12: Number of forms with derived form more/less frequent than the base, for suffixed forms with the derived form having above/below average lexical frequency. Chi-Square=12.11, df=1, p<.001 above-average freq
below-average freq
above-av meanings
157 (53%)
443 (26%)
below-av meanings
139 (47%)
1289 (74%)
Relative Frequency and the Lexicon
101
Table 5.13: Number of suffixed forms with above/below average frequency, with the derived form having above/below average lexical frequency. Chi-Square= 90.21, p<.001 less frequent than their bases may acquire multiple meanings, but are unlikely to become dissociated with the meaning which explicitly invokes the base. Given no reasons to assume that prefixes and suffixes work differently in this regard, we therefore predict that both relative and absolute frequency can affect the number of meanings associatied with suffixed forms. The average number of definitions in the corpus of 2028 suffixed words is 2.2. Table 5.12 shows the number of forms with above and below average definition counts, for derived forms which are more/less frequent than the bases they contain. Note that there are 26 forms for which derived frequency is equal to base frequency. These are not included in the table. As predicted, derived forms which are more frequent than their bases are significantly more likely to be associated with an above-average number of definitions (Chi-Square 12.11, df=1, p<.001). Analyzing the data according to absolute frequency also yields a highly significant result, as can be seen in table 5.13. There is an extremely significant relationship between absolute frequency and polysemy. Frequent forms tend to be associated with more meanings. There is also a significant gradient relationship between the two factors. The log number of meanings is significantly predicted by the log lexical frequency of the derived form (Spearman’s rho=.34, p<.0001). This correlation is shown in figure 5.6. Table 5.14 breaks down the data by both absolute and relative frequency. Note that the 26 forms for which derived frequency is equal to base frequency are not included in the table. Examination of the table reveals that the effect of relative frequency is absent
Causes and consequences of word structure
102
Figure 5.6: Log number of meanings, as predicted by the log lexical frequency of the derived form (Spearman’s rho=.34, p<.0001). The line shows a non-parametric scatterplot smoother fit through the data. above-av. freq.
below-av. freq.
above-av. meanings
below
above-av. meanings
below
base>der
121
101
411
1211
der>base
36
38
31
53
Table 5.14: Number of forms with above/below average number of listed definitions for suffixed forms with the derived form having above/below average lexical frequency, for forms for which the derived form is more or less frequent than the base
Relative Frequency and the Lexicon
103
when we restrict our attention to high frequency forms (Chi-Square=.55, n.s). Amongst low frequency forms, however, there is a significant effect of relative frequency. Low frequency forms which are more frequent than their bases are associated with more meanings than low frequency forms which are less frequent than their bases (ChiSquare=4.98, p<.05). Similarly, when the derived form is more frequent than the base, there is no effect of absolute frequency (Chi-Square=1.77, n.s.). However, amongst derived forms which are less frequent than their bases, there is a highly significant tendency for frequent forms to be more polysemous than infrequent forms (Chi-Square=79.51, p<.001). Thus, the overall distribution in this table is identical to that seen in table 5.7, for prefixed forms. The two factors appear to interact in an identical manner for prefixed forms and suffixed forms. However the distribution of suffixed forms departs from that of prefixed forms in one important respect. For suffixes, while the log lexical frequency of the derived form is a significant predictor of the gradience in the data (see figure 5.6), the log (base/derived) frequency was not a significant predictor (p<.19). While the results discussed above do seem to suggest that relative frequency is operative in this data, this lack of a gradient correlation, together with the huge effect of absolute frequency, seen in table 5.13, leads us to conclude that the absolute frequency of the derived form plays a more dominant role.
5.4 Summary The relative frequency of the derived form and the base it contains have been shown to have a significant effect upon several factors related to decomposition. This provides additional evidence, to support the experimental results in chapter 4, that derived forms which are more frequent than their bases are significantly less decomposable than derived forms which are less frequent than their bases. Relative frequency affects semantic transparency—derived forms which are more frequent than their bases tend to drift away from the meaning of their base. They also proliferate in meaning, and so tend to be more polysemous than derived forms which are less frequent than their bases. Polysemy is also related to absolute frequency—derived forms which are highly frequent tend to be associated with more meanings than lower frequency derived forms. As long as the base is higher frequency than the derived form, however, these forms do not lose their semantically transparent meanings. The semantically transparent meaning remains, along with additional meanings the form may acquire, by virtue of being lexically frequent. Only when the derived form is more frequent than the base, is there a tendency for the semantically transparent meaning to disappear.
5.5 Consequences The results of this chapter, together with the previous chapter, provide convincing evidence that relative frequency of the derived form and the base is related to
Causes and consequences of word structure
104
decomposability, for both prefixed and suffixed forms. This result has consequences, not only for models of morphological access and representation, but also for methodology used to investigate these issues. There is a large experimental literature dedicated to investigating whether affixed words are accessed whole, or in decomposed form. Comparisons are made between regular and irregular affixes, inflections versus derivations, level one versus level two derivations, phonologically transparent versus non-transparent affixes, prefixes and suffixes, and various other contrasts, to attempt to establish in which cases decomposition takes place. Much of the literature deals with orthographic input. And almost all of it either controls for, or manipulates, lexical frequency. There are numerous paradigms used to test for decomposition, but two of them dominate the literature—repetition priming, and response latency measurement. In the priming literature, an attempt is generally made to control for lexical frequency, and priming effects are investigated. Does a derived form prime its base? Does a base prime related derived forms? And if so, to what degree. Results are mixed, but as a whole, the body of literature certainly demonstrates that priming takes place in some cases. Delimiting those cases has proved to be a difficult task. I will not give an exhaustive review of the literature here, but will describe some typical results, so as to give its general flavor. Many affixed forms do tend to prime the bases they contain. Some argue that this priming effect is stronger for inflectional affixes than derivational affixes (e.g. Stanners et al. 1979 for English, Feldman 1994 for Serbian). Others have demonstrated equivalent priming for inflections and derivations (Burani and Laudanna 1992 for Italian, Napps 1989 and Raveh and Rueckl 2000 for English). Amongst derivational affixes, some experiments tend to suggest that priming is the same regardless of phonological transparency (Marslen-Wilson and Zhou 1999), whereas others suggest that more priming occurs when the base is phonologically transparent (Tsapkini et al. 1999). Auditory presentation of derived suffixed forms appears to inhibit recognition of other suffixed forms sharing the same base. Presentation of sanely, for example, inhibits sanity (Marslen-Wilson et al. 1994, Feldman and Soltano 1999, Marslen-Wilsen and Zhou 1999). There may be a modality difference here, with priming occurring between derived suffixed forms when presented visually, and no apparent effect cross-modally (Feld-man and Soltano 1999). However priming appears to occur between prefixed and suffixed forms in all modalities (Feldman and Soltano 1999). Productive derivational affixes also appear to prime themselves—darkness, for example, facilitates recognition of toughness (Marslen-Wilson, Ford, and Zhou 1997, as cited in Marslen-Wilson and Zhou 1999). Finally, low frequency derived forms appear to prime their stems more than high frequency derived froms do (Meunier and Segui 1999b). Amongst inflected forms, regular forms appear to display full priming (Stanners et al. 1979, Napps 1989, Marslen-Wilson et al. 1993). Experimenters have variously claimed that English irregular inflections display no priming (Kempley and Morton 1982), reduced priming (Stanners et al. 1979), and full priming (Fowler et al. 1985). Marslen Wilson et al (1993) argue that while no priming occurs for irregulars such as burn-burnt, inhibition actually occurs when the inflection involves a vowel change in the stem. That is gave inhibits recognition of give.
Relative Frequency and the Lexicon
105
In the literature on response latencies, lexical frequency is typically manipulated in a lexical decision task. If, for example, derived forms with high frequency bases are recognized more quickly than derived forms with low frequency bases, this would provide some evidence for the role of the base in access, i.e., for decomposition. If, on the other hand, high frequency derived forms are recognized faster than low frequency derived forms, this is thought to provide evidence for a whole-word access strategy. When base frequencies are kept constant, some regular inflections show effects of surface frequency upon response latencies. Sereno and Jongman (1997), for example, demonstrate an effect of surface frequency upon recognition of English regular plurals. Bertram et al. (2000) suggest that this may be due to the fact that -s is homonymal in English, which would slow down processing in decomposition. They present similar homonymal examples from Dutch and Finnish, which display surface frequency effects. Burani and Caramazza (1987) argue that reaction times for suffixed derived words in Italian are related to surface frequency. Meunier and Segui (1999a, 1999b) demonstrate that some suffixed forms in English display surface frequency effects and some do not, and argue that this depends on the word’s position in the morphological family—forms compete with one another based on surface frequency. Vannest and Boland (1999) find a surface frequency effect for forms in -ity and -ation, but none for forms suffixed in -less. They argue that this may reflect a difference between level 1 and level 2 affixes, but, in a second experiment involving more level 2 affixes, they find little evidence to support this claim. When stem frequency is manipulated, and surface frequency is kept constant, Bertram et al. (2000) argue that stem frequency effects are absent for affixes which are homonymal, but may otherwise be present. Burani and Caramazza (1987) show that they are present for suffixed derived words in Italian. Gordon and Alegre (1999) argue that there is a frequency effect, with stem frequency effects emerging when the surface frequency is below 6 occurrences per million. However Meunier and Segui (1999b) demonstrate that there is an effect of stem frequency for suffixed words with both high and low surface frequency. Vannest and Boland (1999) show that the effect is present for some affixes but not others, and suggest it will always be absent for level 1 affixes. Bradley (1979) found similarly variable results. The results of Cole, Beauvillain and Segui (1989) indicate that the cumulative root frequency is relevant for the processing of suffixed forms, but not for prefixed forms. This disjunction is supported by the results of Beauvillain and Segui (1992), who demonstrate an effect of surface frequency for prefixed forms but not suffixed forms. (However note the results outlined above which claim there is a surface frequency effect for at least some suffixed forms). And Taft and Forster (1975) and Taft (1979) argue that there is a stem frequency effect for prefixed forms, even when they have bound stems. As should be apparent, there are conflicting and confusing results in both the repetition priming and the response latency literatures. Many of the experiments are conducted with an implicit assumption that words containing the same affix are a largely homogenous set. This makes the results difficult to interpret in relation to the wordspecific effects described in this book, and may go some way towards explaining the variable results. While it is apparent that the base sometimes plays a role in the perception of derived forms, exactly what that role is, and in what cases it is relevant, is not at all clear. And the
Causes and consequences of word structure
106
suggestion that the relative frequency of the derived form and the base may affect the access strategy, makes the results of many of the studies reported above particularly difficult to interpret. Most of the studies either control or manipulate surface and/or base frequency, because there is an implicit assumption that these factors may affect response times. However, if the relationship of these factors to each other affects the access strategy, then controlling one or both factors can have radically different consequences depending on the level at which they are controlled relative to one another. Consider, for example, a hypothetical response latency experiment where we are controlling base frequency and manipulating surface frequency, in order to determine whether there are whole word representations for words derived with a given affix. Assume that we have (miraculously) managed to control all of the base words such that they all have frequencies 300/million in English. Now, what do we expect to happen when we manipulate surface frequency? Well, if the relative frequency of the base and the derived form has an important effect on representation and access, then it will depend on the frequency range within which we manipulate surface frequency. If we manipulate our derived forms such that they all range between 400–1000/million in English, then we expect that all should have robust whole word representations (as they are more frequent than the bases they contain). Then the more frequent derived forms will be accessed more quickly than the less frequent derived forms, and a significant effect of surface frequency should arise. Now, what if we manipulate our derived forms such that they all range between 50–200/million. In this case, all the derived forms are less frequent than the bases they contain. If the affix is phonologically transparent, and is productive, then such forms may well be accessed via their component parts. As such, because the bases are of roughly the same frequency, there should be little difference in the response latencies for these forms. In a third possible scenario, we might manipulate the surface frequencies, such that they range from 100 per million to 600 per million, thus straddling the level at which base frequency is controlled. In such a case we expect variable behavior. If the whole word route is significantly faster than the decomposed route, then perhaps the more frequent words in our set will be accessed faster. If so, a misleading frequency effect will arise. Words with high surface frequency will be accessed faster than words with low surface frequency, but it would be incorrect to conclude that words with this affix are all accessed as whole words, and that it is the frequency of these words which is directly responsible for the observed difference. It should be clear that, if relative frequency is relevant to access, controlling both factors, or manipulating one of them, should have radically different effects depending on the relative frequency ranges in which they are controlled. However, such information is rarely reported in the literature. When frequency characteristics are reported, they tend to take the form of means, or ranges. Vannest and Boland (1999), for example, report that they matched words for surface frequency and manipulated base frequency, and report means for each of the two sets, making it impossible to deduce their overall profile in terms of relative frequency. Other experimenters also report the range within which their stimuli were controlled. For example, in their priming studies, Raveh and Rueckl (2000) report that their base form targets had a mean frequency of 27 (median 12, range 1–200); their inflected primes had a mean frequency of 27 (median 14, range 0–172); and the derivational primes had a mean frequency of 18 (median 8, range 0–89) (Raveh and Rueckl 2000:112). As such, the relative frequency of the affixed form and the base must
Relative Frequency and the Lexicon
107
have displayed variation within each condition, but we cannot know whether the derived form was more often more frequent than the base in one condition than the other. In some cases, the base frequencies differ markedly across conditions. For example Sonnenstuhl et al. (1999) investigate cross-modal priming of German regular and irregular inflections. They controlled stem frequency within condi-tions but were not able to do this across conditions. ..since nouns which take -s plurals tend to have lower frequencies in German than nouns that take -er plurals (CELEX mean lemma frequencies are 52 for -s plurals and 563 for -er plurals), the lemma frequencies of the nouns we used for -er plurals were higher (CELEX mean frequency 538) than those of the nouns with -s plurals (CELEX mean lemma frequency 43). We would expect this difference to lead to shorter lexical decision times for nouns that take -er plurals in each of the three experimental conditions, especially in the unprimed control condition. Morphological priming effects, however are not determined by directly comparing -s and -er plurals, but are measured within target sets, that is by comparing the same targets in the experimental, identity and control condition. As this is done separately for -s plurals and for -er plurals, the different lemma frequencies mentioned above should not affect the priming results. (Sonnenstuhl et al. 1999:215–216). The argument they put forward stands if lexical frequency’s only role is to affect speed of access. However, if lexical frequency, and in particular, the relative frequency of the derived form and the base, affects the actual form of the representation, and the preferred access strategy, then such an experimental approach is problematic. The difference between the two conditions described above presumably means that the relative frequency relationships in the two conditions also differ, though the frequency of the inflected forms was not controlled, so we can not know this for sure. In short, relative lexical frequency is not taken into account in current experimental work on this topic. While we are often told that experimenters “controlled for lexical frequency,” the exact relationships that hold vary across conditions, experiments, and experimenters, and are often impossible to deduce. This failure to take account of relative frequency relations may well account for some of the conflicting results reported in the literature.
CHAPTER 6 Relative Frequency and Phonetic Implementation
In our calculations over lexica in chapter 3, we observed a consistent effect of phonotactics upon English prefixed words, but no effect upon suffixed words. Despite the fact that listeners rated words which contained illegal transitions across a suffixal morpheme boundary as being highly decomposable (cf. experiment 3), these words appeared no more likely to display characteristics of decomposability (such as semantic transparency) than words which contained legal transitions. We hypothesized that this is partially due to the left-to-right nature of speech. We also speculated that many phonological violations across suffixal boundaries may, in fact, be resolved in the phonetic implementation. Phoneme transitions across suffix boundaries are likely to be more malleable than phoneme transitions toward the beginning of the word, as they are less vital to the word recognition process. Subjects are less likely to notice mispronunciations if they occur late in a word (Marslen-Wilson and Welsh 1978). If other factors lead a whole-word representation to become increasingly robust, we might expect phonological cues to suffixal juncture to disappear. This chapter describes an experiment designed to test the effect of the decomposability of suffixed words upon their phonetic implementation. We find further evidence that the relative frequency of the derived form and the base is relevant to decomposition, and demonstrate that the strength of a morpheme boundary is relevant in the phonetics.
6.1 Experiment 6: Relative Frequency and /t/-deletion This experiment investigates the implementation of /t/ when it occurs in a consonant cluster which straddles a morpheme boundary. We hypothesize that the strength of the boundary will have a direct effect upon the strength of the /t/. Following results of previous chapters, we manipulate the relative frequency of the derived form and the base, predicting that derived forms which are more frequent than the bases they contain will be less decomposable than derived forms which are less frequent than the bases they contain. Consider, for example, the pair of words swiftly and softly. Softly is slightly more frequent than swiftly but this difference is not large—they occur with roughly equivalent frequency.
Relative Frequency and Phonetic Implementation
109
They differ, however, in the frequency of their bases. Swiftly is more frequent than swift. Softly is much less frequent than soft. Following the discussion and results of previous chapters, we therefore expect swiftly to be less decomposable than softly. This difference results from the fact that softly is highly likely to be parsed in perception, whereas swiftly is prone to whole-word access. What consequences might this difference have for their phonetic implementation? The phonetic context of the /t/ is highly unstable in both words. The consonant cluster is prone to simplification—gestural overlap from the enclosing phonemes may obliterate any trace of the /t/ entirely, or at least reduce it. We predict the degree of reduction to be greater in swiftly than softly for three inter-linked reasons. First, if swiftly has a robust whole-word representation which is not very decomposed, then the /t/ is enclosed in a true consonant cluster, and highly prone to simplification. Softly, on the other hand, if it is more decomposed, has a boundary inside the consonant cluster (soft#ly). As such, the degree of gestural overlap may be somewhat reduced. This can be compared further with a form like daftly, which is so infrequent as to be very unlikely to have a whole-word representation. When a speaker produces daftly, then, it is composed on the fly, of daft plus -ly, and so there is a robust boundary which is likely to prevent consonant cluster simplification. Second, the -ftl transition is unattested morpheme-internally in English, and so is a strong cue to decomposition. Upon encountering such a transition, we predict the Fast Phonological Preprocessor to hypothesize a boundary. Such a hypothesis will facilitate recognition of any form which is parsing-route dominant, but hinder recognition of any form in which the whole-word representation is more robust. The presence of the /t/ may therefore facilitate recognition of softly, by clearly marking the morpheme boundary, but will do nothing to facilitate recognition of swiftly, which should be more quickly accessed via a whole-word access route. This may lead the speaker to be more likely to reduce the /t/ in swiftly than softly. And third, the /t/ is an important part of the base word. It will obviously be easiest to recognize soft in softly, if soft is fully contained therein. The presence of the /t/ is therefore important for any word in which the identity of the base word is important. If, on the other hand, a derived word tends towards whole-word access, then the identity of the base word is not important for recognition. For this reason, too, we expect the /t/ to be more likely to be produced in softly than swiftly. And, of course, in a word like daftly where there is unlikely to be any whole-word representation, deletion of the /t/ is likely to considerably hinder recognition. Note that the predictions put forward here run directly counter to independent predictions about the degree of reduction of the base words, when not suffixed. Frequent words tend to be more reduced in production (Whalen 1991, Wright 1997). It follows, then, that the implementation of soft should be more reduced than the implementation of swift, when produced as simple words. The prediction we are making is therefore a particularly powerful one, because it predicts that this trend will reverse when these words are suffixed with -ly. If this is so, it will provide powerful new evidence about the nature of complex words, and the consequences of decomposability for production. 6.1.1 Materials
Causes and consequences of word structure
110
The experimental stimuli consisted of five sets of four matched words. The target words are given in table 6.1. Word A of each paradigm is more frequent than the base it contains. The relevant frequencies are shown in the table. Swiftly for example, is more frequent than its base, swift. Word B of each paradigm is of approximately the same frequency as Word A. In Word B, however, the base is of much higher frequency. Word B of each word set, then, is predicted to be more decomposable than Word A, and so less likely to display /t/-deletion. Word C of each word set is also less frequent than the base it contains. In addition, Word C has an extremely low frequency of occurrence. We predict Word C is extremely unlikely to display properties of whole-word access, and so should display evidence of a /t/. Word D is a control, in which no /t/ is present. If swiftly undergoes complete /t/deletion, then, the transition between the syllables should be comparable to the transition observed in briefly. The words were embedded in sentential contexts, such that all target words occurred phrase finally. If a modifier was present, the same modifier was used for all four words in the word set. The sentences used for the softly/swiftly paradigm are given in (5). (5)
(a)
Chris dropped by very briefly.
(b)
Sam cleaned up the mess very swiftly.
Word A
freq.
base freq
diligently
35
frequently
1036
Word B
freq.
31 arrogantly 396 recently
basefreq 17
116
1676
1814
440
1464
swiftly
268
exactly
2535
532 directly
1278
1472
listless
42
19 tasteless
30
1027
Word C
freq.
221 softly
base freq
Word D 31 gentlemanly
freq.
decadently
0
38
potently
5
119 suddenly
336
daftly
0
56 briefly
584
erectly
5
218 quickly
2677
dustless
1
762 classless
38
Table 6.1: Experiment 6: Stimuli (c)
Fran tapped Sue’s arm very softly.
(d)
John toppled onto stage very daftly.
In addition to the 20 sentences involving target words, 20 filler sentences were used. Example filler sentences are shown in (6). (6)
(a)
Spring came early this year.
Relative Frequency and Phonetic Implementation
(b)
The stapler and tape are kept on the top shelf.
(c)
Sue’s answering machine picks up after just three rings.
(d)
Ellen hates getting out of bed in the morning.
111
The 40 sentences were then pseudo-randomized such that four or more sentences intervened between any two sentences belonging to the same word set. The sentences were printed on index cards. Subjects were seated in a sound proof booth, and went through the cards one at a time, reading each sentence twice. After going through all the cards, the task was repeated. Each subject therefore produced each sentence a total of four times. Six Northwestern University undergraduate students completed the task, for course credit in an introductory linguistics class. 6.1.2 Measurement and Analysis For all subjects, all four tokens of each target word were excised and analyzed using esps/xwaves. For each subject, four rankings were collected for each word set, and then average rankings were calculated. Consider the swiftly/softly word set, for example. Each subject produced four tokens of all four words in this word set. The first production of each word was ranked with respect to the first production of the other words in the set, the second was ranked with respect to the second production of the other words, and so on. Rankings were used rather than direct measurements, so as to allow for statistical comparability across the different word sets, for which slightly different measurement techniques were used, as outlined below. For the swiftly/softly set, the ranking was based solely on the duration of any characteristics associated with the /t/. That is, the period from the offset of the fricative to the onset of the lateral was measured. If the stop was released, then the period of release was included in the measurement. Figures 6.1 to 6.4 show waveforms and spectrograms for one subject’s fourth repetition of each target type, indicating the period which was measured. Once the duration was measured, the durations were transformed into ranks. In these particular examples, the resulting ranking is exactly as predicted by our hypothesis: 1. daftly 2. softly 3. swiftly 4. briefly There were several cases in which there was no stop present, but the fricative was clearly geminated (as in swif-fly). Such cases were ranked below any tokens which contained stops, but above cases which contained a simple, non-geminated fricative. This process resulted in four rankings for each word-set, for each speaker. The rankings for each word-set were then averaged together. For example, if the speaker shown in figures 6.1 to 6.4 consistently displayed these rankings, then the resulting average ranking would be:
Causes and consequences of word structure
112
1. daftly 2. softly 3. swiftly 4. briefly
Figure 6.1: Subject 3: Fourth repetition of daftly. Waveform and spectrogram, with labels indicating measurement of stop duration. Total stop duration=.119 secs
Relative Frequency and Phonetic Implementation
113
Figure 6.2: Subject 3: Fourth repetition of softly. Waveform and spectrogram, with labels indicating stop duration. Total stop duration=.083 secs
Causes and consequences of word structure
114
Figure 6.3: Subject 3: Fourth repetition of swiftly. Waveform and spectrogram, with labels indicating stop duration. Total stop duration=.027 secs
Relative Frequency and Phonetic Implementation
115
Figure 6.4: Subject 3: Fourth repetition of briefly. Waveform and spectrogram, with labels indicating stop duration. Total stop duration=0 secs However, if the speaker displayed entirely random behaviour, with every word occupying a different rank each time, then this would result in every word in the set receiving an average ranking of 2.5. Words in the listless/tasteless word-set were ranked in an exactly analogous manner. Measurements were made from the offset of the fricative to the onset of the lateral. For each word-set, four individual rankings were obtained for each speaker, and then these four rankings were averaged together to obtain an averaged ranking for each speaker. Measurement in the exactly/directly word-set included the complete duration from the offset of the vowel to the onset of the lateral, to avoid the difficulty associated with discerning any boundary between the two adjacent stops. The diligently/arrogantly and frequently/recently word-sets did not lend themselves well to duration measurements. Because the /t/ is embedded between a nasal and a lateral in these words, a complete stop is usually absent. Instead, there is often a region of glottalization associated with the presence of a /t/ in the lexical items. Items in these word-sets were examined in pairs. The presence of a complete stop was ranked above glottalization, and, in cases in which multiple words in the same set contained stops, the duration of the stop was measured, in order to obtain a ranking. Glottalized tokens were ranked above non-glottalized tokens. To rank two tokens which contained glottalization,
Causes and consequences of word structure
116
the wave-forms were carefully examined together. If one token displayed a greater degree of glottalization, which was both audible, and visible in the degree and length of disturbance in the wave-form, then the two tokens were ranked accordingly (cf. Pierrehumbert 1995). In cases in which the distinction was not clear, the tokens were left unranked with respect to each other. These word-sets therefore contained a greater proportion of tokens which were not ranked with respect to each other, than did the wordsets in which the stops could be easily measured. Tokens which were equally ranked were assigned a rank halfway between the two positions they occupied. For example, if recently and frequently are tied for top ranking in a speaker’s first tokens of these words, then the two words are assigned a rank of 1.5 for this particular set of utterances. The analysis resulted in a ranking in terms of the presence of /t/, for each of the wordsets in 6.1, for each of the six speakers. 6.1.3 Results and Discussion The averaged rankings for all five word-sets and all six speakers are given in table 6.2.1 The words are arranged from left to right, in terms of the rankings predicted 1. Several of these scores differ marginally from the results published in the original dissertation. This is due to addition errors, which have been corrected in this version.
by our hypothesis. We expect a fairly clear and reliable /t/ in the leftmost word, as it is very low frequency, and so almost certainly accessed via a parsing route. The second word is of somewhat higher frequency, but is still less frequent than the base it contains. Thus, the relative frequency facts suggest it should be reasonably decomposable. The third word is more frequent than the base it contains, and so we predict that it is less decomposable, and more likely to reduce or delete the /t/. Finally, the rightmost word is a control, which did not contain a /t/ to begin with. It therefore represents total /t/-deletion, and so serves as a benchmark against which to compare the other words. Subjects 1, 2 and 4, for example, have equal averaged rankings for briefly and swiftly, indicating that the /t/ is completely absent in swiftly for these speakers. The words in the middle two columns represent the crucial relative frequency manipulation. For all five items, more subjects ranked them in the predicted order than vice-versa. Similarly, all six subjects ranked more pairs in the predicted order than viceversa (sign test, p<.05). Words which are more frequent than the bases they contain are more likely to display /t/-deletion (or reduction), than words which are less frequent than their bases. There is little evidence that this reduction is due to complete relexicalization of such words, as they still contain more evidence of a /t/ than the controls (sign test: by subjects, p<.05). The difference between the first two columns goes in the predicted direction for all items but potently. And all subjects display the predicted contrast between columns one and two except subject 6. Necessarily decomposable words display more of a /t/ than words which at least have some level of frequency, but are prone to decomposition because of the relatively higher frequency of their base.
Relative Frequency and Phonetic Implementation
117
The numbers in table 6.2 are summarized graphically in the boxplots in figure 6.5. The labels for each boxplot refer to the word-type represented, rather than the specific word itself. That is, the leftmost boxplot in figure 6.5 represents all of the numbers appearing in column one of table 6.2. The word-types are ordered left to right in order of predicted ranking.
6.2 Discussion Extremely low frequency words (such as dustless, and daftly) tend to show the most robust implementation of a coronal stop. One exception to this tendency is the implementation of potently. A likely explanation for this exception is the presence of a coronal stop, earlier in the word. English contains a number of Obligatory Contour Principle-type effects (Berkley 1994), in which similar segments tend to be dis-preferred in close proximity. Thus, the presence of the first /t/ in this word, may have some effect upon the implementation of the second. subj.
daftly
softly
swiftly
briefly
1
1.25
2.25
3.25
3.25
2
1
3
3
3
3
1.5
2
2.75
3.75
4
1.25
2.5
3.125
3.125
5
1.5
2.875
2.25
3.375
6
2.25
2.125
2.375
3.25
dustless
tasteless
listless
classless
1
1.5
1.875
3
3.625
2
1.5
2.125
2.625
3.75
3
1.5
1.5
3
4
4
1
2.75
3
3.25
5
1.375
2.625
2.875
3.125
6
2.75
2.375
1.625
3.25
erectly
directly
exactly
quickly
1
1.75
2
2.25
4
2
2
2.5
3
2.5
3
1.5
2.75
2.25
3.5
4
1.25
2.75
2.25
3.75
5
1.5
1.75
3.75
3
6
1
2.75
2.75
3.5
Causes and consequences of word structure
118
decadently
arrogantly
diligently
gentlemanly
1
1.875
2.125
2
4
2
1.25
2.25
2.5
4
3
2.25
2.5
1.5
4
4
1.875
1.625
2.5
4
5
1.375
2
2.625
4
6
1.5
2.375
2.375
3.75
potently
recently
frequently
suddenly
1
1.625
2
2.375
4
2
2.625
1.25
2.125
4
3
2.5
1.75
1.75
4
4
1.25
2.25
2.5
4
5
2.375
2
1.875
3.75
6
2.625
1.5
2.125
3.75
Table 6.2: Experiment 6: Average “/t/-ness” rankings
Relative Frequency and Phonetic Implementation
119
Figure 6.5: Boxplots of the average “/t/-ness” rankings in table 6.2. Each boxplot represents an entire column. The “swiftly” boxplot, for example, represents averaged rankings for all 5 of the words which were more frequent than their bases, for all 6 subjects. For the rest of the word-sets, the very low frequency derived form displayed more /t/ness than the somewhat more frequent derived form which was nonetheless more frequent than its base. As Whalen (1991) and Wright (1997) have shown, frequent words tend to be more reduced in implementation, and so this result could follow either from facts about decomposability, or simply from facts about the frequency of the word as a whole. Note, though, that in order to argue that the frequency of the word as a whole is having an effect, words like tasteless must have a whole-word representation. If the fact that tasteless is more frequent than dustless is encoded somewhere, then the fact that tasteless exists must also be encoded. This would argue against theorists who claim that most words are stored in decomposed form, with the occasional high frequency form stored. It would also argue against any theory which assumed a binary division between forms which are decomposed versus stored. This result suggests tasteless must be stored in some sense, but the difference between tasteless and listless also suggests that stored
Causes and consequences of word structure
120
forms must display varying levels of decomposability. We should be cautious about reading too much into the dustless-tasteless comparison, however, because the lack of control for base frequency makes the results difficult to interpret. That is, dustless may contain more of a /t/ than tasteless, simply because dust contains more of a /t/ than taste. The more important, and more carefully controlled manipulation, is the role of the relative frequency of the derived form and the base. When we control for the frequency of the derived form, the relative frequency of the base has an important effect. Words which are more frequent than the bases they contain, display a significant reduction in the implementation of /t/ compared to words which are less frequent than the bases they contain. This therefore provides strong support of the evidence presented in previous chapters, that the relative frequency of the derived form and the base affects the decomposability of a word. This result is particularly powerful because known effects of frequency upon the implementation of simple words predict the opposite from what we observe here. Frequent words tend to display a greater degree of reduction than infrequent words (Whalen 1991, Wright 1997). This would predict that (holding the frequency of the derived form constant), complex words with frequent bases should show a greater degree of reduction than words with infrequent bases. Here, we have the opposite effect. This reversal is predicted only when we take varying degrees of morphological decomposition into account. Finally, while some individual speakers showed no difference in pronunciation between swiftly and briefly, overall there was a significant difference between forms containing no underlying /t/, and those words which were more frequent than their bases. For the majority of speakers in this study, it is not the case that, for example, listless has been relexicalized to be just like classless. It is the case, however, that for the majority of speakers, listless is more like classless than tasteless is. Overall, the results of this study provide us with evidence for three crucial points. First, they provide further evidence that relative frequency plays an important role in morphological decomposition. Second, they demonstrate that the degree of decomposability of a morphologically complex form has an important effect on at least some aspects of its phonetic implementation. And finally, they provide some evidence to explain the lack of observed effect of subtle phonotactic differences upon the semantics of suffixed forms (see chapter 3). Because the relevant juncture comes late in the word for suffixed words, the phonetics is malleable. “Illegal” phonotactics across a morpheme boundary can be easily resolved in the phonetics, and so the Fast Phonological Preprocessor does not necessarily receive evidence that there is a cue to juncture. Suffixed forms with illegal phonotactics across the boundary, then, are more easily able to acquire properties of whole-word access than comparable prefixed forms—they may drift in meaning, and become polysemous. However, when they do so, they may also tend to acquire phonetic properties appropriate to whole-word access. The fast phonological preprocessor is not overly sensitive to the “illegal” transition in listless, because this conflicting cue is (at least partially) resolved in the phonetics.
CHAPTER 7 Morphological Productivity
Schultink (1961) defines morphological productivity as follows: By productivity as a morphological phenomenon we understand the possibility of language users to coin, unintentionally, a number of formations which are in principle uncountable. (Schultink 1961, as cited and translated by van Marle 1985:45). The productivity of an affix, then, is the degree to which speakers can use it to unintentionally coin new words. Bauer (1992, 1995) distinguishes between two current views of morphological productivity. Proponents of the scalar view would clam that productivity is a continuum, and that affixes can be more or less productive. Proponents of the absolute view of productivity would claim that productivity can be defined with respect to a set of restrictions—all affixes are equally productive, but some simply place tighter restrictions on possible bases. In this chapter I will argue for the scalar view of productivity. I argue that productivity is a continuum, and arises as a function of decomposed words in the lexicon.1 Affixes display massive variability in productivity. Some affixes (such as English ness) are highly productive, and regularly used to create new words. Other 1. The arguments put forward in this chapter have since been considerably developed in joint work with Harald Baayen (Hay and Baayen 2002, Hay and Baayen forthcoming). The calculations have been repeated with a much larger set of affixes, and a more sophisticated measure of phonotactic juncturehood. That work demonstrates that both of the findings reported in this chapter (that relative frequency is related to productivity, and that phonotactics are predictive of productivity) are robust when tested on a much larger data-set than discussed here.
affixes are completely non-productive (e.g -th). Individual affixes can be differently productive with different kinds of bases (see, e.g. Baayen and Lieber 1991), and perhaps even across different registers (Plag et al. 1999). This type of variable behavior makes the phenomenon very complex to model, and has even lead some linguists to dismiss it as linguistically uninteresting.
Morphological Productivity
123
In short, productivity is a continuum and belongs to a theory of performance which answers questions about how linguistic knowledge is used rather than a theory of competence which answers questions about the nature of linguistic knowledge. (Mohanen 1986:57) However, members of a speech community display remarkable agreement about which affixes can be used in which contexts to create new words (Aronoff 1980). It is not the case, for example, that -th is fully productive for some speakers, and nonproductive for others. Surely knowledge of possible words in one’s language, then, is an aspect of the “nature of linguistic knowledge”. The question of where this knowledge comes from is an important one, and answers have proved elusive. Aronoff (1976:35) dubs productivity “one of the central mysteries of derivational morphology”. In previous chapters I have argued that junctural phonotactics and the relative frequency of the derived form and the base are causally related to the decomposability of affixed forms. In this chapter, I argue that decomposability and productivity are closely related. Affixes which consistently create highly decomposable forms are much more likely to be productive than affixes which create less decomposable words. This result strongly supports models in which morphological productivity is projected from the lexicon.
7.1 Measuring Productivity Baayen and his colleagues (Baayen 1992, 1993, Baayen and Lieber 1991, Baayen and Renouf 1996, and elsewhere), have discussed a number of possible metrics for measuring the productivity of affixes. The most widely used and cited metric arising from this body of work is the metric— which measures the category-conditioned degree of productivity, or “productivity in the narrow sense”. is calculated as shown in (7). (7) For any given affix, n1 is the number of forms containing that affix occurring exactly once in a large corpus—the so called hapax legomena. N is the total number of tokens observed in the corpus containing that affix. Baayen assumes that the number of hapaxes observed for a given affix should be highly related to the number of true neologisms. For a non-productive affix, there will no true neologisms, and so, as the corpus size increases, the number of words encountered just once should be minimal. For a productive affix, however, we expect to find neologisms, even in a large corpus. Large numbers of hapaxes, then, are “a sure sign that an affix is productive” (Baayen and Renouf 1996:74). One reported advantage of this measure is that high frequency forms decrease overall productivity. Plag et al. (1999) explain as follows: A large number of hapaxes leads to a high value of P, thus indicating a productive morphological process. Conversely, larger numbers of high frequency items lead to a high value of N, hence to a decrease of P, indicating low productivity. These results seem to be exactly in
Causes and consequences of word structure
124
accordance with our intuitive notion of productivity, since high frequencies are indicative of the less productive word-formation processes. The assumed lack of contribution of high frequency forms to productivity is generally regarded as a consequence of the fact that they are not decomposed (see, e.g. Bybee 1985, 1995a). However the relationship between decomposition and lexical frequency is not a straightforward one, as has been demonstrated elsewhere in this book. Indeed, the use of token frequency as the denominator of the is not uncontroversial. Van Marle (1992), for example, considers the notion of hapax to be in need of further sophistication, and is concerned that the metric leads to assigning equal productivity to affixes with quite distinct distributions. He takes issue with the use of token frequency as the denominator, noting “once a word is coined, the frequency of use of that word, it seems to me, is more or less irrelevant to the degree of productivity of that rule” (Van Marle 1992:156). He suggests that maybe the frequency of types is more relevant, although does not examine the consequences of this suggestion closely. is not designed to have any kind of explanatory power in terms of the source of morphological productivity. While the metric has some problems (see Van Marle 1992 for further discussion), it does seem to extract relevant aspects of the usage of certain terms, and for the most part does a decent job of ordering affixes in a way that is in accord with linguists’ intuitions. It has therefore been used extensively to study the behavior of a range of affixes. Because it ranks affixes in generally the right way, and because it has been widely applied in the last 10 years or so, we will use this metric as an estimation of the degree of productivity of the affixes we are investigating. This provides an independent measure of productivity, which enables us to examine the relation between productivity and decomposability, and also allows comparison with previous literature.
7.2 Modeling Productivity While much effort has been put into measuring productivity, relatively less has been put into modeling it. Some who model morphological processing more generally, have built some mechanism for productivity into their model. Frauenfelder and Schreuder (1992), for example, assume differences in productivity to be coded into the resting activation levels of the affix and base word. A high degree of productivity is then modeled by a high resting activation level for the affixes. Such an approach presumably assumes that the resting activation level is tied to the frequency of activation, and so implicitly ties decomposition with productivity. The more affixed words that are decomposed on access, the more productive that affix will be. Baayen (1993) also argues for a dual processing race model, in which there are two routes for processing—the direct route, and the parsing route. Whether or not a complex form is parsed is determined by the frequency of that form—if it is above a certain threshold of frequency, then the direct route will win. In order for an affix to remain productive, the parsing route has to win often enough to keep the activation level of that
Morphological Productivity
125
affix high enough that it can still be accessed. Baayen too, then, ties productivity to decomposition. He defines a new measure of productivity, where an activation level is arbitrarily set at a certain frequency threshold—θ. This measure is the number of types of a certain category occurring below this frequency threshold, each weighted by their frequency. He claims this is the only productivity statistic that is psychologically motivated, and that it also seems to be the most reliable. This measure is distinct from the others he proposes, in that it attempts, not only to measure degree of productivity, but also to explain it. The general idea behind the approach is that low frequency types require parsing, and so protect the activation levels of the affixes against decay. The choice of frequency threshold is not straightforward, as Baayen himself notes. He chooses a fairly low threshold, explaining: The low choice of θ in the present paper is motivated by the constraint that only semantically transparent complex words contribute to the activation level A. Since transparency is inversely correlated with frequency, higher values of θ would lead to the inclusion of opaque and less transparent forms in the frequency counts. In the absence of indications in the CELEX database of the (degree of) semantic transparency of complex words, and in the absence of principled methods by means of which degrees of transparency and their effect on processing can be properly evaluated, the research strategy adopted here is to concentrate on that frequency range where complex words are most likely to be transparent. (Baayen 1993:203) Having found a principled means to evaluate degrees of transparency, chapter 5 of this book demonstrated that frequency by itself is, in fact, not a good predictor. High frequency forms are not significantly more likely to display signs of semantic drift than low frequency forms. The relative frequency of the derived form and the base however, was a good predictor, both for prefixes and suffixes. Thus, while drawing a line at a given frequency threshold may have been a good approximation, it does not reliably distinguish between opaque and transparent forms, as Baayen had intended it to. As should be clear from the above discussion, several recent attempts to model the source of productivity rely on the notion of decomposition. High rates of decomposition should ensure the productivity of an affix. Conversely, an affix which is represented by many words which are characterized by whole-word access is unlikely to be productive. In chapters 2 through 5 we established two factors which are fairly robustly related to the morphological decomposability of an affixed word—the phonotactics across the morpheme boundary, and the relative frequency of the derived form and the base. We are therefore in an excellent position to explore the hypothesis that morphological productivity is related to morphological decomposition. Affixes which are represented by a large proportion of forms which are more frequent than the bases they contain, are not often parsed out, and so should display low levels of productivity. And affixes which consistently create juncture-like phonotactics across the morpheme boundary are likely to be represented by more decomposed forms, and so should be highly productive.
Causes and consequences of word structure
126
An indication that this second prediction may be true comes from the literature on level ordering. As will be discussed in the next chapter, level ordering divides English affixation into two strata—one of which is said to be more productive than the other. Consonant-initial suffixes are overwhelmingly more likely to occur on the more productive “level two”. As consonant-initial suffixes are much more likely to create juncture-like phonotactics than vowel-initial suffixes, this bodes well for the prediction. Promising evidence suggesting that relative frequency may also be at play comes from an example raised by van Marle (1992) in his critique of Baayen’s hapax-based measure of productivity. Van Marle translates Baayen’s (1990) characterization as follows: In general productive categories contain relatively few types with a high frequency. […] Conversely, categories with a low degree of productivity are characterized by a low number of hapaxes, and a large number of types with a high frequency […] Baayen (1990:218 as cited in van Marle 1992:156) I
high frequency:
1 kinderen
’children’
437
II
moderate frequency:
2 eieren
’eggs’
49
3 goederen
’goods’
20
4 bladeren
’leaves’
15
5 volkeren
’peoples’
9
6 liederen
’songs’
7
7 gemoederen
’minds’
4
8 gelederen
’ranks’
4
9 runderen
’cows’
1
10 kalveren
’calves’
0
11 lammeren
’lambs’
0
12 raderen
’wheels’
0
13 beenderen
’bones’
0
14 hoenderen
’hens’
0
III
IV
low frequency
not in corpus:
Table 7.1: Frequency of the 14 irregular plurals in eren in Modern Dutch, based on counts from Uit den Boogaart 1975. (Table II from van Marle 1992) Van Marle is skeptical of this characterization, and offers an interesting counterexample—the non-productive plural -eren in modern Dutch. Table 7.1 presents the relevant data (table II from van Marle 1992). Van Marle’s argument runs as follows. Dutch -eren is a classically non-productive affix. Yet it is represented by more low frequency forms than high frequency forms. Moreover, in the corpus studied, five forms did not appear (listed in IV in table 7.1).
Morphological Productivity
127
Thus, in a slightly larger corpus, we might expect them to turn up as hapaxes. This unproductive affix therefore appears to have the characteristics of a productive affix, and certainly not the unproductive profile described by Baayen, above. This example, claims van Marle, is therefore problematic for Baayen’s approach. When we examine the forms in table 7.1, the semantics of the words listed look anything but random. In fact, they are strongly reminiscent of the types of examples offered by Tiersma (1982) in his discussion of local and general markedness. Tiersma’s argument is that nouns denoting objects which tend to occur in pairs or groups, are locally unmarked in the plural. This leads them to have certain profiles, of which one aspect is the plural is more frequent than the singular. Here, then, we have an affix characterized by words which are not particularly frequent. Baayen would predict these words should be transparently related to their base forms, and that this affix should be productive. However, the words containing this affix tend to be more frequent than their corresponding bases. We therefore argue that the words’ connection with their bases is not strong, and that the affix should not be productive. Note that Baayen and Lieber (1991) have explicitly ruled out any role of relative lexical frequency upon morphological productivity, claiming “derivation ratios are irrelevant to the issue of productivity” (Baayen and Lieber 1991:807). However our results demonstrate a clear link between relative frequency and decomposability. This, together with Baayen’s own (1993) intuition that decomposability and productivity are linked, predicts that derivation ratios should, in fact, be very relevant. Section 7.2.1 explores this possibility. 7.2.1 Productivity and Morphological Decomposition In this section I examine the relationship of phonotactics and relative lexical frequency to morphological productivity. The analysis includes all the affixes analyzed in chapters 2 and 4 which passed Baayen and Lieber’s (1991) test for productivity. That is, all the affixes presented here displayed at least one nonce form in the COBUILD corpus (from which the CELEX frequency counts are taken). Omitting non-productive affixes allowed for a straightforward log transformation of the productivity metric. Frequency counts were taken from the CELEX lexical database, which, as stated above, includes counts from the COBUILD corpus. This is a later version of the corpus used in Baayen and Lieber’s (1991) analysis of English affixes. Thus, the values are roughly comparable with those reported there. One important difference is that the analysis here is restricted to forms with monomorphemic bases, and does not include duplicate headwords. To start with, it is worth noting that there is no apparent relationship in this set of affixes between type frequency and productivity. This (lack of) relationship is shown in figure 7.1. Low type frequency affixes appear toward the left of the graph. Affixes which score highly on the productivity metric appear toward the top of the graph. Indeed, many researchers have explicitly excluded any role of type frequency in productivity (see, e.g. Dressler 1997, Anshen and Aronoff 1999). Anshen and Aronoff note:
Causes and consequences of word structure
128
Apparently, a language can have in its lexicon a fairly large number of words from which one could potentially analogize to a productive pattern without any consequent productivity. (Anshen and Aronoff 1999:25) They see this as a “formidable obstacle” to those who argue for quantitative analogy over rules. It certainly appears to be the case that type frequency alone
Figure 7.1: Log type frequency vs log productivity—measured as the ratio of hapaxes to total tokens occurring in corpus (cf Baayen and Lieber 1991). can not predict productivity. What is crucially missing from any analysis focusing on type frequency alone is any information about how decomposable the types are. An affix which has high type frequency is unlikely to be productive if all of the words containing it are relatively opaque. Before turning to an investigation of the relationship between decomposability and productivity, we should note again that the two factors we will be using to index decomposability are not independent. In chapter 3 we saw that individual forms which contain legal junctural phonotactics are significantly more likely to be more frequent than their bases, than forms which contain illegal junctural phonotactics. Thus, there is a link between relative frequency and phonotactics at the level of words. This link carries over to affect the overall behavior of affixes. The relationship is shown in figure 7.2. The x axis shows the proportion of forms for which the derived form is more frequent than the base. The y axis shows the proportion of forms which contain illegal junctural phonotactics. As we have already seen that forms containing illegal junctural phonotactics are unlikely to be more frequent than the bases they contain, it is perhaps not surprising that affixes which are unlikely to create forms with illegal junctural phonotactics (toward the bottom of the graph), are associated with higher
Morphological Productivity
129
proportions of forms which are more frequent than their bases. Taking the log of the proportion of forms for which the derived form is more frequent than the base makes the relationship more linear, and allows for the fitting of a linear regression model. The log proportion of forms for which the derived form is more frequent than the base is well predicted by the proportion of forms creating illegal junctural phonotactics (r2=.68, p<.001). This correlation between relative frequency and phonotactics in individual affixes can be predicted from the fact that they are correlated in individual words. This serves as a first illustration of the degree to which the properties of affixes are shaped by the properties of the words which represent them. Relative frequency and phonotactics are related to each other. Are they also related to morphological productivity? Figure 7.3 shows the relationship between productivity and relative lexical frequency. The x axis shows the proportion of forms, for each affix, which are more frequent than the bases they contain. Thus, roughly 23% of -ment forms are more frequent than the bases they contain, whereas there are no examples of trans- or ex- which are more frequent than the bases they contain. The y axis shows the log of the category conditioned degree of productivity— . There is a relationship between the degree to which an affix is represented by derived forms containing less frequent bases, and the measure of productivity . Figure 7.4 displays the relationship between junctural phonotactics and . The X axis displays the log proportion of words which contain an illegal transition
Figure 7.2: Proportion of forms for which the derived form is more frequent than the base, vs proportion of
Causes and consequences of word structure
130
forms containing illegal junctural phonotactics
Figure 7.3: Proportion of forms for which the derived form is more frequent than the base, vs log productivity—measured as the ratio of hapaxes to total tokens occurring in corpus (cf Baayen and Lieber 1991). across the morpheme boundary.
Morphological Productivity
131
Figure 7.4: Log proportion of forms containing illegal junctural phonotactics vs log productivity— measured as the ratio of hapaxes to total tokens occurring in corpus (cf Baayen and Lieber 1991). As with relative frequency, we see a relationship between phonotactics and . Forms to the right of the graph (characterized by many forms with junctural phonotactics) display higher values of than forms toward the left of the graph, which tend to create legal phonotactics. Examination of these two factors, then, indicates that there is, indeed, some relation between decomposability and productivity. The strength of this relation can be estimated by fitting a multiple regression model to the data—how well can be predicted given phonotactics and relative frequency? The relationship in figure 7.3 is clearly non-linear and so a log transform was taken of the proportion of forms with the derived form more frequent than the base, so as to facilitate multiple regression modeling. Multiple regression was used to fit a model of based on its relation to phonotactics and relative frequency. Together, phonotactics and relative frequency account for a large proportion of the observed variability in productivity. A multiple regression returns an r2 of .8, with p<.001. Both the log proportion of forms with the derived form more frequent than the base (coefficient −1.3, t=− 5.3, p<.001) and the proportion of forms containing illegal junctural phonotactics (coefficient=−7.9, t=−2.9, p<.02) make a significant contribution to the model. Figure 7.5 provides a graphical display of the degree to which the resulting model predicts the observed data. The x axis shows the two predicting factors, each weighted by
Causes and consequences of word structure
132
the values of their coefficients. The y axis shows the observed productivity value. Affixes which have a small proportion of forms which are more frequent than their base, and a large proportion of forms with illegal juncture are predicted to be highly productive. A possible objection to modeling productivity in this way is that it is based on proportions of forms containing a given affix, without any indication of how many forms are involved. Bybee and Pardo (1981) and Bybee and Newman (1995) have demonstrated that people are unlikely to generalize a pattern which only occurs in a very small number of forms. However the model shown in 7.5 would predict that, if an affix occurred in only one word, and that word happened to be less frequent than its base and contained junctural phonotactics, then the affix should be fully productive. Type frequency surely plays a role too. The current set of affixes does not suffer under an analysis which excludes type frequency, because they are preselected to display at least a minimum of productivity. If the set of affixes were widened, or the analysis were extended to include completely unproductive affixes, type frequency may well be needed in order to account for the observed patterns. Indeed there are many additional factors which could potentially be relevant. I do not want to claim that the factors examined here are the only factors which are operative in the emergence of productivity. When the set of affixes is widened still further, factors such as the phonological transparency of the base (controlled away from in the current data set), and the overall transparency of the semantics are certain to play a role. Minimally, we predict that any factor which facilitates decomposition of complex forms should also facilitate the emergence of productivity. This brief discussion of productivity has demonstrated two important points. First I have further illustrated the importance of the two factors which are the focus of this thesis, demonstrating that their influence extends beyond the decomposition of individual forms. And second, I have provided evidence in support of the hypothesis that morphological productivity is emergent from the lexicon. The more an affix is represented by highly decomposable forms, the more likely it is to be productive.
Morphological Productivity
133
Figure 7.5: Predicted productivity values based on multiple regression model, plotted against observed productivity values. The X axis shows the proportion of affixed forms containing illegal junctural phonotactics, and the log proportion of forms with derived form more frequent than base, each weighted by their coefficient values. The Y axis represents log productivity, as measured by the ratio of hapaxes to total tokens occurring in corpus (cf Baayen and Lieber 1991).
CHAPTER 8 Affix Ordering
One of the most debated problems of English morphology is that of stacking restrictions amongst derivational affixes. Of the many potential combinations of affixes, only a very small proportion are actually attested. Many attempts to account for apparent restrictions on affix ordering have invoked some form of the AffixOrdering Generalization (Siegel 1979). The basic claim of the Affix-Ordering Generalization is that affixes can be divided into two sets—level 1 affixes, and level 2 affixes. Level 1 affixation occurs prior to level 2 affixation, and so no level 1 affix can attach outside of any level 2 affix (hence *-nessic, *-less-ity, etc.). In this chapter I argue that early accounts of affix ordering were overly restrictive, and drew the line at the wrong level of abstraction. However, more recent work which has discarded the idea that there are any restrictions on ordering (beyond selectional restrictions) misses a number of important generalizations. Many facts about English stacking restrictions can be predicted if we reduce the problem to one of parsability. In this way, we can capture not only the range of generalizations about English stacking restrictions, but also a large number of systematic, word-based exceptions to these generalizations. In a paper widely cited as responsible for disproving the affix-ordering generalization, Fabb (1988) argues against a stratificational approach to affix ordering. He demonstrates that the Affix-Ordering Generalization fails to rule out a large number of affix combinations, which nonetheless do not occur. He argues that affix ordering is constrained only by selectional restrictions: …all but a few non-occurring pairs of suffixes can be ruled out solely by a combination of selectional restrictions, of which one of the most extensive in its effects is a restriction against attachment to an already suffixed word. (Fabb 1988:538) One of Fabb’s observations, then, is that there seem to be a large number of affixes which do not attach to already affixed words. This observation, I argue, holds the key to understanding restrictions on affix ordering in English. Many affixes are sensitive to internal structure in potential bases. While some affixes basically tolerate no internal structure, others will tolerate structure to some minimum degree. The degree of internal structure tolerated by an affix is not determined by selectional restrictions, however. Rather, it is determined by how much structure that affix, itself, creates. Phrased in terms
Affix Ordering
135
of processing, an affix which can be easily parsed out should not occur inside an affix which can not. I demonstrate that this maxim, when combined with an understanding of the role of frequency and phonology in morphological processing, accounts neatly for restrictions on affix ordering in English. We do not find any cases in which an affix attaches only to forms which are maximally decomposable, and not to forms which are relatively opaque. However, we find large numbers of cases where the opposite holds true. The range of results put forward here cannot be accounted for by the classical affix-ordering account, nor by any account involving selectional restrictions.
8.1 A Parsing Account This chapter explores the hypothesis that affix-ordering constraints are related to the perception and storage of morphologically complex forms. I argue that affixes which are likely to be parsed out during perception are restricted from occurring inside affixes which are less likely to be parsed. This hypothesis is explored in the context of the two factors which have been the focus of this book, which together provide an index of the inherent decomposability of any given morphologically complex form—the phonotactics across the morpheme boundary, and the relative frequency of the derived form and the base. Such an account incorporates one of the main insights of Lexical Phonology—that affixes create different boundary strengths, and that boundary strength is related to ordering. But it extends this by moving to an account of boundary strength which is gradient, and which is closely tied to decomposability in speech perception. The account relies on the assumption that degrees of parsability exist, such that even different words containing the same affix may contain different degrees of decomposability (e.g. discernment vs government). We can view words containing a particular affix as together forming a distribution of varying juncture strengths. Tasteless, for example, has a stronger juncture (i.e. is more easily decomposable) than listless, and these words probably occupy opposite extremes of the -less distribution. Similarly, warmth is more decomposable than health, and these two words tend towards opposite extremes of the -th distribution. However, there is (probably) no word affixed in -th that is more decomposable than some word affixed in -less. That is, these two distributions do not overlap. While—less words occupy a range of juncture strengths, and -th words occupy a range of juncture strengths, the entire -less distribution represents higher levels of decomposability than the entire -th distribution. Other affix pairs, however, (e.g. -less and -ness), may exhibit overlapping distributions. Thus, while the average juncture strength of one affix may be lower than a second, the most decomposable word containing the first affix, may still be more decomposable than the least decomposable word containing the second. The overall separability of an affix has an effect on any individual word, and is a cumulative effect of its level of separability in all words. This leads to a number of predictions. Consonantinitial suffixes, for example, should tend to be more separable than vowel-initial suffixes, because they tend to enter into more illegal phonotactic combinations, which leads to increased rates of parsing, and so increases the overall resting activation level of the affix.
Causes and consequences of word structure
136
In addition, consonant initial affixes in words actually containing illegal phonotactic transitions should be more decomposable than in words without such transitions. The hypothesis that decomposability is related to ordering, together with the above view of decomposability, and the results reported in previous chapters regarding phonotactics and lexical frequency, lead to five specific predictions which are articulated below.
8.2 Hypotheses 8.2.1 The same suffix will be differently separable in individual words depending on the phonotactics. Individual words containing the same suffix will be tend to be more decomposable if they contain a low probability phonotactic transition than if they do not. (e.g. pipeful is more decomposable than bowlful). If a prelexical processor posits boundaries based on phonotactics, and speeds access to entries which are well aligned with those boundaries, then words containing low probability or illegal phoneme transitions will be more likely to be decomposed than words containing the same affix, which exhibit fully legal phonotactics. Evidence that this is true was presented in chapters 2 and 3. Evidence that it affects ordering is given in sections 8.3 and 8.7. 8.2.2 The same suffix will be differently separable in individual words depending on the frequency. Individual words containing the same suffix will tend to be more decomposable if they are less frequent than their base than if they are more frequent than their base. (e.g. discernment is more decomposable than government). Words which are frequent, relevant to their bases are prone to whole word access. Affixes which appear in such words (e.g. government—more frequent than govern) are likely to be less separable than the same affix in words which are less frequent than the bases they contain (e.g. discernment—less frequent than discern). Evidence suggesting that this is true was summarised in chapter 4. Evidence that it affects ordering is given in sections 8.4 and 8.6. Note that the theory also predicts that a word for which the base is much more frequent than the derived form should be more decomposable than a word for which the base is just slightly more frequent than the derived form. What we are really dealing here is not with two classes of words (parsed and not parsed) but a continuum, on which we have placed a relatively arbitrary dividing point for reasons of convenience. 8.2.3 Suffixes beginning with consonants will tend to be more separable than suffixes beginning with vowels (e.g. -ness tends to be more separable than -ess) The resting activation level of an affix is a function of how often it has been accessed. Highly frequent, often-utilised affixes will have high resting activation levels. It follows that affixes which are represented by a large number of words which tend towards
Affix Ordering
137
decomposition will have higher activation levels than affixes which are represented primarily by words which are prone to whole-word access. That is, the more an affix is used in general, the more likely it is to be used for the access of any particular word. Another way of viewing this hypothesis is that frequency facilitates access. Affixes which are more often used in access are from the point of the processor, more frequent affixes. This, together with hypothesis 8.2.1, predicts that suffixes beginning with consonants will tend to be more separable than suffixes beginning with vowels (all other things being equal). This is because suffixes beginning with consonants more often form illegal phonotactics across the morpheme boundary, and so are likely to be represented by a greater number of individual words which are highly prone to decomposition than suffixes beginning with vowels. The result of this is that, even in individual cases where there is no phonotactic violation, consonant-initial affixes are likely to be more readily activated than vowel-initial affixes. As with prediction 8.2.2, this prediction takes a continuum and chops it in two, purely for reasons of convenience. Affixes vary in terms of the degree of phonotactic juncture they tend to create. Some consonant-initial suffixes create more consistently low-probability junctural phonotactics than others. The same stands for vowel-initial suffixes. And these differences, too, are predicted to be relevant to ordering.1 The division between consonant-initial and vowel-initial consonants is simply a convenient place to draw the line, and divide the affixes into two sets for the purposes of hypothesis testing. 8.2.4 Suffixes represented by a relatively high proportion of words which are less frequent than their bases will tend to be more separable than suffixes represented by a relatively low proportion of words which are less frequent than their bases (e.g. -ish tends to be more separable than -ic). The logic here follows the same path as above. If an affix is represented by a substantial proportion of words which are more frequent than the bases they contain, then the affix itself is not often activated. This should make it, overall, a less separable affix than a comparable affix which is represented by a strong majority of words which are less frequent than the bases they contain. 93% of the monomorphemic bases which are listed in the CELEX lexical database (Baayen et al. 1995) as affixed with -ish, for example, are more frequent than the corresponding derived -ish words. 73% of bases of -ic are more frequent than the derived -ic words. Thus if we compare two words which are otherwise matched for frequency, we expect a word containing -ish to be more decomposable than a word containing -ic. Grayish and scenic, for example, have roughly similar frequency profiles (per 17.4 million: gray: 32, grayish: 1542, scene: 30, scenic: 1995). However because the affix in the first word has a higher resting activation level 1. Hay and Baayen (to appear) calculate the average probability of the phonotactic junctures created by 52 English suffixes—high probability junctures are phoneme transitions that are highly likely to occur inside monomorphemic words, low probability junctures are unlikely to occur inside monomorphemic words. The probabilities for average junctures created by the consonant-initial suffixes ranged from .000016 (-hood) to .000926 (-some). The probabilities for the vowel-initial suffixes ranged from .000155 (-oid) to .009004 (-ive). There were two vowel-initial affixes which tended to create worse (i.e. lower probability) phonotactics across the morpheme boundary than the
Causes and consequences of word structure
138
consonant-initial affix which created the best phonotactics (-some). These were -oid and -eer. This prediction, then, is a crude generalization which follows from a more complex set of overlapping probability distributions, and the generalization as stated has several exceptions. However the division between consonant-initial and vowel-initial consonants is a very convenient place to draw the line, and divide the affixes into two sets for the purposes of hypothesis testing.
than the affix in the second word, we expect grayish to be more decomposable than scenic. 8.2.5 More separable affixes will occur outside less separable affixes This is the primary hypothesis under investigation in this chapter. Together with the above four hypotheses, it predicts that we should find effects of complexity-based ordering at both the affix-level and the word-level. At the affix level, I argue below, the combined hypotheses can account for the set of phenomena which motivated levelordering. At the word-level, I argue in sections 8.4–8.6, it accounts for a large number of word-based “exceptions” to level-ordering phenomena—cases in which an affix may attach to just a subset of words containing a second affix—namely, those with low levels of decomposability. It is important to note that I do not wish to argue that phonotactics and frequency are the only factors which influence decomposition. There are likely to be many others, including stress pattern, semantic and phonological transparency, and syntactic context. Hypothesis 8.2.5 predicts that any factor which influences decomposition will also influence ordering. In this book I investigate just two factors. They are not necessarily the most important, but they are sufficiently well understood that we can use them to begin to untangle the unarguably intricate relationship between complexity and ordering. The most common account of restrictions on affix-level co-occurrences is levelordering, which claims that affixes attached at level 1 cannot attach outside affixes attached at level 2. If the account proposed here is consistent with the set of facts generally attributed to level ordering, then hypotheses 8.2.3 and 8.2.4, together with hypothesis 8.2.5, suggest that affixes which are generally analyzed as level 1 and level 2 affixes should show different characteristics from one another. Namely, we expect level 2 affixes to more often enter into phonotactic violations (i.e. to begin with consonants), and we expect them to be represented by many words which have frequency profiles which facilitate decomposition. If this is the case, then no explicit stratification is required. This would lead “level 2”affixes to be generally more separable than “level 1” affixes, and so (by hypothesis 8.2.5) would predict that they should occur outside level 1 affixes. I therefore now turn to an exploration of the phonotactic and frequency characteristics of English affixes.
8.3 Frequency and Phonotactic Profiles of Level 1 and 2 Affixes In this section I demonstrate that affixes commonly categorized as level 1 tend to have attributes which would encourage whole word access, whereas affixes categorized as level 2 tend to have attributes which would facilitate parsing. Note that this does not constitute an endorsement of the division into two levels. The division into two levels is
Affix Ordering
139
stipulative and non-explanatory, and is in any case highly questionable (see, e.g. Fabb 1988, Plag 1996). I will argue that this is not a necessary or desirable division. It is, however, useful to explore the affixes which have traditionally been assigned to each of these classes as a first step towards investigating the plausibility of a parsing account. Minimally, we expect such an account to be able to account for the set of facts which are generally attributed to level-ordering. Listed in (8) and (9) is a set of suffixes for which a relatively clear consensus as to their level has emerged in the literature (see, e.g. Siegel 1979, Aronoff 1976, Kiparsky 1982, Selkirk 1982, Fabb 1988, Szpyra 1989). Giegerich (1999) provides a thorough overview discussion of the level status of individual affixes, and highlights the apparently dual nature of many affixes. Affixes which are not much discussed in the literature, are of unclear status, or are commonly regarded has having dual membership are omitted from the list. (8) Level 1: -al, -an, -ary, -ate, -ese, -ette, -ian, -ic, -ify, -ity, -or, -ory, -ous, -th. (9) Level 2: -age, -dom, -en, -er, -ful, -hood,- ish, -less,- let, -like, -ling, -ly, -most, -ness, -ship, -some. The first point to make about these two lists is that they reveal that level 1 affixes more often tend to begin with vowels than level 2 affixes. Significantly more consonant-initial suffixes occur at level 2 (Fisher exact test, p<.001). The difference is so marked, in fact, that Raffelsiefen (1999) argues that the true distinction amongst English suffixes should be drawn between vowel-initial and consonant-initial suffixes. As outlined above, phonological transitions appear to be used in online morphological parsing, and so affect a word’s compositionality. Words which contain phonotactic violations are more likely to favour the parsing route and remain robustly decomposed than words which contain no such violations. The nature of English is such that most low probability or illegal junctures involve clusters of consonants. As such, the likelihood of a word with a consonant-initial suffix containing a phonological violation across the morpheme boundary in English is much higher than the likelihood of a word with a vowel-initial suffix containing a violation. Words with consonant-initial suffixes should therefore tend to be more decomposed. Consider now the following excerpt from Plag (1996), in which he argues that general semantic factors may rule out a large number of affix combinations in English. It is the oddness of the denotation and not of the morphological form that makes kafkaesquism a presumably unacceptable derivative. Parallel arguments hold for putative derivatives involving other adjectival suffixes, consider ?girlishism, ?peacefulism, ?wholesomeism. Factors like blocking may additionally be involved (as always), as can be seen with ?helpfulism, which is probably blocked by altruism. (Plag 1996:794) While blocking and semantic unlikelihood may well be at play in the examples Plag lists (on the importance of blocking see also Aronoff 1976, Kiparsky 1983), there is also an independent reason why such forms should tend to be dispreferred—the phonotactics lead to an unhelpful parse. Consider the behavior of the processor upon encountering the form helpfulism. The phonotactics across the first boundary (together with an overall high rate of parsing of -ful) will lead to a decomposition hypothesis: help#ful. The suffix -ism,
Causes and consequences of word structure
140
however, does not have properties which would lead it to be as easily parsed out. A prelexical processor, then, is likely to parse the word with a single boundary: help#fulism. Hardly a helpful parse in terms of recovering the semantics. The optimal arrangement of suffixes would be one in which affixes favoring whole-word access occurred inside of affixes favoring decomposition. Or, even more generally, affixes favoring decomposition should not be particularly sensitive to the internal structure of a word, but affixes favoring whole-word access should disfavor words with internal structure. Phonotactics, however, is not the only factor which influences the likelihood of parsing. A second factor is frequency. We have seen in chapters 4 to 6 that relative frequency is relevant to decomposition. Thus, if, as is commonly argued, level 2 affixes tend to be more separable than level 1 affixes, we expect that level 1 affixes should tend to have a greater proportion of forms which are more frequent than the bases they contain. When we examine the relative frequency profiles of these affixes, this hypothesis is confirmed. The level 1 affixes range from having 4% of forms more frequent than the base they contain (-ify) to 32% (-ic). The average proportion of forms which are more frequent than the bases they contain for the level 1 affixes is 17%. There are four level 2 affixes which have no forms which are more frequent than their bases (-dom, -hood, -let, and -ship). The level 2 affix represented by the greatest proportion of words which are more frequent than their bases is -age (12%). The average for level two affixes is 5%. Thus, level 1 and level 2 affixes tend to have quite different relative frequency profiles (t=4.8, df=18.7, p<.0001). These relative frequency facts alone should lead level 2 affixes to be more parsable/separable than level 1 affixes. Thus, both the phonotactic and the frequency profiles of level 1 and level 2 affixes predict that level 2 affixes tend to be prone to parsing, whereas level 1 affixes do not. Note, also, that one characteristic which is often claimed to be relevant to levelordering is morphological productivity—level 2 affixes tend to be much more productive. Indeed this should follow directly from the observation that level 2 affixes tend to be represented by more words which are not more frequent than their bases, and which have phonotactics which facilitate parsing. Chapter 7 demonstrated that both of these characteristics are predictive of a high degree of morphological productivity. Indeed, when we compare the productivity levels of the affixes in (8) and (9) using Baayen’s (1992) category-conditioned measure of productivity (see chapter 7 for discussion), we find that the level 1 affixes average =.002 whereas the level 2 affixes average =.030. In sum, affixes generally classified as level 1 affixes contain properties which facilitate whole-word access, and those generally classified as level 2 affixes contain properties which lead to parsing. This suggests that the affix-ordering generalization can be largely reduced to a perceptually grounded maxim: an affix which can be easily parsed out should not occur inside an affix which can not. This has the result that, the less phonologically segmentable, the less transparent, the less productive an affix is, the more sensitive it will be to internal structure. We predict that highly parsable affixes, however, will contain predictable meaning, and will be easily parsed out. Such affixes can pile up at the ends of words, and should display many syntax-like properties. This captures the set of facts that the Affix-Ordering Generalization set out to explain. No explicit stratification is necessary. The generalization correctly rules out practically
Affix Ordering
141
every combination of non-occurring affixes cited as evidence for the Affix-Ordering Generalization, providing a straightforward explanation for why the highly productive, highly decomposable, mostly consonant initial suffixes of level 2 do not occur inside the less productive, less decomposable, mostly vowel initial suffixes of level 1 (e.g. *homelessity, *helpfulism, *sadnessic, *wistfulity, *gratefulize etc). Not only is -less much more frequent and represented by a smaller proportion of words with frequency profiles which facilitate whole-word access than -ity (2% more frequent than base vs 10%), it is also much more productive ( =.017 vs .001), and begins with a consonant so is much more likely to create a low probability juncture. The same type of relationship holds between the other pairs of non-attested affix combinations cited above. Note I am not arguing that this is only reason these specific examples are unacceptable—but I am arguing that if all other restrictions such as Latinateness, semantic anomaly, blocking etc. were removed, these should still be independently dispreferred for processing reasons. In addition to ordering consonant-initial and vowel-initial suffixes correctly, the parsing account explains a large number of facts relating to the interaction of decomposability with stacking restrictions, which have previously gone un-noticed. These are word-based effects, which speak to hypothesis 8.2.1, above. These are discussed in the following sections, in the context of Fabb’s (1988) critique of level-ordering approaches.
8.4 Fabb’s (1988) Affix Classes Fabb (1988) divides English suffixes into four classes, according to the types of selectional restrictions they display. Plag (1996,1999) argues against affix-driven selectional restrictions, arguing instead that much of the data can be explained by basedriven selectional restrictions. He steps through Fabb’s four classes individually, reanalyzing the data. In the next four sections I, too, step through these four classes, one at a time. By discussing Fabb’s analysis, and Plag’s (1996,1999) objections to Fabb’s analysis, I demonstrate that the four classes are, in fact, variants on a single theme. The facts at hand can be explained by the general processing account outlined above. In addition, many previously unnoticed and unexplained patterns receive a natural explanation when processing is taken into account. 8.4.1 Suffixes which never attach to an already suffixed word Fabb lists 28 suffixes which he claims never attach to an already suffixed word. He argues that a selectional restriction is at work, which has the result that these particular suffixes cannot attach to any word which has already undergone suffixation. He argues that this explanation goes beyond level-ordering, by having increased predictive power: Taking the level 1 suffix -ify as an example, level-ordering predicts that it does not attach outside any level 2 suffix (hence *derivable-ify); this is alternatively predicted if the suffix is prevented by the above-mentioned selectional restriction from attaching outside any suffix, and this second approach also makes a further prediction not made by level-ordering,
Causes and consequences of word structure
142
which is that -ify does not attach outside any other level 1 suffix either (e.g*personalify, *destructivify). The restriction against suffixation to an already-suffixed word cuts down the number of potential suffix pairs considerably, and at the same time does a large part of the work of levelordering of suffixes. (Fabb 1988:533) From the point of view of a parsing account, the above generalization would seem sensible—the natural behaviour of a set of affixes which tend to strongly favor the wholeword access route. For affixes which are represented predominantly by words which favor whole-word access, any degree of internal structure in potential bases is likely to be dispreferred. That is, if affixes may not attach to other affixes which are more separable than they themselves are, then it follows that there will be a set of affixes which display minimum decomposability, and so are not able to attach to any other affix. Plag (1996), in his reply to Fabb, argues that Fabb’s claim is hugely over-simplistic, presenting evidence that a large number of the affixes do, in fact, occur on suffixed bases, and that many combinations are ruled out by by independent constraints such as typeblocking, the Latinate constraint, or phonological constraints on bases. The most relevant part of this discussion is the discussion of those affixes about which Plag claims Fabb was wrong—these affixes do, claims Plag, occur with suffixed bases. Some of the examples he provides involve suffixed bases which contain bound roots. With derivatives of the form V-ory we find lots of counterexamples to Fabb’s claim that deverbal -ory does not attach to already suffixed verbs. In fact, verbs ending in -ate may take -ory as an adjectival suffix productively, for example assimilatory, emancipatory, stimulatory. (Plag 1996:785) While it is not clear whether exceptions involving bound roots are problematic for Fabb’s account, they are certainly predicted under an account in which parsing is central. A form with a bound root is a clear example of a case in which whole-word access dominates, and parsing is highly unlikely. As such, bases containing bound roots have a very low level of decomposability. If -ory disprefers affixation with decomposable bases, it should come as no surprise that the examples Plag cites are wellformed. They stand in strong contrast to the unacceptability of forms containing bases with a higher degree of decomposability, such as *pollinatory, *alienatory, *activatory, *validatory and *orientatory. Not all of the exceptions cited by Plag involve bound roots however. For example, he takes issue with Fabb’s characterization of -ist as an affix which can not attach to already suffixed words. …we find numerous examples of already suffixed nouns that take -ist as a suffix, such as abortionist, expansionist, consumerist, conventionalist. (Plag 1996:783) I first discuss consumerist and conventionalist, and then turn to discussion of the -ionist combination below. While the affix -er does not create a strong phonotactic juncture, it is highly frequent and highly regular, and so more likely to be parsed out than its
Affix Ordering
143
counterpart -ist. We might therefore predict this to be a non-optimal combination of affixes. In fact, a search through the CELEX lexical database (Baayen et al. 1995) lists no words displaying this combination of affixes (not even consumerist). Moreover a moment’s reflection reveals that this is generally a dispreferred combination (cf. *writerist, *defenderist, *climberist). What, then, makes consumerist acceptable? Here we see further evidence that affix ordering is intricately linked to decomposability— relative frequency matters. Consumer is more frequent than consume (CELEX listed frequency of 627 per 17.4 million as opposed to 423). As argued above, this fact reflects a low level of decomposability. Because consumer favors whole-word access, it contains a minimum of structure. -er is therefore unlikely to be parsed out, and so -ist is able to attach unproblematically. Conventional has a listed frequency of 834, and so is more frequent than its base, convention (470). Relative frequency facts, then, predict that both conventional and consumer should favor whole word access, making them more acceptable bases of -ist affixation than base forms which are more prone to decomposition. However, note that the existence of conventionalist is not a good counter-example to Fabb’s claim that denominal -ist cannot attach to suffixed forms—conventional is an adjective, so more properly belongs in Fabb’s “problematic suffixes” category, in which he includes deadjectival -ist (See discussion in section 8.4.4). The other two examples listed by Plag end in -ionist. The affix combination -ionist occurs more frequently than either -erist or -alist. However we should still predict that, if -ist is only able to attach to a restricted range of -ion final forms, it will display a preference for forms which are minimally decomposable. Indeed, CELEX contains no shortage of forms in which the base of -ion is bound, such as abolitionist. It also, however, lists a number of forms which do not contain bound roots. Table 8.1 lists all such forms, together with their frequency, the frequency of the innermost base, and the ion-final base. Of the 20 -ion final bases, 14 are more frequent than the roots they contain— indicating low levels of decomposability. The other seven are less frequent than their roots, although not by much. Of the forms affixed in -ion that -ist appears with, then, 70% are more frequent than the bases they contain. This is a very high percentage compared to the percentages we dealt with in our discussions of relative frequency in chapter 5. There we saw that the vast majority of affixed forms were less frequent than the bases they contained. However, that analysis was restricted to consonant-initial suffixes. The fact that we are dealing with a vowelinitial affix here is likely to increase the overall likelihood that forms will become liberated from their bases, and overtake them in frequency. To compare the frequency distribution of these forms, as opposed to other forms suffixed with -ion, I extracted all forms ending with this morpheme from CELEX. In order to simplify the calculation, I restricted the set to only those with a monomorphemic base (excluding forms such as acclimitisation)—so as to avoid taking into consideration the separate frequency contributions from the root, and the base of affixation. CELEX lists 741 entries representing monomorphemic bases affixed with -ion. Of these, there are 262 cases in which the derived form is more frequent than the base—35%. Therefore -ion final forms which can be bases abort
35 abortion
351 abortionist
18
Causes and consequences of word structure
collaborate
144
72 collaboration
102 collaborationist
1
116 conservation
554 conservationist
119
contort
55 contortion
19 contortionist
3
deviate
33 deviation
51 deviationist
6
divert
216 diversion
108 diversionist
0
exhibit
218 exhibition
410 exhibitionist
19
express
1874 expression
1669 expressionist
32
0 extortion
11 extortionist
1
impress
689 impression
1128 impressionist
16
isolate
212 isolation
312 isolationist
8
conserve
extort
obstruct
66 obstruction
perfect
94 perfection
204 perfectionist
17
prohibit
134 prohibition
114 prohibitionist
2
project
399 projection
236 projectionist
5
protect
1534 protection
978 protectionist
22
revise
154 revision
93 revisionist
25
19 secessionist
3
secede
14 secession
vacate
48 vacation
vivisect
1 vivisection
79 obstructionist
270 vacationist 42 vivisectionist
0
2 3
Table 8.1: -ionist forms listed in CELEX: frequency of the base, -ion form and -ist form. of -ist affixation are more frequent than the bases they contain twice as often as would be expected by chance. A Chi-Square comparing the number of derived—ion forms which are more/less frequent than their base in this restricted set, versus the larger set from CELEX, reveals this difference to be highly significant (Chi-Square=7.36, df=1, p<.01). Note that this is not an artifact of absolute frequency. The average listed frequency of ion forms in CELEX is 300.2. Of the 20 -ion forms listed in table 8.1, 13 are associated with below-average frequency. This pattern is also present in other affixes discussed by Plag. For example: The denominal suffix -ize seems to attach quite often, and naturally, to suffixed nouns of various types. Consider for example computerize, christianize, preacherize, protestantize (if one assumes that the stem is a noun and not an adjective). (Plag 1996:787)
Affix Ordering
145
The forms listed by Plag are relatively opaque, with the exception of preacher, and preacherize seems to me to be marginally acceptable in any case. Again, note it would not be particularly problematic for us if -ize could freely attach to forms which are affixed, as long as those forms were not highly decomposable (as would be the case with consonant initial suffixes). To the degree that we do find it occurring with forms with internal structure, we expect it to be preferred on forms which display minimal decomposability. That is, following hypothesis 8.2.2 we do not expect to find any cases in which an affix attaches only to forms which are maximally decomposable and not to forms which are relatively opaque. However, we expect to find large numbers of cases where the opposite is true. 8.4.2 Suffixes which Attach Outside One Other Suffix In his second category of affixes, Fabb lists a number of affixes which each appear to be licensed to attach outside one specific affix. For example -ary can attach outside -ion as in revolutionary, and deadjectival -y can attach outside -ent as in residency. If it is the case that the affixes in this category really do not occur outside any affixes other than that listed, then these affixes display a general dispreference for attaching to forms with internal structure. It is no problem to assume that they can occur outside specific affixes, if those affixes are not highly decomposable or parsable. However we might predict some variability, even with affixation to the licensed affix. If, for example, -ary displays a general dispreference for affixing to decomposable words, then we might expect it to appear more on -ion final bases which display minimal structure. More highly decomposable -ion forms may avoid affixation with -ary. Since we have already calculated some general properties of forms affixed in -ion for the previous section, this is not difficult to test. Fabb lists three suffixes deflate
39 deflation
29 deflationary
15
discrete
47 discretion
157 discretionary
31
divert
216 diversion
108 diversionary
25
evolve
438 evolution
455 evolutionary
180
expedite inflate probate react revert revolve
8 expedition 59 inflation 8 probation 510 reaction 0 reversion 123 revolution
326 expeditionary 677 inflationary 83 probationary 1288 reactionary 18 reversionary 1596 revolutionary
Table 8.2: -ionary forms listed in CELEX: frequency of base, -ion form and -ary form.
8 80 17 96 1 802
Causes and consequences of word structure
146
which can attach only to -ion bases—noun-forming -ary (as in the noun revolutionary), adjective-forming -ary (as in the adjective revolutionary), and denominal -er (vacationer). Let’s first deal with the first two examples together. Table 8.2 lists all the examples appearing in CELEX with a potentially free root, and -ionary affixation. The root is listed, with its frequency, together with the -ion and -ary final forms. Note first, that absolute frequency does not appear to be playing a role here. The average frequency of -ion forms is 300.2 occurrences per 17.4 million. Of the ten examples in table 8.2, five are below average frequency, and five are above average. However, for eight out of the ten examples listed, the derived form is more frequent than the base it contains—and for the remaining two the difference is minimal. Recall that, of -ion final forms as a group, 35% are more frequent than the bases they contain. That 80% of forms which can be further affixed by -ary are more frequent than the bases they contain, indicates that -ary disprefers bases which are highly decomposable. (Fisher Exact Test, p<.005). Only five vacationer-type forms are listed in CELEX. These are given in table 8.3. Four out of the five -ion forms are more frequent than the bases they contain. Again—a rate of 80%, as opposed to the 35% which would be expected by chance (Fisher Exact Test, p<.06). Fabb’s explanation for affixes which can occur outside just one other affix is as follows, using the specific example of -ionary. -ary selects for a non-complex host. However, it is unusual in that while the unmarked case for most suffixes is that they select for words, -ary has the option of also selecting for one specific suffix, -ion. (Fabb 1988:534) execute
231 execution
206 executioner
21
exhibit
218 exhibition
410 exhibitioner
0
extort
0 extortion
11 extortioner
31
probate
8 probation
83 probationer
5
vacate
48 vacation
270 vacationer
4
Table 8.3: -ioner forms listed in CELEX: frequency of root, base, and derived form. Fabb’s generalization, while clearly on the right track, misses the fact that the two options he describes are not insignificantly related. -ary is not a highly decomposable affix, and so we predict that it will be ill-formed when attaching to highly decomposable bases. Affixes which are unlikely to be parsed out do not occur outside of affixes which are likely to be parsed out. As such, many affixed words are ill-formed as bases of -ary. Now if -ary simply selected for the affix -ion, there would be no clear explanation for the fact that not all -ion forms are equally well-formed. But the fact that -ary preferentially attaches to -ion bases with low levels of decompositionality, follows naturally from the analysis presented here.
Affix Ordering
147
Plag’s critique of Fabb’s analysis of these types of verbs consists largely of pointing out that the “suffixes which occur outside just one other suffix” in fact can attach to a wider range of bases than Fabb suggests. He offers -ate and -ment as suffixes which can also occur before -ary. Examples given are commendatory, complementary, sacramentary, sedimentary, and supplementary. As all of these examples involve bound bases, they fall in nicely with our analysis. 8.4.3 Freely attaching suffixes Fabb lists three suffixes which he claims attach freely outside other affixes: -able, -ness and deverbal -er. As -ness is consonant-initial, and highly parsable, we predict that it should be able to stack up after other parsable affixes, and so the fact that it attaches freely is predictable in our approach. We would not, however, predict that -able and -er can occur outside affixes which are more likely to be parsed out than they themselves are, and in particular we should be concerned if these affixes attached freely to consonantinitial affixes. Investigation of the possibilities, however, reveals that this claim is not problematic for us, because both -able and -er attach to verbal bases, and English contains no verbalizing suffix which creates marked phonotactic junctures. All of the possibilities for suffixes which could occur before these two, then, are suffixes which are not inherently highly decomposable. Plag notes that the three suffixes do not attach as freely as Fabb suggests. Rather, there are some restrictions which “involve morphological, semantic, and phonological properties of the base, and additional, more general mechanisms like type blocking or token blocking” (Plag 1996:790). None of the observations concerning the behavior of these three affixes contradict an account in which highly parsable affixes are restricted from occurring before less parsable ones. 8.4.4 Problematic suffixes Fabb lists six affixes which are problematic for his approach. They attach to more than one affix, but are not completely unrestricted. These affixes are problematic in Fabb’s affix-driven account, but are less problematic for Plag, who sees them as “not more problematic than any others, since they are subject to the same kind of idiosyncratic, paradigmatic, and semantic-pragmatic constraints as are all the supposedly nonproblematic ones” (Plag 1996:793). From the point of view of the analysis being presented here, we predict that there will be some suffixes which can attach to some suffixes (i.e. those which are not highly parsable), and not others (i.e. highly parsable affixes). Thus the fact that such a set exists is unproblematic. Moreover, we expect that, to the degree that some affix combinations are not completely productive, they are restricted in a way which reflects the decomposability of the base. This does, indeed, appear to be the case. For example, Fabb lists adjective-selecting -ist as a “problematic suffix,” because there are four affixes to which it can attach: -ive, -ic, -an and -al. An examination of entries listed in CELEX, however, reveals that -ist does not attach unrestrictedly to these affixes. Only five forms are listed with -ivist forms. Of these, four of the -ive bases contain bound roots (archivist, positivist, prescriptivist, recidivist). The fifth form is activist—a form which has itself
Causes and consequences of word structure
148
undergone some semantic drift. -ist, then, does not affix to -ive in a fully unrestricted manner (cf. the illformedness of *combativist, *addictivist, *protectivist). Seven forms are listed in -icist: classicist, empiricist, geneticist, lyricist, physicist, publicist, romanticist. Of these, six have bound roots (assuming no relation between empire and empiric). The seventh romantic, is not highly decomposable, as evidenced by the fact that it is more frequent than the root romance (545 vs 207). All listed -anist forms contain bound roots (e.g. botanist, humanist, organist). The combination -alist is less restricted. There are many bases with bound roots, (e.g. fatalist, pluralist, vitalist), many which are not highly decomposable, (e.g. rationalist, colonialist, formalist), and a reasonable number which contain roots which are frequent relative to the -al final base, (e.g. conversationalist, sentimentalist, herbalist). The affix -al is able to occur relatively unrestrictedly inside -ist because it is not a highly parsable affix. Baayen and Lieber (1991:830), for example, demonstrate that it is on the borderline of productivity—scoring the same on their measure of productivity as simplex nouns. Plag claims that -ist is restricted only by semantics. …in addition to the four suffixes Fabb finds, the adjectival suffixes -ile, able, and -ar are also attested to precede -ism (as in infantilism, probabilism, particularism, and it seems that only semantic-pragmatic factors speak against forms involving other more picturesque adjectival suffixes like -esque preceding -ism/ist. Consider the putative kafkaesquism, which, if not ruled out for reasons of euphony, could certainly denote a theoretical framework developed by a circle of literary critics who try to find kafkaesque traits in any piece of fictional writing (with a kafkaesquist being a member of this circle). (Plag 1996:793) The three examples given by Plag all involve bases with bound roots, further supporting a model in which the decomposability of the base is central. He then argues that forms like -esque could even occur inside -ist or -ism, given the right context. Indeed, he is arguing that, if we construct the meaning such that the base is no longer entirely decomposable, then the word increases in wellformedness. That restricting the reference of kafkaesquism to a particular literary framework increases its potential wellformedness, supports models in which there is an explicit interaction between decomposability and wellformedness. Fabb lists denominal -al as another problematic affix, citing its attachment to -ment, ion and -or. The claim that -al can freely attach to -ment forms is an oversimplification, as attested by the large literature dealing with this praticular combination of affixes. In the following section I investigate denominal -al in depth.
8.5 A Case Study: Denominal -al The affixation of -al to forms in -ment is widely cited as evidence that -ment can attach at both level 1 and level 2. -ment can attach both to bound roots and Latinate verbs, as shown in (10a) and (10b) respectively (examples from Giegerich 1999:47). (10)
(a)
ornament
Affix Ordering
149
increment regiment fragment sentiment tenement experiment (b)
employment discernment containment derangement government development judgment
The level 1 suffix -al can attach to the bound forms which are affixed in -ment (in (10a)), but not to the Latinate forms (in (10b)). Thus, Aronoff (1976) and Giegerich (1999) argue that -ment affixation is able to occur at two distinct levels, and the level 2 affixed forms can not be followed by -al. There are three exceptions to the rule that -al can not affix to -ment forms with Latinate bases. Governmental, developmental, and judgmental are possible forms. Recall that one property associated with level 1 affixation is a relative lack of semantic transparency. Thus, the fact that at least two of these three forms are not particularly semantically transparent is taken as support for this interpretation. Departmental is also well-formed. This plays a lesser role in the relevant discussions, due to general agreement that it bears no synchronic relationship to the base depart. This form too, then, must be formed at level 1. Mohanan’s (1986) analysis is representative of the level ordering approach. ..dis- and -ment are affixed at stratum 2 in a productive fashion, creating semantically transparent results, while the same affixes attach unproductively at stratum 1, creating semantically opaque forms. We shall see in chapter III that a similar contrast appears in Malayalam, in which compounding at an ‘early’ stratum creates more semantically opaque results. My guess is that if a language has two strata of affixation or two strata of compounding, the earlier stratum would be the one that yields more opaque forms. If this is a correct observation, it merits further study. (Mohanen 1986:57) Mohanan’s guess is almost guaranteed to be confirmed by all work which assumes two levels of affixation, as lack of transparency is commonly used as a diagnostic of a process’s location on level 1.
Causes and consequences of word structure
150
Giegerich (1999) uses affixes’ tendency to recur at both levels to argue for a basedriven stratificational model. In this type of model, the bases govern, develop and judge (together with all the relevant bound roots) are marked in the lexicon as possessing the potential for -ment affixation in the first stratum. As such, this set of -ment forms may feed into the cyclic rules of level one. One of these rules allows for -al affixation to -ment final forms. The fact that these three Latinate bases may be affixed in -ment-al, then, is due to an idiosyncratic marking on the bases in the lexicon. A consequence of the fact that they are allowed to occur at level 1, is that they are not particularly transparent. Other -ment forms, such as employment and discernment are not prone to -al affixation, because -ment affixation generally occurs at level 2. This level-ordering approach is not very satisfying, because it essentially reduces the problem to a diacritic, in terms of which bases take -ment at level 1, and which at level 2. But Fabb’s alternative (that -al simply selects for -ment final bases) fails to distinguish between occurring forms like (governmental) and non-occurring words (like discernmental). Plag (1996) seeks to explain the range of affixes that -al can attach to with reference to a Latinateness constraint. This, too, seems to have lost the generalization that level-ordering was attempting to capture—-al only attaches to a specific limited range of -ment final bases. Levelordering approaches at least allow for such limitations, without necessarily explaining them. Goldsmith (1990) develops an account of -mental affixation which draws on patterns of stress assignment in English. The basic claim is that stress clash—a pattern of two adjacent stress-bearing syllables, is avoided across a level 2, or “open” juncture in English. While such clashes are possible in monomorphemic words (e.g. nylon), and words with level 1 affixation (eg abnormality), they are not permitted across words with open junctures. This is the principle, claims Goldsmith, which rules out words like racistic and careeristic—they contain a stress clash across an open juncture, unlike communistic, and regalistic, for which the stress of the stem is not stem-final. This analysis, argues Goldsmith, extends straightforwardly to the problem of -mental affixation. -ment affixation to bound roots does not involve an open juncture, and so a form like fragmental, while it contains a stress clash, is legal because that stress clash does not straddle an open juncture. Forms like developmental and governmental, according to Goldsmith, are also well-formed because, while they contain an open juncture, there is no stress clash across that juncture. Forms like employmental and recruitmental, however, are ruled out, because they contain a stress clash which straddles an open boundary. While stress patterns may play some role in ruling out some -mental combinations, this cannot be the full story. Goldsmith’s analysis fails to account for the well-formedness of judgmental and departmental, and incorrectly predicts that -ment forms with bases which don’t carry stem-final stress should have well-formed counterparts in -al. Words like discouragemental, nourishmental and managemental, however, are not well-formed. The approach advocated here potentially provides a clear explanation of the facts. I predict that those -ment forms which -al attaches to are ones that display low levels of decomposability. With this prediction in mind, I turn to a closer examination of the data set. base word
base frequency
derived word
derived frequency
ln(base/der)
Affix Ordering
151
discern
84 discernment
4
3.04
contain
2244 containment
17
2.83
derange
43 derangement
12
1.2
develop
4492 development
3707
.19
employ
1110 employment
967
.14
judge
718 judgment
1053
−.38
govern
340 government
7693
−3.12
Table 8.4: Frequency counts for Latinate bases suffixed with -ment, ordered with respect to relative lexical frequency Table 8.4 shows the lexical frequency of the bases and the derived -ment forms discussed by Aronoff and Giegerich, with frequencies listed in the CELEX lexical data base. I have listed the forms in terms of relative frequency, from discernment—which is much less frequent than the base it contains, to government, which is much more frequent than the base it contains. A fourth form mentioned by both Aronoff and Giegerich is the form departmental. Both argue that department bears no synchronous relationship to the base depart, and so this form, too, must be formed at level 1. It is worth noting, in passing, that like judgment and government, department is more frequent than the base it contains. Now, if the suffix -al shows a dispreference for affixation to words with internal structure, this would account straightforwardly for the fact that it can apply to -ment forms with bound roots (cf. ornamental, regimental). The forms in table 8.4, then, should show increasing acceptability with -al as we move down the table. That is discernmental should be very bad, whereas governmental should be perfectly acceptable. This explanation accounts nicely for the extremes of the table, but the relative ordering of developmental and employmental seems to be in the wrong order. The ordering in the table suggests that employmental should be more acceptable than developmental. However a glance at the relative frequency numbers illustrates that these two words are extremely close in relative frequency—both around the borderline—with the base and derived form roughly equally frequent, for both. This is exactly the part of the distribution where we expect some instability—and where we might expect some arbitrariness in terms of degree of internal structure. Thus, while employmental is not particularly well-formed, and has not been lexicalized, the relative frequency facts predict that it should be a more likely word of English than discernmental or containmental. We expect -al to display this sensitivity with other affixes, as well as -ment, and so can look elsewhere in the lexicon for independent evidence that the suffix -al disprefers bases which are highly decomposable. While Fabb’s (1988) claim is that denominal -al can occur after -ion, -ment and -al, Plag (1996) claims that this set of affixes is too small, and that -al can actually attach to a large set of derivatives.
Causes and consequences of word structure
152
Contrary to Fabb’s claim, denominal -al attaches not only to -ion, -ment, and—or, but also to derivatives involving nominal bases in -ure (aperturial, cultural), -ent/-ant (presidential, componential, consonantal), -ance/-ence (concordantial, conferential), -cide (insecticidal, suicidal), ory (laboratorial), -ary (secretarial), -ive (relatival, substantival). Most of these nominal suffixes take the adjectival suffix -al regularly, but rival processes, especially -ous, may intervene. (Plag 1996:791) Most of the examples cited by Plag involve bound roots, which have the minimum possible internal structure. And, indeed, -al affixation to most of the affixes he cites do appear to be sensitive to internal structure. For example while -al affixes readily to -or final bases (pastoral, mayoral, ambassadorial, tutorial, manorial), it is dispreferred in cases in which -or represents a salient parse within the word (c.f. *conductorial, *directorial, *sailorial, *ejectorial). CELEX lists only two -orial forms for which the root stands in a semantically transparent relation to the whole: editorial and senatorial. In both cases the relative frequency facts strongly favor the whole-word route in access. Editor is more frequent than edit, and senator is more frequent than senate. The parsing route is unlikely to win in access for these two words, thus minimizing the salience of their internal structure. That is, the relative frequency facts lead editor to be less decomposable than conductor. If -al disprefers bases with internal structure, it should be preferred on the former rather than the later. Similarly, while -al is possible on -ive final bases, the attested bases tend to be nondecomposable. Plag lists substantival and relatival. Neither substantive nor relative stand in a phonologically transparent relation with their base, and so the whole-word route is likely to be strongly favored, and the internal structure eroded. The only additional example listed in CELEX is adjectival—with a clearly bound root. Note the impossibility of -al affixation to -ive final bases which are semantically and phonologically transparent: *collectival, *conservatival, *actival, *digestival, *preventival, *protectival, *selectival. Again, while -al can attach to suffixed forms, it displays a strong preference for forms for which the internal structure is absent or eroded. The only affix to which -al can attach in a relatively unrestricted manner is the nominalizing -ion. CELEX lists many words in which these affixes occur together, including educational, congregational, conversational, etc. It is unlikely to be accidental that -al affixes freely to this particular affix, as -ion displays properties which heavily bias the whole-word route in access. That is, regardless of the relative frequency facts, the parsing route is disfavored for words affixed in -ion. For -ate final forms (such as educate), -ion both changes the stress placement, and alters the identity of the final phoneme. For forms which do not end in -ate (such as converse), -ion affixation shifts the primary stress, so that it no longer appears on the base. Thus, in all cases, the derived form is uniquely identifiable over the base before the morpheme boundary is reached, heavily biasing the whole-word access route. As such, the degree of internal structure present in the representation of -ion final forms is likely to be heavily reduced as compared to forms affixed in -or or -ment. It appears that denominal -al affixation is tolerant of some degree of internal structure, but not much. The sensitivity of -al to internal structure is predictable from the fact that the suffix is associated with a low level of productivity and so not readily parsed. As
Affix Ordering
153
noted above, Baayen and Lieber (1991:830) argue that -al is on the borderline of productivity, scoring the same on their measure of productivity as simplex nouns. It is not the case that -ment affixation occurs at two separate strata of affixation in English. Rather, as argued in previous chapters of this book, factors such as the relative frequency of the derived form and the base, and degree of phonological transparency, lead some affixed words to be more decomposable than others. -al affixation is sensitive to this aspect of a word’s representation. An affix-driven level-ordering account of this phenomena is therefore not only not required, but it is also unable to make the correct predictions. The next section describes an experiment designed to test subjects’ intuitions about the likelihood of -al affixation to a range of -ment final forms. Do subjects’ preferences about non-words reflect the decomposability of the base?
8.6 Experiment 7a: -al Affixation and Relative Frequency The facts presented above strongly suggest that -al is most likely to attach to simple bases, and should decrease in acceptability with the increasing decomposability of potential bases. An experiment was designed to test the psychological reality of this generalization. 8.6.1 Methodology and Materials The stimuli consisted of sixteen pairs of words which were matched for syllable count, stress pattern2 and lexical frequency. An attempt was also made to match 2. With the exception of embarrassment and advertisement, which are not matched for stress in American English. This was an oversight on the part of the New Zealand English-
Word A arrangement
freq.
base freq.
Word B
freq
base freq.
1111
1587 investment
1294
525
attachment
196
878 detachment
133
99
attainment
153
234 amendment
163
83
curtailment
17
21
0
containment
17
27
15
appeasement
32
70 infringement
31
22
enchantment
26
115 impeachment
34
10
embarrassment
344
461 advertisement
346
277
involvement
418
3779 appointment
561
465
settlement
471
1428 punishment
629
421
commitment
743
958 excitement
690
207
achievement
713
2121 equipment
1352
179
63 bereavement 2244 atonement
Causes and consequences of word structure
adornment
41
advancement
77
commandment effacement
103 1
75 alignment 944 bombardment 508 recruitment 13 abasement
154
57
44
65
48
103
102
6
2
Table 8.5: Experiment 7a: Stimuli for the approximate probability of the phoneme transition across the word boundary. The members of the pair differed in the frequency of the embedded base. The stimuli are listed in 8.5. Word A of each pair is less frequent than the base it contains, whereas Word B of each pair is more frequent than its base. We therefore predict that Word B of each pair should be more acceptable when suffixed with -al. Note that while, in some cases, the frequency of the base and the frequency of the derived form is fairly close, in all cases, the difference in the log ratio of base and surface frequency is fairly sizeable across the two conditions. These pairs of words were all suffixed with -al (e.g. arrangemental, investmental etc.) They were counterbalanced for order of presentation, and then presented to subjects, who were asked to indicate which member of each pair sounds more like a possible English word. The exact instructions were as follows. This is an experiment about possible words. You will be presented with pairs of words, neither of which are actual words of English. Your task is to decide which sounds more like it could be a word of English. Read the two words silently to yourself, and then circle the word you think is more likely to enter the English language. speaking author, using British-English-based CELEX for stimuli construction. There are no right or wrong answers, we are just interested in your intuition. It is very important than you provide an answer for every pair. Don’t worry if you are not sure of your answer, just provide your best guess. Thirty five Northwestern University undergraduate students completed the task for class credit in an introductory linguistics class. The stimuli were present in written form, and subjects indicated their preference on an answersheet. 8.6.2 Results Overall, 56% of all responses favored affixation to derived forms that were more frequent than their bases, while 44% favored their matched counterparts. Subjects displayed a significant preference for -al affixation to -ment forms which were more frequent than their bases (by subjects Wilcoxon: p<.05).
Affix Ordering
155
The by-items results fall slightly short of reaching significance on a Wilcoxon Test (by items, p<.08). They are, however, significant on a t-test—a test with greater statistical power (t=2.158, df=15, p<.05). Examination of the behavior of the individual items reveals that two items went strongly in the wrong direction: subjects preferred adornmental over alignmental (22 to 13), and advancemental to bombardmental (21 to 14). The first pair shows the lowest difference in the log ratio of the base word to the derived word, and so we should perhaps expect the predicted preference to be weaker for this pair than most of the others. This pair was also included in the stimulus set for experiment 4—in which subjects were asked to indicate which word appeared more complex. This enables us to refer back to subjects’ impression of the morphological complexity of the words in this pair. Indeed, the word pair is unusual, in that ratings in experiment 4 went the wrong way for this item. More subjects rated alignment as more complex than adornment than vice-versa. Thus, while judgments of complexity for this specific word pair go in the opposite direction than we predict based on relative frequency, their corresponding behavior in this rating task in fact provides evidence in support of the hypothesis that base decomposability is relevant to -al affixation. If we omit the alignment-adornment pair from our by-items analysis, then it reaches significance (Wilcoxon: p<.05). Note that this result is in the opposite direction from that predicted by frequency and familiarity facts alone. Subjects display a preference for the stimuli containing less familiar bases. All things being equal, we should expect subjects to rate derivations based on frequent words as being more likely to enter English than derivations based on infrequent words. Here we see the opposite trend, which can be explained only if we take into account the role of base frequency in maintaining morphological complexity. In the next section, we investigate whether -al is preferred attaching to complex words which do not have a phonotactic cue to juncture.
8.7 Experiment 7b: -al Affixation and Phonotactics In the previous section, we found evidence confirming earlier observations about the relative ordering of affixes. We have argued that -al is not a highly parsable affix, and, as such, displays a dispreference for attaching to forms which are readily decomposable. Elsewhere in this book we have seen evidence that the phonotactics across a morpheme boundary can provide a strong cue to juncture. As such, we predict that -al should prefer to attach to -ment forms which display legal phonology over -ment forms which contain a phonotactic transition which may facilitate decomposition. This section describes an experiment designed to test this hypothesis. 8.7.1 Methodology and Materials The stimuli consisted of 15 pairs of affixed words, as shown in table 8.6. The members of each pair were matched for syllable count, stress pattern, and the frequency of both the whole word and the base. Word A of each pair contains an illegal or low probability transition across the morpheme boundary. Word B, on the other hand, contains a highly probable V-C transition. As such, the phonology of Word A is much more likely to
Causes and consequences of word structure
156
induce parsing than the phonology of Word B. We predict that Word B should appear less complex, and so should be more acceptable when suffixed with -al. The word pairs in table 8.6 were suffixed with -al and presented to subjects in written form. The word pair improvemental/requiremental was mistakenly included twice on the answersheet. Subjects’ responses to the second presentation of this pair were discarded. The order of presentation of pairs was counterbalanced, and presented in a manner analogous to that described in section 8.6. The instructions for this task were identical. Twenty different subjects completed the task. The subjects were students in a Northwestern University undergraduate linguistics class, and received course credit for their participation. 8.7.2 Results As predicted, subjects displayed a strong preference for -al affixation to forms which contained legal phonotactics. This result is extremely robust. Of the twenty subjects, nineteen displayed this preference. The result is highly significant both Word A
freq.
base freq.
Word B
freq
base freq.
improvement
756
1685 requirement
944
3391
recruitment
103
257 deployment
96
203
3
12 conferment
9
9
enforcement
66
294 endowment
66
112
escapement
1
1079 acquirement
1
1035
annulment
settlement
471
1428 measurement
140
1308
curtailment
17
63 impairment
18
93
management
1462
2289 argument
233
2130
arrangement
1111
1587 employment
967
1110
assessment
404
417 retirement
400
561
defacement
0
18 allurement
1
15
abridgement
3
15 deferment
5
5
12
131 wonderment
10
668
215
2656
4
46
bafflement announcement discernment
271 4
1334 enjoyment 84 interment
Table 8.6: Experiment 7b: Stimuli by items (Wilcoxon, p<.005), and by subjects (Wilcoxon, p<.0001). 67% of all responses favored -al affixation to forms with vowel-consonant transitions across the morpheme boundary (e.g. requiremental), with only 33% of responses displaying a preference for affixation to matched counterparts containing consonant-consonant transitions (e.g. improvemental).
Affix Ordering
157
Together with the results of Experiment 7a, and the preceding examination of the stacking behavior of a range of affixes, this provides strong evidence in favor of an account of affix-ordering which involves parsability. Affixes which are not likely to be parsed out can not occur inside affixes which are likely to be parsed out. This perceptually grounded restriction accounts for the range of affix-ordering behavior commonly associated with Level-Ordering, in addition to predicting differential behavior of individual forms. We have seen experimental evidence that such differences do, indeed, occur. Only an account involving parsability can afford these phenomena a unified explanation.
8.8 Discussion The results of these experiments cast considerable doubt on an affix-driven level-ordering account of denominal -al affixation. Recall that the standard level-ordering account of the -al affixation facts states that -ment affixation can take place at level 1 and level 2, whereas denominal -al affixation is a strictly level 1 phenomena. Under this account, ment affixation to bound roots occurs at level 1, and a few select non-transparent words (e.g. government), may also be formed there. This accounts for the fact that -al affixation may attach to some forms in -ment but not others. Most words affixed with -ment occur at level 2, and so can not take an -al affix. Level-ordering accounts require that all -ment forms which do not take -al affixation were created at level 2, and so does not allow for any distinction between them. That is, neither requirement nor improvement take -al, and so both must be created at level 2. This account can not predict that improvement would be much less likely to take -al than requirement. Rather, -al (like other affixes discussed above), is sensitive to the presence of internal structure. Some forms affixed with -ment favor the whole word route on access, are minimally decomposable, and so may take -al. As such, governmental, judgmental, etc. are possible, and are listed in the lexicon. Other forms are highly decomposable, and, as such, are bad with -al. These forms include improvemental and effacemental. Still others do not have -al forms in the lexicon, but there is no particular reason why they should be excluded, and, indeed, if we started to develop a need for these words, we might expect them to enter the language. These include requiremental and abasemental. The degree to which -al suffixation is bad with ment forms is related to the degree to which these words are decomposed. The division of affixation into multiple strata was one way of accounting for the fact that “level 1 affixes”—non-transparent, non-productive, non-neutral affixes, tend to occur before “level 2 affixes” which are more transparent, more productive, and leave the phonology of the base untouched. This division appeared to be a natural one because of a perceptually-driven restriction against placing affixes which are likely to be parsed out inside affixes which are not. This explanation not only explains the ordering restrictions level-ordering accounted for, but predicts that there will be a large number of cases in which one affix can attach only to a restricted set of words containing a second— specifically those words which have a minimal degree of internal structure. Indeed we have found a large number of cases in which one affix displays restricted behavior in terms of possible bases containing another. In all such cases, non-decomposable bases were favored. We did not find any cases in which an affix attaches only to forms which
Causes and consequences of word structure
158
are maximally decomposable and not to forms which are relatively opaque. The proposal presented here resembles a principle offered by Burzio (1994). A structure with a degree of compositionality n may not contain a structure with a degree of compositionality greater than n. (Burzio 1994:354) Burzio offers this “Structure-transparency principle” as an explanation for the fact that Latinate affixes cannot occur inside Germanic affixes. He assumes that Latinate affixes all give rise to the same degree of compositionality, and that Germanic affixes, too, are all equally decomposable. In this way he accounts for the relatively free attachment of affixes within each of the classes—because, if two affixes are in the same class “whatever internal structure one tolerates, the other will too” (355). The discussion above reveals that the Structure-transparency principle was on the right track, and in fact, turns out to be far more powerful, and with far more nuanced consequences than conceived of by Burzio. In particular, the generalization extends well beyond a description of abstract structures, to have concrete and demonstrable consequences for individual word forms in the lexicon.
8.9 Exceptions to Level-ordering Aronoff and Sridhar (1983), and many others following, discuss three pairs of affixes which appear to be major exceptions to the affix-ordering generalization in English. The exceptions are the pairs of affixes which appear in the words patentability, standardization, and governmental, in which level 2 affixes appear closer to the root than level 1 affixes. Fabb (1988) also discusses the common pairing -istic. The combination mental was discussed in depth above. The other three combinations are consistent with the phonotactic aspect of the hypothesis under investigation here. Because the phonotactics do not make -able, -ist, and—ize highly parsable affixes, nothing phonotactic makes forms such as patentability, legalistic, and standardization particularly ill-formed. However these pairings are still rather surprising under the account advocated here, due to the frequency characteristics of the affixes. In each of these three cases, the productivity and relative frequency profiles suggest that, if decomposability were a sole function of frequency, the outermost affix in these combinations should be less separable than the innermost. -able, for example, is much more productive than -ity, and its frequency profile suggests that it should be much more prone to parsing in speech perception. It remains for future work to determine whether these combinations are truly anomalous, or whether their wellformedness is explainable within the context of an enriched understanding of the relative contribution of phonotactics, frequency, and other characteristics, towards determining the decomposability of an affix. These cases point to the danger of interpreting each of the hypotheses in section 8.2 as absolutes. The hypotheses will sometimes make contradictory predictions, and untangling the exact manner in which they will interact in such cases is a complex matter. It is a tantalizing fact, however, that the number of cases in which contradictory predictions are
Affix Ordering
159
made is minimized by the non-independence of phonotactic patterns and frequency in the lexicon. At the word level, complex words which do not contain any phonotactic cue to juncture are significantly more likely to be more frequent than the bases they contain than words which contain a low probability phonotactic transitions (chapter 3). And at the affix level, consonant-initial affixes tend to display frequency profiles which make them still more prone to decomposition (see section 8.3). It is certainly not my claim that this account is the complete story of English affixation. However the evidence presented here demonstrates that, when combined with other pertinent linguistic information, a parsing-based account has the potential to significantly advance our understanding of the constraints on possible affix-orderings. No single factor alone (such as relative frequency effects, or phonotactic profiles) can completely predict the parsability of an affix. Rather, what we are dealing with here is a syndrome -a set of complex statistical generalizations which provide insight into the representation and organization of affixes, and places constraints on their possibility of co-occurrence.
8.10 Bracketing Paradoxes The discussion has thus far focused on suffixes, because stacking restrictions on English suffixes have formed the core of the debate about affix-ordering. Prefixes in English are much less likely to co-occur. However they are involved in one particular type of exception to level-ordering—bracketing paradoxes. Bracketing paradoxes are words in which selectional restrictions suggest that a level 1 suffix has attached before a level 2 prefix. Such cases have been widely debated in the literature (e.g. Williams 1981, Strauss 1982, Kiparsky 1983, Pesetsky 1985, Sproat 1985, and others). Ungrammaticality is a textbook case of a bracketing paradox—the level 2 un- attaches to adjectives and so must attach before the nominalizing level 1 -ity. Other examples (from Kiparsky 1983), include decongestant, arch-ducal vice-presidential and underestimation. Having dispensed with the Affix-Ordering generalization the above forms are not explicitly ruled out as possible forms. However, we should still predict that they are difficult to parse—we are likely to posit a strong boundary after arch-, for example, so may have to do some back-construction in order to achieve the right parse for arch-ducal. In order to understand why this type of “paradox” does not lead to rampant mis-parsing, we can compare them to the case of a second theoretically possible, and yet completely non-occurring type of bracketing paradox. There are no attested cases of words in which a level 1 prefix attaches after a level 2 suffix. Where does this assymmetry come from? The asymmetry receives a natural explanation when we consider the left-to-right nature of the speech signal, and so, consequently, the left-to-right nature of parsing. I have argued that suffixed words are biased towards decomposition. The base is encountered before the full form, and so likely to receive a high level activation, biasing decomposition over a whole word, non-decomposed analysis. In prefixes, however, the whole-word route is naturally biased. The onset of the base is later than the onset of the whole word, affording it an advantage in access. As such, prefixed words are, on the whole, less decomposable than suffixed words (see e.g. Cole et al. 1989).
Causes and consequences of word structure
160
Now, consider the non-occurring bracketing paradoxes, which involve level 1 prefixes outside level 2 suffixes. Insuccessful, is an example given by Spencer (1991). Prefixes displaying characteristics traditionally associated with level 1 are highly unlikely to be parsed out. First, because they do not create salient junctural phonotactics, second, because they tend to be non-productive in any case, and third, because they are prefixes, and so are naturally biased towards whole word access. Therefore it is likely that the processor will fail to posit a boundary at the first morpheme boundary in insuccessful. The second morpheme boundary is associated with a suffix which contains “level 2” characteristics such as high productivity, and is highly likely to be parsed as a boundary, thus leading to the parse insuccess#ful, which is inconsistent with the intended meaning, and requires a fair amount of backtracking to repair. Because we are naturally inclined to treat the first two morphemes as a unit, the attempt to more closely associate the second two morphemes fails. By comparison, the existing bracketing paradoxes are cases in which a level 2 prefix attaches before a level 1 suffix, as in reburial. Because of left-to-right parsing, the base rebury is encountered before the suffix. Even if there is some evidence to posit a boundary between the first two morphemes, listeners will almost certainly entertain the possibility that they are intended to be associated as a unit. Rebury will receive some level of activation. Encountering a suffix which semantically modifies this unit, then, is not going to throw the processor completely. In sum, bracketing paradoxes are cases in which a highly parsable prefix appears to have attached before a marginally parsable suffix. This is one of four theoretically possible configurations of prefix-suffix pairs differing in parsability: (11) non-paradoxical configurations: (a) [(marginally parsable affix) (base)] (highly parsable affix) (b) (highly parsable affix) [(base) (marginally parsable affix)] (12) parodoxical configurations: (a) [(highly parsable affix) (base)] (marginally parsable affix) (b) (marginally parsable affix) [(base) (highly parsable affix)] The two configurations in (11) are by far the most common patterns attested. This fact has been viewed as evidence in favor of level-ordering. It follows straightforwardly, however, from the same principles used to motivate the ordering of English suffixes in the previous sections. The configurations in (11) are maximally well-formed in terms of maximizing the probability of successful online parsing. Affixes which are more marginally parsable, are also closest to the base grammatically. The configurations in (12) are non-optimal in terms of parsing, however (12b) is much more non-optimal than (12a). Because we process speech in a temporal manner, the prefix-base sequence is available as input to suffixation in (12a), even if its not the preferred input. The configuration in (12b), however is essentially non-parsable. Both the left-to-right nature of speech, and the marginal parsability of the prefix, will lead to an overwhelming tendency to parse the first two elements as a constituent, contrary to the intended analysis, in which the last two elements are a constituent.
Affix Ordering
161
We also predict that configurations of type (12a) (traditional bracketing para-doxes), should be most acceptable when the prefix-base constituent is minimally decomposable. We might expect arch-ducal, for example, to be less acceptable than reburial (due to the difference in strength of phonotactic cue to decomposition). While this prediction seems intuitively plausible, it remains to be empirically tested.
8.11 Conclusion The problem of restrictions on affix-ordering in English can be largely reduced to one of parsability: an affix which can be easily parsed out should not occur inside an affix which can not. This has the overall result that, the less phonologically segmentable, the less transparent, the less frequent, and the less productive an affix is, the more resistant it will be to attaching to already affixed words. This prediction accounts for the patterns the original affix-ordering generalization was intended to explain. Importantly, the prediction also extends to the parsability of affixes as they occur in specific words. This accounts for the so-called dual-level behavior of some affixes. An affix may resist attaching to a complex word which is highly decomposable, but be acceptable when it attaches to a comparable complex word which favors the direct route in access. Understanding affixordering, then, requires a full understanding of factors influencing the parsing and storage of individual words.
CHAPTER 9 Conclusion
9.1 Summary of Results This book set out to explore possible effects of speech perception strategies upon morphological structure. We chose to explore two factors known to be relevant to speech perception—probabilistic phonotactics and lexical frequency. 9.1.1 Probabilistic Phonotactics Infants and adults use probabilistic phonotactics to segment words from the speech stream. Experiment 1 implemented a simple recurrent network, in order to simulate this strategy. The network was trained to spot boundaries between monomorphemic words. By exploiting the available phonotactic information, the network learned to perform this task well. Testing the network on multimorphemic words revealed that English is configured in such a way that a phonotactics-based segmentation strategy must have consequences for the processing of morphologically complex words. When presented with multimorphemic words, the network hypothesized a boundary at 60% of wordinternal morpheme boundaries. Preliminary evidence from Hay et al. (in press), and more direct evidence from Experiment 2 demonstrated that English speakers do, indeed, use phonotactic information to segment nonsense words into component “morphemes.” This result was replicated in Experiment 3 with real words. The word pipeful, for example, is judged by subjects to be more complex than the word bowlful. When other factors are controlled, low probability transitions provide a stronger cue to juncture than transitions which are relatively more likely to be found morpheme-internally. Together, the results of experiments 1–3 provide strong evidence that phonotactics are involved in the process of morphological decomposition. A series of calculations in chapter 3 revealed that this is reflected in the contents of the lexicon. Prefixed words with legal junctural phonotactics display different characteristics than prefixed words with junctural phonotactics which are not attested morpheme internally. Prefixed words with legal junctural phonotactics appear less prefixed, are less semantically transparent, more polysemous, and are more likely to overtake their base word in lexical frequency. These effects are absent for suffixed words, a difference which was ascribed to the left to right nature of speech perception. We also hypothesized that
Conclusion
163
suffixed words which contained low probability junctural phonotactics, but nonetheless displayed other symptoms of non-decomposability, may resolve this conflict in the phonetics—a hypothesis which was later explored in experiment 6 (to be summarized below). 9.1.2 Lexical Frequency Experiment 4 demonstrated that lexical frequency is also relevant to morphological decomposition. Words were matched for the surface frequency of the derived form, and the frequency of the base was manipulated. Words which were less frequent than the bases they contained were rated consistently more complex than matched counterparts which were more frequent than the base they contain. This provides evidence that the relative frequency of the derived form and the base affects the decomposability of a complex form. This contrasts with widespread assumptions involving the role of the absolute frequency of the derived form in decomposition. While many have claimed that high frequency forms do not tend to be decomposed, I argue that this follows only when such forms are more frequent than the bases they contain. Further evidence for the importance of relative frequency is found in Experiment 5. Subjects are more likely to place a contrastive pitch accent on the prefix in words like imperfect (less frequent than perfect), than matched counterparts such as impatient (more frequent than patient). Importantly, impatient is not a particularly high frequency word. Regardless, it displays characteristics of non-decomposability, because it is more frequent than the base it contains. Calculations presented in chapter 5 reveal that relative frequency is a better predictor of semantic drift than the absolute frequency of the derived form. Using the mention of the base word in a word’s definition as an indication of transparency, we found that relative frequency was a significant predictor of semantic drift, both for prefixed and suffixed forms. In contrast, no significant correlation between the absolute frequency of the derived form and semantic transparency emerged, either for prefixed or suffixed words. An investigation of polysemy revealed that both relative frequency and absolute frequency are separately correlated with the number of meanings associated with complex forms. Of these, absolute frequency is the better predictor. 9.1.3 Phonetic Consequences Having established that relative frequency and morphological decomposition are linked, experiment 6 returned to a possibility raised during the investigation of phonotactics. Chapter 3 found evidence that some suffixed forms display phonotactic cues to juncture, and yet are associated with characteristics of non-decomposability. We hypothesized that in such cases, the low probability phonotactic transition may, in fact, be resolved in the phonetics. The results outlined above about the role of relative frequency in decomposition provide us with a way to manipulate decomposability while holding phonotactics constant. Experiment 6 demonstrates that forms which are more frequent than their bases (e.g swiftly) display greater reduction in the implementation of the (offending) /t/ than
Causes and consequences of word structure
164
forms which the relative frequency relations predict to be highly decomposable (e.g. softly). This result is particularly powerful because it goes in the opposite direction from independent evidence that high frequency forms tend to be more reduced (see, e.g. Wright 1997). It can only be explained with reference to variation in morpheme boundary strength. 9.1.4 Morphological Productivity The evidence outlined above provides us with two invaluable diagnostics for gauging the decomposability of a complex word—does it contain a phonotactic cue to juncture, and is it less frequent than the base it contains? Once we can gauge the approximate decomposability of a complex word, we can gauge, for any given affix, the proportion of words containing that affix which are likely to be decomposable, as opposed to the proportion which are likely to have become relatively independent and opaque. With these tools in hand, chapter 7 turns to a discussion of the relation between decomposability and morphological productivity. I demonstrate that the productivity of an affix can be predicted by a joint function of the proportion of forms representing it which contain phonotactic cues to juncture, and the proportion of forms which are less frequent than their base words. Adding the type frequency of the affix to the model increases the predictiveness still further. Affixes which consistently create highly decomposable forms are much more likely to be productive than affixes which create less decomposable words. This result strongly supports models in which morphological productivity is projected from the lexicon. 9.1.5 Level-Ordering and Stacking Restrictions Finally, in chapter 8, we examined in detail a phenomenon historically associated with level-ordering—stacking restrictions among English affixes. I argued that early accounts of affix-ordering were overly restrictive, and drew the line at the wrong level of abstraction. However, recent work which has discarded the idea that there are any restrictions on ordering (beyond selectional restrictions) misses a number of important generalizations about stacking restrictions in English, and cannot account for the full range of facts. The affix-ordering generalization can be reduced to a perceptually grounded generalization: an affix which can be easily parsed out should not occur inside an affix which can not. I demonstrate that this maxim, when combined with our increased understanding of the role of frequency and phonology in morphological decomposition, captures not only the full range of generalizations about which affixes can occur in which order, but also a large number of systematic, word-based exceptions to these generalizations. I demonstrate that we do not find any cases in which an affix attaches only to forms which are maximally decomposable and not to forms which are relatively opaque. However, we find large numbers of cases where the opposite holds true. The range of results put forward here can not be accounted for by the classical affix-ordering account, nor by any account involving selectional restrictions. The range of empirical facts discussed in chapter 8 put forward a convincing case that affix-ordering and decomposability are intricately linked. However, because the claim
Conclusion
165
departs rather radically from previous suggestions in the literature, and because important consequences follow, two experiments were conducted to test the claim experimentally. Experiments 7a and 7b investigate the relationship of decomposability to affixordering. This is done in the context of the combination of -ment and -al. In general, -al can not attach to forms affixed in -ment. This follows from our prediction about parsability—-ment is consonant-initial, and so can be more easily parsed out than -al. However, there are a range of exceptions to this restriction. One set involves forms with bound roots (such as ornamental). Such forms are acceptable because the base word contains the minimum of internal structure—it is not decomposable. A second set involves a number of exceptions which we argue are systematic. Governmental, for example, is acceptable because government is more frequent than govern, reducing the decomposability of the base form. Experiment 7a presented subjects with pairs of matched non-words, such as arrangemental and investmental. Both words were matched for surface frequency, but— in this case—arrange is more frequent than arrangement, while invest is less frequent than investment. This should lead investmental to be less decomposable, and so more acceptable as a base of -al affixation. The results supported this prediction. The results of experiment 7b provide further evidence in favor of a parsing account of affix ordering. Subjects displayed a highly significant preference for forms such as requiremental (in which the base word contains no phonotactic cue to juncture) over matched forms such as improvemental (in which the base word is more likely to be decomposed, based on the phonotactics). No current proposal regarding restrictions on affix ordering predicts that subjects should display such strong intuitions about the relative wellformedness of affix combinations in non-words. However the result falls out naturally from an account of affix combination which focuses on parsability. Chapter 8 concludes with a demonstration that the parsing account extends naturally to account for attested restrictions on possible orders of attachment of differently parsable prefixes and suffixes.
9.2 Discussion This book started with a simple question. How do processes of speech perception affect the online decomposition of morphologically complex forms? We took two factors known to be relevant to the segmentation of words from the speech stream, and demonstrated that they exert a strong influence on the process of morphological decomposition. The resulting understanding of morphological decomposition has enabled us to make significant progress on a number of linguistic problems which have not generally been studied together. I have demonstrated that this increased understanding of morphological decomposability yields enormous explanatory power, from elements of fine phonetic detail, through to predicting the degree to which we can use an affix to create new words, and the order in which affixes can occur with respect to one another. Importantly, we have seen that a full understanding of these phenomena requires a sophisticated understanding of the morphological decomposability of individual words.
Causes and consequences of word structure
166
Laudanna et al. (1994) and Schreuder and Baayen (1994) have warned that affixes are not an homogeneous set, and that generalizing over them as an undifferentiated category can be hazardous. The results of this book certainly support this position. Further, the results presented here demonstrate that different words containing the same affix are not an homogeneous set, and that the widespread practice of generalizing over them as an undifferentiated category has resulted in a great loss of explanatory power. The properties of specific affixes cannot be sensibly detached from the properties of the specific words in which they appear.
APPENDIX A Segmentation and Statistics
Calculations of phonotactic cues to juncture reported in this book use simple n-gram probabilities—usually probabilities of coda-onset transitions. This may seem a surprising choice to those with some statistics background, as it does not normalize for the frequency of the component phonemes. The evidence that phonotactics is relevant to morphological decomposition does not rely on this specific choice of statistic—the experimental materials were chosen such that both normalized and non-normalized probabilities made the same predictions, and in my analysis of the co-occurrence syndromes reported in chapter 3, a division is made between occurring and non-occurring transitions—a line which both types of probabilities would draw in the same place. I do, however, report the occasional correlation, which is based on non-normalized probabilities. In this appendix I justify this choice of statistic, and also report corresponding normalized probabilities for the relevant correlations. It transpires that, at least for the types of calculations relevant to this book, English is arranged in such a way that non-normalized statistics approximate the normalized probabilities fairly well. Moreover, the non-normalized statistics are consistently better predictors of the data. This raises the interesting possibility that English is arranged in a way that strongly facilitates a strictly bottom-up approach to segmentation.
A.1 Interactionist vs Bottom-up Models of Segmentation Cairns, Shillcock, Chater and Levy (1997), distinguish between interactionist, weakly bottom-up and strongly bottom-up models of segmentation. In interactionist accounts, the information which facilitates segmentation flows top-down from the lexicon, and lexical segmentation is simply a by-product of word recognition (cf. McClelland and Elman 1986). Cairns et al. argue that such an approach is difficult to reconcile with the language acquisition process, as a lexicon is presupposed. A weakly bottom-up model does not make use of higher-level information during processing, “although it can be used to teach the system, or can be implicit in its construction” (Cairns et al. 1997:116). Cairns et al. cite Norris’s connectionist model as an example of a weakly bottom-up system (Norris 1990, 1992, 1993). The recent MERGE model also argues for a bottom-up approach (Norris et al. 2000). A strongly bottom-up system would be one in which higher-level lexical information is not used
Appendix A
169
either during processing, or during training. Such a system would be highly tractable in terms of the development of an early segmentation strategy in infancy—before a lexicon is formed.
A.2 Reanalysis of Hay et al. (in press) In Hay et al. (in press), we model our results as follows: the most likely parse is the winning parse, and well-formedness judgments are predicted by the log probability of the winning parse. The comparison of the observed probability to the expected value plays a role in establishing the degree to which a nonsense word is decomposed, but no role in establishing the overall well-formedness of a given parse once it is chosen. This system is not strictly bottom-up—it relies crucially on the contents of the lexicon. There is an implicit comparison between inter- and intra- word probabilities. In order to make such a comparison, a listener needs knowledge of items in the lexicon. However, the result could be equally well-modeled using a strongly bottom-up approach, such as that advocated by Cairns et al. (1997), who present a strictly bottom-up simple n-gram model, which places a boundary at all transitions occurring below a certain threshold. To see why this is so, consider figure A.1 (figure 5 from Hay et al. in press), which shows the log probability of nasal-obstruent clusters word-internally versus their probability across a boundary. The open squares represent the case of words with a boundary after the cluster (as in camp#er), and the filled squares represent the case in which the boundary splits the cluster, as in drum#pad. The distribution of the filled squares are of most interest here, as they represent the relation between probabilities word-internally (the x axis), and across word boundaries (the y axis). There is considerable variation in the probability of nasal-obstruent clusters word internally. The variation along the x axis is considerable. Consider, however, the variation in the probability of nasal-obstruent clusters across word boundaries. There is very little variation in the placement of the filled squares along the y axis. By considering the variation on the y axis alone, we would have little information to help distinguish which transitions are more likely across a word boundary than others. By considering the variation on the x axis alone, however, the variation is great enough, that we can hypothesize that the low probability transitions are likely to indicate juncture.
Appendix A
170
Figure A.1: Figure 5 from Hay et al. in press. Probability of Nasal-Obstruent Clusters Morpheme Internally, vs across a Morpheme Boundary. The dotted line in the graph indicates how the points would fall if the probabilities wordinternally were the same as those across words. In Hay et al. (in press), we argued that nasal-obstruent clusters falling under this line were more probable word-internally than across words, and so stimuli containing them would be perceived as monomorphemic. Clusters falling above the line were more probable across words and so stimuli containing them should be perceived as compounds. Thus, we argued for an active role of the expected probability under decomposition. Note, however, that the competition between monomorphemic and compound analysis (represented by the filled squares in the graph), is modulated entirely by the variation across the x axis. We would have got the same result in our paper if we had simply drawn a vertical line on this graph at a threshold of observed word-internal probability. If we nominate a threshold of around log (probability)= −8, and hypothesize everything below this threshold to indicate juncture, then this divides the points in almost exactly the same
Appendix A
171
manner. The results of analyzing the data in this way are shown in A.2. Comparison with the original analysis, in figure 2.2, reveals that only one point has shifted (ntS), and the overall significance level remains exactly the same (r2=.65, p<.0005). In fact, the one point which was caused to shift in this reanalysis, shifted towards the regression line. The distribution of English nasal-obstruent clusters, then, is arranged in such a way that a simple calculations of co-occurrence probabilities will approximate the information required for successful segmentation. In fact, this appears to be true for English coda-onset transitions in general. Figure A.3 plots the transitions across the morpheme boundaries of 515 prefixed words in English. The figure contains a point for all words with monomorphemic bases, which are prefixed with 9 consonant-final prefixes of English—the test corpus for the neural network described in 2.3, and the object of extensive calculations throughout this book. Each point in figure A.3 relates to the transition between a coda and an onset. Complex codas and onsets are included, as well as simple ones, and syllables beginning with vowels are assigned the probability of a syllable being onsetless. The x axis shows the probability of the transition occurring morpheme-internally, and the y axis shows the expected probability across a word boundary. Because the range of transition types is more diverse than that in the nasal-obstruent cluster data set shown in figure A.1, the variation in the expected value across a word boundary is much greater. This variation is weakly positively correlated with the probability word-internally (r2=.08, p<.0001—regression line not shown). Transitions which occur with high probability within words, also have a high probability of occurrence between words. The greatest variation in expected probability between words is for those transitions which are non-existent word-internally (towards the very left of the graph—treated here as if they occurred just once in the corpus, to avoid taking log of zero). Two dotted lines are shown on the graph. The diagonal line shows the points on the graph at which the observed (word-internal) and expected (inter-word) probabilities are equal. Points which fall below this diagonal line represent transi-
Appendix A
172
Figure A.2: Reanalysis of data from Hay et al. (2000). The y axis shows the average wellformedness ratings for the cluster perceived. The x axis shows the probability of the parse: a monomorphemic analysis for all clusters which have a monomorphemic probability>log (prob)=−8, and a morpheme boundary analysis, for all clusters which have a monomorphemic probability
−8, and for no
Appendix A
173
transitions with a word-internal probability of log (prob)<−8. Together the diagonal line representing the weakly bottom-up “relative probability” segmentation strategy, and the horizontal line representing the strongly bottom-up “absolute probability” segmentation strategy divide the graph into four quadrants. The upper-left quadrant of the graph and the lower-right quadrant of the graph represent those cases in which both strategies make the same prediction. These are the cases in which the segmentation strategy based on absolute frequencies would make a “correct” prediction as to the extent of juncture. The lower-left quadrant of the graph represents cases in which the absolute probability strategy would posit a boundary, and the relative probability strategy would not. Recall, though, that the points occurring at the very left of the graph are actually unattested morpheme internally (but treated here as occurring just once in the corpus). So the lowerleft quadrant contains only one transition which truly distinguishes the behavior of these two strategies. The triangular segment at the top of the graph represents those transitions in which the absolute frequency strategy would not posit a boundary, but which nonetheless occur more frequently across words than within words. In this particular corpus, it is these points where the two segmentation strategies would make different predictions. Note, crucially, that the points falling within the two parts of the graph in which the absolute frequency strategy makes the correct prediction, form a distinct majority. The degree to which the two segmentation strategies make similar predictions is further illustrated in figure A.4. This shows the ratio of intra-word probability to interword probability, as predicted by the inter-word probability alone. The two are extremely well correlated (r2=.60, p<.00001). There is at least one type of case in which relying on the statistics of the lexicon (or the speech stream), without regard to explicit intra- and inter- word comparison, could potentially run into trouble. This case involves rare phonemes, or rare onsets or codas. If a phoneme is itself fairly rare, then the probability of
Appendix A
174
Figure A.3: Intra- and inter-word cooccurrence probabilities for transitions occurring in 515 prefixed words.
Figure A.4: Log (intra-word cooccurrence probabilities) vs log (intra-
Appendix A
175
word/inter-word) probabilities for transitions occurring in 515 prefixed words. (r2=.60, p<.00001) its co-occurrence with any other phoneme will be low. A reliance on absolute probability for segmentation will lead to the positing of a boundary. Thus even if a phoneme transition involving a rare phoneme is more probable word-internally than across words, reliance on absolute co-occurrence probability will tend to indicate juncture, and lead the listener into trouble. This is the case of the lower-left quadrant in figure A.3—relatively empty in that particular corpus, but presumably populated at least some of the time. One thing to say about this possibility is that rare items are rare. Thus, the trouble caused by them may be sufficiently minimal that a segmentation strategy based on absolute probabilities may still be worthwhile. When we look at the design of words in English, however, we notice that this problem may be even less severe still. This segmentation strategy will only run into trouble if it encounters the problematic item word-internally. Thus rare clusters or phonemes will not be problematic either wordfinally or word-initially. In such locations, the strategy will lead to a boundary hypothesis—and will be right. The preponderance of complex consonant clusters at word-edges in English will significantly aid such a segmentation strategy. English permits extra consonants word-finally which are rare or unattested in word-internal syllables (see, e.g. Greenberg 1978, Kenstowicz 1994, Coleman and Pierrehumbert 1997). We might also predict that rare phonemes might be more likely to occur at wordedges than word-internally. Note, of course, that the calculations from Hay et al., and the calculations provided here for prefixes all involve transitions across boundaries. I would not want to suggest that the processor does not make some comparisons between inter-and intra-word transitions. The knowledge that onset-nucleus transitions are many times more probable word-internally than across words, for example, is undoubtedly available to the processor. However this may well reflect some higher-level generalization, rather than the calculation of specific phoneme transitions within and across words. For the gradient results reported in this book I calculated both the ratio of the intra- to inter- word probabilities (a type of mutual information statistic), and the intra- word probabilities alone. The intra-word probabilities were calculated as described in section 3.2. The expected probability across a word boundary is calculated using word ends. Thus, the expected probability of an /ns/ transition is calculated as the product of the probability of an /n/ final monomorpheme, and the probability of an /s/ initial monomorpheme. When the inter-word probabilities contributed significantly to the variation in the data, it was in the same direction as the intra- word probabilities. That is, transitions which occur with low probability across a word boundary were more likely to be associated with juncture than transitions which occur with high probability across a word boundary. The relevant results are outlined below.
Appendix A
176
A.3 Experiment 2 (2.5) Experiment 2 reported that subjects can use phonotactics to segment nonsense words into component “morphemes.” We can seek further evidence that phonotactics are behind the observed results by examining the robustness of the complexity judgments, in relation to the degree of juncture indicated by the phonotactics. Pairs for which there is a big difference in the phonotactics should show a stronger bias in the judgments than pairs for which the difference is relatively smaller. This calculation is also likely to shed light on the type of segmentation strategy being used. I will examine a number of different calculations of juncturehood, in an attempt to establish the precise type of information listeners are exploiting for this task. There are at least three possibilities in terms of what could be the best predictor of the results. The degree of decomposability could be related to word-internal statistics (the statistics of the lexicon), it could be a function of the ratio of interto intra- word probabilities (the degree of under- or over- representation in the lexicon), or it could be a function of both interand intra- word statistics summed together (an approximation of the statistics of the speech stream—though clearly a closer approximation would also incorporate token frequency). We will begin by examining the condition where the words were presented in isolation, and speakers rated them as either simple or complex. Recall that this condition was significant only in the sub-condition in which subjects both heard and read the stimuli. The analysis here involves only the data from subjects in that condition. Before trying to understand the relative roles of the inter- and intra- word statistics in this task, it is important to take note of how they relate to each other. As with the corpora examined earlier in this section, the log probability of the relevant transitions occurring morpheme internally is positively correlated with their log expected probability across words (r2=.23, p<.0001). This relation is shown in figure A.5. Transitions which are common word-internally also tend to be common across word boundaries. Let’s first assume that subjects are guided primarily by the word-internal statistics, with transitions that occur with low probability word-internally inducing decomposition. Then we expect the the number of subjects judging a word to be complex to be well predicted by the probability of the medial consonant cluster occurring morpheme internally. The less likely the transition is to occur within a
Appendix A
177
Figure A.5: Experiment 2: Intra- and inter-word co-occurrence probabilities for stimuli. morpheme, the more complex the nonsense word containing it should appear. Indeed, there is a negative correlation between the probability of the phoneme transition morpheme internally, and the complexity of the stimuli (r2=.07, p<.05).1 The less likely the relevant phoneme transition is morpheme-internally, the more likely subjects are to perceive the stimuli as complex. Thus, we have some evidence that wordinternal statistics are recruited for the task of word-segmentation. Can any of the other potential factors better predict the results? If subjects explicitly compare the probability of a phoneme transition occurring wordinternally to the probability of it occurring across a word boundary, then the degree to which it is over- or under-represented word-internally should be a good indicator of subjects’ choices. That is, we expect the above, fairly weak correlation, to improve significantly when we introduce information about the expected probability of occurrence across a word boundary. In particular, the log ratio of inter- to intra- word probability should indicate the degree of decomposability. However, there is no evidence that subjects are making this comparison. The log ratio of morpheme-internal probability to expected probability across a word boundary is not a significant predictor of the likelihood a subject will rate a word as complex (r2=.037, p<.12). Thus, the absolute probability of a phoneme transition morpheme-internally appears to be a better predictor of decomposability than the degree to which it is over-represented morpheme-internally. One might reasonably conclude from the above, then, that the expected probability of occurrence across a word boundary is not relevant to this task. This conclusion, however, would be premature. The log expected probability of occurrence across a word boundary
Appendix A
178
is, by itself, a significant predictor of the degree to which a word is perceived as complex (r2=.167, p<.001). Moreover, this inter-word probability accounts for a larger proportion of the variation than the intra-word probability reported above. If both the intra- and inter- word probabilities are both significant predictors of the data, why, then, is the ratio between them not? The reason is that the correlations both go in the same direction. The expected value across a word boundary is negatively correlated with the probability it will be rated as complex. That is, the less likely a transition is to occur across words, the more complex a word will sound. And the more likely a transition is to occur across words, the less it will indicate juncture. The correlation is given in figure A.6. The expected value across a word boundary is involved, then, but not in the way one might first expect. That both the expected value across word boundaries and the morpheme1. To avoid taking log (zero), non-attested clusters were treated as if they occurred just once in our corpus of monomorphemes.
Figure A.6: Experiment 2: Log expected probability across a word boundary, vs number subjects rating the stimuli as “complex.” (r2=.167, p<.001) internal probabilities are negatively correlated with the degree of decomposability suggests that a strongly bottom-up model may be correct. The segmentation strategy appears to be based on the overall statistics of the speech stream. Just as we use low
Appendix A
179
probability transitions in acquisition to discover word boundaries, we use the same statistics to gauge decomposability. A rough estimation of the overall probability of encountering a transition in the speech stream can be had by summing together the probability of occurrence word internally and the expected probability across a wordboundary. Such a heuristic is crude, of course, because it does not pay attention to the overall proportion to word-internal vs cross-word transitions encountered in speech, but it will do as a first approximation. Does summing together these two statistics improve or damage the correlation with subjects ratings? The correlation improves very slightly (r2=.169, p<.0001). In this task, it is clear that most of the variability is predicted by the expected value across the boundary. Why is the expected value across a boundary a better predictor of decomposability than word-internal probability in this data? Earlier in this appendix I argued that wordinternal probabilities carry more information than inter-word probabilities, and so should be a better predictor of decomposition. Yet here we have the opposite result. The reason for this lies with the design of the stimuli for this experiment. The stimuli were explicitly designed in matched pairs, with one member of the pair containing a low probability transition, and one member containing a high probability transition. Variation in the probability of the low probability words was explicitly avoided. Note, in table 2.1, that the transitions in the low probability members of the pair range from unattested to attested just twice in our corpus of monomorphemes. As such, there is basically no variability in the morpheme-internal probability of half of the data-set. Moreover, for the task at hand, this is the more important half of the data-set. In light of such restricted variability in the morpheme-internal probability of the data-set, it is perhaps not so surprising that this factor is only a weak predictor of the results. If decomposition is based on the statistics of the speech stream, then controlling the word-internal statistics in this way, means that almost all of the variability will be carried by the inter-word statistics. In this particular data-set, then, the statistics across word boundaries should be the better predictor of the results. In a less controlled data-set, however, we should still expect the word-internal statistics to be a better predictor of decomposability. This is because, as we saw earlier in this section, the word-internal statistics, in general, carry more information than the inter-word statistics. This analysis of the single-word presentation condition, then, provides evidence that both inter- and intra-word probabilities are involved in the process of segmentation. Low probabilities in both cases leads to high levels of decompos-ability. Now, then, let us turn to an examination of the condition in which the words were presented in matched pairs. Do the results there confirm the analysis provided above? As with the above discussion, I will limit the investigation here to the condition where subjects both heard and saw the stimuli, as this is likely to provide the cleanest data, and so best enable us to disentangle the various contributions of different types of probabilities. Here, then, we are focusing on a condition in which subjects both heard and saw the matched pairs given in table 2.1, and were asked to indicate which word they thought was more likely to be the complex word. Examining the various factors in the same order as above, I turn first to the role of the probability of the transition, word-internally. Can the difference between the times each member of a pair was chosen as simple or complex be predicted by the difference in the
Appendix A
180
morpheme-internal log probability of the transitions? In this case, the answer is no (r2=.01, p<.62). The difference between the log ratios is significant. The difference between the log (intra-word/inter-word) probabilities is a good predictor of the difference between their scores (r2=.27, p<.005). However, this correlation is carried entirely by the information in the intra-word statistics. As noted above, the interword statistics by themselves were not significant. However the difference in expected value across a word boundary was a significant predictor of the variation (r2=.26, p<.005). This statistic, in turn, is due to the expected value of the lower probability member of the pair. The expected value of the “good” transition played no role (r2= 0, p<.95), whereas the expected value of the “bad” transition in the pair was a highly significant predictor (r2.4, p<.001). Figure A.7 shows the expected value across a word boundary of the worst member of each matched pair, against the difference in judgments for that pair. Subjects’ behavior in this task, then, appears to be primarily driven by how bad the bad member of the pair is. Variation in the goodness of the good member of the pair seems to play little role. As with the single-word condition, the variation in the morpheme-internal probability of the bad member of the pair is fairly tightly controlled, and so the expected probability across a word boundary is the better predictor in the variation of how decomposable a word is. Also, like the previous condition, the correlation goes in the opposite direction than would be expected if the expected value was used as a point of comparison, to see how over or under- represented a transition was morpheme-internally. Rather, low probability of occurrence, even across words, tends to indicate decomposition. In sum, it appears that the strategy used in this segmentation task is strictly bottom-up. Boundaries are posited at low probability transitions in the speech stream. Because there are more inter-word transitions than intra-word transitions
Appendix A
181
Figure A.7: Experiment 2: Log expected value across a word boundary of the worst member of each pair, against the difference in judgments between pair members. (r2.4, p<.001) in language, transitions which occur across words will tend to occur less frequently in the speech stream than transitions which can also occur within words. In addition, the preponderance of consonant clusters at word-edges in English should also facilitate this task. We might also expect that rare phonemes might be over-represented at word edges. Because there tends to be more word-internal variation in probabilities than inter-word variation, the word-internal statistics carry more information, and so should dominate the segmentation task. The stimuli in this experiment, however, restricted the variation in intra-word likelihood, leading to a situation in which the inter-word probabilities displayed more variation, and so could more accurately predict subjects’ performance.
A.4 Wurm: Prefixedness (3.3.1) As reported in the text, the log probability of a transition occurring morpheme internally is weakly but negatively correlated with perceived prefixedness (Spearman’s rho=.26, p<.05). The less likely a phoneme transition is, the more likely the word containing it will appear prefixed. The log expected probability across words, however, is not a significant predictor of prefixedness (Spearman’s rho= .06, p<.7).
A.5 Wurm: Semantic Transparency (3.3.2) The log probability of the phoneme transitions occurring morpheme internally was a significant predictor of Wurm’s semantic transparency ratings (Spearman’s rho=.35, p<.005). More probable phoneme transitions were associated with lower transparency ratings. The log expected probability across a word boundary was not a significant predictor (Spearman’s rho=.17, p<.2).
A.6 Polysemy (3.3.2) I reported in chapter 3 that phonotactics are a weak predictor of polysemy (Spearman’s rho=.2, p<.0001). In this case, the log expected probability across the word boundary is also a weak but significant predictor of the number of meanings (Spearman’s rho=.15, p<.001). This correlation is weak, and is in the same direction as the correlation with word-internal probabilities. That is, for both inter-and intra- word probabilities, higher transitional probabilities are associated with more meanings. Low probabilities provide evidence for juncture, and so words containing them are less likely to proliferate in meaning.
Appendix A
182
That both factors contribute significantly, and positively, supports the argument that the use of phonotactics for segmentation is based on the statistics of the speech stream, with low probability transitions indicating juncture.
A.7 Relative Frequency (3.3.3) The log probability of the junctural phonotactics occurring morpheme internally proved an extremely weak, but highly significant predictor of the log ratio of the derived frequency over the base frequency (Spearman’s rho=−.19, p<.0001). The log expected probability of the junctural phonotactics occurring across a word boundary are a weaker, but significant predictor of the log relative frequency of the base and the derived form (Spearman’s rho=−.13, p<.005). It is interesting to note that the difference in magnitude between the strength of inter and intra-word probability here is close to that observed in section 3.3.2, in the analysis of polysemy. As was the case with polysemy, the direction of the effect of each factor is the same. Here, co-occurrence probability is negatively correlated with the relative frequency of the base to the derived form. The lower the probability, the more frequent the base is, relative to the derived form.
A.8 Segmentation and Language Design There are a number of theoretically possible relationships that could hold between probabilities word-internally and probabilities across word boundaries. Coda-onset transitions in English appear to be arranged in such a way that a simple, strictly bottomup heuristic will approximate a more sophisticated statistic quite closely. To understand how remarkable this is, it is worth spending some time stepping through theoretically possible relationships which could hold between word-internal and inter-word probabilities. Consider a vastly over-simplified family of fake languages—ZED. ZED languages are characterized by uniform syllable structure—all syllables must have both an onset and a coda. Words in ZED languages may be one, two, or three syllables long, and all ZED lexicons contains equal numbers of each type. The speech stream therefore consists of a string of CVC syllables, and (all other things being equal) any given transition between syllables is equally likely to be a word-internal transition or a transition across a word boundary. Figure A.8 shows a graph containing no points. The x axis represents the probability of a transition occurring within a morpheme, and the y axis represents the probability of it occurring across a morpheme boundary. There are many ways in which a ZED language could logically populate this space. As we step through our discussion of the ZED family, we will discuss possible distributions of points across this space, and implications of these distributions for language learning and processing. As with figure A.3, figure A.8 is divided by two lines. The diagonal line divides the graph at x=y. Any points appearing above this line would represent transitions which are under-represented word-internally. Any points appearing below the line would represent transitions which are over-represented word-internally.
Appendix A
183
The horizontal line is placed (arbitrarily) at log (probability)=−8. If the decomposer posited a boundary for all cases in which the log probability word-internally was greater than −8, points to the left of the line would represent transitions which were decomposed, and points to the right would represent transitions which were not. The quadrants on the graph are labeled with how such a decomposer would fare, compared to the analysis according to over- and under- representation. Points falling into the top-left and bottom-right quadrants would receive similar analysis under either strategy. Points in the bottom left quadrant would represent “false positives”— hypothesizing a boundary in cases in which a boundary is not likely, and points in the top right quadrant would represent missed boundaries—transitions which were likely to be boundaries but for which no boundary was hypothesized. Given this logical space, then, how might a ZED language populate it? Let’s begin by imagining a member of the ZED family—ZED-A, in which any given onset or coda is equiprobable in any position in a word. That is, while some onsets might be more common than other onsets, there is no single onset that occurs with different probabilities at different positions within a word. In ZED-A, then, inter- and intra- word probabilities are highly correlated. Transitions in ZED-A would cluster tightly along the x=y line in figure A.8. When we fit a regression line through the points in ZED-A, we expect the regression line to fall at x=y, with the residuals from the line falling in equal numbers above and below the line. In ZED-A, inter-syllable phonotactics will not help the language learner to find word boundaries, nor will they help the listener with a mature lexicon with the task of segmentation. They will not help the language learner because low probability coda-onset transitions in the speech stream are no more likely to indicate word boundaries than high probability transitions. In terms of figure A.8, because points fall in equal numbers on each side of the x=y line, points which fall to the left of the vertical line of the graph are no more likely to indicate a boundary than points which fall to the right of the vertical line. They will also not help the listener with a mature lexicon. That is, we might imagine that the mature speaker of ZED-A, having managed the difficult task of
Appendix A
184
Figure A.8: Predictions made by an “absolute probability” decomposer, relative to a decomposer based on over- or under-representation in the lexicon. acquiring a ZED-A lexicon despite its considerable obstacles, might then develop a phonotactic strategy to aid in parsing incoming speech. However, knowledge of the lexicon will not help in this task. There are no patterns of over- or under-representation in the lexicon of ZED-A. The patterns of coda-onset occurrence in ZED-A contain no cues to juncture. To the extent that junctural phonotactics play an important role in acquisition and processing, then, we do not expect to find languages with the properties of ZED-A. A second logical possibility can be exemplified by ZED-B. ZED-B contains absolutely no correlation between inter- and intra-word coda-onset probabilities. It scatters points liberally, and without order across the space in figure A.8. Some transitions are highly likely both word-internally and between words. Other transitions are highly unlikely both internally and between words. Some transitions are highly likely word-internally but occur very seldom across word boundaries, and other transitions occur with great frequency across word boundaries, but are illegal word internally. Consider first the task of the infant learner of ZED-B. Because points are scattered randomly throughout figure A.8, roughly as many points fall in the two quadrants marked “correct” as in the other two quadrants. That is, hypothesizing that low probability transitions are boundaries and high probability ones are not will not have a high success rate. Coda-onset probabilities then, will not help the lexicon-less learner of ZED-B with the task of locating word boundaries.
Appendix A
185
ZED-B does, however, contain some patterns of over- and under-representation, and so contains potential phonotactic cues to juncture for listeners who have a ZED-B lexicon to generalize over. Some transitions are much more probable word-internally, and some transitions are much more probable across word-boundaries. ZED-B speakers are likely aware of these generalizations, and may use them to facilitate the segmentation of the speech stream. Also, the acquirer of ZED-B, once they have started to form a lexicon may be able to generalize over its contents to extract generalizations which will help with the segmentation task, and so bootstrap the acquisition of further words. ZED-B, then, is not a language in which the statistics of the speech-stream facilitate segmentation, but may provide some information which is exploited at later stages of learning. ZED-C is a third possible member of the ZED language family. In ZED-C a negative correlation holds between word-internal and inter-word coda-onset probabilities. The more likely a transition is to occur within words, the less likely it is to occur between words. At first glance, ZED-C might seem like a very sensibly designed language—most transitions carry a lot of information, they are either more likely to be word internal, or more likely to occur between word transitions. Furthermore, examining the space in figure A.8 indicates a negative correlation would position the points primarily in the two quadrants marked “correct.” The statistics of the lexicon, then, provide an excellent cue to decomposition, and relying on word-internal probabilities alone will make the right predictions in ZED-C—the word-internal probabilities do not need be compared with expected probabilities across words. For someone with a mature ZED-C lexicon, then, the phonotactic information carried by ZED-C syllable boundaries could potentially be exploited in a reliable and robust manner. Consider, however, the task of the ZED-C learning infant. Because there is a strong negative correlation between word-internal and inter-word transitions, the statistics of the speech stream displays no clear patterns. High probability word-internal transitions occur infrequently across words, and therefore occur in the speech stream at the same rate as low probability word-internal transitions which occur frequently across words. There are no transitions which occur with low probability in the speech stream, and so nothing for the infant to grasp onto to bootstrap the learning process. It is possible that ZED-C is easier to process for mature listeners than either ZED-A or ZED-B. But it does not have any advantage in acquisition. In terms of the adult listener, ZED-C would be an ideal language if Pitt and McQueen’s (1998) Fast Phonological Preprocessor were a generalization over lexical entries, as they claim, and if it was this preprocessor which was used to segment running speech. However some other mechanisms would be required to bootstrap the initial stages of segmentation for ZED-C learning infants. ZED-D is a particularly frustrating language for the language learner. In ZED-D the correlation between word-internal and inter-word probabilities is positive, and the slope of the correlation is steep. ZED-D is similar to ZED-A, then, but would display a steeper regression line. While the ZED-A would scatter points around the x=y line shown in figure A.8, ZED-D would scatter points around a steeper line. Therefore, towards the right of the graph, most points would fall above the x=y line, whereas towards the left of the graph, most points would fall below the x=y line. Examination of figure A.8, then, shows that this is exactly the case in which reliance on word-internal probability alone would make incorrect predictions about degree of juncture. Transitions which are low probability within words are even lower probability between words. And transitions
Appendix A
186
which are high probability within words occur with even higher probability between words. Unlike in ZED-A, ZED-B and ZED-C, ZED-D’S speech stream provides the learning infant with some variability in probabilities of coda-onset transitions. For the learner, then, the speech stream contains an extremely high amount of variability. In order to exploit this variability, however, the learner needs to treat it in the opposite way than has been documented for infant learners. High probability transitions indicate word boundaries, and low probability transitions do not. In theory, then, this language is more learnable than the previous three, because there is a cue to juncture in the phonotactics. However, while ZED-D may in theory be learnable, it is an extremely unlikely language. In Hay et al. (in press), we argued: It would be logically possible to design a language in which phonological restrictions across word boundaries were strong and pervasive. But human language is not like this. Because words have arbitrary meanings, and coherence of meaning dominates how words are combined, phoneme transitions across word boundaries are closer to being statistically independent than transitions within morphemes. That is, across words, coda-onset probabilities are constrained only by the independent probability of the two parts. Within words, there may be some additional constraint on the transition itself. ZED-D, then, does provide a cue to juncture in its phonotactics, but the cue is an abnormal one, and we should be surprised to find an existing language which displayed this characteristic. By this stage the reader may be wondering whether there is any possible member of the ZED language family which is both learnable and likely. It may, then, come as some relief to learn that the next language, ZED-E, is much better designed. In ZED-E, most codas and onsets are equiprobable at word-edges. Because the rates of occurrence of specific segments at word-edges are equal, and because words combine at chance, there is very little variation in the intra-word probabilities in ZED-E. All transitions occur across word boundaries with roughly equal probability. However, ZEDE does display variation in the types of codas and onsets which can occur together wordinternally. Some coda-onset transitions are well-attested morpheme internally, and some are highly unlikely. ZED-E would place points in a narrow, horizontal band across the space in figure A.8. The band would span the entire x axis, but display very little variation along the y axis. Because both the x and y axes represent probability spaces, the horizontal band must fall roughly around the value of y that is equivalent to the mean value of x. High probability transitions, then, will be more frequent within words than between words, but low probability transitions will be less frequent within words than between words. In ZED-E, the predictions made by a decomposer based on over- or underrepresentation in the lexicon, on word-internal statistics in the lexicon, and on the statistics of the speech stream would be very similar. Because there is almost no variation in inter-word probabilities, the degree of over- or under-representation in the lexicon can be well indexed by word-internal statistics alone. And, likewise, the variation in the statistics of the speech stream will be carried by the word-internal statistics.
Appendix A
187
ZED-E is the first ZED language we have encountered in which the organization of the phonotactics facilitate learning. A ZED-E learning infant can use distributional information to bootstrap the learning task, well before starting to acquire a lexicon or the concept of reference. Low probability transitions in the speech stream are likely to represent word boundaries. High probability transitions are not. ZED-E is a welldesigned language. Readers may by now have recognized that we encountered a system with ZED-E-type properties earlier in this appendix. English nasal-obstruent clusters are organized in just this way. Refer back to figure A.1, from Hay et al. (in press). The filled points on this graph represent the relationship between word-internal and inter-word probabilities in English. They fall in a narrow band, just like in ZED-E, and it was this fact which enabled a reanalysis of Hay et al.’s data based on an absolute probability threshold, as discussed earlier in this chapter. Since ZED-E contains a number of design characteristic which would appear optimal for language learning and processing, it is worth shifting slightly away from the idealized abstractions of ZED-E, and examine the consequences of endowing it with some more language-like properties. Recall that all codas and onsets are roughly equiprobable in ZED-E. Human languages, however, are likely to have some phonemes which are more common than others. And if the language contains clusters as syllable onsets or codas, these are unlikely to all occur equiprobably, and with the same probability as simple onsets and codas. Let’s examine three dialects of ZED-E, then, which display some variation in the probability of occurrence of individual segments. In all three dialects, onsets are either simple consonants, or consonant clusters. And in all three dialects consonant cluster onsets occur less frequently than simple consonant onsets. Consider how coda-onset transitions in these dialects will be arranged in figure A.8. Because consonant cluster onsets occur overall less frequently than simple onsets, codaonset transitions involving cluster onsets will be less frequent than coda-onset transitions involving simple onsets. Transitions involving onset clusters, then, are unlikely both within and across words, and so will be positioned toward the lower left hand corner of the graph. Assume, then, that in the ZED-E dialects we are considering, all transitions involving clusters fall to the left of the vertical line on the graph. The position of these transitions with respect to the diagonal x=y line is what sets the three dialects apart. In ZED-E1, consonant cluster onsets occur with high probability word-internally, but are unlikely word-initially. It follows, then, that coda-onset transitions which involve consonant cluster onsets, are much more likely to occur within words than between words. These transitions, then, all fall to the left of the graph (because consonant clusters in general are infrequent), and below the diagonal x=y line (because they are more frequent across words than within words). In ZED-E1, then, the undesirable portion of figure A.8 labeled “false” will be highly populated. This configuration is not helpful to the language learner. Hypothesizing that low probability transitions in the speech stream represent boundaries will not lead to a high level of success, because of the large number of transitions in ZED-E1 which are low probability, and yet characterize word-internal phonology. ZED-E1, then, is not optimal. In a second dialect—ZED-E2—consonant cluster onsets, while infrequent, occur with roughly equal probability word-internally and word initially. Transitions involving consonant clusters, then, tend to fall to the left of the vertical line in figure A.8, but fall in equal numbers above and below the diagonal x=y line. Such an organization will lead a
Appendix A
188
learner relying on the statistics of the speech stream to hypothesize wrongly in a decent percentage of cases involving consonant clusters. This is better than ZED-E1, but still not optimal. ZED-E3 displays the most learnable configuration of the ZED-E dialects. In ZED-E3, like the previous two dialects, consonant cluster onsets are rarer than simple onsets. In ZED-E3, however, consonant clusters are much more likely to occur at word onsets than word-internally. Transitions involving consonant clusters, then, tend to fall to the left of the vertical line in figure A.8, but above the x=y diagonal line. The language learner will not run into trouble when encountering transitions involving clusters in ZED-E3. They will be fairly low probability, so the learner will posit a boundary, and the learner will be right. Because ZED-E3 is so well configured, we expect to find many human languages which display this property—rare phonemes, codas or onsets should be over-represented at word edges. Overall, ZED-E3-type languages will display a weak positive correlation between word-internal and inter-word probabilities. The variation in inter-word probabilities will be less than the variation in intra-word probabilities. A regression line fit through the points of a ZED-E3 language on figure A.8, then, should display a weakly positive correlation. The line will start, on the left of the graph, above the x=y line, and display a shallow slope, so that it ends, on the right of the graph, well below the x=y line. If we assume that the idealized ZED-E, which displayed no variation in the probabilities of individual units, is unlikely (at least in a system which includes more variability than the restricted set of nasal-obstruent transitions), then ZED-E3 displays the optimal design characteristics for language. It is the only configuration which allows some variability in expected value, is segmentable bottom-up from the speech stream, and contains tighter restrictions word-internally than across words. Inter-syllable transitions in English are configured like ZED-E3. Figure A.9 repeats figure A.3, with a solid line indicating the regression line fit through the data. When we compare inter- and intra-word probabilities for transitions found across the morpheme boundaries of 515 prefixed words—affixed with consonant final prefixes, the correlation is weak but positive. Our journey through various
Appendix A
189
Figure A.9: Intra- and inter-word cooccurrence probabilities for transitions occurring in 515 prefixed words. Repeats figure A.3, with regression line fit through the data. possible configurations of this space, then, has revealed that the configuration displayed by English may reflect an optimal configuration for language learning and processing. Aslin et al. (1998), arguing for the importance of conditional probabilities in speech segmentation, argue …in language as in many other patterned domains, relative frequency (even complex frequency, such as the frequency of co-occurrence of pairs or triples of items) is not the best indicator of structure. Instead, significant structure is typically most sharply revealed by the statistical predictiveness among items (i.e. frequency of co-occurrence normalized for frequency of the individual components). (Aslin et al. 1998:323) The results reported above, together with the reanalysis of Hay et al.’s (in press) results suggest that, at least in the context of coda-onset transitions in English, such normalization may not be necessary. It remains to be seen the degree to which this holds true of other types of cooccurrence patterns in English, as well as in other languages. As noted earlier, probabilistic phonotactics are far from the only cue which can be used to segment speech, and we may well expect that different languages exploit various cues to different degrees. We therefore do not necessarily expect the phonotactics of all
Appendix A
190
languages to reflect the design characteristics of ZED-E3. We should, however, predict that languages not designed in this way carry other types of information which facilitate the segmentation process. And to the extent that the use of probabilistic phonotactics is a valuable and widely exploited strategy for language acquisition, we predict that a disproportionately high number of languages might display a weak positive correlation between inter- and intra-word probabilities. The degree to which this is the case remains as an empirical, testable, question.
References
Alegre, M. and Gordon, P.: 1999a, Frequency effects and the representational status of regular inflections, Journal of Memory and Language 40, 41–61. Alegre, M. and Gordon, P.: 1999b, Rule-based versus associative processes in derivational morphology, Brain and Language 68, 347–354. Allen, J. and Christiansen, M.: 1996, Integrating multiple cues in word segmentation: A connectionist model using hints, Proceedings of the 18th annual Cognitive Science Society conference, Lawrence Erlbaum Associates Inc., Mahwah, NJ, pp. 370–375. Anshen, F. and Aronoff, M.: 1981, Morphological Productivity and Transparency, Canadian Journal of Linguistics 26 (1), 63–72. Anshen, F. and Aronoff, M.: 1988, Producing morphologically complex words, Linguistics pp. 641–655. Anshen, F. and Aronoff, M.: 1999, Using dictionaries to study the mental lexicon, Brain and Language 68, 16–26. Aronoff, M.: 1976, Word Formation in Generative Grammar, MIT Press, Cambridge, MA. Aronoff, M.: 1980, The relevance of productivity in a synchronic description of word formation, in J.Fisiak (ed.), Historical Morphology, Mouton, The Hague, pp. 71–82. Aronoff, M. and Sridhar, S.: 1983, Morphological levels in English and Kannada or atarizing Reagan, Papers from the Parassesion on the Interplay of Phonology, Morphology and Syntax, Chicago Linguistic Society, Chicago, pp. 3–16. Aslin, R.N., Saffran, J.R., and Newport, E.L.: 1998, Computation of conditional probability statistics by 8-month-old infants, Psychological Science 9, 321–324. Baayen, R.H.: 1990, Corpusgebaseerd onderzoek naar morfologische produktiviteit, Spektator 19, 213–217. Baayen, R.H.: 1992, Quantitative Aspects of Morphological Productivity, in G.Booij and J.van Marle (eds), Yearbook of Morphology 1991, Kluwer Academic Publishers, Dordrecht, pp. 109– 150. Baayen, R.H.: 1993, On frequency, transparency and productivity, in G.Booij and J.van Marle (eds), Yearbook of Morphology 1992, Kluwer Academic Publishers, pp. 181–208. Baayen, R.H.: 1994, Productivity in Language Production, Language and Cognitive Processes 9 (3), 447–469. Baayen, R.H.: 2002, Review of Causes and Consequences of Word Structure, Glot International 6 (2), 58–64. Baayen, R.H. and Lieber, R.: 1991, Productivity and English Derivation: A Corpus Based Study, Linguistics 29, 801–843. Baayen, R.H. and Lieber, R.: 1997, Word Frequency Distributions and Lexical Semantics, Computers and the Humanities 30, 281–291. Baayen, R.H. and Renouf, A.: 1996, Chronicling the Times: productive innovations in an English newspaper, Language 72 (1), 69–96.
References
193
Baayen, R.H. and Schreuder, R.: 1999, War and Peace: Morphemes and Full Forms in a Noninteractive Activation Parallel Dual-Route Model, Brain and Language 68, 27–32. Baayen, R.H., Lieber, R. and Schreuder, R.: 1997, The morphological complexity of simplex nouns, Linguistics 35, 861–877. Baayen, R.H., Piepenbrock, R. and Gulikers, L.: 1995, The CELEX lexical database (release 2) cdrom., Philadelphia, PA: Linguistic Data Consortium, Univesity of Pennsylvania (Distributor). Bagley, W.A.: 1900, The apperception of the spoken sentence: a study in the psychology of language, American Journal of Psychology 12, 80–130. Balota, D. and Chumbley, J.: 1984, Are lexical decisions good measures of lexical access? The role of word frequency in the neglected decision stage, Journal of Experimental Psychology: Human Perception and Performance 10, 340–357. Bates, E. and MacWhinney, B.: 1987, Competition, variation and language learning, in B.MacWhinney (ed.), Mechanisms of Language Acquisition, Erlbaum, Hillsdale, New Jersey, pp. 173–218. Bauer, L.: 1988, Introducing Linguistic Morphology, Edinburgh University Press, Edinburgh. Bauer, L.: 1992, Scalar productivity and -lily adverbs, in G.Booij and J.van Marle (eds), Yearbook of Morphology 1991, Kluwer Academic Publishers, Dordrecht, pp. 185–191. Bauer, L.: 1995, Is morphological productivity non-linguistic?, Acta Linguistica Hungarica 43 (1– 2), 19–34. Beauvillain, C. and Segui, J.: 1992, Representation and Processing of Morphological Information, in R.Frost and L.Katz (eds), Orthography, Phonology, Morphology, and Meaning, Elsevier Science Publishers, B.V., chapter 19, pp. 377–388. Behaghel, O.: 1916, Geschichte der deutschen Sprache, Trübner, Strassburg. Berkley, D.: 1994, Variability in Obligatory Contour Principle effects, Papers from the 30th Regional Meeting of the Chicago Linguistic Society, Part 1, pp. 1–12. Bertram, R., Laine, M., Baayen, R.H., Schreuder, R. and Hyöna, J.: 2000, Affixal Homonymy triggers full-form storage, even with inflected words, even in a morphologically rich language, Cognition 74, B13-B25. Bolinger, D.L.: 1961, Contrastive accent and contrastive stress, Language 37 (1), 83–96. Boogaart, P.C. Uit den: 1975, Woordfrequenties in geschreven en gesproken Nederlands, Oosthoek, Utrecht. Bradley, D.: 1979, Lexical representation of derivational relations, in M.Aronoff and M.Kean (eds), Juncture, MIT Press, Cambridge, MA., pp. 37–55. Brent, M.R. and Cartwright, T.A.: 1996, Distributional regularity and phonotactic constraints are useful for segmentation, Cognition 61, 93–125. Burani, C. and Caramazza, A.: 1987, Representation and processing of derived words, Language and Cognitive Processes 2, 217–227. Burani, C. and Laudanna, A.: 1992, Units of Representation for Derived Words in the Lexicon, in R.Frost and L.Katz (eds), Orthography, Phonology, Morphology, and Meaning, Elsevier Science Publishers, B.V., chapter 18, pp. 361–376. Burzio, L.: 1994, Principles of English Stress, Cambridge University Press, Cambridge. Butterworth, B.: 1983, Lexical representation, in B.Butterworth (ed.), Language Production, Vol. 2, Academic Press, London, pp. 257–294. Bybee, J.: 1985, Morphology: A study of the relation between meaning and form, John Benjamins Publishing Company, Amsterdam. Bybee, J.: 1988, Morphology as Lexical organization, in M.Hammond and M.Noonan (eds), Theoretical morphology: approaches in modern linguistics, Academic Press, Inc., San Diego, pp. 119–142. Bybee, J.: 1995a, Diachronic and Typological Properties of Morphology and their Implications for Representation, in L.B.Feldman (ed.), Morphological Aspects of Language Processing, Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp. 225–246.
References
194
Bybee, J.: 1995b, Regular Morphology and the Lexicon, Language and Cognitive Processes 10 (5), 425–455. Bybee, J. and Newman, J.E.: 1995, Are stem changes as natural as affixes?, Linguistics pp. 633– 654. Bybee, J. and Pardo, E.: 1981, On lexical and morphological conditioning of alternations: an nonce-probe experiment with Spanish verbs, Linguistics 19, 931. Cairns, P., Shillcock, R., Chater, N. and Levy, J.: 1997, Bootstrapping word boundaries: A bottomup corpus-based approach to speech segmentation, Cognitive Psychology 33, 111–153. Caramazza, A., Laudanna, A., and Romani, C: 1988, Lexical access and inflectional morphology, Cognition 28, 297–332. Chialant, D. and Caramazza, A.: 1995, Where is Morphology and How is it Processed?, in L.B.Feldman (ed.), Morphological Aspects of Language Processing, Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp. 55–78. Christiansen, M.H., Allen, J. and Seidenberg, M.S.: 1998, Learning to segment speech using multiple cues: A connectionist model, Language and Cognitive Processes 13, 221–268. Cleveland, W.S.: 1979, Robust locally weighted regression and smoothing scatterplots, Journal of the American Statistical Association 74, 829–36. Cole, P., Beauvillan, C, and Segui, J.: 1989, On the Representation and Processing of Prefixed and Suffixed Derived Words: A Differential Frequency Effect, Journal of Memory and Language 28, 1–13. Coleman, J.: 1996, The psychological reality of language-specific constraints, Paper presented at the Fourth Phonology Meeting, University of Manchester, May 1996. Coleman, J. and Pierrehumbert, J.: 1997, Stochastic Phonological Grammars and Acceptability, Computational Phonology: Proceedings of the Workshop, 12 July 1997, Association for Computational Linguistics, Somerset, NJ, pp. 49–56. Connine, C.M., Titone, D., and Wang, J.: 1993, Auditory word recognition: extrinsic and intrinsic effects of word frequency, Journal of Experimental Psychology: Learning, Memory and Cognition 19 (1), 81–94. Cooper, W. and Paccia-Cooper, J.: 1980, Syntax and Speech, Harvard University Press, Cambridge, Mass. Cutler, A.: 1980, Productivity in word formation, in J.Kreiman and A.E.Ojeda (eds), Papers from the 16th regional meeting of the Chicago Linguistic Society, pp. 45–51. Cutler, A.: 1981, Degrees of Transparency in Word Formation, Canadian Journal of Linguistics 26 (1), 73–77. Cutler, A.: 1990, Exploiting prosodic probabilities in speech segmentation, in G.Altmann (ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives, MIT Press, Cambridge, pp. 105–121. Cutler, A.: 1994, Segmentation problems, rhythmic solutions, Lingua 92, 81–104. Cutler, A. and Butterfield, S.: 1992, Rhythmic Cues to Speech Segmentation: Evidence from Juncture Misperception, Journal of Memory and Language 31, 218–236. Cutler, A. and Carter, D.: 1987, The predominance of strong initial syllables in the English vocabulary, Computer Speech and Language 2, 133–142. Cutler, A. and Norris, D.: 1988, The role of strong syllables in segmentation for lexical access, Journal of Experimental Psychology: Human Perception and Performance 14, 113–121. Cutler, A., Hawkins, J.A., and Gilligan, G.: 1985, The suffixing preference: a processing explanation, Linguistics 23, 723–758. Dahan, D. and Brent, M.R.: 1999, On the discovery of novel word-like units from utterances: An artificial-language study with implications for native-language acquisition, Journal of Experimental Psychology: General 128, 165–185. Dell, G.: 1990, Effects of frequency and vocabulary type on phonological speech errors, Language and Cognitive Processes 5 (4), 313–349.
References
195
Dressler, W.U.: 1997, On productivity and potentiality in inflectional morphology, CLAS-NET Working Papers 7, 2–22. Elman, J. and McClelland, J.: 1988, Cognitive penetration of the mechanisms of perception: compensation for coarticulation of lexically restored phonemes, Journal of Memory and Language 27, 143–165. Elman, J.L.: 1990, Finding structure in time, Cognitive Science 14, 179–211. Fabb, N.: 1988, English suffixation is constrained only by selectional restrictions, Natural Language and Linguistic Theory 6, 527–539. Feldman, L.B.: 1994, Beyond Orthography and Phonology: Differences between Inflections and Derivations, Journal of Memory and Language 33, 442–470. Feldman, L.B. and Soltano, E.G.: 1999, Morphological Priming: The Role of Prime Duration, Semantic Transparency, and Affix Position, Brain and Language 68, 33–39. Fowler, C.A., Napps, S.E., and Feldman, L.B.: 1985, Relations among regular and irregular morphologically related words in the lexicon as revealed by repetition priming, Memory and Cognition 13, 241–255. Francis, W.N. and Kucera, H.: 1982, Frequency Analysis of English Usage, Lexicon and Grammar, Houghton Mifflin Co., Boston. Frauenfelder, U.H. and Schreuder, R.: 1992, Constraining Psycholinguistic Models of Morphological Processing and Representation: The Role of Productivity, in G.Booij and J.van Marle (eds), Yearbook of Morphology 1991, Kluwer Academic Publishers, Dordrecht, pp. 165– 185. Frisch, S.: 1996, Similarity and Frequency in Phonology, PhD thesis, Northwestern University. Frisch, S.: 2000, Temporally organized lexical representations as phonological units, in M.Broe and J.Pierrehumbert (eds), Papers in Laboratory Phonology V:Acquisition and the lexicon, Cambridge University Press, Cambridge, pp. 283–298. Frisch, S., Large, N. and Pisoni, D.B.: 2000, Perception of wordlikeness: Effects of segment probability and length on the processing of non-words, Journal of Memory and Language 42, 481–496. Giegerich, H.J.: 1999, Lexical Strata in English: morphological causes, phonological effects, Cambridge University Press, Cambridge, UK. Gleitman, L.R., Gleitman, H., Landau, B. and Wanner, E.: 1988, Where learning begins: Initial representations for language learning, in F.Newmeyer (ed.), Linguistics: The Cambridge Survey, Vol. 3, Cambridge University Press, Cambridge, U.K., pp. 150–193. Goldsmith, J.: 1990, Autosegmental and Metrical Phonology, Basil Blackwell Ltd., Oxford, U.K. Gordon, P. and Alegre, M.: 1999, Is There a Dual System for Regular Inflections?, Brain and Language 68, 212–217. Greenberg, J.: 1963, Some universals of grammar with particular reference to the order of meaningful elements, in J.Greenberg (ed.), Universals of Language, MIT Press, Cambridge, Mass. Greenberg, J.H.: 1978, Some generalizations regarding initial and final consonant clusters, in Greenberg, J.H., C.A.Ferguson and E.A.Moravcsik (ed.), Universals of Human Language Volume II: Phonology, Stanford University Press, Stanford, CA, pp. 243–280. Grosjean, F.: 1980, Spoken word recognition processes and the gating paradigm, Perception and Psychophysics 45, 189–195. Grossberg, S.: 1986, The adaptive self-organization of serial order in behavior: Speech, language, and motor control, in E.Schwab and H.C.Nusbaum (eds), Pattern recognition by humans and machines: Vol 1. Speech perception, Academic Press, New York, pp. 187–294. Harwood, F. and Wright, A. M.: 1956, Statistical study of English word formation, Language 32 (2), 260–273. Hay, J.: 2001, Lexical frequency in morphology: Is everything relative?, Linguistics 39 (6), 1041– 1070.
References
196
Hay, J.: 2002, From speech perception to morphology: Affix-ordering revisited, Language 78 (3), 527–555. Hay, J. and Baayen, R.: 2002, Parsing and productivity, in G.Booij and J.van Marle (eds), Yearbook of Morphology, 2001, Kluwer Academic Publishers, Dordrecht, pp. 203–235. Hay, J. and Baayen, R.H.: to appear, Phonotactics, parsing and productivity, Italian Journal of Linguistics. Hay, J. and Plag, I.: to appear, What constrains possible suffix combinations? On the interaction of grammatical and processing restrictions in derivational morphology, Natural Language and Linguistic Theory. Hay, J., Jannedy, S., and Mendoza-Denton, M.: 1999, Oprah and /ay/: Lexical frequency, referee design and style, Proceedings of the 14 International Congress of Phonetic Sciences, San Francisco, pp. 1389–1392. Hay, J., Pierrehumbert, J. and Beckman, M.: in press, Speech perception, wellformedness, and the statistics of the lexicon, in Local, J., Ogden, R. and Temple, R. (ed.), Papers in Laboratory Phonology VI: Phonetic Interpretation, Cambridge University Press, Cambridge. Hay, J., Pierrehumbert, J., Beckman, M. and Faust West, J.: 1999, Lexical frequency effects in speech errors, Manuscript of paper presented at the 73rd Annual Meeting of the Linguistic Society of America, LA, January 1999. Johnson, K.: 1997a, The auditory/perceptual basis for speech segmentation, in K.AinsworthDarnell and M.D’Imperio (eds), Ohio State University Working Papers in Linguistics: Papers from the Linguistics Laboratory, Vol. 50, Ohio State University, pp. 101–113. Johnson, K.: 1997b, Speech perception without speaker normalization, in K.Johnson and J.Mullenix (eds), Talker variability in speech processing, Academic Press, San Diego, pp. 145– 165. Jusczyk, P.W., Cutler, A. and Redanz, N.: 1993, Preference for the predominant stress patterns of English words, Child Development 64, 675–687. Jusczyk, P.W., Houston, D.M., and Newsome, M.: 1999, The beginnings of word segmentation in English-learning infants, Cognitive Psychology 39, 159–207. Jusczyk, P.W., Luce, P.A. and Charles-Luce, J.: 1994, Infants’ sensitivity to phonotactic patterns in the native language, Journal of Memory and Language 33, 630–645. Kempley, S. and Morton, J.: 1982, The effects of priming with regularly and irregularly related words in auditory word recognition, British Journal of Psychology 73, 441–454. Kenstowicz, M.: 1994, Phonology in Generative Grammar, Oxford: Blackwell Publications. Kiparsky, P.: 1982, Lexical Morphology and Phonology, in I.S.Yang (ed.), Linguistics in the Morning Calm, Hanshin, Seoul, pp. 1–92. Kiparsky, P.: 1983, Word formation and the lexicon, in F.Ingemann (ed.), Proceedings of the 1982 Mid-America Linguistics Conference, University of Kansas, pp. 3–22. Laudanna, A. and Burani, C.: 1985, Address mechanisms to decomposed lexical entries, Linguistics 23, 775–792. Laudanna, A., Burani, C. and Cermele, A.: 1994, Prefixes as processing units, Language and Cognitive Processes 9 (3), 295–316. Lehiste, I.: 1972, The timing of utterances and linguistic boundaries, Journal of the Acoustical Society of America 51, 2018–2024. Levelt, W.: 1983, Monitoring and self-repair in speech, Cognition 14, 41–104. Losiewicz, B.L.: 1992, The effect of frequency on linguistic morphology, PhD thesis, University of Texas at Austin. Luce, P.A. and Pisoni, D.: 1998, Recognizing spoken words: The neighborhood activation model, Ear and Hearing 19, 1–36. Marslen-Wilson, W. and Welsh, A.: 1978, Processing interactions and lexical access during word recognition in continuous speech, Cognitive Psychology 10, 29–63. Marslen-Wilson, W. and Zhou, X.: 1999, Abstractness, Allomorphy, and Lexical Architecture, Language and Cognitive Processes 14 (4), 321–352.
References
197
Marslen-Wilson, W., Ford, M., and Zhou, X.: 1997, The combinatorial lexicon: Priming derivational affixes, Proceedings of the 18th Annual Conference of the Cognitive Science Society, Erlbaum, Mahwah, NJ. Marslen-Wilson, W., Hare, M., and Older, L.: 1993, Inflectional morphology and phonological regularity in the English mental lexicon., Proceedings of the 15th Annual Conference of the Cognitive Science Society, Erlbaum, Princeton, NJ. Marslen-Wilson, W., Tyler, L.K., Waksler, R., and Older, L.: 1994, Morphology and Meaning in the English Mental Lexicon, Psychological Review 101 (1), 3–33. Marslen-Wilson, W., Zhou, X. and Ford, M.: 1997, Morphology, modality, and lexical architechure, in G.Booij and J.van Marle (eds), Yearbook of Morphology 1996, Kluwer Academic Publishers, Dordrecht, pp. 117–134. Mast-Finn, J.: 1999, Triggering the speech mode of perception: Does word frequency matter?, Proceedings of the 1999 National Conference on Undergraduate Research. Mattys, S.L., Jusczyk, P.W., Luce, P.A. and Morgan, J.L.: 1999, Word segmentation in infants: How phonotactics and prosody combine, Cognitive Psychology 38, 465–494. McClelland, J. and Elman, J.: 1986, The TRACE model of speech perception, Cognitive Psychology 18, 1–86. McQueen, J. and Cutler, A.: 1998, Morphology in word recognition, in A.Spencer and A.Zwicky (eds), The Handbook of morphology, Blackwell, Oxford, pp. 406–427. McQueen, J.M.: 1998, Segmentation of Continuous Speech using Phonotactics, Journal of Memory and Language 39, 21–46. McQueen, J.M. and Cox, E.: 1995, The use of phonotactic constraints in the segmentation of Dutch, EUROSPEECH’95, ESCAT, Madrid, pp. 1707–1709. McQueen, J.M., Norris, D. and Cutler, A.: 1994, Competition in spoken word recognition: Spotting words in other words, Journal of Experimental Psychology: Learning, Memory and Cognition 20, 621–638. Meunier, F. and Segui, J.: 1999a, Frequency Effects in Auditory Word Recognition: The Case of Suffixed Words, Journal of Memory and Language 41, 327–344. Meunier, F. and Segui, J.: 1999b, Morphological Priming Effect:The Role of Surface Frequency, Brain and Language 68, 54–60. Modor, C.: 1992, Productivity and categorization in morphological classes, PhD thesis, SUNY at Buffalo, NY. Mohanan, K.: 1986, The Theory of Lexical Phonology, D.Reidel Publishing Company, Dordrecht, Holland. Napps, S.E.: 1989, Morphemic relationships in the lexicon: Are they distinct from semantic and formal relationships?, Memory and Cognition 17 (6), 729–739. Norris, D.: 1990, Aa dynamic-net model of human speech recognition, in G.Altmann (ed.), Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives, MIT Press, Cambridge, MA, pp. 87–105. Norris, D.: 1992, Connectionism: A new class of bottom up model, in R.G.Reilly and N.E.Sharkey (eds), Connectionist Approaches to Language Processing, Lawrence Erlbaum Associates, Hove, Sussex, pp. 351–371. Norris, D.: 1993, Bottom-up connectionist models of interaction, in G.Altmann and R.Schillcock (eds), Connectionist Models of Speech Processing: The Sperlonga Meeting II, Lawrence Erlbaum Associates, Hillsdate, N.J., pp. 211–234. Norris, D.: 1994, Shortlist: A connectionist model of continuous speech recognition, Cognition 52, 189–234. Norris, D., McQueen, J. and Cutler, A.: 1995, Competition and segmentation in spoken-word recognition, Journal of Experimental Psychology: Learning, Memory and Cognition 21, 1209– 1228. Norris, D., McQueen, J.M., and Cutler, A.: 2000, Merging information in speech recognition: Feedback is never necessary, Behavioral and Brain Sciences 23 (3), 299–325.
References
198
Norris, D., McQueen, J.M., Cutler, A. and Butterfield, S,: 1997, The Possible-Word Constraint in the Segmentation of Continuous Speech, Cognitive Psychology 34, 191–243. Otake, T., Yoneyama, K., Cutler, A. and van der Lugt, A.: 1996, The representation of Japanese moraic nasals, Journal of the Acoustical Society of America 100, 3831–3842. Pagliuca, W.: 1976, Pre-fixing, Unpublished Manuscript, SUNY/Buffalo. Paivio, A., Yuille, J.C. and Madigan, S.: 1968, Concreteness, imagery, and meaningness values for 925 nouns, Journal of Experimental Psychology Monograph 76. Pesetsky, D.: 1985, Morphology and logical form, Linguistic Inquiry 16, 193–246. Pierrehumbert, J.: 1994, Syllable structure and word structure: a study of triconsonantal clusters in English, in P.A.Keating (ed.), Phonological Structure and Phonetic Form. Papers in Laboratory Phonology 3, Cambridge University Press, Cambridge, pp. 168–190. Pierrehumbert, J.: 1995, Prosodic effects on glottal allophones, in O.Fujimura (ed.), Vocal Fold Physiology 8: Voice Quality Control (Proceedings of the VIIIth Vocal Fold Physiology Conference), Singular Press, San Diego, pp. 39–60. Pierrehumbert, J.: 2001, Why phonological constraints are so coarse-grained, in J.McQueen and A.Cutler (eds), SWAP special issue, Language and Cognitive Processes 16, 5/6, pp. 691–698. Pinker, S. and Prince, A.: 1988, On language and connectionism: Analysis of a parallel distributed processing model of language acquisition, Cognition 28, 73–193. Pitt, M.A. and McQueen, J. M.: 1998, Is compensation for coarticulation mediated by the lexicon?, Journal of Memory and Language 39, 347–370. Plag, I.: 1996, Selectional restrictions in English suffixation revisted: a reply to Fabb (1988), Linguistics pp. 769–798. Plag, I.: 1999, Morphological Productivity: Structural Constraints in English Derivation, Mouton de Gruyter, Berlin/New York. Plag, I.: 2002, The role of selectional restrictions, phonotactics and parsing in constraining suffix ordering in English, in G.Booij and J.van Marle (eds), Yearbook of Morphology, 2001, Kluwer Academic Publishers, Dordrecht, pp. 285–314. Plag, I, Dalton-Puffer, C. and Baayen, R.H.: 1999, Morphological productivity across speech and writing, English Language and Linguistics 3 (2), 209–228. Plunkett, K. and Elman, J.L.: 1997, Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations, MIT Press, Cambridge, MA. Prasada, S. and Pinker, S.: 1993, Generalisation of regular and irregular morphological patterns, Language and Cognitive Processes 8, 1–56. Raffelsiefen, R.: 1999, Phonological constraints on English word formation, in G.Booij and J.van Marle (eds), Yearbook of Morphology 1998, Kluwer Academic Publishers, Dordrecht, pp. 225– 287. Raveh, M. and Rueckl, J.G.: 2000, Equivalent Effects of Inflected and Derived Primes: Long-Term Morphological Priming in Fragment Completion and Lexical Decision, Journal of Memory and Language 42, 103–119. Rumelhart D.E., G.E.Hinton, and R.J.Williams: 1986, Learning internal representations by error backpropagation, in R.D. and M.J. (eds), Parallel Distributed Processing: Explorations in the microstructure of cognition, volume 1, MIT Press, Cambridge, pp. 318–362. Saffran, J.R., Newport, E.L. and Aslin, R.N.: 1996, Word Segmentation: The Role of Distributional Cues, Journal of Memory and Language 35, 606–621. Saffran, J.R., Aslin, R.N., and Newport, E.L: 1996, Statistical learning by 8-month old infants, Science 274, 1926–1928. Schreuder, R. and Baayen, R.H.: 1994, Prefix stripping re-revisited, Journal of Memory and Language 33, 357–375. Schreuder, R. and Baayen, R.H.: 1995, Modeling Morphological Processing, in L.B. Feldman (ed.), Morphological Aspects of Language Processing, Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp. 131–156. Schultink, H.: 1961, Produktiviteit als morfologisch fenomeen, Forum der Letteren 2, 110–125.
References
199
Segui, J. and Zubizarreta, M.-L.: 1985, Mental representation of morphologically complex words and lexical access, Linguistics 23, 759–774. Selkirk, E.O.: 1982, The syntax of words, number 7 in Linguistic Inquiry Monograph Series, MIT Press, Cambridge, Mass. Sereno, J. and Jongman, A.: 1997, Processing of English inflectional morphology, Memory and Cognition 25, 425–437. Siegel, D.: 1979, Topics in English Morphology, Garland, New York. Sonnenstuhl, I, S.Eisenbeiss, and H.Clahsen: 1999, Morphological priming in the German mental lexicon, Cognition 72, 203–236. Spencer, A.: 1991, Morphological theory: An introduction to word structure in generative grammar, Oxford edn, Basil Blackwell. Sproat, R.: 1985, On deriving the lexicon, PhD thesis, MIT. Stanners, R.F., Neiser, J.J., Hernon, W.P. and Hall, R.: 1979, Memory representations for morphologically related words, Journal of Verbal Learning and Verbal Behavior 18, 399–412. Stemberger, J. and MacWhinney, B.: 1986, Frequency and the storage of regularly inflected forms, Memory and Cognition 14, 17–26. Stemberger, J.P. and MacWhinney, B.: 1988, Are inflected forms stored in the lexicon?, in M.Hammond and M.Noonan (eds), Theoretical Morphology: Approaches in Modern Linguistics, Academic Press, San Diego, California, pp. 101–116. Strauss, S.: 1982, On ‘relatedness paradoxes’ and related paradoxes, Linguistic Inquiry 13, 695– 700. Suomi, K., McQueen, J.M. and Cutler, A.: 1997, Vowel harmony and speech segmentation in Finnish, Journal of Memory and Language 36, 422–444. Szpyra, J.: 1989, The Phonology-Morphology Interface: Cycles, levels and words, Routledge, London.
Index
access node, 83, 84 access representation, 82, 83 affix ordering, 17, 153–182 Alegre, D., 76, 77, 120 Allen, J., 22, 23 Anshen, F., 72, 81, 145 Aronoff, M., 72, 81, 140, 145, 159, 160, 171, 173, 181 ART, 4 Aslin, R., 4, 22, 217 Augmented Addressed Morphology, 79, 80 Baayen, R.H, 6, 7, 10, 14, 24, 71, 72, 76, 80–84, 139–146, 149, 150, 152, 157, 161, 163, 169, 175, 189 Bagley, W., 68 Bates, E., 22 Bauer, L., 15, 139 Beauvillain, C., 10, 67, 120 Beckman, M., 5, 28, 29 Berkley, D., 133 Bertram, R., 77, 119 Boland, J., 75, 77, 119–121 Bolinger, D., 89 Boogaart, P., 144 bound root, 163, 170, 171, 174, 179 bracketing paradox, 17, 182–184 Bradley, D., 120 Brent, M, 4, 15, 22 Burani, C, 7, 71, 75, 76, 81, 118–120 Burzio, L., 180, 181 Butterfield, S., 14 Butterworth, B., 7 Bybee, J., 9, 71, 75–79, 141, 151 Cairns, P., 15, 45, 191, 192 Caramazza, A., 7, 71, 75, 76, 79, 119,120 Carter, D., 12 Cartwright, T., 4, 15
Index
202
CELEX lexical database, 10, 24, 30, 31, 44, 45, 55–57, 59, 62, 65, 85, 104, 145, 157, 163–170, 173, 174 Cermele, A., 7 Chater, N., 191 Chialant, D., 79 CHILDES database, 23 Christiansen, M, 15, 22, 23 COBUILD corpus, 62, 145 Cole, P., 10, 67, 120, 183 Coleman, J., 21, 199 complexity ratings, 30–36, 39–43, 85– 88 concept node, 82–84 Connine, C., 5 cross-linguistic differences, 18 cross-splicing, 32 Cutler, A., 4, 7, 9, 10, 12, 14, 22, 67, 68 Dahan, D., 22 de Gelder, B., 18 Dell, G., 5 derivation ratios, 72, 73, 145 Dressler, W., 145 dual-route model, 5, 7–10, 12, 16, 18, 72, 73, 80–82, 142 dual-systems models, 18 Dutch, 119, 144 Elman, J., 4, 21, 22, 24, 191 exemplar theory, 5, 7 Fabb, N., 153, 154, 159, 162–164, 166–170, 172, 174, 181 fast phonological preprocessor, 12, 15, 124, 137, 212 Faust West, J., 5 Feldman, L., 118, 119 Finnish, 119 Ford, M., 119 Forster, K., 120 Fowler, C., 119 Frauenfelder, U., 6, 7, 9, 81, 82, 142 Frisch, S., 21, 68 German, 122 Giegerich, H., 159, 170, 171, 173 Gleitman, H., 22 Gleitman, L., 22 glottalization, 132 Goldsmith, J., 172 Gordon, P., 76, 77, 120 Greenberg, J., 10, 67, 199 Grosjean, F., 5 Grossberg, S., 4
Index
203
hapax, 140–145 Harwood, F, 98 Hay, J., 5, 21, 26, 28–31, 36, 139, 157, 185, 192, 193, 195, 196, 199, 213, 214, 217 homonyms, 119, 120 Houston, D., 14 inflectional morphology, 18, 72, 76, 118, 119 irregular morphology, 72, 76, 119 Italian, 118–120 Jannedy, S., 5 Johnson, K., 5 Jongman, A., 75, 119 Jusczyk, P., 14, 22 Kempley, S., 119 Kenstowicz, M., 199 Kiparsky, P., 159, 160, 182 Landau, B., 22 Large, N., 141 Laudanna, A., 7, 81, 118, 189 Lehiste, I., 4, 22 level ordering, 119, 120, 143, 153–172, 175–184 Levelt, P., 5 Levy, J., 191 lexical decision task, 119, 120 lexical frequency and affix ordering, 154–170, 173–178 and markedness, 75, 144 and morphological decomposi-tion, 72–81, 85–88, 118–125, 133, 136, 137, 141, 155–170 and morphological productivity, 71, 76, 141–147, 149–152 and phonetic implementation pitch-accent placement, 88–95 t-deletion, 61, 123–137 and phonological transparency, 76, 78 and phonotactics, 147, 148 of prefixed words, 61–63 of suffixed words, 67 and polysemy of prefixed words, 104–111 of suffixed words, 114–117 and semantic transparency, 76, 78, 84, 142 of prefixed words, 108, 110, 112, 113 of suffixed words, 113, 114 base frequency, 76, 77, 81, 84, 120 derived frequency, 4, 12, 71– 73, 75, 76, 80, 82–84, 119, 120 relative frequency, 10, 12, 61–63, 67, 72–75, 78, 81, 82, 84, 87–95, 97–117, 120–137, 143–149, 173, 175–178, 208 Lieber, R., 72, 84, 140, 145, 146, 149, 150, 152, 169, 175 Losiewicz, B., 76
Index
204
Luce, J., 40 Luce, P., 4, 37 MacWhinney, B., 22, 76 markedness, 75, 144 Marslen-Wilson, W., 9, 10, 67, 68, 80, 118, 119, 123 Mast-Finn, J., 5 Mattys, S., 15 McClelland, J., 4, 191 McQueen, J., 4, 7, 15, 22, 212 Mendoza-Denton, N., 5 MERGE, 4 metrical segmentation strategy, 12, 14 Meunier, F., 75, 77, 119, 120 modality differences, 118 Modor, C., 71 Mohanan, K., 171 Morphological Meta-Model, 82–84 morphological productivity, 17, 18, 30, 71–73, 76, 139–147, 149– 152, 160, 170, 175, 180 Morphological Race Model (MRM), 81, 82 Morton, J., 119 NAM, 4 Napps, S., 118, 119 nasal-obstruent clusters, 28–30, 193, 194, 214 neural networks, 21–28, 36 Newman, J., 151 Newport, E., 4, 22 Newsome, M., 14 Norris, D., 4, 6, 12, 14, 22, 192 Obligatory Contour Principle, 29, 68, 133 Onsets, special status of, 68, 123 Otake, T., 21 Oxford English Dictionary, 53 Paccia-Cooper, J., 68 Pagliuca, W., 76 Paivio, A., 107 Pardo, E., 151 Pesetsky, D., 182 phoneme-monitoring, 21 phonological transparency, 9, 17, 76, 81, 118 phonotactics, 15 and affix ordering, 154–162, 178, 179 and language acquisition, 22, 192, 208, 209, 211–215, 217 and lexical frequency, 147, 148 of prefixed words, 61–63 of suffixed words, 67 and morphological productivity, 143, 147, 150–152
Index
205
and polysemy of prefixed words, 51, 53–58 of suffixed words, 66 and prefixedness, 46, 47 and segmentation of English complex words, 15, 17, 21, 24, 26–28, 36, 39–43, 67–70, 155–162, 191–217 of nonsense words, 21, 28–36 of sentences, 15, 21–24, 26–28, 36, 191–217 and semantic transparency of prefixed words, 49, 51, 57, 59–61 of suffixed words, 65, 66 calculating probability of, 27, 31, 32, 36, 44–46, 191–217 Pierrehumbert, J., 5, 21, 26, 28, 29, 31, 132, 199 Pinker, S., 18 Pisoni, D., 4 pitch-accent placement, 88–95 Pitt, M., 4, 212 Plag, I., 140, 141, 159, 160, 162–164, 166, 168–170, 172, 174 Plunkett, K., 24 polysemy, 39, 51, 53–58, 66, 97, 104–111, 114–117, 207 possible word constraint, 14, 15, 17 Prasada, S., 18 prefix-suffix asymmetry, 10, 39, 43, 44, 65, 67–69, 102–104, 120, 123, 182–184 prefixedness ratings, 44, 46, 47, 207 priming, 9, 118–120 Prince, A., 18 psuedo-prefixes, 46 Raffelsiefen, R., 159 Raveh, M., 118, 121 Redanz, N., 14, 22 relative frequency, 62, 63, 73 Renouf, A., 140, 141 resting activation level, 5–7, 10, 16, 18, 68, 81–84, 89, 142, 156 Romani, C., 7 Rueckl, J., 118, 121 Rumelhart, D., 26 Saffran, J., 4, 15, 22 Schreuder, R., 6, 7, 9, 14, 81–84, 142, 189 Schultink, H., 139 Segui, J., 10, 67, 75, 77, 119, 120 Seidenberg, M., 22 selectional restrictions, 17, 153–170, 182–184 Selkirk, E., 159 semantic transparency, 14, 16, 17, 39, 44, 46, 49, 51, 56, 57, 59–61, 65, 66, 76, 81–83, 97, 110– 114, 116, 117, 142, 171, 172, 207 Serbian, 118 Sereno, J., 75, 119 Shillcock, R., 191 SHORTLIST, 4 Siegel, D., 153, 159
Index
206
Soltano, E., 118, 119 Sonnenstuhl, I., 121, 122 sonority, 24 Spencer, A., 18, 183 Sproat, R., 182 Sridhar, S., 181 Stanners, R., 118, 119 Stemberger, J., 76 Strauss, S., 182 stress clash, 172 Structure-transparency principle, 180, 181 Suomi, K., 21 Szpyra, J., 159 t-deletion, 61, 123–137 Taft, M., 7, 71, 81, 120 temporality, 9, 17, 67, 182–184 Thorndike, E., 72, 73 Titone, D., 5 Tlearn, 24 TRACE, 4 Treiman, R., 21 Tsapkini, K., 118 Tuomainen, J., 18 Ullman, M., 18 van der Lugt, A., 4, 15, 22 van Marle, J., 139, 143, 144 Vannest, J., 75, 77, 119–121 Vitevitch, M., 4, 21, 37, 40 Vroomen, J., 18 Wang, J., 5 Wanner, E., 22 Webster’s dictionary, 44, 49, 53, 55–57, 59–61, 65, 66, 97 well-formedness ratings, 21, 28–30, 192–194 Welsh, A., 68, 123 Whalen, D., 5, 125, 136 Williams, E., 182 Wright, R., 5, 98, 125, 136, 187, 237 Wright, A., 98 Wurm, L., 7, 44, 46, 47, 49, 97, 207 ZED, 208–215, 217 Zhou, X., 9, 80, 118, 119 Zubizarreta, M., 67